CN114168730A - Consumption tendency analysis method based on BilSTM and SVM - Google Patents

Consumption tendency analysis method based on BilSTM and SVM Download PDF

Info

Publication number
CN114168730A
CN114168730A CN202111416830.9A CN202111416830A CN114168730A CN 114168730 A CN114168730 A CN 114168730A CN 202111416830 A CN202111416830 A CN 202111416830A CN 114168730 A CN114168730 A CN 114168730A
Authority
CN
China
Prior art keywords
user
sentence
consumption
judging
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111416830.9A
Other languages
Chinese (zh)
Inventor
贾海涛
唐小龙
周焕来
乔磊崖
林思远
陈泓秀
张博阳
王俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yituo Communications Group Co ltd
Original Assignee
Yituo Communications Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yituo Communications Group Co ltd filed Critical Yituo Communications Group Co ltd
Priority to CN202111416830.9A priority Critical patent/CN114168730A/en
Publication of CN114168730A publication Critical patent/CN114168730A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0254Targeted advertisements based on statistics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0282Rating or review of business operators or products
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/259Fusion by voting

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a consumption tendency analysis method based on BilSTM and SVM, which comprises the following steps: determining whether the single sentence is relevant to consumption; judging the type of the commodity appearing in the sentence through a Bag-of-words model; judging the emotional attitude of the character in the sentence, namely support or objection, through the BilSTM and SVM models; performing comprehensive calculation according to the type of the commodity and the emotional attitude of the character, and judging the consumption tendency of the user in the sentence; and (4) carrying out the operations on each sentence of the text in sequence, and counting to obtain the consumption tendency of the user in the whole text. By the invention, the tendency of the user in online consumption can be analyzed according to the network talk, evaluation and the like of the user, such as pursuing quality or paying attention to price balance, so that commodities can be recommended to the user more accurately.

Description

Consumption tendency analysis method based on BilSTM and SVM
Technical Field
The invention relates to a consumption tendency analysis method based on BilSTM and SVM, belonging to the field of natural language processing.
Background
Today, the Internet and information technology are rapidly developing, and electronic commerce is a business model which is relatively advanced and has a trend of vigorous development. Electronic commerce is mainly characterized in that various commerce activities are developed on line by buyers and sellers by means of business activities developed by information technology based on application modes such as a server, a browser and the like, so that online shopping, online payment, various business activities and the like of customers are realized.
Personalized recommendation is a new e-commerce service mode, and mainly aims to analyze personal preference, personality, habits and the like of users by combining the needs of vast users, accurately provide interested services or information for the users, and further better solve various problems caused by information loss and information overload.
The existing solutions mainly include a user-based collaborative filtering algorithm (UserCF) and an item-based collaborative filtering (ItemCF). UserCF is a recommendation for items that are liked by a user who has a common interest and preference, and ItemCF is a recommendation for items that are similar to items that he previously liked. However, the UserCF cannot perform personalized recommendation immediately after a new user acts on few items, because the user similarity table is calculated offline at intervals, and the algorithm itself has difficulty in providing a recommendation explanation which is convincing to the user; ItemCF has no way to recommend new items to a user without updating the item similarity table offline. Meanwhile, the two algorithms hardly contain the side features of the articles, and the analysis is not performed aiming at the network talk, evaluation and the like of the user, so that important information possibly contained in the algorithms is ignored.
Meanwhile, emotion analysis can only judge whether a person is in a positive attitude or a negative attitude, and the user preference cannot be obtained because the emotion analysis does not analyze the characteristics of the commodity. Therefore, the invention provides a consumption tendency analysis method based on BilSTM and SVM, and aims to provide a method for analyzing the consumption tendency of a user, so that commodities can be recommended to the user more accurately.
Disclosure of Invention
The invention provides a consumption tendency analysis method based on BilSTM and SVM. The invention aims to provide a method for analyzing the consumption tendency of a user, so that commodities can be recommended to the user more accurately.
The technical scheme of the invention is as follows:
step one, data preprocessing is carried out, and whether a single sentence is related to consumption or not is judged through a Bag-of-words model. And if one word in the preprocessed words exists in the consumption-related dictionary, judging that the words are related to consumption.
And step two, carrying out named entity identification by using LTP to find the commodity in the LTP. Then, the type of the commodities appearing in the sentence is judged, and the commodities with the two tendencies are represented by positive scores and negative scores respectively.
And step three, judging the emotional attitude of the character in the sentence through the BilSTM and SVM models, namely supporting or resisting. Firstly, acquiring depth word vector characteristics of emotion classification by using a BilSTM model, and then classifying and judging the depth word vector characteristics by using an SVM model.
And step four, multiplying the scores of the first two steps to judge the consumption tendency of the user in the sentence, namely which product is preferred to buy.
And step five, counting and judging the consumption tendency of the user in the whole text.
The invention has the beneficial effects that: the method for analyzing the consumption tendency of the user is provided, and the problems that the conventional collaborative filtering algorithm is difficult to contain the side features of articles and cannot analyze the network speech, evaluation and the like of the user, and the emotional analysis can only judge the attitude of a person, whether the person is positive or negative, does not analyze the features of commodities and cannot obtain the preference of the user are solved. The method is based on the BilSTM and SVM models, comprehensive analysis is carried out on the characteristics of the commodities and the emotional attitude of the user, the preference of the user is obtained, and therefore the commodities can be recommended to the user more accurately.
Drawings
FIG. 1 is an overall block diagram of the algorithm of the present invention;
FIG. 2 is a block diagram of a bidirectional recurrent neural network BilSTM;
FIG. 3 is a process of maximum pooling layer;
FIG. 4 is a training process of a word vector model based on BiLSTM's sentiment classification;
FIG. 5 is a schematic diagram of determining the emotional attitude of a user in a sentence.
Detailed Description
The idea of the algorithm will be described below, and specific steps of the algorithm will be given.
The method comprises the following steps: determining whether a single sentence is relevant to consumption
Firstly, reading data, and preprocessing input data by utilizing jieba, wherein the preprocessing comprises word segmentation, removal of stop words and stop symbols. A user-defined dictionary is used to avoid erroneous segmentation. And after preprocessing the data, reading the text line by line to a words list. Creating a merchandish _ list, and sequentially storing all E-commerce thesaurus words: including trade names, store names, platforms, holidays, idioms, consumption-sensitive words, and the like. Reading the list requires the use of the Bag-of-words model, i.e. the document is converted into an unordered matrix by ignoring the grammar and the sequence of words.
The principle of the Bag-of-words model is that for a corpus of documents, C ═ doc1,doc2,....,docmIntegrating all the entries (Tokens) into a large lexicon (Lexicons) LcIn this patent, the term corresponds to the e-commerce thesaurus list merchandish _ list. For arbitrary text doci,i∈R+The word segmentation result is WiThen the text is represented as Vi,|Vi|=len(Lc). For document dociW of (2)iIf the jth occurrence in the thesaurus is WiIn, thatVector component V of the documentijIs its word frequency
Figure BDA0003374242880000021
Otherwise, it is 0, i.e.:
Figure BDA0003374242880000022
and retrieving the E-commerce thesaurus list merchandish _ list while reading the words list. And setting a variable flag to store the number of consumed emotional words of the sentence. As long as the words involved in the merchandish _ list exist in the words list, the sentence is judged to be related to the e-commerce, that is: the flag value is incremented by one each time a related word (also present in the merchandish _ list) in words is retrieved. If the flag value is 0, judging that the sentence does not relate to the E-commerce; and if the flag is not 0, judging that the sentence relates to the E-commerce.
Step two: the type of the commodity appearing in the sentence is judged.
In processing the input text, named entity recognition is performed using LTP. After preprocessing the data in the step, performing part-of-speech tagging on the segmented sentences by using a part-of-speech tagging tool Postagger, and storing the part-of-speech tagged sentences in a parameter postags, wherein the LTP adopts a BIESO tagging system. B represents an entity start word, I represents an entity intermediate word, E represents an entity end word, S represents a single entity, and O represents a non-constituent named entity. Named entity recognition is performed after part-of-speech tagging with Postagger. The named entity types provided by LTP are: person name (Nh), place name (Ns), organization name (Ni). After the data is initialized, the named body recognition model ner _ model recognition data of Pyltp is loaded and stored in the parameter postags.
After all identification is successful, the data is stored as a ternary list zip (words, prestags, networks), the first parameter words is a word, the second parameter prestags is a part of speech, and the third parameter networks is a named body type. And extracting the commodities in the input sentence according to the commodity dictionary, wherein the commodity dictionary covers the commodities possibly related to the input sentence.
The consumption attitude of a zip (words, posts, nets) character in a ternary list is judged by searching an active dictionary and a passive dictionary. The positive (pos _ dic)/negative (neg _ dic) dictionary is a binary list, the first column of parameters is words and the second column of parameters is emotion weights. The emotion weight is positive number and represents positive, and the more obvious the positive attitude is, the larger the weight is; negative emotion weight means negative, and the more obvious the negative attitude, the larger the absolute value of the weight. The input sequence is retrieved according to the active and passive dictionaries, and each occurrence of an active and passive word gives corresponding weight to the characters in the sentence and stores the weight in the parameter score.
Step three: determining emotional attitude of user in sentence
3.1 building a BilSTM depth word vector feature model
The keras interface of tenserflow is used to build the model. The model consists of an input layer, a word embedding layer, a BilSTM layer, a polymerization layer, a maximum pooling layer, a full-link layer and a classification layer, wherein the output of each layer is the input of the next layer. The depth word vector characteristics of emotion classification can be obtained by using the model, and the structure of the BilSTM model is shown in FIG. 2:
3.1.1 input layer
This layer is the input part of the model, i.e. a section of text T in the corpus is input for subsequent processing. It is assumed here that there is a document whose text T is "new package feeling good". After data pre-processing, the input text may be represented as: t [ 'new', 'package', 'feel', 'also', 'good' ].
3.1.2 word embedding layer
And training the corpus by word2vec to obtain a context vector list, and searching word vectors corresponding to all words of the input text in the context vector list and combining the word vectors. In this way the input sequence T can be expressed as:
Figure BDA0003374242880000041
wherein: the ith line in Z represents the m-dimensional word vector corresponding to the ith word in the input text T.
3.1.3 BilsTM layer
The layer is equivalent to a feature extraction part, and information is acquired from two opposite directions by constructing two LSTM neural networks, so that the method is more favorable for capturing the long dependency of sentences and the deep semantic expression of texts on the whole, and the input of the two neural networks is consistent. The LSTM has the advantage that it has three special gate functions: the input gate, the forgetting gate and the output gate are used for controlling the memory of the neural network. The forward calculation process of a single LSTM memory cell at a certain time t is:
forgetting a door mechanism:
ft=σ(Wf·[ht-1,Xt]+bf)
an input gate mechanism:
it=σ(Wi·[ht-1,Xt]+bf)
Figure BDA0003374242880000042
Figure BDA0003374242880000043
an output gate mechanism:
ot=σ(Wo[ht-1,Xt]+bo)
Zt=ot×tanh(Ct)
wherein: { W*,b*Is the parameter set of neural network training;
Figure BDA0003374242880000044
ft、it、otrespectively representing the output values of an input unit, a forgetting gate, an input gate and an output gate of the memory unit at the time t; h ist-1、xtRespectively representing the input of a memory unit at the time t and the input of the current memory unit; ctRepresents the internal state of the memory cell at time t; ztIndicating the output of the memory cell at time t. Based on the above structure, this layer operates as:
Figure BDA0003374242880000045
wherein LSTMf、LSTMbRespectively representing forward propagation and backward propagation of the LSTM neural network;
Figure BDA0003374242880000046
representing the output vectors of forward LSTM and backward LSTM, respectively, at time t. After bi-directional LSTM layer processing, Z after word embedding becomes the following form:
Figure BDA0003374242880000051
the column number c here represents the number of neurons in the LSTM unit.
3.1.4 polymeric layer
The layer is mainly formed by splicing forward propagation output vectors and backward propagation output vectors obtained by the previous layer, namely:
Figure BDA0003374242880000052
after the polymeric layer is treated, the layer above
Figure BDA0003374242880000053
Integration is in the form:
Figure BDA0003374242880000054
3.1.5 maximum pooling layer
The layer mainly performs maximum pooling operation to obtain the most significant characteristic value in the vector, and the influence of data sparsity on the performance of the classifier is reduced to a certain extent. Meanwhile, because the number of words contained in each input text is inconsistent, the input text is universalThe over-pooling operation can also obtain a feature vector M with a fixed lengtht. The calculation is as follows:
Mt=max{Zt(i)}1≤i≤c
the specific operation is shown in fig. 3. The left rectangular frame represents a matrix vector obtained through polymerization layer processing, the width and the height of the selected pooling unit are both 2, the step length is also set to be 2, and after pooling processing, the original matrix space is changed into a matrix space shown by the right rectangular frame.
Thus, the feature extraction work of the text data of one document is completed.
3.1.6 full connection layer
The above process describes the feature extraction process of BilSTM, and the features of all documents are converged into the depth word vector feature M finally used for emotion classification at the full link layer:
Figure BDA0003374242880000055
wherein: miAnd (1 is not less than i and not more than n) represents the depth word vector characteristics corresponding to the ith document. .
3.1.7 Classification layer
In training the neural network, the classification layer employs a softmax function. The characteristics M output by the full connection layer output the judgment categories (positive 1, neutral 0 and negative-1) of the emotional tendency by utilizing softmax, and the parameters in the network are updated in a gradient manner by adopting a back propagation algorithm in the training process.
3.2 train BilSTM depth word vector feature model
The results of word embedding are input into the model to set the number of iterations, the number of samples in a batch. After the gradient of the whole BilSTM neural network model is updated and converged, depth word vector features (i.e. features output by the full connection layer) can be obtained and used as emotion classification features, and a specific processing algorithm is shown in FIG. 4.
3.3 Classification and discrimination of depth word vector features based on SVM model
After the gradient updating of the BilSTM neural network model is converged, the depth word vector characteristics, namely the characteristics output by the full connection layer, can be obtained as emotion classification characteristics, and then the SVM is used for carrying out model training and classification judgment on the depth word vector characteristics corresponding to the samples in the training set and the test set. The overall schematic diagram for judging the emotional attitude of the user in the sentence is shown in FIG. 5.
The SVM is a classification algorithm, and the data are separated on two sides of a plane by finding a classification plane, so that the classification purpose is achieved based on the structure risk minimization principle. The SVM is a prediction tool with good generalization capability, and is widely applied to the fields of face recognition, text classification and the like. However, the emotion classification task performed in the text is oriented to a three-classification problem, and the traditional SVM algorithm is not applicable, so that one-summary-one SVM is adopted for classification, one SVM classifier is designed between any two classes, and three SVM classifiers need to be designed in total. When the emotional tendency judgment is carried out on the testcase of an unknown sample, the three designed SVM classifiers are used for carrying out classification judgment voting, and the class with the largest number of votes is the emotional class of the sample. The specific classification decision voting algorithm is as follows:
pos, Neu and Neg are training samples of positive, neutral and negative categories respectively, and generate three classifier classifiers after training1,classifier2,classifier3
classifier1=SVM(Pos,Neu)
classifier2=SVM(Pos,Neg)
classifier3=SVM(Neu,Neg)
Initializing Pos Neu Neg 0, and predicting the emotion classification of the unknown sample testcase according to the following formula:
Figure BDA0003374242880000061
Figure BDA0003374242880000062
Figure BDA0003374242880000063
the final emotion category of testcase, namely, table, is:
lable=max(Pos,Neg,Neu)
step four: calculating and judging consumption tendency of user in sentence
The row number of an input matrix of an input text after data preprocessing is set to be m, and the length of the longest row is set to be n. As can be seen from the third step and the fourth step, the consumption weight of the person in the jth row of data sentence is
Figure BDA0003374242880000071
Is the sum of the consumption weights of all persons in the row, i.e.
Figure BDA0003374242880000072
From step three, the emotional tendency of the jth line sentence is xj
xj=linej.emotendency j∈m,xj∈(0,1)
Let the consumption emotional tendency of the jth sentence be outputjTherefore, it is
Figure BDA0003374242880000073
Namely, the consumption emotional tendency of the jth line of sentence is the product of the consumption weight of the character in the jth line of sentence and the emotional tendency of the jth line of sentence. When the output is positive, the emotional attitude in the sentence is represented to be positive, and the larger the result is, the more obvious the positive attitude is represented; when the output is negative, the emotional attitude in the sentence is negative, and the larger the absolute value of the result is, the more obvious the passive attitude is; when the output is approximately 0, it indicates that the emotional attitude in the sentence is neutral.
Step five: calculating and judging consumption tendency of users in the whole text
And (4) carrying out sentence segmentation on the whole text, solving the consumption tendency of the user in a single sentence through the step four, sequentially solving the consumption emotional tendency of the user in each sentence, and further solving the consumption tendency of the user in the whole text.
Two thresholds a and b are set, where 0 < a < b < 1, and then the proportion of sentences in a certain tendency (set as tendency A) to the full text is calculated. If the ratio is between b and 1, the user is considered to be inclined to A; if the proportion is between a and b, the consumption tendency of the user is considered to be neutral; if the ratio is between 0 and a, the user is considered to be less agreeable to the tendency A. The solution of the thresholds a and b will be determined by experimental accuracy.
The invention is not the best known technology.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (4)

1. A consumption tendency analysis method based on BilSTM and SVM is characterized by comprising the following steps:
step one, data preprocessing is carried out, and whether a single sentence is related to consumption or not is judged through a Bag-of-words model. Firstly, input data is preprocessed by the jieba, and the preprocessing comprises word segmentation, word stop removal and stop characters. If one word exists in the pre-processed words in the consumption-related dictionary, the word is judged to be related to consumption, otherwise, the word is judged to be unrelated to consumption, and subsequent operation is not performed.
And step two, judging the type of the commodity appearing in the sentence. LTP is used for named entity recognition to find the commodity in the LTP. Then, the commodities with two tendencies are represented by positive scores and negative scores respectively, for example, the quality type commodity is given a positive score, and the flat type commodity is given a negative score.
And step three, judging whether the emotional attitude of the character in the sentence is supporting or resisting through the BilSTM and SVM models. Firstly, acquiring depth word vector characteristics of emotion classification by using a BilSTM model, and then classifying and judging the depth word vector characteristics by using an SVM model.
And step four, calculating and judging the consumption tendency of the user in the sentence, namely which product is preferred to be bought. Multiplying the scores of the first two steps together, with the result being a positive number indicating that the user prefers the product represented by the positive number, and vice versa.
And step five, counting and judging the consumption tendency of the user in the whole text. Dividing the whole text, calculating the consumption tendency of a single sentence through the step four, calculating the consumption tendency of each sentence in turn, setting two threshold values, judging that the user is inclined to a certain product when the proportion of sentences inclined to the certain product in the text is higher than the larger threshold value, judging that the user does not like the certain product when the proportion of sentences inclined to the certain product in the text is lower than the smaller threshold value, and judging that the user is neutral when the proportion of sentences inclined to the certain product in the text is between the two threshold values.
2. The method of claim 1, wherein the second step of determining the type of the commodity appearing in the sentence is followed by assigning a positive score or a negative score according to the tendency. For example, if the user wants to analyze whether to pursue quality or pay price, two commodity vocabularies of a good quality type and a good price type are organized, each vocabularies has two columns, the first column is a word, and the second column is a score. The score of the quality type is positive, and the higher the price is, the larger the score is; the score of the flat type is a negative number, and the lower the price, the larger the absolute value of the score. And after word segmentation, if commodities in the word list appear, acquiring a corresponding score.
3. The method of claim 1, wherein the bilateral long-short term memory network BilSTM and the SVM are used in the third step to determine whether the sentence has an emotional attitude, i.e., a positive attitude or a negative attitude. Firstly, acquiring depth word vector characteristics of emotion classification by using a BilSTM model, acquiring the depth word vector characteristics after gradient updating convergence of the BilSTM neural network model, namely, acquiring the characteristics output by a full-link layer as emotion classification characteristics, and further performing model training and classification judgment on the depth word vector characteristics corresponding to samples in a training set and a test set by using an SVM (support vector machine).
4. The method of claim 1, wherein the fourth step multiplies the scores of the first two steps, and if the result is positive, the product is indicated that the user prefers to be indicated by the positive number, and the larger the result is, the more obvious the attitude is; a negative result indicates that the user is more inclined to the item represented by the negative number, and a larger absolute value of the result indicates a more obvious attitude.
CN202111416830.9A 2021-11-26 2021-11-26 Consumption tendency analysis method based on BilSTM and SVM Pending CN114168730A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111416830.9A CN114168730A (en) 2021-11-26 2021-11-26 Consumption tendency analysis method based on BilSTM and SVM

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111416830.9A CN114168730A (en) 2021-11-26 2021-11-26 Consumption tendency analysis method based on BilSTM and SVM

Publications (1)

Publication Number Publication Date
CN114168730A true CN114168730A (en) 2022-03-11

Family

ID=80480828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111416830.9A Pending CN114168730A (en) 2021-11-26 2021-11-26 Consumption tendency analysis method based on BilSTM and SVM

Country Status (1)

Country Link
CN (1) CN114168730A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015043075A1 (en) * 2013-09-29 2015-04-02 广东工业大学 Microblog-oriented emotional entity search system
US20150095330A1 (en) * 2013-10-01 2015-04-02 TCL Research America Inc. Enhanced recommender system and method
WO2015103695A1 (en) * 2014-01-10 2015-07-16 Cluep Inc. Systems, devices, and methods for automatic detection of feelings in text
US9336192B1 (en) * 2012-11-28 2016-05-10 Lexalytics, Inc. Methods for analyzing text
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN110879938A (en) * 2019-11-14 2020-03-13 中国联合网络通信集团有限公司 Text emotion classification method, device, equipment and storage medium
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph
KR20210094461A (en) * 2020-01-21 2021-07-29 김종호 System and method extracting information according to experience of product

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9336192B1 (en) * 2012-11-28 2016-05-10 Lexalytics, Inc. Methods for analyzing text
WO2015043075A1 (en) * 2013-09-29 2015-04-02 广东工业大学 Microblog-oriented emotional entity search system
US20150095330A1 (en) * 2013-10-01 2015-04-02 TCL Research America Inc. Enhanced recommender system and method
WO2015103695A1 (en) * 2014-01-10 2015-07-16 Cluep Inc. Systems, devices, and methods for automatic detection of feelings in text
CN107153642A (en) * 2017-05-16 2017-09-12 华北电力大学 A kind of analysis method based on neural network recognization text comments Sentiment orientation
CN110879938A (en) * 2019-11-14 2020-03-13 中国联合网络通信集团有限公司 Text emotion classification method, device, equipment and storage medium
KR20210094461A (en) * 2020-01-21 2021-07-29 김종호 System and method extracting information according to experience of product
CN111914096A (en) * 2020-07-06 2020-11-10 同济大学 Public transport passenger satisfaction evaluation method and system based on public opinion knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李俭兵;刘栗材;: "基于改进型神经网络的影评文本情感分析算法", 计算机工程与科学, no. 12, 15 December 2019 (2019-12-15) *

Similar Documents

Publication Publication Date Title
CN110609897B (en) Multi-category Chinese text classification method integrating global and local features
CN107608956B (en) Reader emotion distribution prediction algorithm based on CNN-GRNN
CN102831184B (en) According to the method and system text description of social event being predicted to social affection
CN111260437B (en) Product recommendation method based on commodity-aspect-level emotion mining and fuzzy decision
CN110489523B (en) Fine-grained emotion analysis method based on online shopping evaluation
CN114330354B (en) Event extraction method and device based on vocabulary enhancement and storage medium
CN112487189B (en) Implicit discourse text relation classification method for graph-volume network enhancement
CN110119849B (en) Personality trait prediction method and system based on network behaviors
CN112862569B (en) Product appearance style evaluation method and system based on image and text multi-modal data
CN114238577B (en) Multi-task learning emotion classification method integrating multi-head attention mechanism
CN107818173B (en) Vector space model-based Chinese false comment filtering method
CN111966888B (en) Aspect class-based interpretability recommendation method and system for fusing external data
CN112182145A (en) Text similarity determination method, device, equipment and storage medium
CN109584006A (en) A kind of cross-platform goods matching method based on depth Matching Model
CN113326374A (en) Short text emotion classification method and system based on feature enhancement
CN111353044A (en) Comment-based emotion analysis method and system
CN114942974A (en) E-commerce platform commodity user evaluation emotional tendency classification method
CN111061939A (en) Scientific research academic news keyword matching recommendation method based on deep learning
CN111400449A (en) Regular expression extraction method and device
Kim et al. Accurate and prompt answering framework based on customer reviews and question-answer pairs
CN111666410B (en) Emotion classification method and system for commodity user comment text
CN113761910A (en) Comment text fine-grained emotion analysis method integrating emotional characteristics
CN111414755A (en) Network emotion analysis method based on fine-grained emotion dictionary
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
CN114168730A (en) Consumption tendency analysis method based on BilSTM and SVM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination