CN109829166A - People place customer input method for digging based on character level convolutional neural networks - Google Patents
People place customer input method for digging based on character level convolutional neural networks Download PDFInfo
- Publication number
- CN109829166A CN109829166A CN201910117188.0A CN201910117188A CN109829166A CN 109829166 A CN109829166 A CN 109829166A CN 201910117188 A CN201910117188 A CN 201910117188A CN 109829166 A CN109829166 A CN 109829166A
- Authority
- CN
- China
- Prior art keywords
- people place
- theme
- text
- character
- convolutional neural
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the people place customer input method for digging based on character level convolutional neural networks, the following steps are included: building web crawlers, it acquires whole people place comments and establishes out people place dictionary, feature extraction and vectorization are carried out to text using TF-IDF and carry out visual Subject Clustering, construct people place subject dictionary, find out after subordinate sentence corresponding evaluation item number in text, Weakly supervised based on naive Bayesian is presorted, the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, obtain feeling polarities, emotion visualization and verifying model are carried out to sense polarity;The method of the present invention can be from a large amount of emotions and user demand being hidden in these personalized reviews with excavation in noise and false comment data, it will be helpful to the decision behavior of business organization and individual subscriber, the method of the present invention is from the angle of data-driven simultaneously, satisfaction situation of the customer under each theme can be excavated, as a result can provide suggestion for people place operator and regulator.
Description
Technical field
The present invention relates to people place customer input method for digging fields, more particularly to based on character level convolutional neural networks
People place customer input method for digging.
Background technique
Customer input excavation is the analysis to customer demand and opinion, carries out analysis to customer's comment and is conducive to people place clothes
The improvement and iteration of business, due to the invisible nature of people place service, the online comment of people place than other information sources influence more
Greatly, therefore, it is excavated by means of customer input and improves service quality, be the key that Rapid Accumulation competitive advantage, customer's meaning of mainstream
See there are two types of excavation modes, first is that being directed to structured data analysis, that is, is based on structural data, such as questionnaire, Li Ke
Special scale, semantic difference scale etc., to obtain appreciable, effective attribute;Second is that being analyzed for unstructured data, i.e.,
The characteristics of data itself are analyzed by natural language processing technique, visualization technique is commenting on website, forum, blog and society
The text that great expression opinion can be obtained in media is handed over, and with the help of sentiment analysis system, this unstructured letter
Breath can be automatically converted to structural data, it can capturing expression can be with about product, service, brand, politics or people
Other themes of expression of opinion etc..
The comment of people place has the spies such as timeliness is strong, context theme is independent, viewpoint is clear, length is short and small, expression is random
Point, existing customer input are excavated mode and are still deposited in terms of how efficiently excavating the customer's viewpoint and emotion that are hidden in noise
In deficiency, it is unable to satisfy actual demand, therefore, the present invention proposes the people place customer input based on character level convolutional neural networks
Method for digging, to solve shortcoming in the prior art.
Summary of the invention
In view of the above-mentioned problems, the method for the present invention can be hidden from a large amount of with excavation in noise and false comment data
Emotion and user demand in these personalized reviews, it will help the decision behavior of business organization and individual subscriber, simultaneously
The method of the present invention can excavate satisfaction situation of the customer under each theme, as a result may be used from the angle of data-driven
Suggestion is provided for people place operator and regulator, there is very strong versatility, to consumer, operator and overseer specific one
Fixed practical utility value.
The present invention proposes the people place customer input method for digging based on character level convolutional neural networks, comprising the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, acquire whole people place comments and establish out
Then punctuation mark is substituted using Harbin Institute of Technology's open source LTP part-of-speech tagging function using newline, will be commented by people place dictionary
Theme line in is decomposed, and is formed theme and is evaluated text;
Step 2: Subject Clustering makes after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF
Visual Subject Clustering is carried out to the comment of people place with pyLDAvis, visualization cluster result is obtained, according still further to similarity in cluster
Height, the low theme selection standard of similarity selects original text number of files k between cluster, obtains initial model, then calculate each theme t
Between correlation;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, so
The evaluation item number of corresponding theme is counted afterwards;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part and is chased after
The former comment commented, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vector
The mode of change calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, by the unstructured comment of character level as original
Signal carries out duplicate removal according to character, and carries out descending arrangement according to character frequency and establish character list, by polling character table
The mode of position ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, lead to
It crosses softmax function to export to obtain feeling polarities, be printed by parameter of the Keras neural network tool to this model;
Step 7: to obtaining after the convolutional neural networks feature extraction of one-dimensional convolution kernel, feeling polarities progress emotion is visual
Change, compare the customer input tendency under multiple themes, specific aim is carried out with the customer input tendency under multiple themes after comparison
Improvement, the total satisfactory grade of people place is improved with this;
Step 8: verifying model is carried out 10 times under condition of equivalent using the model evaluation method of ten folding cross validations
Experiment uses the accuracy of average test collection, accuracy of the mean, average recall rate and average F value as evaluation index and carries out model
The verifying of validity.
Further improvement lies in that: in the step 2 shown in TF-IDF formula such as formula (1):
The positional factor of distribution situation and Feature Words between characteristic item classification different in a classification is to text
Discrimination, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF-
IDF method calculates the weights of Feature Words, and word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text containing entry t
Number of files is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k.
Further improvement lies in that: topic relativity is calculated as shown in formula (3) in the step 2:
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, in the theme
The word w more frequently occurred under t, it is more relevant with theme t;If word λ more special, more exclusive under 0, theme t
W, it is more relevant with theme t, change domanial words term_w with the correlation of theme topic_t by adjusting the size of λ.
Further improvement lies in that: the value of text document number k is referring initially to people place normative document, then benefit in the step 2
With experiment by the way that on the basis of k=6, using the method for successively rising high-k, by reducing the intersection between main body, observation theme is not
The minimum k value of covering carries out the selection for subject attribute word as number of topics.
Further improvement lies in that: in the step 5 shown in the calculation formula of output probability such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, presort
When, using 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than the 0.9 positive text high as confidence level
This, as confidence level high passive text of the output probability less than 0.1.
Further improvement lies in that: referring initially to the pixel scale processing scheme in image procossing in the step 6, it is assumed that
The size of dictionary is n, by way of establishing character list, comment is carried out vectorization using the ID of character, is then introduced into one layer
Con convolutional neural networks are handled, input layer using the Embdding layers of character vector by all characters of sentence into
Row is spliced into a sentence matrix, the use of Pad length is 200 text size to cover 99%, mends 0 using " Pre " stem
Method fill 0 in front in the case where text size is inadequate, and to Embdding layers of character weight be configured for
Training updates, and then carries out feature extraction using one-dimensional convolution kernel Convolution1D, passes through one layer of global maximum pond layer
Sampling and two layers of full articulamentum, finally export the softmax probability value using positive label as feeling polarities, pass through Keras
Neural network tool prints the parameter of this model.
Further improvement lies in that: when the character vector of all characters of sentence is individual word in the step 6, no
Carry out word segmentation processing.
The invention has the benefit that the method for the present invention can be dug from a large amount of comment datas with noise and falseness
Dig the emotion and user demand being hidden in these personalized reviews, it will help the decision row of business organization and individual subscriber
For, while the method for the present invention can excavate satisfaction feelings of the customer under each theme from the angle of data-driven
As a result condition can provide suggestion for people place operator and regulator, and by improvement idea mining algorithm, for people place corpus
Less problem proposes the sentiment analysis algorithm of the visualization subject extraction and Weakly supervised pre-training that are suitable for the comment of people place, can
To realize the hidden feature subject extraction and sentiment analysis of online people place comment, and can be with accurate validation mould by verifying model
Type validity, the method for the present invention have very strong versatility, to the specific certain practical benefit of consumer, operator and overseer
With value.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention.
Fig. 2 is LDA probabilistic model schematic diagram of the present invention.
Fig. 3 is the method for the present invention model structure schematic diagram.
Fig. 4 is the method for the present invention model parameter schematic diagram.
Fig. 5 is that people host inscribes visualization schematic diagram in the embodiment of the present invention.
Each theme comments on accounting schematic diagram in Fig. 6 embodiment of the present invention.
Service-feeling polarities distribution schematic diagram in Fig. 7 embodiment of the present invention.
Fig. 8 is that the customer input in the embodiment of the present invention under each theme visualizes schematic diagram.
Fig. 9 is experience-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 10 is characteristic-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 11 is facility-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 12 is traffic-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 13 is price-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 14 is environment-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 15 is food and drink-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Specific embodiment
In order to realize invention technological means, reach purpose and effect is easy to understand, below with reference to specific implementation
Mode, the present invention is further explained.
According to Fig. 1,2,3,4,5,6,7,8, the present embodiment proposes that the people place based on character level convolutional neural networks cares for
Objective opinion mining method, comprising the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, and acquisition is taken Cheng Chongqing people place plate, adopted
Collect between on July 26,26 to 2018 years July in 2016 it is all take the Cheng Chongqing plate whole people place comment establish out people place dictionary,
The subject attribute word of building is 100, and comment item number will affect subject distillation less than 100, so this data only chooses comment
People place comment of the number of users greater than 100 and marking, the qualified corpus item number finally sorted out share 81810, contain
There are unmarked 10000 to chase after to comment.After establishing people place dictionary, using Harbin Institute of Technology's open source LTP part-of-speech tagging function by punctuate
Symbol is substituted using newline, and the theme line in comment is decomposed, and forms text, such as says " boss's enthusiasm, room
Between neat and tidy, and inn is in scenic spot " be decomposed into that " boss enthusiasm ", " room neat and tidy ", " and inn is at scenic spot
It is interior " three theme evaluations;
Step 2: Subject Clustering makes after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF
Visual Subject Clustering is carried out to the comment of people place with pyLDAvis, visualization cluster result is obtained, according still further to similarity in cluster
Height, the low theme selection standard of similarity selects original text number of files k between cluster, obtains initial model, then calculate each theme t
Between correlation, shown in TF-IDF formula such as formula (1):
The positional factor of distribution situation and Feature Words between characteristic item classification different in a classification is to text
Discrimination, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF-
IDF method calculates the weights of Feature Words, and word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text containing entry t
Number of files is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k;
Topic relativity is calculated as shown in formula (3):
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, in the theme
The word w more frequently occurred under t, it is more relevant with theme t;If word λ more special, more exclusive under 0, theme t
W, it is more relevant with theme t, change domanial words term_w with the correlation of theme topic_t, Fig. 5 by adjusting the size of λ
The circle in middle left side represents different themes, and the distance between circle is the similarity between each theme, can help to push away
The customer input number for calculating online people place, after selecting some theme, right panel can correspondingly be shown with this theme
Nearest vocabulary can summarize the meaning of the theme by summarizing the meaning of these lexical representations, with reference to people place standard text
Part carries out the selection for subject attribute word by the method on the basis of K=6, successively rising high-k using experiment, works as theme
Each theme intersects less when number K=8, is evenly distributed, and effect is best, selects the 8th theme, and the descriptor that inside includes has
The descriptor such as " surrounding enviroment ", " elevator ", " bedding ", " garden ", " desk ", " road ", by checking " independence " one
Word is followed by " toilet ", and after being concluded by descriptor, the theme for showing that theme 8 includes has, two masters of " environment ", " facility "
Topic, it is also that same mode carries out conclusion theme that 7 themes, which are concluded, realizes theme to the maximal cover of evaluation;
People host by means of people place authority file and visualization cluster auxiliary building people place subject dictionary, after building
Topic and subject attribute word are as shown in table 1 below:
1 subject attribute word set of table
Find out corresponding evaluation item number after subordinate sentence by the matched mode of attribute word, to the evaluation item number of corresponding theme into
Row statistics, in the comment of discovery people place, to facility, service, environment, traffic, food and drink, characteristic, price, experience in customer input
Attention rate successively weakens, wherein it is less to price and the comment number of experience, shown in Fig. 6 specific as follows;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, so
The evaluation item number of corresponding theme is counted afterwards;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part and is chased after
The former comment commented, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vector
The mode of change calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort, output probability
Shown in calculation formula such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, presort
When, using 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than the 0.9 positive text high as confidence level
This, as confidence level high passive text of the output probability less than 0.1;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, and the text of character level is regarded original signal, is pressed
Duplicate removal is carried out according to character, and carries out descending arrangement according to character frequency and establishes character list, passes through the position in polling character table
The mode of ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, obtain emotion pole
Property, referring initially to the pixel scale processing scheme in image procossing, input layer (InputLayer) is the character of every evaluation
Vector, output layer are feeling polarities using softmax output, and model structure is as shown in Figure 3, it is assumed that the size of dictionary is n, is led to
The mode for establishing character list is crossed, comment is subjected to vectorization using the ID of character, is then introduced into one layer of Con convolutional neural networks
Handled, input layer (InputLayer) using the Embdding_1 layers of character vector by all characters of sentence into
Row is spliced into a sentence matrix, is obtained by statistics: character matrix length is 200 text size to cover 99%
(input:200), for length can be changed text using " pre " stem mend 0 method, text size it is inadequate 200 the case where
Under, 0 is filled in front, and be configured to Embdding layers of character weight and update for training, then use
Convolution1D (conv1d_1) carries out feature extraction, passes through one layer of global maximum pond layer sampling
(GlobalMaxPooling1D) and two layers of full articulamentum (dense_1 and dense_2) it, finally exports with positive label
Softmax probability value is printed as feeling polarities by parameter of the Keras neural network tool to this model, specific to join
Number as shown in figure 4, all characters of sentence character vector be individual word when, without word segmentation processing;
In Fig. 6, indicate that corresponding theme evaluates Sentiment orientation using abscissa, the emotion score value of every comment is fallen in
Between [0,1], it is stronger that the step size settings of abscissa 0.01, closer to 1 (right half) represent positive emotion, (left closer to 0
Part) represent that Negative Affect is stronger, output probability is when intermediate position, it is believed that emotion is neutrality, and emotional value is
0, ordinate indicates that corresponding theme comments on item number;
Step 7: to obtaining after the convolutional neural networks feature extraction of one-dimensional convolution kernel, feeling polarities progress emotion is visual
Change, compare the customer input tendency under multiple themes, specific aim is carried out with the customer input tendency under multiple themes after comparison
Improvement, the total satisfactory grade of people place is improved with this, the feelings under each theme can be obtained by the result of collect statistics Fig. 6
The distribution of sense trend, the customer formed under each theme comment on feeling polarities figure, altogether 8 theme emotion tendency charts, such as Fig. 7 institute
Show, from the point of view of people place customer's sentiment analysis result of theme, " the clothes of Chongqing Min Su can be concluded that by analyzing
Business ", " traffic ", " experience ", " environment ", the themes such as " price " are had higher rating, and image is whole obviously to be tilted to right half, this with
Chongqing is undivided service trade as the geographical location of development strategy and " city of scenery with hills and waters ", Chongqing traffic convenience, near the mountain
And build, index of travelling in recent years rises year by year, attracts large quantities of external tourists to come Chongqing and plays, novel is felt to the experience of people place,
For consumption price, Chongqing is in southwest, consumes, price material benefit favorable comment by customer slightly lower compared with eastern region, but
Be to " food and drink ", " characteristic ", " facility " sentiment analysis from the point of view of, for the opinion of customer than stronger, this may be with Chongqing region
With eat it is peppery based on, Lai Chongqing plays in the majority with nonlocal tourist, may it is uncomfortable to diet caused by, the position of general people place
Close to sight spot it is in the majority, consider cost problem, less input on facility, and customer is larger to its opinion, the later period can by with
Sight spot cooperation carrys out more new facility, and feeling polarities figure can only indicate that the customer input under single theme is inclined to, further progress point
Analysis is 0.2 progress emotion visualization according to step-length, compares the customer input tendency under multiple themes, and abscissa indicates evaluation master
The case where topic, ordinate indicates emotion accounting, can compare the customer input of multiple themes simultaneously, show in figure and single feelings
Sense polarity is consistent, and customer is bigger to people place " facility ", the opinion of " food and drink ", and satisfaction is lower, can carry out accordingly later
Targetedly improve, the total satisfactory grade of people place is improved with this, it is specific as shown in Figure 8;
Step 8: verifying model is carried out 10 times under condition of equivalent using the model evaluation method of ten folding cross validations
Experiment uses the accuracy of average test collection, accuracy of the mean, average recall rate and average F value as evaluation index and carries out model
The verifying of validity, training set are selected artificial using 36000 text by weak training aids filtering and hand picking, test set
12000 comments of label, use decision tree (DT), naive Bayesian (NB), support vector machines (SVM) and RNN (LSTM) four
Kind of algorithm and whether using the mode of Weakly supervised pre-training experiment is compared, C-CNN-SA indicates this paper model, CNN-W table
Show that the character level CNN, CNN-N that presort without using Weak Classifier indicate to use word the grade CNN, CNN-S of standard to indicate that use is gone
Word grade CNN, C-RNN after stop words indicate that the LSTM using character level, evaluating result are as shown in table 2:
2 model evaluating tables of data of table
From table 2 it can be seen that the accuracy of test set improves 2% after pre-treatment step is added, on emotional semantic classification,
The method of the present invention compares traditional word grade model using improved model, has a certain upgrade on classification accuracy, in short text
Under emotional semantic classification, the granularity accuracy rate of character level is higher than word grade, can be due to expectation is shorter, can using stop words filtering
Can lose text information causes classification performance to decline, and the text of character level is regarded original input signal, is directly used
One-dimensional convolutional neural networks carry out feature extraction, in the case where short text, may not need the word level for considering language
Meaning, this mode are simplified the engineering of sentiment analysis.
The method of the present invention can be commented from largely these personalizations are hidden in excavation in noise and false comment datas
Emotion and user demand in, it will help the decision behavior of business organization and individual subscriber, while the method for the present invention is from number
Set out according to the angle of driving, satisfaction situation of the customer under each theme can be excavated, as a result can for people place operator and
Regulator provides suggestion, and by improvement idea mining algorithm, for the less problem of people place corpus, proposition is suitable for the people
The hidden of online people place comment may be implemented in the sentiment analysis algorithm of the visualization subject extraction and Weakly supervised pre-training of place comment
Can have containing particular subject extraction and sentiment analysis, and by verifying model with accurate validation model validation, the method for the present invention
Very strong versatility, to the specific certain practical utility value of consumer, operator and overseer.
The basic principles, main features and advantages of the invention have been shown and described above.The technical staff of the industry should
Understand, the present invention is not limited to the above embodiments, and the above embodiments and description only describe originals of the invention
Reason, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these change and change
Into all fall within the protetion scope of the claimed invention.The claimed scope of the invention is by appended claims and its equivalent
Object defines.
Claims (7)
1. the people place customer input method for digging based on character level convolutional neural networks, which comprises the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, acquire whole people place comments and establish out people place word
Then allusion quotation is substituted punctuation mark using newline using Harbin Institute of Technology's open source LTP part-of-speech tagging function, by the master in comment
Topic sentence is decomposed, and is formed theme and is evaluated text;
Step 2: Subject Clustering uses after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF
PyLDAvis carries out visual Subject Clustering to the comment of people place, obtains visualization cluster result, high according still further to similarity in cluster,
The theme selection standard that similarity is low between cluster selects original text number of files k, obtains initial model, then calculate between each theme t
Correlation;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, then right
The evaluation item number of corresponding theme is counted;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part to chase after and be commented
Original comment, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vectorization
Mode calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, and original signal is regarded in the unstructured comment of character level,
Duplicate removal is carried out according to character, and carries out descending arrangement according to character frequency and establishes character list, passes through the position in polling character table
The mode for setting ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, pass through
Softmax function exports to obtain feeling polarities, is printed by parameter of the Keras neural network tool to this model;
Step 7: carrying out emotion visualization to feeling polarities are obtained after the convolutional neural networks feature extraction of one-dimensional convolution kernel, right
Than the customer input tendency under multiple themes, targetedly changed with the customer input tendency under multiple themes after comparison
It is kind, the total satisfactory grade of people place is improved with this;
Step 8: verifying model carries out 10 experiments using the model evaluation method of ten folding cross validations under condition of equivalent,
The accuracy of average test collection, accuracy of the mean, average recall rate and average F value, which are used, as evaluation index carries out model validation
Verifying.
2. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature
It is: in the step 2 shown in TF-IDF formula such as formula (1):
Differentiation of the positional factor of distribution situation and Feature Words between characteristic item classification different in a classification to text
Degree, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF-IDF method
Calculate the weight of Feature Words, word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text document containing entry t
Number is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k.
3. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature
Be: topic relativity is calculated as shown in formula (3) in the step 2:
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, at theme t more
The word w frequently occurred, it is more relevant with theme t;If word w λ more special, more exclusive under 0, theme t, with master
Topic t is more relevant, changes domanial words term_w with the correlation of theme topic_t by adjusting the size of λ.
4. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature
Be: the value of text document number k is base by k=6 referring initially to people place normative document, re-using experiment in the step 2
Standard, using the method for successively rising high-k, by reducing the intersection between main body, the minimum k value that observation theme does not cover is as master
Number is inscribed, the selection for subject attribute word is carried out.
5. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature
It is: in the step 5 shown in the calculation formula of output probability such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, when presorting, make
With 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than 0.9 active text high as confidence level, output
As confidence level high passive text of the probability less than 0.1.
6. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature
It is: referring initially to the pixel scale processing scheme in image procossing in the step 6, it is assumed that the size of dictionary is n, is passed through
Comment is carried out vectorization using the ID of character, is then introduced into one layer of Con convolutional neural networks and carries out by the mode for establishing character list
Processing, carries out being spliced into a sentence square using the Embdding layers of character vector by all characters of sentence in input layer
Battle array, is 200 text size to cover 99% using Pad length, and the method for mending 0 using " Pre " stem is inadequate in text size
In the case where, 0 is filled in front, and be configured to Embdding layers of character weight and update for training, then using one-dimensional
Convolution kernel Convolution1D carries out feature extraction, by one layer of global maximum pond layer sampling and two layers of full articulamentum, finally
Output is using the softmax probability value of positive label as feeling polarities, by Keras neural network tool to the parameter of this model
It is printed.
7. the people place customer input method for digging according to claim 6 based on character level convolutional neural networks, feature
It is: when the character vector of all characters of sentence is individual word in the step 6, without word segmentation processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910117188.0A CN109829166B (en) | 2019-02-15 | 2019-02-15 | People and host customer opinion mining method based on character-level convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910117188.0A CN109829166B (en) | 2019-02-15 | 2019-02-15 | People and host customer opinion mining method based on character-level convolutional neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109829166A true CN109829166A (en) | 2019-05-31 |
CN109829166B CN109829166B (en) | 2022-12-27 |
Family
ID=66862072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910117188.0A Active CN109829166B (en) | 2019-02-15 | 2019-02-15 | People and host customer opinion mining method based on character-level convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109829166B (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347828A (en) * | 2019-06-26 | 2019-10-18 | 西南交通大学 | A kind of Metro Passenger demand dynamic acquisition method and its obtain system |
CN110688451A (en) * | 2019-08-15 | 2020-01-14 | 中国平安人寿保险股份有限公司 | Evaluation information processing method, evaluation information processing device, computer device, and storage medium |
CN110838287A (en) * | 2019-10-16 | 2020-02-25 | 中国第一汽车股份有限公司 | Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium |
CN111027553A (en) * | 2019-12-23 | 2020-04-17 | 武汉唯理科技有限公司 | Character recognition method for circular seal |
CN111159409A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN111309859A (en) * | 2020-01-21 | 2020-06-19 | 上饶市中科院云计算中心大数据研究院 | Scenic spot network public praise emotion analysis method and device |
CN111445271A (en) * | 2020-03-31 | 2020-07-24 | 携程计算机技术(上海)有限公司 | Model generation method, and prediction method, system, device and medium for cheating hotel |
CN112070856A (en) * | 2020-09-16 | 2020-12-11 | 重庆师范大学 | Limited angle C-arm CT image reconstruction method based on non-subsampled contourlet transform |
CN112784776A (en) * | 2021-01-26 | 2021-05-11 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
CN113778454A (en) * | 2021-09-22 | 2021-12-10 | 重庆海云捷迅科技有限公司 | Automatic evaluation method and system for artificial intelligence experiment platform |
CN116385029A (en) * | 2023-04-20 | 2023-07-04 | 深圳市天下房仓科技有限公司 | Hotel bill detection method, system, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks |
CN107391483A (en) * | 2017-07-13 | 2017-11-24 | 武汉大学 | A kind of comment on commodity data sensibility classification method based on convolutional neural networks |
CN108345587A (en) * | 2018-02-14 | 2018-07-31 | 广州大学 | A kind of the authenticity detection method and system of comment |
CN109033089A (en) * | 2018-09-06 | 2018-12-18 | 北京京东尚科信息技术有限公司 | Sentiment analysis method and apparatus |
CN109308317A (en) * | 2018-09-07 | 2019-02-05 | 浪潮软件股份有限公司 | A kind of hot spot word extracting method of the non-structured text based on cluster |
-
2019
- 2019-02-15 CN CN201910117188.0A patent/CN109829166B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107038480A (en) * | 2017-05-12 | 2017-08-11 | 东华大学 | A kind of text sentiment classification method based on convolutional neural networks |
CN107391483A (en) * | 2017-07-13 | 2017-11-24 | 武汉大学 | A kind of comment on commodity data sensibility classification method based on convolutional neural networks |
CN108345587A (en) * | 2018-02-14 | 2018-07-31 | 广州大学 | A kind of the authenticity detection method and system of comment |
CN109033089A (en) * | 2018-09-06 | 2018-12-18 | 北京京东尚科信息技术有限公司 | Sentiment analysis method and apparatus |
CN109308317A (en) * | 2018-09-07 | 2019-02-05 | 浪潮软件股份有限公司 | A kind of hot spot word extracting method of the non-structured text based on cluster |
Non-Patent Citations (3)
Title |
---|
KOYEL CHAKRABORTY等: "Sentiment Analysis on a Set of Movie Reviews Using Deep Learning Techniques", 《SOCIAL NETWORK ANALYTICS:COMPUTATIONAL RESEARCH METHODS AND TECHNIQUES》 * |
周敬一等: "基于深度学习的中文影评情感分析", 《上海大学学报(自然科学版)》 * |
秦鹏等: "基于朴素贝叶斯网页分类的用户行为推衍", 《沈阳工业大学学报》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347828A (en) * | 2019-06-26 | 2019-10-18 | 西南交通大学 | A kind of Metro Passenger demand dynamic acquisition method and its obtain system |
CN110688451A (en) * | 2019-08-15 | 2020-01-14 | 中国平安人寿保险股份有限公司 | Evaluation information processing method, evaluation information processing device, computer device, and storage medium |
CN110838287A (en) * | 2019-10-16 | 2020-02-25 | 中国第一汽车股份有限公司 | Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium |
CN111027553A (en) * | 2019-12-23 | 2020-04-17 | 武汉唯理科技有限公司 | Character recognition method for circular seal |
CN111159409B (en) * | 2019-12-31 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN111159409A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN111309859A (en) * | 2020-01-21 | 2020-06-19 | 上饶市中科院云计算中心大数据研究院 | Scenic spot network public praise emotion analysis method and device |
CN111445271A (en) * | 2020-03-31 | 2020-07-24 | 携程计算机技术(上海)有限公司 | Model generation method, and prediction method, system, device and medium for cheating hotel |
CN112070856A (en) * | 2020-09-16 | 2020-12-11 | 重庆师范大学 | Limited angle C-arm CT image reconstruction method based on non-subsampled contourlet transform |
CN112784776A (en) * | 2021-01-26 | 2021-05-11 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
CN112784776B (en) * | 2021-01-26 | 2022-07-08 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
CN113778454A (en) * | 2021-09-22 | 2021-12-10 | 重庆海云捷迅科技有限公司 | Automatic evaluation method and system for artificial intelligence experiment platform |
CN113778454B (en) * | 2021-09-22 | 2024-02-20 | 重庆海云捷迅科技有限公司 | Automatic evaluation method and system for artificial intelligent experiment platform |
CN116385029A (en) * | 2023-04-20 | 2023-07-04 | 深圳市天下房仓科技有限公司 | Hotel bill detection method, system, electronic equipment and storage medium |
CN116385029B (en) * | 2023-04-20 | 2024-01-30 | 深圳市天下房仓科技有限公司 | Hotel bill detection method, system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109829166B (en) | 2022-12-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109829166A (en) | People place customer input method for digging based on character level convolutional neural networks | |
CN106919689B (en) | Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge | |
CN102929861B (en) | Method and system for calculating text emotion index | |
CN107025299B (en) | A kind of financial public sentiment cognitive method based on weighting LDA topic models | |
CN108108849A (en) | A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning | |
CN109241255A (en) | A kind of intension recognizing method based on deep learning | |
CN109933664A (en) | A kind of fine granularity mood analysis improved method based on emotion word insertion | |
CN107169001A (en) | A kind of textual classification model optimization method based on mass-rent feedback and Active Learning | |
CN101004737A (en) | Individualized document processing system based on keywords | |
CN105468713A (en) | Multi-model fused short text classification method | |
CN101599071A (en) | The extraction method of conversation text topic | |
CN110674274A (en) | Knowledge graph construction method for food safety regulation question-answering system | |
CN108280155A (en) | The problem of based on short-sighted frequency, retrieves feedback method, device and its equipment | |
CN110134947A (en) | A kind of sensibility classification method and system based on uneven multi-source data | |
CN103150333A (en) | Opinion leader identification method in microblog media | |
CN1687924A (en) | Method for producing internet personage information search engine | |
CN108388554A (en) | Text emotion identifying system based on collaborative filtering attention mechanism | |
CN110990670B (en) | Growth incentive book recommendation method and recommendation system | |
CN107145514A (en) | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models | |
CN109492105A (en) | A kind of text sentiment classification method based on multiple features integrated study | |
CN106777193A (en) | A kind of method for writing specific contribution automatically | |
CN110633367A (en) | Seven-emotion classification method based on emotion dictionary and microblog text data | |
CN110059190A (en) | A kind of user's real-time point of view detection method based on social media content and structure | |
CN113934835B (en) | Retrieval type reply dialogue method and system combining keywords and semantic understanding representation | |
CN109582783A (en) | Hot topic detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |