CN109829166A - People place customer input method for digging based on character level convolutional neural networks - Google Patents

People place customer input method for digging based on character level convolutional neural networks Download PDF

Info

Publication number
CN109829166A
CN109829166A CN201910117188.0A CN201910117188A CN109829166A CN 109829166 A CN109829166 A CN 109829166A CN 201910117188 A CN201910117188 A CN 201910117188A CN 109829166 A CN109829166 A CN 109829166A
Authority
CN
China
Prior art keywords
people place
theme
text
character
convolutional neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910117188.0A
Other languages
Chinese (zh)
Other versions
CN109829166B (en
Inventor
杨有
张振
罗凌
余平
尚晋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Normal University
Original Assignee
Chongqing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Normal University filed Critical Chongqing Normal University
Priority to CN201910117188.0A priority Critical patent/CN109829166B/en
Publication of CN109829166A publication Critical patent/CN109829166A/en
Application granted granted Critical
Publication of CN109829166B publication Critical patent/CN109829166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the people place customer input method for digging based on character level convolutional neural networks, the following steps are included: building web crawlers, it acquires whole people place comments and establishes out people place dictionary, feature extraction and vectorization are carried out to text using TF-IDF and carry out visual Subject Clustering, construct people place subject dictionary, find out after subordinate sentence corresponding evaluation item number in text, Weakly supervised based on naive Bayesian is presorted, the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, obtain feeling polarities, emotion visualization and verifying model are carried out to sense polarity;The method of the present invention can be from a large amount of emotions and user demand being hidden in these personalized reviews with excavation in noise and false comment data, it will be helpful to the decision behavior of business organization and individual subscriber, the method of the present invention is from the angle of data-driven simultaneously, satisfaction situation of the customer under each theme can be excavated, as a result can provide suggestion for people place operator and regulator.

Description

People place customer input method for digging based on character level convolutional neural networks
Technical field
The present invention relates to people place customer input method for digging fields, more particularly to based on character level convolutional neural networks People place customer input method for digging.
Background technique
Customer input excavation is the analysis to customer demand and opinion, carries out analysis to customer's comment and is conducive to people place clothes The improvement and iteration of business, due to the invisible nature of people place service, the online comment of people place than other information sources influence more Greatly, therefore, it is excavated by means of customer input and improves service quality, be the key that Rapid Accumulation competitive advantage, customer's meaning of mainstream See there are two types of excavation modes, first is that being directed to structured data analysis, that is, is based on structural data, such as questionnaire, Li Ke Special scale, semantic difference scale etc., to obtain appreciable, effective attribute;Second is that being analyzed for unstructured data, i.e., The characteristics of data itself are analyzed by natural language processing technique, visualization technique is commenting on website, forum, blog and society The text that great expression opinion can be obtained in media is handed over, and with the help of sentiment analysis system, this unstructured letter Breath can be automatically converted to structural data, it can capturing expression can be with about product, service, brand, politics or people Other themes of expression of opinion etc..
The comment of people place has the spies such as timeliness is strong, context theme is independent, viewpoint is clear, length is short and small, expression is random Point, existing customer input are excavated mode and are still deposited in terms of how efficiently excavating the customer's viewpoint and emotion that are hidden in noise In deficiency, it is unable to satisfy actual demand, therefore, the present invention proposes the people place customer input based on character level convolutional neural networks Method for digging, to solve shortcoming in the prior art.
Summary of the invention
In view of the above-mentioned problems, the method for the present invention can be hidden from a large amount of with excavation in noise and false comment data Emotion and user demand in these personalized reviews, it will help the decision behavior of business organization and individual subscriber, simultaneously The method of the present invention can excavate satisfaction situation of the customer under each theme, as a result may be used from the angle of data-driven Suggestion is provided for people place operator and regulator, there is very strong versatility, to consumer, operator and overseer specific one Fixed practical utility value.
The present invention proposes the people place customer input method for digging based on character level convolutional neural networks, comprising the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, acquire whole people place comments and establish out Then punctuation mark is substituted using Harbin Institute of Technology's open source LTP part-of-speech tagging function using newline, will be commented by people place dictionary Theme line in is decomposed, and is formed theme and is evaluated text;
Step 2: Subject Clustering makes after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF Visual Subject Clustering is carried out to the comment of people place with pyLDAvis, visualization cluster result is obtained, according still further to similarity in cluster Height, the low theme selection standard of similarity selects original text number of files k between cluster, obtains initial model, then calculate each theme t Between correlation;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, so The evaluation item number of corresponding theme is counted afterwards;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part and is chased after The former comment commented, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vector The mode of change calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, by the unstructured comment of character level as original Signal carries out duplicate removal according to character, and carries out descending arrangement according to character frequency and establish character list, by polling character table The mode of position ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, lead to It crosses softmax function to export to obtain feeling polarities, be printed by parameter of the Keras neural network tool to this model;
Step 7: to obtaining after the convolutional neural networks feature extraction of one-dimensional convolution kernel, feeling polarities progress emotion is visual Change, compare the customer input tendency under multiple themes, specific aim is carried out with the customer input tendency under multiple themes after comparison Improvement, the total satisfactory grade of people place is improved with this;
Step 8: verifying model is carried out 10 times under condition of equivalent using the model evaluation method of ten folding cross validations Experiment uses the accuracy of average test collection, accuracy of the mean, average recall rate and average F value as evaluation index and carries out model The verifying of validity.
Further improvement lies in that: in the step 2 shown in TF-IDF formula such as formula (1):
The positional factor of distribution situation and Feature Words between characteristic item classification different in a classification is to text Discrimination, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF- IDF method calculates the weights of Feature Words, and word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text containing entry t Number of files is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k.
Further improvement lies in that: topic relativity is calculated as shown in formula (3) in the step 2:
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, in the theme The word w more frequently occurred under t, it is more relevant with theme t;If word λ more special, more exclusive under 0, theme t W, it is more relevant with theme t, change domanial words term_w with the correlation of theme topic_t by adjusting the size of λ.
Further improvement lies in that: the value of text document number k is referring initially to people place normative document, then benefit in the step 2 With experiment by the way that on the basis of k=6, using the method for successively rising high-k, by reducing the intersection between main body, observation theme is not The minimum k value of covering carries out the selection for subject attribute word as number of topics.
Further improvement lies in that: in the step 5 shown in the calculation formula of output probability such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, presort When, using 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than the 0.9 positive text high as confidence level This, as confidence level high passive text of the output probability less than 0.1.
Further improvement lies in that: referring initially to the pixel scale processing scheme in image procossing in the step 6, it is assumed that The size of dictionary is n, by way of establishing character list, comment is carried out vectorization using the ID of character, is then introduced into one layer Con convolutional neural networks are handled, input layer using the Embdding layers of character vector by all characters of sentence into Row is spliced into a sentence matrix, the use of Pad length is 200 text size to cover 99%, mends 0 using " Pre " stem Method fill 0 in front in the case where text size is inadequate, and to Embdding layers of character weight be configured for Training updates, and then carries out feature extraction using one-dimensional convolution kernel Convolution1D, passes through one layer of global maximum pond layer Sampling and two layers of full articulamentum, finally export the softmax probability value using positive label as feeling polarities, pass through Keras Neural network tool prints the parameter of this model.
Further improvement lies in that: when the character vector of all characters of sentence is individual word in the step 6, no Carry out word segmentation processing.
The invention has the benefit that the method for the present invention can be dug from a large amount of comment datas with noise and falseness Dig the emotion and user demand being hidden in these personalized reviews, it will help the decision row of business organization and individual subscriber For, while the method for the present invention can excavate satisfaction feelings of the customer under each theme from the angle of data-driven As a result condition can provide suggestion for people place operator and regulator, and by improvement idea mining algorithm, for people place corpus Less problem proposes the sentiment analysis algorithm of the visualization subject extraction and Weakly supervised pre-training that are suitable for the comment of people place, can To realize the hidden feature subject extraction and sentiment analysis of online people place comment, and can be with accurate validation mould by verifying model Type validity, the method for the present invention have very strong versatility, to the specific certain practical benefit of consumer, operator and overseer With value.
Detailed description of the invention
Fig. 1 is flow diagram of the present invention.
Fig. 2 is LDA probabilistic model schematic diagram of the present invention.
Fig. 3 is the method for the present invention model structure schematic diagram.
Fig. 4 is the method for the present invention model parameter schematic diagram.
Fig. 5 is that people host inscribes visualization schematic diagram in the embodiment of the present invention.
Each theme comments on accounting schematic diagram in Fig. 6 embodiment of the present invention.
Service-feeling polarities distribution schematic diagram in Fig. 7 embodiment of the present invention.
Fig. 8 is that the customer input in the embodiment of the present invention under each theme visualizes schematic diagram.
Fig. 9 is experience-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 10 is characteristic-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 11 is facility-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 12 is traffic-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 13 is price-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 14 is environment-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Figure 15 is food and drink-feeling polarities distribution schematic diagram in the embodiment of the present invention.
Specific embodiment
In order to realize invention technological means, reach purpose and effect is easy to understand, below with reference to specific implementation Mode, the present invention is further explained.
According to Fig. 1,2,3,4,5,6,7,8, the present embodiment proposes that the people place based on character level convolutional neural networks cares for Objective opinion mining method, comprising the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, and acquisition is taken Cheng Chongqing people place plate, adopted Collect between on July 26,26 to 2018 years July in 2016 it is all take the Cheng Chongqing plate whole people place comment establish out people place dictionary, The subject attribute word of building is 100, and comment item number will affect subject distillation less than 100, so this data only chooses comment People place comment of the number of users greater than 100 and marking, the qualified corpus item number finally sorted out share 81810, contain There are unmarked 10000 to chase after to comment.After establishing people place dictionary, using Harbin Institute of Technology's open source LTP part-of-speech tagging function by punctuate Symbol is substituted using newline, and the theme line in comment is decomposed, and forms text, such as says " boss's enthusiasm, room Between neat and tidy, and inn is in scenic spot " be decomposed into that " boss enthusiasm ", " room neat and tidy ", " and inn is at scenic spot It is interior " three theme evaluations;
Step 2: Subject Clustering makes after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF Visual Subject Clustering is carried out to the comment of people place with pyLDAvis, visualization cluster result is obtained, according still further to similarity in cluster Height, the low theme selection standard of similarity selects original text number of files k between cluster, obtains initial model, then calculate each theme t Between correlation, shown in TF-IDF formula such as formula (1):
The positional factor of distribution situation and Feature Words between characteristic item classification different in a classification is to text Discrimination, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF- IDF method calculates the weights of Feature Words, and word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text containing entry t Number of files is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k;
Topic relativity is calculated as shown in formula (3):
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, in the theme The word w more frequently occurred under t, it is more relevant with theme t;If word λ more special, more exclusive under 0, theme t W, it is more relevant with theme t, change domanial words term_w with the correlation of theme topic_t, Fig. 5 by adjusting the size of λ The circle in middle left side represents different themes, and the distance between circle is the similarity between each theme, can help to push away The customer input number for calculating online people place, after selecting some theme, right panel can correspondingly be shown with this theme Nearest vocabulary can summarize the meaning of the theme by summarizing the meaning of these lexical representations, with reference to people place standard text Part carries out the selection for subject attribute word by the method on the basis of K=6, successively rising high-k using experiment, works as theme Each theme intersects less when number K=8, is evenly distributed, and effect is best, selects the 8th theme, and the descriptor that inside includes has The descriptor such as " surrounding enviroment ", " elevator ", " bedding ", " garden ", " desk ", " road ", by checking " independence " one Word is followed by " toilet ", and after being concluded by descriptor, the theme for showing that theme 8 includes has, two masters of " environment ", " facility " Topic, it is also that same mode carries out conclusion theme that 7 themes, which are concluded, realizes theme to the maximal cover of evaluation;
People host by means of people place authority file and visualization cluster auxiliary building people place subject dictionary, after building Topic and subject attribute word are as shown in table 1 below:
1 subject attribute word set of table
Find out corresponding evaluation item number after subordinate sentence by the matched mode of attribute word, to the evaluation item number of corresponding theme into Row statistics, in the comment of discovery people place, to facility, service, environment, traffic, food and drink, characteristic, price, experience in customer input Attention rate successively weakens, wherein it is less to price and the comment number of experience, shown in Fig. 6 specific as follows;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, so The evaluation item number of corresponding theme is counted afterwards;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part and is chased after The former comment commented, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vector The mode of change calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort, output probability Shown in calculation formula such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, presort When, using 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than the 0.9 positive text high as confidence level This, as confidence level high passive text of the output probability less than 0.1;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, and the text of character level is regarded original signal, is pressed Duplicate removal is carried out according to character, and carries out descending arrangement according to character frequency and establishes character list, passes through the position in polling character table The mode of ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, obtain emotion pole Property, referring initially to the pixel scale processing scheme in image procossing, input layer (InputLayer) is the character of every evaluation Vector, output layer are feeling polarities using softmax output, and model structure is as shown in Figure 3, it is assumed that the size of dictionary is n, is led to The mode for establishing character list is crossed, comment is subjected to vectorization using the ID of character, is then introduced into one layer of Con convolutional neural networks Handled, input layer (InputLayer) using the Embdding_1 layers of character vector by all characters of sentence into Row is spliced into a sentence matrix, is obtained by statistics: character matrix length is 200 text size to cover 99% (input:200), for length can be changed text using " pre " stem mend 0 method, text size it is inadequate 200 the case where Under, 0 is filled in front, and be configured to Embdding layers of character weight and update for training, then use Convolution1D (conv1d_1) carries out feature extraction, passes through one layer of global maximum pond layer sampling (GlobalMaxPooling1D) and two layers of full articulamentum (dense_1 and dense_2) it, finally exports with positive label Softmax probability value is printed as feeling polarities by parameter of the Keras neural network tool to this model, specific to join Number as shown in figure 4, all characters of sentence character vector be individual word when, without word segmentation processing;
In Fig. 6, indicate that corresponding theme evaluates Sentiment orientation using abscissa, the emotion score value of every comment is fallen in Between [0,1], it is stronger that the step size settings of abscissa 0.01, closer to 1 (right half) represent positive emotion, (left closer to 0 Part) represent that Negative Affect is stronger, output probability is when intermediate position, it is believed that emotion is neutrality, and emotional value is 0, ordinate indicates that corresponding theme comments on item number;
Step 7: to obtaining after the convolutional neural networks feature extraction of one-dimensional convolution kernel, feeling polarities progress emotion is visual Change, compare the customer input tendency under multiple themes, specific aim is carried out with the customer input tendency under multiple themes after comparison Improvement, the total satisfactory grade of people place is improved with this, the feelings under each theme can be obtained by the result of collect statistics Fig. 6 The distribution of sense trend, the customer formed under each theme comment on feeling polarities figure, altogether 8 theme emotion tendency charts, such as Fig. 7 institute Show, from the point of view of people place customer's sentiment analysis result of theme, " the clothes of Chongqing Min Su can be concluded that by analyzing Business ", " traffic ", " experience ", " environment ", the themes such as " price " are had higher rating, and image is whole obviously to be tilted to right half, this with Chongqing is undivided service trade as the geographical location of development strategy and " city of scenery with hills and waters ", Chongqing traffic convenience, near the mountain And build, index of travelling in recent years rises year by year, attracts large quantities of external tourists to come Chongqing and plays, novel is felt to the experience of people place, For consumption price, Chongqing is in southwest, consumes, price material benefit favorable comment by customer slightly lower compared with eastern region, but Be to " food and drink ", " characteristic ", " facility " sentiment analysis from the point of view of, for the opinion of customer than stronger, this may be with Chongqing region With eat it is peppery based on, Lai Chongqing plays in the majority with nonlocal tourist, may it is uncomfortable to diet caused by, the position of general people place Close to sight spot it is in the majority, consider cost problem, less input on facility, and customer is larger to its opinion, the later period can by with Sight spot cooperation carrys out more new facility, and feeling polarities figure can only indicate that the customer input under single theme is inclined to, further progress point Analysis is 0.2 progress emotion visualization according to step-length, compares the customer input tendency under multiple themes, and abscissa indicates evaluation master The case where topic, ordinate indicates emotion accounting, can compare the customer input of multiple themes simultaneously, show in figure and single feelings Sense polarity is consistent, and customer is bigger to people place " facility ", the opinion of " food and drink ", and satisfaction is lower, can carry out accordingly later Targetedly improve, the total satisfactory grade of people place is improved with this, it is specific as shown in Figure 8;
Step 8: verifying model is carried out 10 times under condition of equivalent using the model evaluation method of ten folding cross validations Experiment uses the accuracy of average test collection, accuracy of the mean, average recall rate and average F value as evaluation index and carries out model The verifying of validity, training set are selected artificial using 36000 text by weak training aids filtering and hand picking, test set 12000 comments of label, use decision tree (DT), naive Bayesian (NB), support vector machines (SVM) and RNN (LSTM) four Kind of algorithm and whether using the mode of Weakly supervised pre-training experiment is compared, C-CNN-SA indicates this paper model, CNN-W table Show that the character level CNN, CNN-N that presort without using Weak Classifier indicate to use word the grade CNN, CNN-S of standard to indicate that use is gone Word grade CNN, C-RNN after stop words indicate that the LSTM using character level, evaluating result are as shown in table 2:
2 model evaluating tables of data of table
From table 2 it can be seen that the accuracy of test set improves 2% after pre-treatment step is added, on emotional semantic classification, The method of the present invention compares traditional word grade model using improved model, has a certain upgrade on classification accuracy, in short text Under emotional semantic classification, the granularity accuracy rate of character level is higher than word grade, can be due to expectation is shorter, can using stop words filtering Can lose text information causes classification performance to decline, and the text of character level is regarded original input signal, is directly used One-dimensional convolutional neural networks carry out feature extraction, in the case where short text, may not need the word level for considering language Meaning, this mode are simplified the engineering of sentiment analysis.
The method of the present invention can be commented from largely these personalizations are hidden in excavation in noise and false comment datas Emotion and user demand in, it will help the decision behavior of business organization and individual subscriber, while the method for the present invention is from number Set out according to the angle of driving, satisfaction situation of the customer under each theme can be excavated, as a result can for people place operator and Regulator provides suggestion, and by improvement idea mining algorithm, for the less problem of people place corpus, proposition is suitable for the people The hidden of online people place comment may be implemented in the sentiment analysis algorithm of the visualization subject extraction and Weakly supervised pre-training of place comment Can have containing particular subject extraction and sentiment analysis, and by verifying model with accurate validation model validation, the method for the present invention Very strong versatility, to the specific certain practical utility value of consumer, operator and overseer.
The basic principles, main features and advantages of the invention have been shown and described above.The technical staff of the industry should Understand, the present invention is not limited to the above embodiments, and the above embodiments and description only describe originals of the invention Reason, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these change and change Into all fall within the protetion scope of the claimed invention.The claimed scope of the invention is by appended claims and its equivalent Object defines.

Claims (7)

1. the people place customer input method for digging based on character level convolutional neural networks, which comprises the following steps:
Step 1: online people place comment acquisition and pretreatment construct web crawlers, acquire whole people place comments and establish out people place word Then allusion quotation is substituted punctuation mark using newline using Harbin Institute of Technology's open source LTP part-of-speech tagging function, by the master in comment Topic sentence is decomposed, and is formed theme and is evaluated text;
Step 2: Subject Clustering uses after carrying out feature extraction and vectorization to theme evaluation text using TF-IDF PyLDAvis carries out visual Subject Clustering to the comment of people place, obtains visualization cluster result, high according still further to similarity in cluster, The theme selection standard that similarity is low between cluster selects original text number of files k, obtains initial model, then calculate between each theme t Correlation;
Step 3: people place authority file and visualization cluster result auxiliary building people place subject dictionary are used in;
Step 4: theme evaluates corresponding evaluation item number in text after finding out subordinate sentence by the matched mode of attribute word, then right The evaluation item number of corresponding theme is counted;
Step 5: Weakly supervised based on naive Bayesian is presorted, and is not had by web crawlers automatic marking part to chase after and be commented Original comment, it is assumed that k is the keyword number of comment, and j is classification number, and evaluation has two class emotions, by text word frequency vectorization Mode calculates the posterior probability of an evaluation, and output probability is greater than 0.5, that is, thinks the success that can presort;
Step 6: sentiment analysis is commented in the people place based on C-CNN-SA, and original signal is regarded in the unstructured comment of character level, Duplicate removal is carried out according to character, and carries out descending arrangement according to character frequency and establishes character list, passes through the position in polling character table The mode for setting ID will comment on vectorization, and the convolutional neural networks for constructing one-dimensional convolution kernel carry out feature extraction, pass through Softmax function exports to obtain feeling polarities, is printed by parameter of the Keras neural network tool to this model;
Step 7: carrying out emotion visualization to feeling polarities are obtained after the convolutional neural networks feature extraction of one-dimensional convolution kernel, right Than the customer input tendency under multiple themes, targetedly changed with the customer input tendency under multiple themes after comparison It is kind, the total satisfactory grade of people place is improved with this;
Step 8: verifying model carries out 10 experiments using the model evaluation method of ten folding cross validations under condition of equivalent, The accuracy of average test collection, accuracy of the mean, average recall rate and average F value, which are used, as evaluation index carries out model validation Verifying.
2. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature It is: in the step 2 shown in TF-IDF formula such as formula (1):
Differentiation of the positional factor of distribution situation and Feature Words between characteristic item classification different in a classification to text Degree, when entry appears in the different location of text document, the contribution to discrimination be it is different, utilize TF-IDF method Calculate the weight of Feature Words, word w is in ctShown in improvement IDF calculation formula such as formula (2) in class:
In formula (1) and formula (2), N is overall text document number, and T is total entry number, wherein the text document containing entry t Number is x, and ctText document number be y, remove ctThe outer text document number comprising entry t is k.
3. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature Be: topic relativity is calculated as shown in formula (3) in the step 2:
Relevance (term_w | topic_t)=λ * p (w | t)+(1- λ) * p (w | t)/p (w) (3)
In formula (3), the correlation of some word theme is adjusted by lambda parameter, if λ close to 1, at theme t more The word w frequently occurred, it is more relevant with theme t;If word w λ more special, more exclusive under 0, theme t, with master Topic t is more relevant, changes domanial words term_w with the correlation of theme topic_t by adjusting the size of λ.
4. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature Be: the value of text document number k is base by k=6 referring initially to people place normative document, re-using experiment in the step 2 Standard, using the method for successively rising high-k, by reducing the intersection between main body, the minimum k value that observation theme does not cover is as master Number is inscribed, the selection for subject attribute word is carried out.
5. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature It is: in the step 5 shown in the calculation formula of output probability such as formula (4):
In order to reject false comment, increases the accuracy of sentiment analysis, use and presort as data cleansing, when presorting, make With 0 and 1 label, it is passive and positive to respectively represent, and output probability value is greater than 0.9 active text high as confidence level, output As confidence level high passive text of the probability less than 0.1.
6. the people place customer input method for digging according to claim 1 based on character level convolutional neural networks, feature It is: referring initially to the pixel scale processing scheme in image procossing in the step 6, it is assumed that the size of dictionary is n, is passed through Comment is carried out vectorization using the ID of character, is then introduced into one layer of Con convolutional neural networks and carries out by the mode for establishing character list Processing, carries out being spliced into a sentence square using the Embdding layers of character vector by all characters of sentence in input layer Battle array, is 200 text size to cover 99% using Pad length, and the method for mending 0 using " Pre " stem is inadequate in text size In the case where, 0 is filled in front, and be configured to Embdding layers of character weight and update for training, then using one-dimensional Convolution kernel Convolution1D carries out feature extraction, by one layer of global maximum pond layer sampling and two layers of full articulamentum, finally Output is using the softmax probability value of positive label as feeling polarities, by Keras neural network tool to the parameter of this model It is printed.
7. the people place customer input method for digging according to claim 6 based on character level convolutional neural networks, feature It is: when the character vector of all characters of sentence is individual word in the step 6, without word segmentation processing.
CN201910117188.0A 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network Active CN109829166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910117188.0A CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910117188.0A CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Publications (2)

Publication Number Publication Date
CN109829166A true CN109829166A (en) 2019-05-31
CN109829166B CN109829166B (en) 2022-12-27

Family

ID=66862072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910117188.0A Active CN109829166B (en) 2019-02-15 2019-02-15 People and host customer opinion mining method based on character-level convolutional neural network

Country Status (1)

Country Link
CN (1) CN109829166B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347828A (en) * 2019-06-26 2019-10-18 西南交通大学 A kind of Metro Passenger demand dynamic acquisition method and its obtain system
CN110688451A (en) * 2019-08-15 2020-01-14 中国平安人寿保险股份有限公司 Evaluation information processing method, evaluation information processing device, computer device, and storage medium
CN110838287A (en) * 2019-10-16 2020-02-25 中国第一汽车股份有限公司 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium
CN111027553A (en) * 2019-12-23 2020-04-17 武汉唯理科技有限公司 Character recognition method for circular seal
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111309859A (en) * 2020-01-21 2020-06-19 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device
CN111445271A (en) * 2020-03-31 2020-07-24 携程计算机技术(上海)有限公司 Model generation method, and prediction method, system, device and medium for cheating hotel
CN112070856A (en) * 2020-09-16 2020-12-11 重庆师范大学 Limited angle C-arm CT image reconstruction method based on non-subsampled contourlet transform
CN112784776A (en) * 2021-01-26 2021-05-11 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network
CN113778454A (en) * 2021-09-22 2021-12-10 重庆海云捷迅科技有限公司 Automatic evaluation method and system for artificial intelligence experiment platform
CN116385029A (en) * 2023-04-20 2023-07-04 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109308317A (en) * 2018-09-07 2019-02-05 浪潮软件股份有限公司 A kind of hot spot word extracting method of the non-structured text based on cluster

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107038480A (en) * 2017-05-12 2017-08-11 东华大学 A kind of text sentiment classification method based on convolutional neural networks
CN107391483A (en) * 2017-07-13 2017-11-24 武汉大学 A kind of comment on commodity data sensibility classification method based on convolutional neural networks
CN108345587A (en) * 2018-02-14 2018-07-31 广州大学 A kind of the authenticity detection method and system of comment
CN109033089A (en) * 2018-09-06 2018-12-18 北京京东尚科信息技术有限公司 Sentiment analysis method and apparatus
CN109308317A (en) * 2018-09-07 2019-02-05 浪潮软件股份有限公司 A kind of hot spot word extracting method of the non-structured text based on cluster

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KOYEL CHAKRABORTY等: "Sentiment Analysis on a Set of Movie Reviews Using Deep Learning Techniques", 《SOCIAL NETWORK ANALYTICS:COMPUTATIONAL RESEARCH METHODS AND TECHNIQUES》 *
周敬一等: "基于深度学习的中文影评情感分析", 《上海大学学报(自然科学版)》 *
秦鹏等: "基于朴素贝叶斯网页分类的用户行为推衍", 《沈阳工业大学学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347828A (en) * 2019-06-26 2019-10-18 西南交通大学 A kind of Metro Passenger demand dynamic acquisition method and its obtain system
CN110688451A (en) * 2019-08-15 2020-01-14 中国平安人寿保险股份有限公司 Evaluation information processing method, evaluation information processing device, computer device, and storage medium
CN110838287A (en) * 2019-10-16 2020-02-25 中国第一汽车股份有限公司 Corpus processing method and device of chat robot in vehicle-mounted environment and storage medium
CN111027553A (en) * 2019-12-23 2020-04-17 武汉唯理科技有限公司 Character recognition method for circular seal
CN111159409B (en) * 2019-12-31 2023-06-02 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111309859A (en) * 2020-01-21 2020-06-19 上饶市中科院云计算中心大数据研究院 Scenic spot network public praise emotion analysis method and device
CN111445271A (en) * 2020-03-31 2020-07-24 携程计算机技术(上海)有限公司 Model generation method, and prediction method, system, device and medium for cheating hotel
CN112070856A (en) * 2020-09-16 2020-12-11 重庆师范大学 Limited angle C-arm CT image reconstruction method based on non-subsampled contourlet transform
CN112784776A (en) * 2021-01-26 2021-05-11 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network
CN112784776B (en) * 2021-01-26 2022-07-08 山西三友和智慧信息技术股份有限公司 BPD facial emotion recognition method based on improved residual error network
CN113778454A (en) * 2021-09-22 2021-12-10 重庆海云捷迅科技有限公司 Automatic evaluation method and system for artificial intelligence experiment platform
CN113778454B (en) * 2021-09-22 2024-02-20 重庆海云捷迅科技有限公司 Automatic evaluation method and system for artificial intelligent experiment platform
CN116385029A (en) * 2023-04-20 2023-07-04 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium
CN116385029B (en) * 2023-04-20 2024-01-30 深圳市天下房仓科技有限公司 Hotel bill detection method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109829166B (en) 2022-12-27

Similar Documents

Publication Publication Date Title
CN109829166A (en) People place customer input method for digging based on character level convolutional neural networks
CN106919689B (en) Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge
CN102929861B (en) Method and system for calculating text emotion index
CN107025299B (en) A kind of financial public sentiment cognitive method based on weighting LDA topic models
CN108108849A (en) A kind of microblog emotional Forecasting Methodology based on Weakly supervised multi-modal deep learning
CN109241255A (en) A kind of intension recognizing method based on deep learning
CN109933664A (en) A kind of fine granularity mood analysis improved method based on emotion word insertion
CN107169001A (en) A kind of textual classification model optimization method based on mass-rent feedback and Active Learning
CN101004737A (en) Individualized document processing system based on keywords
CN105468713A (en) Multi-model fused short text classification method
CN101599071A (en) The extraction method of conversation text topic
CN110674274A (en) Knowledge graph construction method for food safety regulation question-answering system
CN108280155A (en) The problem of based on short-sighted frequency, retrieves feedback method, device and its equipment
CN110134947A (en) A kind of sensibility classification method and system based on uneven multi-source data
CN103150333A (en) Opinion leader identification method in microblog media
CN1687924A (en) Method for producing internet personage information search engine
CN108388554A (en) Text emotion identifying system based on collaborative filtering attention mechanism
CN110990670B (en) Growth incentive book recommendation method and recommendation system
CN107145514A (en) Chinese sentence pattern sorting technique based on decision tree and SVM mixed models
CN109492105A (en) A kind of text sentiment classification method based on multiple features integrated study
CN106777193A (en) A kind of method for writing specific contribution automatically
CN110633367A (en) Seven-emotion classification method based on emotion dictionary and microblog text data
CN110059190A (en) A kind of user's real-time point of view detection method based on social media content and structure
CN113934835B (en) Retrieval type reply dialogue method and system combining keywords and semantic understanding representation
CN109582783A (en) Hot topic detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant