CN110413780A - Text emotion analysis method, device, storage medium and electronic equipment - Google Patents

Text emotion analysis method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN110413780A
CN110413780A CN201910639049.4A CN201910639049A CN110413780A CN 110413780 A CN110413780 A CN 110413780A CN 201910639049 A CN201910639049 A CN 201910639049A CN 110413780 A CN110413780 A CN 110413780A
Authority
CN
China
Prior art keywords
feature
text
collection
analyzed
affective style
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910639049.4A
Other languages
Chinese (zh)
Other versions
CN110413780B (en
Inventor
周谧
张志�
贺洋
朱珊珊
胡梦
杨爱峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Hefei Polytechnic University
Original Assignee
Hefei Polytechnic University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Polytechnic University filed Critical Hefei Polytechnic University
Priority to CN201910639049.4A priority Critical patent/CN110413780B/en
Publication of CN110413780A publication Critical patent/CN110413780A/en
Application granted granted Critical
Publication of CN110413780B publication Critical patent/CN110413780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0623Item investigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of text emotion analysis method, device, storage medium and electronic equipment, is related to data mining technology field, the technical problem for solving to carry out affective style analysis result inaccuracy using comment text of the prior art to product.This method comprises: obtaining text to be analyzed, which is text comments information of the user for product;Classified to obtain feature critiques collection based on each participle of the target signature dictionary to text to be analyzed, above-mentioned target signature dictionary includes being handled by Agglomerative Hierarchical Clustering method for describing the multiple groups specific feature set of product different attribute;The input of feature critiques collection is used to differentiate the disaggregated model of comment affective style, obtains the affective style information of the text to be analyzed of disaggregated model output.

Description

Text emotion analysis method, device, storage medium and electronic equipment
Technical field
The present invention relates to data mining technology fields, and in particular to a kind of text emotion analysis method, device, storage medium And electronic equipment.
Background technique
With the fast development that Internet technology is applied, the operation flow of only enterprise does not bring dramatic change, also facilitates Viewpoint or comment of consumer's expression for enterprise product.The important research side excavated as unstructured information is excavated in comment To relating generally to the analysis of network comment Sentiment orientation, comment on available product development demand, investigation consumption based on mass network Person understands product advantage and deficiency by consumer reviews, helps that enterprise is instructed to improve to the opinion and attitude of product, enterprise Product improves service quality.In the prior art, the text comments of sentiment analysis use are carried out to the text comments information of product Message sample quantity is few, usually using human subjective specify in the way of construction feature dictionary, obtain in this way text emotion analyze As a result it objective cannot illustrate consumer for the attitude of product, cause to carry out the comment text of product affective style analysis result Accuracy, confidence level it is not high.
Summary of the invention
In view of the deficiencies of the prior art, the present invention provides a kind of text emotion analysis method, device, storage medium and electricity Sub- equipment is solved and is asked using the technology that comment text of the prior art to product carries out affective style analysis result inaccuracy Topic.
In order to achieve the above object, the present invention is achieved by the following technical programs:
The first aspect of the present invention provides a kind of text emotion analysis method, which comprises
Text to be analyzed is obtained, the text to be analyzed is text comments information of the user for product;
Classified to obtain feature critiques collection based on each participle of the target signature dictionary to the text to be analyzed, it is described Target signature dictionary includes being handled by Agglomerative Hierarchical Clustering method for describing the specific spy of multiple groups of product different attribute Collection;
Feature critiques collection input is used to differentiate the disaggregated model of comment affective style, obtains disaggregated model output The text to be analyzed affective style information.
Optionally, the target signature dictionary constructs in the following way:
It is random to obtain multistage text comments information;
Pre-processed to obtain alternative features collection to the text comments information, the alternative features data set is and product The theme degree of association is greater than the set that the feature of degree of association threshold value segments;
Corresponding multidimensional term vector is converted by each feature participle of the alternative features collection;
The multidimensional term vector is clustered according to default cluster condition, obtains feature clustering gathering;
The multiple groups generic features collection for describing product different attribute is determined according to the feature clustering gathering, it is each described Generic features collection includes the common feature participle for the preset quantity for describing single attribute;
It is segmented according to each common feature of each multidimensional term vector of the alternative features collection and the generic features collection Similarity, construct the target signature dictionary.
Optionally, the disaggregated model constructs in the following way:
The affective style for marking every cluster of the feature clustering gathering obtains the affective style mark of each gathering Label;
According to the feature clustering gathering train classification models for having affective style label, to obtain for differentiating comment The object-class model of affective style.
Optionally, the basis has the feature clustering gathering train classification models of affective style label, to obtain For differentiating the object-class model of comment affective style, comprising:
Using every cluster in feature clustering gathering as input training sample data, by the cluster each in the training set The affective style label of collection obtains multiple fundamental classifiers as output training sample data training;
Multiple fundamental classifiers are obtained into multiple middle classification devices according to Adaboost algorithm training, wherein described The weight of each fundamental classifier in middle classification device can automatic adjusument;
Multiple middle classification devices are integrated based on evidential reasoning and obtain the object-class model, wherein the target The weight of each middle classification device in disaggregated model can automatic adjusument.
Optionally, the disaggregated model that feature critiques collection input is used to differentiate comment affective style, is somebody's turn to do The affective style information of the text to be analyzed of disaggregated model output, comprising:
Determine the weight distribution of each middle classification device in object-class model;
The target point that feature critiques collection input is integrated according to multiple middle classification devices that the weight is distributed Class model, output obtain the affective style information of the text to be analyzed.
Optionally, the method also includes:
The target signature dictionary is updated according to the feature critiques collection.
The second aspect of the present invention provides a kind of text emotion analytical equipment, and described device includes: acquisition module, for obtaining Text to be analyzed is taken, the text to be analyzed is text comments information of the user for product;
Categorization module, for being classified to obtain spy based on each participle of the target signature dictionary to the text to be analyzed Comment collection is levied, the target signature dictionary includes being handled by Agglomerative Hierarchical Clustering method for describing product different attribute Multiple groups specific feature set;
Determining module is obtained for feature critiques collection input to be used to differentiate the disaggregated model of comment affective style The affective style information of the text to be analyzed of disaggregated model output.
Optionally, described device further include:
Update module, for updating the target signature dictionary according to the feature critiques collection.
Third aspect present invention provides a kind of computer readable storage medium, is stored thereon with computer program, the program The step of text emotion analysis method that first aspect present invention provides is realized when being executed by processor.
Fourth aspect present invention provides a kind of electronic equipment, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize that first aspect present invention provides Text emotion analysis method the step of.
The present invention provides a kind of text emotion analysis method, device, storage medium and electronic equipments, with prior art phase Than, the invention has the following advantages:
Pass through target signature dictionary constructed by Agglomerative Hierarchical Clustering method processing feature data, it is possible to reduce because of Manual definition The limitation of feature lexicon improves the typicalness and representativeness of target signature dictionary.Therefore, it is based on above-mentioned target signature dictionary Carrying out word segmentation processing to text to be analyzed, to obtain Feature Semantics similarity of the feature critiques collection in each dimension higher, by the spy Levy comment collection and input disaggregated model, can achieve preferable affective style classifying quality, improve to the comment text of product into The accuracy and reliability of market sense type analysis result.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 is a kind of flow chart of the text emotion analysis method provided according to an exemplary embodiment;
Fig. 2 is a kind of flow chart of the target signature dictionary creation method provided according to an exemplary embodiment;
Fig. 3 is a kind of flow chart of the disaggregated model construction method provided according to an exemplary embodiment;
Fig. 4 is a kind of block diagram of the text emotion analytical equipment provided according to an exemplary embodiment;
Fig. 5 is the block diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, to the technology in the embodiment of the present invention Scheme is clearly and completely described, it is clear that and described embodiments are some of the embodiments of the present invention, rather than whole Embodiment.Based on the embodiments of the present invention, those of ordinary skill in the art are obtained without creative efforts The every other embodiment obtained, shall fall within the protection scope of the present invention.
The embodiment of the present application is solved and is commented using the prior art product by providing a kind of text emotion analysis method Paper this progress affective style analyzes the technical problem of result inaccuracy, realizes the affective style analysis result of comment text The promotion of accuracy and confidence level.In conjunction with appended figures and specific embodiments to technology provided by the present application Scheme is described in detail.
Embodiment 1:
Referring to FIG. 1, Fig. 1 is a kind of flow chart of the text emotion analysis method provided according to an exemplary embodiment, As shown in Figure 1, text emotion analysis method the following steps are included:
Step S11, text to be analyzed is obtained.
With the development of business model on line, more and more consumers is produced by internet checking shopping goods, discussion Product use, deliver evaluation to product, thus have and obtain consumer demand based on internet data and optimize product development It is required that condition.For example, can crawl text relevant to product by web crawlers technology from forum or electric business website and comment By information, these text comments information are the text to be analyzed acquired.
Step S12, classified to obtain feature critiques collection based on each participle of the target signature dictionary to text to be analyzed.
Wherein, target signature dictionary, which can be, is not belonged to by what Agglomerative Hierarchical Clustering method was handled for describing product The multiple groups specific feature set of property, every group of specific feature set includes the multiple correlated characteristic words closest with particular community feature. Word segmentation processing is carried out for text to be analyzed, and based on target signature dictionary to each participle obtained after text to be analyzed participle The method and step classified specifically: correspond multiple groups specific feature set and establish feature critiques subset, traverse text to be analyzed This each participle, successively judges whether each participle appears in multiple groups special characteristic concentration, if so, by text addition pair In the feature critiques subset answered, is completed until text comments information all in text to be analyzed is classified, obtain multiple groups feature and comment The feature critiques collection constituted by subset.For example, target signature dictionary SD includes m group specific feature set SD={ sd1,sd2,…sdk, K=1,2 ... m }, corresponding every group of specific feature set establishes m feature critiques subset { f1,f2,…fk, k=1,2 ... m }, traversal Each participle of text to be analyzed, if participle A appears in specific feature set sd3In, then it will add comprising the text to be sorted for segmenting A Enter corresponding feature critiques subset f3In, it is analysed to text comments information all in text in this way and classifies, Obtain feature critiques collection fea_comments={ f1,f2,…fk, k=1,2 ... m }.
Step S13, the input of feature critiques collection is used to differentiate the disaggregated model of comment affective style, obtains the disaggregated model The affective style information of the text to be analyzed of output.
Illustratively, feature critiques collection fea_comments={ f step S12 obtained1,f2,…fk, k=1,2 ... m } Input is for differentiating the disaggregated model of comment affective style, available m feature critiques subset { f1,f2,…fk, k=1, 2 ... m } affective style information, wherein affective style includes positive emotion, neutral emotion and negative sense emotion further can Consumer is obtained for the emotion distribution situation of product with analysis.
Using the above method, target signature dictionary is constructed by Agglomerative Hierarchical Clustering method processing feature data, it is possible to reduce Because of the limitation of Manual definition's feature lexicon, the typicalness and representativeness of target signature dictionary are improved.Therefore, it is based on above-mentioned mesh Mark feature lexicon carries out word segmentation processing to text to be analyzed and obtains Feature Semantics similarity of the feature critiques collection in each dimension It is higher, this feature comment collection is inputted into disaggregated model, preferable affective style classifying quality is can achieve, improves to product Comment text carries out affective style precision of analysis and reliability.
Embodiment 2:
Referring to FIG. 2, Fig. 2 is a kind of process of the target signature dictionary creation method provided according to an exemplary embodiment Figure, can be applied to text emotion analysis method provided by embodiment 1, classifies for each participle to text to be analyzed Obtain feature critiques collection, as shown in Fig. 2, target signature dictionary creation method the following steps are included:
Step S21, multistage text comments information is obtained at random.
Enough text comments information relevant to product is crawled by the network platform, such as forum or electric business website.It is excellent Selection of land can choose internet data source according to Enterprise product development item optimization direction.
Step S22, text comments information is pre-processed to obtain alternative features collection.
Wherein, alternative features data set is the set for being greater than the feature of degree of association threshold value with the product theme degree of association and segmenting, Being pre-processed the step of obtaining alternative features collection to the content of text comments information includes: filtering short text, to comment content It is segmented, removes stop words, filter unrelated character, the data cleansings such as part-of-speech tagging are carried out to each participle and are operated.
Specifically, the text comments information that the number of words that will acquire first is less than default number of words threshold value N is excluded from sample, Then word segmentation processing is carried out to remaining text comments information, and executes and stop words is gone to operate to obtain comment text collection;Use word Property annotation tool mark comment text concentrate participle part of speech, for example, from the text after participle extract common noun, proprietary name Word and abbreviation, and it is stored in noun feature set n_features;The frequency of each word in noun feature set n_features is counted, and Frequency is filtered lower than the participle of integer M, with reduce because word segmentation result it is unstable caused by low frequency noun and low representative Property low frequency noun, the noun that will filter out low frequency noun is all stored in thick feature set rn_features;It further, is guarantor Noun and product feature highlights correlations in thick feature set rn_features are demonstrate,proved, it can be based on point mutual information (Pointwise Mutual Information, abbreviation PMI) calculate the degree of association of noun and product feature in thick feature set rn_features, example Such as, point mutual information Web_PMI can be calculated by following formula:
Wherein, hit (feature), hit (product) respectively indicate word feature, product in a search engine The probability of occurrence returned is searched for, hit (feature) * hit (product) respectively indicates word feature and product and searching Index holds up the two that middle search returns while probability of occurrence.
If the noun deposit for being greater than threshold alpha with the Web_PMI value of product feature in thick feature set rn_features is alternative special Collect an_features, can filter in this way with the lower noun of the product theme degree of association, to improve the typicalness of feature lexicon And representativeness.
It is worth noting that default number of words threshold value N, integer M and threshold alpha can mutually be fitted according to project actual demand It should adjust, be not specifically limited herein.
Step S23, corresponding multidimensional term vector is converted by each feature participle of alternative features collection.
It illustratively, can be using all text comments information as training corpus to Word2vec (word tovector) mind It is trained through network model, and utilizes the neural network model of training completion by each of alternative features collection an_features Feature segments the term vector for being converted into various dimensions, the spy that each dimension of term vector has certain semanteme and grammatically explains Sign, term vector can be used for calculating similarity between word.
Step S24, multidimensional term vector is clustered according to default cluster condition, obtains feature clustering gathering.
In the present embodiment, multidimensional term vector can be clustered using Agglomerative Hierarchical Clustering method, by each multidimensional word to Amount is used as initial classes cluster, merges these initial class clusters according to certain criterion, until reaching preset termination cluster condition.For example, For the set D={ d of m multidimensional term vector1,d2,…dmBirds of the same feather flock together, the quantity of initial classes cluster is equal to multidimensional term vector Number is C={ c1,c2,…cm, two class clusters that distance is less than pre-determined distance threshold value are merged, until class number of clusters amount reaches pre- If numerical value q when terminate cluster process, obtain feature clustering gathering C={ c1,c2,…cq}.Specifically, two class clusters are executed The step of union operation are as follows: the central point for calculating every a kind of cluster replaces class cluster to calculate between any two class cluster with central point Euclidean distance merges two class clusters that distance is less than pre-determined distance threshold value, and repeats above step until class number of clusters amount reaches Preset numerical value, the feature clustering gathering after being clustered.
Step S25, the multiple groups generic features collection for describing product different attribute is determined according to feature clustering gathering, it is each Generic features collection includes the common feature participle for the preset quantity for describing single attribute.
In a kind of possible embodiment, before constructing target signature dictionary, first constructed according to feature clustering gathering Generic features dictionary.Specifically, feature clustering gathering is obtained based on step S24, therefrom chooses certain amount semanteme and product The closest participle of one attribute is segmented as the common feature of the attribute, these features participle constitutes the generic features of the attribute Collection, further, obtains the multiple groups generic features collection of related product attribute, these generic features collection may be constructed generic features Dictionary.
Step S26, it is segmented according to each common feature of each multidimensional term vector of alternative features collection and generic features collection Similarity, construct target signature dictionary.
It include the more correlated characteristic words of related product attribute in target signature dictionary, illustratively, for passing through step S25 Construct m obtained generic features dictionary CD={ cd1,cd2,…cdk, k=1,2 ... m }, accordingly establish m empty word allusion quotation SD= {sd1,sd2,…sdk, k=1,2 ... m }, traverse generic features dictionary cdkIn each common feature participle, calculate common feature The multidimensional term vector similarity of participle and feature participle each in alternative features collection an_features, such as can be by following public Formula calculate alternative features collection each multidimensional term vector and generic features collection each common feature segment similarity Sim (X, Y):
Wherein, X, Y respectively indicate two multidimensional term vectors, and n is the dimension of term vector.
If the similarity of the two is greater than similarity threshold β, generic features dictionary cd is added in multidimensional term vectorkIt is corresponding Empty word allusion quotation sdkIn, constitute the specific feature set in relation to certain attribute;The classification of all multidimensional term vectors is added in this way In m empty word allusion quotation SD.If with generic features dictionary cdkThe similarity of middle common feature participle is greater than the multidimensional term vector of threshold value beta Quantity is less than least restrictive numerical value x, then can carry out descending sort to multidimensional term vector according to similarity size, and choose preceding x A multidimensional term vector is added in corresponding empty word allusion quotation.In this way, may be constructed the specific spy of multiple groups for describing product different attribute Collect SD, i.e. target signature dictionary.
In text emotion analytic process, after obtaining text to be analyzed, it is based on method institute provided in this embodiment structure The target signature dictionary built classifies each participle of text to be analyzed to obtain feature critiques collection, and feature critiques collection is defeated Enter the disaggregated model for differentiating comment affective style, the affective style of the text to be analyzed of available disaggregated model output Information.
Optionally, the feature critiques collection of text to be analyzed can be also used for updating target signature dictionary, to cope with network use Language environment it is continually changing, avoiding lacking because of outmoded feature lexicon new term causes feature to segment error in classification, final shadow Ring the accuracy of disaggregated model output result.
Using the above method, multidimensional property spy can be constructed for enterprise product by carrying out feature extraction by the data to magnanimity Dictionary is levied, may be implemented to carry out multi-faceted performance evaluation to product.It is worth noting that believing in the present embodiment for text comments The feature participle of breath has used the feature vector representation containing semantic and grammer, carries out suitable for the text to extensive quantity Processing, in addition, also successively being clustered using Agglomerative Hierarchical Clustering method to multidimensional term vector, makes finally obtained generic features collection And the degree of association in target signature dictionary between feature participle is higher, avoids in existing research process because of researcher's knowledge knot Influence of the structure level to subjective one-sided brought by analysis result, the feature critiques collection obtained based on target signature dictionary is inputted Disaggregated model can achieve preferable affective style classifying quality, improves and carries out affective style point to the comment text of product Analyse the accuracy and reliability of result.
Embodiment 3:
Referring to FIG. 3, Fig. 3 is a kind of flow chart of the disaggregated model construction method provided according to an exemplary embodiment, It can be applied to text emotion analysis method provided by embodiment 1 or embodiment 2, for sentencing according to the feature critiques collection of input The affective style of other text comments information, as shown in figure 3, disaggregated model construction method the following steps are included:
Step S31, the affective style of every cluster of marker characteristic cluster gathering, obtains the affective style mark of each gathering Label;
Step S32, according to the feature clustering gathering train classification models for having affective style label, to obtain for differentiating Comment on the object-class model of affective style.
Specifically, according to the feature clustering gathering train classification models for having affective style label, to obtain for differentiating Comment on affective style object-class model, comprising: using every cluster in feature clustering gathering as input training sample data, Multiple fundamental classifiers are obtained using the affective style label of gathering each in training set as output training sample data training;It will Multiple fundamental classifiers obtain multiple middle classification devices according to Adaboost algorithm training, each basis in the middle classification device The weight of classifier can automatic adjusument;It is then based on evidential reasoning and integrates multiple middle classification devices and obtain object-class model, Wherein, the weight of each middle classification device in object-class model can automatic adjusument.
AdaBoost algorithm is further improved in the present embodiment, is specifically theed improvement is that: being calculated based on K-means Method clusters text comments information to obtain the feature clustering gathering including k gathering, to the feelings of each feature clustering gathering Sense type information is labeled, and obtains the affective style label of every cluster, and the above k feature including affective style label is gathered Class gathering is as training dataset T={ T1,T2,…TkTo Naive Bayes Classifier, logistic regression classifier, supporting vector The model parameter of machine classifier is trained test respectively, obtains k Naive Bayes Classifier, k logistic regression classifier And k support vector machine classifier, and determine that the highest classifier of accuracy rate is as basic classifier in each classifier. Three obtained fundamental classifier is constructed to the emotional semantic classification integrated model of basic classification linear combination according to AdaBoost algorithm, Middle classification device i.e. in the present embodiment, wherein the weight of each fundamental classifier in middle classification device can automatic adjusument.
Specifically, AdaBoost algorithm is the iterative process for adaptively changing training sample distribution.Initialization instruction first Practice the weight of each sample in data set, the training sample data training with weight distribution is then concentrated according to training data Fundamental classifier is obtained, further according to the corresponding adjusting training sample of classification error rate of the fundamental classifier in training sample data The weight of data can such as reduce the weight for the sample data correctly classified, and improve by the power of the sample data of mistake classification Value, is updated according to sample weights of the preset rules to the training sample data that training data is concentrated, based on after right value update Training sample data continue train fundamental classifier, can be obtained one group of middle classification device, can then construct middle classification device Linear combination obtain object-class model.
In a kind of possible embodiment, each middle classification in object-class model can be calculated in the following manner The weight of device: the cosine similarity that feature critiques concentrate each sample to be analyzed and k gathering mass center is calculated;More than the two String similarity calculates the standardization weight of each middle classification device according to preset rules Corresponding matching.It is worth noting that being prominent Out with the weight of the maximum middle classification device of Sample Similarity to be analyzed, weight normalize when improve, sample to be analyzed with Gathering mass center is more similar, and middle classification device weight corresponding with the gathering is bigger.
Trained multiple middle classification devices progress result is melted with Evidential reasoning algorithm in addition, the present embodiment is additionally provided It closes, the prediction probability integrated using each middle classification device to feature critiques is initial confidence level, according to feature critiques collection and training sample The weight of the similarity calculation middle classification device of feature clustering gathering in notebook data, and using the accuracy rate of each middle classification device as Reliability, using evidential reasoning rule to final result fusion calculation emotion score.Illustratively, the fusion rule of evidence approach Shown in being then defined as follows:
Assuming that identification framework Θ={ θ12,…θn, there is e1And e2Fundamental classifier is regarded as evidence by two evidences, corresponding Basic probability assignment be respectivelyWithCorresponding evidence weight is respectively w1And w2, corresponding evidence reliability point It Wei not r1And r2.It is available:
Wherein, p (Θ) is the power set of identification framework;βθ,iIndicate that i-th of evidence supports evaluation object setting in θ grade Reliability, it can be understood as the classification of i-th of fundamental classifier is θ output probability;Mixed weight-valueEvidence weight and evidence reliability are merged.So two evidences are as follows:
Wherein, the above rule meetsAnd
Based on the above evidential reasoning formula, when it is n that evidence, which is expanded, i.e., n fundamental classifier is as evidence e1, e2... en, the fusion results of n evidence can be obtained by carrying out successive ignition to evidenceFinal fusion results indicate various The probability of affective style determines Sentiment orientation of the affective style of maximum probability as text to be analyzed.
In text emotion analytic process, constructed by the method for obtaining text to be analyzed and being provided based on embodiment 2 Target signature dictionary classifies after obtaining feature critiques collection each participle of text to be analyzed, according to feature critiques collection with The similarity of each gathering and the reliability of middle classification device determine the weight of each middle classification device in object-class model Feature critiques collection is inputted the target classification for being used to differentiate comment affective style constructed by method provided in this embodiment by distribution Model can export the affective style information for obtaining text to be analyzed.
In the above scheme, it is used for the feature clustering gathering obtained by K-means clustering method to train base categories Device and middle classification device, on this basis using Evidential reasoning algorithm building for differentiating the classification mould of comment affective style Type.Since cluster process can classify the feature text with similitude, reduce mistake in traditional AbaBoost algorithm The sample data for point emphasizing classification error, ignore between different samples there are this case that similar characteristic to caused by classification results Influence.Meanwhile each fundamental classifier is trained using feature clustering gathering, each fundamental classifier is enhanced in training sample Difference on notebook data can sufficiently excavate the classification capacity of each fundamental classifier.It is each by the method synthesis of evidential reasoning The reliability of fundamental classifier, weight merge classification results, and disaggregated model can be made to realize better classifying quality, mentioned It is high that affective style precision of analysis and reliability are carried out to the comment text of product.
The present invention can be widely applied to a variety of product attribute feature minings and sentiment analysis in kind and non-material object, guidance Enterprise, for product various dimensions performance and satisfaction evaluation, improves product defects according to consumer, Ke Yiyou Effect improves the execution efficiency and demand degree of conformity of enterprise product project.
Embodiment 4:
Referring to FIG. 4, a kind of block diagram for text emotion analytical equipment that Fig. 4 is provided according to an exemplary embodiment, the dress Setting can be implemented in combination with by software, hardware or both as some or all of of electronic equipment, as shown in figure 4, text Sentiment analysis device 400 includes:
Module 41 is obtained, for obtaining text to be analyzed, text to be analyzed is text comments information of the user for product;
Categorization module 42, for being classified to obtain feature based on each participle of the target signature dictionary to text to be analyzed Comment collection, target signature dictionary include being handled by Agglomerative Hierarchical Clustering method for describing the multiple groups of product different attribute Specific feature set;
Determining module 43 is somebody's turn to do for the input of feature critiques collection to be used to differentiate the disaggregated model of comment affective style The affective style information of the text to be analyzed of disaggregated model output.
Optionally, device further includes update module 44, for updating target signature dictionary according to feature critiques collection.
Using above-mentioned apparatus, target signature dictionary is constructed by Agglomerative Hierarchical Clustering method processing feature data, it is possible to reduce Because of the limitation of Manual definition's feature lexicon, the typicalness and representativeness of target signature dictionary are improved.Therefore, it is based on above-mentioned mesh Mark feature lexicon carries out word segmentation processing to text to be analyzed and obtains Feature Semantics similarity of the feature critiques collection in each dimension It is higher, this feature comment collection is inputted into disaggregated model, preferable affective style classifying quality is can achieve, improves to product Comment text carries out affective style precision of analysis and reliability.
Embodiment 5:
Fig. 5 is the block diagram of a kind of electronic equipment 500 provided in an embodiment of the present invention.As shown in figure 5, the electronic equipment 500 It may include: processor 501, memory 502.The electronic equipment 500 can also include multimedia component 503, input/output (I/O) one or more of interface 504 and communication component 505.
Wherein, processor 501 is used to control the integrated operation of the electronic equipment 500, to complete above-mentioned text emotion point All or part of the steps in analysis method.Memory 502 is for storing various types of data to support in the electronic equipment 500 Operation, these data for example may include the finger of any application or method for operating on the electronic equipment 500 Enable and the relevant data of application program, such as in target signature dictionary, disaggregated model each fundamental classifier weight etc. Deng.The memory 502 can realize by any kind of volatibility or non-volatile memory device or their combination, such as Static random access memory (Static Random Access Memory, abbreviation SRAM), electrically erasable is read-only to be deposited Reservoir (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), it is erasable can Program read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), may be programmed read-only deposit Reservoir (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 503 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 502 is sent by communication component 505.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 504 provides interface between processor 501 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 505 is for the electronic equipment 500 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G, 4G or 5G or they one or more of combination, therefore corresponding this is logical Believe that component 505 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 500 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing above-mentioned text emotion analysis method.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of above-mentioned text emotion analysis method is realized when program instruction is executed by processor.For example, the computer-readable storage Medium can be the above-mentioned memory 502 including program instruction, and above procedure instruction can be by the processor 501 of electronic equipment 500 It executes to complete above-mentioned text emotion analysis method.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program instructions, which refers to The step of text emotion analysis method provided by the invention is realized when order is executed by processor.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
It should be noted that, in this document, relational terms such as first and second and the like are used merely to a reality Body or operation are distinguished with another entity or operation, are deposited without necessarily requiring or implying between these entities or operation In any actual relationship or order or sequence.Moreover, the terms "include", "comprise" or its any other variant are intended to Non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wanted including those Element, but also including other elements that are not explicitly listed, or further include for this process, method, article or equipment Intrinsic element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that There is also other identical elements in process, method, article or equipment including element.
The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to the foregoing embodiments Invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each implementation Technical solution documented by example is modified or equivalent replacement of some of the technical features;And these modification or Replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of text emotion analysis method, which is characterized in that the described method includes:
Text to be analyzed is obtained, the text to be analyzed is text comments information of the user for product;
Classified to obtain feature critiques collection, the target based on each participle of the target signature dictionary to the text to be analyzed Feature lexicon includes being handled by Agglomerative Hierarchical Clustering method for describing the multiple groups specific feature set of product different attribute;
Feature critiques collection input is used to differentiate the disaggregated model of comment affective style, obtains the institute of disaggregated model output State the affective style information of text to be analyzed.
2. the method as described in claim 1, which is characterized in that the target signature dictionary constructs in the following way:
It is random to obtain multistage text comments information;
The text comments information is pre-processed to obtain alternative features collection, the alternative features data set is and product theme The degree of association is greater than the set that the feature of degree of association threshold value segments;
Corresponding multidimensional term vector is converted by each feature participle of the alternative features collection;
The multidimensional term vector is clustered according to default cluster condition, obtains feature clustering gathering;
The multiple groups generic features collection for describing product different attribute is determined according to the feature clustering gathering, it is each described general Feature set includes the common feature participle for the preset quantity for describing single attribute;
The phase segmented according to each common feature of each multidimensional term vector of the alternative features collection and the generic features collection Like degree, the target signature dictionary is constructed.
3. method according to claim 2, which is characterized in that the disaggregated model constructs in the following way:
The affective style for marking every cluster of the feature clustering gathering obtains the affective style label of each gathering;
According to the feature clustering gathering train classification models for having affective style label, to obtain for differentiating comment emotion The object-class model of type.
4. method as claimed in claim 3, which is characterized in that the basis has the feature clustering of affective style label Gathering train classification models, to obtain the object-class model for differentiating comment affective style, comprising:
Using every cluster in feature clustering gathering as input training sample data, by the gathering each in the training set Affective style label obtains multiple fundamental classifiers as output training sample data training;
Multiple fundamental classifiers are obtained into multiple middle classification devices according to Adaboost algorithm training, wherein the centre The weight of each fundamental classifier in classifier can automatic adjusument;
Multiple middle classification devices are integrated based on evidential reasoning and obtain the object-class model, wherein the target classification The weight of each middle classification device in model can automatic adjusument.
5. method as claimed in claim 4, which is characterized in that described to be used to differentiate comment feelings for feature critiques collection input The disaggregated model for feeling type obtains the affective style information of the text to be analyzed of disaggregated model output, comprising:
Determine the weight distribution of each middle classification device in object-class model;
The feature critiques collection is inputted to the target classification mould integrated according to multiple middle classification devices that the weight is distributed Type, output obtain the affective style information of the text to be analyzed.
6. the method as described in any one of claims 1 to 5, which is characterized in that the method also includes:
The target signature dictionary is updated according to the feature critiques collection.
7. a kind of text emotion analytical equipment, which is characterized in that described device includes:
Module is obtained, for obtaining text to be analyzed, the text to be analyzed is text comments information of the user for product;
Categorization module is commented for being classified to obtain feature based on each participle of the target signature dictionary to the text to be analyzed Analects, the target signature dictionary include being handled by Agglomerative Hierarchical Clustering method for describing the more of product different attribute Group specific feature set;
Determining module obtains this point for feature critiques collection input to be used to differentiate the disaggregated model of comment affective style The affective style information of the text to be analyzed of class model output.
8. device as claimed in claim 7, which is characterized in that described device further include:
Update module, for updating the target signature dictionary according to the feature critiques collection.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claims 1 to 6 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize any one of claims 1 to 6 institute The step of stating method.
CN201910639049.4A 2019-07-16 2019-07-16 Text emotion analysis method and electronic equipment Active CN110413780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910639049.4A CN110413780B (en) 2019-07-16 2019-07-16 Text emotion analysis method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910639049.4A CN110413780B (en) 2019-07-16 2019-07-16 Text emotion analysis method and electronic equipment

Publications (2)

Publication Number Publication Date
CN110413780A true CN110413780A (en) 2019-11-05
CN110413780B CN110413780B (en) 2022-02-22

Family

ID=68361586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910639049.4A Active CN110413780B (en) 2019-07-16 2019-07-16 Text emotion analysis method and electronic equipment

Country Status (1)

Country Link
CN (1) CN110413780B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN111126063A (en) * 2019-12-26 2020-05-08 北京百度网讯科技有限公司 Text quality evaluation method and device
CN111144507A (en) * 2019-12-30 2020-05-12 北京百度网讯科技有限公司 Emotion analysis model pre-training method and device and electronic equipment
CN111353300A (en) * 2020-02-14 2020-06-30 中科天玑数据科技股份有限公司 Data set construction and related information acquisition method and device
CN111597336A (en) * 2020-05-14 2020-08-28 腾讯科技(深圳)有限公司 Processing method and device of training text, electronic equipment and readable storage medium
CN112199500A (en) * 2020-09-30 2021-01-08 北京猎豹移动科技有限公司 Emotional tendency identification method and device for comments and electronic equipment
CN112560912A (en) * 2020-12-03 2021-03-26 北京百度网讯科技有限公司 Method and device for training classification model, electronic equipment and storage medium
CN112883295A (en) * 2019-11-29 2021-06-01 北京搜狗科技发展有限公司 Data processing method, device and medium
CN112966092A (en) * 2020-11-25 2021-06-15 安徽教育网络出版有限公司 Knowledge graph personalized semantic recommendation method based on basic education
CN113111269A (en) * 2021-05-10 2021-07-13 网易(杭州)网络有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN113435199A (en) * 2021-07-18 2021-09-24 谢勇 Storage and reading interference method and system for character corresponding culture
CN113722487A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 User emotion analysis method, device and equipment and storage medium
WO2021243956A1 (en) * 2020-06-05 2021-12-09 浙江工商大学 Method for performing fine-grained text sentiment analysis on basis of degree of user dissatisfaction
CN115292505A (en) * 2022-10-09 2022-11-04 深圳市明源云科技有限公司 Public opinion-based market analysis method, device, equipment and readable storage medium
CN115910110A (en) * 2022-11-30 2023-04-04 杭州网筌科技有限公司 Government affair service system based on natural language identification
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
US20170351676A1 (en) * 2016-06-02 2017-12-07 International Business Machines Corporation Sentiment normalization using personality characteristics
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550269A (en) * 2015-12-10 2016-05-04 复旦大学 Product comment analyzing method and system with learning supervising function
US20170351676A1 (en) * 2016-06-02 2017-12-07 International Business Machines Corporation Sentiment normalization using personality characteristics
CN108763214A (en) * 2018-05-30 2018-11-06 河海大学 A kind of sentiment dictionary method for auto constructing for comment on commodity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
郑开雨: "基于上下文语义的AdaBoost-NB文本分类", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
金蕾等: "基于电商评论的文本情感分类", 《电脑知识与技术》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112883295B (en) * 2019-11-29 2024-02-23 北京搜狗科技发展有限公司 Data processing method, device and medium
CN112883295A (en) * 2019-11-29 2021-06-01 北京搜狗科技发展有限公司 Data processing method, device and medium
CN111091000A (en) * 2019-12-24 2020-05-01 深圳视界信息技术有限公司 Processing system and method for extracting user fine-grained typical opinion data
CN111126063A (en) * 2019-12-26 2020-05-08 北京百度网讯科技有限公司 Text quality evaluation method and device
US11537792B2 (en) 2019-12-30 2022-12-27 Beijing Baidu Netcom Science And Technology Co., Ltd. Pre-training method for sentiment analysis model, and electronic device
CN111144507A (en) * 2019-12-30 2020-05-12 北京百度网讯科技有限公司 Emotion analysis model pre-training method and device and electronic equipment
CN111353300A (en) * 2020-02-14 2020-06-30 中科天玑数据科技股份有限公司 Data set construction and related information acquisition method and device
CN111353300B (en) * 2020-02-14 2023-09-01 中科天玑数据科技股份有限公司 Data set construction and related information acquisition method and device
CN111597336A (en) * 2020-05-14 2020-08-28 腾讯科技(深圳)有限公司 Processing method and device of training text, electronic equipment and readable storage medium
CN111597336B (en) * 2020-05-14 2023-12-22 腾讯科技(深圳)有限公司 Training text processing method and device, electronic equipment and readable storage medium
US11748565B2 (en) 2020-06-05 2023-09-05 Zhejiang Gongshang University Method for analyzing fine-grained text sentiment based on users' harshness
WO2021243956A1 (en) * 2020-06-05 2021-12-09 浙江工商大学 Method for performing fine-grained text sentiment analysis on basis of degree of user dissatisfaction
CN112199500A (en) * 2020-09-30 2021-01-08 北京猎豹移动科技有限公司 Emotional tendency identification method and device for comments and electronic equipment
CN112966092A (en) * 2020-11-25 2021-06-15 安徽教育网络出版有限公司 Knowledge graph personalized semantic recommendation method based on basic education
CN112560912A (en) * 2020-12-03 2021-03-26 北京百度网讯科技有限公司 Method and device for training classification model, electronic equipment and storage medium
CN112560912B (en) * 2020-12-03 2023-09-26 北京百度网讯科技有限公司 Classification model training method and device, electronic equipment and storage medium
CN113111269A (en) * 2021-05-10 2021-07-13 网易(杭州)网络有限公司 Data processing method and device, computer readable storage medium and electronic equipment
CN113435199A (en) * 2021-07-18 2021-09-24 谢勇 Storage and reading interference method and system for character corresponding culture
CN113435199B (en) * 2021-07-18 2023-05-26 谢勇 Storage and reading interference method and system for character corresponding culture
CN113722487A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 User emotion analysis method, device and equipment and storage medium
CN115292505A (en) * 2022-10-09 2022-11-04 深圳市明源云科技有限公司 Public opinion-based market analysis method, device, equipment and readable storage medium
CN115910110A (en) * 2022-11-30 2023-04-04 杭州网筌科技有限公司 Government affair service system based on natural language identification
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117217218B (en) * 2023-11-08 2024-01-23 中国科学技术信息研究所 Emotion dictionary construction method and device for science and technology risk event related public opinion

Also Published As

Publication number Publication date
CN110413780B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
CN110413780A (en) Text emotion analysis method, device, storage medium and electronic equipment
CN111753060B (en) Information retrieval method, apparatus, device and computer readable storage medium
TWI424325B (en) Systems and methods for organizing collective social intelligence information using an organic object data model
CN110532451A (en) Search method and device for policy text, storage medium, electronic device
WO2016179938A1 (en) Method and device for question recommendation
CN107992531A (en) News personalization intelligent recommendation method and system based on deep learning
CN112052356B (en) Multimedia classification method, apparatus and computer readable storage medium
CN104350490A (en) Methods, apparatuses and computer-readable mediums for organizing data relating to a product
CN110516074A (en) Website theme classification method and device based on deep learning
CN111897963A (en) Commodity classification method based on text information and machine learning
CN110162771A (en) The recognition methods of event trigger word, device, electronic equipment
CN113011186A (en) Named entity recognition method, device, equipment and computer readable storage medium
CN112597283A (en) Notification text information entity attribute extraction method, computer equipment and storage medium
CN107229614A (en) Method and apparatus for grouped data
CN110827118A (en) Method for automatically analyzing user comments in application store and recommending user comments to developer
CN115018255A (en) Tourist attraction evaluation information quality validity analysis method based on integrated learning data mining technology
CN116108191A (en) Deep learning model recommendation method based on knowledge graph
CN114547346A (en) Knowledge graph construction method and device, electronic equipment and storage medium
CN106775694B (en) A kind of hierarchy classification method of software configuration code product
CN116882414A (en) Automatic comment generation method and related device based on large-scale language model
CN102541913B (en) VSM classifier trainings, the identification of the OSSP pages and the OSS Resource Access methods of web oriented
CN112181814A (en) Multi-label marking method for defect report
Rastogi et al. Exploring graph based approaches for author name disambiguation
Sun et al. A scenario model aggregation approach for mobile app requirements evolution based on user comments
CN115510269A (en) Video recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant