CN110263344A - A kind of text emotion analysis method, device and equipment based on mixed model - Google Patents

A kind of text emotion analysis method, device and equipment based on mixed model Download PDF

Info

Publication number
CN110263344A
CN110263344A CN201910554825.0A CN201910554825A CN110263344A CN 110263344 A CN110263344 A CN 110263344A CN 201910554825 A CN201910554825 A CN 201910554825A CN 110263344 A CN110263344 A CN 110263344A
Authority
CN
China
Prior art keywords
text
model
emotion
analysis
entry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910554825.0A
Other languages
Chinese (zh)
Other versions
CN110263344B (en
Inventor
李兆钧
丁永兵
雷小平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chuangyou Digital Technology Guangdong Co Ltd
Original Assignee
Mingchuang Excellent Products (hengqin) Enterprise Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mingchuang Excellent Products (hengqin) Enterprise Management Co Ltd filed Critical Mingchuang Excellent Products (hengqin) Enterprise Management Co Ltd
Priority to CN201910554825.0A priority Critical patent/CN110263344B/en
Publication of CN110263344A publication Critical patent/CN110263344A/en
Application granted granted Critical
Publication of CN110263344B publication Critical patent/CN110263344B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The text emotion analysis method based on mixed model that this application discloses a kind of, device and equipment, using the mixed model of algorithm model and sentiment dictionary, the identification of the first emotion tendency based on algorithm model is carried out to text to be analyzed simultaneously and the second emotion tendency based on sentiment dictionary identifies, when the second emotional orientation analysis has recognition result output, it is exported the result of the second emotional orientation analysis as the first final emotion recognition result, when the second emotional orientation analysis does not have recognition result output, it is exported the result of the first emotional orientation analysis as the second final emotion recognition result, the range of the baseline result of algorithm model can be accessed, again tuning can be carried out according to the precision of sentiment dictionary, to promote whole accuracy rate and coverage rate, artificial degree of participation reduces, solution The technical issues of existing sentiment analysis method of having determined cannot solve analysis accuracy rate with higher simultaneously, reduce artificial participation and expand result coverage rate.

Description

A kind of text emotion analysis method, device and equipment based on mixed model
Technical field
This application involves text emotion analysis technical field more particularly to a kind of text emotion analyses based on mixed model Recognition methods, device and equipment.
Background technique
Text emotion analysis, also known as opinion mining, proneness analysis etc. are briefly to the master for having emotional color The property seen text is analyzed, handled, concluded and the process of reasoning.Produced on internet it is that a large amount of user participates in, for all Such as personage, event, the valuable comment information of product, these comment informations express the various emotional colors and emotion of people Tendentiousness is based on this, and potential user can understand public public opinion to Mr. Yu by browsing the comment of these subjective colo(u)rs The view of one event or product.
Existing sentiment analysis method is divided into two classes, and one kind is the method based on sentiment dictionary, passes through feelings in matched text Sense word number calculates identification Sentiment orientation, and this method needs to expend a large amount of manpower and build huge emotion dictionary early period, after Phase, there is still a need for a large amount of manpowers to safeguard the dictionary;Another kind of is the method based on algorithm model, utilizes a large amount of training sample of mark This training pattern predicts new samples, and this method needs early period and marks a large amount of training sample, and generally there are divide for training data The phenomenon that class imbalanced training sets, can impact prediction result, additionally due to netspeak iteration is rapid, need constantly to add Training sample keeps the update of model.The advantages of method based on algorithm model is to be rapidly completed to model and obtain baseline As a result, still there are problems that analysis accuracy rate is not high and is difficult to tuning, the method based on sentiment dictionary, although comparatively dividing It is some higher to analyse accuracy rate, but there is a problem of that artificial participation is higher and result coverage rate is inadequate.
Summary of the invention
The embodiment of the present application provides a kind of text emotion analysis method, device and equipment based on mixed model, is used for Analysis accuracy rate, the artificial participation of reduction and expansion knot with higher cannot be solved simultaneously by solving existing sentiment analysis method The technical issues of fruit coverage rate.
In view of this, the application first aspect provides a kind of text emotion analysis method based on mixed model, including Following steps:
101, it is analysed to text input Predistribution Algorithm model and carries out the first emotional orientation analysis, while according to preset feelings Feel dictionary model and the second emotional orientation analysis is carried out to the text to be analyzed;
102, judge whether second emotional orientation analysis has recognition result output, if so, output described second The result of emotional orientation analysis is as the first final emotion recognition as a result, otherwise, exporting first emotional orientation analysis Result as the second final emotion recognition result.
Preferably, before step 101 further include:
S10, to from the text in the text corpus that the large-scale social networks text that acquires on social networks is constituted Sample carries out word2vec model training, obtains term vector library;
S11, all term vectors expression in the text corpus in one text is converted into the one text Text vector indicates, if the text vector is indicated that being divided into the model that training set carries out Ganlei's algorithm model with verifying collection instructs Practice, the algorithm model for obtaining meeting preset condition is as the Predistribution Algorithm model.
Preferably, before step 101 further include:
S2, the samples of text in text corpus is screened and is filtered, calculate each entry in the samples of text Tendentiousness probability obtains the positive negative affect weight dictionary mould being made of the weight size of front entry, negative entry and the two Type, using the positive negative affect weight dictionary model as the preset sentiment dictionary model.
Preferably, step S2 is specifically included:
S21, the samples of text in text corpus is segmented, text filtering and remove noise;
It is negative probability and described that the corresponding samples of text, which occur, in S22, each entry for calculating the samples of text Samples of text is positive probability;
Negative probability or front probability are greater than the entry of preset threshold value in S23, the output samples of text, for described negative Face probability is greater than the negative entry of the preset threshold value, using the opposite number of the negative probability as the first of the negative entry Sentiment orientation weight is greater than the front probability front entry of the preset threshold value, using the front probability as institute The the second Sentiment orientation weight for stating front entry is obtained by the front entry, the first Sentiment orientation weight, described negative The positive negative affect weight dictionary model of entry and the second Sentiment orientation weight composition, by the positive negative affect weight dictionary mould Type is as the preset sentiment dictionary model.
Preferably, if Ganlei's algorithm model includes: logistic regression disaggregated model, support vector cassification model, Piao Plain Bayesian Classification Model, random forest disaggregated model, GBDT disaggregated model and xgboost disaggregated model.
Preferably, the Predistribution Algorithm model is according to ten folding cross-validation method Optimized model parameters.
Preferably, step S11 is specifically included:
S110, all term vectors under one text sample in the text corpus are subjected to the flat of corresponding dimension It calculates, the text vector for obtaining the one text sample indicates;
If the text vector S111, is indicated that being divided into training set and verifying collection carries out the model instruction of Ganlei's algorithm model Practice, the algorithm model for obtaining meeting preset condition is as the Predistribution Algorithm model.
The application second aspect provides a kind of text emotion analytical equipment based on mixed model, comprises the following modules:
Mixed model analysis module carries out the first emotion tendency point for being analysed to text input Predistribution Algorithm model Analysis, while the second emotional orientation analysis is carried out to the text to be analyzed according to preset sentiment dictionary model;
Identify output module, for judging whether second emotional orientation analysis has recognition result output, if so, The result of second emotional orientation analysis is exported as the first final emotion recognition as a result, otherwise, exporting first feelings The result of proneness analysis is felt as the second final emotion recognition result.
Preferably, the mixed model analysis module is also used to:
To from the samples of text in the text corpus that the large-scale social networks text that acquires on social networks is constituted Word2vec model training is carried out, term vector library is obtained;
All term vectors in the text corpus in one text are indicated to the text for being converted into the one text Vector indicates, if the text vector is indicated that being divided into training set and verifying collection carries out the model training of Ganlei's algorithm model, The algorithm model for obtaining meeting preset condition is as the Predistribution Algorithm model;
Samples of text in text corpus is screened and filtered, the tendency of each entry in the samples of text is calculated Property probability, obtain the positive negative affect weight dictionary model being made of the weight size of front entry, negative entry and the two, will The positive negative affect weight dictionary model is as the preset sentiment dictionary model.
The application third aspect provides a kind of text emotion analytical equipment based on mixed model, and the equipment includes place Manage device and memory:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used for according to the instruction execution first aspect in said program code based on mixed model Text emotion analysis method.
As can be seen from the above technical solutions, the embodiment of the present application has the advantage that
In the application, a kind of text emotion analysis method based on mixed model is provided, comprising the following steps: 101, general Text input Predistribution Algorithm model to be analyzed carries out the first emotional orientation analysis, while being treated according to preset sentiment dictionary model It analyzes text and carries out the second emotional orientation analysis;102, judge whether the second emotional orientation analysis has recognition result output, If so, the result of the second emotional orientation analysis of output is as the first final emotion recognition as a result, otherwise, exporting the first emotion The result of proneness analysis is as the second final emotion recognition result.Text emotion analysis method provided by the present application, using calculation The mixed model of method model and sentiment dictionary, while the first emotion tendency based on algorithm model is carried out to text to be analyzed and is known Not and the identification of the second emotion tendency based on sentiment dictionary will when the second emotional orientation analysis has recognition result output The result of second emotional orientation analysis is exported as the first final emotion recognition result, in the second emotional orientation analysis When there is no recognition result output, carried out using the result of the first emotional orientation analysis as the second final emotion recognition result defeated Out, the range of the baseline result of algorithm model can be accessed and tuning is carried out according to the precision of sentiment dictionary, from And whole accuracy rate and coverage rate are promoted, artificial degree of participation reduces, and solving existing sentiment analysis method cannot be simultaneously The technical issues of solving analysis accuracy rate with higher, reducing artificial participation and expand result coverage rate.
Detailed description of the invention
Fig. 1 is a kind of process of one embodiment of the text emotion analysis method based on mixed model provided by the present application Schematic diagram;
Fig. 2 is a kind of stream of another embodiment of the text emotion analysis method based on mixed model provided by the present application Journey schematic diagram;
Fig. 3 is a kind of structure of one embodiment of the text emotion analytical equipment based on mixed model provided by the present application Schematic diagram.
Specific embodiment
In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.
In order to make it easy to understand, referring to Fig. 1, a kind of text emotion analysis method based on mixed model provided by the present application One embodiment, comprising:
Step 101 is analysed to text input Predistribution Algorithm model the first emotional orientation analysis of progress, while according to pre- It sets sentiment dictionary model and the second emotional orientation analysis is carried out to text to be analyzed.
It should be noted that carrying out feelings based on exclusive use algorithm model existing in the prior art in the embodiment of the present application Sense analyze the existing low problem of accuracy rate and be used alone sentiment dictionary carry out artificial participation height existing for sentiment analysis and Baseline result covering limitation the problem of, two kinds of sentiment analysis models are used in mixed way, to text to be analyzed simultaneously into The sentiment analysis of row algorithm model and the sentiment analysis of sentiment dictionary model.
Step 102 judges whether the second emotional orientation analysis has recognition result output, if so, the second emotion of output The result of proneness analysis is as the first final emotion recognition as a result, otherwise, the result of the first emotional orientation analysis of output is made For the second final emotion recognition result.
It should be noted that when the second emotional orientation analysis has recognition result output, due to sentiment dictionary mould The sentiment analysis result accuracy rate of type is higher than the sentiment analysis accuracy rate of algorithm model, therefore, by the emotion of sentiment dictionary model Analysis result, still, can since the result coverage rate of sentiment dictionary model is limited to as final text emotion recognition result There can be the case where result is not exported to the sentiment analysis of text to be analyzed, at this point, to make up the defect, the embodiment of the present application In, the text emotion to be analyzed analysis result of algorithm model output is exported as final text emotion recognition result.
The text emotion analysis method provided in the embodiment of the present application, using the hybrid guided mode of algorithm model and sentiment dictionary Type, while the identification of the first emotion tendency based on algorithm model and the second feelings based on sentiment dictionary are carried out to text to be analyzed Orientation identification is felt, when the second emotional orientation analysis has recognition result output, by the result of the second emotional orientation analysis It is exported as the first final emotion recognition result, when the second emotional orientation analysis does not have recognition result output, by The result of one emotional orientation analysis is exported as the second final emotion recognition result, can access algorithm model The range of baseline result, and tuning can be carried out according to the precision of sentiment dictionary, to promote whole accuracy rate and cover Lid rate, artificial degree of participation reduce, and analysis with higher cannot be solved accurately simultaneously by solving existing sentiment analysis method Rate reduces the technical issues of artificial participation and expansion result coverage rate.
In order to make it easy to understand, referring to Fig. 2, a kind of text emotion analysis side based on mixed model provided herein Another embodiment of method, comprising:
Step 201, to from the text corpus that the large-scale social networks text that acquires on social networks is constituted Samples of text carries out word2vec model training, obtains term vector library.
It should be noted that acquiring large-scale social networks text conduct from social networks in the embodiment of the present application The samples of text of text corpus, since the language expression being related on network is various informative, referent is also very much, therefore needs The samples of text of corpus is segmented, the pretreatment of text filtering and removing noise text.Meanwhile collecting industry open source Extensive Opening field Chinese corpus training term vector, using these term vectors as initial value initialization skip-gram mould The network structure of type (one of which in word2vec model), using above-mentioned pretreated samples of text as training sample, It is trained using stochastic gradient descent method to term vector model (above-mentioned skip-gram model), finally by term vector model The term vector that training obtains is as term vector library used in subsequent modeling.
In the embodiment of the present application, in view of social networks text data scale is all more huge and term vector model is related to Training parameter is more, and stochastic gradient descent method can greatly speed up model training speed, therefore uses it to term vector model Objective function optimizes, and objective function chooses cross entropy loss function and carries out model training herein.
Step 202, the average meter that all term vectors under one text sample in text corpus are carried out to corresponding dimension It calculates, the text vector for obtaining the one text sample indicates, text vector expression is divided into training set and verifying collection carries out If the model training of Ganlei's algorithm model, the algorithm model for obtaining meeting preset condition is as Predistribution Algorithm model.
It should be noted that obtained after the term vector library of term vector model training in the embodiment of the present application, it will All term vectors in one text carry out the average computation of corresponding dimension, and the text vector for obtaining corresponding text indicates, such as: There are two word w for one text1,w2, their corresponding text vectors are expressed as 200 dimensional vectors, (x1,x2,…,x200), (y1, y2,…,y200), it is ((x that corresponding dimension, which asks average,1+y1)/2,(x2+y2)/2,…,(x200+y200)/2), obtain the text Vector indicates.
The text vector of text is indicated to calculate for the lack of uniformity of positive negative sample using SMOTE etc. as training sample Method carries out random over-sampling, and is divided into training set and verifying collection, carries out model instruction using the sorting algorithm model of machine learning Practice, using the optimized parameter of preferably each model of ten folding cross validation methods, 9/10 training sample taken to carry out model training every time, 1/10 sample carries out the verifying of model, takes the average value of 10 results as final result, to make full use of training sample Data select most suitable model parameter, the sorting algorithm model selected in the embodiment of the present application is logistic regression disaggregated model, Support vector cassification model, Naive Bayes Classification Model, random forest disaggregated model, GBDT disaggregated model and xgboost Disaggregated model measures the algorithm effect in the above sorting algorithm model using indexs such as AUC, Precision, Recall, F1, And output result of the output result of the best model of selection indicators as algorithm model.AUC(area under the Curve a possibility that) what index was measured is the performance of model sequence, i.e., positive sample is sorted before negative sample, Precision Index be model be judged as among positive sample be actually also positive sample ratio, Recall index is in practical positive sample by model The ratio being correctly found, F1 index are the harmonic-means of Precision index and Recall index, Precision and Recall is shifting two indices, and when judgment criteria is stringent, Precision can be improved and Recall can decline, on the contrary ?.Preference pattern be according to AUC index because the characteristic of AUC first is that insensitive to positive and negative sample distribution, it is contemplated that it is social The usage scenario of netspeak, positive negative sample are usually very unbalanced, therefore can be to avoid positive and negative sample using AUC evaluation model This unbalanced problem;Also it needs to be determined that the cutoff of model judging result, this problem AUC just can not be true after preference pattern It is fixed, because AUC only considers that the performance of sequence, this problem just need to use Precision, Recall or F1 to solve, this Three indexs will carry out different selections in different application, such as the case where need " finding out whole positive samples as much as possible ", Just chosen with Recall index, such as need " what is found out is all positive sample as far as possible " the case where, just with Precision index come It chooses, is chosen if needing to coordinate both the above situation with F1 index.
Step 203, the samples of text in text corpus is segmented, text filtering and remove noise.
It is negative probability and samples of text that corresponding samples of text, which occur, in step 204, each entry for calculating samples of text For positive probability.
Negative probability or front probability are greater than the entry of preset threshold value in step 205, output samples of text, for negative general Rate is greater than the negative entry of the preset threshold value, weighs using the opposite number of negative probability as the first Sentiment orientation of negative entry Weight is greater than front probability the front entry of preset threshold value, weighs using front probability as the second Sentiment orientation of front entry Weight obtains the positive negative affect being made of front entry, the first Sentiment orientation weight, negative entry and the second Sentiment orientation weight power Weight dictionary model, using positive negative affect weight dictionary model as preset sentiment dictionary model.
Step 203~step 205 carries out simultaneously with step 201~step 202.
It should be noted that in the embodiment of the present application, step 203~step 205 is to sentiment dictionary model training and right The process that text to be analyzed is analyzed.Since the language expression on social networks is various informative, the object of design is also very much, because This needs to carry out sentence division to the samples of text of text corpus, then whether contains target word inside statement-by-statement search, will contain There is the sentence of target word to extract, it is contemplated that if the prefix of emotion word adds the word of the prefixes such as " no ", "no" negative, That just negative meaning will be on the contrary, therefore, it is necessary to first filter the sentence containing prefix negative word.It is calculated according to statistical model Conditional probability P (Y | w), wherein the Y expression text is negative, and w is word appearance, when formula expression means that w word occurs, The text is negative probability, can calculate the probability using method for parameter estimation, can be with Maximum Likelihood Estimation Method, not It is confined to the method.All entries are sorted from large to small according to P (Y | w), wherein bigger explanation entry text occurs just More be likely to be negative, threshold value be used as using 0.9 citing, P (Y | the w) entry for being greater than threshold value is exported, will export entry list into Row screening can be screened in conjunction with business experience artificial screening or according to machine identification method, by the not high word of interpretation Item filtering, and several entries are selected as being may determine that once encountering as negative word at once, negative early warning word list is established, Remaining word is using the opposite number of probability as Sentiment orientation weight, therefore the weight of negative entry is negative.It correspondingly, can also be with The weight of front entry is obtained, but the weight of front entry is the probability of the word, without using the opposite number of probability, therefore, just The probability of face entry is positive number.
Therefore, when carrying out Affective Evaluation using sentiment dictionary to text in the embodiment of the present application, text is subjected to language first Sentence divides, and selects the sentence containing target word, then retrieves the front word contained in sentence and negation words, and press weight calculation Algebraical sum, judges whether the text contains prefix negative word, if any the opposite number for then taking the algebraical sum, finally judges the algebraical sum It is positive and negative, if it is positive number then text Sentiment orientation be front, be otherwise negative.
Step 206 is analysed to text input Predistribution Algorithm model the first emotional orientation analysis of progress, while according to pre- It sets sentiment dictionary model and the second emotional orientation analysis is carried out to text to be analyzed.
It should be noted that the step 206 in the embodiment of the present application is consistent with the step 101 in a upper embodiment, herein No longer repeated.
Step 207 judges whether the second emotional orientation analysis has recognition result output, if so, the second emotion of output The result of proneness analysis is as the first final emotion recognition as a result, otherwise, the result of the first emotional orientation analysis of output is made For the second final emotion recognition result.
It should be noted that the step 207 in the embodiment of the present application is consistent with the step 102 in a upper embodiment, herein No longer repeated.
In order to make it easy to understand, referring to Fig. 3, present invention also provides a kind of text emotion analytical equipments of mixed model Embodiment, comprising:
Mixed model analysis module 301 carries out the first Sentiment orientation for being analysed to text input Predistribution Algorithm model Property analysis, while according to preset sentiment dictionary model to text to be analyzed carry out the second emotional orientation analysis.
Identify output module 302, for judging whether the second emotional orientation analysis has recognition result output, if so, The result of the second emotional orientation analysis is exported as the first final emotion recognition as a result, otherwise, exporting the first emotion tendency The result of analysis is as the second final emotion recognition result.
Further, mixed model analysis module 301 is also used to:
To from the samples of text in the text corpus that the large-scale social networks text that acquires on social networks is constituted Word2vec model training is carried out, term vector library is obtained;
All term vectors in text corpus in one text are indicated that the text vector for being converted into one text indicates, If text vector is indicated that being divided into training set and verifying collection carries out the model training of Ganlei's algorithm model, obtains meeting default item The algorithm model of part is as Predistribution Algorithm model;
Samples of text in text corpus is screened and filtered, the tendentiousness for calculating each entry in samples of text is general Rate obtains the positive negative affect weight dictionary model being made of the weight size of front entry, negative entry and the two, will be positive and negative Emotion weight dictionary model is as preset sentiment dictionary model.
The text emotion analytical equipment based on mixed model that present invention also provides a kind of, equipment include processor and deposit Reservoir;
Program code is transferred to processor for storing program code by memory;
Processor is used for according to the text emotion analysis side above-mentioned based on mixed model of the instruction execution in program code The text emotion analysis method based on mixed model in method embodiment.
In several embodiments provided herein, it should be understood that disclosed device and method can pass through it Its mode is realized.For example, the apparatus embodiments described above are merely exemplary, for example, the division of the unit, only Only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components can be tied Another system is closed or is desirably integrated into, or some features can be ignored or not executed.Another point, it is shown or discussed Mutual coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or logical of device or unit Letter connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (full name in English: Read-Only Memory, english abbreviation: ROM), random access memory (full name in English: Random Access Memory, english abbreviation: RAM), the various media that can store program code such as magnetic or disk.
The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that: it still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features;And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims (10)

1. a kind of text emotion analysis method based on mixed model, which comprises the following steps:
101, it is analysed to text input Predistribution Algorithm model and carries out the first emotional orientation analysis, while according to preset emotion word Allusion quotation model carries out the second emotional orientation analysis to the text to be analyzed;
102, judge whether second emotional orientation analysis has recognition result output, if so, output second emotion The result of proneness analysis is as the first final emotion recognition as a result, otherwise, exporting the knot of first emotional orientation analysis Fruit is as the second final emotion recognition result.
2. the text emotion analysis method according to claim 1 based on mixed model, which is characterized in that step 101 it Before further include:
S10, to from the samples of text in the text corpus that the large-scale social networks text that acquires on social networks is constituted Word2vec model training is carried out, term vector library is obtained;
S11, all term vectors in the text corpus in one text are indicated to the text for being converted into the one text Vector indicates, if the text vector is indicated that being divided into training set and verifying collection carries out the model training of Ganlei's algorithm model, The algorithm model for obtaining meeting preset condition is as the Predistribution Algorithm model.
3. the text emotion analysis method according to claim 1 or 2 based on mixed model, which is characterized in that step 101 Before further include:
S2, the samples of text in text corpus is screened and is filtered, calculate the tendency of each entry in the samples of text Property probability, obtain the positive negative affect weight dictionary model being made of the weight size of front entry, negative entry and the two, will The positive negative affect weight dictionary model is as the preset sentiment dictionary model.
4. the text emotion analysis method according to claim 3 based on mixed model, which is characterized in that step S2 is specific Include:
S21, the samples of text in text corpus is segmented, text filtering and remove noise;
It is negative probability and the text that the corresponding samples of text, which occur, in S22, each entry for calculating the samples of text Sample is positive probability;
Negative probability or front probability are greater than the entry of preset threshold value in S23, the output samples of text, for described negative general Rate is greater than the negative entry of the preset threshold value, using the opposite number of the negative probability as the first emotion of the negative entry Be inclined to weight, for the front probability be greater than the preset threshold value front entry, using the front probability as it is described just Second Sentiment orientation weight of face entry is obtained by the front entry, the first Sentiment orientation weight, the negative entry With the positive negative affect weight dictionary model of the second Sentiment orientation weight composition, the positive negative affect weight dictionary model is made For the preset sentiment dictionary model.
5. the text emotion analysis method according to claim 2 based on mixed model, which is characterized in that if the Ganlei Algorithm model includes: logistic regression disaggregated model, support vector cassification model, Naive Bayes Classification Model, random forest Disaggregated model, GBDT disaggregated model and xgboost disaggregated model.
6. the text emotion analysis method according to claim 2 based on mixed model, which is characterized in that the preset calculation Method model is according to ten folding cross-validation method Optimized model parameters.
7. the text emotion analysis method according to claim 2 based on mixed model, which is characterized in that step S11 tool Body includes:
S110, the average meter that all term vectors under one text sample in the text corpus are carried out to corresponding dimension It calculates, the text vector for obtaining the one text sample indicates;
If the text vector S111, is indicated that being divided into training set and verifying collection carries out the model training of Ganlei's algorithm model, The algorithm model for obtaining meeting preset condition is as the Predistribution Algorithm model.
8. a kind of text emotion analytical equipment based on mixed model, which is characterized in that comprise the following modules:
Mixed model analysis module carries out the first emotional orientation analysis for being analysed to text input Predistribution Algorithm model, The second emotional orientation analysis is carried out to the text to be analyzed according to preset sentiment dictionary model simultaneously;
Output module is identified, for judging whether second emotional orientation analysis has recognition result output, if so, output The result of second emotional orientation analysis is as the first final emotion recognition as a result, otherwise, exporting first emotion and inclining The result of tropism analysis is as the second final emotion recognition result.
9. the text emotion analytical equipment according to claim 8 based on mixed model, which is characterized in that the hybrid guided mode Type analysis module is also used to:
It is carried out to from the samples of text in the text corpus that the large-scale social networks text acquired on social networks is constituted Word2vec model training obtains term vector library;
All term vectors in the text corpus in one text are indicated to the text vector for being converted into the one text It indicates, if the text vector is indicated that being divided into training set and verifying collection carries out the model training of Ganlei's algorithm model, obtains Meet the algorithm model of preset condition as the Predistribution Algorithm model;
Samples of text in text corpus is screened and filtered, the tendentiousness for calculating each entry in the samples of text is general Rate obtains the positive negative affect weight dictionary model being made of the weight size of front entry, negative entry and the two, will be described Positive negative affect weight dictionary model is as the preset sentiment dictionary model.
10. a kind of text emotion analytical equipment based on mixed model, which is characterized in that the equipment includes processor and deposits Reservoir:
Said program code is transferred to the processor for storing program code by the memory;
The processor is used for described in any item based on mixing according to the instruction execution claim 1-7 in said program code The text emotion analysis method of model.
CN201910554825.0A 2019-06-25 2019-06-25 Text emotion analysis method, device and equipment based on hybrid model Active CN110263344B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910554825.0A CN110263344B (en) 2019-06-25 2019-06-25 Text emotion analysis method, device and equipment based on hybrid model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910554825.0A CN110263344B (en) 2019-06-25 2019-06-25 Text emotion analysis method, device and equipment based on hybrid model

Publications (2)

Publication Number Publication Date
CN110263344A true CN110263344A (en) 2019-09-20
CN110263344B CN110263344B (en) 2022-04-19

Family

ID=67921343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910554825.0A Active CN110263344B (en) 2019-06-25 2019-06-25 Text emotion analysis method, device and equipment based on hybrid model

Country Status (1)

Country Link
CN (1) CN110263344B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177390A (en) * 2019-12-30 2020-05-19 南京三百云信息科技有限公司 Accident vehicle identification method and device based on hybrid model
CN111831824A (en) * 2020-07-16 2020-10-27 民生科技有限责任公司 Public opinion positive and negative face classification method

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN104572616A (en) * 2014-12-23 2015-04-29 北京锐安科技有限公司 Method and device for identifying text orientation
CN105095183A (en) * 2014-05-22 2015-11-25 株式会社日立制作所 Text emotional tendency determination method and system
CN106776574A (en) * 2016-12-28 2017-05-31 Tcl集团股份有限公司 User comment text method for digging and device
US20170169008A1 (en) * 2015-12-15 2017-06-15 Le Holdings (Beijing) Co., Ltd. Method and electronic device for sentiment classification
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest
US20170337182A1 (en) * 2016-05-23 2017-11-23 Ricoh Company, Ltd. Evaluation element recognition method, evaluation element recognition apparatus, and evaluation element recognition system
US20170351971A1 (en) * 2016-06-07 2017-12-07 International Business Machines Corporation Method and apparatus for informative training repository building in sentiment analysis model learning and customaization
CN108052505A (en) * 2017-12-26 2018-05-18 上海智臻智能网络科技股份有限公司 Text emotion analysis method and device, storage medium, terminal
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
US20180349355A1 (en) * 2017-05-31 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Based Method and Apparatus for Constructing Comment Graph
CN109492226A (en) * 2018-11-10 2019-03-19 上海文军信息技术有限公司 A method of it improving the low text of Sentiment orientation accounting and prejudges accuracy rate
CN109840328A (en) * 2019-02-28 2019-06-04 上海理工大学 Deep learning comment on commodity text emotion trend analysis method

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN105095183A (en) * 2014-05-22 2015-11-25 株式会社日立制作所 Text emotional tendency determination method and system
CN104572616A (en) * 2014-12-23 2015-04-29 北京锐安科技有限公司 Method and device for identifying text orientation
US20170169008A1 (en) * 2015-12-15 2017-06-15 Le Holdings (Beijing) Co., Ltd. Method and electronic device for sentiment classification
US20170337182A1 (en) * 2016-05-23 2017-11-23 Ricoh Company, Ltd. Evaluation element recognition method, evaluation element recognition apparatus, and evaluation element recognition system
US20170351971A1 (en) * 2016-06-07 2017-12-07 International Business Machines Corporation Method and apparatus for informative training repository building in sentiment analysis model learning and customaization
CN106776574A (en) * 2016-12-28 2017-05-31 Tcl集团股份有限公司 User comment text method for digging and device
CN107066553A (en) * 2017-03-24 2017-08-18 北京工业大学 A kind of short text classification method based on convolutional neural networks and random forest
US20180349355A1 (en) * 2017-05-31 2018-12-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Artificial Intelligence Based Method and Apparatus for Constructing Comment Graph
CN108052505A (en) * 2017-12-26 2018-05-18 上海智臻智能网络科技股份有限公司 Text emotion analysis method and device, storage medium, terminal
CN108717406A (en) * 2018-05-10 2018-10-30 平安科技(深圳)有限公司 Text mood analysis method, device and storage medium
CN109492226A (en) * 2018-11-10 2019-03-19 上海文军信息技术有限公司 A method of it improving the low text of Sentiment orientation accounting and prejudges accuracy rate
CN109840328A (en) * 2019-02-28 2019-06-04 上海理工大学 Deep learning comment on commodity text emotion trend analysis method

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
CHANDAN PRASAD GUPTA等: "Detecting Sentiment in Nepali texts: A bootstrap approach for Sentiment Analysis of texts in the Nepali language", 《2015 INTERNATIONAL CONFERENCE ON COGNITIVE COMPUTING AND INFORMATION PROCESSING(CCIP)》 *
DAVID B. BRACEWELL: "Semi-Automatic WordNet Based Emotion Dictionary Construction", 《2010 NINTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS》 *
XUEYAN LIU等: "Social Network Influence Propagation Model Based on Emotion Analysis", 《2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG)》 *
何跃等: "结合话题相关性的热点话题情感倾向研究", 《数据分析与知识发现》 *
文俊等: "基于协同迭代及动态词库扩展的文本情感倾向分类算法", 《成都信息工程学院学报》 *
毕秋敏等: "一种主动学习和协同训练相结合的半监督微博情感分类方法", 《现代图书情报技术》 *
赵军等: "一种改进的融合关联词典的微博倾向性分析方法", 《数据采集与处理》 *
韩飞等: "一种结合随机游走和粗糙决策的文本分类方法", 《小型微型计算机系统》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177390A (en) * 2019-12-30 2020-05-19 南京三百云信息科技有限公司 Accident vehicle identification method and device based on hybrid model
CN111831824A (en) * 2020-07-16 2020-10-27 民生科技有限责任公司 Public opinion positive and negative face classification method
CN111831824B (en) * 2020-07-16 2024-02-09 民生科技有限责任公司 Public opinion positive and negative surface classification method

Also Published As

Publication number Publication date
CN110263344B (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN103365867B (en) It is a kind of that the method and apparatus for carrying out sentiment analysis are evaluated to user
CN110135494A (en) Feature selection method based on maximum information coefficient and Gini index
CN103207913B (en) The acquisition methods of commercial fine granularity semantic relation and system
CN109299268A (en) A kind of text emotion analysis method based on dual channel model
US20190278864A2 (en) Method and device for processing a topic
CN109522548A (en) A kind of text emotion analysis method based on two-way interactive neural network
CN106709754A (en) Power user grouping method based on text mining
CN111125360B (en) Emotion analysis method and device in game field and model training method and device thereof
CN108052505A (en) Text emotion analysis method and device, storage medium, terminal
CN104361037B (en) Microblogging sorting technique and device
CN107122340A (en) A kind of similarity detection method for the science and technology item return analyzed based on synonym
CN108733644B (en) A kind of text emotion analysis method, computer readable storage medium and terminal device
CN109918501A (en) Method, apparatus, equipment and the storage medium of news article classification
CN112749281A (en) Restful type Web service clustering method fusing service cooperation relationship
Wang et al. A spectral clustering method with semantic interpretation based on axiomatic fuzzy set theory
CN107145516A (en) A kind of Text Clustering Method and system
CN110472040A (en) Extracting method and device, storage medium, the computer equipment of evaluation information
CN111339439A (en) Collaborative filtering recommendation method and device fusing comment text and time sequence effect
CN106776566A (en) The recognition methods of emotion vocabulary and device
CN110457472A (en) The emotion association analysis method for electric business product review based on SOM clustering algorithm
CN106547866A (en) A kind of fine granularity sensibility classification method based on the random co-occurrence network of emotion word
CN110019563B (en) Portrait modeling method and device based on multi-dimensional data
CN110263344A (en) A kind of text emotion analysis method, device and equipment based on mixed model
Gabbay et al. Isolation forests and landmarking-based representations for clustering algorithm recommendation using meta-learning
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20201120

Address after: Room 011, first floor, no.2429, Xingang East Road, Haizhu District, Guangzhou City, Guangdong Province (office only)

Applicant after: CHUANGYOU digital technology (Guangdong) Co., Ltd

Address before: 519000 -41072, 105 room 6, Baohua Road, Hengqin New District, Zhuhai, Guangdong (centralized office area)

Applicant before: MINISO (HENGQIN) ENTERPRISE MANAGEMENT Co.,Ltd.

GR01 Patent grant
GR01 Patent grant