CN108647205A

CN108647205A - Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing

Info

Publication number: CN108647205A
Application number: CN201810414228.3A
Authority: CN
Inventors: 刘志煌
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2018-05-02
Filing date: 2018-05-02
Publication date: 2018-10-12
Anticipated expiration: 2038-05-02
Also published as: CN108647205B

Abstract

The invention discloses a kind of fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing, the present invention passes through during carrying out fine granularity sentiment analysis, attribute word and emotion word are extracted using class sequence rules, improve the accuracy rate of attribute word and emotion word extraction, excavated class sequence rules are made to change with the variation for waiting for training text of institute's application field, the generalization ability of constructed fine granularity sentiment analysis model is improved, and makes constructed fine granularity sentiment analysis model there are good scalabilities；Solves the problems, such as long-distance dependence between emotion word and attribute word by class sequence rules, solves the problems, such as long-distance dependence between evaluation object and evaluating word, and by the extraction of the neural network of attention mechanism and the relevant emotion contextual information of attribute word, realize fine granularity sentiment analysis.

Description

Fine granularity sentiment analysis model building method, equipment and readable storage medium storing program for executing

Technical field

The present invention relates to field of computer technology more particularly to a kind of fine granularity sentiment analysis model building method, equipment And readable storage medium storing program for executing.

Background technology

Fine granularity sentiment analysis, also known as property level sentiment analysis belong to a classification of text emotion analysis, the analysis one As be to carry out Judgment by emotion for attributive character in comment text.Compared to chapter grade or the sentiment analysis of Sentence-level, fine granularity Sentiment analysis more specific clearly can carry out emotion recognition, the analysis knot of gained for the association attributes feature of product Fruit provides more detailed evaluation information, therefore more has reference significance and value.

First step of fine granularity sentiment analysis is to carry out the extraction of evaluation object, and OpinionTargetsExtraction Identification is used from magnanimity The product attribute that consumer is concerned about is obtained in the product evaluation of family.Such as：" service is fine, and facility is quite good, but room sound insulation is real Too poor ".In this section about in the evaluation in hotel, the product attribute of consumer's care is " service ", " facility " and " sound insulation ", institute Very crucial effect is played to fine granularity sentiment analysis with the extraction of Feature Words.OpinionTargetsExtraction Identification is attributed to two methods, One is based on dictionary, template and for the text of specific area, manually laying down a regulation to extract fine granularity essential elements of evaluation, such as first It determines candidate target, then part-of-speech rule is used to filter candidate collection, to obtain accurate evaluation object.This method relies on In the rule that dictionary and language specialist are formulated, poor expandability, generalization ability is not strong, for not including in sentiment dictionary Network neologisms etc. can not be identified well, and extraction effect is not good enough；Another method is then to make the extraction of essential elements of evaluation For sequence labelling problem, such as essential elements of evaluation extracted using condition random field, hidden Markov model sequence labelling method, But this method can not solve the problems, such as long-distance dependence between evaluating word and evaluation object.It follows that by existing OpinionTargetsExtraction Identification method judges Sentiment orientation, and there are poor expandabilities, and generalization ability is not strong, and can not solve evaluating word and Between evaluation object the problem of long-distance dependence.

Invention content

The main purpose of the present invention is to provide a kind of fine granularity sentiment analysis model building method, equipment and readable storages Medium, it is intended to which solving the existing Sentiment orientation judgment method to text, there are poor expandabilities, and generalization ability is not strong, and can not The technical issues of solving long-distance dependence between evaluating word and evaluation object.

To achieve the above object, the present invention provides a kind of fine granularity sentiment analysis model building method, the fine granularity feelings It includes step to feel analysis model construction method：

When get the first preset quantity wait train subordinate sentence after, participle operation is carried out to the subordinate sentence to be trained, and be Each word after participle in the subordinate sentence to be trained adds part of speech label；

The attribute word and emotion word of the second preset quantity are obtained in the subordinate sentence to be trained, and are belonged to for attribute word addition Property word label, add emotion word label for the emotion word, determine the corresponding part of speech sequence of each subordinate sentence to be trained；

According to the part of speech sequential mining goal rule containing the attribute word label and/or the emotion word label, and root According to the attribute set of words and emotion set of words in subordinate sentence to be trained described in goal rule extraction；

The attribute word in the attribute set of words, which is corresponded to, according to each emotion word in the emotion set of words adds feelings Feel class label；

Each attribute word in the attribute set of words and the corresponding contextual information of each attribute word are subjected to vectorization table Show, obtains the attribute word and the corresponding term vector of the contextual information；

Using the attribute word and the corresponding term vector of the contextual information as the multilayer of attention mechanism nerve The input of network, using emotional category label corresponding with the attribute word as the multilayer neural network of the attention mechanism Output is as a result, to build fine granularity sentiment analysis model.

Preferably, described using the attribute word and the corresponding term vector of the contextual information as the attention mechanism Multilayer neural network input, using emotional category label corresponding with the attribute word as the multilayer of the attention mechanism The output of neural network further includes as a result, after the step of to build fine granularity sentiment analysis model：

The subordinate sentence to be tested for obtaining third preset quantity, the category in the subordinate sentence to be tested is extracted according to the goal rule Property word；

It is described thin by being inputted after the attribute word of each subordinate sentence to be tested and the progress vectorization expression of corresponding contextual information In granularity sentiment analysis model, correspondence obtains the emotional category label of attribute word in the subordinate sentence to be tested；

The emotional category label of the subordinate sentence attribute word to be tested is corresponded to default with the subordinate sentence attribute word to be tested Emotional category label is compared, and the fine granularity sentiment analysis model analysis text is determined according to the comparing result of comparison gained The accuracy rate of affective style.

Preferably, it is described when get the first preset quantity wait train subordinate sentence after, the subordinate sentence to be trained is divided Word operates, and the step of adding part of speech label for each word in the subordinate sentence to be trained after participle includes：

When get the first preset quantity after training subordinate sentence, remove unrelated character in the subordinate sentence to be trained and stop Word, and participle operation is carried out to the subordinate sentence to be trained by segmentation methods, the subordinate sentence to be trained after being segmented；

Part of speech label is added for each word of subordinate sentence to be trained described in after participle.

Preferably, the step of each subordinate sentence to be trained of the determination corresponding part of speech sequence includes：

Whether subordinate sentence to be trained described in detection carries the attribute word label and the emotion word label；

If the subordinate sentence to be trained carries the attribute word label and the emotion word label, by the attribute word label Attribute word in subordinate sentence to be trained described in replacement, and by the emotion in subordinate sentence to be trained described in the emotion word tag replacement Word, and group is corresponded to according to the corresponding part of speech label of each word, attribute word label and emotion word label in the subordinate sentence to be trained The part of speech sequence of subordinate sentence to be trained described in synthesis；

If the subordinate sentence to be trained does not carry the attribute word label and the emotion word label, wait training according to In subordinate sentence the corresponding part of speech tag combination of each word at the subordinate sentence to be trained part of speech sequence.

Preferably, the basis contains the part of speech sequential mining mesh of the attribute word label and/or the emotion word label Marking regular step includes:

Determine the target part of speech sequence containing the attribute word label and/or the emotion word label in the part of speech sequence Row；

The First ray quantity for meeting same rule in the target part of speech sequence is calculated, is determined in the part of speech sequence In addition to the target part of speech sequence, meet the second sequence quantity of rule to be determined, wherein the rule to be determined is described the The rule that target part of speech sequence meets described in one sequence quantity；

According in the part of speech sequence total sequence quantity and the First ray quantity support is calculated, according to institute It states the second sequence quantity and confidence level is calculated in the First ray quantity；

If the support is more than or equal to default support threshold, and the confidence level is more than or equal to and pre-sets Confidence threshold then regard the rule to be determined as goal rule.

Preferably, if the support is more than or equal to default support threshold, and the confidence level be more than or Person is equal to default confidence threshold value, then by before described regular the step of being used as goal rule to be determined, further includes：

Subordinate sentence quantity in subordinate sentence to be trained described in acquisition and default supporting rate；

The product between the subordinate sentence quantity and the default supporting rate is calculated, using the product as the default support Spend threshold value.

Preferably, the attribute set of words and emotion word set according in subordinate sentence to be trained described in goal rule extraction The step of conjunction includes：

The subordinate sentence of the attribute word label and/or the emotion word label, note have been added in subordinate sentence to be trained described in determination For target subordinate sentence；

By the part of speech sequence of other subordinate sentences in the subordinate sentence to be trained in addition to the target subordinate sentence and the goal rule It is matched, to extract the attribute set of words and emotion set of words in the subordinate sentence to be trained.

Preferably, described using the attribute word and the corresponding term vector of the contextual information as the attention mechanism Multilayer neural network input, using emotional category label corresponding with the attribute word as the multilayer of the attention mechanism The output of neural network as a result, include the step of fine granularity sentiment analysis model to build：

Using the term vector of the attribute word and the corresponding term vector of the contextual information as the first floor of attention mechanism The input of the attention layer of neural net layer, obtains and the relevant contextual information of attribute word emotion；

It is corresponding with the relevant contextual information of attribute word emotion in the linear layer pair of the first floor neural net layer The term vector of term vector and the attribute word is summed, and summed result is obtained；

It, will emotional category mark corresponding with the attribute word using the summed result as the input of next layer of neural network The output of the multilayer neural network as the attention mechanism is signed as a result, obtaining each in the fine granularity sentiment analysis model A parameter builds the fine granularity sentiment analysis model according to the parameter.

In addition, to achieve the above object, it is described thin the present invention also provides a kind of fine granularity sentiment analysis model construction equipment Granularity sentiment analysis model construction equipment includes memory, processor and is stored on the memory and can be in the processor The fine granularity sentiment analysis model construction program of upper operation, the fine granularity sentiment analysis model construction program is by the processor The step of fine granularity sentiment analysis model building method as described above is realized when execution.

In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium Fine granularity sentiment analysis model construction program is stored on storage medium, the fine granularity sentiment analysis model construction program is located The step of reason device realizes fine granularity sentiment analysis model building method as described above when executing.

The present invention is by the way that during carrying out fine granularity sentiment analysis, attribute word and emotion are extracted using class sequence rules Word improves the accuracy rate of attribute word and emotion word extraction, excavated class sequence rules is made to wait instructing with institute's application field Practice the variation of text and change, improve the generalization ability of constructed fine granularity sentiment analysis model, and makes constructed There are good scalabilities for fine granularity sentiment analysis model；It is solved by class sequence rules long between emotion word and attribute word The problem of distance relies on, that is, solve the problems, such as long-distance dependence between evaluation object (attribute word) and evaluating word (emotion word), And by the extraction of the neural network of attention mechanism and the relevant emotion contextual information of attribute word, realize fine granularity emotion point Analysis.

Description of the drawings

Fig. 1 is the structural schematic diagram for the hardware running environment that the embodiment of the present invention is related to；

Fig. 2 is the flow diagram of fine granularity sentiment analysis model building method first embodiment of the present invention；

Fig. 3 is the flow diagram of fine granularity sentiment analysis model building method second embodiment of the present invention.

The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.

Specific implementation mode

It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.

As shown in Figure 1, the structural schematic diagram for the hardware running environment that Fig. 1, which is the embodiment of the present invention, to be related to.

It should be noted that Fig. 1 can be the structure of the hardware running environment of fine granularity sentiment analysis model construction equipment Schematic diagram.Fine granularity sentiment analysis model construction equipment of the embodiment of the present invention can be PC, the terminal devices such as pocket computer.

As shown in Figure 1, the fine granularity sentiment analysis model construction equipment may include：Processor 1001, such as CPU, net Network interface 1004, user interface 1003, memory 1005, communication bus 1002.Wherein, communication bus 1002 is for realizing these Connection communication between component.User interface 1003 may include display screen (Display), input unit such as keyboard (Keyboard), optional user interface 1003 can also include standard wireline interface and wireless interface.Network interface 1004 is optional May include standard wireline interface and wireless interface (such as WI-FI interfaces).Memory 1005 can be high-speed RAM memory, Can also be stable memory (non-volatile memory), such as magnetic disk storage.Memory 1005 optionally may be used also To be independently of the storage device of aforementioned processor 1001.

It will be understood by those skilled in the art that fine granularity sentiment analysis model construction device structure shown in Fig. 1 is not The restriction to fine granularity sentiment analysis model construction equipment is constituted, may include than illustrating more or fewer components or group Close certain components or different components arrangement.

As shown in Figure 1, as may include that operating system, network are logical in a kind of memory 1005 of computer storage media Believe module, Subscriber Interface Module SIM and fine granularity sentiment analysis model construction program.Wherein, operating system is that management and control are thin The program of granularity sentiment analysis model construction device hardware and software resource, support fine granularity sentiment analysis model construction program with And the operation of other softwares or program.

In fine granularity sentiment analysis model construction equipment shown in Fig. 1, user interface 1003 is mainly used for obtaining user Addition instruction, obtain the acquisition instruction etc. of subordinate sentence to be trained；Network interface 1004 is mainly used for connecting background server, and rear Platform server is into row data communication, as searched the corresponding answer that waits answering a question；And processor 1001 can be used for calling storage The fine granularity sentiment analysis model construction program stored in device 1005, and execute following operation：

Further, described using the attribute word and the corresponding term vector of the contextual information as the attention machine The input of the multilayer neural network of system, using emotional category label corresponding with the attribute word as the more of the attention mechanism As a result, after the step of to build fine granularity sentiment analysis model, processor 1001 can be also used for for the output of layer neural network The fine granularity sentiment analysis model construction program stored in memory 1005 is called, and executes following steps：

Further, it is described when get the first preset quantity after training subordinate sentence, to subordinate sentence the to be trained progress Participle operation, and the step of adding part of speech label for each word in the subordinate sentence to be trained after participle includes：

Further, the step of each subordinate sentence to be trained of the determination corresponding part of speech sequence includes：

Further, the basis contains the part of speech sequential mining of the attribute word label and/or the emotion word label The step of goal rule includes:

Further, if the support is more than or equal to default support threshold, and the confidence level is more than Or equal to default confidence threshold value, then by before described regular the step of being used as goal rule to be determined, processor 1001 is also It can be used for calling the fine granularity sentiment analysis model construction program stored in memory 1005, and execute following steps：

Further, the attribute set of words and emotion word according in subordinate sentence to be trained described in goal rule extraction The step of set includes：

Further, described using the attribute word and the corresponding term vector of the contextual information as the attention machine The input of the multilayer neural network of system, using emotional category label corresponding with the attribute word as the more of the attention mechanism The output of layer neural network as a result, include the step of fine granularity sentiment analysis model to build：

Based on above-mentioned structure, each embodiment of fine granularity sentiment analysis model building method is proposed.Fine granularity emotion Analysis model construction method is applied to fine granularity sentiment analysis model construction equipment, and fine granularity sentiment analysis model construction equipment can For PC, the terminal devices such as pocket computer.For the simplicity of description, in each of following fine granularity sentiment analysis model building method In a embodiment, this executive agent of omission fine granularity sentiment analysis model construction equipment.

It is the flow diagram of fine granularity sentiment analysis model building method first embodiment of the present invention with reference to Fig. 2, Fig. 2.

An embodiment of the present invention provides the embodiments of fine granularity sentiment analysis model building method, it should be noted that though So logical order is shown in flow charts, but in some cases, it can be with different from shown by sequence execution herein Or the step of description.

Fine granularity sentiment analysis model building method includes：

Step S10, when get the first preset quantity wait train subordinate sentence after, participle behaviour is carried out to the subordinate sentence to be trained Make, and part of speech label is added for each word in the subordinate sentence to be trained after participle.

When get the first preset quantity wait train subordinate sentence after, treat trained subordinate sentence and carry out participle operation, obtain waiting instructing Practice each word in subordinate sentence.It is each in subordinate sentence to be trained after obtaining participle after training each word in subordinate sentence Word adds part of speech label.

Further, when detect structure fine granularity sentiment analysis model structure instruction after, according to the structure instruction from It is crawled on network and waits for training dataset, specifically, electric business comment, news analysis and Taobao's comment etc. can be crawled from network.When It crawls after training dataset, identification waits for that training data concentrates punctuation mark and newline in each sentence etc., removes and waits for Training data concentrates punctuation mark and newline of each sentence etc., obtains waiting for the subordinate sentence to be trained that training data is concentrated, and select The subordinate sentence to be trained of the first preset quantity is taken to build fine granularity sentiment analysis model.Wherein, the first preset quantity can be according to tool Body needs and is arranged, and is not particularly limited in the present embodiment.Punctuation mark in identification waits for each sentence of training dataset During newline, each sentence that training data is concentrated and punctuation mark and the newline progress pre-set can will be waited for Comparison, to identify the punctuation mark and newline that wait for that training data concentrates each sentence.

Further, step S10 includes：

Step a, when get the first preset quantity after training subordinate sentence, remove the unrelated word in the subordinate sentence to be trained Symbol and stop words, and participle operation is carried out to the subordinate sentence to be trained by segmentation methods, the subordinate sentence to be trained after being segmented.

Further, when get the first preset quantity wait train subordinate sentence after, remove the unrelated word in subordinate sentence to be trained Symbol and stop words, the subordinate sentence to be trained that obtains that treated, and to treated, subordinate sentence to be trained segments by segmentation methods Operation, the subordinate sentence to be trained after being segmented.Segmentation methods include but not limited to segmentation methods based on dictionary, based on statistics Segmentation methods, rule-based segmentation methods and stammerer segmentation methods.It should be noted that being treated using stammerer segmentation methods Training subordinate sentence carries out in participle operating process, and the search engine participle pattern or accurate model of stammerer segmentation methods can be used Deng.

During the unrelated character and stop words in removing subordinate sentence to be trained, by subordinate sentence to be trained each word with Word in preset unrelated character database and stop words database is compared, by nothing to do with number of characters in subordinate sentence to be trained Remove according to word of the library as in stop words database.It is understood that unrelated character database and stop words database It pre-sets.In unrelated character database and stop words database, contain common unrelated character and stop words.Such as In unrelated character database, containing "/,~, % " etc., in stop words database, containing ", Oh, uh " etc..

Step b adds part of speech label for each word of subordinate sentence to be trained described in after participle.

When obtain participle after wait train subordinate sentence after, for the subordinate sentence to be trained after participle each word addition part of speech mark Label.It should be noted that during adding part of speech label for each word in subordinate sentence to be trained, it can be by user in fine granularity Input addition instruction in sentiment analysis model construction equipment is that each word adds part of speech label manually by user；Or it uses The modes such as stammerer segmentation methods, LTP part-of-speech taggings carry out part-of-speech tagging automatically by fine granularity sentiment analysis model construction equipment.It can With understanding, part of speech label is added for each word, that is, indicates the part of speech of each word, determines that each word is noun, pair Word or adjective etc..

Further, for Unified Expression specification, the speed and accuracy rate of structure fine granularity sentiment analysis model are improved, if There are Traditional Chinese words in subordinate sentence to be trained, then Traditional Chinese word are uniformly converted into corresponding Simplified Chinese word；If waiting training There is capitalization English and small English in subordinate sentence, then the capitalization English in subordinate sentence to be trained be revised as corresponding small English, Or the small English in subordinate sentence to be trained is revised as corresponding capitalization English, so as to only exist capitalization English in subordinate sentence to be trained Text only exists small English.

Step S20 obtains the attribute word and emotion word of the second preset quantity in the subordinate sentence to be trained, and is the attribute Word adds attribute word label, and emotion word label is added for the emotion word, determines the corresponding part of speech of each subordinate sentence to be trained Sequence.

The attribute word and emotion word of the second preset quantity are obtained in subordinate sentence to be trained, and attribute word mark is added for attribute word Label add emotion word label for emotion word.It is understood that the second preset quantity is less than the first preset quantity, second is default Quantity can determine according to the size and practical situations of the first preset quantity, can such as set the second preset quantity to 10, 20 or 32 etc..It should be noted that the second preset quantity is the basis for determining attribute set of words and emotion set of words, because This, can obtain a small number of attribute words and emotion word, the attribute set of words and emotion of subordinate sentence to be trained are obtained by class sequence rules Set of words.Wherein, in the present embodiment, it is to be referred in the middle triggering addition of fine granularity sentiment analysis model construction equipment by user It enables, fine granularity sentiment analysis model construction equipment is attribute word addition attribute word label according to addition instruction, and is emotion Word adds emotion word label, i.e., adds emotion word label manually by user.In the present embodiment, not limitation attribute word label and feelings Feel the specific manifestation form of word label, such as available " # " indicates that attribute word label, " * " indicate emotion word label.Such as some feature point Sentence is " room is very comfortable ", it is determined that " room " is attribute word, and " comfortable " is emotion word, adds " # " for " room ", is " comfortable " It adds " * "；If some feature subordinate sentence is " service is fine ", it is determined that " service " is attribute word, and " good " is emotion word, is " service " " # " is added, is " good " addition " * ".After being characterized subordinate sentence addition attribute word label and emotion word label, this feature subordinate sentence contains There is type label.

After adding attribute word label and emotion word label for the feature subordinate sentence in training subordinate sentence, determines and each wait training The corresponding part of speech sequence of subordinate sentence.Wherein, if some subordinate sentence to be trained is containing the attribute word and emotion word in the second preset quantity, Attribute word label and emotion word label are carried in its corresponding part of speech sequence；If some subordinate sentence to be trained is common subordinate sentence, i.e., Without containing the attribute word and emotion word in the second preset quantity, part of speech label is only carried in corresponding part of speech sequence.

Further, it is determined that the step of each subordinate sentence to be trained corresponding part of speech sequence, includes：

Whether step c, subordinate sentence to be trained described in detection carry the attribute word label and the emotion word label.

Step d, if the subordinate sentence to be trained carries the attribute word label and the emotion word label, by the attribute Attribute word in subordinate sentence to be trained described in word tag replacement, and will be in subordinate sentence be trained described in the emotion word tag replacement Emotion word, and according to the corresponding part of speech label of each word, attribute word label and emotion word label pair in the subordinate sentence to be trained The part of speech sequence of the subordinate sentence to be trained should be combined into.

Step e, if the subordinate sentence to be trained does not carry the attribute word label and the emotion word label, according to In subordinate sentence to be trained the corresponding part of speech tag combination of each word at the subordinate sentence to be trained part of speech sequence.

Further, it is determined that the detailed process of the corresponding part of speech sequence of each subordinate sentence to be trained can be：It detects and each waits instructing Practice in subordinate sentence and whether is carried to attribute word label and emotion word label.If carrying attribute word label and emotion in subordinate sentence to be trained Secondary label, then by the attribute word in attribute word tag replacement subordinate sentence to be trained, and by emotion word tag replacement subordinate sentence to be trained In emotion word.It is understood that in subordinate sentence to be trained, the corresponding attribute word of the second preset quantity is only carried, It needs the attribute word in attribute word tag replacement subordinate sentence to be trained, and carries the corresponding emotion word of the second preset quantity, It just needs the emotion word in emotion word tag replacement subordinate sentence to be trained.According to the corresponding part of speech of each word in subordinate sentence to be trained Label, attribute word label and emotion word label correspond to the part of speech sequence for being combined into subordinate sentence to be trained.If not having in subordinate sentence to be trained Attribute word label and emotion word label are carried, then will need to add attribute word label and emotion word label position is vacated, i.e., Directly according to the corresponding part of speech tag combination of each word in subordinate sentence to be trained at the part of speech sequence of subordinate sentence to be trained.

After such as treating trained subordinate sentence " room is very comfortable " and " service is fine " progress part-of-speech tagging, the result of gained is " room Between/n, very/d, comfortable/a " and " service/n, very/d, good/a ", wherein " n " indicates that noun, " d " indicate that adverbial word, " a " indicate shape Hold word, corresponding part of speech sequence is " #/n ,/d, */a " and " #/n ,/d, */a ", corresponding statement are all " #nd*a ".If some is waited for Training subordinate sentence is " price is not cheap ", but attribute word label and emotion word label are not contained in the subordinate sentence to be trained, then waits for this After training subordinate sentence carries out part-of-speech tagging, the result of gained is " price/n, not /d, cheap */a ".Corresponding part of speech sequence be "/ N ,/d ,/a ", it is corresponding to be expressed as " nda ".

Step S30 is advised according to the part of speech sequential mining target containing the attribute word label and/or the emotion word label Then, and according to the attribute set of words and emotion set of words in subordinate sentence to be trained described in goal rule extraction.

It is when obtaining each after training the part of speech sequence of subordinate sentence, according to containing attribute word label and/or emotion word label Part of speech sequential mining goal rule, and extract according to the goal rule excavated attribute set of words and emotion in subordinate sentence to be trained Set of words.It should be noted that after excavating goal rule, it will meet in the part of speech sequence of the goal rule, attribute word mark The word of label corresponding position is denoted as attribute word, and the word of emotion word label corresponding position is denoted as emotion word.Such as when some waits training The excavated goal rule of subordinate sentence satisfaction "<#<n><d>*<a>>", it is determined that vocabulary corresponding with " n " is in the subordinate sentence to be trained Attribute word, vocabulary corresponding with " a " are emotion word.

Specifically, it excavates and contains in part of speech sequence by class sequence rules (Class Sequential Rules, CSR) The goal rule of type label excavates goal rule in the part of speech sequence containing attribute word label and/or emotion word label. The rule that class sequence rules are made of type label (i.e. attribute word label and/or emotion word label) and part of speech sequence data, The two constitutes a kind of mapping relations, and formalization is expressed as：It is as follows to specifically describe the mapping relations by X → Y：X is a part of speech sequence Row, are expressed as

<S1x1S2x2…Sixi>, wherein S refers to part of speech sequence library, and part of speech sequence library is a series of tuples <sid,s>The set sid of composition is the label of part of speech sequence in part of speech sequence library, as first in part of speech sequence library The sid of part of speech sequence is 1, and the sid of the second part of speech sequence is 2, and S refers to part of speech sequence, and what xi was indicated is this part of speech sequence Arrange corresponding possible classification；Y is the part of speech sequence containing type label, is expressed as<S₁c₁S₂c₂...S_ic_i>, wherein (c_r∈ C, 1≤i≤r), S is defined as above, c_rFor determining type label, and C={ c₁,c₂,…c_rBe type label set.At this In embodiment, CSR requires the presence of the part of speech sequence for carrying attribute word label and/or emotion word label.When determining attribute word mark After label and emotion word label, CSR can will meet the part of speech sequential mining for presetting support threshold and default confidence threshold value and come out As goal rule.

Further, according to the part of speech sequential mining target containing the attribute word label and/or the emotion word label Rule step include：

Step f determines the target word containing the attribute word label and/or the emotion word label in the part of speech sequence Property sequence.

Step g calculates the First ray quantity for meeting same rule in the target part of speech sequence, in the part of speech sequence Middle determination meets the second sequence quantity of rule to be determined in addition to the target part of speech sequence, wherein the rule to be determined is The rule that target part of speech sequence meets described in the First ray quantity.

Further, according to the tool containing attribute word label and/or the part of speech sequential mining goal rule of emotion word label Body process is：Determine the part of speech sequence containing attribute word label and/or emotion word label in the corresponding part of speech sequence of subordinate sentence to be trained Row, are denoted as target part of speech sequence, and calculate the quantity for the part of speech sequence for meeting same rule in target part of speech sequence, are denoted as first Sequence quantity.Wherein, during the quantity for the part of speech sequence for meeting same rule in calculating target part of speech sequence, according to target The form of expression of each part of speech sequence determines in part of speech sequence, if the form of expression of certain two part of speech sequence is consistent, then really The two fixed part of speech sequences meet same rule.Such as when 3 part of speech sequences be respectively "<abd*gh>",<#kea>With<ab*fgh >, then can determine "<abd*gh>" and<ab*fgh>All meet rule "<<ab>x<gh>>→<<ab>*<gh>>", and<#kea>No Meet "<<ab>*<gh>>" rule.It should be noted that the letter in rule and part of speech sequence indicates the word of corresponding position word Property.

After meeting the First ray quantity of same rule in calculating target part of speech sequence, First ray quantity is corresponded to The rule that is met of target part of speech sequence be denoted as rule to be determined, and determination removes in the corresponding part of speech sequence of subordinate sentence to be trained Outside target part of speech sequence, meet the part of speech sequence of rule to be determined, and the corresponding number of part of speech sequence that rule to be determined will be met Amount is denoted as the second sequence quantity.Such as when part of speech sequence is<abeghk>With<d#kb>When, then it can determine part of speech sequence<abeghk> Meet it is to be determined rule "<<ab>*<gh>>", part of speech sequence<d#kb>Do not meet it is to be determined rule "<<ab>*<gh>>”.It can be with Understand, since attribute word label and emotion word label being not present in the corresponding part of speech sequence of the second sequence quantity, Part of speech sequence of the calculating without containing attribute word label and/or emotion word label meets during the quantity of rule to be determined, it is not necessary to Consider the attribute word label in rule to be determined and/or emotion word label.

Step h, according in the part of speech sequence total sequence quantity and the First ray quantity support is calculated, Confidence level is calculated according to the second sequence quantity and the First ray quantity.

After First ray quantity and the second sequence quantity is calculated, total sequence of the part of speech sequence of subordinate sentence to be trained is calculated First ray quantity divided by middle sequence quantity are obtained the corresponding support of goal rule by number of columns；By First ray quantity and Second sequence quantity is added, and obtains the sum of First ray quantity and the second sequence quantity, will be by First ray quantity divided by first The sum of sequence quantity and the second sequence quantity obtain the corresponding confidence level of goal rule.

Step i, if the support is more than or equal to default support threshold, and the confidence level is more than or equal to Default confidence threshold value then regard the rule to be determined as goal rule.

After regular grid DEM to be determined and confidence level is calculated, judge calculate gained support whether be more than or Person is equal to default support threshold, and judges whether the confidence level for calculating gained is more than or equal to default confidence threshold value. If calculating the support of gained more than or equal to default support threshold, and the confidence level for calculating gained is more than or equal in advance Confidence threshold is set, then regard rule to be determined as goal rule；If the support for calculating gained is less than default support threshold, And/or the confidence level for calculating gained does not then regard rule to be determined as goal rule less than default confidence threshold value.Wherein, in advance If support threshold and default confidence threshold value can be arranged according to specific needs, to presetting support threshold in the present embodiment It is not particularly limited with default confidence threshold value.

Further, according to the attribute set of words and emotion set of words in subordinate sentence to be trained described in goal rule extraction The step of include：

Step j, determine described in dividing for the attribute word label and/or the emotion word label has been added in subordinate sentence to be trained Sentence, is denoted as target subordinate sentence.

Step k, by the part of speech sequence of other subordinate sentences in the subordinate sentence to be trained in addition to the target subordinate sentence and the mesh Mark rule is matched, to extract the attribute set of words and emotion set of words in the subordinate sentence to be trained.

Further, the process for the attribute set of words in subordinate sentence to be trained being extracted according to goal rule is：Determination waits training The subordinate sentence that attribute word label and/or emotion word label have been added in subordinate sentence, will have been added in subordinate sentence to be trained attribute word label and/ Or the subordinate sentence of emotion word label is denoted as target subordinate sentence, and by the part of speech sequence of other subordinate sentences in subordinate sentence to be trained in addition to target subordinate sentence Row matched with identified goal rule, obtain attribute word and emotion word in other subordinate sentences in addition to target subordinate sentence, with Attribute set of words and emotion set of words are extracted in subordinate sentence to be trained.But such as some subordinate sentence part of speech sequence to be trained<fabeghk> Meet goal rule "<<ab>*<gh>>", then it can determine in the subordinate sentence to be trained, the corresponding word of part of speech " e " is attribute word.

In the attribute set of words and emotion set of words of gained, the attribute word being added in target subordinate sentence and emotion can be corresponded to Word, attribute word and the emotion word that can be also added without in target subordinate sentence.

Further, attribute set of words is being extracted, or after extracting attribute set of words and emotion set of words, corresponded to Part of speech sequence adds corresponding attribute word label and emotion word label.Complete attribute set of words and emotion word set in order to obtain It closes, after adding corresponding attribute word label and emotion word label for part of speech sequence, returns to step S40, return each time When executing step S40, default support threshold more corresponding than preceding primary execution step S40 all is arranged in default support threshold Greatly, to ensure the accuracy of excavated target sequence, to make the attribute word extracted by the goal rule excavated and feelings It is more accurate to feel word.In the present embodiment, class excavates the way of sequence reference semi-supervised learning, and repetitive exercise is being taken turns through excessive, The method for being similar to " snowball ", constantly marks new training set (subordinate sentence to be trained for meeting goal rule) and iteration is dug After pick rule, final attribute set of words and emotion set of words are obtained, to better assure that the precision ratio of CSR and look into complete Rate.Simultaneously because what is excavated is rule, versatility is very strong for part of speech sequence, so the Generalization Capability of CSR is fine.

Step S40 corresponds to the attribute in the attribute set of words according to each emotion word in the emotion set of words Word adds emotional category label.

After obtaining emotion set of words and attribute set of words, attribute is corresponded to according to each emotion word in emotion set of words Attribute word in set of words adds emotional category label.In the present embodiment, emotional category label is two kinds, and one is commendation feelings Sense, corresponding label may be configured as 1, and one is derogatory sense emotion, corresponding label may be configured as -1.It is understood that commendation emotion It is not restricted to 1 and -1 with the form of expression of derogatory sense emotion corresponding label.It in other embodiments, can be by emotional category label point It is 3 kinds or 4 kinds.Such as can be points 3 kinds by emotional category label, respectively preferably, it is medium and poor.It should be noted that in this implementation In example, there are a corresponding emotion words for an attribute word, therefore when determining after training some attribute word in subordinate sentence, Since attribute word and emotion word are there are corresponding relationship, it can correspond to and determine the corresponding emotion word of attribute word.

It, can be by user in fine granularity feelings during adding emotional category label for each attribute word in attribute set of words Feel input addition instruction in analysis model structure equipment, is that each subordinate sentence to be trained adds emotional category label manually by user； Or the corresponding vocabulary of different emotions class label is pre-set in fine granularity sentiment analysis model construction equipment, it will be each The corresponding emotion word of attribute word is compared with the emotional category label vocabulary pre-set, with the feelings of each attribute word of determination Feel class label.Such as when the corresponding vocabulary of pre-set commendation emotion has " comfortable, good, cheap, value, big ", derogatory sense emotion Corresponding vocabulary has " small, poor, do not worth, is rotten, is low " etc..When determine some corresponding emotion word of attribute word be commendation emotion it is corresponding When vocabulary, the corresponding label of commendation emotion is added for the attribute word；When determine some corresponding emotion word of attribute word be derogatory sense feelings When feeling corresponding vocabulary, the corresponding label of derogatory sense emotion is added for the attribute word.

Step S50 carries out each attribute word in the attribute set of words and the corresponding contextual information of each attribute word Vectorization indicates, obtains the attribute word and the corresponding term vector of the contextual information.

When obtaining after training the attribute set of words in subordinate sentence, determine in each subordinate sentence to be trained that attribute word is corresponding upper and lower Literary information, and attribute word and corresponding contextual information are subjected to vectorization expression, obtain each attribute word and corresponding context The term vector of information.Wherein, contextual information is and the relevant context words of attribute word.Specifically, word2vec can be passed through Tool obtains the term vector of each attribute word and corresponding contextual information.Word2vec can be in the dictionary of million orders of magnitude and upper Efficiently trained on hundred million data set, the training result that word2vec tools obtain --- term vector (word Embedding), the similitude between word and word can be measured well.In word2vec, word2vec is broadly divided into CBOW (Continuous Bag of Words) and two kinds of models of Skip-Gram.CBOW is to speculate target words from original statement, CBOW model equivalencies are multiplied by an embedding matrix in the vector of a bag of words, continuous to obtain one Embedding vectors；And Skip-Gram is exactly the opposite, is to deduce original statement from target words.It is understood that In the present embodiment, language processing tools are alternatively other tools that may be implemented with word2vec said functions.

Step S60, using the attribute word and the corresponding term vector of the contextual information as the attention mechanism The input of multilayer neural network, using emotional category label corresponding with the attribute word as the multilayer of attention mechanism god Output through network is as a result, to build fine granularity sentiment analysis model.

After obtaining the term vector of each attribute word and corresponding contextual information in attribute set of words, by attribute word and up and down The input of multilayer neural network of the corresponding term vector of literary information as attention mechanism, and determine and input attention mechanism Emotion word of the attribute word in same subordinate sentence in multilayer neural network, using the emotional category label of the emotion word as attention machine The last output of the multilayer neural network of system as a result, will the corresponding emotional category label of the attribute word as attention mechanism The output of multilayer neural network builds fine granularity sentiment analysis model as a result, to obtain the parameters in emotion participle model. Wherein, the number of plies of neural network can be arranged according to specific needs in attention mechanism, such as may be configured as 3 layers, 4 layers or 6 layers Deng.

Further, step S60 includes：

Step l, using the term vector of the attribute word and the corresponding term vector of the contextual information as attention mechanism First floor neural net layer attention layer input, obtain and the relevant contextual information of attribute word emotion.

Further, after the term vector of the term vector for obtaining attribute word and corresponding contextual information, by the word of attribute word Vector sum corresponds to the term vector of contextual information as the input for paying attention to layer in the first floor neural net layer of attention mechanism, obtains With the relevant contextual information of attribute word emotion.In neural net layer, there are one to pay attention to layer and linear layer.When by attribute word After corresponding term vector is input to the attention layer of neural network, it can export and the relevant contextual information of attribute word emotion.

Step m, in linear layer pair and the relevant contextual information of attribute word emotion of the first floor neural net layer The term vector of attribute word described in equivalent vector sum is summed, and summed result is obtained.

After obtaining contextual information relevant with attribute word emotion, it will remember with the relevant contextual information of attribute word emotion For relevant information, and with term vector that relevant information is determined in the relevant contextual information term vector of attribute word emotion, will belong to The input of the term vector of property word and the term vector of relevant information as the linear layer of first floor neural network, in first floor neural network Linear summation is corresponded to the term vector of the term vector of relevant information and corresponding attribute word in linear layer, obtains summed result.It can be with Understand, which is the output of first floor neural network linear layer.The relevant information is related to attribute word emotion A kind of word either compression the information that indicates of numeralization, for indicating and the relevant information of attribute word emotion.

As having 6, respectively A, B, C, D, E and F when some corresponding contextual information of attribute word.If according to first floor nerve Network notices that the output result of layer determines that this 5 contextual informations of A, C, D, E and F are relevant information, then by this 5 relevant informations The term vector of corresponding term vector and attribute word inputs in the linear layer of first floor neural network.

Step n will emotion corresponding with the attribute word using the summed result as the input of next layer of neural network Class label as the multilayer neural network of the attention mechanism output as a result, obtaining the fine granularity sentiment analysis model In parameters, the fine granularity sentiment analysis model is built according to the parameter.

After obtaining summed result, using the corresponding term vector of the summed result as next layer of nerve net in attention mechanism Network pays attention to the input of layer, according to the output determination of the attention layer and the more relevant contextual information of attribute word emotion, and will be with The term vector of the more relevant contextual information of attribute word emotion and the term vector of attribute word are input to the linear of the neural network In layer, loop iteration (i.e. by the output of upper neural network linear layer, the input of layer is paid attention to as next neural network) according to this, It is the corresponding emotional category label of attribute word to make the output result of last layer of neural network of attention mechanism.Pass through attribute word Attribute word in set constantly trains the parameters in attention mechanism to get to each in fine granularity sentiment analysis model Parameter.It is understood that after obtaining the parameters in fine granularity sentiment analysis model, that is, show successfully to construct particulate Spend sentiment analysis model.

It should be noted that in order to which constructed fine granularity sentiment analysis model can be corresponding to attribute word according to attribute word Subordinate sentence carries out Judgment by emotion, and the neural network that the present embodiment builds a multilayer using deep learning method carries out emotion marking, And the relevant contextual information of emotion for introducing attention mechanism concern and attribute word, extracts corresponding Sentiment orientation information, To carry out the Judgment by emotion of attribute word using these information.

The contextual information important to attribute word emotional semantic classification, different word distich are successively extracted using attention mechanism Son has different significance levels, such as stop words all occurs in many sentences, corresponding TF-IDF (term frequency- Inverse document frequency, word frequency-inverse document frequency) value very little, it is minimum to the contribution of sentence.Use TF- IDF values sum the Weight of term vector and their corresponding words to obtain sentence vector as term weighing, can be in sentence vector The middle percentage contribution for embodying each word distich vector and indicating.Therefore it to obtain with the relevant contextual information of attribute word emotion, note Power mechanism of anticipating substantially is exactly weighted average, if giving the term vector Vi (i=1,2 ..., n) of the attribute word of n m dimensions, whole The term vector information for closing all properties word, obtains the emotional category label phase as far as possible learnt with fine granularity sentiment analysis model The term vector of pass.In the present embodiment, in order to improve integration term vector accuracy, by attention mechanism by attribute word and Weight calculation corresponding with the attribute relevant contextual information of word emotion comes out.

During computation attribute word and contextual information respective weights, a scoring functions F (x) is first designed, it is defeated What is entered is the term vector of attribute word and the term vector of context, exports reciprocal fraction, that is, exports corresponding weight.Marking is according to being The degree of correlation of the term vector and attention mechanism perpetual object is calculated, if the term vector and perpetual object of input scoring functions Degree of correlation it is higher, reciprocal fraction value is bigger.It should be understood that perpetual object is the attribute word in the present embodiment. In the present embodiment, the term vector of attribute word is denoted as Vi, the term vector of context is denoted as Vt, and b is to be set in neural network Parameter.At this point, the input of activation primitive will consider multiple features, wherein activation primitive can be tanh, the functions such as relu.It is right The scoring functions answered can be：F (x)=activation (W₁V_i+W₂V_t+b)。

After obtaining corresponding fractional value by scoring functions, for fractional value, design classification activation primitive, output corresponds to Weight, then for first floor neural network, most important be exactly information is attribute word itself, so the first floor of attention mechanism is defeated Enter be attribute word term vector, output be with the given relevant contextual information of attribute word emotion, then by the word of attribute word to The term vector weighted sum of contextual information obtained by amount and first floor neural network obtains next layer of neural network and pays attention to the defeated of layer Enter, extracts to obtain the related most important contextual information of attribute word emotion by multilayer neural network, finally to contextual information Emotional semantic classification is carried out, that is, determines that emotion is derogatory sense emotion or commendation emotion, adds emotional category label.

The present embodiment is by the way that during carrying out fine granularity sentiment analysis, attribute word and emotion are extracted using class sequence rules Word improves the accuracy rate of attribute word and emotion word extraction, makes excavated class sequence rules (the target rule i.e. in the present embodiment Then) change with the variation for waiting for training text of institute's application field, improves constructed fine granularity sentiment analysis model Generalization ability, and make constructed fine granularity sentiment analysis model there are good scalabilities；Pass through class sequence rules solution Between emotion word of having determined and attribute word the problem of long-distance dependence, that is, solves long-distance dependence between evaluation object and evaluating word The problem of, and by the extraction of the neural network of attention mechanism and the relevant emotion contextual information of attribute word, realize fine granularity Sentiment analysis.

Further, fine granularity sentiment analysis model building method second embodiment of the present invention is proposed.

The fine granularity sentiment analysis model building method second embodiment and the fine granularity sentiment analysis model construction Difference lies in reference to Fig. 3, fine granularity sentiment analysis model building method further includes method first embodiment：

Step S70 obtains the subordinate sentence to be tested of third preset quantity, according to described to be tested point of goal rule extraction Attribute word in sentence.

After successfully building fine granularity sentiment analysis model, the subordinate sentence to be tested of third preset quantity is obtained, and according to mesh Mark the attribute word in Rule Extraction subordinate sentence to be tested.Wherein, third preset quantity can be equal to the first preset quantity, can also be small In the first preset quantity.Further, the emotion word in subordinate sentence to be tested can be also extracted according to goal rule.Wherein, according to mesh It marks the principle of the attribute word and emotion word in Rule Extraction subordinate sentence to be tested and is extracted in subordinate sentence to be trained according to goal rule The principle of attribute set of words is consistent, repeats no more in the present embodiment.

It should be noted that after getting subordinate sentence to be tested, needs to remove the unrelated character in subordinate sentence to be tested and stop Word etc., and participle operation is carried out to subordinate sentence to be tested, and add part of speech label etc. for each word in subordinate sentence to be tested. It is consistent that the process of trained subordinate sentence is treated in detailed process and embodiment one, is repeated no more in the present embodiment.

Step S80, will be defeated after the attribute word of each subordinate sentence to be tested and the progress vectorization expression of corresponding contextual information Enter in the fine granularity sentiment analysis model, correspondence obtains the emotional category label of attribute word in the subordinate sentence to be tested.

After obtaining the attribute word of each subordinate sentence to be tested, determine that each attribute word is corresponding upper and lower in subordinate sentence to be tested Literary information, and the attribute word of subordinate sentence to be tested and corresponding contextual information are subjected to vectorization expression, obtain subordinate sentence to be tested Middle attribute word and the corresponding term vector of contextual information, and by attribute word in subordinate sentence to be tested and the corresponding word of contextual information to In the constructed fine granularity sentiment analysis model of amount input, to obtain corresponding to the emotional category label of subordinate sentence to be tested.Wherein, carefully The output of granularity sentiment analysis model is the emotional category label of the corresponding subordinate sentence to be tested of attribute word of input.

Step S90 corresponds to the emotional category label of the subordinate sentence attribute word to be tested and the subordinate sentence attribute to be tested The default emotional category label of word is compared, and the fine granularity sentiment analysis model is determined according to the comparing result of comparison gained Analyze the accuracy rate of text emotion type.

After obtaining the emotional category label of subordinate sentence to be tested, the emotional category label of subordinate sentence attribute word to be tested is corresponded to It is compared with the default emotional category label of subordinate sentence attribute word to be tested, obtains comparing result, and determine according to comparing result The accuracy rate of fine granularity sentiment analysis model analysis text emotion type.Wherein, the default emotion class of subordinate sentence attribute word to be tested Distinguishing label is pre-set by user.For a subordinate sentence to be tested, if the output of fine granularity sentiment analysis model Emotional category label is consistent with default emotional category label, then can determine fine granularity sentiment analysis model to the subordinate sentence to be tested Sentiment analysis is correct；If the emotional category label and default emotional category label of the output of fine granularity sentiment analysis model are inconsistent, It then can determine sentiment analysis mistake of the fine granularity sentiment analysis model to the subordinate sentence to be tested.According to feelings in all subordinate sentences to be tested Correct data are analyzed in sense and the data of sentiment analysis mistake determine fine granularity sentiment analysis model analysis text emotion type Accuracy rate.

Such as when there is 100 subordinate sentences to be tested, the feelings of 83, fine granularity sentiment analysis model pair subordinate sentence attribute word to be tested Sense analysis is correct, i.e., the emotional category label of 83 subordinate sentence attribute words to be tested of fine granularity sentiment analysis model output with it is corresponding Default emotional category label it is consistent, to the sentiment analysis mistake of remaining 17 subordinate sentence attribute words to be tested, i.e. fine granularity emotion The emotional category label of 17 subordinate sentence attribute words to be tested of analysis model output differs with corresponding default emotional category label It causes, it is determined that the accuracy rate of fine granularity sentiment analysis model analysis text emotion type is 83%.

The present embodiment is by after successfully building fine granularity sentiment analysis model, obtaining constructed by subordinate sentence test to be tested The accuracy rate of fine granularity sentiment analysis model analysis text emotion type, in order to which user is determining constructed fine granularity emotion When the accuracy rate of analysis model analysis text emotion type is relatively low, passes through and obtain more subordinate sentences training fine granularity emotions to be trained Analysis model, to improve the accuracy rate of constructed fine granularity sentiment analysis model analysis text emotion type.

Further, fine granularity sentiment analysis model building method 3rd embodiment of the present invention is proposed.

The fine granularity sentiment analysis model building method 3rd embodiment and the fine granularity sentiment analysis model construction Difference lies in fine granularity sentiment analysis model building method further includes method first or second embodiments：

Step o, the subordinate sentence quantity in subordinate sentence to be trained described in acquisition and default supporting rate.

Step p calculates the product between the subordinate sentence quantity and the default supporting rate, using the product as described pre- If support threshold.

During excavating class sequence rules, class sequence rules CSR is first determining classification, then excavates mesh according to classification Mark rule.In class sequence rules, left side is sequence pattern, and right side is corresponding type label, passes through this corresponding mapping Relationship binds together sequence pattern and classification information.The target that CSR is excavated is found with classification information with highly relevant Property sequence pattern, excavate corresponding rule between sequence pattern and classification.It can be seen that the spy of class sequence rules mining algorithm Point is that have supervision and in advance given classification.Sequential Pattern Mining Algorithm has GSP (Generalizad Sequential Patterns), Prefixspan etc. may be used to the excavation of CSR.It is calculated by the prefixspan based on Frequent Pattern Mining Method meets the Frequent Sequential Patterns of minimum support (i.e. default support threshold) to excavate, while it is considered that in each sequence The difference of sequence length is larger in pattern, and class sequence rules excavation and improper is carried out using single fixed minimum support, Else if to excavate low frequency sequence, need to reduce support threshold, the rule largely generated by high frequency words can be introduced in this way, Introduce noise.

In order to avoid the above problem, the present embodiment uses more minimum support strategies, i.e., by obtaining in subordinate sentence to be trained Subordinate sentence quantity and default supporting rate, calculate the product between subordinate sentence quantity and default supporting rate, the product of gained made To preset support threshold, i.e., using the product of gained as minimum support.Such as when default supporting rate be a, subordinate sentence to be trained Subordinate sentence quantity is n, then presets support threshold min_sup=a × n, wherein can determine the value of a 0.01 by a large number of experiments Between~0.1.

It should be noted that when a is bigger, the precision of the goal rule excavated is higher, and successive ignition excavates, can be with Ensure excavated goal rule recall ratio, such as by pure group item simultaneously the part of speech sequence containing type label, as " #/n ,/ D, */a " are individually extracted, and obtain goal rule.Wherein, type label is attribute word label and emotion word label.Pure combination Item refers to the part of speech sequence that the word in same subordinate sentence to be trained is formed；The part of speech of word composition in difference subordinate sentence to be trained Sequence is non-pure group item；Contain the word of same subordinate sentence to be trained, and includes that the words of difference subordinate sentences to be trained forms Part of speech sequence is mixed term.It is to distinguish the interval between part of speech sequence to distinguish pure group item, non-pure group item and mixed term.

In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium On be stored with fine granularity sentiment analysis model construction program, the fine granularity sentiment analysis model construction program is executed by processor Shi Shixian rewards the step of sending method as described above.

Computer readable storage medium specific implementation mode of the present invention and above-mentioned fine granularity sentiment analysis model building method Each embodiment is essentially identical, and details are not described herein.

It should be noted that herein, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that process, method, article or device including a series of elements include not only those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or device institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including this There is also other identical elements in the process of element, method, article or device.

The embodiments of the present invention are for illustration only, can not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical scheme of the present invention substantially in other words does the prior art Going out the part of contribution can be expressed in the form of software products, which is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions are used so that a station terminal equipment (can be mobile phone, computer, clothes Be engaged in device, air conditioner or the network equipment etc.) execute method described in each embodiment of the present invention.

It these are only the preferred embodiment of the present invention, be not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims

1. a kind of fine granularity sentiment analysis model building method, which is characterized in that fine granularity sentiment analysis model construction side Method includes the following steps：

When get the first preset quantity after training subordinate sentence, participle operation carried out to the subordinate sentence to be trained, and to segment Each word in the subordinate sentence to be trained adds part of speech label afterwards；

The attribute word and emotion word of the second preset quantity are obtained in the subordinate sentence to be trained, and attribute word is added for the attribute word Label adds emotion word label for the emotion word, determines the corresponding part of speech sequence of each subordinate sentence to be trained；

According to the part of speech sequential mining goal rule containing the attribute word label and/or the emotion word label, and according to institute State the attribute set of words and emotion set of words in subordinate sentence to be trained described in goal rule extraction；

The attribute word in the attribute set of words, which is corresponded to, according to each emotion word in the emotion set of words adds emotion class Distinguishing label；

Each attribute word in the attribute set of words and the corresponding contextual information of each attribute word are subjected to vectorization expression, obtained To the attribute word and the corresponding term vector of the contextual information；

Using the attribute word and the corresponding term vector of the contextual information as the multilayer neural network of the attention mechanism Input, using emotional category label corresponding with the attribute word as the output of the multilayer neural network of the attention mechanism As a result, to build fine granularity sentiment analysis model.

2. fine granularity sentiment analysis model building method as described in claim 1, which is characterized in that described by the attribute word Input of the term vector corresponding with the contextual information as the multilayer neural network of the attention mechanism, will be with the category The property corresponding emotional category label of word as the multilayer neural network of the attention mechanism output as a result, to build fine granularity After the step of sentiment analysis model, further include：

The subordinate sentence to be tested for obtaining third preset quantity, the attribute in the subordinate sentence to be tested is extracted according to the goal rule Word；

The fine granularity is inputted after the attribute word of each subordinate sentence to be tested and corresponding contextual information are carried out vectorization expression In sentiment analysis model, correspondence obtains the emotional category label of the subordinate sentence attribute word to be tested；

The emotional category label of the subordinate sentence attribute word to be tested is corresponded into the default emotion with the subordinate sentence attribute word to be tested Class label is compared, and the fine granularity sentiment analysis model analysis text emotion is determined according to the comparing result of comparison gained The accuracy rate of type.

3. fine granularity sentiment analysis model building method as described in claim 1, which is characterized in that described to get first Preset quantity after training subordinate sentence, participle operation carried out to the subordinate sentence to be trained, and be subordinate sentence to be trained described in after participle In each word addition part of speech label the step of include：

When get the first preset quantity after training subordinate sentence, remove unrelated character in the subordinate sentence to be trained and deactivated Word, and participle operation is carried out to the subordinate sentence to be trained by segmentation methods, the subordinate sentence to be trained after being segmented；

4. fine granularity sentiment analysis model building method as described in claim 1, which is characterized in that the determination is each described The step of subordinate sentence to be trained corresponding part of speech sequence includes：

If the subordinate sentence to be trained carries the attribute word label and the emotion word label, by the attribute word tag replacement Attribute word in the subordinate sentence to be trained, and by the emotion word in subordinate sentence to be trained described in the emotion word tag replacement, and It is combined into institute according to the corresponding part of speech label of each word, attribute word label and emotion word label correspondence in the subordinate sentence to be trained State the part of speech sequence of subordinate sentence to be trained；

If the subordinate sentence to be trained does not carry the attribute word label and the emotion word label, the subordinate sentence to be trained according to In the corresponding part of speech tag combination of each word at the subordinate sentence to be trained part of speech sequence.

5. fine granularity sentiment analysis model building method as described in claim 1, which is characterized in that the basis contains described The step of part of speech sequential mining goal rule of attribute word label and/or the emotion word label includes:

Determine the target part of speech sequence containing the attribute word label and/or the emotion word label in the part of speech sequence；

The First ray quantity for meeting same rule in the target part of speech sequence is calculated, is determined in the part of speech sequence and removes institute It states outside target part of speech sequence, meets the second sequence quantity of rule to be determined, wherein the rule to be determined is first sequence The rule that the sequence of target part of speech described in number of columns meets；

According in the part of speech sequence total sequence quantity and the First ray quantity support is calculated, according to described Confidence level is calculated in two sequence quantity and the First ray quantity；

If the support is more than or equal to default support threshold, and the confidence level is more than or equal to default confidence level Threshold value then regard the rule to be determined as goal rule.

6. fine granularity sentiment analysis model building method as claimed in claim 5, which is characterized in that if the support More than or equal to default support threshold, and the confidence level is more than or equal to default confidence threshold value, then is waited for described Before determining the step of rule is used as goal rule, further include：

The product between the subordinate sentence quantity and the default supporting rate is calculated, using the product as the default support threshold Value.

7. fine granularity sentiment analysis model building method as described in claim 1, which is characterized in that described according to the target The step of attribute set of words and emotion set of words in subordinate sentence to be trained described in Rule Extraction includes：

The subordinate sentence that the attribute word label and/or the emotion word label have been added in subordinate sentence to be trained described in determination, is denoted as mesh Mark subordinate sentence；

The part of speech sequence of other subordinate sentences in the subordinate sentence to be trained in addition to the target subordinate sentence and the goal rule are carried out Matching, to extract the attribute set of words and emotion set of words in the subordinate sentence to be trained.

8. fine granularity sentiment analysis model building method as described in any one of claim 1 to 7, which is characterized in that described to incite somebody to action The input of the attribute word and the corresponding term vector of the contextual information as the multilayer neural network of the attention mechanism, Using emotional category label corresponding with the attribute word as the output of the multilayer neural network of the attention mechanism as a result, with Build fine granularity sentiment analysis model the step of include：

Using the term vector of the attribute word and the corresponding term vector of the contextual information as the first floor of attention mechanism nerve The input of the attention layer of network layer, obtains and the relevant contextual information of attribute word emotion；

The linear layer pair of first floor neural net layer word corresponding with the relevant contextual information of attribute word emotion to The term vector of amount and the attribute word is summed, and summed result is obtained；

Using the summed result as the input of next layer of neural network, emotional category label corresponding with the attribute word is made For the attention mechanism multilayer neural network output as a result, obtaining each ginseng in the fine granularity sentiment analysis model Number builds the fine granularity sentiment analysis model according to the parameter.

9. a kind of fine granularity sentiment analysis model construction equipment, which is characterized in that the fine granularity sentiment analysis model construction is set It is standby to include memory, processor and be stored in the fine granularity sentiment analysis that run on the memory and on the processor Model construction program realizes such as claim 1 when the fine granularity sentiment analysis model construction program is executed by the processor The step of to fine granularity sentiment analysis model building method described in any one of 8.

10. a kind of computer readable storage medium, which is characterized in that be stored with fine granularity on the computer readable storage medium Sentiment analysis model construction program is realized when the fine granularity sentiment analysis model construction program is executed by processor as right is wanted The step of seeking the fine granularity sentiment analysis model building method described in any one of 1 to 8.