CN108647205B - Fine-grained emotion analysis model construction method and device and readable storage medium - Google Patents

Fine-grained emotion analysis model construction method and device and readable storage medium Download PDF

Info

Publication number
CN108647205B
CN108647205B CN201810414228.3A CN201810414228A CN108647205B CN 108647205 B CN108647205 B CN 108647205B CN 201810414228 A CN201810414228 A CN 201810414228A CN 108647205 B CN108647205 B CN 108647205B
Authority
CN
China
Prior art keywords
word
emotion
attribute
trained
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810414228.3A
Other languages
Chinese (zh)
Other versions
CN108647205A (en
Inventor
刘志煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN201810414228.3A priority Critical patent/CN108647205B/en
Publication of CN108647205A publication Critical patent/CN108647205A/en
Application granted granted Critical
Publication of CN108647205B publication Critical patent/CN108647205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method, equipment and a readable storage medium for constructing a fine-grained sentiment analysis model, wherein attribute words and sentiment words are extracted by using class sequence rules in the fine-grained sentiment analysis process, so that the accuracy of extracting the attribute words and the sentiment words is improved, the mined class sequence rules are changed along with the change of a text to be trained in the application field, the generalization capability of the constructed fine-grained sentiment analysis model is improved, and the constructed fine-grained sentiment analysis model has good expandability; the problem of long-distance dependence between emotion words and attribute words is solved through a class sequence rule, namely the problem of long-distance dependence between an evaluation object and the evaluation words is solved, emotion context information related to the attribute words is extracted through a neural network of an attention mechanism, and fine-grained emotion analysis is achieved.

Description

Fine-grained emotion analysis model construction method and device and readable storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a fine-grained emotion analysis model construction method, fine-grained emotion analysis model construction equipment and a readable storage medium.
Background
Fine-grained sentiment analysis, also called attribute-level sentiment analysis, belongs to a category of text sentiment analysis, and generally carries out sentiment judgment on attribute features in comment texts. Compared with the emotion analysis at chapter level or sentence level, the emotion analysis at fine granularity can more specifically and definitely identify the emotion according to the related attribute characteristics of the product, and the obtained analysis result provides more detailed evaluation information, so that the emotion analysis method has more reference significance and value.
The first step of fine-grained sentiment analysis is to extract evaluation objects, and the evaluation object extraction is to acquire product attributes concerned by consumers from the evaluation of mass user products. For example: "good service, fair at all but room sound insulation is really too poor". In this evaluation on hotels, the product attributes of interest to consumers are "service", "facility" and "sound insulation", so the extraction of feature words plays a very crucial role for fine-grained sentiment analysis. The evaluation object extraction is classified into two methods, one is based on a dictionary, a template and a text aiming at a specific field, a rule is manually established to extract fine-grained evaluation elements, and if a candidate object is determined firstly, a part-of-speech rule is used for filtering a candidate set, so that an accurate evaluation object is obtained. The method depends on the rules formulated by dictionaries and language experts, has poor expandability and weak generalization capability, can not well identify network new words and the like which are not contained in emotion dictionaries, and has poor extraction effect; another method is to extract the evaluation elements as a sequence tagging problem, for example, a sequence tagging method such as a conditional random field or a hidden markov model is used to extract the evaluation elements, but this method cannot solve the problem of long distance dependence between the evaluation word and the evaluation object. Therefore, the emotion tendency is judged by the conventional evaluation object extraction method, so that the problems of poor expandability, poor generalization ability and long-distance dependence between the evaluation words and the evaluation objects cannot be solved.
Disclosure of Invention
The invention mainly aims to provide a fine-grained sentiment analysis model construction method, equipment and a readable storage medium, and aims to solve the technical problems that the existing sentiment tendency judgment method for a text is poor in expandability, weak in generalization capability and incapable of solving long-distance dependence between an evaluation word and an evaluation object.
In order to achieve the above object, the present invention provides a fine-grained emotion analysis model construction method, which includes the steps of:
after a first preset number of clauses to be trained are obtained, performing word segmentation operation on the clauses to be trained, and adding part-of-speech labels to each word in the clauses to be trained after word segmentation;
acquiring a second preset number of attribute words and emotion words from the clauses to be trained, adding attribute word labels to the attribute words, adding emotion word labels to the emotion words, and determining part-of-speech sequences corresponding to the clauses to be trained;
mining a target rule according to a part-of-speech sequence containing the attribute word labels and/or the emotion word labels, and extracting an attribute word set and an emotion word set in the clause to be trained according to the target rule;
adding emotion category labels to the attribute words in the attribute word set according to the correspondence of each emotion word in the emotion word set;
vectorizing each attribute word in the attribute word set and the context information corresponding to each attribute word to obtain a word vector corresponding to each attribute word and the context information;
and taking the word vector corresponding to the attribute word and the context information as the input of the multi-layer neural network of the attention mechanism, and taking the emotion category label corresponding to the attribute word as the output result of the multi-layer neural network of the attention mechanism to construct a fine-grained emotion analysis model.
Preferably, after the step of using the word vector corresponding to the attribute word and the context information as the input of the multi-layer neural network of the attention mechanism and using the emotion classification label corresponding to the attribute word as the output result of the multi-layer neural network of the attention mechanism to construct the fine-grained emotion analysis model, the method further includes:
acquiring a third preset number of clauses to be tested, and extracting attribute words in the clauses to be tested according to the target rule;
vectorizing the attribute words and the corresponding context information of each clause to be tested, and inputting the vectorized attribute words and the corresponding context information into the fine-grained emotion analysis model to correspondingly obtain emotion category labels of the attribute words in the clauses to be tested;
and correspondingly comparing the emotion category label of the clause attribute word to be tested with a preset emotion category label of the clause attribute word to be tested, and determining the accuracy of the fine-grained emotion analysis model for analyzing the text emotion type according to the comparison result obtained by comparison.
Preferably, after a first preset number of to-be-trained clauses are obtained, performing word segmentation on the to-be-trained clauses, and adding part-of-speech tags to each word in the to-be-trained clauses after word segmentation, the step includes:
after a first preset number of clauses to be trained are obtained, removing irrelevant characters and stop words in the clauses to be trained, and performing word segmentation operation on the clauses to be trained through a word segmentation algorithm to obtain the clauses to be trained after word segmentation;
and adding part-of-speech labels to the words of the clauses to be trained after word segmentation.
Preferably, the step of determining the part-of-speech sequence corresponding to each clause to be trained includes:
detecting whether the clause to be trained carries the attribute word label and the emotion word label;
if the clause to be trained carries the attribute word labels and the emotion word labels, replacing the attribute words in the clause to be trained with the attribute word labels, replacing the emotion word labels with the emotion words in the clause to be trained, and correspondingly combining part-of-speech sequences of the clause to be trained according to part-of-speech labels, the attribute word labels and the emotion word labels corresponding to all words in the clause to be trained;
and if the clause to be trained does not carry the attribute word label and the emotion word label, combining the part of speech sequence of the clause to be trained according to the part of speech labels corresponding to the words in the clause to be trained.
Preferably, the step of mining target rules according to the part-of-speech sequence containing the attribute word tags and/or the emotion word tags comprises:
determining a target part-of-speech sequence containing the attribute word labels and/or the emotion word labels in the part-of-speech sequence;
calculating a first sequence number which accords with the same rule in the target part-of-speech sequence, and determining a second sequence number which accords with a rule to be determined except the target part-of-speech sequence in the part-of-speech sequence, wherein the rule to be determined is the rule which accords with the target part-of-speech sequence in the first sequence number;
calculating to obtain support degree according to the total sequence number in the part of speech sequence and the first sequence number, and calculating to obtain confidence degree according to the second sequence number and the first sequence number;
and if the support degree is greater than or equal to a preset support degree threshold value and the confidence degree is greater than or equal to a preset confidence degree threshold value, taking the rule to be determined as a target rule.
Preferably, before the step of using the rule to be determined as the target rule, the method further includes:
acquiring the number of clauses in the clauses to be trained and a preset support rate;
and calculating the product between the clause number and the preset support rate, and taking the product as the threshold value of the preset support degree.
Preferably, the step of extracting the attribute word set and the emotion word set in the clause to be trained according to the target rule includes:
determining the clauses added with the attribute word labels and/or the emotion word labels in the clauses to be trained, and marking as target clauses;
and matching part-of-speech sequences of other clauses except the target clause in the clause to be trained with the target rule so as to extract the attribute word set and the emotion word set from the clause to be trained.
Preferably, the step of using the word vector corresponding to the attribute word and the context information as an input of the multi-layer neural network of the attention mechanism, and using the emotion category label corresponding to the attribute word as an output result of the multi-layer neural network of the attention mechanism to construct a fine-grained emotion analysis model includes:
taking the word vector of the attribute word and the word vector corresponding to the context information as the input of an attention layer of a first layer neural network layer of an attention mechanism to obtain context information related to the emotion of the attribute word;
summing word vectors corresponding to the context information related to the attribute word emotion and the word vectors of the attribute words in a linear layer of the first layer neural network layer to obtain a summation result;
and taking the summation result as the input of the next layer of neural network, taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism, obtaining each parameter in the fine-grained emotion analysis model, and constructing the fine-grained emotion analysis model according to the parameters.
In addition, to achieve the above object, the present invention further provides a fine grain emotion analysis model construction apparatus, including a memory, a processor, and a fine grain emotion analysis model construction program stored in the memory and executable on the processor, where the fine grain emotion analysis model construction program, when executed by the processor, implements the steps of the fine grain emotion analysis model construction method as described above.
Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a fine-grained emotion analysis model construction program, which, when executed by a processor, realizes the steps of the fine-grained emotion analysis model construction method as described above.
In the fine-grained emotion analysis process, the attribute words and the emotion words are extracted by using the class sequence rules, so that the accuracy of extracting the attribute words and the emotion words is improved, the mined class sequence rules are changed along with the change of a text to be trained in the applied field, the generalization capability of the constructed fine-grained emotion analysis model is improved, and the constructed fine-grained emotion analysis model has good expandability; the problem of long-distance dependence between emotion words and attribute words is solved through a class sequence rule, namely the problem of long-distance dependence between an evaluation object (attribute words) and the evaluation words (emotion words) is solved, and emotion context information related to the attribute words is extracted through a neural network of an attention mechanism, so that fine-grained emotion analysis is realized.
Drawings
FIG. 1 is a schematic diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a first embodiment of a fine-grained emotion analysis model construction method according to the present invention;
FIG. 3 is a flowchart illustrating a second embodiment of the fine-grained emotion analysis model construction method according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the present invention.
It should be noted that fig. 1 is a schematic structural diagram of a hardware operating environment of a fine-grained emotion analysis model building device. The fine-grained emotion analysis model construction equipment in the embodiment of the invention can be terminal equipment such as a PC (personal computer), a portable computer and the like.
As shown in fig. 1, the fine-grained emotion analysis model construction apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the fine grain emotion analysis model construction apparatus configuration shown in FIG. 1 does not constitute a limitation of the fine grain emotion analysis model construction apparatus, and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.
As shown in FIG. 1, a memory 1005, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a fine-grained emotion analysis model builder. The operating system is a program for managing and controlling hardware and software resources of the fine-grained emotion analysis model building equipment, and supports the operation of the fine-grained emotion analysis model building program and other software or programs.
In the fine-grained emotion analysis model construction apparatus shown in fig. 1, the user interface 1003 is mainly used for acquiring an adding instruction of a user, an acquiring instruction of a clause to be trained, and the like; the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server, for example, searching for an answer corresponding to a question to be answered; and processor 1001 may be configured to invoke the fine-grained emotion analysis model builder stored in memory 1005 and perform the following operations:
after a first preset number of clauses to be trained are obtained, performing word segmentation operation on the clauses to be trained, and adding part-of-speech labels to each word in the clauses to be trained after word segmentation;
acquiring a second preset number of attribute words and emotion words from the clauses to be trained, adding attribute word labels to the attribute words, adding emotion word labels to the emotion words, and determining part-of-speech sequences corresponding to the clauses to be trained;
mining a target rule according to a part-of-speech sequence containing the attribute word labels and/or the emotion word labels, and extracting an attribute word set and an emotion word set in the clause to be trained according to the target rule;
adding emotion category labels to the attribute words in the attribute word set according to the correspondence of each emotion word in the emotion word set;
vectorizing each attribute word in the attribute word set and the context information corresponding to each attribute word to obtain a word vector corresponding to each attribute word and the context information;
and taking the word vector corresponding to the attribute word and the context information as the input of the multi-layer neural network of the attention mechanism, and taking the emotion category label corresponding to the attribute word as the output result of the multi-layer neural network of the attention mechanism to construct a fine-grained emotion analysis model.
Further, after the step of constructing the fine-grained emotion analysis model by using the word vector corresponding to the attribute word and the context information as the input of the multi-layer neural network of the attention system and using the emotion classification tag corresponding to the attribute word as the output result of the multi-layer neural network of the attention system, the processor 1001 may be further configured to call a fine-grained emotion analysis model construction program stored in the memory 1005, and execute the following steps:
acquiring a third preset number of clauses to be tested, and extracting attribute words in the clauses to be tested according to the target rule;
vectorizing the attribute words and the corresponding context information of each clause to be tested, and inputting the vectorized attribute words and the corresponding context information into the fine-grained emotion analysis model to correspondingly obtain emotion category labels of the attribute words in the clauses to be tested;
and correspondingly comparing the emotion category label of the clause attribute word to be tested with a preset emotion category label of the clause attribute word to be tested, and determining the accuracy of the fine-grained emotion analysis model for analyzing the text emotion type according to the comparison result obtained by comparison.
Further, after a first preset number of to-be-trained clauses are obtained, performing word segmentation on the to-be-trained clauses, and adding part-of-speech tags to each word in the to-be-trained clauses after word segmentation, the step includes:
after a first preset number of clauses to be trained are obtained, removing irrelevant characters and stop words in the clauses to be trained, and performing word segmentation operation on the clauses to be trained through a word segmentation algorithm to obtain the clauses to be trained after word segmentation;
and adding part-of-speech labels to the words of the clauses to be trained after word segmentation.
Further, the step of determining the part-of-speech sequence corresponding to each clause to be trained includes:
detecting whether the clause to be trained carries the attribute word label and the emotion word label;
if the clause to be trained carries the attribute word labels and the emotion word labels, replacing the attribute words in the clause to be trained with the attribute word labels, replacing the emotion word labels with the emotion words in the clause to be trained, and correspondingly combining part-of-speech sequences of the clause to be trained according to part-of-speech labels, the attribute word labels and the emotion word labels corresponding to all words in the clause to be trained;
and if the clause to be trained does not carry the attribute word label and the emotion word label, combining the part of speech sequence of the clause to be trained according to the part of speech labels corresponding to the words in the clause to be trained.
Further, the step of mining the target rule according to the part-of-speech sequence containing the attribute word tags and/or the emotion word tags comprises:
determining a target part-of-speech sequence containing the attribute word labels and/or the emotion word labels in the part-of-speech sequence;
calculating a first sequence number which accords with the same rule in the target part-of-speech sequence, and determining a second sequence number which accords with a rule to be determined except the target part-of-speech sequence in the part-of-speech sequence, wherein the rule to be determined is the rule which accords with the target part-of-speech sequence in the first sequence number;
calculating to obtain support degree according to the total sequence number in the part of speech sequence and the first sequence number, and calculating to obtain confidence degree according to the second sequence number and the first sequence number;
and if the support degree is greater than or equal to a preset support degree threshold value and the confidence degree is greater than or equal to a preset confidence degree threshold value, taking the rule to be determined as a target rule.
Further, before the step of using the rule to be determined as the target rule, if the support degree is greater than or equal to a preset support degree threshold and the confidence degree is greater than or equal to a preset confidence degree threshold, the processor 1001 may be further configured to invoke a fine-grained sentiment analysis model building program stored in the memory 1005, and execute the following steps:
acquiring the number of clauses in the clauses to be trained and a preset support rate;
and calculating the product between the clause number and the preset support rate, and taking the product as the threshold value of the preset support degree.
Further, the step of extracting the attribute word set and the emotion word set in the clause to be trained according to the target rule comprises:
determining the clauses added with the attribute word labels and/or the emotion word labels in the clauses to be trained, and marking as target clauses;
and matching part-of-speech sequences of other clauses except the target clause in the clause to be trained with the target rule so as to extract the attribute word set and the emotion word set from the clause to be trained.
Further, the step of taking the word vector corresponding to the attribute word and the context information as the input of the multi-layer neural network of the attention mechanism, and taking the emotion category label corresponding to the attribute word as the output result of the multi-layer neural network of the attention mechanism to construct a fine-grained emotion analysis model includes:
taking the word vector of the attribute word and the word vector corresponding to the context information as the input of an attention layer of a first layer neural network layer of an attention mechanism to obtain context information related to the emotion of the attribute word;
summing word vectors corresponding to the context information related to the attribute word emotion and the word vectors of the attribute words in a linear layer of the first layer neural network layer to obtain a summation result;
and taking the summation result as the input of the next layer of neural network, taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism, obtaining each parameter in the fine-grained emotion analysis model, and constructing the fine-grained emotion analysis model according to the parameters.
Based on the structure, various embodiments of the fine-grained emotion analysis model construction method are provided. The fine-grained emotion analysis model construction method is applied to fine-grained emotion analysis model construction equipment which can be terminal equipment such as a PC (personal computer) and a portable computer. For simplicity of description, in the following embodiments of the fine-grained emotion analysis model construction method, this execution subject of the fine-grained emotion analysis model construction apparatus is omitted.
Referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of a fine-grained emotion analysis model construction method according to the present invention.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in a different order than that shown.
The fine-grained emotion analysis model construction method comprises the following steps:
step S10, after a first preset number of to-be-trained clauses are obtained, performing word segmentation operation on the to-be-trained clauses, and adding part-of-speech labels to each word in the to-be-trained clauses after word segmentation.
After the first preset number of clauses to be trained are obtained, word segmentation operation is carried out on the clauses to be trained, and each word in the clauses to be trained is obtained. And adding part-of-speech labels to the words in the clauses to be trained after the word segmentation are obtained.
Further, after a construction instruction for constructing a fine-grained emotion analysis model is detected, a data set to be trained is crawled from a network according to the construction instruction, and specifically, business comments, news comments, treasure comments and the like can be crawled from the network. After the data set to be trained is crawled, punctuation marks, line breaks and the like in each sentence in the data set to be trained are identified, the punctuation marks, the line breaks and the like of each sentence in the data set to be trained are removed, clauses to be trained in the data set to be trained are obtained, and a first preset number of clauses to be trained are selected to construct a fine-grained emotion analysis model. The first preset number may be set according to specific needs, and is not specifically limited in this embodiment. In the process of identifying punctuation marks and line breaks in each sentence of the data set to be trained, each sentence of the data set to be trained can be compared with preset punctuation marks and line breaks to identify the punctuation marks and the line breaks of each sentence of the data set to be trained.
Further, step S10 includes:
step a, after a first preset number of clauses to be trained are obtained, removing irrelevant characters and stop words in the clauses to be trained, and performing word segmentation operation on the clauses to be trained through a word segmentation algorithm to obtain the clauses to be trained after word segmentation.
Further, after a first preset number of clauses to be trained are obtained, irrelevant characters and stop words in the clauses to be trained are removed to obtain a processed clause to be trained, and word segmentation operation is performed on the processed clause to be trained through a word segmentation algorithm to obtain the word-segmented clause to be trained. Word segmentation algorithms include, but are not limited to, dictionary-based word segmentation algorithms, statistical-based word segmentation algorithms, rule-based word segmentation algorithms, and bar segmentation algorithms. It should be noted that, in the process of performing word segmentation operation on the clause to be trained by using the ending word segmentation algorithm, a search engine word segmentation mode or an accurate mode of the ending word segmentation algorithm may be used.
In the process of removing irrelevant characters and stop words in the clauses to be trained, comparing each word in the clauses to be trained with words in a preset irrelevant character database and a preset stop word database, and removing the same words in the clauses to be trained as those in the irrelevant character database and the stop word database. It will be appreciated that the extraneous character database and the stop word database are preset. The irrelevant character database and the stop word database contain commonly used irrelevant characters and stop words. For example, the irrelevant character database contains "/,"%, "and the stop word database contains" and "/," and the like ".
And b, adding part-of-speech labels to the words of the clauses to be trained after word segmentation.
And after the segmented sentence to be trained is obtained, adding a part-of-speech label to each word of the segmented sentence to be trained. It should be noted that in the process of adding part-of-speech tags to each word in the clause to be trained, a user can input an adding instruction in fine-grained emotion analysis model construction equipment, and manually add part-of-speech tags to each word by the user; or the part-of-speech tagging is automatically carried out by fine-grained emotion analysis model construction equipment by using modes such as a Chinese character encoding algorithm, LTP part-of-speech tagging and the like. It is understood that part-of-speech tags are added to the words, i.e., the part-of-speech of each word is noted, whether each word is a noun, adverb, adjective, etc., is determined.
Further, in order to uniformly express the standard and improve the speed and the accuracy of constructing a fine-grained emotion analysis model, if the traditional Chinese characters exist in the clause to be trained, the traditional Chinese characters are uniformly converted into corresponding simple Chinese characters; and if the sentence to be trained has capital English and lowercase English, modifying the capital English in the sentence to be trained into corresponding lowercase English, or modifying the lowercase English in the sentence to be trained into corresponding uppercase English, so that only the capital English or only the lowercase English exists in the sentence to be trained.
Step S20, acquiring a second preset number of attribute words and emotion words in the clauses to be trained, adding attribute word labels to the attribute words, adding emotion word labels to the emotion words, and determining part-of-speech sequences corresponding to the clauses to be trained.
And acquiring a second preset number of attribute words and emotional words from the clauses to be trained, adding attribute word labels to the attribute words, and adding emotional word labels to the emotional words. It is understood that the second predetermined number is smaller than the first predetermined number, and the second predetermined number may be determined according to the size of the first predetermined number and the practical application, for example, the second predetermined number may be set to 10, 20, or 32. It should be noted that the second preset number is a basis for determining the attribute word set and the emotion word set, so that a small number of attribute words and emotion words can be acquired, and the attribute word set and emotion word set of the clause to be trained are obtained through the class sequence rule. In this embodiment, an adding instruction is triggered in fine-grained emotion analysis model construction equipment by a user, and the fine-grained emotion analysis model construction equipment adds attribute word tags to attribute words and emotion word tags to emotion words according to the adding instruction, that is, the user manually adds emotion word tags. In this embodiment, the specific representation forms of the attribute word label and the emotion word label are not limited, for example, the attribute word label may be represented by "#" and the emotion word label may be represented by "#". If a certain characteristic clause is 'room is comfortable', determining 'room' as an attribute word, 'comfortable' as an emotion word, adding a # to 'room', and adding a x to 'comfortable'; if a certain characteristic clause is 'good service', determining 'service' as an attribute word, 'good' as an emotion word, adding a # to 'service', and adding a # to 'good'. After attribute word labels and emotion word labels are added to the feature clauses, the feature clauses contain type labels.
After adding attribute word labels and emotion word labels to the characteristic clauses in the clauses to be trained, determining part-of-speech sequences corresponding to the clauses to be trained. If a clause to be trained contains attribute words and emotion words in a second preset number, the corresponding part-of-speech sequence carries attribute word labels and emotion word labels; if a clause to be trained is a common clause, the clause does not contain attribute words and emotion words in the second preset number, and the part-of-speech sequence corresponding to the clause only carries part-of-speech tags.
Further, the step of determining the part-of-speech sequence corresponding to each clause to be trained includes:
and c, detecting whether the clause to be trained carries the attribute word label and the emotion word label.
And d, if the clause to be trained carries the attribute word labels and the emotion word labels, replacing the attribute words in the clause to be trained with the attribute word labels, replacing the emotion word labels with the emotion words in the clause to be trained, and correspondingly combining part-of-speech sequences of the clause to be trained according to part-of-speech labels, the attribute word labels and the emotion word labels corresponding to all the words in the clause to be trained.
And e, if the clause to be trained does not carry the attribute word labels and the emotion word labels, combining part-of-speech sequences of the clause to be trained according to part-of-speech labels corresponding to all words in the clause to be trained.
Further, the specific process of determining the part-of-speech sequence corresponding to each clause to be trained may be: and detecting whether the clauses to be trained carry attribute word labels and emotion word labels. And if the clause to be trained carries the attribute word label and the emotion secondary label, replacing the attribute word in the clause to be trained with the attribute word label, and replacing the emotion word label with the emotion word in the clause to be trained. It can be understood that, in the clause to be trained, only the attribute words carrying the attribute words corresponding to the second preset number need to be replaced by the attribute word labels, and only the emotion words carrying the emotion words corresponding to the second preset number need to be replaced by the emotion word labels. And correspondingly combining part-of-speech sequences of the clauses to be trained according to the part-of-speech tags, the attribute word tags and the emotion word tags corresponding to all the words in the clauses to be trained. If the clauses to be trained do not carry the attribute word labels and the emotion word labels, the positions of the attribute word labels and the emotion word labels to be added are set aside, namely, the part-of-speech sequences of the clauses to be trained are directly combined according to the part-of-speech labels corresponding to the words in the clauses to be trained.
If the parts of speech are labeled for the clauses to be trained, namely 'comfortable room' and 'good service', the obtained results are 'room/n, very/d, comfortable/a' and 'service/n, very/d, good/a', wherein 'n' represents noun,'d' represents adverb, 'a' represents adjective, the corresponding parts of speech sequence is '#/n,/d,/a' and '#/n,/d,/a', and the corresponding expression is '# a'. If a certain clause to be trained is 'price is not cheap', but the clause to be trained does not contain attribute word labels and emotion word labels, the clause to be trained is marked with the part of speech, and the obtained result is 'price/n, not/d, cheap \/a'. The corresponding part-of-speech sequence is "/n,/d,/a", and the corresponding expression is "nda".
Step S30, mining a target rule according to the part of speech sequence containing the attribute word labels and/or the emotion word labels, and extracting the attribute word set and the emotion word set in the clause to be trained according to the target rule.
And after the part-of-speech sequence of each clause to be trained is obtained, mining a target rule according to the part-of-speech sequence containing the attribute word labels and/or the emotion word labels, and extracting an attribute word set and an emotion word set in the clause to be trained according to the mined target rule. It should be noted that, after the target rule is mined, words at positions corresponding to the attribute word tags in the part-of-speech sequence satisfying the target rule are marked as attribute words, and words at positions corresponding to the emotion word tags are marked as emotion words. If a certain clause to be trained meets the mined target rule "< n > < d > < a > >, determining that the vocabulary corresponding to the n in the clause to be trained is an attribute word and the vocabulary corresponding to the a is an emotion word.
Specifically, target Rules containing type tags are mined in part-of-speech sequences through Class Sequential Rules (CSR), that is, target Rules are mined in part-of-speech sequences containing attribute word tags and/or emotion word tags. The class sequence rule is a rule composed of a type label (namely an attribute word label and/or an emotion word label) and part of speech sequence data, and the type label and the emotion word label form a mapping relation and are formally expressed as follows: x → Y, the mapping relationship is described in detail as follows: x is a part of speech sequence expressed as
<S1x1S2x2…Sixi>Where S refers to a sequence database of parts of speech, which is a series of tuples<sid,s>The set sid is the label of the word sequence in the word sequence database, for example, the sid of the first word sequence in the word sequence database is 1, the sid of the second word sequence is 2, and S refers to the word sequence, xi indicates the possible category corresponding to the word sequence; y is a part of speech sequence containing type labels and is expressed as<S1c1S2c2...Sici>Wherein (c)rE is C,1 is more than or equal to i is less than or equal to r), S is defined as above, CrFor certain type tags, and C ═ { C ═ C1,c2,…crIs a set of type labels. In this embodiment, the CSR requires the presence of a part-of-speech sequence carrying attribute word tags and/or emotion word tags. After determining the attribute word labels and the emotion word labels, the CSR excavates part-of-speech sequences meeting a preset support degree threshold and a preset confidence degree threshold to serve as target rules.
Further, the step of mining the target rule according to the part of speech sequence containing the attribute word labels and/or the emotion word labels comprises the following steps:
and f, determining a target part-of-speech sequence containing the attribute word labels and/or the emotional word labels in the part-of-speech sequence.
Step g, calculating a first sequence number which accords with the same rule in the target part-of-speech sequence, and determining a second sequence number which accords with a rule to be determined except the target part-of-speech sequence in the part-of-speech sequence, wherein the rule to be determined is the rule which accords with the target part-of-speech sequence in the first sequence number.
Further, the specific process of mining the target rule according to the part-of-speech sequence containing the attribute word tags and/or the emotion word tags comprises the following steps: and determining a part-of-speech sequence containing attribute word labels and/or emotion word labels in the part-of-speech sequence corresponding to the clause to be trained, marking as a target part-of-speech sequence, calculating the number of part-of-speech sequences which accord with the same rule in the target part-of-speech sequence, and marking as a first sequence number. In the process of calculating the number of the part-of-speech sequences which accord with the same rule in the target part-of-speech sequence, determining according to the expression form of each part-of-speech sequence in the target part-of-speech sequence, and if the expression forms of two part-of-speech sequences are consistent, determining that the two part-of-speech sequences accord with the same rule. If the 3 parts of speech sequences are "< abd × gh >", < # kea > and < ab × fgh >, respectively, it can be determined that "< abd × gh >" and < ab × fgh > both comply with the rule "< < ab > x < gh > > → < < ab > > and" < # kea > does not comply with the rule "< < ab > > < gh >". It should be noted that the letters in the rule and part-of-speech sequence indicate the part-of-speech of the word at the corresponding position.
After the number of first sequences which accord with the same rule in the target part-of-speech sequence is calculated, the rule which accords with the target part-of-speech sequence corresponding to the number of the first sequences is recorded as a rule to be determined, part-of-speech sequences which accord with the rule to be determined except the target part-of-speech sequence are determined in the part-of-speech sequence corresponding to the clause to be trained, and the number which accords with the rule to be determined is recorded as a second sequence number. If the part-of-speech sequence is < abeghk > and < d # kb >, it may be determined that the part-of-speech sequence < abeghk > does not conform to the rule to be determined "< < ab > < gh > >, and the part-of-speech sequence < d # kb > does not conform to the rule to be determined" < < ab > < gh > >. It can be understood that, since the part-of-speech sequence corresponding to the second sequence number does not have the attribute word tags and the emotion word tags, the attribute word tags and/or the emotion word tags in the rule to be determined do not need to be considered in the process of calculating the number of the part-of-speech sequences not containing the attribute word tags and/or the emotion word tags and meeting the rule to be determined.
And h, calculating to obtain support degree according to the total sequence number in the part of speech sequence and the first sequence number, and calculating to obtain confidence degree according to the second sequence number and the first sequence number.
After the first sequence number and the second sequence number are obtained through calculation, calculating the total sequence number of the part-of-speech sequences of the clauses to be trained, and dividing the first sequence number by the middle sequence number to obtain the support degree corresponding to the target rule; and adding the first sequence number and the second sequence number to obtain the sum of the first sequence number and the second sequence number, and dividing the first sequence number by the sum of the first sequence number and the second sequence number to obtain the confidence corresponding to the target rule.
And i, if the support degree is greater than or equal to a preset support degree threshold value and the confidence degree is greater than or equal to a preset confidence degree threshold value, taking the rule to be determined as a target rule.
And after the support degree and the confidence degree of the rule to be determined are obtained through calculation, judging whether the support degree obtained through calculation is larger than or equal to a preset support degree threshold value or not, and judging whether the confidence degree obtained through calculation is larger than or equal to a preset confidence degree threshold value or not. If the calculated support degree is greater than or equal to a preset support degree threshold value and the calculated confidence degree is greater than or equal to a preset confidence degree threshold value, taking the rule to be determined as a target rule; and if the calculated support degree is smaller than a preset support degree threshold value and/or the calculated confidence degree is smaller than a preset confidence degree threshold value, not taking the rule to be determined as the target rule. The preset support degree threshold and the preset confidence degree threshold may be set according to specific needs, and in this embodiment, the preset support degree threshold and the preset confidence degree threshold are not specifically limited.
Further, the step of extracting the attribute word set and the emotion word set in the clause to be trained according to the target rule comprises the following steps:
and j, determining the clauses added with the attribute word labels and/or the emotion word labels in the clauses to be trained, and marking as target clauses.
And k, matching part-of-speech sequences of other clauses except the target clause in the clause to be trained with the target rule so as to extract the attribute word set and the emotion word set from the clause to be trained.
Further, the process of extracting the attribute word set in the clause to be trained according to the target rule is as follows: determining clauses added with attribute word labels and/or emotion word labels in clauses to be trained, marking the clauses added with the attribute word labels and/or emotion word labels in the clauses to be trained as target clauses, matching part of speech sequences of other clauses except the target clauses in the clauses to be trained with the determined target rules to obtain attribute words and emotion words in the other clauses except the target clauses, and extracting an attribute word set and an emotion word set from the clauses to be trained. If a part-of-speech sequence < fabeghk > of a clause to be trained can meet the target rule "< < ab > < gh > >, it can be determined that the word corresponding to the part-of-speech" e "in the clause to be trained is an attribute word.
And correspondingly adding the attribute words and the emotion words in the target clauses into the obtained attribute word set and emotion word set, or not adding the attribute words and the emotion words in the target clauses.
Further, after the attribute word set is extracted, or the attribute word set and the emotion word set are extracted, corresponding attribute word labels and emotion word labels are correspondingly added to the part of speech sequence. In order to obtain a complete attribute word set and emotion word set, after adding corresponding attribute word labels and emotion word labels to the part-of-speech sequence, returning to execute step S40, and when returning to execute step S40 each time, setting the preset support threshold to be larger than the preset support threshold corresponding to the previous execution of step S40, so as to ensure the accuracy of the mined target sequence, thereby enabling the attribute words and emotion words extracted through the mined target rule to be more accurate. In this embodiment, the class mining sequence refers to a semi-supervised learning method, and after multiple rounds of iterative training, that is, a method similar to "rolling snowball", and new training sets (that is, clauses to be trained that meet target rules) and iterative mining rules are continuously labeled, a final attribute word set and emotion word set are obtained, so as to better ensure the precision ratio and recall ratio of the CSR. Meanwhile, the mining is a rule, and the universality is strong for the part-of-speech sequence, so the generalization performance of the CSR is good.
And step S40, adding emotion category labels to the attribute words in the attribute word set according to the correspondence of each emotion word in the emotion word set.
And after obtaining the emotion word set and the attribute word set, adding emotion category labels for the attribute words in the attribute word set according to correspondence of the emotion words in the emotion word set. In this embodiment, the emotion category labels are two, one is a positive emotion, the corresponding label may be set to 1, one is a derogative emotion, and the corresponding label may be set to-1. It is to be understood that the expression of the positive and negative emotion corresponding tags is not limited to 1 and-1. In other embodiments, the emotion category labels may be classified into 3 or 4. For example, the emotion classification labels can be classified into 3 types, namely good, medium and bad. It should be noted that, in this embodiment, one attribute word has one corresponding emotion word, and therefore, after a certain attribute word in a clause to be trained is determined, because the attribute word and the emotion word have a corresponding relationship, the emotion word corresponding to the attribute word can be correspondingly determined.
In the process of adding the emotion category labels to the attribute words in the attribute word set, a user can input an adding instruction in fine-grained emotion analysis model construction equipment, and manually add the emotion category labels to the clauses to be trained by the user; or words corresponding to different emotion category labels are preset in fine-grained emotion analysis model construction equipment, and the emotion words corresponding to the attribute words are compared with the preset emotion category label words to determine the emotion category labels of the attribute words. For example, when the words corresponding to the predetermined commendation emotion are "comfortable, good, cheap, worth, large", etc., and the words corresponding to the depreciation emotion are "small, bad, not worth, rotten, low", etc. When the condition word corresponding to a certain attribute word is determined to be a word corresponding to the condition, adding a label corresponding to the condition for the attribute word; and when determining that the emotion words corresponding to certain attribute words are the words corresponding to the derogatory emotion, adding the labels corresponding to the derogatory emotion to the attribute words.
Step S50, performing vectorization representation on each attribute word in the attribute word set and the context information corresponding to each attribute word, to obtain a word vector corresponding to each attribute word and the context information.
After the attribute word set in the clause to be trained is obtained, determining context information corresponding to the attribute words in the clause to be trained, and vectorizing the attribute words and the corresponding context information to obtain word vectors of the attribute words and the corresponding context information. Wherein the context information is a context word related to the attribute word. Specifically, word vectors of each attribute word and corresponding context information may be obtained through a word2vec tool. word2vec can be trained efficiently on millions of dictionaries and billions of data sets, and word vectors (word embedding), which are training results obtained by the word2vec tool, can well measure word-to-word similarity. In word2vec, word2vec is largely divided into two models of CBOW (continuous Bag of words) and Skip-Gram. The CBOW is used for speculating a target word from an original statement, and the CBOW model is equivalent to a vector of a word bag model multiplied by an embedding matrix so as to obtain a continuous embedding vector; and the Skip-Gram is just the opposite, and the original sentence is deduced from the target word. It is understood that, in this embodiment, the language processing tool may also be other tools that can implement the same function as word2 vec.
Step S60, taking the word vector corresponding to the attribute word and the context information as an input of the multi-layer neural network of the attention mechanism, and taking the emotion classification label corresponding to the attribute word as an output result of the multi-layer neural network of the attention mechanism, so as to construct a fine-grained emotion analysis model.
After obtaining each attribute word in the attribute word set and a word vector corresponding to the context information, taking the word vector corresponding to the attribute word and the context information as the input of the multilayer neural network of the attention mechanism, determining an emotion word in the same clause as the attribute word in the multilayer neural network of the attention mechanism, taking an emotion category label of the emotion word as the final output result of the multilayer neural network of the attention mechanism, namely taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism, so as to obtain each parameter in the emotion word segmentation model, and construct a fine-grained emotion analysis model. The number of the neural networks in the attention mechanism can be set according to specific needs, such as 3 layers, 4 layers or 6 layers.
Further, step S60 includes:
and step l, taking the word vector of the attribute word and the word vector corresponding to the context information as the input of an attention layer of a first layer neural network layer of an attention mechanism, and obtaining the context information related to the emotion of the attribute word.
Further, after the word vectors of the attribute words and the word vectors corresponding to the context information are obtained, the word vectors of the attribute words and the word vectors corresponding to the context information are used as the input of an attention layer in a first layer neural network layer of an attention mechanism, and the context information related to the emotion of the attribute words is obtained. Among the neural network layers, there is one attention layer and a linear layer. When the word vector corresponding to the attribute word is input to the attention layer of the neural network, context information related to the emotion of the attribute word is output.
And m, summing word vectors corresponding to the context information related to the attribute word emotion and the word vectors of the attribute words in a linear layer of the first layer neural network layer to obtain a summation result.
After the context information related to the attribute word emotion is obtained, the context information related to the attribute word emotion is recorded as related information, word vectors of the related information are determined in context information word vectors related to the attribute word emotion, the word vectors of the attribute words and the word vectors of the related information are used as input of a linear layer of a first-layer neural network, the word vectors of the related information and the word vectors of corresponding attribute words are linearly summed correspondingly in the linear layer of the first-layer neural network, and a summation result is obtained. It is understood that the summation result is the output of the first layer neural network linear layer. The related information is words related to the emotion of the attribute words or compressed information represented in a numerical mode and is used for representing the information related to the emotion of the attribute words.
For example, when there are 6 context information corresponding to a certain attribute word, A, B, C, D, E and F are respectively used. If the 5 pieces of context information of A, C, D, E and F are determined to be related information according to the output result of the attention layer of the first-layer neural network, the word vectors of the word vectors and the attribute words corresponding to the 5 pieces of related information are input into the linear layer of the first-layer neural network.
And n, taking the summation result as the input of the next layer of neural network, taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism, obtaining each parameter in the fine-grained emotion analysis model, and constructing the fine-grained emotion analysis model according to the parameters.
And after a summation result is obtained, taking the word vector corresponding to the summation result as the input of the attention layer of the next neural network in the attention mechanism, determining the context information more relevant to the emotion of the attribute word according to the output of the attention layer, inputting the word vector of the context information more relevant to the emotion of the attribute word and the word vector of the attribute word into the linear layer of the neural network, and repeating the iteration in a circular manner (namely taking the output of the linear layer of the previous neural network as the input of the attention layer of the next neural network), so that the output result of the last layer of the attention mechanism is the emotion category label corresponding to the attribute word. And continuously training each parameter in the attention mechanism through the attribute words in the attribute word set to obtain each parameter in the fine-grained emotion analysis model. It can be understood that after each parameter in the fine-grained emotion analysis model is obtained, the fine-grained emotion analysis model is successfully constructed.
It should be noted that, in order that the constructed fine-grained emotion analysis model can perform emotion judgment on clauses corresponding to attribute words according to the attribute words, in this embodiment, a deep learning method is adopted to construct a multilayer neural network for emotion scoring, attention is drawn to context information related to emotion of the attribute words, corresponding emotion tendency information is extracted, and thus, emotion judgment of the attribute words is performed by using the information.
The method comprises the steps of utilizing an attention mechanism to extract context information important for attribute word emotion classification layer by layer, wherein different words have different importance degrees to sentences, for example, stop words appear in a plurality of sentences, the corresponding TF-IDF (term frequency-inverse text frequency index) value is small, and the contribution to the sentences is extremely small. And taking the TF-IDF value as a word weight, weighting and summing the word vector and the weight of the corresponding word to obtain a sentence vector, and representing the contribution degree of each word to the sentence vector in the sentence vector. Therefore, to obtain context information related to emotion of attribute words, the attention mechanism is essentially weighted average, and if word vectors Vi (i ═ 1, 2, …, n) of n m-dimensional attribute words are given, word vector information of all attribute words is integrated to obtain word vectors related to emotion category labels learned by a fine-grained emotion analysis model as much as possible. In the embodiment, in order to improve the accuracy of the integrated word vector, weights corresponding to the attribute words and the context information related to the emotion of the attribute words are calculated through an attention mechanism.
In the process of calculating the corresponding weight of the attribute words and the context information, a scoring function F (x) is designed, the word vector of the attribute words and the word vector of the context are input, and corresponding scores are output, namely the corresponding weight is output. The scoring basis is to calculate the degree of correlation between the word vector and the attention mechanism attention object, and if the degree of correlation between the word vector input into the scoring function and the attention object is higher, the corresponding score value is larger. It is understood that the object of interest is the attribute word in this embodiment. In this embodiment, the word vector of the attribute word is denoted as Vi, the word vector of the context is denoted as Vt, and b is a parameter set in the neural network. At this time, the input of the activation function considers a plurality of features, wherein the activation function may be a function of tanh, relu, or the like. The corresponding scoring function may be: f (x) activation (W)1Vi+W2Vt+b)。
After obtaining the corresponding score value through the scoring function, designing a classification activation function aiming at the score value, and outputting the corresponding weight, so that the most important information is the attribute words for the first-layer neural network, the first-layer input of the attention mechanism is word vectors of the attribute words, the output is context information relevant to given attribute word emotion, then the word vectors of the attribute words and the word vectors of the context information obtained by the first-layer neural network are weighted and summed to obtain the input of the attention layer of the next-layer neural network, the most important context information relevant to attribute word emotion is obtained through extraction of the multilayer neural network, and finally the context information is subjected to emotion classification, namely, whether the emotion is derogatory emotion or positive emotion is determined, and an emotion classification label is added.
In the fine-grained emotion analysis process, the attribute words and the emotion words are extracted by using the class sequence rules, so that the accuracy of extracting the attribute words and the emotion words is improved, the mined class sequence rules (namely the target rules in the embodiment) are changed along with the change of the text to be trained in the application field, the generalization capability of the constructed fine-grained emotion analysis model is improved, and the constructed fine-grained emotion analysis model has good expandability; the problem of long-distance dependence between emotion words and attribute words is solved through a class sequence rule, namely the problem of long-distance dependence between an evaluation object and the evaluation words is solved, emotion context information related to the attribute words is extracted through a neural network of an attention mechanism, and fine-grained emotion analysis is achieved.
Further, a second embodiment of the fine-grained emotion analysis model construction method is provided.
The second embodiment of the fine grain emotion analysis model construction method differs from the first embodiment of the fine grain emotion analysis model construction method in that, referring to fig. 3, the fine grain emotion analysis model construction method further includes:
step S70, obtaining a third preset number of clauses to be tested, and extracting attribute words in the clauses to be tested according to the target rule.
And when the fine-grained emotion analysis model is successfully constructed, obtaining a third preset number of clauses to be tested, and extracting attribute words in the clauses to be tested according to a target rule. The third preset number may be equal to the first preset number, or may be smaller than the first preset number. Furthermore, emotion words in the clauses to be tested can be extracted according to the target rule. The principle of extracting the attribute words and the emotion words in the clause to be tested according to the target rule is consistent with the principle of extracting the attribute word set in the clause to be trained according to the target rule, and details are not repeated in this embodiment.
It should be noted that after the clauses to be tested are obtained, irrelevant characters, stop words and the like in the clauses to be tested need to be removed, the clauses to be tested are subjected to word segmentation, and part-of-speech tags and the like are added to each word in the clauses to be tested. The specific process is consistent with the process of the clause to be trained in the first embodiment, and is not described in detail in this embodiment.
And step S80, vectorizing the attribute words and the corresponding context information of each clause to be tested, and inputting the vectorized attribute words and the corresponding context information into the fine-grained emotion analysis model to correspondingly obtain the emotion category labels of the attribute words in the clauses to be tested.
After the attribute words of each clause to be tested are obtained, determining the context information corresponding to each attribute word in the clause to be tested, vectorizing the attribute words of the clause to be tested and the corresponding context information to obtain word vectors corresponding to the attribute words and the context information in the clause to be tested, and inputting the word vectors corresponding to the attribute words and the context information in the clause to be tested into the constructed fine-grained emotion analysis model to obtain the emotion category labels corresponding to the clause to be tested. And the output of the fine-grained emotion analysis model is the emotion category label of the clause to be tested corresponding to the input attribute word.
And step S90, comparing the emotion category label of the clause attribute word to be tested with the preset emotion category label of the clause attribute word to be tested correspondingly, and determining the accuracy of the fine-grained emotion analysis model for analyzing the text emotion type according to the comparison result obtained by comparison.
And after obtaining the emotion category labels of the clauses to be tested, comparing the emotion category labels of the clause attribute words to be tested with the preset emotion category labels of the clause attribute words to be tested correspondingly to obtain a comparison result, and determining the accuracy of analyzing the text emotion types by a fine-grained emotion analysis model according to the comparison result. The preset emotion category labels of the clause attribute words to be tested are preset by a user. For a clause to be tested, if the emotion category label output by the fine-grained emotion analysis model is consistent with the preset emotion category label, determining that the fine-grained emotion analysis model has correct emotion analysis on the clause to be tested; and if the emotion category label output by the fine-grained emotion analysis model is inconsistent with the preset emotion category label, determining the emotion analysis error of the fine-grained emotion analysis model on the clause to be tested. And determining the accuracy of analyzing the text emotion types by the fine-grained emotion analysis model according to the correct emotion analysis data and the wrong emotion analysis data in all the clauses to be tested.
If the fine-grained emotion analysis model correctly analyzes the emotion of 83 clause attribute words to be tested when there are 100 clauses to be tested, namely, emotion category labels of 83 clause attribute words to be tested output by the fine-grained emotion analysis model are consistent with corresponding preset emotion category labels, and emotion analysis of the rest 17 clause attribute words to be tested is wrong, namely, emotion category labels of 17 clause attribute words to be tested output by the fine-grained emotion analysis model are not consistent with corresponding preset emotion category labels, the accuracy of analyzing the emotion types of the text by the fine-grained emotion analysis model is determined to be 83%.
In the embodiment, after the fine-grained emotion analysis model is successfully constructed, the accuracy of the fine-grained emotion analysis model constructed by the sentence test to be tested for analyzing the text emotion types is obtained, so that when a user determines that the accuracy of the constructed fine-grained emotion analysis model for analyzing the text emotion types is low, more fine-grained emotion analysis models to be trained are obtained, and the accuracy of the constructed fine-grained emotion analysis model for analyzing the text emotion types is improved.
Further, a third embodiment of the fine-grained emotion analysis model construction method is provided.
The third embodiment of the fine grain emotion analysis model construction method differs from the first or second embodiment of the fine grain emotion analysis model construction method in that the fine grain emotion analysis model construction method further includes:
and step o, acquiring the number of the clauses in the clauses to be trained and a preset support rate.
And p, calculating the product between the number of the clauses and the preset support rate, and taking the product as the threshold value of the preset support degree.
In the process of mining class sequence rules, class sequence rules CSR determine classes first and then mine target rules according to the classes. In the class sequence rule, the left side is the sequence mode, the right side is the corresponding type label, and the sequence mode and the class information are bound together through the corresponding mapping relation. The goal of CSR mining is to find sequence patterns with a high degree of correlation with category information, mining rules for correspondence between sequence patterns and categories. It can be seen that the class sequence rule mining algorithm is characterized by supervised and pre-given classes. Sequence pattern mining algorithms such as GSP (generalized Sequential patterns), Prefix span, etc. can be used for CSR mining. Frequent sequence patterns meeting the minimum support degree (namely, a preset support degree threshold value) are mined through a prefix span algorithm based on frequent pattern mining, and meanwhile, considering that the difference of sequence lengths in the sequence patterns is large, the similar sequence rule mining by using a single fixed minimum support degree is not suitable, otherwise, if a low-frequency sequence is to be mined, the support degree threshold value needs to be reduced, so that a large amount of rules generated by high-frequency words are introduced, and noise is introduced.
In order to avoid the above problem, the embodiment uses a multiple minimum support degree strategy, that is, a product between the number of clauses and a preset support rate is calculated by obtaining the number of clauses in the clause to be trained and the preset support rate, and the obtained product is used as a preset support degree threshold, that is, the obtained product is used as the minimum support degree. If the preset support rate is a and the number of clauses of the clauses to be trained is n, a preset support threshold value min _ sup is a value of a × n, wherein the value of a can be determined to be between 0.01 and 0.1 through a large number of experiments.
It should be noted that, when a is larger, the higher the precision of the mined target rule is, and multiple times of iterative mining can ensure the recall ratio of the mined target rule, for example, part-of-speech sequences, such as "#/n,/d,/a", which simultaneously contain type labels in pure composition terms are separately extracted to obtain the target rule. The type labels are attribute word labels and emotion word labels. The pure combination item refers to a part-of-speech sequence formed by words in the same clause to be trained; the part-of-speech sequence formed by the words in different clauses to be trained is a non-pure combination item; the part-of-speech sequence which contains the words of the same clause to be trained and the words of different clauses to be trained is a mixed item. The pure combination terms, the non-pure combination terms and the mixture terms are distinguished to distinguish the intervals between the part-of-speech sequences.
Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a fine-grained emotion analysis model building program is stored on the computer-readable storage medium, and when executed by a processor, the fine-grained emotion analysis model building program implements the steps of the reward sending method described above.
The specific implementation manner of the computer-readable storage medium of the present invention is substantially the same as that of each embodiment of the fine-grained emotion analysis model construction method, and is not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for constructing a fine-grained emotion analysis model is characterized by comprising the following steps:
after a first preset number of clauses to be trained are obtained, performing word segmentation operation on the clauses to be trained, and adding part-of-speech labels to each word in the clauses to be trained after word segmentation;
acquiring a second preset number of attribute words and emotion words from the clauses to be trained, adding attribute word labels to the attribute words, adding emotion word labels to the emotion words, and determining part-of-speech sequences corresponding to the clauses to be trained;
mining a target rule according to a part-of-speech sequence containing the attribute word labels and/or the emotion word labels, and extracting an attribute word set and an emotion word set in the clause to be trained according to the target rule, wherein the target rule is obtained by mining in the part-of-speech sequence based on a class sequence rule;
adding emotion category labels to the attribute words in the attribute word set according to the correspondence of each emotion word in the emotion word set;
vectorizing each attribute word in the attribute word set and the context information corresponding to each attribute word to obtain a word vector corresponding to each attribute word and the context information;
and taking the word vector corresponding to the attribute word and the context information as the input of the multilayer neural network of the attention mechanism, and taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism to construct a fine-grained emotion analysis model.
2. The fine-grained emotion analysis model construction method according to claim 1, wherein after the step of constructing the fine-grained emotion analysis model by using the word vectors corresponding to the attribute words and the context information as the input of the attention-oriented multilayer neural network and using the emotion classification labels corresponding to the attribute words as the output result of the attention-oriented multilayer neural network, the method further comprises:
acquiring a third preset number of clauses to be tested, and extracting attribute words in the clauses to be tested according to the target rule;
vectorizing and expressing the attribute words and the corresponding context information of each clause to be tested, and inputting the vectorized and expressed attribute words and the corresponding context information into the fine-grained emotion analysis model to correspondingly obtain emotion category labels of the attribute words of the clauses to be tested;
and correspondingly comparing the emotion category label of the clause attribute word to be tested with a preset emotion category label of the clause attribute word to be tested, and determining the accuracy of the fine-grained emotion analysis model for analyzing the text emotion type according to the comparison result obtained by comparison.
3. The fine-grained emotion analysis model construction method according to claim 1, wherein the step of performing word segmentation on the clause to be trained after a first preset number of clauses to be trained are obtained, and adding part-of-speech tags to each word in the clauses to be trained after word segmentation comprises:
after a first preset number of clauses to be trained are obtained, removing irrelevant characters and stop words in the clauses to be trained, and performing word segmentation operation on the clauses to be trained through a word segmentation algorithm to obtain the clauses to be trained after word segmentation;
and adding part-of-speech labels to the words of the clauses to be trained after word segmentation.
4. The fine-grained emotion analysis model construction method of claim 1, wherein the step of determining the part-of-speech sequence corresponding to each of the clauses to be trained comprises:
detecting whether the clause to be trained carries the attribute word label and the emotion word label;
if the clause to be trained carries the attribute word labels and the emotion word labels, replacing the attribute words in the clause to be trained with the attribute word labels, replacing the emotion word labels with the emotion words in the clause to be trained, and correspondingly combining part-of-speech sequences of the clause to be trained according to part-of-speech labels, the attribute word labels and the emotion word labels corresponding to all words in the clause to be trained;
and if the clause to be trained does not carry the attribute word label and the emotion word label, combining the part of speech sequence of the clause to be trained according to the part of speech labels corresponding to the words in the clause to be trained.
5. The method for constructing a fine-grained emotional analysis model according to claim 1, wherein the step of mining target rules according to the part-of-speech sequence including the attribute word labels and/or the emotional word labels comprises:
determining a target part-of-speech sequence containing the attribute word labels and/or the emotion word labels in the part-of-speech sequence;
calculating a first sequence number which accords with the same rule in the target part-of-speech sequence, and determining a second sequence number which accords with a rule to be determined except the target part-of-speech sequence in the part-of-speech sequence, wherein the rule to be determined is the rule which accords with the target part-of-speech sequence in the first sequence number;
calculating to obtain support degree according to the total sequence number in the part of speech sequence and the first sequence number, and calculating to obtain confidence degree according to the second sequence number and the first sequence number;
and if the support degree is greater than or equal to a preset support degree threshold value and the confidence degree is greater than or equal to a preset confidence degree threshold value, taking the rule to be determined as a target rule.
6. The fine-grained emotion analysis model construction method according to claim 5, wherein, before the step of using the rule to be determined as the target rule, if the support degree is greater than or equal to a preset support degree threshold and the confidence degree is greater than or equal to a preset confidence degree threshold, the method further comprises:
acquiring the number of clauses in the clauses to be trained and a preset support rate;
and calculating the product between the clause number and the preset support rate, and taking the product as the threshold value of the preset support degree.
7. The fine-grained emotion analysis model construction method of claim 1, wherein the step of extracting the attribute word set and emotion word set in the clause to be trained according to the target rule comprises:
determining the clauses added with the attribute word labels and/or the emotion word labels in the clauses to be trained, and marking as target clauses;
and matching part-of-speech sequences of other clauses except the target clause in the clause to be trained with the target rule so as to extract the attribute word set and the emotion word set from the clause to be trained.
8. The fine-grained emotion analysis model construction method according to any one of claims 1 to 7, wherein the step of constructing a fine-grained emotion analysis model by using word vectors corresponding to the attribute words and the context information as inputs of the multi-layer neural network of the attention mechanism and using emotion category labels corresponding to the attribute words as output results of the multi-layer neural network of the attention mechanism comprises:
taking the word vector of the attribute word and the word vector corresponding to the context information as the input of an attention layer of a first layer neural network layer of an attention mechanism to obtain context information related to the emotion of the attribute word;
summing word vectors corresponding to the context information related to the attribute word emotion and the word vectors of the attribute words in a linear layer of the first layer neural network layer to obtain a summation result;
and taking the summation result as the input of the next layer of neural network, taking the emotion category label corresponding to the attribute word as the output result of the multilayer neural network of the attention mechanism, obtaining each parameter in the fine-grained emotion analysis model, and constructing the fine-grained emotion analysis model according to the parameters.
9. A fine-grained emotion analysis model construction apparatus comprising a memory, a processor, and a fine-grained emotion analysis model construction program stored on the memory and executable on the processor, the fine-grained emotion analysis model construction program when executed by the processor implementing the steps of the fine-grained emotion analysis model construction method according to any one of claims 1 to 8.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a fine-grained affective analysis model construction program, which when executed by a processor, implements the steps of the fine-grained affective analysis model construction method according to any one of claims 1 to 8.
CN201810414228.3A 2018-05-02 2018-05-02 Fine-grained emotion analysis model construction method and device and readable storage medium Active CN108647205B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810414228.3A CN108647205B (en) 2018-05-02 2018-05-02 Fine-grained emotion analysis model construction method and device and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810414228.3A CN108647205B (en) 2018-05-02 2018-05-02 Fine-grained emotion analysis model construction method and device and readable storage medium

Publications (2)

Publication Number Publication Date
CN108647205A CN108647205A (en) 2018-10-12
CN108647205B true CN108647205B (en) 2022-02-15

Family

ID=63748579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810414228.3A Active CN108647205B (en) 2018-05-02 2018-05-02 Fine-grained emotion analysis model construction method and device and readable storage medium

Country Status (1)

Country Link
CN (1) CN108647205B (en)

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109410011A (en) * 2018-11-05 2019-03-01 深圳市鹏朗贸易有限责任公司 Adopt the recommended method, terminal and computer readable storage medium of product
CN109670170B (en) * 2018-11-21 2023-04-07 东软集团股份有限公司 Professional vocabulary mining method and device, readable storage medium and electronic equipment
CN109614499B (en) * 2018-11-22 2023-02-17 创新先进技术有限公司 Dictionary generation method, new word discovery method, device and electronic equipment
CN109684634B (en) * 2018-12-17 2023-07-25 北京百度网讯科技有限公司 Emotion analysis method, device, equipment and storage medium
CN109710762B (en) * 2018-12-26 2023-08-01 南京云问网络技术有限公司 Short text clustering method integrating multiple feature weights
CN109710934B (en) * 2018-12-26 2023-07-07 南京云问网络技术有限公司 Customer service quality supervision algorithm based on emotion
CN109858035A (en) * 2018-12-29 2019-06-07 深兰科技(上海)有限公司 A kind of sensibility classification method, device, electronic equipment and readable storage medium storing program for executing
CN109766557B (en) * 2019-01-18 2023-07-18 河北工业大学 Emotion analysis method and device, storage medium and terminal equipment
CN111723199A (en) * 2019-03-19 2020-09-29 北京沃东天骏信息技术有限公司 Text classification method and device and computer readable storage medium
CN112446201B (en) * 2019-08-12 2024-08-02 北京国双科技有限公司 Method and device for determining comment properties of text
CN110457480B (en) * 2019-08-16 2023-07-28 国网天津市电力公司 Construction method of fine granularity emotion classification model based on interactive attention mechanism
CN112445907B (en) * 2019-09-02 2024-10-15 顺丰科技有限公司 Text emotion classification method, device, equipment and storage medium
CN111126046B (en) * 2019-12-06 2023-07-14 腾讯云计算(北京)有限责任公司 Sentence characteristic processing method and device and storage medium
CN111143569B (en) * 2019-12-31 2023-05-02 腾讯科技(深圳)有限公司 Data processing method, device and computer readable storage medium
CN111177392A (en) * 2019-12-31 2020-05-19 腾讯云计算(北京)有限责任公司 Data processing method and device
CN111159412B (en) * 2019-12-31 2023-05-12 腾讯科技(深圳)有限公司 Classification method, classification device, electronic equipment and readable storage medium
CN111222344B (en) * 2020-01-03 2023-07-18 支付宝(杭州)信息技术有限公司 Method and device for training neural network and electronic equipment
CN111241847B (en) * 2020-01-15 2024-07-26 深圳前海微众银行股份有限公司 Method and device for identifying emotion reasons of conversation
CN111353303B (en) * 2020-05-25 2020-08-25 腾讯科技(深圳)有限公司 Word vector construction method and device, electronic equipment and storage medium
CN112784048B (en) * 2021-01-26 2023-03-28 海尔数字科技(青岛)有限公司 Method, device and equipment for emotion analysis of user questions and storage medium
CN113221551B (en) * 2021-05-28 2022-07-29 复旦大学 Fine-grained sentiment analysis method based on sequence generation
US11366965B1 (en) * 2021-10-29 2022-06-21 Jouf University Sentiment analysis using bag-of-phrases for Arabic text dialects

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106557463A (en) * 2016-10-31 2017-04-05 东软集团股份有限公司 Sentiment analysis method and device
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN106372058A (en) * 2016-08-29 2017-02-01 中译语通科技(北京)有限公司 Short text emotion factor extraction method and device based on deep learning
CN106557463A (en) * 2016-10-31 2017-04-05 东软集团股份有限公司 Sentiment analysis method and device
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Learning user and product distributed representations using a sequence model for sentiment analysis;Tao Chen;《IEEE Computational Intelligence Magazine》;20160831;第11卷(第3期);34-44 *
Web文本观点挖掘及隐含情感倾向的研究;杨卉;《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》;20120515(第5期);I138-124 *
基于类序列规则的中文微博情感分类;郑诚;《计算机工程》;20160228;第42卷(第2期);184-194 *
基于规则的比较观点挖掘系统的设计与实现;宋雅;《中国优秀硕士学位论文全文数据库》;20150401;无 *

Also Published As

Publication number Publication date
CN108647205A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647205B (en) Fine-grained emotion analysis model construction method and device and readable storage medium
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN106649742B (en) Database maintenance method and device
CN110737768B (en) Text abstract automatic generation method and device based on deep learning and storage medium
CN105095204B (en) The acquisition methods and device of synonym
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
CN112395385B (en) Text generation method and device based on artificial intelligence, computer equipment and medium
CN110781673B (en) Document acceptance method and device, computer equipment and storage medium
CN111339248A (en) Data attribute filling method, device, equipment and computer readable storage medium
CN114330343A (en) Part-of-speech-aware nested named entity recognition method, system, device and storage medium
CN114255096A (en) Data requirement matching method and device, electronic equipment and storage medium
CN113988057A (en) Title generation method, device, equipment and medium based on concept extraction
CN110110218A (en) A kind of Identity Association method and terminal
KR102206781B1 (en) Method of fake news evaluation based on knowledge-based inference, recording medium and apparatus for performing the method
CN111274366A (en) Search recommendation method and device, equipment and storage medium
CN113361252B (en) Text depression tendency detection system based on multi-modal features and emotion dictionary
CN109298796B (en) Word association method and device
CN114138969A (en) Text processing method and device
CN110750967B (en) Pronunciation labeling method and device, computer equipment and storage medium
CN112559711A (en) Synonymous text prompting method and device and electronic equipment
CN112364666B (en) Text characterization method and device and computer equipment
JP2019086815A (en) Idea support apparatus and program
CN113688633A (en) Outline determination method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant