WO2019174423A1 - Entity sentiment analysis method and related apparatus - Google Patents

Entity sentiment analysis method and related apparatus Download PDF

Info

Publication number
WO2019174423A1
WO2019174423A1 PCT/CN2019/073665 CN2019073665W WO2019174423A1 WO 2019174423 A1 WO2019174423 A1 WO 2019174423A1 CN 2019073665 W CN2019073665 W CN 2019073665W WO 2019174423 A1 WO2019174423 A1 WO 2019174423A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
text
predicted
speech sequence
target entity
Prior art date
Application number
PCT/CN2019/073665
Other languages
French (fr)
Chinese (zh)
Inventor
王天祎
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Publication of WO2019174423A1 publication Critical patent/WO2019174423A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to the field of text analysis technology, and in particular, to a physical sentiment analysis method and related apparatus.
  • Text sentiment analysis is mainly to reflect the emotional orientation of users in certain events, people, companies, products, etc. in social media.
  • Entity sentiment analysis refers to analyzing the sentimental tendency of certain entities in the text, rather than the tendency of the whole text. The advantage of this is to make the analysis of the emotional objects more granular.
  • the present invention has been made in order to provide an entity sentiment analysis method and related apparatus that overcome the above problems or at least partially solve the above problems.
  • a method of entity sentiment analysis including:
  • the entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text.
  • the result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
  • the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity including:
  • a vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
  • it also includes:
  • the multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
  • the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted.
  • the vector of the target entity in the part of speech sequence is used as the text to be predicted.
  • the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted.
  • the results include:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under the sentiment orientation of the preset category.
  • the construction process of the entity sentiment prediction model includes:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
  • the manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
  • the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
  • a physical emotion analysis device includes:
  • An obtaining unit configured to obtain a text to be predicted
  • a word segmentation unit configured to perform word segmentation processing on the text to be predicted, to obtain a part-of-speech sequence of the text to be predicted
  • a generating unit configured to obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity
  • a prediction unit configured to predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted
  • the entity sentiment prediction model is constructed based on a first principle; the first principle comprises: iteratively updating parameters in the neural network algorithm until the feature vector of the training text is performed by using a neural network algorithm after updating the parameters
  • the prediction result obtained by the prediction is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part-of-speech sequence of the training text and the vector of the target entity in the part-of-speech sequence of the training text.
  • the generating unit includes:
  • a first obtaining unit configured to respectively obtain a word vector of each participle in the part of speech sequence of the text to be predicted
  • a second obtaining unit configured to multiply a word vector and an attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted;
  • a storage medium comprising a stored program, wherein the device in which the storage medium is located is controlled to perform the entity sentiment analysis method according to any one of the above.
  • a processor for running a program wherein the program runtime executes the entity sentiment analysis method of any of the above.
  • the word-prepared text is subjected to word segmentation processing to obtain the part-of-speech sequence of the text to be predicted, and then each of the part of speech sequence of the text to be predicted is obtained.
  • a vector of the word segmentation and a vector of the target entity, and the vector of the target word in the part of speech of the text to be predicted and the vector of the target entity are predicted by the entity sentiment prediction model, and the target entity in the text to be predicted is obtained.
  • the predictive outcome of sentiment orientation is obtained.
  • the text to be predicted is subjected to word segmentation to obtain a part-of-speech sequence, and the vector of each participle in the part-of-speech sequence and the vector of the target entity are obtained, and the word selection is not selected by the manual selection and the word feature is solved, and the manual selection is solved. Words and questions that provide the characteristics of the words that affect the accuracy of the emotionally biased results.
  • FIG. 1 is a flow chart showing a process of constructing a physical sentiment prediction model disclosed in an embodiment of the present invention
  • FIG. 2 is a flowchart showing a specific implementation manner of step S102 disclosed in the embodiment of the present invention.
  • FIG. 3 is a flowchart of a method for analyzing entity sentiment according to an embodiment of the present invention
  • FIG. 4 is a flowchart showing a specific implementation manner of step S303 disclosed in the embodiment of the present invention.
  • FIG. 5 is a flowchart of a specific implementation manner of step S304 disclosed in the embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a physical sentiment analysis apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a generating unit disclosed in an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a prediction unit disclosed in an embodiment of the present invention.
  • the entity sentiment prediction model needs to be used to predict the predicted text. Therefore, before performing the entity sentiment analysis method disclosed in the embodiment of the present application, the entity sentiment prediction model needs to be constructed first.
  • the process of constructing the entity sentiment prediction model includes:
  • a training document is prepared, and the training document includes at least one training text.
  • the training text is a user's evaluation statement about certain events, people, businesses, products, and so on.
  • LTP Lianguage Technology Platform
  • LTP Lianguage Technology Platform
  • the part-of-speech sequence includes the word segmentation sequence and the part-of-speech result.
  • the word segmentation sequence includes each segmentation word obtained after segmentation of the training text; the part-of-speech result includes the part of speech of each segmentation word.
  • the training text is: the car front face design is mighty and domineering.
  • the obtained word segmentation sequence is [car, front face, design, mighty, domineering]
  • the part of speech result is [n, n, v, a, n]
  • n represents general noun, noun
  • v represents verb, verb
  • a stands for adjective, adjective.
  • each participle in the part of speech sequence of the training text needs to be expressed by using a feature vector. Therefore, it is necessary to obtain a vector of the word segmentation for each participle in the part of speech sequence of the training text.
  • the training text includes a target entity, and the part-of-speech sequence after the word segmentation processing of the training text also includes a word segmentation corresponding to the target entity. Therefore, the vector of the word segmentation corresponding to the target entity in the part of speech sequence of the training text is a vector of the target entity.
  • the step includes:
  • the word vector model is separately screened, and the word vector of the current participle in the word vector model is obtained.
  • the open source tool software is used to segment each text sentence in the text library, and the word vector model is used for word vector training, that is, the word vector model is generated.
  • the text library includes an industry corpus and a general corpus, which refers to a text library that is separated from the industry.
  • the role of the word vector model is to map words to a certain latitude space, which can represent the similarity between words and words.
  • the word vector model contains the low-frequency long tail words appearing in the corpus (low-frequency long-tail words refer to words whose frequency is lower than a certain threshold in all vocabulary), and are collectively recorded as UNK (unknown keyword). UNK has a unique word vector in the word vector model.
  • the word vector of the participle uses the UNK word vector.
  • each participle in the part-of-speech sequence of the training text which differs in part of speech, may also result in a different emotional orientation of the target entity. Therefore, the part of speech vector of each part of the part of speech of the training text can also be obtained.
  • a random vector of a certain dimension is used for the part of speech. For example, if there are five kinds of words [a, b, c, d, e], then the random vector Va can be used to represent a. Similarly, the random vector Vb is used to represent b, Va. The dimension of Vb can be arbitrarily specified. For each participle in the part of speech sequence of the training text, a corresponding part of speech vector can be obtained according to its part of speech.
  • the word package to which the word segment belongs also affects the judgment of the sentiment orientation of the target entity.
  • a part of the participle of the training text does not find the corresponding word vector in the word vector model.
  • the word package vector can be a comprehensive reflection of the word segmentation.
  • a packet vector for each of the participles in the part of speech sequence of the training text can also be obtained.
  • each word segmentation of the part of speech of the training text and the belonging relationship of the industry domain word package are encoded to obtain a word packet vector of each word segment in the part of speech sequence of the training text. For example, it is determined whether each participle in the part of speech sequence of the training text is in the entity word package and is in the evaluation word package. The result of the judgment is encoded to obtain a packet vector of each part of the part of speech of the training text.
  • the distance of each participle in the part-of-speech sequence of the training text relative to the target entity may have a different influence on the sentiment orientation of the target entity.
  • the distance between the participle and the target entity is far, and the influence of the sentiment orientation of the target entity is less. Therefore, it is also necessary to obtain a vector of the distance of each word segment relative to the target entity in the part of speech sequence of the training text.
  • the distance is compared with the distance of the target entity to obtain a vector of the relative target entity distance of each participle.
  • the target entity is the front face design
  • the distance between each participle from the target entity is [-2,-1,0,0,1,2,3]
  • the distance sequence is encoded, and -2, -1, 0, 1, 2, and 3 are respectively encoded into vectors of a certain dimension, and a vector of relative target entity distances of each participle is obtained.
  • the word vector, the part of speech vector, the word packet vector, and the vector of the relative target entity distance of each part of the part of the training text it is also necessary to combine the part of speech of the training text.
  • the word vector, the part of speech vector, the word packet vector, and the vector of the relative target entity distance of each of the participles in the middle get the initial vector of each participle in the part of speech sequence of the training text.
  • the word vector, the part of speech vector, the word packet vector, and the vector of the relative target entity distance are respectively spliced and combined to form an initial vector of the participle.
  • S1022 Multiply a word vector of each part of the part of the training text and an attenuation factor to obtain a vector of each participle in the part of speech sequence of the training text.
  • the vector of each word segmentation of the part of the training text and the corresponding attenuation factor are multiplied to obtain a vector of the word segmentation.
  • the length of the word segmentation of each training text in the training document is counted, and whether the long outlier length text exists in the training document is determined. Specifically, the standard deviation of the mean of the segmentation length of the training text is calculated, and the extra-long outlier length text is the training text other than the multiple of the standard deviation of the mean value. The specific multiple requirements can be set according to the actual situation.
  • the length of the segmentation of the training text having the longest segmentation length in the training document is used as the length of the part-of-speech sequence of the training document. If it is determined that there is an extra long out-of-group length text in the training document, the length of the segmentation length of the training text having the longest segmentation length is included in the training text remaining in the training document except for the extra-long out-group length text. The length of the part of speech sequence of the training document. And, the extra long outlier length text in the training document is intercepted according to the length of the part of speech sequence of the training document. Specifically, it is centered on the target entity in the training text, and is extended forward and backward respectively until the length of the word segmentation reaches the length of the part of speech sequence of the training document.
  • the training document there are 10 training texts in the training document, and the length of the word segmentation of each training text is not equal, but the length of the segmentation of the longest one training text is 50, then 50 is the length of the part-of-speech sequence of the training document. If there is a training text in the training document with a length of 1000, then the training text is an extra long out-of-group text.
  • S1023 A vector of the participle of the training text corresponding to the target entity is used as a vector of the target entity.
  • the average of the vectors of the plurality of word segments corresponding to the target entity in the training text is used as a vector of the target entity.
  • the vector of the target entity is combined on a vector of each participle in the part of speech sequence of the training text to obtain a vector of each participle in the part of speech sequence of the training text.
  • the vector of each participle in the part of speech of the training text is calculated using the attention layer of HAN (Hierarchical Attention Networks), and the weight of each participle is obtained. Specifically, if the participle is farther away from the target entity, the emotional impact on the target entity is not large, and the weight does not need to be paid too much, and the weight is weakened. Otherwise, the weight is strengthened.
  • the vector of the target entity in the part of speech sequence of the training text is multiplied by the first matrix to obtain a derived vector of the target entity.
  • the first matrix is a matrix of m ⁇ m, and m is the dimension of the vector of the target entity in the part-of-speech sequence of the training text.
  • the specific value of the first matrix is a randomly initialized value, and each value can select a fraction that is uniformly distributed in the range of -0.1 to 0.1.
  • the weighted vector of the training text may be used as the feature vector, or the derived vector of the target entity may be used as the feature vector, and the part of the training text may be weighted. Based on the latter vector, the derived vector of the target entity is added or subtracted to obtain the feature vector.
  • the derived vector of the target entity may be selected as the feature vector.
  • the obtained feature vector may cause the derived vector of the target entity to act on the The weighted vector of the part of speech of the training text.
  • steps S103-S104 may be repeatedly performed several times, wherein the number of times to be repeatedly executed may be set according to actual requirements.
  • the feature vector obtained last time in step S104 is used as the vector of the target entity at the next execution of steps S103 and S104, and the latest weighted vector of the part of speech of the training text and the latest of the target entity are obtained. a derived vector, and then a new feature vector is obtained according to the latest weighted vector of the part of speech of the training text and/or the latest derived vector of the target entity.
  • S106 Process the feature vector by using a softmax function to obtain a probability output vector.
  • the probability output vector includes probability values of three categories, including positive, medium and negative. It is indicated that the training text is positive for the target entity; negative indicates that the training text is negative for the target entity; the middle indicates that the training text is neutral for the target entity.
  • the probability value of each category is used to indicate the probability that the entity emotion of the target entity of the training text belongs to the corresponding category.
  • the emotion of the training text to the target entity is manually recognized, and the positive, middle and negative emotions of the three categories are marked according to the emotion, and the manual labeling category of the training text is obtained.
  • the training text of “the front face design of the car is mighty and domineering”
  • the target entity is “front face design”
  • the emotion is positive. Therefore, the identifier of the manual label category of the training text can be [1, 0, 0].
  • the first parameter includes a vector of each of the attention layer, the first matrix, the softmax function, and the part of speech of the training text.
  • the manner of obtaining the vector of each participle in the part-of-speech sequence of the training text can be referred to in the embodiment corresponding to FIG. 1 , and the content of step S102 is not described herein.
  • the loss function can be optimized by a stochastic gradient descent method or an Adam optimization algorithm, etc., and an optimized loss function is obtained, and the updated parameter is recursively layer by layer according to the optimized loss function.
  • the equivalent meaning is that, from the perspective of those skilled in the art, the probability output vector can be regarded as equivalent to the manual labeling category of the training text, and may include not exactly the same.
  • the updated second parameter is used as an entity sentiment prediction model.
  • the second parameter includes: the attention layer, the first matrix, and the softmax function.
  • entity sentiment analysis can be performed on the predicted text.
  • the entity sentiment analysis method includes:
  • the text to be predicted is a user's evaluation statement about certain events, people, businesses, products, and the like.
  • the text to be predicted is obtained to analyze the sentiment orientation of the text with respect to the target entity in the text.
  • the open source tool software is also used for word segmentation, and the part-of-speech sequence of the corresponding word segmentation is obtained.
  • the specific implementation process of this step refer to the content of step S101 in the embodiment corresponding to FIG. 1, and details are not described herein again.
  • the step includes:
  • S3031 Obtain a word vector of each participle in the part of speech sequence of the text to be predicted, respectively.
  • a part of speech vector, a word packet vector, and a relative of each part of the part of speech sequence of the text to be predicted may be obtained.
  • the vector of the target entity distance may be obtained.
  • step S1021 For the manner in which the word vector, the part of speech vector, the word packet vector, and the vector of the target entity distance are obtained, refer to the content of step S1021 in the embodiment corresponding to FIG.
  • S3032 Multiply a word vector of each part of the part of speech of the text to be predicted and an attenuation factor to obtain a vector of each part of the part of speech of the text to be predicted.
  • step S1022 For the specific implementation of this step, refer to the content of step S1022 in the embodiment corresponding to FIG. 1, and details are not described herein again.
  • a vector of the word segmentation corresponding to the target entity in the text to be predicted is used as a vector of the target entity.
  • the vector of the plurality of word segments corresponding to the target entity in the text to be predicted is The average value is used as the vector of the target entity.
  • S304 predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the target entity of the text to be predicted; wherein the entity emotion
  • the prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction result obtained by predicting the feature vector of the training text by using the neural network algorithm after updating the parameter is equivalent The result is manually labeled; the feature vector of the training text is obtained according to a vector of the part of speech of the training text and a vector of the target entity in the part of speech of the training text.
  • a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity are obtained.
  • predicting, by the entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted.
  • the text to be predicted is subjected to word segmentation to obtain a part-of-speech sequence, and the vector of each participle in the part-of-speech sequence and the vector of the target entity are obtained, and the word selection is not selected by the manual selection and the word feature is solved, and the manual selection is solved. Words and questions that provide the characteristics of the words that affect the accuracy of the emotionally biased results.
  • step S304 includes:
  • S3041 Perform weighted average processing on a vector of each participle in the part of speech sequence of the text to be predicted, and obtain a vector weighted by the part of speech sequence of the text to be predicted.
  • step S103 For the specific implementation of this step, refer to the content of step S103 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
  • S3042 Multiply a vector of the target entity in the part-of-speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity.
  • the first matrix is a first matrix corresponding to the entity sentiment prediction model in step S109 in the embodiment of FIG. 1.
  • the first matrix is a first matrix corresponding to the entity sentiment prediction model in step S109 in the embodiment of FIG. 1.
  • this step refer to the content of step S104 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
  • step S105 For the specific implementation of this step, refer to the content of step S105 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
  • S3044 Processing the feature vector by using a softmax function to obtain a probability output vector.
  • the softmax function is a softmax function corresponding to the entity sentiment prediction model in step S109 in the embodiment of FIG. 1.
  • the softmax function refer to the content of step S106 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
  • the entity sentiment analysis apparatus includes:
  • the obtaining unit 601 is configured to obtain the text to be predicted.
  • the word segmentation unit 602 is configured to perform word segmentation processing on the to-be-predicted text to obtain a part-of-speech sequence of the to-be-predicted text.
  • the generating unit 603 is configured to obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity.
  • the generating unit 603, referring to FIG. 7, includes:
  • the first obtaining unit 6031 is configured to respectively obtain a word vector of each participle of the part of speech sequence of the text to be predicted.
  • the second obtaining unit 6032 is configured to multiply the word vector of each part of the part of speech of the text to be predicted and the attenuation factor to obtain a vector of each part of the part of speech of the text to be predicted.
  • the generating sub-unit 6033 is configured to use a vector of the word segmentation corresponding to the target entity in the text to be predicted as a vector of the target entity in the part-of-speech sequence of the text to be predicted.
  • the generating sub-unit 6033 executes the corresponding to the target entity in the text to be predicted.
  • the vector of the word segmentation is used as the vector of the target entity in the part of speech sequence of the text to be predicted, and is specifically used to: use an average value of a vector of the plurality of word segments corresponding to the target entity in the text to be predicted as the A vector that predicts the target entity in the part-of-speech sequence of the text.
  • the entity sentiment analysis apparatus further includes:
  • a third obtaining unit configured to obtain any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each part of the part of speech of the text to be predicted.
  • a combination unit a word vector for combining each of the participles in the part of speech sequence of the text to be predicted, and a part of speech vector, a word package vector, and a relative target of each of the participle sequences of the obtained text to be predicted Any one or combination of the vectors of the entity distances obtains an initial vector of each of the word segments in the part of speech sequence of the text to be predicted.
  • the second obtaining unit 6032 performs multiplication of the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted. Specifically, the method uses: multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
  • a prediction unit 604 configured to predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction of the sentiment orientation of the target entity in the text to be predicted a result;
  • the entity sentiment prediction model is constructed based on a first principle; the first principle comprises: iteratively updating parameters in the neural network algorithm until a feature vector of the training text is obtained by using a neural network algorithm after updating the parameter
  • the prediction result obtained by the prediction is equivalent to the manual annotation result;
  • the feature vector of the training text is obtained according to the vector of the part-of-speech sequence of the training text and the vector of the target entity in the part-of-speech sequence of the training text.
  • the prediction unit 604, as shown in FIG. 8, includes:
  • the first calculating unit 6041 is configured to perform weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, to obtain a vector weighted by the part of speech sequence of the text to be predicted.
  • the second calculating unit 6042 is configured to multiply a vector of the target entity in the part-of-speech sequence of the text to be predicted by the first matrix to obtain a derived vector of the target entity.
  • the third calculating unit 6043 is configured to obtain a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted.
  • the fourth calculating unit 6044 is configured to process the feature vector by using a softmax function to obtain a probability output vector.
  • the text to be predicted is subjected to word segmentation by the word segmentation unit to obtain a part of speech sequence, and the vector of each word segment in the part of speech sequence and the vector of the target entity are obtained by the generating unit, instead of manually selecting words and extracting word features.
  • word segmentation unit to obtain a part of speech sequence
  • vector of each word segment in the part of speech sequence and the vector of the target entity are obtained by the generating unit, instead of manually selecting words and extracting word features.
  • the entity sentiment analysis apparatus may further predict the training text to obtain an entity sentiment prediction model.
  • the word segmentation unit 602 is further configured to perform word segmentation processing on the training text to obtain a part-of-speech sequence of the training text.
  • the generating unit 603 is further configured to obtain a vector of each participle in the part of speech sequence of the training text and a vector of the target entity.
  • the first calculating unit 6041 is further configured to perform weighted averaging processing on the vector of each participle in the part of speech of the training text, to obtain a vector with the partiality of the part of the training text.
  • the second calculating unit 6042 is further configured to multiply a vector of the target entity in the part-of-speech sequence of the training text by the first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text.
  • the third calculating unit 6043 is further configured to obtain a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech sequence of the training text.
  • the fourth calculating unit 6044 is further configured to process the feature vector by using a softmax function to obtain a probability output vector.
  • the entity sentiment analysis apparatus further includes: an operation unit configured to perform an entropy operation on the probability output output vector and the manual annotation category of the training text to obtain a loss function.
  • An optimization unit for optimizing the loss function is an optimization unit for optimizing the loss function.
  • an updating unit configured to update the first parameter according to the optimized loss function, until the fourth computing unit 6044 predicts the training text by using the updated feature vector obtained by the updated first parameter, and the The manual annotation category of the training text is substantially equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text.
  • a building unit configured to use the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
  • the entity sentiment analysis apparatus includes a processor and a memory, and the above-mentioned acquisition unit, word segmentation unit, generation unit, prediction unit, and the like are all stored as a program unit in a memory, and the processor executes the above-mentioned program unit stored in the memory to implement corresponding The function.
  • the processor contains a kernel, and the kernel removes the corresponding program unit from the memory.
  • the kernel may set one or more, and adjust the kernel parameters to implement the sentiment analysis process of the text to be predicted to obtain the prediction result of the sentiment orientation of the target entity in the text to be predicted.
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
  • RAM random access memory
  • ROM read only memory
  • flash RAM flash memory
  • Embodiments of the present invention provide a storage medium on which a program is stored, which is implemented by a processor to implement the entity sentiment analysis method.
  • An embodiment of the present invention provides a processor, where the processor is configured to run a program, where the program executes the method of entity sentiment analysis.
  • the embodiment of the invention provides a device, which may be a server, a PC, a PAD, a mobile phone or the like.
  • the device includes a processor, a memory, and a program stored on the memory and executable on the processor, and the processor performs the following steps when executing the program:
  • a method of entity sentiment analysis including:
  • the entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text.
  • the result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
  • the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity including:
  • a vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
  • the entity sentiment analysis method further includes:
  • the multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
  • the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted.
  • the vector of the target entity in the part of speech sequence is used as the text to be predicted.
  • the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted.
  • the results include:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under the sentiment orientation of the preset category.
  • the process of constructing the entity sentiment prediction model includes:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
  • the manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
  • the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
  • the invention also provides a computer program product, when executed on a data processing device, adapted to perform a process of initializing the method steps as follows:
  • a method of entity sentiment analysis including:
  • the entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text.
  • the result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
  • the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity including:
  • a vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
  • the entity sentiment analysis method further includes:
  • the multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
  • the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted.
  • the vector of the target entity in the part of speech sequence is used as the text to be predicted.
  • the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted.
  • the results include:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under a sentiment orientation of a preset species.
  • the process of constructing the entity sentiment prediction model includes:
  • the feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
  • the manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
  • the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • the implemented processing such as instructions executed on a computer or other programmable device, provides steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media includes both permanent and non-persistent, removable and non-removable media.
  • Information storage can be implemented by any method or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Disclosed in the present invention are an entity sentiment analysis method and a related apparatus, the entity sentiment analysis method comprising: performing word segmentation processing on a text to be predicted to obtain a part of speech sequence of the text to be predicted and to obtain the vector of each participle in the part of speech sequence of the text to be predicted and the vector of a target entity; and carrying out prediction for the vector of the each participle in the part of speech sequence of the text to be predicted and for the vector of the target entity by means of an entity sentiment analysis model so as to be able to obtain a prediction result about the sentiment tendency of the target entity in the text to be predicted. In the described process, the text to be predicted is subject to word segmentation processing to obtain a the part of speech sequence thereof and to obtain the vector of each participle in the part of speech sequence and the vector of the target entity, instead of manual word selection and extraction of word features, which solves the problem of the accuracy of a sentiment tendency result being affected by manual word selection and provision of word features.

Description

实体情感分析方法及相关装置Entity sentiment analysis method and related device
本申请要求于2018年03月16日提交中国专利局、申请号为201810217282.9、发明名称为“实体情感分析方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. 201810217282.9, entitled "Entity Affective Analysis Method and Related Device", filed on March 16, 2018, the entire contents of which is incorporated herein by reference. .
技术领域Technical field
本发明涉及文本分析技术领域,尤其涉及一种实体情感分析方法及相关装置。The present invention relates to the field of text analysis technology, and in particular, to a physical sentiment analysis method and related apparatus.
背景技术Background technique
文本情感分析主要是为了反映社交媒体中,用户关于某些事件、人物、企业、产品等的情感倾向性。实体情感分析是指分析文本中关于某些实体的情感倾向性,而非整个文本的倾向性,这样的好处是使得情感对象的分析粒度更加清晰。Text sentiment analysis is mainly to reflect the emotional orientation of users in certain events, people, companies, products, etc. in social media. Entity sentiment analysis refers to analyzing the sentimental tendency of certain entities in the text, rather than the tendency of the whole text. The advantage of this is to make the analysis of the emotional objects more granular.
现有方案一般主要依赖于人工提取特征进行传统的机器学习分类算法。具体的,人工选择文本中目标实体周边的词语,并提取该词语的特征输入至分类器,由分类器进行情感分析,得到文本对目标实体的情感倾向性结果。Existing solutions generally rely on manual extraction of features for traditional machine learning classification algorithms. Specifically, the words around the target entity in the text are manually selected, and the feature of the word is extracted and input to the classifier, and the classifier performs sentiment analysis to obtain the sentimental tendency result of the text to the target entity.
由人工选择词语并提取词语的特征,会使得特征提取过程带有很强的主观性,会影响情感倾向性结果的准确性。Selecting words by hand and extracting the characteristics of words will make the feature extraction process have strong subjectivity and affect the accuracy of emotional tendency results.
发明内容Summary of the invention
鉴于上述问题,提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的实体情感分析方法及相关装置。In view of the above problems, the present invention has been made in order to provide an entity sentiment analysis method and related apparatus that overcome the above problems or at least partially solve the above problems.
一种实体情感分析方法,包括:A method of entity sentiment analysis, including:
获取待预测文本;Get the text to be predicted;
对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;Performing word segmentation processing on the text to be predicted to obtain a part-of-speech sequence of the text to be predicted;
获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情 感倾向性的预测结果;其中:所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直到利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。Using a physical sentiment prediction model to predict a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted; wherein: The entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text The result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
可选地,所述获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,包括:Optionally, the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity, including:
分别获得所述待预测文本的词性序列中的每一个分词的词向量;Obtaining a word vector of each of the participles in the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量;Multiplying a word vector of each word segment in the part-of-speech sequence of the text to be predicted and an attenuation factor to obtain a vector of each word segment in the part-of-speech sequence of the text to be predicted;
将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。A vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
可选地,还包括:Optionally, it also includes:
获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合;Obtaining any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each participle of the part of speech sequence of the text to be predicted;
组合所述待预测文本的词性序列中的每一个分词的词向量、以及获得的待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合,得到所述待预测文本的词性序列中的每一个分词的初始向量;Combining the word vector of each participle in the part-of-speech sequence of the text to be predicted, and the part-of-speech vector of each part of the obtained part-of-speech sequence of the text to be predicted, the word packet vector, and the vector of the relative target entity distance One or a combination, obtaining an initial vector of each of the participles in the part of speech sequence of the text to be predicted;
其中,所述将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量,包括:The multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
将所述待预测文本的词性序列中的每一个分词的初始向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。Multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
可选地,若所述待预测文本中对应所述目标实体的分词包括多个,则将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述待预测文本的词性序列中的目标实体的向量。Optionally, if a part of the to-be-predicted text corresponding to the target entity includes a plurality, the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted. The vector of the target entity in the part of speech sequence.
可选地,所述利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中 目标实体的情感倾向性的预测结果,包括:Optionally, the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted. The results include:
对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, and obtaining a vector weighted by the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量;Multiplying a vector of the target entity in the part of speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity;
依据所述待预测文本的词性序列加权后的向量,和/或,所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量;And obtaining a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述待预测文本中目标实体分别在预设种类别的情感倾向性下的概率值。The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under the sentiment orientation of the preset category.
可选地,Optionally,
所述实体情感预测模型的构建过程,包括:The construction process of the entity sentiment prediction model includes:
对训练文本进行分词处理,得到所述训练文本的词性序列;Performing word segmentation processing on the training text to obtain a part-of-speech sequence of the training text;
获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the training text and a vector of the target entity;
对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech of the training text to obtain a vector weighted by the part of speech of the training text;
将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述训练文本的词性序列中的目标实体的派生向量;Multiplying a vector of the target entity in the part-of-speech sequence of the training text with a first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text;
依据所述训练文本的词性序列加权后的向量,和/或所述训练文本的词性序列中的目标实体的派生向量,得到特征向量;Obtaining a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech sequence of the training text;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述训练文本中目标实体分别在预设种类别的情感倾向性下的概率值;The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数;Performing a cross-entropy operation on the probability output vector and the artificial annotation category of the training text to obtain a loss function;
优化所述损失函数,并根据所述优化后的损失函数更新第一参数,直至利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别等同为止;其中,所述第一 参数包括所述第一矩阵、所述softmax函数以及所述训练文本的词性序列中的每一个分词的向量;Optimizing the loss function, and updating the first parameter according to the optimized loss function until the probability output vector obtained by predicting the training text by using the updated feature vector obtained by the updated first parameter and the training text The manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述第一矩阵和所述softmax函数。And using the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
一种实体情感分析装置,包括:A physical emotion analysis device includes:
获取单元,用于获取待预测文本;An obtaining unit, configured to obtain a text to be predicted;
分词单元,用于对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;a word segmentation unit, configured to perform word segmentation processing on the text to be predicted, to obtain a part-of-speech sequence of the text to be predicted;
生成单元,用于获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;a generating unit, configured to obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
预测单元,用于利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中,所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直至利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。a prediction unit, configured to predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted Wherein the entity sentiment prediction model is constructed based on a first principle; the first principle comprises: iteratively updating parameters in the neural network algorithm until the feature vector of the training text is performed by using a neural network algorithm after updating the parameters The prediction result obtained by the prediction is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part-of-speech sequence of the training text and the vector of the target entity in the part-of-speech sequence of the training text.
可选地,所述生成单元,包括:Optionally, the generating unit includes:
第一获得单元,用于分别获得所述待预测文本的词性序列中的每一个分词的词向量;a first obtaining unit, configured to respectively obtain a word vector of each participle in the part of speech sequence of the text to be predicted;
第二获得单元,用于将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量;a second obtaining unit, configured to multiply a word vector and an attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted;
生成子单元,用于将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。And generating a subunit, configured to use a vector of the word segment corresponding to the target entity in the text to be predicted as a vector of the target entity in the part of speech sequence of the text to be predicted.
一种存储介质,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行如上述任意一项所述的实体情感分析方法。A storage medium, the storage medium comprising a stored program, wherein the device in which the storage medium is located is controlled to perform the entity sentiment analysis method according to any one of the above.
一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述任意一项所述的实体情感分析方法。A processor for running a program, wherein the program runtime executes the entity sentiment analysis method of any of the above.
借由上述技术方案,本发明提供的实体情感分析方法及相关装置中,对待预测文本进行分词处理得到所述待预测文本的词性序列后,再得到所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,由实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,即可得到所述待预测文本中目标实体的情感倾向性的预测结果。由于在上述过程中,对待预测文本是进行分词处理得到词性序列,并得到词性序列中的每一个分词的向量和目标实体的向量,并不是由人工选词并提取词语特征,解决了由于人工选词且提供词语特征而导致的影响情感倾向性结果的准确性的问题。According to the above technical solution, in the entity sentiment analysis method and the related device provided by the present invention, the word-prepared text is subjected to word segmentation processing to obtain the part-of-speech sequence of the text to be predicted, and then each of the part of speech sequence of the text to be predicted is obtained. a vector of the word segmentation and a vector of the target entity, and the vector of the target word in the part of speech of the text to be predicted and the vector of the target entity are predicted by the entity sentiment prediction model, and the target entity in the text to be predicted is obtained. The predictive outcome of sentiment orientation. In the above process, the text to be predicted is subjected to word segmentation to obtain a part-of-speech sequence, and the vector of each participle in the part-of-speech sequence and the vector of the target entity are obtained, and the word selection is not selected by the manual selection and the word feature is solved, and the manual selection is solved. Words and questions that provide the characteristics of the words that affect the accuracy of the emotionally biased results.
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。The above description is only an overview of the technical solutions of the present invention, and the above-described and other objects, features and advantages of the present invention can be more clearly understood. Specific embodiments of the invention are set forth below.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not to be construed as limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the drawing:
图1示出了本发明实施例公开的实体情感预测模型的构建过程的流程图;1 is a flow chart showing a process of constructing a physical sentiment prediction model disclosed in an embodiment of the present invention;
图2示出了本发明实施例公开的步骤S102的具体执行方式的流程图;FIG. 2 is a flowchart showing a specific implementation manner of step S102 disclosed in the embodiment of the present invention;
图3示出了本发明实施例公开的实体情感分析方法的流程图;FIG. 3 is a flowchart of a method for analyzing entity sentiment according to an embodiment of the present invention;
图4示出了本发明实施例公开的步骤S303的具体执行方式的流程图;FIG. 4 is a flowchart showing a specific implementation manner of step S303 disclosed in the embodiment of the present invention;
图5示出了本发明实施例公开的步骤S304的具体执行方式的流程图;FIG. 5 is a flowchart of a specific implementation manner of step S304 disclosed in the embodiment of the present invention;
图6示出了本发明实施例公开的实体情感分析装置的结构示意图;FIG. 6 is a schematic structural diagram of a physical sentiment analysis apparatus according to an embodiment of the present invention;
图7示出了本发明实施例公开的生成单元的结构示意图;FIG. 7 is a schematic structural diagram of a generating unit disclosed in an embodiment of the present invention;
图8示出了本发明实施例公开的预测单元的结构示意图。FIG. 8 is a schematic structural diagram of a prediction unit disclosed in an embodiment of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the embodiments of the present invention have been shown in the drawings, the embodiments Instead, these embodiments are provided to enable
透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。The disclosure is to be thoroughly understood and the scope of the disclosure may be fully conveyed to those skilled in the art.
本申请实施例中,需要采用实体情感预测模型对待预测文本进行预测。因此,在执行本申请实施例公开的实体情感分析方法之前,需要先构建所述实体情感预测模型。In the embodiment of the present application, the entity sentiment prediction model needs to be used to predict the predicted text. Therefore, before performing the entity sentiment analysis method disclosed in the embodiment of the present application, the entity sentiment prediction model needs to be constructed first.
参见图1,所述实体情感预测模型的构建过程,包括:Referring to FIG. 1, the process of constructing the entity sentiment prediction model includes:
S101、对训练文本进行分词处理,得到所述训练文本的词性序列。S101. Perform word segmentation on the training text to obtain a part of speech sequence of the training text.
其中,准备训练文档,该训练文档中至少包括一个训练文本。训练文本为用户关于某些事件、人物、企业以及产品等的评价语句。Wherein, a training document is prepared, and the training document includes at least one training text. The training text is a user's evaluation statement about certain events, people, businesses, products, and so on.
针对训练文档中的每一个训练文本,采用开源工具软件,如LTP(哈工大语言技术平台,Language Technology Platform)进行分词,并获取相应分词的词性序列,其中,所述词性序列包括分词序列和词性结果;所述分词序列包括对训练文本进行分词后而得到的各个分词;所述词性结果包括各个分词的词性。例如:训练文本为:汽车前脸设计威武霸气。对该训练文本进行分词处理后,得到的分词序列为[汽车,前脸,设计,威武,霸气],词性结果为[n,n,v,a,n];n代表general noun,名词;v代表verb,动词;a代表adjective,形容词。For each training text in the training document, open source tools, such as LTP (Language Technology Platform), are used to segment the words and obtain the part-of-speech sequence of the corresponding participle, wherein the part-of-speech sequence includes the word segmentation sequence and the part-of-speech result. The word segmentation sequence includes each segmentation word obtained after segmentation of the training text; the part-of-speech result includes the part of speech of each segmentation word. For example: the training text is: the car front face design is mighty and domineering. After the word segmentation of the training text, the obtained word segmentation sequence is [car, front face, design, mighty, domineering], the part of speech result is [n, n, v, a, n]; n represents general noun, noun; v Represents verb, verb; a stands for adjective, adjective.
S102、获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量。S102. Obtain a vector of each participle in the part of speech sequence of the training text and a vector of the target entity.
其中,所述训练文本的词性序列中的每一个分词,需要采用特征向量的方式来表达。因此,需要针对所述训练文本的词性序列中的每一个分词,获得该分词的向量。所述训练文本包括目标实体,对所述训练文本进行分词处理后的词性序列中也包括对应目标实体的分词。因此,所述训练文本的词性序列中对应目标实体的分词的向量,则为目标实体的向量。Wherein, each participle in the part of speech sequence of the training text needs to be expressed by using a feature vector. Therefore, it is necessary to obtain a vector of the word segmentation for each participle in the part of speech sequence of the training text. The training text includes a target entity, and the part-of-speech sequence after the word segmentation processing of the training text also includes a word segmentation corresponding to the target entity. Therefore, the vector of the word segmentation corresponding to the target entity in the part of speech sequence of the training text is a vector of the target entity.
可选地,步骤S102的一种实现方式中,参见图2,该步骤包括:Optionally, in an implementation manner of step S102, referring to FIG. 2, the step includes:
S1021、分别获得所述训练文本的词性序列中的每一个分词的词向量。S1021. Obtain a word vector of each participle in the part of speech sequence of the training text, respectively.
其中,对于所述训练文本的词性序列中的每一个分词,分别在词向量模型进行筛查,获取当前分词在词向量模型中的词向量。Wherein, for each participle in the part of speech sequence of the training text, the word vector model is separately screened, and the word vector of the current participle in the word vector model is obtained.
使用开源工具软件对文本库中每条文本句子进行分词,并使用词向量模型进行词向量训练,即生成词向量模型。所述文本库包括行业语料库和通用语料库,所述通用语料库指脱离行业个性化的文本库。词向量模型的作用是将词映射到一定纬度的空间内,能表征词与词之间的相似性。同时,词向量模型中包含了出现在语料库中的低频长尾词(低频长尾词指在全部词汇中出现频率低于某个阈值的词汇),统一记为UNK(unknown keyword,未知关键字),UNK在词向量模型中共有唯一的词向量。The open source tool software is used to segment each text sentence in the text library, and the word vector model is used for word vector training, that is, the word vector model is generated. The text library includes an industry corpus and a general corpus, which refers to a text library that is separated from the industry. The role of the word vector model is to map words to a certain latitude space, which can represent the similarity between words and words. At the same time, the word vector model contains the low-frequency long tail words appearing in the corpus (low-frequency long-tail words refer to words whose frequency is lower than a certain threshold in all vocabulary), and are collectively recorded as UNK (unknown keyword). UNK has a unique word vector in the word vector model.
若所述训练文本的词性序列中的某个分词,在所述词向量模型中没有对应的词向量,那么该分词的词向量即使用UNK词向量。If a certain part of the part of the training text has no corresponding word vector in the word vector model, then the word vector of the participle uses the UNK word vector.
还需要说明的是,训练文本的词性序列中的每一个分词,其词性的不同,也会导致目标实体的情感倾向性的不同。因此,还可以获得所述训练文本的词性序列中的每一个分词的词性向量。It should also be noted that each participle in the part-of-speech sequence of the training text, which differs in part of speech, may also result in a different emotional orientation of the target entity. Therefore, the part of speech vector of each part of the part of speech of the training text can also be obtained.
具体的,对词性进行一定维数的随机向量,比如词性共有5种[a,b,c,d,e],那么可以用随机向量Va表示a,同理,用随机向量Vb表示b,Va、Vb的维数可以任意指定。针对所述训练文本的词性序列中的每一个分词,可以根据其词性得到对应的词性向量。Specifically, a random vector of a certain dimension is used for the part of speech. For example, if there are five kinds of words [a, b, c, d, e], then the random vector Va can be used to represent a. Similarly, the random vector Vb is used to represent b, Va. The dimension of Vb can be arbitrarily specified. For each participle in the part of speech sequence of the training text, a corresponding part of speech vector can be obtained according to its part of speech.
同理,分词所属的词包也会影响对目标实体的情感倾向性的判断,尤其是训练文本的词性序列中的某一个分词并没有在所述词向量模型中找到对应的词向量,通过分词的词包向量,能够能为全面的反映分词。因此,还可以获得所述训练文本的词性序列中的每一个分词的词包向量。Similarly, the word package to which the word segment belongs also affects the judgment of the sentiment orientation of the target entity. In particular, a part of the participle of the training text does not find the corresponding word vector in the word vector model. The word package vector can be a comprehensive reflection of the word segmentation. Thus, a packet vector for each of the participles in the part of speech sequence of the training text can also be obtained.
具体的,将所述训练文本的词性序列中的每一个分词和行业领域词包的所属关系,进行编码,得到述训练文本的词性序列中的每一个分词的词包向量。例如:判断所述训练文本的词性序列中的每一个分词是否在实体词包里,是否在评价语词包里。将判断结果进行编码,得到所述训练文本的词性序列中的每一个分词的词包向量。Specifically, each word segmentation of the part of speech of the training text and the belonging relationship of the industry domain word package are encoded to obtain a word packet vector of each word segment in the part of speech sequence of the training text. For example, it is determined whether each participle in the part of speech sequence of the training text is in the entity word package and is in the evaluation word package. The result of the judgment is encoded to obtain a packet vector of each part of the part of speech of the training text.
所述训练文本的词性序列中的每一个分词相对目标实体的距离的大 小,会对目标实体的情感倾向性的影响大小也不同。一般情况下,分词相对目标实体的距离较远,则会目标实体的情感倾向性的影响越不大。因此,还需要获得所述训练文本的词性序列中的每一个分词相对目标实体距离的向量。The distance of each participle in the part-of-speech sequence of the training text relative to the target entity may have a different influence on the sentiment orientation of the target entity. In general, the distance between the participle and the target entity is far, and the influence of the sentiment orientation of the target entity is less. Therefore, it is also necessary to obtain a vector of the distance of each word segment relative to the target entity in the part of speech sequence of the training text.
对于所述训练文本的词性序列中的每一个分词,根据其相对目标实体的距离进行编码,得到每一个分词的相对目标实体距离的向量。例如:[汽车,前脸,设计,威武,霸气],目标实体是前脸设计,那么每个分词距离目标实体的距离为[-2,-1,0,0,1,2,3],该距离序列进行编码,对-2、-1、0、1、2、3分别编码成一定维数的向量,得到每一个分词的相对目标实体距离的向量。For each participle in the part-of-speech sequence of the training text, the distance is compared with the distance of the target entity to obtain a vector of the relative target entity distance of each participle. For example: [car, front face, design, mighty, domineering], the target entity is the front face design, then the distance between each participle from the target entity is [-2,-1,0,0,1,2,3], The distance sequence is encoded, and -2, -1, 0, 1, 2, and 3 are respectively encoded into vectors of a certain dimension, and a vector of relative target entity distances of each participle is obtained.
若获得所述训练文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一种或任意组合之后,还需要将组合所述训练文本的词性序列中的每一个分词的词向量、词性向量、词包向量、以及相对目标实体距离的向量,得到所述训练文本的词性序列中的每一个分词的初始向量。If any one or any combination of the part of speech vector, the word packet vector, and the vector of the relative target entity distance of each part of the part of the training text is obtained, it is also necessary to combine the part of speech of the training text. The word vector, the part of speech vector, the word packet vector, and the vector of the relative target entity distance of each of the participles in the middle get the initial vector of each participle in the part of speech sequence of the training text.
其中,针对所述训练文本的词性序列中的每一个分词,分别将其词向量、词性向量、词包向量、以及相对目标实体距离的向量进行拼接组合,形成该分词的初始向量。Wherein, for each participle in the part of speech sequence of the training text, the word vector, the part of speech vector, the word packet vector, and the vector of the relative target entity distance are respectively spliced and combined to form an initial vector of the participle.
S1022、将所述训练文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述训练文本的词性序列中的每一个分词的向量。S1022: Multiply a word vector of each part of the part of the training text and an attenuation factor to obtain a vector of each participle in the part of speech sequence of the training text.
其中,根据所述训练文本的词性序列中的每一个分词的相对目标实体距离的词向量,计算得到每一个分词的衰减因子。具体的,衰减因子e的计算公式为e=1-d/N,其中,d表示当前分词距离目标实体的绝对距离,N为所述训练文档的词性序列的长度。Wherein, the attenuation factor of each participle is calculated according to the word vector of the relative target entity distance of each participle in the part of speech sequence of the training text. Specifically, the calculation formula of the attenuation factor e is e=1-d/N, where d represents the absolute distance of the current word segment from the target entity, and N is the length of the part-of-speech sequence of the training document.
将所述训练文本的词性序列中的每一个分词的词向量和对应的衰减因子相乘,即可得到该分词的向量。The vector of each word segmentation of the part of the training text and the corresponding attenuation factor are multiplied to obtain a vector of the word segmentation.
还需要说明的是,统计训练文档中每一个训练文本的分词长度,判断训练文档中是否存在超长的离群长度文本。具体的,计算训练文本的分词长度的均值的标准差,超长的离群长度文本即为分词长度是否超过所述均 值的标准差几个倍数以外的训练文本。可以根据实际情况,来设定具体的倍数要求。It should also be noted that the length of the word segmentation of each training text in the training document is counted, and whether the long outlier length text exists in the training document is determined. Specifically, the standard deviation of the mean of the segmentation length of the training text is calculated, and the extra-long outlier length text is the training text other than the multiple of the standard deviation of the mean value. The specific multiple requirements can be set according to the actual situation.
若判断出训练文档中不存在超长的离群长度文本,则将所述训练文档中分词长度最长的训练文本的分词长度作为所述训练文档的词性序列的长度。若判断出训练文档中存在超长的离群长度文本,则将所述训练文档中除超长的离群长度文本以外而剩余的训练文本中,分词长度最长的训练文本的分词长度作为所述训练文档的词性序列的长度。并且,依据所述训练文档的词性序列的长度截取所述训练文档中的超长的离群长度文本。具体的,与所述训练文本中的目标实体为中心,向前、后分别扩展直至分词长度至所述训练文档的词性序列的长度为止。If it is determined that there is no extra long out-of-group text in the training document, the length of the segmentation of the training text having the longest segmentation length in the training document is used as the length of the part-of-speech sequence of the training document. If it is determined that there is an extra long out-of-group length text in the training document, the length of the segmentation length of the training text having the longest segmentation length is included in the training text remaining in the training document except for the extra-long out-group length text. The length of the part of speech sequence of the training document. And, the extra long outlier length text in the training document is intercepted according to the length of the part of speech sequence of the training document. Specifically, it is centered on the target entity in the training text, and is extended forward and backward respectively until the length of the word segmentation reaches the length of the part of speech sequence of the training document.
例如:训练文档中存在10条训练文本,每一条训练文本的分词长度不等,但最长的一条训练文本的分词长度为50,那么取50为所述训练文档的词性序列的长度。若训练文档中存有一条训练文本,其分词长度有1000,那么该训练文本即为超长的离群长度文本。For example, there are 10 training texts in the training document, and the length of the word segmentation of each training text is not equal, but the length of the segmentation of the longest one training text is 50, then 50 is the length of the part-of-speech sequence of the training document. If there is a training text in the training document with a length of 1000, then the training text is an extra long out-of-group text.
还需要说明的是,若得到所述训练文本的词性序列中的每一个分词的初始向量,获得所述训练文本的词性序列中的每一个分词的向量的方式为:所述训练文本的词性序列中的每一个分词的初始向量和所述衰减因子相乘。It should be noted that, if an initial vector of each participle in the part-of-speech sequence of the training text is obtained, a vector of each participle in the part-of-speech sequence of the training text is obtained: the part of the training text The initial vector of each participle in the multiplication is multiplied by the attenuation factor.
S1023、将所述训练文本中对应所述目标实体的分词的向量,作为所述目标实体的向量。S1023: A vector of the participle of the training text corresponding to the target entity is used as a vector of the target entity.
需要说明的是,若所述训练文本的对应目标实体的分词包括多个,则将所述训练文本中对应所述目标实体的多个分词的向量的平均值作为所述目标实体的向量。It should be noted that, if the word segmentation of the corresponding target entity of the training text includes a plurality, the average of the vectors of the plurality of word segments corresponding to the target entity in the training text is used as a vector of the target entity.
S103、对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量。S103. Perform weighted averaging processing on a vector of each participle in the part of speech sequence of the training text, and obtain a vector weighted by the part of speech of the training text.
其中,在所述训练文本的词性序列中的每一个分词的向量上组合所述目标实体的向量,得到所述训练文本的词性序列中的每一个分词的向量。使用HAN(层次化关注网络机制模型,Hierarchical Attention Networks)的attention层对所述训练文本的词性序列中的每一个分词的向量做计算,得 到每个分词的权重。具体的,若分词相较目标实体较远,对目标实体的情感影响不大,不需要太关注,则权重弱化,反之,则权重强化。Wherein, the vector of the target entity is combined on a vector of each participle in the part of speech sequence of the training text to obtain a vector of each participle in the part of speech sequence of the training text. The vector of each participle in the part of speech of the training text is calculated using the attention layer of HAN (Hierarchical Attention Networks), and the weight of each participle is obtained. Specifically, if the participle is farther away from the target entity, the emotional impact on the target entity is not large, and the weight does not need to be paid too much, and the weight is weakened. Otherwise, the weight is strengthened.
根据所述训练文本的词性序列中的每一个分词的权重,对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量。And performing weighted averaging processing on the vector of each participle in the part of speech sequence of the training text according to the weight of each participle in the part of speech sequence of the training text, to obtain a vector weighted by the part of speech of the training text.
S104、将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量。S104. Multiply a vector of the target entity in the part-of-speech sequence of the training text with the first matrix to obtain a derived vector of the target entity.
其中,将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量。The vector of the target entity in the part of speech sequence of the training text is multiplied by the first matrix to obtain a derived vector of the target entity.
还需要说明的是,第一矩阵为m×m的矩阵,m为所述训练文本的词性序列中的目标实体的向量的维数。第一矩阵的具体数值为随机初始化的数值,每个数值都可以选取-0.1~0.1区间均匀分布的小数。It should also be noted that the first matrix is a matrix of m×m, and m is the dimension of the vector of the target entity in the part-of-speech sequence of the training text. The specific value of the first matrix is a randomly initialized value, and each value can select a fraction that is uniformly distributed in the range of -0.1 to 0.1.
S105、依据所述训练文本的词性序列加权后的向量,和/或,所述目标实体的派生向量,得到特征向量。S105. Obtain a feature vector according to a vector weighted by the part of speech of the training text, and/or a derived vector of the target entity.
其中,可以将所述训练文本的词性序列加权后的向量,作为所述特征向量,也可以将所述目标实体的派生向量,作为所述特征向量,还可以在所述训练文本的词性序列加权后的向量的基础上,加上或者减去所述目标实体的派生向量,得到所述特征向量。The weighted vector of the training text may be used as the feature vector, or the derived vector of the target entity may be used as the feature vector, and the part of the training text may be weighted. Based on the latter vector, the derived vector of the target entity is added or subtracted to obtain the feature vector.
具体的,若所述训练文本中对应目标实体的分词,本身具有情感倾向性,则可以选择将所述目标实体的派生向量作为所述特征向量。另外,在所述训练文本的词性序列加权后的向量的基础上,加上或者减去所述目标实体的派生向量,得到的所述特征向量,可以使所述目标实体的派生向量作用到所述训练文本的词性序列加权后的向量上。Specifically, if the participle corresponding to the target entity in the training text itself has sentiment orientation, the derived vector of the target entity may be selected as the feature vector. In addition, based on the weighted vector of the part of speech of the training text, adding or subtracting the derived vector of the target entity, the obtained feature vector may cause the derived vector of the target entity to act on the The weighted vector of the part of speech of the training text.
可选地,本申请的另一实施例中,步骤S103~S104可重复执行若干次,其中,可以根据实际需求设定要重复执行的次数。Optionally, in another embodiment of the present application, steps S103-S104 may be repeatedly performed several times, wherein the number of times to be repeatedly executed may be set according to actual requirements.
具体的,将步骤S104上一次执行得到的特征向量,作为步骤S103和S104下一次执行时的目标实体的向量,得到所述训练文本的词性序列最新的加权后的向量和所述目标实体的最新的派生向量,再依据所述训练文本的词性序列最新的加权后的向量和/或所述目标实体的最新的派生向量,得 到新的特征向量。Specifically, the feature vector obtained last time in step S104 is used as the vector of the target entity at the next execution of steps S103 and S104, and the latest weighted vector of the part of speech of the training text and the latest of the target entity are obtained. a derived vector, and then a new feature vector is obtained according to the latest weighted vector of the part of speech of the training text and/or the latest derived vector of the target entity.
S106、采用softmax函数处理所述特征向量,得到概率输出向量。S106. Process the feature vector by using a softmax function to obtain a probability output vector.
其中,所述概率输出向量包括三个类别的概率值,所述三个类别包括正、中和负。正表明训练文本对目标实体而言,情感是正面的;负表明训练文本对目标实体而言,情感是负面的;中表明训练文本对目标实体而言,情感是中性的。每个类别的概率值用于表示所述训练文本的目标实体的实体情感属于对应类别的概率。Wherein, the probability output vector includes probability values of three categories, including positive, medium and negative. It is indicated that the training text is positive for the target entity; negative indicates that the training text is negative for the target entity; the middle indicates that the training text is neutral for the target entity. The probability value of each category is used to indicate the probability that the entity emotion of the target entity of the training text belongs to the corresponding category.
S107、将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数。S107. Perform cross-entropy operation on the probability output vector and the manual labeling category of the training text to obtain a loss function.
其中,对训练文档中的每一个训练文本,人工识别训练文本对目标实体的情感,并根据情感进行正、中和负三个类别情感的标注,得到所述训练文本的人工标注类别。例如:“汽车前脸设计威武霸气”的训练文本,目标实体为“前脸设计”,情感为正面。因此,训练文本的人工标注类别的标识可以为[1,0,0]。Wherein, for each training text in the training document, the emotion of the training text to the target entity is manually recognized, and the positive, middle and negative emotions of the three categories are marked according to the emotion, and the manual labeling category of the training text is obtained. For example: the training text of “the front face design of the car is mighty and domineering”, the target entity is “front face design”, and the emotion is positive. Therefore, the identifier of the manual label category of the training text can be [1, 0, 0].
将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,得到的所述损失函数用于表明所述概率输出向量和所述训练文本的人工标注类别的差异。And performing the cross entropy operation on the probability output output vector and the artificial annotation category of the training text, and the obtained loss function is used to indicate a difference between the probability output vector and the manual annotation category of the training text.
S108、优化所述损失函数,并根据所述优化后的损失函数更新参数,直至利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别等同为止。S108. Optimize the loss function, and update the parameter according to the optimized loss function until the probability output vector obtained by predicting the training text by using the updated feature vector obtained by the updated first parameter and the training text. Manually labeled categories are equivalent.
其中,所述第一参数包括所述attention层、所述第一矩阵、softmax函数以及所述训练文本的词性序列中的每一个分词的向量。具体的,所述训练文本的词性序列中的每一个分词的向量的得到方式可参见对应图1的实施例中,步骤S102的内容,此处不再赘述。The first parameter includes a vector of each of the attention layer, the first matrix, the softmax function, and the part of speech of the training text. Specifically, the manner of obtaining the vector of each participle in the part-of-speech sequence of the training text can be referred to in the embodiment corresponding to FIG. 1 , and the content of step S102 is not described herein.
通过随机梯度下降法或者Adam优化算法等,可以实现对所述损失函数进行优化,得到优化后的损失函数,依据所述优化后的损失函数逐层递推得到更新后的参数。The loss function can be optimized by a stochastic gradient descent method or an Adam optimization algorithm, etc., and an optimized loss function is obtained, and the updated parameter is recursively layer by layer according to the optimized loss function.
还需要说明的是,本步骤中,等同的含义是:站在本领域技术人员的角度来看,概率输出向量与训练文本的人工标注类别相比可以当成是等同 的,可以包括不完全相同。It should also be noted that, in this step, the equivalent meaning is that, from the perspective of those skilled in the art, the probability output vector can be regarded as equivalent to the manual labeling category of the training text, and may include not exactly the same.
S109、将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述attention层、所述第一矩阵和所述softmax函数。S109. The updated second parameter is used as an entity sentiment prediction model. The second parameter includes: the attention layer, the first matrix, and the softmax function.
基于由上述实施例的方法构建得到的实体情感预测模型,可对待预测文本进行实体情感分析。具体的,参见图3,所述实体情感分析方法,包括:Based on the entity sentiment prediction model constructed by the method of the above embodiment, entity sentiment analysis can be performed on the predicted text. Specifically, referring to FIG. 3, the entity sentiment analysis method includes:
S301、获取待预测文本。S301. Acquire a text to be predicted.
其中,所述待预测文本为用户关于某些事件、人物、企业以及产品等的评价语句。获取该待预测文本,以分析对该文本关于文本中的目标实体的情感倾向性。The text to be predicted is a user's evaluation statement about certain events, people, businesses, products, and the like. The text to be predicted is obtained to analyze the sentiment orientation of the text with respect to the target entity in the text.
S302、对所述待预测文本进行分词处理,得到所述待预测文本的词性序列。S302. Perform word segmentation on the to-be-predicted text to obtain a part-of-speech sequence of the text to be predicted.
针对待预测文本,同样采用开源工具软件进行分词处理,并获取相应分词的词性序列。本步骤的具体执行过程可参见对应图1的实施例中,步骤S101的内容,此处不再赘述。For the text to be predicted, the open source tool software is also used for word segmentation, and the part-of-speech sequence of the corresponding word segmentation is obtained. For the specific implementation process of this step, refer to the content of step S101 in the embodiment corresponding to FIG. 1, and details are not described herein again.
S303、获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量。S303. Obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity.
可选地,步骤S303的一种实现方式中,参见图4,该步骤包括:Optionally, in an implementation manner of step S303, referring to FIG. 4, the step includes:
S3031、分别获得所述待预测文本的词性序列中的每一个分词的词向量。S3031: Obtain a word vector of each participle in the part of speech sequence of the text to be predicted, respectively.
可选地,除了获得所述待预测文本的词性序列中的每一个分词的词向量之前,还可以获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量。Optionally, in addition to obtaining a word vector of each participle in the part of speech sequence of the text to be predicted, a part of speech vector, a word packet vector, and a relative of each part of the part of speech sequence of the text to be predicted may be obtained. The vector of the target entity distance.
其中,分词的词向量、词性向量、词包向量、以及相对目标实体距离的向量的获得方式可参见对应图1的实施例中,步骤S1021的内容。For the manner in which the word vector, the part of speech vector, the word packet vector, and the vector of the target entity distance are obtained, refer to the content of step S1021 in the embodiment corresponding to FIG.
S3032、将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。S3032: Multiply a word vector of each part of the part of speech of the text to be predicted and an attenuation factor to obtain a vector of each part of the part of speech of the text to be predicted.
其中,本步骤的具体实现方式,可以参见对应图1的实施例中,步骤 S1022的内容,此处不再赘述。For the specific implementation of this step, refer to the content of step S1022 in the embodiment corresponding to FIG. 1, and details are not described herein again.
S3033、将所述待预测文本中对应所述目标实体的分词的向量,作为所述目标实体的向量。S3033. A vector of the word segmentation corresponding to the target entity in the text to be predicted is used as a vector of the target entity.
可选地,本申请的另一实施例中,若所述待预测文本中对应所述目标实体的分词包括多个,则将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述目标实体的向量。Optionally, in another embodiment of the present application, if the word segmentation corresponding to the target entity in the text to be predicted includes multiple, the vector of the plurality of word segments corresponding to the target entity in the text to be predicted is The average value is used as the vector of the target entity.
S304、利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本的目标实体的预测结果;其中,所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直到利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。S304: predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the target entity of the text to be predicted; wherein the entity emotion The prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction result obtained by predicting the feature vector of the training text by using the neural network algorithm after updating the parameter is equivalent The result is manually labeled; the feature vector of the training text is obtained according to a vector of the part of speech of the training text and a vector of the target entity in the part of speech of the training text.
本实施例公开的实体情感分析方法中,对待预测文本进行分词处理得到所述待预测文本的词性序列后,再得到所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,由实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,即可得到所述待预测文本中目标实体的情感倾向性的预测结果。由于在上述过程中,对待预测文本是进行分词处理得到词性序列,并得到词性序列中的每一个分词的向量和目标实体的向量,并不是由人工选词并提取词语特征,解决了由于人工选词且提供词语特征而导致的影响情感倾向性结果的准确性的问题。In the entity sentiment analysis method disclosed in this embodiment, after performing word segmentation processing on the predicted text to obtain a part-of-speech sequence of the text to be predicted, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity are obtained. And predicting, by the entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted. In the above process, the text to be predicted is subjected to word segmentation to obtain a part-of-speech sequence, and the vector of each participle in the part-of-speech sequence and the vector of the target entity are obtained, and the word selection is not selected by the manual selection and the word feature is solved, and the manual selection is solved. Words and questions that provide the characteristics of the words that affect the accuracy of the emotionally biased results.
可选地,本申请的另一实施例中,参见图5,步骤S304包括:Optionally, in another embodiment of the present application, referring to FIG. 5, step S304 includes:
S3041、对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量。S3041: Perform weighted average processing on a vector of each participle in the part of speech sequence of the text to be predicted, and obtain a vector weighted by the part of speech sequence of the text to be predicted.
其中,本步骤的具体实现方式,可参见对应图1的实施例中,步骤S103的内容,此处不再赘述。For the specific implementation of this step, refer to the content of step S103 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
S3042、将所述待预测文本的词性序列中的目标实体的向量与第一矩阵 做乘,得到所述目标实体的派生向量。S3042: Multiply a vector of the target entity in the part-of-speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity.
其中,所述第一矩阵为对应图1的实施例中步骤S109中的实体情感预测模型的第一矩阵。并且,本步骤的具体实现方式,可参见对应图1的实施例中,步骤S104的内容,此处不再赘述。The first matrix is a first matrix corresponding to the entity sentiment prediction model in step S109 in the embodiment of FIG. 1. For the specific implementation of this step, refer to the content of step S104 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
S3043、依据所述待预测文本的词性序列加权后的向量和/或所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量。S3043. Obtain a feature vector according to the vector weighted by the part-of-speech sequence of the text to be predicted and/or the derived vector of the target entity in the part-of-speech sequence of the text to be predicted.
其中,本步骤的具体实现方式,可参见对应图1的实施例中,步骤S105的内容,此处不再赘述。For the specific implementation of this step, refer to the content of step S105 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
S3044、采用softmax函数处理所述特征向量,得到概率输出向量。S3044: Processing the feature vector by using a softmax function to obtain a probability output vector.
其中,所述softmax函数为对应图1的实施例中步骤S109中的实体情感预测模型的softmax函数。并且,本步骤的具体实现方式,可参见对应图1的实施例中,步骤S106的内容,此处不再赘述。The softmax function is a softmax function corresponding to the entity sentiment prediction model in step S109 in the embodiment of FIG. 1. For the specific implementation of this step, refer to the content of step S106 in the embodiment corresponding to FIG. 1 , and details are not described herein again.
本申请另一实施例还公开了一种实体情感分析装置,其包括的各个单元的具体工作过程可参见对应图3的实施例内容。具体的,参见图6,所述实体情感分析装置包括:Another embodiment of the present application further discloses a physical sentiment analysis apparatus, and the specific working process of each unit included in the application can be referred to the content corresponding to the embodiment of FIG. 3. Specifically, referring to FIG. 6, the entity sentiment analysis apparatus includes:
获取单元601,用于获取待预测文本。The obtaining unit 601 is configured to obtain the text to be predicted.
分词单元602,用于对所述待预测文本进行分词处理,得到所述待预测文本的词性序列。The word segmentation unit 602 is configured to perform word segmentation processing on the to-be-predicted text to obtain a part-of-speech sequence of the to-be-predicted text.
生成单元603,用于获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量。The generating unit 603 is configured to obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity.
可选地,本申请的另一实施例中,生成单元603,参见图7,包括:Optionally, in another embodiment of the present application, the generating unit 603, referring to FIG. 7, includes:
第一获得单元6031,用于分别获得所述待预测文本的词性序列中的每一个分词的词向量。The first obtaining unit 6031 is configured to respectively obtain a word vector of each participle of the part of speech sequence of the text to be predicted.
第二获得单元6032,用于将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。The second obtaining unit 6032 is configured to multiply the word vector of each part of the part of speech of the text to be predicted and the attenuation factor to obtain a vector of each part of the part of speech of the text to be predicted.
生成子单元6033,用于将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。The generating sub-unit 6033 is configured to use a vector of the word segmentation corresponding to the target entity in the text to be predicted as a vector of the target entity in the part-of-speech sequence of the text to be predicted.
其中,本实施例公开的生成单元603中的各个单元的具体工作过程可参见上述对应图4的实施例的内容,此处不再赘述。For the specific working process of each unit in the generating unit 603 disclosed in this embodiment, refer to the content of the embodiment corresponding to FIG. 4, and details are not described herein again.
可选地,本申请的另一实施例中,若所述待预测文本中对应所述目标实体的分词包括多个,则生成子单元6033执行将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量时,具体用于:将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述待预测文本的词性序列中的目标实体的向量。Optionally, in another embodiment of the present application, if the participle of the to-be-predicted text corresponding to the target entity includes multiple, the generating sub-unit 6033 executes the corresponding to the target entity in the text to be predicted. The vector of the word segmentation is used as the vector of the target entity in the part of speech sequence of the text to be predicted, and is specifically used to: use an average value of a vector of the plurality of word segments corresponding to the target entity in the text to be predicted as the A vector that predicts the target entity in the part-of-speech sequence of the text.
可选地,本申请的另一实施例中,所述实体情感分析装置,还包括:Optionally, in another embodiment of the present application, the entity sentiment analysis apparatus further includes:
第三获得单元,用于获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合。And a third obtaining unit, configured to obtain any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each part of the part of speech of the text to be predicted.
组合单元,用于组合所述待预测文本的词性序列中的每一个分词的词向量、以及所述获得的待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合,得到所述待预测文本的词性序列中的每一个分词的初始向量。a combination unit, a word vector for combining each of the participles in the part of speech sequence of the text to be predicted, and a part of speech vector, a word package vector, and a relative target of each of the participle sequences of the obtained text to be predicted Any one or combination of the vectors of the entity distances obtains an initial vector of each of the word segments in the part of speech sequence of the text to be predicted.
其中,第二获得单元6032执行所述将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量时,具体用于:将所述待预测文本的词性序列中的每一个分词的初始向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。The second obtaining unit 6032 performs multiplication of the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted. Specifically, the method uses: multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
预测单元604,用于利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中,所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直至利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。a prediction unit 604, configured to predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction of the sentiment orientation of the target entity in the text to be predicted a result; wherein the entity sentiment prediction model is constructed based on a first principle; the first principle comprises: iteratively updating parameters in the neural network algorithm until a feature vector of the training text is obtained by using a neural network algorithm after updating the parameter The prediction result obtained by the prediction is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part-of-speech sequence of the training text and the vector of the target entity in the part-of-speech sequence of the training text.
可选地,本申请的另一实施例中,预测单元604,如图8所示,包括:Optionally, in another embodiment of the present application, the prediction unit 604, as shown in FIG. 8, includes:
第一计算单元6041,用于对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量。The first calculating unit 6041 is configured to perform weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, to obtain a vector weighted by the part of speech sequence of the text to be predicted.
第二计算单元6042,用于将所述待预测文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量。The second calculating unit 6042 is configured to multiply a vector of the target entity in the part-of-speech sequence of the text to be predicted by the first matrix to obtain a derived vector of the target entity.
第三计算单元6043,用于依据所述待预测文本的词性序列加权后的向量,和/或,所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量。The third calculating unit 6043 is configured to obtain a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted.
第四计算单元6044,用于采用softmax函数处理所述特征向量,得到概率输出向量。The fourth calculating unit 6044 is configured to process the feature vector by using a softmax function to obtain a probability output vector.
其中,本实施例公开的预测单元604中的各个单元的具体工作过程可参见上述对应图5的实施例的内容,此处不再赘述。For the specific working process of each unit in the prediction unit 604 disclosed in this embodiment, refer to the content of the embodiment corresponding to FIG. 5, and details are not described herein again.
本实施例中,对待预测文本,由分词单元进行分词处理得到词性序列,并由生成单元得到词性序列中的每一个分词的向量和目标实体的向量,并不是由人工选词并提取词语特征,解决了由于人工选词且提供词语特征而导致的影响情感倾向性结果的准确性的问题。In this embodiment, the text to be predicted is subjected to word segmentation by the word segmentation unit to obtain a part of speech sequence, and the vector of each word segment in the part of speech sequence and the vector of the target entity are obtained by the generating unit, instead of manually selecting words and extracting word features. The problem of affecting the accuracy of affective tendencies results due to manual word selection and providing word features is solved.
可选地,本申请的另一实施例中,所述实体情感分析装置还可以对训练文本进行预测,得到实体情感预测模型。Optionally, in another embodiment of the present application, the entity sentiment analysis apparatus may further predict the training text to obtain an entity sentiment prediction model.
具体的:分词单元602,还用于对训练文本进行分词处理,得到所述训练文本的词性序列。Specifically, the word segmentation unit 602 is further configured to perform word segmentation processing on the training text to obtain a part-of-speech sequence of the training text.
生成单元603,还用于获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量。The generating unit 603 is further configured to obtain a vector of each participle in the part of speech sequence of the training text and a vector of the target entity.
第一计算单元6041,还用于对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量。The first calculating unit 6041 is further configured to perform weighted averaging processing on the vector of each participle in the part of speech of the training text, to obtain a vector with the partiality of the part of the training text.
第二计算单元6042,还用于将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述训练文本的词性序列中的目标实体的派生向量。The second calculating unit 6042 is further configured to multiply a vector of the target entity in the part-of-speech sequence of the training text by the first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text.
第三计算单元6043,还用于依据所述训练文本的词性序列加权后的向量,和/或所述训练文本的词性序列中的目标实体的派生向量,得到特征向量。The third calculating unit 6043 is further configured to obtain a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech sequence of the training text.
第四计算单元6044,还用于采用softmax函数处理所述特征向量,得到概率输出向量。The fourth calculating unit 6044 is further configured to process the feature vector by using a softmax function to obtain a probability output vector.
并且,所述实体情感分析装置还包括:运算单元,用于将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数。Moreover, the entity sentiment analysis apparatus further includes: an operation unit configured to perform an entropy operation on the probability output output vector and the manual annotation category of the training text to obtain a loss function.
优化单元,用于优化所述损失函数。An optimization unit for optimizing the loss function.
更新单元,用于根据所述优化后的损失函数更新第一参数,直至第四计算单元6044利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别基本等同为止;其中,所述第一参数包括所述第一矩阵、所述softmax函数以及所述训练文本的词性序列中的每一个分词的向量。And an updating unit, configured to update the first parameter according to the optimized loss function, until the fourth computing unit 6044 predicts the training text by using the updated feature vector obtained by the updated first parameter, and the The manual annotation category of the training text is substantially equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text.
构建单元,用于将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述第一矩阵和所述softmax函数。a building unit, configured to use the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
其中,上述实施例中的各个单元的具体工作过程可参见上述对应图1的实施例的内容,此处不再赘述。For the specific working process of each unit in the foregoing embodiment, refer to the content of the embodiment corresponding to FIG. 1 , and details are not described herein again.
所述实体情感分析装置包括处理器和存储器,上述的获取单元、分词单元、生成单元和预测单元等均作为程序单元存储在存储器中,由处理器执行存储在存储器中的上述程序单元来实现相应的功能。The entity sentiment analysis apparatus includes a processor and a memory, and the above-mentioned acquisition unit, word segmentation unit, generation unit, prediction unit, and the like are all stored as a program unit in a memory, and the processor executes the above-mentioned program unit stored in the memory to implement corresponding The function.
处理器中包含内核,由内核去存储器中调取相应的程序单元。内核可以设置一个或以上,通过调整内核参数来实现待预测文本的情感分析过程,以得到所述待预测文本中目标实体的情感倾向性的预测结果。The processor contains a kernel, and the kernel removes the corresponding program unit from the memory. The kernel may set one or more, and adjust the kernel parameters to implement the sentiment analysis process of the text to be predicted to obtain the prediction result of the sentiment orientation of the target entity in the text to be predicted.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM),存储器包括至少一个存储芯片。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory (flash RAM), the memory including at least one Memory chip.
本发明实施例提供了一种存储介质,其上存储有程序,该程序被处理器执行时实现所述实体情感分析的方法。Embodiments of the present invention provide a storage medium on which a program is stored, which is implemented by a processor to implement the entity sentiment analysis method.
本发明实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行所述实体情感分析的方法。An embodiment of the present invention provides a processor, where the processor is configured to run a program, where the program executes the method of entity sentiment analysis.
本发明实施例提供了一种设备,本文中的设备可以是服务器、PC、PAD、手机等。设备包括处理器、存储器及存储在存储器上并可在处理器 上运行的程序,处理器执行程序时实现以下步骤:The embodiment of the invention provides a device, which may be a server, a PC, a PAD, a mobile phone or the like. The device includes a processor, a memory, and a program stored on the memory and executable on the processor, and the processor performs the following steps when executing the program:
一种实体情感分析方法,包括:A method of entity sentiment analysis, including:
获取待预测文本;Get the text to be predicted;
对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;Performing word segmentation processing on the text to be predicted to obtain a part-of-speech sequence of the text to be predicted;
获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中:所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直到利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。Using a physical sentiment prediction model to predict a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted; wherein: The entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text The result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
可选地,所述获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,包括:Optionally, the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity, including:
分别获得所述待预测文本的词性序列中的每一个分词的词向量;Obtaining a word vector of each of the participles in the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量;Multiplying a word vector of each word segment in the part-of-speech sequence of the text to be predicted and an attenuation factor to obtain a vector of each word segment in the part-of-speech sequence of the text to be predicted;
将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。A vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
可选地,所述实体情感分析方法还包括:Optionally, the entity sentiment analysis method further includes:
获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合;Obtaining any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each participle of the part of speech sequence of the text to be predicted;
组合所述待预测文本的词性序列中的每一个分词的词向量、以及获得的待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合,得到所述待预测文本的词性序列中的每一个分词的初始向量;Combining the word vector of each participle in the part-of-speech sequence of the text to be predicted, and the part-of-speech vector of each part of the obtained part-of-speech sequence of the text to be predicted, the word packet vector, and the vector of the relative target entity distance One or a combination, obtaining an initial vector of each of the participles in the part of speech sequence of the text to be predicted;
其中,所述将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量, 包括:The multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
将所述待预测文本的词性序列中的每一个分词的初始向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。Multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
可选地,若所述待预测文本中对应所述目标实体的分词包括多个,则将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述待预测文本的词性序列中的目标实体的向量。Optionally, if a part of the to-be-predicted text corresponding to the target entity includes a plurality, the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted. The vector of the target entity in the part of speech sequence.
可选地,所述利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果,包括:Optionally, the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted. The results include:
对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, and obtaining a vector weighted by the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量;Multiplying a vector of the target entity in the part of speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity;
依据所述待预测文本的词性序列加权后的向量,和/或,所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量;And obtaining a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述待预测文本中目标实体分别在预设种类别的情感倾向性下的概率值。The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under the sentiment orientation of the preset category.
可选地,所述实体情感预测模型的构建过程,包括:Optionally, the process of constructing the entity sentiment prediction model includes:
对训练文本进行分词处理,得到所述训练文本的词性序列;Performing word segmentation processing on the training text to obtain a part-of-speech sequence of the training text;
获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the training text and a vector of the target entity;
对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech of the training text to obtain a vector weighted by the part of speech of the training text;
将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述训练文本的词性序列中的目标实体的派生向量;Multiplying a vector of the target entity in the part-of-speech sequence of the training text with a first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text;
依据所述训练文本的词性序列加权后的向量,和/或所述训练文本的词性序列中的目标实体的派生向量,得到特征向量;Obtaining a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech sequence of the training text;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所 述概率输出向量包括:所述训练文本中目标实体分别在预设种类别的情感倾向性下的概率值;The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数;Performing a cross-entropy operation on the probability output vector and the artificial annotation category of the training text to obtain a loss function;
优化所述损失函数,并根据所述优化后的损失函数更新第一参数,直至利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别等同为止;其中,所述第一参数包括所述第一矩阵、所述softmax函数以及所述训练文本的词性序列中的每一个分词的向量;Optimizing the loss function, and updating the first parameter according to the optimized loss function until the probability output vector obtained by predicting the training text by using the updated feature vector obtained by the updated first parameter and the training text The manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述第一矩阵和所述softmax函数。And using the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
本发明还提供了一种计算机程序产品,当在数据处理设备上执行时,适于执行初始化有如下方法步骤的程序:The invention also provides a computer program product, when executed on a data processing device, adapted to perform a process of initializing the method steps as follows:
一种实体情感分析方法,包括:A method of entity sentiment analysis, including:
获取待预测文本;Get the text to be predicted;
对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;Performing word segmentation processing on the text to be predicted to obtain a part-of-speech sequence of the text to be predicted;
获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中:所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直到利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。Using a physical sentiment prediction model to predict a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted; wherein: The entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text The result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
可选地,所述获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,包括:Optionally, the obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity, including:
分别获得所述待预测文本的词性序列中的每一个分词的词向量;Obtaining a word vector of each of the participles in the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相 乘,得到所述待预测文本的词性序列中的每一个分词的向量;Multiplying a word vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted;
将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。A vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
可选地,所述实体情感分析方法还包括:Optionally, the entity sentiment analysis method further includes:
获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合;Obtaining any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each participle of the part of speech sequence of the text to be predicted;
组合所述待预测文本的词性序列中的每一个分词的词向量、以及获得的待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合,得到所述待预测文本的词性序列中的每一个分词的初始向量;Combining the word vector of each participle in the part-of-speech sequence of the text to be predicted, and the part-of-speech vector of each part of the obtained part-of-speech sequence of the text to be predicted, the word packet vector, and the vector of the relative target entity distance One or a combination, obtaining an initial vector of each of the participles in the part of speech sequence of the text to be predicted;
其中,所述将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量,包括:The multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
将所述待预测文本的词性序列中的每一个分词的初始向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。Multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
可选地,若所述待预测文本中对应所述目标实体的分词包括多个,则将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述待预测文本的词性序列中的目标实体的向量。Optionally, if a part of the to-be-predicted text corresponding to the target entity includes a plurality, the average of the vectors of the plurality of word segments corresponding to the target entity in the text to be predicted is used as the text to be predicted. The vector of the target entity in the part of speech sequence.
可选地,所述利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果,包括:Optionally, the entity emotion prediction model is used to predict a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity, and obtain a prediction of the sentiment orientation of the target entity in the text to be predicted. The results include:
对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, and obtaining a vector weighted by the part of speech sequence of the text to be predicted;
将所述待预测文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量;Multiplying a vector of the target entity in the part of speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity;
依据所述待预测文本的词性序列加权后的向量,和/或,所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量;And obtaining a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述待预测文本中目标实体分别在预设种类别的情 感倾向性下的概率值。The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under a sentiment orientation of a preset species.
可选地,所述实体情感预测模型的构建过程,包括:Optionally, the process of constructing the entity sentiment prediction model includes:
对训练文本进行分词处理,得到所述训练文本的词性序列;Performing word segmentation processing on the training text to obtain a part-of-speech sequence of the training text;
获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the training text and a vector of the target entity;
对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech of the training text to obtain a vector weighted by the part of speech of the training text;
将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述训练文本的词性序列中的目标实体的派生向量;Multiplying a vector of the target entity in the part-of-speech sequence of the training text with a first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text;
依据所述训练文本的词性序列加权后的向量,和/或所述训练文本的词性序列中的目标实体的派生向量,得到特征向量;Obtaining a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech sequence of the training text;
采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述训练文本中目标实体分别在预设种类别的情感倾向性下的概率值;The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数;Performing a cross-entropy operation on the probability output vector and the artificial annotation category of the training text to obtain a loss function;
优化所述损失函数,并根据所述优化后的损失函数更新第一参数,直至利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别等同为止;其中,所述第一参数包括所述第一矩阵、所述softmax函数以及所述训练文本的词性序列中的每一个分词的向量;Optimizing the loss function, and updating the first parameter according to the optimized loss function until the probability output vector obtained by predicting the training text by using the updated feature vector obtained by the updated first parameter and the training text The manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述第一矩阵和所述softmax函数。And using the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on the computer or other programmable device to produce the computer
实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。The implemented processing, such as instructions executed on a computer or other programmable device, provides steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
存储器可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。存储器是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁 盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer readable media includes both permanent and non-persistent, removable and non-removable media. Information storage can be implemented by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include temporary storage of computer readable media, such as modulated data signals and carrier waves.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in a process, method, article, or device that comprises the element, without further limitation.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above is only an embodiment of the present application and is not intended to limit the application. Various changes and modifications can be made to the present application by those skilled in the art. Any modifications, equivalents, improvements, etc. made within the spirit and scope of the present application are intended to be included within the scope of the appended claims.

Claims (10)

  1. 一种实体情感分析方法,其特征在于,包括:A method of entity sentiment analysis, characterized in that it comprises:
    获取待预测文本;Get the text to be predicted;
    对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;Performing word segmentation processing on the text to be predicted to obtain a part-of-speech sequence of the text to be predicted;
    获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
    利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中:所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直到利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。Using a physical sentiment prediction model to predict a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted; wherein: The entity sentiment prediction model is constructed based on the first principle; the first principle includes: iteratively updating the parameters in the neural network algorithm until the prediction using the neural network algorithm that updates the parameters to predict the feature vector of the training text The result is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part of speech sequence of the training text and the vector of the target entity in the part of speech sequence of the training text.
  2. 根据权利要求1所述的方法,其特征在于,所述获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量,包括:The method according to claim 1, wherein the obtaining a vector of each participle of the part of speech sequence of the text to be predicted and a vector of the target entity comprises:
    分别获得所述待预测文本的词性序列中的每一个分词的词向量;Obtaining a word vector of each of the participles in the part of speech sequence of the text to be predicted;
    将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量;Multiplying a word vector of each word segment in the part-of-speech sequence of the text to be predicted and an attenuation factor to obtain a vector of each word segment in the part-of-speech sequence of the text to be predicted;
    将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。A vector of the word segment corresponding to the target entity in the text to be predicted is used as a vector of a target entity in the part-of-speech sequence of the text to be predicted.
  3. 根据权利要求2所述的方法,其特征在于,还包括:The method of claim 2, further comprising:
    获得所述待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合;Obtaining any one or combination of a part of speech vector, a word packet vector, and a vector of a relative target entity distance of each participle of the part of speech sequence of the text to be predicted;
    组合所述待预测文本的词性序列中的每一个分词的词向量、以及获得的待预测文本的词性序列中的每一个分词的词性向量、词包向量、以及相对目标实体距离的向量中的任意一个或组合,得到所述待预测文本的词性序列中的每一个分词的初始向量;Combining the word vector of each participle in the part-of-speech sequence of the text to be predicted, and the part-of-speech vector of each part of the obtained part-of-speech sequence of the text to be predicted, the word packet vector, and the vector of the relative target entity distance One or a combination, obtaining an initial vector of each of the participles in the part of speech sequence of the text to be predicted;
    其中,所述将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量,包括:The multiplying the word vector and the attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted includes:
    将所述待预测文本的词性序列中的每一个分词的初始向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量。Multiplying an initial vector and an attenuation factor of each participle in the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted.
  4. 根据权利要求2所述的方法,其特征在于,若所述待预测文本中对应所述目标实体的分词包括多个,则将所述待预测文本中对应所述目标实体的多个分词的向量的平均值作为所述待预测文本的词性序列中的目标实体的向量。The method according to claim 2, wherein if the word segmentation corresponding to the target entity in the text to be predicted includes a plurality, the vector of the plurality of word segments corresponding to the target entity in the text to be predicted is The average value is used as a vector of the target entity in the part-of-speech sequence of the text to be predicted.
  5. 根据权利要求1所述的方法,其特征在于,所述利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果,包括:The method according to claim 1, wherein the entity emotion prediction model predicts a vector of each word segment in the part-of-speech sequence of the text to be predicted and a vector of the target entity to obtain the text to be predicted The prediction of the sentiment orientation of the target entity, including:
    对所述待预测文本的词性序列中的每一个分词的向量做加权平均处理,得到所述待预测文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech sequence of the text to be predicted, and obtaining a vector weighted by the part of speech sequence of the text to be predicted;
    将所述待预测文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述目标实体的派生向量;Multiplying a vector of the target entity in the part of speech sequence of the text to be predicted by a first matrix to obtain a derived vector of the target entity;
    依据所述待预测文本的词性序列加权后的向量,和/或,所述待预测文本的词性序列中的目标实体的派生向量,得到特征向量;And obtaining a feature vector according to the vector weighted by the part of speech sequence of the text to be predicted, and/or the derived vector of the target entity in the part of speech sequence of the text to be predicted;
    采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述待预测文本中目标实体分别在预设种类别的情感倾向性下的概率值。The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the text to be predicted, respectively, under the sentiment orientation of the preset category.
  6. 根据权利要求1所述的方法,其特征在于,所述实体情感预测模型的构建过程,包括:The method according to claim 1, wherein the constructing process of the entity sentiment prediction model comprises:
    对训练文本进行分词处理,得到所述训练文本的词性序列;Performing word segmentation processing on the training text to obtain a part-of-speech sequence of the training text;
    获得所述训练文本的词性序列中的每一个分词的向量和目标实体的向量;Obtaining a vector of each participle in the part of speech sequence of the training text and a vector of the target entity;
    对所述训练文本的词性序列中的每一个分词的向量做加权平均处理,得到所述训练文本的词性序列加权后的向量;Performing weighted averaging processing on the vector of each participle in the part of speech of the training text to obtain a vector weighted by the part of speech of the training text;
    将所述训练文本的词性序列中的目标实体的向量与第一矩阵做乘,得到所述训练文本的词性序列中的目标实体的派生向量;Multiplying a vector of the target entity in the part-of-speech sequence of the training text with a first matrix to obtain a derived vector of the target entity in the part-of-speech sequence of the training text;
    依据所述训练文本的词性序列加权后的向量,和/或所述训练文本的词 性序列中的目标实体的派生向量,得到特征向量;Obtaining a feature vector according to the weighted vector of the part of speech of the training text, and/or the derived vector of the target entity in the part of speech of the training text;
    采用softmax函数处理所述特征向量,得到概率输出向量,其中,所述概率输出向量包括:所述训练文本中目标实体分别在预设种类别的情感倾向性下的概率值;The feature vector is processed by using a softmax function to obtain a probability output vector, wherein the probability output vector includes: a probability value of the target entity in the training text under the sentiment orientation of the preset category;
    将所述概率输出向量与所述训练文本的人工标注类别进行交叉熵运算,获得损失函数;Performing a cross-entropy operation on the probability output vector and the artificial annotation category of the training text to obtain a loss function;
    优化所述损失函数,并根据所述优化后的损失函数更新第一参数,直至利用更新后的第一参数得到的特征向量对所述训练文本进行预测得到的概率输出向量与所述训练文本的人工标注类别等同为止;其中,所述第一参数包括所述第一矩阵、所述softmax函数以及所述训练文本的词性序列中的每一个分词的向量;Optimizing the loss function, and updating the first parameter according to the optimized loss function until the probability output vector obtained by predicting the training text by using the updated feature vector obtained by the updated first parameter and the training text The manual labeling category is equivalent; wherein the first parameter comprises a vector of each of the first matrix, the softmax function, and the part of speech of the training text;
    将所述更新后的第二参数作为实体情感预测模型;其中,所述第二参数包括:所述第一矩阵和所述softmax函数。And using the updated second parameter as an entity sentiment prediction model; wherein the second parameter comprises: the first matrix and the softmax function.
  7. 一种实体情感分析装置,其特征在于,包括:An entity sentiment analysis apparatus, comprising:
    获取单元,用于获取待预测文本;An obtaining unit, configured to obtain a text to be predicted;
    分词单元,用于对所述待预测文本进行分词处理,得到所述待预测文本的词性序列;a word segmentation unit, configured to perform word segmentation processing on the text to be predicted, to obtain a part-of-speech sequence of the text to be predicted;
    生成单元,用于获得所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量;a generating unit, configured to obtain a vector of each participle in the part of speech sequence of the text to be predicted and a vector of the target entity;
    预测单元,用于利用实体情感预测模型对所述待预测文本的词性序列中的每一个分词的向量和目标实体的向量进行预测,得到所述待预测文本中目标实体的情感倾向性的预测结果;其中,所述实体情感预测模型基于第一原理构建得到;所述第一原理包括:迭代更新所述神经网络算法中的参数,直至利用更新参数后的神经网络算法对训练文本的特征向量进行预测而得到的预测结果等同于人工标注结果;所述训练文本的特征向量,依据所述训练文本的词性序列的向量和所述训练文本的词性序列中的目标实体的向量得到。a prediction unit, configured to predict, by using an entity sentiment prediction model, a vector of each participle in the part-of-speech sequence of the text to be predicted and a vector of the target entity, to obtain a prediction result of the sentiment orientation of the target entity in the text to be predicted Wherein the entity sentiment prediction model is constructed based on a first principle; the first principle comprises: iteratively updating parameters in the neural network algorithm until the feature vector of the training text is performed by using a neural network algorithm after updating the parameters The prediction result obtained by the prediction is equivalent to the manual annotation result; the feature vector of the training text is obtained according to the vector of the part-of-speech sequence of the training text and the vector of the target entity in the part-of-speech sequence of the training text.
  8. 根据权利要求7所述的装置,其特征在于,所述生成单元,包括:The device according to claim 7, wherein the generating unit comprises:
    第一获得单元,用于分别获得所述待预测文本的词性序列中的每一个 分词的词向量;a first obtaining unit, configured to respectively obtain a word vector of each participle in the part of speech sequence of the text to be predicted;
    第二获得单元,用于将所述待预测文本的词性序列中的每一个分词的词向量和衰减因子相乘,得到所述待预测文本的词性序列中的每一个分词的向量;a second obtaining unit, configured to multiply a word vector and an attenuation factor of each participle of the part-of-speech sequence of the text to be predicted to obtain a vector of each participle in the part-of-speech sequence of the text to be predicted;
    生成子单元,用于将所述待预测文本中对应所述目标实体的分词的向量,作为所述待预测文本的词性序列中的目标实体的向量。And generating a subunit, configured to use a vector of the word segment corresponding to the target entity in the text to be predicted as a vector of the target entity in the part of speech sequence of the text to be predicted.
  9. 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,在所述程序运行时控制所述存储介质所在设备执行如权利要求1-6中任一项所述的实体情感分析方法。A storage medium, comprising: a stored program, wherein the device in which the storage medium is located is controlled to perform entity sentiment analysis according to any one of claims 1-6 while the program is running method.
  10. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行如权利要求1-6中任一项所述的实体情感分析方法。A processor, wherein the processor is configured to execute a program, wherein the program is executed to perform the entity sentiment analysis method according to any one of claims 1-6.
PCT/CN2019/073665 2018-03-16 2019-01-29 Entity sentiment analysis method and related apparatus WO2019174423A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810217282.9 2018-03-16
CN201810217282.9A CN110287477B (en) 2018-03-16 2018-03-16 Entity emotion analysis method and related device

Publications (1)

Publication Number Publication Date
WO2019174423A1 true WO2019174423A1 (en) 2019-09-19

Family

ID=67907347

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073665 WO2019174423A1 (en) 2018-03-16 2019-01-29 Entity sentiment analysis method and related apparatus

Country Status (2)

Country Link
CN (1) CN110287477B (en)
WO (1) WO2019174423A1 (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111241832A (en) * 2020-01-15 2020-06-05 北京百度网讯科技有限公司 Core entity labeling method and device and electronic equipment
CN111309864A (en) * 2020-02-11 2020-06-19 安徽理工大学 User group emotional tendency migration dynamic analysis method for microblog hot topics
CN111538835A (en) * 2020-03-30 2020-08-14 东南大学 Social media emotion classification method and device based on knowledge graph
CN111552810A (en) * 2020-04-24 2020-08-18 深圳数联天下智能科技有限公司 Entity extraction and classification method and device, computer equipment and storage medium
CN111783453A (en) * 2020-07-01 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for processing emotion information of text
CN112069324A (en) * 2020-08-27 2020-12-11 北京灵汐科技有限公司 Classified label adding method, device, equipment and storage medium
US20210118024A1 (en) * 2019-10-21 2021-04-22 Salesforce.Com, Inc. Multi-label product categorization
CN112749275A (en) * 2020-05-22 2021-05-04 腾讯科技(深圳)有限公司 Data processing method and equipment
CN113569559A (en) * 2021-07-23 2021-10-29 北京智慧星光信息技术有限公司 Short text entity emotion analysis method and system, electronic equipment and storage medium
CN113723089A (en) * 2020-05-25 2021-11-30 阿里巴巴集团控股有限公司 Word segmentation model training method, word segmentation method, data processing method and data processing device
CN113849651A (en) * 2021-09-28 2021-12-28 平安科技(深圳)有限公司 Document-level emotional tendency-based emotion classification method, device, equipment and medium
CN115392260A (en) * 2022-10-31 2022-11-25 暨南大学 Social media tweet emotion analysis method facing specific target

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112579768A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Emotion classification model training method, text emotion classification method and text emotion classification device
CN110990531B (en) * 2019-11-28 2024-04-02 北京声智科技有限公司 Text emotion recognition method and device
CN113378562B (en) * 2020-03-10 2023-09-19 中国移动通信集团辽宁有限公司 Word segmentation processing method, device, computing equipment and storage medium
CN111324739B (en) * 2020-05-15 2020-08-28 支付宝(杭州)信息技术有限公司 Text emotion analysis method and system
CN114386411A (en) * 2020-10-16 2022-04-22 北京金山数字娱乐科技有限公司 Relationship extraction method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281694A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Analysis system of emotional tendency of text
CN104298665A (en) * 2014-10-16 2015-01-21 苏州大学 Identification method and device of evaluation objects of Chinese texts
KR20160077446A (en) * 2014-12-23 2016-07-04 고려대학교 산학협력단 Method for extracting semantic entity topic
CN107038154A (en) * 2016-11-25 2017-08-11 阿里巴巴集团控股有限公司 A kind of text emotion recognition methods and device
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129008B1 (en) * 2008-11-10 2015-09-08 Google Inc. Sentiment-based classification of media content
CN101593204A (en) * 2009-06-05 2009-12-02 北京大学 A kind of emotion tendency analysis system based on news comment webpage
CN106776581B (en) * 2017-02-21 2020-01-24 浙江工商大学 Subjective text emotion analysis method based on deep learning
CN107168945B (en) * 2017-04-13 2020-07-14 广东工业大学 Bidirectional cyclic neural network fine-grained opinion mining method integrating multiple features

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281694A (en) * 2014-10-13 2015-01-14 安徽华贞信息科技有限公司 Analysis system of emotional tendency of text
CN104298665A (en) * 2014-10-16 2015-01-21 苏州大学 Identification method and device of evaluation objects of Chinese texts
KR20160077446A (en) * 2014-12-23 2016-07-04 고려대학교 산학협력단 Method for extracting semantic entity topic
CN107038154A (en) * 2016-11-25 2017-08-11 阿里巴巴集团控股有限公司 A kind of text emotion recognition methods and device
CN107608956A (en) * 2017-09-05 2018-01-19 广东石油化工学院 A kind of reader's mood forecast of distribution algorithm based on CNN GRNN

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11507989B2 (en) * 2019-10-21 2022-11-22 Salesforce, Inc. Multi-label product categorization
US20210118024A1 (en) * 2019-10-21 2021-04-22 Salesforce.Com, Inc. Multi-label product categorization
CN111241832B (en) * 2020-01-15 2023-08-15 北京百度网讯科技有限公司 Core entity labeling method and device and electronic equipment
CN111241832A (en) * 2020-01-15 2020-06-05 北京百度网讯科技有限公司 Core entity labeling method and device and electronic equipment
CN111309864A (en) * 2020-02-11 2020-06-19 安徽理工大学 User group emotional tendency migration dynamic analysis method for microblog hot topics
CN111309864B (en) * 2020-02-11 2022-08-26 安徽理工大学 User group emotional tendency migration dynamic analysis method for microblog hot topics
CN111538835B (en) * 2020-03-30 2023-05-23 东南大学 Social media emotion classification method and device based on knowledge graph
CN111538835A (en) * 2020-03-30 2020-08-14 东南大学 Social media emotion classification method and device based on knowledge graph
CN111552810A (en) * 2020-04-24 2020-08-18 深圳数联天下智能科技有限公司 Entity extraction and classification method and device, computer equipment and storage medium
CN111552810B (en) * 2020-04-24 2024-03-19 深圳数联天下智能科技有限公司 Entity extraction and classification method, entity extraction and classification device, computer equipment and storage medium
CN112749275B (en) * 2020-05-22 2024-05-14 腾讯科技(深圳)有限公司 Data processing method and device
CN112749275A (en) * 2020-05-22 2021-05-04 腾讯科技(深圳)有限公司 Data processing method and equipment
CN113723089A (en) * 2020-05-25 2021-11-30 阿里巴巴集团控股有限公司 Word segmentation model training method, word segmentation method, data processing method and data processing device
CN113723089B (en) * 2020-05-25 2023-12-26 阿里巴巴集团控股有限公司 Word segmentation model training method, word segmentation method and data processing method and device
CN111783453A (en) * 2020-07-01 2020-10-16 支付宝(杭州)信息技术有限公司 Method and device for processing emotion information of text
CN111783453B (en) * 2020-07-01 2024-05-21 支付宝(杭州)信息技术有限公司 Text emotion information processing method and device
CN112069324A (en) * 2020-08-27 2020-12-11 北京灵汐科技有限公司 Classified label adding method, device, equipment and storage medium
CN113569559B (en) * 2021-07-23 2024-02-02 北京智慧星光信息技术有限公司 Short text entity emotion analysis method, system, electronic equipment and storage medium
CN113569559A (en) * 2021-07-23 2021-10-29 北京智慧星光信息技术有限公司 Short text entity emotion analysis method and system, electronic equipment and storage medium
CN113849651B (en) * 2021-09-28 2024-04-09 平安科技(深圳)有限公司 Emotion classification method, device, equipment and medium based on document-level emotion tendencies
CN113849651A (en) * 2021-09-28 2021-12-28 平安科技(深圳)有限公司 Document-level emotional tendency-based emotion classification method, device, equipment and medium
CN115392260B (en) * 2022-10-31 2023-04-07 暨南大学 Social media tweet emotion analysis method facing specific target
CN115392260A (en) * 2022-10-31 2022-11-25 暨南大学 Social media tweet emotion analysis method facing specific target

Also Published As

Publication number Publication date
CN110287477A (en) 2019-09-27
CN110287477B (en) 2021-05-25

Similar Documents

Publication Publication Date Title
WO2019174423A1 (en) Entity sentiment analysis method and related apparatus
WO2019174422A1 (en) Method for analyzing entity association relationship, and related apparatus
Treviso et al. Efficient methods for natural language processing: A survey
US20170116203A1 (en) Method of automated discovery of topic relatedness
CN109299228B (en) Computer-implemented text risk prediction method and device
CN110619044B (en) Emotion analysis method, system, storage medium and equipment
TW201822098A (en) Computer device and method for predicting market demand of commodities
US20220027738A1 (en) Distributed synchronous training architecture using stale weights
WO2014126657A1 (en) Latent semantic analysis for application in a question answer system
US11580119B2 (en) System and method for automatic persona generation using small text components
CN111783993A (en) Intelligent labeling method and device, intelligent platform and storage medium
US10067983B2 (en) Analyzing tickets using discourse cues in communication logs
WO2017075980A1 (en) Information pushing method and apparatus
US10614109B2 (en) Natural language processing keyword analysis
WO2014073206A1 (en) Information-processing device and information-processing method
CN102789473A (en) Identifier retrieval method and equipment
JP6770709B2 (en) Model generator and program for machine learning.
US9460086B2 (en) Method and apparatus for performing bilingual word alignment
US20230351121A1 (en) Method and system for generating conversation flows
JP6436086B2 (en) Classification dictionary generation device, classification dictionary generation method, and program
Visser et al. Sentiment and intent classification of in-text citations using bert
CN116578400A (en) Multitasking data processing method and device
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
CN116151235A (en) Article generating method, article generating model training method and related equipment
JP2017538226A (en) Scalable web data extraction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19766754

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19766754

Country of ref document: EP

Kind code of ref document: A1