CN111241842A

CN111241842A - Text analysis method, device and system

Info

Publication number: CN111241842A
Application number: CN201811426073.1A
Authority: CN
Inventors: 宋凯嵩; 孙常龙; 林君; 刘晓钟
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-06-05
Anticipated expiration: 2038-11-27
Also published as: CN111241842B

Abstract

The invention discloses a text analysis method, a text analysis device and a text analysis system. Wherein, the method comprises the following steps: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and outputting the emotion information included in the text data. The method and the device solve the technical problem that in the prior art, the semantic analysis result of the text is inaccurate because the text can be subjected to semantic analysis only from a single dimension.

Description

Text analysis method, device and system

Technical Field

The invention relates to the field of language analysis, in particular to a text analysis method, a text analysis device and a text analysis system.

Background

Emotion analysis (Sentiment analysis), also known as tendency analysis, is a process of analyzing, processing, inducing and reasoning subjective text with emotional colors, which divides the text into two or more types, either positive or negative, according to the meaning and emotional information expressed by the text.

The purpose of emotion analysis is generally to find out the attitude of a speaker or author with respect to a topic, which is the emotional state of the speaker or author. A great deal of comment information which is participated by users and is valuable to tasks, time, products and the like is generated on the Internet, the comment information expresses various emotional colors and emotional tendencies of people, such as happiness, anger, sadness, joy, criticism and praise, and based on the comment information, potential users can know the opinion of public opinion on a certain event or product by browsing the comments with subjective colors.

The current emotion analysis methods generally include a dictionary-based analysis method, a network-based analysis method, and a corpus-based analysis method, where the corpus-based analysis method is widely applied, and it uses a machine learning correlation technique to classify the emotion of a word, and the machine learning method generally needs to first make a classification model learn the consideration in training data, and then use the trained model to predict test data. However, currently, this method can only perform emotion analysis on a language based on one dimension, and thus it is difficult to obtain an accurate analysis result.

Aiming at the problem that the semantic analysis result of the text is inaccurate because the text can be subjected to semantic analysis only from a single dimension in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a text analysis method, a text analysis device and a text analysis system, which at least solve the technical problem that in the prior art, the semantic analysis result of a text is inaccurate because the text can be subjected to semantic analysis only from a single dimension.

According to an aspect of an embodiment of the present invention, there is provided a text analysis method, including: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and outputting the emotion information included in the text data.

According to another aspect of the embodiments of the present invention, there is also provided a text analysis method, including: displaying the acquired text data, wherein the text data comprises entity information and attribute information of at least one product; and outputting emotion information included by the text data, wherein the multi-view emotion analysis model is used for predicting the emotion information of the product on the preset attribute according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data.

According to another aspect of the embodiments of the present invention, there is also provided a text analysis apparatus, including: the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring text data, and the text data comprises entity information and attribute information of at least one product; the prediction module is used for processing the text data based on the multi-view emotion analysis model and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and the output module is used for outputting the emotion information included in the text data.

According to another aspect of the embodiments of the present invention, there is also provided a text analysis apparatus, including: the display module is used for displaying the acquired text data, wherein the text data comprises entity information and attribute information of at least one product; and the output module is used for outputting the emotion information included by the text data, wherein the multi-view emotion analysis model is used for predicting the emotion information of the product on the preset attribute according to the entity information and the attribute information, and the emotion information is used for representing the emotion tendency information represented by the text data.

According to another aspect of the embodiments of the present invention, there is also provided a text analysis system, including: a processor; and a memory coupled to the processor for providing instructions to the processor for processing the following processing steps: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and outputting the emotion information included in the text data.

In the embodiment of the invention, the attribute information needing to be known can be preset, and the emotion information of the product on the preset attribute can be predicted based on the entity information and the attribute information of the text data by adopting the multi-view emotion analysis model, so that the multi-dimensional analysis of the text data is realized without adding a processing model, and the accuracy of the text analysis is improved. Therefore, the embodiment of the application solves the technical problem that in the prior art, the semantic analysis result of the text is inaccurate because the text can be subjected to the semantic analysis only from a single dimension.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a text analysis method;

fig. 2 is a flowchart of a text analysis method according to embodiment 1 of the present application;

FIG. 3 is a flow chart of using a multi-view emotion analysis model according to embodiment 1 of the present application;

FIG. 4 is a schematic diagram of an alternative multi-view emotion analysis model according to embodiment 1 of the present application;

fig. 5 is a flowchart of a text analysis method according to embodiment 2 of the present application;

FIG. 6 is a schematic diagram of a display interface of a multi-view emotion analysis system according to embodiment 2 of the present application;

fig. 7 is a schematic view of a text analysis apparatus according to embodiment 3 of the present application;

fig. 8 is a schematic view of a text analysis apparatus according to embodiment 4 of the present application; and

fig. 9 is a block diagram of a computer terminal according to embodiment 5 of the present application.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method for text analysis, including the steps illustrated in the flowchart of the figure as being executable by a computer system, such as a set of computer-executable instructions, and where a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a text analysis method. As shown in fig. 1, the computer terminal 10 (or mobile device 10) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (the processors 102 may include, but are not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), a memory 104 for storing data, and a transmission module 106 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the text analysis method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the text analysis method. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 1 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 1 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

Under the above operating environment, the present application provides a method for analyzing text as shown in fig. 2. Fig. 2 is a flowchart of a text analysis method according to embodiment 1 of the present application.

Step S21, obtaining text data, wherein the text data includes entity information and attribute information of at least one product.

Specifically, the text data may be comments in forums, shopping websites, and e-commerce platforms. For example, the text data may be user's evaluation of a product in a shopping website, user's evaluation of an automobile in an automobile forum, etc., user's evaluation of a restaurant in a review website, etc.

The entity information in the text data is used for representing the name or brand of the product in the text data, namely the object evaluated by the text data, and the attribute information in the text data is used for evaluating the characteristics of the product or the brand of the product in the text data on certain attribute.

In an alternative embodiment, taking a scene of an e-commerce platform as an example, the text data may be comments of users on entities in the platform, and in one comment, a subject of the comment may be an entity of the text, for example, for a comment "this type of nike sports shoes is comfortable", an entity in the text is "nike" or "nike sports shoes", and the attribute information is "very comfortable".

And step S23, processing the text data based on the multi-view emotion analysis model, and predicting to obtain emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of the product on preset attributes according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data.

Specifically, the preset attribute may be a pre-specified attribute, and in the process of analyzing the text data, what attribute the user has to set may be set if the emotional tendency of the entity on what attribute needs to be obtained. The emotional information can be a representation of emotional tendency, and optionally, the emotional information can be output in three ways of representation, namely positive direction, negative direction and neutral direction.

When the text data is a comment for the entity, the object generating the text data can be a user filling the comment on the shopping website, and the method and the system for generating the text data can obtain the emotional tendency information of the entity on one or more attributes by analyzing the emotional tendency of the comment filled by the user.

In an alternative embodiment, still taking the text data "this nike sports shoe is comfortable", the entity in this text is "nike" or "nike sports shoe", and the attribute information is "very comfortable". The multi-view emotion analysis model analyzes emotion tendencies of 'nike' and 'nike sports shoes' on preset attributes based on entity information and emotion information of the text data, for example, in order to acquire emotion information of users on comfort of the nike sports shoes through comments, the attribute can be set to 'comfort', if emotion information of users on air permeability of the nike sports shoes is also required to be acquired, the attribute can be set to 'air permeability', and the like, and various emotion information of the users on the attribute can be acquired.

The multi-view emotion analysis model can be realized through a neural network model, and is specifically used for analyzing emotion tendencies of text data from multiple dimensions. In an optional embodiment, the multi-view emotion analysis model analyzes the emotion tendencies of the text data from two dimensions of an entity and an attribute, so that the obtained emotion tendencies are not only directed to one entity or only directed to one attribute but also the emotion tendencies of the entity on a certain attribute, and further the obtained emotion tendencies are more accurate, more comprehensive and more targeted, and are more instructive to manufacturers of the entities.

Fig. 3 is a flowchart of using a multi-view emotion analysis model according to embodiment 1 of the present application, and with reference to fig. 3, the multi-view emotion analysis model is embedded in a multi-view emotion analysis system, and after the multi-view emotion analysis system is used, it is first determined whether to train the multi-view emotion analysis model or to predict emotion tendencies of text data using the multi-view emotion analysis model. For example, it may be determined whether or not to predict reviews.

If the judgment result is negative, the multi-view emotion analysis model needs to be trained, comment data can be collected from a product comment pool, after the collected comment data are preprocessed, a training sample is constructed by using the collected comment data, and the multi-view emotion analysis model is trained by using the training sample, so that the multi-view emotion analysis model is constructed.

Specifically, comment text data of one or more fields (such as women's dress, shoes, beauty makeup, and the like) may be selected, and the comment text data specifically includes: product review text, entities of the review, and attributes of the entities, where the entities may be extracted from the review by an entity extraction tool (e.g., Stanford's natural language processing tool set), and the attributes of the entities may be preset by a human. And storing the three information as triples in a database as training samples to train the emotion information analysis module.

In selecting the comment values for training, in order to alleviate possible sparsity of entities or attributes in the comments and ensure stable and reliable performance of the model, it is necessary to have not less than a preset number of comments per entity and per attribute of the entity, which may be 10.

The training sample comprises triple data corresponding to the comments and labels corresponding to the triple data, namely emotion information of the comments, the triple data of the comments are input to the multi-view emotion analysis model to obtain a prediction result output by the multi-view emotion analysis model, a loss function is determined according to the prediction result output by the multi-view emotion analysis model and the labels corresponding to the comments, and the loss function is used as a minimum objective function, so that network parameters in the multi-view emotion analysis model are adjusted, and the accurate multi-view emotion analysis model is obtained.

If the judgment result is yes, the emotion tendency of the comment needs to be predicted by using multi-view emotion analysis, a new comment (namely, a comment which does not belong to a training sample) needing to be predicted can be obtained, the new comment is input to a data preprocessing module for data preprocessing, the training sample and the new comment are possibly preprocessed by the same preprocessing module, therefore, after the new comment is output by the preprocessing module, the operation is needed to judge whether the new comment is used for predicting, if yes, the multi-view emotion analysis model is used for emotion analysis of the new comment, and if not, the multi-view emotion analysis model is established.

In step S25, emotion information included in the text data is output.

Specifically, after the text data is processed by the multi-view emotion analysis model, the processing result is output, and emotion information included in the text data is used for expressing the emotional tendency of a product in the text data on a preset attribute.

In an alternative embodiment, for example, the text data "nike sports shoes are comfortable", the obtained emotional information is that the entity "nike sports shoes" has a tendency to "forward" in the attribute of "comfort"; taking the text "XX brand clothes are easy to fade", the obtained emotional information may be that the entity "XX brand clothes" have a negative trend in the attribute of "color"; still taking the example of "XX brand clothing being susceptible to fading", the resulting affective information may also be that the entity "XX brand clothing" has a "neutral" tendency in the attribute of "comfort".

It should be noted that, in the prior art, when the comment text is analyzed for emotion, only one dimension of analysis result can be output, for example, the text data of "nike sports shoes are comfortable" can only be analyzed for an entity, and as a result, the text data analyzes the emotional tendency of the entity "nike" or "nike sports shoes", or analyzes the attribute "very comfortable", and as a result, the text data analyzes the emotional tendency of the attribute "very comfortable". However, text data to be analyzed has diversified characteristics, especially comments of a product, and if only the emotional tendency of the user to the product is obtained, the result is too lusterless, and it is not clear which aspect (i.e. attribute) the product actually has been evaluated in the "positive direction" or the "negative direction"; however, if only the evaluation of the text data on a certain attribute is obtained, it is unclear what kind of product is evaluated, so that the accuracy of the analysis result of a single dimension is poor, and it is difficult to have accurate guidance on the improvement of the product.

In the embodiment of the application, the attribute information to be known can be preset, and the multi-view emotion analysis model is adopted, so that the emotion information of the product on the preset attribute is predicted based on the entity information and the attribute information of the text data, thereby realizing the multi-dimensional analysis of the text data without adding a processing model, and improving the accuracy of the text analysis.

Therefore, the embodiment of the application solves the technical problem that in the prior art, the semantic analysis result of the text is inaccurate because the text can be subjected to the semantic analysis only from a single dimension.

As an optional embodiment, processing the text data in the multi-view emotion analysis model, and predicting emotion information included in the text data includes: acquiring a semantic vector of the text data, an entity vector corresponding to the entity information and an attribute vector corresponding to the attribute information based on the text data; updating the entity vector through the semantic vector to generate a first text vector; updating the attribute vector through the semantic vector to generate a second text vector; and after the first text vector and the second text vector are combined, determining the emotion information of the text data according to a combined result through a classifier.

After the text data, the entities in the text data and the preset attributes are determined, the vectors corresponding to the text data, the entities and the preset attributes can be obtained through vectorization processing of the text data, the entities and the preset attributes.

The semantic vectors in the scheme can be fixed in length, the vectors containing semantic information can be extracted from statement vectors of text data, the vectors obtained by vectorizing the entities are word vectors under the condition that the text data only comprises one entity, and the word vectors of the text data can be obtained by linearly combining vectorized results of the entities under the condition that the text data comprises a plurality of entities. The processing methods for the preset attributes are the same, and in the case of only one attribute, the vector obtained by vectorizing the attribute is the attribute vector, and in the case of multiple preset attributes, the vectorized result of each attribute can be linearly combined, so that the attribute vector of the text data can be obtained.

In the scheme, the entity vector is updated by using the semantic vector, so that the expression abstraction degree of the word vector is higher and more accurate, and therefore, a more accurate result can be obtained by analyzing the tendency of the text data by using the first text vector.

In an alternative embodiment, the semantic vector and the entity vector may be directly superimposed to generate an entity vector carrying context information.

In another alternative embodiment, different weights can be given to the semantic vector and the entity vector according to the importance of the input semantic vector and entity vector through a preset fusion component, and then the first text vector is generated through a weighted combination mode.

The attribute vector is updated by using the semantic vector, so that the expression abstraction degree of the word vector is higher and more accurate, and therefore, a more accurate result can be obtained by analyzing the tendency of the text data by using the second text vector. Likewise, there are many ways to update the attribute vector using the semantic vector, for example: the semantic vector and the attribute vector are directly superposed or fused by a preset fusion component.

In the above step, the first text vector and the second text vector are combined, which may be two text vectors that are directly spliced, for example, a 100-dimensional first text vector and a 100-dimensional second text vector are combined to obtain a 200-dimensional vector. The classifier may be a classifier formed by a sofmax function.

The emotional information can be used for representing the attitude of the user on a certain attribute of the brand or the product represented by the entity, so that the defect of the brand or the product is obtained, and the improvement of a merchant is facilitated. Affective information can include positive, negative, and neutral. Taking the classifier softmax as an example, the softmax outputs the relative probability that the text belongs to different emotion information, and finally, the emotion information with the maximum relative probability is determined to be the emotion information to which the text data belongs.

In an optional embodiment, the steps in the above embodiment may be performed by a multi-view emotion analysis model, where the multi-view emotion analysis model obtains an entity vector and an attribute, and superimposes a semantic vector on the entity vector and the attribute vector of the text data to update the entity vector and the attribute vector, and predicts emotion information of the text according to the updated entity word vector and the updated attribute word vector.

In a more specific embodiment, the description is made by taking an e-commerce website as an example. The emotion information analysis task in this example is to analyze the emotion information of the user on the XX brand clothing, and the analyzed content includes: color of the garment, elasticity of the garment, tailoring of the garment, design of the garment, comfort of the garment, degree of deformation of the garment, and size of the garment. Collecting all comments of the XX brand clothes on the E-commerce website, performing sentiment analysis by taking each comment as text data, wherein in the analysis process, an object in each comment is an entity of a text, the analyzed content is a preset attribute, forming triple data by the comment text, the entities in the comments and the preset attribute, inputting the triple data into the multi-view sentiment analysis model, performing sentiment analysis on the comments by using the sentiment analysis mode provided in the steps S21-S25 through the multi-view sentiment analysis model, and obtaining the result of sentiment information of each comment on each attribute.

It should be noted that the emotion information analysis result may include positive, negative and neutral, and it is difficult for a comment in the e-commerce website to relate to all the analysis contents, that is, to cover all the preset attributes, and therefore, the output emotion information may be "neutral" for the attributes that are not referred to by the text.

As an optional embodiment, acquiring a semantic vector of text data, an entity vector corresponding to entity information, and an attribute vector corresponding to attribute information based on the text data includes: acquiring triple information of text data, wherein the triple information comprises: the text content of the text data, entity words representing entity information and attribute words representing preset attributes; vectorizing the triple information to obtain statement vectors of text data, entity word vectors of entity words and attribute word vectors of preset attributes; carrying out linear combination coding on the entity word vectors to obtain entity vectors; carrying out linear combination coding on the attribute word vectors to obtain attribute vectors; and generating a semantic vector according to the statement vector, the entity vector and the attribute vector.

Specifically, the text content of the text data can be directly obtained from the comments of the website, and the entity words in the text data can be extracted through the Stanford natural language processing tool set. And vectorizing the text content, the entity words and the attribute words, and inputting the vectorized text content, the entity words and the attribute words into the emotion information reverse model to obtain semantic vectors of the text data, entity vectors for representing entities in the text data and attribute vectors for representing preset attributes.

Specifically, the entity word vector may be a vector obtained by directly vectorizing an entity word in a word2vec manner or the like, and the attribute word vector is also a vector obtained by directly vectorizing a preset attribute in a word2vec manner or the like. The entity vector is obtained by performing linear combination coding on the entity word vector, and the attribute vector is also obtained by performing linear combination coding on the attribute word vector in the same way. And through linear combination coding, entity words or attribute words with different word numbers are combined into vectors with fixed dimensionality, so that subsequent calculation is facilitated.

Fig. 4 is a schematic diagram of an alternative multi-view emotion analysis model according to embodiment 1 of the present application, and in combination with fig. 4, the entity memory coding module is configured to perform linear combination coding on an entity word vector to obtain an entity vector.

The entity memory coding module pair is used for carrying out vectorization processing on the entity words to obtain entity word vectors

And carrying out linear combination to obtain an entity vector.

Still referring to fig. 4, the aspect memory coding module is configured to perform linear combination coding on the attribute word vector to obtain an attribute vector.

The aspect memory coding module pair is used for carrying out vectorization processing on the attribute words to obtain attribute word vectors

And carrying out linear combination to obtain an attribute vector.

As will be shown in conjunction with figure 4,

the statement vector, the entity vector output by the entity memory coding module and the attribute vector output by the aspect memory coding module are processed by the context memory coding module to obtain the semantic vector of the text data.

As an alternative embodiment, the method further includes: preprocessing the text data, wherein the preprocessing comprises any one or more of the following items: word segmentation processing, root reduction and noise elimination.

Specifically, the above steps may be executed by the data acquisition and preprocessing module, the data acquisition module directly acquires the comments as the text information, and the preprocessing module preprocesses the text information.

The word segmentation processing is used for dividing the text data into a plurality of words, the word segmentation can be carried out on the text data in a bigram word segmentation mode, the root reduction is used for reducing words with word shape changes into roots, and the noise elimination is used for eliminating noise words such as special characters and special symbols from word segmentation results.

In an optional embodiment, word segmentation processing may be performed on the text information, and then root and noise elimination processing may be performed, so as to obtain the text content.

As an alternative embodiment, in the case that the entity information includes a plurality of entity words, performing linear combinatorial coding on the entity word vector to obtain an entity vector, including: obtaining the average value of elements of a plurality of entity word vectors at the same position; and determining the average value of each position as the element of the entity vector at each position, and obtaining the entity vector according to the element at each position.

In a possible case, the entity word may include a plurality of words, and in order to use the entity word in a vector representation with a fixed dimension, the plurality of words of the entity need to be subjected to linear combination coding.

In the above scheme, the position of the vector refers to the row and column in the vector, and in an alternative embodiment, the average value of the elements in the first row and the first column in the two vectors is obtained as the elements in the first row and the first column of the entity vector, the average value of the elements in the first row and the second column in the two vectors is obtained as the elements in the first row and the second column of the entity vector, and so on, so as to obtain the entity vector.

Here, it should be noted that the attribute words may be a plurality of words, and when the attribute words are a plurality of words, the attribute words need to be linearly combined and encoded, and the manner of linearly combining and encoding the attribute words may be the same as the manner of linearly combining and encoding the entity words.

As an alternative embodiment, generating a semantic vector from the statement vector, the entity vector, and the attribute vector includes: connecting the statement vector, the entity vector and the attribute vector to obtain a connection result; acquiring a first position weight of a context word of the entity information relative to the entity information and a second position weight of the context word of the attribute information relative to the attribute information, and performing weighting representation on a connection result by using the first position weight and the second position weight to obtain a weighting result, wherein the first position weight is determined according to the distance between the context word of the entity information and the entity information, and the second position weight is determined according to the distance between the context word of the attribute information and the attribute information; coding the weighting result through a long-term and short-term memory network layer to obtain a hidden state vector of each vector; and determining the hidden state vector as a semantic vector.

In the above scheme, the sentence vectors, the entity vectors and the attribute vectors are connected, and the sentence vectors, the entity vectors and the attribute vectors may be directly spliced in a predetermined order.

Specifically, the entity information may be entity words in the text data, the attribute information may be attribute words corresponding to preset attributes, and the vector output by the interaction layer is weighted by using the first position weight and the second position weight, so as to introduce the position information into the semantic vector.

In an alternative embodiment, the weights corresponding to different distances may be set manually. Each vector is output by the interaction layer to represent a word in the text data, and taking one of the vectors as an example, the distance between the word and the entity information is obtained, and the weight corresponding to the distance is determined to be the first position weight of the word; and then obtaining the distance between the word and the attribute information, and determining the weight corresponding to the distance as the second position weight of the word. Multiplying the vector corresponding to the word by the first position weight and the second position weight to weight the vector corresponding to the word. And finally, processing the weighted vector through an LSTM layer to obtain the expression of the hidden vector.

In an alternative embodiment, as shown in fig. 4, the above steps can be performed by a context memory coding module in the multi-view emotion analysis model, where the context memory coding module includes: an interaction layer for transforming the sentence vector, a position Attention layer, and an LSTM layer

Entity vector

And attribute vector

And performing connection representation, wherein the position authorization layer is used for determining weights according to the distance between the context words in the text data and the entity and the attribute, and performing weighted representation on the output of the interaction layer by using the determined weights. The LSTM layer encodes the output of the position-orientation layer using a unidirectional single layer of LSTM, thereby obtaining a series of hidden-state representations h₁、h₂、h₂……h_lWherein h is₁For representing the semantics of the first word, h₂To represent the semantics of a first word and a second word, and so on.

As an alternative embodiment, obtaining a first position weight of a context word of an entity word relative to the entity word includes: obtaining the distance between the context words of the entity information and the entity information; and determining a first position weight according to the distance between the context words of the entity information and the entity information, wherein the distance and the first position weight are in an inverse proportion relation.

In the product comment, a word generally closer to the entity can express the semantics of the text data more than a word closer to the attribute, so that the distance and the first position weight equation can be set in an inverse proportion relation.

According to the scheme, the position information of the words is introduced, so that the importance degree of the words in the text data is determined according to the positions of the words in the text data, and the obtained semantic vector is more accurate.

The method for obtaining the second position weight is the same as the method for obtaining the first position weight, and the distance between the context words and the attribute information of the attribute information is obtained; and determining a second position weight according to the distance between the context words of the attribute information and the attribute information, wherein the distance and the second position weight are in an inverse proportion relation.

As an alternative embodiment, generating the first text vector comprises: generating a first text vector, comprising: acquiring a fusion component and a preset iteration layer number, wherein the fusion component is used for fusing an entity vector and a semantic vector; obtaining an entity semantic vector of the current layer through an attention mechanism according to the entity words and semantic vectors of the current layer; fusing the semantic vector of the current layer and the entity word vector of the current layer by a fusion component to obtain a first text vector of the current layer; and detecting whether the current layer reaches a preset iteration layer number, if so, determining that the first text vector of the current layer is a first text vector, and otherwise, continuously outputting the first text vector of the current layer to a next layer fusion component as an entity vector of the next layer.

The scheme adopts a multilayer iteration mode to update the entity vector.

In an alternative embodiment, as shown in fig. 4, the hop1, hop2, and hop3 in the entity memory updating module are the above iteration layers, each iteration layer includes a fusion component G, and the fusion component is configured to perform a fusion operation on the entity semantic vector and the entity vector of each layer. First, in hop1, the attribute mechanism is applied to the entity vector VEC^eAnd the hidden state vector (h1, h2, … … hn) is operated to obtain an entity semantic vector of the hop1, and a G component in the hop1 is fused according to the entity vector and the entity semantic vector to generate VEC updated by the hop1^eAnd VEC updated by hop1^eInput to hop 2.

In hop2, the same applies depending on the VEC input^eComputing with hidden state vectors (h1, h2, … … hn) to obtain an entity semantic vector of hop2, fusing a G component in the hop2 according to the entity vector and the entity semantic vector to generate VEC updated by hop2^e. Generating a HOP2 updated VEC^eThe vector is input to the hop3, and after the same operation as the hop3, the vector is output after being updated by the entity memory updating module.

The mode of generating the second text vector can be the same as the mode of generating the first text vector, and the attribute semantic vector of the current layer is obtained through an attention mechanism according to the attribute words and the semantic vector of the current layer; fusing the semantic vector of the current layer and the attribute word vector of the current layer by the fusion component to obtain a second text vector of the current layer; and detecting whether the current layer reaches a preset iteration layer number, if so, determining that the second text vector of the current layer is the second text vector, and otherwise, continuously outputting the second text vector of the current layer to a next layer fusion component as the attribute vector of the next layer.

Still referring to fig. 4, the hops 1, 2, and 3 in the aspect memory updating module are the above iteration layers, each iteration layer includes a fusion component G, which may be the same as G in the entity memory updating module, and the fusion component is configured to perform a fusion operation on the attribute semantic vector and the attribute vector of each layer. First, in hop1, the attribute mechanism is applied to the entity vector VEC^aAnd the hidden state vector (h1, h2, … … hn) is operated to obtain an attribute semantic vector of the hop1, and the G component in the hop1 is fused according to the attribute vector and the attribute semantic vector to generate VEC updated by the hop1^aAnd VEC updated by hop1^aInput to hop 2.

In hop2, the same applies depending on the VEC input^aComputing with hidden state vectors (h1, h2, … … hn) to obtain attribute semantic vectors of hop2, fusing G components in the hop2 according to the attribute vectors and the attribute semantic vectors to generate VECs updated by hop2^a. Generating a HOP2 updated VEC^aThe attribute vector is input to the hop3, and after the same operation as that of the hop3, the attribute vector updated by the aspect memory updating module is output.

As an optional embodiment, fusing, by a fusion component, the semantic vector of the current layer and the entity word vector of the current layer to obtain a first text vector of the current layer, including: inputting the entity semantic vector of the current layer into a first hyperbolic tangent function to obtain a first operation result, and inputting the entity vector of the current layer into a second hyperbolic tangent function to obtain a second operation result; inputting the entity semantic vector of the current layer and the entity vector of the current layer into a logic stecke function together to obtain a third operation result; weighting the first operation result and the third operation result to obtain a fourth operation result; after subtracting the third operation result from 1, weighting the third operation result and the second operation result to obtain a fifth operation result; and summing the fourth operation result and the fifth operation result to obtain a first text vector of the current layer.

In an alternative embodiment, referring to fig. 4, the tanh in fig. 4 is the hyperbolic tangent function, and the δ in fig. 4 is the logical steckel function. In the G component, the semantic vector of the current layer is input to the tan h (a first hyperbolic tangent function) on the left side and the delta function in the middle, and the entity vector of the current layer is input to the tan h (a second hyperbolic tangent function) on the right side and the delta function in the middle, the tan h on the left side obtains a first operation result according to the semantic vector of the current layer, the tan h on the right side obtains a second operation result according to the entity vector of the current layer, and the delta function obtains a third operation result according to the semantic vector and the entity vector of the current layer.

And weighting a third operation result obtained by calculation of the delta function as the weight of the first operation result and weighting (1-the third operation result) as the weight of the second operation result to obtain a first text vector of the current layer.

As an optional embodiment, after combining the first text vector and the second text vector, determining the emotion information of the text data according to the combined result through a classifier, wherein the step of splicing the first text vector and the second text vector to obtain a splicing result; processing the splicing result through the full connecting layer; and determining the emotion information of the text data according to the processing result of the full connection layer through the classifier.

Specifically, the classifier may be a softmax classifier, and in the step, the first text vector is an update result of a word vector, the second text vector is an update result of an attribute vector, the first text vector and the second text vector are subjected to item concatenation, and an obtained concatenation result includes both entity information in the original text data and attribute information in the original text data, so that processing of a full connection layer and processing of the classifier are performed on the basis of the concatenation result, and the obtained emotion information inevitably has emotion tendencies in two dimensions, namely the entity and the attribute.

As an optional embodiment, the text data includes a comment text of the object in the preset website on the product.

In an optional embodiment, the preset website may be a shopping website, the object may be a user of the shopping website, the user of the shopping website reviews the product on the shopping website after purchasing the product, and the text data may be review text generated when the user reviews the product.

In another optional embodiment, the preset website may also be a review website, the object may also be a user of the review website, and the text data may be a review text generated by the user reviewing the product at the review website.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, an embodiment of a text analysis method is further provided, and fig. 5 is a flowchart of a text analysis method according to embodiment 2 of the present application, which is shown in fig. 5, and includes the following steps:

step S51, displaying the acquired text data, wherein the text data includes entity information and attribute information of at least one product.

Specifically, the text data may be comments in forums, shopping websites, and e-commerce platforms. For example, the text data may be user's evaluations of entities in a shopping website, user's evaluations of cars in a car forum, etc., user's evaluations of restaurants in a review website, etc.

In an optional embodiment, taking a scene of an e-commerce platform as an example, the text data may be comments of users on entities in the platform, and in one comment, a subject of the comment may be an entity of the text, for example, for a comment "this type of nike sports shoes is comfortable", an entity in the text is "nike", and "nike sports shoes", the attribute information is "very comfortable".

And step S53, outputting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of the entity on preset attributes according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data.

When the text data is the comments on the product, the object generating the text data can be the user filling the comments on the shopping website, and the emotional tendency information of the user on one or more attributes of the product is obtained by analyzing the emotional tendency of the comments filled by the user.

In an alternative embodiment, still taking the text data "this nike sports shoe is comfortable", the entity in this text is "nike", and "nike sports shoe", the attribute information is "very comfortable". The multi-view emotion analysis model analyzes emotion tendencies of 'nike' and 'nike sports shoes' on preset attributes based on entity information and emotion information of the text data, for example, if emotion information of users on comfort of the nike sports shoes is acquired through comments, the attribute can be set to 'comfort', if emotion information of users on air permeability of the nike sports shoes is also required to be acquired, the attribute can be set to 'air permeability', and the like, and various emotion information of the users on the attribute can be acquired.

The multi-view emotion analysis model can be realized through a neural network model, and is specifically used for analyzing emotion tendencies of text data from multiple dimensions. In an optional embodiment, the multi-view emotion analysis model analyzes the emotional tendency of the text data from two dimensions, namely an entity and an attribute, so that the obtained emotional tendency is not only specific to one entity or only specific to one attribute but also specific to the entity on a certain attribute, and the obtained emotional tendency is more accurate, more comprehensive and more targeted, and is more instructive to manufacturers of the entities.

And after the text data is processed by the multi-view emotion analysis model, outputting a processing result, wherein emotion information included in the text data is used for expressing the emotional tendency of a product in the text data on a preset attribute.

In an alternative embodiment, for example, the text data "nike sports shoes are comfortable", the obtained emotional information is that the entity "nike sports shoes" has a tendency to "forward" in the attribute of "comfort"; taking the text "XX brand clothes are easy to fade", the obtained emotional information may be that the entity "XX brand clothes" have a negative trend in the attribute of "color"; still taking the example of "XX brand clothing being susceptible to fading", the resulting affective information may be that the entity "XX brand clothing" has a "neutral" tendency in the attribute of "comfort".

It should be noted that, in the prior art, when performing emotion analysis on a comment text, only one-dimensional analysis result can be output, for example, for text data of "nike sports shoes are comfortable", only an entity can be analyzed, and as a result, the text data has an emotional tendency to the entity "nike" or "nike sports shoes"; or the attribute "very comfortable" is analyzed, and the result is the emotional tendency of the text data to the attribute "very comfortable". However, text data to be analyzed has diversified characteristics, especially comments of a product, and if only the emotional tendency of the user to the product is obtained, the result is too lusterless, and it is not clear which aspect (i.e. attribute) the product actually has been evaluated in the "positive direction" or the "negative direction"; however, if only the evaluation of the text data on a certain attribute is obtained, it is unclear what kind of product is evaluated, so that the accuracy of the analysis result of a single dimension is poor, and it is difficult to have accurate guidance on the improvement of the product.

As an alternative embodiment, the method further includes: receiving model parameters of a multi-view emotion analysis model; and displaying the model parameters of the multi-view emotion analysis model.

Specifically, the model parameters may include parameters such as an initialization mode of the multi-view emotion analysis model, a learning speed during training, and a depth of a computation layer.

The text analysis method can be realized by operating a multi-view emotion analysis system of a multi-view emotion analysis model, the multi-view emotion analysis system provides at least two types of interactive interfaces, fig. 6 is a schematic diagram of a display interface of the multi-view emotion analysis system according to embodiment 2 of the application, and is combined with fig. 6 to show one type of the display interface is a product comment display interface displayed for a user, when the product comment display is performed on the user, the product comment display can be performed according to product names or attribute names, so that a multi-dimensional comment display mode is provided; the other is a background management interface displayed to the background management personnel, and the background management interface can comprise a model training setting interface used for the background management personnel to set model parameters and a system management interface used for the background management personnel to limit the authority of the user and avoid sensitive information leakage.

Example 3

According to an embodiment of the present invention, there is also provided a text analysis apparatus for implementing the text analysis method in embodiment 1, and fig. 7 is a schematic diagram of a text analysis apparatus according to embodiment 3 of the present application, as shown in fig. 7, the apparatus 700 includes:

an obtaining module 702 is configured to obtain text data, where the text data includes entity information and attribute information of at least one product.

And the predicting module 704 is used for processing the text data based on the multi-view emotion analysis model and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data.

And an output module 706, configured to output emotion information included in the text data.

It should be noted here that the obtaining module 702, the predicting module 704 and the outputting module 706 correspond to steps S21 to S25 in embodiment 1, and the two modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the prediction module comprises: the first obtaining submodule is used for obtaining semantic vectors of the text data, entity vectors corresponding to the entity information and attribute vectors corresponding to the attribute information based on the text data; the first generation submodule is used for updating the entity vector through the semantic vector to generate a first text vector; the second generation submodule is used for updating the attribute vector through the semantic vector to generate a second text vector; and the combination submodule is used for determining the emotion information of the text data according to the combination result through the classifier after the first text vector and the second text vector are combined.

As an alternative embodiment, the first obtaining sub-module includes: a first obtaining unit, configured to obtain triple information of text data, where the triple information includes: the text content of the text data, entity words representing entity information and attribute words representing preset attributes; the processing unit is used for vectorizing the triple information to obtain statement vectors of text data, entity word vectors of entity words and attribute word vectors of preset attributes; the first coding unit is used for carrying out linear combination coding on the entity word vectors to obtain entity vectors; the second coding unit is used for carrying out linear combination coding on the attribute word vectors to obtain attribute vectors; and the generating unit is used for generating a semantic vector according to the statement vector, the entity vector and the attribute vector.

As an optional embodiment, the first obtaining sub-module further includes: the preprocessing unit is used for preprocessing the text data, wherein the preprocessing comprises any one or more of the following items: word segmentation processing, root reduction and noise elimination.

As an alternative embodiment, in the case that the entity information includes a plurality of entity words, the first encoding unit includes: the first obtaining subunit is used for obtaining the average value of elements of the plurality of entity word vectors at the same position; and the determining subunit is used for determining the average value of each position as the element of the entity vector at each position, and obtaining the entity vector according to the element at each position.

As an alternative embodiment, the generating unit includes: the connection subunit is used for connecting the statement vector, the entity vector and the attribute vector to obtain a connection result; the second obtaining submodule is used for obtaining a first position weight of the context words of the entity information relative to the entity information and a second position weight of the context words of the attribute information relative to the attribute information, and performing weighted representation on the connection result by using the first position weight and the second position weight to obtain a weighted result, wherein the first position weight is determined according to the distance between the context words of the entity information and the entity information, and the second position weight is determined according to the distance between the context words of the attribute information and the attribute information; the coding subunit is used for coding the weighting result through the long-term and short-term memory network layer to obtain a hidden state vector of each vector; and the determining subunit is used for determining the hidden state vector as a semantic vector.

As an alternative embodiment, the second obtaining sub-module includes: obtaining the distance between the context words of the entity information and the entity information; and the second acquisition unit is used for determining the first position weight according to the distance between the context words of the entity information and the entity information, wherein the distance and the first position weight are in an inverse proportion relation.

As an alternative embodiment, the first generation submodule includes: a third obtaining unit, configured to obtain a fusion component and a preset number of iteration layers, where the fusion component is configured to fuse an entity vector and a semantic vector; the fourth acquisition unit is used for acquiring the entity semantic vector of the current layer through an attention mechanism according to the entity words and the semantic vector of the current layer; a fifth obtaining unit, configured to fuse, by a fusion component, the semantic vector of the current layer and the entity word vector of the current layer to obtain a first text vector of the current layer; and the detection unit is used for detecting whether the current layer reaches a preset iteration layer number, if so, determining that the first text vector of the current layer is the first text vector, and otherwise, continuously outputting the first text vector of the current layer to the next layer of fusion component as the entity vector of the next layer.

As an alternative embodiment, the fifth obtaining unit includes: the second acquisition subunit is used for inputting the entity semantic vector of the current layer into the first hyperbolic tangent function to obtain a first operation result; the third obtaining subunit is configured to input the entity vector of the current layer to a second hyperbolic tangent function, so as to obtain a second operation result; the fourth obtaining subunit is configured to input the entity semantic vector of the current layer and the entity vector of the current layer to a logical steckel function together, so as to obtain a third operation result; the fifth obtaining subunit is configured to weight the first operation result and the third operation result to obtain a fourth operation result; the sixth obtaining subunit is configured to subtract the third operation result from 1, and then weight the third operation result with the second operation result to obtain a fifth operation result; and the seventh obtaining subunit is configured to sum the fourth operation result and the fifth operation result to obtain the first text vector of the current layer.

As an optional embodiment, the combining sub-module comprises a splicing unit, a first processing unit and a second processing unit, wherein the splicing unit is used for splicing the first text vector and the second text vector to obtain a splicing result; the full-connection processing unit is used for processing the splicing result through the full-connection layer; and the determining unit is used for determining the emotion information of the text data according to the processing result of the full connection layer through the classifier.

Example 4

According to an embodiment of the present invention, there is further provided a text analysis apparatus for implementing the text analysis method according to embodiment 2, and fig. 8 is a schematic diagram of a text analysis apparatus according to embodiment 4 of the present application, and as shown in fig. 8, the apparatus 800 includes:

a display module 802, configured to display the obtained text data, where the text data includes entity information and attribute information of at least one product.

The output module 804 is configured to output emotion information included in the text data, where the multi-view emotion analysis model is configured to predict emotion information of the product on a preset attribute according to the entity information and the attribute information, and the emotion information is used to represent emotion tendentiousness information represented by the text data.

It should be noted here that the display module 802 and the output module 804 correspond to steps S51 to S53 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

Example 5

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the text analysis method: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and outputting the emotion information included in the text data.

Alternatively, fig. 9 is a block diagram of a computer terminal according to embodiment 5 of the present application. As shown in fig. 9, the computer terminal a may include: one or more processors 902 (only one of which is shown), memory 904, and a peripherals interface 906.

The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the text analysis method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the text analysis method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information represented by the text data; and outputting the emotion information included in the text data.

Optionally, the processor may further execute the program code of the following steps: acquiring a semantic vector of the text data, an entity vector corresponding to the entity information and an attribute vector corresponding to the attribute information based on the text data; updating the entity vector through the semantic vector to generate a first text vector; updating the attribute vector through the semantic vector to generate a second text vector; and after the first text vector and the second text vector are combined, determining the emotion information of the text data according to a combined result through a classifier.

Optionally, the processor may further execute the program code of the following steps: acquiring triple information of text data, wherein the triple information comprises: the text content of the text data, entity words representing entity information and attribute words representing preset attributes; vectorizing the triple information to obtain statement vectors of text data, entity word vectors of entity words and attribute word vectors of preset attributes; carrying out linear combination coding on the entity word vectors to obtain entity vectors; carrying out linear combination coding on the attribute word vectors to obtain attribute vectors; and generating a semantic vector according to the statement vector, the entity vector and the attribute vector.

Optionally, the processor may further execute the program code of the following steps: preprocessing the text data, wherein the preprocessing comprises any one or more of the following items: word segmentation processing, root reduction and noise elimination.

Optionally, the processor may further execute the program code of the following steps: under the condition that the entity information comprises a plurality of entity words, obtaining the average value of elements of a plurality of entity word vectors at the same position; and determining the average value of each position as the element of the entity vector at each position, and obtaining the entity vector according to the element at each position.

Optionally, the processor may further execute the program code of the following steps: connecting the statement vector, the entity vector and the attribute vector to obtain a connection result; acquiring a first position weight of a context word of the entity information relative to the entity information and a second position weight of the context word of the attribute information relative to the attribute information, and performing weighting representation on a connection result by using the first position weight and the second position weight to obtain a weighting result, wherein the first position weight is determined according to the distance between the context word of the entity information and the entity information, and the second position weight is determined according to the distance between the context word of the attribute information and the attribute information; coding the weighting result through a long-term and short-term memory network layer to obtain a hidden state vector of each vector; and determining the hidden state vector as a semantic vector.

Optionally, the processor may further execute the program code of the following steps: obtaining the distance between the context words of the entity information and the entity information; and determining a first position weight according to the distance between the context words of the entity information and the entity information, wherein the distance and the first position weight are in an inverse proportion relation.

Optionally, the processor may further execute the program code of the following steps: acquiring a fusion component and a preset iteration layer number, wherein the fusion component is used for fusing an entity vector and a semantic vector; obtaining an entity semantic vector of the current layer through an attention mechanism according to the entity words and semantic vectors of the current layer; fusing the semantic vector of the current layer and the entity word vector of the current layer by a fusion component to obtain a first text vector of the current layer; and detecting whether the current layer reaches a preset iteration layer number, if so, determining that the first text vector of the current layer is a first text vector, and otherwise, continuously outputting the first text vector of the current layer to a next layer fusion component as an entity vector of the next layer.

Optionally, the processor may further execute the program code of the following steps: inputting the entity semantic vector of the current layer into a first hyperbolic tangent function to obtain a first operation result, and inputting the entity vector of the current layer into a second hyperbolic tangent function to obtain a second operation result; inputting the entity semantic vector of the current layer and the entity vector of the current layer into a logic stecke function together to obtain a third operation result; weighting the first operation result and the third operation result to obtain a fourth operation result; after subtracting the third operation result from 1, weighting the third operation result and the second operation result to obtain a fifth operation result; and summing the fourth operation result and the fifth operation result to obtain a first text vector of the current layer.

Optionally, the processor may further execute the program code of the following steps: splicing the first text vector and the second text vector to obtain a splicing result; processing the splicing result through the full connecting layer; and determining the emotion information of the text data according to the processing result of the full connection layer through the classifier.

Optionally, the processor may further execute the program code of the following steps: the text data comprises comment texts of objects in the preset website on the products.

The embodiment of the invention provides a text analysis method. The attribute information needing to be known can be preset, a multi-view emotion analysis model is adopted, and emotion information of a product on the preset attribute is predicted based on entity information and attribute information of the text data, so that multi-dimensional analysis of the text data is achieved without adding a processing model, and the accuracy of the text analysis is improved. Therefore, the embodiment of the application solves the technical problem that in the prior art, the semantic analysis result of the text is inaccurate because the text can be subjected to the semantic analysis only from a single dimension.

It can be understood by those skilled in the art that the structure shown in fig. 9 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 9 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 90 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 9, or have a different configuration than shown in FIG. 9.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 6

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the text analysis method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring text data, wherein the text data comprises entity information and attribute information of at least one product; processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of a product on a preset attribute according to entity information and attribute information, and the emotion information is used for representing emotion tendency information to be represented by the text data; and outputting the emotion information included in the text data.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of analyzing text, comprising:

acquiring text data, wherein the text data comprises entity information and attribute information of at least one product;

processing the text data based on a multi-view emotion analysis model, and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of the product on preset attributes according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data;

and outputting the emotion information included in the text data.

2. The method of claim 1, wherein processing the text data based on a multi-view emotion analysis model to predict emotion information included in the text data comprises:

acquiring a semantic vector of the text data, an entity vector corresponding to the entity information and an attribute vector corresponding to the attribute information based on the text data;

updating the entity vector through the semantic vector to generate a first text vector;

updating the attribute vector through the semantic vector to generate a second text vector;

and after the first text vector and the second text vector are combined, determining the emotion information of the text data according to a combined result through a classifier.

3. The method of claim 2, wherein obtaining a semantic vector of the text data, an entity vector corresponding to the entity information, and an attribute vector corresponding to the attribute information based on the text data comprises:

acquiring triple information of the text data, wherein the triple information comprises: the text content of the text data, entity words representing the entity information and attribute words representing the preset attributes;

vectorizing the triple information to obtain statement vectors of the text data, and entity word vectors of the entity words and attribute word vectors of the preset attributes;

carrying out linear combination coding on the entity word vector to obtain the entity vector;

carrying out linear combination coding on the attribute word vectors to obtain the attribute vectors;

generating the semantic vector from the statement vector, the entity vector, and the attribute vector.

4. The method of claim 3, wherein the method further comprises:

preprocessing the text data, wherein the preprocessing comprises any one or more of the following items: word segmentation processing, root reduction and noise elimination.

5. The method of claim 3, wherein, in the case that the entity information includes a plurality of entity words, performing linear combination coding on the entity word vector to obtain the entity vector comprises:

obtaining the average value of elements of the entity word vectors at the same position;

and determining the average value of each position as the element of the entity vector at each position, and obtaining the entity vector according to the element at each position.

6. The method of claim 3, wherein generating the semantic vector from the statement vector, the entity vector, and the attribute vector comprises:

connecting the statement vector, the entity vector and the attribute vector to obtain a connection result;

acquiring a first position weight of a context word of the entity information relative to the entity information and a second position weight of the context word of the attribute information relative to the attribute information, and performing weighted representation on the connection result by using the first position weight and the second position weight to obtain a weighted result, wherein the first position weight is determined according to the distance between the context word of the entity information and the entity information, and the second position weight is determined according to the distance between the context word of the attribute information and the attribute information;

coding the weighting result through a long-term and short-term memory network layer to obtain a hidden state vector of each vector;

determining the latent state vector as the semantic vector.

7. The method of claim 6, wherein obtaining a first positional weight of a context term of the entity term relative to the entity term comprises:

obtaining the distance between the context words of the entity information and the entity information;

determining the first location weight according to a distance between the context words of the entity information and the entity information, wherein the distance is in an inverse proportional relationship with the first location weight.

8. The method of claim 2, wherein updating the entity vector with the semantic vector generates a first text vector comprising:

acquiring a fusion component and a preset iteration layer number, wherein the fusion component is used for fusing the entity vector and the semantic vector;

obtaining the entity semantic vector of the current layer through an attention mechanism according to the entity words and the semantic vector of the current layer;

fusing the semantic vector of the current layer and the entity word vector of the current layer by a fusion component to obtain a first text vector of the current layer;

and detecting whether the current layer reaches the preset iteration layer number, if so, determining that the first text vector of the current layer is the first text vector, otherwise, continuously outputting the first text vector of the current layer as the entity vector of the next layer to the next layer fusion component.

9. The method of claim 8, wherein fusing, by the fusion component, the semantic vector of the current layer and the entity word vector of the current layer to obtain the first text vector of the current layer comprises:

inputting the entity semantic vector of the current layer into a first hyperbolic tangent function to obtain a first operation result,

inputting the entity vector of the current layer into a second hyperbolic tangent function to obtain a second operation result;

inputting the entity semantic vector of the current layer and the entity vector of the current layer into a logic Stecke function together to obtain a third operation result;

weighting the first operation result and the third operation result to obtain a fourth operation result;

after subtracting the third operation result by 1, weighting the third operation result and the second operation result to obtain a fifth operation result;

and summing the fourth operation result and the fifth operation result to obtain a first text vector of the current layer.

10. The method of claim 2, wherein combining the first text vector and the second text vector and determining emotion information of the text data according to a result of the combining by a classifier comprises:

splicing the first text vector and the second text vector to obtain a splicing result;

processing the splicing result through a full connecting layer;

and determining the emotion information of the text data according to the processing result of the full connection layer through the classifier.

11. The method of claim 1, wherein the text data comprises text of comments made to the product by objects in a predetermined website.

12. A method of analyzing text, comprising:

displaying the acquired text data, wherein the text data comprises entity information and attribute information of at least one product;

and outputting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of the product on preset attributes according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data.

13. The method of claim 12, wherein the method further comprises:

receiving model parameters of the multi-view emotion analysis model;

and displaying the model parameters of the multi-view emotion analysis model.

14. An apparatus for analyzing text, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring text data, and the text data comprises entity information and attribute information of at least one product;

the prediction module is used for processing the text data based on a multi-view emotion analysis model and predicting emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting emotion information of the product on preset attributes according to the entity information and the attribute information, and the emotion information is used for representing emotion tendency information represented by the text data;

and the output module is used for outputting the emotion information included in the text data.

15. An apparatus for analyzing text, comprising:

the display module is used for displaying the acquired text data, wherein the text data comprises entity information and attribute information of at least one product;

and the output module is used for outputting the emotion information included in the text data, wherein the multi-view emotion analysis model is used for predicting the emotion information of the product on the preset attribute according to the entity information and the attribute information, and the emotion information is used for representing the emotion tendency information represented by the text data.

16. A system for analyzing text, comprising:

a processor; and

a memory coupled to the processor for providing instructions to the processor for processing the following processing steps:

and outputting the emotion information included in the text data.