CN113553855A

CN113553855A - Viewpoint role labeling method and device, computer equipment and medium

Info

Publication number: CN113553855A
Application number: CN202010339904.2A
Authority: CN
Inventors: 章波; 张月; 王睿
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2021-10-26

Abstract

The disclosure provides a method, a device, computer equipment and a medium for annotating viewpoint roles. The method comprises the following steps: inputting the linguistic data to be labeled into a syntactic model; and representing the word sequence of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled, and inputting the word sequence of the hidden state and the linguistic data to be labeled into a bidirectional long-short term memory model and a conditional random field model which are connected in series, so as to obtain the viewpoint role of the linguistic data to be labeled. The present disclosure provides a point of view role notation that does not employ SRL-assisted approaches, but that also achieves the same performance as the SRL-assisted approaches.

Description

Viewpoint role labeling method and device, computer equipment and medium

Technical Field

The present disclosure relates to the field of big data, and more particularly, to a method, an apparatus, a computer device, and a medium for annotating a viewpoint role.

Background

Opinion mining and emotion analysis have a wide range of practical applications in the big data domain, such as social media monitoring and general e-commerce applications. In particular, fine-grained analysis of opinions and emotions is key to understanding politician's standpoint, customer reviews, marketing trends, and other subjective information. Viewpoint role labeling (ORL) is a form of fine-grained sentiment analysis, widely used in the mining of large data.

ORL, automatically annotates opinion holders (i.e., who is commenting), opinion wording (how to comment), opinion targets (for what comments) for a comment sentence or paragraph, article, etc. to be annotated. After tagging, the tagged content may be further analyzed by a subsequent semantic analysis model or the like to generate various decisions, such as delivering the resources of the network to match the opinions of the users.

In the prior art, in order to improve the performance of ORL, a Semantic Role Labeling (SRL) model is generally used to help implement ORL. Namely, the corpus to be labeled is input into the ORL on one hand and the SRL on the other hand, and some semantic information of the SRL in the process of generating the semantic role label can be led back to the ORL model, so that more accurate label is realized. The role of a point of view, which is purely labeled by ORL, is inaccurate due to the lack of semantic analysis, if the semantics between words are not considered. Therefore, the performance of the ORL can be greatly improved by using some semantic information generated in the SRL.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

In view of the above, the present disclosure is directed to providing a viewpoint role notation that does not employ SRL assistance, but can achieve the same performance as the SRL assistance.

To achieve this object, according to one aspect of the present disclosure, there is provided a viewpoint character labeling method including:

inputting the linguistic data to be labeled into a syntactic model;

and representing the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled, and inputting the hidden state word sequence and the linguistic data to be labeled into a bidirectional long-short term memory model (Bi-LSTM) and a conditional random field model (CRF) which are connected in series, so as to obtain the viewpoint role of the linguistic data to be labeled.

Optionally, the syntax model includes a Bi-LSTM encoding layer, a typing layer, and a decoding layer connected in series, where the Bi-LSTM encoding layer generates a word sequence representation considering semantic relations of words before and after the corpus for the corpus to be labeled, the typing layer generates a probability matrix of a dependency probability of each word in the corpus to be labeled according to the word sequence representation, and the decoding layer generates a syntax tree according to the probability matrix; and the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled is output by the Bi-LSTM coding layer.

Optionally, the step of inputting the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled and the corpus to be labeled into the Bi-LSTM and the CRF connected in series includes:

inputting the word sequence representation in the hidden state and the linguistic data to be labeled into the Bi-LSTM;

inputting the probability matrix into a graph encoder together with the feature sequence of the Bi-LSTM output;

and outputting the encoding result output by the graph encoder to the CRF to obtain the viewpoint role of the linguistic data to be labeled.

inputting the syntax tree into a graph encoder along with the sequence of features of the Bi-LSTM output;

Optionally, the graph encoder is a graph convolution network.

Optionally, the Bi-LSTM encoding layer includes a plurality of LSTM sub-layers, and a hidden state word sequence obtained by the syntax model in the process of obtaining the syntax structure of the corpus to be labeled is represented by at least one of the following:

a hidden state word sequence representation output by a designated LSTM sub-layer in the plurality of LSTM sub-layers;

a weighted sum of the word sequence representations of the hidden states output by a part of the LSTM sublayers specified in the plurality of LSTM sublayers;

a weighted sum of the hidden-state word-sequence representations output by the plurality of LSTM sublayers.

Optionally, the syntax model further includes a first embedding layer before the Bi-LSTM encoding layer, and a hidden state word sequence obtained by the syntax model in a process of obtaining a syntax structure of the corpus to be labeled is represented as a word sequence representation output by the first embedding layer.

Optionally, the step of inputting the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled and the corpus to be labeled into the Bi-LSTM and the CRF connected in series includes: and inputting the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled and the word sequence representation obtained by the linguistic data to be labeled through a second embedding layer into the Bi-LSTM and the CRF which are connected in series.

Optionally, said inputting said syntax tree into a graph encoder together with said Bi-LSTM output feature sequence comprises: converting the syntax tree into a 0-1 connection matrix, and inputting the 0-1 connection matrix and the characteristic sequence output by the Bi-LSTM into a graph encoder.

Optionally, the probability matrix is generated by the hierarchical layer by:

according to the word sequence representation, aiming at the word of the linguistic data to be labeled, generating a dependency probability score of the word and other words in the linguistic data to be labeled;

normalizing the dependency probability score;

and generating the probability matrix by using the normalized dependency probability scores of the words and other words in the corpus to be labeled.

According to an aspect of the present disclosure, there is provided a viewpoint character labeling apparatus including:

the syntactic model input unit is used for inputting the linguistic data to be labeled into a syntactic model;

and the label obtaining unit is used for representing the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled, inputting the Bi-LSTM and the CRF which are connected in series together with the linguistic data to be labeled, and obtaining the viewpoint role of the linguistic data to be labeled.

Optionally, the annotation obtaining unit is further configured to:

Optionally, the graph encoder is a graph convolution network.

Optionally, the annotation obtaining unit is further configured to: and inputting the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled and the word sequence representation obtained by the linguistic data to be labeled through a second embedding layer into the Bi-LSTM and the CRF which are connected in series.

Optionally, the probability matrix is generated by the hierarchical layer by:

normalizing the dependency probability score;

According to an aspect of the present disclosure, there is provided a computer device including: a memory for storing computer executable code; a processor for executing the computer executable code to implement the method as described above.

According to an aspect of the present disclosure, there is provided a computer-readable medium comprising computer-executable code which, when executed by a processor, implements a method as described above.

The embodiment of the disclosure introduces the syntax information instead of the SRL semantic information into the viewpoint role labeling for the first time, and finds that the syntax information is also very useful for the ORL task compared with using the SRL semantic information to help the ORL, and the introduction of the syntax information greatly improves the model performance. The embodiment of the disclosure inputs the linguistic data to be labeled into a syntactic model. And the syntactic model generates a hidden state word sequence representation in the process of obtaining the syntactic structure of the linguistic data to be labeled. The embodiment of the disclosure inputs the generated word sequence representation in the hidden state and the corpus to be labeled into an ORL model together to obtain the viewpoint role of the corpus to be labeled. Therefore, when in labeling, the words in the corpus are not only isolated and labeled, but also the syntactic relation between the words is considered, and the syntactic relation is the same as the SRL semantic information, so that the labeling accuracy can be improved, and the model performance can be improved.

Drawings

The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:

FIGS. 1A-B illustrate interface state diagrams when the point of view role tagging method of the disclosed embodiments is applied;

FIG. 2 illustrates a flow diagram of a method of opinion role tagging according to one embodiment of the present disclosure;

FIG. 3 illustrates an interaction model diagram for perspective character tagging, highlighting the introduction of hidden state word sequence representations generated in a syntactic model into a perspective character tag, according to one embodiment of the present disclosure;

FIG. 4 illustrates an interaction model diagram for opinion character tagging highlighting the introduction of probability matrices generated in a syntactic model into opinion character tagging according to one embodiment of the present disclosure;

FIG. 5 illustrates an interaction model diagram for opinion role tagging, highlighting the introduction of a syntax tree generated in a syntactic model into opinion role tagging, according to one embodiment of the present disclosure;

FIG. 6 illustrates one example of a probability matrix according to one embodiment of the present disclosure;

FIG. 7 illustrates a block diagram of a point of view role tagging apparatus, in accordance with one embodiment of the present disclosure;

FIG. 8 shows a block diagram of a computer device according to one embodiment of the present disclosure.

Detailed Description

The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.

The viewpoint role labeling can be used in various scenes, for example, fine-grained analysis can be performed on client comments on a website, so that the understanding of the client's position is facilitated, the evaluation of related commodities is facilitated, and appropriate commodities are recommended to corresponding users, and the like. In a scenario of analyzing a customer comment on a website, a page prompting to input a sentence to be annotated may be displayed as shown in fig. 1A. Customer reviews to be analyzed are copied to the page by the backend maintainer, as in "king X: this shoe works well, costs less and delivers goods faster ". In FIG. 1B, the user is presented with the results of opinion tagging of the customer comments.

Opinion role tagging is to automatically tag opinion holders (i.e., who is commenting), opinion wording (how to comment), opinion targets (for what comments) for a comment sentence or paragraph, article, etc. to be tagged. In the annotation result shown in fig. 1B, "king X" is the opinion holder, "this shoe" is the opinion target, "do good job, cost-effective, delivery is also fast" is the opinion wording. After labeling, the labeled results can be further analyzed by a subsequent decision model and the like to generate various decisions, such as putting of the commented commodities to the matching users.

According to one embodiment of the present disclosure, a method for annotating a perspective role is provided. The method is performed by a web server when opinion persona annotations are used by a web site to analyze comments, etc., of users on the web site. When the point of view persona tagging is used by a user to understand some completeness of a problem, the method is performed by the user's terminal, on which an application executing the point of view persona tagging method of the embodiments of the present disclosure may be installed.

As shown in fig. 2, the method includes:

step 110, inputting the linguistic data to be labeled into a syntactic model;

and 120, representing the word sequence of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled, and inputting the word sequence and the linguistic data to be labeled into a bidirectional long-short term memory model (Bi-LSTM) and a conditional random field model (CRF) which are connected in series to obtain the viewpoint role of the linguistic data to be labeled.

The above steps are described in detail below.

And 110, inputting the linguistic data to be labeled into a syntactic model.

The corpus to be labeled refers to sentences, paragraphs, articles and the like to be labeled with view roles, such as "wang X: this shoe works well, costs less and delivers goods faster ".

A syntactic model is a model used to determine sentence components (e.g., subjects, predicates, objects, etc.) of a corpus. The current more general syntax model with better performance is shown on the left side of fig. 3-5, and comprises a Bi-LSTM encoding layer 201, a scoring layer 202, and a decoding layer 203.

A first embedding layer (not shown) may be provided before the Bi-LSTM encoding layer 201. When the Bi-LSTM layer 201 processes data, it usually processes the data in a matrix form to improve efficiency and implement forward propagation of the neural network, so that the input corpus is mapped to a dense vector with fixed dimensions, i.e. a dimension-increasing operation, and then is conveniently applied to subsequent neural network processing. Specifically, each word of the corpus may correspond to an m-dimensional vector, and since the corpus has a plurality of words, which are set to n, the corpus is mapped into an m × n-dimensional matrix, i.e. the embedded sequential representation 204 of the corpus to be labeled. For example, a word corresponds to a 10-dimensional vector, there are 8 words in the corpus, and the corpus becomes a 10 × 8-dimensional matrix after being processed by the embedding layer.

In the embedded sequential representation 204 of the corpus to be labeled, each word is only represented as a code only related to itself, and the semantic relation with the surrounding words is not considered, so that after passing through the Bi-LSTM layer 201, the representation of each word is not only based on the word itself but also based on the semantic relation with the surrounding words, and therefore, the sequential representation generated after passing through the Bi-LSTM layer 201 is called a hidden state word sequential representation 205, which contains partial semantic information, which can provide some help for the perspective role labeling of the embodiment of the present disclosure.

The long short term memory model (LSTM) is a special form of the Recurrent Neural Network (RNN). RNNs are neural networks that mimic the way the human brain processes information. The signals received by the human brain can be divided into external (i.e. our perception, vision, hearing, etc.) and internal (ideas already present in the brain's sea), and at different times the brain will cause a person to create new actions (external output) and new ideas (internal output). An RNN network is actually a combination of a set of forward networks, each cell representing a point in time, and each RNN cell (also called a cell) having two inputs and outputs, respectively representing the input and output of the RNN network itself and the input and output to respond to external stimuli (which may be considered as a state desired to be delivered to the next moment). The structure theoretically mimics the human brain perfectly, but is not as simple to train in practice. RNN training is usually accompanied by the phenomena of slow decay of the gradient and explosive growth of the gradient. LSTM was proposed to solve this problem.

The original purpose of the LSTM design is to ensure the integrity of information in the real world, by introducing a memory unit, i.e. recording historical information, and this recording is a control option. Thereby introducing the concept of three control gates. These three control gates control what the LSTM cell should write, read, and output, respectively. Thus, the current input, previous state and current output determine the gate control mode, and then the output is controlled and the current state is updated through the three control gates.

The forward LSTM combines with the backward LSTM to form a Bi-LSTM. For example, the sentence "love China" is encoded, the forward LSTM inputs the words "I", "love" and "China" in sequence to obtain three vectors, the backward LSTM inputs the words "China", "love" and "I" in sequence to obtain three vectors, and the final output is obtained after the results are spliced. Bi-LSTM has more stable performance than LSTM.

In addition, the Bi-LSTM encoding layer 201 may be a single Bi-LSTM encoding layer or may include a plurality of LSTM sublayers. Each LSTM sublayer outputs a process hidden state word sequence representation, and the uppermost LSTM sublayer outputs a final hidden state word sequence representation.

After the Bi-LSTM encoding layer 201 outputs the hidden state word sequence representation 205, the hidden state word sequence representation 205 enters the scoring layer 202 for scoring. The layering 202 generates, for each word of the corpus to be annotated, a dependency probability score between the word and another word in the corpus to be annotated according to the input hidden state word sequence representation 205. For example, for "i am a chinese person", the probability score of dependency of "i am" with each of the other three words "is", "china" and "person" is determined; the probability score for dependency of "yes" with each of the other three words "i", "china" and "people" … … is determined. These dependency probability scores are then normalized, i.e., for one word, such that the sum of its dependency probability scores with each of the other words is 1. The probability matrix is generated by using the normalized dependency probability scores of each word and other words in the corpus to be labeled, for example, the left side of fig. 6 is an example of a probability matrix, the right side of fig. 6 is a dependency graph, each word is represented by a dot in the graph, the dependency between words is represented by a bidirectional arrow between words, and the numerical value labeled at the arrow is the normalized dependency probability score.

The probability matrix is then input to a decoding layer 203, such as a viterbi decoder. The viterbi decoder uses a viterbi decoding algorithm to obtain an optimal syntax tree. In the syntax tree, a plurality of labeled branches of syntax are formed, each branch is accompanied by a probability, and finally, the best branch is selected to obtain the label of the syntax. For example, "i eat apple in a unit", one branch in the syntax tree may be "name-intermediary-name-action-name", and the other branch is "name-action-name", the former has a higher probability than the latter, and the former should be selected as the syntax notation result.

In step 120, the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled is represented, and is input into a bidirectional long-short term memory model (Bi-LSTM) and a conditional random field model (CRF) which are connected in series together with the corpus to be labeled, so as to obtain the viewpoint role of the corpus to be labeled.

Conventional point of view character annotation models include the Bi-LSTM model 209 and the CRF210, as shown on the right side of FIG. 3.

Through the processing of the Bi-LSTM model 209, a better representation method for the input linguistic data to be labeled is obtained, the Bi-LSTM model 209 can well extract the characteristics of the linguistic data to be labeled and output the characteristic representation form of the linguistic data to be labeled, and finally, in the labeling stage, the speech data to be labeled is generally processed by softmax. However, this approach has limited effectiveness in processing data having strong relationships between output tags. Particularly, when an actual sequence labeling task (for example, labeling viewpoint roles), because the neural network structure has a great dependence on data, and the size and quality of data volume also seriously affect the model training effect, a method of combining the existing linear statistical model with the neural network structure appears, and the effect is better to be the combination of LSTM and CRF. In brief, the Bi-LSTM model 209 is followed by the CRF210, the former is used for solving the problem of extracting sequence features, and the latter is used for effectively utilizing sentence-level marking information to solve the problem of sequence marking. Under the LSTM + CRF model, the output labels are not independent labels any more, but the best label sequence.

In the conventional view role labeling model, the Bi-LSTM model 209 may be preceded by a second embedding layer (not shown). Similar to the first embedding layer, the second embedding layer also performs the function of mapping the input corpus into a dense vector with fixed dimensionality, namely, the ascending-dimension operation, and then is conveniently applied to the subsequent neural network processing. In the traditional model, only the word sequence representation obtained by the corpus to be labeled through the second embedding layer is input into the Bi-LSTM model 209 and the CRF210 which are connected in series. However, in the embodiment of the present disclosure, the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be annotated, together with the word sequence representation obtained by the corpus to be annotated through the second embedding layer, is input into the Bi-LSTM model 209 and the CRF210 connected in series. Because the word sequence representation in the hidden state reflects the hidden state relation between words, which is the same as the SRL semantic information and can improve the accuracy of labeling, the word sequence representation in the hidden state is introduced and input into the Bi-LSTM model 209 and the CRF210 which are connected in series together with the word sequence representation obtained by the second embedding layer of the linguistic data to be labeled, so that the labeling performance is improved.

As described above, the Bi-LSTM encoding layer may include a plurality of LSTM sublayers. Thus, the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled can be the word sequence representation of the hidden state output by the top LSTM sublayer in the plurality of LSTM sublayers, and is also the word sequence representation of the hidden state output by the whole Bi-LSTM coding layer. The word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled can also be other LSTM sublayers (such as the bottommost layer and the second LSTM sublayer from the bottom) which are not the topmost LSTM sublayer. It may also be a weighted sum of the hidden-state word sequence representations output by a specified portion of the plurality of LSTM sublayers. For example, the Bi-LSTM coding layer has 3LSTM sublayers in total, and the weights from the bottom 1 st and 2 nd LSTM sublayers are set to 0.6 and 0.4, respectively, which is a weighted sum of the 2 weights for the hidden-state word sequence representation output for the 2LSTM sublayers. The word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled can also be a weighted sum of the word sequence representations of the hidden state output by the plurality of LSTM sublayers. For example, the Bi-LSTM coding layer has 3LSTM sublayers in total, the weights of the 1 st, 2 nd and 3 rd LSTM sublayers from the base number are set to 0.5, 0.3 and 0.2, respectively, and the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled is the weighted sum of the word sequence representations of the hidden state output by the 3LSTM sublayers by using the 3 weights. The contribution of each LSTM sublayer can be reflected scientifically by utilizing a weighted sum method, and the labeling quality is improved.

As shown in fig. 3, in the syntax model, there are 3 kinds of syntax information available for annotation of the viewpoint character: hidden state word sequence representation 205, probability matrix 206, syntax tree 207. The hidden state word sequence representation 205 is a word sequence representation that reflects the dependency relationships between words of the input corpus, and therefore carries semantic relationships between words, which can be used to assist in opinion role tagging. The probability matrix 206 reflects the probability score of dependence of each word of the normalized input corpus on other words in the input corpus, and thus carries semantic relationships between words, which can be used to assist in opinion role labeling. The syntax tree 207 reflects the syntax of the input corpus that the syntax model finally recognizes, and thus carries semantic relationships between words, which can be used to assist in opinion role tagging. The use of the hidden-state word sequence representation 205 to facilitate point of view role tagging was previously described in detail.

As shown in fig. 4, a graph encoder 212 is introduced in the context of utilizing the probability matrix 206 to facilitate point of view role labeling. The probability matrix 206 is input to a graph encoder 212 along with the signature sequence output by the Bi-LSTM model 209. And outputting the encoding result output by the graph encoder 212 to the CRF210 to obtain the viewpoint role of the linguistic data to be annotated.

Convolutional neural networks are widely used for computer vision and natural language processing, but the research objects are limited to data with a regular spatial structure, such as a regular square grid of pictures, and a regular one-dimensional sequence of speech. And the data structures can be represented by a one-dimensional and two-dimensional matrix, and the convolutional neural network is very efficient to process. However, for spatial structure data without rules, such as abstracted maps of recommendation systems, electronic transactions, computational geometry, brain signals, molecular structures, etc., the connections of each node of these map structures are different, some nodes have three connections, some nodes have two connections, and the data structure is an irregular data structure, and thus cannot be processed by a convolutional neural network. Thus, a graph encoder 212 is generated, one of the more influential graph encoders being the graph convolution network.

The graph has two basic characteristics: firstly, each node has its own characteristic information. For example, a wind control rule is established, whether the registered address, the IP address and the receiving address of the transaction of the user are the same or not is checked, and if the characteristic information does not match, the system judges that the user has a certain fraud risk. This is the application to the graph node characteristic information. And secondly, each node in the graph also has structural information. If a certain IP node is connected to a large number of transaction nodes during a certain period of time, that is, there are a large number of edges extending from the certain IP node, the wind control system may determine that there is a risk in the IP address. This is an application to graph node structure information. In the graph data, the characteristic information and the structural information of the nodes are considered at the same time, if the nodes are extracted by manual rules, a plurality of hidden and complex modes are necessarily lost, and the graph convolution neural network can automatically learn the characteristic information and the structural information of the graph at the same time.

In addition to the graph convolution network, the graph encoder 212 may be a general tree gate cycle unit (TreeGRU), a tree long short term memory model (trelstm), or the like, or a Shortest Dependent Path (SDP) using only partial trees, or the like, but the graph convolution neural network has better performance.

Probability matrix 206 and syntax tree 207 are graph structure data that, if introduced to assist in opinion role labeling, is input to a graph encoder 212, such as a graph convolution network. At the same time, the feature sequences output by the Bi-LSTM model 209 are also input to the graph encoder 212 together. The output of the graph encoder 212 represents the function of the probability matrix 206 and the function of the original corpus to be labeled. And outputting the encoding result output by the graph encoder 212 to the CRF210, so as to obtain the viewpoint role 211 of the corpus to be annotated.

As shown in fig. 5, a graph encoder 212 is also employed in the context of utilizing the probability matrix 206 to facilitate point of view role labeling. Instead of inputting the probability matrix 206 into the graph encoder 212 along with the signature sequence output by the Bi-LSTM model 209, this embodiment inputs the syntax tree 207 into the graph encoder 212 along with the signature sequence output by the Bi-LSTM model 209. And outputting the encoding result output by the graph encoder 212 to the CRF210 to obtain the viewpoint role 211 of the linguistic data to be annotated.

Since the graph structure of the syntax tree is not straightforward, in one embodiment, it can be converted to a more direct graph structure, i.e., a 0-1 connection matrix, input to the graph encoder 212. That is, the syntax tree 207 is converted into a 0-1 connection matrix, which is input to the graph encoder 212 together with the feature sequence output by the Bi-LSTM model 209.

In addition, in an embodiment, the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled may not be any sub-layer hidden state word sequence representation generated in the middle of the encoding process of the Bi-LSTM encoding layer 201 or a weighted sum thereof, but be a word sequence representation output by the first embedding layer, that is, a word sequence representation input to the Bi-LSTM encoding layer 201. Although the word sequence does not pass through the Bi-LSTM encoding layer 201, the scoring layer 202, and the decoding layer 203, since the Bi-LSTM encoding layer 201, the scoring layer 202, the decoding layer 203, and the embedding layer preceding the Bi-LSTM encoding layer 201 are trained as a whole, they can reflect certain syntax information or semantic relation information between words, and thus, they can also help the annotation of viewpoint characters.

In summary, the embodiments of the present disclosure introduce syntax information, rather than SRL semantic information, into the opinion role labeling for the first time, and find that the syntax information is also very useful for ORL tasks compared to using SRL semantic information to help ORL, and the introduction of the syntax information greatly improves the model performance. The embodiment of the disclosure inputs the linguistic data to be labeled into a syntactic model. And the syntactic model generates a hidden state word sequence representation in the process of obtaining the syntactic structure of the linguistic data to be labeled. The embodiment of the disclosure inputs the generated word sequence representation in the hidden state and the corpus to be labeled into an ORL model together to obtain the viewpoint role of the corpus to be labeled. Therefore, when in labeling, the words in the corpus are not only isolated and labeled, but also the syntactic relation between the words is considered, and the syntactic relation is the same as the SRL semantic information, so that the labeling accuracy can be improved, and the model performance can be improved.

As shown in fig. 7, according to an embodiment of the present disclosure, there is provided a viewpoint character labeling apparatus 300 including:

a syntactic model input unit 310, configured to input the corpus to be labeled into a syntactic model;

and a label obtaining unit 320, configured to input the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the corpus to be labeled and the corpus to be labeled into a bidirectional long-short term memory model (Bi-LSTM) and a conditional random field model (CRF) connected in series, so as to obtain the viewpoint role of the corpus to be labeled.

Optionally, the annotation obtaining unit 320 is further configured to:

Optionally, the graph encoder is a graph convolution network.

Optionally, the probability matrix is generated by the hierarchical layer by:

normalizing the dependency probability score;

The details of the above-mentioned viewpoint role labeling apparatus 300 have been described in detail in the above method embodiments, and are not repeated herein.

Point of view role tagging according to one embodiment of the present disclosure can be implemented by the computer device 800 of fig. 8. A computer device 800 according to an embodiment of the disclosure is described below with reference to fig. 8. The computer device 800 shown in fig. 8 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 8, computer device 800 is in the form of a general purpose computing device. The components of computer device 800 may include, but are not limited to: the at least one processing unit 810, the at least one memory unit 820, and a bus 830 that couples the various system components including the memory unit 820 and the processing unit 810.

Wherein the storage unit stores program code that is executable by the processing unit 810 to cause the processing unit 810 to perform the steps of the various exemplary embodiments of the present disclosure described in the description section of the above exemplary methods of the present specification. For example, the processing unit 810 may perform the various steps as shown in fig. 2.

The storage unit 820 may include readable media in the form of volatile memory units such as a random access memory unit (RAM)8201 and/or a cache memory unit 8202, and may further include a read only memory unit (ROM) 8203.

The storage unit 820 may also include a program/utility 8204 having a set (at least one) of program modules 8205, such program modules 8205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 830 may be any of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The computer device 800 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the computer device 800, and/or with any devices (e.g., router, modem, etc.) that enable the computer device 800 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 850. Also, computer device 800 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet) via network adapter 860. As shown, the network adapter 860 communicates with the other modules of the computer device 800 via a bus 830. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 800, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

It should be understood that the above-described are only preferred embodiments of the present disclosure, and are not intended to limit the present disclosure, since many variations of the embodiments described herein will occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

It should be understood that the embodiments in this specification are described in a progressive manner, and that the same or similar parts in the various embodiments may be referred to one another, with each embodiment being described with emphasis instead of the other embodiments.

It should be understood that the above description describes particular embodiments of the present specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

It should be understood that an element described herein in the singular or shown in the figures only represents that the element is limited in number to one. Furthermore, modules or elements described or illustrated herein as separate may be combined into a single module or element, and modules or elements described or illustrated herein as single may be split into multiple modules or elements.

It is also to be understood that the terms and expressions employed herein are used as terms of description and not of limitation, and that the embodiment or embodiments of the specification are not limited to those terms and expressions. The use of such terms and expressions is not intended to exclude any equivalents of the features shown and described (or portions thereof), and it is recognized that various modifications may be made within the scope of the claims. Other modifications, variations, and alternatives are also possible. Accordingly, the claims should be looked to in order to cover all such equivalents.

Claims

1. A method for annotating opinion roles comprises the following steps:

inputting the linguistic data to be labeled into a syntactic model;

2. The method as claimed in claim 1, wherein the syntactic model includes a Bi-LSTM encoding layer, a typing layer, and a decoding layer connected in series, the Bi-LSTM encoding layer generates a word sequence representation considering semantic relations of words before and after the corpus for the corpus to be annotated, the typing layer generates a probability matrix of dependency probability of each word in the corpus to be annotated according to the word sequence representation, and the decoding layer generates a syntactic tree according to the probability matrix;

and the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled is output by the Bi-LSTM coding layer.

3. The method according to claim 2, wherein said inputting the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled into the Bi-LSTM and CRF connected in series together with the linguistic data to be labeled comprises:

4. The method according to claim 2, wherein said inputting the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled into the Bi-LSTM and CRF connected in series together with the linguistic data to be labeled comprises:

5. The method of claim 3 or 4, wherein the graph encoder is a graph convolution network.

6. The method of claim 2, wherein the Bi-LSTM encoding layer comprises a plurality of LSTM sub-layers, and the syntax model obtains the hidden state word sequence in the process of obtaining the syntax structure of the corpus to be labeled as at least one of:

7. The method as claimed in claim 2, wherein the syntactic model further includes a first embedding layer before the Bi-LSTM encoding layer, and the syntactic model obtains a hidden state word sequence representation obtained in the process of obtaining the syntactic structure of the corpus to be labeled as a word sequence representation output by the first embedding layer.

8. The method according to claim 1, wherein said inputting the hidden state word sequence representation obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled into the Bi-LSTM and CRF connected in series together with the linguistic data to be labeled comprises:

and inputting the word sequence representation of the hidden state obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled and the word sequence representation obtained by the linguistic data to be labeled through a second embedding layer into the Bi-LSTM and the CRF which are connected in series.

9. The method of claim 4, wherein said inputting the syntax tree into a graph encoder along with the sequence of features of the Bi-LSTM output comprises: converting the syntax tree into a 0-1 connection matrix, and inputting the 0-1 connection matrix and the characteristic sequence output by the Bi-LSTM into a graph encoder.

10. The method of claim 2, wherein the probability matrix is generated by the hierarchical layer by:

normalizing the dependency probability score;

11. A point of view role tagging apparatus comprising:

and the label obtaining unit is used for representing the hidden state word sequence obtained by the syntactic model in the process of obtaining the syntactic structure of the linguistic data to be labeled, inputting the hidden state word sequence and the linguistic data to be labeled into a bidirectional long-short term memory model (Bi-LSTM) and a conditional random field model (CRF) which are connected in series, and obtaining the viewpoint role of the linguistic data to be labeled.

12. The apparatus according to claim 11, wherein the syntactic model comprises a Bi-LSTM encoding layer, a typing layer, and a decoding layer connected in series, the Bi-LSTM encoding layer generates a word sequence representation considering semantic relations of words before and after a corpus for the corpus to be annotated, the typing layer generates a probability matrix of dependency probabilities of the words in the corpus to be annotated according to the word sequence representation, and the decoding layer generates a syntax tree according to the probability matrix;

13. The apparatus of claim 12, wherein the annotation obtaining unit is further configured to:

14. The apparatus of claim 12, wherein the annotation obtaining unit is further configured to:

15. The apparatus of claim 13 or 14, wherein the graph encoder is a graph convolution network.

16. The apparatus of claim 12, wherein the Bi-LSTM encoding layer comprises a plurality of LSTM sub-layers, and the syntax model obtains a hidden state word sequence in the process of obtaining the syntax structure of the corpus to be labeled as at least one of:

17. The apparatus of claim 12, wherein the syntactic model further comprises a first embedding layer before the Bi-LSTM encoding layer, and the syntactic model obtains a hidden state word sequence representation obtained in the process of obtaining the syntactic structure of the corpus to be labeled as a word sequence representation output by the first embedding layer.

18. The apparatus of claim 11, wherein the annotation obtaining unit is further configured to:

19. The apparatus of claim 14, wherein said inputting the syntax tree into a graph encoder along with the sequence of features of the Bi-LSTM output comprises: converting the syntax tree into a 0-1 connection matrix, and inputting the 0-1 connection matrix and the characteristic sequence output by the Bi-LSTM into a graph encoder.

20. The apparatus of claim 12, wherein the probability matrix is generated by the hierarchical layer by:

normalizing the dependency probability score;

21. A computer device, comprising:

a memory for storing computer executable code;

a processor for executing the computer executable code to implement the method of any one of claims 1-10.

22. A computer-readable medium comprising computer-executable code that, when executed by a processor, performs the method of any one of claims 1-10.