CN113255360A

CN113255360A - Document rating method and device based on hierarchical self-attention network

Info

Publication number: CN113255360A
Application number: CN202110418139.8A
Authority: CN
Inventors: 李欣; 赵志云; 葛自发; 孙小宁; 张冰; 万欣欣; 袁钟怡; 赵忠华; 孙立远; 付培国; 王禄恒; 王晴
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-08-13

Abstract

The embodiment of the invention discloses a document rating method and device based on a hierarchical self-attention network. The method comprises the following steps: obtaining a comment text of a target document, wherein the comment text contains a plurality of comments, and each comment contains a plurality of sentences; extracting the characteristics of each word in each sentence; extracting the characteristics of each sentence in each comment from the characteristics of all words contained in each sentence in each comment on the basis of a self-attention mechanism; extracting features of each comment from features of the plurality of sentences contained in each comment based on a self-attention mechanism; extracting features of the comment text from features of the plurality of comments based on a self-attention mechanism; and generating a rating result of the target document according to the characteristics of the comment text. Based on the method and the device, the deep semantic information contained in the comment text of the target document can be fully captured, and then the rating result for the target document is automatically given.

Description

Document rating method and device based on hierarchical self-attention network

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a document rating method and device based on a hierarchical self-attention network, electronic equipment and a storage medium.

Background

With the popularization of the mobile internet, netizens have been accustomed to expressing opinions and suggestions on the internet, including evaluation of commodities on e-commerce websites, evaluation of policies in social media, and even review opinions of documents in document rating work. These evaluations all contain rich subjective information and emotional guidance. The text sentiment analysis aims to analyze the positive and negative surfaces of the evaluation of a certain object in the text, extract key information elements from unstructured text comments and abstract and classify the descriptor. The main tasks of current sentiment analysis include: word level emotion analysis, sentence, document level emotion analysis, and target level emotion analysis. The word-level emotion analysis researches how to endow words with emotion information and print positive or negative emotion labels, the sentence-level emotion analysis corresponds to the emotion labels for the whole sentence, and the target-level emotion analysis considers specific target entities and synthesizes emotion analysis of each attribute under the entity attribute set. The emotion recognition main tasks comprise evaluation keyword extraction and evaluation keyword classification, a keyword extraction technology is the core of whole text information feature extraction, keywords are words capable of expressing document center content, keyword extraction is an important branch of the text mining field and is also basic work of text classification research. In a natural language processing task, a text comment rating prediction technology is used as a rapidly-developed technology and has a wide application prospect.

Currently, most of the existing researches take text comment rating prediction as a multi-classification/regression task, and prediction is performed through supervised machine learning, and in the process, most of the researches focus on how to efficiently extract effective features, however, feature engineering is a time-consuming and labor-consuming work. In recent years, with the development of neural networks, in order to automatically learn features from text data and to mine richer text information, various deep learning-based text classification models have been proposed, which mainly include: a text classification method based on a Recurrent Neural Network (RNN), a classification model based on a Convolutional Neural Network (CNN), a Hierarchical Attention Network (HAN) introducing an attention mechanism, a large-scale pre-training model based on a Transformer and the like.

Although the models have better improvement effect on the aspect of text corpus feature extraction, some problems still exist in the aspect of information extraction, and the models cannot be directly applied to the tasks of automatic document scoring and decision recommendation.

First, using RNN-based models alone works well in NLP tasks that require understanding remote semantics, but is prone to losing previous textual information associations due to problems with their textual order processing. While CNNs work well under the very important condition of detecting local and location invariant patterns, and have less sequence than RNNs, the computation cost for obtaining the relationship between words in a sentence also increases with the increase of the sentence length, and there is a similar problem with RNNs, and a computation bottleneck is faced. In the task, the models cannot capture text semantic deep information.

Secondly, although a hierarchical network model introducing an attention mechanism provides a hierarchical framework and is well combined with a deep learning framework, the method does not focus on an academic document rating recommendation task, and is not enough to be directly applied to the rating task due to the following two reasons: firstly, the document rating data is unreasonably layered, so that deep semantic information in the rating data cannot be fully captured, and further, final acceptance or rejection binary decision judgment cannot be automatically given; second, the model is more suitable for processing shorter sentences, and the processing effect on document rating sentences with longer sentence length is not ideal.

Finally, the large-scale pre-training language model based on the Transformer is superior in performance in a plurality of classification tasks, overcomes the limitation that the calculation cost of CNN and RNN is increased along with the increase of sentence length to a certain extent, and realizes more parallelization, but does not have mature and proper application in a document recommendation task at present.

Disclosure of Invention

It is an object of embodiments of the present invention to address at least the above problems and/or disadvantages and to provide at least the advantages described hereinafter.

The embodiment of the invention provides a document rating method and device based on a hierarchical self-attention network, electronic equipment and a storage medium, which can fully capture deep semantic information contained in a comment text of a target document and further automatically give a rating result aiming at the target document.

In a first aspect, a document rating method based on a hierarchical self-attention network is provided, which includes:

obtaining a comment text of a target document, wherein the comment text contains a plurality of comments, and each comment contains a plurality of sentences;

extracting the characteristics of each word in each sentence;

extracting the characteristics of each sentence in each comment from the characteristics of all words contained in each sentence in each comment on the basis of a self-attention mechanism;

extracting features of each comment from features of the plurality of sentences contained in each comment based on a self-attention mechanism;

extracting features of the comment text from features of the plurality of comments based on a self-attention mechanism;

and generating a rating result of the target document according to the characteristics of the comment text.

Optionally, the extracting features of words in each sentence includes:

and extracting the characteristics of each word in each sentence based on a GloVe pre-training model, a BERT pre-training model or a ScIBERT pre-training model.

Optionally, the extracting, based on the self-attention mechanism, the features of each sentence in each comment from the features of all words contained in each sentence in each comment includes:

processing the characteristics of all words contained in each sentence in each comment based on a first self-attention model, extracting the relation between each word and other words in the sentence where the word is located, and obtaining the characteristics of each word contained in each sentence in each comment based on context perception;

and extracting the characteristics of each sentence in each comment from the characteristics of all words contained in each sentence in each comment based on context perception.

Optionally, the extracting features of each sentence in each comment from the context-awareness-based features of all words contained in each sentence in each comment includes:

processing the context-aware-based features of all words contained in each sentence in each comment based on a second self-attention model, and determining the attention weight of the context-aware-based features of each word contained in each sentence in each comment, wherein the attention weight of the context-aware-based features of each word contained in each sentence in each comment is used for representing the importance degree of each word contained in each sentence in each comment in the sentence;

and according to the attention weight of the context-aware-based features of all words contained in each sentence in each comment, performing weighted connection on the context-aware-based features of all words contained in each sentence in each comment to obtain the features of each sentence in each comment.

Optionally, the extracting, based on the self-attention mechanism, features of each comment from features of the plurality of sentences contained in each comment includes:

processing the characteristics of the sentences contained in the comments based on a third self-attention model, extracting the relation between each sentence in each comment and other sentences in the comment, and obtaining the characteristics of each sentence in each comment based on context perception;

extracting features of each comment from context-aware based features of the plurality of sentences contained by each comment.

Optionally, the extracting features of the comments from the context-awareness-based features of the sentences contained in the comments includes:

processing the context-aware-based features of the sentences contained in the comments based on a fourth self-attention model, and determining the attention weight of the context-aware-based features of the sentences in the comments, wherein the attention weight of the context-aware-based features of the sentences in the comments is used for representing the importance degree of the sentences in the comments;

and according to the attention weight of the context-aware-based features of the sentences contained in each comment, performing weighted connection on the context-aware-based features of the sentences contained in each comment to obtain the features of each comment.

Optionally, the extracting features of the comment text from the features of the plurality of comments based on a self-attention mechanism includes:

processing the characteristics of the comments based on a fifth self-attention model, extracting the relationship between each comment and other comments in the comment text, and obtaining the characteristics of each comment based on context perception;

extracting features of the comment text from context-aware based features of the plurality of comments.

Optionally, the extracting the feature of the comment text from the context-awareness-based features of the plurality of comments includes:

processing the context-awareness-based features of the comments based on a sixth self-attention model, and determining the attention weight of the context-awareness-based features of the comments, wherein the attention weight of the context-awareness-based features of the comments is used for representing the importance degree of the comments in the comment text;

and according to the attention weight of the feature based on the context perception of the comments, performing weighted connection on the feature based on the context perception of the comments to obtain the feature of the comment text.

Optionally, before the processing the features of the multiple comments based on the fifth self-attention model, extracting a relationship between each comment and other comments in the comment text, and obtaining the feature of each comment based on context awareness, the method further includes:

processing the features of the plurality of comments based on a bidirectional GRU neural network model.

Optionally, the first self-attention model is a bidirectional self-attention model and/or the second self-attention model is a multidimensional source2token self-attention model.

Optionally, the third self-attention model is a bidirectional self-attention model, and/or the fourth self-attention model is a multidimensional source2token self-attention model.

Optionally, the fifth self-attention model is a bidirectional self-attention model, and/or the sixth self-attention model is a multidimensional source2token self-attention model.

Optionally, after extracting the feature of the comment text from the features of the plurality of comments based on the self-attention mechanism, the method further includes:

and generating a rating result of each comment aiming at the target document according to the characteristics of each comment.

In a second aspect, a document rating apparatus based on a hierarchical self-attention network is provided, including:

the comment text acquisition module is used for acquiring a comment text of a target document, wherein the comment text contains a plurality of comments, and each comment contains a plurality of sentences;

the word feature extraction module is used for extracting the features of all words in all sentences;

the sentence characteristic extraction module is used for extracting the characteristics of each sentence in each comment from the characteristics of all words contained in each sentence in each comment on the basis of a self-attention mechanism;

the comment feature extraction module is used for extracting features of comments from the features of the sentences contained in the comments based on a self-attention mechanism;

a comment text feature extraction module, configured to extract features of the comment text from the features of the plurality of comments based on a self-attention mechanism;

and the rating result generating module is used for generating a rating result of the target document according to the characteristics of the comment text.

In a third aspect, an electronic device is provided, including: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method described above.

In a fourth aspect, a storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the method described above.

The embodiment of the invention at least comprises the following beneficial effects:

the document rating method and device based on the hierarchical self-attention network provided by the embodiment of the invention firstly obtain a comment text of a target document, wherein the comment text contains a plurality of comments, each comment contains a plurality of sentences, the characteristics of each word in each sentence are extracted, the characteristics of each sentence in each comment are extracted from the characteristics of all words contained in each sentence in each comment based on a self-attention mechanism, then the characteristics of each comment are extracted from the characteristics of the plurality of sentences contained in each comment based on the self-attention mechanism, the characteristics of the comment text are extracted from the characteristics of the plurality of comments based on the self-attention mechanism, and finally, a rating result of the target document is generated according to the characteristics of the comment text. Based on the method and the device, the comment text of the target document is divided into three levels, namely, the interior of a sentence, the interior of a comment and the inter-comment, feature extraction is carried out on each level based on a self-attention mechanism, deep semantic information of the whole comment text is fully extracted, and therefore automatic rating of the target document is achieved.

Additional advantages, objects, and features of embodiments of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of embodiments of the invention.

Drawings

FIG. 1 is a schematic diagram of a hierarchical self-attention network according to an embodiment of the present invention;

FIG. 2 is a flow diagram of a method for document rating based on a hierarchical self-attention network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a document rating device based on a hierarchical self-attention network according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in further detail with reference to the accompanying drawings so that those skilled in the art can implement the embodiments of the invention with reference to the description.

The existing hierarchical network model introducing the attention mechanism cannot be directly applied to a document rating task, and the main reason is that the hierarchical design of rating data is unreasonable, only two levels of word level and the inside of a rating wheel are considered, the relation between comments and comments is ignored, deep semantic information in the rating data cannot be fully captured, and further, a document rating result cannot be automatically given. However, it has been found through research that there are three levels of hierarchy in the rating data of a document, namely, inside a sentence, inside a comment, and between comments. Different comments are independent from each other, but in fact, a positive and negative game relationship exists, so that integration of the different comments is important for realizing comprehensive analysis of the document rating data and rating of the document. Based on the above, the embodiment of the invention provides a document rating method based on a hierarchical self-attention network, which aims to divide a comment text of a target document into three levels, namely, a sentence interior, a comment interior and a comment-comment space, and perform feature extraction on each level based on a self-attention mechanism, so that deep semantic information of the whole comment text can be fully extracted, and automatic rating of the target document is realized. In addition, the method also adopts a self-attention mechanism to replace the traditional attention mechanism, so that the information conforming to human emotion cognition is acquired more accurately.

The embodiment of the invention divides the comment text into three levels of sentence interior, comment interior and comment-comment, and based on the level division mode, the embodiment of the invention provides a hierarchical self-attention-based network for realizing the analysis and processing of the document rating data of the three-layer structure. As shown in fig. 1, in some embodiments, the hierarchical self-attention network is implemented based on a Transformer framework, with three layers of encoders, respectively, a sentence-level encoder, a comment-intra encoder, and an inter-comment encoder. The three-layer encoder encodes layer by layer, semantic information of each layer can be effectively captured, and finally more accurate vector representation of comment text features is provided. The document auto-rating process will be described below in conjunction with the hierarchical self-attention network shown in FIG. 1.

Fig. 2 is a flowchart of a document rating method based on a hierarchical self-attention network according to an embodiment of the present invention, where the method is performed by a system with processing capability, a server device, or a document rating apparatus based on a hierarchical self-attention network. As shown in fig. 2, the method includes:

step 210, obtaining a comment text of a target document, wherein the comment text contains a plurality of comments, and each comment contains a plurality of sentences.

Here, the comment text of the target document is rating data of the target document, and the comment text may contain a plurality of comments, each of which contains a plurality of sentences. It should be understood that each sentence further contains at least one word. The target document may be a scholarly paper or other type of document. When an academic paper is taken as a target document, the document rating method based on the hierarchical self-attention network provided by the embodiment of the invention can realize automatic review of the academic paper.

Step 220, extracting the characteristics of each word in each sentence.

In some embodiments, the extracting features of words in each sentence includes: and extracting the characteristics of each word in each sentence based on a GloVe pre-training model, a BERT pre-training model or a ScIBERT pre-training model. Specifically, a GloVe or BERT, SciBERT pre-training model is loaded first, and then the words in each sentence are subjected to vector conversion to obtain word feature vectors of the words in each sentence. Here, the word feature vector of each word is used to indicate the feature of each word.

The GloVe pre-training model can generate word feature vectors by utilizing a co-occurrence matrix, and can capture local information and global information in the calculation process, thereby capturing semantic information at a word level well. Coding is carried out by using a BERT pre-training model and a ScIBERT pre-training model based on a Transformer, and context characteristics can be comprehensively considered. Therefore, the pre-training model is selected to extract the characteristics of the words contained in each sentence in each comment, and a more accurate word characteristic vector can be obtained.

Next, the features of all words contained in each sentence obtained in step 220 are used as input from a sentence-level encoder in the attention network in a hierarchical manner to perform feature extraction of each sentence in each comment.

Step 230, extracting the features of each sentence in each comment from the features of all words contained in each sentence in each comment based on the self-attention mechanism.

In practical application, in a document rating situation, sentences of comments in a processed comment text are relatively long, different words in the same sentence are far apart, namely, the semantic relation between different words in the same sentence has a feature of remote dependence. Based on the above, the embodiment of the invention adopts the first self-attention model to carry out context-dependent embedding on the words, thereby effectively capturing the relation between the words at different positions in the sentence and further extracting the word characteristics which can contain more semantic information. In some embodiments, the extracting features of each of the sentences in each of the comments from features of all words contained in each of the sentences in each of the comments based on a self-attention mechanism includes:

and step S11, processing the characteristics of all words contained in each sentence in each comment based on the first self-attention model, extracting the relation between each word and other words in the sentence where the word is located, and obtaining the characteristics of each word contained in each sentence in each comment based on context perception.

In some examples, the first self-attention model may be a self-attention model (SAN). Compared to traditional CNN and RNN neural network models, the self-attention model (SAN) is more flexible for both distance dependent and local correlation modeling. Preferably, the first self-attention model may be a Bi-directional self-attention model (Bi-SAN). The bidirectional self-attention model can perform bidirectional learning through a forward self-attention network and a backward self-attention network so as to capture context information in a sentence, and therefore a new vector representation which is based on context perception and contains more semantic information is calculated for each word in the sentence. This means that the bi-directional attention model can provide more accurate word feature vectors than the normal self-attention model (SAN). In addition, the bidirectional self-attention model (Bi-SAN) has the advantages of being faster and more space-saving, and is beneficial to improving the efficiency of the document rating task.

Specifically, word feature vectors obtained based on a pre-training model of all words contained in each sentence in each comment are input to a bidirectional self-attention model (Bi-SAN). The bidirectional self-attention model provides a forward self-attention network and a backward self-attention network, two attention matrixes are respectively constructed, attention probability distribution in two directions is calculated, word weighting vector connection results in the two directions are respectively obtained according to the calculated attention probability distribution, and then the obtained word weighting vector connection results in the two directions are subjected to vector connection to obtain fine vectors with double dimensionalities. The calculation result is the feature vector based on up and down perception of each word.

Step S12 is to extract the features of each sentence in each comment from the features based on context-aware of all words included in each sentence in each comment.

In practical applications, the influence of different words on a sentence in the same sentence is different, i.e. the importance of different words in the same sentence is different. According to the embodiment of the invention, the attention weight of the feature based on the context perception of each word contained in each sentence in each comment is determined by using the second self-attention model, and the feature based on the context perception of all words in each sentence is weighted and connected based on the attention weight, so that the obtained sentence feature can reflect the semantic information of the sentence more accurately. In some embodiments, the extracting features of each sentence in each comment from context-awareness-based features of all words contained in each sentence in each comment includes:

step S121, based on the second self-attention model, processing the context-aware-based features of all the words contained in each sentence in each comment, and determining the attention weight of the context-aware-based features of each word contained in each sentence in each comment, where the attention weight of the context-aware-based features of each word contained in each sentence in each comment is used to represent the importance degree of each word contained in each sentence in each comment in the sentence in which the word is located.

And step S122, carrying out weighted connection on the context-aware-based characteristics of all words contained in each sentence in each comment according to the attention weight of the context-aware-based characteristics of all words contained in each sentence in each comment, and obtaining the characteristics of each sentence in each comment.

Specifically, the second self-attention model may assign a self-attention weight to a feature vector based on context awareness of each word included in each sentence in each comment and learn, calculate an attention score, construct an attention matrix based on a softmax function, and finally output a weighted sum of the feature vectors based on context awareness of all words in each sentence as a feature vector of each sentence according to the attention weight in the attention matrix.

In some examples, the second self-attention model may be a self-attention model (SAN). Preferably, the second self-attention model may be a multi-dimensional source2token self-attention model. Traditional attention models compute similarity scores for each word based on the embedding of the words, and cannot distinguish the meaning of the same word in different contexts. The multidimensional self-attention model can calculate a score for each feature of a respective word, and thus can select the feature that best describes the particular meaning of the word in any given context, resulting in sentence features that are more semantic to the sentence. In addition, a source2token module in the multidimensional source2token self-attention model can explore the dependency relationship between each word and the whole sentence, and finally the characteristics of the sentence are compressed into a vector expression.

In fig. 2, the sentence-level encoder is implemented by a bi-directional self-attention model and a multi-dimensional source2token self-attention model. Wherein [ w_i，j，1，w_i，j，2，…，w_i，j，L]Feature vectors corresponding to the 1 st word to the L th word of the jth sentence representing the ith comment in the comment text, [ we [_i，j，1，we_i，j，2，…，we_i，j，L]S is generated by the characteristic vector of the 1 st word to the L th word of the jth sentence of the ith comment in the comment text based on the context perception_i，jThen is the feature vector, s, corresponding to the jth sentence of the ith comment in the comment text_i，NA feature vector representing the nth sentence of the ith comment in the comment text. As can be seen from fig. 2, for the jth sentence of any comment in the comment text, after the features of all words included in the jth sentence are input to the sentence-level encoder, the feature vectors based on context perception are regenerated for each word through bidirectional self-attention model (Bi-SAN) processing, the feature vectors based on context perception are processed through the multidimensional source2token self-attention model, an attention matrix is obtained, the attention weight of the feature vectors based on context perception of each word in the sentence can be determined based on the attention matrix, and the feature vectors based on context perception of all words and the attention weights thereof are subjected to weighted summation, so that the feature vector s of the jth sentence is obtained_i，j. The same processing procedure as that of the jth sentence is adopted for other sentences in the ith comment until the feature vectors [ s ] of all sentences in the ith comment are obtained_i，1，s_i，2，…，s_i，N]. Here, the sentences contained in each comment in the comment text are processed using a sentence-level encoder to obtain a feature vector for each sentence in each comment.

Next, feature vectors of all sentences in each comment obtained by the sentence-level encoder can be used as input of the comment internal encoder, and the next processing can be carried out.

Step 240, extracting features of each comment from the features of the sentences contained in each comment based on a self-attention mechanism.

In practical application, the comment text is usually long, so that the relationship between different sentences in any one comment in the comment text and the influence degree of one sentence on another sentence need to be learned and mined. In some embodiments, said extracting features of each comment from features of the plurality of sentences contained in each comment based on a self-attention mechanism includes:

step S21, based on the third self-attention model, processing the features of the sentences contained in each comment, extracting the relationship between each sentence in each comment and other sentences in the comment, and obtaining the features of each sentence in each comment based on context perception.

In some examples, the third self-attention model may be a self-attention model (SAN). Compared to traditional CNN and RNN neural network models, the self-attention model (SAN) is more flexible for both distance dependent and local correlation modeling. Preferably, the third self-attention model may be a Bi-directional self-attention model (Bi-SAN). The bidirectional self-attention model can perform bidirectional learning through a forward self-attention network and a backward self-attention network, and further captures context information in the comments, so that a new vector representation which is based on context perception and contains more semantic information is calculated for each sentence in the comments. This means that the bi-directional attention model can provide more accurate sentence feature vectors than the normal self-attention model (SAN). In addition, the bidirectional self-attention model (Bi-SAN) has the advantages of being faster and more space-saving, and is beneficial to improving the efficiency of the document rating task.

Specifically, feature vectors of all sentences in each comment are input into a bidirectional self-attention model (Bi-SAN), the bidirectional self-attention model provides a forward self-attention network and a backward self-attention network, two attention matrixes are respectively constructed, attention probability distribution in two directions is calculated, sentence weighting vector connection results in the two directions are respectively obtained according to the calculated attention probability distribution, and then the obtained sentence weighting vector connection results in the two directions are subjected to vector connection to obtain fine vectors with double dimensionalities. The calculation result is the feature vector based on upper and lower perception of each sentence in each comment.

Step S22, extracting features of each comment from the context-aware-based features of the plurality of sentences contained in each comment.

In practical applications, a comment in a comment text usually contains a plurality of sentences with different information contents, and the influence of different sentences on the comment is different, that is, the importance degree of different sentences in the same comment is different. According to the embodiment of the invention, the fourth self-attention model is used, the initial weight is distributed to each sentence, and higher weight is distributed to the effective sentences which can generate higher influence on the output result through learning, so that the obtained comment characteristics can reflect the semantic information of the comment more accurately. In some embodiments, said extracting features of each comment from context-aware based features of the plurality of sentences contained in each comment comprises:

step S221, based on the fourth self-attention model, processing the context-aware-based features of the multiple sentences contained in each comment, and determining an attention weight of the context-aware-based features of each sentence in each comment, where the attention weight of the context-aware-based features of each sentence in each comment is used to represent an importance degree of each sentence in the comment.

Step S222, performing weighted connection on the context-aware features of the multiple sentences included in each comment according to the attention weight of the context-aware features of the multiple sentences included in each comment, so as to obtain the features of each comment.

Specifically, the fourth self-attention model may assign a self-attention weight to the context-aware-based feature vector of each sentence in each comment and learn, calculate an attention score, construct an attention matrix based on a softmax function, and finally output a weighted sum of the context-aware-based feature vectors of all words in each sentence as the feature vector of each sentence according to the attention weight in the attention matrix.

In some examples, the fourth self-attention model may be a self-attention model (SAN). Preferably, the fourth self-attention model may be a multi-dimensional source2token self-attention model. The multi-dimensional self-attention model can calculate scores for each feature of each sentence, so that the feature which can best describe the specific meaning of the sentence can be selected, and the feature of the comment which is more consistent with the comment semantics can be obtained. In addition, a source2token module in the multidimensional source2token self-attention model can explore the dependency relationship between each sentence and the whole comment, and finally the characteristics of the comment are compressed into a vector expression.

In FIG. 2, the comment intra-encoder is also implemented by a bi-directional self-attention model and a multi-dimensional source2token self-attention model. Wherein, [ s ]_i，1，s_i，2，...，s_i，N]Then is the feature vector of the 1 st sentence to the Nth sentence of the ith comment in the comment text, [ se_i，1，se_i，2，...，se_i，N]R generated from the context-aware-based feature vectors representing the 1 st sentence to the Nth sentence of the ith comment in the comment text_iFeature vector representing the ith comment, r_MThe feature vector for the mth comment. As can be seen from fig. 2, for the ith comment in the comment text, after feature vectors of all sentences included in the ith comment are input to a comment internal encoder, bidirectional self-attention model (Bi-SAN) processing is performed, a feature vector based on context awareness is regenerated for each sentence, multidimensional source2token self-attention model processing is performed to obtain an attention matrix, attention weights of the feature vectors based on context awareness of the sentences in the comment can be determined based on the attention matrix, and a feature vector r of the ith comment is obtained by performing weighted summation on the feature vectors based on context awareness and the attention weights thereof of all sentences_i. The same processing process as the ith comment is adopted for other comments in the comment text until feature vectors [ r ] of all comments in the comment text are obtained₁，r_i，...，r_M]。

Next, feature vectors of all comments obtained by the comment internal encoder can be used as input of the inter-comment encoder, and the next processing can be performed.

Step 250, extracting the feature of the comment text from the features of the comments based on a self-attention mechanism.

In practical application, the comment text is usually long and contains a plurality of comments, and positive and negative game relations exist among different comments, so that integration of different comments is very important for realizing comprehensive analysis of the document rating data and rating of the document. Based on this, the embodiment of the invention adopts the fifth self-attention model to effectively capture the relationship between different comments and the mutual influence degree between different comments, so as to extract the comment features capable of containing the information about the association of the upper and lower sentences. In some embodiments, said extracting features of said comment text from features of said plurality of comments based on a self-attention mechanism comprises:

step S31, based on the fifth self-attention model, processing the features of the comments, extracting the relationship between each comment and other comments in the comment text, and obtaining the feature of each comment based on context awareness.

In some examples, the fifth self-attention model may be a self-attention model (SAN). Compared to traditional CNN and RNN neural network models, the self-attention model (SAN) is more flexible for both distance dependent and local correlation modeling. Preferably, the fifth self-attention model may be a Bi-directional self-attention model (Bi-SAN). The bidirectional self-attention model can perform bidirectional learning through a forward self-attention network and a backward self-attention network, and further captures context information in the comment text, so that a new vector representation which is based on context perception and contains more semantic information is calculated for each comment in the comment text. This means that the bi-directional attention model can provide more accurate feature vectors for comments than the normal self-attention model (SAN). In addition, the bidirectional self-attention model (Bi-SAN) has the advantages of being faster and more space-saving, and is beneficial to improving the efficiency of the document rating task.

Specifically, feature vectors of all comments in a comment text are input into a bidirectional self-attention model (Bi-SAN), the bidirectional self-attention model provides a forward self-attention network and a backward self-attention network, two attention matrixes are respectively constructed, attention probability distribution in two directions is calculated, comment weighting vector connection results in the two directions are respectively obtained according to the calculated attention probability distribution, and then the obtained comment weighting vector connection results in the two directions are subjected to vector connection to obtain a double-dimension fine vector. The calculation result is the feature vector based on upper and lower perception of each comment.

In addition, in order to extract the features of the text of the comment more fully in consideration of the fact that the relationship between the comments is more complicated than the relationship between the interior of the sentence and the relationship between the interior of the comment, the features of all the comments extracted in step 240 are preprocessed by the recurrent neural network model before the fifth self-attention model is used to process the features of all the comments extracted in step 240.

In some examples, before the processing the features of the comments based on the fifth self-attention model, extracting relationships between the comments and other comments in the comment text, and obtaining the features of the comments based on the context perception, the method further includes: processing the features of the plurality of comments based on a bidirectional GRU neural network model.

The recurrent neural network model can adopt LSTM and GRU neural network models, and preferably adopts a bidirectional GRU neural network model. The bidirectional GRU neural network model helps to extract semantic information containing comments from other comments. Compared with LSTM and GRU neural network models, the bidirectional GRU neural network model has a simpler structure and shorter training time, so that the model complexity can be effectively reduced on the premise of better capturing the dependency relationship. Furthermore, based on the bidirectional GRU neural network model, the vector representation of each comment which is more compact and effective and accords with the phrase semantics and syntax can be obtained, then the bidirectional self-attention model is used for extracting the relation between comments, the comprehensive capability of different comments is enhanced through the two models, and the performance of document rating decision is obviously improved.

Step S32, extracting features of the comment text from the context-awareness-based features of the plurality of comments.

In practical applications, different comments in the comment text affect the document rating decision to different degrees. Based on this, the embodiment of the invention allocates higher attention weight to the comment which has larger influence on the whole document rating decision through the sixth self-attention model, so as to extract the feature which can more accurately reflect the semantic information of the comment text. In some embodiments, said extracting features of said comment text from context-aware based features of said plurality of comments comprises:

step S321, based on a sixth self-attention model, processing the features based on context awareness of the multiple comments, and determining an attention weight of the features based on context awareness of each comment, where the attention weight of the features based on context awareness of each comment is used to represent an importance degree of each comment in the comment text.

Step S322, performing weighted connection on the features of the comments based on the context perception according to the attention weights of the features of the comments based on the context perception to obtain the features of the comment text.

Specifically, the sixth self-attention model may assign a self-attention weight to the context-awareness-based feature vector of each comment and learn, calculate an attention score, construct an attention matrix based on the softmax function, and finally output a weighted sum value of the context-awareness-based feature vectors of each comment according to the attention weight in the attention matrix as the feature vector of each comment.

In some examples, the sixth self-attention model may be a self-attention model (SAN). Preferably, the sixth self-attention model may be a multi-dimensional source2token self-attention model. The multidimensional self-attention model can calculate scores for each feature of each comment, so that the feature which can best describe the specific meaning of the comment can be selected, and the feature which is more consistent with the text semantics of the comment is obtained. In addition, a source2token module in the multidimensional source2token self-attention model can explore the dependency relationship between each comment and the whole comment text, and finally the features of the comment text are compressed into a vector expression.

In fig. 2, the inter-comment encoder is implemented by a bidirectional GRU neural network model, a bidirectional self-attention model, and a multidimensional source2token self-attention model. Wherein [ r ]₁，r_i，...，r_M]Is the feature vector of the 1 st comment to the Mth comment in the comment text, [ re [₁，re₂，...，re_M]Then the feature vectors based on context awareness representing the 1 st comment to the mth comment in the comment text are generated, and rs represents the feature vector of the comment text. As can be seen from fig. 2, for the comment text, after feature vectors of all comments included in the comment text are input to the inter-comment encoder, the comment text is processed by the bidirectional GRU neural network model to obtain a new feature vector, then processed by the bidirectional self-attention model (Bi-SAN), a feature vector based on context awareness is regenerated for each comment, and then processed by the multidimensional source2token self-attention model to obtain an attention matrix, the attention weight of the feature vector based on context awareness of each comment can be determined based on the attention matrix, and the feature vector rs of the comment text is obtained by performing weighted summation on the feature vectors based on context awareness and the attention weights thereof of all comments.

And step 260, generating a rating result of the target document according to the characteristics of the comment text.

Specifically, a fully-connected layer based on the Softmax function may be constructed, the features of the comment text obtained in step 150 are input into the fully-connected layer, and finally, a binary prediction of the document rating decision is output. The two categorical opinions of the document rating decision may be recommended and not recommended, which is not particularly limited by the embodiment of the present invention. The features of the comment text described above may be represented by a feature vector of the comment text.

As shown in fig. 2, a feature vector rs of the comment text may be input into the fully-connected layer, and a rating prediction result for the target document may be output through the fully-connected layer processing.

In some embodiments, after extracting the feature of the comment text from the features of the plurality of comments based on the self-attention mechanism, the method further comprises: and generating a rating result of each comment aiming at the target document according to the characteristics of each comment. Specifically, a fully-connected layer based on the Softmax function may be constructed, taking the features of each comment of the comment text as input, and outputting a prediction of the rating for each comment to infer the numeric rating of each reviewer for the target document. The numerical score may be set to 1-10, which is not particularly limited in the embodiments of the present invention.

In summary, in the document rating method based on the hierarchical self-attention network provided by the embodiment of the present invention, firstly, a comment text of a target document is obtained, where the comment text includes a plurality of comments, each comment includes a plurality of sentences, a feature of each word in each sentence is extracted, based on an attention mechanism, a feature of each sentence in each comment is extracted from features of all words included in each sentence in each comment, then, based on the attention mechanism, a feature of each comment is extracted from features of the plurality of sentences included in each comment, based on the attention mechanism, a feature of the comment text is extracted from features of the plurality of comments, and finally, a rating result of the target document is generated according to features of the comment text. Based on the method, the comment text of the target document is divided into three levels, namely, the interior of a sentence, the interior of the comment and the inter-comment, feature extraction is carried out on each level based on a self-attention mechanism, deep semantic information of the whole comment text is fully extracted, and therefore automatic rating of the target document is achieved.

FIG. 3 is a schematic structural diagram of a document rating device based on a hierarchical self-attention network according to an embodiment of the present invention. As shown in fig. 3, the document rating apparatus 300 based on the hierarchical self-attention network includes: a comment text obtaining module 310, configured to obtain a comment text of a target document, where the comment text includes multiple comments, and each comment includes multiple sentences; a word feature extraction module 320, configured to extract features of words in each sentence; a sentence feature extraction module 330, configured to extract features of each sentence in each comment from features of all words included in each sentence in each comment based on a self-attention mechanism; a comment feature extraction module 340, configured to extract features of each comment from features of the plurality of sentences included in each comment based on a self-attention mechanism; a comment text feature extraction module 350, configured to extract features of the comment text from the features of the plurality of comments based on a self-attention mechanism; and the rating result generating module 360 is configured to generate a rating result of the target document according to the feature of the comment text.

Fig. 4 shows an electronic device of an embodiment of the invention. As shown in fig. 4, the electronic device 400 includes: at least one processor 410, and a memory 420 communicatively coupled to the at least one processor 410, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method.

Specifically, the memory 420 and the processor 410 are connected together via the bus 430, and can be a general memory and a processor, which are not limited in particular, and when the processor 410 executes a computer program stored in the memory 420, the operations and functions described in the embodiments of the present invention in conjunction with fig. 1 to 3 can be performed.

An embodiment of the present invention further provides a storage medium, on which a computer program is stored, which, when executed by a processor, implements the method. For specific implementation, reference may be made to the method embodiment, which is not described herein again.

While embodiments of the present invention have been disclosed above, it is not limited to the applications listed in the description and the embodiments. It is fully applicable to a variety of fields in which embodiments of the present invention are suitable. Additional modifications will readily occur to those skilled in the art. Therefore, the embodiments of the invention are not to be limited to the specific details and illustrations shown and described herein, without departing from the general concept defined by the claims and their equivalents.

Claims

1. A document rating method based on a hierarchical self-attention network is characterized by comprising the following steps:

extracting the characteristics of each word in each sentence;

2. The method for ranking documents based on a hierarchical self-attention network according to claim 1, wherein the extracting the features of each sentence in each comment from the features of all words contained in each sentence in each comment based on a self-attention mechanism comprises:

3. The method for ranking documents based on a hierarchical self-attention network according to claim 2, wherein said extracting the features of each sentence in each comment from the context-aware-based features of all words contained in each sentence in each comment comprises:

4. The method of claim 3, wherein the extracting features of each comment from the features of the sentences contained in each comment based on a self-attention mechanism comprises:

5. The method of claim 4, wherein the extracting features of each comment from context-aware-based features of the sentences contained in each comment comprises:

6. The method of hierarchical self-attention network-based document rating according to claim 5, wherein the extracting features of the comment text from the features of the plurality of comments based on a self-attention mechanism comprises:

7. The method of hierarchical self-attention network-based document rating according to claim 6, wherein said extracting features of the comment text from context-awareness-based features of the plurality of comments comprises:

8. The method for ranking documents based on a hierarchical self-attention network as claimed in claim 6, wherein before the processing the features of the comments based on the fifth self-attention model and extracting the relationship between each comment and other comments in the comment text to obtain the feature based on context perception of each comment, the method further comprises:

9. The hierarchical self-attention network-based document rating method of claim 7, wherein the first self-attention model is a bi-directional self-attention model and/or the second self-attention model is a multi-dimensional source2token self-attention model;

and/or the third self-attention model is a bidirectional self-attention model, and/or the fourth self-attention model is a multi-dimensional source2token self-attention model;

and/or the fifth self-attention model is a bidirectional self-attention model, and/or the sixth self-attention model is a multi-dimensional source2token self-attention model.

10. A document rating apparatus based on a hierarchical self-attention network, comprising: