CN116187339B

CN116187339B - Automatic composition scoring method based on feature semantic fusion of double-tower model

Info

Publication number: CN116187339B
Application number: CN202310104079.1A
Authority: CN
Inventors: 刘杰; 王炯; 张磊
Original assignee: Capital Normal University
Current assignee: Capital Normal University
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2024-03-01
Anticipated expiration: 2043-02-13
Also published as: CN116187339A

Abstract

The invention discloses a composition automatic scoring method based on feature semantic fusion of a double-tower model, which comprises the following steps: constructing an initial double-tower model, acquiring part-of-speech representation of a composition according to non-target subject composition data, acquiring word-level composition feature representation through a convolutional neural network based on the part-of-speech representation of the composition, and acquiring sentence-level part-of-speech representation through a cyclic neural network based on the word-level feature representation; acquiring composition semantic representations according to non-target subject composition data, and acquiring sentence-level semantic representations by using a cyclic neural network based on the composition semantic representations; merging the part-of-speech representation at the sentence level and the semantic representation at the sentence level, acquiring a composition representation by using an attention mechanism, training an initial double-tower model based on the composition representation and the composition manual characteristics, and acquiring a trained double-tower model; and acquiring the composition data of the target subject, inputting the composition data of the target subject into the trained double-tower model, and acquiring the composition score.

Description

Automatic composition scoring method based on feature semantic fusion of double-tower model

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a composition automatic scoring method for feature semantic fusion based on a double-tower model.

Background

Automatic scoring of compositions is an important task in the field of natural language processing, and aims to automatically score compositions by using machines instead of manual scoring personnel. The composition automatic scoring system has wide application prospect in the education field, and under the premise of guaranteeing scientific and reasonable scoring, the scoring efficiency can be improved by using a machine, and subjectivity in artificial scoring can be reduced, so that the fairness of scoring is guaranteed as much as possible.

Currently, with the rise of deep learning technology, most of mainstream composition automatic scoring systems are end-to-end composition automatic evaluating methods based on deep learning. Taghipou et al use convolutional and recurrent neural networks to directly learn semantic representations of compositions to predict composition scores; dong et al uses a hierarchical modeling approach to sequentially construct word-level, sentence-level, and document-level representations of a composition through a multi-layer neural network model to predict the final result; the following Dong et al have introduced attention mechanisms based on the hierarchical modeling they have proposed to distinguish the contribution of different words and different sentences to the constituent document-level representations, highlighting the representation of the emphasized words or sentences to enhance the predictive effect. With the development of a large-scale pre-training network model in recent years, particularly a Fine-Tuning model Bert, a better effect is shown in various downstream tasks, so that the application of the pre-training model in automatic scoring of composition is a development trend. Uto and the like, which combine feature engineering with Bert, and incorporate feature vectors into a neural network model to predict scores of compositions; yang et al propose an automatic scoring method based on Bert, using two loss models to dynamically adjust weights to score loss results.

Although pre-training neural network models has improved scoring accuracy to a higher level, acquiring tagged data in a real application scenario requires a significant amount of human and material resources. Therefore, the existing marked data is used for scoring the unmarked data, so that the method is more suitable for practical application, namely cross-subject scoring research is realized.

According to the existing research, jin and the like, scoring prediction across topics is achieved by using a mode of combining scores of two stages, firstly, in the first stage, pseudo tags {0,1} are distributed to target topic articles with lowest and highest quality, and the scoring prediction is achieved by extracting prompt independent features and training a supervision model of non-target topic articles. In the second stage, the pseudo-tagged target cue articles are used as training data for a neural network with topic specific features to predict all target topic articles. Although this approach may achieve good performance when there are no tagged target subject articles, it still requires a large number of untagged target subject articles to assign pseudo tags. Ridley et al, using fully the articles of non-target subjects as training data, propose a more generalized article representation, part-of-speech (part-of-speech) for the way text is represented, i.e. for text data, for example, "i" label it as noun, "done" label as verb, and then the passage of the composition becomes noun- > -verb- > -adverb- > -noun. However, there is no way to explain why the same part-of-speech representation labels data differently, since this method only extracts part-of-speech representations and ignores the semantic representations of the articles, because it lacks some scoring interpretations for the text representations.

Disclosure of Invention

The invention aims to provide a composition automatic scoring method based on feature semantic fusion of a double-tower model, so as to improve the problem that semantic features are absent only by using part-of-speech representation of articles when cross-topic composition automatic scoring is carried out, and improve the weight representation of important words, thereby improving the effect of cross-topic composition automatic scoring.

In order to achieve the above purpose, the invention provides a composition automatic scoring method for feature semantic fusion based on a double-tower model, which comprises the following steps:

an initial double-tower model is built, composition part-of-speech representations are obtained according to non-target subject composition data, feature representations of word levels are obtained through a convolutional neural network based on the composition part-of-speech representations, and part-of-speech representations of sentence levels are obtained through a cyclic neural network based on the feature representations of the word levels;

acquiring composition semantic representations according to non-target subject composition data, and acquiring sentence-level composition semantic representations through a cyclic neural network based on the composition semantic representations;

merging the part-of-speech representation of the sentence level and the semantic representation of the sentence level, and acquiring a composition representation by using an attention mechanism, training the initial double-tower model based on the composition representation and the composition manual characteristic, and acquiring a trained double-tower model;

and obtaining the composition data of the target subject, inputting the composition data of the target subject into the trained double-tower model, and obtaining the composition score.

Optionally, the method for obtaining the composition part-of-speech representation according to the non-target subject composition data comprises the following steps: acquiring a part-of-speech representation, denoted P, at the POS feature layer of a non-target article _pos =[[p _1,j ,p _2,j ,p _3,j , …,p _n,j ],…,[p _i,j+n ,p _i+1,j+n ,p _i+2,j+n , …,p _i+n-1,j+n ]]Wherein P is _pos For part of speech representation, i and j are subscripts of the part of speech representation vector.

Optionally, the method for obtaining the feature representation of the word level by using the convolutional neural network based on the composition part-of-speech representation is as follows:

W _word = [w ₁ ,w ₂ ,w ₃ ,…,w _j ,…, w _n ]

wherein W is _word For word level representation, w is the number of convolution kernels of the convolution calculation, w _j For the feature representation at word level, j is the vector subscript, n is the number of words,is a weight matrix>For the matrix index of the corresponding weight, +.>Is a matrix deviation.

Optionally, the method for obtaining the part-of-speech representation of the sentence level through the recurrent neural network based on the feature representation of the word level is as follows:

wherein,weight matrix at word level, +.>Weight vector at word level, +.>For the bias vector +.>Andis->Attention vector and +.>Attention weight of individual word, +.>For part-of-speech representation at sentence level, +.>For circulating neural network->Output of individual time steps,/->Is a vector of parts of speech representations.

Optionally, the method for acquiring the semantic representation of the sentence level through the recurrent neural network based on the composition semantic representation comprises the following steps:

extracting semantic information of a document level based on semantic features of a non-target article to obtain complete article level representation, extracting sentence level information through a cyclic neural network, and obtaining sentence level semantic representation.

Optionally, the method for fusing the part-of-speech representation at the sentence level and the semantic representation at the sentence level and obtaining the composition representation using an attention mechanism comprises:

wherein,for a composition sentence representation after fusing part-of-speech representation and semantic representation,/a->For the fused composition representation s is part of speech representation, ht is semantic representation, ++>Is->Attention weight matrix, q _s ,s _j Is a weight matrix, < >>Is a deviation vector, +.>Fusing vectors of part of speech and semantic representations for the ith sentence,/->Is the attention weight of the t-th sentence.

Optionally, the composition manual feature includes: length-based features, syntax-based features, word-based features, and article-readability-based features.

The invention has the technical effects that: the invention discloses a composition automatic scoring method based on feature semantic fusion of a double-tower model, which solves the defect of lack of semantic related information during cross-topic composition automatic scoring, and solves the problem of undefined importance weight distribution of part-of-speech representation, thereby enabling important parts-of-speech to have more functions during scoring and finally improving the cross-topic composition automatic scoring result.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a schematic diagram of a neural network feature extraction model of a composition automatic scoring method for feature semantic fusion based on a double-tower model in an embodiment of the invention;

fig. 2 is a flow chart of a composition automatic scoring method for feature semantic fusion based on a double-tower model according to an embodiment of the invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

As shown in fig. 1-2, in this example, a composition automatic scoring method for feature semantic fusion based on a dual-tower model is provided, which includes the following steps:

First, word embedding layer of non-target composition word part representation is used for obtaining word part representation, wherein word embedding is expressed as

P _pos =[[p _1,j ,p _2,j ,p _3,j , …,p _n,j ],…,[p _i,j+n ,p _i+1,j+n ,p _i+2,j+n , …,p _i+n-1,j+n ]]Wherein P is _pos For part of speech representation, i and j are subscripts of the part of speech representation vector.

Then Word-level CNN layer (Word-level convolutional neural network layer) is used to extract the feature representation of sentence level of part-of-speech representation, and the extracted Word level representation is W _word = [w ₁ ,w ₂ ,w ₃ ,…,w _j ,…, w _n ]

Wherein W is _word For word level representation, w is the number of convolution kernels of the convolution calculation, w _j For the feature representation at word level, j is the vector subscript, n is the number of words,is a weight matrix>For the matrix index of the corresponding weight, +.>Is a matrix deviation. />The feature representation of the corresponding part-of-speech level is obtained, after the representation of the word level is obtained, the important representation is obtained by using the Attention, and the representation of the sentence level is obtained by using the LSTM, wherein the calculation formula is as follows:

On the right side of the model diagram we can see a flow chart for our acquisition of semantic representations. First, the document-level semantic information is extracted using a transducer, and a complete article-level representation is obtained from the first output location of the transducer model. I.e., non-target prompt transformer embedding (word embedding layer of the transformer of the non-target subject), then to ensure that the part-of-speech information at the sentence level and the semantic information at the sentence level can be aligned, the information at the sentence level needs to be extracted using LSTM. The specific extraction formula is as follows:

tanh(/>)

is to propose the t-th sentence representation from the transducer,>is the hidden state at time t generated from the LSTM.,/>,/>,/>,/>, />, />And->Are all weight matrices calculated by the neural network, < ->, />, />, />Is a bias vector.

The two representations are fused: combining the part-of-speech representation at the sentence level with the semantic representation at the sentence level, the representation being as follows:

is after fusing part-of-speech representation and semantic representationThe fused article representation P is obtained by using an attention mechanism, and finally transmitted to a full connection layer to be combined with the article characteristics, and finally the score of the article is predicted. Where s is part-of-speech representation, ht is semantic representation, < >>Is->Attention weight matrix, q _s ,s _j Is a weight matrix, < >>Is a deviation vector, +.>Fusing vectors of part of speech and semantic representations for the ith sentence,/->Is the attention weight of the t-th sentence.

The method directly extracts some characteristics of the composition, and the characteristics can expand the characteristic representation of the article when the characteristics are used for automatic scoring of the composition. As shown in fig. 1, these composition features are combined with the composition parts obtained by the model to be enhanced, and finally transmitted to the activation function of sigmoid to calculate the final score. The composition features used comprise a plurality of different types, and feature fusion fuses the composition features which are more helpful for scoring together, so that the feature differences of compositions with different scores are more obvious. The specific manual feature classes used are shown in table 1.

TABLE 1

Compared with the prior art, the method has the beneficial technical effects that: the defect that semantic related information is lacking in automatic scoring of cross-topic composition is overcome, importance weights expressed by parts of speech are enhanced, more effects are generated when the important parts of speech are scored, and finally the result of automatic scoring of the cross-topic composition is improved.

To demonstrate certain advantages in using the present method in performing as a score, the present invention will perform experiments on the Automated Student's Assessment Prize (ASAP) Automatic Essay Grading dataset. The dataset is the largest, most commonly used, composition scoring dataset presently disclosed, and is widely used to evaluate the scoring accuracy of automatic composition scoring models. The ASAP dataset consisted of a subset of 8 subject articles containing three composition genres: in order to demonstrate the usability of the method, the invention will be tested under various genres of data sets, with the data statistics used in particular as shown in table 2.

TABLE 2

To more clearly demonstrate the advantages of the present method, the present invention is compared to the following mainstream baseline method: 1) CNN-CNN-MoT, 2) CNN-LSTM-ATT, 3) TDNN-mod, 5) PAES-mod. (-mod, representing the model results reproduced in the same experimental environment using the model.) to ensure comparative consistency with related work, the present invention uses a Quadratic Weighted Kappa (QWK) coefficient as a measure of the accuracy of the scoring of the method, the QWK coefficient being used to measure the consistency between the two sets of scores (machine score and manual score), typically between 0 and 1, with higher scores indicating higher consistency. If less than 0, the consistency is interpreted as even less than a random score. Experiment setting: the pretrained Bert used in the invention adopts a 'Bert-base-uncase' model, and semantic vector representation of composition is constructed through the Bert. The invention adopts an 8-fold cross validation method, and articles of other topics are used as training sets each time, and articles of target topics are used as test sets. The specific settings of the experimental parameters are shown in table 3.

TABLE 3 Table 3

As shown in table 4, QWK scores for all subjects of the ASAP dataset for the different models are listed, and table 4 predicts a QWK result comparison for the scoring in the ASAP dataset.

TABLE 4 Table 4

It can be seen from table 4 that our proposed method achieves optimal performance under the composition of multiple topics, and our method is also optimal from the average result.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The automatic composition scoring method for feature semantic fusion based on the double-tower model is characterized by comprising the following steps of:

merging the part-of-speech representation of the sentence level and the semantic representation of the sentence level, and acquiring a composition representation by using an attention mechanism, training the initial double-tower model based on the composition representation and the composition manual characteristic, and acquiring a trained double-tower model; the method for fusing the part-of-speech representation at the sentence level and the semantic representation at the sentence level and obtaining a composition representation using an attention mechanism comprises:

wherein (1)>To fuse the part-of-speech representation with the semantic representation of the composition sentence representation,for the fused composition representation, s is a sentence-level part-of-speech representation, h _t Is a semantic representation,/->Is->Attention weight matrix, q _s ,s _j Is a weight matrix, < >>Is a deviation vector, +.>Fusing vectors of part of speech and semantic representation for the t-th sentence,>attention weight for the t-th sentence;

2. The method for automatically scoring a composition based on feature semantic fusion of a dual-tower model according to claim 1, wherein the method for obtaining the composition part-of-speech representation from non-target subject composition data comprises: POS feature layer acquisition of non-target theme composition dataWen Cixing, denoted as P _pos =[[p _1,j ,p _2,j ,p _3,j , …,p _n,j ],…,[p _i,j+n ,p _i+1,j+n ,p _i+2,j+n , …,p _i+n-1,j+n ]]Wherein P is _pos For part of speech representation, i and j are subscripts of the part of speech representation vector, and n is the number of words.

3. The automatic composition scoring method for feature semantic fusion based on a double-tower model according to claim 2, wherein the method for acquiring feature representations of word levels by using a convolutional neural network based on the composition part-of-speech representation is as follows:

W _word = [w ₁ ,w ₂ ,w ₃ ,…,w _j ,…, w _n ]

wherein W is _word For word level representation, w is the number of convolution kernels of the convolution calculation, w _j For the feature representation of word level, j is the vector subscript, n is the number of words, ++>For the matrix index of the corresponding weight, +.>Is a matrix deviation.

4. The automatic composition scoring method for feature semantic fusion based on a double-tower model as claimed in claim 3, wherein the method for obtaining part-of-speech representation of sentence level through a recurrent neural network based on the feature representation of word level is as follows:

wherein (1)>Weight matrix at word level, +.>Weight vector at word level, +.>For the bias vector +.>And->Is->Attention vector and +.>Attention weight of individual word, +.>For part-of-speech representation at sentence level, +.>For circulating neural network->Output of individual time steps,/->Is a vector of parts of speech representations.

5. The automatic composition scoring method for feature semantic fusion based on a dual-tower model as recited in claim 4, wherein the method for obtaining semantic representations at sentence level through a recurrent neural network based on the composition semantic representations comprises:

extracting semantic information of a document level based on semantic features of non-target subject composition data, obtaining complete composition semantic representation of an article level, extracting sentence level information through a cyclic neural network, and obtaining semantic representation of a sentence level.

6. The automatic composition scoring method for feature semantic fusion based on a double-tower model as claimed in claim 1, wherein the manual composition features comprise: length-based features, syntax-based features, word-based features, and article-readability-based features.