CN108519975B - Composition scoring method, device and storage medium - Google Patents

Composition scoring method, device and storage medium Download PDF

Info

Publication number
CN108519975B
CN108519975B CN201810287644.1A CN201810287644A CN108519975B CN 108519975 B CN108519975 B CN 108519975B CN 201810287644 A CN201810287644 A CN 201810287644A CN 108519975 B CN108519975 B CN 108519975B
Authority
CN
China
Prior art keywords
vector
composition
text
score
scored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810287644.1A
Other languages
Chinese (zh)
Other versions
CN108519975A (en
Inventor
陆勇毅
秦龙
徐书尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SINGSOUND INTELLIGENT TECHNOLOGY Co.,Ltd.
Original Assignee
Beijing Singsound Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Singsound Education Technology Co ltd filed Critical Beijing Singsound Education Technology Co ltd
Priority to CN201810287644.1A priority Critical patent/CN108519975B/en
Publication of CN108519975A publication Critical patent/CN108519975A/en
Application granted granted Critical
Publication of CN108519975B publication Critical patent/CN108519975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The embodiment of the invention provides a composition scoring method, a composition scoring device and a storage medium, and relates to the technical field of automatic composition scoring. Wherein the method comprises the following steps: processing the composition text to be scored to obtain a word vector and a text length vector of the composition text; inputting the word vector into a first neural network model for training to obtain a first output vector, and averaging the first output vector to obtain a first mean vector; inputting the first mean vector and the text length vector into a classifier to perform score grade division on the composition to be scored to obtain a probability distribution vector; inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and averaging the second output vector to obtain a second mean vector; and calculating to obtain the final score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector. The composition is scored by the method, so that the automatic composition scoring effect is greatly improved.

Description

Composition scoring method, device and storage medium
Technical Field
The invention relates to the technical field of automatic composition scoring, in particular to a composition scoring method, a composition scoring device and a storage medium.
Background
With the recent development of natural language processing technology and deep learning technology, neural network algorithms are widely used in various tasks of natural language processing, and Automatic text Scoring (AES) systems based on neural network algorithms are also reported in succession. Most of the existing AES systems based on the neural network algorithm represent the semantics of each word in an article by embedding to obtain a word vector, then deep feature extraction is carried out by adopting a method of a cyclic neural network or a convolutional neural network or combining the cyclic neural network and the convolutional neural network, and finally the article is graded by classifying or regressing the extracted features.
However, in the existing AES system, only simple classification or regression is performed on the features output by the convolutional or cyclic neural network, and the global information like the article length cannot be well characterized. In fact, the emphasis of features should be different for different levels of articles, and in the scoring process, the corresponding features should be considered according to the levels of the articles for comprehensive scoring. Therefore, the scoring effect of the existing AES system is not ideal for tasks requiring fine scoring.
Disclosure of Invention
In order to overcome the above-mentioned deficiencies in the prior art, the present invention aims to provide a composition scoring method, device and storage medium, wherein the composition scoring method extracts word vectors and text length vectors of composition texts, and then performs coarse classification and fine scoring on the composition texts through a neural network model to obtain final scores of the composition texts, thereby greatly improving composition scoring effects.
In order to achieve the above object, the preferred embodiment of the present invention adopts the following technical solutions:
a composition scoring method, the method comprising:
processing the composition text to be scored to obtain a feature vector of the composition text, wherein the feature vector comprises a word vector and a text length vector;
inputting the word vector into a first neural network model for training to obtain a first output vector, and averaging the first output vector to obtain a first mean vector;
inputting the first mean vector and the text length vector into a classifier to perform score grade division on the composition to be scored, and obtaining probability distribution vectors of the text of the composition to be scored on each score grade;
inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and averaging the second output vector to obtain a second mean vector;
and calculating to obtain the final score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector.
Optionally, the method further includes a step of adjusting mapping parameters of the classifier and the regressor, where the step includes:
acquiring the score grade, the score vector, the probability distribution vector and the final score of the composition to be scored;
calculating the difference value of the final score and the manual score of the composition text through the score grade, the score vector, the probability distribution vector, the final score and the loss function;
optimizing the loss function, and adjusting the mapping parameters of the classifier and the regressor to reduce the difference between the final score and the manual score.
Further, when the feature vector is a word vector, the step of processing the composition text to be scored to obtain the feature vector of the composition text comprises:
acquiring initial vectors for representing all words in composition texts;
and training the initial vector for representing each word in the composition text through a word vector model to obtain a word vector of each word in the composition text.
Further, when the feature vector is a text length vector, the step of processing the composition text to be scored to obtain the feature vector of the composition text comprises:
according to a preset text length, carrying out length grade division on the composition text to be evaluated to enable each length grade to correspond to one embedded vector;
acquiring an initial vector for representing the length of a text;
and operating through the embedded vector and the initial vector for expressing the text length to obtain a text length vector of the composition text.
Further, the step of inputting the first mean vector and the text length vector into a classifier to perform score level division on the composition to be scored, and obtaining probability distribution vectors of the text of the composition to be scored on each score level includes:
dividing the composition to be scored into a plurality of score grades according to the manual scoring labels;
inputting the first mean vector and the text length vector into a classifier;
acquiring mapping parameters of the classifier;
and calculating to obtain a probability distribution vector through a softmax function and the mapping parameters of the classifier.
Further, the step of calculating the score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector comprises:
inputting the second mean vector into a regressor;
obtaining a mapping parameter of the regressor;
obtaining a score vector of a composition to be scored through a sigmoid function and the mapping parameters of the regressor;
and calculating the score vector and the probability distribution vector to obtain the final score of the composition to be scored.
The embodiment of the invention also provides a composition scoring device, which comprises:
the text processing module is used for processing the composition text to be scored to obtain a feature vector of the composition text, wherein the feature vector comprises a word vector and a text length vector;
the first training module is used for inputting the word vector into a first neural network model for training to obtain a first output vector, and averaging the first output vector to obtain a first mean vector;
the grading division module is used for inputting the first mean vector and the text length vector into a classifier to perform score grading on the composition to be scored, and obtaining probability distribution vectors of the text of the composition to be scored on each score grade;
the second training module is used for inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and averaging the second output vector to obtain a second mean vector;
and the calculating module is used for calculating to obtain the final score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector.
Further, when the feature vector is a word vector, the text processing module is specifically configured to:
acquiring initial vectors for representing all words in composition texts;
and training the initial vector through a word vector model to obtain a word vector of each word in the composition text.
Further, when the feature vector is a text length vector, the text processing module is specifically configured to:
according to the text length, carrying out length grade division on the composition text to be scored, and enabling each length grade to correspond to one embedded vector;
acquiring an initial vector for representing the length of a text;
and operating through the embedded vector and the initial vector for expressing the text length to obtain a text length vector of the composition text.
The embodiment of the invention also provides a storage medium, which comprises a computer program, and the computer program controls the equipment where the storage medium is located to execute the composition scoring method when running.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only show some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating the steps of a composition scoring method according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a mapping parameter adjustment step in the composition scoring method according to the embodiment of the present invention;
fig. 4 is a schematic diagram illustrating a step of obtaining word vectors of composition texts in the composition scoring method according to the embodiment of the present invention;
fig. 5 is a schematic diagram illustrating a step of obtaining a text length vector of a composition text in the composition scoring method according to the embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a step of obtaining a probability distribution vector in the composition scoring method according to the embodiment of the present invention;
fig. 7 is a schematic diagram illustrating a step of obtaining a final score of a composition to be scored in the composition scoring method according to the embodiment of the present invention;
fig. 8 is a schematic diagram of a composition scoring device according to an embodiment of the present invention.
Icon: 100-an electronic device; 111-a memory; 112-a memory controller; 113-a processor; 70-composition scoring means; 701-a text processing module; 702-a first training module; 703-a grading module; 704-a second training module; 705-calculation module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the embodiments of the present invention, it should be noted that the terms "first", "second", and the like are named only for distinguishing different features, so as to facilitate description of the present invention and simplify description, but do not indicate or imply relative importance, and thus, should not be construed as limiting the present invention.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Fig. 1 is a block diagram of an electronic device 100 according to a preferred embodiment of the invention. The electronic device 100 may include a composition scoring device 70, a memory 111, a memory controller 112, and a processor 113.
The memory 111, the memory controller 112 and the processor 113 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The composition scoring device 70 may include at least one software function module which may be stored in the memory 111 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 100. The processor 113 is used for executing executable modules stored in the memory 111, such as software functional modules and computer programs included in the composition scoring device 70.
The Memory 111 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 111 is used for storing a program, and the processor 113 executes the program after receiving an execution instruction. Access to the memory 111 by the processor 113 and possibly other components may be under the control of the memory controller 112.
The processor 113 may be an integrated circuit chip having signal processing capabilities. The Processor 113 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
First embodiment
Referring to fig. 2, fig. 2 is a schematic diagram illustrating steps of the composition scoring method according to the embodiment, the method includes the following steps:
step S10, processing the composition text to be scored to obtain the feature vector of the composition text, wherein the feature vector comprises a word vector and a text length vector.
In this embodiment, a neural network model is used to process a composition text, and word vectors in the composition text and feature vectors such as text length vectors for representing the length of the composition text are obtained through encoding. The composition text may be, but is not limited to, an english text and a chinese text, and the composition text may be a text composed of any one language.
Step S20, inputting the word vector into a first neural network model for training to obtain a first output vector, and averaging the first output vector to obtain a first mean vector.
In this embodiment, the word vector obtained in step S10 is input into a first neural network for training to obtain a first output vector, and then the first output vector is averaged to obtain a first mean vector.
Step S30, inputting the first mean vector and the text length vector into a classifier to perform score grade division on the composition to be scored, and obtaining probability distribution vectors of the composition text to be scored on each score grade.
In this embodiment, the first mean vector obtained in step S20 and the text length vector obtained in step S10 are input into a classifier, and the composition to be scored is subjected to preliminary score ranking, and at the same time, a probability distribution vector of the text of the composition to be scored at each score ranking is obtained.
And step S40, inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and averaging the second output vector to obtain a second mean vector.
In this embodiment, the first output vector obtained in step S20 and the word vector obtained in step S10 are input to a second neural network model for training to obtain a second output vector, and then the second output vector is averaged to obtain a second mean vector.
And step S50, calculating to obtain the final score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector.
In this embodiment, the second mean vector obtained in step S40 is input into the regressor to obtain a score vector of the composition text, and then the score vector is operated with the probability distribution vector obtained in step S30 to obtain a final score of the composition text.
Optionally, in this embodiment, the first neural network model and the second neural network model may be, but are not limited to, a Recurrent neural network rnn (Recurrent neural network), a Long Short-Term Memory network LSTM (Long Short-Term Memory), a gated Recurrent unit gru (gated Recurrent unit), and in a preferred embodiment of the present invention, the first neural network model and the second neural network model use a Long Short-Term Memory network LSTM.
Referring to fig. 3, optionally, in order to improve the accuracy of the automatic composition scoring, the method further includes a step S60 of adjusting the mapping parameters of the classifier and the regressor, the step includes the following sub-steps:
step S601, obtaining the score grade, the score vector, the probability distribution vector and the final score of the composition to be scored.
In the present embodiment, the score levels and the probability distribution vectors described in step S601 may be obtained by step S30 described above; the score vector and the final score may be obtained by the above-described step S50.
Step S602, calculating the difference value between the final score and the manual score of the composition text through the score grade, the score vector, the probability distribution vector, the final score and the loss function.
In this embodiment, the smaller the difference between the final score and the manual score of the composition text is, the higher the accuracy of the automatic composition scoring model is.
Step S603, optimizing the loss function, and adjusting the mapping parameters of the classifier and the regressor to reduce the difference between the final score and the manual score.
In the preferred embodiment of the present invention, the loss function is optimized by a gradient descent algorithm to obtain mapping parameters in the classifier and the regressor, so as to adjust the mapping parameters in the classifier and the regressor and improve the accuracy of composition scoring.
The loss function can be expressed as:
Figure BDA0001616454900000091
where 0≤λ≤1
wherein y is the score grade of the composition to be scored, score is the score vector of the composition to be scored, prob is the probability distribution vector of the composition to be scored, M is the number of regressors, and S is the final score of the composition to be scored.
In the preferred embodiment of the present invention, Dropout is used as the regularization method, the retention probability is set to 0.5, and the optimization algorithm is Adam. The retention probability is a hyper-parameter in a Dropout algorithm, is used for solving an overfitting problem in the regularization process, and the specific value of the retention probability can be adjusted according to the actual situation.
Referring to fig. 4, in the embodiment of the present invention, when obtaining word vectors of a composition text through a neural network model, step S10 includes step S101 of obtaining word vectors of a composition text, and step S101 specifically includes the following sub-steps:
in step S1011, initial vectors representing respective words in the composition text are acquired.
In the preferred embodiment of the present invention, a one-hot vector is used as the initial vector of each word to represent each word in the composition text, for example, each word in the text is represented as X ═ { w1, w2, …, wn }, where wi represents the one-hot vector corresponding to the ith word, and n is the text length.
Step S1012, training the initial vector for representing each word in the composition text through a word vector model, to obtain a word vector of each word in the composition text.
In the preferred embodiment of the present invention, the one-hot vector of each word obtained in step S1011 is input into a word2vec or GloVe word vector model, then an embedded matrix formed by word vectors obtained through word2vec or GloVe word vector model pre-training is obtained, and finally, the word vectors corresponding to the one-hot vectors of each word in the composition text are obtained through encoding by the embedded matrix. The method for obtaining the word vector may be represented as:
L1=X·E,
Figure BDA0001616454900000101
wherein, L1 is a word vector of the composition text, E is an embedding matrix formed by word vectors obtained by pre-training a word vector model, | V | represents the size of the vocabulary, and D is the dimension of the word vectors.
Referring to fig. 5, in the embodiment of the present invention, when the text length vector of the composition text is obtained through the neural network model, step S10 further includes step S102 of obtaining the length vector of the composition text, where the step includes:
and S1021, performing length grade division on the composition text to be evaluated according to a preset text length, so that each length grade corresponds to one embedded vector.
In the preferred embodiment of the present invention, in order to better represent the text length, a text length embedding method is adopted, the composition text is divided into a plurality of length levels according to the text length of the training data, and the correspondence between the length of the composition text and the length levels can be adjusted according to the actual situation, for example, 0-75 words are defined as 1, 76-150 words are defined as 2, 151-250 words are defined as 3, 251-350 words are defined as 4, and 351 words are defined as 5, so that each length level corresponds to an embedding vector qr.
In step S1022, an initial vector representing the length of the text is acquired.
In the preferred embodiment of the present invention, a one-hot vector is used as the initial vector to represent the text length of the composition text.
In step S1023, a text length vector of the composition text is obtained by performing an operation on the embedded vector and an initial vector indicating the length of the text.
In the present embodiment, a text length vector of the composition text is obtained by operating the embedded vector representing the text length obtained in step S1021 and the initial vector of the text length obtained in step S1022.
The calculation method of the text length vector can be represented as follows:
Figure BDA0001616454900000111
wherein Q is a text length vector, l is a one-hot vector for indicating the length of the composition text, Q is an embedded vector corresponding to a text length level, { Q1, Q2, …, qr }, r is a text length level, and d is a dimension of the text length vector.
Referring to fig. 6, further, in the embodiment of the present invention, the step S30 of performing score level division on the composition to be scored to obtain probability distribution vectors of texts of the composition to be scored at each score level specifically includes:
step S301, dividing the composition to be scored into a plurality of score grades according to the manual scoring labels.
In the present embodiment, the composition text is classified by manual scoring, such as poor, medium, good, and excellent. The composition scoring model may then initially classify the composition text to be scored using the criteria as labels.
Step S302, inputting the first mean vector and the text length vector into a classifier.
In the present embodiment, the first mean vector is obtained through the above step S20, and the text length vector is obtained through the above step S10.
Step S303, obtaining the mapping parameters of the classifier.
In this embodiment, the mapping parameters of the classifier may be obtained by optimizing the loss function in step S60, and the mapping parameters of the classifier may be adjusted by optimizing the loss function, so as to improve the accuracy of automatic composition scoring.
And step S304, calculating a probability distribution vector through a softmax function and the mapping parameters of the classifier.
In this embodiment, before performing score grade division on the composition text to be scored, the word vectors of the composition text obtained in the step S10 are first input into a first neural network model, and a first output vector is obtained after LSTM processing; then carrying out MoT (mean over time) operation on the first output vector to average to obtain a first mean value vector; and finally, splicing the first mean vector and the text length vector obtained in the step S10 to be used as the input of a classifier, performing preliminary score grade division on the composition text according to the manual scoring label, and calculating to obtain probability distribution vectors of the composition text on each score grade.
In the above process, the method for obtaining the first output vector and the first mean vector may be represented as:
L2=LSTM(L1),h=MoT(L2)
Figure BDA0001616454900000121
wherein, L1 is a word vector of the composition text, L2 is a first output vector obtained by LSTM processing, h is a first mean vector obtained by averaging the first output vector, and T is the length of the composition text (i.e., the number of words in the composition text).
The above probability distribution vector can be obtained by the following formula:
prob=soft max([h,q]·Wcls+bcls
Figure BDA0001616454900000131
wherein prob is the probability distribution vector of the composition text at each score level, h is the first mean vector, q is the text length vector of the composition text, WclsAnd bclsM is the number of score levels, which is a mapping parameter for the classifier, which can be obtained by optimizing a loss function.
Referring to fig. 7, further, in the embodiment of the present invention, the step S50 of calculating the score of the composition to be scored according to the score vector and the probability distribution vector obtained by the second mean vector input regressor specifically includes:
step S501, inputting the second mean vector into a regressor.
In the preferred embodiment of the present invention, in order to score the composition texts with different score levels more finely, a plurality of regressors are used to process the composition texts, and the number of regressors is the same as the number of score levels, so that each regressor respectively processes the composition texts with different score levels, and each regressor respectively processes the composition texts with respect to different features, thereby realizing fine scoring and improving the accuracy of composition scoring. The second mean vector is obtained in step S40, and the obtaining process may be represented as:
L3=LSTM([L1,L2])
a=MoT(L3)
l1 is the word vector of the composition text, L2 is the first output vector, L3 is the second output vector obtained by performing LSTM operation on L1 and L2 using the second neural network model, and a is the second mean vector obtained by averaging the second output vectors.
Step S502, obtaining the mapping parameters of the regressor.
In this embodiment, the mapping parameters of the regressor may be obtained by optimizing the loss function in step S60, and the mapping parameters of the regressor may be adjusted by optimizing the loss function, so as to improve the accuracy of the automatic composition scoring.
And S503, obtaining a score vector of the composition to be scored through the sigmoid function and the mapping parameter of the regressor.
And step S504, calculating the score vector and the probability distribution vector to obtain the final score of the composition to be scored.
In this embodiment, after obtaining the second mean vector, the second mean vector is used as an input of the regressor, and a score vector of the composition text is obtained through sigmoid function calculation, where the calculation method may be represented as:
scorei=sigmoid(a·Wi+bi)
Figure BDA0001616454900000141
wherein a is the second mean vector, WiAnd biScore being a mapping parameter of the regressoriThe score vector of the composition text obtained after the grading processing of the regressors is obtained, and M is the number of the regressors(i.e., the number of composition text score levels). The mapping parameters of the regressor can be adjusted by optimizing a loss function, so that the accuracy of composition scoring is further improved.
In the embodiment of the present invention, the final score of the composition text can be obtained by performing a dot product operation on the score vector and the probability distribution vector obtained in the above process, and the calculation formula is as follows:
S=prob·scoresT
where scores={score1,score2,…,scoreM}
wherein prob is a probability distribution vector obtained in the above process, scores is a score vector obtained in the above process, and S is a final score of the composition text.
Second embodiment
Referring to fig. 8, the present embodiment provides a composition scoring device 70, which includes:
the text processing module 701 is configured to process a composition text to be scored to obtain a feature vector of the composition text, where the feature vector includes a word vector and a text length vector;
a first training module 702, configured to input the word vector into a first neural network model for training to obtain a first output vector, and average the first output vector to obtain a first mean vector;
a grading module 703, configured to input the first mean vector and the text length vector into a classifier to perform score grading on the composition to be scored, so as to obtain probability distribution vectors of texts of the composition to be scored at each score grade;
a second training module 704, configured to input the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and average the second output vector to obtain a second mean vector;
the calculating module 705 is configured to calculate a final score of the composition to be scored according to the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector.
In this embodiment, the composition text may be, but is not limited to, an english text and a chinese text, and the composition text may be a text composed of any one language.
Further, when the feature vector processed by the text processing module 701 is a word vector, the text processing module 701 is specifically configured to:
acquiring initial vectors for representing all words in composition texts;
and training the initial vector through a word vector model to obtain a word vector of each word in the composition text.
Further, when the feature vector processed by the text processing module 701 is a text length vector, the text processing module 701 is specifically configured to:
according to the text length, carrying out length grade division on the composition text to be scored, and enabling each length grade to correspond to one embedded vector;
acquiring an initial vector for representing the length of a text;
and operating through the embedded vector and the initial vector for expressing the text length to obtain a text length vector of the composition text.
In the preferred embodiment of the present invention, a one-hot vector is used as the initial vector for representing each word in the composition text and the length of the composition text. The text length grade can be divided according to specific situations or actual requirements.
The embodiment of the invention also provides a storage medium, wherein the storage medium comprises a computer program, and the computer program controls the electronic equipment where the storage medium is located to execute the composition scoring method when running.
In summary, the embodiments of the present invention provide a composition scoring method, apparatus and storage medium. The method comprises the steps of extracting word vectors and text length vectors representing the lengths of composition texts from the composition texts through a neural network model, and performing coarse classification processing and fine grading processing on the composition texts by combining a classifier, a regressor and the neural network model; firstly, dividing the composition text into a plurality of score grades through a classifier, and simultaneously obtaining probability distribution vectors of the composition text at each score grade; then, finely grading the composition texts with different score grades aiming at different features through a plurality of regressors to obtain score vectors of the composition texts; and finally, calculating to obtain the final score of the composition text through the probability distribution vector and the score vector. The composition scoring method provided by the embodiment of the invention greatly improves the effect of automatic composition scoring by extracting the word vector and the text length vector of the composition text and combining the coarse classification and fine scoring methods to comprehensively score the composition.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Claims (10)

1. A composition scoring method, comprising:
processing the composition text to be scored to obtain a feature vector of the composition text, wherein the feature vector comprises a word vector and a text length vector, and the text length vector is used for representing the length of the composition text;
inputting the word vector into a first neural network model for training to obtain a first output vector, and carrying out MoT operation on the first output vector for averaging to obtain a first mean vector;
inputting the first mean vector and the text length vector into a classifier to perform score grade division on the composition to be scored, and obtaining probability distribution vectors of the text of the composition to be scored on each score grade;
inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and carrying out MoT operation on the second output vector for averaging to obtain a second mean value vector;
and performing dot product operation on the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector to obtain the final score of the composition to be scored.
2. The composition scoring method of claim 1, further comprising the step of adjusting mapping parameters of the classifier and the regressor, the step comprising:
acquiring the score grade, the score vector, the probability distribution vector and the final score of the composition to be scored;
calculating the difference value of the final score and the manual score of the composition text through the score grade, the score vector, the probability distribution vector, the final score and the loss function;
optimizing the loss function, and adjusting the mapping parameters of the classifier and the regressor to reduce the difference between the final score and the manual score.
3. The composition scoring method according to claim 1 or 2, wherein when the feature vector is a word vector, the step of processing the composition text to be scored to obtain the feature vector of the composition text comprises:
acquiring initial vectors for representing all words in composition texts;
and training the initial vector for representing each word in the composition text through a word vector model to obtain a word vector of each word in the composition text.
4. The composition scoring method according to claim 1 or 2, wherein when the feature vector is a text length vector, the step of processing the composition text to be scored to obtain the feature vector of the composition text comprises:
according to a preset text length, carrying out length grade division on the composition text to be evaluated to enable each length grade to correspond to one embedded vector;
acquiring an initial vector for representing the length of a text;
and operating through the embedded vector and the initial vector for expressing the text length to obtain a text length vector of the composition text.
5. The composition scoring method according to claim 2, wherein the step of inputting the first mean vector and the text length vector into the classifier to score the composition to be scored and obtain the probability distribution vector of the text of the composition to be scored on each score level comprises:
dividing the composition to be scored into a plurality of score grades according to the manual scoring labels;
inputting the first mean vector and the text length vector into a classifier;
acquiring mapping parameters of the classifier;
and calculating to obtain a probability distribution vector through a softmax function and the mapping parameters of the classifier.
6. The composition scoring method according to claim 2, wherein the step of performing dot product operation on the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector to obtain the score of the composition to be scored comprises:
inputting the second mean vector into a regressor;
obtaining a mapping parameter of the regressor;
obtaining a score vector of a composition to be scored through a sigmoid function and the mapping parameters of the regressor;
and performing dot product operation on the fraction vector and the probability distribution vector to obtain the final score of the composition to be scored.
7. A composition scoring device, the device comprising:
the text processing module is used for processing the composition text to be scored to obtain a feature vector of the composition text, wherein the feature vector comprises a word vector and a text length vector, and the text length vector is used for representing the length of the composition text;
the first training module is used for inputting the word vector into a first neural network model for training to obtain a first output vector, and performing MoT operation on the first output vector for averaging to obtain a first mean value vector;
the grading division module is used for inputting the first mean vector and the text length vector into a classifier to perform score grading on the composition to be scored, and obtaining probability distribution vectors of the text of the composition to be scored on each score grade;
the second training module is used for inputting the first output vector and the word vector into a second neural network model for training to obtain a second output vector, and performing MoT operation on the second output vector for averaging to obtain a second mean vector;
and the calculating module is used for performing dot product operation on the score vector obtained by inputting the second mean vector into the regressor and the probability distribution vector to obtain the final score of the composition to be scored.
8. The composition scoring device of claim 7, wherein when the feature vector is a word vector, the text processing module is specifically configured to:
acquiring initial vectors for representing all words in composition texts;
and training the initial vector through a word vector model to obtain a word vector of each word in the composition text.
9. The composition scoring device of claim 7, wherein when the feature vector is a text length vector, the text processing module is specifically configured to:
according to the text length, carrying out length grade division on the composition text to be scored, and enabling each length grade to correspond to one embedded vector;
acquiring an initial vector for representing the length of a text;
and operating through the embedded vector and the initial vector for expressing the text length to obtain a text length vector of the composition text.
10. A storage medium comprising a computer program which, when executed, controls an apparatus in which the storage medium is located to perform the composition scoring method of any one of claims 1-6.
CN201810287644.1A 2018-04-03 2018-04-03 Composition scoring method, device and storage medium Active CN108519975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810287644.1A CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810287644.1A CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Publications (2)

Publication Number Publication Date
CN108519975A CN108519975A (en) 2018-09-11
CN108519975B true CN108519975B (en) 2021-09-28

Family

ID=63431745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810287644.1A Active CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Country Status (1)

Country Link
CN (1) CN108519975B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471915B (en) * 2018-10-09 2021-07-06 科大讯飞股份有限公司 Text evaluation method, device and equipment and readable storage medium
CN110162777B (en) * 2019-04-01 2020-05-19 广东外语外贸大学 Picture-drawing composition automatic scoring method and system
CN110717023B (en) * 2019-09-18 2023-11-07 平安科技(深圳)有限公司 Method and device for classifying interview answer text, electronic equipment and storage medium
CN111061870B (en) * 2019-11-25 2023-06-06 腾讯科技(深圳)有限公司 Article quality evaluation method and device
CN111581379B (en) * 2020-04-28 2022-03-25 电子科技大学 Automatic composition scoring calculation method based on composition question-deducting degree
CN111581392B (en) * 2020-04-28 2022-07-05 电子科技大学 Automatic composition scoring calculation method based on statement communication degree
CN112183065A (en) * 2020-09-16 2021-01-05 北京思源智通科技有限责任公司 Text evaluation method and device, computer readable storage medium and terminal equipment
CN112561334A (en) * 2020-12-16 2021-03-26 咪咕文化科技有限公司 Grading method and device for reading object, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279844A (en) * 2011-08-31 2011-12-14 中国科学院自动化研究所 Method and system for automatically testing Chinese composition
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN107506360A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 A kind of essay grade method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279844A (en) * 2011-08-31 2011-12-14 中国科学院自动化研究所 Method and system for automatically testing Chinese composition
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN107506360A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 A kind of essay grade method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring;Fei Dong;《Proceedings of the 21st Conference on Computational Natural Language Learning》;20170804;153–162 *
Automatic Text Scoring Using Neural Networks;Dimitrios Alikaniotis;《https://arxiv.org/abs/1606.04289》;20160614;1-11 *
Utilizing Latent Semantic Word Representations for Automated Essay Scoring;Cancan Jin;《 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing》;20160721;1101-1108 *
基于Bi-LSTM的数学主观题自动阅卷方法;刘逸雪;《教育管理》;20180120;109-113 *
自动作文评分模型及方法研究;陈珊珊;《中国优秀硕士学位论文全文数据库信息科技辑》;20180215(第2期);I138-2830 *
高考历史简答题自动评价方法研究;杨靖云;《中国优秀硕士学位论文全文数据库社会科学Ⅱ辑》;20170215(第2期);H130-208 *

Also Published As

Publication number Publication date
CN108519975A (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN108519975B (en) Composition scoring method, device and storage medium
US11657602B2 (en) Font identification from imagery
CN110619369B (en) Fine-grained image classification method based on feature pyramid and global average pooling
CN109886335B (en) Classification model training method and device
US20200027002A1 (en) Category learning neural networks
CN111738251B (en) Optical character recognition method and device fused with language model and electronic equipment
CN108470172B (en) Text information identification method and device
CN105144239B (en) Image processing apparatus, image processing method
CN106156777B (en) Text picture detection method and device
CN110807314A (en) Text emotion analysis model training method, device and equipment and readable storage medium
CN113657483A (en) Model training method, target detection method, device, equipment and storage medium
WO2020168754A1 (en) Prediction model-based performance prediction method and device, and storage medium
CN111831826A (en) Training method, classification method and device of cross-domain text classification model
US20170039451A1 (en) Classification dictionary learning system, classification dictionary learning method and recording medium
CN111611386A (en) Text classification method and device
CN113836303A (en) Text type identification method and device, computer equipment and medium
CN116994021A (en) Image detection method, device, computer readable medium and electronic equipment
WO2017188048A1 (en) Preparation apparatus, preparation program, and preparation method
CN113128565B (en) Automatic image annotation system and device oriented to agnostic pre-training annotation data
KR102437193B1 (en) Apparatus and method for parallel deep neural networks trained by resized images with multiple scaling factors
CN115713669B (en) Image classification method and device based on inter-class relationship, storage medium and terminal
EP4220555A1 (en) Training method and apparatus for image segmentation model, image segmentation method and apparatus, and device
US20210334938A1 (en) Image processing learning program, image processing program, information processing apparatus, and image processing system
CN115080864A (en) Artificial intelligence based product recommendation method and device, computer equipment and medium
CN113887422A (en) Table picture content extraction method, device and equipment based on artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 155, bungalow 17, No. 12, Jiancai Chengzhong Road, Xisanqi, Haidian District, Beijing 100096

Patentee after: BEIJING SINGSOUND INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 100000 No. 38, 2f, block B, building 1, yard 2, Yongcheng North Road, Haidian District, Beijing

Patentee before: BEIJING SINGSOUND EDUCATION TECHNOLOGY CO.,LTD.