CN108519975A - Composition methods of marking, device and storage medium - Google Patents
Composition methods of marking, device and storage medium Download PDFInfo
- Publication number
- CN108519975A CN108519975A CN201810287644.1A CN201810287644A CN108519975A CN 108519975 A CN108519975 A CN 108519975A CN 201810287644 A CN201810287644 A CN 201810287644A CN 108519975 A CN108519975 A CN 108519975A
- Authority
- CN
- China
- Prior art keywords
- vector
- text
- composition
- allocated
- evaluated
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
A kind of composition methods of marking of offer of the embodiment of the present invention, device and storage medium, are related to scoring technology field of writing a composition automatically.Wherein, the method includes:Scoring composition text is treated to be handled to obtain the term vector of composition text and text size vector;It is trained term vector input first nerves network model to obtain the first output vector, and the first output vector is averaging to obtain the first mean vector;First mean vector and text size vector input grader are treated into scoring composition and carry out fraction levels division, obtains ProbabilityDistribution Vector;First output vector and term vector input nervus opticus network model are trained and obtain the second output vector, and the second output vector is averaging to obtain the second mean vector;The scores vector for returning device acquisition is inputted according to the second mean vector and the final score to be evaluated for being allocated as text is calculated in the ProbabilityDistribution Vector.It is scored composition by this method, greatly improves automatic composition scoring effect.
Description
Technical field
The present invention relates to automatic composition scoring technology fields, are situated between in particular to composition methods of marking, device and storage
Matter.
Background technology
With the development of natural language processing technique and depth learning technology in recent years, neural network algorithm is widely transported
For in each task of natural language processing, the automatic composition based on neural network algorithm is scored (Automatic Essay
Scoring, AES) system is also reported in succession.The existing AES systems based on neural network algorithm, are mostly to first pass through insertion
Term vector is obtained to characterize the semanteme of each word in article, then uses Recognition with Recurrent Neural Network either convolutional neural networks or two
The method that person combines carries out the feature extraction of deep layer, and the feature extracted finally by front carries out article to classify or return
Scoring.
However, in existing AES systems, the feature only exported to convolution or Recognition with Recurrent Neural Network does simple classification
Or return, the such global information of similar article length can not be characterized well.In fact, for the article of different levels,
Stressing for feature should be different, in scoring process, should consider that corresponding feature is integrated according to the level of article
Scoring.Therefore, for needing finely scoring for task, the effect is unsatisfactory for existing AES systems scoring.
Invention content
In order to overcome above-mentioned deficiency in the prior art, the purpose of the present invention is to provide a kind of composition methods of marking, dresses
It sets and storage medium, wherein the composition methods of marking is write a composition the term vector and text size vector of text by extraction, then
Rough sort and fine scoring are carried out to the composition text by neural network model, obtain the final score of composition text, pole
The earth improves composition scoring effect.
To achieve the goals above, the technical solution that present pre-ferred embodiments use is as follows:
A kind of composition methods of marking, the method includes:
Treat the feature vector that scoring composition text is handled to obtain composition text, wherein described eigenvector includes
Term vector and text size vector;
It is trained term vector input first nerves network model to obtain the first output vector, and to described first
Output vector is averaging to obtain the first mean vector;
First mean vector and text size vector input grader divide the text to be evaluated that is allocated as
Number grade classification, obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
By first output vector and term vector input nervus opticus network model be trained acquisition second export to
Amount, and second output vector is averaging to obtain the second mean vector;
The scores vector for returning device acquisition is inputted according to second mean vector and the ProbabilityDistribution Vector calculates
To the final score to be evaluated for being allocated as text.
Optionally, the method further includes the steps that being adjusted to the mapping parameters of grader and recurrence device, the step
Including:
Obtain fraction levels, scores vector, ProbabilityDistribution Vector and the final score to be evaluated for being allocated as text;
The composition text is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function
This final difference for scoring and manually scoring;
Optimize the loss function, adjust the grader and return device mapping parameters make the final score with it is artificial
The difference of scoring reduces.
Further, when feature vector is term vector, the scoring composition text for the treatment of is handled to obtain composition text
The step of this feature vector includes:
Obtain the initial vector for indicating each word in composition text;
It is obtained to described for indicating that the initial vector of each word in composition text is trained by term vector model
The term vector of each word in composition text.
Further, when feature vector is text size vector, the scoring composition text for the treatment of is handled to obtain
Write a composition text feature vector the step of include:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to preset text size, makes each length scale
A corresponding embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained
This length vector.
Further, described that first mean vector and the text size vector are inputted into grader to described to be evaluated
It is allocated as text and carries out fraction levels division, obtain the step of the ProbabilityDistribution Vector in each fraction levels to be evaluated that is allocated as Wen Wenben
Suddenly include:
The text to be evaluated that is allocated as is divided into multiple fraction levels according to artificial scoring label;
First mean vector and the text size vector are inputted into grader;
Obtain the mapping parameters of the grader;
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader.
Further, described that the scores vector and the probability point for returning that device obtains are inputted according to second mean vector
Cloth vector be calculated this it is to be evaluated be allocated as text score the step of include:
Second mean vector is inputted and returns device;
Obtain the mapping parameters for returning device;
The scores vector to be evaluated for being allocated as text is obtained by sigmoid functions and the mapping parameters for returning device;
Scores vector and the ProbabilityDistribution Vector are subjected to operation, obtain the final score to be evaluated for being allocated as text.
The embodiment of the present invention also provides a kind of composition scoring apparatus, and described device includes:
Text processing module is handled for treating scoring composition text, obtains the feature vector of composition text,
In, described eigenvector includes term vector and text size vector;
First training module, for being trained term vector input first nerves network model to obtain the first output
Vector, and first output vector is averaging to obtain the first mean vector;
Grade classification module, for first mean vector and the text size vector to be inputted grader to described
It is to be evaluated be allocated as text carry out fraction levels division, obtain it is described it is to be evaluated be allocated as probability distribution of the Wen Wenben in each fraction levels to
Amount;
Second training module, for instructing first output vector and term vector input nervus opticus network model
Practice and obtain the second output vector, and second output vector is averaging to obtain the second mean vector;
Computing module returns the scores vector and the probability point that device obtains for being inputted according to second mean vector
The final score to be evaluated for being allocated as text is calculated in cloth vector.
Further, when feature vector is term vector, the text processing module is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
Further, when feature vector is text size vector, the text processing module is specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, each length scale is made to correspond to one
A embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained
This length vector.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium includes computer program, the computer
Equipment program controls the storage medium when running where executes above-mentioned composition methods of marking.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only the section Example of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided in an embodiment of the present invention;
Fig. 2 is composition methods of marking step schematic diagram provided in an embodiment of the present invention;
Fig. 3 is mapping parameters set-up procedure schematic diagram in composition methods of marking provided in an embodiment of the present invention;
Fig. 4 is to illustrate the step of obtaining the term vector of composition text in composition methods of marking provided in an embodiment of the present invention
Figure;
Fig. 5 is the step for the text size vector that composition text is obtained in composition methods of marking provided in an embodiment of the present invention
Schematic diagram;
Fig. 6 is the step schematic diagram of acquisition probability distribution vector in composition methods of marking provided in an embodiment of the present invention;
Fig. 7 illustrates to obtain the step of being allocated as literary final score to be evaluated in composition methods of marking provided in an embodiment of the present invention
Figure;
Fig. 8 is the comprising modules schematic diagram of composition scoring apparatus provided in an embodiment of the present invention.
Icon:100- electronic equipments;111- memories;112- storage controls;113- processors;70- composition scoring dresses
It sets;701- text processing modules;The first training modules of 702-;703- grade classification modules;The second training modules of 704-;705- is counted
Calculate module.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented
The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiment of the present invention to providing in the accompanying drawings be not intended to limit it is claimed
The scope of the present invention, but be merely representative of the present invention selected embodiment.Based on the embodiments of the present invention, this field is common
All other embodiment that technical staff is obtained without creative efforts belongs to the model that the present invention protects
It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
In the description of the embodiment of the present invention, it should be noted that the naming methods such as term " first ", " second " be only for
Differentiation different characteristic convenient for the description present invention and simplifies description, rather than indicates or imply its relative importance, therefore cannot
It is interpreted as limitation of the present invention.
Below in conjunction with the accompanying drawings, it elaborates to some embodiments of the present invention.In the absence of conflict, following
Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, the block diagram of a kind of electronic equipment 100 provided for present pre-ferred embodiments.The electricity
Sub- equipment 100 may include composition scoring apparatus 70, memory 111, storage control 112 and processor 113.
The memory 111, storage control 112 and 113 each element of processor are directly or indirectly electrical between each other
Connection, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus or letter between each other
Number line, which is realized, to be electrically connected.The composition scoring apparatus 70 may include it is at least one can be with software or firmware (firmware)
Form be stored in the memory 111 or be solidificated in the electronic equipment 100 operating system (operating system,
OS the software function module in).The processor 113 is used to execute the executable module stored in the memory 111, such as
Software function module and computer program included by the composition scoring apparatus 70 etc..
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access
Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable
Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only
Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only
Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after receiving and executing instruction,
Execute described program.The processor 113 and other possible components can control the access of memory 111 in the storage
It is carried out under the control of device 112.
The processor 113 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor
113 can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit
(Network Processor, NP) etc.;It can also be digital signal processor (DSP), application-specific integrated circuit (ASIC), ready-made
Programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware group
Part.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be with
It is microprocessor or the processor can also be any conventional processor etc..
First embodiment
Referring to Fig. 2, Fig. 2 is the step schematic diagram of composition methods of marking provided in this embodiment, this method includes following
Step:
Step S10 treats scoring composition text and is handled to obtain the feature vector of composition text, wherein the feature
Vector includes term vector and text size vector.
In the present embodiment, composition text is handled using neural network model, composition text is obtained by coding
In term vector and feature vectors such as text size vector for characterizing composition text size.Wherein, the composition text can
To be, but it is not limited to English text and Chinese text, the composition text can be the text of any one language composition.
Step S20 is trained term vector input first nerves network model to obtain the first output vector, and right
First output vector is averaging to obtain the first mean vector.
In the present embodiment, by the way that the term vector obtained in above-mentioned steps S10 input first nerves network to be trained
The first output vector is obtained, then first output vector is averaging to obtain the first mean vector.
First mean vector and the text size vector are inputted grader and to be evaluated are allocated as to described by step S30
Text carries out fraction levels division, obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels.
In the present embodiment, by obtaining text size in the first mean vector for will being obtained in step S20 and step S10
Vector input grader treats scoring composition and carries out preliminary score grade classification, while obtaining the Wen Wenben to be evaluated that is allocated as at each point
ProbabilityDistribution Vector in number grade.
First output vector and term vector input nervus opticus network model are trained acquisition the by step S40
Two output vectors, and second output vector is averaging to obtain the second mean vector.
In the present embodiment, the first output vector will be obtained in step S20 to be input to the term vector obtained in step S10
Nervus opticus network model is trained to obtain the second output vector, is then averaging to obtain the second mean value to the second output vector
Vector.
Step S50, inputted according to second mean vector return scores vector and the probability distribution that device obtains to
The final score to be evaluated for being allocated as text is calculated in amount.
In the present embodiment, the second mean vector obtained in step S40 is inputted into recurrence device and obtains point of composition text
Then number vector carries out operation by the ProbabilityDistribution Vector obtained in the scores vector and step S30, obtain composition text
Final score.
Optionally, in the present embodiment, the first nerves network model and nervus opticus network model can be, but not
It is limited to, Recognition with Recurrent Neural Network RNN (Recurrent neural Network), shot and long term memory network LSTM (Long Short-
Term Memory), gating cycle unit GRU (Gated Recurrent Unit) is in a preferred embodiment of the present invention, described
First nerves network model and nervus opticus network model use shot and long term memory network LSTM.
Refering to Fig. 3, optionally, in order to improve the accuracy of automatic composition scoring, the method further include to grader and
The step S60 that the mapping parameters of device are adjusted is returned, which includes following sub-step:
Step S601 obtains the fraction levels to be evaluated for being allocated as text, scores vector, ProbabilityDistribution Vector and final
Point.
In the present embodiment, the fraction levels described in step S601 and ProbabilityDistribution Vector can S30 through the above steps
It obtains;Scores vector and final score S50 can obtain through the above steps.
Step S602 is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function
The final difference for scoring and manually scoring of the composition text.
In the present embodiment, the final scoring of the composition text and the difference manually to score are smaller, then explanation acts certainly
The accuracy of literary Rating Model is higher.
Step S603 optimizes the loss function, adjust the grader and return device mapping parameters make it is described final
Score and the difference manually to score reduce.
In the present embodiment, above-mentioned loss function includes two parts, the loss in loss and recurrence device in grader,
In the preferred embodiment, the loss function is optimized by gradient descent algorithm, can obtain grader and returns device
In mapping parameters, to grader and return device in mapping parameters be adjusted, improve composition scoring accuracy.
The loss function can be expressed as:
where 0≤λ≤1
Wherein, y is the fraction levels to be evaluated for being allocated as text, and score is the scores vector to be evaluated for being allocated as text, and prob is to wait for
Score the ProbabilityDistribution Vector write a composition, and M is the quantity for returning device, and S is the final score to be evaluated for being allocated as text.
In a preferred embodiment of the present invention, retain probability as regularization method using Dropout and be set as 0.5, optimization is calculated
Method is Adam.Wherein, the probability that retains is a hyper parameter in Dropout algorithms, for during solving regularization
Overfitting problem, occurrence can be adjusted according to actual conditions.
Referring to Fig. 4, in an embodiment of the present invention, when obtaining composition text term vector by neural network model,
Step S10 includes the step S101 for the term vector for obtaining composition text, and step S101 specifically includes following sub-step:
Step S1011 obtains the initial vector for indicating each word in composition text.
In a preferred embodiment of the present invention, it is used for indicating to make as the initial vector of each word using one-hot vectors
Each word in text, for example, each word in text is expressed as X={ w1, w2 ..., wn }, wherein wi is indicated
The corresponding one-hot vectors of i-th of word, n is text size.
Step S1012, by term vector model to it is described for indicates write a composition text in each word initial vector into
Row training obtains the term vector of each word in composition text.
In a preferred embodiment of the present invention, it is inputted by the one-hot vectors for each word that will be obtained in step S1011
Then word2vec or GloVe term vector models obtain the word obtained by word2vec or GloVe term vector model pre-training
The embeded matrix that vector is constituted, is encoded finally by the embeded matrix, is obtained and each word in composition text
The corresponding term vector of one-hot vectors.The acquisition methods of the term vector can be expressed as:
L1=XE,
Wherein, L1 is the term vector of composition text, and E is the insertion square that the term vector that term vector model pre-training obtains is constituted
Battle array, | V | indicate that the size of vocabulary, D are the dimension of term vector.
Refering to Fig. 5, in embodiments of the present invention, when the text size vector for obtaining composition text by neural network model
When, step S10 further includes the step S102 for the length vector for obtaining composition text, which specifically includes:
Step S1021 carries out length scale division to the Wen Wenben to be evaluated that is allocated as according to preset text size, makes every
A length scale corresponds to an embedded vector.
In a preferred embodiment of the present invention, in order to preferably characterize text size, the method being embedded in using text size, root
Composition text is divided into several length scales according to the text size of training data, composition text size is corresponding with length scale to close
System can be adjusted according to actual conditions, be defined as example, 0-75 words are defined as 1,76-150 words and are defined as 2,151-250 words
3,251-350 words are defined as 4,351 words defined above as 5, and each length scale is made to correspond to an embedded vector qr.
Step S1022 obtains the initial vector for indicating text size.
In a preferred embodiment of the present invention, it is used for indicating the text of composition text as initial vector using one-hot vectors
This length.
Step S1023 is made by the embedded vector and for indicating that the initial vector of text size carries out operation
The text size vector of text.
In the present embodiment, by the embedded vector and step S1022 of the characterization text size that will be obtained in step S1021
In the obtained initial vector of text size carry out operation, obtain the text size vector of composition text.
The computational methods of the text size vector are represented by:
Wherein, q is text size vector, and l is the one-hot vectors for indicating composition text size, and Q is text size
The corresponding embedded vector of grade, Q={ q1, q2 ..., qr }, r are text size grade, and d is the dimension of text size vector.
Referring to Fig. 6, further, in embodiments of the present invention, the text to be evaluated that is allocated as being carried out fraction levels division, is obtained
The step S30 of the ProbabilityDistribution Vector in each fraction levels to be evaluated that is allocated as Wen Wenben is specifically included:
The text to be evaluated that is allocated as is divided into multiple fraction levels by step S301 according to artificial scoring label.
In the present embodiment, classified to composition text by manually scoring, it is such as poor, medium, good, outstanding.So
Afterwards, composition Rating Model can be treated scoring composition text using the standard as label and carry out preliminary classification.
First mean vector and the text size vector are inputted grader by step S302.
In the present embodiment, S20 is obtained first mean vector through the above steps, and text size vector passes through above-mentioned
Step S10 is obtained.
Step S303 obtains the mapping parameters of the grader.
In the present embodiment, the mapping parameters of the grader can optimization loss function obtains in S60 through the above steps,
And the mapping parameters of grader can be adjusted by optimizing loss function, to improve the accurate of automatic composition scoring
Property.
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader in step S304.
It in the present embodiment, first will be in above-mentioned steps S10 before treating scoring composition text progress fraction levels division
The term vector of the composition text of acquisition inputs first nerves network model, and the first output vector is obtained after LSTM is handled;So
MoT (mean over time) operation is carried out to the first output vector afterwards to be averaging to obtain the first mean vector;Finally by first
The text size vector obtained in mean vector and step S10 splices the input as grader, according to the label pair that manually scores
Composition text carries out preliminary score grade classification and the ProbabilityDistribution Vector of the composition text in each fraction levels is calculated.
In the above process, the acquisition methods of first output vector and the first mean vector can be expressed as:
L2=LSTM (L1), h=MoT (L2)
Wherein, L1 is the term vector of composition text, and L2 is the first output vector that LSTM processing obtains, and h is defeated to first
The first mean vector that outgoing vector is averaging, T are the length (word number in text of writing a composition) of composition text.
Above-mentioned ProbabilityDistribution Vector can be obtained by following formula:
Prob=soft max ([h, q] Wcls+bcls
Wherein, prob is to write a composition text in the ProbabilityDistribution Vector of each fraction levels, and h is the first above-mentioned mean vector, q
For the text size vector for text of writing a composition, WclsAnd bclsFor the mapping parameters of grader, M is the quantity of fraction levels, the mapping
Parameter can be obtained by optimizing loss function.
Refering to Fig. 7, further, in embodiments of the present invention, is inputted according to the second mean vector and return point that device obtains
The step S50 that the score to be evaluated for being allocated as text is calculated with ProbabilityDistribution Vector for number vector is specifically included:
Second mean vector is inputted and returns device by step S501.
In a preferred embodiment of the present invention, it is contemplated that the feature that the composition text of different fraction levels should be paid close attention to should not
Together, in order to more subtly give each fraction levels composition text score, using multiple recurrences devices to write a composition text into
Row processing, and the quantity for returning device is identical as the quantity of above-mentioned fraction levels, makes corresponding different points respectively of each recurrence device
The composition text of number grade is handled, and so that each recurrence device is directed to different characteristic respectively and is handled composition text, realizes
Fine scoring, improves the accuracy of composition scoring.Wherein, S40 is obtained second mean vector through the above steps, is obtained
Process can be expressed as:
L3=LSTM ([L1, L2])
A=MoT (L3)
L1 is the term vector of above-mentioned composition text, and L2 is the first above-mentioned output vector, and L3 is using nervus opticus net
Network model carries out L1, L2 the second output vector that LSTM operations obtain, and a is second be averaging to the second output vector
Mean vector.
Step S502 obtains the mapping parameters for returning device.
In the present embodiment, the mapping parameters for returning device can optimization loss function obtains in S60 through the above steps,
And the mapping parameters for returning device can be adjusted by optimizing loss function, to improve the accurate of automatic composition scoring
Property.
Step S503, by sigmoid functions with it is described return device mapping parameters obtain it is to be evaluated be allocated as text score to
Amount.
Scores vector and the ProbabilityDistribution Vector are carried out operation by step S504, are obtained text final to be evaluated that be allocated as and are obtained
Point.
In the present embodiment, obtain the second mean vector after, using second mean vector as recurrence device input,
It can be calculated the scores vector of composition text by sigmoid functions, which can be expressed as:
scorei=sigmoid (aWi+bi)
Wherein, a is the second above-mentioned mean vector, WiAnd biTo return the mapping parameters of device, scoreiFor by returning device
The scores vector of the composition text obtained after scoring processing, M are the quantity (quantity for Text Score grade of writing a composition) for returning device.
The mapping parameters for returning device can be adjusted by optimizing loss function, to further increase the accurate of composition scoring
Property.
In embodiments of the present invention, the ProbabilityDistribution Vector obtained in above-mentioned scores vector and the above process is subjected to dot product
Operation can be obtained the final score of composition text, and calculation formula is:
S=probscoresT
Where scores={ score1, score2..., scoreM}
Wherein, prob is the ProbabilityDistribution Vector obtained in the above process, and scores is the score obtained in the above process
Vector, S are the final score of composition text.
Second embodiment
Referring to Fig. 8, the present embodiment provides a kind of composition scoring apparatus 70, which includes:
Text processing module 701 is handled for treating scoring composition text, obtains the feature vector of composition text,
Wherein, described eigenvector includes term vector and text size vector;
First training module 702, for being trained term vector input first nerves network model to obtain first
Output vector, and first output vector is averaging to obtain the first mean vector;
Grade classification module 703, for first mean vector and the text size vector to be inputted grader pair
The text to be evaluated that is allocated as carries out fraction levels division, obtains the probability distribution to be evaluated for being allocated as Wen Wenben in each fraction levels
Vector;
Second training module 704, for will first output vector and term vector input nervus opticus network model into
Row training obtains the second output vector, and is averaging to obtain the second mean vector to second output vector;
Computing module 705, for inputted according to second mean vector return scores vector that device obtains with it is described general
The final score to be evaluated for being allocated as text is calculated in rate distribution vector.
In the present embodiment, the composition text may be, but not limited to, English text and Chinese text, the composition text
Originally can be the text of any one language composition.
Further, when the feature vector of the processing of the text processing module 701 is term vector, the text-processing mould
Block 701 is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
Further, when the feature vector of the processing of the text processing module 701 is text size vector, the text
Processing module 701 is specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, each length scale is made to correspond to one
A embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained
This length vector.
In a preferred embodiment of the present invention, use one-hot vectors be used as indicate composition text in each word with
And the initial vector of composition text size.The text size grade can as the case may be or actual demand is divided.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium includes computer program, the calculating
Electronic equipment where machine program controls the storage medium when running executes above-mentioned composition methods of marking.
In conclusion a kind of composition methods of marking of offer of the embodiment of the present invention, device and storage medium.Wherein, the side
Method extracts the text size vector of the term vector and characterization composition text size in composition text by neural network model, in conjunction with
Grader returns device and neural network model to the progress rough sort processing of composition text and fine scoring processing;First, pass through
Grader by write a composition text be divided into multiple fraction levels, while obtain composition text each fraction levels probability distribution to
Amount;Then, it is finely scored for different characteristic the composition text of different fraction levels, is made by multiple recurrence devices
The scores vector of text;Finally, the most final review of composition text is calculated by the ProbabilityDistribution Vector and scores vector
Point.Composition methods of marking provided in an embodiment of the present invention is vectorial by the term vector and text size of extraction composition text, and ties
It closes rough sort and the method finely to score carries out comprehensive score to composition, greatly improve the effect of automatic composition scoring.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included within the present invention, and any reference signs in the claims should not be construed as limiting the involved claims.
Claims (10)
1. a kind of composition methods of marking, which is characterized in that the method includes:
Treat the feature vector that scoring composition text is handled to obtain composition text, wherein described eigenvector include word to
Amount and text size vector;
Term vector input first nerves network model is trained to obtain the first output vector, and to first output
Vector is averaging and obtains the first mean vector;
First mean vector and text size vector input grader are subjected to score etc. to the text to be evaluated that is allocated as
Grade divides, and obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
First output vector and term vector input nervus opticus network model are trained and obtain the second output vector, and
Second output vector is averaging to obtain the second mean vector;
The scores vector for returning device acquisition is inputted according to second mean vector and this is calculated in the ProbabilityDistribution Vector
The final score to be evaluated for being allocated as text.
2. composition methods of marking as described in claim 1, which is characterized in that the method further includes to grader and returning device
Mapping parameters the step of being adjusted, which includes:
Obtain fraction levels, scores vector, ProbabilityDistribution Vector and the final score to be evaluated for being allocated as text;
The composition text is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function
The final difference for scoring and manually scoring;
Optimize the loss function, adjusting the mapping parameters of the grader and recurrence device makes the final score and artificial scoring
Difference reduce.
3. composition methods of marking as claimed in claim 1 or 2, which is characterized in that described right when feature vector is term vector
It is to be evaluated be allocated as Wen Wenben processing obtain composition text feature vector the step of include:
Obtain the initial vector for indicating each word in composition text;
It is write a composition to described for indicating that the initial vector of each word in composition text is trained by term vector model
The term vector of each word in text.
4. composition methods of marking as claimed in claim 1 or 2, which is characterized in that when feature vector is text size vector,
It is described to treat the step of scoring composition text is handled to obtain the feature vector of composition text and include:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to preset text size, each length scale is made to correspond to
One embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text for obtaining composition text is long
Degree vector.
5. composition methods of marking as claimed in claim 2, which is characterized in that described by first mean vector and the text
This length vector inputs grader and carries out fraction levels division to the text to be evaluated that is allocated as, and obtains the Wen Wenben to be evaluated that is allocated as and exists
The step of ProbabilityDistribution Vector in each fraction levels includes:
The text to be evaluated that is allocated as is divided into multiple fraction levels according to artificial scoring label;
First mean vector and the text size vector are inputted into grader;
Obtain the mapping parameters of the grader;
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader.
6. composition methods of marking as claimed in claim 2, which is characterized in that described to be inputted back according to second mean vector
Return scores vector and the ProbabilityDistribution Vector that device obtains to be calculated the step of of being allocated as literary score to be evaluated to include:
Second mean vector is inputted and returns device;
Obtain the mapping parameters for returning device;
The scores vector to be evaluated for being allocated as text is obtained by sigmoid functions and the mapping parameters for returning device;
Scores vector and the ProbabilityDistribution Vector are subjected to operation, obtain the final score to be evaluated for being allocated as text.
7. a kind of composition scoring apparatus, which is characterized in that described device includes:
Text processing module is handled for treating scoring composition text, obtains the feature vector of composition text, wherein institute
It includes term vector and text size vector to state feature vector;
First training module, for by term vector input first nerves network model be trained to obtain the first output to
Amount, and first output vector is averaging to obtain the first mean vector;
Grade classification module, for first mean vector and the text size vector to be inputted grader to described to be evaluated
It is allocated as text and carries out fraction levels division, obtain the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
Second training module is obtained for first output vector to be trained with term vector input nervus opticus network model
The second output vector is obtained, and second output vector is averaging to obtain the second mean vector;
Computing module, for inputted according to second mean vector return scores vector and the probability distribution that device obtains to
The final score to be evaluated for being allocated as text is calculated in amount.
8. composition scoring apparatus as claimed in claim 7, which is characterized in that when feature vector is term vector, the text
Processing module is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
9. composition scoring apparatus as claimed in claim 7, which is characterized in that when feature vector is text size vector, institute
Text processing module is stated to be specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, keeps each length scale correspondence one embedding
Incoming vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text for obtaining composition text is long
Degree vector.
10. a kind of storage medium, which is characterized in that the storage medium includes computer program, the computer program operation
When control the storage medium where equipment perform claim require composition methods of marking described in any one of 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810287644.1A CN108519975B (en) | 2018-04-03 | 2018-04-03 | Composition scoring method, device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810287644.1A CN108519975B (en) | 2018-04-03 | 2018-04-03 | Composition scoring method, device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108519975A true CN108519975A (en) | 2018-09-11 |
CN108519975B CN108519975B (en) | 2021-09-28 |
Family
ID=63431745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810287644.1A Active CN108519975B (en) | 2018-04-03 | 2018-04-03 | Composition scoring method, device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108519975B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109471915A (en) * | 2018-10-09 | 2019-03-15 | 科大讯飞股份有限公司 | A kind of text evaluation method, device, equipment and readable storage medium storing program for executing |
CN110162777A (en) * | 2019-04-01 | 2019-08-23 | 广东外语外贸大学 | One kind seeing figure writing type Automated Essay Scoring method and system |
CN111061870A (en) * | 2019-11-25 | 2020-04-24 | 三角兽(北京)科技有限公司 | Article quality evaluation method and device |
CN111581379A (en) * | 2020-04-28 | 2020-08-25 | 电子科技大学 | Automatic composition scoring calculation method based on composition question-deducting degree |
CN111581392A (en) * | 2020-04-28 | 2020-08-25 | 电子科技大学 | Automatic composition scoring calculation method based on statement communication degree |
CN112183065A (en) * | 2020-09-16 | 2021-01-05 | 北京思源智通科技有限责任公司 | Text evaluation method and device, computer readable storage medium and terminal equipment |
WO2021051586A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Interview answer text classification method, device, electronic apparatus and storage medium |
CN112561334A (en) * | 2020-12-16 | 2021-03-26 | 咪咕文化科技有限公司 | Grading method and device for reading object, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102279844A (en) * | 2011-08-31 | 2011-12-14 | 中国科学院自动化研究所 | Method and system for automatically testing Chinese composition |
CN102831558A (en) * | 2012-07-20 | 2012-12-19 | 桂林电子科技大学 | System and method for automatically scoring college English compositions independent of manual pre-scoring |
CN107133211A (en) * | 2017-04-26 | 2017-09-05 | 中国人民大学 | A kind of composition methods of marking based on notice mechanism |
CN107506360A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | A kind of essay grade method and system |
-
2018
- 2018-04-03 CN CN201810287644.1A patent/CN108519975B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102279844A (en) * | 2011-08-31 | 2011-12-14 | 中国科学院自动化研究所 | Method and system for automatically testing Chinese composition |
CN102831558A (en) * | 2012-07-20 | 2012-12-19 | 桂林电子科技大学 | System and method for automatically scoring college English compositions independent of manual pre-scoring |
CN107506360A (en) * | 2016-06-14 | 2017-12-22 | 科大讯飞股份有限公司 | A kind of essay grade method and system |
CN107133211A (en) * | 2017-04-26 | 2017-09-05 | 中国人民大学 | A kind of composition methods of marking based on notice mechanism |
Non-Patent Citations (6)
Title |
---|
CANCAN JIN: "Utilizing Latent Semantic Word Representations for Automated Essay Scoring", 《 2015 IEEE 12TH INTL CONF ON UBIQUITOUS INTELLIGENCE AND COMPUTING》 * |
DIMITRIOS ALIKANIOTIS: "Automatic Text Scoring Using Neural Networks", 《HTTPS://ARXIV.ORG/ABS/1606.04289》 * |
FEI DONG: "Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring", 《PROCEEDINGS OF THE 21ST CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING》 * |
刘逸雪: "基于Bi-LSTM的数学主观题自动阅卷方法", 《教育管理》 * |
杨靖云: "高考历史简答题自动评价方法研究", 《中国优秀硕士学位论文全文数据库社会科学Ⅱ辑》 * |
陈珊珊: "自动作文评分模型及方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109471915A (en) * | 2018-10-09 | 2019-03-15 | 科大讯飞股份有限公司 | A kind of text evaluation method, device, equipment and readable storage medium storing program for executing |
CN110162777A (en) * | 2019-04-01 | 2019-08-23 | 广东外语外贸大学 | One kind seeing figure writing type Automated Essay Scoring method and system |
CN110162777B (en) * | 2019-04-01 | 2020-05-19 | 广东外语外贸大学 | Picture-drawing composition automatic scoring method and system |
WO2021051586A1 (en) * | 2019-09-18 | 2021-03-25 | 平安科技(深圳)有限公司 | Interview answer text classification method, device, electronic apparatus and storage medium |
CN111061870A (en) * | 2019-11-25 | 2020-04-24 | 三角兽(北京)科技有限公司 | Article quality evaluation method and device |
CN111581379A (en) * | 2020-04-28 | 2020-08-25 | 电子科技大学 | Automatic composition scoring calculation method based on composition question-deducting degree |
CN111581392A (en) * | 2020-04-28 | 2020-08-25 | 电子科技大学 | Automatic composition scoring calculation method based on statement communication degree |
CN111581379B (en) * | 2020-04-28 | 2022-03-25 | 电子科技大学 | Automatic composition scoring calculation method based on composition question-deducting degree |
CN111581392B (en) * | 2020-04-28 | 2022-07-05 | 电子科技大学 | Automatic composition scoring calculation method based on statement communication degree |
CN112183065A (en) * | 2020-09-16 | 2021-01-05 | 北京思源智通科技有限责任公司 | Text evaluation method and device, computer readable storage medium and terminal equipment |
CN112561334A (en) * | 2020-12-16 | 2021-03-26 | 咪咕文化科技有限公司 | Grading method and device for reading object, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108519975B (en) | 2021-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108519975A (en) | Composition methods of marking, device and storage medium | |
US20210042580A1 (en) | Model training method and apparatus for image recognition, network device, and storage medium | |
CN109765462A (en) | Fault detection method, device and the terminal device of transmission line of electricity | |
CN109325547A (en) | Non-motor vehicle image multi-tag classification method, system, equipment and storage medium | |
WO2019179403A1 (en) | Fraud transaction detection method based on sequence width depth learning | |
CN106326984A (en) | User intention identification method and device and automatic answering system | |
CN109948149A (en) | A kind of file classification method and device | |
CN111178537B (en) | Feature extraction model training method and device | |
CN106909972A (en) | A kind of learning method of sensing data calibrating patterns | |
CN110348563A (en) | The semi-supervised training method of neural network, device, server and storage medium | |
CN112785005A (en) | Multi-target task assistant decision-making method and device, computer equipment and medium | |
CN109299246A (en) | A kind of file classification method and device | |
CN109635755A (en) | Face extraction method, apparatus and storage medium | |
CN113919401A (en) | Modulation type identification method and device based on constellation diagram characteristics and computer equipment | |
CN107688651A (en) | The emotion of news direction determination process, electronic equipment and computer-readable recording medium | |
CN106557566A (en) | A kind of text training method and device | |
CN110162769A (en) | Text subject output method and device, storage medium and electronic device | |
CN105117330B (en) | CNN code test methods and device | |
CN116432023A (en) | Novel power system fault classification method based on sample transfer learning | |
CN110428012A (en) | Brain method for establishing network model, brain image classification method, device and electronic equipment | |
CN110503600A (en) | Feature point detecting method, device, electronic equipment and readable storage medium storing program for executing | |
CN113947140A (en) | Training method of face feature extraction model and face feature extraction method | |
CN114970357A (en) | Energy-saving effect evaluation method, system, device and storage medium | |
CN115249281A (en) | Image occlusion and model training method, device, equipment and storage medium | |
CN106803233A (en) | The optimization method of perspective image conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: Room 155, bungalow 17, No. 12, Jiancai Chengzhong Road, Xisanqi, Haidian District, Beijing 100096 Patentee after: BEIJING SINGSOUND INTELLIGENT TECHNOLOGY Co.,Ltd. Address before: 100000 No. 38, 2f, block B, building 1, yard 2, Yongcheng North Road, Haidian District, Beijing Patentee before: BEIJING SINGSOUND EDUCATION TECHNOLOGY CO.,LTD. |