CN108519975A - Composition methods of marking, device and storage medium - Google Patents

Composition methods of marking, device and storage medium Download PDF

Info

Publication number
CN108519975A
CN108519975A CN201810287644.1A CN201810287644A CN108519975A CN 108519975 A CN108519975 A CN 108519975A CN 201810287644 A CN201810287644 A CN 201810287644A CN 108519975 A CN108519975 A CN 108519975A
Authority
CN
China
Prior art keywords
vector
text
composition
allocated
evaluated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810287644.1A
Other languages
Chinese (zh)
Other versions
CN108519975B (en
Inventor
陆勇毅
秦龙
徐书尧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING SINGSOUND INTELLIGENT TECHNOLOGY Co.,Ltd.
Original Assignee
Beijing Pre Education Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Pre Education Science And Technology Co Ltd filed Critical Beijing Pre Education Science And Technology Co Ltd
Priority to CN201810287644.1A priority Critical patent/CN108519975B/en
Publication of CN108519975A publication Critical patent/CN108519975A/en
Application granted granted Critical
Publication of CN108519975B publication Critical patent/CN108519975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A kind of composition methods of marking of offer of the embodiment of the present invention, device and storage medium, are related to scoring technology field of writing a composition automatically.Wherein, the method includes:Scoring composition text is treated to be handled to obtain the term vector of composition text and text size vector;It is trained term vector input first nerves network model to obtain the first output vector, and the first output vector is averaging to obtain the first mean vector;First mean vector and text size vector input grader are treated into scoring composition and carry out fraction levels division, obtains ProbabilityDistribution Vector;First output vector and term vector input nervus opticus network model are trained and obtain the second output vector, and the second output vector is averaging to obtain the second mean vector;The scores vector for returning device acquisition is inputted according to the second mean vector and the final score to be evaluated for being allocated as text is calculated in the ProbabilityDistribution Vector.It is scored composition by this method, greatly improves automatic composition scoring effect.

Description

Composition methods of marking, device and storage medium
Technical field
The present invention relates to automatic composition scoring technology fields, are situated between in particular to composition methods of marking, device and storage Matter.
Background technology
With the development of natural language processing technique and depth learning technology in recent years, neural network algorithm is widely transported For in each task of natural language processing, the automatic composition based on neural network algorithm is scored (Automatic Essay Scoring, AES) system is also reported in succession.The existing AES systems based on neural network algorithm, are mostly to first pass through insertion Term vector is obtained to characterize the semanteme of each word in article, then uses Recognition with Recurrent Neural Network either convolutional neural networks or two The method that person combines carries out the feature extraction of deep layer, and the feature extracted finally by front carries out article to classify or return Scoring.
However, in existing AES systems, the feature only exported to convolution or Recognition with Recurrent Neural Network does simple classification Or return, the such global information of similar article length can not be characterized well.In fact, for the article of different levels, Stressing for feature should be different, in scoring process, should consider that corresponding feature is integrated according to the level of article Scoring.Therefore, for needing finely scoring for task, the effect is unsatisfactory for existing AES systems scoring.
Invention content
In order to overcome above-mentioned deficiency in the prior art, the purpose of the present invention is to provide a kind of composition methods of marking, dresses It sets and storage medium, wherein the composition methods of marking is write a composition the term vector and text size vector of text by extraction, then Rough sort and fine scoring are carried out to the composition text by neural network model, obtain the final score of composition text, pole The earth improves composition scoring effect.
To achieve the goals above, the technical solution that present pre-ferred embodiments use is as follows:
A kind of composition methods of marking, the method includes:
Treat the feature vector that scoring composition text is handled to obtain composition text, wherein described eigenvector includes Term vector and text size vector;
It is trained term vector input first nerves network model to obtain the first output vector, and to described first Output vector is averaging to obtain the first mean vector;
First mean vector and text size vector input grader divide the text to be evaluated that is allocated as Number grade classification, obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
By first output vector and term vector input nervus opticus network model be trained acquisition second export to Amount, and second output vector is averaging to obtain the second mean vector;
The scores vector for returning device acquisition is inputted according to second mean vector and the ProbabilityDistribution Vector calculates To the final score to be evaluated for being allocated as text.
Optionally, the method further includes the steps that being adjusted to the mapping parameters of grader and recurrence device, the step Including:
Obtain fraction levels, scores vector, ProbabilityDistribution Vector and the final score to be evaluated for being allocated as text;
The composition text is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function This final difference for scoring and manually scoring;
Optimize the loss function, adjust the grader and return device mapping parameters make the final score with it is artificial The difference of scoring reduces.
Further, when feature vector is term vector, the scoring composition text for the treatment of is handled to obtain composition text The step of this feature vector includes:
Obtain the initial vector for indicating each word in composition text;
It is obtained to described for indicating that the initial vector of each word in composition text is trained by term vector model The term vector of each word in composition text.
Further, when feature vector is text size vector, the scoring composition text for the treatment of is handled to obtain Write a composition text feature vector the step of include:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to preset text size, makes each length scale A corresponding embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained This length vector.
Further, described that first mean vector and the text size vector are inputted into grader to described to be evaluated It is allocated as text and carries out fraction levels division, obtain the step of the ProbabilityDistribution Vector in each fraction levels to be evaluated that is allocated as Wen Wenben Suddenly include:
The text to be evaluated that is allocated as is divided into multiple fraction levels according to artificial scoring label;
First mean vector and the text size vector are inputted into grader;
Obtain the mapping parameters of the grader;
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader.
Further, described that the scores vector and the probability point for returning that device obtains are inputted according to second mean vector Cloth vector be calculated this it is to be evaluated be allocated as text score the step of include:
Second mean vector is inputted and returns device;
Obtain the mapping parameters for returning device;
The scores vector to be evaluated for being allocated as text is obtained by sigmoid functions and the mapping parameters for returning device;
Scores vector and the ProbabilityDistribution Vector are subjected to operation, obtain the final score to be evaluated for being allocated as text.
The embodiment of the present invention also provides a kind of composition scoring apparatus, and described device includes:
Text processing module is handled for treating scoring composition text, obtains the feature vector of composition text, In, described eigenvector includes term vector and text size vector;
First training module, for being trained term vector input first nerves network model to obtain the first output Vector, and first output vector is averaging to obtain the first mean vector;
Grade classification module, for first mean vector and the text size vector to be inputted grader to described It is to be evaluated be allocated as text carry out fraction levels division, obtain it is described it is to be evaluated be allocated as probability distribution of the Wen Wenben in each fraction levels to Amount;
Second training module, for instructing first output vector and term vector input nervus opticus network model Practice and obtain the second output vector, and second output vector is averaging to obtain the second mean vector;
Computing module returns the scores vector and the probability point that device obtains for being inputted according to second mean vector The final score to be evaluated for being allocated as text is calculated in cloth vector.
Further, when feature vector is term vector, the text processing module is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
Further, when feature vector is text size vector, the text processing module is specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, each length scale is made to correspond to one A embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained This length vector.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium includes computer program, the computer Equipment program controls the storage medium when running where executes above-mentioned composition methods of marking.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only the section Example of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided in an embodiment of the present invention;
Fig. 2 is composition methods of marking step schematic diagram provided in an embodiment of the present invention;
Fig. 3 is mapping parameters set-up procedure schematic diagram in composition methods of marking provided in an embodiment of the present invention;
Fig. 4 is to illustrate the step of obtaining the term vector of composition text in composition methods of marking provided in an embodiment of the present invention Figure;
Fig. 5 is the step for the text size vector that composition text is obtained in composition methods of marking provided in an embodiment of the present invention Schematic diagram;
Fig. 6 is the step schematic diagram of acquisition probability distribution vector in composition methods of marking provided in an embodiment of the present invention;
Fig. 7 illustrates to obtain the step of being allocated as literary final score to be evaluated in composition methods of marking provided in an embodiment of the present invention Figure;
Fig. 8 is the comprising modules schematic diagram of composition scoring apparatus provided in an embodiment of the present invention.
Icon:100- electronic equipments;111- memories;112- storage controls;113- processors;70- composition scoring dresses It sets;701- text processing modules;The first training modules of 702-;703- grade classification modules;The second training modules of 704-;705- is counted Calculate module.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiment of the present invention to providing in the accompanying drawings be not intended to limit it is claimed The scope of the present invention, but be merely representative of the present invention selected embodiment.Based on the embodiments of the present invention, this field is common All other embodiment that technical staff is obtained without creative efforts belongs to the model that the present invention protects It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
In the description of the embodiment of the present invention, it should be noted that the naming methods such as term " first ", " second " be only for Differentiation different characteristic convenient for the description present invention and simplifies description, rather than indicates or imply its relative importance, therefore cannot It is interpreted as limitation of the present invention.
Below in conjunction with the accompanying drawings, it elaborates to some embodiments of the present invention.In the absence of conflict, following Feature in embodiment and embodiment can be combined with each other.
Referring to Fig. 1, the block diagram of a kind of electronic equipment 100 provided for present pre-ferred embodiments.The electricity Sub- equipment 100 may include composition scoring apparatus 70, memory 111, storage control 112 and processor 113.
The memory 111, storage control 112 and 113 each element of processor are directly or indirectly electrical between each other Connection, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus or letter between each other Number line, which is realized, to be electrically connected.The composition scoring apparatus 70 may include it is at least one can be with software or firmware (firmware) Form be stored in the memory 111 or be solidificated in the electronic equipment 100 operating system (operating system, OS the software function module in).The processor 113 is used to execute the executable module stored in the memory 111, such as Software function module and computer program included by the composition scoring apparatus 70 etc..
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after receiving and executing instruction, Execute described program.The processor 113 and other possible components can control the access of memory 111 in the storage It is carried out under the control of device 112.
The processor 113 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor 113 can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (DSP), application-specific integrated circuit (ASIC), ready-made Programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware group Part.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be with It is microprocessor or the processor can also be any conventional processor etc..
First embodiment
Referring to Fig. 2, Fig. 2 is the step schematic diagram of composition methods of marking provided in this embodiment, this method includes following Step:
Step S10 treats scoring composition text and is handled to obtain the feature vector of composition text, wherein the feature Vector includes term vector and text size vector.
In the present embodiment, composition text is handled using neural network model, composition text is obtained by coding In term vector and feature vectors such as text size vector for characterizing composition text size.Wherein, the composition text can To be, but it is not limited to English text and Chinese text, the composition text can be the text of any one language composition.
Step S20 is trained term vector input first nerves network model to obtain the first output vector, and right First output vector is averaging to obtain the first mean vector.
In the present embodiment, by the way that the term vector obtained in above-mentioned steps S10 input first nerves network to be trained The first output vector is obtained, then first output vector is averaging to obtain the first mean vector.
First mean vector and the text size vector are inputted grader and to be evaluated are allocated as to described by step S30 Text carries out fraction levels division, obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels.
In the present embodiment, by obtaining text size in the first mean vector for will being obtained in step S20 and step S10 Vector input grader treats scoring composition and carries out preliminary score grade classification, while obtaining the Wen Wenben to be evaluated that is allocated as at each point ProbabilityDistribution Vector in number grade.
First output vector and term vector input nervus opticus network model are trained acquisition the by step S40 Two output vectors, and second output vector is averaging to obtain the second mean vector.
In the present embodiment, the first output vector will be obtained in step S20 to be input to the term vector obtained in step S10 Nervus opticus network model is trained to obtain the second output vector, is then averaging to obtain the second mean value to the second output vector Vector.
Step S50, inputted according to second mean vector return scores vector and the probability distribution that device obtains to The final score to be evaluated for being allocated as text is calculated in amount.
In the present embodiment, the second mean vector obtained in step S40 is inputted into recurrence device and obtains point of composition text Then number vector carries out operation by the ProbabilityDistribution Vector obtained in the scores vector and step S30, obtain composition text Final score.
Optionally, in the present embodiment, the first nerves network model and nervus opticus network model can be, but not It is limited to, Recognition with Recurrent Neural Network RNN (Recurrent neural Network), shot and long term memory network LSTM (Long Short- Term Memory), gating cycle unit GRU (Gated Recurrent Unit) is in a preferred embodiment of the present invention, described First nerves network model and nervus opticus network model use shot and long term memory network LSTM.
Refering to Fig. 3, optionally, in order to improve the accuracy of automatic composition scoring, the method further include to grader and The step S60 that the mapping parameters of device are adjusted is returned, which includes following sub-step:
Step S601 obtains the fraction levels to be evaluated for being allocated as text, scores vector, ProbabilityDistribution Vector and final Point.
In the present embodiment, the fraction levels described in step S601 and ProbabilityDistribution Vector can S30 through the above steps It obtains;Scores vector and final score S50 can obtain through the above steps.
Step S602 is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function The final difference for scoring and manually scoring of the composition text.
In the present embodiment, the final scoring of the composition text and the difference manually to score are smaller, then explanation acts certainly The accuracy of literary Rating Model is higher.
Step S603 optimizes the loss function, adjust the grader and return device mapping parameters make it is described final Score and the difference manually to score reduce.
In the present embodiment, above-mentioned loss function includes two parts, the loss in loss and recurrence device in grader, In the preferred embodiment, the loss function is optimized by gradient descent algorithm, can obtain grader and returns device In mapping parameters, to grader and return device in mapping parameters be adjusted, improve composition scoring accuracy.
The loss function can be expressed as:
where 0≤λ≤1
Wherein, y is the fraction levels to be evaluated for being allocated as text, and score is the scores vector to be evaluated for being allocated as text, and prob is to wait for Score the ProbabilityDistribution Vector write a composition, and M is the quantity for returning device, and S is the final score to be evaluated for being allocated as text.
In a preferred embodiment of the present invention, retain probability as regularization method using Dropout and be set as 0.5, optimization is calculated Method is Adam.Wherein, the probability that retains is a hyper parameter in Dropout algorithms, for during solving regularization Overfitting problem, occurrence can be adjusted according to actual conditions.
Referring to Fig. 4, in an embodiment of the present invention, when obtaining composition text term vector by neural network model, Step S10 includes the step S101 for the term vector for obtaining composition text, and step S101 specifically includes following sub-step:
Step S1011 obtains the initial vector for indicating each word in composition text.
In a preferred embodiment of the present invention, it is used for indicating to make as the initial vector of each word using one-hot vectors Each word in text, for example, each word in text is expressed as X={ w1, w2 ..., wn }, wherein wi is indicated The corresponding one-hot vectors of i-th of word, n is text size.
Step S1012, by term vector model to it is described for indicates write a composition text in each word initial vector into Row training obtains the term vector of each word in composition text.
In a preferred embodiment of the present invention, it is inputted by the one-hot vectors for each word that will be obtained in step S1011 Then word2vec or GloVe term vector models obtain the word obtained by word2vec or GloVe term vector model pre-training The embeded matrix that vector is constituted, is encoded finally by the embeded matrix, is obtained and each word in composition text The corresponding term vector of one-hot vectors.The acquisition methods of the term vector can be expressed as:
L1=XE,
Wherein, L1 is the term vector of composition text, and E is the insertion square that the term vector that term vector model pre-training obtains is constituted Battle array, | V | indicate that the size of vocabulary, D are the dimension of term vector.
Refering to Fig. 5, in embodiments of the present invention, when the text size vector for obtaining composition text by neural network model When, step S10 further includes the step S102 for the length vector for obtaining composition text, which specifically includes:
Step S1021 carries out length scale division to the Wen Wenben to be evaluated that is allocated as according to preset text size, makes every A length scale corresponds to an embedded vector.
In a preferred embodiment of the present invention, in order to preferably characterize text size, the method being embedded in using text size, root Composition text is divided into several length scales according to the text size of training data, composition text size is corresponding with length scale to close System can be adjusted according to actual conditions, be defined as example, 0-75 words are defined as 1,76-150 words and are defined as 2,151-250 words 3,251-350 words are defined as 4,351 words defined above as 5, and each length scale is made to correspond to an embedded vector qr.
Step S1022 obtains the initial vector for indicating text size.
In a preferred embodiment of the present invention, it is used for indicating the text of composition text as initial vector using one-hot vectors This length.
Step S1023 is made by the embedded vector and for indicating that the initial vector of text size carries out operation The text size vector of text.
In the present embodiment, by the embedded vector and step S1022 of the characterization text size that will be obtained in step S1021 In the obtained initial vector of text size carry out operation, obtain the text size vector of composition text.
The computational methods of the text size vector are represented by:
Wherein, q is text size vector, and l is the one-hot vectors for indicating composition text size, and Q is text size The corresponding embedded vector of grade, Q={ q1, q2 ..., qr }, r are text size grade, and d is the dimension of text size vector.
Referring to Fig. 6, further, in embodiments of the present invention, the text to be evaluated that is allocated as being carried out fraction levels division, is obtained The step S30 of the ProbabilityDistribution Vector in each fraction levels to be evaluated that is allocated as Wen Wenben is specifically included:
The text to be evaluated that is allocated as is divided into multiple fraction levels by step S301 according to artificial scoring label.
In the present embodiment, classified to composition text by manually scoring, it is such as poor, medium, good, outstanding.So Afterwards, composition Rating Model can be treated scoring composition text using the standard as label and carry out preliminary classification.
First mean vector and the text size vector are inputted grader by step S302.
In the present embodiment, S20 is obtained first mean vector through the above steps, and text size vector passes through above-mentioned Step S10 is obtained.
Step S303 obtains the mapping parameters of the grader.
In the present embodiment, the mapping parameters of the grader can optimization loss function obtains in S60 through the above steps, And the mapping parameters of grader can be adjusted by optimizing loss function, to improve the accurate of automatic composition scoring Property.
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader in step S304.
It in the present embodiment, first will be in above-mentioned steps S10 before treating scoring composition text progress fraction levels division The term vector of the composition text of acquisition inputs first nerves network model, and the first output vector is obtained after LSTM is handled;So MoT (mean over time) operation is carried out to the first output vector afterwards to be averaging to obtain the first mean vector;Finally by first The text size vector obtained in mean vector and step S10 splices the input as grader, according to the label pair that manually scores Composition text carries out preliminary score grade classification and the ProbabilityDistribution Vector of the composition text in each fraction levels is calculated.
In the above process, the acquisition methods of first output vector and the first mean vector can be expressed as:
L2=LSTM (L1), h=MoT (L2)
Wherein, L1 is the term vector of composition text, and L2 is the first output vector that LSTM processing obtains, and h is defeated to first The first mean vector that outgoing vector is averaging, T are the length (word number in text of writing a composition) of composition text.
Above-mentioned ProbabilityDistribution Vector can be obtained by following formula:
Prob=soft max ([h, q] Wcls+bcls
Wherein, prob is to write a composition text in the ProbabilityDistribution Vector of each fraction levels, and h is the first above-mentioned mean vector, q For the text size vector for text of writing a composition, WclsAnd bclsFor the mapping parameters of grader, M is the quantity of fraction levels, the mapping Parameter can be obtained by optimizing loss function.
Refering to Fig. 7, further, in embodiments of the present invention, is inputted according to the second mean vector and return point that device obtains The step S50 that the score to be evaluated for being allocated as text is calculated with ProbabilityDistribution Vector for number vector is specifically included:
Second mean vector is inputted and returns device by step S501.
In a preferred embodiment of the present invention, it is contemplated that the feature that the composition text of different fraction levels should be paid close attention to should not Together, in order to more subtly give each fraction levels composition text score, using multiple recurrences devices to write a composition text into Row processing, and the quantity for returning device is identical as the quantity of above-mentioned fraction levels, makes corresponding different points respectively of each recurrence device The composition text of number grade is handled, and so that each recurrence device is directed to different characteristic respectively and is handled composition text, realizes Fine scoring, improves the accuracy of composition scoring.Wherein, S40 is obtained second mean vector through the above steps, is obtained Process can be expressed as:
L3=LSTM ([L1, L2])
A=MoT (L3)
L1 is the term vector of above-mentioned composition text, and L2 is the first above-mentioned output vector, and L3 is using nervus opticus net Network model carries out L1, L2 the second output vector that LSTM operations obtain, and a is second be averaging to the second output vector Mean vector.
Step S502 obtains the mapping parameters for returning device.
In the present embodiment, the mapping parameters for returning device can optimization loss function obtains in S60 through the above steps, And the mapping parameters for returning device can be adjusted by optimizing loss function, to improve the accurate of automatic composition scoring Property.
Step S503, by sigmoid functions with it is described return device mapping parameters obtain it is to be evaluated be allocated as text score to Amount.
Scores vector and the ProbabilityDistribution Vector are carried out operation by step S504, are obtained text final to be evaluated that be allocated as and are obtained Point.
In the present embodiment, obtain the second mean vector after, using second mean vector as recurrence device input, It can be calculated the scores vector of composition text by sigmoid functions, which can be expressed as:
scorei=sigmoid (aWi+bi)
Wherein, a is the second above-mentioned mean vector, WiAnd biTo return the mapping parameters of device, scoreiFor by returning device The scores vector of the composition text obtained after scoring processing, M are the quantity (quantity for Text Score grade of writing a composition) for returning device. The mapping parameters for returning device can be adjusted by optimizing loss function, to further increase the accurate of composition scoring Property.
In embodiments of the present invention, the ProbabilityDistribution Vector obtained in above-mentioned scores vector and the above process is subjected to dot product Operation can be obtained the final score of composition text, and calculation formula is:
S=probscoresT
Where scores={ score1, score2..., scoreM}
Wherein, prob is the ProbabilityDistribution Vector obtained in the above process, and scores is the score obtained in the above process Vector, S are the final score of composition text.
Second embodiment
Referring to Fig. 8, the present embodiment provides a kind of composition scoring apparatus 70, which includes:
Text processing module 701 is handled for treating scoring composition text, obtains the feature vector of composition text, Wherein, described eigenvector includes term vector and text size vector;
First training module 702, for being trained term vector input first nerves network model to obtain first Output vector, and first output vector is averaging to obtain the first mean vector;
Grade classification module 703, for first mean vector and the text size vector to be inputted grader pair The text to be evaluated that is allocated as carries out fraction levels division, obtains the probability distribution to be evaluated for being allocated as Wen Wenben in each fraction levels Vector;
Second training module 704, for will first output vector and term vector input nervus opticus network model into Row training obtains the second output vector, and is averaging to obtain the second mean vector to second output vector;
Computing module 705, for inputted according to second mean vector return scores vector that device obtains with it is described general The final score to be evaluated for being allocated as text is calculated in rate distribution vector.
In the present embodiment, the composition text may be, but not limited to, English text and Chinese text, the composition text Originally can be the text of any one language composition.
Further, when the feature vector of the processing of the text processing module 701 is term vector, the text-processing mould Block 701 is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
Further, when the feature vector of the processing of the text processing module 701 is text size vector, the text Processing module 701 is specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, each length scale is made to correspond to one A embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text of composition text is obtained This length vector.
In a preferred embodiment of the present invention, use one-hot vectors be used as indicate composition text in each word with And the initial vector of composition text size.The text size grade can as the case may be or actual demand is divided.
The embodiment of the present invention also provides a kind of storage medium, and the storage medium includes computer program, the calculating Electronic equipment where machine program controls the storage medium when running executes above-mentioned composition methods of marking.
In conclusion a kind of composition methods of marking of offer of the embodiment of the present invention, device and storage medium.Wherein, the side Method extracts the text size vector of the term vector and characterization composition text size in composition text by neural network model, in conjunction with Grader returns device and neural network model to the progress rough sort processing of composition text and fine scoring processing;First, pass through Grader by write a composition text be divided into multiple fraction levels, while obtain composition text each fraction levels probability distribution to Amount;Then, it is finely scored for different characteristic the composition text of different fraction levels, is made by multiple recurrence devices The scores vector of text;Finally, the most final review of composition text is calculated by the ProbabilityDistribution Vector and scores vector Point.Composition methods of marking provided in an embodiment of the present invention is vectorial by the term vector and text size of extraction composition text, and ties It closes rough sort and the method finely to score carries out comprehensive score to composition, greatly improve the effect of automatic composition scoring.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation is included within the present invention, and any reference signs in the claims should not be construed as limiting the involved claims.

Claims (10)

1. a kind of composition methods of marking, which is characterized in that the method includes:
Treat the feature vector that scoring composition text is handled to obtain composition text, wherein described eigenvector include word to Amount and text size vector;
Term vector input first nerves network model is trained to obtain the first output vector, and to first output Vector is averaging and obtains the first mean vector;
First mean vector and text size vector input grader are subjected to score etc. to the text to be evaluated that is allocated as Grade divides, and obtains the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
First output vector and term vector input nervus opticus network model are trained and obtain the second output vector, and Second output vector is averaging to obtain the second mean vector;
The scores vector for returning device acquisition is inputted according to second mean vector and this is calculated in the ProbabilityDistribution Vector The final score to be evaluated for being allocated as text.
2. composition methods of marking as described in claim 1, which is characterized in that the method further includes to grader and returning device Mapping parameters the step of being adjusted, which includes:
Obtain fraction levels, scores vector, ProbabilityDistribution Vector and the final score to be evaluated for being allocated as text;
The composition text is calculated by fraction levels, scores vector, ProbabilityDistribution Vector, final score and loss function The final difference for scoring and manually scoring;
Optimize the loss function, adjusting the mapping parameters of the grader and recurrence device makes the final score and artificial scoring Difference reduce.
3. composition methods of marking as claimed in claim 1 or 2, which is characterized in that described right when feature vector is term vector It is to be evaluated be allocated as Wen Wenben processing obtain composition text feature vector the step of include:
Obtain the initial vector for indicating each word in composition text;
It is write a composition to described for indicating that the initial vector of each word in composition text is trained by term vector model The term vector of each word in text.
4. composition methods of marking as claimed in claim 1 or 2, which is characterized in that when feature vector is text size vector, It is described to treat the step of scoring composition text is handled to obtain the feature vector of composition text and include:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to preset text size, each length scale is made to correspond to One embedded vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text for obtaining composition text is long Degree vector.
5. composition methods of marking as claimed in claim 2, which is characterized in that described by first mean vector and the text This length vector inputs grader and carries out fraction levels division to the text to be evaluated that is allocated as, and obtains the Wen Wenben to be evaluated that is allocated as and exists The step of ProbabilityDistribution Vector in each fraction levels includes:
The text to be evaluated that is allocated as is divided into multiple fraction levels according to artificial scoring label;
First mean vector and the text size vector are inputted into grader;
Obtain the mapping parameters of the grader;
ProbabilityDistribution Vector is calculated by the mapping parameters of softmax functions and the grader.
6. composition methods of marking as claimed in claim 2, which is characterized in that described to be inputted back according to second mean vector Return scores vector and the ProbabilityDistribution Vector that device obtains to be calculated the step of of being allocated as literary score to be evaluated to include:
Second mean vector is inputted and returns device;
Obtain the mapping parameters for returning device;
The scores vector to be evaluated for being allocated as text is obtained by sigmoid functions and the mapping parameters for returning device;
Scores vector and the ProbabilityDistribution Vector are subjected to operation, obtain the final score to be evaluated for being allocated as text.
7. a kind of composition scoring apparatus, which is characterized in that described device includes:
Text processing module is handled for treating scoring composition text, obtains the feature vector of composition text, wherein institute It includes term vector and text size vector to state feature vector;
First training module, for by term vector input first nerves network model be trained to obtain the first output to Amount, and first output vector is averaging to obtain the first mean vector;
Grade classification module, for first mean vector and the text size vector to be inputted grader to described to be evaluated It is allocated as text and carries out fraction levels division, obtain the ProbabilityDistribution Vector to be evaluated for being allocated as Wen Wenben in each fraction levels;
Second training module is obtained for first output vector to be trained with term vector input nervus opticus network model The second output vector is obtained, and second output vector is averaging to obtain the second mean vector;
Computing module, for inputted according to second mean vector return scores vector and the probability distribution that device obtains to The final score to be evaluated for being allocated as text is calculated in amount.
8. composition scoring apparatus as claimed in claim 7, which is characterized in that when feature vector is term vector, the text Processing module is specifically used for:
Obtain the initial vector for indicating each word in composition text;
The initial vector is trained by term vector model, obtains the term vector of each word in composition text.
9. composition scoring apparatus as claimed in claim 7, which is characterized in that when feature vector is text size vector, institute Text processing module is stated to be specifically used for:
Length scale division is carried out to the Wen Wenben to be evaluated that is allocated as according to text size, keeps each length scale correspondence one embedding Incoming vector;
Obtain the initial vector for indicating text size;
By the embedded vector and for indicating that the initial vector of text size carries out operation, the text for obtaining composition text is long Degree vector.
10. a kind of storage medium, which is characterized in that the storage medium includes computer program, the computer program operation When control the storage medium where equipment perform claim require composition methods of marking described in any one of 1-6.
CN201810287644.1A 2018-04-03 2018-04-03 Composition scoring method, device and storage medium Active CN108519975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810287644.1A CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810287644.1A CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Publications (2)

Publication Number Publication Date
CN108519975A true CN108519975A (en) 2018-09-11
CN108519975B CN108519975B (en) 2021-09-28

Family

ID=63431745

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810287644.1A Active CN108519975B (en) 2018-04-03 2018-04-03 Composition scoring method, device and storage medium

Country Status (1)

Country Link
CN (1) CN108519975B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471915A (en) * 2018-10-09 2019-03-15 科大讯飞股份有限公司 A kind of text evaluation method, device, equipment and readable storage medium storing program for executing
CN110162777A (en) * 2019-04-01 2019-08-23 广东外语外贸大学 One kind seeing figure writing type Automated Essay Scoring method and system
CN111061870A (en) * 2019-11-25 2020-04-24 三角兽(北京)科技有限公司 Article quality evaluation method and device
CN111581379A (en) * 2020-04-28 2020-08-25 电子科技大学 Automatic composition scoring calculation method based on composition question-deducting degree
CN111581392A (en) * 2020-04-28 2020-08-25 电子科技大学 Automatic composition scoring calculation method based on statement communication degree
CN112183065A (en) * 2020-09-16 2021-01-05 北京思源智通科技有限责任公司 Text evaluation method and device, computer readable storage medium and terminal equipment
WO2021051586A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Interview answer text classification method, device, electronic apparatus and storage medium
CN112561334A (en) * 2020-12-16 2021-03-26 咪咕文化科技有限公司 Grading method and device for reading object, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279844A (en) * 2011-08-31 2011-12-14 中国科学院自动化研究所 Method and system for automatically testing Chinese composition
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN107506360A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 A kind of essay grade method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102279844A (en) * 2011-08-31 2011-12-14 中国科学院自动化研究所 Method and system for automatically testing Chinese composition
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN107506360A (en) * 2016-06-14 2017-12-22 科大讯飞股份有限公司 A kind of essay grade method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
CANCAN JIN: "Utilizing Latent Semantic Word Representations for Automated Essay Scoring", 《 2015 IEEE 12TH INTL CONF ON UBIQUITOUS INTELLIGENCE AND COMPUTING》 *
DIMITRIOS ALIKANIOTIS: "Automatic Text Scoring Using Neural Networks", 《HTTPS://ARXIV.ORG/ABS/1606.04289》 *
FEI DONG: "Attention-based Recurrent Convolutional Neural Network for Automatic Essay Scoring", 《PROCEEDINGS OF THE 21ST CONFERENCE ON COMPUTATIONAL NATURAL LANGUAGE LEARNING》 *
刘逸雪: "基于Bi-LSTM的数学主观题自动阅卷方法", 《教育管理》 *
杨靖云: "高考历史简答题自动评价方法研究", 《中国优秀硕士学位论文全文数据库社会科学Ⅱ辑》 *
陈珊珊: "自动作文评分模型及方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471915A (en) * 2018-10-09 2019-03-15 科大讯飞股份有限公司 A kind of text evaluation method, device, equipment and readable storage medium storing program for executing
CN110162777A (en) * 2019-04-01 2019-08-23 广东外语外贸大学 One kind seeing figure writing type Automated Essay Scoring method and system
CN110162777B (en) * 2019-04-01 2020-05-19 广东外语外贸大学 Picture-drawing composition automatic scoring method and system
WO2021051586A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Interview answer text classification method, device, electronic apparatus and storage medium
CN111061870A (en) * 2019-11-25 2020-04-24 三角兽(北京)科技有限公司 Article quality evaluation method and device
CN111581379A (en) * 2020-04-28 2020-08-25 电子科技大学 Automatic composition scoring calculation method based on composition question-deducting degree
CN111581392A (en) * 2020-04-28 2020-08-25 电子科技大学 Automatic composition scoring calculation method based on statement communication degree
CN111581379B (en) * 2020-04-28 2022-03-25 电子科技大学 Automatic composition scoring calculation method based on composition question-deducting degree
CN111581392B (en) * 2020-04-28 2022-07-05 电子科技大学 Automatic composition scoring calculation method based on statement communication degree
CN112183065A (en) * 2020-09-16 2021-01-05 北京思源智通科技有限责任公司 Text evaluation method and device, computer readable storage medium and terminal equipment
CN112561334A (en) * 2020-12-16 2021-03-26 咪咕文化科技有限公司 Grading method and device for reading object, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108519975B (en) 2021-09-28

Similar Documents

Publication Publication Date Title
CN108519975A (en) Composition methods of marking, device and storage medium
US20210042580A1 (en) Model training method and apparatus for image recognition, network device, and storage medium
CN109765462A (en) Fault detection method, device and the terminal device of transmission line of electricity
CN109325547A (en) Non-motor vehicle image multi-tag classification method, system, equipment and storage medium
WO2019179403A1 (en) Fraud transaction detection method based on sequence width depth learning
CN106326984A (en) User intention identification method and device and automatic answering system
CN109948149A (en) A kind of file classification method and device
CN111178537B (en) Feature extraction model training method and device
CN106909972A (en) A kind of learning method of sensing data calibrating patterns
CN110348563A (en) The semi-supervised training method of neural network, device, server and storage medium
CN112785005A (en) Multi-target task assistant decision-making method and device, computer equipment and medium
CN109299246A (en) A kind of file classification method and device
CN109635755A (en) Face extraction method, apparatus and storage medium
CN113919401A (en) Modulation type identification method and device based on constellation diagram characteristics and computer equipment
CN107688651A (en) The emotion of news direction determination process, electronic equipment and computer-readable recording medium
CN106557566A (en) A kind of text training method and device
CN110162769A (en) Text subject output method and device, storage medium and electronic device
CN105117330B (en) CNN code test methods and device
CN116432023A (en) Novel power system fault classification method based on sample transfer learning
CN110428012A (en) Brain method for establishing network model, brain image classification method, device and electronic equipment
CN110503600A (en) Feature point detecting method, device, electronic equipment and readable storage medium storing program for executing
CN113947140A (en) Training method of face feature extraction model and face feature extraction method
CN114970357A (en) Energy-saving effect evaluation method, system, device and storage medium
CN115249281A (en) Image occlusion and model training method, device, equipment and storage medium
CN106803233A (en) The optimization method of perspective image conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: Room 155, bungalow 17, No. 12, Jiancai Chengzhong Road, Xisanqi, Haidian District, Beijing 100096

Patentee after: BEIJING SINGSOUND INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 100000 No. 38, 2f, block B, building 1, yard 2, Yongcheng North Road, Haidian District, Beijing

Patentee before: BEIJING SINGSOUND EDUCATION TECHNOLOGY CO.,LTD.