CN107967318A - A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets - Google Patents
A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets Download PDFInfo
- Publication number
- CN107967318A CN107967318A CN201711177862.1A CN201711177862A CN107967318A CN 107967318 A CN107967318 A CN 107967318A CN 201711177862 A CN201711177862 A CN 201711177862A CN 107967318 A CN107967318 A CN 107967318A
- Authority
- CN
- China
- Prior art keywords
- text
- answer
- answer text
- semantic feature
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
Abstract
The present invention is a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including:(1) answer case text is segmented, and converts the text to a word sequence;(2) vectorization for obtaining each word in answer text represents and builds answer text mapping matrix;(3) computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the output of all or part of hidden layers, obtain the semantic feature matrix of answer text;(4) down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature vector of the answer text;(5) the semantic feature vector of answer text is assigned to grader, the classification of answer case text is predicted;(6) consider the many-to-one relationship between answer text generic and score, according to default mapping between the two, determine the score of answer text.The present invention can not depend on subjective item model answer, effectively excavate answer text semantic information, realize Chinese short text subjective item automatic scoring.
Description
Technical field
The present invention relates to Automatic Read Overmarginalia technical field, is that one kind uses long short-term memory (LSTM, Long specifically
Short-Term Memory) neutral net Chinese short text subjective item automatic scoring method and system, it can be applied to by
Chinese natural language answer translate, letter answer, judge, picture and text conversion the problems such as automatic scoring, and be finally applied to operation and
Paper read and make comments and the learning evaluation process of student.
Background technology
For subjective item in occupation of very important status in subject learns and imparts knowledge to students, its maximum advantage is can to measure respectively
The more complicated performance-based objective of kind, can more examine or check the creative thinking ability of student and with ability to express.Subjective item also by
This becomes one of the most widely used topic type in course teaching and test.Heavy, mechanical subjective item is read and made comments work and is occupied
The plenty of time of teacher and energy, and students then wish to obtain the feedback on handling situations in real time, thus teacher and
The subjective item Automatic Read Overmarginalia of objective effective, time saving and energy saving resource-saving is realized in all urgent hope of student by computer.Subjective item
The realization of automatic scoring has very important realistic meaning:First, the efficiency that teacher reads and makes comments link can be greatly improved, effectively
Mitigate teaching job burden;Secondly, the factors such as the subjective preferences for reading and appraising teacher, physical condition and psychological condition can be reduced to commenting
Divide the influence of accuracy;Again, Real-time Feedback can be provided for the student of on-line study, saving is read and made comments the stand-by period, improves study
Efficiency;Finally, automatic adaptive learning and adaptive test and appraisal task are can be applied to, is the key for realizing intelligent tutoring system
Technology.
In daily teaching and examination, type, the features such as short text subjective item mainly includes translation, letter is answered, judged are:
(1) answered in the form of natural language;(2) length of answer is shorter, is usually no more than one section of word;(3) student can not be from stem
Middle acquisition answer information, it is necessary to carry out understanding application and migration to domain knowledge;(4) answer case content of text is laid particular emphasis on when scoring
Investigation, and the feature such as non-textual writing style, rhetorical devices;(5) problem is open varied, can be closing
, it is semiclosed or open.Realize computer to the automatic scoring of short text subjective item, it is necessary to which computer can be deeper
The semantic information of " understanding " text.In addition, being limited be subject to answer text size, the statistics that computer can be extracted therefrom is believed
Breath (such as word co-occurrence, contextual information) is limited, and traditional natural language processing method and model based on statistics are faced with
The problems such as Sparse and semantic sensitiveness.Therefore, the automatic scoring of accurate subjective item is obtained, is still a great challenge
With technical problem urgently to be resolved hurrily.
One of the key technology of the automatic scoring of subjective item as intellectual education, in field of Educational Technology in occupation of very heavy
The status wanted.Summarize domestic and international on going result and find that general methods of marking frame mainly divides following 4 modules, as shown in Figure 1:
Module (1):Establish database.The correlations such as examination question, model answer, standards of grading and student's answer are included in database
Data.
Module (2):Pretreatment.Answer case text is segmented, duplicate removal, go stop words, part-of-speech tagging etc. handle.
Module (3):Establish Rating Model.Two submodules are included in the module, both influence each other, and mutually restrict:
A. feature extraction:Using natural language processing technique, rule-based, statistics or the methods of neutral net, carry out text
Feature extraction, by answer text vector.
B. modeling method:Mapped, information extraction, the method based on corpus and the methods of machine learning, built using concept
Vertical Rating Model.
Module (4):Scoring.For new student's answer text, the processing of module (2) is carried out to it first, is then placed in
In the model established to module (3), Tag Estimation is carried out to new student's answer, this is provided further according to the label predicted and answers
The final scoring of case.
In the frame of above-mentioned automatic scoring, the module of core is model building module (i.e. module (3)), the method for mainstream
Following 4 class can be divided into:
(1) concept matching:Main thought is that model answer is considered as several key concepts or crucial contamination, according to
Score in raw answer with the presence or absence of these key concepts.This method has been relatively specific for clear and definite answer and answer is more simple
Short type topic.Typical system has ATM (Automatic Text Marker) and C-rater etc..
(2) information extraction:Main thought is to think that some specific viewpoints, these viewpoints would generally be included in answer text
It can be positioned and be modeled with template, the template matches degree of student's answer and model answer is marking foundation.First, from non-knot
The extracting data of structure by element group representation structured message;Then matched using such as regular expression or analytic tree isotype
Algorithm carries out pattern match.Typical system has AutoMark, WebLSA (Web-based Language Assessment
) and Auto-marking etc. System.
(3) method based on language material:This method is that the statistical nature of corpus, profit are extracted in large-scale text corpus
The text similarity of student's answer and model answer is calculated with these statistical natures, student's answer is beaten according to the height of similarity
Point.Common method is latent semantic analysis (LSA, Latent Semantic Analysis).Method based on language material is commented
Divide performance directly proportional to the scale of corpus.Typical system has Atenea and SAMText (Short Answer
Measurement of Text) etc..
(4) machine learning:Main thought is that the scoring problem to short text is converted into text classification or clustering problem.It is first
First, using the feature of natural language processing technique extraction student's answer, text is subjected to vectorization expression.Here feature master is extracted
To include similar features between text feature, answer and the model answer of answer etc..Then, using the score of student's answer as
Class label, classification model construction is carried out so as to obtain Rating Model, often using machine learning algorithm to student's answer feature of extraction
Sorting algorithm is k nearest neighbor, logistic regression, naive Bayesian method and support vector machines etc..Typical system has e-
Examiner and CAM (Content Assessment Module) etc..
To Chinese subjective item carry out automatic scoring, at present the technology of mainstream the main problems are as follows:
(1) above method is mainly used for English subjective item automatic scoring, but due to the natural language processing of Chinese and English
The greatest differences of technology, therefore the above method is difficult to be transplanted in the automatic scoring of Chinese subjective item.
(2) above method is directed to closed question, i.e. such problem has model answer, but in actual teaching and examination
In, many problems do not have model answer.In taking an examination such as Chinese language, the standards of grading of some topics for " sounding reasonable " or
Person " looks like to i.e. score ".And for it is this kind of without model answer and standards of grading it is relatively fuzzyyer the problem of, above-mentioned algorithm is not
It is applicable in.
(3) above method largely relies on traditional language model, and the method for extracting Text Representation is complicated, can not solve
The problems such as Deta sparseness and semanteme sensitiveness that certainly short text length is come compared with short strip.
In recent years, deep learning (Deep Learning) algorithm is in natural language processing (NLP, Natural Language
Processing) field achieves the achievement to attract people's attention.Compared to traditional language model, the model energy based on deep learning
Enough semantic informations for preferably excavating word, phrase, sentence and chapter.Particularly, Recognition with Recurrent Neural Network (RNN, Recurrent
Neural Network) it is widely used in since it adapts to sequence information modeling in the task of natural language processing, and
Achieve good effect.RNN with LSTM units, solves the problems, such as that the Long-range dependence in traditional RNN and gradient disappear and ask
Topic, thus more scholars are of interest.
The content of the invention
The task of the present invention is overcome the deficiencies in the prior art, it is contemplated that Chinese short text subjective item automatic scoring problem
Feature, the challenge faced and LSTM neutral nets establish the advantage in problem in language model, it is proposed that one kind is using LSTM god
Chinese short text subjective item automatic scoring method and system through network, can be in the case of independent of model answer in
Literary short text subjective item carries out automatic scoring.
Chinese subjective item automatic scoring problem is converted into text classification problem by the present invention, is come using the term vector of pre-training
Student's answer text is indicated, then utilizes long Memory Neural Networks (LSTM, Long Short-Term Memory in short-term
Neural Network) extraction text semantic feature vector, for training grader, answer case text carries out classification prediction,
The final score of this answer is finally determined according to the mapping relations between predetermined classification and scoring.The present invention is first by LSTM
Neutral net is incorporated into Chinese short text subjective item automatic scoring method, is LSTM neutral nets in Chinese short text subjective item
New opplication in automatic scoring.The inventive method, solves the problems, such as the automatic scoring of Chinese subjective item, reduces scoring algorithm
To the dependence of model answer.Meanwhile more traditional subjective item automatic scoring method, the present invention carry out vector in answer case text
Change and word sequential relationship within a context is considered when representing and answer case text has carried out semantic extension, obtained answer
Text semantic feature vector dimension is controllable, can effectively solve Deta sparseness and semantic sensitive question that text comes compared with short strip.
Chinese short text subjective item automatic scoring method provided by the invention using LSTM neutral nets, including following step
Suddenly:
Step 1:Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence;
Step 2:The vectorization for obtaining each word in answer text represents and builds answer text mapping matrix;
Step 3:Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains all or part of hidden layers
Output, obtain the semantic feature matrix of answer text;
Step 4:Down-sampling is carried out to the semantic feature matrix using pond algorithm (Pooling) and obtains the answer text
This semantic feature vector;
Step 5:The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers
The classification of case text is predicted;
Step 6:According to default answer text generic and the mapping relations of score, the score of answer text is determined.
Following technical characteristic further can also provide some preferential technical solutions to above-mentioned automatic scoring method.
In the step 2, in default dictionary, each word in answer case text scan for obtain the word to
Quantization means, the sequencing then occurred according to each word in answer text, builds answer text mapping matrix;For institute
State indivedual words in answer text and do not appear in situation in the dictionary, be considered as stop words and carry out discard processing.
In the step 3, computing is carried out using LSTM neutral net answer case text mapping matrixes M, extracts answer text
Semantic feature, generate the semantic feature matrix H of answer text, matrix H is all or part of implicit by the LSTM neutral nets
The output vector composition of layer.
In the step 3, the mode of answer text mapping matrix M input LSTM neutral nets is:Each moment inputs square
A row of battle array M to LSTM neutral nets, the column vector of matrix M sequentially inputs LSTM neutral nets, effectively to arrange mark ascending order arrangement
Remain the word order information of answer text.
LSTM neural network model parameters and sorter model parameter in the step 3 and step 5 is in the scoring
Obtained during model training, use the cross entropy for minimizing destination probability distribution and actual probability distribution as object function, profit
Batch sample error is calculated with gradient descent method, and uses back propagation renewal LSTM neural network model parameters and grader
Model parameter.
Generated in the step 4 semantic feature vector for input answer text vectorization semantic feature represent, this to
Quantify semantic feature represent in contain answer text word order information and word and text semantic between related information.
The pond algorithm used in the step 4, may be selected different pond methods, maximum pond method (max-
) or minimum pond method (min-pooling) or the Chi Huafa (average pooling) etc. that is averaged pooling.
It is many-to-one relation between answer text generic and score in the step 6, that is, allows different classes of
Answer text obtains identical score, but does not allow the answer text of identical category to obtain different scores.
It is described present invention also offers a kind of Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets
Auto-Evaluation System includes input module, data processing module, semantic feature extraction module, grading module, lexicon module, its
In:
Input module, for the answer text to be reached data processing module;
Data processing module, for corresponding answer text mapping to be segmented and built to the answer text of the input
Matrix;Answer text mapping matrix is reached into semantic feature extraction module;
Semantic feature extraction module, for obtaining the semantic feature vector of the answer text;Including LSTM neutral nets
Layer and pond layer, are input to LSTM neutral nets by the answer text mapping matrix, obtain network portion or all hidden layers
Output, obtain answer text semantic eigenmatrix, then by answer text semantic eigenmatrix carry out pond computing, answered
Case text semantic feature vector, and send it to grading module;
Grading module, for determining the score of answer text;Answer text semantic feature vector imparting multiclass is patrolled
This meaning grader is collected, the classification of answer case text is predicted, is then reflected according to the answer text categories of prediction according to default
Penetrate relationship map and be the score of the answer text, and export appraisal result;
Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents (term vector), for number
The calling of data is provided according to processing module.
The advantages of Chinese subjective item automatic scoring method proposed by the present invention using LSTM neutral nets, is:
(1) present invention does not depend on model answer, learns semantic feature automatically from existing student's answer text, by subjectivity
The conversion of automatic scoring problem is inscribed for short text classification problem.
(2) LSTM neutral nets are incorporated into Chinese short text subjective item automatic scoring method by the present invention first, are
New opplications of the LSTM in short text subjective item automatic scoring.
(3) present invention is introduced substantial amounts of external auxiliary information, is expanded using the term vector set initialization dictionary of pre-training
Opened up the semantic information in answer text, effectively solve brought since answer text is too short contextual information deficiency ask
Topic.
(4) present invention is modeled by the word sequence of LSTM neutral net answer case texts, is remained in modeling process
The time sequencing that word occurs in the text, has effectively excavated the word order feature in context.
(5) for the present invention independent of complicated language model, the semanteme that answer text is extracted by LSTM neutral nets is special
Sign, effectively excavates the related information between semantic information and the word in answer text, improves the semantic sensitiveness of short text
Problem, improves the performance of subjective item automatic scoring.
(6) present invention is portrayed between the two by the mapping of previously given answer text generic and final score
Many-to-one relation, has taken into full account the possible identical situation of student's answer score of multiple generics.
Brief description of the drawings
Fig. 1 is existing subjective item automatic scoring algorithm general framework;
Fig. 2 is Chinese short text subjective item automatic scoring flow chart proposed by the present invention;
Fig. 3 trains flow chart for answer textual classification model of the present invention;
Fig. 4 is test data set term vector sectional drawing of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.
The present invention proposes a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including in
Text participle, structure answer text mapping matrix, extraction semantic feature vector, text classification and scoring and etc., its flow chart is such as
Shown in Fig. 2.Comprise the following steps that described:
Step 1, the answer text of input is segmented, obtains word sequence S={ w1,w2,…,wN, wherein wiFor word order
I-th of word in S, i=1,2 ..., N are arranged, N is the number for including word in word sequence S.
Step 2, dictionary Dict is initialized using the term vector set of pre-training, introduced related to subjective item scoring
Auxiliary information.
Step 3, answer text mapping matrix is built.By way of inquiring about dictionary, obtain in answer text and appear in word
The vector representation of all words in allusion quotation Dict, the sequencing occurred in the text according to word, builds text mapping matrix M, tool
Body formula is:
M=Dictindex (S)=[v1,v2,…,vN]
Wherein, index (w) is index functions of the word w in the dictionary Dict, viFor word wiIt is right in dictionary Dict
The term vector answered.In the case of indivedual words in the answer text are not appeared in the dictionary Dict, the present embodiment
Using the method directly abandoned, the word that will be not included in dictionary is handled as stop words.
Step 4, computing is carried out to the mapping matrix M of the answer text using LSTM neutral nets, extracts answer text
Semantic feature, generate answer text semantic feature matrix H, matrix H is by all or part of LSTM neutral nets hidden layers
Output vector forms, and specific formula for calculation is:
it=σ (Wivt+Uiht-1+bi)
ot=σ (Wovt+Uoht-1+bo)
ft=σ (Wfvt+Ufht-1+bf)
ut=tanh (Wuvt+Uuht-1+bu)
ct=it⊙ut+ft⊙ct-1
ht=ot⊙tanh(ct)
T=1,2 ..., T
Symbol implication is as listed in table 1 wherein in formula.
1. formal notation of table defines
T | Moment |
M | Answer text mapping matrix |
vt | D dimensional vectors, the input of t moment, the t row of matrix M |
Wk | Q × d ties up matrix, and k=i, o, f, u, input weight matrix |
Uk | Q × q ties up matrix, and k=i, o, f, u, circulate weight matrix |
bk | Q dimensional vectors, k=i, o, f, u, biasing |
it | The input gate of t moment |
ot | The out gate of t moment |
ft | The forgetting door of t moment |
ut | The candidate value of t moment |
ct | The hidden layer internal state of t moment |
ht | Q dimensional vectors, output of the hidden layer in t moment |
⊙ | Element multiplication |
σ | 1/(1+e-x) function |
Tanh | Hyperbolic tangent function |
T | LSTM neural network node numbers |
Step 5, down-sampling is carried out to the semantic feature matrix H of the answer text using pond algorithm (Pooling) to obtain
To the semantic feature vector l of the answer textT.The semantic feature vector is the vectorization semantic feature of input answer text
Represent.
Step 6, assign many-sorted logic this carefully classification the semantic feature vector of the answer text obtained in the step 5
Device, the classification Y of answer case textSIt is predicted.Specific formula for calculation is as follows:
Wherein, lTFor the answer text semantic feature vector obtained using LSTM neutral nets, yjIt is j-th of classification, WlFor
| Y | the grader weight matrix of × q dimensions, blFor | Y | the grader biasing of dimension, M are the corresponding mapping matrixes of training sample S, Θ
For the parameter sets of the LSTM neutral nets, i.e. Θ={ Wk,Uk,bk, k=i, o, f, u }, Y is the predefined class of training sample
Do not gather, | Y | to include the number of element, [A] in YkThe row k of representing matrix A.
Wherein, step 4, the parameter in LSTM neutral nets described in step 6 and many-sorted logic this carefully return grader ginseng
Number obtains in the LSTM neural metwork trainings stage, as shown in figure 3, specific training step is as follows:
(1) when being trained to the model, the object function that uses of the present invention for minimize destination probability distribution with
The cross entropy of actual probability distribution, object function are defined as:
Wherein, y*And MiRespectively training sample SiCorresponding correct classification and mapping matrix, C are the number of training sample,
Parameter setsα is regularization factors.
(2) to the object functionBatch sample error is calculated using gradient descent method, and uses back propagation
(BP, Back Propagation) updates the parameter setsMore new formula is:
Wherein, λ is learning rate.
Step 7, the score G (S) of answer text is determined according to given answer text S generics.Specific formula for calculation
It is as follows:
G (S)=F (YS)
Wherein F () is given function, and codomain is set { 0,1 ..., K }, and K is the full marks that answer text S corresponds to topic.
In order to verify the actual techniques effect of the present invention, inventor carries out so that one group of primary school 6 grades reads and understands as an example
Test.Examination question is as follows:
Article please be read, is answered a question.
Long long ago, only beautiful bird has a rest in the mountain forest in the State of Lu countryside, the king of the State of Lu sends someone it
It is connected in the Imperial Ancestral Temple and raises, the good wine of most mellowness is drunk to it, feeds the beef and mutton that it eats delicious food, played to it beautiful《Nine is splendid》Music.
But bird is dizzy, worry is sad, dare not eat piece of meat, dare not drink a glass of wine, after three days just in the dust.
This is kept pet by the life style of oneself, rather than is kept pet by the habit of bird!If keeping pet by the habit of bird, just
It should be allowed to inhabit in remote, thickly forested mountains, on land, shoal is played, and is circled in the air in rivers,lakes and seas, is rested with the queue of flock of birds, from
By comfortable live in forest.Birds are detested to listen the sound of people, but why also noisy to go to be tired of them noisyly
Magnificent palace " salty pond " is inner to play grace to bird《Nine is splendid》How laughable music is.If played in wilderness field《Nine
It is splendid》, small bird, which has listened, just flys far and high, and beasts, which have listened just to be scared, to run away, and fish, which has been listened just to dive into the water, to keep close, but people
Having listened will crowd around to watch.Fish could survive in water, but people enter to water in will be drowned.The mankind and fish habit are not
Together, both tastes are natively different.Therefore, the sage of former generation does not impose the mankind and small bird to have the same ability, not yet
Them can be allowed to do same thing.
It is understood that natural rule will be met by doing things, happiness could be muchly kept.You think best, Ren Jiake
It can feel bad, in some instances it may even be possible to feel very bad.Measure other's corn by own bushel, sometimes wrong;Get sth into one's head, often lead to thing mistake
Lose.
Original text changes certainly《Village is to a happy piece》
Please according to author's viewpoint, with the king of the Hua Shuoyishuo thes State of Lu of oneself keep pet failure the reason for:
--------------------------------------------------------
The topic full marks are 2 points.Effective student's answer 533 is collected altogether, and every answer is beaten respectively by 2 teachers
Point, marking coincident indicator kappa=0.755.Student's answer text averagely includes 24 Chinese vocabularies, and longest text includes
78 vocabulary, student's answer text extract are as shown in table 2.Inventor is collected using wikipedia, search dog news and inventor
Primary school student composition training word represents vector, and more than totally 47 ten thousand bar term vector, every term vector 200 are tieed up, as shown in Figure 4.This implementation
Word represents that the training tool of vector is python Open-Source Tools gensim in example, and gensim is in https://
Radimrehurek.com/gensim/models/word2vec.html is discussed in detail.
2. answer text extract of table
Hardware environment is used by this experiment:64 bit manipulation systems of Ubuntu, Intel i7 processors, CPU frequency
3.41GHz, memory 16G.Chinese word segmentation uses Open-Source Tools " stammerer " Chinese word segmentation, in https://pypi.python.org/
In pypi/jieba/ and Zheng Jie is written《NLP Chinese natural language process principle and practice》In have it is detailed on " stammerer "
The introduction of Chinese word segmentation.In the LSTM neutral nets, using 128 LSTM units, the wheel number epoch=of model training
30, using maximum pond algorithm.After LSTM neutral nets carry out the progress semantic feature vectorization of student's answer text, using more
This carefully returns grader classification to the logic of class, and tag along sort then is converted into scoring according to established rule.Assuming that student's answer is literary
This S, which is predicted to be, belongs to YSClass, then G (S) is the prediction score of this answer.Student's answer text classification mark in the present embodiment
The score for this answer is signed, i.e.,:
Using 80% sample in the data set as training set, remaining sample forms test set, that is, randomly selects 426 samples
This is used for model training, remaining 107 parts of sample is used for model measurement, and the scoring accuracy rate after cross validation is 80.1%, kappa
It is worth for 0.686, close to the marking uniformity of people.As it can be seen that answered data using a small amount of student, it is possible to answered in the standard of not depending on
In the case of case information, the structure of the automatic scoring model with superperformance is realized.
In conclusion the present invention proposes a kind of Chinese short text subjective item automatic scoring side using LSTM neutral nets
Method, can fully excavate the semantic information in answer text, realizes to Chinese short essay in the case of independent of code of points
The automatic scoring of this subjective item.The experiment test of truthful data collection shows, according to the LSTM neutral nets of the method for the present invention training
, can be with the realization Chinese short text subjective item automatic scoring of better quality with good text classification performance.
Finally it should be noted that above example be only used for description technical scheme rather than to technical method into
Row limitation, within the spirit and principles of the invention, does any modification, equivalent substitution, improvement and etc. and should be included in this hair
Within bright protection domain.
Claims (9)
- A kind of 1. Chinese short text subjective item automatic scoring method using LSTM neutral nets, it is characterised in that including following Step:Step 1:Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence;Step 2:The vectorization for obtaining each word in answer text represents, and builds answer text mapping matrix;Step 3:Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the defeated of all or part of hidden layers Go out, obtain the semantic feature matrix of answer text;Step 4:Down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature of the answer text Vector;Step 5:The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers text This classification is predicted;Step 6:According to default answer text generic and the mapping relations of score, the score of answer text is determined.
- 2. method according to claim 1, it is characterised in that in the step 2, in default dictionary, answer case text In each word scan for representing with the vectorization for obtaining the word, the priority then occurred according to each word in answer text Sequentially, answer text mapping matrix is built;The feelings in the dictionary are not appeared in for indivedual words in the answer text Condition, is considered as stop words and carries out discard processing.
- 3. method according to claim 1, it is characterised in that in the step 3, utilize LSTM neutral net answer case texts Mapping matrix M carry out computing, extract answer text semantic feature, generate answer text semantic feature matrix H, matrix H by The output vector composition of all or part of hidden layers of LSTM neutral nets.
- 4. method according to claim 1, it is characterised in that in the step 3, answer text mapping matrix M inputs LSTM The mode of neutral net is:A row of each moment input matrix M to LSTM neutral nets, the column vector of matrix M is risen with arranging mark Sequence arrangement sequentially inputs LSTM neutral nets, has been effectively retained the word order information of answer text.
- 5. method according to claim 1, it is characterised in that the LSTM neural network models ginseng in the step 3 and step 5 Number and sorter model parameter obtain in the Rating Model training process, are distributed and reality using destination probability is minimized The cross entropy of probability distribution is object function, and batch sample error is calculated using gradient descent method, and using back propagation more New LSTM neural network model parameters and sorter model parameter.
- 6. method according to claim 1, it is characterised in that the semantic feature vector generated in the step 4 is answered for input The vectorization semantic feature of case text represents that the vectorization semantic feature contains the word order information and word of answer text in representing Related information between language and text semantic.
- 7. method according to claim 1, it is characterised in that the pond algorithm used in the step 4, the pond method Using maximum pond method or minimum pond method or average Chi Huafa.
- 8. method according to claim 1, it is characterised in that in the step 6 between answer text generic and score It is many-to-one relation, that is, allows different classes of answer text to obtain identical score, but do not allow the answer of identical category Text obtains different scores.
- A kind of 9. Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets, it is characterised in that the scoring System, including input module, data processing module, semantic feature extraction module, grading module and lexicon module, wherein:Input module, for the answer text to be reached data processing module;Data processing module, for corresponding answer text mapping square to be segmented and built to the answer text of the input Battle array;Answer text mapping matrix is reached into semantic feature extraction module;Semantic feature extraction module, for obtaining the semantic feature vector of the answer text;Including LSTM neural net layers and Pond layer, LSTM neutral nets are input to by the answer text mapping matrix, obtain some or all hidden layers in network Output, obtains answer text semantic eigenmatrix, and answer text semantic eigenmatrix then is carried out pond computing, obtains answer Text semantic feature vector, and send it to grading module;Grading module, for determining the score of answer text;By the answer text semantic feature vector assign many-sorted logic this Meaning grader, the classification of answer case text are predicted, and are then closed according to the answer text categories of prediction according to default mapping System is mapped as the score of the answer text, and exports appraisal result;Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents, carried for data processing module For the calling of data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711177862.1A CN107967318A (en) | 2017-11-23 | 2017-11-23 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711177862.1A CN107967318A (en) | 2017-11-23 | 2017-11-23 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107967318A true CN107967318A (en) | 2018-04-27 |
Family
ID=62000419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711177862.1A Pending CN107967318A (en) | 2017-11-23 | 2017-11-23 | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107967318A (en) |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763411A (en) * | 2018-05-23 | 2018-11-06 | 北京师范大学 | A kind of combination short text clustering and the subjective item of recommendation mechanisms read and make comments system and method |
CN108875074A (en) * | 2018-07-09 | 2018-11-23 | 北京慧闻科技发展有限公司 | Based on answer selection method, device and the electronic equipment for intersecting attention neural network |
CN108897723A (en) * | 2018-06-29 | 2018-11-27 | 北京百度网讯科技有限公司 | The recognition methods of scene dialog text, device and terminal |
CN108960319A (en) * | 2018-06-29 | 2018-12-07 | 哈尔滨工业大学 | It is a kind of to read the candidate answers screening technique understood in modeling towards global machine |
CN109146296A (en) * | 2018-08-28 | 2019-01-04 | 南京葡萄诚信息科技有限公司 | A kind of artificial intelligence assessment talent's method |
CN109242090A (en) * | 2018-08-28 | 2019-01-18 | 电子科技大学 | A kind of video presentation and description consistency discrimination method based on GAN network |
CN109388806A (en) * | 2018-10-26 | 2019-02-26 | 北京布本智能科技有限公司 | A kind of Chinese word cutting method based on deep learning and forgetting algorithm |
CN109670168A (en) * | 2018-11-14 | 2019-04-23 | 华南师范大学 | Short answer automatic scoring method, system and storage medium based on feature learning |
CN109815491A (en) * | 2019-01-08 | 2019-05-28 | 平安科技(深圳)有限公司 | Answer methods of marking, device, computer equipment and storage medium |
CN110134948A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | A kind of Financial Risk Control method, apparatus and electronic equipment based on text data |
CN110162797A (en) * | 2019-06-21 | 2019-08-23 | 北京百度网讯科技有限公司 | Article quality determining method and device |
CN110245860A (en) * | 2019-06-13 | 2019-09-17 | 桂林电子科技大学 | A method of the automatic scoring based on Virtual Experiment Platform Based |
CN110309503A (en) * | 2019-05-21 | 2019-10-08 | 昆明理工大学 | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN |
CN110309267A (en) * | 2019-07-08 | 2019-10-08 | 哈尔滨工业大学 | Semantic retrieving method and system based on pre-training model |
CN110413741A (en) * | 2019-08-07 | 2019-11-05 | 山东山大鸥玛软件股份有限公司 | A kind of intelligently reading method towards subjective item |
CN110457674A (en) * | 2019-06-25 | 2019-11-15 | 西安电子科技大学 | A kind of text prediction method of theme guidance |
CN110858218A (en) * | 2018-08-13 | 2020-03-03 | 宋曜廷 | Automatic scoring method and system for divergent thinking test |
CN110991161A (en) * | 2018-09-30 | 2020-04-10 | 北京国双科技有限公司 | Similar text determination method, neural network model obtaining method and related device |
CN111079641A (en) * | 2019-12-13 | 2020-04-28 | 科大讯飞股份有限公司 | Answering content identification method, related device and readable storage medium |
CN111104881A (en) * | 2019-12-09 | 2020-05-05 | 科大讯飞股份有限公司 | Image processing method and related device |
CN111191578A (en) * | 2019-12-27 | 2020-05-22 | 北京新唐思创教育科技有限公司 | Automatic scoring method, device, equipment and storage medium |
CN111221939A (en) * | 2019-11-22 | 2020-06-02 | 华中师范大学 | Grading method and device and electronic equipment |
CN111241392A (en) * | 2020-01-07 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Method, device, equipment and readable storage medium for determining popularity of article |
CN111724813A (en) * | 2020-06-17 | 2020-09-29 | 东莞理工学院 | LSTM-based piano playing automatic scoring method |
CN112085985A (en) * | 2020-08-20 | 2020-12-15 | 安徽七天教育科技有限公司 | Automatic student answer scoring method for English examination translation questions |
WO2021000675A1 (en) * | 2019-07-04 | 2021-01-07 | 平安科技(深圳)有限公司 | Method and apparatus for machine reading comprehension of chinese text, and computer device |
CN112287083A (en) * | 2020-10-29 | 2021-01-29 | 北京乐学帮网络技术有限公司 | Evaluation method and device, computer equipment and storage device |
CN112634689A (en) * | 2020-12-24 | 2021-04-09 | 广州奇大教育科技有限公司 | Application method of regular expression in automatic subjective question changing in computer teaching |
CN112988921A (en) * | 2019-12-13 | 2021-06-18 | 北京四维图新科技股份有限公司 | Method and device for identifying map information change |
CN113111154A (en) * | 2021-06-11 | 2021-07-13 | 北京世纪好未来教育科技有限公司 | Similarity evaluation method, answer search method, device, equipment and medium |
CN113392642A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | System and method for automatically labeling child-bearing case based on meta-learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN107133211A (en) * | 2017-04-26 | 2017-09-05 | 中国人民大学 | A kind of composition methods of marking based on notice mechanism |
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
-
2017
- 2017-11-23 CN CN201711177862.1A patent/CN107967318A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104834747A (en) * | 2015-05-25 | 2015-08-12 | 中国科学院自动化研究所 | Short text classification method based on convolution neutral network |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN107133211A (en) * | 2017-04-26 | 2017-09-05 | 中国人民大学 | A kind of composition methods of marking based on notice mechanism |
CN107301246A (en) * | 2017-07-14 | 2017-10-27 | 河北工业大学 | Chinese Text Categorization based on ultra-deep convolutional neural networks structural model |
Cited By (47)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108763411A (en) * | 2018-05-23 | 2018-11-06 | 北京师范大学 | A kind of combination short text clustering and the subjective item of recommendation mechanisms read and make comments system and method |
CN108897723B (en) * | 2018-06-29 | 2022-08-02 | 北京百度网讯科技有限公司 | Scene conversation text recognition method and device and terminal |
CN108897723A (en) * | 2018-06-29 | 2018-11-27 | 北京百度网讯科技有限公司 | The recognition methods of scene dialog text, device and terminal |
CN108960319A (en) * | 2018-06-29 | 2018-12-07 | 哈尔滨工业大学 | It is a kind of to read the candidate answers screening technique understood in modeling towards global machine |
CN108875074B (en) * | 2018-07-09 | 2021-08-10 | 北京慧闻科技发展有限公司 | Answer selection method and device based on cross attention neural network and electronic equipment |
CN108875074A (en) * | 2018-07-09 | 2018-11-23 | 北京慧闻科技发展有限公司 | Based on answer selection method, device and the electronic equipment for intersecting attention neural network |
CN110858218A (en) * | 2018-08-13 | 2020-03-03 | 宋曜廷 | Automatic scoring method and system for divergent thinking test |
CN110858218B (en) * | 2018-08-13 | 2023-06-30 | 宋曜廷 | Automatic scoring method and system for divergent thinking test |
CN109146296A (en) * | 2018-08-28 | 2019-01-04 | 南京葡萄诚信息科技有限公司 | A kind of artificial intelligence assessment talent's method |
CN109242090A (en) * | 2018-08-28 | 2019-01-18 | 电子科技大学 | A kind of video presentation and description consistency discrimination method based on GAN network |
CN110991161B (en) * | 2018-09-30 | 2023-04-18 | 北京国双科技有限公司 | Similar text determination method, neural network model obtaining method and related device |
CN110991161A (en) * | 2018-09-30 | 2020-04-10 | 北京国双科技有限公司 | Similar text determination method, neural network model obtaining method and related device |
CN109388806A (en) * | 2018-10-26 | 2019-02-26 | 北京布本智能科技有限公司 | A kind of Chinese word cutting method based on deep learning and forgetting algorithm |
CN109388806B (en) * | 2018-10-26 | 2023-06-27 | 北京布本智能科技有限公司 | Chinese word segmentation method based on deep learning and forgetting algorithm |
CN109670168B (en) * | 2018-11-14 | 2023-04-18 | 华南师范大学 | Short answer automatic scoring method, system and storage medium based on feature learning |
CN109670168A (en) * | 2018-11-14 | 2019-04-23 | 华南师范大学 | Short answer automatic scoring method, system and storage medium based on feature learning |
CN109815491B (en) * | 2019-01-08 | 2023-08-08 | 平安科技(深圳)有限公司 | Answer scoring method, device, computer equipment and storage medium |
CN109815491A (en) * | 2019-01-08 | 2019-05-28 | 平安科技(深圳)有限公司 | Answer methods of marking, device, computer equipment and storage medium |
CN110134948A (en) * | 2019-04-23 | 2019-08-16 | 北京淇瑀信息科技有限公司 | A kind of Financial Risk Control method, apparatus and electronic equipment based on text data |
CN110309503A (en) * | 2019-05-21 | 2019-10-08 | 昆明理工大学 | A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN |
CN110245860B (en) * | 2019-06-13 | 2022-08-23 | 桂林电子科技大学 | Automatic scoring method based on virtual experiment platform |
CN110245860A (en) * | 2019-06-13 | 2019-09-17 | 桂林电子科技大学 | A method of the automatic scoring based on Virtual Experiment Platform Based |
CN110162797A (en) * | 2019-06-21 | 2019-08-23 | 北京百度网讯科技有限公司 | Article quality determining method and device |
CN110162797B (en) * | 2019-06-21 | 2023-04-07 | 北京百度网讯科技有限公司 | Article quality detection method and device |
CN110457674B (en) * | 2019-06-25 | 2021-05-14 | 西安电子科技大学 | Text prediction method for theme guidance |
CN110457674A (en) * | 2019-06-25 | 2019-11-15 | 西安电子科技大学 | A kind of text prediction method of theme guidance |
WO2021000675A1 (en) * | 2019-07-04 | 2021-01-07 | 平安科技(深圳)有限公司 | Method and apparatus for machine reading comprehension of chinese text, and computer device |
CN110309267A (en) * | 2019-07-08 | 2019-10-08 | 哈尔滨工业大学 | Semantic retrieving method and system based on pre-training model |
CN110413741B (en) * | 2019-08-07 | 2022-04-05 | 山东山大鸥玛软件股份有限公司 | Subjective question-oriented intelligent paper marking method |
CN110413741A (en) * | 2019-08-07 | 2019-11-05 | 山东山大鸥玛软件股份有限公司 | A kind of intelligently reading method towards subjective item |
CN111221939A (en) * | 2019-11-22 | 2020-06-02 | 华中师范大学 | Grading method and device and electronic equipment |
CN111221939B (en) * | 2019-11-22 | 2023-09-08 | 华中师范大学 | Scoring method and device and electronic equipment |
CN111104881B (en) * | 2019-12-09 | 2023-12-01 | 科大讯飞股份有限公司 | Image processing method and related device |
CN111104881A (en) * | 2019-12-09 | 2020-05-05 | 科大讯飞股份有限公司 | Image processing method and related device |
CN111079641A (en) * | 2019-12-13 | 2020-04-28 | 科大讯飞股份有限公司 | Answering content identification method, related device and readable storage medium |
CN112988921A (en) * | 2019-12-13 | 2021-06-18 | 北京四维图新科技股份有限公司 | Method and device for identifying map information change |
CN111079641B (en) * | 2019-12-13 | 2024-04-16 | 科大讯飞股份有限公司 | Answer content identification method, related device and readable storage medium |
CN111191578A (en) * | 2019-12-27 | 2020-05-22 | 北京新唐思创教育科技有限公司 | Automatic scoring method, device, equipment and storage medium |
CN111241392A (en) * | 2020-01-07 | 2020-06-05 | 腾讯科技(深圳)有限公司 | Method, device, equipment and readable storage medium for determining popularity of article |
CN111241392B (en) * | 2020-01-07 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Method, apparatus, device and readable storage medium for determining popularity of article |
CN111724813A (en) * | 2020-06-17 | 2020-09-29 | 东莞理工学院 | LSTM-based piano playing automatic scoring method |
CN112085985A (en) * | 2020-08-20 | 2020-12-15 | 安徽七天教育科技有限公司 | Automatic student answer scoring method for English examination translation questions |
CN112287083A (en) * | 2020-10-29 | 2021-01-29 | 北京乐学帮网络技术有限公司 | Evaluation method and device, computer equipment and storage device |
CN112634689A (en) * | 2020-12-24 | 2021-04-09 | 广州奇大教育科技有限公司 | Application method of regular expression in automatic subjective question changing in computer teaching |
CN113392642B (en) * | 2021-06-04 | 2023-06-02 | 北京师范大学 | Automatic labeling system and method for child care cases based on meta learning |
CN113392642A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | System and method for automatically labeling child-bearing case based on meta-learning |
CN113111154A (en) * | 2021-06-11 | 2021-07-13 | 北京世纪好未来教育科技有限公司 | Similarity evaluation method, answer search method, device, equipment and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107967318A (en) | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets | |
CN107967257B (en) | Cascading composition generating method | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
US9779085B2 (en) | Multilingual embeddings for natural language processing | |
CN103823794B (en) | A kind of automatization's proposition method about English Reading Comprehension test query formula letter answer | |
CN104794169B (en) | A kind of subject terminology extraction method and system based on sequence labelling model | |
CN109977199B (en) | Reading understanding method based on attention pooling mechanism | |
CN106569998A (en) | Text named entity recognition method based on Bi-LSTM, CNN and CRF | |
CN108829678A (en) | Name entity recognition method in a kind of Chinese international education field | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN112149421A (en) | Software programming field entity identification method based on BERT embedding | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
CN111368082A (en) | Emotion analysis method for domain adaptive word embedding based on hierarchical network | |
CN110287298A (en) | A kind of automatic question answering answer selection method based on question sentence theme | |
CN113486645A (en) | Text similarity detection method based on deep learning | |
Cai | Automatic essay scoring with recurrent neural network | |
Lilja | Automatic essay scoring of Swedish essays using neural networks | |
CN113011196A (en) | Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model | |
CN110705306B (en) | Evaluation method for consistency of written and written texts | |
CN114579706B (en) | Automatic subjective question review method based on BERT neural network and multi-task learning | |
CN116049349A (en) | Small sample intention recognition method based on multi-level attention and hierarchical category characteristics | |
CN114462389A (en) | Automatic test paper subjective question scoring method | |
Luo | Automatic short answer grading using deep learning | |
CN112036170A (en) | Neural zero sample fine-grained entity classification method based on type attention | |
CN110083825A (en) | A kind of Laotian sentiment analysis method based on GRU model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180427 |