CN107967318A

CN107967318A - A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets

Info

Publication number: CN107967318A
Application number: CN201711177862.1A
Authority: CN
Inventors: 余胜泉; 杨熙; 黄俞卫; 庄福振; 张立山
Original assignee: Beijing Normal University
Current assignee: Beijing Normal University
Priority date: 2017-11-23
Filing date: 2017-11-23
Publication date: 2018-04-27

Abstract

The present invention is a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including：(1) answer case text is segmented, and converts the text to a word sequence；(2) vectorization for obtaining each word in answer text represents and builds answer text mapping matrix；(3) computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the output of all or part of hidden layers, obtain the semantic feature matrix of answer text；(4) down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature vector of the answer text；(5) the semantic feature vector of answer text is assigned to grader, the classification of answer case text is predicted；(6) consider the many-to-one relationship between answer text generic and score, according to default mapping between the two, determine the score of answer text.The present invention can not depend on subjective item model answer, effectively excavate answer text semantic information, realize Chinese short text subjective item automatic scoring.

Description

A kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets and System

Technical field

The present invention relates to Automatic Read Overmarginalia technical field, is that one kind uses long short-term memory (LSTM, Long specifically Short-Term Memory) neutral net Chinese short text subjective item automatic scoring method and system, it can be applied to by Chinese natural language answer translate, letter answer, judge, picture and text conversion the problems such as automatic scoring, and be finally applied to operation and Paper read and make comments and the learning evaluation process of student.

Background technology

For subjective item in occupation of very important status in subject learns and imparts knowledge to students, its maximum advantage is can to measure respectively The more complicated performance-based objective of kind, can more examine or check the creative thinking ability of student and with ability to express.Subjective item also by This becomes one of the most widely used topic type in course teaching and test.Heavy, mechanical subjective item is read and made comments work and is occupied The plenty of time of teacher and energy, and students then wish to obtain the feedback on handling situations in real time, thus teacher and The subjective item Automatic Read Overmarginalia of objective effective, time saving and energy saving resource-saving is realized in all urgent hope of student by computer.Subjective item The realization of automatic scoring has very important realistic meaning：First, the efficiency that teacher reads and makes comments link can be greatly improved, effectively Mitigate teaching job burden；Secondly, the factors such as the subjective preferences for reading and appraising teacher, physical condition and psychological condition can be reduced to commenting Divide the influence of accuracy；Again, Real-time Feedback can be provided for the student of on-line study, saving is read and made comments the stand-by period, improves study Efficiency；Finally, automatic adaptive learning and adaptive test and appraisal task are can be applied to, is the key for realizing intelligent tutoring system Technology.

In daily teaching and examination, type, the features such as short text subjective item mainly includes translation, letter is answered, judged are： (1) answered in the form of natural language；(2) length of answer is shorter, is usually no more than one section of word；(3) student can not be from stem Middle acquisition answer information, it is necessary to carry out understanding application and migration to domain knowledge；(4) answer case content of text is laid particular emphasis on when scoring Investigation, and the feature such as non-textual writing style, rhetorical devices；(5) problem is open varied, can be closing , it is semiclosed or open.Realize computer to the automatic scoring of short text subjective item, it is necessary to which computer can be deeper The semantic information of " understanding " text.In addition, being limited be subject to answer text size, the statistics that computer can be extracted therefrom is believed Breath (such as word co-occurrence, contextual information) is limited, and traditional natural language processing method and model based on statistics are faced with The problems such as Sparse and semantic sensitiveness.Therefore, the automatic scoring of accurate subjective item is obtained, is still a great challenge With technical problem urgently to be resolved hurrily.

One of the key technology of the automatic scoring of subjective item as intellectual education, in field of Educational Technology in occupation of very heavy The status wanted.Summarize domestic and international on going result and find that general methods of marking frame mainly divides following 4 modules, as shown in Figure 1：

Module (1)：Establish database.The correlations such as examination question, model answer, standards of grading and student's answer are included in database Data.

Module (2)：Pretreatment.Answer case text is segmented, duplicate removal, go stop words, part-of-speech tagging etc. handle.

Module (3)：Establish Rating Model.Two submodules are included in the module, both influence each other, and mutually restrict：

A. feature extraction：Using natural language processing technique, rule-based, statistics or the methods of neutral net, carry out text Feature extraction, by answer text vector.

B. modeling method：Mapped, information extraction, the method based on corpus and the methods of machine learning, built using concept Vertical Rating Model.

Module (4)：Scoring.For new student's answer text, the processing of module (2) is carried out to it first, is then placed in In the model established to module (3), Tag Estimation is carried out to new student's answer, this is provided further according to the label predicted and answers The final scoring of case.

In the frame of above-mentioned automatic scoring, the module of core is model building module (i.e. module (3)), the method for mainstream Following 4 class can be divided into：

(1) concept matching：Main thought is that model answer is considered as several key concepts or crucial contamination, according to Score in raw answer with the presence or absence of these key concepts.This method has been relatively specific for clear and definite answer and answer is more simple Short type topic.Typical system has ATM (Automatic Text Marker) and C-rater etc..

(2) information extraction：Main thought is to think that some specific viewpoints, these viewpoints would generally be included in answer text It can be positioned and be modeled with template, the template matches degree of student's answer and model answer is marking foundation.First, from non-knot The extracting data of structure by element group representation structured message；Then matched using such as regular expression or analytic tree isotype Algorithm carries out pattern match.Typical system has AutoMark, WebLSA (Web-based Language Assessment ) and Auto-marking etc. System.

(3) method based on language material：This method is that the statistical nature of corpus, profit are extracted in large-scale text corpus The text similarity of student's answer and model answer is calculated with these statistical natures, student's answer is beaten according to the height of similarity Point.Common method is latent semantic analysis (LSA, Latent Semantic Analysis).Method based on language material is commented Divide performance directly proportional to the scale of corpus.Typical system has Atenea and SAMText (Short Answer Measurement of Text) etc..

(4) machine learning：Main thought is that the scoring problem to short text is converted into text classification or clustering problem.It is first First, using the feature of natural language processing technique extraction student's answer, text is subjected to vectorization expression.Here feature master is extracted To include similar features between text feature, answer and the model answer of answer etc..Then, using the score of student's answer as Class label, classification model construction is carried out so as to obtain Rating Model, often using machine learning algorithm to student's answer feature of extraction Sorting algorithm is k nearest neighbor, logistic regression, naive Bayesian method and support vector machines etc..Typical system has e- Examiner and CAM (Content Assessment Module) etc..

To Chinese subjective item carry out automatic scoring, at present the technology of mainstream the main problems are as follows：

(1) above method is mainly used for English subjective item automatic scoring, but due to the natural language processing of Chinese and English The greatest differences of technology, therefore the above method is difficult to be transplanted in the automatic scoring of Chinese subjective item.

(2) above method is directed to closed question, i.e. such problem has model answer, but in actual teaching and examination In, many problems do not have model answer.In taking an examination such as Chinese language, the standards of grading of some topics for " sounding reasonable " or Person " looks like to i.e. score ".And for it is this kind of without model answer and standards of grading it is relatively fuzzyyer the problem of, above-mentioned algorithm is not It is applicable in.

(3) above method largely relies on traditional language model, and the method for extracting Text Representation is complicated, can not solve The problems such as Deta sparseness and semanteme sensitiveness that certainly short text length is come compared with short strip.

In recent years, deep learning (Deep Learning) algorithm is in natural language processing (NLP, Natural Language Processing) field achieves the achievement to attract people's attention.Compared to traditional language model, the model energy based on deep learning Enough semantic informations for preferably excavating word, phrase, sentence and chapter.Particularly, Recognition with Recurrent Neural Network (RNN, Recurrent Neural Network) it is widely used in since it adapts to sequence information modeling in the task of natural language processing, and Achieve good effect.RNN with LSTM units, solves the problems, such as that the Long-range dependence in traditional RNN and gradient disappear and ask Topic, thus more scholars are of interest.

The content of the invention

The task of the present invention is overcome the deficiencies in the prior art, it is contemplated that Chinese short text subjective item automatic scoring problem Feature, the challenge faced and LSTM neutral nets establish the advantage in problem in language model, it is proposed that one kind is using LSTM god Chinese short text subjective item automatic scoring method and system through network, can be in the case of independent of model answer in Literary short text subjective item carries out automatic scoring.

Chinese subjective item automatic scoring problem is converted into text classification problem by the present invention, is come using the term vector of pre-training Student's answer text is indicated, then utilizes long Memory Neural Networks (LSTM, Long Short-Term Memory in short-term Neural Network) extraction text semantic feature vector, for training grader, answer case text carries out classification prediction, The final score of this answer is finally determined according to the mapping relations between predetermined classification and scoring.The present invention is first by LSTM Neutral net is incorporated into Chinese short text subjective item automatic scoring method, is LSTM neutral nets in Chinese short text subjective item New opplication in automatic scoring.The inventive method, solves the problems, such as the automatic scoring of Chinese subjective item, reduces scoring algorithm To the dependence of model answer.Meanwhile more traditional subjective item automatic scoring method, the present invention carry out vector in answer case text Change and word sequential relationship within a context is considered when representing and answer case text has carried out semantic extension, obtained answer Text semantic feature vector dimension is controllable, can effectively solve Deta sparseness and semantic sensitive question that text comes compared with short strip.

Chinese short text subjective item automatic scoring method provided by the invention using LSTM neutral nets, including following step Suddenly：

Step 1：Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence；

Step 2：The vectorization for obtaining each word in answer text represents and builds answer text mapping matrix；

Step 3：Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains all or part of hidden layers Output, obtain the semantic feature matrix of answer text；

Step 4：Down-sampling is carried out to the semantic feature matrix using pond algorithm (Pooling) and obtains the answer text This semantic feature vector；

Step 5：The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers The classification of case text is predicted；

Step 6：According to default answer text generic and the mapping relations of score, the score of answer text is determined.

Following technical characteristic further can also provide some preferential technical solutions to above-mentioned automatic scoring method.

In the step 2, in default dictionary, each word in answer case text scan for obtain the word to Quantization means, the sequencing then occurred according to each word in answer text, builds answer text mapping matrix；For institute State indivedual words in answer text and do not appear in situation in the dictionary, be considered as stop words and carry out discard processing.

In the step 3, computing is carried out using LSTM neutral net answer case text mapping matrixes M, extracts answer text Semantic feature, generate the semantic feature matrix H of answer text, matrix H is all or part of implicit by the LSTM neutral nets The output vector composition of layer.

In the step 3, the mode of answer text mapping matrix M input LSTM neutral nets is：Each moment inputs square A row of battle array M to LSTM neutral nets, the column vector of matrix M sequentially inputs LSTM neutral nets, effectively to arrange mark ascending order arrangement Remain the word order information of answer text.

LSTM neural network model parameters and sorter model parameter in the step 3 and step 5 is in the scoring Obtained during model training, use the cross entropy for minimizing destination probability distribution and actual probability distribution as object function, profit Batch sample error is calculated with gradient descent method, and uses back propagation renewal LSTM neural network model parameters and grader Model parameter.

Generated in the step 4 semantic feature vector for input answer text vectorization semantic feature represent, this to Quantify semantic feature represent in contain answer text word order information and word and text semantic between related information.

The pond algorithm used in the step 4, may be selected different pond methods, maximum pond method (max- ) or minimum pond method (min-pooling) or the Chi Huafa (average pooling) etc. that is averaged pooling.

It is many-to-one relation between answer text generic and score in the step 6, that is, allows different classes of Answer text obtains identical score, but does not allow the answer text of identical category to obtain different scores.

It is described present invention also offers a kind of Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets Auto-Evaluation System includes input module, data processing module, semantic feature extraction module, grading module, lexicon module, its In：

Input module, for the answer text to be reached data processing module；

Data processing module, for corresponding answer text mapping to be segmented and built to the answer text of the input Matrix；Answer text mapping matrix is reached into semantic feature extraction module；

Semantic feature extraction module, for obtaining the semantic feature vector of the answer text；Including LSTM neutral nets Layer and pond layer, are input to LSTM neutral nets by the answer text mapping matrix, obtain network portion or all hidden layers Output, obtain answer text semantic eigenmatrix, then by answer text semantic eigenmatrix carry out pond computing, answered Case text semantic feature vector, and send it to grading module；

Grading module, for determining the score of answer text；Answer text semantic feature vector imparting multiclass is patrolled This meaning grader is collected, the classification of answer case text is predicted, is then reflected according to the answer text categories of prediction according to default Penetrate relationship map and be the score of the answer text, and export appraisal result；

Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents (term vector), for number The calling of data is provided according to processing module.

The advantages of Chinese subjective item automatic scoring method proposed by the present invention using LSTM neutral nets, is：

(1) present invention does not depend on model answer, learns semantic feature automatically from existing student's answer text, by subjectivity The conversion of automatic scoring problem is inscribed for short text classification problem.

(2) LSTM neutral nets are incorporated into Chinese short text subjective item automatic scoring method by the present invention first, are New opplications of the LSTM in short text subjective item automatic scoring.

(3) present invention is introduced substantial amounts of external auxiliary information, is expanded using the term vector set initialization dictionary of pre-training Opened up the semantic information in answer text, effectively solve brought since answer text is too short contextual information deficiency ask Topic.

(4) present invention is modeled by the word sequence of LSTM neutral net answer case texts, is remained in modeling process The time sequencing that word occurs in the text, has effectively excavated the word order feature in context.

(5) for the present invention independent of complicated language model, the semanteme that answer text is extracted by LSTM neutral nets is special Sign, effectively excavates the related information between semantic information and the word in answer text, improves the semantic sensitiveness of short text Problem, improves the performance of subjective item automatic scoring.

(6) present invention is portrayed between the two by the mapping of previously given answer text generic and final score Many-to-one relation, has taken into full account the possible identical situation of student's answer score of multiple generics.

Brief description of the drawings

Fig. 1 is existing subjective item automatic scoring algorithm general framework；

Fig. 2 is Chinese short text subjective item automatic scoring flow chart proposed by the present invention；

Fig. 3 trains flow chart for answer textual classification model of the present invention；

Fig. 4 is test data set term vector sectional drawing of the present invention.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.

The present invention proposes a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including in Text participle, structure answer text mapping matrix, extraction semantic feature vector, text classification and scoring and etc., its flow chart is such as Shown in Fig. 2.Comprise the following steps that described：

Step 1, the answer text of input is segmented, obtains word sequence S={ w₁,w₂,…,w_N, wherein w_iFor word order I-th of word in S, i=1,2 ..., N are arranged, N is the number for including word in word sequence S.

Step 2, dictionary Dict is initialized using the term vector set of pre-training, introduced related to subjective item scoring Auxiliary information.

Step 3, answer text mapping matrix is built.By way of inquiring about dictionary, obtain in answer text and appear in word The vector representation of all words in allusion quotation Dict, the sequencing occurred in the text according to word, builds text mapping matrix M, tool Body formula is：

M=Dictindex (S)=[v₁,v₂,…,v_N]

Wherein, index (w) is index functions of the word w in the dictionary Dict, v_iFor word w_iIt is right in dictionary Dict The term vector answered.In the case of indivedual words in the answer text are not appeared in the dictionary Dict, the present embodiment Using the method directly abandoned, the word that will be not included in dictionary is handled as stop words.

Step 4, computing is carried out to the mapping matrix M of the answer text using LSTM neutral nets, extracts answer text Semantic feature, generate answer text semantic feature matrix H, matrix H is by all or part of LSTM neutral nets hidden layers Output vector forms, and specific formula for calculation is：

i_t=σ (Wⁱv_t+Uⁱh_t-1+bⁱ)

o_t=σ (W^ov_t+U^oh_t-1+b^o)

f_t=σ (W^fv_t+U^fh_t-1+b^f)

u_t=tanh (W^uv_t+U^uh_t-1+b^u)

c_t=i_t⊙u_t+f_t⊙c_t-1

h_t=o_t⊙tanh(c_t)

T=1,2 ..., T

Symbol implication is as listed in table 1 wherein in formula.

1. formal notation of table defines

T	Moment
		M	Answer text mapping matrix
v_t	D dimensional vectors, the input of t moment, the t row of matrix M
		W^k	Q × d ties up matrix, and k=i, o, f, u, input weight matrix
U^k	Q × q ties up matrix, and k=i, o, f, u, circulate weight matrix
		b^k	Q dimensional vectors, k=i, o, f, u, biasing
i_t	The input gate of t moment
		o_t	The out gate of t moment
f_t	The forgetting door of t moment
		u_t	The candidate value of t moment
c_t	The hidden layer internal state of t moment
		h_t	Q dimensional vectors, output of the hidden layer in t moment
⊙	Element multiplication
		σ	1/(1+e^-x) function
Tanh	Hyperbolic tangent function
		T	LSTM neural network node numbers

Step 5, down-sampling is carried out to the semantic feature matrix H of the answer text using pond algorithm (Pooling) to obtain To the semantic feature vector l of the answer text_T.The semantic feature vector is the vectorization semantic feature of input answer text Represent.

Step 6, assign many-sorted logic this carefully classification the semantic feature vector of the answer text obtained in the step 5 Device, the classification Y of answer case text_SIt is predicted.Specific formula for calculation is as follows：

Wherein, l_TFor the answer text semantic feature vector obtained using LSTM neutral nets, y_jIt is j-th of classification, W_lFor | Y | the grader weight matrix of × q dimensions, b_lFor | Y | the grader biasing of dimension, M are the corresponding mapping matrixes of training sample S, Θ For the parameter sets of the LSTM neutral nets, i.e. Θ={ W^k,U^k,b^k, k=i, o, f, u }, Y is the predefined class of training sample Do not gather, | Y | to include the number of element, [A] in Y_kThe row k of representing matrix A.

Wherein, step 4, the parameter in LSTM neutral nets described in step 6 and many-sorted logic this carefully return grader ginseng Number obtains in the LSTM neural metwork trainings stage, as shown in figure 3, specific training step is as follows：

(1) when being trained to the model, the object function that uses of the present invention for minimize destination probability distribution with The cross entropy of actual probability distribution, object function are defined as：

Wherein, y^*And M_iRespectively training sample S_iCorresponding correct classification and mapping matrix, C are the number of training sample, Parameter setsα is regularization factors.

(2) to the object functionBatch sample error is calculated using gradient descent method, and uses back propagation (BP, Back Propagation) updates the parameter setsMore new formula is：

Wherein, λ is learning rate.

Step 7, the score G (S) of answer text is determined according to given answer text S generics.Specific formula for calculation It is as follows：

G (S)=F (Y_S)

Wherein F () is given function, and codomain is set { 0,1 ..., K }, and K is the full marks that answer text S corresponds to topic.

In order to verify the actual techniques effect of the present invention, inventor carries out so that one group of primary school 6 grades reads and understands as an example Test.Examination question is as follows：

Article please be read, is answered a question.

Long long ago, only beautiful bird has a rest in the mountain forest in the State of Lu countryside, the king of the State of Lu sends someone it It is connected in the Imperial Ancestral Temple and raises, the good wine of most mellowness is drunk to it, feeds the beef and mutton that it eats delicious food, played to it beautiful《Nine is splendid》Music. But bird is dizzy, worry is sad, dare not eat piece of meat, dare not drink a glass of wine, after three days just in the dust.

This is kept pet by the life style of oneself, rather than is kept pet by the habit of bird！If keeping pet by the habit of bird, just It should be allowed to inhabit in remote, thickly forested mountains, on land, shoal is played, and is circled in the air in rivers,lakes and seas, is rested with the queue of flock of birds, from By comfortable live in forest.Birds are detested to listen the sound of people, but why also noisy to go to be tired of them noisyly Magnificent palace " salty pond " is inner to play grace to bird《Nine is splendid》How laughable music is.If played in wilderness field《Nine It is splendid》, small bird, which has listened, just flys far and high, and beasts, which have listened just to be scared, to run away, and fish, which has been listened just to dive into the water, to keep close, but people Having listened will crowd around to watch.Fish could survive in water, but people enter to water in will be drowned.The mankind and fish habit are not Together, both tastes are natively different.Therefore, the sage of former generation does not impose the mankind and small bird to have the same ability, not yet Them can be allowed to do same thing.

It is understood that natural rule will be met by doing things, happiness could be muchly kept.You think best, Ren Jiake It can feel bad, in some instances it may even be possible to feel very bad.Measure other's corn by own bushel, sometimes wrong；Get sth into one's head, often lead to thing mistake Lose.

Original text changes certainly《Village is to a happy piece》

Please according to author's viewpoint, with the king of the Hua Shuoyishuo thes State of Lu of oneself keep pet failure the reason for：

--------------------------------------------------------

The topic full marks are 2 points.Effective student's answer 533 is collected altogether, and every answer is beaten respectively by 2 teachers Point, marking coincident indicator kappa=0.755.Student's answer text averagely includes 24 Chinese vocabularies, and longest text includes 78 vocabulary, student's answer text extract are as shown in table 2.Inventor is collected using wikipedia, search dog news and inventor Primary school student composition training word represents vector, and more than totally 47 ten thousand bar term vector, every term vector 200 are tieed up, as shown in Figure 4.This implementation Word represents that the training tool of vector is python Open-Source Tools gensim in example, and gensim is in https:// Radimrehurek.com/gensim/models/word2vec.html is discussed in detail.

2. answer text extract of table

Hardware environment is used by this experiment：64 bit manipulation systems of Ubuntu, Intel i7 processors, CPU frequency 3.41GHz, memory 16G.Chinese word segmentation uses Open-Source Tools " stammerer " Chinese word segmentation, in https://pypi.python.org/ In pypi/jieba/ and Zheng Jie is written《NLP Chinese natural language process principle and practice》In have it is detailed on " stammerer " The introduction of Chinese word segmentation.In the LSTM neutral nets, using 128 LSTM units, the wheel number epoch=of model training 30, using maximum pond algorithm.After LSTM neutral nets carry out the progress semantic feature vectorization of student's answer text, using more This carefully returns grader classification to the logic of class, and tag along sort then is converted into scoring according to established rule.Assuming that student's answer is literary This S, which is predicted to be, belongs to Y_SClass, then G (S) is the prediction score of this answer.Student's answer text classification mark in the present embodiment The score for this answer is signed, i.e.,：

Using 80% sample in the data set as training set, remaining sample forms test set, that is, randomly selects 426 samples This is used for model training, remaining 107 parts of sample is used for model measurement, and the scoring accuracy rate after cross validation is 80.1%, kappa It is worth for 0.686, close to the marking uniformity of people.As it can be seen that answered data using a small amount of student, it is possible to answered in the standard of not depending on In the case of case information, the structure of the automatic scoring model with superperformance is realized.

In conclusion the present invention proposes a kind of Chinese short text subjective item automatic scoring side using LSTM neutral nets Method, can fully excavate the semantic information in answer text, realizes to Chinese short essay in the case of independent of code of points The automatic scoring of this subjective item.The experiment test of truthful data collection shows, according to the LSTM neutral nets of the method for the present invention training , can be with the realization Chinese short text subjective item automatic scoring of better quality with good text classification performance.

Finally it should be noted that above example be only used for description technical scheme rather than to technical method into Row limitation, within the spirit and principles of the invention, does any modification, equivalent substitution, improvement and etc. and should be included in this hair Within bright protection domain.

Claims

A kind of 1. Chinese short text subjective item automatic scoring method using LSTM neutral nets, it is characterised in that including following Step：

Step 1：Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence；

Step 2：The vectorization for obtaining each word in answer text represents, and builds answer text mapping matrix；

Step 3：Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the defeated of all or part of hidden layers Go out, obtain the semantic feature matrix of answer text；

Step 4：Down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature of the answer text Vector；

Step 5：The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers text This classification is predicted；

Step 6：According to default answer text generic and the mapping relations of score, the score of answer text is determined.
2. method according to claim 1, it is characterised in that in the step 2, in default dictionary, answer case text In each word scan for representing with the vectorization for obtaining the word, the priority then occurred according to each word in answer text Sequentially, answer text mapping matrix is built；The feelings in the dictionary are not appeared in for indivedual words in the answer text Condition, is considered as stop words and carries out discard processing.
3. method according to claim 1, it is characterised in that in the step 3, utilize LSTM neutral net answer case texts Mapping matrix M carry out computing, extract answer text semantic feature, generate answer text semantic feature matrix H, matrix H by The output vector composition of all or part of hidden layers of LSTM neutral nets.
4. method according to claim 1, it is characterised in that in the step 3, answer text mapping matrix M inputs LSTM The mode of neutral net is：A row of each moment input matrix M to LSTM neutral nets, the column vector of matrix M is risen with arranging mark Sequence arrangement sequentially inputs LSTM neutral nets, has been effectively retained the word order information of answer text.
5. method according to claim 1, it is characterised in that the LSTM neural network models ginseng in the step 3 and step 5 Number and sorter model parameter obtain in the Rating Model training process, are distributed and reality using destination probability is minimized The cross entropy of probability distribution is object function, and batch sample error is calculated using gradient descent method, and using back propagation more New LSTM neural network model parameters and sorter model parameter.
6. method according to claim 1, it is characterised in that the semantic feature vector generated in the step 4 is answered for input The vectorization semantic feature of case text represents that the vectorization semantic feature contains the word order information and word of answer text in representing Related information between language and text semantic.
7. method according to claim 1, it is characterised in that the pond algorithm used in the step 4, the pond method Using maximum pond method or minimum pond method or average Chi Huafa.
8. method according to claim 1, it is characterised in that in the step 6 between answer text generic and score It is many-to-one relation, that is, allows different classes of answer text to obtain identical score, but do not allow the answer of identical category Text obtains different scores.
A kind of 9. Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets, it is characterised in that the scoring System, including input module, data processing module, semantic feature extraction module, grading module and lexicon module, wherein：

Input module, for the answer text to be reached data processing module；

Data processing module, for corresponding answer text mapping square to be segmented and built to the answer text of the input Battle array；Answer text mapping matrix is reached into semantic feature extraction module；

Semantic feature extraction module, for obtaining the semantic feature vector of the answer text；Including LSTM neural net layers and Pond layer, LSTM neutral nets are input to by the answer text mapping matrix, obtain some or all hidden layers in network Output, obtains answer text semantic eigenmatrix, and answer text semantic eigenmatrix then is carried out pond computing, obtains answer Text semantic feature vector, and send it to grading module；

Grading module, for determining the score of answer text；By the answer text semantic feature vector assign many-sorted logic this Meaning grader, the classification of answer case text are predicted, and are then closed according to the answer text categories of prediction according to default mapping System is mapped as the score of the answer text, and exports appraisal result；

Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents, carried for data processing module For the calling of data.