CN107967318A - A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets - Google Patents

A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets Download PDF

Info

Publication number
CN107967318A
CN107967318A CN201711177862.1A CN201711177862A CN107967318A CN 107967318 A CN107967318 A CN 107967318A CN 201711177862 A CN201711177862 A CN 201711177862A CN 107967318 A CN107967318 A CN 107967318A
Authority
CN
China
Prior art keywords
text
answer
answer text
semantic feature
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711177862.1A
Other languages
Chinese (zh)
Inventor
余胜泉
杨熙
黄俞卫
庄福振
张立山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201711177862.1A priority Critical patent/CN107967318A/en
Publication of CN107967318A publication Critical patent/CN107967318A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Abstract

The present invention is a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including:(1) answer case text is segmented, and converts the text to a word sequence;(2) vectorization for obtaining each word in answer text represents and builds answer text mapping matrix;(3) computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the output of all or part of hidden layers, obtain the semantic feature matrix of answer text;(4) down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature vector of the answer text;(5) the semantic feature vector of answer text is assigned to grader, the classification of answer case text is predicted;(6) consider the many-to-one relationship between answer text generic and score, according to default mapping between the two, determine the score of answer text.The present invention can not depend on subjective item model answer, effectively excavate answer text semantic information, realize Chinese short text subjective item automatic scoring.

Description

A kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets and System
Technical field
The present invention relates to Automatic Read Overmarginalia technical field, is that one kind uses long short-term memory (LSTM, Long specifically Short-Term Memory) neutral net Chinese short text subjective item automatic scoring method and system, it can be applied to by Chinese natural language answer translate, letter answer, judge, picture and text conversion the problems such as automatic scoring, and be finally applied to operation and Paper read and make comments and the learning evaluation process of student.
Background technology
For subjective item in occupation of very important status in subject learns and imparts knowledge to students, its maximum advantage is can to measure respectively The more complicated performance-based objective of kind, can more examine or check the creative thinking ability of student and with ability to express.Subjective item also by This becomes one of the most widely used topic type in course teaching and test.Heavy, mechanical subjective item is read and made comments work and is occupied The plenty of time of teacher and energy, and students then wish to obtain the feedback on handling situations in real time, thus teacher and The subjective item Automatic Read Overmarginalia of objective effective, time saving and energy saving resource-saving is realized in all urgent hope of student by computer.Subjective item The realization of automatic scoring has very important realistic meaning:First, the efficiency that teacher reads and makes comments link can be greatly improved, effectively Mitigate teaching job burden;Secondly, the factors such as the subjective preferences for reading and appraising teacher, physical condition and psychological condition can be reduced to commenting Divide the influence of accuracy;Again, Real-time Feedback can be provided for the student of on-line study, saving is read and made comments the stand-by period, improves study Efficiency;Finally, automatic adaptive learning and adaptive test and appraisal task are can be applied to, is the key for realizing intelligent tutoring system Technology.
In daily teaching and examination, type, the features such as short text subjective item mainly includes translation, letter is answered, judged are: (1) answered in the form of natural language;(2) length of answer is shorter, is usually no more than one section of word;(3) student can not be from stem Middle acquisition answer information, it is necessary to carry out understanding application and migration to domain knowledge;(4) answer case content of text is laid particular emphasis on when scoring Investigation, and the feature such as non-textual writing style, rhetorical devices;(5) problem is open varied, can be closing , it is semiclosed or open.Realize computer to the automatic scoring of short text subjective item, it is necessary to which computer can be deeper The semantic information of " understanding " text.In addition, being limited be subject to answer text size, the statistics that computer can be extracted therefrom is believed Breath (such as word co-occurrence, contextual information) is limited, and traditional natural language processing method and model based on statistics are faced with The problems such as Sparse and semantic sensitiveness.Therefore, the automatic scoring of accurate subjective item is obtained, is still a great challenge With technical problem urgently to be resolved hurrily.
One of the key technology of the automatic scoring of subjective item as intellectual education, in field of Educational Technology in occupation of very heavy The status wanted.Summarize domestic and international on going result and find that general methods of marking frame mainly divides following 4 modules, as shown in Figure 1:
Module (1):Establish database.The correlations such as examination question, model answer, standards of grading and student's answer are included in database Data.
Module (2):Pretreatment.Answer case text is segmented, duplicate removal, go stop words, part-of-speech tagging etc. handle.
Module (3):Establish Rating Model.Two submodules are included in the module, both influence each other, and mutually restrict:
A. feature extraction:Using natural language processing technique, rule-based, statistics or the methods of neutral net, carry out text Feature extraction, by answer text vector.
B. modeling method:Mapped, information extraction, the method based on corpus and the methods of machine learning, built using concept Vertical Rating Model.
Module (4):Scoring.For new student's answer text, the processing of module (2) is carried out to it first, is then placed in In the model established to module (3), Tag Estimation is carried out to new student's answer, this is provided further according to the label predicted and answers The final scoring of case.
In the frame of above-mentioned automatic scoring, the module of core is model building module (i.e. module (3)), the method for mainstream Following 4 class can be divided into:
(1) concept matching:Main thought is that model answer is considered as several key concepts or crucial contamination, according to Score in raw answer with the presence or absence of these key concepts.This method has been relatively specific for clear and definite answer and answer is more simple Short type topic.Typical system has ATM (Automatic Text Marker) and C-rater etc..
(2) information extraction:Main thought is to think that some specific viewpoints, these viewpoints would generally be included in answer text It can be positioned and be modeled with template, the template matches degree of student's answer and model answer is marking foundation.First, from non-knot The extracting data of structure by element group representation structured message;Then matched using such as regular expression or analytic tree isotype Algorithm carries out pattern match.Typical system has AutoMark, WebLSA (Web-based Language Assessment ) and Auto-marking etc. System.
(3) method based on language material:This method is that the statistical nature of corpus, profit are extracted in large-scale text corpus The text similarity of student's answer and model answer is calculated with these statistical natures, student's answer is beaten according to the height of similarity Point.Common method is latent semantic analysis (LSA, Latent Semantic Analysis).Method based on language material is commented Divide performance directly proportional to the scale of corpus.Typical system has Atenea and SAMText (Short Answer Measurement of Text) etc..
(4) machine learning:Main thought is that the scoring problem to short text is converted into text classification or clustering problem.It is first First, using the feature of natural language processing technique extraction student's answer, text is subjected to vectorization expression.Here feature master is extracted To include similar features between text feature, answer and the model answer of answer etc..Then, using the score of student's answer as Class label, classification model construction is carried out so as to obtain Rating Model, often using machine learning algorithm to student's answer feature of extraction Sorting algorithm is k nearest neighbor, logistic regression, naive Bayesian method and support vector machines etc..Typical system has e- Examiner and CAM (Content Assessment Module) etc..
To Chinese subjective item carry out automatic scoring, at present the technology of mainstream the main problems are as follows:
(1) above method is mainly used for English subjective item automatic scoring, but due to the natural language processing of Chinese and English The greatest differences of technology, therefore the above method is difficult to be transplanted in the automatic scoring of Chinese subjective item.
(2) above method is directed to closed question, i.e. such problem has model answer, but in actual teaching and examination In, many problems do not have model answer.In taking an examination such as Chinese language, the standards of grading of some topics for " sounding reasonable " or Person " looks like to i.e. score ".And for it is this kind of without model answer and standards of grading it is relatively fuzzyyer the problem of, above-mentioned algorithm is not It is applicable in.
(3) above method largely relies on traditional language model, and the method for extracting Text Representation is complicated, can not solve The problems such as Deta sparseness and semanteme sensitiveness that certainly short text length is come compared with short strip.
In recent years, deep learning (Deep Learning) algorithm is in natural language processing (NLP, Natural Language Processing) field achieves the achievement to attract people's attention.Compared to traditional language model, the model energy based on deep learning Enough semantic informations for preferably excavating word, phrase, sentence and chapter.Particularly, Recognition with Recurrent Neural Network (RNN, Recurrent Neural Network) it is widely used in since it adapts to sequence information modeling in the task of natural language processing, and Achieve good effect.RNN with LSTM units, solves the problems, such as that the Long-range dependence in traditional RNN and gradient disappear and ask Topic, thus more scholars are of interest.
The content of the invention
The task of the present invention is overcome the deficiencies in the prior art, it is contemplated that Chinese short text subjective item automatic scoring problem Feature, the challenge faced and LSTM neutral nets establish the advantage in problem in language model, it is proposed that one kind is using LSTM god Chinese short text subjective item automatic scoring method and system through network, can be in the case of independent of model answer in Literary short text subjective item carries out automatic scoring.
Chinese subjective item automatic scoring problem is converted into text classification problem by the present invention, is come using the term vector of pre-training Student's answer text is indicated, then utilizes long Memory Neural Networks (LSTM, Long Short-Term Memory in short-term Neural Network) extraction text semantic feature vector, for training grader, answer case text carries out classification prediction, The final score of this answer is finally determined according to the mapping relations between predetermined classification and scoring.The present invention is first by LSTM Neutral net is incorporated into Chinese short text subjective item automatic scoring method, is LSTM neutral nets in Chinese short text subjective item New opplication in automatic scoring.The inventive method, solves the problems, such as the automatic scoring of Chinese subjective item, reduces scoring algorithm To the dependence of model answer.Meanwhile more traditional subjective item automatic scoring method, the present invention carry out vector in answer case text Change and word sequential relationship within a context is considered when representing and answer case text has carried out semantic extension, obtained answer Text semantic feature vector dimension is controllable, can effectively solve Deta sparseness and semantic sensitive question that text comes compared with short strip.
Chinese short text subjective item automatic scoring method provided by the invention using LSTM neutral nets, including following step Suddenly:
Step 1:Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence;
Step 2:The vectorization for obtaining each word in answer text represents and builds answer text mapping matrix;
Step 3:Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains all or part of hidden layers Output, obtain the semantic feature matrix of answer text;
Step 4:Down-sampling is carried out to the semantic feature matrix using pond algorithm (Pooling) and obtains the answer text This semantic feature vector;
Step 5:The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers The classification of case text is predicted;
Step 6:According to default answer text generic and the mapping relations of score, the score of answer text is determined.
Following technical characteristic further can also provide some preferential technical solutions to above-mentioned automatic scoring method.
In the step 2, in default dictionary, each word in answer case text scan for obtain the word to Quantization means, the sequencing then occurred according to each word in answer text, builds answer text mapping matrix;For institute State indivedual words in answer text and do not appear in situation in the dictionary, be considered as stop words and carry out discard processing.
In the step 3, computing is carried out using LSTM neutral net answer case text mapping matrixes M, extracts answer text Semantic feature, generate the semantic feature matrix H of answer text, matrix H is all or part of implicit by the LSTM neutral nets The output vector composition of layer.
In the step 3, the mode of answer text mapping matrix M input LSTM neutral nets is:Each moment inputs square A row of battle array M to LSTM neutral nets, the column vector of matrix M sequentially inputs LSTM neutral nets, effectively to arrange mark ascending order arrangement Remain the word order information of answer text.
LSTM neural network model parameters and sorter model parameter in the step 3 and step 5 is in the scoring Obtained during model training, use the cross entropy for minimizing destination probability distribution and actual probability distribution as object function, profit Batch sample error is calculated with gradient descent method, and uses back propagation renewal LSTM neural network model parameters and grader Model parameter.
Generated in the step 4 semantic feature vector for input answer text vectorization semantic feature represent, this to Quantify semantic feature represent in contain answer text word order information and word and text semantic between related information.
The pond algorithm used in the step 4, may be selected different pond methods, maximum pond method (max- ) or minimum pond method (min-pooling) or the Chi Huafa (average pooling) etc. that is averaged pooling.
It is many-to-one relation between answer text generic and score in the step 6, that is, allows different classes of Answer text obtains identical score, but does not allow the answer text of identical category to obtain different scores.
It is described present invention also offers a kind of Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets Auto-Evaluation System includes input module, data processing module, semantic feature extraction module, grading module, lexicon module, its In:
Input module, for the answer text to be reached data processing module;
Data processing module, for corresponding answer text mapping to be segmented and built to the answer text of the input Matrix;Answer text mapping matrix is reached into semantic feature extraction module;
Semantic feature extraction module, for obtaining the semantic feature vector of the answer text;Including LSTM neutral nets Layer and pond layer, are input to LSTM neutral nets by the answer text mapping matrix, obtain network portion or all hidden layers Output, obtain answer text semantic eigenmatrix, then by answer text semantic eigenmatrix carry out pond computing, answered Case text semantic feature vector, and send it to grading module;
Grading module, for determining the score of answer text;Answer text semantic feature vector imparting multiclass is patrolled This meaning grader is collected, the classification of answer case text is predicted, is then reflected according to the answer text categories of prediction according to default Penetrate relationship map and be the score of the answer text, and export appraisal result;
Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents (term vector), for number The calling of data is provided according to processing module.
The advantages of Chinese subjective item automatic scoring method proposed by the present invention using LSTM neutral nets, is:
(1) present invention does not depend on model answer, learns semantic feature automatically from existing student's answer text, by subjectivity The conversion of automatic scoring problem is inscribed for short text classification problem.
(2) LSTM neutral nets are incorporated into Chinese short text subjective item automatic scoring method by the present invention first, are New opplications of the LSTM in short text subjective item automatic scoring.
(3) present invention is introduced substantial amounts of external auxiliary information, is expanded using the term vector set initialization dictionary of pre-training Opened up the semantic information in answer text, effectively solve brought since answer text is too short contextual information deficiency ask Topic.
(4) present invention is modeled by the word sequence of LSTM neutral net answer case texts, is remained in modeling process The time sequencing that word occurs in the text, has effectively excavated the word order feature in context.
(5) for the present invention independent of complicated language model, the semanteme that answer text is extracted by LSTM neutral nets is special Sign, effectively excavates the related information between semantic information and the word in answer text, improves the semantic sensitiveness of short text Problem, improves the performance of subjective item automatic scoring.
(6) present invention is portrayed between the two by the mapping of previously given answer text generic and final score Many-to-one relation, has taken into full account the possible identical situation of student's answer score of multiple generics.
Brief description of the drawings
Fig. 1 is existing subjective item automatic scoring algorithm general framework;
Fig. 2 is Chinese short text subjective item automatic scoring flow chart proposed by the present invention;
Fig. 3 trains flow chart for answer textual classification model of the present invention;
Fig. 4 is test data set term vector sectional drawing of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
The present invention proposes a kind of Chinese short text subjective item automatic scoring method using LSTM neutral nets, including in Text participle, structure answer text mapping matrix, extraction semantic feature vector, text classification and scoring and etc., its flow chart is such as Shown in Fig. 2.Comprise the following steps that described:
Step 1, the answer text of input is segmented, obtains word sequence S={ w1,w2,…,wN, wherein wiFor word order I-th of word in S, i=1,2 ..., N are arranged, N is the number for including word in word sequence S.
Step 2, dictionary Dict is initialized using the term vector set of pre-training, introduced related to subjective item scoring Auxiliary information.
Step 3, answer text mapping matrix is built.By way of inquiring about dictionary, obtain in answer text and appear in word The vector representation of all words in allusion quotation Dict, the sequencing occurred in the text according to word, builds text mapping matrix M, tool Body formula is:
M=Dictindex (S)=[v1,v2,…,vN]
Wherein, index (w) is index functions of the word w in the dictionary Dict, viFor word wiIt is right in dictionary Dict The term vector answered.In the case of indivedual words in the answer text are not appeared in the dictionary Dict, the present embodiment Using the method directly abandoned, the word that will be not included in dictionary is handled as stop words.
Step 4, computing is carried out to the mapping matrix M of the answer text using LSTM neutral nets, extracts answer text Semantic feature, generate answer text semantic feature matrix H, matrix H is by all or part of LSTM neutral nets hidden layers Output vector forms, and specific formula for calculation is:
it=σ (Wivt+Uiht-1+bi)
ot=σ (Wovt+Uoht-1+bo)
ft=σ (Wfvt+Ufht-1+bf)
ut=tanh (Wuvt+Uuht-1+bu)
ct=it⊙ut+ft⊙ct-1
ht=ot⊙tanh(ct)
T=1,2 ..., T
Symbol implication is as listed in table 1 wherein in formula.
1. formal notation of table defines
T Moment
M Answer text mapping matrix
vt D dimensional vectors, the input of t moment, the t row of matrix M
Wk Q × d ties up matrix, and k=i, o, f, u, input weight matrix
Uk Q × q ties up matrix, and k=i, o, f, u, circulate weight matrix
bk Q dimensional vectors, k=i, o, f, u, biasing
it The input gate of t moment
ot The out gate of t moment
ft The forgetting door of t moment
ut The candidate value of t moment
ct The hidden layer internal state of t moment
ht Q dimensional vectors, output of the hidden layer in t moment
Element multiplication
σ 1/(1+e-x) function
Tanh Hyperbolic tangent function
T LSTM neural network node numbers
Step 5, down-sampling is carried out to the semantic feature matrix H of the answer text using pond algorithm (Pooling) to obtain To the semantic feature vector l of the answer textT.The semantic feature vector is the vectorization semantic feature of input answer text Represent.
Step 6, assign many-sorted logic this carefully classification the semantic feature vector of the answer text obtained in the step 5 Device, the classification Y of answer case textSIt is predicted.Specific formula for calculation is as follows:
Wherein, lTFor the answer text semantic feature vector obtained using LSTM neutral nets, yjIt is j-th of classification, WlFor | Y | the grader weight matrix of × q dimensions, blFor | Y | the grader biasing of dimension, M are the corresponding mapping matrixes of training sample S, Θ For the parameter sets of the LSTM neutral nets, i.e. Θ={ Wk,Uk,bk, k=i, o, f, u }, Y is the predefined class of training sample Do not gather, | Y | to include the number of element, [A] in YkThe row k of representing matrix A.
Wherein, step 4, the parameter in LSTM neutral nets described in step 6 and many-sorted logic this carefully return grader ginseng Number obtains in the LSTM neural metwork trainings stage, as shown in figure 3, specific training step is as follows:
(1) when being trained to the model, the object function that uses of the present invention for minimize destination probability distribution with The cross entropy of actual probability distribution, object function are defined as:
Wherein, y*And MiRespectively training sample SiCorresponding correct classification and mapping matrix, C are the number of training sample, Parameter setsα is regularization factors.
(2) to the object functionBatch sample error is calculated using gradient descent method, and uses back propagation (BP, Back Propagation) updates the parameter setsMore new formula is:
Wherein, λ is learning rate.
Step 7, the score G (S) of answer text is determined according to given answer text S generics.Specific formula for calculation It is as follows:
G (S)=F (YS)
Wherein F () is given function, and codomain is set { 0,1 ..., K }, and K is the full marks that answer text S corresponds to topic.
In order to verify the actual techniques effect of the present invention, inventor carries out so that one group of primary school 6 grades reads and understands as an example Test.Examination question is as follows:
Article please be read, is answered a question.
Long long ago, only beautiful bird has a rest in the mountain forest in the State of Lu countryside, the king of the State of Lu sends someone it It is connected in the Imperial Ancestral Temple and raises, the good wine of most mellowness is drunk to it, feeds the beef and mutton that it eats delicious food, played to it beautiful《Nine is splendid》Music. But bird is dizzy, worry is sad, dare not eat piece of meat, dare not drink a glass of wine, after three days just in the dust.
This is kept pet by the life style of oneself, rather than is kept pet by the habit of bird!If keeping pet by the habit of bird, just It should be allowed to inhabit in remote, thickly forested mountains, on land, shoal is played, and is circled in the air in rivers,lakes and seas, is rested with the queue of flock of birds, from By comfortable live in forest.Birds are detested to listen the sound of people, but why also noisy to go to be tired of them noisyly Magnificent palace " salty pond " is inner to play grace to bird《Nine is splendid》How laughable music is.If played in wilderness field《Nine It is splendid》, small bird, which has listened, just flys far and high, and beasts, which have listened just to be scared, to run away, and fish, which has been listened just to dive into the water, to keep close, but people Having listened will crowd around to watch.Fish could survive in water, but people enter to water in will be drowned.The mankind and fish habit are not Together, both tastes are natively different.Therefore, the sage of former generation does not impose the mankind and small bird to have the same ability, not yet Them can be allowed to do same thing.
It is understood that natural rule will be met by doing things, happiness could be muchly kept.You think best, Ren Jiake It can feel bad, in some instances it may even be possible to feel very bad.Measure other's corn by own bushel, sometimes wrong;Get sth into one's head, often lead to thing mistake Lose.
Original text changes certainly《Village is to a happy piece》
Please according to author's viewpoint, with the king of the Hua Shuoyishuo thes State of Lu of oneself keep pet failure the reason for:
--------------------------------------------------------
The topic full marks are 2 points.Effective student's answer 533 is collected altogether, and every answer is beaten respectively by 2 teachers Point, marking coincident indicator kappa=0.755.Student's answer text averagely includes 24 Chinese vocabularies, and longest text includes 78 vocabulary, student's answer text extract are as shown in table 2.Inventor is collected using wikipedia, search dog news and inventor Primary school student composition training word represents vector, and more than totally 47 ten thousand bar term vector, every term vector 200 are tieed up, as shown in Figure 4.This implementation Word represents that the training tool of vector is python Open-Source Tools gensim in example, and gensim is in https:// Radimrehurek.com/gensim/models/word2vec.html is discussed in detail.
2. answer text extract of table
Hardware environment is used by this experiment:64 bit manipulation systems of Ubuntu, Intel i7 processors, CPU frequency 3.41GHz, memory 16G.Chinese word segmentation uses Open-Source Tools " stammerer " Chinese word segmentation, in https://pypi.python.org/ In pypi/jieba/ and Zheng Jie is written《NLP Chinese natural language process principle and practice》In have it is detailed on " stammerer " The introduction of Chinese word segmentation.In the LSTM neutral nets, using 128 LSTM units, the wheel number epoch=of model training 30, using maximum pond algorithm.After LSTM neutral nets carry out the progress semantic feature vectorization of student's answer text, using more This carefully returns grader classification to the logic of class, and tag along sort then is converted into scoring according to established rule.Assuming that student's answer is literary This S, which is predicted to be, belongs to YSClass, then G (S) is the prediction score of this answer.Student's answer text classification mark in the present embodiment The score for this answer is signed, i.e.,:
Using 80% sample in the data set as training set, remaining sample forms test set, that is, randomly selects 426 samples This is used for model training, remaining 107 parts of sample is used for model measurement, and the scoring accuracy rate after cross validation is 80.1%, kappa It is worth for 0.686, close to the marking uniformity of people.As it can be seen that answered data using a small amount of student, it is possible to answered in the standard of not depending on In the case of case information, the structure of the automatic scoring model with superperformance is realized.
In conclusion the present invention proposes a kind of Chinese short text subjective item automatic scoring side using LSTM neutral nets Method, can fully excavate the semantic information in answer text, realizes to Chinese short essay in the case of independent of code of points The automatic scoring of this subjective item.The experiment test of truthful data collection shows, according to the LSTM neutral nets of the method for the present invention training , can be with the realization Chinese short text subjective item automatic scoring of better quality with good text classification performance.
Finally it should be noted that above example be only used for description technical scheme rather than to technical method into Row limitation, within the spirit and principles of the invention, does any modification, equivalent substitution, improvement and etc. and should be included in this hair Within bright protection domain.

Claims (9)

  1. A kind of 1. Chinese short text subjective item automatic scoring method using LSTM neutral nets, it is characterised in that including following Step:
    Step 1:Answer text to subjective item carries out participle operation, and answer text is converted into a word sequence;
    Step 2:The vectorization for obtaining each word in answer text represents, and builds answer text mapping matrix;
    Step 3:Computing is carried out using LSTM neutral net answer case texts mapping matrix, obtains the defeated of all or part of hidden layers Go out, obtain the semantic feature matrix of answer text;
    Step 4:Down-sampling is carried out to the semantic feature matrix using pond algorithm and obtains the semantic feature of the answer text Vector;
    Step 5:The semantic feature vector for the answer text that step 4 is obtained assigns many-sorted logic this meaning grader, answers text This classification is predicted;
    Step 6:According to default answer text generic and the mapping relations of score, the score of answer text is determined.
  2. 2. method according to claim 1, it is characterised in that in the step 2, in default dictionary, answer case text In each word scan for representing with the vectorization for obtaining the word, the priority then occurred according to each word in answer text Sequentially, answer text mapping matrix is built;The feelings in the dictionary are not appeared in for indivedual words in the answer text Condition, is considered as stop words and carries out discard processing.
  3. 3. method according to claim 1, it is characterised in that in the step 3, utilize LSTM neutral net answer case texts Mapping matrix M carry out computing, extract answer text semantic feature, generate answer text semantic feature matrix H, matrix H by The output vector composition of all or part of hidden layers of LSTM neutral nets.
  4. 4. method according to claim 1, it is characterised in that in the step 3, answer text mapping matrix M inputs LSTM The mode of neutral net is:A row of each moment input matrix M to LSTM neutral nets, the column vector of matrix M is risen with arranging mark Sequence arrangement sequentially inputs LSTM neutral nets, has been effectively retained the word order information of answer text.
  5. 5. method according to claim 1, it is characterised in that the LSTM neural network models ginseng in the step 3 and step 5 Number and sorter model parameter obtain in the Rating Model training process, are distributed and reality using destination probability is minimized The cross entropy of probability distribution is object function, and batch sample error is calculated using gradient descent method, and using back propagation more New LSTM neural network model parameters and sorter model parameter.
  6. 6. method according to claim 1, it is characterised in that the semantic feature vector generated in the step 4 is answered for input The vectorization semantic feature of case text represents that the vectorization semantic feature contains the word order information and word of answer text in representing Related information between language and text semantic.
  7. 7. method according to claim 1, it is characterised in that the pond algorithm used in the step 4, the pond method Using maximum pond method or minimum pond method or average Chi Huafa.
  8. 8. method according to claim 1, it is characterised in that in the step 6 between answer text generic and score It is many-to-one relation, that is, allows different classes of answer text to obtain identical score, but do not allow the answer of identical category Text obtains different scores.
  9. A kind of 9. Chinese short text subjective item Auto-Evaluation System using LSTM neutral nets, it is characterised in that the scoring System, including input module, data processing module, semantic feature extraction module, grading module and lexicon module, wherein:
    Input module, for the answer text to be reached data processing module;
    Data processing module, for corresponding answer text mapping square to be segmented and built to the answer text of the input Battle array;Answer text mapping matrix is reached into semantic feature extraction module;
    Semantic feature extraction module, for obtaining the semantic feature vector of the answer text;Including LSTM neural net layers and Pond layer, LSTM neutral nets are input to by the answer text mapping matrix, obtain some or all hidden layers in network Output, obtains answer text semantic eigenmatrix, and answer text semantic eigenmatrix then is carried out pond computing, obtains answer Text semantic feature vector, and send it to grading module;
    Grading module, for determining the score of answer text;By the answer text semantic feature vector assign many-sorted logic this Meaning grader, the classification of answer case text are predicted, and are then closed according to the answer text categories of prediction according to default mapping System is mapped as the score of the answer text, and exports appraisal result;
    Lexicon module, stores the word of pre-training in the form of tables of data and corresponding vectorization represents, carried for data processing module For the calling of data.
CN201711177862.1A 2017-11-23 2017-11-23 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets Pending CN107967318A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711177862.1A CN107967318A (en) 2017-11-23 2017-11-23 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711177862.1A CN107967318A (en) 2017-11-23 2017-11-23 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets

Publications (1)

Publication Number Publication Date
CN107967318A true CN107967318A (en) 2018-04-27

Family

ID=62000419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711177862.1A Pending CN107967318A (en) 2017-11-23 2017-11-23 A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets

Country Status (1)

Country Link
CN (1) CN107967318A (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763411A (en) * 2018-05-23 2018-11-06 北京师范大学 A kind of combination short text clustering and the subjective item of recommendation mechanisms read and make comments system and method
CN108875074A (en) * 2018-07-09 2018-11-23 北京慧闻科技发展有限公司 Based on answer selection method, device and the electronic equipment for intersecting attention neural network
CN108897723A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 The recognition methods of scene dialog text, device and terminal
CN108960319A (en) * 2018-06-29 2018-12-07 哈尔滨工业大学 It is a kind of to read the candidate answers screening technique understood in modeling towards global machine
CN109146296A (en) * 2018-08-28 2019-01-04 南京葡萄诚信息科技有限公司 A kind of artificial intelligence assessment talent's method
CN109242090A (en) * 2018-08-28 2019-01-18 电子科技大学 A kind of video presentation and description consistency discrimination method based on GAN network
CN109388806A (en) * 2018-10-26 2019-02-26 北京布本智能科技有限公司 A kind of Chinese word cutting method based on deep learning and forgetting algorithm
CN109670168A (en) * 2018-11-14 2019-04-23 华南师范大学 Short answer automatic scoring method, system and storage medium based on feature learning
CN109815491A (en) * 2019-01-08 2019-05-28 平安科技(深圳)有限公司 Answer methods of marking, device, computer equipment and storage medium
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110162797A (en) * 2019-06-21 2019-08-23 北京百度网讯科技有限公司 Article quality determining method and device
CN110245860A (en) * 2019-06-13 2019-09-17 桂林电子科技大学 A method of the automatic scoring based on Virtual Experiment Platform Based
CN110309503A (en) * 2019-05-21 2019-10-08 昆明理工大学 A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN
CN110309267A (en) * 2019-07-08 2019-10-08 哈尔滨工业大学 Semantic retrieving method and system based on pre-training model
CN110413741A (en) * 2019-08-07 2019-11-05 山东山大鸥玛软件股份有限公司 A kind of intelligently reading method towards subjective item
CN110457674A (en) * 2019-06-25 2019-11-15 西安电子科技大学 A kind of text prediction method of theme guidance
CN110858218A (en) * 2018-08-13 2020-03-03 宋曜廷 Automatic scoring method and system for divergent thinking test
CN110991161A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Similar text determination method, neural network model obtaining method and related device
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium
CN111104881A (en) * 2019-12-09 2020-05-05 科大讯飞股份有限公司 Image processing method and related device
CN111191578A (en) * 2019-12-27 2020-05-22 北京新唐思创教育科技有限公司 Automatic scoring method, device, equipment and storage medium
CN111221939A (en) * 2019-11-22 2020-06-02 华中师范大学 Grading method and device and electronic equipment
CN111241392A (en) * 2020-01-07 2020-06-05 腾讯科技(深圳)有限公司 Method, device, equipment and readable storage medium for determining popularity of article
CN111724813A (en) * 2020-06-17 2020-09-29 东莞理工学院 LSTM-based piano playing automatic scoring method
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
WO2021000675A1 (en) * 2019-07-04 2021-01-07 平安科技(深圳)有限公司 Method and apparatus for machine reading comprehension of chinese text, and computer device
CN112287083A (en) * 2020-10-29 2021-01-29 北京乐学帮网络技术有限公司 Evaluation method and device, computer equipment and storage device
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN112988921A (en) * 2019-12-13 2021-06-18 北京四维图新科技股份有限公司 Method and device for identifying map information change
CN113111154A (en) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 Similarity evaluation method, answer search method, device, equipment and medium
CN113392642A (en) * 2021-06-04 2021-09-14 北京师范大学 System and method for automatically labeling child-bearing case based on meta-learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104834747A (en) * 2015-05-25 2015-08-12 中国科学院自动化研究所 Short text classification method based on convolution neutral network
CN106980608A (en) * 2017-03-16 2017-07-25 四川大学 A kind of Chinese electronic health record participle and name entity recognition method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN107301246A (en) * 2017-07-14 2017-10-27 河北工业大学 Chinese Text Categorization based on ultra-deep convolutional neural networks structural model

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763411A (en) * 2018-05-23 2018-11-06 北京师范大学 A kind of combination short text clustering and the subjective item of recommendation mechanisms read and make comments system and method
CN108897723B (en) * 2018-06-29 2022-08-02 北京百度网讯科技有限公司 Scene conversation text recognition method and device and terminal
CN108897723A (en) * 2018-06-29 2018-11-27 北京百度网讯科技有限公司 The recognition methods of scene dialog text, device and terminal
CN108960319A (en) * 2018-06-29 2018-12-07 哈尔滨工业大学 It is a kind of to read the candidate answers screening technique understood in modeling towards global machine
CN108875074B (en) * 2018-07-09 2021-08-10 北京慧闻科技发展有限公司 Answer selection method and device based on cross attention neural network and electronic equipment
CN108875074A (en) * 2018-07-09 2018-11-23 北京慧闻科技发展有限公司 Based on answer selection method, device and the electronic equipment for intersecting attention neural network
CN110858218A (en) * 2018-08-13 2020-03-03 宋曜廷 Automatic scoring method and system for divergent thinking test
CN110858218B (en) * 2018-08-13 2023-06-30 宋曜廷 Automatic scoring method and system for divergent thinking test
CN109146296A (en) * 2018-08-28 2019-01-04 南京葡萄诚信息科技有限公司 A kind of artificial intelligence assessment talent's method
CN109242090A (en) * 2018-08-28 2019-01-18 电子科技大学 A kind of video presentation and description consistency discrimination method based on GAN network
CN110991161B (en) * 2018-09-30 2023-04-18 北京国双科技有限公司 Similar text determination method, neural network model obtaining method and related device
CN110991161A (en) * 2018-09-30 2020-04-10 北京国双科技有限公司 Similar text determination method, neural network model obtaining method and related device
CN109388806A (en) * 2018-10-26 2019-02-26 北京布本智能科技有限公司 A kind of Chinese word cutting method based on deep learning and forgetting algorithm
CN109388806B (en) * 2018-10-26 2023-06-27 北京布本智能科技有限公司 Chinese word segmentation method based on deep learning and forgetting algorithm
CN109670168B (en) * 2018-11-14 2023-04-18 华南师范大学 Short answer automatic scoring method, system and storage medium based on feature learning
CN109670168A (en) * 2018-11-14 2019-04-23 华南师范大学 Short answer automatic scoring method, system and storage medium based on feature learning
CN109815491B (en) * 2019-01-08 2023-08-08 平安科技(深圳)有限公司 Answer scoring method, device, computer equipment and storage medium
CN109815491A (en) * 2019-01-08 2019-05-28 平安科技(深圳)有限公司 Answer methods of marking, device, computer equipment and storage medium
CN110134948A (en) * 2019-04-23 2019-08-16 北京淇瑀信息科技有限公司 A kind of Financial Risk Control method, apparatus and electronic equipment based on text data
CN110309503A (en) * 2019-05-21 2019-10-08 昆明理工大学 A kind of subjective item Rating Model and methods of marking based on deep learning BERT--CNN
CN110245860B (en) * 2019-06-13 2022-08-23 桂林电子科技大学 Automatic scoring method based on virtual experiment platform
CN110245860A (en) * 2019-06-13 2019-09-17 桂林电子科技大学 A method of the automatic scoring based on Virtual Experiment Platform Based
CN110162797A (en) * 2019-06-21 2019-08-23 北京百度网讯科技有限公司 Article quality determining method and device
CN110162797B (en) * 2019-06-21 2023-04-07 北京百度网讯科技有限公司 Article quality detection method and device
CN110457674B (en) * 2019-06-25 2021-05-14 西安电子科技大学 Text prediction method for theme guidance
CN110457674A (en) * 2019-06-25 2019-11-15 西安电子科技大学 A kind of text prediction method of theme guidance
WO2021000675A1 (en) * 2019-07-04 2021-01-07 平安科技(深圳)有限公司 Method and apparatus for machine reading comprehension of chinese text, and computer device
CN110309267A (en) * 2019-07-08 2019-10-08 哈尔滨工业大学 Semantic retrieving method and system based on pre-training model
CN110413741B (en) * 2019-08-07 2022-04-05 山东山大鸥玛软件股份有限公司 Subjective question-oriented intelligent paper marking method
CN110413741A (en) * 2019-08-07 2019-11-05 山东山大鸥玛软件股份有限公司 A kind of intelligently reading method towards subjective item
CN111221939A (en) * 2019-11-22 2020-06-02 华中师范大学 Grading method and device and electronic equipment
CN111221939B (en) * 2019-11-22 2023-09-08 华中师范大学 Scoring method and device and electronic equipment
CN111104881B (en) * 2019-12-09 2023-12-01 科大讯飞股份有限公司 Image processing method and related device
CN111104881A (en) * 2019-12-09 2020-05-05 科大讯飞股份有限公司 Image processing method and related device
CN111079641A (en) * 2019-12-13 2020-04-28 科大讯飞股份有限公司 Answering content identification method, related device and readable storage medium
CN112988921A (en) * 2019-12-13 2021-06-18 北京四维图新科技股份有限公司 Method and device for identifying map information change
CN111079641B (en) * 2019-12-13 2024-04-16 科大讯飞股份有限公司 Answer content identification method, related device and readable storage medium
CN111191578A (en) * 2019-12-27 2020-05-22 北京新唐思创教育科技有限公司 Automatic scoring method, device, equipment and storage medium
CN111241392A (en) * 2020-01-07 2020-06-05 腾讯科技(深圳)有限公司 Method, device, equipment and readable storage medium for determining popularity of article
CN111241392B (en) * 2020-01-07 2024-01-26 腾讯科技(深圳)有限公司 Method, apparatus, device and readable storage medium for determining popularity of article
CN111724813A (en) * 2020-06-17 2020-09-29 东莞理工学院 LSTM-based piano playing automatic scoring method
CN112085985A (en) * 2020-08-20 2020-12-15 安徽七天教育科技有限公司 Automatic student answer scoring method for English examination translation questions
CN112287083A (en) * 2020-10-29 2021-01-29 北京乐学帮网络技术有限公司 Evaluation method and device, computer equipment and storage device
CN112634689A (en) * 2020-12-24 2021-04-09 广州奇大教育科技有限公司 Application method of regular expression in automatic subjective question changing in computer teaching
CN113392642B (en) * 2021-06-04 2023-06-02 北京师范大学 Automatic labeling system and method for child care cases based on meta learning
CN113392642A (en) * 2021-06-04 2021-09-14 北京师范大学 System and method for automatically labeling child-bearing case based on meta-learning
CN113111154A (en) * 2021-06-11 2021-07-13 北京世纪好未来教育科技有限公司 Similarity evaluation method, answer search method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN107967318A (en) A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN107967257B (en) Cascading composition generating method
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
US9779085B2 (en) Multilingual embeddings for natural language processing
CN103823794B (en) A kind of automatization's proposition method about English Reading Comprehension test query formula letter answer
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
CN109977199B (en) Reading understanding method based on attention pooling mechanism
CN106569998A (en) Text named entity recognition method based on Bi-LSTM, CNN and CRF
CN108829678A (en) Name entity recognition method in a kind of Chinese international education field
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN112149421A (en) Software programming field entity identification method based on BERT embedding
CN106682089A (en) RNNs-based method for automatic safety checking of short message
CN111368082A (en) Emotion analysis method for domain adaptive word embedding based on hierarchical network
CN110287298A (en) A kind of automatic question answering answer selection method based on question sentence theme
CN113486645A (en) Text similarity detection method based on deep learning
Cai Automatic essay scoring with recurrent neural network
Lilja Automatic essay scoring of Swedish essays using neural networks
CN113011196A (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN110705306B (en) Evaluation method for consistency of written and written texts
CN114579706B (en) Automatic subjective question review method based on BERT neural network and multi-task learning
CN116049349A (en) Small sample intention recognition method based on multi-level attention and hierarchical category characteristics
CN114462389A (en) Automatic test paper subjective question scoring method
Luo Automatic short answer grading using deep learning
CN112036170A (en) Neural zero sample fine-grained entity classification method based on type attention
CN110083825A (en) A kind of Laotian sentiment analysis method based on GRU model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180427