CN1700200A - English composition automatic scoring system - Google Patents

English composition automatic scoring system Download PDF

Info

Publication number
CN1700200A
CN1700200A CN 200510040305 CN200510040305A CN1700200A CN 1700200 A CN1700200 A CN 1700200A CN 200510040305 CN200510040305 CN 200510040305 CN 200510040305 A CN200510040305 A CN 200510040305A CN 1700200 A CN1700200 A CN 1700200A
Authority
CN
China
Prior art keywords
composition
content
reflection
speech
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200510040305
Other languages
Chinese (zh)
Inventor
梁茂成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN 200510040305 priority Critical patent/CN1700200A/en
Publication of CN1700200A publication Critical patent/CN1700200A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an automatic graded system of English composition which comprises a training book combined with a set of English composition, textbook character term, regression equation and computer with input and output devices, wherein training book storages itself at computer by input device; textbook character term obtains the information by analysis the composition textbook of training book as independent variable; regression equation does multiple regression analysis with the independent variable and dependent variable which is obtained from people graded and then establishes operational method by statistics module and displays it by computer.

Description

English composition automatic scoring system
One, technical field
The present invention relates to a kind of paper be carried out the system of automatic scoring, specifically a kind of English composition automatic scoring system.
Two, background technology
At present, the domestic system that also english composition is not carried out automatic scoring did not meet any report at the technology of Chinese student's english composition scoring in the world yet.Research to English composition automatic scoring system mainly contains three kinds of softwares in the world, all is to utilize artificial scoring training machine scoring model, by extracting the numerous text feature items in the composition, utilizes statistical homing method to calculate the composition score.These three kinds of softwares are respectively PEG (by University of Duke exploitations), IEA (by University of Colorado exploitation) and E-rater (by Educational TestingService exploitation).Yet these three kinds of softwares are not to design at Chinese student's english composition automatic scoring, and the general principles of operation is basic identical, but the text feature item that extracts has nothing in common with each other and externally maintains secrecy.From the disclosed research report of fragmentary publication, as if PEG and IEA are mainly and read and appraise with English is that the student's of mother tongue composition designs, and E-rater is mainly and reads and appraises the theme in the GMAT examination and design.Each software extracts the variable of which concrete text feature item as the scoring model respectively, has no way of learning.
Evaluation to the English Writing quality, generally should start with from language, content and three aspects of the structure of an article, and the evaluation of its speech quality is often started with from fluent degree (fluency), accuracy (accuracy) and three aspects of complicacy (complexity), complicacy is wherein observed from speech and two aspects of sentence respectively again.External existing composition points-scoring system is not passed judgment on principle because follow such second language writing, thereby it is not strong to the scoring specific aim of Chinese student's english composition, method is ineffective, the scoring of composition can only be adapted to, or automatic scoring can only be adapted to composition in certain examination to English being the student of mother tongue.
Therefore, all there is following shortcoming in above-mentioned three kinds of English composition automatic scoring systems:
1, Chinese student's english composition has the characteristics of himself, more than three kinds of systems automatic scoring specific aim of being used for Chinese student not strong, can not reflect the height of composition level objectively.
2, these three kinds of softwares can not be from the characteristics of comprehensive analysis student english composition.PEG only analyzes text feature the most basic in the composition, and as text size, average speech is long etc., its dependent variable but without analyse: IEA utilizes the Latent Semantic Analysis technology in the information retrieval, the main content of analyzing composition; And E-rater utilizes natural language processing technique, analyzes the syntactic features of composition, keep to the point degree and rhetorical structure, and other concrete variable also without analyse.
Three, summary of the invention
Purpose of the present invention will overcome the shortcoming of above-mentioned automatic scoring system just, a kind of suitable Chinese student's English composition automatic scoring system is provided, the characteristics of aspects and as judging basis in the composition of this system synthesis, english composition to Chinese student carries out automatic scoring, can realize the extensive scoring of english composition.
The objective of the invention is to be achieved through the following technical solutions:
A kind of English composition automatic scoring system is characterized in that: it comprises training set, text feature item, regression equation that is formed by one group of english composition set and the computer that has the input and output device; Described training set is stored in the computer by input media; The text feature item is by the information that text analyzing obtains is carried out in the composition in the training set, and with this information as independent variable; Regression equation is with the method for operation of commenting score to carry out multiple regression analysis and set up by statistical model as dependent variable and independent variable as the scholar in the training set; Text analyzing is carried out in the treating scoring composition of input in the computer, the text feature item that obtains as independent variable, through obtaining appraisal result after the regression equation computing, and is shown by the output unit of computer.
Among the present invention, described text feature item comprises speech quality, content quality and the structure of an article quality that can embody the composition feature.Described speech quality comprises fluency, vocabulary complicacy, syntactic complexity and accuracy; Content quality comprises the correlativity of content and the continuity of content; Structure of an article quality comprises discourse structure and paragraph arrangement.
Independent variable described in the present invention comprises following 14: the class symbol number of reflection fluency; Average speech length, the long standard deviation of speech and the noun vocabulary ratio of reflection vocabulary complicacy; The mean sentence length and the gerund number of reflection syntactic complexity; Reproduction speech clump number, preposition frequency error, definite article frequency error, the noun pronoun ratio of reflection accuracy; The content similarity of the correlativity of reflection content; The procedure Term number of reflection continuity of content; The language piece of writing of reflection discourse structure connects the language number; The paragraph that the reflection paragraph is arranged is counted error.
Variable-definition is as follows separately among the present invention:
1) class symbol number: class symbol (word types) number that refers in the text to be comprised.
2) average speech is long: the average length (calculating with the alphabetical number that is comprised in the word) that refers to all vocabulary in the text.
3) the long standard deviation of speech: the standard deviation of the length of the vocabulary that refers in the text to be comprised (calculating) with the alphabetical number that is comprised in the word.
4) noun vocabulary ratio: the ratio that refers to noun vocabulary in the text (ion ,-ment etc.) and total speech number.
5) mean sentence length: the average length (number of words of pressing in the sentence is calculated) that refers to all sentences in the text.
6) gerund number: refer in the text speech number with-ing ending.
7) reappear speech clump number: refer to occur in the best set (in the sampling sample score the highest 1/4) in the training set number of times that the speech clump (word clusters) of the 3-4 speech more than 3 times occurs in text.
8) preposition frequency error: refer to that the ratio ratio of total speech number (the preposition number with) of preposition deducts the absolute value of 13.21% back institute value.
9) definite article frequency error: specify the ratio ratio of total speech number (the definite article number with) of article to deduct the absolute value of 6.5% back institute value.
10) noun pronoun ratio: the ratio that refers to noun sum and personal pronoun sum in the text.
11) content similarity: refer to word-document matrix (term-document matrix) is carried out passing through svd (Singular ValueDecomposition) again after the weight to word according to Okapi term weighing scheme, each text of trying to achieve according to dot product scalar product (dot product) again behind the reconstruction matrix and the best set in the training set are in semantically similarity (similarity).Okapi term weighing scheme is:
12) procedure Term item number: the number of a procedure Term (procedural vocabulary) that refers in the text to be comprised.The procedure Term table is self-editing by the patent applicant.
13) a language piece of writing connects a language number: a language piece of writing that refers in the text to be comprised connects the number of language (discourse conjuncts).It is self-editing by the patent applicant that a language piece of writing connects the language tabulation.
14) paragraph is counted error: the absolute value that refers to the difference of the average paragraph number of the best set composition in the training set and the actual paragraph of text.
The automatic scoring process mainly relies on the foundation of scoring model among the present invention, and the core of scoring model is the independent variable in speech quality, content quality, this three big module of structure of an article quality and each module.
At first, from extensive examination, collect theme in batches as research material, and organize a plurality of senior scorers that this batch composition is manually marked.Composition after the scoring is used for creating the scoring model as training set.
In the model creation stage, utilize natural language processing technique, corpus is endowed and statistical technique, information retrieval technique are carried out text analyzing to theme, extracts a large amount of text feature items, carries out correlation analysis then, to determine the independent variable in the model; While as dependent variable, is carried out multiple regression analysis with artificial scoring, sets up regression model, finally obtains regression equation.These independents variable are some text feature items of some language that can embody composition, content and the structure of an article.At present, comprise three big grading module and 14 independents variable having determined based on the analysis result that has carried out core of the present invention as can be known, three definite big grading module are: speech quality, content quality and structure of an article quality; Independent variable comprises following 14: class symbol number, average speech length, the long standard deviation of speech, noun vocabulary ratio, mean sentence length, gerund number, reproduction speech clump number, preposition frequency error, definite article frequency error, noun pronoun ratio, content similarity, procedure Term number, a language piece of writing connect the language number, paragraph is counted error.
In the automatic scoring stage, treat the scoring composition earlier and carry out text analyzing, extract variable, among the numerical value substitution regression equation with variable, can obtain the machine scoring then.
One aspect of the present invention is carried out text analyzing to the composition in the training set, extract a large amount of text feature items, to determine the independent variable in the model, on the other hand with artificial scoring as dependent variable, carry out multiple regression analysis, obtain regression equation, carry out text analyzing by treating the scoring composition then, extract variable, and among the numerical value substitution regression equation with variable, finally realize the machine scoring.The present invention compares with existing artificial methods of marking, and is low in resources consumption, the scoring reliability is reliable, is fit to Chinese student's english composition automatic scoring.
Four, description of drawings
Fig. 1 is an english composition automatic scoring process flow diagram among the present invention;
Fig. 2 is english composition quality analysis figure among the present invention.
Five, preferred forms
A kind of English composition automatic scoring system of the present invention, at first collect the e-text of one group of english composition, it can be 50 pieces, assemble training set, and be stored in the computer by input media, should embed text analyzing instrument and statistical and analytical tool in the computer, the text analyzing instrument is used for extracting variable from the e-text of english composition, and statistical and analytical tool is used to carry out correlation analysis and sets up regression model.Random sampling from training set is then carried out many people to the sampling composition and is manually marked, and obtains dependent variable; Computer literal this analysis is carried out in the sampling composition, extracted the text feature item, totally 14 kinds, as shown in the table:
Composition quality evaluation aspect Variable
Speech quality Fluency Class symbol number
The vocabulary complicacy Average speech is long
The long standard deviation of speech
Noun vocabulary ratio
Syntactic complexity Mean sentence length
The gerund number
Accuracy Reappear speech clump number
The preposition frequency error
The definite article frequency error
Noun pronoun ratio
Content quality The correlativity of content The content similarity
The continuity of content Procedure Term item number
Structure of an article quality Discourse structure A language piece of writing connects the language number
Paragraph is arranged Paragraph is counted error
Analyze the correlativity between each text feature item and the artificial scoring again, as independent variable, the average of artificial scoring is carried out multiple regression analysis as dependent variable, obtains regression equation with the significant text feature item of correlativity; To treat in the scoring composition input computer, extract the variable in the e-text of the composition of waiting to mark, and with variable for people's regression equation, the machine scoring of obtaining waiting marking composition.Appraisal result can show by the output unit of computer.
The present invention can realize the scoring of the large-scale machines of Chinese student's english composition, and is low in resources consumption, the scoring reliability is reliable.

Claims (4)

1, a kind of English composition automatic scoring system is characterized in that: it comprises training set, text feature item, regression equation that is formed by one group of english composition set and the computer that has the input and output device; Described training set is stored in the computer by input media; The text feature item is by the information that text analyzing obtains is carried out in the composition in the training set, and with this information as independent variable; Regression equation is with the method for operation of commenting score to carry out multiple regression analysis and set up by statistical model as dependent variable and independent variable as the scholar in the training set; Text analyzing is carried out in the treating scoring composition of input in the computer, the text feature item that obtains as independent variable, through obtaining appraisal result after the regression equation computing, and is shown by the output unit of computer.
2, English composition automatic scoring system according to claim 1 is characterized in that: comprise speech quality, content quality and the structure of an article quality that can embody the composition feature by the composition in the training set being carried out the independent variable that text analyzing obtains.
3, English composition automatic scoring system according to claim 2 is characterized in that: described speech quality comprises fluency, vocabulary complicacy, syntactic complexity and accuracy; Content quality comprises the correlativity of content and the continuity of content; Structure of an article quality comprises discourse structure and paragraph arrangement.
4, English composition automatic scoring system according to claim 1 is characterized in that: described independent variable comprises the class symbol number that reflects fluency; The average speech length of reflection vocabulary complicacy, the long standard deviation of speech and noun vocabulary ratio; The mean sentence length and the gerund number of reflection syntactic complexity; Reproduction speech clump number, preposition frequency error, definite article frequency error, the noun pronoun ratio of reflection accuracy; The content similarity of the correlativity of reflection content; The procedure Term number of reflection continuity of content; The language piece of writing of reflection discourse structure connects the language number; The paragraph that the reflection paragraph is arranged is counted error.
CN 200510040305 2005-05-30 2005-05-30 English composition automatic scoring system Pending CN1700200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200510040305 CN1700200A (en) 2005-05-30 2005-05-30 English composition automatic scoring system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200510040305 CN1700200A (en) 2005-05-30 2005-05-30 English composition automatic scoring system

Publications (1)

Publication Number Publication Date
CN1700200A true CN1700200A (en) 2005-11-23

Family

ID=35476268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200510040305 Pending CN1700200A (en) 2005-05-30 2005-05-30 English composition automatic scoring system

Country Status (1)

Country Link
CN (1) CN1700200A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN103020882A (en) * 2012-12-17 2013-04-03 王晓龙 Automatic and Chinesized test paper marking scoring system
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
CN104778160A (en) * 2015-04-27 2015-07-15 桂林电子科技大学 Analysis method for subject relevance of English composition contents
CN105183713A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 English composition automatic correcting method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN108363687A (en) * 2018-01-16 2018-08-03 深圳市脑洞科技有限公司 Subjective item scores and its construction method, electronic equipment and the storage medium of model
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN109635087A (en) * 2018-12-12 2019-04-16 广东小天才科技有限公司 A kind of composition methods of marking and private tutor's equipment
CN109670184A (en) * 2018-12-26 2019-04-23 南京题麦壳斯信息科技有限公司 A kind of english article method for evaluating quality and system
CN110413991A (en) * 2019-06-20 2019-11-05 华中师范大学 A kind of primary language composition automatic evaluation method used based on rhetoric
CN111832281A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Composition scoring method and device, computer equipment and computer readable storage medium
CN113553830A (en) * 2021-08-11 2021-10-26 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method
CN113836894A (en) * 2021-09-26 2021-12-24 武汉天喻信息产业股份有限公司 Multidimensional English composition scoring method and device and readable storage medium

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103294660A (en) * 2012-02-29 2013-09-11 张跃 Automatic English composition scoring method and system
CN103294660B (en) * 2012-02-29 2015-09-16 张跃 A kind of english composition automatic scoring method and system
CN102831558A (en) * 2012-07-20 2012-12-19 桂林电子科技大学 System and method for automatically scoring college English compositions independent of manual pre-scoring
CN103020882A (en) * 2012-12-17 2013-04-03 王晓龙 Automatic and Chinesized test paper marking scoring system
CN104778160A (en) * 2015-04-27 2015-07-15 桂林电子科技大学 Analysis method for subject relevance of English composition contents
CN104778160B (en) * 2015-04-27 2017-10-24 桂林电子科技大学 A kind of english composition content is kept to the point analysis method
CN105183713A (en) * 2015-08-27 2015-12-23 北京时代焦点国际教育咨询有限责任公司 English composition automatic correcting method and system
CN107133211A (en) * 2017-04-26 2017-09-05 中国人民大学 A kind of composition methods of marking based on notice mechanism
CN108363687A (en) * 2018-01-16 2018-08-03 深圳市脑洞科技有限公司 Subjective item scores and its construction method, electronic equipment and the storage medium of model
CN108920455A (en) * 2018-06-13 2018-11-30 北京信息科技大学 A kind of Chinese automatically generates the automatic evaluation method of text
CN109635087A (en) * 2018-12-12 2019-04-16 广东小天才科技有限公司 A kind of composition methods of marking and private tutor's equipment
CN109670184A (en) * 2018-12-26 2019-04-23 南京题麦壳斯信息科技有限公司 A kind of english article method for evaluating quality and system
CN110413991A (en) * 2019-06-20 2019-11-05 华中师范大学 A kind of primary language composition automatic evaluation method used based on rhetoric
CN111832281A (en) * 2020-07-16 2020-10-27 平安科技(深圳)有限公司 Composition scoring method and device, computer equipment and computer readable storage medium
CN113553830A (en) * 2021-08-11 2021-10-26 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method
CN113553830B (en) * 2021-08-11 2023-01-03 桂林电子科技大学 Graph-based English text sentence language piece coherent analysis method
CN113836894A (en) * 2021-09-26 2021-12-24 武汉天喻信息产业股份有限公司 Multidimensional English composition scoring method and device and readable storage medium
CN113836894B (en) * 2021-09-26 2023-08-15 武汉天喻信息产业股份有限公司 Multi-dimensional English composition scoring method and device and readable storage medium

Similar Documents

Publication Publication Date Title
CN1700200A (en) English composition automatic scoring system
CN102779220A (en) English test paper scoring system
Blanchard et al. TOEFL11: A corpus of non‐native English
Kennedy An introduction to corpus linguistics
Burstein et al. Computer analysis of essays
Kate et al. Learning to predict readability using diverse linguistic features
Chong et al. Using natural language processing for automatic detection of plagiarism
CN101599071A (en) The extraction method of conversation text topic
Van de Velde et al. Historical linguistics
Mohammadshahi et al. What do compressed multilingual machine translation models forget?
Leńko-Szymańska How to Trace the Growth in Learners Active Vocabulary? A Corpus-based Study
Saroj et al. IRlab@ IITV at SemEval-2020 Task 12: multilingual offensive language identification in social media using SVM
Kessler et al. Extraction of terminology in the field of construction
CN113934814A (en) Automatic scoring method for subjective questions of ancient poetry
Yoon et al. A comparison of grammatical proficiency measures in the automated assessment of spontaneous speech
Sornlertlamvanich et al. Classifier assignment by corpus-based approach
Oco et al. Measuring language similarity using trigrams: Limitations of language identification
Chanda et al. Is Meta Embedding better than pre-trained word embedding to perform Sentiment Analysis for Dravidian Languages in Code-Mixed Text?
Crosbie et al. Towards a model for replicating aesthetic literary appreciation
Chen et al. CYUT-III system at Chinese grammatical error diagnosis task
Mahmoud et al. Uyghur stemming using conditional random fields
Christodoulides Acoustic Correlates of Prosodic Boundaries in French A Review of Corpus Data/Correlatos acústicos de fronteiras prosódicas em francês: uma revisão de dados de corpora
Balint et al. The rhetorical nature of rhythm
Zhao et al. AI-Driven Automated Language Assessment of Picture Writing Tasks
Pal et al. An Approach to Speed-up the Word Sense Disambiguation Procedure through Sense Filtering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication