CN112580333A

CN112580333A - English composition scoring method aiming at image recognition

Info

Publication number: CN112580333A
Application number: CN202011515007.9A
Authority: CN
Inventors: 侯冲; 李哲; 陈家海; 叶家鸣; 吴波
Original assignee: Anhui Seven Day Education Technology Co ltd
Current assignee: Anhui Seven Day Education Technology Co ltd
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-03-30

Abstract

The invention discloses an English composition scoring method aiming at image recognition, and relates to the field of text scoring. Aiming at the English composition scoring problem identified by paper scanning, a solution scheme that a plurality of modules are used for extracting features and an xgboost model is used for mapping the features into final scoring is provided. The scheme mainly comprises the following modules: the system comprises a basic score module, a semantic score module, a theme score module, a writing score module and an integrated score module. In the scheme, each module respectively utilizes a traditional nlp processing method, deep learning, machine learning and other modes to extract features, so that student compositions can be thoroughly analyzed, and the scoring accuracy of the compositions is optimized by integrating the weight of each feature.

Description

English composition scoring method aiming at image recognition

Technical Field

The invention belongs to the technical field of text scoring, and particularly relates to a method for scoring a multi-dimensional analysis composition.

Background

Since the three and four decades of the 20 th century, a number of emerging technologies have begun to emerge, as have the computer internet. "science and technology is the first productivity", and the development of internet technology has promoted the revolution of various industries. Network paper marking and automatic correction are products impacted by internet technology in the education industry. The current automatic correction of network marking covers all disciplines such as small, early, high grade and out of language and number. The modification of the english subject is usually the earliest research project of each researcher, and the modification of the composition is the central importance of the modification of the english subject, and although the research of the automatic modification of the english composition has been in the history of decades, the accuracy of the scoring is not satisfactory. There is still a great need for further research and improvement on the automatic correction method for English compositions.

The existing composition scoring method is mainly based on some shallow features or the similarity of the calculated composition text and the scored composition for scoring. Simple shallow features are extracted from the composition text, machine learning fitting regression is utilized according to the score distribution corresponding to each feature, composition scores can be fitted to a certain extent by the method, but the features are too simple and can not reflect the scores of the composition comprehensively, so that errors are large; the scoring is more accurate in some times according to the similarity of the scored composition, however, once the composition subject is too different from the scored composition corpus, the scoring effect slides down linearly. There is much work to do composition scoring.

Disclosure of Invention

The technical problem to be solved is as follows:

the problem that the scores of the identified composition texts are not accurate enough is solved, and a method for scoring the composition texts based on multi-module omnibearing analysis and identification is provided.

The technical scheme is as follows:

in order to achieve the purpose, the English composition scoring method based on image recognition adopts the scheme that a plurality of modules are used for respectively extracting features with different dimensionalities, after the features with different dimensionalities are subjected to expansion calculation, an xgboost model is used for integrally training all the features, and finally a trained model is used for predicting composition scores. The method comprises four characteristic modules of basic characteristics, theme characteristics, writing characteristics and semantic characteristics and an integrated prediction module.

Preferably, the basic feature module is specifically described as follows: the method is characterized in that the traditional natural language processing method is utilized to count character level text features, including character number, punctuation number, sentence number, paragraph number, word set number, word average length, sentence average length, stop word number, non-stop word number, name word number, movable word number, adjective number, interword number, subword number, conjunctive number and word frequency distribution of each word. And counting the number of each part-of-speech word, namely, performing part-of-speech tagging on the composition text by using an open source tool space, and then counting the number of each part-of-speech word tagged. Calculating word frequency distribution, namely counting word frequency of each word in the existing composition data, sequencing all words according to the word frequency, and dividing the ordered words into 13 and 9 stage ranges according to positive sequence and reverse sequence, namely, the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words from 100 th to 300 th with the highest word frequency), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the reverse sequence is [ -1: -100], [100: -200], [ -200: -400], [ -700], [ -1000], [ -1500: -2000], [2000 ] -2000], [100: -2000], [100 ], (100: -2000], [200 ], [2000, The method comprises the steps of [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], then counting the number of the range of each word in a composition text as a word frequency distribution characteristic, and training an xgboost model after integrating all basic characteristics.

Preferably, the subject feature module is specifically described as: dividing recognized composition texts into two types of running questions and non-running questions, combining each composition text with other student composition texts in the examination to form a text pair, manually marking the text pair formed by two composition texts with one running question and one non-running question with label 1, and forming the text pair formed by other combinations with label 0 to form a text pair training data set, calculating the similarity of the text pair through three schemes, wherein one scheme is that a 50-dimensional glove word vector is used for word vector embedding of the text, converting the text into a vector matrix, then extracting a semantic feature matrix from the text matrix by using a Bilstm neuron to serve as a front-segment input layer of a siamese network, processing the semantic feature matrices of the two texts by matrix superposition and matrix subtraction, and transmitting the two processing results into a full-connection layer and outputting a similar numerical value with a softmax layer after splicing; one scheme is that an examination is taken as a unit, a preprocessed text set is selected out by tfidf, onehot vectors are constructed by the keyword sets, and the similarity of the text pair is calculated by cosine similarity; the last scheme is that firstly, space is used for carrying out part-of-speech tagging on a text, words with parts-of-speech being nouns in the text are counted, high-frequency nouns in the words are selected as a keyword set, an onehot vector is constructed, and the similarity of the text pair is calculated by utilizing cosine similarity; and respectively calculating the mean value, the variance, the median, the maximum value and the minimum value of the similarity of each article and other articles in the examination by using the three schemes to serve as the feature data after the expansion calculation of the running question module.

Preferably, the writing feature module is specifically described as: the composition region image is firstly zoomed to 224 small graphs, the 224 small graphs are labeled manually, the handwritten fonts are divided into four categories including 1, 2, 3 and 4 (corresponding to 'poor', 'normal', 'good' and 'excellent') according to the attractiveness of the handwritten fonts, dense connection convolutional network denseNet training classification models are adopted, and writing categories of the 224 small graphs are predicted. Dense connection convolutional network denseNet is set as follows:

(1) and learning rate: 0.01, the attenuation rate is 0.9;

(2) and an optimizer: adagarad;

(3)、batch:32；

(4)、epoch:50；

preferably, the semantic module is specifically described as: word vector embedding is carried out on a text by utilizing a glove word vector, then a Bilstm neuron extracts a semantic feature matrix from a text matrix, and the semantic feature matrix is transmitted into a full-link layer and a softmax layer to output corresponding scores.

Preferably, the integrated prediction module is specifically described as: and integrating the multidimensional characteristics output by the basic module, the theme module, the writing module and the semantic module as the input of the xgboost model for final prediction.

An English composition scoring method aiming at image recognition comprises the following specific steps:

step one, preparing data: preparing identified composition texts (the number of people in each examination is 50+ based on the examination data of more than 50) by taking an examination as a unit, and ensuring that each examination has a running question composition; 4000 scanned images of the prepared composition block are prepared, and various answer paper types of application scenes are contained as much as possible;

step two, shallow layer feature extraction: preprocessing each text and extracting the following characteristics, namely the number of characters, the number of punctuation marks, the number of sentences, the number of paragraphs, the number of words, the number of word sets, the average length of words, the average length of sentences, the number of stop words and the number of non-stop words, labeling the part of speech of the composition text by space, counting the number of first words, the number of action words, the number of adjectives, the number of interwork words, the number of bystanders and the number of conjunctions, counting the word frequency of all the composition texts, sequencing, and dividing the ordered words into 13 and 9 stage ranges according to the positive sequence and the reverse sequence, namely the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (100 th words with the highest word frequency) and 200 words), the [300:600], [600: 1000:1500], [1500: 2000:3000], [5000: 5000], [ 1208000: 00], [17000 ], [25000 ], [ 2501 ],2501,2501, the reverse order is [ -1: -100] (100 words with the lowest word frequency), [ -100: -200], [ -200: -400], [ -400: -700], [ -700: -1000], [ -1500: -2000], [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], and then the number of the respective ranges to which all the words in the composition text belong is counted.

Step three, training a basic model: integrating the shallow features extracted in the step two, and training by using an xgboost model to obtain a basic scoring model;

step four, text pair construction: taking an examination as a unit, pairwise composing text pairs of composition texts in each examination, setting label of a text pair composed of marked running composition and non-marked running composition as 1 and setting other labels as 0, and respectively extracting the same number of text pairs with label as 1 and 0 from the text pair in each examination as training data;

step five, word vector level similarity training: utilizing the glove word vector to carry out word embedding processing, inputting the word embedding processing into a Bilstm + siense network structure, training a text similarity model, and setting the network structure as follows:

(1) and learning rate: 0.001;

(2) and an optimizer: adagarad;

(3) the Bilstm combination mode is that corresponding elements are added;

(4) stacking (subtracting corresponding elements and stacking) two matrixes of sieme;

(5)、batchsize:256；

(6)、epoch:3；

step six, tfidf similarity calculation: taking an examination as a unit, performing tfidf calculation on a text set to extract key words, and calculating the similarity between every two texts by using cosine similarity on the key words;

step seven, high-frequency noun similarity calculation: taking an examination as a unit, labeling the part of speech of a text by using space, counting the word frequency of nouns, taking high-frequency nouns in each examination text as keywords of the examination text, and calculating the similarity between every two texts by using cosine similarity of the keywords;

step eight, training a running question model: respectively calculating the average value, the variance, the median, the maximum value and the minimum value of the similarity of the single text to other texts by using the similarity calculation methods of the fifth step, the sixth step and the seventh step as characteristic data, and training an xgboost model;

step nine, a deep characteristic module: performing glove word embedding processing on a text, extracting text sequence characteristics by using a Bilstm layer, inputting the text sequence characteristics into a fully-connected network, outputting scores by a softmax layer, and setting the network structure as follows:

(1) and learning rate: 0.001;

(2) and an optimizer: adagarad;

(3) the Bilstm combination mode is that corresponding elements are added;

(4)、batchsize:256；

(5)、epoch:100；

step ten, preparing writing data: respectively zooming 4000 composition pictures to 224 × 224 sizes, and classifying into 1, 2, 3 and 4 (corresponding to "poor", "normal", "good" and "excellent") according to whether the handwritten fonts are beautiful, wherein 1000 pieces of each type of data

Eleventh, writing a model training: performing four-classification training on the marked small graph by using a dense connection convolution network denseNet, and storing a model;

step twelve, integrating the method flow: the scores and the characteristics of the basic scoring module, the running question scoring module, the writing scoring module and the semantic scoring module are integrated together, and the scores are predicted by utilizing an xgboost integrated training model.

Advantageous effects

The invention provides an English composition scoring method aiming at image recognition, which has the following beneficial effects: the method mainly aims at the problem that the grading accuracy rate of an English composition identified by an image is not high, and provides a scheme of extracting characteristics in multiple modes and integrating the characteristics into a training model to predict scores.

Drawings

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a table illustrating the basic features of the present invention;

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below. The embodiments described herein are merely illustrative and are not intended to be limiting.

As shown in figure 1, the invention is a technical scheme, namely an English composition scoring method for image recognition, and the whole method takes a basic module, a semantic module, a theme module and a writing module as a feature extraction stage, and then performs expansion calculation on feature values to realize the function of score prediction for xgboost model prediction.

A basic module: the method is characterized in that the traditional natural language processing method is utilized to count character level text features, including character number, punctuation number, sentence number, paragraph number, word set number, word average length, sentence average length, stop word number, non-stop word number, name word number, movable word number, adjective number, interword number, subword number, conjunctive number and word frequency distribution of each word. And (4) counting the number of each part of speech word, namely, performing part of speech tagging on the composition text by using an open source tool space, and counting the number of each part of speech word tagged. Calculating word frequency distribution, namely counting word frequency of each word in composition data of existing data, sequencing all words according to word frequency, and dividing ordered words into 13 and 9 stage ranges according to positive sequence and reverse sequence, wherein the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words with the highest word frequency from 100 th to 300 th), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the reverse sequence is [ -1: -100], [ -100: -200], [ -200: -400], [ -700-1000 ], [ -1500-2000 ], [ 1500-2000 ], [100: -2000], [100 ] with the lowest word frequency ], and [ 200-2000 ], [ 200: -2000], [200 ], the method comprises the steps of [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], then counting the number of the range of each word in a composition text as a word frequency distribution characteristic, integrating all basic characteristics, and then training an xgoost model, wherein the basic characteristic description is shown in figure 2.

A semantic module: word vector embedding is carried out on a text by utilizing a glove word vector, then a Bilstm neuron extracts a semantic feature matrix from a text matrix, and the semantic feature matrix is transmitted into a full-link layer and a softmax layer to output corresponding scores.

A theme module: dividing recognized composition texts into two types of running questions and non-running questions, combining each composition text with other student composition texts in the examination to form a text pair, manually marking the text pair formed by two composition texts with one running question and one non-running question with label 1, and forming the text pair formed by other combinations with label 0 to form a text pair training data set, calculating the similarity of the text pair through three schemes, wherein one scheme is that a 50-dimensional glove word vector is used for word vector embedding of the text, converting the text into a vector matrix, then extracting a semantic feature matrix from the text matrix by using a Bilstm neuron to serve as a front-segment input layer of a siamese network, processing the semantic feature matrices of the two texts by matrix superposition and matrix subtraction, and transmitting the two processing results into a full-connection layer and outputting a similar numerical value with a softmax layer after splicing; one scheme is that an examination is taken as a unit, a preprocessed text set is selected out by tfidf, onehot vectors are constructed by the keyword sets, and the similarity of the text pair is calculated by cosine similarity; the last scheme is that firstly, space is used for carrying out part-of-speech tagging on a text, words with parts-of-speech being nouns in the text are counted, high-frequency nouns in the words are selected as a keyword set, an onehot vector is constructed, and the similarity of the text pair is calculated by utilizing cosine similarity; and respectively calculating the mean value, the variance, the median, the maximum value and the minimum value of the similarity of each article and other articles in the examination by using the three schemes to serve as the feature data after the expansion calculation of the running question module.

A writing module: the composition region image is firstly zoomed to 224 small graphs, the 224 small graphs are labeled manually, the handwritten fonts are divided into four categories including 1, 2, 3 and 4 (corresponding to 'poor', 'normal', 'good' and 'excellent') according to the attractiveness of the handwritten fonts, dense connection convolutional network denseNet training classification models are adopted, and writing categories of the 224 small graphs are predicted.

step four, text pair construction: taking an examination as a unit, pairwise composing text pairs of composition texts in each examination, setting label of a text pair composed of marked running composition and non-marked running composition as 1 and other labels as 0, respectively extracting the same number of text pairs with label as 1 and 0 from the text pair in each examination as training data;

(1) and learning rate: 0.001;

(2) and an optimizer: adagarad;

(3) the Bilstm combination mode is that corresponding elements are added;

(5)、batchsize:256；

(6)、epoch:3；

(1) and learning rate: 0.001;

(2) and an optimizer: adagarad;

(3) the Bilstm combination mode is that corresponding elements are added;

(4)、batchsize:256；

(5)、epoch:100；

The method provides an English composition scoring method aiming at image recognition, and various factors such as semantics, writing, themes and the like are comprehensively considered and calculated to obtain a final score by utilizing the standards of statistics, deep learning, machine learning and the like when simulating teacher scoring, so that more accurate prediction of composition scores is realized.

The above description is only for the purpose of explanation and should not be construed as limiting the invention, but rather as the subject matter of any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention.

Claims

1. The English composition scoring method aiming at image recognition is characterized in that four aspects of characteristics of basic characters, subject contents, deep semantics and image writing are extracted from composition texts, and a final score is calculated by utilizing xgboost integration, and mainly comprises the following steps: the system comprises a basic feature module, a theme feature module, a writing feature module and a semantic feature module, and an integrated prediction module.

2. The method for scoring english compositions according to claim 1, wherein the basic feature module is specifically described as: the text features are extracted from the character level, and mainly comprise 16-dimensional features of the number of characters, the number of punctuation marks, the number of sentences, the number of paragraphs, the number of words, the number of word sets, the average length of words, the average length of sentences, the number of inactive words, the number of nouns words, the number of active words, the number of adjectives, the number of interwords, the number of subwords and the number of conjunctions, and 22-dimensional features of word frequency distribution.

3. The method for scoring english compositions according to claim 2, wherein the 22-dimensional word frequency distribution features are specifically described as follows: counting word frequency for the existing composition data set, ordering each word from high to low according to word frequency, and dividing the ordered words into 13 and 9 stage ranges according to positive sequence and negative sequence, namely the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words with the highest word frequency from 100 th to 300 th), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the negative sequence is [ -1: -100] (100 words with the lowest word frequency), [ -100: -200], [ -200-400 ], [ -400: -700], [ -700-1000 ], [ -1500: -2000], [ -3000], [ 3000-3000 ] and the negative sequence is: -3000, [ -3000: -5000], [ -5000: -10000], and the number of 22 ranges to which a word belongs in the text is counted as a 22-dimensional word frequency feature.

4. The method for scoring english compositions according to claim 1, wherein the topic feature module is specifically described as: word vector embedding is carried out on the text by using a glove word vector, and the similarity of the two texts is calculated by extracting text matrix characteristics by using the Bilstm and matching with a siamese network structure; extracting keywords in the text set of the test composition by using tfidf, and calculating the similarity of the two texts by matching with the cosine similarity; labeling part of speech of the text by using space, counting high-frequency nouns in a composition text set as key words, and calculating the similarity of the two texts by matching with cosine similarity; and comparing every two composition texts in the test through three similar schemes, and accumulating the similar results of each composition text and other composition texts to serve as the theme characteristics of the composition text.

5. The method for grading English composition for image recognition according to claim 1, wherein the writing feature module is specifically described as: scaling the composition picture to 224 × 224, labeling the 224 × 224 small pictures as four categories of "poor category", "normal category", "good category" and "excellent category", and training a denseNet161 classification model by using the labeling data for use in the composition picture prediction writing score.

6. The method for scoring english compositions according to claim 1, wherein the semantic feature module is specifically described as: word vector embedding is carried out on the text by the glove word vector, a text characteristic matrix is extracted by using the Bilstm and is input into the full connection layer, and text classification is carried out by matching with softmax.

7. The method according to claim 1, wherein the integrated prediction module is specifically described as integrating the multidimensional features output by the four modules, namely the basic module, the topic module, the writing module and the semantic module, as the input of the xgboost model to perform final prediction.