CN112580333A - English composition scoring method aiming at image recognition - Google Patents

English composition scoring method aiming at image recognition Download PDF

Info

Publication number
CN112580333A
CN112580333A CN202011515007.9A CN202011515007A CN112580333A CN 112580333 A CN112580333 A CN 112580333A CN 202011515007 A CN202011515007 A CN 202011515007A CN 112580333 A CN112580333 A CN 112580333A
Authority
CN
China
Prior art keywords
text
composition
words
module
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011515007.9A
Other languages
Chinese (zh)
Inventor
侯冲
李哲
陈家海
叶家鸣
吴波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui Seven Day Education Technology Co ltd
Original Assignee
Anhui Seven Day Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui Seven Day Education Technology Co ltd filed Critical Anhui Seven Day Education Technology Co ltd
Priority to CN202011515007.9A priority Critical patent/CN112580333A/en
Publication of CN112580333A publication Critical patent/CN112580333A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses an English composition scoring method aiming at image recognition, and relates to the field of text scoring. Aiming at the English composition scoring problem identified by paper scanning, a solution scheme that a plurality of modules are used for extracting features and an xgboost model is used for mapping the features into final scoring is provided. The scheme mainly comprises the following modules: the system comprises a basic score module, a semantic score module, a theme score module, a writing score module and an integrated score module. In the scheme, each module respectively utilizes a traditional nlp processing method, deep learning, machine learning and other modes to extract features, so that student compositions can be thoroughly analyzed, and the scoring accuracy of the compositions is optimized by integrating the weight of each feature.

Description

English composition scoring method aiming at image recognition
Technical Field
The invention belongs to the technical field of text scoring, and particularly relates to a method for scoring a multi-dimensional analysis composition.
Background
Since the three and four decades of the 20 th century, a number of emerging technologies have begun to emerge, as have the computer internet. "science and technology is the first productivity", and the development of internet technology has promoted the revolution of various industries. Network paper marking and automatic correction are products impacted by internet technology in the education industry. The current automatic correction of network marking covers all disciplines such as small, early, high grade and out of language and number. The modification of the english subject is usually the earliest research project of each researcher, and the modification of the composition is the central importance of the modification of the english subject, and although the research of the automatic modification of the english composition has been in the history of decades, the accuracy of the scoring is not satisfactory. There is still a great need for further research and improvement on the automatic correction method for English compositions.
The existing composition scoring method is mainly based on some shallow features or the similarity of the calculated composition text and the scored composition for scoring. Simple shallow features are extracted from the composition text, machine learning fitting regression is utilized according to the score distribution corresponding to each feature, composition scores can be fitted to a certain extent by the method, but the features are too simple and can not reflect the scores of the composition comprehensively, so that errors are large; the scoring is more accurate in some times according to the similarity of the scored composition, however, once the composition subject is too different from the scored composition corpus, the scoring effect slides down linearly. There is much work to do composition scoring.
Disclosure of Invention
The technical problem to be solved is as follows:
the problem that the scores of the identified composition texts are not accurate enough is solved, and a method for scoring the composition texts based on multi-module omnibearing analysis and identification is provided.
The technical scheme is as follows:
in order to achieve the purpose, the English composition scoring method based on image recognition adopts the scheme that a plurality of modules are used for respectively extracting features with different dimensionalities, after the features with different dimensionalities are subjected to expansion calculation, an xgboost model is used for integrally training all the features, and finally a trained model is used for predicting composition scores. The method comprises four characteristic modules of basic characteristics, theme characteristics, writing characteristics and semantic characteristics and an integrated prediction module.
Preferably, the basic feature module is specifically described as follows: the method is characterized in that the traditional natural language processing method is utilized to count character level text features, including character number, punctuation number, sentence number, paragraph number, word set number, word average length, sentence average length, stop word number, non-stop word number, name word number, movable word number, adjective number, interword number, subword number, conjunctive number and word frequency distribution of each word. And counting the number of each part-of-speech word, namely, performing part-of-speech tagging on the composition text by using an open source tool space, and then counting the number of each part-of-speech word tagged. Calculating word frequency distribution, namely counting word frequency of each word in the existing composition data, sequencing all words according to the word frequency, and dividing the ordered words into 13 and 9 stage ranges according to positive sequence and reverse sequence, namely, the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words from 100 th to 300 th with the highest word frequency), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the reverse sequence is [ -1: -100], [100: -200], [ -200: -400], [ -700], [ -1000], [ -1500: -2000], [2000 ] -2000], [100: -2000], [100 ], (100: -2000], [200 ], [2000, The method comprises the steps of [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], then counting the number of the range of each word in a composition text as a word frequency distribution characteristic, and training an xgboost model after integrating all basic characteristics.
Preferably, the subject feature module is specifically described as: dividing recognized composition texts into two types of running questions and non-running questions, combining each composition text with other student composition texts in the examination to form a text pair, manually marking the text pair formed by two composition texts with one running question and one non-running question with label 1, and forming the text pair formed by other combinations with label 0 to form a text pair training data set, calculating the similarity of the text pair through three schemes, wherein one scheme is that a 50-dimensional glove word vector is used for word vector embedding of the text, converting the text into a vector matrix, then extracting a semantic feature matrix from the text matrix by using a Bilstm neuron to serve as a front-segment input layer of a siamese network, processing the semantic feature matrices of the two texts by matrix superposition and matrix subtraction, and transmitting the two processing results into a full-connection layer and outputting a similar numerical value with a softmax layer after splicing; one scheme is that an examination is taken as a unit, a preprocessed text set is selected out by tfidf, onehot vectors are constructed by the keyword sets, and the similarity of the text pair is calculated by cosine similarity; the last scheme is that firstly, space is used for carrying out part-of-speech tagging on a text, words with parts-of-speech being nouns in the text are counted, high-frequency nouns in the words are selected as a keyword set, an onehot vector is constructed, and the similarity of the text pair is calculated by utilizing cosine similarity; and respectively calculating the mean value, the variance, the median, the maximum value and the minimum value of the similarity of each article and other articles in the examination by using the three schemes to serve as the feature data after the expansion calculation of the running question module.
Preferably, the writing feature module is specifically described as: the composition region image is firstly zoomed to 224 small graphs, the 224 small graphs are labeled manually, the handwritten fonts are divided into four categories including 1, 2, 3 and 4 (corresponding to 'poor', 'normal', 'good' and 'excellent') according to the attractiveness of the handwritten fonts, dense connection convolutional network denseNet training classification models are adopted, and writing categories of the 224 small graphs are predicted. Dense connection convolutional network denseNet is set as follows:
(1) and learning rate: 0.01, the attenuation rate is 0.9;
(2) and an optimizer: adagarad;
(3)、batch:32;
(4)、epoch:50;
preferably, the semantic module is specifically described as: word vector embedding is carried out on a text by utilizing a glove word vector, then a Bilstm neuron extracts a semantic feature matrix from a text matrix, and the semantic feature matrix is transmitted into a full-link layer and a softmax layer to output corresponding scores.
Preferably, the integrated prediction module is specifically described as: and integrating the multidimensional characteristics output by the basic module, the theme module, the writing module and the semantic module as the input of the xgboost model for final prediction.
An English composition scoring method aiming at image recognition comprises the following specific steps:
step one, preparing data: preparing identified composition texts (the number of people in each examination is 50+ based on the examination data of more than 50) by taking an examination as a unit, and ensuring that each examination has a running question composition; 4000 scanned images of the prepared composition block are prepared, and various answer paper types of application scenes are contained as much as possible;
step two, shallow layer feature extraction: preprocessing each text and extracting the following characteristics, namely the number of characters, the number of punctuation marks, the number of sentences, the number of paragraphs, the number of words, the number of word sets, the average length of words, the average length of sentences, the number of stop words and the number of non-stop words, labeling the part of speech of the composition text by space, counting the number of first words, the number of action words, the number of adjectives, the number of interwork words, the number of bystanders and the number of conjunctions, counting the word frequency of all the composition texts, sequencing, and dividing the ordered words into 13 and 9 stage ranges according to the positive sequence and the reverse sequence, namely the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (100 th words with the highest word frequency) and 200 words), the [300:600], [600: 1000:1500], [1500: 2000:3000], [5000: 5000], [ 1208000: 00], [17000 ], [25000 ], [ 2501 ],2501,2501, the reverse order is [ -1: -100] (100 words with the lowest word frequency), [ -100: -200], [ -200: -400], [ -400: -700], [ -700: -1000], [ -1500: -2000], [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], and then the number of the respective ranges to which all the words in the composition text belong is counted.
Step three, training a basic model: integrating the shallow features extracted in the step two, and training by using an xgboost model to obtain a basic scoring model;
step four, text pair construction: taking an examination as a unit, pairwise composing text pairs of composition texts in each examination, setting label of a text pair composed of marked running composition and non-marked running composition as 1 and setting other labels as 0, and respectively extracting the same number of text pairs with label as 1 and 0 from the text pair in each examination as training data;
step five, word vector level similarity training: utilizing the glove word vector to carry out word embedding processing, inputting the word embedding processing into a Bilstm + siense network structure, training a text similarity model, and setting the network structure as follows:
(1) and learning rate: 0.001;
(2) and an optimizer: adagarad;
(3) the Bilstm combination mode is that corresponding elements are added;
(4) stacking (subtracting corresponding elements and stacking) two matrixes of sieme;
(5)、batchsize:256;
(6)、epoch:3;
step six, tfidf similarity calculation: taking an examination as a unit, performing tfidf calculation on a text set to extract key words, and calculating the similarity between every two texts by using cosine similarity on the key words;
step seven, high-frequency noun similarity calculation: taking an examination as a unit, labeling the part of speech of a text by using space, counting the word frequency of nouns, taking high-frequency nouns in each examination text as keywords of the examination text, and calculating the similarity between every two texts by using cosine similarity of the keywords;
step eight, training a running question model: respectively calculating the average value, the variance, the median, the maximum value and the minimum value of the similarity of the single text to other texts by using the similarity calculation methods of the fifth step, the sixth step and the seventh step as characteristic data, and training an xgboost model;
step nine, a deep characteristic module: performing glove word embedding processing on a text, extracting text sequence characteristics by using a Bilstm layer, inputting the text sequence characteristics into a fully-connected network, outputting scores by a softmax layer, and setting the network structure as follows:
(1) and learning rate: 0.001;
(2) and an optimizer: adagarad;
(3) the Bilstm combination mode is that corresponding elements are added;
(4)、batchsize:256;
(5)、epoch:100;
step ten, preparing writing data: respectively zooming 4000 composition pictures to 224 × 224 sizes, and classifying into 1, 2, 3 and 4 (corresponding to "poor", "normal", "good" and "excellent") according to whether the handwritten fonts are beautiful, wherein 1000 pieces of each type of data
Eleventh, writing a model training: performing four-classification training on the marked small graph by using a dense connection convolution network denseNet, and storing a model;
step twelve, integrating the method flow: the scores and the characteristics of the basic scoring module, the running question scoring module, the writing scoring module and the semantic scoring module are integrated together, and the scores are predicted by utilizing an xgboost integrated training model.
Advantageous effects
The invention provides an English composition scoring method aiming at image recognition, which has the following beneficial effects: the method mainly aims at the problem that the grading accuracy rate of an English composition identified by an image is not high, and provides a scheme of extracting characteristics in multiple modes and integrating the characteristics into a training model to predict scores.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a table illustrating the basic features of the present invention;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below. The embodiments described herein are merely illustrative and are not intended to be limiting.
As shown in figure 1, the invention is a technical scheme, namely an English composition scoring method for image recognition, and the whole method takes a basic module, a semantic module, a theme module and a writing module as a feature extraction stage, and then performs expansion calculation on feature values to realize the function of score prediction for xgboost model prediction.
A basic module: the method is characterized in that the traditional natural language processing method is utilized to count character level text features, including character number, punctuation number, sentence number, paragraph number, word set number, word average length, sentence average length, stop word number, non-stop word number, name word number, movable word number, adjective number, interword number, subword number, conjunctive number and word frequency distribution of each word. And (4) counting the number of each part of speech word, namely, performing part of speech tagging on the composition text by using an open source tool space, and counting the number of each part of speech word tagged. Calculating word frequency distribution, namely counting word frequency of each word in composition data of existing data, sequencing all words according to word frequency, and dividing ordered words into 13 and 9 stage ranges according to positive sequence and reverse sequence, wherein the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words with the highest word frequency from 100 th to 300 th), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the reverse sequence is [ -1: -100], [ -100: -200], [ -200: -400], [ -700-1000 ], [ -1500-2000 ], [ 1500-2000 ], [100: -2000], [100 ] with the lowest word frequency ], and [ 200-2000 ], [ 200: -2000], [200 ], the method comprises the steps of [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], then counting the number of the range of each word in a composition text as a word frequency distribution characteristic, integrating all basic characteristics, and then training an xgoost model, wherein the basic characteristic description is shown in figure 2.
A semantic module: word vector embedding is carried out on a text by utilizing a glove word vector, then a Bilstm neuron extracts a semantic feature matrix from a text matrix, and the semantic feature matrix is transmitted into a full-link layer and a softmax layer to output corresponding scores.
A theme module: dividing recognized composition texts into two types of running questions and non-running questions, combining each composition text with other student composition texts in the examination to form a text pair, manually marking the text pair formed by two composition texts with one running question and one non-running question with label 1, and forming the text pair formed by other combinations with label 0 to form a text pair training data set, calculating the similarity of the text pair through three schemes, wherein one scheme is that a 50-dimensional glove word vector is used for word vector embedding of the text, converting the text into a vector matrix, then extracting a semantic feature matrix from the text matrix by using a Bilstm neuron to serve as a front-segment input layer of a siamese network, processing the semantic feature matrices of the two texts by matrix superposition and matrix subtraction, and transmitting the two processing results into a full-connection layer and outputting a similar numerical value with a softmax layer after splicing; one scheme is that an examination is taken as a unit, a preprocessed text set is selected out by tfidf, onehot vectors are constructed by the keyword sets, and the similarity of the text pair is calculated by cosine similarity; the last scheme is that firstly, space is used for carrying out part-of-speech tagging on a text, words with parts-of-speech being nouns in the text are counted, high-frequency nouns in the words are selected as a keyword set, an onehot vector is constructed, and the similarity of the text pair is calculated by utilizing cosine similarity; and respectively calculating the mean value, the variance, the median, the maximum value and the minimum value of the similarity of each article and other articles in the examination by using the three schemes to serve as the feature data after the expansion calculation of the running question module.
A writing module: the composition region image is firstly zoomed to 224 small graphs, the 224 small graphs are labeled manually, the handwritten fonts are divided into four categories including 1, 2, 3 and 4 (corresponding to 'poor', 'normal', 'good' and 'excellent') according to the attractiveness of the handwritten fonts, dense connection convolutional network denseNet training classification models are adopted, and writing categories of the 224 small graphs are predicted.
An English composition scoring method aiming at image recognition comprises the following specific steps:
step one, preparing data: preparing identified composition texts (the number of people in each examination is 50+ based on the examination data of more than 50) by taking an examination as a unit, and ensuring that each examination has a running question composition; 4000 scanned images of the prepared composition block are prepared, and various answer paper types of application scenes are contained as much as possible;
step two, shallow layer feature extraction: preprocessing each text and extracting the following characteristics, namely the number of characters, the number of punctuation marks, the number of sentences, the number of paragraphs, the number of words, the number of word sets, the average length of words, the average length of sentences, the number of stop words and the number of non-stop words, labeling the part of speech of the composition text by space, counting the number of first words, the number of action words, the number of adjectives, the number of interwork words, the number of bystanders and the number of conjunctions, counting the word frequency of all the composition texts, sequencing, and dividing the ordered words into 13 and 9 stage ranges according to the positive sequence and the reverse sequence, namely the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (100 th words with the highest word frequency) and 200 words), the [300:600], [600: 1000:1500], [1500: 2000:3000], [5000: 5000], [ 1208000: 00], [17000 ], [25000 ], [ 2501 ],2501,2501, the reverse order is [ -1: -100] (100 words with the lowest word frequency), [ -100: -200], [ -200: -400], [ -400: -700], [ -700: -1000], [ -1500: -2000], [ -2000: -3000], [ -3000: -5000], [ -5000: -10000], and then the number of the respective ranges to which all the words in the composition text belong is counted.
Step three, training a basic model: integrating the shallow features extracted in the step two, and training by using an xgboost model to obtain a basic scoring model;
step four, text pair construction: taking an examination as a unit, pairwise composing text pairs of composition texts in each examination, setting label of a text pair composed of marked running composition and non-marked running composition as 1 and other labels as 0, respectively extracting the same number of text pairs with label as 1 and 0 from the text pair in each examination as training data;
step five, word vector level similarity training: utilizing the glove word vector to carry out word embedding processing, inputting the word embedding processing into a Bilstm + siense network structure, training a text similarity model, and setting the network structure as follows:
(1) and learning rate: 0.001;
(2) and an optimizer: adagarad;
(3) the Bilstm combination mode is that corresponding elements are added;
(4) stacking (subtracting corresponding elements and stacking) two matrixes of sieme;
(5)、batchsize:256;
(6)、epoch:3;
step six, tfidf similarity calculation: taking an examination as a unit, performing tfidf calculation on a text set to extract key words, and calculating the similarity between every two texts by using cosine similarity on the key words;
step seven, high-frequency noun similarity calculation: taking an examination as a unit, labeling the part of speech of a text by using space, counting the word frequency of nouns, taking high-frequency nouns in each examination text as keywords of the examination text, and calculating the similarity between every two texts by using cosine similarity of the keywords;
step eight, training a running question model: respectively calculating the average value, the variance, the median, the maximum value and the minimum value of the similarity of the single text to other texts by using the similarity calculation methods of the fifth step, the sixth step and the seventh step as characteristic data, and training an xgboost model;
step nine, a deep characteristic module: performing glove word embedding processing on a text, extracting text sequence characteristics by using a Bilstm layer, inputting the text sequence characteristics into a fully-connected network, outputting scores by a softmax layer, and setting the network structure as follows:
(1) and learning rate: 0.001;
(2) and an optimizer: adagarad;
(3) the Bilstm combination mode is that corresponding elements are added;
(4)、batchsize:256;
(5)、epoch:100;
step ten, preparing writing data: respectively zooming 4000 composition pictures to 224 × 224 sizes, and classifying into 1, 2, 3 and 4 (corresponding to "poor", "normal", "good" and "excellent") according to whether the handwritten fonts are beautiful, wherein 1000 pieces of each type of data
Eleventh, writing a model training: performing four-classification training on the marked small graph by using a dense connection convolution network denseNet, and storing a model;
step twelve, integrating the method flow: the scores and the characteristics of the basic scoring module, the running question scoring module, the writing scoring module and the semantic scoring module are integrated together, and the scores are predicted by utilizing an xgboost integrated training model.
The method provides an English composition scoring method aiming at image recognition, and various factors such as semantics, writing, themes and the like are comprehensively considered and calculated to obtain a final score by utilizing the standards of statistics, deep learning, machine learning and the like when simulating teacher scoring, so that more accurate prediction of composition scores is realized.
The above description is only for the purpose of explanation and should not be construed as limiting the invention, but rather as the subject matter of any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention.

Claims (7)

1. The English composition scoring method aiming at image recognition is characterized in that four aspects of characteristics of basic characters, subject contents, deep semantics and image writing are extracted from composition texts, and a final score is calculated by utilizing xgboost integration, and mainly comprises the following steps: the system comprises a basic feature module, a theme feature module, a writing feature module and a semantic feature module, and an integrated prediction module.
2. The method for scoring english compositions according to claim 1, wherein the basic feature module is specifically described as: the text features are extracted from the character level, and mainly comprise 16-dimensional features of the number of characters, the number of punctuation marks, the number of sentences, the number of paragraphs, the number of words, the number of word sets, the average length of words, the average length of sentences, the number of inactive words, the number of nouns words, the number of active words, the number of adjectives, the number of interwords, the number of subwords and the number of conjunctions, and 22-dimensional features of word frequency distribution.
3. The method for scoring english compositions according to claim 2, wherein the 22-dimensional word frequency distribution features are specifically described as follows: counting word frequency for the existing composition data set, ordering each word from high to low according to word frequency, and dividing the ordered words into 13 and 9 stage ranges according to positive sequence and negative sequence, namely the positive sequence is [0:100] (100 words with the highest word frequency), the [100:300] (200 words with the highest word frequency from 100 th to 300 th), the [300:600], [600:1000], [1000:1500], [1500:2000], [2000:3000], [3000:5000], [5000:8000], [8000:12000], [12000:17000], [17000:25000], [25000: -1], the negative sequence is [ -1: -100] (100 words with the lowest word frequency), [ -100: -200], [ -200-400 ], [ -400: -700], [ -700-1000 ], [ -1500: -2000], [ -3000], [ 3000-3000 ] and the negative sequence is: -3000, [ -3000: -5000], [ -5000: -10000], and the number of 22 ranges to which a word belongs in the text is counted as a 22-dimensional word frequency feature.
4. The method for scoring english compositions according to claim 1, wherein the topic feature module is specifically described as: word vector embedding is carried out on the text by using a glove word vector, and the similarity of the two texts is calculated by extracting text matrix characteristics by using the Bilstm and matching with a siamese network structure; extracting keywords in the text set of the test composition by using tfidf, and calculating the similarity of the two texts by matching with the cosine similarity; labeling part of speech of the text by using space, counting high-frequency nouns in a composition text set as key words, and calculating the similarity of the two texts by matching with cosine similarity; and comparing every two composition texts in the test through three similar schemes, and accumulating the similar results of each composition text and other composition texts to serve as the theme characteristics of the composition text.
5. The method for grading English composition for image recognition according to claim 1, wherein the writing feature module is specifically described as: scaling the composition picture to 224 × 224, labeling the 224 × 224 small pictures as four categories of "poor category", "normal category", "good category" and "excellent category", and training a denseNet161 classification model by using the labeling data for use in the composition picture prediction writing score.
6. The method for scoring english compositions according to claim 1, wherein the semantic feature module is specifically described as: word vector embedding is carried out on the text by the glove word vector, a text characteristic matrix is extracted by using the Bilstm and is input into the full connection layer, and text classification is carried out by matching with softmax.
7. The method according to claim 1, wherein the integrated prediction module is specifically described as integrating the multidimensional features output by the four modules, namely the basic module, the topic module, the writing module and the semantic module, as the input of the xgboost model to perform final prediction.
CN202011515007.9A 2020-12-21 2020-12-21 English composition scoring method aiming at image recognition Pending CN112580333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011515007.9A CN112580333A (en) 2020-12-21 2020-12-21 English composition scoring method aiming at image recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011515007.9A CN112580333A (en) 2020-12-21 2020-12-21 English composition scoring method aiming at image recognition

Publications (1)

Publication Number Publication Date
CN112580333A true CN112580333A (en) 2021-03-30

Family

ID=75136556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011515007.9A Pending CN112580333A (en) 2020-12-21 2020-12-21 English composition scoring method aiming at image recognition

Country Status (1)

Country Link
CN (1) CN112580333A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343672A (en) * 2021-06-21 2021-09-03 哈尔滨工业大学 Unsupervised bilingual dictionary construction method based on corpus merging

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929674A (en) * 2019-11-29 2020-03-27 安徽七天教育科技有限公司 Writing rating method for composition area in test paper scanning image
CN111079582A (en) * 2019-11-29 2020-04-28 安徽七天教育科技有限公司 Image recognition English composition running question judgment method
CN111104789A (en) * 2019-11-22 2020-05-05 华中师范大学 Text scoring method, device and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104789A (en) * 2019-11-22 2020-05-05 华中师范大学 Text scoring method, device and system
CN110929674A (en) * 2019-11-29 2020-03-27 安徽七天教育科技有限公司 Writing rating method for composition area in test paper scanning image
CN111079582A (en) * 2019-11-29 2020-04-28 安徽七天教育科技有限公司 Image recognition English composition running question judgment method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕欣;程雨夏;: "基于语义相似度与XGBoost算法的英语作文智能评价框架研究", 浙江大学学报(理学版), no. 03, 15 May 2020 (2020-05-15) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343672A (en) * 2021-06-21 2021-09-03 哈尔滨工业大学 Unsupervised bilingual dictionary construction method based on corpus merging
CN113343672B (en) * 2021-06-21 2022-12-16 哈尔滨工业大学 Unsupervised bilingual dictionary construction method based on corpus merging

Similar Documents

Publication Publication Date Title
CN108287822B (en) Chinese similarity problem generation system and method
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN110598203B (en) Method and device for extracting entity information of military design document combined with dictionary
CN112115238A (en) Question-answering method and system based on BERT and knowledge base
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
CN111694927B (en) Automatic document review method based on improved word shift distance algorithm
Li et al. Word embedding and text classification based on deep learning methods
CN109033064B (en) Primary school Chinese composition corpus label automatic extraction method based on text abstract
CN111309891B (en) System for reading robot to automatically ask and answer questions and application method thereof
CN106446147A (en) Emotion analysis method based on structuring features
CN113065341A (en) Automatic labeling and classifying method for environmental complaint report text
Hughes Automatically acquiring a classification of words
Nugraha et al. Typographic-based data augmentation to improve a question retrieval in short dialogue system
CN107894976A (en) A kind of mixing language material segmenting method based on Bi LSTM
Shashirekha et al. CoLI-machine learning approaches for code-mixed language identification at the word level in Kannada-English texts
CN113934814A (en) Automatic scoring method for subjective questions of ancient poetry
CN113761128A (en) Event key information extraction method combining domain synonym dictionary and pattern matching
CN112580333A (en) English composition scoring method aiming at image recognition
CN112860781A (en) Mining and displaying method combining vocabulary collocation extraction and semantic classification
CN111881685A (en) Small-granularity strategy mixed model-based Chinese named entity identification method and system
CN110705306B (en) Evaluation method for consistency of written and written texts
CN111079582A (en) Image recognition English composition running question judgment method
CN115952794A (en) Chinese-Tai cross-language sensitive information recognition method fusing bilingual sensitive dictionary and heterogeneous graph
CN115630140A (en) English reading material difficulty judgment method based on text feature fusion
CN114579706A (en) Automatic subjective question evaluation method based on BERT neural network and multitask learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination