CN109933668B - Hierarchical evaluation modeling method for readability of simplified Chinese text - Google Patents

Hierarchical evaluation modeling method for readability of simplified Chinese text Download PDF

Info

Publication number
CN109933668B
CN109933668B CN201910206775.7A CN201910206775A CN109933668B CN 109933668 B CN109933668 B CN 109933668B CN 201910206775 A CN201910206775 A CN 201910206775A CN 109933668 B CN109933668 B CN 109933668B
Authority
CN
China
Prior art keywords
text
readability
difficulty
features
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910206775.7A
Other languages
Chinese (zh)
Other versions
CN109933668A (en
Inventor
李虹
李苗苗
李燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201910206775.7A priority Critical patent/CN109933668B/en
Publication of CN109933668A publication Critical patent/CN109933668A/en
Application granted granted Critical
Publication of CN109933668B publication Critical patent/CN109933668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention belongs to the field of Chinese language data processing, and particularly relates to a hierarchical evaluation modeling method for readability of simplified Chinese texts. The grading evaluation modeling method for the readability of the simplified Chinese text comprises the following steps: creating a standard corpus; extracting text features; and (4) constructing a readability formula and evaluating the effect of the formula. The invention selects the text characteristics of three layers of Chinese characters, vocabularies and sentences on the basis of the traditional Chinese readability formula, and constructs a Chinese text readability formula which is suitable for simplified Chinese native language at primary school and has grade classification.

Description

Hierarchical evaluation modeling method for readability of simplified Chinese text
Technical Field
The invention belongs to the field of Chinese language data processing, and particularly relates to a hierarchical evaluation modeling method for readability of simplified Chinese texts.
Background
In the modern information society, the books for children grow exponentially, and the problem that how to select good books suitable for children from books in the great amount as in the tobacco sea is troubling teachers and parents is solved. According to the recent development area theory, the difficulty of reading materials for children is slightly higher than the current development level of children, but not too high, so as to achieve the purposes of training and improving the reading ability of children. If the selected reading material is too difficult, the reading efficiency of the children is damaged, so that the children can escape reading; and too simple materials can make children feel uninteresting and lose reading interest, and the purposes of cultivating reading habits and improving reading ability cannot be achieved. At present, most of the existing book grading systems are dominated by publishers, solid theoretical research is not taken as a foundation, the effectiveness of the book grading systems is also verified by empirical research, the book grading systems are not scientific enough, the public confidence is not high, the influence is not large, and the book grading systems have limited guiding significance for teenagers to read. In order to realize the matching of the reading ability of children and the difficulty of books, an objective and efficient Chinese text readability formula is researched and developed while the reading ability of children is accurately evaluated, and the text difficulty is accurately evaluated, so that the method is one of the difficulties and hot problems of the existing grading reading research.
The readability formula refers to extracting some quantifiable text features which affect reading difficulty by adopting a mathematical expression method, and determining a functional relation between the features and the text difficulty. Currently, there are dozens of readability formulas in the English system, such as the U.S. blues readability formula, the A-Z classification method, the Oxford reading Tree series in the United kingdom, and the like. The formulas have high accuracy and wide application range, and a huge grading reading system is established on the basis of the formulas, so that the formulas play a great role in promoting the reading ability cultivation and habit formation of English children and the like.
Because the Chinese language and the English language have great difference, the readability formula in the English world cannot be directly applied to Chinese text, but the Chinese readability formula of the prior searchable mathematical formula only has 7 items, mainly aims at traditional Chinese learners or Chinese teaching, most formulas do not provide clear grade division standards, and the reading selection guidance significance for pupils in continental region is limited. Therefore, creating a text readability formula for the primary school simplified Chinese native language remains a challenging frontier task.
Disclosure of Invention
The invention aims to provide a simplified Chinese text readability grading evaluation modeling method.
The method for modeling simplified Chinese text readability by hierarchical evaluation according to the specific embodiment of the invention comprises the following steps:
selecting a proper text to establish a standard corpus and carrying out grade marking on the text;
the characteristics of the text are extracted,
defining text difficulty characteristics of word, word and sentence levels, respectively carrying out word cutting, word and sentence labeling and the like on texts in a standard corpus, calculating difficulty characteristic values of each text, and then selecting an optimal characteristic set of the text difficulty characteristics;
a text readability grading evaluation formula is constructed,
the text in the standard corpus is divided into a training text set and a test text set,
the marked grade of the training text set is used as a dependent variable Y, and the optimal feature set is used as an independent variable (X)1,X2,X3) Adopting a linear regression model to obtain a readability grading evaluation formula as follows:
Yi=β01X1i2X2i3X3iiwherein Y isiRepresenting the readability level (1-12), X, of the text1i,X2iAnd X3iValues, β, representing the three best feature sets of this text, respectively0Is constant, represents the intercept, beta1,β2And beta3Is a partial regression coefficient, representing the variable X with the other variables remaining unchanged1,X2Or X3The amount of change in the Y value by one unit;
and evaluating the readability formula by taking the test text set as a reference.
According to the grading evaluation modeling method for the readability of the simplified Chinese text, in the step of extracting the text characteristics, an NLPIR Chinese word segmentation system is adopted to perform word segmentation and part-of-speech tagging on the text.
According to the grading evaluation modeling method for readability of simplified Chinese texts, which is disclosed by the embodiment of the invention, the optimal feature set is selected through the following steps:
respectively calculating the correlation between all the text difficulty characteristics and the text difficulty grades, and sequencing the text difficulty characteristics from large to small according to the absolute value of the correlation coefficient;
according to the sorting, sequentially selecting text difficulty characteristic values to enter an alternative characteristic set, and establishing a regression equation;
and selecting the text difficulty features left in the alternative feature set through co-linear judgment to obtain an optimal feature set.
According to the grading evaluation modeling method for readability of simplified Chinese texts, the method for selecting the text difficulty characteristics left in the alternative characteristic set through collinearity judgment comprises the following steps:
if the text difficulty characteristic X in the alternative characteristic set is used1、X2、……XkThere is a number λ of not all 01、λ2……λkSo that λ1X12X2+……λk XkiIf the candidate feature set is 0, the collinearity problem exists in the candidate feature set, at this time, two text difficulty features with the collinearity problem need to be found out, and under the condition that other features are not changed, Δ R after the two text difficulty features are added is compared2Retention of Δ R in the alternative feature set2Larger features; if the candidate feature set does not have the collinearity problem, calculating the Delta R after the feature is added2If Δ R2>2%, reserving the features in the alternative feature set, and otherwise, deleting the features;
and circulating the steps until all the text difficulty features in the alternative feature set are traversed.
According to the hierarchical evaluation modeling method for readability of simplified Chinese texts, the construction method of the hierarchical evaluation formula for readability of simplified Chinese texts comprises the following steps:
the marked grade of the training text set is used as a dependent variable Y, and the optimal feature set is used as an independent variable (X)1,X2,X3) Let Y follow X1,X2,X3Changes, and exists in a linear relationship: y isi=β01X1i2X2i3X3ii(i ═ 1,2,3, …, n), suppose
Figure BDA0001999234920000031
Respectively is a parameter beta0,β1,β2,β3The regression value of Y can be expressed as:
Figure BDA0001999234920000032
observed value YiAnd the regression value
Figure BDA0001999234920000033
Residual error e ofiIs composed of
Figure BDA0001999234920000034
According to the method of least squares,
Figure BDA0001999234920000035
should be such that all observations YkAnd the regression value
Figure BDA0001999234920000036
The sum of squared deviations of (a) and (b) is minimized, i.e. Q is obtained
Figure BDA0001999234920000037
The minimum value is obtained, and the minimum value,
according to the extreme value principle of the multivariate function, Q is respectively paired
Figure BDA0001999234920000038
First order partial derivatives are calculated and made equal to zero, i.e.
Figure BDA0001999234920000039
In the form of a matrix of
Figure BDA00019992349200000310
Because of the fact that
Figure BDA00019992349200000311
Figure BDA00019992349200000312
Is provided with
Figure BDA0001999234920000041
For the estimated value vector, sample regression model
Figure BDA0001999234920000042
The transposed matrix X' of the sample observation matrix X is multiplied by the two sides, then
Figure BDA0001999234920000043
Get the equation system
Figure BDA0001999234920000044
Since there is no multicollinearity, X 'X is a 4 th order square matrix, so X' X full rank, the inverse of X 'X (X' X)-1Exist, thus
Figure BDA0001999234920000045
I.e. the OLS estimator for beta,
to obtain
Figure BDA0001999234920000046
According to the grading evaluation modeling method for readability of the simplified Chinese text, which is provided by the specific embodiment of the invention, a test text set is taken as a reference, and a grading evaluation formula for readability of the simplified Chinese text is evaluated through the following steps:
calculating an observed value Y calculated from a readability formulaObservation ofAnd the actual value Y of the test text setPractice ofR between the two;
calculating the variation interpretation quantity R of the readability formula to the test text set data2,R2=r2
Calculating the approach accuracy rate, wherein the approach accuracy rate is equal to YObservation of-YPractice ofIf the adjacent accuracy is not more than 1, the evaluation is determined to be correct; calculating the proportion of the total number of the correctly evaluated texts in the total number of the test text sets, namely the near accuracy;
calculating the root mean square error:
Figure BDA0001999234920000047
when 0< r <1, r is close to 1, and
0<R2<1,R2is close to 1, and
the closer the accuracy rate is 1, the closer the accuracy rate is to 1, and
the smaller the root mean square error is, the more accurate the readability grade evaluation formula is judged.
The invention has the beneficial effects that:
based on the characteristics of Chinese, the invention provides a hierarchical assessment modeling method which can carry out difficulty characteristic analysis and automation on three levels of Chinese characters, vocabularies and syntax on Chinese texts, and ensures the objectivity of text difficulty assessment;
based on the statistical principle, the feature optimization is carried out on the basis of comprehensively analyzing 44 text features, the model is simplified, the problem of multiple collinearity is avoided, and the intelligibility of the model is improved while the prediction accuracy is ensured;
the invention constructs a Chinese readability formula and a text grading system, can be combined with Chinese reading capability evaluation, finally establishes a ladder reading system with Chinese characteristics and promotes the ladder reading system, realizes the effective matching of the reading capability of students and the difficulty of books, and scientifically promotes the development of the reading capability of all teenagers and children.
Drawings
FIG. 1 shows a flow chart of a hierarchical assessment method of the present invention;
FIG. 2 shows a flow chart of optimal feature set selection.
Detailed Description
Example 1
As shown in fig. 1, the modeling method for hierarchical evaluation of readability of simplified chinese text of the present invention comprises the following steps:
1. establishing golden standard corpus, i.e. defining dependent variables
1.1 selecting appropriate text
The invention mainly aims at reading materials of primary school children in continental areas, so that the selected text is from four versions of primary school Chinese textbooks widely used in the continental areas, and mainly comprises a set of people education publishers, Beijing university publishers, Jiangsu education publishers and southwest university publishers, wherein each publisher is provided with a set (12 books), 48 books are counted, and each book has clear grade information (book number) which can be used as the grade of the text.
1.2 screening text
Because ancient Chinese and modern Chinese have great difference in syntax, word meaning, modern poetry does not have punctuation marks, it is difficult to make statistics of the text characteristics at the sentence level, so the texts of ancient poetry, ancient Chinese, modern poetry, etc. have been deleted through manual inspection. The final gold standard corpus has 1478 texts, which totals 801550 characters, and the specific information is shown in table 1.
TABLE 1 Standard corpus
Figure BDA0001999234920000051
1.3 text rating labels
And marking each text at a grade of 1-12 according to the number of appearing books of the text in the teaching material (each grade is divided into an upper school period and a lower school period, and the six grades are 12 books in total).
2. Extracting text features, i.e. defining arguments
2.1 defining text features
The invention defines 44 text difficulty characteristics of three layers of characters, words and sentences, and the specific text characteristic names and definitions are shown in table 2:
table 2 text feature summary
Figure BDA0001999234920000061
Figure BDA0001999234920000071
2.2 text preprocessing
The method adopts an NLPIR Chinese word segmentation system (originated from NLPIR. org (natural language processing and information retrieval shared platform)) to perform word segmentation and part-of-speech tagging on the text, and the word segmentation and tagging accuracy of the system reaches 98.45%.
2.3 text feature computation
2.3.1 counting the number of words, word numbers, word types and the number of punctuation marks in the article;
2.3.2 comparing the characters and words with a Chinese character stroke number table, a word difficulty level table and the like to obtain the relevant information of each word and word;
2.3.3, counting the part of speech distribution of the vocabulary;
2.3.4 the operative definition of 44 features in table 2, and the results of 2.3.1 to 2.3.3, the corresponding 44 feature values for each text were obtained.
2.4 selecting an optimal feature set
2.4.1 calculate 44 features (X) respectively1,X2,X3,……X44) A correlation coefficient (r) with the text difficulty level (Y), in particular
Figure BDA0001999234920000081
Wherein j is 1,2,3, … …, 44; n is 1478; sigmaXj,σYRepresents XjStandard deviation of Y; xjiRepresenting the fraction of the ith text on the characteristics of the jth text; y isiA text difficulty rating representing the ith text;
Figure BDA0001999234920000082
representing the average of scores of all texts on the j text feature;
Figure BDA0001999234920000083
representing the average of the Y values of all text.
2.4.2 according to the absolute value of the correlation coefficient (r), sorting 44 characteristics from large to small, and sequentially selecting one characteristic according to the sequenceInputting the candidate characteristic set and establishing a regression equation Yi=β01X1i2X2i+……+βkXkii
Wherein, YiIndicating the difficulty rating, X, of the ith text1i,X2i,……,XkiK candidate feature set scores, beta, representing the text, respectively0Is constant, represents the intercept, beta1,β2……,βkIs a partial regression coefficient, representing the variable X with the other variables remaining unchanged1,X2,……,XkThe amount of change in the Y value by one unit.
2.4.3 making collinearity decisions
If for feature X in the candidate feature set at this time1,X2,……XkThere is a constant lambda of not all 01,λ2……λkμ, such that λ1X12X2+……λk XkAnd the + mu is 0, namely, the co-linearity problem exists in the judgment candidate feature set. On the other hand, if the expression is not solved, the constant λ of not all 0 can not be found1,λ2……λkMu makes the equation true, then there is no collinearity problem.
When the collinearity problem exists in the alternative feature set, k features X in the alternative feature set are calculated1,X2,……XkIf the correlation coefficient between two characteristics is larger than 0.75, the collinearity problem of the two characteristics can be determined.
Hypothesis feature Xk-1And XkIf the collinearity problem exists, firstly establishing a regression equation model M without adding the two characteristics0:Yi=β01X1i+……+βk-2Xk-2ii(the meaning of the parameters is the same as 2.4.2) and calculating multiple blocks of the model
Figure BDA0001999234920000084
Wherein the content of the first and second substances,
Figure BDA0001999234920000085
the value of each text Y is calculated according to the regression modeliIs the actual value of Y and is,
Figure BDA0001999234920000086
means the average value of Y values;
then, in the model M0Respectively adding the characteristics X on the basis of the characteristics ofk-1And XkEstablishing a model M1:Yi=β01X1i+……+βk-2Xk-2ik-1Xk-1ii(the meaning of the parameters is 2.4.2) and M2:Yi=β01X1i+……+βk-2Xk-2ikXkii(the meaning of the parameters is the same as 2.4.2), the multiple determination coefficients R of the models M1 and M2 are also obtainedM1 2And RM1 2. Finally, the calculation is compared to model M0In other words, model M1And model M2Increased R of2Variation amount: delta RM1 2=RM1 2-RM0 2;△RM2 2=RM2 2-RM0 2Retention of Δ R2All features in the larger model go into the set of candidate features.
If the candidate feature set does not have the co-linearity problem, calculating the Delta R after the feature is added2If Δ R2>2%, the feature is retained in the alternative feature set, otherwise the feature is deleted.
And 2.4.4 circulating the steps 2.4.2-2.4.3 until all the characteristics are traversed, and referring to the figure 2 in the flow chart.
2.4.5 finally obtaining an optimal feature set, wherein the optimal feature set finally comprises three features: the average difficulty of character types and the ratio of the virtual words in the character type and the literacy table.
3. Establishing readability formula and evaluating formula effect
3.1 determining training and test text sets
Randomly dividing the texts in each book of the Chinese teaching material into a training text set and a test text set, and ensuring that the number ratio of the texts in the training text set to the texts in the test text set in each version and each book is 1: 1.
3.2 establishing readability formulas
Marking the grade of the training text set as a dependent variable Y, and taking the optimal characteristic set (the character type, the average difficulty of the character types of the character learning table and the ratio of the null words) determined in the step 2.4 as an independent variable (X)1,X2,X3) Adopting a linear regression model to construct a readability formula, which is as follows:
let Y follow X1,X2,X3And in a linear relationship, formulated as follows:
Yi=β01X1i2X2i3X3ii
wherein, YiRepresenting the readability level of the text, X1i,X2i,X3iThe values of the average difficulty of the character type and the character type of the literacy table of the text, the virtual word proportion, beta0Is constant, represents the intercept, beta1,β2,β3Is a partial regression coefficient, representing the variable X with the other variables remaining unchanged1,X2Or X3The amount of change in the Y value by one unit.
Suppose that
Figure BDA0001999234920000091
Respectively is a parameter beta0,β1,β2,β3The regression value of Y can be expressed as:
Figure BDA0001999234920000092
observed value YiAnd the regression value
Figure BDA0001999234920000093
Residual error e ofiIs composed of
Figure BDA0001999234920000094
According to the method of least squares,
Figure BDA0001999234920000095
should be such that all observations YkAnd the regression value
Figure BDA0001999234920000096
The sum of squared deviations of (a) and (b) is minimized, i.e. Q is obtained
Figure BDA0001999234920000097
The minimum value is obtained.
According to the extreme value principle of the multivariate function, Q is respectively paired
Figure BDA0001999234920000098
First order partial derivatives are calculated and made equal to zero, i.e.
Figure BDA0001999234920000099
After the arrangement and simplification, the matrix form is
Figure BDA0001999234920000101
Because of the fact that
Figure BDA0001999234920000102
Figure BDA0001999234920000103
Is provided with
Figure BDA0001999234920000104
For the estimated value vector, sample regression model
Figure BDA0001999234920000105
The transposed matrix X' of the sample observation matrix X is multiplied by the two sides, then
Figure BDA0001999234920000106
Get normal system of equations
Figure BDA0001999234920000107
Since there is no multicollinearity, X 'X is a 4 th order square matrix, so X' X full rank, the inverse of X 'X (X' X)-1Exist, thus
Figure BDA0001999234920000108
I.e. an OLS estimator for beta.
Finally, find out
Figure BDA0001999234920000109
The resulting readability formula is:
grade number-4.84 +0.01*Type +3.34*Average difficulty of character type of character learning table +7.83*Ratio of the imaginary words.
3.3 readability formula evaluation
And evaluating the readability formula by taking the test text set as a reference, wherein the method specifically comprises the following steps:
3.3.1 calculate r value: calculating an observed value (Y) calculated from a readability formulaObservation of) And the actual value (Y) of the test text setPractice of) The correlation coefficient between (the calculation formula is as same as 2.4.1, concretely is
Figure BDA00019992349200001010
Figure BDA0001999234920000111
Wherein n is 1478; sigmaY observation,σY actualRespectively represent YObservation ofAnd YPractice ofStandard deviation of (d); y isObservation iRepresenting the difficulty level of the text calculated by the readability formula of the ith text; y isReality iRepresenting the actual text difficulty level of the ith text;
Figure BDA0001999234920000112
representing an average of all text difficulty rating observations;
Figure BDA0001999234920000113
representing the average of the actual values of all text difficulty ratings. The value of r ranges from 0 to 1, and the closer to 1, the better the readability formula is.
3.3.2 calculation of R2:R2Is an important index for measuring the regression result and represents the variation interpretation quantity of the readability formula on the difficulty value of the test text set, R2=r2
R2The value range is between 0 and 1, and the closer to 1, the better the readability formula is.
3.3.3 calculate proximity accuracy: the near-accurate means that the observed value and the actual value are different by one level and the prediction is correct. For example, if the actual value of the text is 3, then the observed value is 2,3 or 4, and the adjacent accuracy is | YObservation of-YPractice of|<The text accounts for 1, the value range is between 0 and 1, and the closer to 1, the better the readability formula is.
3.3.4 root mean square error: the root mean square error is the square root deviation between an observed value and an actual value, and the specific calculation formula is as follows:
Figure BDA0001999234920000114
the smaller the value, the better.
The indexes of the readability formula constructed by the invention are shown in table 3:
TABLE 3 readability formula indices
Figure BDA0001999234920000115
According to the result, the Chinese readability formula constructed by the method can be used for predicting the difficulty of Chinese texts in the primary school stage and carrying out 1-12-grade difficulty calibration.

Claims (4)

1. The hierarchical evaluation modeling method for the readability of the simplified Chinese text is characterized by comprising the following steps of:
selecting a proper text to establish a standard corpus and carrying out grade marking on the text;
extracting text features;
defining text difficulty characteristics of word, word and sentence levels, respectively carrying out word cutting and word, word and sentence marking processing on texts in a standard corpus, calculating difficulty characteristic values of each text, and then selecting an optimal characteristic set of the text difficulty characteristics;
a text readability grading evaluation formula is constructed,
the text in the standard corpus is divided into a training text set and a test text set,
the marked grade of the training text set is used as a dependent variable Y, and the optimal feature set is used as an independent variable (X)1,X2,X3) Adopting a linear regression model to obtain a readability grading evaluation formula as follows:
Yi=β01X1i2X2i3X3ii
wherein, beta0Is constant, represents the intercept, beta1,β2And beta3Is a partial regression coefficient, representing the variable X with the other variables remaining unchanged1,X2Or X3The amount of change in the Y value by one unit,
evaluating the readability grading evaluation formula by taking the test text set as a reference,
wherein the content of the first and second substances,
selecting an optimal feature set by:
respectively calculating correlation coefficients of the text difficulty features and the text difficulty grades, and sequencing the text difficulty features according to absolute values of the correlation coefficients;
according to the sorting, sequentially selecting the difficulty features to enter an alternative feature set, and establishing a regression equation;
selecting the text difficulty characteristics left in the candidate characteristic set through collinearity judgment to obtain an optimal characteristic set,
wherein the content of the first and second substances,
the method for selecting the text difficulty characteristics left in the alternative characteristic set through collinearity judgment comprises the following steps:
text difficulty feature X as in alternative feature set1、X2、……XkThere is a number λ of not all 01、λ2……λkSo that λ1X12X2+……λkXkiIf 0, the candidate features are concentrated to have a collinearity problem, at this time, two text difficulty features having the collinearity problem need to be found out, and Δ R after the two text difficulty features are respectively added is compared under the condition that other features are kept unchanged2Retention of Δ R in the alternative feature set2Larger features; if the candidate feature set does not have the collinearity problem, calculating the Delta R after the feature is added2If Δ R2>2%, the text difficulty feature is reserved in the alternative feature set, otherwise, the text difficulty feature is deleted;
and circulating the steps until all the text difficulty features in the alternative feature set are traversed.
2. The modeling method for hierarchical assessment of readability of simplified chinese text according to claim 1, wherein in the step of extracting the text features, the text is processed by word segmentation and part-of-speech tagging using NLPIR chinese segmentation system.
3. The modeling method for hierarchical evaluation of readability of simplified chinese text according to claim 1, wherein the readability hierarchical evaluation formula is constructed as follows:
the marked grade of the training text set is used as a dependent variable Y, and the optimal feature set is used as an independent variable (X)1,X2,X3) Let Y follow X1,X2,X3Changes, and exists in a linear relationship: y isi=β01X1i2X2i3X3ii(i ═ 1,2,3, …, n), suppose
Figure FDA0002827775030000021
Respectively is a parameter beta0,β1,β2,β3The regression value of Y can be expressed as:
Figure FDA0002827775030000022
observed value YiAnd the regression value
Figure FDA0002827775030000023
Residual error e ofiIs composed of
Figure FDA0002827775030000024
According to the method of least squares,
Figure FDA0002827775030000025
should be such that all observations YkAnd the regression value
Figure FDA0002827775030000026
Is minimized, i.e. such that
Figure FDA0002827775030000027
The minimum value is obtained, and the minimum value,
according to the extreme value principle of the multivariate function, Q is respectively paired
Figure FDA0002827775030000028
First order partial derivatives are calculated and made equal to zero, i.e.
Figure FDA0002827775030000029
In the form of a matrix of
Figure FDA00028277750300000210
Because of the fact that
Figure FDA00028277750300000211
Figure FDA0002827775030000031
Is provided with
Figure FDA0002827775030000032
For the estimated value vector, sample regression model
Figure FDA0002827775030000033
The transposed matrix X' of the sample observation matrix X is multiplied by the two sides, then
Figure FDA0002827775030000034
Get the equation system
Figure FDA0002827775030000035
Since there is no multicollinearity, X 'X is a 4 th order square matrix, so X' X full rank, the inverse of X 'X (X' X)-1Exist, thus
Figure FDA0002827775030000036
I.e. the OLS estimator for beta,
to obtain
Figure FDA0002827775030000037
4. The modeling method for hierarchical assessment of readability of simplified chinese text according to claim 1, wherein the simplified chinese text readability hierarchical assessment formula is assessed with reference to the test text set by the following steps:
calculating an observed value Y calculated from a readability formulaObservation ofAnd the actual value Y of the test text setPractice ofThe correlation coefficient r between;
calculating the variation interpretation quantity R of the readability formula to the test text set data2,R2=r2
Calculating the approach accuracy rate, wherein the approach accuracy rate is equal to YObservation of-YPractice ofIf the adjacent accuracy is not more than 1, the evaluation is determined to be correct; calculating the proportion of the total number of the correctly evaluated texts in the total number of the test text sets, namely the near accuracy;
calculating the root mean square error:
Figure FDA0002827775030000038
when 0< r <1, r is close to 1, and
0<R2<1,R2is close to 1, and
the closer the accuracy rate is 1, the closer the accuracy rate is to 1, and
the smaller the root mean square error is, the more accurate the readability grade evaluation formula is judged.
CN201910206775.7A 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text Active CN109933668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910206775.7A CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910206775.7A CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Publications (2)

Publication Number Publication Date
CN109933668A CN109933668A (en) 2019-06-25
CN109933668B true CN109933668B (en) 2021-03-26

Family

ID=66987605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910206775.7A Active CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Country Status (1)

Country Link
CN (1) CN109933668B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472236A (en) * 2019-07-23 2019-11-19 浙江大学城市学院 A kind of two-way GRU text readability appraisal procedure based on attention mechanism
CN111797499B (en) * 2020-06-02 2023-12-15 黑龙江省农业科学院绥化分院 Crop breeding multi-objective optimization method
CN112115701B (en) * 2020-09-07 2021-07-09 北京语言大学 News reading text readability evaluation method and system
CN112836275B (en) * 2021-02-08 2023-03-14 哈尔滨工业大学 Stadium emergency evacuation sign readability evaluation system based on fuzzy theory and control method thereof
CN113408295B (en) * 2021-06-22 2023-02-28 深圳证券信息有限公司 Text readability evaluation method, computer device and computer storage medium
CN113569556B (en) * 2021-07-28 2024-04-02 怀化学院 Grading method for children reading test text difficulty based on Ross model
CN113934850B (en) * 2021-11-02 2022-06-17 北京语言大学 Chinese text readability evaluation method and system fusing text distribution law characteristics
CN115147013B (en) * 2022-08-31 2023-07-18 南京复保科技有限公司 Insurance product readability calculating method, apparatus, computer device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
CN106951406A (en) * 2017-03-13 2017-07-14 广西大学 A kind of stage division of the Chinese reading ability based on text language variable
CN107609591A (en) * 2017-09-13 2018-01-19 深圳市悦好教育科技有限公司 A kind of books stage division and system
CN107977449A (en) * 2017-12-14 2018-05-01 广东外语外贸大学 A kind of linear model approach estimated for simplified form of Chinese Character readability
CN108389147A (en) * 2018-02-26 2018-08-10 浙江创课教育科技有限公司 Item difficulty hierarchical processing method and system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544393B (en) * 2013-10-23 2017-05-24 北京师范大学 Method for tracking development of language abilities of children
CN103530523B (en) * 2013-10-23 2017-01-04 北京师范大学 Child linguistic competence development evaluation modeling method
US10162864B2 (en) * 2015-06-07 2018-12-25 Apple Inc. Reader application system utilizing article scoring and clustering
US10503829B2 (en) * 2016-10-13 2019-12-10 Booxby Inc. Book analysis and recommendation
CN106601041A (en) * 2016-12-15 2017-04-26 邵宏锋 Reading information grading analysis processing system
CN107657559A (en) * 2017-08-25 2018-02-02 北京享阅教育科技有限公司 A kind of Chinese reading capability comparison method and system
CN107977362B (en) * 2017-12-11 2021-05-04 中山大学 Method for grading Chinese text and calculating Chinese text difficulty score
CN108984531A (en) * 2018-07-23 2018-12-11 深圳市悦好教育科技有限公司 Books reading difficulty method and system based on language teaching material

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
CN106951406A (en) * 2017-03-13 2017-07-14 广西大学 A kind of stage division of the Chinese reading ability based on text language variable
CN107609591A (en) * 2017-09-13 2018-01-19 深圳市悦好教育科技有限公司 A kind of books stage division and system
CN107977449A (en) * 2017-12-14 2018-05-01 广东外语外贸大学 A kind of linear model approach estimated for simplified form of Chinese Character readability
CN108389147A (en) * 2018-02-26 2018-08-10 浙江创课教育科技有限公司 Item difficulty hierarchical processing method and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Improvements in predicting children"s overall reading ability by modeling variability in evaluators" subjective judgments;Matthew P. Black,等;《2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)》;20120831;第5069-5072页 *
基于线性回归的中文文本可读性预测方法研究;孙刚;《中国优秀硕士学位论文全文数据库信息科技辑》;20160315;第23-50页 *

Also Published As

Publication number Publication date
CN109933668A (en) 2019-06-25

Similar Documents

Publication Publication Date Title
CN109933668B (en) Hierarchical evaluation modeling method for readability of simplified Chinese text
CN106503055B (en) A kind of generation method from structured text to iamge description
CN107967318A (en) A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
Van Hout et al. Comparing measures of lexical richness
CN106528656A (en) Student history and real-time learning state parameter-based course recommendation realization method and system
CN107943784A (en) Relation extraction method based on generation confrontation network
CN109299380A (en) Exercise personalized recommendation method in online education platform based on multidimensional characteristic
CN107977362A (en) A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN107832781A (en) A kind of software defect towards multi-source data represents learning method
CN102279844A (en) Method and system for automatically testing Chinese composition
CN114913729B (en) Question selecting method, device, computer equipment and storage medium
Fuge et al. Automatically inferring metrics for design creativity
CN105786898B (en) A kind of construction method and device of domain body
KR102201709B1 (en) Method and system for estimating a reading index using automatic analysis program for text of korean language
Mizumoto et al. Modeling a prototypical use of language learning strategies
Rokade et al. Automated grading system using natural language processing
CN108280065B (en) Foreign text evaluation method and device
CN112015862A (en) User abnormal comment detection method and system based on hierarchical multichannel attention
Dascalu et al. Age of exposure: A model of word learning
Tack et al. Human and automated CEFR-based grading of short answers
CN113486645A (en) Text similarity detection method based on deep learning
Agarwal et al. Autoeval: A nlp approach for automatic test evaluation system
CN112115701B (en) News reading text readability evaluation method and system
CN111553821B (en) Automatic problem solving method for application problems based on teacher-student network and multi-head decoder
CN112528011A (en) Open type mathematic operation correction method, system and equipment driven by multiple data sources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Li Hong

Inventor after: Liu Miaomiao

Inventor after: Li Yan

Inventor before: Li Hong

Inventor before: Li Miaomiao

Inventor before: Li Yan

CB03 Change of inventor or designer information