CN109933668A - The classified estimation modeling method of simplified Chinese language text readability - Google Patents
The classified estimation modeling method of simplified Chinese language text readability Download PDFInfo
- Publication number
- CN109933668A CN109933668A CN201910206775.7A CN201910206775A CN109933668A CN 109933668 A CN109933668 A CN 109933668A CN 201910206775 A CN201910206775 A CN 201910206775A CN 109933668 A CN109933668 A CN 109933668A
- Authority
- CN
- China
- Prior art keywords
- text
- feature
- difficulty
- readability
- classified estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention belongs to Chinese language data processing fields, and in particular to the classified estimation modeling method of simplified Chinese language text readability.The classified estimation modeling method of simplified Chinese language text readability of the invention is the following steps are included: creation standard corpus library;Extract text feature;It constructs readable formula and formula effect is assessed.The present invention chooses the text feature of three Chinese character, vocabulary and sentence levels on the basis of existing Chinese readability formula, Chinese language text readability formula constructing the suitable simplified Chinese native of primary school period, having grade grade classification.
Description
Technical field
The invention belongs to Chinese language data processing fields, and in particular to the classified estimation of simplified Chinese language text readability is built
Mould method.
Background technique
In advanced information society, how children's book grow exponentially picks out conjunction in vast as the open sea books
The good book of suitable child becomes the problem of puzzlement teacher and parent.According to latest developments domain tyeory, the difficulty of children's reading material is answered
The current development level of a little higher than children, but cannot be excessively high, it can be only achieved training and improve the purpose of children's reading ability.
If selected reading material is excessively difficult, the efficiency sense of children's reading can be damaged, it is escaped and reads;And too simple material can then allow
Children feel barren, lose reading interest, and culture reading habit is not achieved and improves the purpose of reading ability.Current existing figure
Book staging hierarchy is dominated by publisher mostly, both based on not solid theoretical research, is also lacked positive research and is verified it
Validity, scientific deficiency, public credibility is not high, influence power is little, and the directive significance read to teenager is limited.In order to realize
The matching of virgin reading ability and books difficulty researches and develops objective, efficient Chinese language while accurate evaluation children's reading ability
This readability formula carries out accurate evaluation to text difficulty, is one of difficult point and hot issue of current classification Reading studies.
Readable formula refers to the method using mathematical expression, extracts certain texts that are quantifiable, influencing reading difficulty
Eigen, and determine the functional relation between these features and text difficulty.Currently, having more than ten of readability in English system
Readable formula, A-Z staging, Oxford reading tree series of Britain etc. are thought in formula, such as the blue of the U.S..These formula it is accurate
Degree is high, has a wide range of application, and establishes huge classification based on this and reads system, is promoting the ability culture of English children's reading
Huge effect has been played with habit formation etc..
Since Chinese and English are there is greatest differences, the readable formula in English-Speaking World can not directly apply to the Chinese
Chinese language sheet, and the Chinese readability formula that can find mathematical formulae at present only has 7, be primarily directed to complex form of Chinese characters learner or
Teaching Chinese as a foreign language, and most of formula does not provide specific grading standard, to the reading matter of Continental Area pupil
Select directive significance limited.Therefore, the text readability formula for being directed to the simplified Chinese native of primary school is created, is still a tool
Challenging leading edge operation.
Summary of the invention
The purpose of the present invention is to provide a kind of classified estimation modeling methods of simplified Chinese language text readability.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention comprising with
Lower step:
The suitable text of selection establishes standard corpus library, and text is carried out grade mark;
Text feature is extracted,
Defined word, word, sentence level text difficulty feature, word cutting and words are carried out to the text in standard corpus library respectively
Sentence mark processing etc., calculates the difficulty characteristic value of every text, then selects the optimal characteristics collection of text difficulty feature;
Text readability classified estimation formula is constructed,
Text in standard corpus library is divided into training text collection and test text collection,
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), it uses
Linear regression model (LRM) obtains readable classified estimation formula are as follows:
Yi=β0+β1X1i+β2X2i+β3X3i+μi, wherein YiIndicate the readable grade (1-12) of text, X1i, X2iAnd X3i
Respectively indicate the numerical value of three optimal characteristics collection of this text, β0For constant, intercept, β are represented1, β2And β3It is partial regression system
Number represents the variable X in the case where its dependent variable remains unchanged1, X2Or X3Y value variable quantity after changing a unit;
Integrated using test text as reference, the readable formula is assessed.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention is extracting text
In eigen step, word cutting is carried out to text using NLPIR Chinese word segmentation system and part-of-speech tagging is handled.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, by following
Step selects optimal characteristics collection:
It is related to text grade of difficulty to calculate separately all text difficulty features, according to the absolute value of related coefficient from big
To small by text difficulty feature ordering;
According to sequence, sequentially text difficulty characteristic value is selected to enter alternative features collection, establish regression equation;
The text difficulty feature that alternative features concentration is stayed in by synteny judgement selection, obtains optimal characteristics collection.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, by conllinear
Property judgement selection stay in alternative features concentration text difficulty feature method are as follows:
If the text difficulty feature X concentrated for alternative features1、X2、……Xk, there is the number λ for being not all 01、λ2……
λk, so that λ1X1+λ2X2+……λk Xk+μi=0, then there is synteny in alternative features concentration, need to find out at this time in the presence of altogether
Two text difficulty features of linear problem compare two text difficulty features and add in the case where keeping other feature invariants
△ R after entering2, concentrated in alternative features and retain △ R2Biggish feature;If alternative features, which are concentrated, is not present synteny problem,
Calculate the △ R after feature is added2If △ R2> 2%, then it is concentrated in alternative features and retains the feature, otherwise leave out the feature;
Above-mentioned steps are recycled, until all text difficulty features that traversal alternative features are concentrated.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, simplified Chinese
The construction method of text readability classified estimation formula is as follows:
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), if Y with
X1, X2, X3Variation and change, and there are linear relationships: Yi=β0+β1X1i+β2X2i+β3X3i+μi(i=1,2,3 ..., n),
Assuming thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y may be expressed as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations
Reach minimum, i.e., so that QMinimum value is obtained,
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposition for multiplying sample observing matrix X
Matrix X ', then haveObtain equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of,
ThusThe OLS estimator of as β,
It acquiresSpecific embodiment is simplified according to the present invention
The classified estimation modeling method of Chinese language text readability, is integrated using test text as reference, assesses simplified Chinese by following steps
Text readability classified estimation formula:
Calculate the observation Y calculated according to readable formulaObservationWith the actual value Y of test text collectionIt is practicalBetween related r;
Readable formula is calculated to the variation accounts amount R of test text collection data2, R2=r2;
Calculating closes on accuracy rate, close on accuracy rate=| YObservation-YIt is practical|, if closing on accuracy rate no more than 1, it is being considered as assessment just
Really;The ratio that correct text sum accounts for test text collection sum of assessing is calculated, accuracy rate is as closed on;
Calculate root-mean-square error:
As 0 < r < 1, r close to 1, and
0<R2< 1, R2Close to 1, and
Accuracy rate≤1 is closed on, closes on accuracy rate closer to 1, and
Root-mean-square error is smaller, then judges that readable classified estimation formula is more accurate.
Beneficial effects of the present invention:
The present invention is based on Chinese features, and three Chinese character, vocabulary and syntax levels can be carried out to Chinese language text by providing one kind
Difficulty signature analysis and automation classified estimation modeling method, ensure that text difficulty evaluation objectivity;
The present invention is based on Principles of Statistics to have carried out characteristic optimization on the basis of 44 text features of analysis comprehensively, letter
Change model, avoided Problems of Multiple Synteny, while guaranteeing forecasting accuracy, improves the comprehensibility of model;
Construction of the present invention Chinese readability formula and textual hierarchy system, can combine with Chinese reading capability comparison,
Final establish there is the ladder of Chinese characteristic to read system is simultaneously promoted, and realize the effective of students ' reading ability and books difficulty
Matching, science push the development of all youngsters and children reading abilities.
Detailed description of the invention
Fig. 1 shows grading evaluation method flow chart of the invention;
Fig. 2 shows that optimal characteristics collection selects flow chart.
Specific embodiment
Embodiment 1
As shown in Figure 1, the classified estimation modeling method of simplified Chinese language text readability of the invention the following steps are included:
1. establishing gold standard corpus, that is, define dependent variable
1.1 selection appropriate texts
The selection in standard corpus library needs to be bonded the use purpose of readable formula, and it is small that present invention is generally directed to Continental Areas
The reading material of children is learned, therefore the text selected is taught from the primary school Chinese of Continental Area, four versions being widely used
Material mainly includes that People's Education Publishing House, publishing house, Beijing Normal University, Jiangsu education publishing house and Southwestern Normal University publish
Society, each publishing house is each a set of (12), amounts to 48, each volume has specific class information (volumes), can be used as text
This grade.
1.2 screening texts
Since archaic Chinese and Modern Chinese have larger difference in syntax, words meaning, modern poetic does not have punctuation mark,
It is difficult to count the text feature of sentence surface, therefore the texts such as ancient poetry, ancient Chinese prose, Modern Poetry is eliminated by manual inspection.Finally
Gold standard corpus shares 1478 texts, amounts to 801550 words, and specifying information is shown in Table 1.
1 standard corpus library of table
1.3 text grades mark
According to appearance volumes (each year fraction upper and lower term, six grades total 12 copy) of the text in teaching material, to every
One text carries out 1~12 grade mark.
2. extracting text feature, that is, define independent variable
2.1 define text feature
The present invention defines word, word, text difficulty feature total 44 of three levels of sentence, specific text feature title altogether
And definition is shown in Table 2:
2 text feature of table summarizes
2.2 Text Pretreatment
Using NLPIR Chinese word segmentation system (being originated from NLPIR.org (natural language processing and information retrieval shared platform))
Word cutting and part-of-speech tagging processing are carried out to text, system word cutting mark accuracy reaches 98.45%.
2.3 text features calculate
2.3.1 the quantity of the number of words in statistics article, word number, word kind, word kind and punctuation mark;
2.3.2 word, word and Chinese-character stroke number table, words grade of difficulty table etc. are compared, obtain the phase of each words
Close information;
2.3.3 the part of speech distribution situation of vocabulary is counted;
2.3.4 according to the operational definition of 44 features in table 2 and 2.3.1 to 2.3.3's as a result, obtaining every text
This corresponding 44 characteristic value.
2.4 selection optimal characteristics collection
2.4.1 44 feature (X are calculated separately1, X2, X3... ... X44) and text grade of difficulty (Y) related coefficient (r),
Specially
Wherein, j=1,2,3 ... ..., 44;N=1478;σXj, σYIndicate Xj, the standard deviation of Y;XjiIndicate that i-th text exists
Score on jth item text feature;YiIndicate the text grade of difficulty of i-th text;Indicate all texts in jth item text
Score average in feature;Indicate the Y value average of all texts.
2.4.2 according to the absolute value of related coefficient (r), 44 features are ranked up from big to small, in sequence successively
It selects a feature to enter alternative features collection, establishes regression equation Yi=β0+β1X1i+β2X2i+……+βkXki+μi;
Wherein, YiIndicate the grade of difficulty of i-th text, X1i, X2i... ..., XkiThe k item for respectively indicating this text is standby
Select feature set score, β0For constant, intercept, β are represented1, β2..., βkIt is partial regression coefficient, representative is remained unchanged in its dependent variable
In the case where, variable X1, X2... ..., XkY value variable quantity after changing a unit.
2.4.3 carrying out conllinear sex determination
If the feature X concentrated for alternative features at this time1, X2... ... Xk, there is the constant λ for being not all 01, λ2……λk,
μ, so that λ1X1+λ2X2+……λk XkThere is synteny in+μ=0, i.e. judgement alternative features concentration.Conversely, if this formula
Without solution, that is, can not find be not all 0 constant λ1, λ2……λk, μ sets up the equation, then synteny problem is just not present.
When alternative features concentration has synteny, the k feature X that alternative features are concentrated is calculated1, X2... ... XkTwo
Related coefficient (the same 2.4.1 of calculation method) between two, if the related coefficient between certain two feature is greater than 0.75, that is, can determine that is
There is synteny in the two features.
Assuming that feature Xk-1And XkThere are problems that synteny, then initially sets up the regression equation mould for being added without this two features
Type M0: Yi=β0+β1X1i+……+βk-2Xk-2i+μi(the same 2.4.2 of meaning of parameters), and the multiple of computation model is determined
Wherein,Refer to each text Y value being calculated according to the regression model, YiIt is practical Y value,Refer to that Y value is flat
Mean value;
Later, in model M0Feature base on be separately added into feature Xk-1And Xk, establish model M1: Yi=β0+β1X1i
+……+βk-2Xk-2i+βk-1Xk-1i+μi(the same 2.4.2 of meaning of parameters) and M2: Yi=β0+β1X1i+……+βk-2Xk-2i+βkXki+μi
(the same 2.4.2 of meaning of parameters) is similarly obtained model M 1 and the multiple coefficient of determination R of M2M1 2And RM1 2.Finally, it calculates compared to mould
Type M0For, model M1And model M2The increased R of institute2Variable quantity: △ RM1 2=RM1 2-RM0 2;△RM2 2=RM2 2-RM0 2, retain △
R2All features enter alternative features collection in biggish model.
If synteny problem is not present in alternative features collection, the △ R after this feature is added is calculated2If △ R2> 2%, then exist
Alternative features, which are concentrated, retains this feature, otherwise leaves out this feature.
2.4.4 each step of 2.4.2~2.4.3 is recycled, until traversing all features, flow chart is referring to fig. 2.
2.4.5 optimal characteristics collection is finally obtained, finally altogether includes three Xiang Tezheng in the present invention: word kind, character learning literary name kind
Average difficulty and function word ratio.
3. constructing readable formula and assessing formula effect
3.1 determine training and test text collection
Text in each language teaching material is randomly divided into training text collection and test text collection, guarantee each version,
In each volume, the amount of text ratio that training text collection and test text integrate is 1:1.
3.2 establish readable formula
Be demarcated as dependent variable Y with the grade of training text collection, in above-mentioned 2.4 step determine optimal characteristics collection (word kind,
Character learning literary name kind average difficulty and function word ratio) it is independent variable (X1, X2, X3), using linear regression model (LRM), construct readable public
Formula, specific as follows:
If Y is with X1, X2, X3Variation and change, and there are linear relationships, are formulated as follows:
Yi=β0+β1X1i+β2X2i+β3X3i+μi,
Wherein, YiIndicate the readable grade of text, X1i, X2i, X3iRespectively indicate word kind, the character learning literary name of this text
The score value of kind average difficulty and function word ratio, β0For constant, intercept, β are represented1, β2, β3It is partial regression coefficient, represents in other changes
In the case that amount remains unchanged, variable X1, X2Or X3Y value variable quantity after changing a unit.
Assuming thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y can table
It is shown as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations
Reach minimum, i.e., so that QObtain minimum value.
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is after arranging abbreviation
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposition for multiplying sample observing matrix X
Matrix X ', then haveObtain normal equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of,
ThusThe OLS estimator of as β.
Finally acquire
Finally obtained readability formula are as follows:
It is classified number=- 4.84+0.01*Word kind+3.34*Character learning literary name kind average difficulty+7.83*Function word ratio.
3.3 readable formula assessments
Integrated using test text as reference, above-mentioned readable formula assessed, specific steps are as follows:
3.3.1 it calculates r value: calculating the observation (Y calculated according to readable formulaObservation) and test text collection actual value
(YIt is practical) between related coefficient (the same 2.4.1 of calculation formula, specially
Wherein, n=1478;σY observation, σY is practicalRespectively indicate YObservationAnd YIt is practicalStandard deviation;YObserve iIndicate i-th text according to can
The text grade of difficulty that the property read formula calculates;YPractical iIndicate the actual text grade of difficulty of i-th text;Indicate all texts
The average of this grade of difficulty observation;Indicate the average of all text grade of difficulty actual values.R value value range is
Between 0 to 1, closer to 1, readable formula effect is better.
3.3.2 R is calculated2: R2It is the important indicator for measuring regression result, indicates readable formula to test text collection difficulty
The variation accounts amount of value, R2=r2。
R2Value range is between 0 to 1, and closer to 1, readable formula effect is better.
3.3.3 it calculates and closes on accuracy rate: closing on and accurately refer to the case where observation is differed to a rank with actual value
It is correct to be considered as prediction.For example, observation is that 2 or 3 or 4 labels are to close on accuracy rate i.e. if text actual value is 3 |
YObservation-YIt is practical| ratio shared by≤1 text, value range is between 0 to 1, and closer to 1, readable formula effect is better.
3.3.4 root-mean-square error: root-mean-square error refers to the square root deviation size between observation and actual value, specifically
Calculation formula are as follows:
Its value is the smaller the better.
The indices of readable formula constructed by the present invention are as shown in table 3:
The readable formula indices of table 3
As can be seen from the results, the Chinese readability formula of institute's construction of the present invention, can be used for predicting primary school period Chinese language text
Difficulty carries out the difficulty calibration of 1~12 grade.
Claims (6)
1. the classified estimation modeling method of simplified Chinese language text readability, which is characterized in that the classified estimation modeling method packet
Include following steps:
The suitable text of selection establishes standard corpus library, and text is carried out grade mark;
Extract text feature;
Defined word, word, sentence level text difficulty feature, word cutting and words sentence mark are carried out to the text in standard corpus library respectively
Note processing, calculates the difficulty characteristic value of every text, then selects the optimal characteristics collection of text difficulty feature;
Text readability classified estimation formula is constructed,
Text in standard corpus library is divided into training text collection and test text collection,
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), using linear
Regression model obtains readable classified estimation formula are as follows:
Yi=β0+β1X1i+β2X2i+β3X3i+μi,
Wherein, β0For constant, intercept, β are represented1, β2And β3It is partial regression coefficient, represents the case where its dependent variable remains unchanged
Under, variable X1, X2Or X3Y value variable quantity after changing a unit,
Integrated using test text as reference, the readable classified estimation formula is assessed.
2. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that mentioning
It takes in text feature step, word cutting is carried out to text using NLPIR Chinese word segmentation system and part-of-speech tagging is handled.
3. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that pass through
Following steps select optimal characteristics collection:
The related coefficient for calculating separately the text difficulty feature and text grade of difficulty, according to related coefficient absolute value by text
Difficulty feature ordering;
According to sequence, sequentially difficulty feature is selected to enter alternative features collection, establish regression equation;
The text difficulty feature that alternative features concentration is stayed in by synteny judgement selection, obtains optimal characteristics collection.
4. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that pass through
The method that synteny judgement selection stays in the text difficulty feature of alternative features concentration are as follows:
The text difficulty feature X concentrated such as alternative features1、X2、……Xk, there is the number λ for being not all 01、λ2……λk, so that λ1X1
+λ2X2+……λkXk+μi=0, then there is synteny in alternative features concentration, needs to find out at this time and there are problems that synteny
Two text difficulty features in the case where keeping other feature invariants, compare the △ after two text difficulty features are separately added into
R2, concentrated in alternative features and retain △ R2Biggish feature;If alternative features, which are concentrated, is not present synteny problem, feature is calculated
△ R after addition2If △ R2> 2%, then it is concentrated in alternative features and retains the text difficulty feature, otherwise leave out the text
Difficulty feature;
Above-mentioned steps are recycled, until all text difficulty features that traversal alternative features are concentrated.
5. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that readable
The construction method of property classified estimation formula is as follows:
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), if Y is with X1,
X2, X3Variation and change, and there are linear relationships: Yi=β0+β1X1i+β2X2i+β3X3i+μi(i=1,2,3 ..., n), it is assumed thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y may be expressed as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations reach
Minimum, i.e., so thatMinimum value is obtained,
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposed matrix for multiplying sample observing matrix X
X ' then hasObtain equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of, thusThe OLS estimator of as β,
It acquires
6. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that survey
Examination text set is reference, assesses simplified Chinese language text readability classified estimation formula by following steps:
Calculate the observation Y calculated according to readable formulaObservationWith the actual value Y of test text collectionIt is practicalBetween correlation coefficient r;
Readable formula is calculated to the variation accounts amount R of test text collection data2, R2=r2;
Calculating closes on accuracy rate, close on accuracy rate=| YObservation-YIt is practical|, if closing on accuracy rate no more than 1, it is correct to be considered as assessment;
The ratio that correct text sum accounts for test text collection sum of assessing is calculated, accuracy rate is as closed on;
Calculate root-mean-square error:
As 0 < r < 1, r close to 1, and
0<R2< 1, R2Close to 1, and
Accuracy rate≤1 is closed on, closes on accuracy rate closer to 1, and
Root-mean-square error is smaller, then judges that readable classified estimation formula is more accurate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910206775.7A CN109933668B (en) | 2019-03-19 | 2019-03-19 | Hierarchical evaluation modeling method for readability of simplified Chinese text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910206775.7A CN109933668B (en) | 2019-03-19 | 2019-03-19 | Hierarchical evaluation modeling method for readability of simplified Chinese text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109933668A true CN109933668A (en) | 2019-06-25 |
CN109933668B CN109933668B (en) | 2021-03-26 |
Family
ID=66987605
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910206775.7A Active CN109933668B (en) | 2019-03-19 | 2019-03-19 | Hierarchical evaluation modeling method for readability of simplified Chinese text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109933668B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472236A (en) * | 2019-07-23 | 2019-11-19 | 浙江大学城市学院 | A kind of two-way GRU text readability appraisal procedure based on attention mechanism |
CN111797499A (en) * | 2020-06-02 | 2020-10-20 | 黑龙江省农业科学院绥化分院 | Multi-objective optimization method for crop breeding |
CN112115701A (en) * | 2020-09-07 | 2020-12-22 | 北京语言大学 | News reading text readability evaluation method and system |
CN112836275A (en) * | 2021-02-08 | 2021-05-25 | 哈尔滨工业大学 | Stadium emergency evacuation sign readability evaluation system based on fuzzy theory and control method thereof |
CN113408295A (en) * | 2021-06-22 | 2021-09-17 | 深圳证券信息有限公司 | Text readability evaluation method, computer device and computer storage medium |
CN113569556A (en) * | 2021-07-28 | 2021-10-29 | 怀化学院 | Rous model-based classification method for text difficulty in reading test of children |
CN113934850A (en) * | 2021-11-02 | 2022-01-14 | 北京语言大学 | Chinese text readability evaluation method and system fusing text distribution law characteristics |
CN115147013A (en) * | 2022-08-31 | 2022-10-04 | 南京复保科技有限公司 | Method and device for calculating readability of insurance product, computer equipment and storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207854A (en) * | 2012-01-11 | 2013-07-17 | 宋曜廷 | Chinese text readability measuring system and method thereof |
CN103530523A (en) * | 2013-10-23 | 2014-01-22 | 北京师范大学 | Modeling method for child linguistic competence development evaluation |
CN103544393A (en) * | 2013-10-23 | 2014-01-29 | 北京师范大学 | Method for tracking development of language abilities of children |
CN105068993A (en) * | 2015-07-31 | 2015-11-18 | 成都思戴科科技有限公司 | Method for evaluating text difficulty |
US20160357753A1 (en) * | 2015-06-07 | 2016-12-08 | Apple Inc. | Reader application system utilizing article scoring and clustering |
CN106601041A (en) * | 2016-12-15 | 2017-04-26 | 邵宏锋 | Reading information grading analysis processing system |
CN106951406A (en) * | 2017-03-13 | 2017-07-14 | 广西大学 | A kind of stage division of the Chinese reading ability based on text language variable |
CN107609591A (en) * | 2017-09-13 | 2018-01-19 | 深圳市悦好教育科技有限公司 | A kind of books stage division and system |
CN107657559A (en) * | 2017-08-25 | 2018-02-02 | 北京享阅教育科技有限公司 | A kind of Chinese reading capability comparison method and system |
US20180107645A1 (en) * | 2016-10-13 | 2018-04-19 | SkywriterRX, Inc. | Book analysis and recommendation |
CN107977449A (en) * | 2017-12-14 | 2018-05-01 | 广东外语外贸大学 | A kind of linear model approach estimated for simplified form of Chinese Character readability |
CN107977362A (en) * | 2017-12-11 | 2018-05-01 | 中山大学 | A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty |
CN108389147A (en) * | 2018-02-26 | 2018-08-10 | 浙江创课教育科技有限公司 | Item difficulty hierarchical processing method and system |
CN108984531A (en) * | 2018-07-23 | 2018-12-11 | 深圳市悦好教育科技有限公司 | Books reading difficulty method and system based on language teaching material |
-
2019
- 2019-03-19 CN CN201910206775.7A patent/CN109933668B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103207854A (en) * | 2012-01-11 | 2013-07-17 | 宋曜廷 | Chinese text readability measuring system and method thereof |
CN103530523A (en) * | 2013-10-23 | 2014-01-22 | 北京师范大学 | Modeling method for child linguistic competence development evaluation |
CN103544393A (en) * | 2013-10-23 | 2014-01-29 | 北京师范大学 | Method for tracking development of language abilities of children |
US20160357753A1 (en) * | 2015-06-07 | 2016-12-08 | Apple Inc. | Reader application system utilizing article scoring and clustering |
CN105068993A (en) * | 2015-07-31 | 2015-11-18 | 成都思戴科科技有限公司 | Method for evaluating text difficulty |
US20180107645A1 (en) * | 2016-10-13 | 2018-04-19 | SkywriterRX, Inc. | Book analysis and recommendation |
CN106601041A (en) * | 2016-12-15 | 2017-04-26 | 邵宏锋 | Reading information grading analysis processing system |
CN106951406A (en) * | 2017-03-13 | 2017-07-14 | 广西大学 | A kind of stage division of the Chinese reading ability based on text language variable |
CN107657559A (en) * | 2017-08-25 | 2018-02-02 | 北京享阅教育科技有限公司 | A kind of Chinese reading capability comparison method and system |
CN107609591A (en) * | 2017-09-13 | 2018-01-19 | 深圳市悦好教育科技有限公司 | A kind of books stage division and system |
CN107977362A (en) * | 2017-12-11 | 2018-05-01 | 中山大学 | A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty |
CN107977449A (en) * | 2017-12-14 | 2018-05-01 | 广东外语外贸大学 | A kind of linear model approach estimated for simplified form of Chinese Character readability |
CN108389147A (en) * | 2018-02-26 | 2018-08-10 | 浙江创课教育科技有限公司 | Item difficulty hierarchical processing method and system |
CN108984531A (en) * | 2018-07-23 | 2018-12-11 | 深圳市悦好教育科技有限公司 | Books reading difficulty method and system based on language teaching material |
Non-Patent Citations (3)
Title |
---|
MATTHEW P. BLACK,等: "Improvements in predicting children"s overall reading ability by modeling variability in evaluators" subjective judgments", 《2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
孙刚: "基于线性回归的中文文本可读性预测方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
左虹,等: "中级欧美留学生汉语文本可读性公式研究", 《世界汉语教学》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472236A (en) * | 2019-07-23 | 2019-11-19 | 浙江大学城市学院 | A kind of two-way GRU text readability appraisal procedure based on attention mechanism |
CN111797499A (en) * | 2020-06-02 | 2020-10-20 | 黑龙江省农业科学院绥化分院 | Multi-objective optimization method for crop breeding |
CN111797499B (en) * | 2020-06-02 | 2023-12-15 | 黑龙江省农业科学院绥化分院 | Crop breeding multi-objective optimization method |
CN112115701A (en) * | 2020-09-07 | 2020-12-22 | 北京语言大学 | News reading text readability evaluation method and system |
CN112836275A (en) * | 2021-02-08 | 2021-05-25 | 哈尔滨工业大学 | Stadium emergency evacuation sign readability evaluation system based on fuzzy theory and control method thereof |
CN113408295A (en) * | 2021-06-22 | 2021-09-17 | 深圳证券信息有限公司 | Text readability evaluation method, computer device and computer storage medium |
CN113408295B (en) * | 2021-06-22 | 2023-02-28 | 深圳证券信息有限公司 | Text readability evaluation method, computer device and computer storage medium |
CN113569556A (en) * | 2021-07-28 | 2021-10-29 | 怀化学院 | Rous model-based classification method for text difficulty in reading test of children |
CN113569556B (en) * | 2021-07-28 | 2024-04-02 | 怀化学院 | Grading method for children reading test text difficulty based on Ross model |
CN113934850A (en) * | 2021-11-02 | 2022-01-14 | 北京语言大学 | Chinese text readability evaluation method and system fusing text distribution law characteristics |
CN113934850B (en) * | 2021-11-02 | 2022-06-17 | 北京语言大学 | Chinese text readability evaluation method and system fusing text distribution law characteristics |
CN115147013A (en) * | 2022-08-31 | 2022-10-04 | 南京复保科技有限公司 | Method and device for calculating readability of insurance product, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109933668B (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109933668A (en) | The classified estimation modeling method of simplified Chinese language text readability | |
CN107967318A (en) | A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets | |
CN106528656A (en) | Student history and real-time learning state parameter-based course recommendation realization method and system | |
Jayakodi et al. | WordNet and cosine similarity based classifier of exam questions using bloom’s taxonomy | |
CN107977362A (en) | A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty | |
CN105740382A (en) | Aspect classification method for short comment texts | |
Pong-Inwong et al. | Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation | |
CN105786898B (en) | A kind of construction method and device of domain body | |
KR102484007B1 (en) | Method and system for estimating a reading index using automatic analysis program for text of korean language | |
Dascalu et al. | Age of exposure: A model of word learning | |
CN110472236A (en) | A kind of two-way GRU text readability appraisal procedure based on attention mechanism | |
CN114780723B (en) | Portrayal generation method, system and medium based on guide network text classification | |
Agarwal et al. | Autoeval: A nlp approach for automatic test evaluation system | |
CN113934850B (en) | Chinese text readability evaluation method and system fusing text distribution law characteristics | |
Dascălu et al. | Towards an integrated approach for evaluating textual complexity for learning purposes | |
CN112115701B (en) | News reading text readability evaluation method and system | |
CN104794168B (en) | A kind of Knowledge Relation method and system | |
Botarleanu et al. | Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings | |
CN107329951A (en) | Build name entity mark resources bank method, device, storage medium and computer equipment | |
Pang | Chinese readability analysis and its applications on the internet | |
Pavlekovic et al. | Modeling children’s mathematical gift by neural networks and logistic regression | |
Karadeniz et al. | Sustainability in Urban and Regional Planning Education in Turkey. | |
Wheeler et al. | Exploring rater accuracy using unfolding models combined with topic models: Incorporating supervised latent Dirichlet allocation | |
Ke et al. | Autoscoring essays based on complex networks | |
Limongelli et al. | Enriching didactic similarity measures of concept maps by a deep learning based approach |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Li Hong Inventor after: Liu Miaomiao Inventor after: Li Yan Inventor before: Li Hong Inventor before: Li Miaomiao Inventor before: Li Yan |