CN109933668A - The classified estimation modeling method of simplified Chinese language text readability - Google Patents

The classified estimation modeling method of simplified Chinese language text readability Download PDF

Info

Publication number
CN109933668A
CN109933668A CN201910206775.7A CN201910206775A CN109933668A CN 109933668 A CN109933668 A CN 109933668A CN 201910206775 A CN201910206775 A CN 201910206775A CN 109933668 A CN109933668 A CN 109933668A
Authority
CN
China
Prior art keywords
text
feature
difficulty
readability
classified estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910206775.7A
Other languages
Chinese (zh)
Other versions
CN109933668B (en
Inventor
李虹
李苗苗
李燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN201910206775.7A priority Critical patent/CN109933668B/en
Publication of CN109933668A publication Critical patent/CN109933668A/en
Application granted granted Critical
Publication of CN109933668B publication Critical patent/CN109933668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention belongs to Chinese language data processing fields, and in particular to the classified estimation modeling method of simplified Chinese language text readability.The classified estimation modeling method of simplified Chinese language text readability of the invention is the following steps are included: creation standard corpus library;Extract text feature;It constructs readable formula and formula effect is assessed.The present invention chooses the text feature of three Chinese character, vocabulary and sentence levels on the basis of existing Chinese readability formula, Chinese language text readability formula constructing the suitable simplified Chinese native of primary school period, having grade grade classification.

Description

The classified estimation modeling method of simplified Chinese language text readability
Technical field
The invention belongs to Chinese language data processing fields, and in particular to the classified estimation of simplified Chinese language text readability is built Mould method.
Background technique
In advanced information society, how children's book grow exponentially picks out conjunction in vast as the open sea books The good book of suitable child becomes the problem of puzzlement teacher and parent.According to latest developments domain tyeory, the difficulty of children's reading material is answered The current development level of a little higher than children, but cannot be excessively high, it can be only achieved training and improve the purpose of children's reading ability. If selected reading material is excessively difficult, the efficiency sense of children's reading can be damaged, it is escaped and reads;And too simple material can then allow Children feel barren, lose reading interest, and culture reading habit is not achieved and improves the purpose of reading ability.Current existing figure Book staging hierarchy is dominated by publisher mostly, both based on not solid theoretical research, is also lacked positive research and is verified it Validity, scientific deficiency, public credibility is not high, influence power is little, and the directive significance read to teenager is limited.In order to realize The matching of virgin reading ability and books difficulty researches and develops objective, efficient Chinese language while accurate evaluation children's reading ability This readability formula carries out accurate evaluation to text difficulty, is one of difficult point and hot issue of current classification Reading studies.
Readable formula refers to the method using mathematical expression, extracts certain texts that are quantifiable, influencing reading difficulty Eigen, and determine the functional relation between these features and text difficulty.Currently, having more than ten of readability in English system Readable formula, A-Z staging, Oxford reading tree series of Britain etc. are thought in formula, such as the blue of the U.S..These formula it is accurate Degree is high, has a wide range of application, and establishes huge classification based on this and reads system, is promoting the ability culture of English children's reading Huge effect has been played with habit formation etc..
Since Chinese and English are there is greatest differences, the readable formula in English-Speaking World can not directly apply to the Chinese Chinese language sheet, and the Chinese readability formula that can find mathematical formulae at present only has 7, be primarily directed to complex form of Chinese characters learner or Teaching Chinese as a foreign language, and most of formula does not provide specific grading standard, to the reading matter of Continental Area pupil Select directive significance limited.Therefore, the text readability formula for being directed to the simplified Chinese native of primary school is created, is still a tool Challenging leading edge operation.
Summary of the invention
The purpose of the present invention is to provide a kind of classified estimation modeling methods of simplified Chinese language text readability.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention comprising with Lower step:
The suitable text of selection establishes standard corpus library, and text is carried out grade mark;
Text feature is extracted,
Defined word, word, sentence level text difficulty feature, word cutting and words are carried out to the text in standard corpus library respectively Sentence mark processing etc., calculates the difficulty characteristic value of every text, then selects the optimal characteristics collection of text difficulty feature;
Text readability classified estimation formula is constructed,
Text in standard corpus library is divided into training text collection and test text collection,
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), it uses Linear regression model (LRM) obtains readable classified estimation formula are as follows:
Yi01X1i2X2i3X3ii, wherein YiIndicate the readable grade (1-12) of text, X1i, X2iAnd X3i Respectively indicate the numerical value of three optimal characteristics collection of this text, β0For constant, intercept, β are represented1, β2And β3It is partial regression system Number represents the variable X in the case where its dependent variable remains unchanged1, X2Or X3Y value variable quantity after changing a unit;
Integrated using test text as reference, the readable formula is assessed.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention is extracting text In eigen step, word cutting is carried out to text using NLPIR Chinese word segmentation system and part-of-speech tagging is handled.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, by following Step selects optimal characteristics collection:
It is related to text grade of difficulty to calculate separately all text difficulty features, according to the absolute value of related coefficient from big To small by text difficulty feature ordering;
According to sequence, sequentially text difficulty characteristic value is selected to enter alternative features collection, establish regression equation;
The text difficulty feature that alternative features concentration is stayed in by synteny judgement selection, obtains optimal characteristics collection.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, by conllinear Property judgement selection stay in alternative features concentration text difficulty feature method are as follows:
If the text difficulty feature X concentrated for alternative features1、X2、……Xk, there is the number λ for being not all 01、λ2…… λk, so that λ1X12X2+……λk Xki=0, then there is synteny in alternative features concentration, need to find out at this time in the presence of altogether Two text difficulty features of linear problem compare two text difficulty features and add in the case where keeping other feature invariants △ R after entering2, concentrated in alternative features and retain △ R2Biggish feature;If alternative features, which are concentrated, is not present synteny problem, Calculate the △ R after feature is added2If △ R2> 2%, then it is concentrated in alternative features and retains the feature, otherwise leave out the feature;
Above-mentioned steps are recycled, until all text difficulty features that traversal alternative features are concentrated.
The classified estimation modeling method of the simplified Chinese language text readability of specific embodiment according to the present invention, simplified Chinese The construction method of text readability classified estimation formula is as follows:
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), if Y with X1, X2, X3Variation and change, and there are linear relationships: Yi01X1i2X2i3X3ii(i=1,2,3 ..., n), Assuming thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y may be expressed as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations Reach minimum, i.e., so that QMinimum value is obtained,
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposition for multiplying sample observing matrix X Matrix X ', then haveObtain equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of, ThusThe OLS estimator of as β,
It acquiresSpecific embodiment is simplified according to the present invention The classified estimation modeling method of Chinese language text readability, is integrated using test text as reference, assesses simplified Chinese by following steps Text readability classified estimation formula:
Calculate the observation Y calculated according to readable formulaObservationWith the actual value Y of test text collectionIt is practicalBetween related r;
Readable formula is calculated to the variation accounts amount R of test text collection data2, R2=r2
Calculating closes on accuracy rate, close on accuracy rate=| YObservation-YIt is practical|, if closing on accuracy rate no more than 1, it is being considered as assessment just Really;The ratio that correct text sum accounts for test text collection sum of assessing is calculated, accuracy rate is as closed on;
Calculate root-mean-square error:
As 0 < r < 1, r close to 1, and
0<R2< 1, R2Close to 1, and
Accuracy rate≤1 is closed on, closes on accuracy rate closer to 1, and
Root-mean-square error is smaller, then judges that readable classified estimation formula is more accurate.
Beneficial effects of the present invention:
The present invention is based on Chinese features, and three Chinese character, vocabulary and syntax levels can be carried out to Chinese language text by providing one kind Difficulty signature analysis and automation classified estimation modeling method, ensure that text difficulty evaluation objectivity;
The present invention is based on Principles of Statistics to have carried out characteristic optimization on the basis of 44 text features of analysis comprehensively, letter Change model, avoided Problems of Multiple Synteny, while guaranteeing forecasting accuracy, improves the comprehensibility of model;
Construction of the present invention Chinese readability formula and textual hierarchy system, can combine with Chinese reading capability comparison, Final establish there is the ladder of Chinese characteristic to read system is simultaneously promoted, and realize the effective of students ' reading ability and books difficulty Matching, science push the development of all youngsters and children reading abilities.
Detailed description of the invention
Fig. 1 shows grading evaluation method flow chart of the invention;
Fig. 2 shows that optimal characteristics collection selects flow chart.
Specific embodiment
Embodiment 1
As shown in Figure 1, the classified estimation modeling method of simplified Chinese language text readability of the invention the following steps are included:
1. establishing gold standard corpus, that is, define dependent variable
1.1 selection appropriate texts
The selection in standard corpus library needs to be bonded the use purpose of readable formula, and it is small that present invention is generally directed to Continental Areas The reading material of children is learned, therefore the text selected is taught from the primary school Chinese of Continental Area, four versions being widely used Material mainly includes that People's Education Publishing House, publishing house, Beijing Normal University, Jiangsu education publishing house and Southwestern Normal University publish Society, each publishing house is each a set of (12), amounts to 48, each volume has specific class information (volumes), can be used as text This grade.
1.2 screening texts
Since archaic Chinese and Modern Chinese have larger difference in syntax, words meaning, modern poetic does not have punctuation mark, It is difficult to count the text feature of sentence surface, therefore the texts such as ancient poetry, ancient Chinese prose, Modern Poetry is eliminated by manual inspection.Finally Gold standard corpus shares 1478 texts, amounts to 801550 words, and specifying information is shown in Table 1.
1 standard corpus library of table
1.3 text grades mark
According to appearance volumes (each year fraction upper and lower term, six grades total 12 copy) of the text in teaching material, to every One text carries out 1~12 grade mark.
2. extracting text feature, that is, define independent variable
2.1 define text feature
The present invention defines word, word, text difficulty feature total 44 of three levels of sentence, specific text feature title altogether And definition is shown in Table 2:
2 text feature of table summarizes
2.2 Text Pretreatment
Using NLPIR Chinese word segmentation system (being originated from NLPIR.org (natural language processing and information retrieval shared platform)) Word cutting and part-of-speech tagging processing are carried out to text, system word cutting mark accuracy reaches 98.45%.
2.3 text features calculate
2.3.1 the quantity of the number of words in statistics article, word number, word kind, word kind and punctuation mark;
2.3.2 word, word and Chinese-character stroke number table, words grade of difficulty table etc. are compared, obtain the phase of each words Close information;
2.3.3 the part of speech distribution situation of vocabulary is counted;
2.3.4 according to the operational definition of 44 features in table 2 and 2.3.1 to 2.3.3's as a result, obtaining every text This corresponding 44 characteristic value.
2.4 selection optimal characteristics collection
2.4.1 44 feature (X are calculated separately1, X2, X3... ... X44) and text grade of difficulty (Y) related coefficient (r), Specially
Wherein, j=1,2,3 ... ..., 44;N=1478;σXj, σYIndicate Xj, the standard deviation of Y;XjiIndicate that i-th text exists Score on jth item text feature;YiIndicate the text grade of difficulty of i-th text;Indicate all texts in jth item text Score average in feature;Indicate the Y value average of all texts.
2.4.2 according to the absolute value of related coefficient (r), 44 features are ranked up from big to small, in sequence successively It selects a feature to enter alternative features collection, establishes regression equation Yi01X1i2X2i+……+βkXkii
Wherein, YiIndicate the grade of difficulty of i-th text, X1i, X2i... ..., XkiThe k item for respectively indicating this text is standby Select feature set score, β0For constant, intercept, β are represented1, β2..., βkIt is partial regression coefficient, representative is remained unchanged in its dependent variable In the case where, variable X1, X2... ..., XkY value variable quantity after changing a unit.
2.4.3 carrying out conllinear sex determination
If the feature X concentrated for alternative features at this time1, X2... ... Xk, there is the constant λ for being not all 01, λ2……λk, μ, so that λ1X12X2+……λk XkThere is synteny in+μ=0, i.e. judgement alternative features concentration.Conversely, if this formula Without solution, that is, can not find be not all 0 constant λ1, λ2……λk, μ sets up the equation, then synteny problem is just not present.
When alternative features concentration has synteny, the k feature X that alternative features are concentrated is calculated1, X2... ... XkTwo Related coefficient (the same 2.4.1 of calculation method) between two, if the related coefficient between certain two feature is greater than 0.75, that is, can determine that is There is synteny in the two features.
Assuming that feature Xk-1And XkThere are problems that synteny, then initially sets up the regression equation mould for being added without this two features Type M0: Yi01X1i+……+βk-2Xk-2ii(the same 2.4.2 of meaning of parameters), and the multiple of computation model is determined
Wherein,Refer to each text Y value being calculated according to the regression model, YiIt is practical Y value,Refer to that Y value is flat Mean value;
Later, in model M0Feature base on be separately added into feature Xk-1And Xk, establish model M1: Yi01X1i +……+βk-2Xk-2ik-1Xk-1ii(the same 2.4.2 of meaning of parameters) and M2: Yi01X1i+……+βk-2Xk-2ikXkii (the same 2.4.2 of meaning of parameters) is similarly obtained model M 1 and the multiple coefficient of determination R of M2M1 2And RM1 2.Finally, it calculates compared to mould Type M0For, model M1And model M2The increased R of institute2Variable quantity: △ RM1 2=RM1 2-RM0 2;△RM2 2=RM2 2-RM0 2, retain △ R2All features enter alternative features collection in biggish model.
If synteny problem is not present in alternative features collection, the △ R after this feature is added is calculated2If △ R2> 2%, then exist Alternative features, which are concentrated, retains this feature, otherwise leaves out this feature.
2.4.4 each step of 2.4.2~2.4.3 is recycled, until traversing all features, flow chart is referring to fig. 2.
2.4.5 optimal characteristics collection is finally obtained, finally altogether includes three Xiang Tezheng in the present invention: word kind, character learning literary name kind Average difficulty and function word ratio.
3. constructing readable formula and assessing formula effect
3.1 determine training and test text collection
Text in each language teaching material is randomly divided into training text collection and test text collection, guarantee each version, In each volume, the amount of text ratio that training text collection and test text integrate is 1:1.
3.2 establish readable formula
Be demarcated as dependent variable Y with the grade of training text collection, in above-mentioned 2.4 step determine optimal characteristics collection (word kind, Character learning literary name kind average difficulty and function word ratio) it is independent variable (X1, X2, X3), using linear regression model (LRM), construct readable public Formula, specific as follows:
If Y is with X1, X2, X3Variation and change, and there are linear relationships, are formulated as follows:
Yi01X1i2X2i3X3ii,
Wherein, YiIndicate the readable grade of text, X1i, X2i, X3iRespectively indicate word kind, the character learning literary name of this text The score value of kind average difficulty and function word ratio, β0For constant, intercept, β are represented1, β2, β3It is partial regression coefficient, represents in other changes In the case that amount remains unchanged, variable X1, X2Or X3Y value variable quantity after changing a unit.
Assuming thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y can table It is shown as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations Reach minimum, i.e., so that QObtain minimum value.
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is after arranging abbreviation
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposition for multiplying sample observing matrix X Matrix X ', then haveObtain normal equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of, ThusThe OLS estimator of as β.
Finally acquire
Finally obtained readability formula are as follows:
It is classified number=- 4.84+0.01*Word kind+3.34*Character learning literary name kind average difficulty+7.83*Function word ratio.
3.3 readable formula assessments
Integrated using test text as reference, above-mentioned readable formula assessed, specific steps are as follows:
3.3.1 it calculates r value: calculating the observation (Y calculated according to readable formulaObservation) and test text collection actual value (YIt is practical) between related coefficient (the same 2.4.1 of calculation formula, specially
Wherein, n=1478;σY observation, σY is practicalRespectively indicate YObservationAnd YIt is practicalStandard deviation;YObserve iIndicate i-th text according to can The text grade of difficulty that the property read formula calculates;YPractical iIndicate the actual text grade of difficulty of i-th text;Indicate all texts The average of this grade of difficulty observation;Indicate the average of all text grade of difficulty actual values.R value value range is Between 0 to 1, closer to 1, readable formula effect is better.
3.3.2 R is calculated2: R2It is the important indicator for measuring regression result, indicates readable formula to test text collection difficulty The variation accounts amount of value, R2=r2
R2Value range is between 0 to 1, and closer to 1, readable formula effect is better.
3.3.3 it calculates and closes on accuracy rate: closing on and accurately refer to the case where observation is differed to a rank with actual value It is correct to be considered as prediction.For example, observation is that 2 or 3 or 4 labels are to close on accuracy rate i.e. if text actual value is 3 | YObservation-YIt is practical| ratio shared by≤1 text, value range is between 0 to 1, and closer to 1, readable formula effect is better.
3.3.4 root-mean-square error: root-mean-square error refers to the square root deviation size between observation and actual value, specifically Calculation formula are as follows:
Its value is the smaller the better.
The indices of readable formula constructed by the present invention are as shown in table 3:
The readable formula indices of table 3
As can be seen from the results, the Chinese readability formula of institute's construction of the present invention, can be used for predicting primary school period Chinese language text Difficulty carries out the difficulty calibration of 1~12 grade.

Claims (6)

1. the classified estimation modeling method of simplified Chinese language text readability, which is characterized in that the classified estimation modeling method packet Include following steps:
The suitable text of selection establishes standard corpus library, and text is carried out grade mark;
Extract text feature;
Defined word, word, sentence level text difficulty feature, word cutting and words sentence mark are carried out to the text in standard corpus library respectively Note processing, calculates the difficulty characteristic value of every text, then selects the optimal characteristics collection of text difficulty feature;
Text readability classified estimation formula is constructed,
Text in standard corpus library is divided into training text collection and test text collection,
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), using linear Regression model obtains readable classified estimation formula are as follows:
Yi01X1i2X2i3X3ii,
Wherein, β0For constant, intercept, β are represented1, β2And β3It is partial regression coefficient, represents the case where its dependent variable remains unchanged Under, variable X1, X2Or X3Y value variable quantity after changing a unit,
Integrated using test text as reference, the readable classified estimation formula is assessed.
2. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that mentioning It takes in text feature step, word cutting is carried out to text using NLPIR Chinese word segmentation system and part-of-speech tagging is handled.
3. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that pass through Following steps select optimal characteristics collection:
The related coefficient for calculating separately the text difficulty feature and text grade of difficulty, according to related coefficient absolute value by text Difficulty feature ordering;
According to sequence, sequentially difficulty feature is selected to enter alternative features collection, establish regression equation;
The text difficulty feature that alternative features concentration is stayed in by synteny judgement selection, obtains optimal characteristics collection.
4. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that pass through The method that synteny judgement selection stays in the text difficulty feature of alternative features concentration are as follows:
The text difficulty feature X concentrated such as alternative features1、X2、……Xk, there is the number λ for being not all 01、λ2……λk, so that λ1X12X2+……λkXki=0, then there is synteny in alternative features concentration, needs to find out at this time and there are problems that synteny Two text difficulty features in the case where keeping other feature invariants, compare the △ after two text difficulty features are separately added into R2, concentrated in alternative features and retain △ R2Biggish feature;If alternative features, which are concentrated, is not present synteny problem, feature is calculated △ R after addition2If △ R2> 2%, then it is concentrated in alternative features and retains the text difficulty feature, otherwise leave out the text Difficulty feature;
Above-mentioned steps are recycled, until all text difficulty features that traversal alternative features are concentrated.
5. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that readable The construction method of property classified estimation formula is as follows:
Integrate the grade being marked using training text as dependent variable Y, is integrated using optimal characteristics as independent variable (X1, X2, X3), if Y is with X1, X2, X3Variation and change, and there are linear relationships: Yi01X1i2X2i3X3ii(i=1,2,3 ..., n), it is assumed thatIt is parameter beta respectively0, β1, β2, β3Least-squares estimation, then the regressand value of Y may be expressed as:
Observation YiWith regressand valueResidual error eiFor
According to least square method,It should make whole observation YkWith regressand valueSum of square of deviations reach Minimum, i.e., so thatMinimum value is obtained,
According to the extremum principle of the function of many variables, Q is right respectivelySingle order local derviation is sought, and it is enabled to be equal to zero, i.e.,Its matrix form is
Because
IfFor estimated value vector, regression modelBoth sides are the same as the transposed matrix for multiplying sample observing matrix X X ' then hasObtain equation group
Since there is no multicollinearity, X ' X is 4 rank square matrixes, so X ' X full rank, the inverse matrix (X ' X) of X ' X-1In the presence of, thusThe OLS estimator of as β,
It acquires
6. the classified estimation modeling method of simplified Chinese language text readability according to claim 1, which is characterized in that survey Examination text set is reference, assesses simplified Chinese language text readability classified estimation formula by following steps:
Calculate the observation Y calculated according to readable formulaObservationWith the actual value Y of test text collectionIt is practicalBetween correlation coefficient r;
Readable formula is calculated to the variation accounts amount R of test text collection data2, R2=r2
Calculating closes on accuracy rate, close on accuracy rate=| YObservation-YIt is practical|, if closing on accuracy rate no more than 1, it is correct to be considered as assessment; The ratio that correct text sum accounts for test text collection sum of assessing is calculated, accuracy rate is as closed on;
Calculate root-mean-square error:
As 0 < r < 1, r close to 1, and
0<R2< 1, R2Close to 1, and
Accuracy rate≤1 is closed on, closes on accuracy rate closer to 1, and
Root-mean-square error is smaller, then judges that readable classified estimation formula is more accurate.
CN201910206775.7A 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text Active CN109933668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910206775.7A CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910206775.7A CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Publications (2)

Publication Number Publication Date
CN109933668A true CN109933668A (en) 2019-06-25
CN109933668B CN109933668B (en) 2021-03-26

Family

ID=66987605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910206775.7A Active CN109933668B (en) 2019-03-19 2019-03-19 Hierarchical evaluation modeling method for readability of simplified Chinese text

Country Status (1)

Country Link
CN (1) CN109933668B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472236A (en) * 2019-07-23 2019-11-19 浙江大学城市学院 A kind of two-way GRU text readability appraisal procedure based on attention mechanism
CN111797499A (en) * 2020-06-02 2020-10-20 黑龙江省农业科学院绥化分院 Multi-objective optimization method for crop breeding
CN112115701A (en) * 2020-09-07 2020-12-22 北京语言大学 News reading text readability evaluation method and system
CN112836275A (en) * 2021-02-08 2021-05-25 哈尔滨工业大学 Stadium emergency evacuation sign readability evaluation system based on fuzzy theory and control method thereof
CN113408295A (en) * 2021-06-22 2021-09-17 深圳证券信息有限公司 Text readability evaluation method, computer device and computer storage medium
CN113569556A (en) * 2021-07-28 2021-10-29 怀化学院 Rous model-based classification method for text difficulty in reading test of children
CN113934850A (en) * 2021-11-02 2022-01-14 北京语言大学 Chinese text readability evaluation method and system fusing text distribution law characteristics
CN115147013A (en) * 2022-08-31 2022-10-04 南京复保科技有限公司 Method and device for calculating readability of insurance product, computer equipment and storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN103530523A (en) * 2013-10-23 2014-01-22 北京师范大学 Modeling method for child linguistic competence development evaluation
CN103544393A (en) * 2013-10-23 2014-01-29 北京师范大学 Method for tracking development of language abilities of children
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
US20160357753A1 (en) * 2015-06-07 2016-12-08 Apple Inc. Reader application system utilizing article scoring and clustering
CN106601041A (en) * 2016-12-15 2017-04-26 邵宏锋 Reading information grading analysis processing system
CN106951406A (en) * 2017-03-13 2017-07-14 广西大学 A kind of stage division of the Chinese reading ability based on text language variable
CN107609591A (en) * 2017-09-13 2018-01-19 深圳市悦好教育科技有限公司 A kind of books stage division and system
CN107657559A (en) * 2017-08-25 2018-02-02 北京享阅教育科技有限公司 A kind of Chinese reading capability comparison method and system
US20180107645A1 (en) * 2016-10-13 2018-04-19 SkywriterRX, Inc. Book analysis and recommendation
CN107977449A (en) * 2017-12-14 2018-05-01 广东外语外贸大学 A kind of linear model approach estimated for simplified form of Chinese Character readability
CN107977362A (en) * 2017-12-11 2018-05-01 中山大学 A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN108389147A (en) * 2018-02-26 2018-08-10 浙江创课教育科技有限公司 Item difficulty hierarchical processing method and system
CN108984531A (en) * 2018-07-23 2018-12-11 深圳市悦好教育科技有限公司 Books reading difficulty method and system based on language teaching material

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207854A (en) * 2012-01-11 2013-07-17 宋曜廷 Chinese text readability measuring system and method thereof
CN103530523A (en) * 2013-10-23 2014-01-22 北京师范大学 Modeling method for child linguistic competence development evaluation
CN103544393A (en) * 2013-10-23 2014-01-29 北京师范大学 Method for tracking development of language abilities of children
US20160357753A1 (en) * 2015-06-07 2016-12-08 Apple Inc. Reader application system utilizing article scoring and clustering
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
US20180107645A1 (en) * 2016-10-13 2018-04-19 SkywriterRX, Inc. Book analysis and recommendation
CN106601041A (en) * 2016-12-15 2017-04-26 邵宏锋 Reading information grading analysis processing system
CN106951406A (en) * 2017-03-13 2017-07-14 广西大学 A kind of stage division of the Chinese reading ability based on text language variable
CN107657559A (en) * 2017-08-25 2018-02-02 北京享阅教育科技有限公司 A kind of Chinese reading capability comparison method and system
CN107609591A (en) * 2017-09-13 2018-01-19 深圳市悦好教育科技有限公司 A kind of books stage division and system
CN107977362A (en) * 2017-12-11 2018-05-01 中山大学 A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN107977449A (en) * 2017-12-14 2018-05-01 广东外语外贸大学 A kind of linear model approach estimated for simplified form of Chinese Character readability
CN108389147A (en) * 2018-02-26 2018-08-10 浙江创课教育科技有限公司 Item difficulty hierarchical processing method and system
CN108984531A (en) * 2018-07-23 2018-12-11 深圳市悦好教育科技有限公司 Books reading difficulty method and system based on language teaching material

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MATTHEW P. BLACK,等: "Improvements in predicting children"s overall reading ability by modeling variability in evaluators" subjective judgments", 《2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 *
孙刚: "基于线性回归的中文文本可读性预测方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
左虹,等: "中级欧美留学生汉语文本可读性公式研究", 《世界汉语教学》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472236A (en) * 2019-07-23 2019-11-19 浙江大学城市学院 A kind of two-way GRU text readability appraisal procedure based on attention mechanism
CN111797499A (en) * 2020-06-02 2020-10-20 黑龙江省农业科学院绥化分院 Multi-objective optimization method for crop breeding
CN111797499B (en) * 2020-06-02 2023-12-15 黑龙江省农业科学院绥化分院 Crop breeding multi-objective optimization method
CN112115701A (en) * 2020-09-07 2020-12-22 北京语言大学 News reading text readability evaluation method and system
CN112836275A (en) * 2021-02-08 2021-05-25 哈尔滨工业大学 Stadium emergency evacuation sign readability evaluation system based on fuzzy theory and control method thereof
CN113408295A (en) * 2021-06-22 2021-09-17 深圳证券信息有限公司 Text readability evaluation method, computer device and computer storage medium
CN113408295B (en) * 2021-06-22 2023-02-28 深圳证券信息有限公司 Text readability evaluation method, computer device and computer storage medium
CN113569556A (en) * 2021-07-28 2021-10-29 怀化学院 Rous model-based classification method for text difficulty in reading test of children
CN113569556B (en) * 2021-07-28 2024-04-02 怀化学院 Grading method for children reading test text difficulty based on Ross model
CN113934850A (en) * 2021-11-02 2022-01-14 北京语言大学 Chinese text readability evaluation method and system fusing text distribution law characteristics
CN113934850B (en) * 2021-11-02 2022-06-17 北京语言大学 Chinese text readability evaluation method and system fusing text distribution law characteristics
CN115147013A (en) * 2022-08-31 2022-10-04 南京复保科技有限公司 Method and device for calculating readability of insurance product, computer equipment and storage medium

Also Published As

Publication number Publication date
CN109933668B (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN109933668A (en) The classified estimation modeling method of simplified Chinese language text readability
CN107967318A (en) A kind of Chinese short text subjective item automatic scoring method and system using LSTM neutral nets
CN106528656A (en) Student history and real-time learning state parameter-based course recommendation realization method and system
Jayakodi et al. WordNet and cosine similarity based classifier of exam questions using bloom’s taxonomy
CN107977362A (en) A kind of method defined the level for Chinese text and calculate the scoring of Chinese text difficulty
CN105740382A (en) Aspect classification method for short comment texts
Pong-Inwong et al. Teaching senti-lexicon for automated sentiment polarity definition in teaching evaluation
CN105786898B (en) A kind of construction method and device of domain body
KR102484007B1 (en) Method and system for estimating a reading index using automatic analysis program for text of korean language
Dascalu et al. Age of exposure: A model of word learning
CN110472236A (en) A kind of two-way GRU text readability appraisal procedure based on attention mechanism
CN114780723B (en) Portrayal generation method, system and medium based on guide network text classification
Agarwal et al. Autoeval: A nlp approach for automatic test evaluation system
CN113934850B (en) Chinese text readability evaluation method and system fusing text distribution law characteristics
Dascălu et al. Towards an integrated approach for evaluating textual complexity for learning purposes
CN112115701B (en) News reading text readability evaluation method and system
CN104794168B (en) A kind of Knowledge Relation method and system
Botarleanu et al. Age of Exposure 2.0: Estimating word complexity using iterative models of word embeddings
CN107329951A (en) Build name entity mark resources bank method, device, storage medium and computer equipment
Pang Chinese readability analysis and its applications on the internet
Pavlekovic et al. Modeling children’s mathematical gift by neural networks and logistic regression
Karadeniz et al. Sustainability in Urban and Regional Planning Education in Turkey.
Wheeler et al. Exploring rater accuracy using unfolding models combined with topic models: Incorporating supervised latent Dirichlet allocation
Ke et al. Autoscoring essays based on complex networks
Limongelli et al. Enriching didactic similarity measures of concept maps by a deep learning based approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information
CB03 Change of inventor or designer information

Inventor after: Li Hong

Inventor after: Liu Miaomiao

Inventor after: Li Yan

Inventor before: Li Hong

Inventor before: Li Miaomiao

Inventor before: Li Yan