CN101520773A - Method for measuring cognitive difficulty of text - Google Patents

Method for measuring cognitive difficulty of text Download PDF

Info

Publication number
CN101520773A
CN101520773A CN200910048309A CN200910048309A CN101520773A CN 101520773 A CN101520773 A CN 101520773A CN 200910048309 A CN200910048309 A CN 200910048309A CN 200910048309 A CN200910048309 A CN 200910048309A CN 101520773 A CN101520773 A CN 101520773A
Authority
CN
China
Prior art keywords
text
boolean expression
keyword
cognitive difficulty
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200910048309A
Other languages
Chinese (zh)
Inventor
方宁
骆祥峰
徐炜民
刘方方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Shanghai for Science and Technology
Original Assignee
University of Shanghai for Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Shanghai for Science and Technology filed Critical University of Shanghai for Science and Technology
Priority to CN200910048309A priority Critical patent/CN101520773A/en
Publication of CN101520773A publication Critical patent/CN101520773A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention relates to a method for measuring the cognitive difficulty of a text. Firstly, a single text is defined as a boolean expression consisting of a plurality of key words and sentences, and then the boolean expression is subjected to logical operation to measure the cognitive difficulty of the text. The core of the method is that the understanding process of the text is abstracted into a cognitive process of concept learning, and the cognitive difficulty of the text is measured by performing the logical operation on objects (the sentences) and attributes (the key words) in the concept. The method can calculate the logical relation among the sentences in the text, thereby being convenient for a computer to process.

Description

The measure of the cognitive difficulty of text
Technical field:
Wood invention relates to a kind of measure of cognitive difficulty of text, more particularly, relates to a kind of method that the logical relation between the sentence in the text is measured the cognitive difficulty of text of calculating.
Background technology:
The cognitive process of text be unable to do without the grasp to logical relation between the text sentence.Traditional text analyzing only rests on the basis of grammatical analysis and simple semantic analysis, and the quantitative test at the logical relation between the text sentence is seldom arranged.The present invention can measure the cognitive difficulty of text from view of cognition science, thereby provides technical support for the text understanding based on machine.
Summary of the invention:
The objective of the invention is to analyze the limitation of text, the measure of the cognitive difficulty of a kind of text is provided at present machine.Its essence be the understanding process of text abstract be the process of a concept learning, by the cognitive difficulty that text is measured in the logical operation of object in the notion (sentence) and attribute (keyword).
For achieving the above object, design of the present invention is: the keyword in one piece of text, promptly significant noun and verb extract; Add up appearance and absent variable keyword in each sentence, form a Boolean expression; Arrive the simplest form, the variable minimum number that promptly comprises by various these Boolean expressions of means abbreviation; Calculate the cognitive difficulty of the text again.
According to above-mentioned inventive concept, the present invention adopts following technical proposals:
The measure of the cognitive difficulty of a kind of text is characterized in that operation steps is as follows:
(1) keyword and the sentence in one piece of text of statistics, described keyword is exactly noun and the verb that has important implication in the text;
(2) whether on corresponding sentence a Boolean expression appears making up according to described keyword;
(3) carry out the cognitive difficulty that the text is calculated in logical operation according to described Boolean expression.
Boolean expression in the above-mentioned steps (2) constitutes each described by some additions and represents a sentence, and each is multiplied each other by some variablees and constitutes, and each described variable is represented a described keyword; If keyword appears in the corresponding sentence, then described variable is " 1 "; If keyword does not occur, then described variable is " 0 "; A sentence is regarded as the relation of multiplying each other of plurality of keywords, one piece of text is regarded as the addition relation of some sentences again, like this, the text is just represented with a Boolean expression, is initial Boolean expression.
The measure of the cognitive difficulty of the text of the described logic-based computing in the above-mentioned steps (3), it is characterized in that the logical operation of carrying out according to Boolean expression described in the described step (3) carries out logic minimization to Boolean expression exactly, till the simplest Boolean expression occurs.
The simplest above-mentioned Boolean expression is exactly the shortest Boolean expression of length, the described variable minimum number that it comprises; The simplest described Boolean expression is to be made of some described additions.
The cognitive difficulty of above-mentioned text obtains divided by the described initial variable number that Boolean expression comprised with the described described variable number that the simplest Boolean expression comprised.
The present invention compared with prior art, have following conspicuous outstanding substantive distinguishing features and remarkable advantage: it is a Boolean expression as if each keyword and sentence composition that the present invention defines single piece of text earlier, again this Boolean expression is carried out the cognitive difficulty that the text is measured in logical operation, its core be the understanding process of text abstract be the cognitive process of a concept learning.By the cognitive difficulty that text is measured in the logical operation of object in the notion (sentence) and attribute (keyword), this method is handled thereby be convenient to computing machine to calculate the logical relation between the sentence in the text.
Embodiment:
A preferred embodiment of the present invention is as follows: the measure of the cognitive difficulty of Ben Wenben, and operation steps is as follows:
1. add up one piece of keyword and sentence in the text, keyword is exactly noun and the verb that has important implication in the text;
2. whether on corresponding sentence a Boolean expression appears making up according to keyword.This Boolean expression is made of some additions, and each represents a sentence, and each is multiplied each other by some variablees and constitutes, and each variable is represented a keyword.If keyword appears in the corresponding sentence, then variable is " 1 "; If keyword does not occur, then variable is " 0 ".A sentence is regarded as the relation of multiplying each other of plurality of keywords, one piece of text is regarded as the addition relation of some sentences again, the text can be represented (initial Boolean expression) with a Boolean expression like this.
Suppose to have two keywords to represent with variable a and b respectively.We represent with an ab and have occurred two keyword a and b in the sentence, represent first sentence that keyword a is arranged with two item a+b, and second sentence has keyword b, use variable a ' to represent keyword a not occur.
3. carry out the cognitive difficulty that the text is calculated in logical operation according to Boolean expression, it is characterized in that operation steps is as follows:
(1) Boolean expression is carried out logic minimization, till the simplest Boolean expression occurs.The simplest Boolean expression is exactly that the length of Boolean expression is the shortest, the variable minimum number that promptly comprises.The simplest Boolean expression is to be made of some additions, and each is to be multiplied each other by some variablees to constitute;
(2) the variable number that the simplest Boolean expression comprised is divided by the initial variable number that Boolean expression comprised, thereby obtains the cognitive difficulty tolerance of the text.
Suppose that one piece has the text of two sentences that initial Boolean expression ab+ab ' expression is arranged, it comprises 4 variablees, the process of abbreviation Boolean expression is ab+ab '=a (b+b ')=a, and then the simplest Boolean expression just comprises 1 variable, and then the cognitive difficulty of the text is 1/4; Suppose that another piece has the text ab+a ' b ' of two sentences, because not shorter expression formula, so the length of simple Boolean expression is exactly 4, then the cognitive difficulty of the text is 1.The shortest equivalent expression of finding the particular expression formula is a very difficult problem.Pass through some approximate computing techniques (decomposing) abbreviation expression formula in the reality as factor.
Two comparison example are as follows:
1. hypothesis has one section dialogue between two people, is designated as A and B.
A:Can?you?tell?me?the?time?
B:The?milkman?has?just?left.
We extract two keywords (being represented by overstriking in the dialogue), i.e. time (a represents with symbol) and milkman (b represents with symbol) from dialogue.
2. the initial Boolean expression of this section dialogue is shown in ab '+a ' b.
3. the initial Boolean expression of this dialogue comprises 4 variablees, because this Boolean expression can't abbreviation, so the simplest Boolean expression also comprises 4 variablees.The cognitive difficulty of its text is exactly 1 like this, represents that this is one section dialogue that is difficult to understanding.
The concrete steps of the measure of the cognitive difficulty of the text of another logic-based computing are as follows:
Now we add a background knowledge to this section dialogue as follows:
A:Can?you?tell?me?the?time?
B:The?milkman?has?just?left.
Background knowledge: The time is6am of the milkman leaving.
2. the initial Boolean expression of this section dialogue is shown in ab '+a ' b+ab.
3. the initial Boolean expression that has comprised background knowledge comprises 6 variablees.Can be the simplest Boolean expression a+b by abbreviation by heuristic, this the simplest Boolean expression comprises 2 variablees, the cognitive difficulty of text that has then comprised this section dialogue of background knowledge is 0.333, and expression has added the cognitive difficulty that background knowledge can reduce text.
Contrast top two examples as can be seen, the measure of the cognitive difficulty of text of the present invention's definition can embody the cognitive difficulty of actual text really.

Claims (5)

1. the measure of the cognitive difficulty of a text is characterized in that operation steps is as follows:
(1) keyword and the sentence in one piece of text of statistics, described keyword is exactly noun and the verb that has important implication in the text;
(2) whether on corresponding sentence a Boolean expression appears making up according to described keyword;
(3) carry out the cognitive difficulty that the text is calculated in logical operation according to described Boolean expression.
2. the measure of the cognitive difficulty of text according to claim 1, it is characterized in that the Boolean expression in the affiliated step (2) is made of some additions, each described item is represented a sentence, each is multiplied each other by some variablees and constitutes, and each described variable is represented a described keyword; If keyword appears in the corresponding sentence, then described variable is " 1 "; If keyword does not occur, then described variable is " 0 "; A sentence is regarded as the relation of multiplying each other of plurality of keywords, one piece of text is regarded as the addition relation of some sentences again, like this, the text is just represented with a Boolean expression, is initial Boolean expression.
3. the measure of the cognitive difficulty of text according to claim 2, it is characterized in that the logical operation of carrying out according to Boolean expression described in the described step (3) carries out logic minimization to Boolean expression exactly, till the simplest Boolean expression occurs.
4. the measure of the cognitive difficulty of text according to claim 3 is characterized in that the simplest described Boolean expression, is exactly the shortest Boolean expression of length, the described variable minimum number that it comprises; The simplest described Boolean expression is to be made of some described additions.
5. the measure of the cognitive difficulty of text according to claim 4 is so the cognitive difficulty that it is characterized in that text obtains except that stating the initial variable number that Boolean expression comprised with the described described variable number that the simplest Boolean expression comprised.
CN200910048309A 2009-03-26 2009-03-26 Method for measuring cognitive difficulty of text Pending CN101520773A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910048309A CN101520773A (en) 2009-03-26 2009-03-26 Method for measuring cognitive difficulty of text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910048309A CN101520773A (en) 2009-03-26 2009-03-26 Method for measuring cognitive difficulty of text

Publications (1)

Publication Number Publication Date
CN101520773A true CN101520773A (en) 2009-09-02

Family

ID=41081369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910048309A Pending CN101520773A (en) 2009-03-26 2009-03-26 Method for measuring cognitive difficulty of text

Country Status (1)

Country Link
CN (1) CN101520773A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068993A (en) * 2015-07-31 2015-11-18 成都思戴科科技有限公司 Method for evaluating text difficulty
CN105068993B (en) * 2015-07-31 2018-08-07 成都思戴科科技有限公司 A method of assessment text difficulty

Similar Documents

Publication Publication Date Title
Sharma et al. Prediction of Indian election using sentiment analysis on Hindi Twitter
Zesch et al. Wisdom of crowds versus wisdom of linguists–measuring the semantic relatedness of words
CN105005553B (en) Short text Sentiment orientation analysis method based on sentiment dictionary
CN104881402B (en) The method and device of Chinese network topics comment text semantic tendency analysis
CN102693279B (en) Method, device and system for fast calculating comment similarity
CN105302794B (en) A kind of Chinese finger event recognition method and system together
Krug et al. Rule-based coreference resolution in German historic novels
Pakray et al. Textual entailment using lexical and syntactic similarity
WO2014065392A1 (en) Information extraction system, information extraction method, and information extraction program
Zhang et al. Term recognition using conditional random fields
JP2014219872A (en) Utterance selecting device, method and program, and dialog device and method
Li et al. Comparison of current semantic similarity methods in wordnet
Liu et al. Chinese syntactic and typological properties based on dependency syntactic treebanks
Ansari Sentiment polarity classification using structural features
Bel et al. The use of sequences of linguistic categories in forensic written text comparison revisited
Krishna et al. A hybrid method for query based automatic summarization system
CN101520773A (en) Method for measuring cognitive difficulty of text
Bopche et al. Grammar checking system using rule based morphological process for an Indian language
Zhao et al. A CRF sequence labeling approach to Chinese punctuation prediction
Nazar et al. A co-occurrence taxonomy from a general language corpus
Guo et al. Web-based chinese term extraction in the field of study
Wali et al. Using standardized lexical semantic knowledge to measure similarity
CN102609413A (en) Control method and system for semantically enhanced relationship measure among word pairs
El-Shishtawy et al. A lemma based evaluator for semitic language text summarization systems
Asadifar et al. Hcqa: Hybrid and complex question answering on textual corpus and knowledge graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20090902