CN101567004B - English text automatic abstracting method based on eye tracking - Google Patents
English text automatic abstracting method based on eye tracking Download PDFInfo
- Publication number
- CN101567004B CN101567004B CN2009100960607A CN200910096060A CN101567004B CN 101567004 B CN101567004 B CN 101567004B CN 2009100960607 A CN2009100960607 A CN 2009100960607A CN 200910096060 A CN200910096060 A CN 200910096060A CN 101567004 B CN101567004 B CN 101567004B
- Authority
- CN
- China
- Prior art keywords
- user
- speech
- text
- sentence
- document
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 210000001508 eye Anatomy 0.000 claims abstract description 15
- 210000005252 bulbus oculi Anatomy 0.000 claims abstract description 5
- 239000000203 mixture Substances 0.000 claims description 21
- 238000004458 analytical method Methods 0.000 claims description 3
- 230000007115 recruitment Effects 0.000 claims description 3
- 238000013100 final test Methods 0.000 abstract 1
- 238000012360 testing method Methods 0.000 abstract 1
- 235000019988 mead Nutrition 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001149 cognitive effect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000035800 maturation Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an English text automatic abstracting method based on eye tracking. Existing methods can not generate a personalized text abstract aiming at different readers. The method comprises the following steps of: obtaining the concerning time of a user to all words in a text when reading an electronic text by utilizing an eyeball tracking device or a camera; predicating the user interest in all sentences based on test similarity; and generating a personalized automatic abstracting result by utilizing the combination of user interest and text automatic abstracting algorithm. The method can effectively combine the user interest in an English text automatic abstracting process so as to lead the final test automatic abstracting result to be more similar to the abstract contentexpected by the user, thereby causing automatic abstracting software to provide better personalized service to the user.
Description
Technical field
The invention belongs to computer information retrieval and man-machine interface field, relate to a kind of personalized English text automatic abstracting method based on eye tracking.
Background technology
Current number of research projects and the achievement on the problem of computer English text summarization, made comprises to general file and to the autoabstract of certain specific knowledge field document.For example: Richard's ultraman people such as (Richard Alterman) went up " autoabstract at details place " (" the Summarization in the Small ") that proposes in 1986 at " cognitive science progress " (" Advances in Cognitive Science "), went up " text summarization " (" the Text Summarization ") that proposes in 1992 at " artificial intelligence encyclopedia " (" Encyclopedia of Artificial Intelligence "); " the automatic generation of spoken dialog simplified summary in non-strict field " that clo Si Zeqinei (Klaus Zechner) proposed in the SIGIR2001 meeting in calendar year 2001 (" Automatic Generation of Concise Summaries of SpokenDialogues in Unrestricted Domains "); " excavation of film comment and autoabstract " (" Movie ReviewMining and Summarization ") that Zu Wang people such as (L.Zuang) proposed in the CIKM2006 meeting in 2006; " using the autoabstract of supervised and semi-supervised formula study to extract " papers such as (" Extractive Summarization Using Supervised and Semi-SupervisedLearning ") that king people such as (Wong) proposed in Coling 2008 meetings in 2008.Randt husband people such as (Radev) is in the MEAD abstract system of exploitation in 2003; By GIN system of the CLAIR research group of Univ Michigan-Ann Arbor USA in exploitation in 2007.Above method does not all produce personalized text snippet to different readers, can not satisfy reader's demand.
Summary of the invention
The objective of the invention is to overcome the deficiency of prior art, a kind of personalized English text automatic abstracting method based on eye tracking is provided.
The inventive method may further comprise the steps:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0.
(b) every interval is 0.1 second, through eye tracking appearance or camera get access to the focal position of user's eyeball on screen (x, y).Utilize eye tracking appearance or camera to get access to user's eyeball focal position on screen and be the existing method of maturation.
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition, and AT (wi) unit is second.
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition.
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete steps are:
(e) calculate the semantic similarity Sim between any two the speech wi and wj in the literary composition (wi, wj); This similarity is the real number of a span between [0,1].Concrete computing method adopt by (Y.Li) people of etc.ing Lee and go up " a kind of method of utilizing multiple information source measurement semanteme of word similarity " (" the An approach for measuring semantic similaritybetween words using multiple information sources ") of proposition IEEE knowledge in 2003 with data engineering journal (IEEE Transactions onKnowledge and Data Engineering).
(f) to any speech w in the text, to pick out in the text and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the text to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
Wherein γ is a constant, and the value that is used for controlling Sim () accounts for many proportion; ε is the positive integer constant, and being used for the denominator of the formula that prevents (1) is 0; Function δ () is defined as with removing the low text of similarity:
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the text.
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of text size, utilizes the text summarization algorithm based on semantic analysis to obtain the text snippet result of compressibility for c%.Wherein based on the existing maturation method of text summarization algorithm use of semantic analysis, like Word AutoSummarize or MEAD.
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the text
Offset(s):
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the text, m is the sentence sum in the text.If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0.K is a free parameter, and span is 0~1.
(j) the adjusted user interest degree of each the sentence s I in the calculating text
Adj(s):
I
adj(s)=I(s)+I
offset(s)
(k) all the sentence s in the text are selected the summary result of the sentence of preceding c% as the text from high to low by its adjusted user interest degree.
The inventive method is combined in user's hobby in the process of English text automatic abstracting effectively, makes final text snippet result more near the clip Text of user expectation, thereby makes autoabstract software better personalized service to be provided for the user.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method embodiment.
Embodiment
Like Fig. 1; English text automatic abstracting method based on eye tracking comprises with lower module: eye tracking device 10, user concerned time sample collection 20, user interest degree prediction 30, traditional text auto-abstracting method 40, user interest degree adjustment 50, text summarization result 60, and concrete steps are following:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0.
(b) every interval is 0.1 second, through the eye tracking device get access to the focal position of user's eyeball on screen (x, y).The eye tracking device adopts common camera (Logitech QuickCam NotebookPro) the collocation opengazer of eye tracking system that increases income to assemble.
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition, and AT (wi) unit is second.
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition.Module user concerned time sample collection 20, each that the eye tracking system is got access to ocular focusing location records are constantly got off, and the user concerned time of each speech in the text that adds up.
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete steps are:
(e) the semantic similarity Sim (wi between any two speech wi and the wj in the calculating literary composition; Wj), concrete computing method adopt by (Y.Li) people of etc.ing Lee and go up " a kind of method of utilizing multiple information source measurement semanteme of word similarity " (" the An approach formeasuring semantic similarity between words using multiple informationsources ") of proposition IEEE knowledge in 2003 with data engineering journal (IEEE Transactions on Knowledge and Data Engineering).
(f) to any speech w in the text, to pick out in the text and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the text to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
Wherein γ is a constant, and the value that is used for controlling Sim () accounts for many proportion; ε is the positive integer constant, and being used for the denominator of the formula that prevents (1) is 0; Function δ () is defined as with removing the low text of similarity:
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the text.
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of text size, utilizes the MEAD English text automatic abstracting method to obtain the text snippet result of compressibility for c%.
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the text
Offset(s):
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the text, m is the sentence sum in the text.K be one can be by the parameter of user-defined value between [0,1], the information of having represented the user concerned time of obtaining from the eye tracking device shared ratio among the autoabstract results; If k=1, the result that then makes a summary is determined by user concerned time fully; If k=0, the result that then makes a summary has nothing to do with user concerned time fully, is equivalent to direct use MEAD system.If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0.K is a free parameter, and span is 0~1, and preset value is 0.5.
(j) the adjusted user interest degree of each the sentence s I in the calculating text
Adj(s):
I
adj(s)=I(s)+I
offset(s)
(k) all the sentence s in the text are selected the summary result of the sentence of preceding c% as the text from high to low by its adjusted user interest degree.
Utilizing present embodiment is that 10%, 20%, 30% o'clock recall ratio (Recall), precision ratio (Precision) and F ratio (F-rate) contrasts as follows in compressibility respectively to the autoabstract result of on " science " electronic document 60 pieces science and technology type articles of publication and the system MS Word AutoSummarize of two traditional auto-abstracting methods of employing and the summary result's that MEAD obtains performance:
Can find out that the inventive method all increases with respect to existing method performance under three kinds of compressibilitys.
Claims (1)
1. based on the English text automatic abstracting method of eye tracking, it is characterized in that the concrete steps of this method are:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0;
(b) every interval is 0.1 second, through eye tracking appearance or camera get access to the focal position of user's eyeball on screen (x, y);
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition;
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition;
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete grammar is:
(e) calculate the semantic similarity Sim between any two the speech wi and wj in the literary composition (wi, wj); This similarity is the real number of a span between [0,1];
(f) to any speech w in the document, to pick out in the document and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the document to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
Wherein γ is that constant, ε are the positive integer constant, and function δ () is defined as:
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the document;
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of document length, utilizes the text summarization algorithm based on semantic analysis to obtain the text snippet result of compressibility for c%;
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the document
Offset(s):
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the document, m is the sentence sum in the document; If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0; K is a free parameter, and span is 0~1;
(j) the adjusted user interest degree of each the sentence s I in the calculating document
Adj(s):
I
adj(s)=I(s)+I
offset(s)
(k) all the sentence s in the document are selected the summary result of the sentence of preceding c% as the document from high to low by its adjusted user interest degree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009100960607A CN101567004B (en) | 2009-02-06 | 2009-02-06 | English text automatic abstracting method based on eye tracking |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2009100960607A CN101567004B (en) | 2009-02-06 | 2009-02-06 | English text automatic abstracting method based on eye tracking |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101567004A CN101567004A (en) | 2009-10-28 |
CN101567004B true CN101567004B (en) | 2012-05-30 |
Family
ID=41283157
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2009100960607A Expired - Fee Related CN101567004B (en) | 2009-02-06 | 2009-02-06 | English text automatic abstracting method based on eye tracking |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101567004B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186565B (en) * | 2011-12-28 | 2017-02-22 | 中国移动通信集团浙江有限公司 | Method and device for judging user preference according to web browsing behavior of user |
CN103823849A (en) * | 2014-02-11 | 2014-05-28 | 百度在线网络技术(北京)有限公司 | Method and device for acquiring entries |
CN106469176B (en) * | 2015-08-20 | 2019-08-16 | 百度在线网络技术(北京)有限公司 | It is a kind of for extracting the method and apparatus of text snippet |
CN109983755A (en) * | 2016-06-30 | 2019-07-05 | 北方公司 | The image capture system focused automatically, device and method are tracked based on eyes |
US10636181B2 (en) | 2018-06-20 | 2020-04-28 | International Business Machines Corporation | Generation of graphs based on reading and listening patterns |
US11158206B2 (en) | 2018-09-20 | 2021-10-26 | International Business Machines Corporation | Assisting learners based on analytics of in-session cognition |
CN110287413A (en) * | 2019-06-19 | 2019-09-27 | 掌阅科技股份有限公司 | The display methods and electronic equipment of e-book description information |
CN110941712B (en) * | 2019-11-21 | 2022-09-20 | 清华大学深圳国际研究生院 | User-level personalized text abstract generation method and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003094044A1 (en) * | 2002-05-03 | 2003-11-13 | Hyperbolex Limited | Electronic document indexing system and method |
TW200608242A (en) * | 2004-08-30 | 2006-03-01 | Advance Multimedia Internet Technology Inc | Ontology automatic construction method and system based on episode net |
CN1755696A (en) * | 2004-09-29 | 2006-04-05 | 株式会社东芝 | System and method for creating document abstract |
CN101320387A (en) * | 2008-07-11 | 2008-12-10 | 浙江大学 | Web page text and image ranking method based on user caring time |
-
2009
- 2009-02-06 CN CN2009100960607A patent/CN101567004B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003094044A1 (en) * | 2002-05-03 | 2003-11-13 | Hyperbolex Limited | Electronic document indexing system and method |
TW200608242A (en) * | 2004-08-30 | 2006-03-01 | Advance Multimedia Internet Technology Inc | Ontology automatic construction method and system based on episode net |
CN1755696A (en) * | 2004-09-29 | 2006-04-05 | 株式会社东芝 | System and method for creating document abstract |
CN101320387A (en) * | 2008-07-11 | 2008-12-10 | 浙江大学 | Web page text and image ranking method based on user caring time |
Also Published As
Publication number | Publication date |
---|---|
CN101567004A (en) | 2009-10-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101567004B (en) | English text automatic abstracting method based on eye tracking | |
CN105426514B (en) | Personalized mobile application APP recommended method | |
CN109284506A (en) | A kind of user comment sentiment analysis system and method based on attention convolutional neural networks | |
CN102033880A (en) | Marking method and device based on structured data acquisition | |
CN103207860A (en) | Method and device for extracting entity relationships of public sentiment events | |
WO2012111226A1 (en) | Time-series document summarization device, time-series document summarization method and computer-readable recording medium | |
US11593555B1 (en) | Systems and methods for determining consensus values | |
Rubinstein | Historical corpora meet the digital humanities: the Jerusalem corpus of emergent modern Hebrew | |
Hlubík et al. | Inserting punctuation to asr output in a real-time production environment | |
Tang et al. | DuReader_robust: A Chinese dataset towards evaluating robustness and generalization of machine reading comprehension in real-world applications | |
Friginal et al. | Exploring mega-corpora: Google Ngram viewer and the corpus of historical American English | |
Khalid et al. | Topic detection from conversational dialogue corpus with parallel dirichlet allocation model and elbow method | |
Ashraf et al. | Author profiling on bi-lingual tweets | |
Hatmi et al. | Named Entity Recognition in Speech Transcripts following an Extended Taxonomy. | |
CN110019556A (en) | A kind of topic news acquisition methods, device and its equipment | |
Cao et al. | Question answering on lecture videos: a multifaceted approach | |
CN112463922A (en) | Risk user identification method and storage medium | |
Falahati Qadimi Fumani et al. | Inconsistent transliteration of Iranian university names: a hazard to Iran’s ranking in ISI Web of Science | |
Gonzales | Sociolinguistic analysis with missing metadata? Leveraging linguistic and semiotic resources through deep learning to investigate English variation and change on Twitter | |
Zhu et al. | Chinese text summarization based on fine-tuned GPT2 | |
CN115391522A (en) | Text topic modeling method and system based on social platform metadata | |
Matsumoto et al. | Supporting human recollection of the impressive events using the number of photos | |
Chirkova et al. | Modeling change in contact settings: A case study of phonological convergence | |
Giarelis et al. | GreekT5: A Series of Greek Sequence-to-Sequence Models for News Summarization | |
Nie et al. | Social Emotion Analysis System for Online News |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20120530 Termination date: 20130206 |