CN101567004B - English text automatic abstracting method based on eye tracking - Google Patents

English text automatic abstracting method based on eye tracking Download PDF

Info

Publication number
CN101567004B
CN101567004B CN2009100960607A CN200910096060A CN101567004B CN 101567004 B CN101567004 B CN 101567004B CN 2009100960607 A CN2009100960607 A CN 2009100960607A CN 200910096060 A CN200910096060 A CN 200910096060A CN 101567004 B CN101567004 B CN 101567004B
Authority
CN
China
Prior art keywords
user
speech
text
sentence
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100960607A
Other languages
Chinese (zh)
Other versions
CN101567004A (en
Inventor
徐颂华
江浩
刘智满
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN2009100960607A priority Critical patent/CN101567004B/en
Publication of CN101567004A publication Critical patent/CN101567004A/en
Application granted granted Critical
Publication of CN101567004B publication Critical patent/CN101567004B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to an English text automatic abstracting method based on eye tracking. Existing methods can not generate a personalized text abstract aiming at different readers. The method comprises the following steps of: obtaining the concerning time of a user to all words in a text when reading an electronic text by utilizing an eyeball tracking device or a camera; predicating the user interest in all sentences based on test similarity; and generating a personalized automatic abstracting result by utilizing the combination of user interest and text automatic abstracting algorithm. The method can effectively combine the user interest in an English text automatic abstracting process so as to lead the final test automatic abstracting result to be more similar to the abstract contentexpected by the user, thereby causing automatic abstracting software to provide better personalized service to the user.

Description

English text automatic abstracting method based on eye tracking
Technical field
The invention belongs to computer information retrieval and man-machine interface field, relate to a kind of personalized English text automatic abstracting method based on eye tracking.
Background technology
Current number of research projects and the achievement on the problem of computer English text summarization, made comprises to general file and to the autoabstract of certain specific knowledge field document.For example: Richard's ultraman people such as (Richard Alterman) went up " autoabstract at details place " (" the Summarization in the Small ") that proposes in 1986 at " cognitive science progress " (" Advances in Cognitive Science "), went up " text summarization " (" the Text Summarization ") that proposes in 1992 at " artificial intelligence encyclopedia " (" Encyclopedia of Artificial Intelligence "); " the automatic generation of spoken dialog simplified summary in non-strict field " that clo Si Zeqinei (Klaus Zechner) proposed in the SIGIR2001 meeting in calendar year 2001 (" Automatic Generation of Concise Summaries of SpokenDialogues in Unrestricted Domains "); " excavation of film comment and autoabstract " (" Movie ReviewMining and Summarization ") that Zu Wang people such as (L.Zuang) proposed in the CIKM2006 meeting in 2006; " using the autoabstract of supervised and semi-supervised formula study to extract " papers such as (" Extractive Summarization Using Supervised and Semi-SupervisedLearning ") that king people such as (Wong) proposed in Coling 2008 meetings in 2008.Randt husband people such as (Radev) is in the MEAD abstract system of exploitation in 2003; By GIN system of the CLAIR research group of Univ Michigan-Ann Arbor USA in exploitation in 2007.Above method does not all produce personalized text snippet to different readers, can not satisfy reader's demand.
Summary of the invention
The objective of the invention is to overcome the deficiency of prior art, a kind of personalized English text automatic abstracting method based on eye tracking is provided.
The inventive method may further comprise the steps:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0.
(b) every interval is 0.1 second, through eye tracking appearance or camera get access to the focal position of user's eyeball on screen (x, y).Utilize eye tracking appearance or camera to get access to user's eyeball focal position on screen and be the existing method of maturation.
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
AT ( w i ) = 0.1 exp ( - ( x i - x ) 2 2 k x 2 - ( y i - y ) 2 2 k y 2 )
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition, and AT (wi) unit is second.
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition.
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete steps are:
(e) calculate the semantic similarity Sim between any two the speech wi and wj in the literary composition (wi, wj); This similarity is the real number of a span between [0,1].Concrete computing method adopt by (Y.Li) people of etc.ing Lee and go up " a kind of method of utilizing multiple information source measurement semanteme of word similarity " (" the An approach for measuring semantic similaritybetween words using multiple information sources ") of proposition IEEE knowledge in 2003 with data engineering journal (IEEE Transactions onKnowledge and Data Engineering).
(f) to any speech w in the text, to pick out in the text and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the text to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
I ( w ) = Σ i = 1 k ( AT ( w i ) Sim γ ( w i , w ) δ ( w i , w ) ) Σ i = 1 k ( Sim γ ( w i , w ) δ ( w i , w ) ) + ϵ - - - ( 1 )
Wherein γ is a constant, and the value that is used for controlling Sim () accounts for many proportion; ε is the positive integer constant, and being used for the denominator of the formula that prevents (1) is 0; Function δ () is defined as with removing the low text of similarity:
δ ( w i , w ) = 1 If Sim γ ( w i , w ) > 0.01 0 Otherwise
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the text.
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of text size, utilizes the text summarization algorithm based on semantic analysis to obtain the text snippet result of compressibility for c%.Wherein based on the existing maturation method of text summarization algorithm use of semantic analysis, like Word AutoSummarize or MEAD.
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the text Offset(s):
I offset ( s ) = ( 1 - k ) max i = 1 m { I ( s i ) } λ ( s )
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the text, m is the sentence sum in the text.If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0.K is a free parameter, and span is 0~1.
(j) the adjusted user interest degree of each the sentence s I in the calculating text Adj(s):
I adj(s)=I(s)+I offset(s)
(k) all the sentence s in the text are selected the summary result of the sentence of preceding c% as the text from high to low by its adjusted user interest degree.
The inventive method is combined in user's hobby in the process of English text automatic abstracting effectively, makes final text snippet result more near the clip Text of user expectation, thereby makes autoabstract software better personalized service to be provided for the user.
Description of drawings
Fig. 1 is the process flow diagram of the inventive method embodiment.
Embodiment
Like Fig. 1; English text automatic abstracting method based on eye tracking comprises with lower module: eye tracking device 10, user concerned time sample collection 20, user interest degree prediction 30, traditional text auto-abstracting method 40, user interest degree adjustment 50, text summarization result 60, and concrete steps are following:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0.
(b) every interval is 0.1 second, through the eye tracking device get access to the focal position of user's eyeball on screen (x, y).The eye tracking device adopts common camera (Logitech QuickCam NotebookPro) the collocation opengazer of eye tracking system that increases income to assemble.
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
AT ( w i ) = 0.1 exp ( - ( x i - x ) 2 2 k x 2 - ( y i - y ) 2 2 k y 2 )
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition, and AT (wi) unit is second.
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition.Module user concerned time sample collection 20, each that the eye tracking system is got access to ocular focusing location records are constantly got off, and the user concerned time of each speech in the text that adds up.
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete steps are:
(e) the semantic similarity Sim (wi between any two speech wi and the wj in the calculating literary composition; Wj), concrete computing method adopt by (Y.Li) people of etc.ing Lee and go up " a kind of method of utilizing multiple information source measurement semanteme of word similarity " (" the An approach formeasuring semantic similarity between words using multiple informationsources ") of proposition IEEE knowledge in 2003 with data engineering journal (IEEE Transactions on Knowledge and Data Engineering).
(f) to any speech w in the text, to pick out in the text and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the text to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
I ( w ) = Σ i = 1 k ( AT ( w i ) Sim γ ( w i , w ) δ ( w i , w ) ) Σ i = 1 k ( Sim γ ( w i , w ) δ ( w i , w ) ) + ϵ - - - ( 1 )
Wherein γ is a constant, and the value that is used for controlling Sim () accounts for many proportion; ε is the positive integer constant, and being used for the denominator of the formula that prevents (1) is 0; Function δ () is defined as with removing the low text of similarity:
δ ( w i , w ) = 1 If Sim γ ( w i , w ) > 0.01 0 Otherwise
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the text.
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of text size, utilizes the MEAD English text automatic abstracting method to obtain the text snippet result of compressibility for c%.
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the text Offset(s):
I offset ( s ) = ( 1 - k ) max i = 1 m { I ( s i ) } λ ( s )
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the text, m is the sentence sum in the text.K be one can be by the parameter of user-defined value between [0,1], the information of having represented the user concerned time of obtaining from the eye tracking device shared ratio among the autoabstract results; If k=1, the result that then makes a summary is determined by user concerned time fully; If k=0, the result that then makes a summary has nothing to do with user concerned time fully, is equivalent to direct use MEAD system.If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0.K is a free parameter, and span is 0~1, and preset value is 0.5.
(j) the adjusted user interest degree of each the sentence s I in the calculating text Adj(s):
I adj(s)=I(s)+I offset(s)
(k) all the sentence s in the text are selected the summary result of the sentence of preceding c% as the text from high to low by its adjusted user interest degree.
Utilizing present embodiment is that 10%, 20%, 30% o'clock recall ratio (Recall), precision ratio (Precision) and F ratio (F-rate) contrasts as follows in compressibility respectively to the autoabstract result of on " science " electronic document 60 pieces science and technology type articles of publication and the system MS Word AutoSummarize of two traditional auto-abstracting methods of employing and the summary result's that MEAD obtains performance:
Figure GSB00000187459300061
Can find out that the inventive method all increases with respect to existing method performance under three kinds of compressibilitys.

Claims (1)

1. based on the English text automatic abstracting method of eye tracking, it is characterized in that the concrete steps of this method are:
Step 1) obtain the user when reading electronic document to literary composition in concern time of all speech, concrete grammar is:
(a) user concerned time with all speech in the literary composition is initialized as 0;
(b) every interval is 0.1 second, through eye tracking appearance or camera get access to the focal position of user's eyeball on screen (x, y);
(c) each the speech wi position on current screen in the literary composition be (xi, yi), then this speech is after at interval constantly, the recruitment AT of its user concerned time (wi) is:
AT ( w i ) = 0.1 exp ( - ( x i - x ) 2 2 k x 2 - ( y i - y ) 2 2 k y 2 )
Wherein kx and ky are respectively mean breadth and the average height of each speech on screen in the literary composition;
(d) repeating step (b) and (c) read this electronic document to the user obtains the user concerned time of each speech in the literary composition;
Step 2) based on the user interest degree of all sentences in the text similarity prediction literary composition, concrete grammar is:
(e) calculate the semantic similarity Sim between any two the speech wi and wj in the literary composition (wi, wj); This similarity is the real number of a span between [0,1];
(f) to any speech w in the document, to pick out in the document and k maximum speech of its similarity, the k value is that (10, n), wherein n is the number of all different speech in the document to min; The k that setting an is picked out speech is w1, w2 ..., wk, the user interest degree of through type (1) prediction speech w:
I ( w ) = Σ i = 1 k ( AT ( w i ) Sim γ ( w i , w ) δ ( w i , w ) ) Σ i = 1 k ( Sim γ ( w i , w ) δ ( w i , w ) ) + ϵ - - - ( 1 )
Wherein γ is that constant, ε are the positive integer constant, and function δ () is defined as:
δ ( w i , w ) = 1 If Sim γ ( w i , w ) > 0.01 0 Otherwise
(g) the user interest degree sum of all various words is the user interest degree I (s) of this sentence among any sentence s in the document;
Step 3) utilizes the user interest degree to combine the text summarization algorithm to generate personalized autoabstract result, and concrete grammar is:
(h) the text snippet length of setting user needs is the c% of document length, utilizes the text summarization algorithm based on semantic analysis to obtain the text snippet result of compressibility for c%;
(i), calculate the side-play amount I of its user interest degree to each the sentence s in the document Offset(s):
I offset ( s ) = ( 1 - k ) max i = 1 m { I ( s i ) } λ ( s )
Wherein I (si) is the user interest degree of sentence si, s1, and s2 ..., sm is a sentence all in the document, m is the sentence sum in the document; If sentence s appears among the resulting summary result of step (h), then λ (s) value is 1; If sentence s does not appear among the resulting summary result of step (h), then λ (s) value is 0; K is a free parameter, and span is 0~1;
(j) the adjusted user interest degree of each the sentence s I in the calculating document Adj(s):
I adj(s)=I(s)+I offset(s)
(k) all the sentence s in the document are selected the summary result of the sentence of preceding c% as the document from high to low by its adjusted user interest degree.
CN2009100960607A 2009-02-06 2009-02-06 English text automatic abstracting method based on eye tracking Expired - Fee Related CN101567004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100960607A CN101567004B (en) 2009-02-06 2009-02-06 English text automatic abstracting method based on eye tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100960607A CN101567004B (en) 2009-02-06 2009-02-06 English text automatic abstracting method based on eye tracking

Publications (2)

Publication Number Publication Date
CN101567004A CN101567004A (en) 2009-10-28
CN101567004B true CN101567004B (en) 2012-05-30

Family

ID=41283157

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100960607A Expired - Fee Related CN101567004B (en) 2009-02-06 2009-02-06 English text automatic abstracting method based on eye tracking

Country Status (1)

Country Link
CN (1) CN101567004B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103186565B (en) * 2011-12-28 2017-02-22 中国移动通信集团浙江有限公司 Method and device for judging user preference according to web browsing behavior of user
CN103823849A (en) * 2014-02-11 2014-05-28 百度在线网络技术(北京)有限公司 Method and device for acquiring entries
CN106469176B (en) * 2015-08-20 2019-08-16 百度在线网络技术(北京)有限公司 It is a kind of for extracting the method and apparatus of text snippet
CN109983755A (en) * 2016-06-30 2019-07-05 北方公司 The image capture system focused automatically, device and method are tracked based on eyes
US10636181B2 (en) 2018-06-20 2020-04-28 International Business Machines Corporation Generation of graphs based on reading and listening patterns
US11158206B2 (en) 2018-09-20 2021-10-26 International Business Machines Corporation Assisting learners based on analytics of in-session cognition
CN110287413A (en) * 2019-06-19 2019-09-27 掌阅科技股份有限公司 The display methods and electronic equipment of e-book description information
CN110941712B (en) * 2019-11-21 2022-09-20 清华大学深圳国际研究生院 User-level personalized text abstract generation method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003094044A1 (en) * 2002-05-03 2003-11-13 Hyperbolex Limited Electronic document indexing system and method
TW200608242A (en) * 2004-08-30 2006-03-01 Advance Multimedia Internet Technology Inc Ontology automatic construction method and system based on episode net
CN1755696A (en) * 2004-09-29 2006-04-05 株式会社东芝 System and method for creating document abstract
CN101320387A (en) * 2008-07-11 2008-12-10 浙江大学 Web page text and image ranking method based on user caring time

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003094044A1 (en) * 2002-05-03 2003-11-13 Hyperbolex Limited Electronic document indexing system and method
TW200608242A (en) * 2004-08-30 2006-03-01 Advance Multimedia Internet Technology Inc Ontology automatic construction method and system based on episode net
CN1755696A (en) * 2004-09-29 2006-04-05 株式会社东芝 System and method for creating document abstract
CN101320387A (en) * 2008-07-11 2008-12-10 浙江大学 Web page text and image ranking method based on user caring time

Also Published As

Publication number Publication date
CN101567004A (en) 2009-10-28

Similar Documents

Publication Publication Date Title
CN101567004B (en) English text automatic abstracting method based on eye tracking
CN105426514B (en) Personalized mobile application APP recommended method
CN109284506A (en) A kind of user comment sentiment analysis system and method based on attention convolutional neural networks
CN102033880A (en) Marking method and device based on structured data acquisition
CN103207860A (en) Method and device for extracting entity relationships of public sentiment events
WO2012111226A1 (en) Time-series document summarization device, time-series document summarization method and computer-readable recording medium
US11593555B1 (en) Systems and methods for determining consensus values
Rubinstein Historical corpora meet the digital humanities: the Jerusalem corpus of emergent modern Hebrew
Hlubík et al. Inserting punctuation to asr output in a real-time production environment
Tang et al. DuReader_robust: A Chinese dataset towards evaluating robustness and generalization of machine reading comprehension in real-world applications
Friginal et al. Exploring mega-corpora: Google Ngram viewer and the corpus of historical American English
Khalid et al. Topic detection from conversational dialogue corpus with parallel dirichlet allocation model and elbow method
Ashraf et al. Author profiling on bi-lingual tweets
Hatmi et al. Named Entity Recognition in Speech Transcripts following an Extended Taxonomy.
CN110019556A (en) A kind of topic news acquisition methods, device and its equipment
Cao et al. Question answering on lecture videos: a multifaceted approach
CN112463922A (en) Risk user identification method and storage medium
Falahati Qadimi Fumani et al. Inconsistent transliteration of Iranian university names: a hazard to Iran’s ranking in ISI Web of Science
Gonzales Sociolinguistic analysis with missing metadata? Leveraging linguistic and semiotic resources through deep learning to investigate English variation and change on Twitter
Zhu et al. Chinese text summarization based on fine-tuned GPT2
CN115391522A (en) Text topic modeling method and system based on social platform metadata
Matsumoto et al. Supporting human recollection of the impressive events using the number of photos
Chirkova et al. Modeling change in contact settings: A case study of phonological convergence
Giarelis et al. GreekT5: A Series of Greek Sequence-to-Sequence Models for News Summarization
Nie et al. Social Emotion Analysis System for Online News

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120530

Termination date: 20130206