US9875218B2 - Document summarization - Google Patents

Document summarization Download PDF

Info

Publication number
US9875218B2
US9875218B2 US14/166,098 US201414166098A US9875218B2 US 9875218 B2 US9875218 B2 US 9875218B2 US 201414166098 A US201414166098 A US 201414166098A US 9875218 B2 US9875218 B2 US 9875218B2
Authority
US
United States
Prior art keywords
reader
reading speed
document
interest
reading
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US14/166,098
Other versions
US20150213120A1 (en
Inventor
Diptiman Dasgupta
Radha M. De
Indrajit Poddar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US14/166,098 priority Critical patent/US9875218B2/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: DASGUPTA, DIPTIMAN, DE, RADHA M., PODDAR, INDRAJIT
Priority to US14/487,530 priority patent/US9852111B2/en
Publication of US20150213120A1 publication Critical patent/US20150213120A1/en
Application granted granted Critical
Publication of US9875218B2 publication Critical patent/US9875218B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • G06F17/211
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • G06F17/2745
    • G06F17/30011
    • G06F17/30719
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Definitions

  • the present invention relates to summarization of documents.
  • the problem of identifying a gist of a document is conventionally referred to as the text summarization or document summarization problem.
  • Traditional document-summarization techniques focus on the central idea of the text of the document.
  • Various computer algorithms have been developed to automatically generate the summary of the document.
  • a computer implemented method, system and a computer program product for summarizing a document includes receiving a reading speed of the reader, determining a summary length of a summary of the document based on the received reading speed of the reader, and generating a summary of the document having the determined summary length.
  • FIG. 1 illustrates a block diagram of a computing system for implementing embodiments of the present invention.
  • FIG. 2 illustrates a block diagram of a system for implementing embodiments of the present invention.
  • FIG. 3 illustrates a matrix of reading speeds and corresponding summary lengths.
  • FIG. 4 illustrates a flowchart depicting steps to be performed for implementing an embodiment of the present invention.
  • embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware embodiments that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, embodiments of the present invention may take the form of a computer program product, embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • LAN local area network
  • WAN wide area network
  • Internet Service Provider for example, AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • Embodiments of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • FIG. 1 illustrates a block diagram of a computing system for implementing an embodiment of the present invention.
  • the computing system includes a computing device 110 , which in turn includes a processing unit 112 , a system memory 114 , and a system bus 116 that couples various system components including the system memory 114 to the processing unit 112 .
  • the system bus 116 may be any of several types of bus architectures, including a memory bus, a memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures, such as PCI.
  • the system memory 114 includes a Read Only Memory (ROM) 118 and a Random Access Memory (RAM) 120 .
  • ROM Read Only Memory
  • RAM Random Access Memory
  • a Basic Input/Output System (BIOS) 122 containing the basic routines that help to transfer information between elements within the computing device 110 , such as during start-up, is stored in the ROM 118 .
  • the computing device 110 further includes a Hard Disk Drive (HDD) 124 as computer-readable storage media.
  • the HDD 124 is connected to the system bus 116 by an HDD interface 126 .
  • the HDD 124 provides a non-volatile storage for computer-readable instructions, data structures, program modules, and other data for the computing device 110 .
  • HDD 124 Although the exemplary environment described herein employs the HDD 124 , it should be appreciated by those skilled in the art that other types of computer-readable storage media, which can store data that is accessible by computer, such as RAM, ROM, removable magnetic disks, removable optical disks, and the like may also be used in the exemplary operating environment.
  • a number of program modules may be stored on the HDD 124 , including an operating system 128 , one or more application programs 130 , other program modules 132 , program data 134 , and a database system 136 .
  • the operating system 128 , the one or more application programs 130 , the other program modules 132 and program data 134 may be loaded onto the system memory 114 and specifically onto the RAM 120 during the functioning of the computing device 110 .
  • a user may provide commands and information through input devices, such as a keyboard, and receive output through peripheral output devices, such as monitor, speaker, printer, etc. These input and output devices are often connected to the processing unit 112 through an I/O adapter 140 coupled to the system bus 116 .
  • the computing device 110 may be connected to a remote computing device 142 through a network interface card 144 .
  • network interface card 144 may be any conventional means 141 of establishing communications links between the computers, such as a local area network, wide are network or wireless connection, may be used.
  • program modules depicted relative to the computing device 110 may be stored in a remote memory 146 .
  • the remote computing device 142 may be a personal computer, a router, a server, a network PC, a peer device, or other common network device.
  • FIG. 1 is a basic computing system and may vary.
  • the architecture of the aforementioned computing device is not limiting and is only depicted as an example on which an embodiment of the present invention may be implemented.
  • Other types of computing system such as a smart phone or a web-kiosk are well within the intended scope on which an embodiment of the present invention may be implemented
  • FIG. 2 illustrates a block diagram of a system for implementing embodiments of the present invention.
  • a reading device 201 used by a reader 202 for summarizing a document may be a device comprising a computing system as shown in FIG. 1 .
  • the reading device 201 may comprise various modules to perform various operations as shown distinctively in boxes in FIG. 2 and described hereinafter.
  • Each of the hereinafter described modules, which comprise computer program codes, may be configured to operate in conjunction with each other and incorporated in a software application running on the reading device 201 for summarizing the document.
  • the reader 202 through an input module 203 seeks to summarize the selected document.
  • the input module 203 may comprise a Graphical User Interface (GUI) to enable the reader 202 to select or receive a document for summarization. Selecting the document may include selecting the document stored within an internal memory of the reading device 201 or from an external memory accessible through the reading device 201 .
  • the input module 203 may additionally provide a framework to receive a reading speed of the reader 202 . The method of receiving the reading speed of the reader 202 may vary according to embodiments. For example, the input module 203 may require the reader 202 to input a reading speed.
  • the reading speed of a user is automatically determined according to how a particular user interacts with an e-reader, browser, e-mail system, etc. For example, assume that an e-reader displays one page at a time, and that each page contains 100 words. Assume further that a particular user turns to a next displayed page on the e-reader every 60 seconds. Thus, the reading speed for this user is 100 words per minute, which is automatically determined by a system detecting that each 100 word page is turned (i.e., replaced on the e-reader's display with a new 100 word page) every minute.
  • each page may or may not contain exactly 100 words, but the system is able to determine exactly how many words are on each page, as well as how long a reader stays on each page before turning it, thus enabling the system to calculate the user's reading speed (in words-per-minute).
  • a similar process is used to track how long a user stays on a webpage having a known number of words before switching to a new webpage; how long a user displays an e-mail having a known number of words before minimizing/closing the e-mail; etc.
  • the input module 203 may be configured to automatically retrieve a pre-determined reading speed of the reader.
  • the pre-determined reading speed of the reader may be stored in a memory in communication with a computer implementing an embodiment of the invention.
  • the pre-determined reading speed of the reader may be determined through a reading speed recorder module 204 .
  • the reading speed recorder module 204 may be a part of the aforementioned software application for summarizing the document or a separate software application running independently in the reading device 201 or in a separate computing device as shown in FIG. 1 .
  • the aforementioned determination of the reading speed of the reader may be through a suitable computer program code or algorithm embedded within the reading speed recorder module 204 and known to a person skilled in the art.
  • a summarization module 205 receives an input to summarize the document through the input module 203 along with the reading speed of the reader 202 .
  • the summarization module may comprise a pre-defined computer implemented algorithm known to a person skilled in the art to summarize the document.
  • the summarization module 205 before generating a summary of the document, determines a summary length of the document.
  • the aforementioned computer implemented algorithms known to a person skilled in the art for generating the summary of the document may be modified to generate the summary of the document of having a specific summary length based on the reading speed of the reader 202 .
  • the summary length is determined, according to an embodiment, from a table of reading speeds and corresponding summary length as shown in FIG.
  • the reading speed received by the summarization module 205 from the input module 203 is searched by the summarization module 205 in the aforementioned table to determine the appropriate summary length corresponding thereto.
  • the document is summarized having the aforementioned summary length as determined.
  • the unit of the reading speeds may be expressed in paragraph or pages or words per unit time and the corresponding summary lengths may be expressed in words.
  • the summarized document is subsequently displayed to the reader 202 through a display unit 206 of the reading device 201 .
  • FIG. 4 illustrates flowcharts depicting steps to be performed for implementing an embodiment of the present invention.
  • a reader seeks to summarize a selected document in a reading device.
  • the reading device may be a device comprising a computing system as shown in FIG. 1 .
  • a reading speed of the reader is received by the reading device before the document is summarized.
  • the method of receiving the reading speed of the reader may vary according to embodiments.
  • the reader may input a reading speed.
  • the reading device may be configured to automatically retrieve a pre-determined recorded reading speed of the reader stored in a memory in communication with the reading device.
  • a summary length of the summary to be generated of the document is determined.
  • the summary length is determined, according to an embodiment, from a table of reading speeds and corresponding summary length as shown in FIG. 3 stored in the reading device or accessible to the reading device.
  • the reading speed received by the reading device is searched in the aforementioned matrix to determine the appropriate summary length corresponding thereto.
  • the unit of the reading speeds may be expressed in paragraph or pages or words per unit time and the corresponding summary lengths may be expressed in words.
  • a summary is generated having the determined summary length using pre-defined computer implemented algorithm known to a person skilled in the art to summarize the document.
  • the generated summary of the document is displayed to the reader through a display unit of the reading device.
  • the document is summarized based on the reading speed of the reader.
  • the present invention has been described as determining a length of a document summary according to a reading speed of a reader
  • the document summary is further customized according to identified interests of the reader. For example, assume that a document describes several topics, including “how to invest in stocks”, “current geopolitical issues”, and “popular culture” (i.e., current art, music, movies, etc.). Assume further that data mining shows that the reader is primarily interested in stock investments. This data mining can be performed by examining databases (e.g., browsing histories, e-mail folders, etc.) in order to determine what genres of e-books, webpages, etc.
  • the reader has been read by the reader; the content of e-mail and other electronic documents (e.g., blog postings) that the reader has generated; an educational and/or employment background of the reader; etc.
  • the summary of the document is modified to reflect the reader's interests.
  • the document summary is modified to describe information from the document that relates to stock investments.
  • different identified interests of the reader are weighted (e.g., according to the frequency of readings/writings of the reader on different topics of interest), such that the summary of the document is modified to reflect these weights.
  • the length/content of the summary of the document will also reflect this same 70/20/10 breakdown. That is, 70% of the summary is devoted to (i.e., describes) stock investments, 20% of the summary is devoted to current geopolitical issues, and 10% of the summary is devoted to popular culture.
  • the interests (weighted or unweighted) of the reader are received by inputs from the reader.
  • a profile of the reader can be generated by the reader selecting and/or otherwise inputting different areas of interest to the reader. This profile is then used to modify/customize summaries of documents read by that particular reader.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

A computer implemented method, system and a computer program product is provided for summarizing a document which includes receiving a reading speed of the reader, determining a summary length of a summary of the document based on the received reading speed of the reader, and generating a summary of the document having the determined summary length.

Description

BACKGROUND
The present invention relates to summarization of documents.
The problem of identifying a gist of a document is conventionally referred to as the text summarization or document summarization problem. Traditional document-summarization techniques focus on the central idea of the text of the document. Various computer algorithms have been developed to automatically generate the summary of the document. However, there is a need to have a desired length of the generated summary of the document based on the reading habits of a reader of the document.
SUMMARY
A computer implemented method, system and a computer program product for summarizing a document is provided which includes receiving a reading speed of the reader, determining a summary length of a summary of the document based on the received reading speed of the reader, and generating a summary of the document having the determined summary length.
BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
FIG. 1 illustrates a block diagram of a computing system for implementing embodiments of the present invention.
FIG. 2 illustrates a block diagram of a system for implementing embodiments of the present invention.
FIG. 3 illustrates a matrix of reading speeds and corresponding summary lengths.
FIG. 4 illustrates a flowchart depicting steps to be performed for implementing an embodiment of the present invention.
DETAILED DESCRIPTION
As will be appreciated by one skilled in the art, embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware embodiments that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, embodiments of the present invention may take the form of a computer program product, embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Embodiments of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
FIG. 1 illustrates a block diagram of a computing system for implementing an embodiment of the present invention. The computing system includes a computing device 110, which in turn includes a processing unit 112, a system memory 114, and a system bus 116 that couples various system components including the system memory 114 to the processing unit 112. The system bus 116 may be any of several types of bus architectures, including a memory bus, a memory controller, a peripheral bus, and a local bus using any of a variety of bus architectures, such as PCI. The system memory 114 includes a Read Only Memory (ROM) 118 and a Random Access Memory (RAM) 120. A Basic Input/Output System (BIOS) 122, containing the basic routines that help to transfer information between elements within the computing device 110, such as during start-up, is stored in the ROM 118. The computing device 110 further includes a Hard Disk Drive (HDD) 124 as computer-readable storage media. The HDD 124 is connected to the system bus 116 by an HDD interface 126. The HDD 124 provides a non-volatile storage for computer-readable instructions, data structures, program modules, and other data for the computing device 110. Although the exemplary environment described herein employs the HDD 124, it should be appreciated by those skilled in the art that other types of computer-readable storage media, which can store data that is accessible by computer, such as RAM, ROM, removable magnetic disks, removable optical disks, and the like may also be used in the exemplary operating environment.
A number of program modules may be stored on the HDD 124, including an operating system 128, one or more application programs 130, other program modules 132, program data 134, and a database system 136. The operating system 128, the one or more application programs 130, the other program modules 132 and program data 134 may be loaded onto the system memory 114 and specifically onto the RAM 120 during the functioning of the computing device 110. A user may provide commands and information through input devices, such as a keyboard, and receive output through peripheral output devices, such as monitor, speaker, printer, etc. These input and output devices are often connected to the processing unit 112 through an I/O adapter 140 coupled to the system bus 116.
In a networked environment, the computing device 110 may be connected to a remote computing device 142 through a network interface card 144. It will be appreciated that the network connections shown are exemplary, and any conventional means 141 of establishing communications links between the computers, such as a local area network, wide are network or wireless connection, may be used. In a networked environment, program modules depicted relative to the computing device 110, or its components, may be stored in a remote memory 146. The remote computing device 142 may be a personal computer, a router, a server, a network PC, a peer device, or other common network device.
Those of the ordinary skill in the art will appreciate that the hardware in FIG. 1 is a basic computing system and may vary. The architecture of the aforementioned computing device is not limiting and is only depicted as an example on which an embodiment of the present invention may be implemented. Other types of computing system such as a smart phone or a web-kiosk are well within the intended scope on which an embodiment of the present invention may be implemented
FIG. 2 illustrates a block diagram of a system for implementing embodiments of the present invention. A reading device 201 used by a reader 202 for summarizing a document may be a device comprising a computing system as shown in FIG. 1. The reading device 201, according to an embodiment, may comprise various modules to perform various operations as shown distinctively in boxes in FIG. 2 and described hereinafter. Each of the hereinafter described modules, which comprise computer program codes, may be configured to operate in conjunction with each other and incorporated in a software application running on the reading device 201 for summarizing the document. The reader 202 through an input module 203 seeks to summarize the selected document. The input module 203 may comprise a Graphical User Interface (GUI) to enable the reader 202 to select or receive a document for summarization. Selecting the document may include selecting the document stored within an internal memory of the reading device 201 or from an external memory accessible through the reading device 201. The input module 203 may additionally provide a framework to receive a reading speed of the reader 202. The method of receiving the reading speed of the reader 202 may vary according to embodiments. For example, the input module 203 may require the reader 202 to input a reading speed.
In one embodiment, the reading speed of a user is automatically determined according to how a particular user interacts with an e-reader, browser, e-mail system, etc. For example, assume that an e-reader displays one page at a time, and that each page contains 100 words. Assume further that a particular user turns to a next displayed page on the e-reader every 60 seconds. Thus, the reading speed for this user is 100 words per minute, which is automatically determined by a system detecting that each 100 word page is turned (i.e., replaced on the e-reader's display with a new 100 word page) every minute. Or course, each page may or may not contain exactly 100 words, but the system is able to determine exactly how many words are on each page, as well as how long a reader stays on each page before turning it, thus enabling the system to calculate the user's reading speed (in words-per-minute). A similar process is used to track how long a user stays on a webpage having a known number of words before switching to a new webpage; how long a user displays an e-mail having a known number of words before minimizing/closing the e-mail; etc.
Alternatively, the input module 203 may be configured to automatically retrieve a pre-determined reading speed of the reader. The pre-determined reading speed of the reader may be stored in a memory in communication with a computer implementing an embodiment of the invention. The pre-determined reading speed of the reader may be determined through a reading speed recorder module 204. The reading speed recorder module 204 may be a part of the aforementioned software application for summarizing the document or a separate software application running independently in the reading device 201 or in a separate computing device as shown in FIG. 1. The aforementioned determination of the reading speed of the reader may be through a suitable computer program code or algorithm embedded within the reading speed recorder module 204 and known to a person skilled in the art.
A summarization module 205, receives an input to summarize the document through the input module 203 along with the reading speed of the reader 202. The summarization module may comprise a pre-defined computer implemented algorithm known to a person skilled in the art to summarize the document. The summarization module 205, before generating a summary of the document, determines a summary length of the document. The aforementioned computer implemented algorithms known to a person skilled in the art for generating the summary of the document may be modified to generate the summary of the document of having a specific summary length based on the reading speed of the reader 202. The summary length is determined, according to an embodiment, from a table of reading speeds and corresponding summary length as shown in FIG. 3 stored in the reading device 201 or accessible to the reading device 201. The reading speed received by the summarization module 205 from the input module 203 is searched by the summarization module 205 in the aforementioned table to determine the appropriate summary length corresponding thereto. The document is summarized having the aforementioned summary length as determined. According to an embodiment, the unit of the reading speeds may be expressed in paragraph or pages or words per unit time and the corresponding summary lengths may be expressed in words.
The summarized document is subsequently displayed to the reader 202 through a display unit 206 of the reading device 201.
FIG. 4 illustrates flowcharts depicting steps to be performed for implementing an embodiment of the present invention. At step 401, a reader seeks to summarize a selected document in a reading device. The reading device may be a device comprising a computing system as shown in FIG. 1. At step 402, a reading speed of the reader is received by the reading device before the document is summarized. The method of receiving the reading speed of the reader may vary according to embodiments. For example, the reader may input a reading speed. Alternatively, the reading device may be configured to automatically retrieve a pre-determined recorded reading speed of the reader stored in a memory in communication with the reading device.
At step 403, a summary length of the summary to be generated of the document is determined. The summary length is determined, according to an embodiment, from a table of reading speeds and corresponding summary length as shown in FIG. 3 stored in the reading device or accessible to the reading device. The reading speed received by the reading device is searched in the aforementioned matrix to determine the appropriate summary length corresponding thereto. According to an embodiment, the unit of the reading speeds may be expressed in paragraph or pages or words per unit time and the corresponding summary lengths may be expressed in words.
At step 404, a summary is generated having the determined summary length using pre-defined computer implemented algorithm known to a person skilled in the art to summarize the document.
At step 405, the generated summary of the document is displayed to the reader through a display unit of the reading device.
According to aforementioned embodiments, the document is summarized based on the reading speed of the reader.
While the present invention has been described as determining a length of a document summary according to a reading speed of a reader, in one embodiment the document summary is further customized according to identified interests of the reader. For example, assume that a document describes several topics, including “how to invest in stocks”, “current geopolitical issues”, and “popular culture” (i.e., current art, music, movies, etc.). Assume further that data mining shows that the reader is primarily interested in stock investments. This data mining can be performed by examining databases (e.g., browsing histories, e-mail folders, etc.) in order to determine what genres of e-books, webpages, etc. have been read by the reader; the content of e-mail and other electronic documents (e.g., blog postings) that the reader has generated; an educational and/or employment background of the reader; etc. Once the primary interest of the reader is ascertained by such data mining, then the summary of the document is modified to reflect the reader's interests. Thus, if the primary interest of the reader is stock investments, then the document summary is modified to describe information from the document that relates to stock investments.
In one embodiment, different identified interests of the reader are weighted (e.g., according to the frequency of readings/writings of the reader on different topics of interest), such that the summary of the document is modified to reflect these weights. Thus, in the document example above, if a reader has a reading/writing history of which 70% is related to stock investments, 20% is related to current geopolitical issues, and 10% is related to popular culture, then the length/content of the summary of the document will also reflect this same 70/20/10 breakdown. That is, 70% of the summary is devoted to (i.e., describes) stock investments, 20% of the summary is devoted to current geopolitical issues, and 10% of the summary is devoted to popular culture.
In another embodiment of the present invention, the interests (weighted or unweighted) of the reader are received by inputs from the reader. For example, a profile of the reader can be generated by the reader selecting and/or otherwise inputting different areas of interest to the reader. This profile is then used to modify/customize summaries of documents read by that particular reader.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and compute program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (17)

What is claimed is:
1. A system comprising:
a processor and a computer readable memory, wherein the processor retrieves and executes instructions from the computer readable memory to perform a processor-implemented method comprising;
receiving a reading speed of a reader, wherein the reader is a human reader, wherein the reading speed is based on a first source;
determining a summary length of a summary of a document based on the received reading speed of the reader, wherein the document is a second source that differs from the first source, wherein a first reading speed is faster than a second reading speed, wherein the first reading speed results in a first summary length of the summary and the second reading speed results in a second summary length of the summary, and wherein the first summary length is longer than the second summary length;
generating a summary of the document having the determined summary length;
identifying an interest of the reader;
modifying the summary of the document according to the interest of the reader in order to include, in the summary of the document, content from the document that is of interest to the reader, wherein the reader has multiple interests;
weighting each interest from the multiple interests based on a reading history of the reader, wherein each interest is assigned a weight based on a percentage of the reading history of the reader that is devoted to said each interest;
generating a weight ratio of interests of the reader from the multiple interests based on the percentage of the reading history of the reader that is devoted to said each interest;
generating components of the summary based on the weight ratio of interests of the reader; and
modifying the summary of the document to match the weight ratio such that a ratio of lengths of the components of the summary matches the weight ratio of the interests of the reader.
2. The system of claim 1, wherein the processor-implemented method further comprises:
receiving the reading speed of the reader as an input from the reader.
3. The system of claim 1, wherein the processor-implemented method further comprises:
receiving the reading speed of the reader by automatically retrieving a pre-determined reading speed of the reader stored in a memory in communication with the system.
4. The system of claim 1, wherein the processor-implemented method further comprises:
determining the summary length from a pre-defined table of reading speeds and corresponding summary lengths.
5. A computer program product for summarizing a document, the computer program product comprising a non-transitory computer readable storage medium having program code embodied therewith, the program code readable and executable by a processor to perform a method comprising:
receiving a reading speed of a reader, wherein the reader is a human reader, wherein the reading speed is based on a first source;
determining a summary length of a summary of the document based on the received reading speed of the reader, wherein the document is a second source that differs from the first source, wherein a first reading speed is faster than a second reading speed, wherein the first reading speed results in a first summary length of the summary and the second reading speed results in a second summary length of the summary, and wherein the first summary length is longer than the second summary length;
generating a summary of the document having the determined summary length;
identifying an interest of the reader;
modifying the summary of the document according to the interest of the reader in order to include, in the summary of the document, content from the document that is of interest to the reader, wherein the reader has multiple interests;
weighting each interest from the multiple interests based on a reading history of the reader, wherein each interest is assigned a weight based on a percentage of the reading history of the reader that is devoted to said each interest;
generating a weight ratio of interests of the reader from the multiple interests based on the percentage of the reading history of the reader that is devoted to said each interest;
generating components of the summary based on the weight ratio of interests of the reader; and
modifying the summary of the document to match the weight ratio such that a ratio of lengths of the components of the summary matches the weight ratio of the interests of the reader.
6. The computer program product of claim 5, wherein the method further comprises:
automatically retrieving a pre-determined reading speed of the reader stored in a memory.
7. The computer program product of claim 5, wherein the method further comprises:
determining the summary length from a pre-defined table of reading speeds and corresponding summary lengths.
8. The system of claim 1, wherein the first source is an input from the reader that specifies the reading speed of the reader.
9. The system of claim 1, wherein the first source is a document that is read by the reader solely in order to determine the reading speed of the reader.
10. The system of claim 1, wherein the first source is memory that stores a pre-determined reading speed of the reader.
11. The system of claim 1, wherein the processor-implemented method further comprises:
determining the interest of the reader by examining a browsing history of the reader.
12. The system of claim 1, wherein the processor-implemented method further comprises:
determining the interest of the reader by examining web log (blog) postings by the reader.
13. The system of claim 1, wherein the processor-implemented method further comprises:
determining the interest of the reader by examining an educational background of the reader.
14. The system of claim 1, wherein the processor-implemented method further comprises:
determining the interest of the reader by examining an employment background of the reader.
15. The system of claim 1, wherein the processor-implemented method further comprises:
determining the reading speed of the reader based on a frequency of the reader turning a page having a known number of words on an e-reader.
16. The system of claim 1, wherein the processor-implemented method further comprises:
determining the reading speed of the reader based on a length of time that the reader stays on a first webpage having a known number of words before switching to a second webpage.
17. The system of claim 1, wherein the processor-implemented method further comprises:
determining the reading speed of the reader based on a length of time that the reader displays an e-mail having a known number of words before closing the e-mail.
US14/166,098 2014-01-28 2014-01-28 Document summarization Expired - Fee Related US9875218B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/166,098 US9875218B2 (en) 2014-01-28 2014-01-28 Document summarization
US14/487,530 US9852111B2 (en) 2014-01-28 2014-09-16 Document summarization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US14/166,098 US9875218B2 (en) 2014-01-28 2014-01-28 Document summarization

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/487,530 Continuation US9852111B2 (en) 2014-01-28 2014-09-16 Document summarization

Publications (2)

Publication Number Publication Date
US20150213120A1 US20150213120A1 (en) 2015-07-30
US9875218B2 true US9875218B2 (en) 2018-01-23

Family

ID=53679199

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/166,098 Expired - Fee Related US9875218B2 (en) 2014-01-28 2014-01-28 Document summarization
US14/487,530 Expired - Fee Related US9852111B2 (en) 2014-01-28 2014-09-16 Document summarization

Family Applications After (1)

Application Number Title Priority Date Filing Date
US14/487,530 Expired - Fee Related US9852111B2 (en) 2014-01-28 2014-09-16 Document summarization

Country Status (1)

Country Link
US (2) US9875218B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210248326A1 (en) * 2020-02-12 2021-08-12 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10382823B2 (en) * 2016-03-28 2019-08-13 Oath Inc. Video content deep diving
US11003703B1 (en) 2017-12-31 2021-05-11 Zignal Labs, Inc. System and method for automatic summarization of content
US11640420B2 (en) 2017-12-31 2023-05-02 Zignal Labs, Inc. System and method for automatic summarization of content with event based analysis
US11755915B2 (en) 2018-06-13 2023-09-12 Zignal Labs, Inc. System and method for quality assurance of media analysis
US11356476B2 (en) 2018-06-26 2022-06-07 Zignal Labs, Inc. System and method for social network analysis
US11037356B2 (en) 2018-09-24 2021-06-15 Zignal Labs, Inc. System and method for executing non-graphical algorithms on a GPU (graphics processing unit)
US11126646B2 (en) * 2020-01-13 2021-09-21 International Business Machines Corporation Implicit and explicit cognitive analyses for data content comprehension
US11500912B2 (en) * 2020-09-18 2022-11-15 Ascender AI LLC Search engine UI systems and processes

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5752228A (en) * 1995-05-31 1998-05-12 Sanyo Electric Co., Ltd. Speech synthesis apparatus and read out time calculating apparatus to finish reading out text
US5924108A (en) 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors
US6424362B1 (en) 1995-09-29 2002-07-23 Apple Computer, Inc. Auto-summary of document content
WO2003017142A1 (en) 2001-08-13 2003-02-27 International Business Machines Corporation Summarizing and clustering to classify documents conceptually
US7194693B2 (en) 2002-10-29 2007-03-20 International Business Machines Corporation Apparatus and method for automatically highlighting text in an electronic document
US7395501B2 (en) 1997-12-22 2008-07-01 Ricoh Company, Ltd. Techniques for annotating portions of a document relevant to concepts of interest
WO2010002275A2 (en) 2008-07-04 2010-01-07 Isoundtrack Limited Method and system for making and playing soundtracks
US7711737B2 (en) 2005-09-12 2010-05-04 Microsoft Corporation Multi-document keyphrase extraction using partial mutual information
US7861149B2 (en) 2006-03-09 2010-12-28 Microsoft Corporation Key phrase navigation map for document navigation
US20120117475A1 (en) 2010-11-09 2012-05-10 Palo Alto Research Center Incorporated System And Method For Generating An Information Stream Summary Using A Display Metric
US8229949B2 (en) 2008-07-16 2012-07-24 Kabushiki Kaisha Toshiba Apparatus, method and program product for presenting next search keyword
US20120210203A1 (en) * 2010-06-03 2012-08-16 Rhonda Enterprises, Llc Systems and methods for presenting a content summary of a media item to a user based on a position within the media item
US20130054786A1 (en) 2011-08-29 2013-02-28 Google Inc. Using eBook Reading Data To Generate Time-Based Information
US20130100139A1 (en) * 2010-07-05 2013-04-25 Cognitive Media Innovations (Israel) Ltd. System and method of serial visual content presentation
US8769008B1 (en) * 2007-12-07 2014-07-01 The New York Times Company Method and system for providing preference based content to a location aware mobile device
US20140188766A1 (en) * 2012-07-12 2014-07-03 Spritz Technology Llc Tracking content through serial presentation
US20140234826A1 (en) * 2011-09-07 2014-08-21 Carmel-Haifa University Economic Corp. Ltd. System and method for evaluating and training academic skills
US20140331125A1 (en) * 2013-05-06 2014-11-06 The Speed Reading Group, Chamber Of Commerce Number: 60482605 Methods, systems, and media for guiding user reading on a screen
US20150277552A1 (en) * 2014-03-25 2015-10-01 Weerapan Wilairat Eye tracking enabled smart closed captioning

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5752228A (en) * 1995-05-31 1998-05-12 Sanyo Electric Co., Ltd. Speech synthesis apparatus and read out time calculating apparatus to finish reading out text
US6424362B1 (en) 1995-09-29 2002-07-23 Apple Computer, Inc. Auto-summary of document content
US5924108A (en) 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors
US7395501B2 (en) 1997-12-22 2008-07-01 Ricoh Company, Ltd. Techniques for annotating portions of a document relevant to concepts of interest
WO2003017142A1 (en) 2001-08-13 2003-02-27 International Business Machines Corporation Summarizing and clustering to classify documents conceptually
US7194693B2 (en) 2002-10-29 2007-03-20 International Business Machines Corporation Apparatus and method for automatically highlighting text in an electronic document
US7711737B2 (en) 2005-09-12 2010-05-04 Microsoft Corporation Multi-document keyphrase extraction using partial mutual information
US7861149B2 (en) 2006-03-09 2010-12-28 Microsoft Corporation Key phrase navigation map for document navigation
US8769008B1 (en) * 2007-12-07 2014-07-01 The New York Times Company Method and system for providing preference based content to a location aware mobile device
WO2010002275A2 (en) 2008-07-04 2010-01-07 Isoundtrack Limited Method and system for making and playing soundtracks
US8229949B2 (en) 2008-07-16 2012-07-24 Kabushiki Kaisha Toshiba Apparatus, method and program product for presenting next search keyword
US20120210203A1 (en) * 2010-06-03 2012-08-16 Rhonda Enterprises, Llc Systems and methods for presenting a content summary of a media item to a user based on a position within the media item
US20130100139A1 (en) * 2010-07-05 2013-04-25 Cognitive Media Innovations (Israel) Ltd. System and method of serial visual content presentation
US20120117475A1 (en) 2010-11-09 2012-05-10 Palo Alto Research Center Incorporated System And Method For Generating An Information Stream Summary Using A Display Metric
US20130054786A1 (en) 2011-08-29 2013-02-28 Google Inc. Using eBook Reading Data To Generate Time-Based Information
US20140234826A1 (en) * 2011-09-07 2014-08-21 Carmel-Haifa University Economic Corp. Ltd. System and method for evaluating and training academic skills
US20140188766A1 (en) * 2012-07-12 2014-07-03 Spritz Technology Llc Tracking content through serial presentation
US20140331125A1 (en) * 2013-05-06 2014-11-06 The Speed Reading Group, Chamber Of Commerce Number: 60482605 Methods, systems, and media for guiding user reading on a screen
US20150277552A1 (en) * 2014-03-25 2015-10-01 Weerapan Wilairat Eye tracking enabled smart closed captioning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
G. Drzadzewski et al., "Exploring and Analyzing Documents With OLAP", ACM, Proceedings of the 5th Ph.D. Workshop on Information and Knowledge, New York, 2012, pp. 33-40.
V. Qazvinian et al., "Generating Extractive Summaries of Scientific Paradigms", AI Access Foundation, Journal of Artificial Intelligence Research 46, 2013, pp. 165-201.

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210248326A1 (en) * 2020-02-12 2021-08-12 Samsung Electronics Co., Ltd. Electronic apparatus and control method thereof

Also Published As

Publication number Publication date
US20150213120A1 (en) 2015-07-30
US9852111B2 (en) 2017-12-26
US20150212977A1 (en) 2015-07-30

Similar Documents

Publication Publication Date Title
US9875218B2 (en) Document summarization
US10009308B2 (en) Social content features based on user tracking
US9576075B2 (en) Context aware query selection
US9116991B2 (en) Searching encrypted electronic books
US9990341B2 (en) Predictive approach to URL determination
US20210241893A1 (en) Dashboard Usage Tracking and Generation of Dashboard Recommendations
US20150077419A1 (en) Visualization of data related to unstructured text
US11188543B2 (en) Utilizing social information for recommending an application
US10073839B2 (en) Electronically based thesaurus querying documents while leveraging context sensitivity
US9760557B2 (en) Tagging autofill field entries
US10255249B1 (en) Previewing electronic book content within third-party websites
US20190324997A1 (en) Ordering search results based on a knowledge level of a user performing the search
US9892193B2 (en) Using content found in online discussion sources to detect problems and corresponding solutions
US10049163B1 (en) Connected phrase search queries and titles
US9208142B2 (en) Analyzing documents corresponding to demographics
US10545640B1 (en) Previewing electronic content within third-party websites
JP2022088350A (en) Computer-implemented method, computer program and computer system (document access control based on document component layouts)
US9705972B2 (en) Managing a set of data
JP2019528497A (en) Method and system for providing additional information regarding primary information
CN110275712A (en) A kind of text replacement method, device and equipment
US9286348B2 (en) Dynamic search system
US11768867B2 (en) Systems and methods for generating interactable elements in text strings relating to media assets
US20170277695A1 (en) Document curation
US20230195774A1 (en) Systems and methods for generating interactable elements in text strings relating to media assets
US20190026814A1 (en) Classification of Visitor Intent and Modification of Website Features Based upon Classified Intent

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DASGUPTA, DIPTIMAN;DE, RADHA M.;PODDAR, INDRAJIT;SIGNING DATES FROM 20131224 TO 20131226;REEL/FRAME:032063/0418

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20220123