US20210042383A1 - Natural Language Processing Techniques for Generating a Document Summary - Google Patents
Natural Language Processing Techniques for Generating a Document Summary Download PDFInfo
- Publication number
- US20210042383A1 US20210042383A1 US16/531,151 US201916531151A US2021042383A1 US 20210042383 A1 US20210042383 A1 US 20210042383A1 US 201916531151 A US201916531151 A US 201916531151A US 2021042383 A1 US2021042383 A1 US 2021042383A1
- Authority
- US
- United States
- Prior art keywords
- sentence
- decision
- extracted
- abstracted
- editing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/166—Editing, e.g. inserting or deleting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G06F17/24—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G06F17/27—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Definitions
- the present disclosure relates to natural language processing techniques, including a system for automatically generating a summary from an original text document.
- a system for generating a summary of a text document includes a processor configured to generate an initial summary of an original document.
- the initial summary includes a selection of extracted sentences copied from the original document.
- the processor processes the extracted sentence to generate an abstracted sentence, and generates vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary.
- the vector representations are then input to a decision network to compute an editing decision.
- the editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence.
- the processor also updates the current summary based on the editing decision.
- a method of generating a summary of a text document includes generating an initial summary of an original document, wherein the initial summary includes a selection of extracted sentences copied from the original document.
- the method also includes performing a set of actions for each extracted sentence of the initial summary.
- the actions include processing the extracted sentence to generate an abstracted sentence and generating vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary.
- the actions also include inputting the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence.
- the actions also include updating the current summary based on the editing decision.
- a computer program product for generating a summary of a text document can include a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se.
- the program instructions can be executable by a processor to cause the processor to generate an initial summary of an original document, wherein the initial summary includes a selection of extracted sentences copied from the original document.
- the program instructions can be executable by the processor to process the extracted sentence to generate an abstracted sentence, and generate vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary.
- the program instructions can be executable by the processor to input the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence.
- the program instructions can be executable by the processor to update the current summary based on the editing decision.
- FIG. 1 is a block diagram of a system for generating document summaries, according to an embodiment described herein;
- FIG. 2 is a block diagram of an example system for generating document summaries, according to an embodiment described herein;
- FIG. 3 is an example computing device configured to generate document summaries
- FIG. 5 depicts an illustrative cloud computing environment according to an embodiment described herein.
- FIG. 6 depicts a set of functional abstraction layers provided by a cloud computing environment according to an embodiment described herein.
- the present disclosure describes natural language processing techniques for generating textual summaries of original text documents.
- the original documents may be aby suitable type of document written by a human author, including news articles, scientific papers, essays, business documents, and others.
- a system in accordance with embodiments processes the original document automatically (i.e., without human involvement) to condense the original document into a shorter version (i.e., a summary).
- the original document is condensed while trying to preserve the main essence of the original text and keeping the generated summary as readable as possible.
- An abstracted sentence may be generated by applying natural language paraphrasing and/or compression on a given text.
- an abstracted sentence may be generated using an encoder-decoder (sequence-to-sequence) technique, with the original text sequence being encoded while the abstracted sentence is the decoded sequence.
- encoder-decoder sequence-to-sequence
- abstracted sentences may provide better readability.
- the accuracy of a summary with only abstracted sentences may tend to decline over large textual inputs, and sometimes results in higher redundancy.
- the system describe herein can generate a summary that is a combination of extracted and abstracted sentences.
- the system may include a trained artificial neural network that receives a set of inputs related to an initial summary.
- the initial summary includes a set of extracted sentences that have been extracted from an original document.
- the network For each extracted sentence, the network generates a decision about what to add to the summary.
- the decision can be a decision to add the extracted sentence to the summary, or to add an abstracted version of the extracted sentence to the summary.
- the network can also generate a decision to discard the sentence.
- FIG. 1 is a block diagram of a system for generating document summaries, according to an embodiment described herein.
- the system 100 may be implemented by hardware or a combination of hardware and software.
- the system 100 may be implemented by the computing device 300 of FIG. 3 .
- the input to the system 100 is a full document 102 and the output of the system is a summary 104 of the full document 102 .
- the system 100 is configured to perform an iterative process wherein, for selected sentences of the full document 102 , an editing decision is made regarding whether to add the extracted sentence to the summary, add an abstracted version of the extracted sentence to the summary, or discard the extracted sentence.
- Each extracted sentence is processed by an abstractor 116 that generates the corresponding abstracted representation of the extracted sentence.
- the extracted sentences and abstracted sentences are further processed to generate vector representations of each sentence, i.e., the extracted sentence vector 110 and the abstracted sentence vector 112 .
- Each vector is a vector set of n real numbers.
- the sentence is processed to extract the sentence's tokens (e.g., words or phrases). Each token identified in the sentence is mapped to a corresponding position in the vector. For example, the corresponding position in the vector can be incremented each time a particular word or phrase mapped to that position occurs in the sentence.
- embedded word vectors are generated by mapping each word to m-dimensional pre-trained word vectors. The embedded word vectors may then be input to a convolutional sentence encoder to generate the vector representation of the sentence.
- the input to the ANN 106 also includes two auxiliary representations, a full document representation 118 and a summary representation 120 .
- the full document representation 118 provides a global context for making editing decisions, and will remain unchanged throughout the process of generating the summary for the full document.
- the summary representation 120 is a representation of the summary 104 that has been generated after the previous editing decision.
- Both the full document representation 118 and the summary representation 120 are vector sets of the same dimension as the extracted sentence vector 110 and the abstracted sentence vector 112 (i.e., n real numbers). The generation of both the full document representation 118 and the summary representation 120 is described further in relation to FIG. 2 .
- the four input vectors are input to the ANN 106 and the output of the ANN 106 is an editing decision 108 regarding the input sentence. If the editing decision 108 is a decision to add the extracted sentence to the summary 104 , the extracted sentence is added as the next sentence in the summary 104 . If the editing decision is a decision to add the abstracted sentence to the summary, the abstracted sentence is added as the next sentence in the summary 104 instead of the extracted sentence. If the editing decision is a decision to discard the sentence, no sentence is added and the summary remains unchanged from the previous iteration.
- the summary representation 120 is updated and the next iteration of the process begins with the next sentence of the initial summary generated by the extractor 114 .
- the process continues until an editing decision has been made with regard to each sentence of the initial summary.
- the resulting summary 104 can then be stored to a storage memory and associated with the original document.
- FIG. 2 is a block diagram of an example system for generating document summaries, according to an embodiment described herein.
- the system shown in FIG. 2 is a more detailed example of the system 100 described in relation to FIG. 1 .
- the editorial process described herein is performed over an initial summary (S) 200 , whose sentences were selected by the extractor 114 from a full document (D) 102 .
- the process performed by the system 100 edits the summary 200 to generate the higher quality summary 104 (denoted S′).
- the editorial process may be implemented by iterating over sentences of the initial summary 200 according to the selection order of the extractor 114 .
- s i e and s i a refer to the original (i.e., extracted) and paraphrased (i.e., abstracted) versions of a given sentence s i ⁇ S, respectively.
- e i and a i refer to the corresponding mathematical representations of s i e and s i a , such that e i represents the extracted sentence vector 110 and a i represents the abstracted sentence vector 112 .
- Both e i and a i are vector sets of n real numbers (e i ⁇ n and a i ⁇ n ).
- Both e i and a i may be generated by a sentence representation engine, which parses each sentence and maps each word or phrase (i.e., token) to a corresponding position of the vector, then maps each word to m-dimensional pre-trained word vectors, and inputs the resulting embedded word vectors into a convolutional sentence encoder, as explained above.
- a sentence representation engine which parses each sentence and maps each word or phrase (i.e., token) to a corresponding position of the vector, then maps each word to m-dimensional pre-trained word vectors, and inputs the resulting embedded word vectors into a convolutional sentence encoder, as explained above.
- the extractor 114 may be any suitable type of extractor.
- the extractor consists of two main subcomponents, an encoder and a sentence selector.
- the encoder can encode each sentence into its corresponding vector representation, e i , using a hierarchical representation.
- the hierarchical representation may be a combination of a temporal convolutional model followed by a bidirectional Long Short-Term Memory (biLSTM) encoder.
- the sentence selector can use an artificial neural network, such as a Multilayer Perceptron (MLP) or Pointer Network, to identify which sentences to add to the initial summary.
- MLP Multilayer Perceptron
- the sentence selector may calculate a selection likelihood for each sentence, P(s i ), according to a selection policy P( ⁇ ), and select the sentences for inclusion within the initial summary 200 based on the likelihood.
- the abstractor 116 may be any suitable type of encoder-decoder, such as any type of sequence-to-sequence (seq2seq) model .
- the abstractor 116 may be an encoder-aligner-decoder with a copy mechanism. The abstractor 116 operates by encoding the sentence into a vector representation and then decoding the resulting vector back into a textual representation of the sentence, which is the paraphrased or abstracted sentence s i a .
- the abstractor 116 may be applied to each extracted sentence individually to generate the corresponding abstracted sentence.
- the abstractor 116 may be applied to a group of three consecutive sentences (s ⁇ e ,s i e ,s + e ), to generate the abstracted sentence, s i a , where s ⁇ e and s + e denote the sentence that precedes and succeeds s i e in D, respectively. This allows the generation of an abstractive version of s i e (i.e., s i a ) that benefits from a wider local context.
- the word attention applied by the abstractor 116 may be enhanced using the extractor's decisions, which are given by extractor's sentence selection policy P( ⁇ ).
- C wj represents the original attention value of word, w j .
- the attention applied to the word may be biased according to the selection likelihood calculated for the sentence by the extractor 114 .
- This biasing may be implemented according to the following formula, where Z is a normalization term:
- both sentence versions are encoded in a similar manner.
- s i a into the vector representation, a i , the abstracted sentence, s i a , is first inserted into the whole document, D, in place of its corresponding extracted sentence, s i e .
- s i a is treated as if it was an ordinary sentence within the whole document, while the rest of the document remains untouched.
- the vector representation of the abstracted sentence is then encoded using the extractor's encoder in a similar way in which sentence s i e was encoded. This results in a representation, a i , that provides a comparable alternative to, e i , whose encoding is likely to be effected by similar contextual grounds.
- the full document representation 118 is a vector set of n real numbers (d ⁇ n ).
- the full document representation 118 may be computed by first calculating the mean, ⁇ , of all of the extracted sentence vectors in the full document according to the following formula, wherein N is the number of sentences in the full document:
- W d is an n by n matrix of real numbers (W d ⁇ n ⁇ n )
- b d is a vector set of n real numbers (b d ⁇ n ), which are used as a biasing factor.
- W d and b d are learnable parameters that can be identified through a training process.
- the summary representation 120 (denoted g i ⁇ 1 ).
- the summary representation 120 is a vector set representing the summary, S′, that has been generated by the editor so far.
- the summary representation 120 is a vector set of n real numbers (g i ⁇ 1 ⁇ n ), which is generated based on the editing decisions.
- the summary representation is recalculated based on the editing decision.
- the summary representation, g i may be updated according to the following formula:
- h i is the vector representation of the selected sentence or a vector set of n zeros if the sentence was discarded (i.e., h i ⁇ ⁇ e i ,a i , ⁇ right arrow over (0) ⁇ , depending on the editing decision that was made in the iteration that was just completed).
- the ANN 106 includes two fully-connected layers, a first layer 204 denoted W c , and a second layer 206 denoted V.
- W c is a matrix of real numbers of the dimensions m by 4n (W c ⁇ m ⁇ 4n )
- V is a matrix of real numbers of the dimensions 3 by m (V ⁇ 3 ⁇ m )
- b c is a matrix of m real numbers (b c ⁇ m ) representing bias values applied to the first layer
- b is a vector set of 3 real numbers (b ⁇ 3 ) representing bias values applied to the second layer.
- m may be equal to 512 and n may be equal to 512. However, it will be appreciated that other dimensions may be used.
- the ANN 106 computes three outputs, each one associated with a different editing decision.
- the first output 208 (denoted E) is a likelihood value for the decision to add the extracted sentence to the summary 104
- the second output 210 (denoted A) is a likelihood value for the decision to add the abstracted sentence to the summary 104
- the third output 212 (denoted R) is a likelihood value for the decision to reject the sentence.
- the system 100 then chooses the editing decision (denoted ⁇ i ) based on which output has the highest likelihood value (denoted p( ⁇ i ). In other words, the output with the highest probability value is chosen as the editing decision for the current iteration ( ⁇ i ⁇ ⁇ E, A, R ⁇ depending on which of E, A, and R is higher).
- the system 100 then appends the corresponding sentence version (i.e., either s i e or s i a ) to the summary S′, or if the editing decision is R the sentence s i is discarded.
- a predicted summary is generated using one of the documents from the training data, and the loss function for the predicted summary is computed based on a comparison of the summary with the corresponding author-generated summary.
- the trained ANN 106 may be evaluated against additional documents and summaries of the training data and eventually stored for later use in generating document summaries as described above.
- the loss function is a “soft” version of a cross entropy loss function.
- the cross entropy loss function provides an indication of the loss of information resulting from the predicted summary as opposed to other possible summaries that could have been generated.
- a soft label estimation is performed for the predicted summary.
- the soft label estimation may be performed by computing a quality score, r, for all of the possible summaries that could have been generated (r(S′)).
- the quality score is used to evaluate the quality of any given summary, S′.
- each row represents one of the possible summaries, S′, for an initial summary with three sentences.
- Each potential summary (labeled 0-7) is represented as a sequence of hypothetical editing decisions ⁇ j ⁇ ⁇ E, A, R ⁇ .
- the middle three columns represent the hypothetical editing decisions for each of the sentences of the corresponding potential summary, wherein 0 indicates that the extracted sentence is included in the summary, and 1 indicates that the abstracted sentence is included in the summary.
- the summaries in which a sentence is discarded are not shown. However, it will be appreciated that the actual table would include 27 rows of potential summaries.
- a soft label y( ⁇ i ) is computed for each editing decision of the predicted summary using the quality scores.
- the soft label y( ⁇ i ) may be referred to as the gain, which is the benefit gained if making the editing decision, ⁇ i .
- the soft label, y( ⁇ i ) is the average of all of the alternative quality scores that are included by the editing decision, ⁇ i , divided by a normalization factor, which is the average of all of the quality scores.
- the soft label for deciding on the first sentence would equal the sum of quality scores r 0 , r 1 , r 2 , and r 3 , divided by the normalization factor.
- the soft label for this decision would equal the average of quality scores r 4 , r 5 , r 6 , and r 7 , divided by the normalization factor.
- the soft label computed for each sentence will be dependent on the decisions made for all preceding sentences. For example, assuming that the editing decision for the first sentence is 0, then the soft labels for deciding on the second sentence would be calculated by summing quality scores r 0 and r 1 for keep extracted or r 2 and r 3 for keep abstracted. The above process can be written as:
- y ⁇ ( ⁇ i ) r _ ⁇ ( ⁇ 1 * , ... ⁇ , ⁇ i - 1 * , ⁇ i ) ⁇ ⁇ ⁇ j ⁇ ⁇ E , A , R ⁇ ⁇ r _ ⁇ ( ⁇ 1 * , ... ⁇ , ⁇ i - 1 * , ⁇ j )
- r ⁇ * 1 , . . . , ⁇ * i ⁇ 1 , ⁇ i
- r ( ⁇ * 1 , . . . , ⁇ * i ⁇ 1 , ⁇ i ) denotes the average quality score value obtained by decision sequences that start with the prefix ( ⁇ * 1 , . . . , ⁇ * i ⁇ 1 , ⁇ i ).
- r ( ⁇ * 1 , . . . , ⁇ * i ⁇ 1 , ⁇ i ) r ( ⁇ i ).
- the loss function for the predicted summary may be computed according to the following formula:
- S) refers to the loss function for predicted summary, S′, which is computed based on each of the editing decisions, ⁇ .
- the soft label for each editing decision is multiplied by the log of the probability, p( ⁇ ), which is the probability that the ANN 106 assigns to the decision.
- the computing device 300 may include a processor 302 that is adapted to execute stored instructions, a memory device 304 to provide temporary memory space for operations of said instructions during operation.
- the processor can be a single-core processor, multi-core processor, computing cluster, or any number of other configurations.
- the memory 304 can include random access memory (RAM), read only memory, flash memory, or any other suitable memory systems.
- the processor 302 may also be linked through the system interconnect 306 to a display interface 312 adapted to connect the computing device 300 to a display device 314 .
- the display device 314 may include a display screen that is a built-in component of the computing device 300 .
- the display device 314 may also include a computer monitor, television, or projector, among others, that is externally connected to the computing device 300 .
- a network interface controller (NIC) 316 may be adapted to connect the computing device 300 through the system interconnect 306 to the network 318 .
- the NIC 316 can transmit data using any suitable interface or protocol, such as the internet small computer system interface, among others.
- the network 318 may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, among others.
- a remote device 320 may connect to the computing device 300 through the network 318 .
- the processor 302 can be linked through the system interconnect 306 to the storage device 322 , which can include training data 324 and a network trainer 326 .
- the network trainer 326 is configured to generate the trained ANN 106 as describe above in relation to FIG. 2 .
- the trained ANN 106 can be used in a summary generator 328 to generate document summaries as shown in FIGS. 1 and 2 .
- the storage device 322 can also include a set of text documents 330 .
- the summary generator 328 may receive a selection of one or more documents 330 from a user and automatically generate summaries 332 corresponding to each of the selected documents.
- FIG. 3 is not intended to indicate that the computing device 300 is to include all of the components shown in FIG. 3 . Rather, the computing device 300 can include fewer or additional components not illustrated in FIG. 3 (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.). Furthermore, any of the functionalities of the network trainer 326 and the summary generator 328 are partially, or entirely, implemented in hardware and/or in the processor 302 . For example, the functionality may be implemented with an application specific integrated circuit, logic implemented in an embedded controller, or in logic implemented in the processor 302 , among others.
- the functionalities are implemented with logic, wherein the logic, as referred to herein, can include any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any suitable combination of hardware, software, and firmware.
- FIG. 4 is a process flow diagram summarizing an example method of automatically generating a document summary.
- the method 400 can be implemented with any suitable computing device, such as the computing device 300 of FIG. 3 implementing the system 100 describe in relation to FIGS. 1 and 2 .
- an initial summary is generated from an original document.
- the initial summary is a selection of extracted sentences copied from the whole document. Blocks 404 - 410 may be iteratively repeated for each extracted sentence of the initial summary, starting with the first extracted sentence.
- the extracted sentence is processed to generate a corresponding abstracted sentence.
- the abstracted sentence is a paraphrasing of the extracted sentence and may include less text and less information compared to the extracted sentence.
- the abstracted sentence may be generated by an encoder-aligner-decoder, or other suitable techniques.
- vector representations are computed for the extracted sentence, abstracted sentence, the whole document, and the current summary as it exists after the previous iteration of the process.
- the generation of the whole document representation and summary representation are discussed further above in relation to FIGS. 1 and 2 .
- the vector representations from block 406 are input to a decision network such as the ANN 106 of FIGS. 1 and 2 .
- the output of the decision network is an editing decision that determines whether the extracted sentence is added to the summary or the abstracted sentence is added to the summary instead of the extracted sentence.
- the decision network may also be configured to generate an editing decision to discard the extracted sentence.
- the summary is updated based on the editing decision.
- the summary may be updated by adding the extracted sentence, adding the abstracted sentence, or adding neither sentence and maintaining the summary in its current form if the editing decision is to discard the sentence.
- the process flow diagram of FIG. 4 is not intended to indicate that the operations of the method 400 are to be executed in any particular order, or that all of the operations of the method 400 are to be included in every case. Additionally, the method 400 can include additional operations. Addition variations on the above method 400 may be made within the scope of the described subject matter.
- the present invention may be a system, a method, and/or a computer program product.
- the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
- the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
- a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
- RAM random access memory
- ROM read-only memory
- EPROM or Flash memory erasable programmable read-only memory
- SRAM static random access memory
- CD-ROM compact disc read-only memory
- DVD digital versatile disk
- memory stick a floppy disk
- a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
- a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
- the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
- a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
- the computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
- the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
- electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical functions.
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- cloud computing environment 500 comprises one or more cloud computing nodes 502 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) or cellular telephone 504 A, desktop computer 504 B, laptop computer 504 C, and/or automobile computer system 504 N may communicate.
- Nodes 502 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof.
- This allows cloud computing environment 500 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device.
- computing devices 504 A-N shown in FIG. 5 are intended to be illustrative only and that computing nodes 502 and cloud computing environment 500 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser).
- FIG. 6 a set of functional abstraction layers provided by cloud computing environment 600 ( FIG. 6 ) is shown. It should be understood in advance that the components, layers, and functions shown in FIG. 6 are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functions are provided.
- Hardware and software layer 600 includes hardware and software components.
- hardware components include mainframes, in one example IBM® zSeries® systems; RISC (Reduced Instruction Set Computer) architecture based servers, in one example IBM pSeries® systems; IBM xSeries® systems; IBM BladeCenter® systems; storage devices; networks and networking components.
- software components include network application server software, in one example IBM WebSphere® application server software; and database software, in one example IBM DB2® database software.
- IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere, and DB2 are trademarks of International Business Machines Corporation registered in many jurisdictions worldwide).
- Virtualization layer 602 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients.
- management layer 604 may provide the functions described below.
- Resource provisioning provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment.
- Metering and Pricing provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses.
- Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources.
- User portal provides access to the cloud computing environment for consumers and system administrators.
- Service level management provides cloud computing resource allocation and management such that required service levels are met.
- Service Level Agreement (SLA) planning and fulfillment provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA.
- SLA Service Level Agreement
- Workloads layer 606 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and natural language processing.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A system for generating a summary of a text document is disclosed. In some examples, the system includes a processor configured to generate an initial summary of an original document. The initial summary includes a selection of extracted sentences copied from the original document. For each extracted sentence of the initial summary, the processor processes the extracted sentence to generate an abstracted sentence, and generates vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary. The vector representations are then input to a decision network to compute an editing decision. The editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence. The processor also updates the current summary based on the editing decision.
Description
- The present disclosure relates to natural language processing techniques, including a system for automatically generating a summary from an original text document.
- According to an embodiment described herein, a system for generating a summary of a text document includes a processor configured to generate an initial summary of an original document. The initial summary includes a selection of extracted sentences copied from the original document. For each extracted sentence of the initial summary, the processor processes the extracted sentence to generate an abstracted sentence, and generates vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary. The vector representations are then input to a decision network to compute an editing decision. The editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence. The processor also updates the current summary based on the editing decision.
- In some embodiments, a method of generating a summary of a text document includes generating an initial summary of an original document, wherein the initial summary includes a selection of extracted sentences copied from the original document. The method also includes performing a set of actions for each extracted sentence of the initial summary. The actions include processing the extracted sentence to generate an abstracted sentence and generating vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary. The actions also include inputting the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence. The actions also include updating the current summary based on the editing decision.
- In yet another embodiment, a computer program product for generating a summary of a text document can include a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se. The program instructions can be executable by a processor to cause the processor to generate an initial summary of an original document, wherein the initial summary includes a selection of extracted sentences copied from the original document. For each extracted sentence of the initial summary, the program instructions can be executable by the processor to process the extracted sentence to generate an abstracted sentence, and generate vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary. The program instructions can be executable by the processor to input the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions that includes a decision to add the extracted sentence and a decision to add the abstracted sentence. The program instructions can be executable by the processor to update the current summary based on the editing decision.
-
FIG. 1 is a block diagram of a system for generating document summaries, according to an embodiment described herein; -
FIG. 2 is a block diagram of an example system for generating document summaries, according to an embodiment described herein; -
FIG. 3 is an example computing device configured to generate document summaries; -
FIG. 4 is a process flow diagram summarizing an example method of automatically generating a document summary; -
FIG. 5 depicts an illustrative cloud computing environment according to an embodiment described herein; and -
FIG. 6 depicts a set of functional abstraction layers provided by a cloud computing environment according to an embodiment described herein. - The present disclosure describes natural language processing techniques for generating textual summaries of original text documents. The original documents may be aby suitable type of document written by a human author, including news articles, scientific papers, essays, business documents, and others. A system in accordance with embodiments processes the original document automatically (i.e., without human involvement) to condense the original document into a shorter version (i.e., a summary). The original document is condensed while trying to preserve the main essence of the original text and keeping the generated summary as readable as possible.
- The summarization process described herein can generate a summary that is a combination of extracted and abstracted sentences. Extracted sentences are portions of the original text document that are copied directly from the original document and imported into the summary unchanged. Building a summary from only extracted sentences keeps the extracted fragments untouched, allowing the preservation of important features, such as key phrases, facts, opinions, and the like. However, a summary with only extracted sentences tend to be less fluent, coherent, and readable and may sometimes include superfluous text.
- To improve the quality of the summary, it may be useful in some cases to replace an extracted sentence with an abstracted sentence. An abstracted sentence may be generated by applying natural language paraphrasing and/or compression on a given text. For example, an abstracted sentence may be generated using an encoder-decoder (sequence-to-sequence) technique, with the original text sequence being encoded while the abstracted sentence is the decoded sequence. In some cases, abstracted sentences may provide better readability. However, the accuracy of a summary with only abstracted sentences may tend to decline over large textual inputs, and sometimes results in higher redundancy.
- To improve the overall readability of the summary while maintaining accuracy and reducing redundancy, the system describe herein can generate a summary that is a combination of extracted and abstracted sentences. As describe more fully below, the system may include a trained artificial neural network that receives a set of inputs related to an initial summary. The initial summary includes a set of extracted sentences that have been extracted from an original document. For each extracted sentence, the network generates a decision about what to add to the summary. For example, the decision can be a decision to add the extracted sentence to the summary, or to add an abstracted version of the extracted sentence to the summary. In some embodiments, the network can also generate a decision to discard the sentence.
-
FIG. 1 is a block diagram of a system for generating document summaries, according to an embodiment described herein. Thesystem 100 may be implemented by hardware or a combination of hardware and software. For example, thesystem 100 may be implemented by thecomputing device 300 ofFIG. 3 . The input to thesystem 100 is afull document 102 and the output of the system is asummary 104 of thefull document 102. Thesystem 100 is configured to perform an iterative process wherein, for selected sentences of thefull document 102, an editing decision is made regarding whether to add the extracted sentence to the summary, add an abstracted version of the extracted sentence to the summary, or discard the extracted sentence. - The
system 100 includes a trained artificial neural network (ANN) 106, which is configured to output theediting decision 108 with regard to each sentence input to thesystem 100. In some embodiments, the ANN 106 is a two-layer, fully-connected neural network. Additional details of an example ANN 106 and training process are described in relation toFIG. 2 . - The data input to the ANN 106 for each editing decision includes a representation of the extracted sentence, referred to herein as the extracted
sentence vector 110, and a representation of the corresponding abstracted sentence, referred to herein as theabstracted sentence vector 112. To generate the input to thesystem 100, thefull document 102 is processed by anextractor 114 that generates an initial summary by determining which sentences to include in the initial summary and which sentences to exclude. The initial summary, S, is a set of extracted sentences that have been identified by theextractor 114 for inclusion in the initial summary. - Each extracted sentence is processed by an
abstractor 116 that generates the corresponding abstracted representation of the extracted sentence. The extracted sentences and abstracted sentences are further processed to generate vector representations of each sentence, i.e., the extractedsentence vector 110 and theabstracted sentence vector 112. Each vector is a vector set of n real numbers. To generate the vector representation of a sentence, the sentence is processed to extract the sentence's tokens (e.g., words or phrases). Each token identified in the sentence is mapped to a corresponding position in the vector. For example, the corresponding position in the vector can be incremented each time a particular word or phrase mapped to that position occurs in the sentence. Next, embedded word vectors are generated by mapping each word to m-dimensional pre-trained word vectors. The embedded word vectors may then be input to a convolutional sentence encoder to generate the vector representation of the sentence. - The input to the
ANN 106 also includes two auxiliary representations, afull document representation 118 and asummary representation 120. Thefull document representation 118 provides a global context for making editing decisions, and will remain unchanged throughout the process of generating the summary for the full document. Thesummary representation 120 is a representation of thesummary 104 that has been generated after the previous editing decision. Both thefull document representation 118 and thesummary representation 120 are vector sets of the same dimension as the extractedsentence vector 110 and the abstracted sentence vector 112 (i.e., n real numbers). The generation of both thefull document representation 118 and thesummary representation 120 is described further in relation toFIG. 2 . - The four input vectors are input to the
ANN 106 and the output of theANN 106 is anediting decision 108 regarding the input sentence. If theediting decision 108 is a decision to add the extracted sentence to thesummary 104, the extracted sentence is added as the next sentence in thesummary 104. If the editing decision is a decision to add the abstracted sentence to the summary, the abstracted sentence is added as the next sentence in thesummary 104 instead of the extracted sentence. If the editing decision is a decision to discard the sentence, no sentence is added and the summary remains unchanged from the previous iteration. - After the editing decision is determined, the
summary representation 120 is updated and the next iteration of the process begins with the next sentence of the initial summary generated by theextractor 114. The process continues until an editing decision has been made with regard to each sentence of the initial summary. The resultingsummary 104 can then be stored to a storage memory and associated with the original document. - It will be appreciated that the above description is a summary of the techniques described herein and then many additional sub-processes may be performed to generate the summary. A more detailed description of an example summary generation system is described in relation to
FIG. 2 . -
FIG. 2 is a block diagram of an example system for generating document summaries, according to an embodiment described herein. The system shown inFIG. 2 is a more detailed example of thesystem 100 described in relation toFIG. 1 . The editorial process described herein is performed over an initial summary (S) 200, whose sentences were selected by theextractor 114 from a full document (D) 102. The process performed by thesystem 100 edits thesummary 200 to generate the higher quality summary 104 (denoted S′). The editorial process may be implemented by iterating over sentences of theinitial summary 200 according to the selection order of theextractor 114. - As used herein si e and si a refer to the original (i.e., extracted) and paraphrased (i.e., abstracted) versions of a given sentence si ∈ S, respectively. Additionally, ei and ai refer to the corresponding mathematical representations of si e and si a, such that ei represents the extracted
sentence vector 110 and ai represents theabstracted sentence vector 112. Both ei and ai are vector sets of n real numbers (ei ∈ n and ai ∈ n). Both ei and ai may be generated by a sentence representation engine, which parses each sentence and maps each word or phrase (i.e., token) to a corresponding position of the vector, then maps each word to m-dimensional pre-trained word vectors, and inputs the resulting embedded word vectors into a convolutional sentence encoder, as explained above. - The
extractor 114 may be any suitable type of extractor. In some embodiments, the extractor consists of two main subcomponents, an encoder and a sentence selector. The encoder can encode each sentence into its corresponding vector representation, ei, using a hierarchical representation. For example, the hierarchical representation may be a combination of a temporal convolutional model followed by a bidirectional Long Short-Term Memory (biLSTM) encoder. The sentence selector can use an artificial neural network, such as a Multilayer Perceptron (MLP) or Pointer Network, to identify which sentences to add to the initial summary. The sentence selector may calculate a selection likelihood for each sentence, P(si), according to a selection policy P(▪), and select the sentences for inclusion within theinitial summary 200 based on the likelihood. - The
abstractor 116 may be any suitable type of encoder-decoder, such as any type of sequence-to-sequence (seq2seq) model . In some embodiments, theabstractor 116 may be an encoder-aligner-decoder with a copy mechanism. Theabstractor 116 operates by encoding the sentence into a vector representation and then decoding the resulting vector back into a textual representation of the sentence, which is the paraphrased or abstracted sentence si a. - The
abstractor 116 may be applied to each extracted sentence individually to generate the corresponding abstracted sentence. In some embodiments, instead of applying theabstractor 116 on single extracted sentences, theabstractor 116 may be applied to a group of three consecutive sentences (s− e,si e,s+ e), to generate the abstracted sentence, si a, where s− e and s+ e denote the sentence that precedes and succeeds si e in D, respectively. This allows the generation of an abstractive version of si e (i.e., si a) that benefits from a wider local context. In addition, the word attention applied by theabstractor 116 may be enhanced using the extractor's decisions, which are given by extractor's sentence selection policy P(▪). For example, Cwj represents the original attention value of word, wj. For each given word wj ∈ s, where s ∈ {s− e,si e,s+ e}, the attention applied to the word may be biased according to the selection likelihood calculated for the sentence by theextractor 114. This biasing may be implemented according to the following formula, where Z is a normalization term: -
- In order to have a proper comparison between the extracted sentence vectors, ei, and the abstracted sentence vectors, ai, both sentence versions are encoded in a similar manner. To achieve a proper encoding of the abstracted sentence, si a, into the vector representation, ai, the abstracted sentence, si a, is first inserted into the whole document, D, in place of its corresponding extracted sentence, si e. In this way, si a is treated as if it was an ordinary sentence within the whole document, while the rest of the document remains untouched. The vector representation of the abstracted sentence is then encoded using the extractor's encoder in a similar way in which sentence si e was encoded. This results in a representation, ai, that provides a comparable alternative to, ei, whose encoding is likely to be effected by similar contextual grounds.
- Another input to the
ANN 106 is the full document representation 118 (denoted d). Thefull document representation 118 is a vector set of n real numbers (d ∈ n). Thefull document representation 118 may be computed by first calculating the mean, ē, of all of the extracted sentence vectors in the full document according to the following formula, wherein N is the number of sentences in the full document: -
- The full document representation, d, may then be computed using the following formula:
-
d=tan h(W d ē+b d) -
- The next input to the
ANN 106 is the summary representation 120 (denoted gi−1). Thesummary representation 120 is a vector set representing the summary, S′, that has been generated by the editor so far. Thesummary representation 120 is a vector set of n real numbers (gi−1 ∈ n), which is generated based on the editing decisions. In the first iteration, gi−1 may be set to vector set of all zeros (g0={right arrow over (0)}). After each iteration, the summary representation is recalculated based on the editing decision. The summary representation, gi, may be updated according to the following formula: -
g i =g i−1+tan h(W g h i) - In the above formula, hi is the vector representation of the selected sentence or a vector set of n zeros if the sentence was discarded (i.e., hi ∈ {ei,ai,{right arrow over (0)}}, depending on the editing decision that was made in the iteration that was just completed).
- In the example system of
FIG. 2 , theANN 106 includes two fully-connected layers, afirst layer 204 denoted Wc, and asecond layer 206 denoted V. In this example, Wc is a matrix of real numbers of the dimensions m by 4n (Wc ∈ m×4n), V is a matrix of real numbers of the dimensions 3 by m (V ∈ 3×m), bc is a matrix of m real numbers (bc ∈ m) representing bias values applied to the first layer, and b is a vector set of 3 real numbers (b ∈ 3) representing bias values applied to the second layer. In some embodiments, m may be equal to 512 and n may be equal to 512. However, it will be appreciated that other dimensions may be used. - Given the four representations d, ei, ai, and gi−1 as an input, the editor's decision for each sentence si ∈ S is implemented using the
ANN 106, as follows: -
softmax(V tan h(Wc[ei·ai·gi−1·d]+bc)+b) - In the above equation [·] denotes a vectors concatenation, and the values for Wc, V, bc, and b are learnable parameters that can be determined using a training process as described below. In each step, i, the
ANN 106 computes three outputs, each one associated with a different editing decision. The first output 208 (denoted E) is a likelihood value for the decision to add the extracted sentence to thesummary 104, the second output 210 (denoted A) is a likelihood value for the decision to add the abstracted sentence to thesummary 104, and the third output 212 (denoted R) is a likelihood value for the decision to reject the sentence. Thesystem 100 then chooses the editing decision (denoted πi) based on which output has the highest likelihood value (denoted p(πi). In other words, the output with the highest probability value is chosen as the editing decision for the current iteration (πi ∈ {E, A, R} depending on which of E, A, and R is higher). Thesystem 100 then appends the corresponding sentence version (i.e., either si e or si a) to the summary S′, or if the editing decision is R the sentence si is discarded. - The
system 100 described above is able to capture various complex interactions between the different inputs. For example, thesystem 100 may learn that by choosing any one of the two candidate sentence versions, based on the current local context, the generated summary would be more fluent. As another example, thesystem 100 may learn that given the global context, one of the sentence versions may better fit in terms of the amount of salient information it may contain. Finally, based on the interaction between both sentence versions with either of the local and global contexts (and among the last two), thesystem 100 may learn that both sentence versions may only add superfluous or redundant information to the summary, and therefore, decide to reject both. - Network Training
- The training of the
ANN 106 may be performed using a set of training data that includes a group of documents and associated summaries. For the training data, the documents and their corresponding summaries have been written by human authors. The training data may be obtained by any suitable database of documents. - The weights and biases of the
ANN 106, i.e., the learnable parameters Wc, Wd, Wg, V, bc, bd, and b, are adjusted in an iterative process to obtain a solution that minimizes a loss function. During each iteration of the training process, the learnable parameters may be adjusted using a teacher forcing method, in which the ground truth label from the training data is used as input to the network instead using of the output of the network as the input for the next iteration. After adjusting the learnable parameters of the ANN, a predicted summary is generated using one of the documents from the training data, and the loss function for the predicted summary is computed based on a comparison of the summary with the corresponding author-generated summary. After a suitable number of iterations or after the loss function is below a threshold, for example, the trainedANN 106 may be evaluated against additional documents and summaries of the training data and eventually stored for later use in generating document summaries as described above. - In some embodiments, the loss function is a “soft” version of a cross entropy loss function. The cross entropy loss function provides an indication of the loss of information resulting from the predicted summary as opposed to other possible summaries that could have been generated. To generate the loss function for a particular predicted summary, a soft label estimation is performed for the predicted summary. The soft label estimation may be performed by computing a quality score, r, for all of the possible summaries that could have been generated (r(S′)). The quality score is used to evaluate the quality of any given summary, S′. Overall, for a given initial summary, S, with ι sentences, there are 3ιpossible summaries, S′, and a quality score is generated for each one. The following table is an example of the quality scores computed for an initial summary with 3 sentences.
-
TABLE 1 quality scores Potential Sentence Sentence Sentence Quality Summaries 0 1 2 Score 0 0 0 0 r 01 0 0 1 r1 2 0 1 0 r2 3 0 1 1 r3 4 1 0 0 r4 5 1 0 1 r5 6 1 1 0 r6 7 1 1 1 r7 - In the table, each row represents one of the possible summaries, S′, for an initial summary with three sentences. Each potential summary (labeled 0-7) is represented as a sequence of hypothetical editing decisions πj ∈ {E, A, R}. The middle three columns represent the hypothetical editing decisions for each of the sentences of the corresponding potential summary, wherein 0 indicates that the extracted sentence is included in the summary, and 1 indicates that the abstracted sentence is included in the summary. For the sake of simplicity, the summaries in which a sentence is discarded are not shown. However, it will be appreciated that the actual table would include 27 rows of potential summaries.
- The right column shows the corresponding quality score computed for each potential summary. In some embodiments, the quality score may be a type of quality score known as a rouge score. The quality scores may be calculated by comparing the actual human-authored summary to each of the potential summaries. Any suitable technique may be used to generate the quality scores. The quality score reflects the degree to which the information content of the hypothetical summary matches the information content of the human-authored summary.
- A soft label y(πi) is computed for each editing decision of the predicted summary using the quality scores. The soft label y(πi) may be referred to as the gain, which is the benefit gained if making the editing decision, πi. In this example, the soft label, y(πi), is the average of all of the alternative quality scores that are included by the editing decision, πi, divided by a normalization factor, which is the average of all of the quality scores. In other words, assuming that the editing decision for the first sentence is 0 (keep extracted), then the soft label for deciding on the first sentence would equal the sum of quality scores r0, r1, r2, and r3, divided by the normalization factor. If the editing decision for the first sentence is 1 (keep abstracted), then the soft label for this decision would equal the average of quality scores r4, r5, r6, and r7, divided by the normalization factor. The soft label computed for each sentence will be dependent on the decisions made for all preceding sentences. For example, assuming that the editing decision for the first sentence is 0, then the soft labels for deciding on the second sentence would be calculated by summing quality scores r0 and r1 for keep extracted or r2 and r3 for keep abstracted. The above process can be written as:
-
- In the above equation, (π*=π*1, . . . , π*l) denotes the optimal decision sequence, i.e., the decision sequence that maximizes the quality score, r. For i ∈ {1,2, . . . ,l},
r (π*1, . . . , π*i−1, πi) denotes the average quality score value obtained by decision sequences that start with the prefix (π*1, . . . , π*i−1, πi). For i=1,r (π*1, . . . ,π*i−1, πi)=r (πi). - Using the estimated soft labels computed for each editing decision of the predicted summary, the loss function for the predicted summary may be computed according to the following formula:
-
- In the above equation, (π|S) refers to the loss function for predicted summary, S′, which is computed based on each of the editing decisions, π. According to the above formula, the soft label for each editing decision is multiplied by the log of the probability, p(▪), which is the probability that the
ANN 106 assigns to the decision. These values are then summed for each sentence of the initial summary and divided by the number of sentences in the initial summary. - With reference now to
FIG. 3 , an example computing device is depicted that can generate document summaries. Thecomputing device 300 may be for example, a server, desktop computer, laptop computer, tablet computer, or smartphone. In some examples,computing device 300 may be a cloud computing node.Computing device 300 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types.Computing device 300 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices. - The
computing device 300 may include aprocessor 302 that is adapted to execute stored instructions, amemory device 304 to provide temporary memory space for operations of said instructions during operation. The processor can be a single-core processor, multi-core processor, computing cluster, or any number of other configurations. Thememory 304 can include random access memory (RAM), read only memory, flash memory, or any other suitable memory systems. - The
processor 302 may be connected through a system interconnect 306 (e.g., PCI®, PCI-Express®, etc.) to an input/output (I/O)device interface 308 adapted to connect thecomputing device 300 to one or more I/O devices 310. The I/O devices 310 may include, for example, a keyboard and a pointing device, wherein the pointing device may include a touchpad or a touchscreen, among others. The I/O devices 310 may be built-in components of thecomputing device 300, or may be devices that are externally connected to thecomputing device 300. - The
processor 302 may also be linked through thesystem interconnect 306 to adisplay interface 312 adapted to connect thecomputing device 300 to adisplay device 314. Thedisplay device 314 may include a display screen that is a built-in component of thecomputing device 300. Thedisplay device 314 may also include a computer monitor, television, or projector, among others, that is externally connected to thecomputing device 300. In addition, a network interface controller (NIC) 316 may be adapted to connect thecomputing device 300 through thesystem interconnect 306 to thenetwork 318. In some embodiments, theNIC 316 can transmit data using any suitable interface or protocol, such as the internet small computer system interface, among others. Thenetwork 318 may be a cellular network, a radio network, a wide area network (WAN), a local area network (LAN), or the Internet, among others. Aremote device 320 may connect to thecomputing device 300 through thenetwork 318. - In some examples, the
processor 302 can be linked through thesystem interconnect 306 to thestorage device 322, which can includetraining data 324 and anetwork trainer 326. Thenetwork trainer 326 is configured to generate the trainedANN 106 as describe above in relation toFIG. 2 . The trainedANN 106 can be used in asummary generator 328 to generate document summaries as shown inFIGS. 1 and 2 . Thestorage device 322 can also include a set of text documents 330. Thesummary generator 328 may receive a selection of one ormore documents 330 from a user and automatically generatesummaries 332 corresponding to each of the selected documents. - It is to be understood that the block diagram of
FIG. 3 is not intended to indicate that thecomputing device 300 is to include all of the components shown inFIG. 3 . Rather, thecomputing device 300 can include fewer or additional components not illustrated inFIG. 3 (e.g., additional memory components, embedded controllers, modules, additional network interfaces, etc.). Furthermore, any of the functionalities of thenetwork trainer 326 and thesummary generator 328 are partially, or entirely, implemented in hardware and/or in theprocessor 302. For example, the functionality may be implemented with an application specific integrated circuit, logic implemented in an embedded controller, or in logic implemented in theprocessor 302, among others. In some embodiments, the functionalities are implemented with logic, wherein the logic, as referred to herein, can include any suitable hardware (e.g., a processor, among others), software (e.g., an application, among others), firmware, or any suitable combination of hardware, software, and firmware. -
FIG. 4 is a process flow diagram summarizing an example method of automatically generating a document summary. Themethod 400 can be implemented with any suitable computing device, such as thecomputing device 300 ofFIG. 3 implementing thesystem 100 describe in relation toFIGS. 1 and 2 . - At
block 402, an initial summary is generated from an original document. The initial summary is a selection of extracted sentences copied from the whole document. Blocks 404-410 may be iteratively repeated for each extracted sentence of the initial summary, starting with the first extracted sentence. - At
block 404, the extracted sentence is processed to generate a corresponding abstracted sentence. The abstracted sentence is a paraphrasing of the extracted sentence and may include less text and less information compared to the extracted sentence. The abstracted sentence may be generated by an encoder-aligner-decoder, or other suitable techniques. - At
block 406, vector representations are computed for the extracted sentence, abstracted sentence, the whole document, and the current summary as it exists after the previous iteration of the process. The generation of the whole document representation and summary representation are discussed further above in relation toFIGS. 1 and 2 . - At
block 408, the vector representations fromblock 406 are input to a decision network such as theANN 106 ofFIGS. 1 and 2 . The output of the decision network is an editing decision that determines whether the extracted sentence is added to the summary or the abstracted sentence is added to the summary instead of the extracted sentence. In some embodiments, the decision network may also be configured to generate an editing decision to discard the extracted sentence. - At
block 410, the summary is updated based on the editing decision. In accordance with the editing decision, the summary may be updated by adding the extracted sentence, adding the abstracted sentence, or adding neither sentence and maintaining the summary in its current form if the editing decision is to discard the sentence. - The process flow diagram of
FIG. 4 is not intended to indicate that the operations of themethod 400 are to be executed in any particular order, or that all of the operations of themethod 400 are to be included in every case. Additionally, themethod 400 can include additional operations. Addition variations on theabove method 400 may be made within the scope of the described subject matter. - The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
- The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
- Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
- Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
- Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
- These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
- The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
- The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical functions. In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
- Referring now to
FIG. 5 , illustrativecloud computing environment 500 is depicted. As shown,cloud computing environment 500 comprises one or morecloud computing nodes 502 with which local computing devices used by cloud consumers, such as, for example, personal digital assistant (PDA) orcellular telephone 504A,desktop computer 504B,laptop computer 504C, and/orautomobile computer system 504N may communicate.Nodes 502 may communicate with one another. They may be grouped (not shown) physically or virtually, in one or more networks, such as Private, Community, Public, or Hybrid clouds as described hereinabove, or a combination thereof. This allowscloud computing environment 500 to offer infrastructure, platforms and/or software as services for which a cloud consumer does not need to maintain resources on a local computing device. It is understood that the types ofcomputing devices 504A-N shown inFIG. 5 are intended to be illustrative only and thatcomputing nodes 502 andcloud computing environment 500 can communicate with any type of computerized device over any type of network and/or network addressable connection (e.g., using a web browser). - Referring now to
FIG. 6 , a set of functional abstraction layers provided by cloud computing environment 600 (FIG. 6 ) is shown. It should be understood in advance that the components, layers, and functions shown inFIG. 6 are intended to be illustrative only and embodiments of the invention are not limited thereto. As depicted, the following layers and corresponding functions are provided. - Hardware and
software layer 600 includes hardware and software components. Examples of hardware components include mainframes, in one example IBM® zSeries® systems; RISC (Reduced Instruction Set Computer) architecture based servers, in one example IBM pSeries® systems; IBM xSeries® systems; IBM BladeCenter® systems; storage devices; networks and networking components. Examples of software components include network application server software, in one example IBM WebSphere® application server software; and database software, in one example IBM DB2® database software. (IBM, zSeries, pSeries, xSeries, BladeCenter, WebSphere, and DB2 are trademarks of International Business Machines Corporation registered in many jurisdictions worldwide). -
Virtualization layer 602 provides an abstraction layer from which the following examples of virtual entities may be provided: virtual servers; virtual storage; virtual networks, including virtual private networks; virtual applications and operating systems; and virtual clients. In one example,management layer 604 may provide the functions described below. Resource provisioning provides dynamic procurement of computing resources and other resources that are utilized to perform tasks within the cloud computing environment. Metering and Pricing provide cost tracking as resources are utilized within the cloud computing environment, and billing or invoicing for consumption of these resources. In one example, these resources may comprise application software licenses. Security provides identity verification for cloud consumers and tasks, as well as protection for data and other resources. User portal provides access to the cloud computing environment for consumers and system administrators. Service level management provides cloud computing resource allocation and management such that required service levels are met. Service Level Agreement (SLA) planning and fulfillment provide pre-arrangement for, and procurement of, cloud computing resources for which a future requirement is anticipated in accordance with an SLA. -
Workloads layer 606 provides examples of functionality for which the cloud computing environment may be utilized. Examples of workloads and functions which may be provided from this layer include: mapping and navigation; software development and lifecycle management; virtual classroom education delivery; data analytics processing; transaction processing; and natural language processing. - The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
Claims (23)
1. A system for generating a summary of a text document, the system comprising:
a processor to:
generate an initial summary of an original document, the initial summary comprising a selection of extracted sentences copied from the original document; and
for each extracted sentence of the initial summary:
process the extracted sentence to generate an abstracted sentence;
generate vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary;
input the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions comprising a decision to add the extracted sentence and a decision to add the abstracted sentence, and wherein the decision network is trained according to a cross-entropy loss function, wherein the cross-entropy loss function is computed based on a soft label computed for each editing decision of a predicted summary; and
update the current summary based on the editing decision.
2. The system of claim 1 , wherein the group of possible decisions comprises a decision to discard the extracted sentence and the abstracted sentence.
3. (canceled)
4. The system of claim 1 , wherein the soft label is computed by computing quality scores for each editing decision, wherein the soft label is the average of all alternative quality scores included by the editing decision divided by a normalization factor.
5. The system of claim 1 , wherein to generate the abstracted sentence, the processor is to apply an encoder-decoder to three consecutive extracted sentences of the initial summary.
6. The system of claim 5 , wherein to generate the abstracted sentence, the processor is to bias the word attention applied by the encoder-decoder according to a selection likelihood calculated for each extracted sentence during the selection of extracted sentences for the initial summary.
7. The system of claim 1 , wherein to generate the vector representation of the abstracted sentence, the processor is to insert the abstracted sentence into the original document and then encode the abstracted sentence into the vector representation.
8. The system of claim 1 , wherein the processor is to update the vector representation of the current summary based on the editing decision.
9. The system of claim 1 , wherein to generate the initial summary of the original document, the processor is to calculate a selection likelihood for each sentence of the original document using a pointer network.
10. A method of generating a summary of a text document, the method comprising:
generating an initial summary of an original document, the initial summary comprising a selection of extracted sentences copied from the original document; and
for each extracted sentence of the initial summary:
processing the extracted sentence to generate an abstracted sentence;
generating vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary;
inputting the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions comprising a decision to add the extracted sentence and a decision to add the abstracted sentence; and
updating the current summary based on the editing decision and updating the vector representation of the current summary based on the editing decision.
11. The method of claim 10 , wherein the group of possible decisions comprises a decision to discard the extracted sentence and the abstracted sentence.
12. The method of claim 10 , wherein the decision network is trained according to a cross-entropy loss function, wherein the cross-entropy loss function is computed based on a soft label computed for each editing decision of a predicted summary.
13. The method of claim 12 , wherein the soft label is computed by computing quality scores for each editing decision, wherein the soft label is the average of all alternative quality scores included by the editing decision divided by a normalization factor.
14. The method of claim 10 , wherein generating the abstracted sentence comprises applying an encoder-decoder to three consecutive extracted sentences of the initial summary.
15. The method of claim 14 , wherein generating the abstracted sentence further comprises biasing the word attention applied by the encoder-decoder according to a selection likelihood calculated for each extracted sentence during the selection of extracted sentences for the initial summary.
16. The method of claim 10 , wherein generating the vector representation of the abstracted sentence comprises inserting the abstracted sentence into the original document and then encoding the abstracted sentence into the vector representation.
17. (canceled)
18. The method of claim 10 , wherein generating the initial summary of the original document comprises calculating a selection likelihood for each sentence of the original document using a pointer network.
19. A computer program product for generating a summary of a text document comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, and wherein the program instructions are executable by a processor to cause the processor to:
generate an initial summary of an original document, the initial summary comprising a selection of extracted sentences copied from the original document; and
for each extracted sentence of the initial summary:
process the extracted sentence to generate an abstracted sentence;
generate vector representations of the extracted sentence, the abstracted sentence, the original document, and the current summary;
input the vector representations to a decision network to compute an editing decision, wherein the editing decision is selected from a group of possible decisions comprising a decision to add the extracted sentence and a decision to add the abstracted sentence, wherein the decision network is trained according to a cross-entropy loss function computed based on a soft label computed for each editing decision of a predicted summary, wherein the soft label is computed by computing quality scores for each editing decision, and wherein the soft label is the average of all alternative quality scores included by the editing decision divided by a normalization factor; and
update the current summary based on the editing decision.
20. (canceled)
21. The computer program product of claim 19 , wherein generating the abstracted sentence comprises applying an encoder-decoder to three consecutive extracted sentences of the initial summary.
22. The computer program product of claim 19 , wherein generating the abstracted sentence further comprises biasing the word attention applied by the encoder-decoder according to a selection likelihood calculated for each extracted sentence during the selection of extracted sentences for the initial summary.
23. The computer program product of claim 19 , wherein generating the vector representation of the abstracted sentence comprises inserting the abstracted sentence into the original document and then encoding the abstracted sentence into the vector representation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/531,151 US10902191B1 (en) | 2019-08-05 | 2019-08-05 | Natural language processing techniques for generating a document summary |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/531,151 US10902191B1 (en) | 2019-08-05 | 2019-08-05 | Natural language processing techniques for generating a document summary |
Publications (2)
Publication Number | Publication Date |
---|---|
US10902191B1 US10902191B1 (en) | 2021-01-26 |
US20210042383A1 true US20210042383A1 (en) | 2021-02-11 |
Family
ID=74191056
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/531,151 Active US10902191B1 (en) | 2019-08-05 | 2019-08-05 | Natural language processing techniques for generating a document summary |
Country Status (1)
Country | Link |
---|---|
US (1) | US10902191B1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11423206B2 (en) * | 2020-11-05 | 2022-08-23 | Adobe Inc. | Text style and emphasis suggestions |
US11443538B2 (en) * | 2019-10-16 | 2022-09-13 | Tata Consultancy Services Limited | System and method for machine assisted documentation in medical writing |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210073480A1 (en) * | 2019-09-05 | 2021-03-11 | Netflix, Inc. | Automatic preprocessing for black box translation |
CN113673241B (en) * | 2021-08-03 | 2024-04-09 | 之江实验室 | Text abstract generation framework system and method based on example learning |
CN113626584A (en) * | 2021-08-12 | 2021-11-09 | 中电积至(海南)信息技术有限公司 | Automatic text abstract generation method, system, computer equipment and storage medium |
CN113836295A (en) * | 2021-09-28 | 2021-12-24 | 平安科技(深圳)有限公司 | Text abstract extraction method, system, terminal and storage medium |
US11620320B1 (en) * | 2021-10-13 | 2023-04-04 | Dell Products L.P. | Document summarization through iterative filtering of unstructured text data of documents |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
NO316480B1 (en) * | 2001-11-15 | 2004-01-26 | Forinnova As | Method and system for textual examination and discovery |
US7698339B2 (en) * | 2004-08-13 | 2010-04-13 | Microsoft Corporation | Method and system for summarizing a document |
JP2011501258A (en) * | 2007-10-10 | 2011-01-06 | アイティーアイ・スコットランド・リミテッド | Information extraction apparatus and method |
US11074495B2 (en) * | 2013-02-28 | 2021-07-27 | Z Advanced Computing, Inc. (Zac) | System and method for extremely efficient image and pattern recognition and artificial intelligence platform |
US9483730B2 (en) * | 2012-12-07 | 2016-11-01 | At&T Intellectual Property I, L.P. | Hybrid review synthesis |
US20180018392A1 (en) * | 2015-04-29 | 2018-01-18 | Hewlett-Packard Development Company, L.P. | Topic identification based on functional summarization |
US10810240B2 (en) * | 2015-11-06 | 2020-10-20 | RedShred LLC | Automatically assessing structured data for decision making |
US10762283B2 (en) * | 2015-11-20 | 2020-09-01 | Adobe Inc. | Multimedia document summarization |
CN108280112B (en) * | 2017-06-22 | 2021-05-28 | 腾讯科技(深圳)有限公司 | Abstract generation method and device and computer equipment |
CN109783795B (en) * | 2017-11-14 | 2022-05-06 | 深圳市腾讯计算机系统有限公司 | Method, device and equipment for obtaining abstract and computer readable storage medium |
CN108090049B (en) * | 2018-01-17 | 2021-02-05 | 山东工商学院 | Multi-document abstract automatic extraction method and system based on sentence vectors |
US10303771B1 (en) * | 2018-02-14 | 2019-05-28 | Capital One Services, Llc | Utilizing machine learning models to identify insights in a document |
CA3042921A1 (en) * | 2018-05-10 | 2019-11-10 | Royal Bank Of Canada | Machine natural language processing for summarization and sentiment analysis |
US10803253B2 (en) * | 2018-06-30 | 2020-10-13 | Wipro Limited | Method and device for extracting point of interest from natural language sentences |
US10977291B2 (en) * | 2018-08-03 | 2021-04-13 | Intuit Inc. | Automated document extraction and classification |
US10831806B2 (en) * | 2018-10-29 | 2020-11-10 | International Business Machines Corporation | Query-based extractive summarization |
KR102540774B1 (en) * | 2018-12-04 | 2023-06-08 | 한국전자통신연구원 | Sentence embedding method and apparatus using subword embedding and skip-thought model |
-
2019
- 2019-08-05 US US16/531,151 patent/US10902191B1/en active Active
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11443538B2 (en) * | 2019-10-16 | 2022-09-13 | Tata Consultancy Services Limited | System and method for machine assisted documentation in medical writing |
US11423206B2 (en) * | 2020-11-05 | 2022-08-23 | Adobe Inc. | Text style and emphasis suggestions |
US11928418B2 (en) | 2020-11-05 | 2024-03-12 | Adobe Inc. | Text style and emphasis suggestions |
Also Published As
Publication number | Publication date |
---|---|
US10902191B1 (en) | 2021-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10902191B1 (en) | Natural language processing techniques for generating a document summary | |
US11734509B2 (en) | Controllable style-based text transformation | |
US11093707B2 (en) | Adversarial training data augmentation data for text classifiers | |
US20210050014A1 (en) | Generating dialogue responses utilizing an independent context-dependent additive recurrent neural network | |
US11455473B2 (en) | Vector representation based on context | |
US20210303803A1 (en) | Text style transfer using reinforcement learning | |
US10902208B2 (en) | Personalized interactive semantic parsing using a graph-to-sequence model | |
US11151324B2 (en) | Generating completed responses via primal networks trained with dual networks | |
US11775839B2 (en) | Frequently asked questions and document retrieval using bidirectional encoder representations from transformers (BERT) model trained on generated paraphrases | |
US11189269B2 (en) | Adversarial training data augmentation for generating related responses | |
US20220027707A1 (en) | Subgraph guided knowledge graph question generation | |
US20200293775A1 (en) | Data labeling for deep-learning models | |
US11132513B2 (en) | Attention-based natural language processing | |
US20200110797A1 (en) | Unsupervised text style transfer system for improved online social media experience | |
US11176333B2 (en) | Generation of sentence representation | |
US11281867B2 (en) | Performing multi-objective tasks via primal networks trained with dual networks | |
CN115310408A (en) | Transformer based encoding in conjunction with metadata | |
US11450111B2 (en) | Deterministic learning video scene detection | |
US11663412B2 (en) | Relation extraction exploiting full dependency forests | |
US20220405473A1 (en) | Machine learning for training nlp agent | |
US20230267342A1 (en) | Iterative answer and supplemental information extraction for machine reading comprehension | |
US20230169389A1 (en) | Domain adaptation | |
US20220245460A1 (en) | Adaptive self-adversarial negative sampling for graph neural network training | |
JP2023002475A (en) | Computer system, computer program and computer-implemented method (causal knowledge identification and extraction) | |
US11182155B2 (en) | Defect description generation for a software product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:FEIGENBLAT, GUY;KONOPNICKI, DAVID;MOROSHKO, EDWARD;AND OTHERS;SIGNING DATES FROM 20190730 TO 20190731;REEL/FRAME:049952/0847 |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |