CN111274792A

CN111274792A - Method and system for generating abstract of text

Info

Publication number: CN111274792A
Application number: CN202010065621.3A
Authority: CN
Inventors: 王欣晟; 周继恩; 陆堃彪
Original assignee: China Unionpay Co Ltd
Current assignee: China Unionpay Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-06-12
Anticipated expiration: 2040-01-20
Also published as: CN111274792B

Abstract

The invention provides a method for generating an abstract of a text, which comprises the following steps: preprocessing the text; marking parts of speech of words in the text and determining a grammatical structure existing in the text; determining a category of semantic roles for the word based on the part of speech and the grammar structure; extracting the abstract of the text from the clauses of the text according to a preset algorithm; and adjusting the summary.

Description

Method and system for generating abstract of text

Technical Field

The invention relates to the field of word processing, in particular to a method and a system for generating an abstract of a text.

Background

Currently, there are a number of methods for abstracting a summary of text. For example, a method of extracting a summary based on simple statistics. The method may score each sentence in the text according to the frequency of occurrence of the individual words in the sentence, resulting in a ranking of the importance of the sentences, and take the most important sentences as a summary of the text.

However, this type of method has a disadvantage in that the minimum unit of the extracted summary is one sentence. However, for chinese, a sentence may be a compound sentence, which may be formed of a plurality of clauses, each of which is connected by a comma, a pause, and a semicolon. Thus, the sentences that are abstracts may still be very long and still take a lot of time to read.

Disclosure of Invention

One aspect of the present invention provides a method for generating a summary of a text, comprising: preprocessing the text; marking parts of speech of words in the text and determining a grammatical structure existing in the text; determining a category of semantic roles for the word based on the part of speech and the grammar structure; extracting the abstract of the text from the clauses of the text according to a preset algorithm; and adjusting the summary.

Another aspect of the present invention provides a method for generating a summary of a text, including: preprocessing the text; extracting a first abstract of the text from sentences of the text according to a preset algorithm; marking the part of speech of the words in the first abstract and determining a grammatical structure existing in the first abstract; determining a category of semantic roles for the word based on the part of speech and the grammar structure; extracting a second abstract of the first abstract from the clauses of the first abstract according to a preset algorithm; and adjusting the second summary.

Yet another aspect of the present invention provides a system for generating a summary of text, comprising: a text pre-processing system for pre-processing the text; a part-of-speech tagging and syntactic structure analysis system for tagging parts-of-speech of words in the text and determining syntactic structures present in the text; a semantic role tagging system for determining a category of a semantic role for the word based on the part of speech and the syntactic structure; the single sentence abstract extracting system is used for extracting the abstract of the text from the clauses of the text according to a preset algorithm; and a digest adjustment system for adjusting the digest.

Yet another aspect of the present invention provides a system for generating a summary of a text, comprising: a text pre-processing system for pre-processing the text; a single sentence abstract extraction system for extracting a first abstract of the text in a sentence of the text according to a predetermined algorithm; a part-of-speech tagging and syntactic structure analysis system for tagging parts of speech of words in the first abstract and determining syntactic structures existing in the first abstract; a semantic role tagging system for determining a category of a semantic role for the word based on the part of speech and the syntactic structure; a single sentence abstract extraction system for extracting a second abstract of the first abstract from the clauses of the first abstract according to the predetermined algorithm; and a digest adjustment system for adjusting the second digest.

The present invention also provides a computer readable medium having stored thereon computer readable instructions which, when executed by a computer, are capable of performing a method according to embodiments of the present invention.

The embodiment of the invention can filter the non-text information in the rich text in the HTML format, extract the abstract of the text by taking the signs of periods, question marks, exclamation marks and the like as the marks, and extract the main body of each clause in a compound sentence as the abstract by marking the semantic characters (the main body is the words of which the types of the semantic characters in the clauses are subject, object and time and place). Embodiments of the present invention are also capable of re-abstracting a summary for a sentence and optimizing the re-abstracted summary.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

FIG. 1 shows a general schematic of a system for summarization according to an embodiment of the invention.

FIG. 2 illustrates a schematic diagram of determining related words from predicates according to an embodiment of the invention.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The following detailed description of embodiments of the invention refers to the accompanying drawings.

FIG. 1 shows a general schematic of a system for summarization according to an embodiment of the invention. As shown in the figure, the system comprises a text preprocessing system, a single sentence abstract extracting system, a part-of-speech tagging and syntactic structure analyzing system, a semantic role tagging system and an abstract generating system. Each system will be described separately below.

Text preprocessing system

The text pre-processing system may implement the following steps:

(1) and extracting the plain text.

The text pre-processing system may use regular expressions to extract plain text content from rich text in HTML format. In some embodiments, since multimedia information such as pictures, attachments and the like is included in the rich text in the HTML format, regular expressions can be used to filter out tags in the rich text, so as to extract plain text information therein. In some embodiments, this step may not be performed if the text to be abstracted is already plain text.

(2) And segmenting the text.

Unlike english, which naturally distinguishes different words by spaces, chinese does not have similar symbols to segment text, and therefore it is necessary to segment words in text for subsequent steps. In some embodiments, the text pre-processing system may use jieba tokenization to tokenize text.

(3) Stop words are removed.

There are many stop words in chinese, such as "o", "etc. These stop words are generally insignificant to semantics and may also affect the quality of the generated summary. Stop words may be removed before the abstract is extracted.

Single sentence abstract extraction system

The single sentence abstract extraction system may extract a single sentence with the highest weight value in a text based on a plurality of sentences (sentences are also referred to as "single sentences") included in a piece of text to take the single sentence as an abstract of the text. In some embodiments, the method for extracting a single sentence may adopt an existing TextRank algorithm, and the specific steps of the algorithm are as follows:

(1) from the individual sentences and the results of the word segmentation, a graph can be constructed, G ═ V, (E), consisting of a set of vertices V and a set of edges E, E being a subset of V × V. For text, V is each sentence divided by a period or question mark or exclamation point. For example, Vi represents the ith sentence in a paragraph. W (Vi) represents the weight of the ith sentence, and the specific calculation formula is as follows:

where sentences represents the number of clauses in a paragraph. d is a damping factor between 0 and 1, which represents the probability of jumping from a given vertex to a random vertex in another graph, and is typically set to 0.85. w is a_jiIndicating how similar the jth sentence is to the ith sentence.

Textrank calculates each sentence as follows:

1) initializing a weight for each sentence

2) Obtaining the final weight of each sentence through iterative calculation

And finding a single sentence with the highest weight in the text by a TextRank algorithm, and taking the found single sentence as an abstract. In some embodiments, if multiple single sentences are included in the text, the found single sentence may be referred to as a summary. In some embodiments, if only one single sentence is included in the text, the use of a single sentence summarization system can be reduced and the semantic role tagging system can be used directly with the part of speech and grammatical structure of the text.

In other embodiments, other algorithms such as Luhn, Edmundson, LSA, LexRank, Tfidf, etc. may be used to extract the summary from the text.

Part-of-speech tagging and syntactic structure analysis system

The part-of-speech tagging and syntactic structure analysis system may determine the part-of-speech of each word in a word based on existing methods such as conditional random fields, and determine the syntactic structure in the text based on the obtained parts-of-speech using existing dependent syntactic analysis methods, after the step of segmenting the text or the step of extracting the words using a word summarization extraction system. An exemplary syntax structure is shown in table 1 below.

TABLE 1

Semantic role labeling system

The semantic role labeling system can find predicates in the single sentence through a machine learning model based on the part of speech and the grammatical structure, classify all words related to the predicates in each grammatical structure of the single sentence, and analyze which semantic role each word possibly belongs to. Meanwhile, the machine learning model can score each word possibly belonging to the semantic role, set a threshold value for different semantic roles, and determine the word with the score higher than the threshold value as the semantic role. In some embodiments, the machine learning model may be a classification model. In some embodiments, the classification model may be a maximum entropy model (also known as a maximum entropy classifier, see, for example, the literature published in the journal software journal:semanticrole labeling based on maximum entropy classifiers). In other embodiments, other classification models may be used instead of the maximum entropy model to achieve the same effect.

In some embodiments, the semantic role annotation system may implement the following steps:

(1) verbs in each grammar structure in a single sentence are used as predicates based on the part-of-speech.

(2) And determining related words according to the predicates. For example, a word connected to the predicate by a grammatical structure may be determined as the relevant word. For example, FIG. 2 shows an example of determining related words based on predicates. Wherein the predicate is "punishment", and the words connected to the predicate by a grammatical structure are "party center", "want", "insistence" and "crime".

(3) The actual semantic role of the related words is determined based on the maximum entropy model.

As described above, step (4) may include classifying the related terms to determine which semantic role they are likely to belong to. Generally, each semantic role is corresponding to some part-of-speech of a word, e.g., A0 (which usually represents the action actor, i.e., the subject) corresponds to a noun or pronoun, and A0 cannot be a verb.

In addition, the step (4) may further include scoring each related term, and determining the related term with the score higher than a preset threshold as the semantic role.

Semantic roles may include, but are not limited to: a0 (typically representing action actors, i.e. subjects), a1 (typically representing action influencers, i.e. objects), ADV (subjects), TMP (time), LOC (location), MNR (manner), BNE (beneficiary), CND (condition), DIR (direction), DGR (degree), EXT (spread), FRQ (frequency), PRP (purpose or cause), etc.

Abstract generation system

The summary generation system may perform the following steps:

(1) the extracted single sentence as the abstract (i.e., the case where the text includes a plurality of single sentences) or the separation symbols such as commas, semicolons, etc. in the single sentence (i.e., the case where the text itself is a single sentence) are changed to periods so that each of the clauses in the generated single sentence becomes an individual sentence.

(2) The TextRank algorithm is used to abstract all individual sentences (i.e., using the single sentence abstract extraction system described above).

(3) And adjusting the extracted abstract according to a certain rule.

Experiments show that the abstract extracted according to the step (2) can have the following problems:

(a) the original abstract-extracted single sentence or single sentence itself has a definite subject, but pronouns are used in the abstract extracted in this step to refer to the original subject.

(b) The originally extracted single sentence as the abstract or the single sentence has a definite subject, but the subject is omitted in the abstract extracted in the step;

(c) the originally extracted single sentence or single sentence serving as the abstract has time words or place words, and the abstract extracted in the step lacks the time words or the place words.

Therefore, the method of the present invention can also adjust the summary. For example, semantic roles are added to the abstracts extracted at this step based on the following rules:

(a) if the first word of the abstract extracted at this step is a pronoun (e.g., "he", "she", "they", "it", "this", etc.), the pronoun is replaced with the semantic character a1 in the last atomic sentence of the abstract (i.e., the last current individual sentence);

(b) if the first word of the abstract extracted at this step is not a noun or pronoun, the semantic role a0 in the previous atomic sentence is added to the beginning of the abstract extracted at this step (if the a0 of the previous atomic sentence is empty, look for a0 in the previous atomic sentence until an a0 is found).

(c) If the abstract extracted in this step does not contain the time word TMP and the location word LOC and the originally extracted single sentence or single sentence itself as the abstract contains the time word TMP and the location word LOC, TMP and LOC thereof are extracted and added to the beginning of the abstract extracted in this step.

The overall scheme of the invention will be described below as an embodiment with reference to a piece of text.

The text is as follows:

"the chinese union of bank released the ETC issuing platform from 2019 month 6, and as of this year 10 months, the platform had been docked with the ETC issuing parties in beijing, shanghai, guangdong, flunan, zhejiang, jilin, north of lake, tianjin 8 provinces and cities, and the vehicle owner user in the above-mentioned area can apply for the ETC card provided by the local issuing party through the union of bank ETC issuing platform quickly, and can enjoy the non-inductive payment of highway toll after binding the designated union of bank card for deduction. In addition to the areas, the Unionpay also carries out cooperative docking with multiple places at a high speed, and subsequently has related functions of online, so as to actively respond to the specific requirements of 'implementation scheme for deepening toll road system and reforming and canceling highway provincial toll station' issued by office of State Council, and provide convenient ETC application and payment service for the majority of vehicle owners.

The Unionpay ETC issuing platform supports users to initiate applications through different online and offline channels, wherein the cloud flash APP can be opened by users in Beijing, Shanghai, Guangdong, Henan, Zhejiang, Jilin, Hubei and the like to carry out online applications. Taking Guangdong area Yuetong card application as an example, a car owner enters more ' life choice ' from a cloud flash APP ' preferential ' home page, enters intelligent passing ' ETC service, quickly inputs vehicle information, binds a designated Unionpay card, can select on-site self-pickup or mailing OBU equipment after application is completed, and can quickly pass through an ETC lane on a highway after self-help installation and activation. Users in Tianjin area can carry the identity card and the driving license to a service network point under the networking toll center line of the expressway in Tianjin city to apply for ETC, and the system can be used after field installation and activation, and can automatically identify and deduct toll without parking. On the basis of providing convenient ETC application service for users, the Unionpay and Unionpay combination industry partners develop the preferential activities of the ETC Unionpay noninductive payment toll in multiple places successively.

And next, China Unionpay continuously inherits open and cooperative service concepts, aims at 'benefiting people, facilitating people and benefiting people', further exerts platform advantages, assists ETC release and popularization, and promotes rapid access in more regions. In addition, the Unionpay can develop ETC promotion and release with multi-field partners such as parking, refueling, logistics, insurance, communication and the like, and more convenient travel service is provided for vast users. "

The weight value of each sentence is obtained according to the TextRank algorithm as follows:

[0.16841363，0.11995843，0.11883978，0.13119057，0.114654860.12456722，0.10932548，0.11305002]

according to the weighted value, the first sentence of the text, namely Chinese Union Pay, is that the ETC issuing platform is released from 2019 in 6, and is butted with ETC issuers in 8 provinces and cities of Beijing, Shanghai, Guangdong, Henan, Zhejiang, Jilin, Hubei and Tianjin as of 10 th month in this year, vehicle owners in the areas can quickly apply for ETC cards provided by local issuers through the Union Pay ETC issuing platform, and can enjoy the non-inductive payment of expressway toll after binding the designated Union Pay cards for deduction. "may be taken as a summary.

The following results can be obtained after the semantic role is labeled on the abstract for one time:

the first clause: the predicate "push out", a0 "chinese union of bank", TMP "6 months in 2019", a1 "platform".

The second clause: the predicate "cut to", TMP "month 10 this year".

The third clause: the predicate "docking", LOC "beijing", LOC "shanghai", LOC "guangdong", LOC "he nan", LOC "zhejiang", LOC "jilin", LOC "north of lake", and LOC "tianjin".

The fourth clause: a0 "owner user in the above area", MNR "pass", ADV "fast", predicate "apply", a1 "card"; a0 "local issuer", predicate "provide", a1 "card".

The fifth clause: the predicate "deduct money", ADV "binds the designated Unionpay card"

Sixth clause: the predicate "enjoy", ADV "just, a 1" pay "

The abstract of the second extraction is: has been docked with 8 ETC issuers in Beijing, Shanghai, Guangdong, Henan, Zhejiang, Jilin, Hubei and Tianjin provinces and cities.

The Chinese Union of silver "is the action force provider A0 nearest to the sentence where the secondary abstract is located, and" 10 months this year "is the time TMP nearest to the sentence where the secondary abstract is located, so the abstract generated after adding the semantic role is: in this year, 10 months, Unionpay of China has been in butt joint with ETC issuers in Beijing, Shanghai, Guangdong, Henan, Zhejiang, Jilin, Hubei and Tianjin 8 provinces and cities.

The system, method and apparatus of the embodiments of the present invention can be implemented as pure software (e.g., a software program written in Java), as pure hardware (e.g., a dedicated ASIC chip or FPGA chip), or as a system combining software and hardware (e.g., a firmware system storing fixed code or a system with a general-purpose memory and a processor), as desired.

Moreover, while the operations of the method of the invention are depicted in the drawings in a particular order, this does not require or imply that the operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.

It should be noted that although in the above detailed description several software means/modules and sub-means/modules are mentioned which implement the above described method, such a division is not mandatory. Indeed, the features and functionality of two or more of the devices described above may be embodied in one device/module according to embodiments of the invention. Conversely, the features and functions of one apparatus/module described above may be further divided into embodiments by a plurality of apparatuses/modules.

While the spirit and principles of the invention have been described with reference to several particular embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, nor is the division of aspects, which is for convenience only as the features in such aspects may not be combined to benefit. The invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims

1. A method for generating a summary of text, comprising:

preprocessing the text;

marking parts of speech of words in the text and determining a grammatical structure existing in the text;

determining a category of semantic roles for the word based on the part of speech and the grammar structure;

extracting the abstract of the text from the clauses of the text according to a preset algorithm; and

and adjusting the abstract.

2. The method of claim 1, wherein the step of pre-processing the text comprises:

performing word segmentation on the text to obtain the word.

3. The method of claim 2, wherein the step of pre-processing the text further comprises:

removing stop words in the text; or

The text is extracted from the rich text in HTML format using regular expressions.

4. The method of claim 1, wherein the step of determining a category for a semantic role for the term based on the part of speech and the syntactic structure further comprises:

taking verbs in the grammar structure as predicates based on the parts-of-speech;

determining related words related to the predicates according to the predicates; and

determining a category of semantic roles for the related terms based on a classification model.

5. The method of claim 4, wherein the step of determining a category for the semantic role of the related term based on a classification model comprises:

classifying each word in the related words according to the category of the semantic role, and determining the category to which each word can be subordinate; and

in any one of the categories, the score of each word subordinate to is evaluated, and the word with the score higher than a threshold value is determined as the category with the semantic role.

6. The method of claim 1, wherein the step of adjusting the summary comprises:

if the part of speech of the first word in the abstract is a pronoun, searching the word which is closest to the abstract and has the semantic role category as an object in the sentence before the abstract in the text, and replacing the first word with the word of which the semantic role category is an object; or

If the part of speech of the first word in the abstract is not a noun or a pronoun, searching the word which is closest to the abstract and has the category of the semantic role as the subject in the sentence before the abstract in the text, and adding the word which has the category of the semantic role as the subject to the beginning of the abstract.

7. The method of claim 1, wherein the step of adjusting the summary comprises:

if the words of which the category of the semantic role is time and/or place are not contained in the abstract and the words of which the category of the semantic role is time and/or place exist in the text, the words of which the category of the semantic role is time and/or place are added to the beginning of the abstract.

8. A method for generating a summary of text, comprising:

preprocessing the text;

extracting a first abstract of the text from sentences of the text according to a preset algorithm;

marking the part of speech of the words in the first abstract and determining a grammatical structure existing in the first abstract;

extracting a second abstract of the first abstract from the clauses of the first abstract according to the preset algorithm; and

and adjusting the second abstract.

9. The method of claim 8, wherein the step of pre-processing the text comprises:

performing word segmentation on the text to obtain the word.

10. The method of claim 9, wherein the step of pre-processing the text further comprises:

removing stop words in the text; or

11. The method of claim 8, wherein the step of determining a category for a semantic role for the term based on the part of speech and the grammar structure further comprises:

12. The method of claim 11, wherein the step of determining a category for the semantic role of the related term based on a classification model comprises:

13. The method of claim 8, wherein the step of adjusting the second summary comprises:

if the part of speech of the first word in the second abstract is a pronoun, searching the word which is closest to the second abstract and has the semantic role category as an object in the first abstract and in a clause before the second abstract, and replacing the first word with the word of which the semantic role category is an object; or

If the part of speech of the first word in the second abstract is not a noun or a pronoun, searching the word which is closest to the second abstract and has the semantic role of the subject in the clauses before the second abstract in the first abstract, and adding the word which has the semantic role of the subject to the beginning of the second abstract.

14. The method of claim 8, wherein the step of adjusting the second summary comprises:

if the words of which the category of the semantic role is time and/or place are not contained in the second abstract and the words of which the category of the semantic role is time and/or place exist in the first abstract, the words of which the category of the semantic role is time and/or place are added to the beginning of the second abstract.

15. A system for generating a summary of text, comprising:

a text pre-processing system for pre-processing the text;

a part-of-speech tagging and syntactic structure analysis system for tagging parts-of-speech of words in the text and determining syntactic structures present in the text;

a semantic role tagging system for determining a category of a semantic role for the word based on the part of speech and the syntactic structure;

the single sentence abstract extracting system is used for extracting the abstract of the text from the clauses of the text according to a preset algorithm; and

a digest adjustment system for adjusting the digest.

16. The system of claim 15, wherein the text pre-processing system comprises:

means for segmenting the text to obtain the words.

17. The system of claim 16, wherein the text pre-processing system further comprises:

means for removing stop words in the text; or

Means for extracting the text from the rich text having an HTML format using regular expressions.

18. The system of claim 15, wherein the semantic role annotation system comprises:

means for determining a predicate from the grammar structure based on the part-of-speech;

means for determining a related word related to the predicate from the predicate;

means for determining a category of semantic role for the related term based on a classification model.

19. The system of claim 18, wherein the means for determining the category of the semantic role for the related term based on a classification model further comprises:

a module for classifying each of the related terms by category of semantic role and determining a category to which each term can depend; and

and a module for evaluating the score of each word subordinate to any one of the categories, and determining the word with the score higher than a threshold value as the word in any one of the categories with the semantic role.

20. The system of claim 15, wherein the summary adjustment system comprises:

a module for searching the words which are closest to the abstract and have the category of semantic characters as objects in the text in the clauses before the abstract if the part of speech of the first word in the abstract is a pronoun, and replacing the first word with the words which have the category of the semantic characters as objects; or

And if the part of speech of the first word in the abstract is not a noun or a pronoun, searching the word which is closest to the abstract and has the category of the semantic role as the subject in the sentence before the abstract in the text, and adding the word which has the category of the semantic role as the subject to the beginning of the abstract.

21. The system of claim 15, wherein the summary adjustment system further comprises:

means for adding words of semantic role category of time and/or place to the beginning of the abstract if the words of semantic role category of time and/or place are not included in the abstract and words of semantic role category of time and/or place are present in the text.

22. A system for generating a summary of text, comprising:

a text pre-processing system for pre-processing the text;

a single sentence abstract extraction system for extracting a first abstract of the text in a sentence of the text according to a predetermined algorithm;

a part-of-speech tagging and syntactic structure analysis system for tagging parts of speech of words in the first abstract and determining syntactic structures existing in the first abstract;

a single sentence abstract extraction system for extracting a second abstract of the first abstract from the clauses of the first abstract according to the predetermined algorithm; and

a digest adjustment system for adjusting the second digest.

23. The system of claim 22, wherein the text pre-processing system comprises:

means for segmenting the text to obtain the words.

24. The system of claim 23, wherein the text pre-processing system further comprises:

means for removing stop words in the text; or

25. The system of claim 22, wherein the semantic role annotation system comprises:

means for determining a related word related to the predicate from the predicate; and

26. The system of claim 25, wherein the means for determining the category of the semantic role for the related term based on a classification model further comprises:

27. The system of claim 22, wherein the summary adjustment system comprises:

a module for searching for a word which is closest to the second abstract and has a semantic character category of object in the first abstract and in a clause before the second abstract if the part of speech of the first word in the second abstract is a pronoun, and replacing the first word with a word of which the semantic character category is of object; or

And a module for searching for a word which is closest to the second abstract and has a semantic role of a subject in a clause before the second abstract in the first abstract if the part of speech of the first word in the second abstract is not a noun or a pronoun, and adding a word which has a semantic role of a subject to the beginning of the second abstract.

28. The system of claim 22, wherein the summary adjustment system comprises:

means for adding words of semantic role category of time and/or place to the beginning of the second summary if the words of semantic role category of time and/or place are not included in the second summary and words of semantic role category of time and/or place are present in the first summary.

29. A computer readable medium having computer readable instructions stored thereon which, when executed by a computer, are capable of performing the method of any one of claims 1-4.