WO2016121048A1

WO2016121048A1 - Text generation device and text generation method

Info

Publication number: WO2016121048A1
Application number: PCT/JP2015/052478
Authority: WO
Inventors: 佐藤　美沙; 利昇三好; 利彦柳瀬; 芳樹丹羽; 孝介柳井
Original assignee: 株式会社日立製作所
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2016-08-04

Abstract

A text generation device is provided with: (1) an input unit used to input text to be processed and theme information; (2) a replacement target expression extraction unit for extracting, as a replacement target expression, one or more from one or more unique expressions included in the text on the basis of the theme information, and specifying a keyword that expresses the theme information; (3) a candidate generation unit for generating a plurality of candidate expressions as replacement candidates for abstracting the replacement target expressions by using dictionary information accumulated in advance; (4) a first evaluation unit for outputting a first evaluation result obtained by evaluating the candidate expressions by using the dictionary information; and (5) a post-conversion text generation unit for generating post-conversion text by replacing the replacement target expression by the candidate expression which is highly valued as the first evaluation result.

Description

Sentence generating apparatus and method

The present invention relates to a sentence generation apparatus that abstracts a sentence or a sentence given by a user and a method executed by the apparatus.

In the field where new text is generated by editing existing text such as automatic summarization, or in the field called concept-to-text generation for the purpose of generating text from a concept, the original meaning was changed by keyword replacement. There is a method to generate sentences.

In the following Patent Document 1, a recommendation sentence is generated from a sentence example by replacing a keyword. Specifically, first, a sentence example is selected based on a keyword designated by the user, and the keyword in the sentence example is associated with the input keyword. The degree of similarity between corresponding keywords is measured, and when the degree of similarity is medium, the target sentence is obtained by replacing the keyword in the sentence example with the keyword specified by the user.

JP 2001-256222 A

By the way, in the discussion, it is preferable to briefly state an assertion sentence with a certain level of abstraction that represents the content to be claimed, rather than just a specific case, so that the sentence is clear and easy to understand It becomes. For example, rather than the sentence “We should continue to promote economic assistance in the future. Malaria is endemic every year and many people die.” It is hoped that an abstract sentence that “malaria is prevalent and many people have died” will be created.

Therefore, it is possible to replace the proper expression in the sentence with a more abstract expression and generate an assertion sentence. A specific expression represents an entity, and an abstract expression represents a higher level concept of the entity. For example, if the sentence “Malaria is endemic every year in Myanmar” is given, you can generate an assertion that “Malaria is endemic every year in developing countries” by replacing “Myanmar” with “Developing countries”. .

However, there are multiple corresponding superordinate concepts for entities. There are various levels of abstraction and directions. For sentence abstraction, when replacing a proper expression with a superordinate concept expression, it is necessary to check whether the contents of the replaced sentence are correct and whether it matches the context of the preceding and following sentences. It is necessary to appropriately select the superordinate concept expression in consideration.

By the way, the technique described in Patent Document 1 relates to generation of a recommended sentence, and the sentence cannot be abstracted. In this technique, the term input by the user is used as it is as the replacement term. Further, in the technique, when there are a plurality of replacement destination candidates, the replacement destination term is not automatically selected.

The present invention has been made in view of the above, and provides a mechanism for automatically generating a properly abstracted sentence or sentence based on a given sentence or sentence while maintaining the correctness of the contents. To do.

A sentence generation system which is one of the inventions for solving the above problem has the following sections.
(1) Input section used to input sentence and theme information to be processed
(2) A replacement target expression extraction unit that extracts one or more of one or more unique expressions included in the sentence based on the theme information as a replacement target expression and specifies a keyword representing the theme information
(3) A candidate generation unit that generates a plurality of candidate expressions that are replacement candidates that abstract the replacement target expression using dictionary information stored in advance.
(4) A first evaluation unit that outputs a first evaluation result obtained by evaluating the candidate expression using the dictionary information.
(5) A post-conversion sentence generation unit that generates a post-conversion sentence by replacing the replacement target expression with the candidate expression having a high evaluation in the first evaluation result

According to the present invention, a replacement target expression included in a sentence can be replaced with an appropriate candidate expression in relation to the input theme information, and a more abstract post-conversion sentence that is easy to understand is automatically generated. Can be generated. Problems, configurations, and effects other than those described above will become apparent from the following description of embodiments.

1 is a diagram illustrating a hardware configuration of a document generation device according to a first embodiment. 1 is a diagram illustrating a functional configuration of a document generation apparatus according to a first embodiment. The figure which shows an example of an entity information table. The figure which shows the data structure example of an entity information table. The figure which shows the function structure of a 1st evaluation part. The figure explaining the content of the score calculated in a 1st evaluation score calculation part. The figure which shows the function structure of a 2nd evaluation part. 6 is a flowchart for explaining a processing procedure executed by the document generation apparatus according to the first embodiment. The flowchart explaining the process sequence performed in a 1st evaluation part. The flowchart explaining the process sequence performed in a 2nd evaluation part. The figure which shows the hardware constitutions of the document production | generation apparatus of 2nd Embodiment.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The embodiment of the present invention is not limited to the form described later, and various modifications are possible within the scope of the technical idea. In the following embodiment, a case where an English or Japanese document is mainly processed will be described. However, if a language-specific process is replaced, other languages such as Chinese can be applied in the same procedure.

(1) First Embodiment In this embodiment, a generalized sentence is obtained by inputting a sentence composed of one sentence or a plurality of sentences and a text representing the theme information of the sentence and performing appropriate replacement. A sentence generation device having a function of outputting will be described. For example, when given the keyword “malaria” and the sentence “We should continue to promote economic assistance in the future. Malaria is endemic and many people die in Myanmar every year”. Replacing the term “Myanmar” with the general expression “developing countries” and “promoting economic assistance in the future. Output.

(1-1) Hardware Configuration The text generation device is configured with hardware using a normal computer. FIG. 1 shows an example of a specific hardware configuration. The sentence generator includes an input device 110, an output device 120, an arithmetic device 130, a memory 140 that stores various data and various programs, a storage device 150 that stores various data and various programs, and a network device that controls communication with an external device. 160, and a bus 170 connecting them. When only the data in the storage device is used, the network device 170 is not necessary. Further, when operating remotely via a network, the input device 110 and the output device 120 can be omitted.

(1-2) Functional Block Configuration FIG. 2 shows the functions of a program executed through the arithmetic unit 130 of the sentence generation device. The input unit 210 receives a sentence to be replaced (only one sentence may be used) and theme information instructed by the user. An input device 110 (keyboard, mouse or other input device, GUI screen, etc.) is used to input text and theme information to the input unit 210. The entity extraction unit 220 performs linguistic analysis on the input text and theme information, and identifies a specific expression to be replaced as an entity. The “entity extraction unit” is also referred to as a “replacement target expression extraction unit”.

The entity information table 230 stores entity replacement destination candidate information. The entity information table 230 is stored as a file in the memory 140 or the storage device 150. The candidate generation unit 240 generates a replacement destination candidate for the entity extracted with reference to the entity information table 230. The first evaluation unit 250 calculates a first evaluation score using the entity information table 230 for the generated candidate. The first evaluation score is executed for each sentence. The second evaluation unit 260 calculates a second evaluation score for each candidate from the viewpoint of the entire sentence (a plurality of sentences). Note that the evaluation by the second evaluation unit 260 may be performed on a candidate with a high evaluation result by the first evaluation unit 250. The post-conversion sentence generation unit 270 determines a replacement destination candidate based on the first evaluation score and the second evaluation score, and generates a final sentence using the determined candidate. Note that when the conversion target is a single sentence, the post-conversion sentence generation unit 270 is also referred to as a “post-conversion sentence generation unit”. The output unit 280 presents (displays) the generated text (abstracted text) to the user through the output device 120.

(1-3) Description of Each Functional Unit Hereinafter, specific processing contents executed by each unit will be described individually.

(1-3-1) Entity Extraction Unit The entity extraction unit 220 first identifies a keyword described as a theme based on the input text and theme. However, when the theme is input as a keyword, the input is used as it is as a keyword. When the theme is input as a sentence, the keyword is specified from the expression in the sentence. Specifically, language analysis is performed on the input theme, and a specific expression is extracted. Among the proper expressions, the one with the most appearances is set as a keyword. Alternatively, an expression that appears in common with the text and the theme is extracted and used as a keyword.

Next, the entity extraction unit 220 performs linguistic analysis on the input sentence, and extracts one or more specific expressions included in the sentence. Among the extracted specific expressions, those that are not keywords are used as specific expressions to be replaced (also referred to as “entities” or “replacement target expressions”). A specific expression that represents a date / number is an entity. There may be multiple entities in a sentence.

When the keyword “malaria” and the sentence “Economic assistance should continue to be promoted. Malaria is endemic every year and many people die in Myanmar” are input to the sentence generator. Then, “Myanmar” and “Malaria” are extracted as specific expressions from the text, and “Myanmar” that is not a keyword is extracted as an entity (replacement target expression).

[Entity information table]
FIG. 3 shows a conceptual diagram of the entity information table 230. The entity information table 230 is a dictionary (dictionary information) that stores one or more pairs of entities and their abstract expressions. A circle in the cell indicates that the entity in the corresponding column can take a candidate expression of the corresponding row. By referring to the entity information table 230, it is possible to examine abstract expressions that an entity can take. On the contrary, by referring to the entity information table 230, it is possible to examine entities that can take a certain abstract expression.

FIG. 4 shows an example of the data structure of the entity information table 230. As shown in FIG. 4, the entity information table 230 is a dictionary that uses a character string of a unique expression as a key, and a value has an entity represented by the unique expression. Entities consist of classes and candidates. It has multiple abstract representations for each entity as candidate fields. Each entity can have a class to which the entity belongs in the field. The class is a semantic classification such as “person name”, “location”, “organization name”, and the like. Each entity may have a synonym expression field in order to prevent a plurality of data for the same entity from being distributed in the entity information table 230. Each abstract expression can be scored according to the frequency of co-occurring with the corresponding specific expression. In “Myanmar”, “country with government”, “developing country”, “humid area”, “South Asia”, “country”, etc. are acquired as candidate expressions.

[How to deal with ambiguity in the entity information table]
The same string may represent multiple different entities. To distinguish between these, each entity has a class field, as described above. For example, “Nokia” may represent “Nokia”, a Finnish city, and “Nokia”, a telecommunications equipment manufacturer that sells mobile phones and the like. Therefore, when representing Nokia in a Finnish city, the class is “place name” and the candidates have “city” and “Europe”. On the other hand, when representing Nokia of a telecommunication equipment manufacturer, the class is “organization name” and the candidates have “telecommunications equipment manufacturer” and “company”. As described above, the entity can be distinguished by storing the entity separately in the class and the candidates.

[How to automatically generate an entity information table]
The entity information table 230 can be created by manually assigning an entity to a specific expression and its abstract expression. However, it is difficult to manually add an abstract expression to all of a large number of unique expressions. Therefore, the relationship extraction technology automatically extracts the entity and the relationship information about the entity from the plain text, and gives an abstract expression from the acquired relationship information.

For example, if there is a sentence "Nokia is a Finnish telecommunications corporation." “Relation information that is related can be extracted. From this relationship information, “a Finnish telecommunications corporation” can be stored in the entity information table 230 as an abstract expression of “Nokia”. Further, by removing the modified expression from “a Finnish telecommunications corporation”, “a corporation” can be obtained as an abstract expression having a higher abstraction level. Or you can get “a" country which has casinos ”as an abstract representation of“ Turkey ”from the ownership relationship of“ Turkish casinos ”. These processes are also realized through a program executed by the arithmetic device 130.

(1-3-2) Candidate Generation Unit The candidate generation unit 240 refers to the entity information table 230 and generates a plurality of candidate expressions that are candidates for replacing each entity. It should be noted that the possibility of not replacing is also ensured by including the specific expression to be replaced in the candidate expression.

(1-3-3) First Evaluation Unit FIG. 5 shows a functional configuration of the first evaluation unit 250. The first evaluation unit 250 gives a first evaluation result in consideration of the content of the sentence to each candidate expression of the entity. First, a similar case in which the similar case sentence search unit 251 represents a case similar to the case represented by the sentence including a specific expression (“entity” or “replacement target expression”) to be replaced. A plurality of sentences are acquired from the sentence text data 252. The sentence text data 252 may be text data stored in advance or text data on the Web. Similar case sentences can be acquired by searching for similar sentences using an associative search engine using a query obtained by excluding a specific expression (replacement target expression) to be replaced from words in the sentence.

For the acquired similar case sentence, the similar case entity extracting unit 253 extracts an entity in the similar sentence corresponding to the entity in the input sentence. For example, as a similar sentence, “Malaria is endemic every year and many people die in Myanmar”, “Malaria is endemic every year and many people die in Cambodia”. At this time, the similar case entity extraction unit 253 extracts “Cambodia” as an entity corresponding to “Myanmar”. In this case, “Myanmar” and “Cambodia” are similar case entities. The similar case entity extraction unit 253 is also referred to as a “corresponding expression extraction unit” in this specification.

The first evaluation score calculation unit 254 calculates a score representing the accuracy of the extracted entity replacement expression candidate with a numerical value. The operation of the first evaluation score calculation unit 254 will be described with reference to FIG. Except for the bottom row and the rightmost column of the table, a part of the entity information table 230 is cut out. FIG. 6 shows a column of replacement target entities (replacement target expressions) and all candidate expressions that can be taken by the replacement target entities (replacement target expressions). is there. A circle in the cell indicates that the entity in the corresponding column can take a candidate expression of the corresponding row.

The bottom row of the table indicates whether the entity in that column has been extracted as a similar case entity. The rightmost column of the table represents the calculation result (first evaluation result) of the first evaluation score for each candidate expression. The first evaluation score calculation unit 254 gives, to each candidate expression of an entity, (1) a high score for an abstract expression for more similar case entities, and (2) a non-similar case entity Therefore, the first evaluation result is given so as to reflect two viewpoints of giving a high score to a non-abstract expression. Specifically, based on the following formula, a score that gives the degree of accuracy of replacement with the abstract expression a is calculated.

First evaluation result (a) = harmonic average of (P (a), R (a)) However, evaluation P (a) and evaluation R (a) are given below, respectively.
・ Evaluation P (a)
= (Number of similar case entities having a as an abstract expression) / (Number of all entities having a as an abstract expression)
・ Evaluation R (a)
= (Number of similar case entities having a as an abstract expression) / (Number of similar case entities having a as an abstract expression)

In this example, the highest score of 1.0 is given to “developing countries”, and the next score of 0.8 is given to “humid areas”. The first evaluation score calculation unit 254 is also referred to as a “score calculation unit” in this specification.

Incidentally, the calculation method of the first evaluation result is not limited to this. For example, the similar case sentence search unit 251 may simultaneously search for sentences that deny similar cases and use them for calculating the first evaluation result. The similar case entity extraction unit 253 extracts “similar case negative entity” in which occurrence of the similar case is denied for the sentence that denies the similar case. The abstract representations that similar case negative entities can take are inappropriate when replacing the original case text. Therefore, the following formula is used by adding a case classification to the calculation formula of the first evaluation result.
・ First evaluation result P (a)
= {(Number of similar case entities having a as an abstract expression) / (Number of all entities having a as an abstract expression)}
However, if (number of similar case negative entities having a as an abstract expression)> 1, the first evaluation result P (a) = 0.

This makes it possible to abstract sentences that are accurate even for entities for which the occurrence of similar case entities is denied.

[Update of entity information table by first evaluation unit]
When the information in the entity information table 230 is small and the correspondence information between the entity and the candidate expression is small, the first evaluation unit 250 gives a high evaluation to most candidate expressions. Therefore, when the variation of the first evaluation result P (a) calculated for each abstract expression is smaller than a predetermined threshold, the entity information table 230 is updated with reference to other text data. Specifically, a sentence in which a candidate expression that a certain entity can take and another entity co-occurs is searched. The first evaluation unit 250 extracts the syntactic relationship about the entity from the sentence through the execution of the relationship extraction technique described above. This function is referred to as “relation extraction unit” in this specification. When an appropriate relationship is extracted, the first evaluation unit 250 newly adds information that the corresponding entity can take the corresponding candidate expression to the entity information table 230. This function is referred to as “dictionary information update unit” in this specification. In this way, correspondence information between entities and candidate expressions can be increased.

[If there are multiple entities in the statement]
By the way, there may be a case where a plurality of entities exist in one sentence input by the user. In this case, the first evaluation unit 250 generates a provisional sentence by replacing other entities with candidate expressions for individual entities. A process similar to that for calculating the first evaluation result P (a) for one entity is executed for the provisional sentence generated by the number of entities in the sentence. The first evaluation result P (a) when there are a plurality of entities is not given separately to each candidate expression, but is given to a combination of candidate expressions. This combination function is referred to as a “combination generation unit” in this specification.

(1-3-4) Second Evaluation Unit FIG. 7 shows a functional configuration of the second evaluation unit 260. The second evaluation unit 260 gives each candidate expression a second evaluation result considering the context from the contents of the entire sentence. First, the important word extraction unit 261 extracts important words in the input sentence. The important words can be extracted by a technique such as TF-IDF (Term Frequency-Inverse Document Frequency). The synonym expansion unit 262 acquires and outputs a synonym for the given word. Synonyms can be acquired by methods such as a synonym dictionary and Word2Vec. Here, synonym expansion is performed on the keyword extracted by the keyword extraction unit 261 and each candidate expression given from the first evaluation unit 250.

The second evaluation score calculation unit 263 calculates the degree of co-occurrence with an important word in the input sentence for each candidate expression and outputs it as a second evaluation score (second evaluation result). The degree of co-occurrence refers to the relationship between words that are likely to co-occur in general sentences. For example, the co-occurrence degree can be obtained by the number of hits when a search is performed using a word / word combination as a query in a Web search engine. By using the co-occurrence degree, it is possible to measure whether each candidate expression is an abstraction according to the context of the input sentence. When calculating the degree of co-occurrence, a word expanded by the previous synonym expansion may be used.

In “developing countries” and “humid areas”, a higher context appropriateness score (second evaluation result) is given to “developing countries” that have a high co-occurrence with the key word “economic assistance” in the input text. Given. When there are a plurality of entities in the sentence, a second evaluation score (second evaluation result) is calculated for the combination of candidate expressions.

(1-3-5) Post-conversion sentence generation unit The post-conversion sentence generation unit 270 uses the candidate expressions to which high evaluation is given in each of the first evaluation unit 250 and the second evaluation unit 260 to By substituting, a converted sentence (or converted sentence) is generated. In order to make a natural sentence, an operation of changing the candidate expression from the singular to the plural, an operation of changing the first letter of the sentence to upper case, and the like are also performed.

When the evaluation results of the first evaluation unit 250 and the second evaluation unit 260 are the same value for a plurality of candidate expressions, it is assumed that both are equally suitable as replacement expressions, A sentence may be generated using any candidate expression. Alternatively, the selection may be made using criteria such as a small number of words constituting the candidate expression and a score of the candidate expression stored in the entity information table 230.

(1-4) Process Flow (1-4-1) Process Overview With reference to FIG. 8, the process flow executed by the sentence generation device when generating an asserted sentence will be described.

Step S800
The user uses the input device 110 to input a sentence to be replaced and a theme of the sentence. The input sentence and theme are analyzed through the arithmetic unit 130 and given to the entity extraction unit 220.

Step S801
The entity extraction unit 220 extracts a specific expression from each of the input sentence and the theme information, and specifies a specific expression (entity) to be replaced and a keyword representing the theme information.

Step S802
The candidate generating unit 240 refers to the entity information table 230 for each entity specified in step S801, and acquires a plurality of replacement candidate expressions. For a specific expression in a sentence, a class can be acquired as a result of the specific expression recognition. The candidate generation unit 240 acquires information from the entity information table 230 using the character string and class of the unique expression, and acquires a plurality of candidate expressions to be replaced.

Step S803
The first evaluation unit 250 calculates a first evaluation result for the candidate expression generated by the candidate generation unit 240. That is, the first evaluation unit 250 assigns an accuracy score to each candidate expression.

Step S804
The second evaluation unit 260 calculates a second evaluation result for the candidate expression generated by the candidate generation unit 240. That is, the second evaluation unit 260 assigns a context appropriateness score.

Step S805
The post-conversion sentence generation unit 270 uses the candidate expression with the highest evaluation result to replace the entity, and generates a post-conversion sentence.

(1-4-2) Processing of First Evaluation Unit Details of processing executed by the first evaluation unit 250 (step 803) will be described using FIG.

Step S901
The similar case sentence search unit 251 creates a character string obtained by removing the entity from the target sentence to be replaced as a query.

Step S902
The similar case sentence search unit 251 gives the query created in step S900 to the associative search engine, and acquires a plurality of similar case sentences representing cases similar to the case represented by the input sentence.

Step S903
The similar case entity 253 performs language analysis on each similar case sentence, and extracts a specific expression as in step S801.

Step S904
The similar case entity 253 associates specific expressions, and selects a specific expression corresponding to the entity among the specific expressions in each similar case sentence.

Step S905
The similar case entity 253 acquires candidate expressions from the entity information table 230 for the corresponding specific expressions, as in step S802.

Step S906
The first evaluation score calculation unit 254 counts, for each candidate expression generated by the candidate generation unit 240, the number of corresponding specific expressions in the similar case sentence that have the same candidate expression. Output as accuracy score for each candidate expression. Whether the candidate expressions are the same can be determined by character string matching.

Step S907
The first evaluation score calculation unit 254 ranks the candidate expressions using the calculated accuracy score. By leaving only candidates with a certain rank or higher or a score or higher, it is possible to select a highly accurate candidate. When the candidate expression is narrowed down to one by the evaluation based on the accuracy score, the evaluation by the second evaluation unit 260 can be omitted.

(1-4-3) Processing of second evaluation unit / step 1001
First, the keyword extraction unit 261 generates a combination of candidate expressions.

Step 1002
Next, the important word extraction unit 261 extracts words other than the unique expression from the input sentence. However, frequent words such as “of” and “a” are excluded.

Step 1003
The synonym expansion unit 262 expands words included in the candidate expression and the word set of the input sentence into synonyms using WordNet.

Step 1004
The second evaluation score calculation unit 263 counts the overlap between the candidate expression after synonym expansion and the word set extracted in the previous stage, and outputs it as a context appropriateness score.

(1-5) Summary By executing the above processing operations, it is possible to generate a sentence (sentence) in which a specific expression of a sentence representing a specific case is replaced with a specific expression having a more general meaning. . The claim sentence generated by replacing the case sentence has a correspondence with the case sentence. Therefore, for example, the generated sentence is placed in front of the case sentence, and a conjunction that indicates the case is inserted at the beginning of the case sentence to construct a new sentence. In this way, it is possible to automatically construct a compelling sentence because the assertion is clearly indicated and the argument is easy to understand and the case is shown.

(2) Second Embodiment In this embodiment, replacement for generating a statement sentence is replaced with a sentence set in which sentences extracted from a plurality of sentences are arranged in an appropriate order. The same as the form.

FIG. 11 shows an overall image of the sentence generation system used in the present embodiment. The system includes a text generation device 1100 and a data management device 1101. When a topic is input, the sentence generation device 1100 outputs a descriptive sentence that describes an opinion on the topic. The data management device 1101 stores data that has been processed in advance and is accessible from the text generation device 1100.

The sentence generation device 1100 sequentially executes nine processing functions. First, the input unit 1102 receives a topic from the user. Next, the topic analysis unit 1103 analyzes the topic and determines the polarity of the topic and the keyword used for the search. The search unit 1104 searches for an article using a keyword and an issue word indicating an issue in the debate. The issue determination unit 1105 classifies the output articles and determines an issue to be used when generating an opinion. The sentence extraction unit 1106 extracts a sentence describing the issue from the output article. The sentence rearrangement unit 1107 generates a sentence by rearranging the extracted sentences. The evaluation unit 1108 evaluates the generated sentence. The replacement unit 1109 inserts appropriate conjunctions, deletes unnecessary expressions, and replaces some unique expressions with abstract expressions according to theme information. The output unit 1110 outputs the sentence with the highest evaluation as a descriptive sentence describing an opinion.

The replacement unit 1109 in the present embodiment has a configuration in which input information is added to the configuration described in the first embodiment. In the following, processing functions added to the first embodiment will be described.

A sentence set rearranged as sentences is input to the input unit 210 used in this embodiment, and a theme or an analysis result of the topic analysis unit 1103 or a keyword used as a query in the search unit 1104 is input as theme information. .

The similar case search unit 251 of the first evaluation unit 250 used in the present embodiment can use the output of the search unit 1104 as a search target. Since each sentence has a document as an extraction source, the information in the entity information table can be updated by extracting the relationship from the document.

The second evaluation unit 260 used in the present embodiment can include topic information in a target whose co-occurrence with candidate expressions is measured. Since each sentence has a document as an extraction source, the degree of co-occurrence with an important word in the document can be included in the evaluation.

On the other hand, the data management system 1101 includes an interface unit 1111, a structuring unit 1112, and four databases 1113 to 1116. The interface DB 1111 provides an access unit for data managed in the database together with the structuring unit 1112. The text data DB 1113 is text data such as news articles. The text annotation data DB 1114 is data assigned to the text data DB 1113. The search index DB 1115 is an index for making the text data DB 1113 and the annotation data DB 1114 searchable. The issue ontology DB 1116 is a database in which issues that are often discussed in debates and related words are linked.

As described above, according to the present embodiment, even when a statement statement is generated for a statement set in which sentences extracted from a plurality of sentences are arranged in an appropriate order, the statement whose contents are more generalized The sentence can be finally output.

(3) Other Embodiments The present invention is not limited to the above-described embodiments, and includes various modifications. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and it is not always necessary to include all the configurations described. Further, a part of the configuration of the above-described embodiment may be deleted, a known technique may be added to the configuration of the above-described embodiment, or a part of the configuration of the above-described embodiment may be known. It may be replaced by the technique of.

In addition, each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by the processor interpreting and executing a program that realizes each function (that is, in software). Information such as programs, tables, and files that realize each function can be stored in a storage device such as a memory, a hard disk, or an SSD (Solid State Drive), or a storage medium such as an IC card, an SD card, or a DVD. Control lines and information lines indicate what is considered necessary for the description, and do not represent all control lines and information lines necessary for the product. In practice, it can be considered that almost all components are connected to each other.

110 Input Device 120 Output Device 130 Arithmetic Unit (Central Processing Unit: CPU)
140 Memory 150 Storage Device 160 Network Device 170 Bus 210 Input Unit 220 Entity Extraction Unit 230 Entity Information Table 240 Candidate Generation Unit 250 First Evaluation Unit 260 Second Evaluation Unit 270 Converted Text Generation Unit 280 Output Unit 251 Similar Case Sentence Search Unit 252 sentence text data 253 similar case entity extraction unit 254 first evaluation score calculation unit 261 important word extraction unit 262 synonym expansion unit 263 second evaluation score calculation unit

Claims

An input unit used for inputting a sentence to be processed and theme information;
A replacement target expression extracting unit that extracts one or more of one or more specific expressions included in the sentence based on the theme information as a replacement target expression and identifies a keyword representing the theme information;
A candidate generation unit that generates a plurality of candidate expressions that are replacement candidates that abstract the replacement target expression using dictionary information stored in advance;
A first evaluation unit that outputs a first evaluation result obtained by evaluating the candidate expression using the dictionary information;
A sentence generation apparatus comprising: a post-conversion sentence generation unit that generates a post-conversion sentence by replacing the replacement target expression with the candidate expression having a high evaluation in the first evaluation result.
The sentence generation device according to claim 1,
A second evaluation unit that outputs a second evaluation result obtained by evaluating the candidate expression based on a relationship with other specific expressions included in a plurality of sentences constituting the sentence;
The post-conversion sentence generation unit generates the post-conversion sentence by replacing the replacement target expression by using the candidate expression that has both high evaluation of the first evaluation result and evaluation of the second evaluation result. A featured sentence generator.
The sentence generation device according to claim 1,
The first evaluation unit includes:
A similar case sentence search unit for searching a similar case sentence representing a case similar to the case represented by the sentence;
A corresponding expression extraction unit that extracts a plurality of corresponding replacement target expressions in which the same case as the replacement target expression is generated from the similar case sentence;
A sentence generation device, comprising: a score calculation unit that generates a plurality of replacement candidate expressions of the corresponding replacement target expression from the dictionary information and gives an accuracy score to the candidate expression.
The sentence generation apparatus according to claim 2,
The second evaluation unit is
An important word extraction unit for extracting an important word in the sentence;
A synonym expansion unit for acquiring a synonym expression of the important word and the candidate expression;
A sentence generation device comprising: a score calculation unit that uses the synonym expression to give a context appropriateness score to the candidate expression.
The sentence generation apparatus according to claim 2,
A combination generating unit that generates a combination of the candidate expressions when a plurality of replacement target expressions are extracted;
The sentence evaluation apparatus, wherein the first evaluation unit and the second evaluation unit output an evaluation with respect to the combination of the candidate expressions.
The sentence generation device according to claim 1,
The first evaluation unit includes:
A relationship extraction unit that extracts a syntactic relationship of the replacement target expression from a sentence;
And a dictionary information updating unit that updates the dictionary information based on the relationship.
In a sentence generation method executed in a sentence generation apparatus having an arithmetic device and a storage device,
The arithmetic unit is
A process for receiving input of a sentence to be processed and theme information;
A process of extracting one or more of one or more specific expressions included in the sentence based on the theme information as a replacement target expression;
Processing for specifying a keyword representing the theme information;
Processing to generate a plurality of candidate expressions that are replacement candidates for abstracting the replacement target expression using dictionary information stored in advance;
A process of outputting a first evaluation result obtained by evaluating the candidate expression using the dictionary information;
A sentence generation method that executes a process of generating a post-conversion sentence by replacing the replacement target expression with the candidate expression having a high evaluation in the first evaluation result.
The sentence generation method according to claim 7,
The arithmetic unit is
Further executing a process of outputting a second evaluation result obtained by evaluating the candidate expression based on a relationship with other specific expressions included in a plurality of sentences constituting the sentence;
The process of generating the post-conversion sentence generates the post-conversion sentence by replacing the replacement target expression using the candidate expression that has a high evaluation of the first evaluation result and a high evaluation of the second evaluation result. A sentence generation method characterized by that.
The sentence generation method according to claim 7,
The process of outputting the first evaluation result includes
A process of searching for a similar case sentence representing a case similar to the case represented by the sentence;
A process of extracting a plurality of corresponding replacement target expressions in which the same case as the replacement target expression is generated from the similar case sentence;
A sentence generation method comprising: generating a plurality of replacement candidate expressions for the corresponding replacement target expression from the dictionary information and assigning an accuracy score to the candidate expression.
The sentence generation method according to claim 8,
The process of outputting the second evaluation result includes
Processing to extract important words in the sentence;
Obtaining a synonym expression of the important word and the candidate expression;
And a process of assigning a context appropriateness score to the candidate expression using the synonym expression.
The sentence generation method according to claim 8,
When a plurality of replacement target expressions are extracted, the method further includes a process of generating a combination of the candidate expressions,
The sentence generating method, wherein the process of outputting the first evaluation result and the process of outputting the second evaluation result output an evaluation with respect to the combination of the candidate expressions.
The sentence generation method according to claim 7,
The arithmetic unit is
A process of extracting syntactical relationships for the replacement target expression from the sentence;
A sentence generation method further comprising: updating the dictionary information based on the relationship.