WO2016051551A1 - 文章生成システム - Google Patents
文章生成システム Download PDFInfo
- Publication number
- WO2016051551A1 WO2016051551A1 PCT/JP2014/076237 JP2014076237W WO2016051551A1 WO 2016051551 A1 WO2016051551 A1 WO 2016051551A1 JP 2014076237 W JP2014076237 W JP 2014076237W WO 2016051551 A1 WO2016051551 A1 WO 2016051551A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sentence
- issue
- unit
- agenda
- word
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/55—Rule-based translation
- G06F40/56—Natural language generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/42—Data-driven translation
- G06F40/44—Statistical methods, e.g. probability models
Definitions
- the present invention relates to a system for automatically generating an opinion sentence for an agenda item.
- the usefulness of is increasing. As an example of such a system that has attracted the most attention recently, there is a question-answering system.
- Patent Document 1 discloses a method of constructing a question answering system by determining a lexical answer type (LAT), searching, and analyzing the search result using the LAT.
- LAT lexical answer type
- Patent Document 1 it is difficult to generate an answer with the technique of Patent Document 1 when there is not one correct answer as in the case of debate and the opinion changes depending on each position.
- the debate there is no simple correct answer when considering a statement describing an opinion on the agenda, and the value to be considered differs depending on the position. Therefore, even if analyzed using the LAT, the answer cannot be narrowed down to one. .
- Patent Document 1 it is impossible to output a sentence composed of a plurality of sentences only by outputting a single noun phrase or sentence as an answer.
- the present invention has been made in view of the above, and an object of the present invention is to automatically generate a descriptive sentence that expresses an opinion on an agenda as used in a debate by the system.
- the present application includes a plurality of means for solving the above-described problems.
- a sentence generation system that outputs an opinion sentence on an agenda, an input section to which the agenda is input, an agenda, Determines the agenda analysis unit that determines the polarity of the agenda and the keywords used in the search, the search unit that searches for articles using the keywords and the issue words indicating the issues in the discussion, and the issues to be used when generating the opinion sentence Issue determination unit, a sentence extraction unit that extracts sentences describing the issue from the articles output by the search unit, a sentence rearrangement unit that generates sentences by rearranging sentences, and an evaluation unit that evaluates sentences And a paraphrase unit that inserts an appropriate conjunction to the sentence, and an output unit that outputs a sentence having the highest evaluation among the plurality of sentences as an opinion sentence.
- a sentence generation method for outputting an opinion sentence on an agenda the first step in which the agenda is input, the second step in which the agenda is analyzed, and the polarity of the agenda and the keyword used for the search are determined; From the articles output in the third step of searching for articles using issue words indicating the issues in the discussion, the fourth step of determining issues to be used when generating the opinion sentence, and the articles output in the third step A fifth step of extracting the sentence being described, a sixth step of generating a sentence by rearranging the sentence, a seventh step of evaluating the sentence, and an eighth step of inserting an appropriate conjunction to the sentence And a ninth step of outputting a sentence having the highest evaluation among the plurality of sentences as an opinion sentence.
- the figure which shows the text generation system which concerns on this invention The figure which shows text annotation data.
- the text generation system according to the first embodiment of the present invention is a system including a generation system in which nine modules are combined and a data management system.
- a data management system As an example of a specific hardware configuration, as shown in FIG. 12, an input device 1202, an output device 1202, a memory 1205 in which a program for executing each module is recorded, a storage device 1207 including a text data DB, a text annotation data DB 113, and the like. Consists of.
- Fig. 1 shows the overall system.
- Reference numeral 100 denotes a generation system that, when an agenda is input, outputs a descriptive sentence that describes an opinion on the agenda.
- Reference numeral 101 denotes a data management system, which stores data that has been processed in advance and is accessible from the system 100.
- the input unit 102 receives an agenda input from the user. Furthermore, an input regarding whether to generate a positive opinion or a negative opinion for the agenda item may be accepted. In this way, by clarifying the position of the text to be generated, it becomes possible to use this system in discussions such as debates.
- the agenda analysis unit 103 analyzes the agenda, and determines the polarity of the agenda and the keyword used for the search.
- the search unit 104 searches for articles using keywords and issue words indicating issues in the debate. For example, when the agenda is “Casino should be prohibited”, the keyword is “casino” which is a noun phrase. Further, by determining the polarity, it is possible to determine whether to use a positive issue word or a negative issue word for “casino”.
- the issue word here refers to all the words in the issue ontology shown in FIG. 3, and in particular, when the word “issue” is stated, the “word representing the values that become the issues in the discussion” described in 300. Say.
- the “promotion word” refers to “an event that promotes the issue” described in 301.
- the “suppression word” refers to an “event that suppresses the issue” described in 302.
- a search is performed by selecting “casino” as a keyword and a “suppression word” that suppresses the casino as an issue word.
- the processing using “suppression word” as the issue word is performed.
- FIG. 3 a plurality of suppression words are described.
- Only keywords extracted from the agenda include many articles that do not need to be considered in the debate, such as advertising articles about casinos and blog articles that describe only the impressions made at casinos. Therefore, it is not possible to search appropriately.
- the issue determination unit 105 classifies the output articles and determines an issue to be used when generating an opinion.
- the sentence extraction unit 106 extracts a sentence describing the issue from the output article.
- the sentence rearrangement unit 107 generates a sentence by rearranging the extracted sentences.
- the evaluation unit 108 evaluates the generated sentence.
- the paraphrase unit 109 inserts an appropriate conjunction and deletes unnecessary expressions.
- the output unit 110 outputs the sentence with the highest evaluation as a descriptive sentence describing the opinion.
- the data management system 101 includes four databases (Data Base) and an interface / structuring unit 11.
- the interface DB 111 provides an access means for data managed in the database.
- the text data DB 112 is text data such as news articles, and the text annotation data DB 113 is data assigned to the text data DB 112.
- the search index DB 114 is an index for enabling the text data DB 112 and the annotation data DB 113 to be searched.
- the issue ontology DB 115 is a database in which issues that are frequently discussed in debates and related words are linked.
- the data stored in the text data DB 112 is text data such as news articles. An appropriate sentence is extracted from this text data to construct an opinion sentence, and a statement sentence is generated by arranging the extracted sentences. To do. Accordingly, the text data DB 112 becomes a data source of sentences that constitute the output statement sentence.
- the text data DB 112 is constructed by crawling English and Japanese news articles from the Internet. Each data is managed by assigning, for example, doc_id as an identifier.
- the text annotation data DB 113 is a DB in which data assigned to the text data DB 112 is accumulated.
- FIG. 2 shows an example of text annotation data.
- id is an identifier unique to the annotation.
- doc_id represents the id of a news article stored in the text data DB 112.
- “Annotation” represents the type of annotation. The types of annotation will be described later.
- “Begin” is the start position of the annotation, and the example of FIG. 2 means that this annotation starts from the 24th character of the article whose text data doc_id is 001122.
- “End” is the end position of the annotation, and the example of FIG.
- annotation 2 means that the annotation ends at the 29th character of the article whose doc_id of the text data 112 is 001122.
- “Ref” is a reference to another annotation, and in the example of FIG. 2, this annotation has a link named “arg0” for an annotation whose id is 125123, and “arg1” for an annotation whose id is 125124 Means you have a name link.
- “Attr” is an attribute of the annotation and has an arbitrary hash value.
- promote_arg0 As types of annotation, there are positive, negative, promote, promote_arg0, promote_arg1, suppress, suppress_arg0, and suppress_arg1.
- “Positive” is a matter having a positive value, and the expression in natural language is, for example, benefit, ethic, health, or the like. Negative has a negative value, and its expression in natural language is, for example, disease, crime, risk, and the like.
- “promote” is an expression representing promotion, and is, for example, increase, invoke, and improve.
- promote_arg0 is a subject to be promoted, and “promot_arg1” is an event to be promoted. As described above, when the “promote” annotation is added, it is identified and given from the surrounding syntax information.
- suppress_arg0 is a subject to be suppressed and suppress_arg1 is an event to be suppressed. As described above, when a suppress annotation is added, the suppress_arg0 is identified and attached from the surrounding syntax information.
- annotations can be generated by applying rules created in advance to the result of syntax analysis as described above for text data. Further, it can be generated by using a machine learning technique called sequential labeling such as CRF ++.
- the search index DB 114 is index data for enabling the text data DB 112 and the text annotation data DB 113 to be searched.
- index data for keyword search the statistic of characteristic words of each text data is calculated using, for example, TF-IDF for similarity search, and the vector value is stored as an index for similarity search. deep.
- a search index can be automatically generated by inputting text data or text annotation data into an index generation API of Solr using software such as Solr.
- the issue ontology DB 115 is a database in which issues that are often discussed in debates and related words are linked.
- FIG. 3 shows an example of the issue ontology stored in the issue ontology DB 115.
- Column 300 represents the value of an issue that is often debated and debated.
- Column 301 represents something that promotes its value.
- a column 302 represents what suppresses the value.
- there are exercise, doctor, organ donation, and medicine, etc. that promote the value of health.
- there are junk food, tabacco, alchohol, and smoking to suppress the value of health.
- the issue ontology is a database of about 50 lines at most and is created manually with reference to past debates.
- the interface unit 111 is an interface that provides an access means to the text data DB 112, the text annotation DB 113, the search index DB 114, and the dispute point ontology DB 115, and is implemented by a technique such as REST.
- the input unit 102 receives the agenda from the user.
- the agenda is input from a GUI such as a Web browser.
- An example of the agenda is “We shoulder ban smoking in train stations” (should smoking be prohibited in the station). Also, settings such as the number of output sentence candidates, which will be described later, may be input.
- FIG. 4 is a flowchart showing the operation of the agenda analysis unit 103.
- the POS tag of the word included in the agenda is estimated using OpenNLP or the like, and the syntax of the agenda sentence is analyzed to generate a parse tree.
- the central verb is extracted.
- the parse tree is traced to find a verb, and the verb closest to ROOT is extracted as the central verb.
- the number of negative expressions such as “Not” that appear before reaching the verb is counted, and if the number is an odd number, there is a negative expression, and if it is an even number (such as double negative), there is no negative expression.
- Ban is extracted as a verb and no negative expression is applied to ban, so there is no negative expression.
- the dictionary is checked to determine the polarity of the agenda.
- verbs such as accept and agree that indicate a positive position with respect to the object and verbs that indicate a negative position with respect to the object such as ban and abandon are stored separately.
- ban is determined as a verb taking a negative position by collating with a dictionary. Combine this with or without the negative expression extracted earlier to determine the polarity of the final agenda theme.
- the polarity is determined to be negative.
- the polarity determined here means the polarity for the noun phrase extracted in the next S403.
- a noun phrase that is the theme of the agenda is extracted. Only a subtree having syntax tags of “ROOT”, “S”, “NP”, “VP”, and “SBAR” in the parse tree of the agenda is traced from ROOT, and the noun phrase that comes out is extracted. For example, in the case of the agenda “We shoulder ban smoking in train stations.”, Smoking is extracted.
- context information is extracted.
- the POS tag is CC, FW, JJ, JJR, JJS, NN, NNP, NNPS, NNS, RP, VB, VBD, VBG, VBN, VBP, VBZ, and S401 and S403
- Unextracted words are extracted as context information. For example, in the case of the agenda “We shoulder ban smoking in train stations.”, Train and station are extracted.
- synonyms are expanded in S405.
- the synonyms of the words extracted in S401, S403, and S404 are calculated using a dictionary.
- WordNet may be used as the dictionary.
- Smoking was extracted as a noun phrase, but smoke and fume are calculated as synonyms.
- synonyms are calculated for the words extracted in S401 and the words representing the context information extracted in S404.
- the agenda analysis unit 103 extracts the central verb, polarity, the noun phrase as the theme, context information, and their synonyms from the agenda. These are used in the latter part.
- FIG. 5 is a flowchart showing the operation of the search unit 104.
- Articles including noun phrases extracted from the agenda in S500 are searched from the text data DB 112 using the keyword search index in the search index DB 114, and the top 1000 items are extracted.
- S501 the article including both the noun phrase extracted from the agenda and the context information is searched from the text data 112 using the keyword search index of the search index 114, and the top 1000 items are extracted.
- S501 is a search in which context information is added as a keyword to S500.
- TF-IDF which is a statistic of characteristic words in the topic
- Score (Number of occurrences of noun phrases extracted from the agenda) + (Number of times the word in the issue ontology appears) -(Article age)
- the 100 articles with high scores are output. In this way, by increasing the score of an article in which the number of occurrences of a word is high, it is possible to find an article having a high relationship with the agenda or issue. Also, by assigning a score to the age of an article, it is possible to find an article that reflects newer data and to increase the persuasive power of the sentence that is finally output.
- FIG. 6 is a flowchart showing the operation of the issue determination unit 105.
- the flowchart in FIG. 6 is executed for each article output by the search unit 104.
- the process loops over all the issues k in the issue ontology.
- k health, fortune, safety, and so on.
- the TF-IDF in the article of the word representing the matter that promotes the dispute points k and k in the issue ontology and the word representing the item that suppresses k is obtained. Actually, since it is included in the TF-IDF vector used for the similarity search in the search unit 104, the corresponding value is acquired from the search index 114.
- TF-IDF has a value for each word
- TF-IDF values which are words representing what promotes the dispute points k and k, and words representing what suppresses k.
- the sum of these TF-IDF values is calculated and set as Sk.
- step S603 the loop is terminated.
- k that maximizes Sk is estimated as the issue of the article. This issue expresses what the value of the entire article is primarily focused on. In this way, it is possible to generate an opinion sentence that describes a consistent claim by determining the issue point for each article and generating a sentence by grouping the issue points in a later process. Therefore, it is important to determine the issue for each article in this way.
- FIG. 7 is a flowchart showing the operation of the sentence extraction unit 106.
- S700 an empty list for storing sentences output by the sentence extraction unit 106 is generated.
- S701 the process loops until all sentences of all articles output by the issue determination unit are executed.
- the sentence is scored according to a plurality of conditions shown in FIG. 800 represents the ID of the condition, 801 represents the condition, and 802 represents the score when the condition is satisfied.
- FIG. 8 When the conditions in FIG. 8 are satisfied, the corresponding score is added.
- the sentence score is the sum of all points. For example, if a sentence satisfies only # 1 and # 4, the score of that sentence is 6. If the score value is 5 or more in S703, the sentence is added to the list generated in S700.
- step S704 the loop is terminated.
- step S ⁇ b> 705 sentences in the list are output from the sentence extraction unit 106.
- FIG. 9 is a flowchart showing the operation of the sentence rearrangement unit 107.
- the sentences are grouped by issue.
- the issue determination unit 105 estimates what is the issue for each article. Therefore, sentences are grouped using the issues of the article from which the sentence is extracted as a key. For example, if there are only five issues determined by the issue determination unit 105, the sentences extracted by the statement extraction unit 106 are classified into five groups. In step S901, all groups are looped. In step S902, all sentences in the group are labeled according to the type of claim, reason, or example.
- a machine learning method can be used for this labeling. For example, a sentence can be converted into a feature vector by a known method such as Bag-of-words, and can be classified by a machine learning method such as SVM.
- the sentence rearrangement unit 107 generates a statement sentence regarding a plurality of issues. Then, in the next evaluation unit 108, by evaluating the plurality of statement statements generated by the statement rearrangement unit 107, the issue in the final output sentence, that is, the position or values of the statement statement of this system is determined for the first time. Will be. In this way, by using only sentences extracted from articles determined to have the same issue, it is possible to generate sentences that are discussed from a consistent standpoint.
- FIG. 10 is a flowchart showing the operation of the evaluation unit 108.
- the evaluation unit 108 regards the sentence generated by the rearrangement as a candidate for a descriptive sentence that states an opinion on the agenda, evaluates the sentence, and outputs a sentence having a high evaluation value as a final output.
- S1000 the goodness of the statement sentence generated using the language model is evaluated. This can be done in a manner similar to that used in statistical machine translation. Specifically, the data of discussion statements created by a person is collected and modeled in advance using a known method such as an n-gram language model or a neural network language model.
- the evaluation method is not limited to this, and the generated statement can be evaluated using other known methods, heuristic rules, and criteria.
- step S1001 three high evaluation values are output.
- the evaluation unit 108 receives the description statements of the number of groups grouped for each issue by the sentence rearrangement unit 107.
- step S1001 three statement statements are finally output.
- the user who uses this system is configured to output three sentences so that the contents of the sentence can be easily grasped in a short time.
- It can also be set as the structure which changes a number. With such a configuration, use according to the knowledge level of the user becomes possible.
- FIG. 11 is a flowchart showing the operation of the paraphrase unit 109.
- S1100 the deviation of the anaphoric relationship is corrected. More specifically, in each sentence of the statement sentence, the above-described OpenNLP is used to perform the coreference analysis on the extraction source article. Using the results, find nouns and proper nouns that are references to pronouns in sentences in the statement, and replace the pronouns.
- the conjunction is complemented. If there is a connective at the beginning of the subsequent sentence of two consecutive statements, remove it first. Subsequently, a conjunction is estimated by SVM using a vector obtained by connecting the Bag-of-words of the previous sentence and the Bag-of-words vector of the subsequent sentence as a feature amount.
- S502 phrases including proper nouns are deleted. If there is a phrase including proper nouns only for the sentence determined to be asserted by the sentence rearrangement unit 107, it is deleted.
- the output unit 110 presents a descriptive sentence that is the final output of the system to the user by means of a display or the like.
- a display or the like it is also possible to output synthesized voice other than the display on the display.
- the affirmative side and the negative side express their opinions verbally, so outputting voice can give the user a more realistic feeling.
- the sentence generation system described in the present embodiment is a sentence generation system that outputs an opinion sentence on an agenda, and analyzes the agenda, inputs the agenda, and searches the agenda for polarity and search.
- An agenda analysis unit that determines a keyword to be used, a search unit that searches for an article using the keyword and an issue word indicating an issue in the discussion, an issue determination unit that decides an issue to be used when generating an opinion sentence, From the article output by the search section, a sentence extraction section that extracts sentences describing the issue, a sentence rearrangement section that generates sentences by rearranging sentences, an evaluation section that evaluates sentences, and a sentence And a paraphrase unit for inserting an appropriate conjunction, and an output unit for outputting a sentence having the highest evaluation among a plurality of sentences as an opinion sentence.
- the sentence generation method described in the present embodiment is a sentence generation method for outputting an opinion sentence for an agenda.
- the first step in which the agenda is input, the agenda is analyzed, the agenda polarity and a keyword used for the search A third step of searching for an article using a keyword and an issue word indicating an issue in the discussion, and a fourth step of determining the issue to be used when generating the opinion sentence, From the article output in the third step, a fifth step for extracting a sentence describing the issue, a sixth step for generating a sentence by rearranging the sentence, a seventh step for evaluating the sentence, and a sentence
- an eighth step of inserting an appropriate conjunction and a ninth step of outputting a sentence having the highest evaluation among the plurality of sentences as the opinion sentence are provided.
- 100 ... generation system, 101 ... Data management system, 102 ... input section, 103 ... agenda analysis department, 104 ... search part, 105.
- the issue determination section, 106 ... sentence extractor, 107 ... sentence sorting part, 108 ... evaluation section, 109 ... paraphrase part, 110 ... output part, 111 ... interface 112 ... text data database, 113 ... text annotation data database, 114 ... index database for search, 1 15 ... Issue ontology database.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
[第1の実施の形態]
以下、本発明の第1の実施の形態の文章生成システムについて説明する。本発明の第1の実施の形態の文章生成システムは、9つのモジュールが結合された生成システムと,データ管理システムからなるシステムである。具体的なハードウェア構成の例は図12に示すとおり、入力装置1202、出力装置1202、各モジュールを実行するプログラムが記録されたメモリ1205、テキストデータDBやテキストアノテーションデータDB113等を含む記憶装置1207で構成される。
ここでは、議題が「カジノ」に対して否定的であるため、争点語としては「抑制語」を用いる処理としている。図3では抑制語は複数記載されているが、検索においては、「カジノ」「このように争点語とキーワードとを合わせ検索することにより、カジノに対する是否を述べている記事を検索することができる。議題から抽出したキーワードのみでは、例えばカジノについての宣伝記事や、カジノに行った感想のみを述べているブログ記事のように、ディベートにおいて考慮する必要のない記事等も多く検索結果に含まれてしまい、適切に検索することができない。
+ (争点オントロジ内の単語が出現する回数)
- (記事の古さ)
ここで記事の古さは,最新年が2014年とすると,2014年に発行された記事は0,2013年に発行された記事は1,2012年に発行された記事は2となる。続いてS504で,スコアの高い上記100件の記事を出力する。このように、単語の出現回数が高い記事のスコアを上げることで、議題や争点との関係性が高い記事が発見できる。また、記事の古さについてもスコアをつけることで、より新しいデータが反映された記事を発見でき、最終的に出力する文章の説得力を増すことができる。
101…データ管理システム、
102…入力部、
103…議題解析部、
104…検索部、
105…争点決定部、
106…文抽出部、
107…文並び替え部、
108…評価部、
109…言い換え部、
110…出力部、
111…インターフェース
112…テキストデータデータベース、
113…テキストアノテーションデータデータベース,
114…検索用インデックスデータベース,
115…争点オントロジデータベース。
Claims (10)
- 議題に対する意見文を出力する文章生成システムであって、
前記議題が入力される入力部と、
前記議題を解析し、前記議題の極性と検索に用いるキーワードとを判定する議題解析部と、
前記キーワードと、議論における争点を示す争点語とを用いて記事を検索する検索部と、
前記意見文を生成する際に用いる前記争点を決定する争点決定部と、
前記検索部が出力した記事から、前記争点について述べている文を抽出する文抽出部と、
前記文を並び替えることにより文章を生成する文並び替え部と、
前記文章を評価する評価部と、
前記文章に対して、適切な接続詞を挿入する言い換え部と、
複数の前記文章のうち、最も評価の高い文章を前記意見文として出力する出力部と、を有することを特徴とする文章生成システム。 - 請求項1に記載の文章生成システムにおいて,
前記争点決定部は,前記検索部が出力した記事を分類することで、前記記事ごとに前記争点を決定することを特徴とする,文章生成システム。 - 請求項1に記載の文章生成システムにおいて,
前記検索部が検索する記事のテキストデータと,
前記テキストデータに付与されたアノテーションデータと,
前記テキストデータと前記アノテーションデータとから生成した検索用インデックスと,
前記争点と、前記争点を抑制する意味の単語である抑制語と、前記争点を促進する意味の単語である争点促進語とを関連づけた争点オントロジと,が蓄積された記憶部と、
前記検索部,前記争点決定部,前記文抽出部,前記文並び替え部,前記評価部,前記言い換え部とデータをやり取りするインターフェース部と,
をさらに備えることを特徴とする,文章生成システム。 - 請求項3に記載の文章生成システムにおいて、
前記議題解析部は、前記議題の極性を判定することで、前記キーワードとして前記抑制語または前記促進語のいずれを用いるかを決定することを特徴とする文章生成システム。 - 請求項3に記載の文章生成システムにおいて、
前記記憶部にはさらに、評価モデルが蓄積され、
前記評価部は、複数の前記文章と前記評価モデルとの尤度をそれぞれ算出し、最も尤度が高い前記文章を前記意見文として出力することを特徴とする文章生成システム。 - 議題に対する意見文を出力する文章生成方法であって、
前記議題が入力される第1ステップと、
前記議題を解析し、前記議題の極性と検索に用いるキーワードとを判定する第2ステップと、
前記キーワードと、議論における争点を示す争点語とを用いて記事を検索する第3ステップと、
前記意見文を生成する際に用いる前記争点を決定する第4ステップと、
前記第3ステップで出力された記事から、前記争点について述べている文を抽出する第5ステップと、
前記文を並び替えることにより文章を生成する第6ステップと、
前記文章を評価する第7ステップと、
前記文章に対して、適切な接続詞を挿入する第8ステップと、
複数の前記文章のうち、最も評価の高い文章を前記意見文として出力する第9ステップと、を有することを特徴とする文章生成方法。 - 請求項6に記載の文章生成方法において,
前記第4ステップでは,前記第3ステップで出力した記事を分類することで、前記記事ごとに前記争点を決定することを特徴とする,文章生成方法。 - 請求項6に記載の文章生成方法において,
前記第3ステップでは、
検索する記事のテキストデータと,
前記テキストデータに付与されたアノテーションデータと,
前記テキストデータと前記アノテーションデータとから生成した検索用インデックスと,
前記争点と、前記争点を抑制する意味の単語である抑制語と、前記争点を促進する意味の単語である争点促進語とを関連づけた争点オントロジと,が蓄積された記憶部に対して検索を行うことを特徴とする,文章生成システム。 - 請求項8に記載の文章生成方法において、
前記第2ステップでは、前記議題の極性を判定することで、前記キーワードとして前記抑制語または前記促進語のいずれを用いるかを決定することを特徴とする文章生成方法。 - 請求項8に記載の文章生成方法において、
前記記憶部にはさらに、評価モデルが蓄積され、
前記第7ステップでは、複数の前記文章と前記評価モデルとの尤度をそれぞれ算出し、最も尤度が高い前記文章を前記意見文として出力することを特徴とする文章生成方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016551422A JP6466952B2 (ja) | 2014-10-01 | 2014-10-01 | 文章生成システム |
US15/507,390 US10496756B2 (en) | 2014-10-01 | 2014-10-01 | Sentence creation system |
CN201480080943.8A CN106663087B (zh) | 2014-10-01 | 2014-10-01 | 文章生成系统 |
EP14903477.9A EP3203383A4 (en) | 2014-10-01 | 2014-10-01 | Text generation system |
PCT/JP2014/076237 WO2016051551A1 (ja) | 2014-10-01 | 2014-10-01 | 文章生成システム |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/076237 WO2016051551A1 (ja) | 2014-10-01 | 2014-10-01 | 文章生成システム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016051551A1 true WO2016051551A1 (ja) | 2016-04-07 |
Family
ID=55629642
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/076237 WO2016051551A1 (ja) | 2014-10-01 | 2014-10-01 | 文章生成システム |
Country Status (5)
Country | Link |
---|---|
US (1) | US10496756B2 (ja) |
EP (1) | EP3203383A4 (ja) |
JP (1) | JP6466952B2 (ja) |
CN (1) | CN106663087B (ja) |
WO (1) | WO2016051551A1 (ja) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109885821A (zh) * | 2019-03-05 | 2019-06-14 | 中国联合网络通信集团有限公司 | 基于人工智能的文章撰写方法及装置、计算机存储介质 |
WO2019160098A1 (ja) * | 2018-02-16 | 2019-08-22 | 日本電信電話株式会社 | 議論構造拡張装置、議論構造拡張方法、プログラム、及びデータ構造 |
JP2019215825A (ja) * | 2018-06-14 | 2019-12-19 | 株式会社日立製作所 | 情報処理装置および情報処理方法 |
WO2020137696A1 (ja) * | 2018-12-26 | 2020-07-02 | 日本電信電話株式会社 | 発話文生成モデル学習装置、発話文収集装置、発話文生成モデル学習方法、発話文収集方法、及びプログラム |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10546063B2 (en) * | 2016-12-13 | 2020-01-28 | International Business Machines Corporation | Processing of string inputs utilizing machine learning |
WO2018230551A1 (ja) * | 2017-06-16 | 2018-12-20 | 新日鉄住金ソリューションズ株式会社 | 情報処理装置、情報処理方法及びプログラム |
CN108694160B (zh) | 2018-05-15 | 2021-01-22 | 北京三快在线科技有限公司 | 文章生成方法、设备及存储介质 |
KR102242392B1 (ko) * | 2019-04-26 | 2021-04-20 | 주식회사 엔씨소프트 | 퀴즈 생성 장치 및 퀴즈 생성 방법 |
CN110096710B (zh) * | 2019-05-09 | 2022-12-30 | 董云鹏 | 一种文章分析及自论证的方法 |
CN110245339B (zh) * | 2019-06-20 | 2023-04-18 | 北京百度网讯科技有限公司 | 文章生成方法、装置、设备和存储介质 |
CN110717041B (zh) * | 2019-09-19 | 2023-10-03 | 太极计算机股份有限公司 | 一种案件检索方法及系统 |
US11361759B2 (en) * | 2019-11-18 | 2022-06-14 | Streamingo Solutions Private Limited | Methods and systems for automatic generation and convergence of keywords and/or keyphrases from a media |
US11443211B2 (en) | 2020-01-08 | 2022-09-13 | International Business Machines Corporation | Extracting important sentences from documents to answer hypothesis that include causes and consequences |
CN111859982B (zh) * | 2020-06-19 | 2024-04-26 | 北京百度网讯科技有限公司 | 语言模型的训练方法、装置、电子设备及可读存储介质 |
CN113609263B (zh) * | 2021-09-30 | 2022-01-25 | 网娱互动科技(北京)股份有限公司 | 一种文章自动生成方法和系统 |
KR20240055290A (ko) * | 2022-10-20 | 2024-04-29 | 주식회사 아이팩토리 | 자연어 생성 모델을 이용하여 텍스트를 자동으로 생성하는 기능을 갖는 문서 작성 장치, 방법, 컴퓨터 프로그램, 컴퓨터로 판독 가능한 기록매체, 서버 및 시스템 |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7930302B2 (en) * | 2006-11-22 | 2011-04-19 | Intuit Inc. | Method and system for analyzing user-generated content |
US8239189B2 (en) * | 2008-02-26 | 2012-08-07 | Siemens Enterprise Communications Gmbh & Co. Kg | Method and system for estimating a sentiment for an entity |
US20090265307A1 (en) * | 2008-04-18 | 2009-10-22 | Reisman Kenneth | System and method for automatically producing fluent textual summaries from multiple opinions |
US8332394B2 (en) | 2008-05-23 | 2012-12-11 | International Business Machines Corporation | System and method for providing question and answers with deferred type evaluation |
CN101620596B (zh) * | 2008-06-30 | 2012-02-15 | 东北大学 | 一种面向查询的多文档自动摘要方法 |
CN101667194A (zh) * | 2009-09-29 | 2010-03-10 | 北京大学 | 基于用户评论文本特征的自动摘要方法及其自动摘要系统 |
CN102262632B (zh) * | 2010-05-28 | 2014-03-19 | 国际商业机器公司 | 进行文本处理的方法和系统 |
CN102279846A (zh) * | 2010-06-10 | 2011-12-14 | 英业达股份有限公司 | 文章辅助写作系统及其方法 |
CN101980196A (zh) * | 2010-10-25 | 2011-02-23 | 中国农业大学 | 文章比对方法与装置 |
CN103917968A (zh) * | 2011-08-15 | 2014-07-09 | 平等传媒有限公司 | 用于管理具有交互式评论流的评论网络的系统和方法 |
-
2014
- 2014-10-01 EP EP14903477.9A patent/EP3203383A4/en not_active Ceased
- 2014-10-01 CN CN201480080943.8A patent/CN106663087B/zh active Active
- 2014-10-01 US US15/507,390 patent/US10496756B2/en active Active
- 2014-10-01 JP JP2016551422A patent/JP6466952B2/ja active Active
- 2014-10-01 WO PCT/JP2014/076237 patent/WO2016051551A1/ja active Application Filing
Non-Patent Citations (4)
Title |
---|
AKIHIRO SAITO ET AL.: "Natural Sentence Generation for Serendipitous Question Answering Systems", DAI 67 KAI REPORTS OF THE MEETING OF SPECIAL INTERNET GROUP ON SPOKEN LANGUAGE UNDERSTANDING AND DIALOGUE PROCESSING (SIG-SLUD-B203, 25 January 2013 (2013-01-25), pages 1 - 6, XP008184898, ISSN: 0918-5682 * |
ATSUSHI FUJII: "Opinion Reader: A System for Summarizing and Visualizing Subjective Information Towards Supporting Decision Making", THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS, vol. J91-D, no. 2, 1 February 2008 (2008-02-01), pages 459 - 470, XP008184175, ISSN: 1880-4535 * |
See also references of EP3203383A4 * |
YUKI ARAI: "Opinion Retrieval from Blogs using Topic Dependence Opinion Models", DATABASE TO WEB JOHO SYSTEM NI KANSURU SYMPOSIUM, IPSJ SYMPOSIUM SERIES, vol. 2007, no. 3, 27 November 2007 (2007-11-27), pages 1 - 7, XP008184173, ISSN: 1882-0840 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019160098A1 (ja) * | 2018-02-16 | 2019-08-22 | 日本電信電話株式会社 | 議論構造拡張装置、議論構造拡張方法、プログラム、及びデータ構造 |
JP2019144692A (ja) * | 2018-02-16 | 2019-08-29 | 日本電信電話株式会社 | 議論構造拡張装置、議論構造拡張方法、プログラム、及びデータ構造 |
JP2019215825A (ja) * | 2018-06-14 | 2019-12-19 | 株式会社日立製作所 | 情報処理装置および情報処理方法 |
JP7117168B2 (ja) | 2018-06-14 | 2022-08-12 | 株式会社日立製作所 | 情報処理装置および情報処理方法 |
WO2020137696A1 (ja) * | 2018-12-26 | 2020-07-02 | 日本電信電話株式会社 | 発話文生成モデル学習装置、発話文収集装置、発話文生成モデル学習方法、発話文収集方法、及びプログラム |
JP2020106905A (ja) * | 2018-12-26 | 2020-07-09 | 日本電信電話株式会社 | 発話文生成モデル学習装置、発話文収集装置、発話文生成モデル学習方法、発話文収集方法、及びプログラム |
JP7156010B2 (ja) | 2018-12-26 | 2022-10-19 | 日本電信電話株式会社 | 発話文生成モデル学習装置、発話文収集装置、発話文生成モデル学習方法、発話文収集方法、及びプログラム |
CN109885821A (zh) * | 2019-03-05 | 2019-06-14 | 中国联合网络通信集团有限公司 | 基于人工智能的文章撰写方法及装置、计算机存储介质 |
Also Published As
Publication number | Publication date |
---|---|
EP3203383A1 (en) | 2017-08-09 |
US10496756B2 (en) | 2019-12-03 |
JP6466952B2 (ja) | 2019-02-06 |
CN106663087B (zh) | 2019-08-16 |
EP3203383A4 (en) | 2018-06-20 |
CN106663087A (zh) | 2017-05-10 |
US20170286408A1 (en) | 2017-10-05 |
JPWO2016051551A1 (ja) | 2017-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6466952B2 (ja) | 文章生成システム | |
CN109241538B (zh) | 基于关键词和动词依存的中文实体关系抽取方法 | |
CN104636466B (zh) | 一种面向开放网页的实体属性抽取方法和系统 | |
KR101136007B1 (ko) | 문서 감성 분석 시스템 및 그 방법 | |
JP6676109B2 (ja) | 発話文生成装置とその方法とプログラム | |
Al-Ghadhban et al. | Arabic sarcasm detection in Twitter | |
Qiu et al. | Advanced sentiment classification of tibetan microblogs on smart campuses based on multi-feature fusion | |
JP6830971B2 (ja) | 文章生成のためのデータを生成するシステム及び方法 | |
CN114528919A (zh) | 自然语言处理方法、装置及计算机设备 | |
Das et al. | Sentiment analysis of movie reviews using POS tags and term frequencies | |
Lynch et al. | The translator’s visibility: Detecting translatorial fingerprints in contemporaneous parallel translations | |
Keersmaekers | A computational approach to the Greek papyri: Developing a corpus to study variation and change in the post-classical Greek complementation system | |
Bassa et al. | GerIE-An Open Information Extraction System for the German Language. | |
Duarte | Sentiment analysis on Twitter for the Portuguese language | |
Diamantini et al. | Semantic disambiguation in a social information discovery system | |
JP5697164B2 (ja) | 対象文から直接的に導出できないカテゴリのタグを付与するタグ付けプログラム、装置、方法及びサーバ | |
Boonpa et al. | Relationship extraction from Thai children's tales for generating illustration | |
Siddiqui | Sarcasm detection from Twitter database using text mining algorithms | |
JP5506482B2 (ja) | 固有表現抽出装置、文字列−固有表現クラス対データベース作成装置、固有表現抽出方法、文字列−固有表現クラス対データベース作成方法、プログラム | |
Muralidharan et al. | Analyzing ELearning platform reviews using sentimental evaluation with SVM classifier | |
Yadav et al. | Design of sentiment analysis system for Hindi content | |
JP5744150B2 (ja) | 発話生成装置、方法、及びプログラム | |
Alotaibi | Sentiment analysis in arabic: An overview | |
Chen et al. | Microblog User Emotion Analysis Method Based on Improved Hierarchical Attention Mechanism and BiLSTM | |
Sheikh et al. | Implementing Sentiment Analysis on Real-Time Twitter Data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14903477 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016551422 Country of ref document: JP Kind code of ref document: A |
|
REEP | Request for entry into the european phase |
Ref document number: 2014903477 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014903477 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15507390 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |