TWI737101B - Question-answering learning method and question-answering learning system using the same and computer program product thereof - Google Patents

Question-answering learning method and question-answering learning system using the same and computer program product thereof Download PDF

Info

Publication number
TWI737101B
TWI737101B TW108148096A TW108148096A TWI737101B TW I737101 B TWI737101 B TW I737101B TW 108148096 A TW108148096 A TW 108148096A TW 108148096 A TW108148096 A TW 108148096A TW I737101 B TWI737101 B TW I737101B
Authority
TW
Taiwan
Prior art keywords
sentences
marked
sentence
module
complementary
Prior art date
Application number
TW108148096A
Other languages
Chinese (zh)
Other versions
TW202125342A (en
Inventor
沈民新
范耀中
洪惠蘭
Original Assignee
財團法人工業技術研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 財團法人工業技術研究院 filed Critical 財團法人工業技術研究院
Priority to TW108148096A priority Critical patent/TWI737101B/en
Publication of TW202125342A publication Critical patent/TW202125342A/en
Application granted granted Critical
Publication of TWI737101B publication Critical patent/TWI737101B/en

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of question-answering learning method includes the following steps. A plurality of classifiers are created according to N1 labeled sentences of N sentences. Each of the classifiers determines sentence type of each of unlabeled N2 sentences in the N sentences. According to the degree of inconsistency of the determined sentence types of the classifiers, N3 sentences are selected from the unlabeled N2 sentences, wherein the determined sentence types of the N3 sentences of the classifiers are inconsistent. N4 identified sentences to be labeled that are complementary to each other are selected from the N3 sentences. After the N4 identified sentences are labeled, a plurality of updated classifiers are re-created, according to the labeled N1 sentences and the N4 identified sentences. At least one of the classifiers before re-created is added to the updated classifiers as member(s) of the updated classifiers.

Description

問答學習方法、應用其之問答學習系統及其電腦 程式產品 Question-and-answer learning method, question-and-answer learning system and computer using it Program product

本揭露是有關於一種學習方法及應用其之學習系統,且特別是有關於一種問答學習方法及應用其之問答學習系統。 This disclosure relates to a learning method and a learning system using it, and in particular, it relates to a question-and-answer learning method and a question-and-answer learning system using it.

習知問答學習方法中,通常是以人工對大量的語句進行分類,然後據此建立一問答分類模型。問答系統一般又可稱為自動問答系統、對話系統、交談系統、自動客服系統、客服機器人、或文字互動助理、即時訊息機器人等。在後續的問答學習中,問答分類模型對於無法分類的新語句通常還是會全數交由人工進行標記語句類型(對新語句給予對應的答案)。然而,這樣的方法付出大量的人工處理工時且所獲得的問答準確率卻不一定穩定地往上提升。 In the learning method of conventional question answering, a large number of sentences are usually classified manually, and then a question answering classification model is established accordingly. Question answering systems are generally called automatic question answering systems, dialogue systems, conversation systems, automatic customer service systems, customer service robots, or text interactive assistants, instant messaging robots, etc. In the follow-up Q&A learning, the Q&A classification model will usually hand over all new sentences that cannot be classified to manually mark the sentence types (give corresponding answers to the new sentences). However, such a method requires a lot of manual processing time and the accuracy of the question and answer obtained may not steadily increase.

因此,如何減少人工處理工時及穩定地提高問答準確率成為本技術領域業者努力的目標之一。 Therefore, how to reduce manual processing man-hours and stably improve the accuracy of question and answer has become one of the goals of the industry in this technical field.

本揭露一實施例提出一種問答學習方法。問答學習方法包括以下步驟。一分類器產生模組依據N個語句中已標記的N1個語句建立一分類器模組,分類器模組包含數個分類器,各分類器表示不同的問答分類模型,其中N及N1為正整數;此些分類器之各者判斷N個語句中未標記之N2個語句之每一者的所屬語句類型,其中N2為正整數;於一致性程度評估步驟中,一致性程度評估模組依據此些分類器的判斷結果的一致性程度,從未標記之N2個語句中挑選出N3個語句,其中此些分類器對各N3個語句的判斷結果係不一致,其中N3為正整數;於互補程度評估步驟中,一互補程度評估模組從N3個語句中挑選出彼此互補的N4個語句做為數個確選待標記語句,其中N4為正整數;在標記N4個確選待標記語句後,分類器產生模組依據已標記之N1個語句與N4個確選待標記語句,重新建立分類器模組之此些分類器;以及,一分類器評估模組將此些重建前之分類器之至少一者加入分類器模組中,以做為分類器模組之成員。 An embodiment of the present disclosure provides a question-and-answer learning method. The Q&A learning method includes the following steps. A classifier generation module creates a classifier module based on the marked N1 sentences in the N sentences. The classifier module includes several classifiers, each of which represents a different question and answer classification model, where N and N1 are positive Integer; each of these classifiers judges the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer; in the consistency evaluation step, the consistency evaluation module is based on The degree of consistency of the judgment results of these classifiers is that N3 sentences are selected from the N2 sentences that are not marked, and the judgment results of these classifiers for each N3 sentence are inconsistent, where N3 is a positive integer; in complementary In the degree evaluation step, a complementary degree evaluation module selects N4 sentences complementary to each other from the N3 sentences as several sentences to be marked for selection, where N4 is a positive integer; after marking the N4 sentences to be marked for selection, The classifier generation module re-creates these classifiers of the classifier module based on the marked N1 sentences and N4 sentences to be marked; and, a classifier evaluation module of the classifiers before reconstruction At least one of them is added to the classifier module as a member of the classifier module.

本揭露另一實施例提出一種問答學習系統。問答學習系統包括一分類器產生模組、一致性程度評估模組、一互補程度評估模組及一分類器評估模組。分類器產生模組用以:依據N個語句中已標記的N1個語句建立一分類器模組,分類器模組包含數個分類器,各分類器表示不同的問答分類模型,其中N及N1為正整數。此些分類器之各者判斷N個語句中未標記之N2個語句之每一者的所屬語句類型,其中N2為正整數。一致性程度評估模組 用以:於一致性程度評估步驟中,依據此些分類器的判斷結果的一致性程度,從未標記之N2個語句中挑選出N3個語句,其中此些分類器對各N3個語句的判斷結果係不一致,其中N3為正整數。互補程度評估模組用以:於一互補程度評估步驟中,從N3個語句中挑選出彼此互補的N4個語句做為數個確選待標記語句,其中N4為正整數。分類器產生模組更用以:在標記N4個確選待標記語句後,依據已標記之N1個語句與N4個確選待標記語句,重新建立分類器模組之此些分類器。分類器評估模組用以:將此些重建前之分類器之至少一者加入分類器模組中,以做為分類器模組之成員。 Another embodiment of the present disclosure provides a question-and-answer learning system. The question and answer learning system includes a classifier generation module, a consistency degree evaluation module, a complementary degree evaluation module, and a classifier evaluation module. The classifier generation module is used to: create a classifier module based on the marked N1 sentences in the N sentences. The classifier module includes several classifiers, each of which represents a different question and answer classification model, where N and N1 Is a positive integer. Each of these classifiers determines the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer. Consistency Evaluation Module Used for: In the consistency evaluation step, according to the consistency of the judgment results of these classifiers, N3 sentences are selected from the N2 sentences that are not marked, and the judgments of these classifiers on each N3 sentence The results are inconsistent, where N3 is a positive integer. The degree of complementarity evaluation module is used for: in a step of evaluating the degree of complementarity, N4 sentences that are complementary to each other are selected from N3 sentences as a number of sentences to be marked for selection, where N4 is a positive integer. The classifier generation module is further used to: after marking the N4 sentences to be marked for selection, re-build these classifiers of the classifier module based on the marked N1 sentences and the N4 sentences to be marked for selection. The classifier evaluation module is used for adding at least one of the classifiers before reconstruction to the classifier module as a member of the classifier module.

本揭露另一實施例提出一種電腦程式產品。電腦程式產品用以載入於一問答學習系統,以執行一問答學習方法。問答學習方法包括以下步驟:一分類器產生模組依據N個語句中已標記的N1個語句建立一分類器模組,分類器模組包含數個分類器,各分類器表示不同的問答分類模型,其中N及N1為正整數;此些分類器之各者判斷N個語句中未標記之N2個語句之每一者的所屬語句類型,其中N2為正整數;於一致性程度評估步驟中,一致性程度評估模組依據此些分類器的判斷結果的一致性程度,從未標記之N2個語句中挑選出N3個語句,其中此些分類器對各N3個語句的判斷結果係不一致,其中N3為正整數;於互補程度評估步驟中,一互補程度評估模組從N3個語句中挑選出彼此互補的N4個語句做為數個確選待標記語句,其中N4為正整數;在標記N4個確選待標記語句後,分類器產生模組依據已標記之N1個語句與N4 個確選待標記語句,重新建立分類器模組之此些分類器;以及,一分類器評估模組將此些重建前之分類器之至少一者加入分類器模組中,以做為分類器模組之成員。 Another embodiment of the present disclosure provides a computer program product. The computer program product is used to be loaded into a question-and-answer learning system to execute a question-and-answer learning method. The Q&A learning method includes the following steps: a classifier generation module builds a classifier module based on the labeled N1 sentences in the N sentences, the classifier module includes several classifiers, and each classifier represents a different question and answer classification model , Where N and N1 are positive integers; each of these classifiers judges the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer; in the consistency evaluation step, The consistency evaluation module selects N3 sentences from the N2 sentences that are not marked according to the consistency of the judgment results of these classifiers, and the judgment results of these classifiers for each N3 sentence are inconsistent. N3 is a positive integer; in the complementary degree evaluation step, a complementary degree evaluation module selects N4 sentences that are complementary to each other from the N3 sentences as a number of sentences to be marked, where N4 is a positive integer; in marking N4 sentences After the sentences to be marked are selected, the classifier generation module is based on the marked N1 sentences and N4 A sentence to be marked is selected, and the classifiers of the classifier module are re-created; and, a classifier evaluation module adds at least one of the classifiers before reconstruction to the classifier module for classification A member of the device module.

為了對本揭露之上述及其他方面有更佳的瞭解,下文特舉實施例,並配合所附圖式詳細說明如下: In order to have a better understanding of the above and other aspects of the present disclosure, the following examples are specially cited, and the accompanying drawings are described in detail as follows:

100:問答學習系統 100: Question and Answer Learning System

110:分類器產生模組 110: Classifier generation module

120:分類器模組 120: classifier module

130:一致性程度評估模組 130: Consistency Evaluation Module

140:互補程度評估模組 140: Complementarity Evaluation Module

150:資料庫 150: database

160:分類器評估模組 160: classifier evaluation module

C1、C2:分類器 C1, C2: classifier

S110~S160、S141~S149:步驟 S110~S160, S141~S149: steps

N、N1、N2、N3、N4:數量 N, N1, N2, N3, N4: Quantity

qi、q1~q100:語句 q i , q 1 ~q 100 : statement

第1圖繪示依照本揭露一實施例之問答學習系統的功能方塊圖。 FIG. 1 is a functional block diagram of a question and answer learning system according to an embodiment of the disclosure.

第2圖繪示第1圖之問答學習系統的問答學習方法一實施例流程圖。 Figure 2 shows a flowchart of an embodiment of the question and answer learning method of the question and answer learning system shown in Figure 1.

第3圖繪示依照本揭露一實施例之語句互補程度判斷的示意圖。 FIG. 3 is a schematic diagram of the judgment of the degree of complementarity of sentences according to an embodiment of the present disclosure.

第4圖繪示依照本揭露一實施例之語句互補程度判斷的流程圖。 FIG. 4 is a flowchart of judgment of sentence complementarity according to an embodiment of the present disclosure.

第5圖繪示第1圖之問答學習系統100的分類器重建的一實施例示意圖。 FIG. 5 is a schematic diagram of an embodiment of the classifier reconstruction of the question answering learning system 100 in FIG. 1.

請參照第1圖,其繪示依照本揭露一實施例之問答學習系統100的功能方塊圖。問答學習系統100包括分類器產生模組110、分類器模組120、一致性程度評估模組130、互補程度評估模組140、資料庫150及分類器評估模組160。 Please refer to FIG. 1, which shows a functional block diagram of the question and answer learning system 100 according to an embodiment of the present disclosure. The question and answer learning system 100 includes a classifier generation module 110, a classifier module 120, a consistency evaluation module 130, a complementarity evaluation module 140, a database 150, and a classifier evaluation module 160.

分類器產生模組110、分類器模組120、一致性程度評估模組130、互補程度評估模組140與分類器評估模組160中至 少一者可以由半導體製程所形成的晶片、電路(circuit)、電路板及儲存數組程式碼之記錄媒體等其中一種或多種之組合。分類器產生模組110、分類器模組120、一致性程度評估模組130、互補程度評估模組140與分類器評估模組160中至少二者可整合成單一模組,或者分類器產生模組110、分類器模組120、一致性程度評估模組130、互補程度評估模組140與分類器評估模組160中至少一者可整合至一處理器(processor)或一控制器(controller)中。此外,資料庫150可儲存於一儲存模組,如記憶體。 The classifier generation module 110, the classifier module 120, the consistency evaluation module 130, the complementarity evaluation module 140, and the classifier evaluation module 160 include At least one of them can be a combination of one or more of a chip, a circuit, a circuit board, and a recording medium storing an array of program codes formed by a semiconductor process. At least two of the classifier generation module 110, the classifier module 120, the consistency evaluation module 130, the complementarity evaluation module 140, and the classifier evaluation module 160 can be integrated into a single module, or the classifier generation module At least one of the group 110, the classifier module 120, the consistency evaluation module 130, the complementarity evaluation module 140, and the classifier evaluation module 160 can be integrated into a processor or a controller middle. In addition, the database 150 can be stored in a storage module, such as a memory.

分類器產生模組110依據N個語句qi中的已標記的N1個語句,建立一分類器模組120,其中分類器模組120包含數個分類器C1,各分類器C1表示不同的問答分類模型,其中N及N1為正整數。各分類器用以判斷N個語句中未標記之N2個語句之每一者的所屬語句類型,其中N2為正整數。一致性程度評估模組130用以於一致性程度評估步驟中,依據此些分類器C1的判斷結果的一致性程度,從未標記之N2個語句中挑選出N3個語句,其中此些分類器C1對各N3個語句的判斷結果係不一致,其中N3為正整數。互補程度評估模組140用以於互補程度評估步驟中,從N3個語句中選出彼此互補的N4個做為數個確選待標記語句,其中N4為正整數。N4個語句的內容字詞及/或文意例如是互不重複、互不相似、互不蘊涵及/或互不生成。在標記N4個確選待標記語句後,分類器產生模組110依據已標記之N1個語句與N4個確選待標記語句,重新建立分類器模組120之數個分類器C2(分類器C2一實施例 繪示於第5圖)。分類器評估模組160將此些重建前之分類器C1之至少一者加入分類器模組120中,以做為分類器模組120之成員。於一實施例中,N1小於N、N2小於N、N3不大於N2且N4不大於N3,然本揭露實施例不受此限。此外,本文的「語句」例如是自然語言描述句、自然語言問句或口述句、問句、直述句或其它任何文法形式/句型的語句。 The classifier generation module 110 creates a classifier module 120 based on the marked N1 sentences in the N sentences q i . The classifier module 120 includes several classifiers C1, and each classifier C1 represents a different question and answer. Classification model, where N and N1 are positive integers. Each classifier is used to determine the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer. The consistency evaluation module 130 is used in the consistency evaluation step to select N3 sentences from the N2 unmarked sentences according to the consistency of the judgment results of these classifiers C1, among which these classifiers C1's judgment results for each N3 sentence are inconsistent, and N3 is a positive integer. The complementarity evaluation module 140 is used to select N4 complementary sentences from the N3 sentences as a number of sentences to be marked for selection in the complementary degree evaluation step, where N4 is a positive integer. The content words and/or context of the N4 sentences are, for example, non-repetitive, dissimilar to each other, not implying each other, and/or not generating each other. After marking N4 sentences to be marked for selection, the classifier generation module 110 re-creates several classifiers C2 (classifier C2) of the classifier module 120 according to the marked N1 sentences and N4 sentences to be marked for selection. An embodiment is shown in Figure 5). The classifier evaluation module 160 adds at least one of the classifiers C1 before reconstruction to the classifier module 120 as a member of the classifier module 120. In one embodiment, N1 is less than N, N2 is less than N, N3 is not greater than N2, and N4 is not greater than N3, but the embodiment of the present disclosure is not limited thereto. In addition, the "sentences" in this article are, for example, natural language descriptive sentences, natural language question sentences or oral sentences, question sentences, straightforward sentences, or any other grammatical/sentence sentences.

此外,分類器產生模組110可採用神經網路(Neural Networks,NN)技術、深度神經網路(Deep Neural Networks,DNN)技術或是支援向量機(Support Vector Machine)技術進行訓練學習。問答學習系統100能夠從未標記語句中主動挑選少量語句交由人工標記後,再饋入分類器產生模組110重新學習,分類器產生模組110依據人工標記結果據以建立至少一分類器C2。換言之,問答學習系統100的問答學習方法係一種機器主動學習方法。 In addition, the classifier generation module 110 can use Neural Networks (NN) technology, Deep Neural Networks (DNN) technology, or Support Vector Machine (Support Vector Machine) technology for training and learning. The question and answer learning system 100 can actively select a small number of sentences from unmarked sentences to be manually labeled, and then feed them to the classifier generation module 110 for relearning. The classifier generation module 110 builds at least one classifier C2 based on the manual marking results. . In other words, the question answering learning method of the question answering learning system 100 is a machine active learning method.

相較於標記大量的N個語句,在本揭露實施例之問答學習系統中,人工只需標記(例如提供對該語句的回應,如答句、答案、功能選單、圖案及/或圖片)由問答學習系統100所選之N4個語句的所屬語句類型,其中N4之值小於N之值,因此可節省許多人工標記的時間。此外,由於問答學習系統100所選之N4個語句與先前已標記語句的一致性程度低且互補程度高,因此可減少人工重複標記相同語句類型的語句的機率,且能提升問答學習系統100的整體問答準確率。再者,由於將重建前之分類器之至少一者加入重建後之分類器模組成員,使得在疊代更新過程中,問答 準確率能夠保持穩定提升,減少系統因更新而效能下降的機會,因而提升問答系統的管理便利性。 Compared with marking a large number of N sentences, in the question-and-answer learning system of the disclosed embodiment, the manual only needs to mark (for example, provide a response to the sentence, such as answer sentence, answer, function menu, pattern and/or picture) by The sentence type of the N4 sentences selected by the question and answer learning system 100, wherein the value of N4 is less than the value of N, so it can save a lot of time for manual marking. In addition, since the N4 sentences selected by the question and answer learning system 100 have a low degree of consistency and a high degree of complementarity with previously marked sentences, the probability of manually repetitively marking sentences of the same sentence type can be reduced, and the performance of the question and answer learning system 100 can be improved. Overall Q&A accuracy rate. Furthermore, since at least one of the classifiers before reconstruction is added to the members of the classifier module after reconstruction, in the iterative update process, the question and answer The accuracy rate can maintain a stable improvement, reducing the chance of system performance degradation due to updates, thus improving the management convenience of the question and answer system.

舉例來說,在一例子中,N之值例如是10000個,N1例如是100個,N2例如是9900個(N-N1=9900)中的100個,N3例如是8個,N4例如是3個。在此例子中,人工只需對3個(即N4之值)語句進行標記,且此3個語句相較於先前已標記語句的一致性程度低及互補程度高,屬於有意義的標記,能夠有效提升問答學習系統100的整體問答準確率。此外,本揭露實施例不限定前述N、N1、N2、N3及N4的數值,其可以是更大或更小的數值。 For example, in an example, the value of N is for example 10000, N1 is for example 100, N2 is for example 100 of 9900 (N-N1=9900), N3 is for example 8, and N4 is for example 3. indivual. In this example, the manual only needs to mark 3 sentences (that is, the value of N4), and compared with the previously marked sentences, the consistency of these 3 sentences is low and the degree of complementarity is high. They are meaningful markings and can be effective Improve the overall Q&A accuracy rate of the Q&A learning system 100. In addition, the embodiments of the present disclosure do not limit the aforementioned values of N, N1, N2, N3, and N4, which may be larger or smaller values.

下進一步說明問答學習系統100選出N4個語句的過程。請參照第2圖,其繪示第1圖之問答學習系統100的問答學習方法一實施例流程圖。 The process of selecting N4 sentences by the question answering learning system 100 will be further explained below. Please refer to FIG. 2, which shows a flowchart of an embodiment of the question and answer learning method of the question and answer learning system 100 in FIG. 1.

在步驟S110中,分類器產生模組110依據N個語句中已標記的N1個語句建立分類器模組120,其中分類器模組120包含C1個分類器,各C1個分類器表示不同的問答分類模型,其中N及N1為正整數且N1小於N。換言之,本揭露實施例之主動學習問答方法可不對全部的N個語句逐一標記,只從相對少量的N1語句中挑選出有意義(能夠提升問答準確率)的更少量的語句(即N4個)進行標記即可。 In step S110, the classifier generation module 110 creates a classifier module 120 based on the marked N1 sentences in the N sentences. The classifier module 120 includes C1 classifiers, and each C1 classifier represents a different question and answer. Classification model, where N and N1 are positive integers and N1 is less than N. In other words, the active learning question answering method of the disclosed embodiment may not mark all N sentences one by one, and only select a smaller number of meaningful (that can improve the accuracy of question answering) sentences (ie N4) from a relatively small number of N1 sentences. Just mark it.

在步驟S120中,各分類器C1判斷N個語句中未標記之N2個語句之每一者的所屬語句類型,其中N2為正整數且N2小於N。 In step S120, each classifier C1 determines the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer and N2 is less than N.

以N2為100舉例來說,各分類器C1判斷未標記之N2個語句q1~q100之每一者的所屬語句類型。以其中一個語句q1為例,每個分類器C1判斷語句q1的所屬語句類型,當所有的分類器C1一致地判斷語句q1屬於同一個語句類型時,語句q1定義為具有一致性(或完全一致);當不是所有的分類器C1都一致地判斷語句q1屬於同一個語句類型時,語句q1定義為不具一致性(即不一致)。 Taking N2 as 100 for example, each classifier C1 determines the sentence type of each of the N2 unlabeled sentences q 1 ~q 100. Taking one sentence q 1 as an example, each classifier C1 judges the sentence type of the sentence q 1. When all the classifiers C1 consistently judge that the sentence q 1 belongs to the same sentence type, the sentence q 1 is defined as having consistency (Or completely consistent); when not all classifiers C1 consistently judge that the sentence q 1 belongs to the same sentence type, the sentence q 1 is defined as inconsistent (that is, inconsistent).

在步驟S130中,於一致性程度評估步驟中,一致性程度評估模組130依據此些分類器C1的判斷結果的一致性程度,從未標記之N2個語句中挑選出N3個語句,其中此些分類器C1對各N3個語句的判斷結果係不一致,其中N3為正整數且不大於N2。對於一個語句而言,當此些分類器C1判斷出的語句類型愈多時,表示此語句的不一致性愈高(或一致性程度低);反之,對於一個語句而言,當此些分類器C1判斷出的語句類型愈少時,表示此語句的不一致性愈低(或一致性程度高)。 In step S130, in the consistency evaluation step, the consistency evaluation module 130 selects N3 sentences from the N2 unmarked sentences according to the consistency of the judgment results of these classifiers C1, where this The judgment results of some classifiers C1 for each N3 sentence are inconsistent, where N3 is a positive integer and not greater than N2. For a sentence, the more sentence types judged by these classifiers C1, the higher the inconsistency of the sentence (or the lower the degree of consistency); on the contrary, for a sentence, when these classifiers The fewer sentence types judged by C1, the lower the inconsistency of the sentence (or the higher the degree of consistency).

本實施例以未標記之N3個語句q1~q100中的8個語句q1~q8具有不一致性為例進行後續說明。 This embodiment takes as an example that eight sentences q 1 to q 8 among the N3 unmarked sentences q 1 to q 100 are inconsistent for subsequent description.

在步驟S140中,於互補程度評估步驟中,互補程度評估模組140從N3個語句q1~q8中選出彼此互補的N4個語句做為數個確選待標記語句,其中N4為正整數,且N4不大於N3。視N3個語句q1~q8的彼此互補程度而定,確選的N4個確選待標記語句的數量可能等於或少於N3。 In step S140, in the complementary degree evaluation step, the complementary degree evaluation module 140 selects N4 sentences complementary to each other from the N3 sentences q 1 to q 8 as several sentences to be marked for selection, where N4 is a positive integer, And N4 is not greater than N3. Depending on how complementary the N3 sentences q 1 to q 8 are to each other, the number of N4 confirmed sentences to be marked may be equal to or less than N3.

舉例來說,請參照第3及4圖,第3圖繪示依照本揭露一實施例之語句互補程度判斷的示意圖,而第4圖繪示依照本揭露一實施例之語句互補程度判斷的流程圖。 For example, please refer to FIGS. 3 and 4. FIG. 3 shows a schematic diagram of sentence complementarity judgment according to an embodiment of the present disclosure, and FIG. 4 shows a flow of judgment of sentence complementarity according to an embodiment of the present disclosure. picture.

在步驟S141中,互補程度評估模組140可依據一致性程度排序N3個語句。例如,如第3圖所示,互補程度評估模組140依一致性程度由低至高的順序,將語句依序排列成q4、q6、q3、q8、q5、q2、q7及q1In step S141, the complementarity evaluation module 140 can sort the N3 sentences according to the degree of consistency. For example, as shown in Figure 3, the complementarity evaluation module 140 arranges sentences into q 4 , q 6 , q 3 , q 8 , q 5 , q 2 , q in the order of the degree of consistency from low to high. 7 and q 1 .

然後,在步驟S142中,如第3圖所示,在第一批次挑選中,互補程度評估模組140挑選不一致性最高的語句q4做為確選待標記語句。 Then, in step S142, as shown in FIG. 3, in the first batch selection, the complementarity evaluation module 140 selects the sentence q 4 with the highest inconsistency as the sentence to be marked for selection.

然後,在步驟S143中,互補程度評估模組140設定i的初始值等於2。 Then, in step S143, the complementarity evaluation module 140 sets the initial value of i to be equal to 2.

然後,在步驟S144中,在第二批次挑選中,互補程度評估模組140比較N3個語句中不一致性次高之語句與各確選待標記語句的互補程度。例如,互補程度評估模組140比較第i個(不一致性第i高)語句q6與確選待標記語句q4的互補程度。 Then, in step S144, in the second batch of selection, the complementarity evaluation module 140 compares the complementarity of the sentence with the second highest inconsistency among the N3 sentences and the sentences to be marked for selection. For example, the degree of complementarity evaluation module 140 compares the degree of complementarity between the i-th (i-th highest inconsistency) sentence q 6 and the sentence q 4 to be marked for selection.

然後,在步驟S145中,互補程度評估模組140判斷不一致性次高(如第i高)之語句相較於各確選待標記語句是否互補。例如,互補程度評估模組140判斷語句q6與確選待標記語句q4是否互補。當語句q6與確選待標記語句q4互補時,流程進入步驟S146。此外,當不一致性次高(如第i高)之語句相較於各確選待標記語句不互補時,互補程度評估模組140不將不一致性次高之語句 做為確選待標記語句之成員。例如,當語句q6與確選待標記語句q4不互補時,互補程度評估模組140不將語句q6做為確選待標記語句之成員,例如是忽略此語句q6,然後流程進入步驟S147。 Then, in step S145, the complementarity evaluation module 140 determines whether the sentence with the second highest inconsistency (such as the i-th highest) is complementary to each sentence to be marked. For example, the complementarity evaluation module 140 determines whether the sentence q 6 is complementary to the sentence q 4 to be marked for selection. When the sentence q 6 is complementary to the sentence q 4 to be marked for selection, the flow proceeds to step S146. In addition, when the sentence with the second highest inconsistency (such as the i-th highest) is not complementary to the sentences to be marked for selection, the complementarity evaluation module 140 does not consider the sentence with the second highest inconsistency as the sentence to be marked for selection. member. For example, when the sentence q 6 is not complementary to the sentence to be marked for selection q 4 , the complementarity evaluation module 140 does not use the sentence q 6 as a member of the sentence to be marked for selection, for example, the sentence q 6 is ignored, and then the flow enters Step S147.

在步驟S146中,當不一致性次高(如第i高)之語句相較於各確選待標記語句皆互補時,互補程度評估模組140將不一致性次高之語句做為確選待標記語句之成員。例如,如第3圖所示,當語句q6與確選待標記語句q4互補時,互補程度評估模組140挑選語句q6做為確選待標記語句。 In step S146, when the sentence with the second highest inconsistency (such as the i-th highest) is complementary to each sentence to be marked for selection, the complementarity evaluation module 140 uses the sentence with the second highest inconsistency as the sentence to be marked for selection. The members of the statement. For example, as shown in FIG. 3, when the sentence q 6 is complementary to the sentence to be marked for selection q 4 , the complementarity evaluation module 140 selects the sentence q 6 as the sentence to be marked for selection.

然後,在步驟S147中,互補程度評估模組140判斷此些確選待標記語句之數量是否已達N4個或已無語句可挑選(例如,i已等於N3)。當此些確選待標記語句之數量已達N4個或是已無語句可挑選,流程進入步驟S149;當此些確選待標記語句之數量未達N4個或是i尚未等於N3,流程進入步驟S148,累加i的值且流程回到步驟S144,繼續下一個語句的互補程度判斷。 Then, in step S147, the complementarity evaluation module 140 determines whether the number of sentences to be marked for selection has reached N4 or there are no sentences to be selected (for example, i is equal to N3). When the number of sentences to be marked for selection has reached N4 or there are no sentences to choose from, the process goes to step S149; when the number of sentences for which to be marked for selection does not reach N4 or i is not equal to N3, the process goes to In step S148, the value of i is accumulated and the flow returns to step S144 to continue the judgment of the complementarity of the next sentence.

如第3圖所示,由於第二批次挑選的確選待標記語句的數量僅有2個,其未達N4個(例如是3個),因此繼續下一個語句的互補程度判斷。舉例來說,在累加i之值(i=3)後,互補程度評估模組140比較第i個(不一致性第3高)語句q3與確選待標記語句q4及q6各者的互補程度。如圖所示,雖然語句q3與確選待標記語句q4互補,但由於語句q3與另一個確選待標記語句q6不互補時,因此互補程度評估模組140忽略(或放棄)語句q3,不將語句q3做為確選待標記語句之成員。 As shown in Figure 3, since the number of sentences to be marked is only 2 selected in the second batch selection, which does not reach N4 (for example, 3), the judgment of the complementarity of the next sentence is continued. For example, after accumulating the value of i (i=3), the complementarity evaluation module 140 compares the i-th sentence (the third highest inconsistency) sentence q 3 with the sentence q 4 and q 6 to be marked. Complementary degree. As shown in the figure, although the sentence q 3 is complementary to the sentence to be marked for selection q 4 , when the sentence q 3 is not complementary to another sentence to be marked for selection q 6 , the complementarity evaluation module 140 ignores (or discards) The sentence q 3 does not use the sentence q 3 as a member of the sentence to be marked.

如第3圖所示,由於第三批次挑選的確選待標記語句的數量仍僅有2個,其未達N4個(N4例如是3個),因此繼續下一個語句的互補程度判斷。舉例來說,在累加i之值(i=4)後,互補程度評估模組140比較第i個(不一致性第4高)語句q8與確選待標記語句q4及q6各者的互補程度。如圖所示,由於語句q8與所有確選待標記語句q4及q6皆互補,因此互補程度評估模組140挑選語句q8做為確選待標記語句。 As shown in Figure 3, since the number of sentences to be marked in the third batch of selection is still only 2, which does not reach N4 (N4 is 3, for example), the judgment of the complementarity of the next sentence is continued. For example, after accumulating the value of i (i=4), the complementarity evaluation module 140 compares the i-th (4th highest inconsistency) sentence q 8 with the sentences q 4 and q 6 to be marked. Complementary degree. As shown in the figure, since the sentence q 8 is complementary to all the sentences q 4 and q 6 to be marked for selection, the complementarity evaluation module 140 selects the sentence q 8 as the sentence to be marked for selection.

如第3圖所示,第四批次挑選的確選待標記語句的數量已達N4個(N4例如是3個)或是已無語句可挑選,因此流程進入步驟S149,停止互補程度判斷步驟。然後,人工只要針對此些確選待標記語句進行標記即可。相較於數量較大的N、N1、N2及N3,人工只要針對相對少量的N4個確選待標記語句進行標記,節省大量的處理工時。此外,由於此些確選待標記語句相較於其它已標記語句的互補程度高(或互補)且一致性程度低(或不一致),因此此些確選待標記語句加入問答學習系統100中後,能夠有效穩定地提升問答準確率。 As shown in FIG. 3, the number of sentences to be marked in the fourth batch selection has reached N4 (N4 is 3, for example) or there are no sentences to be selected, so the process goes to step S149 to stop the complementation degree judgment step. Then, the manual only needs to mark the sentences to be marked. Compared with the larger number of N, N1, N2, and N3, the manual only needs to mark a relatively small number of N4 sentences to be marked, which saves a lot of processing man-hours. In addition, since these selected sentences to be marked are highly complementary (or complementary) and low in consistency (or inconsistent) compared with other marked sentences, these sentences to be marked for selection are added to the question-and-answer learning system 100. , Which can effectively and steadily improve the accuracy of question and answer.

以下說明二語句是否互補的判斷方法。 The following explains the method of judging whether the two sentences are complementary.

二語句係互補表示二語句彼此沒有重複的資訊量,亦即,內容字詞及/或文意係互不重複、互不相似、互不蘊涵及/或互不生成。資訊量的重複程度值係根據不同文字分析方式而採取二元值(以0表示不蘊涵,而以1表示蘊涵)、比例數值或機率值。例如,利用文字蘊涵識別二語句q1與q2間的邏輯推理關係,若語句q1可推論出語句q2的完整意思,(即語句q1語義蘊涵語句q2), 表示語句q1的資訊量已包含語句q2的資訊量,則二語句q1與q2為不互補,相反的即為互補。另外,二語句的重複資訊量可利用重複字詞量測,若語句q1的字詞和語句q2的字詞重複比例愈多時(例如是60%,然本發明實施例不受此限),表示二語句q1與q2的重複資訊量高,則二語句q1與q2的互補程度愈低(例如是40%,然本發明實施例不受此限)。此外,字詞重複分析更可透過近義詞、反義詞、關連詞、相似詞、語義詞網、本體詞網、專有名詞識別、詞嵌入等方式擴充,亦即,若語句q1與語句q2的語意相似度愈高,表示二語句q1與q2的互補程度愈低。或者,二語句的重複資訊量可利用語言模型量測,若語句q1的語言模型可生成語句q2的機率值愈高(例如是60%,然本發明實施例不受此限),表示二語句q1與q2的重複資訊量高,則二語句q1與q2的互補程度愈低(例如是40%,然本發明實施例不受此限)。 The complementarity of the two sentences indicates the amount of information that the two sentences do not repeat each other, that is, the content words and/or the context are not repeated, dissimilar to each other, do not imply each other, and/or do not generate each other. The repetition degree value of the amount of information is based on different text analysis methods and adopts a binary value (0 means no implication, and 1 means implication), a proportional value, or a probability value. For example, using literal implication to identify the logical inference relationship between the two sentences q1 and q2, if the sentence q1 can infer the complete meaning of the sentence q2, (that is, the sentence q1 semantic implication sentence q2), It means that the information amount of the sentence q1 already includes the information amount of the sentence q2, so the two sentences q1 and q2 are not complementary, and the opposite is complementary. In addition, the amount of repetitive information of the two sentences can be measured by repeated words. If the words of sentence q1 and the words of sentence q2 are repeated more (for example, 60%, but the embodiment of the present invention is not limited by this), It means that the higher the amount of repetitive information of the two sentences q1 and q2, the lower the degree of complementarity between the two sentences q1 and q2 (for example, 40%, but the embodiment of the present invention is not limited by this). In addition, word repetition analysis can be expanded by similar words, antonyms, related words, similar words, semantic word nets, ontology word nets, proper noun recognition, word embedding, etc., that is, if the semantics of sentence q1 and sentence q2 are similar The higher the degree, the lower the complementarity of the two sentences q1 and q2. Alternatively, the amount of repetitive information of the two sentences can be measured by a language model. If the language model of sentence q1 can generate sentence q2, the higher the probability value (for example, 60%, but the embodiment of the present invention is not limited by this), it means two sentences The higher the amount of repeated information of q1 and q2, the lower the degree of complementarity between the two sentences q1 and q2 (for example, 40%, but the embodiment of the present invention is not limited by this).

以語句q1為「台北101有公車站」,而語句q2為「可以搭公共交通到達台北101」來說,語句q1蘊涵語句q2,其中二者的識別結果可以1表示蘊涵(二元值)(資訊重複,即為不互補),或以機率值例如是90%表示蘊含的可能性高(資訊重複程度高,即為互補程度低)。 Taking the sentence q1 as "Taipei 101 has a bus stop" and sentence q2 as "You can take public transportation to Taipei 101", the sentence q1 implies sentence q2, and the recognition result of the two can be 1 to represent the implication (binary value) ( Information repetition means non-complementarity), or a probability value of 90%, for example, indicates a high possibility of implication (a high degree of information repetition means a low degree of complementarity).

綜上,二語句的互補程度可依據下列之一或組合方式判斷:(1).二語句中字詞的重複程度、字詞的同義詞或近義詞或反義詞或關聯詞的重複程度、字詞的關聯詞網的重複程度、字詞的詞義相近程度、字詞的上下位本體概念的相似程度、字詞的 關聯詞的相似程度、字詞的關聯詞網的圖相似程度;(2).二語句中片語、子句或專有名詞的重複程度;(3).二語句中片語、子句或專有名詞的相似程度;(4).二語句詞嵌入(詞向量)的相似程度;(5).二語句的句型的相似程度;(6).二語句的語意相似程度;(7).二語句的蘊涵關係;(8).二語句的蘊涵機率;(9).二語句的語言模型的相似程度。 In summary, the degree of complementarity of two sentences can be judged according to one of the following or a combination: (1). The degree of repetition of words in the second sentence, the degree of repetition of synonyms or synonyms or antonyms or related words of the words, and the related word network of words The degree of repetition, the similarity of the meaning of the word, the similarity of the upper and lower ontology concepts of the word, the degree of the word The degree of similarity of related words, the degree of similarity of the related word network of words; (2). The degree of repetition of phrases, clauses or proper nouns in the second sentence; (3). The phrase, clause or proprietary of the second sentence The degree of similarity of nouns; (4). The degree of similarity of the word embedding (word vector) of the two sentences; (5). The degree of similarity of the sentence patterns of the two sentences; (6). The degree of semantic similarity of the two sentences; (7). The implication relation of sentences; (8). The implied probability of the two sentences; (9). The similarity of the language model of the two sentences.

在一實施例中,互補程度評估模組140可依據語句的同一概念(或同義)詞與句型來判斷二語句的互補程度。例如,當二語句分別為「如何去台北」及「如何去天龍國」時,由於「台北」及「天龍國」屬於同一概念(或同義)詞且二語句的句型相同,因此互補程度評估模組140將此二語句判斷為不互補(互補程度低)。在另一例子中,當二語句分別為「如何去台北」及「台北有什麼活動」時,由於二語句的句型不同,因此互補程度評估模組140將此二語句判斷為互補(互補程度高)。綜上,互補程度評估模組140係採用句子文意分析技術判斷二語句的互補程度。 In one embodiment, the complementarity evaluation module 140 can determine the complementarity of two sentences based on the same concept (or synonymous) word and sentence pattern of the sentence. For example, when the two sentences are "How to go to Taipei" and "How to go to Tianlongguo", because "Taipei" and "Tianlongguo" belong to the same conceptual (or synonymous) word and the sentence patterns of the two sentences are the same, the degree of complementarity is assessed The module 140 determines that the two sentences are not complementary (the degree of complementarity is low). In another example, when the two sentences are "How to go to Taipei" and "What activities are there in Taipei", the two sentences have different sentence patterns, so the complementarity assessment module 140 judges the two sentences as complementary (degree of complementarity). high). In summary, the complementarity evaluation module 140 uses sentence text analysis technology to determine the complementarity of two sentences.

接著,在第2圖之步驟S150中,請同時參照第5圖,其繪示第1圖之問答學習系統100的分類器更新的示意圖。在人工標記所選之N4個確選待標記語句後,分類器產生模組110依據已標記之N1個語句與N4個確選待標記語句,重新建立分類器模組120之數個分類器C2。此外,分類器產生模組110可預先將前一個疊代次(每重建一批分類器的過程稱為一個疊代次)的分類器C1儲存在資料庫150中。 Next, in step S150 in FIG. 2, please also refer to FIG. 5, which shows a schematic diagram of the update of the classifier of the question answering learning system 100 in FIG. 1. After manually marking the selected N4 sentences to be marked, the classifier generation module 110 re-creates several classifiers C2 of the classifier module 120 according to the marked N1 sentences and N4 sentences to be marked. . In addition, the classifier generation module 110 may pre-store the classifier C1 of the previous iteration (each process of rebuilding a batch of classifiers is called an iteration) in the database 150.

接著,在步驟S160中,如第5圖所示,分類器評估模組160將此些重建前之分類器C1之至少一者加入分類器模組120中,以做為分類器模組120之成員。換言之,分類器評估模組160可將前一個疊代次所產生的數個分類器C1之至少一者加入此些目前疊代次所所產生的數個分類器C2中,以作為分類器模組120之數個分類器的成員。由於不同分類器表示不同的問答分類模型,因此加入前一個疊代次所產生的數個分類器C1能夠將新確選待標記語句對於問答準確率的影響縮減在分類器C1未涵蓋的範圍,降低更新分類器後問答準確度效能不穩定或下降的可能性,達到穩定提升問答學習系統100的問答準確率。 Then, in step S160, as shown in FIG. 5, the classifier evaluation module 160 adds at least one of the classifiers C1 before reconstruction to the classifier module 120 to be used as the classifier module 120 member. In other words, the classifier evaluation module 160 can add at least one of the plurality of classifiers C1 generated in the previous iteration to the plurality of classifiers C2 generated in the current iteration as a classifier model. Member of several classifiers in group 120. Since different classifiers represent different question and answer classification models, adding several classifiers C1 generated by the previous iteration can reduce the impact of the newly selected sentence to be marked on the accuracy of the question and answer to the range not covered by the classifier C1. The possibility that the Q&A accuracy performance will be unstable or decreased after the classifier is updated is reduced, and the Q&A accuracy rate of the Q&A learning system 100 can be stably improved.

以下說明分類器評估模組160加入前一個疊代次的至少一分類器C1的多個實施例。分類器評估模組160可根據分類器C1的分類正確率、疊代次數、疊代次數衰減率、分類器模組成員上限或下限值、保留每一疊代次所有分類器、或上述條件之組合。 The following describes multiple embodiments in which the classifier evaluation module 160 adds at least one classifier C1 of the previous iteration. The classifier evaluation module 160 can be based on the classification accuracy rate of the classifier C1, the number of iterations, the attenuation rate of the number of iterations, the upper or lower limit of the members of the classifier module, all the classifiers of each iteration, or the above conditions的组合。 The combination.

舉例來說,在一實施例中,分類器評估模組160決定分類器模組成員120的組成方式為:保留以前所有疊代次的所有分類器C1。例如,若第一疊代次的分類器模組成員有4個分類器,則第二疊代次有8個分類器,其中4個是第一疊代次的全部4個分類器,而第三疊代次的分類器模組成員則有12個分類器,其中4個是第一疊代次的4個分類器而另4個是第二疊代次的4個分類器。 For example, in one embodiment, the classifier evaluation module 160 determines the composition of the classifier module members 120 as follows: retain all the classifiers C1 of all previous iterations. For example, if the classifier module members of the first iteration have 4 classifiers, then the second iteration has 8 classifiers, 4 of which are all 4 classifiers of the first iteration, and the first iteration has 8 classifiers. The three-generation classifier module has 12 classifiers, 4 of which are the 4 classifiers of the first iteration and the other 4 are the 4 classifiers of the second iteration.

在另一實施例中,分類器評估模組160決定分類器模組成員120的組成方式為:只保留前一疊代次的所有分類器C1。例如,若第一疊代次的分類器模組成員有4個分類器,則第二疊代次有8個分類器,其中4個是第一疊代次的4個分類器,而第三疊代次的分類器模組成員仍為8個分類器,其中4個是第二疊代次的4個分類器。 In another embodiment, the classifier evaluation module 160 determines the composition of the classifier module members 120 as follows: only retain all the classifiers C1 of the previous iteration. For example, if the classifier module members of the first iteration have 4 classifiers, then the second iteration has 8 classifiers, 4 of which are the 4 classifiers of the first iteration, and the third The members of the classifier module of the iteration level are still 8 classifiers, of which 4 are the 4 classifiers of the second iteration.

在其它實施例中,分類器評估模組160決定分類器模組成員120的組成方式為:只保留前一疊代次中分類器C1分類正確率為排序的前n名者,其中n為介於該前一疊代次的分類器數量的1%~50%之間的數量;或者,n為介於1~前一疊代次的分類器數量的任意正整數。舉例來說,若第一疊代次的分類器模組成員有4個分類器,則第二疊代次有6個分類器,其中2個是第一疊代次的4個分類器中分類正確率排序的前2名者。 In other embodiments, the classifier evaluation module 160 determines the composition of the classifier module members 120 as follows: only keep the top n ranked by the classification accuracy of the classifier C1 in the previous iteration, where n is the middle class. The number between 1% and 50% of the number of classifiers in the previous iteration; or, n is any positive integer between 1 and the number of classifiers in the previous iteration. For example, if the classifier module members of the first iteration have 4 classifiers, then there are 6 classifiers in the second iteration, and 2 of them are classified in the 4 classifiers of the first iteration. The top 2 ranked by correct rate.

綜上,在本揭露實施例的問答學習方法中,人工只要針對相對少量確選待標記語句進行標記,因此能夠節省大量人工處理工時。此外,由於學習疊代過程中能保留前次之分類器,確選待標記語句相較於其它已標記語句的互補程度高(或互補)且一致性程度低(或不一致),因此能夠確保每一疊代次所挑選的確選待標記語句在加入問答學習系統中後,都能穩定持續提升問答準確率。 To sum up, in the question and answer learning method of the embodiment of the present disclosure, the manual only needs to mark a relatively small number of selected sentences to be marked, so a large amount of manual processing man-hours can be saved. In addition, since the previous classifier can be retained during the learning iterative process, the sentence to be marked is highly complementary (or complementary) and low in consistency (or inconsistent) compared with other marked sentences, so it can ensure that every After adding the selected sentences to be marked for a generation to the question and answer learning system, they can steadily and continuously improve the accuracy of the question and answer.

綜上所述,雖然本揭露已以實施例揭露如上,然其並非用以限定本揭露。本揭露所屬技術領域中具有通常知識者,在不脫離 本揭露之精神和範圍內,當可作各種之更動與潤飾。因此,本揭露之保護範圍當視後附之申請專利範圍所界定者為準。 To sum up, although the present disclosure has been disclosed as above through the embodiments, it is not intended to limit the present disclosure. Those who have general knowledge in the technical field to which this disclosure pertains will not deviate from Within the spirit and scope of this disclosure, various changes and modifications can be made. Therefore, the scope of protection of this disclosure shall be subject to the scope of the attached patent application.

100:問答學習系統 100: Question and Answer Learning System

110:分類器產生模組 110: Classifier generation module

120:分類器模組 120: classifier module

130:一致性程度評估模組 130: Consistency Evaluation Module

140:互補程度評估模組 140: Complementarity Evaluation Module

150:資料庫 150: database

160:分類器評估模組 160: classifier evaluation module

C1:分類器 C1: classifier

N1:數量 N1: Quantity

qi、q1~q100:語句 q i , q 1 ~q 100 : statement

Claims (17)

一種問答學習方法,包括:一分類器產生模組依據N個語句中已標記的N1個該語句建立一分類器模組,該分類器模組包含複數個分類器,各該分類器表示不同的問答分類模型,其中N及N1為正整數;該些分類器之各者判斷該N個語句中未標記之N2個該語句之每一者的所屬之語句類型,其中N2為正整數;於一致性程度評估步驟中,一致性程度評估模組依據該些分類器的判斷結果的一致性程度,從該未標記之N2個語句中挑選出N3個該語句,其中該些分類器對各該N3個語句的判斷結果係不一致,且N3為正整數;於互補程度評估步驟中,一互補程度評估模組從該N3個語句中挑選出彼此互補的N4個語句做為複數個確選待標記語句,其中N4為正整數;在標記該些確選待標記語句後,該分類器產生模組依據該已標記之N1個語句與該些確選待標記語句,重新建立該分類器模組之該些分類器;以及一分類器評估模組將該些重新建立前之分類器之至少一者加入該分類器模組中,以做為該分類器模組之成員;其中,互補的二語句彼此沒有重複的資訊量。 A question-and-answer learning method, including: a classifier generation module creates a classifier module based on N1 of the N sentences that have been marked. The classifier module includes a plurality of classifiers, each of which represents a different Question and answer classification model, where N and N1 are positive integers; each of these classifiers judges the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer; in agreement In the sexual degree evaluation step, the consistency degree evaluation module selects N3 sentences from the unmarked N2 sentences according to the consistency degree of the judgment results of the classifiers, and the classifiers perform the same for each of the N3 sentences. The judgment results of the sentences are inconsistent, and N3 is a positive integer; in the complementary degree evaluation step, a complementary degree evaluation module selects N4 sentences complementary to each other from the N3 sentences as plural sentences to be marked for selection , Where N4 is a positive integer; after marking the selected sentences to be marked, the classifier generation module re-creates the classifier module based on the marked N1 sentences and the selected sentences to be marked Some classifiers; and a classifier evaluation module adds at least one of the classifiers before the re-creation to the classifier module as a member of the classifier module; wherein the two complementary sentences are each other There is no repetitive amount of information. 如申請專利範圍第1項所述之問答學習方法,更包括: 依一致性程度由高至低的順序,排序該N3個語句。 For example, the question-and-answer learning method described in item 1 of the scope of patent application includes: The N3 sentences are sorted in the order of the degree of consistency from high to low. 如申請專利範圍第1項所述之問答學習方法,其中該互補程度評估步驟包括:該互補程度評估模組挑選該N3個語句中不一致性最高之該語句做為該些確選待標記語句之成員。 For example, the question and answer learning method described in item 1 of the scope of patent application, wherein the complementary degree evaluation step includes: the complementary degree evaluation module selects the sentence with the highest inconsistency among the N3 sentences as the selected sentence to be marked member. 如申請專利範圍第3項所述之問答學習方法,其中該互補程度評估步驟包括:該互補程度評估模組比較該N3個語句中不一致性次高之該語句與該些確選待標記語句之每一者的互補程度;當該不一致性次高之語句相較於該些確選待標記語句之每一者皆互補時,該互補程度評估模組將該不一致性次高之語句做為該些確選待標記語句之成員。 For example, the question and answer learning method described in item 3 of the scope of patent application, wherein the complementary degree evaluation step includes: the complementary degree evaluation module compares the sentence with the second highest inconsistency among the N3 sentences and the sentence to be marked. The degree of complementarity of each; when the sentence with the second highest inconsistency is complementary to each of the sentences to be marked for selection, the complementary degree evaluation module regards the sentence with the second highest inconsistency as the sentence These confirm the members of the sentence to be marked. 如申請專利範圍第4項所述之問答學習方法,其中該互補程度評估步驟包括:當該不一致性次高之語句相較於該些確選待標記語句之任一者不互補時,該互補程度評估模組不將該不一致性次高之語句做為該些確選待標記語句之成員。 For example, the question-and-answer learning method described in item 4 of the scope of patent application, wherein the complementary degree evaluation step includes: when the sentence with the second highest inconsistency is not complementary to any one of the selected sentences to be marked, the complementary The degree evaluation module does not use the sentence with the second highest inconsistency as a member of the sentences to be marked. 如申請專利範圍第1項所述之問答學習方法,其中該互補程度評估步驟包括:(a)該互補程度評估模組挑選該N3個語句中不一致性最高之該語句做為該些確選待標記語句之成員; (b)該互補程度評估模組比較該N3個語句中不一致性第i高之該語句與該些確選待標記語句之每一者的互補程度,其中i的初始值等於2;(c)當該不一致性第i高之語句相較於該些確選待標記語句之每一者皆互補時,該互補程度評估模組將該不一致性第i高之語句做為該些確選待標記語句之成員;以及(d)當i之值不等於N4時,累加i之值,然後重複執行步驟(b)~(d)。 For example, the question and answer learning method described in item 1 of the scope of patent application, wherein the complementary degree evaluation step includes: (a) the complementary degree evaluation module selects the sentence with the highest inconsistency among the N3 sentences as the candidates for confirmation Mark the members of the sentence; (b) The complementarity evaluation module compares the complementarity of the i-th highest inconsistency among the N3 sentences with each of the sentences to be marked, where the initial value of i is equal to 2; (c) When the sentences with the i-th highest inconsistency are complementary to each of the sentences to be marked for selection, the complementary degree evaluation module uses the sentences with the i-th highest inconsistency as the sentences to be marked for selection. The members of the statement; and (d) when the value of i is not equal to N4, add up the value of i, and then repeat steps (b) ~ (d). 如申請專利範圍第6項所述之問答學習方法,其中該互補程度評估步驟包括:該互補程度評估模組判斷該些確選待標記語句之數量是否已達N4;以及當該些確選待標記語句之數量達到N4或已無語句可挑選,該互補程度評估模組停止執行該互補程度評估步驟。 For example, the question-and-answer learning method described in item 6 of the scope of patent application, wherein the complementary degree evaluation step includes: the complementary degree evaluation module judges whether the number of selected sentences to be marked has reached N4; If the number of marked sentences reaches N4 or there are no sentences to choose from, the complementary degree evaluation module stops executing the complementary degree evaluation step. 如申請專利範圍第6項所述之問答學習方法,其中步驟(c)包括:分析該不一致性第i高之語句之文意與該些確選待標記語句之每一者之文意;以及當該不一致性第i高之語句之該文意與該些確選待標記語句之任一者之該文意互補時,該互補程度評估模組將該不一致性第i高之語句做為該些確選待標記語句之成員。 The question-and-answer learning method described in item 6 of the scope of patent application, wherein step (c) includes: analyzing the meaning of the sentence with the highest inconsistency and the meaning of each of the sentences to be marked; and When the context of the sentence with the i-th highest inconsistency is complementary to the context of any one of the selected sentences to be marked, the complementarity evaluation module uses the sentence with the i-th highest inconsistency as the sentence These confirm the members of the sentence to be marked. 一種問答學習系統,包括: 一分類器產生模組,用以依據N個語句中已標記的N1個該語句建立一分類器模組,該分類器模組包含複數個分類器,各該分類器表示不同的問答分類模型,其中N及N1為正整數;該些分類器之各者判斷該N個語句中未標記之N2個該語句之每一者的所屬之語句類型,其中N2為正整數;一致性程度評估模組用以:於一致性程度評估步驟中,依據該些分類器的判斷結果的一致性程度,從該未標記之N2個語句中挑選出N3個該語句,其中該些分類器對各該N3個語句的判斷結果係不一致,其中N3為正整數;一互補程度評估模組用以:於一互補程度評估步驟中,從該N3個語句中挑選出彼此互補的N4個語句做為複數個確選待標記語句,其中N4為正整數;其中,該分類器產生模組更用以:在標記該些確選待標記語句後,依據該已標記之N1個語句與該些確選待標記語句,重新建立該分類器模組之該些分類器;以及其中,該問答學習系統更包括:一分類器評估模組用以:將該些重新建立前之分類器之至少一者加入該分類器模組中,以做為該分類器模組之成員;其中,互補的二語句彼此沒有重複的資訊量。 A question-and-answer learning system, including: A classifier generation module is used to create a classifier module based on the marked N1 of the N sentences. The classifier module includes a plurality of classifiers, each of which represents a different question and answer classification model, Where N and N1 are positive integers; each of the classifiers judges the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer; the consistency evaluation module Used for: In the consistency evaluation step, according to the consistency of the judgment results of the classifiers, N3 sentences are selected from the N2 unlabeled sentences, wherein the classifiers perform each of the N3 sentences The judgment results of sentences are inconsistent, where N3 is a positive integer; a complementary degree evaluation module is used to: in a complementary degree evaluation step, select N4 sentences that are complementary to each other from the N3 sentences as plural confirmations The sentence to be marked, where N4 is a positive integer; wherein, the classifier generation module is further used to: after marking the sentences to be marked for selection, according to the marked N1 sentences and the sentences to be marked for selection, Re-establish the classifiers of the classifier module; and wherein, the question and answer learning system further includes: a classifier evaluation module for adding at least one of the classifiers before the re-establishment to the classifier module In the group, as a member of the classifier module; among them, the two complementary sentences have no overlapping information. 如申請專利範圍第9項所述之問答學習系統,其中互補程度評估模組更用以:依不一致性由高至低的順序,排序該N3個語句。 For example, in the question-and-answer learning system described in item 9 of the scope of patent application, the complementary degree evaluation module is further used to sort the N3 sentences in the order of inconsistency from high to low. 如申請專利範圍第9項所述之問答學習系統,其中互補程度評估模組更用以:挑選該N3個語句中不一致性最高之該語句做為該些確選待標記語句之成員。 For example, in the question and answer learning system described in item 9 of the scope of patent application, the complementary degree evaluation module is further used to select the sentence with the highest inconsistency among the N3 sentences as the members of the sentences to be marked. 如申請專利範圍第11項所述之問答學習系統,其中互補程度評估模組更用以:比較該N3個語句中不一致性次高之該語句與該些確選待標記語句之每一者的互補程度;當該不一致性次高之語句相較於該些確選待標記語句之每一者皆互補時,將該不一致性次高之語句做為該些確選待標記語句之成員。 For example, the question-and-answer learning system described in item 11 of the scope of patent application, wherein the degree of complementarity assessment module is further used to: compare the sentence with the second highest inconsistency among the N3 sentences with each of the sentences to be marked. Complementary degree; when the sentences with the second highest inconsistency are complementary to each of the sentences to be marked for selection, the sentences with the second highest inconsistency are used as members of the sentences to be marked for selection. 如申請專利範圍第12項所述之問答學習系統,其中互補程度評估模組更用以:當該不一致性次高之語句相較於該些確選待標記語句之任一者不互補時,不將該不一致性次高之語句做為該些確選待標記語句之成員。 For example, the question and answer learning system described in item 12 of the scope of patent application, wherein the degree of complementarity assessment module is further used: when the sentence with the second highest inconsistency is not complementary to any one of the selected sentences to be marked, The sentence with the second highest inconsistency is not used as a member of the sentences to be marked. 如申請專利範圍第9項所述之問答學習系統,其中互補程度評估模組更用以:(a)挑選該N3個語句中不一致性最高之該語句做為該些確選待標記語句之成員;(b)比較該N3個語句中不一致性第i高之該語句與該些確選待標記語句之每一者的互補程度,其中i的初始值等於2; (c)當該不一致性第i高之語句相較於該些確選待標記語句之每一者皆互補時,將該不一致性第i高之語句做為該些確選待標記語句之成員;以及(d)當i之值不等於N4時,累加i之值,然後重複執行步驟(b)~(d)。 For example, the question-and-answer learning system described in item 9 of the scope of patent application, wherein the degree of complementarity assessment module is further used to: (a) select the sentence with the highest inconsistency among the N3 sentences as the members of the selected sentences to be marked (B) Compare the complementarity of the i-th highest inconsistency among the N3 sentences with each of the sentences to be marked, where the initial value of i is equal to 2; (c) When the i-th sentence with the highest inconsistency is complementary to each of the sentences to be marked for selection, use the sentence with the highest inconsistency as the member of the sentences to be marked for selection ; And (d) When the value of i is not equal to N4, accumulate the value of i, and then repeat steps (b) ~ (d). 如申請專利範圍第14項所述之問答學習系統,其中互補程度評估模組更用以:判斷該些確選待標記語句之數量是否已達N4或已無語句可挑選;以及當該些確選待標記語句之數量達到N4,停止執行該互補程度評估步驟。 For example, in the question-and-answer learning system described in item 14 of the scope of patent application, the complementary degree evaluation module is further used to determine whether the number of sentences to be marked has reached N4 or there are no sentences to be selected; and when the sentences are confirmed The number of selected sentences to be marked reaches N4, and the execution of the complementary degree evaluation step is stopped. 如申請專利範圍第14項所述之問答學習系統,其中互補程度評估模組於步驟(c)中更用以:分析該不一致性第i高之語句之文意與該些確選待標記語句之每一者之文意;以及當該不一致性第i高之語句之該文意與該些確選待標記語句之任一者之該文意互補時,該互補程度評估模組將該不一致性第i高之語句做為該些確選待標記語句之成員。 For example, the question-and-answer learning system described in item 14 of the scope of patent application, wherein the degree of complementarity evaluation module is further used in step (c) to analyze the meaning of the sentence with the highest inconsistency and the sentences to be marked for selection The context of each of the sentences; and when the context of the sentence with the highest inconsistency is complementary to the context of any of the sentences to be marked, the complementarity evaluation module will be inconsistent The sentence with the highest sex is the member of the sentence to be marked. 一種電腦程式產品,用以載入於一問答學習系統,以執行一問答學習方法,該問答學習方法包括:一分類器產生模組依據N個語句中已標記的N1個該語句建立一分類器模組,該分類器模組包含複數個分類器,各該分類器表示不同的問答分類模型,其中N及N1為正整數; 該些分類器之各者判斷該N個語句中未標記之N2個該語句之每一者的所屬之語句類型,其中N2為正整數;於一致性程度評估步驟中,一致性程度評估模組依據該些分類器的判斷結果的一致性程度,從該未標記之N2個語句中挑選出N3個該語句,其中該些分類器對各該N3個語句的判斷結果係不一致,且N3為正整數;於互補程度評估步驟中,一互補程度評估模組從該N3個語句中挑選出彼此互補的N4個語句做為複數個確選待標記語句,其中N4為正整數;在標記該些確選待標記語句後,該分類器產生模組依據該已標記之N1個語句與該些確選待標記語句,重新建立該分類器模組之該些分類器;以及一分類器評估模組將該些重新建立前之分類器之至少一者加入該分類器模組中,以做為該分類器模組之成員;其中,互補的二語句彼此沒有重複的資訊量。 A computer program product for loading into a question-and-answer learning system to implement a question-and-answer learning method. The question-and-answer learning method includes: a classifier generation module creates a classifier based on N1 of the N sentences that have been marked Module, the classifier module includes a plurality of classifiers, each of which represents a different question and answer classification model, where N and N1 are positive integers; Each of the classifiers judges the sentence type of each of the N2 unmarked sentences in the N sentences, where N2 is a positive integer; in the consistency evaluation step, the consistency evaluation module According to the consistency of the judgment results of the classifiers, N3 sentences are selected from the unlabeled N2 sentences, wherein the judgment results of the N3 sentences by the classifiers are inconsistent, and N3 is positive Integer; in the complementary degree evaluation step, a complementary degree evaluation module selects N4 sentences that are complementary to each other from the N3 sentences as plural sentences to be marked for confirmation, where N4 is a positive integer; in marking the confirmation After the sentences to be marked are selected, the classifier generation module re-creates the classifiers of the classifier module based on the marked N1 sentences and the sentences to be marked; and a classifier evaluation module will At least one of the classifiers before the re-creation is added to the classifier module as a member of the classifier module; wherein the two complementary sentences do not have overlapping information with each other.
TW108148096A 2019-12-27 2019-12-27 Question-answering learning method and question-answering learning system using the same and computer program product thereof TWI737101B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW108148096A TWI737101B (en) 2019-12-27 2019-12-27 Question-answering learning method and question-answering learning system using the same and computer program product thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108148096A TWI737101B (en) 2019-12-27 2019-12-27 Question-answering learning method and question-answering learning system using the same and computer program product thereof

Publications (2)

Publication Number Publication Date
TW202125342A TW202125342A (en) 2021-07-01
TWI737101B true TWI737101B (en) 2021-08-21

Family

ID=77908452

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108148096A TWI737101B (en) 2019-12-27 2019-12-27 Question-answering learning method and question-answering learning system using the same and computer program product thereof

Country Status (1)

Country Link
TW (1) TWI737101B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7603330B2 (en) * 2006-02-01 2009-10-13 Honda Motor Co., Ltd. Meta learning for question classification
WO2014008272A1 (en) * 2012-07-02 2014-01-09 Microsoft Corporation Learning-based processing of natural language questions
CN109101537A (en) * 2018-06-27 2018-12-28 北京慧闻科技发展有限公司 More wheel dialogue data classification methods, device and electronic equipment based on deep learning
CN110321418A (en) * 2019-06-06 2019-10-11 华中师范大学 A kind of field based on deep learning, intention assessment and slot fill method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7603330B2 (en) * 2006-02-01 2009-10-13 Honda Motor Co., Ltd. Meta learning for question classification
WO2014008272A1 (en) * 2012-07-02 2014-01-09 Microsoft Corporation Learning-based processing of natural language questions
CN109101537A (en) * 2018-06-27 2018-12-28 北京慧闻科技发展有限公司 More wheel dialogue data classification methods, device and electronic equipment based on deep learning
CN110321418A (en) * 2019-06-06 2019-10-11 华中师范大学 A kind of field based on deep learning, intention assessment and slot fill method

Also Published As

Publication number Publication date
TW202125342A (en) 2021-07-01

Similar Documents

Publication Publication Date Title
US10990767B1 (en) Applied artificial intelligence technology for adaptive natural language understanding
CN109635108B (en) Man-machine interaction based remote supervision entity relationship extraction method
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
US20050071301A1 (en) Learning system and learning method
JP5137567B2 (en) Search filtering device and search filtering program
CN111581350A (en) Multi-task learning, reading and understanding method based on pre-training language model
CN106997474A (en) A kind of node of graph multi-tag sorting technique based on deep learning
CN110175224A (en) Patent recommended method and device based on semantic interlink Heterogeneous Information internet startup disk
CN112765312A (en) Knowledge graph question-answering method and system based on graph neural network embedding matching
CN111143539A (en) Knowledge graph-based question-answering method in teaching field
CN115048944A (en) Open domain dialogue reply method and system based on theme enhancement
CN113705196A (en) Chinese open information extraction method and device based on graph neural network
CN109710762B (en) Short text clustering method integrating multiple feature weights
TWI737101B (en) Question-answering learning method and question-answering learning system using the same and computer program product thereof
CN110377753B (en) Relation extraction method and device based on relation trigger word and GRU model
Razek et al. A Context-Based Information Agent for Supporting Intelligent Distance Learning Environments.
CN113051393A (en) Question-answer learning method, question-answer learning system and computer program product thereof
US20230168989A1 (en) BUSINESS LANGUAGE PROCESSING USING LoQoS AND rb-LSTM
CN114896391A (en) Method, system, device and medium for classifying small sample sentence patterns based on task prompt
Solomonott Inductive Inference Theory-A Unified Approach to Problems in Pattern Recognition and Artificial Intelligence.
CN114282497A (en) Method and system for converting text into SQL
Bréhélin et al. Hidden Markov models with patterns to learn Boolean vector sequences and applications to the built-in self-test for integrated circuits
Akram et al. Actively learning probabilistic subsequential transducers
Sankarpandi et al. Active learning without unlabeled samples: generating questions and labels using Monte Carlo Tree Search
CN1258725C (en) Method for extracting words containing two Chinese characters based on restriction of semantic word forming