TW201619850A - Question processing system and method thereof - Google Patents

Question processing system and method thereof Download PDF

Info

Publication number
TW201619850A
TW201619850A TW103140400A TW103140400A TW201619850A TW 201619850 A TW201619850 A TW 201619850A TW 103140400 A TW103140400 A TW 103140400A TW 103140400 A TW103140400 A TW 103140400A TW 201619850 A TW201619850 A TW 201619850A
Authority
TW
Taiwan
Prior art keywords
question
candidate
natural language
processing
correction
Prior art date
Application number
TW103140400A
Other languages
Chinese (zh)
Other versions
TWI553491B (en
Inventor
沈民新
邱中人
張如瑩
張俊盛
Original Assignee
財團法人工業技術研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 財團法人工業技術研究院 filed Critical 財團法人工業技術研究院
Priority to TW103140400A priority Critical patent/TWI553491B/en
Priority to CN201410782497.7A priority patent/CN105760359B/en
Publication of TW201619850A publication Critical patent/TW201619850A/en
Application granted granted Critical
Publication of TWI553491B publication Critical patent/TWI553491B/en

Links

Abstract

A question processing system and method thereof is provided. The question processing system comprises a question construction module with a wrong word processing unit and a mistaken word processing unit, and a question classification processing module. The wrong word processing unit detects and corrects wrong word or Martian language of natural language question to generate a correction question to conform to question intent of the natural language question. The mistaken word processing unit analyzes collocation relation of at least two phrases of the correction question, and amends mistaken word of the correction question to generate at least one candidate question to conform to the question intent. The question classification processing module analyzes the candidate question to generate question classification. Thereby, the disclosure can have a good fault-tolerant ability.

Description

問句處理系統及其方法 Question processing system and method thereof

本揭露係關於一種問句處理系統及其方法,特別是指一種具備容錯能力之問句處理系統及其方法。 The disclosure relates to a question processing system and a method thereof, and in particular to a problem processing system and method thereof.

習知技術之搜尋引擎或問答系統中,由於其不具備問句容錯能力,因此當使用者輸入含有錯別字詞、火星文(Martian language)、誤用字詞或缺漏字詞之自然語言問句時,可能會造成該搜尋引擎或問答系統誤判該自然語言問句之問句意圖,因而回覆錯誤的答案予該使用者。 In the search engine or question answering system of the prior art, since the user does not have the ability to tolerate the question, when the user inputs a natural language question containing a typos, a Martian language, a misuse word or a missing word, It may cause the search engine or the question answering system to misjudge the intent of the natural language question and thus reply the wrong answer to the user.

第1A圖至第1C圖係分別繪示習知技術中含有火星文、音似之錯別字詞與形似之錯別字詞之自然語言問句之表單。圖中,該些自然語言問句中含有許多不恰當的用語(以底線_標示),例如第1A圖所示之火星文(如注音文),或者第1B圖所示音似之錯別字詞,抑或者第1C圖所示形似之錯別字詞,該些火星文或錯別字詞將明顯地降低搜尋引擎或問答系統對該自然語言問句之回覆答案之正確率。 Figs. 1A to 1C are diagrams showing a form of a natural language question containing a singular word of a Martian text, a sound like, and a typos of a similar word in the prior art. In the figure, the natural language questions contain many inappropriate terms (indicated by the bottom line _), such as the Martian text (such as phonetic text) shown in Figure 1A, or the typos of the sound shown in Figure 1B. Or the typos in the form shown in Figure 1C, these Martian or typos will significantly reduce the accuracy of the response to the natural language question by the search engine or question answering system.

第2A圖至第2G圖係分別繪示習知技術中以搜尋引擎或自動問答系統提供含有關鍵詞組11、錯別字詞13、火星 文14或誤用字詞15之自然語言問句之答案之網頁。 2A to 2G are respectively shown in the prior art, which are provided with a keyword group 11, a typos 13 and a Mars in a search engine or an automatic question answering system. Page 14 or the page of the answer to the natural language question of the misuse of the word 15.

在第2A圖之搜尋引擎(如Google)中,該搜尋引擎對於例如中文之自然語言問句「日本戰嶺台灣幾年」,雖可將關鍵詞組11「戰嶺」修正為關鍵詞組12「佔領」,並直接以「日本佔領台灣幾年」進行搜尋,但仍無法搜尋到有關「幾年」的正確答案。 In the search engine (such as Google) in Figure 2A, the search engine can correct the keyword group 11 "Zhanling" to the keyword group 12 for the natural language question such as Chinese. And directly searched for "Japan has occupied Taiwan for a few years", but still can't find the correct answer about "years."

在第2B圖之搜尋引擎(如Google)中,該搜尋引擎對於例如中文之自然語言問句「楊傳廣是那一足的」,並無法將錯別字詞13「那一足的」進行修正,以致無法搜尋到適當的答案。 In the search engine (such as Google) in Figure 2B, the search engine for the natural language question such as Chinese "Yang Chuanguang is that one", and can not correct the typo 13 "that one foot", so that it can not be found The appropriate answer.

而在第2C圖之搜尋引擎(如Google)中,該搜尋引擎對於例如中文之自然語言問句「鄭成功的ㄐㄩˋ點?」,亦無法將火星文14(如注音文)「ㄐㄩˋ」進行修正,因而無法搜尋到正確的答案。 In the search engine (such as Google) in Figure 2C, the search engine can't use the Martian text 14 (such as the phonetic transcription) for the natural language question such as Chinese. The correction was made so that the correct answer could not be found.

又在第2D圖之搜尋引擎(如Google)中,該搜尋引擎對於例如英文之自然語言問句「rice plented」,雖可將關鍵詞組11「rice plented」修正為「rice planted」以進行搜尋,但仍無法搜尋到適當的答案。 Also in the search engine of the 2D map (such as Google), the search engine can correct the keyword group "rice plented" to "rice planted" for searching, for example, the natural language question "rice plented" in English. But still can't find the right answer.

另在第2E圖之搜尋引擎(如Google)中,該搜尋引擎對於例如英文之自然語言問句「whom is taiwan president」,雖可直接以「who」代替關鍵詞組11「whom」進行搜尋,但仍無法搜尋到正確的答案。 In the search engine of FIG. 2E (such as Google), the search engine can search for the natural language phrase "who is taiwan president", for example, in English, but can directly search for keyword group 11 "whom" instead of "who". Still can't find the right answer.

而在第2F圖之自動問答系統(如WorframAlpha)中,該自動問答系統對於例如英文之自然語言問句「where does rice live」,並無法將誤用字詞15「rice live」進行修正,以致回覆錯誤的答案16。 In the automatic question answering system of Figure 2F (such as WorframAlpha), the automatic question answering system for natural language questions such as English "where does Rice live", and the misuse of the word "rice live" cannot be corrected, so that the wrong answer 16 is answered.

又在第2G圖之自動問答系統(如WorframAlpha)中,該自動問答系統對於例如英文之自然語言問句「Where is the Taiwan President」,也無法將誤用字詞15「Where」進行修正,因而回覆錯誤的答案16。 Also in the automatic question answering system of Fig. 2G (such as WorframAlpha), the automatic question answering system cannot correct the misuse word 15 "Where" for the natural language question "Where is the Taiwan President", for example, in English. Wrong answer 16.

因此,如何克服上述習知技術的問題,實已成目前亟欲解決的課題。 Therefore, how to overcome the problems of the above-mentioned prior art has become a problem that is currently being solved.

本揭露係提供一種問句處理系統及其方法,其可具備良好的容錯能力,以提高對自然語言問句之回覆答案之正確率。 The disclosure provides a question processing system and a method thereof, which can have good fault tolerance, so as to improve the correct rate of answering answers to natural language questions.

本揭露之問句處理系統係應用於具有處理器、記憶體與作業系統之電子裝置中,且該問句處理系統包括一具有錯字處理單元與誤用詞處理單元之問句建構模組以及一問句類別處理模組。該錯字處理單元係偵測並修正自然語言問句之錯別字詞或火星文,以產生一符合該自然語言問句之問句意圖之校正問句。該誤用詞處理單元係分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞以產生至少一符合該問句意圖之候選問句。該問句類別處理模組係分析該候選問句以產生該候選問句之問句類別。 The problem processing system of the present disclosure is applied to an electronic device having a processor, a memory and an operating system, and the question processing system includes a question construction module having a typo processing unit and a misuse word processing unit, and a question Sentence category processing module. The typo processing unit detects and corrects the typos or natural language of the natural language question to generate a corrective question that matches the intent of the natural language question. The misused word processing unit analyzes the collocation relationship of at least two phrases of the correction question, and corrects the misuse word of the correction question according to the collocation relationship to generate at least one candidate question that meets the intention of the question. The question category processing module analyzes the candidate question to generate a question category of the candidate question.

本揭露之問句處理方法係應用於具有處理器、記憶體與作業系統之電子裝置中,且該問句處理方法包括:偵測 並修正自然語言問句之錯別字詞或火星文,以產生一符合該自然語言問句之問句意圖之校正問句;分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞以產生至少一符合該問句意圖之候選問句;以及分析該候選問句以產生該候選問句之問句類別。 The problem processing method of the present disclosure is applied to an electronic device having a processor, a memory, and an operating system, and the question processing method includes: detecting And correcting the typos or natural language of the natural language question to generate a correcting question that matches the intention of the natural language question; analyzing the collocation relationship of at least two phrases of the correcting question, and correcting according to the collocation relationship The mistype word of the correction question is used to generate at least one candidate question that meets the intent of the question; and the candidate question is analyzed to generate the question category of the candidate question.

上述之問句處理系統及其方法中,可以缺漏詞處理單元分析該候選問句之缺漏字詞,並自語料庫或同義/近義詞庫中擷取至少一搭配詞以補足該校正問句之缺漏字詞而產生該候選問句。 In the above question processing system and method thereof, the missing word processing unit may analyze the missing words of the candidate question and extract at least one collocation from the corpus or the synonym/sense vocabulary to fill the missing words of the corrected question. The word produces the candidate question.

由上述內容可知,本揭露之問句處理系統及其方法中,主要是透過問句建構模組之錯字處理單元、誤用詞處理單元與缺漏詞處理單元,以分別修正自然語言問句之錯別字詞、火星文、誤用字詞及缺漏字詞,並透過問句類別處理模組分析該自然語言問句之問句類別。 It can be seen from the above that in the problem processing system and method of the present disclosure, the typo processing unit, the misuse word processing unit and the missing word processing unit of the question construction module are mainly used to correct the typos of the natural language question respectively. , Martian text, misuse of words and missing words, and analyze the question type of the natural language question through the question category processing module.

藉此,本揭露能具備良好的容錯能力,以容忍該自然語言問句之錯別字詞、火星文、誤用字詞及缺漏字詞,並降低對該自然語言問句之問句意圖之分析錯誤所造成之影響,進而提高對該自然語言問句之回覆答案之正確率。 In this way, the disclosure can have good fault tolerance, to tolerate the typos, the Martian, the misuse and the missing words of the natural language question, and to reduce the analysis error of the question of the natural language question. The impact, and thus the correct answer to the answer to the natural language question.

11、12‧‧‧關鍵詞組 11, 12‧‧ ‧ keyword group

13‧‧‧錯別字詞 13‧‧‧ typos

14‧‧‧火星文 14‧‧‧Martian

15‧‧‧誤用字詞 15‧‧‧ Misuse of words

16‧‧‧答案 16‧‧‧ answers

2‧‧‧問句處理系統 2‧‧‧Question Processing System

20‧‧‧使用者介面 20‧‧‧User interface

21‧‧‧問句建構模組 21‧‧‧ Question Building Module

211‧‧‧錯字處理單元 211‧‧‧ typo processing unit

211a‧‧‧翻譯模型 211a‧‧‧ translation model

211b‧‧‧語言模型 211b‧‧‧ language model

212‧‧‧誤用詞處理單元 212‧‧‧Miscellaneous word processing unit

213‧‧‧缺漏詞處理單元 213‧‧‧ Missing word processing unit

214‧‧‧關鍵詞組擷取單元 214‧‧‧Keyword group capture unit

22‧‧‧問句類別處理模組 22‧‧‧Question class processing module

23‧‧‧語料庫 23‧‧‧ Corpus

24‧‧‧同義/近義詞庫 24‧‧‧Synonyms/synonyms

25‧‧‧知識庫 25‧‧ ‧ knowledge base

26‧‧‧段落檢索模組 26‧‧‧ paragraph search module

261‧‧‧文件 261‧‧ ‧ documents

262‧‧‧段落 262‧‧‧ paragraph

27‧‧‧答案處理模組 27‧‧‧Answer Processing Module

271‧‧‧答案 271‧‧‧ answers

41‧‧‧自然語言問句 41‧‧‧Natural language questions

42‧‧‧候選問句 42‧‧‧ Candidate Questions

43、44‧‧‧關鍵詞組 43, 44‧‧‧ keyword group

S31至S36‧‧‧步驟 S31 to S36‧‧‧ steps

第1A圖至第1C圖係分別繪示習知技術中含有火星文、音似之錯別字詞與形似之錯別字詞之自然語言問句之表單;第2A圖至第2G圖係分別繪示習知技術中以搜尋引擎搜尋或自動問答系統提供含有關鍵詞組、錯別字詞、火星 文或誤用字詞之自然語言問句之答案之網頁;第3圖係繪示本揭露之問句處理系統之方塊示意圖;第4圖係繪示本揭露之問句處理方法之流程示意圖;以及第5圖係繪示本揭露之問句處理系統及其方法之實施例示意圖。 Figures 1A to 1C are diagrams showing the forms of natural language questions containing singular words and similar typos in the prior art; and Figs. 2A to 2G are respectively shown in the drawings. Technology with search engine search or automated question answering system with keyword groups, typos, Mars a webpage of the answer to the natural language question of the word or misuse word; FIG. 3 is a block diagram showing the problem processing system of the present disclosure; and FIG. 4 is a schematic flow chart showing the processing method of the question of the present disclosure; FIG. 5 is a schematic diagram showing an embodiment of a question processing system and method thereof.

以下藉由特定的具體實施形態說明本揭露之實施方式,熟悉此技術之人士可由本說明書所揭示之內容輕易地了解本揭露之其他優點與功效,亦可藉由其他不同的具體實施形態加以施行或應用。 The embodiments of the present disclosure are described in the following specific embodiments, and those skilled in the art can easily understand other advantages and functions of the disclosure by the contents disclosed in the specification, and can also be implemented by other different embodiments. Or application.

第3圖係繪示本揭露之問句處理系統2之方塊示意圖。如圖所示,問句處理系統2可應用於具有處理器、記憶體與作業系統之電子裝置中,且該問句處理系統2主要包括一使用者介面(User Interface,UI)20、一具有錯字處理單元211與誤用詞處理單元212之問句建構模組21、以及一問句類別處理模組22。該電子裝置可為個人電腦、平板電腦、筆記型電腦、網路伺服器、雲端伺服器、行動電話或智慧型手機等。 FIG. 3 is a block diagram showing the problem processing system 2 of the present disclosure. As shown in the figure, the question processing system 2 can be applied to an electronic device having a processor, a memory, and an operating system, and the question processing system 2 mainly includes a user interface (UI) 20, one having The typo processing unit 211 and the question construction module 21 of the misuse word processing unit 212 and a question category processing module 22. The electronic device can be a personal computer, a tablet computer, a notebook computer, a web server, a cloud server, a mobile phone, or a smart phone.

該使用者介面20係供使用者輸入自然語言問句,且該自然語言問句可為中文、英文或各種的語言。該錯字處理單元211係偵測並修正該自然語言問句之錯別字詞或火星文,以產生一符合該自然語言問句之問句意圖之校正問句。 The user interface 20 is for the user to input a natural language question, and the natural language question can be Chinese, English or various languages. The typo processing unit 211 detects and corrects the typos or the Martian text of the natural language question to generate a corrective question that matches the intent of the natural language question.

具體而言,該錯字處理單元211可依據錯別字偵測法 或火星文轉譯法,以偵測並修正該自然語言問句之錯別字詞或火星文,且該錯字處理單元211可具有翻譯模型211a與語言模型(Language Model)211b。 Specifically, the typo processing unit 211 can be based on the typo detection method. Or Martian translation method to detect and correct the typos or Martian text of the natural language question, and the typo processing unit 211 can have a translation model 211a and a language model 211b.

該翻譯模型211a係提供該自然語言問句之錯別字詞或火星文之修正資料,例如:(1)音似或形似之錯誤字、(2)詞組之字彙之特徵值(如同偏旁、同字首、部首差別、部首筆劃差、偏旁筆劃差、注音差或調號差)、(3)易混淆字彙(如躁vs.燥)。 The translation model 211a provides the typos of the natural language question or the correction data of the Martian text, for example: (1) the erroneous word of the sound like or the like, and (2) the eigenvalue of the vocabulary of the phrase (like the radical, the same prefix) , the difference between the radicals, the radical strokes, the deviation of the radicals, the difference of the phonetic or the difference between the notes, and (3) the confusing vocabulary (such as 躁 vs. dry).

又,該語言模型211b係依據該修正資料修正該自然語言問句之錯別字詞或火星文以產生該校正問句,且該語言模型211b可為以n-連詞為基礎之統計式語言模型(Statistical Language Model,SLM)、或類神經網路語言模型(Neural Network-based Language Modeling,NNLM)等。該語言模型211b可具有解碼單元(decoder)以轉換一個中文字、一個注音符號或一串完整的注音符號至原字、音似/形似字或注音之對應字。 Moreover, the language model 211b corrects the typos or the Martian text of the natural language question according to the correction data to generate the correction question, and the language model 211b can be a statistical language model based on the n-joint word (Statistical) Language Model, SLM), or Neural Network-based Language Modeling (NNLM). The language model 211b may have a decoder to convert a Chinese character, a phonetic symbol or a complete string of phonetic symbols to the original word, the sound like/shape like word or the corresponding word of the phonetic.

上述之錯別字詞可例如為習知技術第1B圖所示音似之錯別字詞、或第1C圖所示形似之錯別字詞,且該錯別字詞可為該自然語言問句之疑問詞(question words)或功能詞(function words)等,該功能詞可為限定詞(如這,一隻,我的)、代名詞(如你,我,他)、前置詞/介系詞/後置詞(如上,下,為了)、或連接詞(如和,或,如果)等。該火星文可例如為習知技術第1A圖所示之注音文,亦可為表情符號(如*,#,!)等。 The above-mentioned typos can be, for example, a typo of the sound shown in FIG. 1B of the prior art, or a typos of a similar shape as shown in FIG. 1C, and the typos can be a question word of the natural language question. Or function words, etc., the function words can be qualifiers (such as this, one, my), pronouns (such as you, me, him), prepositions / prepositions / postpositions (above, below, For), or connect words (such as and, or, if) and so on. The Martian text may be, for example, a phonetic transcription shown in FIG. 1A of the prior art, or may be an emoji (eg, *, #, !).

舉例而言,在例如中文之自然語言問句中,該錯字處理單元211可偵測使用者所輸入之自然語言問句「水稻住在舍麼ㄉ」之錯別字詞「舍」與火星文「ㄉ」,並依據該自然語言問句之問句意圖將「舍」與「ㄉ」分別修正為正確字詞「什」與「地」,亦即將該自然語言問句「水稻住在舍麼ㄉ」修正為校正問句「水稻住在什麼地」。 For example, in a natural language question such as Chinese, the typo processing unit 211 can detect the typos "she" and "Martian" of the natural language question "rice living in the house" input by the user. According to the question of the natural language question, it is intended to correct "she" and "ㄉ" to the correct words "sh" and "land" respectively. It is also about the natural language question "The rice lives in the house" Corrected to the correction question "Where does the rice live?"

該誤用詞處理單元212係分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞以產生至少一符合該問句意圖之候選問句。 The misuse word processing unit 212 analyzes the collocation relationship of at least two phrases of the correction question, and corrects the misuse word of the correction question according to the collocation relationship to generate at least one candidate question that meets the intention of the question.

詳言之,該誤用詞處理單元212係分析該問句意圖與該校正問句之至少二詞組之語境(context)是否衝突,並於發生衝突時依據該語境且自語料庫23或同義/近義詞庫24中擷取至少一第一搭配詞,以利用該第一搭配詞修正該校正問句之誤用字詞而產生該候選問句,使得該候選問句之詞組之語境不具有衝突且符合該問句意圖。 In detail, the misuse word processing unit 212 analyzes whether the question intent conflicts with the context of at least two phrases of the correction question, and according to the context and from the corpus 23 or synonym/ At least one first collocation word is extracted from the thesaurus 24 to use the first collocation to correct the misuse words of the correction question to generate the candidate question, so that the context of the phrase of the candidate question does not conflict and In line with the intent of the question.

例如,該誤用詞處理單元212分析出該校正問句「水稻住在什麼地」中,三個詞組「水稻」、「住」與「地」之搭配關係較差並具有衝突性,因「水稻」通常不與「住」共用在同一問句中,而且該詞組「住」應為誤用字詞,故該誤用詞處理單元212可依據該三個詞組之搭配關係以擷取至少一第一搭配詞「種植」或「栽種」來修正該詞組「住」,亦即將該校正問句「水稻住在什麼地」修正為符合該問句意圖之候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」...等等。 For example, the misuse word processing unit 212 analyzes the correction question phrase "Where does the rice live?", and the three phrases "rice", "residence" and "land" have a poor relationship and are conflicting because of "rice". Usually, it is not shared with "stay" in the same question, and the phrase "live" should be a misuse word, so the misuse word processing unit 212 can learn at least one first collocation according to the collocation relationship of the three phrases. "planting" or "planting" to correct the phrase "residence", and the correction question "Where does the rice live" is amended to be a candidate question in line with the intent of the question: "Where is rice planted" or "rice planting" In what place...and so on.

該問句類別處理模組22係分析該候選問句以產生該候選問句之問句類別,且該問句類別可為人、事、時、地、物、數量、質量、速度、高度、尺寸...等各種類型或種類。 The question category processing module 22 analyzes the candidate question to generate a question category of the candidate question, and the question category can be person, event, time, place, object, quantity, quality, speed, altitude, Various types or types such as size.

該問句建構模組21亦可具有缺漏詞處理單元213,係分析該候選問句之缺漏字詞,並自該語料庫23或同義/近義詞庫24中擷取至少一第二搭配詞,以利用該第二搭配詞補足該校正問句之缺漏字詞而產生該候選問句,使得該候選問句之詞組之語境完整且符合該問句意圖。 The question construction module 21 may further include a missing word processing unit 213, which analyzes the missing words of the candidate question and extracts at least one second collocation from the corpus 23 or the synonym/sense vocabulary 24 to utilize The second collocation complements the missing word of the correction question to generate the candidate question, so that the phrase of the candidate question phrase is complete and conforms to the question intention.

例如,該缺漏詞處理單元213分析出該候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」中,「地」應為「地方」或「地區」之意,則該缺漏詞處理單元213擷取至少一第二搭配詞「地方」或「地區」以修正「地」並加上問號「?」,藉以補足該校正問句之缺漏字詞而產生完整的候選問句,亦即將該候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」修正為「水稻種植在什麼地方?」、「水稻種植在什麼地區?」、「水稻栽植在什麼地方?」或「水稻栽植在什麼地區?」...等等。 For example, the missing word processing unit 213 analyzes the candidate question "Where is the rice planted" or "Where is the rice planted", and the "land" should be the meaning of "place" or "region", then the missing word The processing unit 213 extracts at least one second collocation "place" or "region" to correct the "ground" and adds a question mark "?" to supplement the missing words of the correction question to generate a complete candidate question. That is, the candidate question "Where is the rice planted" or "Where is the rice planted?" is amended to "Where is the rice planted?", "Where is the rice planted?", "Where is the rice planted?" or " Where is the rice planted?"...etc.

上述至少一候選問句可為複數最優先之候選問句,且該問句類別處理模組22可依據問句分類模型與知識庫25,以分析該些最優先之候選問句而產生該些最優先之候選問句之問句類別。 The at least one candidate question may be a plurality of first-priority candidate questions, and the question category processing module 22 may generate the points according to the question classification model and the knowledge base 25 to analyze the top priority candidate questions. The question category of the highest priority candidate question.

該問句分類模型可包括混合分類法(hybrid approaches)、正規表示規則(regular expression rule)、機器學習分類器(classifier for machine learning)、支援向量機 (support vector machine,SVM)、最大熵函數分類器(Maximum Entropy classifier)、或決策樹分類器(decision tree classifier)等。 The question classification model may include hybrid approaches, regular expression rules, classifier for machine learning, and support vector machines. (support vector machine, SVM), Maximum Entropy classifier, or decision tree classifier.

該知識庫25可提供對應該些最優先之候選問句之問句類別之資料,例如:最優先之候選問句為「至聖先師是哪一位?」,則該知識庫25提供該問句類別為「人」。該知識庫25也可提供對應該些最優先之候選問句之問句類別之規則,例如:假如「有」字後面或前面接「哪些人」、「哪位」或「哪幾位」,則該知識庫25提供該問句類別為「人」;或者,假如「要」字後面接「多久」,則該知識庫25提供該問句類別為「時」。 The knowledge base 25 can provide information on the types of questions corresponding to the most preferred candidate questions. For example, if the highest priority candidate question is "Which of the Holy Stories is?", the knowledge base 25 provides the question. The sentence category is "person". The knowledge base 25 can also provide rules for the question categories of the top priority candidate questions, for example, if "there" is followed by "who", "who" or "what", Then, the knowledge base 25 provides the question type as "person"; or, if the word "to" is followed by "how long", the knowledge base 25 provides the question type as "hour".

該問句類別處理模組22可依據該些最優先之候選問句之信心分數重新排序該些最優先之候選問句,並自該些最優先之候選問句中擷取超過預定之信心分數門檻值且具有最高信心分數者作為第一優先之候選問句。 The question category processing module 22 may reorder the top priority candidate questions according to the confidence scores of the top priority candidate questions, and extract more than a predetermined confidence score from the highest priority candidate questions. The threshold value and the highest confidence score are the first priority candidate questions.

例如,該問句類別處理模組22可重新排序上述之候選問句「水稻種植在什麼地方?」、「水稻種植在什麼地區?」、「水稻栽植在什麼地方?」與「水稻栽植在什麼地區?」,並以「水稻栽植在什麼地區?」作為該第一優先之候選問句。 For example, the question category processing module 22 can reorder the candidate questions "Where is the rice planted?", "Where is the rice planted?", "Where is the rice planted?" and "What is rice planted?" "Region?" and "Where is the rice planted?" is the first priority candidate.

該問句建構模組21可具有關鍵詞組擷取單元214,係依據該第一優先之候選問句產生至少一關鍵詞組或一問句建構結果。例如,該關鍵詞組擷取單元214可自該第一優先之候選問句「水稻栽植在什麼地區?」中產生三個關鍵 詞組「水稻」、「栽種」及「地區」,或者產生一個問句建構結果「水稻栽種地區」。 The question construction module 21 may have a keyword group extraction unit 214, which generates at least one keyword group or a question construction result according to the first priority candidate question. For example, the keyword group capturing unit 214 may generate three keys from the first priority candidate question "Where is the rice planted?" The phrase "rice", "planting" and "region", or the result of a question construction "rice planting area".

該問句處理系統2可包括段落檢索模組26與答案處理模組27,該段落檢索模組26係自至少一文件261中擷取符合該第一優先之候選問句之關鍵詞組或問句建構結果之段落262,而該答案處理模組27係自該段落262中擷取符合該第一優先之候選問句之問句類別之答案271,以將該答案271(或包括該段落262)顯示於該使用者介面20。 The question processing system 2 can include a paragraph search module 26 and an answer processing module 27, and the paragraph search module 26 extracts a keyword group or a question from the at least one file 261 that matches the first priority candidate question. Constructing a paragraph 262 of the result, and the answer processing module 27 retrieves from the paragraph 262 an answer 271 that matches the question category of the first priority candidate question to the answer 271 (or includes the paragraph 262) Displayed in the user interface 20.

第4圖係繪示本揭露之問句處理方法之流程示意圖,第5圖係繪示本揭露之問句處理系統2及其方法之實施例示意圖,請一併參閱上述第3圖之問句處理系統2。 4 is a schematic flow chart showing a method for processing a question in the present disclosure, and FIG. 5 is a schematic view showing an embodiment of a question processing system 2 and a method thereof. Please refer to the question in the third figure. Processing system 2.

本揭露之問句處理方法可應用於具有處理器、記憶體與作業系統之電子裝置中,且該電子裝置可為個人電腦、平板電腦、筆記型電腦、網路伺服器、雲端伺服器、行動電話或智慧型手機等。同時,本揭露之問句處理方法主要包括下列步驟: The problem processing method of the present disclosure can be applied to an electronic device having a processor, a memory, and an operating system, and the electronic device can be a personal computer, a tablet computer, a notebook computer, a network server, a cloud server, and an action. Phone or smart phone, etc. At the same time, the method for processing the question of the disclosure mainly includes the following steps:

(1)如第4圖之步驟S31與第5圖所示,在小學生知識問答系統中,先由使用者自使用者介面20中輸入自然語言問句41「水稻住在舍麼ㄉ」,並由該問句處理系統2接收該自然語言問句41。接著,進至步驟S32。 (1) As shown in steps S31 and 5 of FIG. 4, in the primary school knowledge question answering system, the user first inputs the natural language question 41 "rice living in the house" from the user interface 20, and The natural language question 41 is received by the question processing system 2. Next, the process proceeds to step S32.

(2)如第4圖之步驟S32所示,由問句建構模組21之錯字處理單元211偵測並修正該自然語言問句41之錯別字詞或火星文,以產生一符合該自然語言問句41之問句意圖之校正問句。 (2) As shown in step S32 of FIG. 4, the typo processing unit 211 of the question construction module 21 detects and corrects the typos or the Martian text of the natural language question 41 to generate a natural language question. The correction question of the intent of the sentence of sentence 41.

具體而言,該錯字處理單元211可依據錯別字偵測法或火星文轉譯法,以偵測並修正該自然語言問句41之錯別字詞或火星文。同時,該錯字處理單元211可具有翻譯模型211a與語言模型211b,該翻譯模型211a係提供該自然語言問句41之錯別字詞或火星文之修正資料,且該語言模型211b係依據該修正資料修正該自然語言問句41之錯別字詞或火星文以產生該校正問句。 Specifically, the typo processing unit 211 can detect and correct the typos or Martian text of the natural language question 41 according to the typo detection method or the Martian translation method. In the meantime, the typo processing unit 211 can have a translation model 211a and a language model 211b, and the translation model 211a provides the typos of the natural language question 41 or the correction data of the Martian text, and the language model 211b is corrected according to the correction data. The natural language question 41 has a typos or a Martian text to generate the correction question.

例如,該錯字處理單元211可偵測該自然語言問句「水稻住在舍麼ㄉ」之錯別字詞「舍」與火星文「ㄉ」,並依據該自然語言問句41之問句意圖將「舍」與「ㄉ」分別修正為正確字詞「什」與「地」,亦即將該自然語言問句「水稻住在舍麼ㄉ」修正為校正問句「水稻住在什麼地」。接著,進至步驟S33。 For example, the typo processing unit 211 can detect the typos "she" and the Martian text "ㄉ" of the natural language question "rice living in the house", and according to the natural language question 41, the intent of the question will be " The "s" and "land" were corrected to the correct words "sh" and "land" respectively. The natural language question "rice living in the house" was corrected to the correct question "Where does the rice live?" Next, the process proceeds to step S33.

(3)如第4圖之步驟S33所示,由該問句建構模組21之誤用詞處理單元212分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞。同時,可由該問句建構模組21之缺漏詞處理單元213補足該校正問句之缺漏字詞而產生一或複數最優先之候選問句。 (3) As shown in step S33 of FIG. 4, the misuse word processing unit 212 of the question construction module 21 analyzes the collocation relationship of at least two phrases of the correction question, and corrects the correction question according to the collocation relationship. Misuse of words. At the same time, the missing word processing unit 213 of the question construction module 21 complements the missing words of the correction question to generate one or plural highest priority candidate questions.

詳言之,該誤用詞處理單元212係分析該問句意圖與該校正問句之至少二詞組之語境是否衝突,並於發生衝突時依據該語境自語料庫23或同義/近義詞庫24中擷取至少一第一搭配詞,以依據該第一搭配詞修正該校正問句之誤用字詞而產生該候選問句,使得該候選問句之詞組之語境 不具有衝突且符合該問句意圖。 In detail, the misuse word processing unit 212 analyzes whether the question intent is in conflict with the context of at least two phrases of the correction question, and in the event of a conflict, according to the context from the corpus 23 or the synonym/synonym 24 Extracting at least one first collocation to modify the misuse word of the correction question according to the first collocation to generate the candidate question, so that the phrase of the phrase of the candidate question phrase There is no conflict and it is consistent with the intent of the question.

例如,該誤用詞處理單元212分析出該校正問句「水稻住在什麼地」中,三個詞組「水稻」、「住」與「地」之搭配關係較差並具有衝突性,因「水稻」通常不與「住」共用在同一問句中,而且該詞組「住」應為誤用字詞,故該誤用詞處理單元212可依據該三個詞組之搭配關係,以擷取至少一第一搭配詞「種植」或「栽種」來修正該詞組「住」,亦即將校正問句「水稻住在什麼地」修正為符合該問句意圖之候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」...等等。 For example, the misuse word processing unit 212 analyzes the correction question phrase "Where does the rice live?", and the three phrases "rice", "residence" and "land" have a poor relationship and are conflicting because of "rice". Usually, it is not shared with "stay" in the same question, and the phrase "live" should be a misuse word, so the misuse word processing unit 212 can learn at least one first match according to the collocation relationship of the three phrases. The word "planting" or "planting" to correct the phrase "live", and also correcting the question "Where does the rice live" is amended to be a candidate question that matches the intent of the question: "Where is rice planted" or "rice planting" In what place...and so on.

而該缺漏詞處理單元213係分析該候選問句之缺漏字詞,並自該語料庫23或同義/近義詞庫24中擷取至少一第二搭配詞,以利用該第二搭配詞補足該校正問句之缺漏字詞而產生該候選問句,使得該候選問句之詞組之語境完整且符合該問句意圖。 The missing word processing unit 213 analyzes the missing words of the candidate question, and extracts at least one second collocation from the corpus 23 or the synonym/thesaurus 24 to supplement the correction question with the second collocation. The candidate question is generated by missing the word, so that the phrase of the candidate question phrase is complete and conforms to the question intention.

例如,該缺漏詞處理單元213分析出該候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」中,「地」應為「地方」或「地區」之意,則該缺漏詞處理單元213擷取至少一第二搭配詞「地方」或「地區」修正「地」並加上問號「?」,藉以補足該校正問句之缺漏字詞而產生完整的候選問句,亦即將該候選問句「水稻種植在什麼地」或「水稻栽植在什麼地」修正為「水稻種植在什麼地方?」、「水稻種植在什麼地區?」、「水稻栽植在什麼地方?」或「水稻栽植在什麼地區?」...等等。接著,進至步驟S34。 For example, the missing word processing unit 213 analyzes the candidate question "Where is the rice planted" or "Where is the rice planted", and the "land" should be the meaning of "place" or "region", then the missing word The processing unit 213 extracts at least one second collocation "place" or "region" to correct the "ground" and adds a question mark "?" to supplement the missing words of the correction question to generate a complete candidate question, that is, The candidate question "Where is rice planted" or "Where is rice planted?" is amended to "Where is rice planted?", "Where is rice planted?", "Where is rice planted?" or "Rice What area is planted?"...etc. Next, the process proceeds to step S34.

(4)如第4圖之步驟S34所示,由問句類別處理模組22依據問句分類模型與知識庫25分析該些最優先之候選問句以產生該些最優先之候選問句之問句類別。 (4) As shown in step S34 of FIG. 4, the question category processing module 22 analyzes the top priority candidate questions according to the question classification model and the knowledge base 25 to generate the highest priority candidate questions. Question category.

另外,可由該問句類別處理模組22依據該些最優先之候選問句之信心分數重新排序該些最優先之候選問句,並自該些最優先之候選問句中擷取超過預定之信心分數門檻值且具有最高信心分數者作為第一優先之候選問句。 In addition, the question category processing module 22 may reorder the top priority candidate questions according to the confidence scores of the highest priority candidate questions, and extract more than the predetermined ones from the highest priority candidate questions. The confidence score threshold and the highest confidence score are used as the first priority candidate questions.

例如,該問句類別處理模組22可重新排序上述之候選問句「水稻種植在什麼地方?」、「水稻種植在什麼地區?」、「水稻栽植在什麼地方?」與「水稻栽植在什麼地區?」,並以「水稻栽植在什麼地區?」作為該第一優先之候選問句,如第5圖所示「我猜你想問“水稻栽植在什麼地區?”」之候選問句42「水稻栽植在什麼地區?」。接著,進至步驟S35。 For example, the question category processing module 22 can reorder the candidate questions "Where is the rice planted?", "Where is the rice planted?", "Where is the rice planted?" and "What is rice planted?" "Region?" and "Where is the rice planted?" is the first priority candidate question, as shown in Figure 5 "I guess you want to ask "Where is the rice planted?" Candidate question 42 "What area is rice planted?" Next, the process proceeds to step S35.

(5)如第4圖之步驟S35所示,由該問句建構模組21之關鍵詞組擷取單元214依據該第一優先之候選問句產生至少一關鍵詞組或一問句建構結果。例如,自該第一優先之候選問句「水稻栽植在什麼地區?」中,產生如第5圖所示之關鍵詞組43「稻」及關鍵詞組44「栽種」等,或者產生一個問句建構結果「水稻栽植地區」。接著,進至步驟S36。 (5) As shown in step S35 of FIG. 4, the keyword group capturing unit 214 of the question construction module 21 generates at least one keyword group or one sentence construction result according to the first priority candidate question. For example, from the first priority candidate question "Where is the rice planted?", the keyword group 43 "rice" and the keyword group 44 "planting" as shown in Fig. 5 are generated, or a question construction is generated. The result is "rice planting area". Next, the process proceeds to step S36.

(6)如第4圖之步驟S36所示,由段落檢索模組26自至少一文件261中擷取符合該第一優先之候選問句之關鍵詞組或問句建構結果之段落262,並由答案處理模組27自 該段落262中擷取符合該第一優先之候選問句之問句類別之答案271,以將該答案271(或包括該段落262)顯示於該使用者介面20上。 (6) As shown in step S36 of FIG. 4, the paragraph search module 26 retrieves from the at least one file 261 a paragraph 262 that matches the keyword group or question construction result of the first priority candidate question, and Answer processing module 27 The answer 271 of the question category corresponding to the first priority candidate question is drawn in the paragraph 262 to display the answer 271 (or include the paragraph 262) on the user interface 20.

例如,自第5圖所示維基百科之文件中擷取符合關鍵詞組43「稻」及關鍵詞組44「栽種」之段落,並將答案「水稻在中國大陸廣為栽種後,逐漸向西傳播到印度,中世紀引入歐洲幹部,現時全世界有一半的人口食用稻,主要在亞洲、歐洲幹部和熱帶美洲及非洲部分地區」顯示於該使用者介面20上。而且,該答案可以是上述之一個段落,也可以是一個簡單答案,如「中國大陸」。 For example, from the Wikipedia document shown in Figure 5, draw the paragraphs that match the keyword group 43 "Rice" and the keyword group 44 "planting", and the answer "The rice is widely planted in mainland China and gradually spread westward. In India, the Middle Ages introduced European cadres. Currently, half of the world's population consumes rice, mainly in Asia, European cadres and tropical Americas and parts of Africa, as shown in the user interface20. Moreover, the answer can be one of the above paragraphs, or it can be a simple answer, such as "China."

同理,在例如英文之自然語言問句中,一樣可以採用上述第3圖之問句處理系統2與第4圖之問句處理方法,下列以一個例子簡單說明之。 Similarly, in the natural language question such as English, the question processing system 2 and the question processing method of FIG. 4 can be used in the same manner. The following is a brief description of an example.

(1)如同上述第3圖與第4圖之步驟S31所示,由使用者自使用者介面20輸入自然語言問句41「What does rice live?」,並由該問句處理系統2接收該自然語言問句41。 (1) The natural language question 41 "What does rice live?" is input from the user interface 20 by the user as shown in step S31 of the third and fourth figures, and is received by the question processing system 2. Natural language question 41.

(2)如同上述第3圖與第4圖之步驟S32所示,由問句建構模組21之錯字處理單元211偵測並修正該自然語言問句41「What does rice live?」之錯別字詞或火星文,以產生一符合該自然語言問句41之問句意圖之校正問句。 (2) As shown in step S32 of the third and fourth figures above, the typo processing unit 211 of the question construction module 21 detects and corrects the typos of the natural language question 41 "What does rice live?" Or Martian to produce a corrective question that matches the intent of the natural language question 41.

因該錯字處理單元211並未偵測到該自然語言問句41「What does rice live?」中含有錯別字詞或火星文,也符合該自然語言問句41之問句意圖,故可直接以該自然語言問句41作為該校正問句「What does rice live?」。 Since the typo processing unit 211 does not detect that the natural language question 41 "What does rice live?" contains a typos or a Martian text, and also conforms to the intent of the natural language question 41, it can directly The natural language question 41 is used as the correction question "What does rice live?".

(3)如同上述第3圖與第4圖之步驟S33所示,由該問句建構模組21之誤用詞處理單元212分析該校正問句之至少二詞組「Where」、「does」及「live」之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞「live」為正確字詞「grown」或「planted」。 (3) The error word processing unit 212 of the question construction module 21 analyzes at least two phrases "Where", "does" and "" of the correction question as shown in step S33 of the third and fourth figures above. According to the collocation relationship, the misuse word "live" of the correction question is corrected to the correct word "grown" or "planted".

同時,可由該問句建構模組21之缺漏詞處理單元213補足該校正問句之缺漏字詞而產生一或複數最優先之候選問句。因該校正問句「What does rice live?」中並未含有缺漏字詞,故該缺漏詞處理單元213可直接產生一或複數最優先之候選問句,例如該候選問句為「where does rice grown?」與「where is rice planted?」。 At the same time, the missing word processing unit 213 of the question construction module 21 complements the missing words of the correction question to generate one or plural highest priority candidate questions. Since the correction question "What does rice live?" does not contain a missing word, the missing word processing unit 213 can directly generate one or a plurality of highest priority candidate questions, for example, the candidate question is "where where rice Grown?" and "where is rice planted?".

(4)如同上述第3圖與第4圖之步驟S34所示,由問句類別處理模組22依據問句分類模型與知識庫25分析該些最優先之候選問句,以產生該些最優先之候選問句之問句類別,例如該問句類別為「where」。 (4) As shown in step S34 of FIG. 3 and FIG. 4 above, the question category processing module 22 analyzes the top priority candidate questions according to the question classification model and the knowledge base 25 to generate the most The question type of the preferred candidate question, for example, the question category is "where".

另外,可由該問句類別處理模組22依據該些最優先之候選問句之信心分數重新排序該些最優先之候選問句,並自該些最優先之候選問句中擷取超過預定之信心分數門檻值且具有最高信心分數者作為第一優先之候選問句,例如該第一優先之候選問句為「where does rice grown?」。 In addition, the question category processing module 22 may reorder the top priority candidate questions according to the confidence scores of the highest priority candidate questions, and extract more than the predetermined ones from the highest priority candidate questions. The confidence score threshold and the highest confidence score are used as the first priority candidate question, for example, the first priority candidate question is "where does rice grown?".

(5)如同上述第3圖與第4圖之步驟S35所示,由該問句建構模組21之關鍵詞組擷取單元214依據該第一優先之候選問句產生至少一關鍵詞組或一問句建構結果,例如該關鍵詞組為「where」、「rice」及「grown」,或者該問句建 構結果為「where rice grown」。 (5) As shown in step S35 of FIG. 3 and FIG. 4 above, the keyword group capturing unit 214 of the question construction module 21 generates at least one keyword group or one question according to the first priority candidate question. The sentence construction result, for example, the keyword group is "where", "rice" and "grown", or the question is built The result is "where rice grown".

(6)如同上述第3圖與第4圖之步驟S36所示,由段落檢索模組26自至少一文件261中擷取符合該第一優先之候選問句之關鍵詞組或問句建構結果之段落262,並由答案處理模組27自該段落262中擷取符合第一優先之候選問句之問句類別之答案271,以將該答案271(或包括該段落262)顯示於該使用者介面20上。 (6) As shown in step S36 of the third and fourth figures above, the paragraph search module 26 retrieves from the at least one file 261 the keyword group or question construction result that matches the first priority candidate question. Paragraph 262, and the answer processing module 27 retrieves an answer 271 from the paragraph 262 that matches the question category of the first priority candidate question to display the answer 271 (or include the paragraph 262) to the user. Interface 20 on.

由上述內容可知,本揭露之問句處理系統及其方法中,主要是透過問句建構模組之錯字處理單元、誤用詞處理單元與缺漏詞處理單元,以分別修正自然語言問句之錯別字詞、火星文、誤用字詞及缺漏字詞,並透過問句類別處理模組分析該自然語言問句之問句類別。 It can be seen from the above that in the problem processing system and method of the present disclosure, the typo processing unit, the misuse word processing unit and the missing word processing unit of the question construction module are mainly used to correct the typos of the natural language question respectively. , Martian text, misuse of words and missing words, and analyze the question type of the natural language question through the question category processing module.

藉此,本揭露能具備良好的容錯能力,以容忍該自然語言問句之錯別字詞、火星文、誤用字詞及缺漏字詞,並降低對該自然語言問句之問句意圖之分析錯誤所造成之影響,進而提高對該自然語言問句之回覆答案之正確率。 In this way, the disclosure can have good fault tolerance, to tolerate the typos, the Martian, the misuse and the missing words of the natural language question, and to reduce the analysis error of the question of the natural language question. The impact, and thus the correct answer to the answer to the natural language question.

上述實施形態僅例示性說明本揭露之原理、特點及其功效,並非用以限制本揭露之可實施範疇,任何熟習此項技藝之人士均可在不違背本揭露之精神及範疇下,對上述實施形態進行修飾與改變。任何運用本揭露所揭示內容而完成之等效改變及修飾,均仍應為下述之申請專利範圍所涵蓋。因此,本揭露之權利保護範圍,應如申請專利範圍所列。 The above-described embodiments are merely illustrative of the principles, features, and functions of the present disclosure, and are not intended to limit the scope of the present disclosure. Any person skilled in the art can practice the above without departing from the spirit and scope of the disclosure. The embodiment is modified and changed. Any equivalent changes and modifications made by the disclosure of the present disclosure should still be covered by the following claims. Therefore, the scope of protection of this disclosure should be as set forth in the scope of the patent application.

2‧‧‧問句處理系統 2‧‧‧Question Processing System

20‧‧‧使用者介面 20‧‧‧User interface

21‧‧‧問句建構模組 21‧‧‧ Question Building Module

211‧‧‧錯字處理單元 211‧‧‧ typo processing unit

211a‧‧‧翻譯模型 211a‧‧‧ translation model

211b‧‧‧語言模型 211b‧‧‧ language model

212‧‧‧誤用詞處理單元 212‧‧‧Miscellaneous word processing unit

213‧‧‧缺漏詞處理單元 213‧‧‧ Missing word processing unit

214‧‧‧關鍵詞組擷取單元 214‧‧‧Keyword group capture unit

22‧‧‧問句類別處理模組 22‧‧‧Question class processing module

23‧‧‧語料庫 23‧‧‧ Corpus

24‧‧‧同義/近義詞庫 24‧‧‧Synonyms/synonyms

25‧‧‧知識庫 25‧‧ ‧ knowledge base

26‧‧‧段落檢索模組 26‧‧‧ paragraph search module

261‧‧‧文件 261‧‧ ‧ documents

262‧‧‧段落 262‧‧‧ paragraph

27‧‧‧答案處理模組 27‧‧‧Answer Processing Module

271‧‧‧答案 271‧‧‧ answers

Claims (19)

一種問句處理系統,係應用於具有處理器、記憶體與作業系統之電子裝置中,該問句處理系統包括:問句建構模組,係具有:錯字處理單元,係偵測並修正自然語言問句之錯別字詞或火星文,以產生一符合該自然語言問句之問句意圖之校正問句;以及誤用詞處理單元,係分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞以產生至少一符合該問句意圖之候選問句;以及問句類別處理模組,係分析該候選問句以產生該候選問句之問句類別。 A question processing system is applied to an electronic device having a processor, a memory and an operating system. The question processing system includes: a question construction module, which has a typo processing unit that detects and corrects natural language. a typos or a Martian text of the question to generate a corrective question that matches the intent of the natural language question; and a misuse term processing unit that analyzes the collocation relationship of at least two phrases of the corrective question, and The collocation relationship corrects the misused words of the correction question to generate at least one candidate question that meets the intent of the question; and the question category processing module analyzes the candidate question to generate the question category of the candidate question. 如申請專利範圍第1項所述之問句處理系統,更包括使用者介面,係供使用者輸入該自然語言問句。 The question processing system described in claim 1 of the patent application further includes a user interface for the user to input the natural language question. 如申請專利範圍第1項所述之問句處理系統,其中,該錯字處理單元係依據錯別字偵測法或火星文轉譯法,以偵測並修正該自然語言問句之錯別字詞或火星文。 The question processing system of claim 1, wherein the typo processing unit is based on a typo detection method or a Martian translation method to detect and correct a typos or a Martian text of the natural language question. 如申請專利範圍第1項所述之問句處理系統,其中,該錯字處理單元係具有翻譯模型與語言模型,該翻譯模型係提供該自然語言問句之錯別字詞或火星文之修正資料,該語言模型係依據該修正資料修正該自然語言問句之錯別字詞或火星文以產生該校正問句。 The question processing system according to claim 1, wherein the typo processing unit has a translation model and a language model, and the translation model provides a typos of the natural language question or an amendment data of the Martian text. The language model corrects the typos or the Martian text of the natural language question according to the revised data to generate the correction question. 如申請專利範圍第1項所述之問句處理系統,其中,該誤用詞處理單元係分析該問句意圖與該校正問句之詞組之語境,並依據該語境自語料庫或同義/近義詞庫中擷取至少一第一搭配詞以修正該校正問句之誤用字詞而產生該候選問句。 The question processing system of claim 1, wherein the misuse word processing unit analyzes the context of the question intent and the phrase of the correction question, and according to the context, the corpus or synonym/synonym The candidate question is generated by extracting at least one first collocation in the library to correct the misuse word of the correction question. 如申請專利範圍第1項所述之問句處理系統,其中,該問句建構模組更具有缺漏詞處理單元,係分析該候選問句之缺漏字詞,並自語料庫或同義/近義詞庫中擷取至少一第二搭配詞以補足該校正問句之缺漏字詞而產生該候選問句。 The question processing system according to claim 1, wherein the question construction module further has a missing word processing unit, and analyzes the missing words of the candidate question, and is in a self-corpus or synonym/synonym database. The candidate question is generated by taking at least one second collocation to complement the missing word of the correction question. 如申請專利範圍第1項所述之問句處理系統,其中,該至少一候選問句係為複數最優先之候選問句,該問句類別處理模組係藉由問句分類模型與知識庫分析該些最優先之候選問句而產生該些最優先之候選問句之問句類別。 The question processing system according to claim 1, wherein the at least one candidate question is a plural-first candidate question, and the question-class processing module is configured by a question classification model and a knowledge base. Analyzing the top priority candidate questions to generate the question categories of the top priority candidate questions. 如申請專利範圍第7項所述之問句處理系統,其中,該問句類別處理模組更依據該些最優先之候選問句之信心分數重新排序該些最優先之候選問句,並自該些最優先之候選問句中擷取具有最高信心分數者作為第一優先之候選問句。 The question processing system of claim 7, wherein the question category processing module reorders the highest priority candidate questions according to the confidence scores of the highest priority candidate questions, and Among the top priority candidate questions, the one with the highest confidence score is taken as the first priority candidate question. 如申請專利範圍第8項所述之問句處理系統,其中,該問句建構模組更具有關鍵詞組擷取單元,係依據該第一優先之候選問句產生至少一關鍵詞組或一問句建構結果。 The question processing system of claim 8, wherein the question construction module further has a keyword group capturing unit, and generates at least one keyword group or a question according to the first priority candidate question. Construct the results. 如申請專利範圍第9項所述之問句處理系統,更包括段落檢索模組與答案處理模組,該段落檢索模組係自文件中擷取符合該第一優先之候選問句之關鍵詞組或問句建構結果之段落,該答案處理模組係自該段落中擷取符合該第一優先之候選問句之問句類別之答案。 For example, the question processing system described in claim 9 further includes a paragraph search module and an answer processing module, wherein the paragraph search module extracts a keyword group from the file that meets the first priority candidate question. Or the paragraph of the construction result, the answer processing module extracts an answer from the paragraph in the paragraph that matches the question type of the first priority candidate question. 一種問句處理方法,係應用於具有處理器、記憶體與作業系統之電子裝置中,該問句處理方法包括:偵測並修正自然語言問句之錯別字詞或火星文,以產生一符合該自然語言問句之問句意圖之校正問句;分析該校正問句之至少二詞組之搭配關係,並依據該搭配關係修正該校正問句之誤用字詞以產生至少一符合該問句意圖之候選問句;以及分析該候選問句以產生該候選問句之問句類別。 A question processing method is applied to an electronic device having a processor, a memory and an operating system, the method for processing the question comprising: detecting and correcting a typos of a natural language question or a Martian text to generate a match A correction question of the intention of the question of the natural language question; analyzing the collocation relationship of at least two phrases of the correction question, and correcting the misuse word of the correction question according to the collocation relationship to generate at least one intent to conform to the question a candidate question; and analyzing the candidate question to generate a question category for the candidate question. 如申請專利範圍第11項所述之問句處理方法,更包括依據錯別字偵測法或火星文轉譯法以偵測並修正該自然語言問句之錯別字詞或火星文。 For example, the method for processing a question as described in claim 11 includes detecting or correcting a typos or a Martian text of the natural language question according to a typo detection method or a Martian translation method. 如申請專利範圍第11項所述之問句處理方法,更包括提供該自然語言問句之錯別字詞或火星文之修正資料,並依據該修正資料修正該自然語言問句之錯別字詞或火星文以產生該校正問句。 For example, the method for processing a question as described in claim 11 further includes providing a typos of the natural language question or an amendment of the Martian text, and correcting the typos or the Martian text of the natural language question according to the revised data. To generate the correction question. 如申請專利範圍第11項所述之問句處理方法,更包括分析該問句意圖與該校正問句之詞組之語境,並依據該語境自語料庫或同義/近義詞庫中擷取至少一第一搭 配詞以修正該校正問句之誤用字詞而產生該候選問句。 The method for processing a question as described in claim 11 further includes analyzing the context of the sentence intent and the phrase of the correcting question, and extracting at least one from the corpus or the synonym/sense vocabulary according to the context. First ride The conjunction generates the candidate question by correcting the misused words of the correction question. 如申請專利範圍第11項所述之問句處理方法,更包括分析該候選問句之缺漏字詞,並自語料庫或同義/近義詞庫中擷取至少一第二搭配詞以補足該校正問句之缺漏字詞而產生該候選問句。 For example, the method for processing a question as described in claim 11 further includes analyzing the missing words of the candidate question, and extracting at least one second collocation from the corpus or the synonym/sense vocabulary to complement the correction question. The candidate question is generated by missing the word. 如申請專利範圍第11項所述之問句處理方法,其中,該至少一候選問句係包括複數最優先之候選問句,以藉由問句分類模型與知識庫分析該些最優先之候選問句而產生該些最優先之候選問句之問句類別。 The method for processing a question according to claim 11, wherein the at least one candidate question includes a plurality of top priority candidate questions to analyze the highest priority candidates by using a question classification model and a knowledge base. The question arises from the question categories of the top priority candidate questions. 如申請專利範圍第16項所述之問句處理方法,更包括依據該些最優先之候選問句之信心分數重新排序該些最優先之候選問句,並自該些最優先之候選問句中擷取具有最高信心分數者作為第一優先之候選問句。 The method for processing a question as described in claim 16 further includes reordering the top priority candidate questions according to the confidence scores of the top priority candidate questions, and selecting the highest priority candidate questions from the top priority candidate questions. The candidate with the highest confidence score is the first priority candidate. 如申請專利範圍第17項所述之問句處理方法,更包括依據該第一優先之候選問句產生至少一關鍵詞組或一問句建構結果。 The method for processing a question as described in claim 17 further includes generating at least one keyword group or a question construction result according to the first priority candidate question. 如申請專利範圍第18項所述之問句處理方法,更包括自文件中擷取符合該第一優先之候選問句之關鍵詞組或問句建構結果之段落,並自該段落中擷取符合該第一優先之候選問句之問句類別之答案。 The method for processing a question as described in claim 18 of the patent application further includes extracting from the document a paragraph of a keyword group or a question construction result that satisfies the candidate question of the first priority, and drawing a match from the paragraph The answer to the question category of the first priority candidate question.
TW103140400A 2014-11-21 2014-11-21 Question processing system and method thereof TWI553491B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
TW103140400A TWI553491B (en) 2014-11-21 2014-11-21 Question processing system and method thereof
CN201410782497.7A CN105760359B (en) 2014-11-21 2014-12-17 Question processing system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW103140400A TWI553491B (en) 2014-11-21 2014-11-21 Question processing system and method thereof

Publications (2)

Publication Number Publication Date
TW201619850A true TW201619850A (en) 2016-06-01
TWI553491B TWI553491B (en) 2016-10-11

Family

ID=56335582

Family Applications (1)

Application Number Title Priority Date Filing Date
TW103140400A TWI553491B (en) 2014-11-21 2014-11-21 Question processing system and method thereof

Country Status (2)

Country Link
CN (1) CN105760359B (en)
TW (1) TWI553491B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI678686B (en) * 2018-08-23 2019-12-01 國立臺灣師範大學 Interactive education method and teaching electronic device
TWI823091B (en) * 2020-05-28 2023-11-21 日商杰富意鋼鐵股份有限公司 information retrieval system

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6819990B2 (en) * 2016-08-16 2021-01-27 国立研究開発法人情報通信研究機構 Dialogue system and computer programs for it
CN106776501A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 A kind of automatic method for correcting of text wrong word and server
CN108573696B (en) * 2017-03-10 2021-03-30 北京搜狗科技发展有限公司 Voice recognition method, device and equipment
CN107688608A (en) * 2017-07-28 2018-02-13 合肥美的智能科技有限公司 Intelligent sound answering method, device, computer equipment and readable storage medium storing program for executing
CN110598222B (en) * 2019-09-12 2023-05-30 北京金山数字娱乐科技有限公司 Language processing method and device, training method and device of language processing system

Family Cites Families (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10269204A (en) * 1997-03-28 1998-10-09 Matsushita Electric Ind Co Ltd Method and device for automatically proofreading chinese document
CN1228565A (en) * 1997-07-18 1999-09-15 睿扬资讯股份有限公司 Computer file automatic error detection and error correction device and its method
CN1442787A (en) * 2002-03-01 2003-09-17 何万贯 Composition revise and write system
JP2004127003A (en) * 2002-10-03 2004-04-22 Nippon Telegr & Teleph Corp <Ntt> Question-answering method, question-answering device, question-answering program, and storage medium
JP4039282B2 (en) * 2003-03-17 2008-01-30 富士ゼロックス株式会社 Natural language processing system, natural language processing method, and computer program
TWI226560B (en) * 2003-12-31 2005-01-11 Lin Guei Mei Information system with natural language parsing ability and processing method thereof
US7254774B2 (en) * 2004-03-16 2007-08-07 Microsoft Corporation Systems and methods for improved spell checking
CN100416570C (en) * 2006-09-22 2008-09-03 浙江大学 FAQ based Chinese natural language ask and answer method
CN101206673A (en) * 2007-12-25 2008-06-25 北京科文书业信息技术有限公司 Intelligent error correcting system and method in network searching process
CN101287228A (en) * 2008-05-26 2008-10-15 北京捷讯畅达科技发展有限公司 Phoneticizing error correcting technique and device applying to query by short message service of mobile phone
CN101287229A (en) * 2008-05-26 2008-10-15 北京捷讯畅达科技发展有限公司 Natural language processing technique and device applying to query by short message service of mobile phone
CN101373532A (en) * 2008-07-10 2009-02-25 昆明理工大学 FAQ Chinese request-answering system implementing method in tourism field
CN101414310A (en) * 2008-10-17 2009-04-22 山西大学 Method and apparatus for searching natural language
CN101727271B (en) * 2008-10-22 2012-11-14 北京搜狗科技发展有限公司 Method and device for providing error correcting prompt and input method system
CN101847140B (en) * 2009-03-23 2012-04-18 中国科学院计算技术研究所 Wrongly-written or mispronounced character processing method and system
CN101630312A (en) * 2009-08-19 2010-01-20 腾讯科技(深圳)有限公司 Clustering method for question sentences in question-and-answer platform and system thereof
CN102456001B (en) * 2010-10-27 2014-11-26 北京四维图新科技股份有限公司 Method and device for checking wrongly written characters
CN102737042B (en) * 2011-04-08 2015-03-25 北京百度网讯科技有限公司 Method and device for establishing question generation model, and question generation method and device
CN103927329B (en) * 2014-03-19 2017-03-29 北京奇虎科技有限公司 A kind of instant search method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI678686B (en) * 2018-08-23 2019-12-01 國立臺灣師範大學 Interactive education method and teaching electronic device
TWI823091B (en) * 2020-05-28 2023-11-21 日商杰富意鋼鐵股份有限公司 information retrieval system

Also Published As

Publication number Publication date
TWI553491B (en) 2016-10-11
CN105760359B (en) 2020-03-20
CN105760359A (en) 2016-07-13

Similar Documents

Publication Publication Date Title
TWI553491B (en) Question processing system and method thereof
US10176804B2 (en) Analyzing textual data
CN106537370B (en) Method and system for robust tagging of named entities in the presence of source and translation errors
Derczynski et al. Microblog-genre noise and impact on semantic annotation accuracy
US9910886B2 (en) Visual representation of question quality
US7584092B2 (en) Unsupervised learning of paraphrase/translation alternations and selective application thereof
KR102025968B1 (en) Phrase-based dictionary extraction and translation quality evaluation
US7546235B2 (en) Unsupervised learning of paraphrase/translation alternations and selective application thereof
US7552046B2 (en) Unsupervised learning of paraphrase/translation alternations and selective application thereof
KR102491172B1 (en) Natural language question-answering system and learning method
US10896222B1 (en) Subject-specific data set for named entity resolution
CN103324621B (en) A kind of Thai text spelling correcting method and device
KR101509727B1 (en) Apparatus for creating alignment corpus based on unsupervised alignment and method thereof, and apparatus for performing morphological analysis of non-canonical text using the alignment corpus and method thereof
KR101500617B1 (en) Method and system for Context-sensitive Spelling Correction Rules using Korean WordNet
KR102100951B1 (en) System for generating question-answer data for maching learning based on maching reading comprehension
US10997223B1 (en) Subject-specific data set for named entity resolution
CN103970765A (en) Error correcting model training method and device, and text correcting method and device
US20180157646A1 (en) Command transformation method and system
CN112417102A (en) Voice query method, device, server and readable storage medium
WO2017166626A1 (en) Normalization method, device and electronic equipment
Tachicart et al. Lexical differences and similarities between Moroccan dialect and Arabic
Ganfure et al. Design and implementation of morphology based spell checker
CN102609410B (en) Authority file auxiliary writing system and authority file generating method
Muhamad et al. Proposal: A hybrid dictionary modelling approach for malay tweet normalization
Chiu et al. Chinese spell checking based on noisy channel model