TWI511124B

TWI511124B - Selection method based on speech recognition and mobile terminal device and information system using the same

Info

Publication number: TWI511124B
Application number: TW102121404A
Authority: TW
Inventors: guo-feng Zhang
Original assignee: Via Tech Inc
Priority date: 2012-12-31
Filing date: 2013-06-17
Publication date: 2015-12-01
Also published as: CN103280218A; CN106847278A; TW201426736A; CN103021403A

Description

Voice recognition based selection method and its mobile terminal device and information system

本發明是有關於一種選擇方法及其移動終端裝置及資訊系統，且特別是有關於一種基於語音識別的選擇方法及其移動終端裝置及資訊系統。The present invention relates to a selection method and a mobile terminal device and information system thereof, and more particularly to a voice recognition based selection method and a mobile terminal device and information system thereof.

在電腦的自然語言理解(Nature Language Understanding)中，通常會使用特定的語法來抓取用戶的輸入語句的意圖或資訊。因此，若資料庫中儲存有足夠多的用戶輸入語句的資料，便能做到合理的判斷。In the computer's Nature Language Understanding, a specific grammar is usually used to capture the intent or information of the user's input statement. Therefore, if there are enough data of the user input sentence stored in the database, a reasonable judgment can be made.

在現有的作法中，有一種是利用內置的固定詞列表來抓取用戶的輸入語句，而固定詞列表中包含了特定的意圖或資訊所使用的特定用語，而用戶需依照此特定用語來表達其意圖或資訊，其意圖或資訊才能被系統正確識別。然而，迫使用戶去記住固定詞列表的每個特定用語是相當不人性化的作法。例如：現有技術使用固定詞列表的實施方式，要求用戶在詢問天氣的時候必須說：“上海(或北京)明天(或後天)天氣如何？”，而若用戶使用其他比較自然的口語化表達也想詢問天氣狀況時，比如是“上海明天怎麼樣啊？”，因為語句中未出現“天氣”，所以現有技術就會理解成“上海有個叫明天的地方”，這樣顯然沒有抓到用戶的真正意圖。另外，用戶所使用的語句種類是十分複雜的，並且又時常有所變化，甚至有時用戶可能會輸入錯誤的語句，在此情況下必須要藉由模糊匹配的方式來抓取用戶的輸入語句。因此，僅提供僵化輸入規則的固定詞列表所能達到的效果就更差了。In the existing practice, one uses the built-in fixed word list to capture the user's input sentence, and the fixed word list contains specific intentions or specific terms used by the information, and the user needs to express according to the specific term. Their intent or information, their intent or information can be correctly identified by the system. However, forcing the user to remember each specific term of the fixed word list is quite unhuman. For example: existing The use of fixed word lists in technology requires users to ask: “What is the weather in Shanghai (or Beijing) tomorrow (or the day after tomorrow)?”, and users want to ask the weather if they use other more natural colloquial expressions. In the situation, for example, "How is Shanghai tomorrow?", because the "weather" does not appear in the statement, the existing technology will be understood as "there is a place in Shanghai called tomorrow", which obviously does not capture the true intention of the user. In addition, the types of statements used by users are very complicated, and often change from time to time. Sometimes users may enter incorrect statements. In this case, the user's input statements must be fetched by fuzzy matching. . Therefore, the effect of a fixed word list that only provides rigid input rules is even worse.

此外，當利用自然語言理解來處理多種類型的用戶意圖時，有些相異的意圖的語法結構卻是相同的，例如當用戶的輸入語句為"我要看三國演義"，其用戶意圖有可能是想看三國演義的電影，或是想看三國演義的書，因此通常在此情況中，便會匹配到兩種可能意圖來讓用戶做選擇。然而，在很多情況下，提供不必要的可能意圖來讓用戶做選擇是十分多餘且沒效率的。例如，當用戶的輸入語句為"我想看超級星光大道"時，將使用者的意圖匹配為看超級星光大道的書或者畫作是十分沒必要的(因為超級星光大道是電視節目)。In addition, when using natural language understanding to handle multiple types of user intent, the grammatical structure of some different intents is the same. For example, when the user's input statement is "I want to see the Romance of the Three Kingdoms", the user's intention may be If you want to see a movie from the Romance of the Three Kingdoms, or if you want to read a book about the Romance of the Three Kingdoms, it is usually in this case that there are two possible intents to match the user's choice. However, in many cases, it is superfluous and inefficient to provide unnecessary possible intent to make the user's choice. For example, when the user's input sentence is "I want to see Super Star Avenue", it is not necessary to match the user's intentions to the book or painting of the Super Star Avenue (because the Super Star Avenue is a TV show).

再者，一般而言，在全文檢索中所獲得的搜尋結果是非結構化的資料。非結構化資料內的資訊是分散且不具關聯的，例如，在google或百度等搜尋引擎輸入關鍵字後，所獲得的網頁搜尋結果就是非結構化資料，因為搜尋結果必須透過人為的逐項閱讀才能找到當中的有用資訊，而這樣的作法不僅浪費用戶的時間，而且可能漏失想要的資訊，所以在實用性上會受到很大的限制。Furthermore, in general, the search results obtained in full-text search are unstructured data. The information in unstructured data is scattered and unrelated. For example, after a keyword is input by a search engine such as Google or Baidu, the obtained web search result is unstructured data, because the search result must be read by human item by item. Reading can find useful information, and this method not only wastes the user's time, but also may miss the information you want, so it will be greatly limited in practicality.

本發明提供一種基於語音識別的選擇方法及其移動終端裝置及資訊系統，可提升使用者操作的便利性。The invention provides a voice recognition-based selection method, a mobile terminal device and an information system thereof, which can improve the convenience of the user operation.

本發明提出一種基於語音識別的選擇方法，包括：接收第一語音輸入；對第一語音輸入進行語音識別以產生第一關鍵字；依據第一關鍵字產生至少一個第一回報答案；當選擇第一回報答案的數量為1時，依據所選擇第一回報答案的資料類型進行對應的操作；當選擇第一回報答案的數量大於1時，顯示包含第一回報答案的第一候選列表且接收第二語音輸入；對第二語音輸入進行語音識別以產生第二關鍵字；依據第二關鍵字從第一候選列表所顯示的第一回報答案中選擇第二回報答案。The present invention provides a voice recognition-based selection method, including: receiving a first voice input; performing voice recognition on the first voice input to generate a first keyword; generating at least one first reward answer according to the first keyword; When the number of the answer answers is 1, the corresponding operation is performed according to the data type of the selected first return answer; when the number of the first return answer is greater than 1, the first candidate list including the first return answer is displayed and the first Two voice input; performing voice recognition on the second voice input to generate a second keyword; and selecting a second reward answer from the first report answer displayed by the first candidate list according to the second keyword.

本發明提出一種移動終端裝置，包括語音接收單元、顯示單元、儲存單元及資料處理單元。語音接收單元接收第一語音輸入及第二語音輸入。顯示單元用以顯示包含回報答案的候選列表。儲存單元用以儲存多個資料。資料處理單元耦接語音接收單元、顯示單元及儲存單元。資料處理單元對第一語音輸入進行語音識別以產生第一關鍵字，並且依據第一關鍵字選擇對應的第一回報答案。當選擇的第一回報答案的數量為1時，資料處理單元依據所選擇第一回報答案的類型進行對應的操作。當選擇的第一回報答案的數量大於1時，資料處理單元控制顯示單元顯示包含第一回報答案的第一候選列表。資料處理單元對第二語音輸入進行語音識別以產生第二關鍵字，並且依據第二關鍵字從第一候選列表的第一回報答案中選擇第二回報答案。The invention provides a mobile terminal device, which comprises a voice receiving unit, a display unit, a storage unit and a data processing unit. The voice receiving unit receives the first voice input and the second voice input. The display unit is used to display a candidate list containing a reward answer. The storage unit is used to store multiple materials. The data processing unit is coupled to the voice receiving unit, the display unit, and the storage unit. The data processing unit performs voice recognition on the first voice input to generate a first keyword, and selects a corresponding first reward answer according to the first keyword. Data processing unit when the number of first answer answers selected is one The corresponding operation is performed according to the type of the first return answer selected. When the number of selected first reward answers is greater than 1, the data processing unit controls the display unit to display the first candidate list including the first reward answer. The data processing unit performs voice recognition on the second voice input to generate a second keyword, and selects a second reward answer from the first report answer of the first candidate list according to the second keyword.

本發明提出一種資訊系統，包括伺服器及移動終端裝置。伺服器用以儲存多個資料且具有語音識別功能。移動終端裝置包括語音接收單元、顯示單元及資料處理單元。語音接收單元接收第一語音輸入及第二語音輸入。顯示單元用以顯示包含回報答案的候選列表。資料處理單元耦接語音接收單元、顯示單元及伺服器。資料處理單元透過伺服器對第一語音輸入進行語音識別以產生第一關鍵字，並且伺服器依據第一關鍵字選擇對應的第一回報答案並傳送至資料處理單元。當選擇的第一回報答案的數量為1時，資料處理單元依據所選擇的第一回報答案的類型進行對應的操作。當選擇的第一回報答案的數量大於1時，資料處理單元控制顯示單元顯示包含第一回報答案的第一候選列表，以及資料處理單元透過伺服器對第二語音輸入進行語音識別以產生第二關鍵字，並且伺服器依據第二關鍵字從第一候選列表的第一回報答案中選擇第二回報答案並傳送至資料處理單元。The invention provides an information system, including a server and a mobile terminal device. The server is used to store multiple data and has voice recognition function. The mobile terminal device includes a voice receiving unit, a display unit, and a data processing unit. The voice receiving unit receives the first voice input and the second voice input. The display unit is used to display a candidate list containing a reward answer. The data processing unit is coupled to the voice receiving unit, the display unit, and the server. The data processing unit performs voice recognition on the first voice input through the server to generate the first keyword, and the server selects the corresponding first reward answer according to the first keyword and transmits the first report answer to the data processing unit. When the number of selected first return answers is 1, the data processing unit performs a corresponding operation according to the type of the selected first return answer. When the number of selected first return answers is greater than 1, the data processing unit controls the display unit to display a first candidate list including the first report answer, and the data processing unit performs voice recognition on the second voice input through the server to generate a second a keyword, and the server selects the second report answer from the first report answer of the first candidate list according to the second keyword and transmits the result to the data processing unit.

本發明提出一種基於語音識別的選擇方法，包括：依據該第一關鍵字於一結構化資料庫進行檢索以取得至少一第一回報答案；當選擇的該第一回報答案的數量大於1時，顯示一包含該第一回報答案的第一候選列表；在顯示該第一候選列表後接收一第二語音輸入，且對該第二語音輸入進行語音識別以產生一第二關鍵字；以及，依據該第二用戶意圖從第一候選列表的該第一回報答案中選擇第二回報答案。The present invention provides a voice recognition-based selection method, including: searching according to the first keyword in a structured database to obtain at least one first reward answer; when the number of selected first reward answers is greater than 1, Show one contains this a first candidate list of the first answer; receiving a second voice input after displaying the first candidate list, and performing voice recognition on the second voice input to generate a second keyword; and, according to the second user It is intended to select a second reward answer from the first reward answer of the first candidate list.

基於上述，本發明實施例的基於語音識別的選擇方法及其移動終端裝置及資訊系統，其對第一語音輸入及第二語音輸入進行語音識別及自然語言處理以確認第一語音輸入及第二語音輸入對應的關鍵字，並且依據第一語音輸入及第二語音輸入對應的關鍵字對回報答案進行選擇。藉此，可提升使用者操作的便利性。Based on the above, the voice recognition-based selection method and the mobile terminal device and the information system thereof perform voice recognition and natural language processing on the first voice input and the second voice input to confirm the first voice input and the second The corresponding keyword is input by voice, and the reward answer is selected according to the keyword corresponding to the first voice input and the second voice input. Thereby, the convenience of the user's operation can be improved.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。The above described features and advantages of the invention will be apparent from the following description.

100、520、520’、720、720’‧‧‧自然語言理解系統100, 520, 520’, 720, 720’ ‧ ‧ natural language understanding system

102、503、503’、703、902、902’‧‧‧請求資訊102, 503, 503', 703, 902, 902'‧‧‧ request information

104‧‧‧分析結果104‧‧‧Analysis results

106‧‧‧可能意圖語法資料106‧‧‧ possible intent grammar information

108、509、509’、711、904、904’‧‧‧關鍵字108, 509, 509’, 711, 904, 904’ ‧ ‧ keywords

110‧‧‧回應結果110‧‧‧Responding results

112‧‧‧意圖資料112‧‧‧Intentional information

114‧‧‧確定意圖語法資料114‧‧‧Determining intent grammar information

116‧‧‧分析結果輸出模組116‧‧‧Analysis Results Output Module

200‧‧‧檢索系統200‧‧‧Search System

220‧‧‧結構化資料庫220‧‧‧ Structured Database

240‧‧‧搜尋引擎240‧‧‧Search Engine

260‧‧‧檢索介面單元260‧‧‧Search interface unit

280‧‧‧指引資料儲存裝置280‧‧‧Guide data storage device

300‧‧‧自然語言處理器300‧‧‧Natural Language Processor

302、832、834、836、838‧‧‧記錄302, 832, 834, 836, 838 ‧ ‧ records

304‧‧‧標題欄位304‧‧‧ title field

306‧‧‧內容欄位306‧‧‧Content field

308‧‧‧分欄位308‧‧ ‧ field

310‧‧‧指引欄位310‧‧‧Guide field

312‧‧‧數值欄位312‧‧‧Value field

314‧‧‧來源欄位314‧‧‧Source field

316‧‧‧熱度欄位316‧‧‧heat field

318、852、854‧‧‧喜好欄位318, 852, 854‧‧‧ favorite fields

320、862、864‧‧‧厭惡欄位320, 862, 864‧‧‧ aversive field

400‧‧‧知識輔助理解模組400‧‧‧Knowledge-assisted understanding module

500、500’、700、700’‧‧‧自然語言對話系統500, 500’, 700, 700’‧‧‧ Natural Language Dialogue System

501、701‧‧‧語音輸入501, 701‧‧‧ voice input

507、507’、707‧‧‧語音應答507, 507’, 707‧ ‧ voice response

510、710‧‧‧語音取樣模組510, 710‧‧‧ voice sampling module

511、511’、711、906、906’‧‧‧回報答案511, 511', 711, 906, 906' ‧ ‧ return answers

513、513’、713‧‧‧語音513, 513’, 713 ‧ ‧ voice

522、722‧‧‧語音識別模組522, 722‧‧‧ voice recognition module

524、724‧‧‧自然語言處理模組524, 724‧‧‧ natural language processing module

526、726‧‧‧語音合成模組526, 726‧‧‧Speech synthesis module

530、740‧‧‧語音合成資料庫530, 740‧‧‧Speech synthesis database

702‧‧‧語音綜合處理模組702‧‧‧voice integrated processing module

715‧‧‧用戶喜好資料715‧‧‧User preference information

717‧‧‧用戶喜好記錄717‧‧‧User preference record

730‧‧‧特性資料庫730‧‧‧Characteristic Database

872、874‧‧‧欄位872, 874‧‧‧ fields

900、1010‧‧‧移動終端裝置900, 1010‧‧‧ mobile terminal devices

908、908’‧‧‧候選列表908, 908’‧‧‧ Candidate List

910、1011‧‧‧語音接收單元910, 1011‧‧‧ voice receiving unit

920、1013‧‧‧資料處理單元920, 1013‧‧‧ Data Processing Unit

930、1015‧‧‧顯示單元930, 1015‧‧‧ display unit

940‧‧‧儲存單元940‧‧‧ storage unit

1000‧‧‧資訊系統1000‧‧‧Information System

1020‧‧‧伺服器1020‧‧‧Server

SP1‧‧‧第一語音SP1‧‧‧ first voice

SP2‧‧‧第二語音SP2‧‧‧second voice

1200、1300‧‧‧語音操控系統1200, 1300‧‧‧ voice control system

1210‧‧‧輔助啟動裝置1210‧‧‧Auxiliary starter

1212、1222‧‧‧無線傳輸模組1212, 1222‧‧‧ wireless transmission module

1214‧‧‧觸發模組1214‧‧‧ Trigger Module

1216‧‧‧無線充電電池1216‧‧‧Wireless rechargeable battery

12162‧‧‧電池單元12162‧‧‧ battery unit

12164‧‧‧無線充電模組12164‧‧‧Wireless charging module

1220、1320‧‧‧移動終端裝置1220, 1320‧‧‧ mobile terminal devices

1221‧‧‧語音系統1221‧‧‧ voice system

1224‧‧‧語音取樣模組1224‧‧‧Voice sampling module

1226‧‧‧語音合成模組1226‧‧‧Speech synthesis module

1227‧‧‧語音輸出介面1227‧‧‧Voice output interface

1228‧‧‧通訊模組1228‧‧‧Communication Module

1230‧‧‧(雲端)伺服器1230‧‧‧ (cloud) server

1232‧‧‧語音理解模組1232‧‧‧Voice Understanding Module

12322‧‧‧語音識別模組12322‧‧‧Voice recognition module

12324‧‧‧語音處理模組12324‧‧‧Voice Processing Module

S410~S450‧‧‧根據本發明一實施例的檢索方法的步驟S410~S450‧‧‧ steps of a retrieval method according to an embodiment of the present invention

S510~S590‧‧‧根據本發明一實施例的自然語言理解系統工作過程的步驟S510~S590‧‧‧ steps of the working process of the natural language understanding system according to an embodiment of the invention

S602、S604、S606、S608、S610、S612‧‧‧修正語音應答的方法各步驟S602, S604, S606, S608, S610, S612‧‧‧ steps of modifying the voice response

S802~S890‧‧‧根據本發明一實施例的自然語言對話方法各步驟S802~S890‧‧‧ steps of a natural language dialogue method according to an embodiment of the present invention

S1100~S1190‧‧‧依據本發明一實施例的基於語音識別的選擇方法的各步驟S1100~S1190‧‧‧ steps of a speech recognition based selection method according to an embodiment of the invention

S1402~S1412‧‧‧依據本發明一實施例的語音操控方法的各步驟S1402~S1412‧‧‧ steps of a voice manipulation method according to an embodiment of the present invention

圖1為根據本發明的一實施例的自然語言理解系統的方塊圖。1 is a block diagram of a natural language understanding system in accordance with an embodiment of the present invention.

圖2為根據本發明的一實施例的自然語言處理器對用戶的各種請求資訊的分析結果的示意圖。2 is a diagram showing the analysis results of various request information of a user by a natural language processor according to an embodiment of the present invention.

圖3A是根據本發明的一實施例的結構化資料庫所儲存的具有特定資料結構的多個記錄的示意圖。3A is a schematic diagram of a plurality of records having a particular data structure stored by a structured database, in accordance with an embodiment of the present invention.

圖3B是根據本發明的另一實施例的結構化資料庫所儲存的具有特定資料結構的多個記錄的示意圖。3B is a schematic diagram of a plurality of records having a particular data structure stored by a structured database in accordance with another embodiment of the present invention.

圖3C是根據本發明的一實施例的指引資料儲存裝置所儲存的指引資料的示意圖。FIG. 3C is a schematic diagram of the guidance material stored by the guidance data storage device according to an embodiment of the invention.

圖4A為根據本發明的一實施例的檢索方法的流程圖。4A is a flow chart of a retrieval method in accordance with an embodiment of the present invention.

圖4B為根據本發明的另一實施例的自然語言理解系統工作過程的流程圖。4B is a flow chart showing the operation of a natural language understanding system in accordance with another embodiment of the present invention.

圖5A是依照本發明一實施例所繪示的自然語言對話系統的方塊圖。FIG. 5A is a block diagram of a natural language dialogue system according to an embodiment of the invention.

圖5B是依照本發明一實施例所繪示的自然語言理解系統的方塊圖。FIG. 5B is a block diagram of a natural language understanding system according to an embodiment of the invention.

圖5C是依照本發明另一實施例所繪示的自然語言對話系統的方塊圖。FIG. 5C is a block diagram of a natural language dialogue system according to another embodiment of the present invention.

圖6是依照本發明一實施例所繪示的修正語音應答的方法流程圖。FIG. 6 is a flow chart of a method for modifying a voice response according to an embodiment of the invention.

圖7A是依照本發明一實施例所繪示的自然語言對話系統的方塊圖。FIG. 7A is a block diagram of a natural language dialogue system according to an embodiment of the invention.

圖7B是依照本發明另一實施例所繪示的自然語言對話系統的方塊圖。FIG. 7B is a block diagram of a natural language dialogue system according to another embodiment of the present invention.

圖8A是依照本發明一實施例所繪示的自然語言對話方法流程圖。FIG. 8A is a flowchart of a natural language dialogue method according to an embodiment of the invention.

圖8B是根據本發明的再一實施例的結構化資料庫所儲存的具有特定資料結構的多個記錄的示意圖。8B is a schematic diagram of a plurality of records having a particular data structure stored by a structured database in accordance with yet another embodiment of the present invention.

圖9為依據本發明一實施例的移動終端裝置的系統示意圖。FIG. 9 is a schematic diagram of a system of a mobile terminal device according to an embodiment of the invention.

圖10為依據本發明一實施例的資訊系統的系統示意圖。FIG. 10 is a schematic diagram of a system of an information system according to an embodiment of the invention.

圖11為依據本發明一實施例的基於語音識別的選擇方法的流程圖。11 is a flow chart of a voice recognition based selection method in accordance with an embodiment of the present invention.

圖12是依照本發明一實施例所繪示的語音操控系統的方塊圖。FIG. 12 is a block diagram of a voice control system according to an embodiment of the invention.

圖13是依照本發明另一實施例所繪示的語音操控系統的方塊圖。FIG. 13 is a block diagram of a voice control system according to another embodiment of the invention.

圖14是依照本發明一實施例所繪示的語音操控方法的流程圖。FIG. 14 is a flowchart of a voice control method according to an embodiment of the invention.

由於現有運用固定詞列表的實施方式只能提供僵化的輸入規則，對於用戶多變的輸入語句的判斷能力十分不足，所以常導致對用戶的意圖判斷錯誤而找不到所需的資訊、或是因為判斷力不足而輸出不必要的資訊給用戶等問題。此外，現有的搜尋引擎只能對用戶提供分散、且相關不強的搜尋結果，於是用戶還要花時間逐條檢視才能過濾出所需資訊，不僅浪費時間而且可能漏失所需資訊。本發明即針對現有技術的前述問題提出一結構化資料的檢索方法與系統，在結構化資料提供特定的欄位來儲存不同類型的資料元素，俾提供用戶使用自然語音輸入資訊進行檢索時，能快速且正確地判斷用戶的意圖，進而提供所需資訊予用戶、或提供更精確訊息供其選取。Since the existing implementation of the fixed word list can only provide rigid input rules, the judgment ability of the user's variable input sentence is very insufficient, so often the user's intention is judged incorrectly and the required information cannot be found, or The problem of outputting unnecessary information to the user due to insufficient judgment. In addition, the existing search engine can only provide users with scattered and unrelated search results, so users have to spend time and one by one to filter out the required information, not only wasting time but also missing the required information. The present invention provides a method and system for retrieving structured data for the aforementioned problems of the prior art. The structured data provides a specific field to store different types of data elements, and provides a user with natural voice input information for searching. Quickly and correctly judge the user's intent, thereby providing the required information to the user, or providing a more accurate message for selection.

圖1為根據本發明的一實施例的自然語言理解系統的方塊圖。如圖1所示，自然語言理解系統100包括檢索系統200、自然語言處理器300以及知識輔助理解模組400，知識輔助理解模組400耦接自然語言處理器300以及檢索系統200，檢索系統200還包括結構化資料庫220、搜尋引擎240以及檢索介面單元260，其中搜尋引擎240耦接結構化資料庫220以及檢索介面單元260。在本實施例中，檢索系統200包括有檢索介面單元260，但非以限定本發明，某些實施例中可能沒有檢索介面單元260，而以其他方式使搜尋引擎240對結構化資料庫220進行全文檢索。1 is a block diagram of a natural language understanding system in accordance with an embodiment of the present invention. As shown in FIG. 1 , the natural language understanding system 100 includes a retrieval system 200 , a natural language processor 300 , and a knowledge assisted understanding module 400 . The knowledge assisted understanding module 400 is coupled to the natural language processor 300 and the retrieval system 200 , and the retrieval system 200 The structure database 220, the search engine 240, and the search interface unit 260 are further included. The search engine 240 is coupled to the structured database 220 and the retrieval interface unit 260. In the present embodiment, the retrieval system 200 includes a retrieval interface unit 260, but is not intended to limit the present invention. In some embodiments, the retrieval interface unit 260 may not be present, and the search engine 240 may otherwise cause the search engine 240 to perform the structured database 220. Full Text Search.

當用戶對自然語言理解系統100發出請求資訊102時，自然語言處理器300可分析請求資訊102，並在將所分析的可能意圖語法資料106送往知識輔助理解模組400，其中可能意圖語法資料106包含關鍵字108與意圖資料112。隨後，知識輔助理解模組400取出可能意圖語法資料106中的關鍵字108並送往檢索系統200並將意圖資料112儲存在知識輔助理解模組400內部，而檢索系統200中的搜尋引擎240將依據關鍵字108對結構化資料庫220進行全文檢索之後，再將全文檢索的回應結果110回傳至知識輔助理解模組400。接著，知識輔助理解模組400依據回應結果110對所儲存的意圖資料112進行比對，並將所求得的確定意圖語法資料114送往分析結果輸出模組116，而分析結果輸出模組116再依據確定意圖語法資料114，傳送分析結果104至伺服器(未顯示)，隨後在查詢到用戶所需的資料後將其送給用戶。應注意的是，分析結果104可包含關鍵字108，亦可輸出包含關鍵字108的記錄(例如圖3A/3B的記錄)的部分資訊(例如記錄302的編號)、或是全部的資訊。此外，分析結果104可直接被伺服器轉換成語音輸出予用戶、或是再經過特定處理後再輸出對應語音予用戶(後文會再詳述“特定處理”的方式與所包含的內容與資訊)，本領域的技術人員可依據實際需求設計檢索系統200所輸出的資訊，本發明對此不予以限制。When the user issues the request information 102 to the natural language understanding system 100, the natural language processor 300 can analyze the request information 102 and send the analyzed possible intent grammar data 106 to the knowledge assisted understanding module 400, where the grammar data may be intended 106 includes a keyword 108 and an intent profile 112. Subsequently, the knowledge assisted understanding module 400 retrieves the keywords 108 in the possible intent grammar material 106 and sends them to the retrieval system 200 and stores the intent data 112 within the knowledge assisted understanding module 400, while the search engine 240 in the retrieval system 200 will After the full-text search is performed on the structured database 220 according to the keyword 108, the response result 110 of the full-text search is transmitted back to the knowledge assisted understanding module 400. Then, the knowledge assisted understanding module 400 compares the stored intent data 112 according to the response result 110, and sends the determined determined intent grammar data 114 to the analysis result output module 116, and the analysis result output module 116 Then, based on the determined intent grammar data 114, the analysis result 104 is transmitted to the server (not shown), and then sent to the user after the information required by the user is queried. Should pay attention to Yes, the analysis result 104 may include the keyword 108, and may also output partial information (eg, the number of the record 302) of the record containing the keyword 108 (eg, the record of FIG. 3A/3B), or all of the information. In addition, the analysis result 104 can be directly converted into a voice output to the user by the server, or after a specific process, and then the corresponding voice is output to the user (the specific method and the content and information included later will be described in detail later). The information output by the retrieval system 200 can be designed by a person skilled in the art according to actual needs, which is not limited by the present invention.

上述的分析結果輸出模組116可視情況與其他模組相結合，例如在一實施例中可併入知識輔助理解模組400中、或是在另一實施例中分離于自然語言理解系統100而位於伺服器(例如包含自然語言理解系統100者)中，於是伺服器將直接接收意圖語法資料114再進行處理。此外，知識輔助理解模組400可將意圖資料112儲存在模組內部的儲存裝置中、在自然語言理解系統100中、伺服器中(例如包含自然語言理解系統100者)、或是在任何可供知識輔助理解模組400可以擷取到的儲存器中，本發明對此並不加以限定。再者，自然語言理解系統100包括檢索系統200、自然語言處理器300以及知識輔助理解模組400可以用硬體、軟體、固件、或是上述方式的各種結合方式來構築，本發明亦未對此進行限制。The analysis result output module 116 may be combined with other modules, for example, may be incorporated into the knowledge assisted understanding module 400 in one embodiment, or separated from the natural language understanding system 100 in another embodiment. Located in the server (eg, including the natural language understanding system 100), the server will then receive the intent grammar data 114 for processing. In addition, the knowledge assisted understanding module 400 can store the intent data 112 in a storage device inside the module, in the natural language understanding system 100, in the server (for example, including the natural language understanding system 100), or in any The storage for the knowledge-assisted understanding module 400 can be retrieved, which is not limited by the present invention. Furthermore, the natural language understanding system 100, including the retrieval system 200, the natural language processor 300, and the knowledge-assisted understanding module 400 can be constructed by hardware, software, firmware, or various combinations of the above, and the present invention is not This is a limitation.

前述自然語言理解系統100可以位於雲端伺服器中，也可以位於區域網路中的伺服器，甚或是位於個人電腦、移動電腦裝置(如筆記型電腦)或移動通訊裝置(如手機)等。自然語言理解系統100或檢索系統200中的各構件也不一定需設置在同一機器中，而可視實際需要分散在不同裝置或系統透過各種不同的通訊協定來連結。例如，自然語言理解處理器300及知識輔助理解模組400可配置於同一智慧型手機內，而檢索系統200可配置在另一雲端伺服器中；或者是，檢索介面單元260、自然語言理解處理器300及知識輔助理解模組400可配置於同一筆記型電腦內，而搜尋引擎240及結構化資料庫220可配置于區域網路中的另一伺服器中。此外，當自然語言理解系統100皆位於伺服器時(不論是雲端伺服器或區域網路伺服器)，可以將檢索系統200、自然語言理解處理器300、以及知識輔助理解模組400配置不同的電腦主機中，並由伺服器主系統來統籌其相互間的訊息與資料的傳送。當然，檢索系統200、自然語言理解處理器300、以及知識輔助理解模組400亦可視實際需求而將其中兩者或全部合併在一電腦主機中，本發明並不對這部分的配置進行限制。The aforementioned natural language understanding system 100 may be located in a cloud server, or may be located in a server in a local area network, or even in a personal computer, a mobile computer device (such as a notebook computer) or a mobile communication device (such as a mobile phone). Natural language understanding The components in the system 100 or the retrieval system 200 do not necessarily need to be disposed in the same machine, but may be dispersed in different devices or systems through various communication protocols depending on actual needs. For example, the natural language understanding processor 300 and the knowledge assisted understanding module 400 can be configured in the same smart phone, and the retrieval system 200 can be configured in another cloud server; or, the retrieval interface unit 260, natural language understanding processing The device 300 and the knowledge assisted understanding module 400 can be configured in the same notebook computer, and the search engine 240 and the structured database 220 can be configured in another server in the local area network. In addition, when the natural language understanding system 100 is located at the server (whether it is a cloud server or a local area network server), the retrieval system 200, the natural language understanding processor 300, and the knowledge assistance understanding module 400 may be configured differently. In the host computer, the server main system coordinates the transmission of information and data between them. Of course, the retrieval system 200, the natural language understanding processor 300, and the knowledge-assisted understanding module 400 can also combine two or all of them into a computer host according to actual needs, and the present invention does not limit the configuration of this part.

在本發明的實施例中，用戶可以用各種方式來向自然語言處理器300發出請求資訊，例如用說話的語音輸入或是文字描述等方式來發出請求資訊。舉例來說，若自然語言理解系統100是位於雲端或區域網路中的伺服器(未顯示)內，則用戶可先藉由移動裝置(例如手機、PDA、平板電腦或類似系統)來輸入請求資訊102，接著再透過電信系統業者來將請求資訊102傳送至伺服器中的自然語言理解系統100，來讓自然語言處理器300進行請求資訊102的分析，最後伺服器於確認用戶意圖後，再透過分析結果輸出模組116將對應的分析結果104透過伺服器的處理後，將用戶所請求的資訊傳回用戶的移動裝置。舉例來說，請求資訊102可以是用戶希望藉由自然語言理解系統100來求得答案的問題(例如"明天上海的天氣怎麼樣啊")，而自然語言理解系統100在分析出用戶的意圖是查詢上海明天的天氣時，將透過分析結果輸出模組116將所查詢的天氣資料作為輸出結果104送給用戶。此外，若用戶對自然語言理解系統100所下的指令為"我要看讓子彈飛"、"我想聽一起走過的日子"時，因為“讓子彈飛”或“一起走過的日子”可能包含不同的領域，所以自然語言處理器300會將用戶的請求資訊102分析成一個或一個以上的可能意圖語法資料106，此可能意圖語法資料106包括有關鍵字108及意圖資料112，然後再經由對檢索系統220中的結構化資料240進行全文檢索後，進而確認用戶的意圖。In an embodiment of the present invention, the user can send request information to the natural language processor 300 in various ways, such as sending a request message by means of a spoken voice input or a text description. For example, if the natural language understanding system 100 is located in a server (not shown) in the cloud or regional network, the user may first input the request by using a mobile device (such as a mobile phone, PDA, tablet, or the like). The information 102 is then transmitted to the natural language understanding system 100 in the server through the telecommunication system operator to allow the natural language processor 300 to perform the analysis of the request information 102. Finally, after confirming the user's intention, the server Output through analysis results After the module 116 passes the corresponding analysis result 104 to the server, the module 116 transmits the information requested by the user back to the user's mobile device. For example, the request information 102 may be a question that the user desires to obtain an answer by the natural language understanding system 100 (eg, "How is the weather in Shanghai tomorrow"), and the natural language understanding system 100 analyzes the user's intention is When the weather of Shanghai Tomorrow is queried, the weather data to be queried will be sent to the user as the output result 104 through the analysis result output module 116. In addition, if the user's instruction to the natural language understanding system 100 is "I want to see the bullet fly", "I want to listen to the days I walked together", because "let the bullets fly" or "the days that have passed together" Different fields may be included, so the natural language processor 300 will parse the user's request information 102 into one or more possible intent grammar materials 106, which may include the keyword 108 and the intent data 112, and then After the full-text search is performed on the structured material 240 in the search system 220, the user's intention is further confirmed.

進一步來說，當用戶的請求資訊102為"明天上海的天氣怎麼樣啊？"時，自然語言處理器300經過分析後，可產生一個可能意圖語法資料106："<queryweather>,<city>=上海,<時間>=明天"。Further, when the user's request information 102 is "How is the weather in Shanghai tomorrow?", the natural language processor 300 can generate a possible intent grammar data 106 after analysis: "<queryweather>, <city>= Shanghai, <time> = tomorrow.

在一實施例中，如果自然語言理解系統100認為用戶的意圖已相當明確，便可以直接將用戶的意圖(亦即查詢明天上海的天氣)透過分析結果輸出模組116輸出分析結果104至伺服器，而伺服器可在查詢到用戶所指定的天氣候傳送給用戶。又例如，當用戶的請求資訊102為"我要看三國演義"時，自然語言處理器300 經過分析後，可產生出三個可能意圖語法資料106："<readbook>,<bookname>=三國演義"；"<watchTV>,<TVname>=三國演義"；以及"<watchfilm>,<filmname>=三國演義"。In an embodiment, if the natural language understanding system 100 considers that the intention of the user is quite clear, the user's intention (ie, querying the weather in Shanghai tomorrow) can be directly output to the server through the analysis result output module 116. And the server can transmit the user to the user in the weather specified by the user. For another example, when the user's request information 102 is "I want to see the Romance of the Three Kingdoms", the natural language processor 300 After analysis, three possible intent grammar data 106 can be generated: "<readbook>, <bookname>=Three Kingdoms"; "<watchTV>, <TVname>=Three Kingdoms"; and "<watchfilm>, <filmname> = The Romance of the Three Kingdoms."

這是因為可能意圖語法資料106中的關鍵字108(亦即“三國演義”)可能屬於不同的領域，亦即書籍(<readbook>)、電視劇(<watchTV>)、以及電影(<readfilm>)三個領域，所以一個請求資訊102可分析成多個可能意圖語法資料106，因此需要透過知識輔助理解模組400做進一步分析，來確認用戶的意圖。再舉另一個例子來說，若用戶輸入"我要看讓子彈飛"時，因其中的"讓子彈飛"有可能是電影名稱或是書名稱，所以也可能出現至少以下兩個可能意圖語法資料106："<readbook>,<bookname>=讓子彈飛"；以及"<watchfilm>,<filmname>=讓子彈飛"；其分別屬於書籍與電影兩個領域。上述的可能意圖語法資料106隨後需透過知識輔助理解模組400做進一步分析，並從中求得確定意圖語法資料114，來表達用戶的請求資訊的明確意圖。當知識輔助理解模組400分析可能意圖語法資料106時，知識輔助理解模組400可透過檢索介面260傳送關鍵字108(例如上述的“三國演義”或“讓子彈飛”)給檢索系統200。檢索系統200中的結構化資料庫220儲存了具有特定資料結構的多個記錄，而搜尋引擎240能藉由檢索介面單元260所接收的關鍵字108來對結構化資料庫220進行全文檢索，並將全文檢索所獲得的回應結果110回傳給知識輔助理解模組400，隨後知識輔助理解模組400便能藉由此回應結果110來求得確定意圖語法資料114。至於對結構化資料庫220進行全文檢索以確定意圖語法資料114的細節，將在後面透過圖3A、圖3B與相關段落做更詳細的描述。This is because the keywords 108 (ie, "Three Kingdoms") in the possible grammar material 106 may belong to different fields, that is, books (<readbook>), television dramas (<watchTV>), and movies (<readfilm>). There are three fields, so one request information 102 can be analyzed into a plurality of possible intent grammar materials 106, so further analysis by the knowledge assisted understanding module 400 is needed to confirm the user's intention. As another example, if the user enters "I want to see the bullet fly", because the "let the bullet fly" may be the movie name or the book name, there may be at least the following two possible intent grammars. Source 106: "<readbook>, <bookname>=Let the bullet fly"; and "<watchfilm>, <filmname>=Let the bullet fly"; they belong to two areas of books and movies. The above-mentioned possible intent grammar data 106 is then further analyzed by the knowledge assisted understanding module 400, and the determined intent grammar data 114 is obtained therefrom to express the explicit intent of the user's request information. When the knowledge assisted understanding module 400 analyzes the possible intent grammar data 106, the knowledge assisted understanding module 400 can transmit the keyword 108 (eg, "Three Kingdoms" or "Let the bullet fly") to the retrieval system 200 via the search interface 260. The structured repository 220 in the retrieval system 200 stores a plurality of records having a particular data structure, and the search engine 240 can retrieve the keywords 108 received by the interface unit 260. The structured database 220 performs a full-text search, and transmits the response result 110 obtained by the full-text search to the knowledge-assisted understanding module 400, and then the knowledge-assisted understanding module 400 can obtain the determined intent grammar by responding to the result 110. Information 114. As for the full-text search of the structured repository 220 to determine the details of the intent grammar data 114, a more detailed description will be provided later through FIGS. 3A, 3B and related paragraphs.

在本發明的概念中，自然語言理解系統100能先擷取用戶的請求資訊102中的關鍵字108，並藉由結構化資料庫220的全文檢索結果來判別關鍵字108的領域屬性，例如上述輸入“我要看三國演義”時，會產生分別屬於書籍、電視劇、電影三個領域的可能意圖語法資料106，隨後再進一步分析並確認用戶的明確意圖。因此用戶能夠很輕鬆地以口語化方式來表達出其意圖或資訊，而不需要特別熟記特定用語，例如現有作法中關於固定詞列表的特定用語。In the concept of the present invention, the natural language understanding system 100 can first retrieve the keyword 108 in the user's request information 102, and determine the domain attribute of the keyword 108 by the full-text search result of the structured database 220, such as the above. When you enter "I want to see the Romance of the Three Kingdoms", it will generate possible intent grammar materials 106 belonging to the three fields of books, TV series, and movies, and then further analyze and confirm the user's clear intentions. Therefore, users can easily express their intentions or information in a colloquial manner, without having to memorize specific terms, such as specific terms in the existing practice regarding fixed word lists.

圖2為根據本發明的一實施例的自然語言處理器300對用戶的各種請求資訊的分析結果的示意圖。2 is a diagram showing the results of analysis of various request information of a user by the natural language processor 300, in accordance with an embodiment of the present invention.

如圖2所示，當用戶的請求資訊102為"明天上海的天氣怎麼樣啊"時，自然語言處理器300經過分析後，可產生出可能意圖語法資料106為："<queryweather>,<city>=上海,<時間>=明天"As shown in FIG. 2, when the user's request information 102 is "How is the weather in Shanghai tomorrow?", after the natural language processor 300 analyzes, the possible intent grammar data 106 can be generated as: "<queryweather>, <city >=Shanghai, <time>=Tomorrow"

其中意圖資料112為"<queryweather>"、而關鍵字108為"上海"與"明天"。由於經自然語言處理器300的分析後只取得一組意圖語法資料106(查詢天氣<queryweather>)，因此在一實施例中，知識輔助理解模組400可直接取出關鍵字108"上海"與"明天"作為分析結果104送往伺服器來查詢天氣的資訊(例如查詢明天上海天氣概況、包含氣象、氣溫...等資訊)，而不需要對結構化資料庫220進行全文檢索來判定用戶意圖。當然，在一實施例中，仍可對結構化資料庫220進行全文檢索做更精確的用戶意圖判定，本領域的技術人員可依據實際需求進行變更。The intent data 112 is "<queryweather>", and the keyword 108 is "Shanghai" and "tomorrow". Since only a set of intent grammar data 106 (query weather <queryweather>) is obtained after analysis by the natural language processor 300, in an embodiment The knowledge assisted understanding module 400 can directly retrieve the keywords 108 "Shanghai" and "Tomorrow" as the analysis result 104 and send them to the server to query the weather information (for example, query the weather profile of Shanghai tomorrow, including weather, temperature, etc. Information), without having to perform a full-text search on the structured repository 220 to determine user intent. Of course, in an embodiment, the structured database 220 can still be subjected to full-text search for more accurate user intent determination, and those skilled in the art can make changes according to actual needs.

此外，當用戶的請求資訊102為"我要看讓子彈飛"時，因為可產生出兩個可能意圖語法資料106："<readbook>,<bookname>=讓子彈飛"；以及"<watchfilm>,<filmname>=讓子彈飛"；與兩個對應的意圖資料112"<readbook>"與"<watchfilm>"、以及兩個相同的關鍵字108"讓子彈飛"，來表示其意圖可能是看"讓子彈飛"的書籍或是看"讓子彈飛"的電影。為進一步確認用戶的意圖，將透過知識輔助理解模組400傳送關鍵字108"讓子彈飛"給檢索介面單元260，接著搜尋引擎240藉由此關鍵字108"讓子彈飛"來對結構化資料庫220進行全文檢索，以確認"讓子彈飛"應該是書名稱或是電影名稱，藉以確認用戶的意圖。In addition, when the user's request information 102 is "I want to see the bullet fly", because two possible intent grammar materials 106 can be generated: "<readbook>, <bookname>=let the bullet fly"; and "<watchfilm> , <filmname>=let the bullet fly"; with two corresponding intent materials 112"<readbook>" and "<watchfilm>", and two identical keywords 108 "let the bullet fly" to indicate that the intention may be Look at the book "Let the bullets fly" or watch the movie "Let the bullets fly." To further confirm the user's intent, the keyword 108 "Let the bullet fly" be sent to the search interface unit 260 via the knowledge assisted understanding module 400, and then the search engine 240 uses the keyword 108 "Let the bullet fly" to structure the data. The library 220 performs a full-text search to confirm that "let the bullet fly" should be the title of the book or the name of the movie to confirm the user's intention.

再者，當用戶的請求資訊102為"我想聽一起走過的日子"時，可產生出兩個可能意圖語法資料106："<playmusic>,<singer>=一起走過,<songname>=日子"；"<playmusic>,<songname>=一起走過的日子"Furthermore, when the user's request information 102 is "I want to listen to the day I walked together", two possible intent grammar materials 106 can be generated: "<playmusic>, <singer>= walked together, <songname>= Day ";"<playmusic>, <songname>=days passed together"

兩個對應的相同的意圖資料112"<playmusic>"、以及兩組對應的關鍵字108"一起走過"與"日子"及"一起走過的日子"，來分別表示其意圖可能是聽歌手"一起走過"所唱的歌曲"日子"、或是聽歌曲"一起走過的日子"，此時知識輔助理解模組400可傳送第一組關鍵字108"一起走過"與"日子"以及第二組關鍵字"一起走過的日子"給檢索介面單元260，來確認是否有"一起走過"這位歌手所演唱的"日子"這首歌(第一組關鍵字所隱含的用戶意圖)、或是否有"一起走過的日子"這首歌(第二組關鍵字所隱含的用戶意圖)，藉以確認用戶的意圖。然而，本發明並不限於在此所表示的各可能意圖語法資料與意圖資料所對應的格式與名稱。Two corresponding identical intent materials 112"<playmusic>", and two The group corresponding to the keyword 108 "walked together" with "days" and "the days that have passed together" to indicate that their intention may be to listen to the song "Day" that the singer "walked together" or listen to the song. "Days passed together", at this time, the knowledge assisted understanding module 400 can transmit the first set of keywords 108 "walking together" to the "days" and the second set of keywords "todays" to the search interface unit 260, to confirm whether there is a "day" song sung by the singer (the user's intention implied by the first set of keywords), or whether there is a "day of the past" song (the first) The user intent implied by the two sets of keywords) to confirm the user's intention. However, the present invention is not limited to the format and name corresponding to each possible intent grammar material and intent material represented herein.

圖3A是根據本發明的一實施例的結構化資料庫220所儲存的具有特定資料結構的多個記錄的示意圖。3A is a diagram of a plurality of records having a particular data structure stored by structured database 220, in accordance with an embodiment of the present invention.

一般而言，在一些現有的全文檢索作法中，所獲得的搜尋結果是非結構化的資料(例如透過google或百度所搜尋的結果)，因其搜尋結果的各項資訊是分散且不具關聯的，所以用戶必須再對各項資訊逐一檢視，因此造成實用性的限制。然而，在本發明的概念中，能藉由結構化資料庫來有效增進檢索的效率與正確性。因為本發明所揭示的結構化資料庫中的每個記錄內部所包含的數值資料相互間具有關聯性，且這些數值資料共同用以表達該記錄的屬性。於是在搜尋引擎對結構化資料庫進行一全文檢索時，可在記錄的數值資料與關鍵字產生匹配時，輸出對應於該數值資料的指引資料，作為確認該請求資訊的意圖。這部分的實施細節將透過下列實例作更進一步的描述。In general, in some existing full-text search methods, the search results obtained are unstructured data (such as those searched through Google or Baidu), because the information of the search results is scattered and unrelated. Therefore, the user must review each piece of information one by one, thus causing practical limitations. However, in the concept of the present invention, the efficiency and correctness of the retrieval can be effectively improved by the structured database. Because the numerical data contained in each record in the structured database disclosed by the present invention is related to each other, and the numerical data is used together to express the attributes of the record. Therefore, when the search engine performs a full-text search on the structured database, when the recorded numerical data matches the keyword, the guidance data corresponding to the numerical data is output as the intention to confirm the requested information. The implementation details of this section will be further described through the following examples.

在本發明的實施例中，結構化資料庫220所儲存的每個記錄302包括標題欄位304及內容欄位306，標題欄位304內包括多個分欄位308，各分欄位包括指引欄位310以及數值欄位312，所述多個記錄302的指引欄位310用以儲存指引資料，而所述多個記錄302的數值欄位312用以儲存數值資料。在此以圖3A所示的記錄1來舉例說明，記錄1的標題欄位304中的三個分欄位308分別儲存了："singerguid：劉德華"、"songnameguid：一起走過的日子"；及"songtypeguid：港臺，粵語，流行"；各分欄位308的指引欄位310分別儲存了指引資料"singerguid"、"songnameguid"及"songtypeguid"、而其對應分欄位308的數值欄位312則分別儲存了數值資料"劉德華"、"一起走過的日子"及"港臺，粵語，流行"。指引資料"singerguid"代表數值資料"劉德華"的領域種類為歌手名稱(singer)，指引資料"songnameguid"代表數值資料"一起走過的日子"的領域種類為歌曲名稱(song)，指引資料"songtypeguid"代表數值資料"港臺，粵語，流行"的領域種類為歌曲類型(song type)。在此的各指引資料實際上可分別用不同的特定一串數字或字來表示，在本發明中不以此為限。記錄1的內容欄位306則是儲存了"一起走過的日子"這首歌的歌詞內容或儲存其他的資料(例如作曲/詞者...等)，然而各記錄的內容欄位306中的真實資料並非本發明所強調的重點，因此在圖3A中僅示意性地來描述之。In the embodiment of the present invention, each record 302 stored in the structured database 220 includes a title field 304 and a content field 306. The title field 304 includes a plurality of field positions 308, each of which includes a guide. Field 310 and value field 312, the navigation field 310 of the plurality of records 302 is used to store the guidance data, and the value field 312 of the plurality of records 302 is used to store the numerical data. Here, the record 1 shown in FIG. 3A is exemplified, and the three sub-fields 308 in the title field 304 of the record 1 are respectively stored: "singerguid: Andy Lau", "songnameguid: the days that have passed together"; "songtypeguid: RTHK, Cantonese, popular"; the guide field 310 of each sub-field 308 stores the guide data "singerguid", "songnameguid" and "songtypeguid", respectively, and the corresponding field 312 of the corresponding sub-field 308 The numerical data "Andy Lau", "the days that passed together" and "Hong Kong and Taiwan, Cantonese, popular" were stored. In the field of the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer, the singer The domain type of "representative numerical data", Hong Kong and Taiwan, Cantonese, and popular is the song type. The various reference materials herein may be represented by different specific strings or words, respectively, and are not limited thereto. The content field 306 of the record 1 is the lyrics content of the song that stores the "days that have passed together" or stores other materials (such as composer/writer, etc.), but in the content field 306 of each record. Real information is not the focus of the invention, so This is only schematically illustrated in Figure 3A.

前述的實施例中，每個記錄包括標題欄位304及內容欄位306，且標題欄位304內的分欄位308包括指引欄位310以及數值欄位312，但非以限定本發明，某些實施例中也可以沒有內容欄位306，甚或是有些實施例中可以沒有指引欄位310。In the foregoing embodiment, each record includes a title field 304 and a content field 306, and the field 308 in the title field 304 includes a guide field 310 and a value field 312, but is not limited to the present invention. There may also be no content field 306 in some embodiments, or even some of the embodiments may have no navigation field 310.

除此之外，在本發明的實施例中，於各分欄位308的資料間儲存有第一特殊字元來分隔各分欄位308的資料，於指引欄位310與該數值欄位312的資料間儲存有第二特殊字元來分隔指引欄位與數值欄位的資料。舉例來說，如圖3A所示，"singerguid"與"劉德華"之間、"songnameguid"與"一起走過的日子"之間、以及"songtypeguid"與"港臺，粵語，流行"之間是利用第二特殊字元"："來做分隔，而記錄1的各分欄位308間是利用第一特殊字元"∣"來做分隔，然而本發明並不限於以"："或"∣"來做為用以分隔的特殊字元。In addition, in the embodiment of the present invention, a first special character is stored between the data of each sub-field 308 to separate the data of each sub-field 308 in the navigation field 310 and the value field 312. The second special character is stored between the data to separate the information of the guide field and the value field. For example, as shown in FIG. 3A, between "singerguid" and "Andy Lau", "songnameguid" and "days passed together", and "songtypeguid" and "Hong Kong and Taiwan, Cantonese, popular" are The second special character ":" is used for the separation, and the respective sub-fields 308 of the record 1 are separated by the first special character "∣", but the invention is not limited to ":" or "∣" "Let's be used as a special character to separate.

另一方面，在本發明的實施例中，標題欄位304中的各分欄位308可具有固定位數，例如各分欄位308的固定位數可以是32個字，而其中的指引欄位310的固定位數可以是7或8個位(最多用來指引128或256種不同的指引資料)，此外，因第一特殊字元與第二特殊字元所需要的位元數可以是固定的，所以分欄位308的固定位數在扣除指引欄位310、第一特殊字元、第二特殊字元所占去的位元數後，剩下的位數便可悉數用來儲存數值欄位312的數值資料。再者，由於分欄位308的位數固定，加上分欄位308 儲存資料的內容可如圖3A所示依序為指引欄位310(指引資料的指標)、第一特殊字元、數值欄位312的數值資料、第二特殊字元，而且如前所述，這四個資料的位元數量也是固定的，於是在實作上可跳過指引欄位310的位(例如跳過前7或8個位)、以及第二特殊字元的位元數(例如再跳過1個字，亦即8個位)後，再扣掉第一特殊字元所占的位元數(例如最後1個字、8個位)之後，最後便可直接取得數值欄位312的數值資料(例如在記錄1的第一個分欄位308中直接取出數值資料”劉德華”，此時還有32-3=29個字可供儲存數值欄位312的數值資料，算式中的3(亦即1+1+1)代表被指引欄位310的指引資料、第一特殊字元、第二特殊字元所分別占去的1個字)，接著再進行所需的領域種類判斷即可。於是，在目前所取出的數值資料比對完畢後(不論是否比對成功與否)，可以再依據上述取出數值資料的方式取出下一個分欄位308的數值資料(例如在記錄1的第二個分欄位308中直接取出數值資料“一起走過的日子”)，來進行比對領域種類的比對。上述取出數值資料的方式可以從記錄1開始進行比對，並在比對完記錄1所有的數值資料後，再取出記錄2的標題欄位308中第一個分欄位308的數值資料(例如“馮小剛”)進行比對。上述比對程式將持續進行，直到所有記錄的數值資料都被比對過為止。On the other hand, in the embodiment of the present invention, each of the sub-fields 308 in the title field 304 may have a fixed number of bits. For example, the fixed number of bits of each sub-field 308 may be 32 words, and the guide bar therein. The fixed number of bits of bit 310 can be 7 or 8 bits (used to direct 128 or 256 different profiles), and in addition, the number of bits required for the first special character and the second special character can be Fixed, so the fixed number of digits of the sub-field 308 can be stored in the remaining digits after deducting the number of bits occupied by the index field 310, the first special character, and the second special character. Numerical data for value field 312. Furthermore, since the number of digits of the sub-field 308 is fixed, the sub-field 308 is added. The content of the stored data may be the guidance field 310 (indicator of the guidance data), the first special character, the numerical data of the numerical field 312, and the second special character, as shown in FIG. 3A, and as described above. The number of bits of the four data is also fixed, so the bits of the hint field 310 can be skipped (eg, skipping the first 7 or 8 bits) and the number of bits of the second special character (eg, After skipping 1 word, that is, 8 bits, and then deducting the number of bits occupied by the first special character (for example, the last 1 word, 8 bits), the value field can be directly obtained. The numerical data of 312 (for example, the numerical data is directly taken out in the first sub-field 308 of the record 1), and there are 32-3=29 words for storing the numerical data of the numerical field 312 in the calculation formula. 3 (ie, 1+1+1) represents the guidance data of the directed field 310, the first special character, and the second special character respectively, and then the required field type Just judge. Therefore, after the currently obtained numerical data comparison is completed (regardless of whether the comparison is successful or not), the numerical data of the next sub-field 308 can be taken out according to the above-mentioned method for taking out the numerical data (for example, in the second of the record 1). In the sub-field 308, the numerical data "days passed together" is directly taken out to compare the types of the comparison fields. The above method of taking out the numerical data can be compared from the record 1, and after comparing all the numerical data of the record 1, the numerical data of the first sub-field 308 in the title field 308 of the record 2 is taken out (for example "Feng Xiaogang") for comparison. The above comparison program will continue until all recorded values have been compared.

應注意的是，上述的分欄位308的位數、以及指引欄位310、第一特殊字元、第二特殊字元個使用的位元數可依實際應用改變，本發明對此並未加以限制。前述利用比對來取出數值資料的方式只是一種實施例，但非用以限定本發明，另一實施例可以使用全文檢索的方式來進行。此外，上述跳過指引欄位310、第二特殊字元、第一特殊字元的實作方式，可以使用位元平移(例如除法)來達成，此部分的實施可以用硬體、軟體、或兩者結合的方式進行，本領域的技術人員可依實際需求而變更。在本發明的另一實施例中，標題欄位304中的各分欄位308可具有固定位數，分欄位308中的指引欄位310可具有另一固定位數，並且標題欄位304中可不包括第一特殊字元以及第二特殊字元，由於各分欄位308以及各指引欄位310的位數為固定，所以可利用跳過特定位元數的方式或是使用位元平移(例如除法)的方式來直接取出各分欄位308中的指引資料或數值資料。It should be noted that the number of bits in the above-mentioned column 308, and the number of bits used in the index field 310, the first special character, and the second special character may be changed according to actual applications, and the present invention does not Limit it. The foregoing uses the alignment to extract the numerical data. The manner of the present invention is only an embodiment, but is not intended to limit the present invention, and another embodiment may be performed by using a full-text search. In addition, the implementation of the skip pointer field 310, the second special character, and the first special character may be achieved by using a bit translation (eg, division), which may be implemented by hardware, software, or The combination of the two can be carried out by a person skilled in the art according to actual needs. In another embodiment of the present invention, each of the sub-fields 308 in the title field 304 may have a fixed number of digits, the index field 310 in the sub-field 308 may have another fixed number of digits, and the title field 304 The first special character and the second special character may not be included. Since the number of bits in each of the sub-fields 308 and the index fields 310 is fixed, the method of skipping the specific number of bits or using the bit shifting may be used. The method (for example, division) is used to directly retrieve the guidance data or numerical data in each of the sub-fields 308.

應注意的是，由於前面已提到分欄位308具有一定的位數，所以可以在自然語言理解系統100中(或是包含自然語言理解系統100的伺服器中)使用計數器來記錄目前所比對的是某一記錄的某分欄位308。此外，比對的記錄亦可使用另一計數器來儲存其順序。舉例來說，當分別使用一第一計數器記錄來表示目前所比對的記錄順序、並使用一第二計數器來表示目前所比對的分欄位順序時，若目前比對的是圖3A的記錄2的第3個分欄位308(亦即比對“filenameguid：華誼兄弟”)時，第一計數器所儲存的數值將是2(表示目前比對的是記錄2)，第二計數器所儲存的數值則為3(表示目前比對的是第3個分欄位308)。再者，上述僅以7或8個位儲存指引欄位310的指引資料的方式，是希望將分欄位308的大多數字都用來儲存數值資料，而實際的指引資料則可透過這7、8個位當作指標，再據以從檢索系統200所儲存的指引資料儲存裝置280中讀取實際的指引資料，其中指引資料是以表格的方式進行儲存，但其他任何可供檢索系統200存取的方式皆可用在本發明中。於是，在實際操作時，除了可直接取出數值資料進行比對之外，亦可在產生匹配結果時，直接依據上述兩個計數器的數值，直接取出指引資料作為回應結果110送給知識輔助理解模組400。舉例來說，當記錄6的第2個分欄位308(亦即“songnameguid：背叛”)匹配成功時，將得知目前的第一計數器/第二計數器的數值分別為6與2，因此可以依據這兩個數值前往儲存圖3C所示的指引資料儲存裝置280，由記錄6的分欄位2查詢出指引資料為“songnameguid”。在一實施例中，可以將分欄位308的位數固定後，再將分欄位308的所有位元都用來儲存數值資料，於是可以完全除去指引欄位、第一特殊字元、第二特殊字元，而搜尋引擎240只要知道每越過固定位數就是另一個分欄位308，並在第二計數器中加一即可(當然，每換下一個記錄進行檢索時亦需將第一計數器的儲存值加一)，這樣可以提供更多的位元數來儲存數值資料。It should be noted that since the sub-field 308 has been mentioned to have a certain number of bits, a counter can be used in the natural language understanding system 100 (or in a server including the natural language understanding system 100) to record the current ratio. It is a sub-field 308 of a record. In addition, the aligned records may also use another counter to store their order. For example, when a first counter record is used to indicate the currently aligned recording order, and a second counter is used to indicate the currently aligned column order, if the current comparison is the one of FIG. 3A When the third sub-column 308 of record 2 is 308 (that is, the comparison "filenameguid: Huayi Brothers"), the value stored in the first counter will be 2 (indicating that the current comparison is record 2), the second counter is The stored value is 3 (indicating that the third sub-column 308 is currently aligned). Furthermore, the above manner of storing the guidance data of the guidance field 310 only by 7 or 8 bits is intended to be large in the sub-field 308. The multi-digits are used to store the numerical data. The actual guidance data can be used as an indicator through the 7 and 8 digits, and the actual guidance data can be read from the guidance data storage device 280 stored in the retrieval system 200. The guidance material is stored in a tabular manner, but any other means available for access by the retrieval system 200 can be used in the present invention. Therefore, in actual operation, in addition to directly extracting the numerical data for comparison, when the matching result is generated, the guidance data may be directly taken out as the response result 110 to the knowledge assisting understanding mode according to the values of the two counters. Group 400. For example, when the second sub-field 308 of record 6 (ie, "songnameguid: betrayal") is successfully matched, it will be known that the current first counter/second counter have values of 6 and 2, respectively, so According to these two values, the guidance data storage device 280 shown in FIG. 3C is stored, and the guidance data is queried from the sub-field 2 of the record 6 as "songnameguid". In an embodiment, after the number of digits of the sub-field 308 is fixed, all the bits of the sub-field 308 are used to store the numerical data, so that the guide field, the first special character, and the Two special characters, and the search engine 240 only knows that each time the fixed number of digits is crossed, it is another sub-field 308, and one of the second counters is added (of course, the first one is required for each record change) The stored value of the counter is incremented by one), which provides more bits to store the value data.

再舉一個實例來說明比對產生匹配結果時，回傳匹配記錄110至知識輔助理解模組400做進一步處理的過程。對應於上述記錄302的資料結構，在本發明的實施例中，當用戶的請求資訊102為"我要看讓子彈飛"時，可產生出兩個可能意圖語法資料 106："<readbook>,<bookname>=讓子彈飛"；與"<watchfilm>,<filmname>=讓子彈飛"；搜尋引擎240便藉由檢索介面單元260所接收的關鍵字108"讓子彈飛"來對圖3A的結構化資料庫220所儲存的記錄的標題欄位304進行全文檢索。全文檢索中，在標題欄位304中找到了儲存有數值資料"讓子彈飛"的記錄5，因此產生了匹配結果。接下來，檢索系統200將回傳記錄5標題欄位304中，對應於關鍵字108“讓子彈飛”的指引資料“filmnameguid”作為回應結果110並回傳至知識輔助理解模組400。由於在記錄5的標題欄位中，包含對應數值資料"讓子彈飛"的指引資料"filmnameguid"，所以知識輔助理解模組400藉由比對記錄5的指引資料"filmnameguid"與上述可能意圖語法資料106先前已儲存的意圖資料112"<watchfilm>"或"<readbook>"，便能判斷出此次請求資訊的確定意圖語法資料114為"<watchfilm>,<filmname>=讓子彈飛"(因為都包含“film”在其中)。換句話說，此次用戶的請求資訊102中所描述資料"讓子彈飛"是電影名稱，而資料用戶的請求資訊102的意圖為看電影"讓子彈飛"，而非閱讀書籍。Another example is given to illustrate the process of returning the matching record 110 to the knowledge assisted understanding module 400 for further processing when the matching result is generated. Corresponding to the data structure of the above record 302, in the embodiment of the present invention, when the user's request information 102 is "I want to see the bullet fly", two possible intent grammar materials may be generated. 106: "<readbook>, <bookname> = let the bullet fly"; and "<watchfilm>, <filmname> = let the bullet fly"; the search engine 240 retrieves the bullet by the keyword 108 received by the retrieval interface unit 260 The "fly" is used to perform a full-text search on the title field 304 of the record stored in the structured database 220 of FIG. 3A. In the full-text search, the record 5 storing the numerical data "Let the bullet fly" is found in the title field 304, and thus a matching result is produced. Next, the retrieval system 200 will return the guidance material "filmnameguid" corresponding to the keyword 108 "Let the bullet fly" in the header field 304 of the backhaul record 5 to the knowledge assisted understanding module 400. Since the title data of the record 5 contains the guide material "filmnameguid" corresponding to the numerical data "let the bullet fly", the knowledge assisted understanding module 400 compares the guide material "filmnameguid" of the record 5 with the above-mentioned possible intention grammar data. 106 previously stored intent data 112"<watchfilm>" or "<readbook>", it can be determined that the determination information of the request information is 114 "<watchfilm>, <filmname>= let the bullet fly" (because Both contain "film" in it). In other words, the information described in the user's request information 102 "let the bullet fly" is the movie name, and the data user's request information 102 is intended to watch the movie "let the bullet fly" instead of reading the book.

再舉一個實例作更進一步的說明。當用戶的請求資訊102為"我想聽一起走過的日子"時，可產生出兩個可能意圖語法資料106："<playmusic>,<singer>=一起走過,<songname>=日子"；與 "<playmusic>,<songname>=一起走過的日子"；搜尋引擎240便藉由檢索介面單元260所接收的兩組關鍵字108："一起走過"與"日子"；以及"一起走過的日子"Give another example for further explanation. When the user's request information 102 is "I want to listen to the day I walked together", two possible intent grammar materials 106 may be generated: "<playmusic>, <singer>= walked together, <songname>=day"; versus "<playmusic>, <songname>=days passed together"; the search engine 240 retrieves the two sets of keywords 108 received by the interface unit 260: "walking together" and "days"; and "walking together" Days"

來對圖3A的結構化資料庫220所儲存的記錄的標題欄位304進行全文檢索。由於全文檢索中，並未在所有記錄的標題欄位304中找到對應於第一組關鍵字108"一起走過"與"日子"的匹配結果，而是找到了對應於第二組關鍵字108"一起走過的日子"的記錄1，於是檢索系統200將記錄1標題欄位304中對應於第二組關鍵字108的指引資料"songnameguid"，作為匹配記錄110且回傳至知識輔助理解模組400。接下來，知識輔助理解模組400在接收對應數值資料"一起走過的日子"的指引資料"songnameguid"後，便與可能意圖語法資料106(亦即"<playmusic>,<singer>=一起走過,<songname>=日子"與"<playmusic>,<songname>=一起走過的日子")中的意圖資料112(亦即<singer>、<songname>等)進行比對，於是便發現此次用戶的請求資訊102中並未描述有歌手名稱的資料，而是描述有歌曲名稱為"一起走過的日子"的資料(因為只有<songname>比對成功)。所以，知識輔助理解模組400可藉由上述比對而判斷出此次請求資訊102的確定意圖語法資料114為"<playmusic>,<songname>=一起走過的日子"，而用戶的請求資訊102的意圖為聽歌曲"一起走過的日子"。The full-text search is performed on the title field 304 of the record stored in the structured database 220 of FIG. 3A. Due to the full-text search, the matching results corresponding to the first group of keywords 108 "going through" and "days" are not found in the title field 304 of all the records, but the corresponding corresponding to the second group of keywords 108 is found. Record 1 of "Days Walking Together", then the retrieval system 200 records the guidance material "songnameguid" corresponding to the second group of keywords 108 in the title field 304 as the matching record 110 and returns it to the knowledge-assisted understanding mode. Group 400. Next, after receiving the guidance material "songnameguid" corresponding to the value data "days passed together", the knowledge assisted understanding module 400 walks with the possible intention grammar material 106 (ie, "playmusic>, <singer>=" After that, <songname>=days are compared with the intent data 112 (ie, <singer>, <songname>, etc.) in "<playongic>, <songname>=days passed together), so this is found. The sub-user's request information 102 does not describe the material of the singer's name, but describes the material whose song name is "the day that passed together" (because only <songname> is successful). Therefore, the knowledge assisted understanding module 400 can determine, by the above comparison, that the determined intent grammar data 114 of the request information 102 is "<playmusic>, <songname>=days passed together", and the user requests information. The intent of 102 is to listen to the song "the days that have passed together."

在本發明的另一實施例中，檢索而得的回應結果110可以是與關鍵字108完全匹配的完全匹配記錄、或是與關鍵字108部分匹配的部分匹配記錄。舉例來說，如果用戶的請求資訊102為"我想聽蕭敬騰的背叛"，同樣地，自然語言處理器300經過分析後，產生出兩個可能意圖語法資料106："<playmusic>,<singer>=蕭敬騰,<songname>=背叛"；及"<playmusic>,<songname>=蕭敬騰的背叛"；並傳送兩組關鍵字108："蕭敬騰"與"背叛"；以及"蕭敬騰的背叛"；給檢索介面單元260，搜尋引擎240接著藉由檢索介面單元260所接收的關鍵字108來對圖3A的結構化資料庫220所儲存的記錄302的標題欄位304進行全文檢索。由於在全文檢索中，對應第二組關鍵字108"蕭敬騰的背叛"並未匹配到任何記錄，但是對應第一組關鍵字108"蕭敬騰"與"背叛"找到了記錄6與記錄7的匹配結果。由於第二組關鍵字108"蕭敬騰"與"背叛"僅與記錄6中的數值資料"蕭敬騰"相匹配，而未匹配到其他數值資料"楊宗緯"及"曹格"，因此記錄6為部分匹配記錄(請注意上述對應請求資訊102"我要看讓子彈飛"的記錄5以及對應請求資訊"我想聽一起走過的日子"的記錄1皆為部分匹配記錄)，而關鍵字"蕭敬騰"與"背叛"完全匹配了記錄7的數值資料(因為第二組關鍵字108"蕭敬騰"與"背叛"皆匹配成功)，所以記錄7為完全匹配記錄。在本發明的實施例中，當該檢索介面單元260輸出多個匹配記錄110至知識輔助理解模組400時，可依序輸出完全匹配記錄(亦即全部的數值資料都被匹配)及部分匹配記錄(亦即僅有部分的數值資料被匹配)的匹配記錄110，其中完全匹配記錄的優先順序大於部分匹配記錄的優先順序。因此，在檢索介面單元260輸出記錄6與記錄7的匹配記錄110時，記錄7的輸出優先順序會大於記錄6的輸出優先順序，因為記錄7全部的數值資料"蕭敬騰"與"背叛"都產生匹配結果，但記錄6還包含"楊宗緯"與"曹格"未產生匹配結果。也就是說，結構化資料庫220中所儲存的記錄對其請求資訊102中的關鍵字108的匹配程度越高，越容易優先被輸出，以便用戶進行查閱或挑選對應的確定意圖語法資料114。在另一實施例中，可直接輸出優先順序最高的記錄所對應的匹配記錄110，做為確定意圖語法資料114之用。前述非以限定本發明，因為在另一實施例中可能採取只要搜尋到有匹配記錄即輸出的方式(例如，以"我想聽蕭敬騰的背叛"為請求資訊102而言，當檢索到記錄6即產生匹配結果時，即輸出記錄6對應的指引資料做匹配記錄110)，而沒有包含優先順序的排序，以加快檢索的速度。在另一實施例中，可對優先順序最高的記錄，直接執行其對應的處理方式並提供予用戶。例如當優先順序最高的為播放三國演義的電影時，可直接播放電影與用戶。此外，若優先順序最高的為蕭敬騰演唱的背叛時，可直接將此歌曲播放與用戶。應注意的是，本發明在此僅作說明，並非對此加以限定。In another embodiment of the present invention, the retrieved response result 110 may be an exact match record that exactly matches the keyword 108, or a partial match record that partially matches the keyword 108. For example, if the user's request information 102 is "I want to listen to Xiao Jingteng's betrayal", similarly, the natural language processor 300 analyzes to generate two possible intent grammar materials 106: "<playmusic>, <singer> = Xiao Jingteng, <songname>=betrayal; and "<playmusic>, <songname>=Xiao Jingteng's betrayal"; and send two sets of keywords 108: "Xiao Jingteng" and "betrayal"; and "Xiao Jingteng's betrayal"; The interface unit 260, the search engine 240 then performs a full-text search of the title field 304 of the record 302 stored in the structured database 220 of FIG. 3A by the keyword 108 received by the search interface unit 260. Since in the full-text search, the second group of keywords 108 "Xiao Jingteng's betrayal" did not match any records, but the first group of keywords 108 "Xiao Jingteng" and "betrayal" found the matching result of record 6 and record 7. . Since the second group of keywords 108 "Xiao Jing Teng" and "Betrayal" only match the numerical data "Xiao Jing Teng" in Record 6, but do not match other numerical data "Yang Zongwei" and "Cao Ge", record 6 is a partial matching record. (Please note that the above-mentioned corresponding request information 102 "I want to see the bullet 5" and the corresponding request information "I want to listen to the day that I walked together" record 1 is a partial match record), and the keyword "Xiao Jingteng" and "Betrayal" exactly matches the value of record 7 (because the second set of keywords 108 "Xiao Jingteng" and "Betrayal" match successfully), so record 7 is an exact match record. In the invention In the embodiment, when the search interface unit 260 outputs the plurality of matching records 110 to the knowledge assisted understanding module 400, the complete matching records (that is, all the numerical data are matched) and the partial matching records may be sequentially output (ie, Only partial data is matched to the matching record 110, wherein the priority of the exact match record is greater than the priority of the partial match record. Therefore, when the search interface unit 260 outputs the matching record 110 of the record 6 and the record 7, the output priority order of the record 7 is greater than the output priority order of the record 6, because all the numerical data "Xiao Jingteng" and "Betrayal" of the record 7 are generated. Match the results, but record 6 also contains "Yang Zongwei" and "Cao Ge" did not produce a match. That is, the higher the degree to which the records stored in the structured repository 220 match the keywords 108 in the request information 102, the easier it is to be prioritized for the user to review or select the corresponding determined intent grammar data 114. In another embodiment, the matching record 110 corresponding to the record with the highest priority can be directly output as the determination of the intended grammar data 114. The foregoing is not intended to limit the invention, as in another embodiment it may be possible to take the form of the search as long as the matching record is found (for example, "I want to hear Xiao Jingteng's betrayal" as the request information 102, when the record 6 is retrieved That is, when the matching result is generated, the guidance data corresponding to the record 6 is output as the matching record 110), and the ordering of the priority order is not included to speed up the retrieval. In another embodiment, the corresponding processing mode can be directly executed and provided to the user for the record with the highest priority. For example, when the movie with the highest priority is the movie of the Three Kingdoms, the movie and the user can be directly played. In addition, if the highest priority is the betrayal of Xiao Jingteng's singing, the song can be played directly with the user. It should be noted that the present invention is described herein only and is not intended to be limiting.

在本發明的再一實施例中，如果用戶的請求資訊102為"我要聽劉德華的背叛"，則其可能意圖語法資料106的其中之一為："<playmusic>,<singer>=劉德華,<songname>=背叛”；若檢索介面單元260將關鍵字108"劉德華"與"背叛"輸入搜尋引擎240，並不會在圖3A的資料庫中找到任何的匹配結果。在本發明的又一實施例中，檢索介面單元260可分別將關鍵字108"劉德華"以及"背叛"輸入搜尋引擎240，並且分別對應求得"劉德華"是歌手名稱(指引資料singerguid)以及"背叛"是歌曲名稱(指引資料songnameguid，且歌手可能是曹格或是蕭敬騰、楊宗緯與曹格合唱)。此時，自然語言理解系統100可進一步提醒用戶：“背叛這首歌曲是否為蕭敬騰所唱(依據記錄7的匹配結果)？”，或者，“是否為蕭敬騰、楊宗緯與曹格所合唱(依據記錄6的匹配結果)？”。In still another embodiment of the present invention, if the user's request information 102 is "I want to listen to Andy Lau's betrayal", then one of the possible intent grammar materials 106 is: "<playmusic>, <singer> = Andy Lau, <songname>=betrayal; if the search interface unit 260 enters the keywords 108 "Andy Lau" and "betrayal" into the search engine 240, no matching results will be found in the database of FIG. 3A. In still another embodiment of the present invention, the search interface unit 260 may input the keywords 108 "Andy Lau" and "betrayal" into the search engine 240, respectively, and respectively determine that "Andy Lau" is the singer name (guide material singerguid) and " "Betrayal" is the name of the song (the guide information is songnameguid, and the singer may be Cao Ge or Xiao Jingteng, Yang Zongwei and Cao Ge chorus). At this time, the natural language understanding system 100 can further remind the user: "Is the betrayal of this song sung by Xiao Jingteng (according to the matching result of record 7)?", or, "Whether it is chorus by Xiao Jingteng, Yang Zongwei and Cao Ge (according to record 6) Match the result)?".

在本發明的再一實施例中，結構化資料庫220所儲存記錄可還包括有來源欄位314及熱度欄位316。如圖3B所示的資料庫，其除了圖3A的各項欄位之外，還包含來源欄位314熱度欄位316、喜好欄位318與厭惡欄位。各記錄的來源欄位314可用以儲存此記錄是出自哪一個結構化資料庫(在此圖式中僅顯示結構化資料庫220，而實際上可存在更多不同的結構化資料庫)、或是哪一個用戶、伺服器所提供的來源值。並且，自然語言理解系統100可根據用戶在之前的請求訊息102中所透漏的喜好，來檢索特定來源的結構化資料庫(例如以請求資訊102中的關鍵字108進行全文檢索產生匹配時，便對該記錄的熱度值加一)。而各記錄302的熱度欄位316用以儲存此記錄302的搜尋熱度值或是熱門程度值(例如該記錄在特定時間裏被單一用戶、特定用戶群組、所有用戶的匹配次數或機率)，以供知識輔助理解模組400判斷用戶意圖時的參考，至於喜好欄位318與厭惡欄位的使用方式後文會再詳述。詳細而論，當用戶的請求資訊102為"我要看三國演義"時，自然語言處理器300經過分析後，可產生出多個可能意圖語法資料106："<readbook>,<bookname>=三國演義"；"<watchTV>,<TVname>=三國演義"；以及"<watchfilm>,<filmname>=三國演義"。In still another embodiment of the present invention, the record stored in the structured database 220 may further include a source field 314 and a heat field 316. The database shown in FIG. 3B includes, in addition to the fields of FIG. 3A, a source field 314 heat field 316, a favorite field 318, and an aversive field. The source field 314 of each record can be used to store which structured database the record originated from (in this figure only the structured database 220 is displayed, but in fact there can be more different structured databases), or Which user and server provide the source value. Moreover, the natural language understanding system 100 can retrieve a structured repository of a particular source based on preferences that the user has disclosed in the previous request message 102 (eg, by requesting the keyword 108 in the information 102). When the text search produces a match, the heat value of the record is incremented by one). The heat field 316 of each record 302 is used to store the search heat value or the popularity value of the record 302 (for example, the number of matches or the probability that the record is matched by a single user, a specific user group, all users at a specific time). For the reference of the knowledge assisted understanding module 400 to determine the user's intention, the use of the favorite field 318 and the aversion field will be described in detail later. In detail, when the user's request information 102 is "I want to see the Romance of the Three Kingdoms", the natural language processor 300 can generate a plurality of possible intent grammar materials 106 after analysis: "<readbook>, <bookname>=Three Kingdoms The Romance ";" <watchTV>, <TVname>=The Romance of the Three Kingdoms; and "<watchfilm>, <filmname>=The Romance of the Three Kingdoms".

若自然語言理解系統100在用戶的請求資訊102的歷史記錄中(例如利用透過熱度欄位316儲存該筆記錄302被某用戶所點選的次數)，統計出其大部份的請求為看電影，則自然語言理解系統100可針對儲存電影記錄的結構化資料庫來做檢索(此時來源欄位314中的來源值，是記錄儲存電影記錄的結構化資料庫的代碼)，從而可優先判定"<watchfilm>,<filmname>=三國演義"為確定意圖語法資料114。舉例來說，在一實施例中亦可在每個記錄302被匹配一次，就可在後面的熱度欄位316加一，作為用戶的歷史記錄。於是在依據關鍵字108“三國演義”做全文檢索時，可以從所有匹配結果中挑選熱度欄位316中數值最高的記錄302，作為判斷用戶意圖之用。在一實施例中，若自然語言理解系統100在關鍵字108"三國演義"的檢索結果中，判定對應"三國演義"這出電視節目的記錄的熱度欄位316所儲存的搜尋熱度值最高，則便可優先判定"<watchTV>,<TVname>=三國演義"為確定意圖語法資料114。此外，上述對熱度欄位316所儲存數值的變更方式，可透過自然語言理解系統100所在的電腦系統進行變更，本發明對此並不加以限制。此外、熱度欄位316的數值亦可隨時間遞減，以表示用戶對某項記錄302的熱度已逐漸降低，本發明對這部分亦不加以限制。If the natural language understanding system 100 is in the history of the user's request information 102 (for example, by using the heat field 316 to store the number of times the record 302 was clicked by a certain user), most of the requests are counted as movies. The natural language understanding system 100 can perform a search for the structured database storing the movie records (at this time, the source value in the source field 314 is the code for recording the structured database storing the movie records), so that the priority can be determined. "<watchfilm>, <filmname>=Three Kingdoms" is used to determine the intent grammar material 114. For example, in an embodiment, each record 302 can also be matched once, and one of the subsequent heat fields 316 can be added as a history of the user. Therefore, when the full-text search is performed according to the keyword 108 "Three Kingdoms", the record 302 having the highest value among the heat field 316 can be selected from all the matching results as the judgment of the user's intention. In an embodiment, if the natural language understanding system 100 searches for the "Three Kingdoms" in the search result of the keyword "Three Kingdoms", If the search heat value stored in the hot field 316 of the program has the highest value, then "<watchTV>, <TVname>=Three Kingdoms" can be preferentially determined to determine the intentional grammar data 114. In addition, the manner of changing the value stored in the heat field 316 can be changed by the computer system in which the natural language understanding system 100 is located, which is not limited by the present invention. In addition, the value of the heat field 316 may also decrease with time to indicate that the user's heat for a certain record 302 has gradually decreased, and the present invention does not limit this portion.

再舉另一個實例來說，在另一實施例中，由於用戶可能在某段時間中特別喜歡看三國演義的電視劇，由於電視劇的長度可能很長而用戶無法短時間看完，因此在短時間中可能重複點選(假設每匹配一次就將熱度欄位316內的數值加一的話)，因此造成某個記錄302被重複匹配，這部分都可透過分析熱度欄位316的資料而得知。再者，在另一實施例中，電信業者也可以利用熱度欄位316來表示某一來源所提供資料被取用的熱度，而此資料供應者的編碼可以用來源欄位314進行儲存。舉例來說，若某位供應“三國演義電視劇”的供應者的被點選的機率最高，所以當某用戶輸入“我要看三國演義”的請求資訊102時，雖然在對圖3B的資料庫進行全文檢索時會找到閱讀三國演義的書籍(記錄8)、觀看三國演義電視劇(記錄9)、觀看三國演義電影(記錄10)三個匹配結果，但由於熱度欄位316中的資料顯示觀看三國演義電視劇是現在最熱門的選項(亦即記錄8、9、10的熱度欄位的數值分別為2、5、8)，所以將先提供記錄10的指引資料做匹配記錄110輸出至知識輔助理解系統400，作為判定用戶意圖的最優先選項。在一實施例中，可同時將來源欄位314的資料顯示給用戶，讓用戶判斷他所想要觀看的電視劇是否為某位供應者所提供。應注意的是，上述對來源欄位314所儲存資料以及其變更方式，亦可透過自然語言理解系統100所在的電腦系統進行變更，本發明對此並不加以限制。應注意的是，本領域的技術人員應知，可進一步將圖3B中的熱度欄位316、喜好欄位318、厭惡欄位320所儲存的資訊進一步切割成與用戶個人相關以及與全體用戶相關兩部分，並將與用戶個人相關的熱度欄位316、喜好欄位318、厭惡欄位320資訊將儲存在用戶的手機，而伺服器則儲存與全體用戶相關的熱度欄位316、喜好欄位318、厭惡欄位320等資訊。這樣一來，僅與用戶個人的選擇或意圖相關的個人喜好相關資訊就只儲存在用戶個人的移動通訊裝置(例如手機、平板電腦、或是小筆電...等)中，而伺服器則儲存與用戶全體相關的資訊，這樣不僅可節省伺服器的儲存空間，也保留用戶個人喜好的隱密性。As another example, in another embodiment, since the user may particularly like to watch the drama of the Three Kingdoms in a certain period of time, since the length of the drama may be long and the user cannot read it for a short time, in a short time It is possible to repeat the selection (assuming that the value in the heat field 316 is incremented by one each time it is matched), thus causing a certain record 302 to be repeatedly matched, which can be known by analyzing the data of the heat field 316. Moreover, in another embodiment, the telecommunications operator may also utilize the popularity field 316 to indicate the popularity of the material provided by a source, and the data provider's code may be stored in the source field 314. For example, if a supplier who supplies the "Three Kingdoms TV series" has the highest probability of being selected, when a user enters the "I want to see the Romance of the Three Kingdoms" request information 102, although in the database of Figure 3B When you are doing a full-text search, you will find a book reading the Romance of the Three Kingdoms (Record 8), watching the drama of the Three Kingdoms (Record 9), and watching the Three Kingdoms (10), but the data in the Heat 316 shows the Three Kingdoms. The TV series is now the most popular option (that is, the values of the heat fields of records 8, 9, and 10 are 2, 5, and 8 respectively), so the guidance data of the record 10 will be provided first to make the matching record 110 output to The knowledge assisted understanding system 400 serves as the top priority for determining user intent. In one embodiment, the data of the source field 314 can be displayed to the user at the same time, allowing the user to determine whether the television show he wants to watch is provided by a provider. It should be noted that the information stored in the source field 314 and the manner of changing the same may also be changed through the computer system in which the natural language understanding system 100 is located, which is not limited by the present invention. It should be noted that those skilled in the art should further further cut the information stored in the heat field 316, the favorite field 318, and the aversion field 320 in FIG. 3B into related to the user and related to all users. The two parts, and the heat field 316, the favorite field 318, and the aversion field 320 information related to the user are stored in the user's mobile phone, and the server stores the heat field 316 associated with all users, and the favorite field. 318, aversion to the field 320 and other information. In this way, information about personal preferences related only to the user's personal choice or intent is stored only in the user's personal mobile communication device (eg, mobile phone, tablet, or small notebook, etc.), and the server The information related to the user is stored, which not only saves the storage space of the server, but also preserves the privacy of the user's personal preference.

明顯的，本發明所揭示的結構化資料庫中的每個記錄內部所包含的數值資料相互間具有關聯性(例如記錄1中的數值資料“劉德華”、“一起走過的日子”、“港臺，粵語，流行”都是用來描述記錄1的特徵)，且這些數值資料共同用以表達來自用戶的請求資訊對該記錄的意圖(例如對“一起走過的日子”產生匹配結果時，表示用戶的意圖可能是對記錄1的資料存取)，於是在搜尋引擎對結構化資料庫進行全文檢索時，可在記錄的數值資料被匹配時，輸出對應於該數值資料的指引資料(例如輸出“songnameguid”作為回應結果110)，進而確認該請求資訊的意圖(例如在知識輔助理解模組400中進行比對)。Obviously, the numerical data contained in each record in the structured database disclosed by the present invention is related to each other (for example, the numerical data in the record 1 "Andy Lau", "days passed together", "port Taiwan, Cantonese, popular "is used to describe the characteristics of record 1," and these numerical data are used together to express the intent of the request information from the user (for example, when the "days passed together" are matched. Indicates that the user's intention may be access to the data of record 1), so when the search engine performs a full-text search on the structured database, the recorded numerical data may be At the time of the distribution, the guidance data corresponding to the numerical data (for example, the output "songnameguid" as the response result 110) is output, thereby confirming the intention of the requested information (for example, the comparison is performed in the knowledge assisted understanding module 400).

基於上述示範性實施例所揭示或教示的內容，圖4A為根據本發明的一實施例的檢索方法的流程圖。請參閱圖4A，本發明的實施例的檢索方法包括以下步驟：提供結構化資料庫，且結構化資料庫儲存多個記錄(步驟S410)；接收至少一關鍵字(步驟S420)；藉由關鍵字來對多個記錄的標題欄位進行全文檢索(步驟S430)。舉例來說，將關鍵字108輸入檢索介面單元260來讓搜尋引擎240對結構化資料庫220所儲存的多個記錄302的標題欄位304進行全文檢索，至於檢索方式可如對圖3A或圖3B所進行的檢索方式、或是不變更其精神的方式來進行；判斷全文檢索是否有匹配結果(步驟S440)。舉例來說，藉由搜尋引擎240來判斷此關鍵字108所對應的全文檢索是否有匹配結果；以及若有匹配結果，依序輸出完全匹配記錄及部分匹配記錄(步驟S450)。舉例來說，若結構化資料庫220中有記錄匹配此關鍵字108，則檢索介面單元260依序輸出匹配此關鍵字108的完全匹配記錄及部分匹配記錄中的指引資料(可透過對圖3C的指引資料儲存裝置280而取得)作為回應結果110送往知識輔助理解系統 400，其中完全匹配記錄的優先順序大於部分匹配記錄的優先順序。Based on the disclosure or teachings of the above exemplary embodiments, FIG. 4A is a flowchart of a retrieval method in accordance with an embodiment of the present invention. Referring to FIG. 4A, the retrieval method of the embodiment of the present invention includes the following steps: providing a structured database, and the structured database stores a plurality of records (step S410); receiving at least one keyword (step S420); The word is used to perform full-text search on the title fields of the plurality of records (step S430). For example, the keyword 108 is input into the search interface unit 260 to cause the search engine 240 to perform a full-text search on the title field 304 of the plurality of records 302 stored in the structured database 220. The search mode may be as shown in FIG. 3A or FIG. The search method performed by 3B or the manner in which the spirit is not changed is performed; whether or not the full-text search has a matching result is determined (step S440). For example, the search engine 240 determines whether the full-text search corresponding to the keyword 108 has a matching result; and if there is a matching result, the exact match record and the partial match record are sequentially output (step S450). For example, if there is a record matching the keyword 108 in the structured database 220, the search interface unit 260 sequentially outputs the matching data in the exact matching record and the partial matching record matching the keyword 108 (via the pair of FIG. 3C). The guidance data storage device 280 is obtained as a response result 110 sent to the knowledge assistance understanding system 400, wherein the priority order of the exact match record is greater than the priority order of the partial match record.

另一方面，若未有匹配結果，則可以直接通知用戶匹配失敗並結束流程、通知用戶未發現匹配結果並要求做更進一步的輸入、或是列舉可能選項給用戶做進一步選擇(例如前述以”劉德華”與”背叛”做全文檢索未產生匹配結果的例子)(步驟460)。On the other hand, if there is no matching result, the user can be directly notified of the matching failure and the process is terminated, the user is notified that the matching result is not found, and further input is required, or the possible options are listed to further select the user (for example, the foregoing). Liu Dehua and "Betrayal" do not produce a matching result in the full-text search) (step 460).

前述的流程步驟非以限定本發明，有些步驟是可以忽略或移除，例如，在本發明的另一實施例中，可藉由位於檢索系統200外的匹配判斷模組(未繪示於圖中)來執行步驟S440；或是在本發明的另一實施例中，可忽略上述步驟S450，其依序輸出完全匹配記錄及部分匹配記錄的動作可以藉由位於檢索系統200外的匹配結果輸出模組(未繪示於圖中)，來執行步驟S450中依序輸出完全匹配記錄及部分匹配記錄的動作。The foregoing process steps are not intended to limit the present invention, and some steps may be omitted or removed. For example, in another embodiment of the present invention, a matching judgment module located outside the retrieval system 200 may be used (not shown in the figure). Step S440 is performed; or in another embodiment of the present invention, the above step S450 may be omitted, and the actions of sequentially outputting the exact match record and the partial match record may be output by the matching result located outside the retrieval system 200. The module (not shown in the figure) performs the actions of sequentially outputting the exact match record and the partial match record in step S450.

基於上述示範性實施例所揭示或教示的內容，圖4B為根據本發明的另一實施例的自然語言理解系統100工作過程的流程圖。請參閱圖4B，本發明的另一實施例的自然語言理解系統100工作過程包括以下步驟：接收請求資訊(步驟S510)。舉例來說，用戶將具有語音內容或文字內容的請求資訊102傳送至自然語言理解系統100；提供結構化資料庫，且結構化資料庫儲存多個記錄(步驟S520)；將請求資訊語法化(步驟S530)。舉例來說，自然語言處理器 300分析用戶的請求資訊102後，進而轉為對應的可能意圖語法資料106；辨別關鍵字的可能屬性(步驟S540)。舉例來說，知識輔助理解模組400辨別出可能意圖語法資料106中的至少一關鍵字108的可能屬性，例如，關鍵字108"三國演義"可能是書、電影及電視節目；藉由關鍵字108來對多個記錄的標題欄位304進行全文檢索(步驟S550)。舉例來說，將關鍵字108輸入檢索介面單元260來讓搜尋引擎240對結構化資料庫220所儲存的多個記錄的標題欄位304進行全文檢索；判斷全文檢索是否有匹配結果(步驟S560)。舉例來說，藉由搜尋引擎240來判斷此關鍵字108所對應的全文檢索是否有匹配結果；若有匹配結果，依序輸出完全匹配記錄及部分匹配記錄(步驟S570)所對應的指引資料為回應結果110。舉例來說，若結構化資料庫220中有記錄匹配此關鍵字108，則檢索介面單元260依序輸出匹配此關鍵字108的完全匹配記錄及部分匹配記錄所對應的指引資料為回應結果110，其中完全匹配記錄的優先順序大於部分匹配記錄的優先順序；以及依序輸出對應的確定意圖語法資料(步驟S580)。舉例來說，知識輔助理解模組400藉由依序輸出的完全匹配記錄及部分匹配記錄，藉以輸出對應的確定意圖語法資料114。4B is a flowchart of the operation of the natural language understanding system 100 in accordance with another embodiment of the present invention, based on what is disclosed or taught by the above-described exemplary embodiments. Referring to FIG. 4B, the natural language understanding system 100 of the other embodiment of the present invention includes the following steps: receiving request information (step S510). For example, the user transmits the request information 102 having the voice content or the text content to the natural language understanding system 100; provides a structured database, and the structured database stores a plurality of records (step S520); grammatically digitizes the request information ( Step S530). For example, a natural language processor 300 analyzes the user's request information 102, and then converts to the corresponding possible intent grammar data 106; identifies the possible attributes of the keyword (step S540). For example, the knowledge assisted understanding module 400 identifies possible attributes of at least one of the keywords 108 that may be intended to be grammar material 106, for example, the keyword 108 "Three Kingdoms" may be a book, a movie, and a television program; The full-text search is performed on the title field 304 of the plurality of records (step S550). For example, the keyword 108 is input into the search interface unit 260 to cause the search engine 240 to perform full-text search on the title field 304 of the plurality of records stored in the structured database 220; whether the full-text search has a matching result (step S560) . For example, the search engine 240 determines whether the full-text search corresponding to the keyword 108 has a matching result; if there is a matching result, sequentially outputs the matching data corresponding to the full matching record and the partial matching record (step S570). Response result 110. For example, if a record in the structured database 220 matches the keyword 108, the search interface unit 260 sequentially outputs the guide data corresponding to the match record and the partial match record of the keyword 108 as the response result 110. The priority order of the exact match records is greater than the priority order of the partial match records; and the corresponding determined intent grammar data is sequentially output (step S580). For example, the knowledge-assisted understanding module 400 outputs the matched records and partial matches in sequence. Recording, by which the corresponding determined intent grammar data 114 is output.

另一方面，若在步驟S560未產生匹配結果，亦可運用類似步驟S460的方式來處理，例如直接通知用戶匹配失敗並結束流程、通知用戶未發現匹配結果並要求做更進一步的輸入、或是列舉可能選項給用戶做進一步選擇(例如前述以”劉德華”與”背叛”做全文檢索未產生匹配結果的例子)(步驟S590)。On the other hand, if the matching result is not generated in step S560, it may be processed in a manner similar to step S460, for example, directly notifying the user that the matching fails and ending the process, notifying the user that the matching result is not found and requesting further input, or The possible options are listed for the user to make further choices (for example, the foregoing example in which "Andy Lau" and "Betrayal" do not produce a matching result in the full-text search) (step S590).

前述的流程步驟非以限定本發明，有些步驟是可以忽略或移除。The foregoing process steps are not intended to limit the invention, and some steps may be omitted or removed.

綜上所述，本發明藉由取出用戶的請求資訊所包括的關鍵字，並且針對結構化資料庫中的具有特定資料結構的記錄的標題欄位來進行全文檢索，若產生匹配結果，便可判斷出關鍵字所屬的領域種類，藉以確定用戶在請求資訊所表示的意圖。In summary, the present invention performs full-text search by extracting keywords included in the user's request information and for the title field of the record having the specific data structure in the structured database, and if a matching result is generated, Determine the category of the domain to which the keyword belongs to determine the intent of the user to request the information.

接下來針對以上結構化資料庫在語音識別上的應用做更多的說明。首先針對在自然語言對話系統中，根據用戶的語音輸入來修正錯誤的語音應答，並進一步找出其他可能的答案來回報給用戶的應用做說明。Next, we will explain more about the application of the above structured database in speech recognition. Firstly, in the natural language dialogue system, the wrong voice response is corrected according to the user's voice input, and further possible answers are found to report the application to the user.

如前所述，雖然現今的移動通訊裝置已可提供自然語言對話功能，以讓用戶發出語音來和移動通訊裝置溝通。然而在目前的語音對話系統，當用戶的語音輸入不明確時，由於同一句語音輸入可能意指多個不同的意圖或目的，故系統容易會輸出不符合語音輸入的語音應答。因此在很多對話情境中，用戶難以得到符合其意圖的語音應答。為此，本發明提出一種修正語音應答的方法以及自然語言對話系統，其中自然語言對話系統可根據用戶的語音輸入來修至錯誤的語音應答，並進一步找出其他可能的答案來回報給用戶。為了使本發明的內容更為明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。As mentioned earlier, today's mobile communication devices have been able to provide natural language conversations to allow users to voice to communicate with mobile communication devices. However, in the current voice dialogue system, when the user's voice input is not clear, since the same sentence voice input may mean a plurality of different intentions or purposes, the system may easily output a voice response that does not conform to the voice input. Therefore, in many conversation situations, it is difficult for a user to obtain a voice response that meets his or her intention. To this end, the present invention proposes a modified speech response The method and the natural language dialogue system, wherein the natural language dialogue system can repair the wrong voice response according to the user's voice input, and further find other possible answers to report back to the user. In order to clarify the content of the present invention, the following specific examples are given as examples in which the present invention can be implemented.

圖5A是依照本發明一實施例所繪示的自然語言對話系統的方塊圖。請參照圖5A，自然語言對話系統500包括語音取樣模組510、自然語言理解系統520、以及語音合成資料庫530。在一實施例中，語音取樣模組510用以接收第一語音輸入501(例如來自用戶的語音)，隨後對其進行解析而產生第一請求資訊503，而自然語言理解系統520會再對第一請求資訊503進行解析而取得其中的第一關鍵字509，並在找到符合第一請求資訊503的第一回報答案511後(依據圖1的描述，第一請求資訊503可運用請求資訊102相同的方式做處理，亦即請求資訊102在分析後會產生可能意圖語法資料106，而其中的關鍵字108會用來對結構化資料庫220進行全文檢索而獲得回應結果110，此回應結果110再與可能意圖語法資料106中的意圖資料112作比對而產生確定意圖語法資料114，最後由分析結果輸出模組116送出分析結果104，此分析結果104可作為圖5A中的第一回報答案511)，依據此第一回報答案511對語音合成資料庫530進行對應的語音查詢(因為做為第一回答案511的分析結果104可包含完全/部分匹配的記錄302的相關資料(例如儲存在指引欄位310的指引資料、在數值欄位312的數值資料、以及在內容欄位306的資料...等)，因此可利用這些資料進行語音查詢)，再輸出所查詢的第一語音513產生對應於第一語音輸入501的第一語音應答507予用戶。其中，倘若用戶認為自然語言理解系統520所輸出的第一語音應答507不符合第一語音輸入501中的第一請求資訊503時，用戶將輸入另一個語音輸入，例如第二語音輸入501’，來指示此事。自然語言理解系統520會利用上述對第一語音輸入501的相同處理方式來處理第二語音輸入501’以產生第二請求資訊503’，隨後對第二請求資訊503’進行解析、取得其中的第二關鍵字509’、找到符合第二請求資訊503’的第二回報答案511’、找出對應的第二語音513’、最後再依據第二語音513’產生對應的第二語音應答507’輸出予用戶，作為修正第一回報答案511之用。明顯的，自然語言理解系統520可以圖1的自然語言理解系統100為基礎，並再增加新的模組(將結合後續的圖5B做解說)來達成根據用戶的語音輸入來修正錯誤的語音應答的目的。FIG. 5A is a block diagram of a natural language dialogue system according to an embodiment of the invention. Referring to FIG. 5A, the natural language dialogue system 500 includes a speech sampling module 510, a natural language understanding system 520, and a speech synthesis database 530. In an embodiment, the voice sampling module 510 is configured to receive the first voice input 501 (eg, voice from the user), and then parse it to generate the first request information 503, and the natural language understanding system 520 will again A request information 503 is parsed to obtain the first keyword 509 therein, and after finding the first return answer 511 that meets the first request information 503 (according to the description of FIG. 1, the first request information 503 can be the same as the request information 102 The method of processing, that is, the request information 102 will generate the possible intent grammar data 106 after the analysis, and the keyword 108 therein will be used to perform a full-text search on the structured database 220 to obtain the response result 110, and the response result 110 The determined intent grammar data 114 is generated by comparison with the intent data 112 in the possible intent grammar data 106, and finally the analysis result 104 is sent by the analysis result output module 116, and the analysis result 104 can be used as the first return answer 511 in FIG. 5A. According to the first return answer 511, the voice synthesis database 530 performs a corresponding voice query (because the analysis result 104 as the first answer 511 can be included) / Partial matching records related information 302 (e.g., stored in the guidance information guidance field 310, the numerical value data field 312, and information in the content field 306 ... etc.), and therefore can use these The data is queried, and the first voice 513 that is queried is output to generate a first voice response 507 corresponding to the first voice input 501 to the user. Wherein, if the user believes that the first voice response 507 output by the natural language understanding system 520 does not conform to the first request information 503 in the first voice input 501, the user inputs another voice input, such as the second voice input 501', To indicate this. The natural language understanding system 520 processes the second voice input 501' using the same processing method as the first voice input 501 to generate the second request information 503', and then parses the second request information 503' to obtain the first The second keyword 509' finds the second return answer 511' that matches the second request information 503', finds the corresponding second voice 513', and finally generates the corresponding second voice response 507' output according to the second voice 513'. To the user, as a correction to the first return answer 511. Obviously, the natural language understanding system 520 can be based on the natural language understanding system 100 of FIG. 1, and a new module (which will be explained in conjunction with the subsequent FIG. 5B) is added to correct the erroneous voice response according to the user's voice input. the goal of.

前述自然語言對話系統500中的各構件可配置在同一機器中。舉例而言，語音取樣模組510與自然語言理解系統520例如是配置於同一電子裝置。其中，電子裝置可以是移動電話(Cell phone)、個人數位助理(Personal Digital Assistant，PDA)手機、智慧型手機(Smart phone)等移動通訊裝置、掌上型電腦(Pocket PC)、平板型電腦(Tablet PC)、筆記型電腦、個人電腦、或是其他具備通訊功能或安裝有通訊軟體的電子裝置，在此並不限制其範圍。此外，上述電子裝置可使用Android作業系統、Microsoft 作業系統、Android作業系統、Linux作業系統等等，不限於此。當然，前述自然語言對話系統500中的各構件也不一定需設置在同一機器中，而可分散在不同裝置或系統並透過各種不同的通訊協定來連結。舉例而言，自然語言理解系統520可以位於雲端伺服器中，也可以位於區域網路中的伺服器。此外，自然語言理解系統520中的各構件也可分散在不同的機器，例如自然語言理解系統520中的各構件可位於與語音取樣模組510相同或不同的機器。The components in the aforementioned natural language dialogue system 500 can be configured in the same machine. For example, the speech sampling module 510 and the natural language understanding system 520 are, for example, disposed on the same electronic device. The electronic device may be a mobile phone (Cell phone), a Personal Digital Assistant (PDA) mobile phone, a smart phone, a mobile communication device, a Pocket PC, or a tablet (Tablet). PC), notebook computers, personal computers, or other electronic devices that have communication functions or are equipped with communication software are not limited in scope here. In addition, the above electronic device can use the Android operating system, Microsoft The operating system, the Android operating system, the Linux operating system, and the like are not limited thereto. Of course, the components in the aforementioned natural language dialogue system 500 do not necessarily need to be disposed in the same machine, but may be distributed in different devices or systems and connected through various different communication protocols. For example, the natural language understanding system 520 can be located in a cloud server or a server in a local area network. Moreover, the various components of the natural language understanding system 520 can also be distributed among different machines. For example, the components of the natural language understanding system 520 can be located in the same or different machines as the speech sampling module 510.

在本實施例中，語音取樣模組510用以接收語音輸入，此語音取樣模組510可以為麥克風(Microphone)等接收音訊的裝置，而第一語音輸入501/第二語音輸入501’可以是來自用戶的語音。In this embodiment, the voice sampling module 510 is configured to receive voice input. The voice sampling module 510 can be a device for receiving audio such as a microphone, and the first voice input 501 / the second voice input 501 ′ can be Voice from the user.

此外，本實施例的自然語言理解系統520可由一個或數個邏輯門組合而成的硬體電路來實作。或者，在本發明另一實施例中，自然語言理解系統520可以透過電腦程式碼來實作。舉例來說，自然語言理解系統520例如是由程式語言所撰寫的程式碼片段來實作于應用程式、作業系統或驅動程式等，而這些程式碼片段儲存在儲存單元中，並藉由處理單元(圖5A未顯示)來執行的。為了使本領域的技術人員進一步瞭解本實施例的自然語言理解系統520，底下舉實例來進行說明。然，本發明在此僅為舉例說明，並不以此為限，例如運用硬體、軟體、固件、或是此三種實施方式的混合結合等方式，皆可運用來實施本發明。In addition, the natural language understanding system 520 of the present embodiment can be implemented by a hardware circuit composed of one or several logic gates. Alternatively, in another embodiment of the present invention, the natural language understanding system 520 can be implemented by computer code. For example, the natural language understanding system 520 is, for example, a code segment written by a programming language, implemented in an application, an operating system, or a driver, and the code segments are stored in a storage unit and processed by the processing unit. (not shown in Figure 5A) to perform. In order to enable those skilled in the art to further understand the natural language understanding system 520 of the present embodiment, an example will be described below. However, the present invention is intended to be illustrative only, and is not limited thereto. For example, the invention may be implemented by using hardware, software, firmware, or a combination of the three embodiments.

圖5B是依照本發明一實施例所繪示的自然語言理解系統520的方塊圖。請參照圖5B，本實施例的自然語言理解系統520可包括語音識別模組522、自然語言處理模組524以及語音合成模組526。其中，語音識別模組522會接收從語音取樣模組510傳來的請求資訊，例如對第一語音輸入501進行解析的第一請求資訊503，並取出一個或多個第一關鍵字509(例如圖1A的關鍵字108或字句等)。自然語言處理模組524可再對這些第一關鍵字509進行解析，而獲得至少包含一個回報答案的候選列表(與圖5A的處理方式相同，亦即例如透過圖1A的檢索系統200對結構化資料庫220進行全文檢索，並在取得回應結果110且對意圖資料112比對後產生確定意圖語法資料114，最後由分析結果輸出模組116所送出的分析結果104來產生回報答案)，並且會從候選列表所有的回報答案中選出一個較符合第一語音輸入501的答案以做為第一回報答案511(例如挑選完全匹配記錄...等)。由於第一回報答案511是自然語言理解系統520在內部分析而得的答案，所以還必須將它轉換成語音輸出才能輸出予用戶，這樣用戶才能進行判斷。於是語音合成模組526會依據第一回報答案511來查詢語音合成資料庫530，而此語音合成資料庫530例如是記錄有文字以及其對應的語音資訊，可使得語音合成模組526能夠找出對應於第一回報答案511的第一語音513，藉以合成出第一語音應答507。之後，語音合成模組526可將合成的第一語音應答507透過語音輸出介面(未繪示)(其中語音輸出介面例如為喇叭、揚聲器、或耳機等裝置)輸出予用戶。應注意的是，語音合成模組526在依據第一回報答案511查詢語音合成資料庫530時，可能需要先將第一回報答案511進行格式轉換，然後透過語音合成資料庫530所規定的介面進行呼叫。由於呼叫語音合成資料庫530時是否需要進行格式轉換與語音合成資料庫530本身的定義相關，因這部分屬於本領域的技術人員所熟知的技術，故在此不予詳述。FIG. 5B is a block diagram of a natural language understanding system 520, in accordance with an embodiment of the invention. Referring to FIG. 5B, the natural language understanding system 520 of the present embodiment may include a voice recognition module 522, a natural language processing module 524, and a voice synthesis module 526. The voice recognition module 522 receives the request information transmitted from the voice sampling module 510, such as the first request information 503 for parsing the first voice input 501, and extracts one or more first keywords 509 (for example, The keyword 108 or sentence of FIG. 1A, etc.). The natural language processing module 524 can further parse the first keywords 509 to obtain a candidate list that includes at least one reward answer (the same as the processing of FIG. 5A, that is, for example, through the retrieval system 200 of FIG. 1A). The database 220 performs full-text search, and after obtaining the response result 110 and comparing the intent data 112, the determined intent grammar data 114 is generated, and finally the analysis result 104 sent by the analysis result output module 116 generates a reward answer), and An answer that is more in line with the first voice input 501 is selected from all of the candidate answers of the candidate list as the first return answer 511 (eg, picking a perfect match record, etc.). Since the first return answer 511 is an internal analysis of the natural language understanding system 520, it must also be converted into a speech output before being output to the user so that the user can make a judgment. The speech synthesis module 526 then queries the speech synthesis database 530 according to the first reward answer 511. The speech synthesis database 530 records, for example, text and corresponding speech information, so that the speech synthesis module 526 can find out The first speech 513 corresponding to the first reward answer 511 is used to synthesize the first speech response 507. Thereafter, the speech synthesis module 526 can transmit the synthesized first voice response 507 through a voice output interface (not shown) (wherein the voice output interface is, for example, a speaker, a speaker, or a headphone, etc.) The device) is output to the user. It should be noted that when the speech synthesis module 526 queries the speech synthesis database 530 according to the first return answer 511, the first report answer 511 may need to be format converted first, and then through the interface specified by the speech synthesis database 530. call. Since the need to perform format conversion when calling the speech synthesis database 530 is related to the definition of the speech synthesis database 530 itself, since this portion belongs to a technique well known to those skilled in the art, it will not be described in detail herein.

接下來列舉實例來說明，若用戶輸入的是“我要看三國演義”的第一語音輸入501話，語音識別模組522會接收從語音取樣模組510傳來的對第一語音輸入501進行解析的第一請求資訊503，然後取出例如是包含“三國演義”的第一關鍵字509。自然語言處理模組524則可再對這個第一關鍵字509“三國演義”進行解析(例如透過圖1A的檢索系統200對結構化資料庫220進行全文檢索，並在取得回應結果110且對意圖資料112比對後產生確定意圖語法資料114，最後由分析結果輸出模組116所送出的分析結果104)，進而產生包含“三國演義”的三個意圖選項的回報答案，並將其整合成一候選列表(假設每個意圖選項只有一個回報答案，其分別歸類於“看書”、“看電視劇”、以及“看電影”三個選項)，接著再從候選列表的這三個回報答案中選出一個在熱度欄位316具有最高值(例如挑選圖3B的記錄10)做為第一回報答案511。在一實施例中，可以直接執行熱度欄位316具有最高值的所對應的方式(例如先前所提的直接播放蕭敬騰所演唱的“背叛”予用戶)，本發明並不對此加以限制。Next, an example is given to illustrate that if the user inputs the first voice input 501 of "I want to see the Romance of the Three Kingdoms", the voice recognition module 522 receives the first voice input 501 from the voice sampling module 510. The parsed first request information 503 is then taken out, for example, as a first keyword 509 containing "Three Kingdoms". The natural language processing module 524 can then parse the first keyword 509 "Three Kingdoms" (for example, the full-text search of the structured database 220 through the retrieval system 200 of FIG. 1A, and obtain the response result 110 and the intent The data 112 is compared to generate the determined intent grammar data 114, and finally the analysis result 104 sent by the analysis result output module 116), thereby generating a reward answer containing the three intent options of the "Three Kingdoms" and integrating them into a candidate. List (assuming that each intent option has only one return answer, which is categorized as "reading", "watching TV", and "watching movies"), and then selecting one of the three return answers from the candidate list. The heat field 316 has the highest value (e.g., the record 10 of Figure 3B is selected) as the first return answer 511. In an embodiment, the corresponding manner in which the heat field 316 has the highest value may be directly executed (for example, the previously mentioned "betrayal" sung by Xiao Jingteng to the user), which is not limited by the present invention.

此外，自然語言處理模組524還可藉由解析後續所接收的第二語音輸入501’(因為與先前的語音輸入501運用同樣的方式饋入語音取樣模組510)，而判斷前次的第一回報答案511是否正確。因為第二語音輸入501’是用戶針對先前提供予用戶的第一語音應答507所做的回應，其包含用戶認為先前的第一語音應答507正確與否的資訊。倘若在分析第二語音輸入501’後是表示用戶認為第一回報答案511不正確，自然語言處理模組524可選擇上述候選列表中的其他回報答案做為第二回報答案511’，例如從候選列表中剔除第一回報答案511後，並在剩餘的回報答案重新挑選一第二回報答案511’，再利用語音合成模組526找出對應於第二回報答案511’的第二語音513’，最後透過語音合成模組526將第二語音513’合成為第二語音應答507’播放予用戶。In addition, the natural language processing module 524 can also determine the previous number by parsing the subsequently received second voice input 501' (because it is fed into the voice sampling module 510 in the same manner as the previous voice input 501). A return answer 511 is correct. Because the second voice input 501' is a response by the user to the first voice response 507 previously provided to the user, it includes information that the user considered the previous first voice response 507 correct or not. If after analyzing the second voice input 501', it indicates that the user thinks that the first report answer 511 is incorrect, the natural language processing module 524 can select other reward answers in the candidate list as the second report answer 511', for example, from the candidate. After the first return answer 511 is removed from the list, and a second return answer 511' is re-selected in the remaining return answer, the speech synthesis module 526 is used to find the second speech 513' corresponding to the second return answer 511'. Finally, the second voice 513' is synthesized into a second voice response 507' to be played to the user through the voice synthesis module 526.

延續先前用戶輸入“我要看三國演義”的例子來說，若用戶想要看三國演義的電視劇，所以先前輸出予用戶的圖3B記錄10的選項(因為是看“三國演義”的電影)就不是用戶想要的，於是用戶可能再輸入“我要看三國演義電視劇”(用戶明確指出想看的是電視劇)、或是“我不要看三國演義電影”(用戶只否定目前選項)...等作為第二語音輸入501’。於是第二語音輸入501’將在解析而取得其第二請求資訊503’(或是第二關鍵字509’)後，會發現第二請求資訊503’中的第二關鍵字509’將包含“電視劇”(用戶有明確指示)或是“不要電影”(用戶只否定目前選項)，因此將判斷第一回報答案511不符合用戶的需求。是以，此時可以從候選列表再選出另一個回報答案做為第二回報答案511’並輸出對應的第二語音應答507’，例如輸出“我現在為您播放三國演義電視劇”的第二語音應答507’(如果用戶明確指出想觀看三國演義電視劇)、或是輸出“您想要的是哪個選項”(如果用戶只否定目前選項)的第二語音應答507’，並結合候選列表中其他的選項供用戶選取(例如“挑選熱度欄位316數值次高的回報答案作為第二回報答案511’)。再者，在另一實施例中，若是用戶所輸入的第二語音輸入501’包含“選擇”的訊息，例如顯示“觀看三國演義書籍”、“觀看三國演義電視劇”、以及“觀看三國演義電影”三個選項給用戶做選擇時，用戶可能輸入“我要看電影”的第二語音輸入501’時，將在分析第二語音輸入501’的第二請求資訊503’並發現用戶的意圖後(例如從第二關鍵字509’發現用戶選擇“觀看電影”)，於是第二語音輸入501’將在解析而取得其第二請求資訊503’後，輸出“我現在為您播放三國演義電影”的第二語音應答507’(如果用戶想觀看三國演義電影)然後直接播放電影予用戶。當然，若用戶所輸入的是“我要第三個選項”時(假設此時用戶所選擇的是閱讀書籍)，將執行第三選所對應的應用程式，亦即輸出“您想要的是閱讀三國演義書籍”的第二語音應答507’，並結合顯示三國演義的電子書予用戶的動作。Continuing the previous user input "I want to see the Romance of the Three Kingdoms", if the user wants to watch the TV series of the Three Kingdoms, the option of recording 10 in Figure 3B previously output to the user (because it is a movie of "Three Kingdoms") Not what the user wants, so the user may enter "I want to watch the drama of the Three Kingdoms" (the user clearly pointed out that he wants to watch a TV series), or "I don't want to watch the Romance of the Three Kingdoms" (the user only denies the current option)... Etc. as the second voice input 501'. Then, after the second voice input 501' obtains its second request information 503' (or the second keyword 509'), it will find that the second keyword 509' in the second request information 503' will contain " The TV series (the user has a clear indication) or "Do not want the movie" (the user only denies the current option), so it will be judged that the first return answer 511 does not meet the user's needs. Therefore, at this time, you can re-list from the candidate list. Select another return answer as the second return answer 511' and output a corresponding second voice response 507', for example, output "I now play the second voice response 507 for you in the Three Kingdoms TV series" (if the user explicitly points out that he wants to watch The Three Kingdoms TV series), or the second voice response 507' that outputs "what option do you want" (if the user only negates the current option), and combines other options in the candidate list for the user to select (eg "choose the heat bar" The second highest return answer 511') is the second highest answer 511'). In another embodiment, if the second voice input 501' input by the user includes a "select" message, for example, "Show the Three Kingdoms" When the user chooses to enter the second voice input 501' of "I want to watch a movie", the user will enter the second choice of "I want to watch the movie", the second choice of the "Royal Book", "Watch the Three Kingdoms TV series", and "Watch the Three Kingdoms" After the second request information 503' of the voice input 501' and the user's intention is found (for example, from the second keyword 509', the user selects "watch movie") Then the second voice input 501' will, after parsing to obtain its second request information 503', output the second voice response 507' of "I am currently playing the Three Kingdoms movie for you" (if the user wants to watch the Three Kingdoms movie) and then directly Play the movie to the user. Of course, if the user enters the "I want the third option" (assuming that the user chooses to read the book at this time), the application corresponding to the third selection will be executed, that is, output " What you want is to read the book of the Three Kingdoms' "Second Voice Response 507" and combine it with the e-book showing the Romance of the Three Kingdoms.

在本實施例中，前述自然語言理解系統520中的語音識別模組522、自然語言處理模組524以及語音合成模組526可與語音取樣模組510配置在同一機器中。在其他實施例中，語音識別模組522、自然語言處理模組524以及語音合成模組526亦可分散在不同的機器(例如電腦系統、伺服器或類似裝置/系統)中。例如圖5C所示的自然語言理解系統520’，語音合成模組526可與語音取樣模組510配置在同一機器502，而語音識別模組522、自然語言處理模組524可配置在另一機器。此外，在圖5C的架構下，自然語言處理模組524會將第一回報答案511/第二回報答案511’傳送至語音合成模組526，其隨即以第一回報答案511/第二回報答案511’送往語音合成資料庫以尋找對應的第一語音513/第二語音513’，作為產生第一語音應答507/第二語音應答507’的依據。In this embodiment, the voice recognition module 522, the natural language processing module 524, and the voice synthesis module 526 in the natural language understanding system 520 can be disposed in the same machine as the voice sampling module 510. In other embodiments, speech recognition The module 522, the natural language processing module 524, and the speech synthesis module 526 can also be distributed among different machines (eg, computer systems, servers, or the like). For example, the natural language understanding system 520' shown in FIG. 5C, the speech synthesis module 526 can be disposed in the same machine 502 as the speech sampling module 510, and the speech recognition module 522 and the natural language processing module 524 can be configured on another machine. . In addition, under the architecture of FIG. 5C, the natural language processing module 524 transmits the first return answer 511 / the second return answer 511 ' to the speech synthesis module 526, which then returns the answer 511 / the second return answer. The 511' is sent to the speech synthesis database to find the corresponding first speech 513/second speech 513' as a basis for generating the first speech response 507/second speech response 507'.

圖6是依照本發明一實施例所繪示的修正第一語音應答507的方法流程圖。在本實施例中的修正第一語音應答507的方法中，當用戶認為目前所播放的第一語音應答507不符合其先前所輸入的第一請求資訊503時，會再輸入第二語音輸入501’並饋入語音取樣模組510，隨後再由自然語言理解系統520分析而得知先前播放予用戶的第一語音應答507並不符合用戶的意圖時，自然語言理解系統520可再次輸出第二語音應答507’，藉以修正原本的第一語音應答507。為了方便說明，在此僅舉圖5A的自然語言對話系統500為例，但本實施例的修正第一語音應答507的方法亦可適用於上述圖5C的自然語言對話系統500’。FIG. 6 is a flow chart of a method for modifying a first voice response 507 according to an embodiment of the invention. In the method for correcting the first voice response 507 in this embodiment, when the user thinks that the currently played first voice response 507 does not match the first request information 503 that was previously input, the second voice input 501 is re-entered. 'When fed into the speech sampling module 510, and then analyzed by the natural language understanding system 520 to know that the first speech response 507 previously played to the user does not conform to the user's intention, the natural language understanding system 520 can output the second again. The voice response 507' is used to correct the original first voice response 507. For convenience of explanation, only the natural language dialogue system 500 of Fig. 5A is taken as an example, but the method of correcting the first voice response 507 of the present embodiment can also be applied to the above-described natural language dialogue system 500' of Fig. 5C.

請同時參照圖5A及圖6，於步驟S602中，語音取樣模組510會接收第一語音輸入501(亦同樣饋入語音取樣模組510)。其中，第一語音輸入501例如是來自用戶的語音，且第一語音輸入501還可具有用戶的第一請求資訊503。具體而言，來自用戶的第一語音輸入501可以是詢問句、命令句或其他請求資訊等，例如「我要看三國演義」、「我要聽忘情水的音樂」或「今天溫度幾度」等等。Referring to FIG. 5A and FIG. 6 simultaneously, in step S602, the voice sampling module 510 receives the first voice input 501 (also fed into the voice sampling module 510). The first voice input 501 is, for example, a voice from a user, and the first voice input is The entry 501 can also have the user's first request information 503. Specifically, the first voice input 501 from the user may be an inquiry sentence, a command sentence, or other request information, such as "I want to see the Romance of the Three Kingdoms", "I want to listen to the music of the water" or "A few degrees of today's temperature", etc. Wait.

于步驟S604中，自然語言理解系統520會解析第一語音輸入501中所包括的至少一個第一關鍵字509而獲得候選列表，其中候選列表具有一個或多個回報答案。舉例來說，當用戶的第一語音輸入501為「我要看三國演義」時，自然語言理解系統520經過分析後所獲得的第一關鍵字509例如是「『三國演義』、『看』」。又例如，當用戶的第一語音輸入501為「我要聽忘情水的歌」時，自然語言理解系統520經過分析後所獲得的第一關鍵字509例如是「『忘情水』、『聽』、『歌』」。In step S604, the natural language understanding system 520 parses the at least one first keyword 509 included in the first voice input 501 to obtain a candidate list, wherein the candidate list has one or more reward answers. For example, when the first voice input 501 of the user is "I want to see the Romance of the Three Kingdoms", the first keyword 509 obtained by the natural language understanding system 520 after analysis is, for example, "The Romance of the Three Kingdoms" and "Look". . For another example, when the first voice input 501 of the user is "I want to listen to the song of forgetting the water", the first keyword 509 obtained by the natural language understanding system 520 after analysis is, for example, "forget the water" and "listen". ,"song"".

接後，自然語言理解系統520可依據上述第一關鍵字509自結構化資料庫220進行查詢，而獲得至少一筆搜尋結果(例如圖1的分析結果104)，據以做為候選列表中的回報答案。至於從多個回報答案中選擇第一回報答案511的方式可如圖1A所述，在此不予以贅述。由於第一關鍵字509可能包含不同的知識領域(例如電影類、書籍類、音樂類或遊戲類等等)，且同一知識領域中亦可進一步分成多種類別(例如同一電影或書籍名稱的不同作者、同一歌曲名稱的不同演唱者、同一遊戲名稱的不同版本等等)，故針對第一關鍵字509而言，自然語言理解系統520可在結構化資料庫中查詢到一筆或多筆相關於此第一關鍵字509的搜尋結果(例如分析結果104)，其中每一筆搜尋結果中可包括相關於此第一關鍵字509的指引資料(例如以“蕭敬騰”、“背叛”為關鍵字108在圖3A、3B的結構化資料庫220進行全文檢索時，將得到例如圖3A的記錄6與7兩組匹配結果，它們分別包含“singerguid”、“songnameguid”的指引資料，此指引資料為儲存在指引欄位310的資料)與其他資料。其中，其他資料例如是在搜尋結果中，除了與第一關鍵字709相關以外的其他關鍵字等等(例如以“一起走過的日子”為關鍵字且在圖3A的結構化資料庫220做全文檢索而得到記錄1為匹配結果時，“劉德華”與“港臺，粵語，流行”兩者即為其他資料)。因此從另一觀點來看，當用戶所輸入的第一語音輸入501具有多個第一關鍵字509時，則表示用戶的第一請求資訊503較明確，使得自然語言理解系統520較能查詢到與第一請求資訊503接近的搜尋結果。After that, the natural language understanding system 520 can perform a query from the structured database 220 according to the first keyword 509 to obtain at least one search result (for example, the analysis result 104 of FIG. 1), which is used as a reward in the candidate list. answer. The manner of selecting the first return answer 511 from the plurality of reward answers may be as described in FIG. 1A and will not be described herein. Since the first keyword 509 may contain different knowledge areas (such as movies, books, music or games, etc.), and the same knowledge field may be further divided into multiple categories (for example, different authors of the same movie or book name) , different singers of the same song name, different versions of the same game name, etc.), so for the first keyword 509, the natural language understanding system 520 can query one or more related documents in the structured database. Search of the first keyword 509 Finding results (eg, analysis results 104), wherein each of the search results may include guidance material related to the first keyword 509 (eg, "Xiao Jingteng", "Betrayal" as a key 108 in the structuring of FIGS. 3A, 3B When the database 220 performs full-text search, for example, the matching results of the records 6 and 7 of FIG. 3A are obtained, which respectively contain the guidance materials of “singerguid” and “songnameguid”, which are the data stored in the guide field 310) And other information. The other information is, for example, in the search result, in addition to other keywords related to the first keyword 709, etc. (for example, "days passed together" as a keyword and in the structured database 220 of FIG. 3A. When the full-text search and record 1 is the matching result, "Andy Lau" and "Hong Kong, Taiwan, Cantonese, and popular" are other materials. Therefore, from another point of view, when the first voice input 501 input by the user has multiple first keywords 509, it indicates that the first request information 503 of the user is clear, so that the natural language understanding system 520 can query more. The search result that is close to the first request information 503.

舉例來說，當第一關鍵字509為「三國演義」時(例如用戶輸入“我要看三國演義”的語音輸入時)，自然語言理解系統520分析後可能產生三個可能意圖語法資料106(如圖1所示)："<readbook>,<bookname>=三國演義"；"<watchTV>,<TVname>=三國演義"；以及"<watchfilm>,<filmname>=三國演義"。For example, when the first keyword 509 is "Three Kingdoms" (for example, when the user inputs "I want to see the voice of the Three Kingdoms"), the natural language understanding system 520 may generate three possible intent grammar materials 106 after analysis ( As shown in Figure 1): "<readbook>, <bookname>=Three Kingdoms"; "<watchTV>, <TVname>=Three Kingdoms"; and "<watchfilm>, <filmname>=Three Kingdoms".

因此查訊到的搜尋結果是關於「...『三國演義』...『書籍』」(意圖資料為<readbook>)、「...『三國演義』...『電視劇』」(意圖資料為<watchTV>)、「...『三國演義』...『電影』」(意圖資料為 <watchfilm>)的記錄(例如圖3B的記錄8、9、10)，其中『電視劇』及『書籍』、『電影』分別列舉對應的用戶意圖)。又例如，當第一關鍵字509為「『忘情水』、『音樂』」(例如用戶輸入“我要聽忘情水的音樂”的語音輸入)時，自然語言理解系統520分析後可能產生以下的可能意圖語法資料："<playmusic>,<songname>=忘情水"；所查訊到的搜尋結果例如關於「...『忘情水』...『劉德華』」的記錄(例如圖3B的記錄11)、「...『忘情水』...『李翊君』」的記錄(例如圖3B的記錄12)，其中『劉德華』及『李翊君』為對應於用戶的意圖資料。換言之，每一筆搜尋結果可包括第一關鍵字509以及相關於第一關鍵字509的意圖資料，而自然語言理解系統520會依據所查詢到的搜尋結果，將搜尋結果中所包括的資料轉換成回報答案，並將回報答案記錄於候選列表中，以供後續步驟使用。Therefore, the search results of the inquiry are about "..."The Romance of the Three Kingdoms..."Books" (intent information is <readbook>), "..."The Romance of the Three Kingdoms..."TV drama" (intention) The information is <watchTV>), "..."The Romance of the Three Kingdoms...""Movie"" The record of <watchfilm>) (for example, records 8, 9, and 10 of FIG. 3B), wherein "TV drama", "book", and "movie" respectively list corresponding user intentions). For another example, when the first keyword 509 is ""forget the water", "music"" (for example, the user inputs "speaking of the music I want to listen to", the natural language understanding system 520 may generate the following after analysis. Possible intent grammar data: "<playmusic>, <songname>=forget the water"; the search results of the inquiry are, for example, records of "... "forget the water"... "Andy Lau"" (for example, the record of Figure 3B) 11), "... "Forget the water"... "Li Junjun" record (for example, record 12 of Figure 3B), in which "Andy Lau" and "Li Yujun" are the intent data corresponding to the user. In other words, each search result may include a first keyword 509 and an intent data related to the first keyword 509, and the natural language understanding system 520 converts the data included in the search result into a search result according to the query. Return the answer and record the return answer in the candidate list for use in subsequent steps.

于步驟S606中，自然語言理解系統520會自候選列表中選擇至少一第一回報答案511，並依據第一回報答案511輸出對應的第一語音應答507。在本實施例中，自然語言理解系統520可按照優先順序排列候選列表中的回報答案，並依據優先順序自候選列表中選出回報答案，據以輸出第一語音應答507。In step S606, the natural language understanding system 520 selects at least one first reward answer 511 from the candidate list, and outputs a corresponding first voice response 507 according to the first reward answer 511. In the present embodiment, the natural language understanding system 520 can arrange the reward answers in the candidate list in order of priority, and select a reward answer from the candidate list according to the priority order, thereby outputting the first voice response 507.

舉例來說，當第一關鍵字509為「三國演義」時，假設自然語言理解系統520查詢到很多筆關於「...『三國演義』...『書籍』」的記錄(亦即以查詢到的資料數量多寡做優先順序)，其次為「...『三國演義』...『音樂』」的記錄，而關於「...『三國演義』...『電視劇』」的記錄數量最少，則自然語言理解系統520會將「三國演義的書籍」做為第一回報答案(最優先選擇的回報答案)，「三國演義的音樂」做為第二回報答案(第二優先選擇的回報答案)，「三國演義的電視劇」做為第三回報答案(第三優先選擇的回報答案)。當然，若相關於「三國演義的書籍」的第一回報答案不只一筆記錄時，還可以依據應先順序(例如被點選次數多寡)來挑選第一回報答案511，相關細節前面已提過，在此不予贅述。For example, when the first keyword 509 is "Three Kingdoms", it is assumed that the natural language understanding system 520 queries a lot of records about "..."Three Kingdoms"... "Books" (ie, by query) The number of materials received is prioritized), followed by "..."The history of the Romance of the Three Kingdoms..."Music", and the "Music of the Three Kingdoms"... "TV drama" has the fewest records, then the Natural Language Understanding System 520 will "Three Kingdoms" "The Book of Romance" as the first return answer (the most preferred return answer), "the music of the Romance of the Three Kingdoms" as the second return answer (the second preferred choice of the answer), "the drama of the Three Kingdoms" as the first Three return answers (the third preferred return answer). Of course, if the first return answer related to the "Book of the Three Kingdoms" is not only a record, the first return answer 511 can also be selected according to the order of precedence (for example, the number of times selected). The relevant details have been mentioned above. I will not repeat them here.

接著，於步驟S608，語音取樣模組510會接收第二語音輸入501’，而自然語言理解系統520會解析此第二語音輸入501’，並判斷先前所選出的第一回報答案511是否正確。在此，語音取樣模組510會對第二語音輸入501’進行解析，以解析出第二語音輸入501’所包括的第二關鍵字509’，其中此第二關鍵字509’例如是用戶進一步提供的關鍵字(例如時間、意圖、知識領域...等等)。並且，當第二語音輸入501’中的第二關鍵字509’與第一回報答案511中所相關的意圖資料不相符時，自然語言理解系統520會判斷先前所選出的第一回報答案511為不正確。至於判斷第二語音輸入501’的第二請求資訊503’包含的是“正確”或“否定”第一語音應答507的方式前面已提過，在此不予贅述。Next, in step S608, the speech sampling module 510 receives the second speech input 501', and the natural language understanding system 520 parses the second speech input 501' and determines whether the previously selected first reward answer 511 is correct. Here, the voice sampling module 510 parses the second voice input 501 ′ to parse the second keyword 509 ′ included in the second voice input 501 ′, wherein the second keyword 509 ′ is, for example, further by the user. Keywords provided (such as time, intent, knowledge area, etc.). Moreover, when the second keyword 509' in the second voice input 501' does not match the intent data associated with the first reward answer 511, the natural language understanding system 520 determines that the previously selected first reward answer 511 is Incorrect. The manner in which the second request information 503' of the second voice input 501' is included to include the "correct" or "negative" first voice response 507 has been mentioned above and will not be described herein.

進一步而言，自然語言理解系統520所解析的第二語音輸入501’可包括或不包括明確的第二關鍵字509’。舉例來說，語音取樣模組510例如是接收到來自用戶所說的「我不是指三國演義的書籍」(情況A)、「我不是指三國演義的書籍，我是指三國演義的電視劇」(情況B)、「我是指三國演義的電視劇」(情況C)等等。上述情況A中的第二關鍵字509’例如為「『不是』、『三國演義』、『書籍』」，情況B中的關鍵字509例如為「『不是』、『三國演義』、『書籍』，『是』、『三國演義』、『電視劇』」，而情況C中的第二關鍵字509’例如為「『是』、『三國演義』、『電視劇』」。為了方便說明，上述僅列舉情況A、B及C為例，但本實施例並不限於此。Further, the second speech input 501' parsed by the natural language understanding system 520 may or may not include an explicit second keyword 509'. For example, the voice sampling module 510 receives, for example, what the user said, "I don't mean the Three Kingdoms. "Book of Righteousness" (Case A), "I am not referring to the books of the Romance of the Three Kingdoms, I am referring to the TV series of the Romance of the Three Kingdoms" (Case B), "I refer to the TV series of the Romance of the Three Kingdoms" (Case C) and so on. The second keyword 509' in the above case A is, for example, ""No", "Three Kingdoms", "Book"", and the keyword 509 in Case B is, for example, "No", "Romance of the Three Kingdoms", "Book" "Yes", "Three Kingdoms", "TV drama", and the second keyword 509' in case C is, for example, "Yes", "Romance of the Three Kingdoms", "TV drama". For convenience of explanation, only the cases A, B, and C are exemplified above, but the embodiment is not limited thereto.

接著，自然語言理解系統520會依據上述第二語音輸入501’所包括的第二關鍵字509’，來判斷第一回報答案511中相關的意圖資料是否正確。也就是說，倘若斷第一回報答案511為「三國演義的書籍」，而上述第二關鍵字509’為「『三國演義』、『電視劇』」，則自然語言理解系統520會判斷第一回報答案511中相關的意圖資料(即用戶想看三國演義『書籍』)不符合來自用戶第二語音輸入501’的第二關鍵字509’(即用戶想看三國演義『電視劇』)，藉以判斷第一回報答案511不正確。類似地，倘若判斷回報答案為「三國演義的書籍」，而上述第二關鍵字509’為「『不是』、『三國演義』、『書籍』」，則自然語言理解系統520亦會判斷出第一回報答案511不正確。Next, the natural language understanding system 520 determines whether the related intent data in the first reward answer 511 is correct based on the second keyword 509' included in the second speech input 501'. That is to say, if the first return answer 511 is "a book of the Romance of the Three Kingdoms" and the second keyword 509' is "the Romance of the Three Kingdoms" or "TV drama", the natural language understanding system 520 will judge the first return. The related intent data in answer 511 (ie, the user wants to see the "National Romance" "book") does not conform to the second keyword 509' from the user's second voice input 501' (ie, the user wants to see the "Three Kingdoms" "TV drama"), thereby judging the A return answer 511 is incorrect. Similarly, if the answer to the answer is "the book of the Romance of the Three Kingdoms" and the second keyword 509' is "No", "Romance of the Three Kingdoms" and "Book", the Natural Language Understanding System 520 will also determine the A return answer 511 is incorrect.

當自然語言理解系統520解析第二語音輸入501之後，判斷之前輸出的第一語音應答501為正確時，則如步驟S610所示，自然語言理解系統520會做出對應於第二語音輸入501’的回應。舉例來說，假設來自用戶的第二語音輸入501’為「是的，是三國演義的書籍」，則自然語言理解系統520可以是輸出「正在幫您開啟三國演義的書籍」的第二語音應答507’。或者，自然語言理解系統520可在播放第二語音應答507’的同時，直接透過處理單元(未繪示)來載入三國演義的書籍內容。After the natural language understanding system 520 parses the second voice input 501 and determines that the previously output first voice response 501 is correct, the natural language understanding system 520 makes a corresponding second voice input 501' as shown in step S610. Back should. For example, if the second voice input 501' from the user is "Yes, a book of the Three Kingdoms", the natural language understanding system 520 may be a second voice response outputting "a book that is helping you to open the Romance of the Three Kingdoms". 507'. Alternatively, the natural language understanding system 520 can load the book content of the Three Kingdoms directly through the processing unit (not shown) while playing the second voice response 507'.

然而，當自然語言理解系統520解析第二語音輸入501’之後，判斷之前輸出的第一語音應答507(亦即回報答案511)不正確時，則如步驟S612所示，自然語言理解系統520會自候選列表中選擇第一回報答案511之外的另一者，並依據所選擇的結果輸出第二語音應答507’。在此，倘若用戶所提供的第二語音輸入501’中不具有明確的第二關鍵字509’(如上述情況A的第二語音輸入501’)，則自然語言理解系統520可依據優先順序從候選列表中選出第二優先選擇的回報答案。或者，倘若用戶所提供的第二語音輸入501’中具有明確的第二關鍵字509’(如上述情況B及C的第二語音輸入501’)，則自然語言理解系統520可直接依據用戶所指引的第二關鍵字509’，在從候選列表中選出對應的回報答案。However, after the natural language understanding system 520 parses the second voice input 501', it is determined that the previously output first voice response 507 (ie, the return answer 511) is incorrect, then the natural language understanding system 520 will perform as shown in step S612. The other one other than the first return answer 511 is selected from the candidate list, and the second voice response 507' is output according to the selected result. Here, if the second voice input 501' provided by the user does not have the explicit second keyword 509' (such as the second voice input 501' of the above case A), the natural language understanding system 520 can be based on the priority order. A second preferred selection of return answers is selected from the candidate list. Alternatively, if the second voice input 501' provided by the user has an explicit second keyword 509' (such as the second voice input 501' of the above cases B and C), the natural language understanding system 520 can directly refer to the user's The second keyword 509' of the guide selects the corresponding reward answer from the candidate list.

另一方面，倘若用戶所提供的第二語音輸入501’中具有明確的第二關鍵字509’(如上述情況B及C的第二語音輸入)，但自然語言理解系統520在候選列表中查無符合此第二關鍵字509的回報答案，則自然語言理解系統520會輸出第三語音應答，例如「查無此書」或「我不知道」等。On the other hand, if the second voice input 501' provided by the user has an explicit second keyword 509' (such as the second voice input of cases B and C above), the natural language understanding system 520 checks in the candidate list. If there is no return answer corresponding to the second keyword 509, the natural language understanding system 520 outputs a third voice response, such as "Check this book" or "I don't know".

為了使本領域的技術人員進一步瞭解本實施例的修正語音應答的方法以及自然語言對話系統，以下再舉一實施例進行詳細的說明。In order to make the correction language of this embodiment be further understood by those skilled in the art. The method of sound response and the natural language dialogue system will be described in detail below with reference to an embodiment.

首先，假設語音取樣模組510接收的第一語音輸入501為「我要看三國演義」(步驟S602)，接著，自然語言理解系統520可解析出為「『看』、『三國演義』」的第一關鍵字509，並獲得具有多個第一回報答案的候選列表，其中每一個回報答案具有相關的關鍵字與其他資料(其他資料可儲存於圖3A/3B的內容欄位306中、或是各記錄302的數值欄位312的一部份)(步驟S604)，如表一所示(假設搜尋結果中關於三國演義的書籍/電視劇/音樂/電影各只有一筆資料)。First, it is assumed that the first voice input 501 received by the voice sampling module 510 is "I want to see the Romance of the Three Kingdoms" (step S602). Then, the natural language understanding system 520 can be interpreted as ""seeing" and "the Romance of the Three Kingdoms". a first keyword 509 and obtaining a candidate list having a plurality of first reward answers, wherein each of the reward answers has associated keywords and other materials (other materials may be stored in content field 306 of FIG. 3A/3B, or Is a portion of the value field 312 of each record 302 (step S604), as shown in Table 1 (assuming that the books/television/music/movie for the Romance of the Three Kingdoms have only one piece of information).

接著，自然語言理解系統520會在候選列表中選出所需的回報答案。假設自然語言理解系統520依序選取候選列表中的回報答案a以做為第一回報答案511，則自然語言理解系統520例如是輸出「是否播放三國演義的書籍」，即第一語音應答507(步驟S606)。Next, the natural language understanding system 520 will select the desired reward answer in the candidate list. Assuming that the natural language understanding system 520 sequentially selects the reward answer a in the candidate list as the first reward answer 511, the natural language understanding system 520 outputs, for example, "whether or not the book of the Three Kingdoms is played", that is, the first voice response 507 ( Step S606).

此時，若語音取樣模組510接收的第二語音輸入501’為「是的」(步驟S608)，則自然語言理解系統520會判斷出上述的回報答案a為正確，且自然語言理解系統520會輸出另一語音應答507「請稍候」(亦即第二語音應答507’)，並透過處理單元(未繪示)來載入三國演義的書籍內容(步驟S610)。At this time, if the second voice input 501 ′ received by the voice sampling module 510 is “Yes” (step S608 ), the natural language understanding system 520 determines that the above-mentioned reward answer a is correct, and the natural language understanding system 520 Another voice response 507 "Please wait" (ie, the second voice response 507') is output, and the book content of the Three Kingdoms is loaded through the processing unit (not shown) (step S610).

然而，若語音取樣模組510接收的第二語音輸入501’為「我不是指三國演義的書籍」(步驟S608)，則自然語言理解系統520會判斷出上述的回報答案a為不正確，且自然語言理解系統520會再從候選列表的回報答案b~e中，選出另一回報答案做第二回報答案511’，其例如是回報答案b的「是否要播放三國演義的電視劇」。倘若用戶繼續回答「不是電視劇」，則自然語言理解系統520會選擇回報答案c~e的其中之一來回報。此外，倘若候選列表中的回報答案a~e皆被自然語言理解系統520回報予用戶過，且這些回報答案a~e中沒有符合用戶的語音輸入501時，則自然語言理解系統520輸出「查無任何資料」的語音應答507(步驟S612)。However, if the second voice input 501' received by the voice sampling module 510 is "I don't refer to the book of the Three Kingdoms" (step S608), the natural language understanding system 520 determines that the above-mentioned reward answer a is incorrect, and The natural language understanding system 520 will then select another return answer from the return answer b~e of the candidate list as the second return answer 511', which is, for example, the "TV drama of whether to play the Three Kingdoms" of the answer b. If the user continues to answer "not a TV show", the natural language understanding system 520 will choose to report one of the answers c~e to report. In addition, if the reward answers a~e in the candidate list are returned to the user by the natural language understanding system 520, and the return answers a~e do not match the user's voice input 501, the natural language understanding system 520 outputs "check". The voice response 507 without any data (step S612).

在另一實施例中，於上述的步驟S608，若語音取樣模組 510接收用戶的第二語音輸入501’為「我是指三國演義的漫畫」，在此，由於候選列表中並無關於漫畫的回報答案，故自然語言理解系統520會直接輸出「查無任何資料」的第二語音應答507’。In another embodiment, in the above step S608, if the voice sampling module 510 receives the user's second voice input 501' as "I mean the comics of the Three Kingdoms". Here, since there is no return answer about the comics in the candidate list, the natural language understanding system 520 will directly output "check no data. The second voice response 507'.

基於上述，自然語言理解系統520可依據來自用戶的第一語音輸入501而輸出對應的第一語音應答507。其中，當自然語言理解系統520所輸出的第一語音應答507不符合用戶的第一語音輸入501的請求資訊503時，自然語言理解系統520可修正原本輸出的第一語音應答507，並依據用戶後續所提供的第二語音輸入501’，進一步輸出較符合用戶第一請求資訊503的第二語音應答507’。如此一來，倘若用戶仍不滿意自然語言理解系統520所提供的答案時，自然語言理解系統520可自動地進行修正，並回報新的語音應答予用戶，藉以增進用戶與自然語言對話系統500進行對話時的便利性。Based on the above, the natural language understanding system 520 can output a corresponding first voice response 507 in accordance with the first voice input 501 from the user. When the first voice response 507 output by the natural language understanding system 520 does not meet the request information 503 of the first voice input 501 of the user, the natural language understanding system 520 can correct the first voice response 507 that is originally output, and according to the user. The second voice input 501' provided subsequently further outputs a second voice response 507' that is more consistent with the user first request information 503. As such, if the user is still dissatisfied with the answers provided by the natural language understanding system 520, the natural language understanding system 520 can automatically correct and report a new voice response to the user in order to enhance the user and natural language dialogue system 500. Convenience in conversation.

值得一提的是，在圖6的步驟S606與步驟S612中，自然語言理解系統520還可依照不同評估優先順序的方法，來排序候選列表中的回報答案，據以按照此優先順序自候選列表中選出回報答案，再輸出對應於回報答案的語音應答。It is worth mentioning that, in step S606 and step S612 of FIG. 6, the natural language understanding system 520 can also sort the reward answers in the candidate list according to different methods for evaluating the priority order, according to the priority list from the candidate list. The answer is selected and the voice response corresponding to the answer is output.

舉例來說，自然語言理解系統520可依據眾人使用習慣，來排序候選列表中的第一回報答案511的優先順序，其中越是關於眾人經常使用的答案則優先排列。例如，再以第一關鍵字509為「三國演義」為例，假設自然語言理解系統520找到的回報答案為三國演義的電視劇、三國演義的書籍與三國演義的音樂。其中，若眾人提到「三國演義」時通常是指「三國演義」的書籍，較少人會指「三國演義」的電視劇，而更少人會指「三國演義」的音樂(例如使用圖3C中的熱度欄位316所儲存的數值來代表全部用戶的匹配情形時，熱度欄位316的數值在「三國演義」的「書籍」記錄上會最高)，則自然語言理解系統520會按照優先順序排序關於「書籍」、「電視劇」、「音樂」的回報答案。也就是說，自然語言理解系統520會優先選擇「三國演義的書籍」來做為第一回報答案511，並依據此第一回報答案511輸出第一語音應答507。For example, the natural language understanding system 520 can prioritize the first return answer 511 in the candidate list according to the usage habits of the people, wherein the more frequently the answers are frequently used by the people. For example, taking the first keyword 509 as the "Three Kingdoms" as an example, assume that the natural language understanding system 520 finds the answer to the drama of the Three Kingdoms, the books of the Three Kingdoms, and the music of the Three Kingdoms. its In the case of the "Three Kingdoms", when people refer to the "Three Kingdoms", less people will refer to the "Three Kingdoms" TV series, and fewer people will refer to the "Three Kingdoms" music (for example, using Figure 3C). When the value stored in the heat field 316 represents the matching situation of all users, the value of the heat field 316 is the highest in the "book" record of the "Three Kingdoms", then the natural language understanding system 520 will be prioritized. Sort the answers to the questions about "books," "television," and "music." That is to say, the natural language understanding system 520 preferentially selects the "book of the Three Kingdoms" as the first return answer 511, and outputs the first voice response 507 according to the first return answer 511.

此外，自然語言理解系統520亦可依據用戶習慣，以決定回報答案的優先順序。具體來說，自然語言理解系統520可將曾經接收到來自用戶的語音輸入(包括第一語音輸入501、第二語音輸入501’、或是任何由用戶所輸入的語音輸入)記錄在特性資料庫(例如圖7A/7B所示)，其中特性資料庫可儲存在硬碟等儲存裝置中。特性資料庫可記錄自然語言理解系統520解析用戶的語音輸入501時，所獲得的第一關鍵字509以及自然語言理解系統520所產生的應答記錄等關於用戶喜好、習慣等資料。關於用戶喜好/習慣資料的儲存與擷取，將在後面透過圖7A/7B/8做更進一步的說明。此外，在一實施例中，在圖3C中的熱度欄位316所儲存的數值是與用戶的習慣(例如匹配次數)相關時，可用熱度欄位316的數值判斷用戶的使用習慣或優先順序。因此，自然語言理解系統520在選擇回報答案時，可根據特性資料庫730中所記錄的用戶習慣等資訊，按照優先排序回報答案，藉以輸出較符合用戶的語音輸入501的語音應答507。舉例來說，在圖3B中，記錄8/9/10的熱度欄位316所儲存的數值分別為2/5/8，其可分別代表「三國演義」的「書籍」、「電視劇」、「電影」的匹配次數分別為2/5/8，所以對應於「三國演義的電影」的回報答案將被優先選擇。In addition, the natural language understanding system 520 can also determine the priority order for returning answers based on user habits. In particular, the natural language understanding system 520 can record voice input (including the first voice input 501, the second voice input 501', or any voice input input by the user) that has received the user from the feature database. (For example, as shown in FIG. 7A/7B), wherein the property database can be stored in a storage device such as a hard disk. The feature database can record information about the user's preferences, habits, and the like, such as the first keyword 509 and the response record generated by the natural language understanding system 520 when the natural language understanding system 520 parses the user's voice input 501. The storage and retrieval of user preferences/customary data will be further explained later through FIG. 7A/7B/8. Moreover, in one embodiment, when the value stored in the heat field 316 in FIG. 3C is related to the user's habits (eg, the number of matches), the value of the heat field 316 can be used to determine the user's usage habits or prioritization. Therefore, the natural language understanding system 520 can select the user records recorded in the property database 730 when selecting the reward answer. The information is used to report the answers in order of priority, so as to output a voice response 507 that is more in line with the user's voice input 501. For example, in Figure 3B, the value stored in the 8/9/10 heat field 316 is 2/5/8, which can represent the "books", "TV series", "Three Kingdoms" respectively. The number of matches for the movie is 2/5/8, so the answer to the movie corresponding to "The Movie of the Three Kingdoms" will be preferred.

另一方面，自然語言理解系統520亦可依據用戶習慣來選擇回報答案。舉例來說，假設用戶與自然語言理解系統520進行對話時，經常提起到「我要看三國演義的書籍」，而較少提起「我要看三國演義的電視劇」，且更少提到「我要看三國演義的音樂」(例如用戶對話資料庫中記錄有20筆關於「三國演義的書籍」的記錄(例如圖3B記錄8的喜好欄位318所示)，8筆關於「三國演義的電視劇」的記錄(例如圖3B記錄9的喜好欄位318所示)，以及1筆關於「三國演義的音樂」的記錄)，則候選列表中的回報答案的優先順序將會依序為「三國演義的書籍」、「三國演義的電視劇」以及「三國演義的音樂」。也就是說，當第一關鍵字509為「三國演義」時，自然語言理解系統520會選擇「三國演義的書籍」來做為第一回報答案511，並依據此第一回報答案511輸出第一語音應答507。On the other hand, the natural language understanding system 520 can also select a reward answer based on user habits. For example, suppose that when a user engages in a conversation with the natural language understanding system 520, he often mentions "I want to read the books of the Romance of the Three Kingdoms", and less mentions "I want to watch the TV series of the Romance of the Three Kingdoms", and less mention "I To see the music of the Romance of the Three Kingdoms" (for example, there are 20 records of the "Book of the Three Kingdoms" recorded in the user dialogue database (for example, the favorite field 318 of record 8 in Figure 3B), 8 TV series about "The Romance of the Three Kingdoms" The record (for example, the preference field 318 of record 9 in Figure 3B) and one record of the "Music of the Three Kingdoms", the priority order of the return answers in the candidate list will be followed by the "Three Kingdoms" Books, "TV dramas of the Romance of the Three Kingdoms" and "Music of the Romance of the Three Kingdoms". That is to say, when the first keyword 509 is "The Romance of the Three Kingdoms", the natural language understanding system 520 selects "the book of the Romance of the Three Kingdoms" as the first return answer 511, and outputs the first according to the first return answer 511. Voice response 507.

值得一提的是，自然語言理解系統520還可依據用戶喜好，以決定回報答案的優先順序。具體來說，用戶對話資料庫還可記錄有用戶所表達過的關鍵字，例如：「喜歡」、「偶像」、「厭惡」或「討厭」等等。因此，自然語言理解系統520可自候選列表中，依據上述關鍵字被記錄的次數來對回報答案進行排序。舉例來說，假設回報答案中相關於「喜歡」的次數較多，則此回報答案會優先被選取。或者，假設回報答案中相關於「厭惡」的次數較多，則較後被選取。It is worth mentioning that the natural language understanding system 520 can also determine the priority order of returning answers according to user preferences. Specifically, the user dialogue database can also record keywords that the user has expressed, such as "like", "idol", "disgust" or "hate". Therefore, the natural language understanding system 520 can wait for a while. In the selection list, the return answers are sorted according to the number of times the above keywords are recorded. For example, if there are more times in the return answer related to "like", then the return answer will be selected first. Or, suppose that the number of times in the return answer related to "disgust" is more, then it is selected later.

舉例來說，假設用戶與自然語言理解系統520進行對話時，經常提到「我討厭看三國演義的電視劇」，而較少提到「我討厭聽三國演義的音樂」，且更少提到「我討厭聽三國演義的書籍」(例如用戶對話資料庫中記錄有20筆關於「我討厭看三國演義的電視劇」的記錄(例如可透過圖3B記錄9的厭惡欄位320做記錄)，8筆關於「我討厭聽三國演義的音樂」的記錄，以及1筆關於「我討厭看三國演義的書籍」(例如透過圖3B記錄8的厭惡欄位320做記錄))，則候選列表中的回報答案的優先順序依序是「三國演義的書籍」、「三國演義的電視劇」以及「三國演義的音樂」。也就是說，當第一關鍵字509為「三國演義」時，自然語言理解系統520會選擇「三國演義」的書籍來做為第一回報答案511，並依據此第一回報答案511輸出第一語音應答507。在一實施例中，可以在圖3B的熱度欄位316外另外加一個“厭惡欄位320”，用以記錄用戶的“厭惡程度”。在另一個實施例中，可以在解析到用戶對某一記錄的“厭惡”資訊時，直接在對應記錄的熱度欄位316(或喜好欄位318)上減一(或其他數值)，這樣可以在不增加欄位時記錄用戶的喜好。各種記錄用戶喜好的實施方式都可應用在本發明實施例中，本發明並不對此加以限制。其他關於用戶習慣資料的記錄與運用、以及用戶/眾人使用習慣及喜好...等方式來提供應答及回報答案的實施例，會在後面的圖7A/7B/8做更詳盡的解說。For example, suppose that when a user engages in a dialogue with the natural language understanding system 520, he often mentions "I hate watching TV dramas of the Three Kingdoms" and less mentions "I hate listening to the music of the Three Kingdoms" and less mentions " I hate listening to the books of the Romance of the Three Kingdoms. (For example, there are 20 records in the user dialogue database about "I hate to watch the drama of the Three Kingdoms" (for example, the aversive field 320 recorded in Figure 3B is recorded), 8 Regarding the record of "I hate listening to the music of the Romance of the Three Kingdoms" and a book about "I hate to read the Romance of the Three Kingdoms" (for example, by recording the disgusting field 320 of record 8 in Figure 3B), the return answer in the candidate list. The order of priority is "The Books of the Romance of the Three Kingdoms", "The TV Drama of the Three Kingdoms" and "The Music of the Romance of the Three Kingdoms". That is to say, when the first keyword 509 is "Three Kingdoms", the natural language understanding system 520 selects the "Three Kingdoms" book as the first return answer 511, and outputs the first according to the first return answer 511. Voice response 507. In an embodiment, an "disgusting field 320" may be added to the heat field 316 of FIG. 3B to record the user's "degree of disgust." In another embodiment, one (or other value) may be directly subtracted from the heat field 316 (or favorite field 318) of the corresponding record when parsing the user's "disgust" information for a record. Record user preferences when no fields are added. Various embodiments for recording user preferences may be applied to the embodiments of the present invention, and the present invention is not limited thereto. Other about users An embodiment of the recording and application of customary materials, as well as user/common usage habits and preferences, etc., to provide answers and reward answers, will be explained in more detail in later FIG. 7A/7B/8.

另一方面，自然語言理解系統520還可依據用戶早于自然語言對話系統500提供回報答案前(例如第一語音輸入501被播放前，此時用戶尚不知自然語言對話系統500將提供哪種回報答案供其選擇)所輸入的語音輸入，以決定至少一回報答案的優先順序。也就是說，假設有語音輸入(例如第四語音輸入)被語音取樣模組510所接收的時間早於第一語音輸入501被播放時，則自然語言理解系統520亦可透過解析第四語音輸入中的第四關鍵字，並在候選列表中優先選取具有與此第四關鍵字符合的第四回報答案，並依據此第四回報答案輸出第四語音應答。On the other hand, the natural language understanding system 520 can also provide a return before the answer is provided by the user prior to the natural language dialogue system 500 (eg, before the first voice input 501 is played, at which time the user does not know which return the natural language dialogue system 500 will provide. The answer is chosen by the input) of the input voice to determine the priority of at least one of the answers. That is, if the voice input (eg, the fourth voice input) is received by the voice sampling module 510 earlier than the first voice input 501, the natural language understanding system 520 can also analyze the fourth voice input. The fourth keyword is selected in the candidate list, and the fourth reward answer corresponding to the fourth keyword is preferentially selected, and the fourth voice response is output according to the fourth reward answer.

舉例來說，假設自然語言理解系統520先接收到「我想看電視劇」的第一語音輸入501，且沒多久(例如隔了幾秒)之後，假設自然語言理解系統520又接收到「幫我放三國演義好了」的第四語音輸入501。此時，自然語言理解系統520可在第一語音輸入501中識別到「電視劇」的第一關鍵字509，隨後又在第四關鍵字中識別到「三國演義」。因此，自然語言理解系統520會從候選列表，選取關於「三國演義」與「電視劇」的回報答案，並以此第四回報答案據以輸出第四語音應答予用戶。For example, assume that the natural language understanding system 520 first receives the first voice input 501 of "I want to watch a TV show", and after a short time (eg, after a few seconds), assume that the natural language understanding system 520 receives "help me. The fourth voice input 501 puts the Romance of the Three Kingdoms. At this time, the natural language understanding system 520 can recognize the first keyword 509 of the "drama" in the first voice input 501, and then recognize the "Three Kingdoms" in the fourth keyword. Therefore, the natural language understanding system 520 selects the return answers for the "Three Kingdoms" and "TV series" from the candidate list, and outputs a fourth voice response to the user based on the fourth return answer.

基於上述，自然語言理解系統520可依據來自用戶的語音輸入，並參酌眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等資訊，而輸出較能符合語音輸入的請求資訊的語音應答予用戶。其中，自然語言理解系統520可依據不同的排序方式，例如眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等方式，來優先排序候選列表中的回報答案。藉此，若來自用戶的語音輸入較不明確時，自然語言理解系統520可參酌眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話，來判斷出用戶的語音輸入501中所意指的意圖(例如第一語音輸入501中的關鍵字509的屬性、知識領域等等)。換言之，若回報答案與用戶曾表達過/眾人所指的意圖接近時，自然語言理解系統520則會優先考慮此回報答案。如此一來，自然語言對話系統500所輸出的語音應答，可較符合用戶的請求資訊。Based on the above, the natural language understanding system 520 can be based on voice input from the user, and can be used according to the usage habits, user preferences, user habits, or users. The front-to-back dialogue and the like, and output a voice response that is more in line with the request information of the voice input to the user. The natural language understanding system 520 can prioritize the reward answers in the candidate list according to different sorting methods, such as the usage habits, user preferences, user habits, or the user's forward and backward conversations. Thereby, if the voice input from the user is relatively unclear, the natural language understanding system 520 can determine the meaning of the user's voice input 501 by referring to the usage habits, user preferences, user habits, or the user's conversations. Intent (e.g., attributes of the keyword 509 in the first voice input 501, knowledge fields, etc.). In other words, if the reward answer is close to what the user has expressed/intended by the person, the natural language understanding system 520 will prioritize the return answer. In this way, the voice response output by the natural language dialogue system 500 can be more consistent with the user's request information.

綜上所述，在本實施例的修正語音應答的方法與自然語言對話系統中，自然語言對話系統可依據來自用戶的第一語音輸入501而輸出對應的第一語音應答507。其中，當自然語言對話系統所輸出的第一語音應答507不符合用戶的第一語音輸入501的第一請求資訊503或第一關鍵字509時，自然語言對話系統可修正原本輸出的第一語音應答507，並依據用戶後續所提供的第二語音輸入501’，進一步選出較符合用戶需求的第二語音應答507’。此外，自然語言對話系統還可依據眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等方式，來優先選出較適當的回報答案，據以輸出對應的語音應答予用戶。如此一來，倘若用戶不滿意自然語言對話系統所提供的答案時，自然語言對話系統可依照用戶每一次所說出的請求資訊自動地進行修正，並回報新的語音應答予用戶，藉以增進用戶與自然語言對話系統進行對話時的便利性。In summary, in the method for correcting a voice response and the natural language dialogue system of the present embodiment, the natural language dialogue system can output a corresponding first voice response 507 according to the first voice input 501 from the user. Wherein, when the first voice response 507 output by the natural language dialogue system does not match the first request information 503 or the first keyword 509 of the first voice input 501 of the user, the natural language dialogue system can correct the first voice originally output. In response to 507, and according to the second voice input 501' provided by the user, a second voice response 507' that is more in line with the user's needs is further selected. In addition, the natural language dialogue system can also preferentially select a more appropriate return answer according to the usage habits of the people, the user's preference, the user's habits or the user's talks, and the like, and output a corresponding voice response to the user. In this way, if the user is not satisfied with the answer provided by the natural language dialogue system, the natural language dialogue system can The user automatically corrects the request according to the request information that is spoken by the user, and returns a new voice response to the user, thereby improving the convenience of the user in the dialogue with the natural language dialogue system.

接著再以自然語言理解系統100與結構化資料庫220等架構與構件，應用於依據與用戶的對話場景及上下文、用戶使用習慣、眾人使用習慣及用戶喜好來提供應答及回報答案的實例做的說明。Then, the architecture and components such as the system 100 and the structured database 220 are understood in a natural language, and are applied to an example of providing a response and returning an answer according to the conversation scene and context of the user, the user's usage habits, the usage habits of the user, and the user's preference. Description.

圖7A是依照本發明一實施例所繪示的自然語言對話系統的方塊圖。請參照圖7A，自然語言對話系統700包括語音取樣模組710、自然語言理解系統720、特性資料庫730及語音合成資料庫740。事實上，圖7A中的語音取樣模組710與圖5A的語音取樣模組510相同、而且自然語言理解系統520與自然語言理解系統720亦相同，所以其執行的功能是相同的。此外，自然語言理解系統720分析請求資訊703時，亦可透過對圖1的資料化資料庫220進行全文檢索而獲得用戶的意圖，這部分的技術因前面已針對圖1與相關敍述做說明故不再贅述。至於特性資料庫730是用以儲存由自然語言理解系統720所送來的用戶喜好資料715、或提供用戶喜好記錄717予自然語言理解系統720，這部分在後文會再行詳述。而語音合成資料庫740則等同語音合成資料庫530，用以提供語音輸出予用戶。在本實施例中，語音取樣模組710用以接收語音輸入701(即圖5A/B的第一/第二語音輸入501/501’，為來自用戶的語音)，而自然語言理解系統720會解析語音輸入中的請求資訊703(即圖5A/B的第一/第二請求資訊503/503’)，並輸出對應的語音應答707(即圖5A/B的第一/第二語音應答507/507’)。前述自然語言對話系統700中的各構件可配置在同一機器中，本發明對此並不加以限定。FIG. 7A is a block diagram of a natural language dialogue system according to an embodiment of the invention. Referring to FIG. 7A, the natural language dialogue system 700 includes a speech sampling module 710, a natural language understanding system 720, a feature database 730, and a speech synthesis database 740. In fact, the speech sampling module 710 in FIG. 7A is the same as the speech sampling module 510 of FIG. 5A, and the natural language understanding system 520 is the same as the natural language understanding system 720, so the functions performed are the same. In addition, when the natural language understanding system 720 analyzes the request information 703, the user's intention can also be obtained by performing full-text search on the data database 220 of FIG. 1. This part of the technology has been described above with respect to FIG. 1 and related descriptions. No longer. The feature database 730 is used to store the user preference information 715 sent by the natural language understanding system 720, or to provide the user preference record 717 to the natural language understanding system 720, which will be described in more detail later. The speech synthesis database 740 is equivalent to the speech synthesis database 530 for providing voice output to the user. In this embodiment, the voice sampling module 710 is configured to receive the voice input 701 (ie, the first/second voice input 501/501' of FIG. 5A/B is the voice from the user), and the natural language understanding system 720 Analysis The request information 703 in the voice input (ie, the first/second request information 503/503' of FIG. 5A/B), and outputs a corresponding voice response 707 (ie, the first/second voice response 507/ of FIG. 5A/B). 507'). The components in the aforementioned natural language dialogue system 700 can be configured in the same machine, which is not limited by the present invention.

自然語言理解系統720會接收從語音取樣模組710傳來的對語音輸入701進行解析後的請求資訊703，並且，自然語言理解系統720會根據語音輸入701中的一個或多個關鍵字709來產生包含至少一個回報答案的候選列表，再從候選列表中找出較符合關鍵字709的一者作為回報答案711，並據以查詢語音合成資料庫740以找出對應於回報答案711的語音713，最後再依據語音713輸出語音應答707。此外，本實施例的自然語言理解系統720可由一個或數個邏輯門組合而成的硬體電路來實作，或以電腦程式碼來實作，在此僅為舉例說明，並不以此為限。The natural language understanding system 720 receives the request information 703 from the speech sampling module 710 that parses the speech input 701, and the natural language understanding system 720 will generate one or more keywords 709 based on the speech input 701. A candidate list including at least one reward answer is generated, and one of the candidate keywords 709 is found as the reward answer 711, and the speech synthesis database 740 is queried to find the voice 713 corresponding to the reward answer 711. Finally, the voice response 707 is output according to the voice 713. In addition, the natural language understanding system 720 of the present embodiment may be implemented by a hardware circuit composed of one or several logic gates, or implemented by computer code, which is merely an example, and is not limit.

圖7B是依照本發明另一實施例所繪示的自然語言對話系統700’的方塊圖。圖7B的自然語言理解系統720’可包括語音識別模組722與自然語言處理模組724，而語音取樣模組710可與語音合成模組726合併在一語音綜合處理模組702中。其中，語音識別模組722會接收從語音取樣模組710傳來對語音輸入701進行解析的請求資訊703，並轉換成一個或多個關鍵字709。自然語言處理模組724再對這些關鍵字709進行處理，而獲得至少一個候選列表，並且從候選列表中選出一個較符合語音輸入701者做為回報答案711。由於此回報答案711是自然語言理解系統720 在內部分析而得的答案，所以還必須將轉換成文字或語音輸出才能輸出予用戶，於是語音合成模組726會依據回報答案711來查詢語音合成資料庫740，而此語音合成資料庫740例如是記錄有文字以及其對應的語音資訊，可使得語音合成模組726能夠找出對應於回報答案711的語音713，藉以合成出語音應答707。之後，語音合成模組726可將合成的語音透過語音輸出介面(未繪示)，其中語音輸出介面例如為喇叭、揚聲器、或耳機等裝置)輸出，藉以輸出語音予用戶。應注意的是，在圖7A中，自然語言理解系統720是將語音合成模組726併入其中(例如圖5B的架構，但語音合成模組726未顯示於圖7A中)，而語音合成模組將利用回報答案711對語音合成資料庫740進行查詢以取得語音713，作為合成出語音應答707的依據。Figure 7B is a block diagram of a natural language dialog system 700', in accordance with another embodiment of the present invention. The natural language understanding system 720' of FIG. 7B can include a speech recognition module 722 and a natural language processing module 724, and the speech sampling module 710 can be combined with the speech synthesis module 726 in a speech synthesis processing module 702. The voice recognition module 722 receives the request information 703 sent from the voice sampling module 710 to parse the voice input 701 and converts it into one or more keywords 709. The natural language processing module 724 further processes the keywords 709 to obtain at least one candidate list, and selects one of the candidate lists that is more in line with the voice input 701 as the reward answer 711. Since this reward answer 711 is the natural language understanding system 720 The internal analysis of the answer, so must also be converted to text or voice output to be output to the user, then the speech synthesis module 726 will query the speech synthesis database 740 according to the return answer 711, and the speech synthesis database 740, for example The recorded text and its corresponding voice information enable the speech synthesis module 726 to find the speech 713 corresponding to the reward answer 711, thereby synthesizing the speech response 707. Thereafter, the speech synthesis module 726 can output the synthesized speech through a voice output interface (not shown), wherein the voice output interface is, for example, a device such as a speaker, a speaker, or a headset, to output a voice to the user. It should be noted that in FIG. 7A, the natural language understanding system 720 incorporates the speech synthesis module 726 (eg, the architecture of FIG. 5B, but the speech synthesis module 726 is not shown in FIG. 7A), while the speech synthesis module The group will use the reward answer 711 to query the speech synthesis database 740 to obtain the speech 713 as a basis for synthesizing the speech response 707.

在本實施例中，前述自然語言理解系統720中的語音識別模組722、自然語言處理模組724以及語音合成模組726，可分別等同於圖5B的語音識別模組522、自然語言處理模組524以及語音合成模組526並提供相同的功能。此外，語音識別模組722、自然語言處理模組724以及語音合成模組726可與語音取樣模組710配置在同一機器中。在其他實施例中，語音識別模組722、自然語言處理模組724以及語音合成模組726亦可分散在不同的機器中(例如電腦系統、伺服器或類似裝置/系統)。例如圖7B所示的自然語言理解系統720’，語音合成模組726可與語音取樣模組710配置在同一機器702，而語音識別模組722、自然語言處理模組724 可配置在另一機器。應注意的是，在圖7B的架構中，因語音合成模組726與語音取樣模組710配置在一機器702中，因此自然語音理解系統720就需要將回報答案711傳送至機器702，並由語音合成模組726會將回報答案711送往語音合成資料庫740以尋找對應的語音713，作為產生語音應答707的依據。此外，語音合成模組726在依據回報答案711呼叫語音合成資料庫740時，可能需要先將回報答案711進行格式轉換，然後透過語音合成資料庫740所規定的介面進行呼叫，因這部分屬於本領域的技術人員所熟知的技術，故在此不予詳述。In this embodiment, the speech recognition module 722, the natural language processing module 724, and the speech synthesis module 726 in the natural language understanding system 720 are respectively equivalent to the speech recognition module 522 and the natural language processing module of FIG. 5B. Group 524 and speech synthesis module 526 provide the same functionality. In addition, the speech recognition module 722, the natural language processing module 724, and the speech synthesis module 726 can be disposed in the same machine as the speech sampling module 710. In other embodiments, the speech recognition module 722, the natural language processing module 724, and the speech synthesis module 726 may also be dispersed in different machines (eg, a computer system, a server, or the like). For example, the natural language understanding system 720' shown in FIG. 7B, the speech synthesis module 726 can be disposed in the same machine 702 as the speech sampling module 710, and the speech recognition module 722 and the natural language processing module 724. Can be configured on another machine. It should be noted that in the architecture of FIG. 7B, since the speech synthesis module 726 and the speech sampling module 710 are disposed in a machine 702, the natural speech understanding system 720 needs to transmit the reward answer 711 to the machine 702, and The speech synthesis module 726 sends the reward answer 711 to the speech synthesis database 740 to find the corresponding speech 713 as a basis for generating the speech response 707. In addition, when the speech synthesis module 726 calls the speech synthesis database 740 according to the reward answer 711, it may be necessary to first convert the response answer 711 and then call through the interface specified by the speech synthesis database 740, because this part belongs to the present. Techniques well known to those skilled in the art are not described in detail herein.

以下即結合上述結合圖7A的自然語言對話系統700來說明自然語言對話方法。圖8是依照本發明一實施例所繪示的自然語言對話方法的流程圖。為了方便說明，在此僅舉圖7A的自然語言對話系統800為例，但本實施例的自然語言對話方法亦可適用於上述圖7B的自然語言對話系統700’。與圖5/6相較下，圖5/6所處理的依據用戶的語音輸入而自動進行修正所輸出的資訊，但圖7A/7B/8所處理的是依據特性資料庫730來記錄用戶喜好資料715，並據以從候選列表中選擇一者做回報答案711，並播放其對應語音予用戶。事實上，圖5/6與圖7A/7B/8的實施方式可擇一或並存，發明並不對此加以限制。The natural language dialogue method will be described below in conjunction with the natural language dialog system 700 described above in connection with FIG. 7A. FIG. 8 is a flowchart of a natural language dialogue method according to an embodiment of the invention. For convenience of explanation, only the natural language dialogue system 800 of Fig. 7A is taken as an example, but the natural language dialogue method of the present embodiment can also be applied to the above-described natural language dialogue system 700' of Fig. 7B. Compared with FIG. 5/6, the information outputted by the user's voice input is automatically corrected according to the user's voice input, but FIG. 7A/7B/8 deals with recording the user's preference according to the feature database 730. The data 715 is selected from the candidate list to report the answer 711 and play the corresponding voice to the user. In fact, the embodiments of Figures 5/6 and 7A/7B/8 may alternatively or coexist, and the invention is not limited thereto.

請同時參照圖7A及圖8，於步驟S810中，語音取樣模組710會接收語音輸入701。其中，語音輸入701例如是來自用戶的語音，且語音輸入701還可具有用戶的請求資訊703。具體而言，來自用戶的語音輸入701可以是詢問句、命令句或其他請求資訊等，例如前面提過的實例「我要看三國演義」、「我要聽忘情水的音樂」或「今天溫度幾度」等等。應注意的是，步驟S802-S806為自然語言對話系統700對用戶先前的語音輸入儲存用戶喜好資料715的流程，往後的步驟S810-S840即基於這些先前已儲存在特性資料庫730的用戶喜好資料715進行操作。步驟S802-S806的細節將在後文再行詳述，以下將先講述步驟S820-S840的操作內容。Referring to FIG. 7A and FIG. 8 simultaneously, in step S810, the voice sampling module 710 receives the voice input 701. The voice input 701 is, for example, a voice from a user, and the voice input 701 may also have a request information 703 of the user. Specifically In other words, the voice input 701 from the user may be an inquiry sentence, a command sentence or other request information, such as the example mentioned above, "I want to see the Romance of the Three Kingdoms", "I want to listen to the music of the water" or "A few degrees of today's temperature" and many more. It should be noted that steps S802-S806 are the flow of the natural language dialogue system 700 storing the user preference information 715 for the user's previous voice input, and the subsequent steps S810-S840 are based on the user preferences previously stored in the feature database 730. The data 715 is operated. The details of steps S802-S806 will be described later in detail, and the operation contents of steps S820-S840 will be described below.

于步驟S820中，自然語言理解系統720會解析第一語音輸入701中所包括的至少一個關鍵字709，進而獲得候選列表，其中候選列表具有一個或多個回報答案。詳細而言，自然語言理解系統720會解析語音輸入701，而獲得語音輸入701的一個或多個關鍵字709。舉例來說，當用戶的語音輸入701為「我要看三國演義」時，自然語言理解系統720經過分析後所獲得的關鍵字709例如是「『三國演義』、『看』」(如前所述，還要再分析用戶想看的是書籍、電視劇、或電影)。又例如，當用戶的語音輸入701為「我要聽忘情水的歌」時，自然語言理解系統720經過分析後所獲得的關鍵字709例如是「『忘情水』、『聽』、『歌』」(如前所述，可以再分析用戶想聽的是劉德華或李翊君所演唱的版本)。接後，自然語言理解系統720可依據上述關鍵字709自結構化資料庫進行全文檢索，而獲得至少一筆搜尋結果(可為圖3A/3B的其中的至少一筆記錄)，據以做為候選列表中的回報答案。由於一個關鍵字709可能屬於不同的知識領域(例如電影類、書籍類、音樂類或遊戲類等等)，且同一知識領域中亦可進一步分成多種類別(例如同一電影或書籍名稱的不同作者、同一歌曲名稱的不同演唱者、同一遊戲名稱的不同版本等等)，故針對一個關鍵字709而言，自然語言理解系統720可在分析後(例如對結構化資料庫220進行全文檢索)得到一筆或多筆相關於此關鍵字709的搜尋結果，其包含除了關鍵字709以及關鍵字709以外的其他資訊等等(其他資訊的內容如表一所示)。因此從另一觀點來看，當用戶所輸入的第一語音輸入701具有多個關鍵字709時，則表示用戶的請求資訊703較明確，使得自然語言理解系統720較能分析到與請求資訊703接近的搜尋結果(因為若自然語言理解系統720可找到完全匹配結果時，應該就是用戶想要的選項了)。In step S820, the natural language understanding system 720 parses at least one keyword 709 included in the first voice input 701 to obtain a candidate list, wherein the candidate list has one or more reward answers. In detail, the natural language understanding system 720 parses the speech input 701 and obtains one or more keywords 709 of the speech input 701. For example, when the user's voice input 701 is "I want to see the Romance of the Three Kingdoms", the keyword 709 obtained by the natural language understanding system 720 after analysis is, for example, "The Romance of the Three Kingdoms" and "Look" (as before) Said, but also analyze the user wants to see books, TV series, or movies). For another example, when the user's voice input 701 is "I want to listen to the song of forgetting the water", the keyword 709 obtained by the natural language understanding system 720 after analysis is, for example, "forget the water", "listen", "song" (As mentioned earlier, you can re-analyze what the user wants to hear is the version that Andy Lau or Li Yijun sang). After that, the natural language understanding system 720 can perform full-text search from the structured database according to the above keyword 709, and obtain at least one search result (which may be at least one of the records in FIG. 3A/3B), as a candidate list. The answer in the return. due to A keyword 709 may belong to different knowledge areas (such as movies, books, music, games, etc.), and may be further divided into multiple categories in the same knowledge field (for example, different authors of the same movie or book name, the same Different singers of song names, different versions of the same game name, etc.), so for a keyword 709, the natural language understanding system 720 can get a stroke after analysis (eg, full-text search of the structured repository 220) A plurality of search results related to this keyword 709, which include other information than the keyword 709 and the keyword 709, etc. (the contents of other information are as shown in Table 1). Therefore, from another point of view, when the first voice input 701 input by the user has multiple keywords 709, it indicates that the user's request information 703 is clearer, so that the natural language understanding system 720 can analyze the request information 703. Close search results (because if the natural language understanding system 720 can find the exact match result, it should be the option that the user wants).

舉例來說，當關鍵字709為「三國演義」時，自然語言理解系統720所分析到的搜尋結果例如是關於「...『三國演義』...『電視劇』」、「...『三國演義』...『書籍』」的記錄(其中『電視劇』及『書籍』即為回應結果所指示的用戶意圖)。又例如，當關鍵字709為「『忘情水』、『音樂』」時，自然語言理解系統720所分析到的用戶意圖可能為「...『忘情水』...『音樂』...『劉德華』」、「...『忘情水』...『音樂』...『李翊君』」的記錄，其中『劉德華』、『李翊君』為用以指示用戶意圖的搜尋結果。換言之，在自然語言理解系統720對結構化資料庫220進行全文檢索後，每一筆搜尋結果可包括關鍵字709、以及相關於關鍵字709 的其他資料(如表一所示)，而自然語言理解系統720會依據所分析到的搜尋結果轉換成包含至少一個回報答案的候選列表以供後續步驟使用。For example, when the keyword 709 is "Three Kingdoms", the search result analyzed by the natural language understanding system 720 is, for example, "..."Three Kingdoms"... "TV drama", "..." The records of the "Romance of the Three Kingdoms"... "Books" (where "TV dramas" and "books" are the user's intentions in response to the results). For another example, when the keyword 709 is "Forget Water" or "Music", the user's intention analyzed by the natural language understanding system 720 may be "... "forgetting water"... "music"... "Andy Lau", "... "forget the water"... "Music"... "Li Yujun" record, in which "Andy Lau" and "Li Yujun" are search results to indicate the user's intention. In other words, after the natural language understanding system 720 performs a full-text search on the structured database 220, each search result may include a keyword 709 and a keyword 709. Other information (as shown in Table 1), and the natural language understanding system 720 converts the search results into at least one candidate answer list for use in subsequent steps based on the analyzed search results.

于步驟S830中，自然語言理解系統720根據特性資料庫730所送來的用戶喜好記錄717(例如依據儲存其中的用戶喜好資料715所匯整的結果，後面會對此做說明)，用以自候選列表中選擇一回報答案711，並依據回報答案711輸出語音應答707。在本實施例中，自然語言理解系統720可按照一優先順序(優先順序包含哪些方式以下會再詳述)排列從候選列表中選出回報答案711。而在步驟S840中，依據回報答案711，輸出語音應答707(步驟S840)。In step S830, the natural language understanding system 720 records the user preference record 717 sent by the feature database 730 (for example, according to the result of storing the user preference data 715 stored therein, which will be described later). A return answer 711 is selected from the candidate list, and a voice response 707 is output according to the return answer 711. In the present embodiment, the natural language understanding system 720 can select the reward answer 711 from the candidate list in a prioritized order (which is included in the priority order). In step S840, based on the report answer 711, the voice response 707 is output (step S840).

舉例來說，在一實施例中可以搜尋結果的數量做優先順序，例如當關鍵字709為「三國演義」時，假設自然語言理解系統720在分析後，發現在結構化資料庫220中關於「...『三國演義』...『書籍』」的記錄數量最多，其次為「...『三國演義』...『音樂』」的記錄，而關於「...『三國演義』...『電視劇』」的記錄數量最少，則自然語言理解系統720會將相關於「三國演義的書籍」的記錄做為第一優先回報答案(例如將所有關於「三國演義的書籍」整理成一候選列表，並可依據熱度欄位316的數值進行排序)，相關於「三國演義的音樂」的記錄做為第二優先回報答案，相關於「三國演義的電視劇」的記錄做為第三優先回報答案。應注意的是，除了搜尋結果的數量外，作為優先順序的依據還可以是用戶喜好、用戶習慣、或是眾人使用習慣，相關的敍述往後會再詳述。For example, in an embodiment, the number of search results may be prioritized. For example, when the keyword 709 is "Three Kingdoms", it is assumed that the natural language understanding system 720 finds in the structured database 220 after analysis. ... "The Romance of the Three Kingdoms"... "Books" has the largest number of records, followed by "..."The Romance of the Three Kingdoms..."Music", and "..."The Romance of the Three Kingdoms." . . . "TV drama" has the fewest records, then the natural language understanding system 720 will record the records related to the "Book of the Three Kingdoms" as the first priority return answer (for example, sorting all the books about the Romance of the Three Kingdoms into one candidate) List, and can be sorted according to the value of the heat field 316), the record related to the "Music of the Three Kingdoms" as the second priority return answer, the record related to the "TV drama of the Three Kingdoms" as the third priority return answer . It should be noted that in addition to the number of search results, the priority may be based on the user's preference. Good, user habits, or habits of everyone, the relevant narrative will be detailed later.

為了使本領域的技術人員進一步瞭解本實施例的自然語言對話方法以及自然語言對話系統，以下再舉一實施例進行詳細的說明。In order to enable those skilled in the art to further understand the natural language dialogue method and the natural language dialogue system of the present embodiment, an embodiment will be described in detail below.

首先，假設語音取樣模組710接收的第一語音輸入701為「我要看三國演義」(步驟S810)，接著，自然語言理解系統720可解析出為「『看』、『三國演義』」的關鍵字709，並獲得具有多個回報答案的候選列表，其中每一個回報答案具有相關的關鍵字(步驟S820)與其他資訊，亦如上述的表一所示。First, it is assumed that the first voice input 701 received by the voice sampling module 710 is "I want to see the Romance of the Three Kingdoms" (step S810), and then, the natural language understanding system 720 can be interpreted as ""seeing" and "the Romance of the Three Kingdoms". Keyword 709, and obtain a candidate list with multiple reward answers, each of which has an associated keyword (step S820) and other information, as shown in Table 1 above.

接著，自然語言理解系統720會在候選列表中選出回報答案。假設自然語言理解系統720選取候選列表中的回報答案a(請參考表一)以做為第一回報答案711，則自然語言理解系統720例如是輸出「是否播放三國演義的書籍」，作為語音應答707(步驟S830~S840)。Next, the natural language understanding system 720 will select the reward answer in the candidate list. Assuming that the natural language understanding system 720 selects the reward answer a in the candidate list (refer to Table 1) as the first reward answer 711, the natural language understanding system 720 outputs, for example, "whether or not the book of the Three Kingdoms is played" as a voice response. 707 (steps S830 to S840).

如上所述，自然語言理解系統720還可依照不同評估優先順序的方法，來排序候選列表中的回報答案，據此輸出對應於回報答案711的語音應答707。舉例來說，自然語言理解系統720可依據與使用者的多個對話記錄判斷用戶喜好(例如前面提過的使用用戶的正面/負向用語)，亦即可利用該用戶喜好記錄717決定回報答案711的優先順序。然在解說用戶正面/負面用語的使用方式之前，先對用戶喜好資料715在儲存用戶/眾人的喜好/厭惡或習慣的方式做說明。As described above, the natural language understanding system 720 can also sort the reward answers in the candidate list in accordance with different methods of evaluating the priority order, and accordingly output a voice response 707 corresponding to the reward answer 711. For example, the natural language understanding system 720 can determine user preferences based on a plurality of conversation records with the user (eg, using the user's positive/negative terms as mentioned above), and can also use the user preference record 717 to determine the return answer. 711 priority order. Before explaining the usage of the user's positive/negative terms, the user preference information 715 is first described in terms of storing the user/person's preferences/dislikes or habits.

現在依據步驟S802-806關於用戶喜好資料715的儲存方式。在一實施例中，可在步驟S810接收語音輸入701之前，即在步驟S802中接收多個語音輸入，也就是先前的歷史對話記錄，並根據這些先前的多個語音輸入701，擷取用戶喜好資料715(步驟S804)，然後儲存在特性資料庫730中。事實上，用戶喜好資料715亦可儲存在結構化資料庫220中(或說是將特性資料庫730併入結構化資料庫220的方式)。舉例來說，在一實施例中，可以直接利用圖3B的熱度欄位316來記錄用戶的喜好，至於熱度欄位316的記錄方式前面已提過(例如某一記錄302被匹配時即將其熱度欄位加一)，在此不予贅述。當然，也可以在結構化資料庫220另辟欄位來儲存用戶喜好資料715，例如用關鍵字(例如“三國演義”)為基礎，結合用戶喜好(例如當用戶提到“喜歡”等正向用語以及“厭惡”等負面用語時，可分別在圖3B的喜好欄位318與厭惡欄位320的數值加一)，然後計算喜好的數量(例如統計正向用語與等負面用語的數量)。於是自然語言理解系統720對結構化資料庫200查詢用戶喜好記錄717時，可以直接查詢喜好欄位318與厭惡欄位320的數值(可查詢正向用語與等負面用語各有多少數量)，再據以判斷用戶的喜好(亦即將正面用語及負面用語的統計數值作為用戶喜好記錄717傳送至自然語言理解系統720)。The manner in which the user preference profile 715 is stored is now determined in accordance with steps S802-806. In an embodiment, before receiving the voice input 701 in step S810, a plurality of voice inputs, that is, previous history conversation records, may be received in step S802, and user preferences are captured based on the previous plurality of voice inputs 701. The data 715 (step S804) is then stored in the property database 730. In fact, the user preference profile 715 can also be stored in the structured repository 220 (or in a manner that incorporates the feature repository 730 into the structured repository 220). For example, in one embodiment, the popularity field 316 of FIG. 3B can be utilized directly to record the user's preferences, as the recording mode of the heat field 316 has been previously mentioned (eg, when a record 302 is matched, the heat is about to be used. The field is incremented by one) and will not be repeated here. Of course, it is also possible to store the user preference information 715 in the structured database 220, for example, by using keywords (such as "Three Kingdoms"), combined with user preferences (for example, when the user mentions "like" and the like. In terms of terms and negative terms such as "disgusting", the values of the favorite field 318 and the disgusting field 320 of FIG. 3B may be respectively added to one), and then the number of favorites (for example, the number of positive and negative terms) may be calculated. Therefore, when the natural language understanding system 720 queries the structured database 200 for the user preference record 717, the value of the preference field 318 and the aversion field 320 can be directly queried (the number of positive terms and other negative terms can be queried), and then According to the judgment of the user's preference (the statistical value of the positive term and the negative term is also transmitted as the user preference record 717 to the natural language understanding system 720).

以下將描述將用戶喜好資訊715儲存在特性資料庫730的情形(亦即特性資料庫730不併入結構化資料庫220)。在一實施例中，用戶喜好資訊715可使用關鍵字與用戶對此關鍵字的“喜好”的對應方式來儲存，舉例來說，用戶喜好資訊715的儲存可直接使用圖8B的喜好欄位852與厭惡欄位862來記錄用戶個人對某關鍵字的喜好與厭惡，並以喜好欄位854與厭惡欄位864來記錄眾人對此組關鍵字的喜好與厭惡。例如在圖8B中，記錄832所儲存的關鍵字「『三國演義』、『書籍』」所對應喜好欄位852與厭惡欄位862的數值為分別為20與1、記錄834所儲存的關鍵字「『三國演義』、『電視劇』」所對應的喜好欄位852與厭惡欄位862的數值為分別8與20、記錄836所儲存的關鍵字「『三國演義』、『音樂』」所對應的喜好欄位852與厭惡欄位862的數值為分別為1與8，其皆表示用戶個人對於相關關鍵字的喜好與厭惡資料(例如喜好欄位852的數值越高表示越喜歡、厭惡欄位862的數值越高表示越厭惡)。此外，記錄832所對應喜好欄位854與厭惡欄位864的數值為分別為5與3、記錄834所對應的喜好欄位854與厭惡欄位864的數值為分別80與20、記錄836所對應的喜好欄位854與厭惡欄位864的數值為分別為2與10，其是表示眾人對於相關關鍵字的喜好與厭惡資料(以“喜好指示”簡稱之)，於是便可依據用戶的喜好來增加喜好欄位852與厭惡欄位862的數值。因此，若用戶輸入“我想看三國演義的電視劇”的語音時，自然語言理解系統720可將“關鍵字”「『三國演義』、『電視劇』」與增加喜好欄位數值的“喜好指示”合併成用戶喜好資料715送往特性資料庫730，於是特性資料庫730可在記錄834的喜好欄位852數值進行加一的操作(因為用戶想看「『三國演義』、『電視劇』」，表示其喜好度增加)。依據上述記錄用戶喜好資料的方式，往後當用戶又再輸入相關的關鍵字時，例如用戶在輸入“我要看三國演義”時，自然語言理解系統720可依據關鍵字“三國演義”在圖8B的特性資料庫730查詢到三筆與“三國演義”相關的記錄832/834/836，而特性資料庫730可將喜好欄位852與厭惡欄位862的數值做為用戶喜好記錄717回傳給自然語言理解系統720，於是自然語言理解系統720可依據用戶喜好記錄717作為判斷用戶個人的喜好依據。當然，特性資料庫730亦可將喜好欄位854與厭惡欄位864的數值做為用戶喜好記錄717回傳給自然語言理解系統720，只是此時用戶喜好記錄717將作為判斷眾人喜好的依據，本發明對用戶喜好記錄717代表的是用戶個人或是眾人的喜好並不加以限制。The case where the user preference information 715 is stored in the feature database 730 (i.e., the feature database 730 is not incorporated into the structured library 220) will be described below. In an embodiment, the user preference information 715 can use the keyword and the user's "hi" for the keyword. A good way to store, for example, the user preference information 715 can be stored directly using the favorite field 852 and the aversive field 862 of FIG. 8B to record the user's personal preference and dislike of a keyword, and the favorite column Bit 854 and disgusting field 864 are used to record the preferences and dislikes of the group of keywords. For example, in FIG. 8B, record 832 stores the keyword "The Romance of the Three Kingdoms" and "Book" corresponding to the favorite field 852 and The values of the dislike field 862 are 20 and 1, respectively, and the values of the favorite field 852 and the disgusting field 862 corresponding to the keyword "Three Kingdoms" and "TV drama" stored in the record 834 are 8 and 20, respectively. The values of the favorite field 852 and the disgusting field 862 corresponding to the keyword "Three Kingdoms" and "Music" stored in the record 836 are 1 and 8, respectively, which indicate the user's personal preference for related keywords. Aversion data (for example, the higher the value of the favorite field 852, the more like, the higher the value of the field 862 is, the more disgusting it is). In addition, the values of the favorite field 854 and the aversion field 864 corresponding to the record 832 are 5 and 3, respectively, and the values of the favorite field 854 and the aversive field 864 corresponding to the record 834 are respectively 80 and 20, and the record 836 corresponds to The values of the favorite field 854 and the aversion field 864 are 2 and 10, respectively, which indicate the preference and disgusting information of the relevant keywords (referred to as "favorite indication"), so that the user's preference can be used. Increase the value of the favorite field 852 and the aversion field 862. Therefore, if the user inputs the voice "I want to watch the TV series of the Three Kingdoms", the natural language understanding system 720 can use the "keyword" ""Three Kingdoms", "TV drama"" and the "favorite indication" to increase the value of the favorite field. The merged user preference data 715 is sent to the feature database 730, so that the feature database 730 can be added to the value of the favorite field 852 of the record 834 (because the user wants to see "Three Kingdoms", "TV" "The drama" indicates that its preference has increased). According to the above method of recording user preference data, when the user inputs the relevant keyword again, for example, when the user inputs "I want to see the Romance of the Three Kingdoms", the natural language understanding system 720 can be based on the keyword "Three Kingdoms". The feature database 730 of 8B queries three records 832/834/836 related to "Three Kingdoms", and the feature database 730 can use the value of the favorite field 852 and the aversion field 862 as the user preference record 717 back. To the natural language understanding system 720, the natural language understanding system 720 can then use the user preference record 717 as a basis for determining the user's personal preferences. Of course, the feature database 730 can also return the value of the favorite field 854 and the dislike field 864 as the user preference record 717 to the natural language understanding system 720, but the user preference record 717 will be used as a basis for judging the preference of the person. The present invention does not limit the user preference record 717 to represent the preferences of the user or the individual.

在另一實施例中，喜好欄位852與厭惡欄位862的數值亦可作為判斷用戶/眾人習慣的依據。舉例來說，自然語言理解系統720可在接收用戶喜好記錄717後，先判斷喜好欄位852/854與厭惡欄位862/864的數值差異，若兩個數值相差到了某個臨界值之上，表示用戶習慣使用特定的方式來進行對話，例如當喜好欄位852的數值較厭惡欄位862的數值大了10次以上，表示用戶特別喜歡使用“正面用語”作對話(此即“用戶習慣”的一種記錄方式)，因此自然語言理解系統720在這個情形下可僅以喜好欄位852來選取回報答案。當自然語言理解系統720使用的是特性資料庫730所儲存的喜好欄位854/厭惡欄位864的數值時，表示所判斷的是特性資料庫730所有用戶的喜好記錄，而判斷結果即可以作為眾人使用習慣的參考資料。應注意的是，由特性資料庫730回傳給自然語言理解系統720的用戶喜好記錄717可同時包含用戶個人的喜好記錄(例如喜好欄位852/厭惡欄位862的數值)與眾人的喜好記錄(例如喜好欄位854/厭惡欄位864的數值)，本發明對此並不加以限制。In another embodiment, the values of the preference field 852 and the aversion field 862 can also serve as a basis for judging the user/people's habits. For example, the natural language understanding system 720 can determine the difference between the preference field 852/854 and the aversion field 862/864 after receiving the user preference record 717. If the two values differ by a certain threshold, Indicates that the user is accustomed to using a specific way to conduct a dialogue. For example, when the value of the favorite field 852 is greater than the value of the aversive field 862 by more than 10 times, the user particularly likes to use the "positive language" for the dialogue (this is "user habit" A way of recording), therefore, the natural language understanding system 720 can only select the reward answer in the preferred field 852 in this case. When the natural language understanding system 720 uses the value of the favorite field 854/aversion field 864 stored in the property database 730, it indicates the judged It is a favorite record of all users of the feature database 730, and the judgment result can be used as a reference material for everyone's usage habits. It should be noted that the user preference record 717 returned by the feature database 730 to the natural language understanding system 720 can include both the user's personal preference record (eg, the value of the favorite field 852/disgusting field 862) and the preference record of the individual. (For example, the value of the favorite field 854/aversion field 864), the present invention is not limited thereto.

至於對基於本次的語音輸入所獲得的用戶喜好資料715的儲存，可在步驟S820產生候選列表時(不論是完全匹配或部分匹配)，由自然語言對話系統700儲存此次在用戶語音輸入中所取得的用戶喜好資料715。例如在步驟S820中，每當關鍵字可在結構化資料庫220中產生匹配結果時，即可判定用戶對此匹配結果是有所偏好的傾向，因此可以將“關鍵字”與“喜好指示”送往特性資料庫730，並在其中找到對應的記錄後，變更對應記錄其對應的喜好欄位852/854或厭惡欄位862/864數值(例如當用戶輸入“我想看三國演義的書籍”時，可對圖8B的記錄832的喜好欄位852/854的數值加一)。在又一實施例中，自然語言對話系統700亦可在步驟S830中，於用戶選取一回報答案後才儲存用戶喜好資料715。此外，若當未在特性資料庫730找到對應的關鍵字時，可以建立一新的記錄來儲存用戶喜好資料715。例如當用戶輸入“我聽劉德華的忘情水”的語音並產生關鍵字「『劉德華』、『忘情水』」時，若進行儲存時未在特性資料庫730找到對應的記錄，所以將在特性資料庫730建立新的記錄838，並在其對應的喜好欄位 852/854數值加一。上述的用戶喜好資料715儲存時機與儲存方式，僅為說明之用，本領域的技術人員可依據實際應用變更本發明所示的實施例，但所有不脫離本發明精神所為的等效修飾仍應包含在本發明權利要求中。As for the storage of the user preference data 715 obtained based on the current voice input, the candidate list may be generated in step S820 (whether it is an exact match or a partial match), and the natural language dialogue system 700 stores the current voice input in the user. User preference data 715 obtained. For example, in step S820, whenever the keyword can generate a matching result in the structured database 220, it can be determined that the user has a preference for the matching result, so "keyword" and "favorite indication" can be used. After sending to the feature database 730 and finding the corresponding record therein, the corresponding record records the corresponding favorite field 852/854 or the aversion field 862/864 (for example, when the user inputs "I want to see the books of the Three Kingdoms" At that time, the value of the favorite field 852/854 of the record 832 of FIG. 8B may be incremented by one). In another embodiment, the natural language dialogue system 700 may also store the user preference information 715 after the user selects a reward answer in step S830. In addition, if a corresponding keyword is not found in the property database 730, a new record can be created to store the user preference profile 715. For example, when the user inputs the voice "I listen to Andy Lau's forgotten water" and generates the keyword "Andy Lau", "forget the water", if the corresponding record is not found in the feature database 730 when storing, the feature data will be Library 730 creates a new record 838 and in its corresponding favorite field The value of 852/854 is increased by one. The above-mentioned user preference information 715 storage timing and storage mode are for illustrative purposes only, and those skilled in the art can change the embodiment shown in the present invention according to the actual application, but all equivalent modifications should not be deviated from the spirit of the present invention. It is included in the claims of the present invention.

此外，雖然在圖8B所示的特性資料庫730儲存記錄832-838的格式與結構化資料庫220的記錄格式(例如圖3A/3B/3C所示者)並不相同，但本發明對各個記錄的儲存格式並不加以限制。再者，雖然上述實施例僅講述喜好欄位852/854與厭惡欄位862/864的儲存與使用方式，但在另一實施例中，可在特性資料庫730另辟欄位872/874以分別儲存用戶/眾人的其他習慣，例如該筆記錄對應的資料被下載、引用、推薦、評論、或轉介...的次數等資料。在另一實施例中，這些下載、引用、推薦、評論、或轉介的次數或資料亦可集中以喜好欄位852/854與厭惡欄位862/864作儲存，例如用戶每次對某項記錄提供好的評論或轉介予他人參考時可在喜好欄位852/854的數值加一、若用戶對某項記錄提供不好的評論時即可在厭惡欄位862/864的數值加一，本發明對記錄的數量與欄位的數值記錄方式皆不予限制。應注意的是，本領域的技術人員應知，因圖8B中的喜好欄位852、厭惡欄位862...等僅與用戶個人的選擇與喜好相關，所以可將這些用戶個人的選擇/喜好/厭惡資訊儲存在用戶的移動通訊裝置中，而與全體用戶相關的喜好欄位854、厭惡欄位864...等資訊就儲存在伺服器中，於是亦可節省伺服器的儲存空間，也保留用戶個人喜好的隱密性。Further, although the format in which the records 832-838 are stored in the property database 730 shown in FIG. 8B is different from the recording format of the structured library 220 (for example, as shown in FIGS. 3A/3B/3C), the present invention The storage format of the record is not limited. Moreover, although the above embodiment only describes the storage and use of the favorite field 852/854 and the aversive field 862/864, in another embodiment, the field 872/874 can be additionally opened in the property database 730. Store other habits of the user/person, such as the number of times the data corresponding to the record is downloaded, quoted, recommended, commented, or referred to... In another embodiment, the number or data of these downloads, citations, recommendations, comments, or referrals may also be stored in the favorite field 852/854 and the aversive field 862/864, for example, each time the user pairs an item. If you record a good comment or refer it to someone else's reference, you can add one to the value of the favorite field 852/854. If the user gives a bad comment on a record, you can add one to the value of the aversion field 862/864. The present invention does not limit the number of records and the numerical recording method of the fields. It should be noted that those skilled in the art should understand that since the favorite field 852, the aversion field 862, etc. in FIG. 8B are only related to the user's personal choices and preferences, the personal selection of these users can be The favorite/disgusting information is stored in the user's mobile communication device, and information such as the favorite field 854 and the aversion field 864... associated with all users are stored in the server, thereby saving storage space of the server. It also preserves the privacy of the user's personal preferences.

以下再利用圖7A與圖8B對用戶的實際使用狀況做更進一步的說明。基於多個語音輸入701的對話內容，假設用戶與自然語言理解系統720進行對話時，經常提到「我討厭看三國演義的電視劇」，而較少提到「我討厭聽三國演義的音樂」，且更少提到「我討厭聽三國演義的書籍」(例如特性資料庫730中記錄有20筆關於「我討厭看三國演義的電視劇」的記錄(亦即在圖8B中，“三國演義”加“電視劇”的負面用語的數量就是20)，8筆關於「我討厭聽三國演義的音樂」的記錄(亦即在圖8B中，“三國演義”加“音樂”的負面用語的數量是8)，以及1筆關於「我討厭聽三國演義的書籍」)(亦即在圖8B中，“三國演義”加“書籍”的負面用語的數量是1)，因為從特性資料庫730所回傳的用戶喜好記錄717將包含這三個負面用語的數量(亦即20、8、1)，則自然語言理解系統720會將候選列表中的回報答案711的優先順序依序排列為「三國演義的書籍」、「三國演義的音樂」、以及「三國演義的電視劇」。也就是說，當關鍵字709為「三國演義」時，自然語言理解系統720會選擇「三國演義」的書籍來做為回報答案711，並依據此回報答案711輸出語音應答707。應注意的是，雖然上述是單獨使用用戶所用過的負面用語的統計數值來列優先順序，但在另一實施例中，仍可單獨使用用戶所用過的正面用語的統計數值來列優先順序(例如先前提到的，喜好欄位852的數值比厭惡欄位862某一個臨界值之上)。The actual use of the user will be further explained below using FIG. 7A and FIG. 8B. Based on the conversation content of the plurality of voice inputs 701, it is often mentioned that "the user hates to watch the drama of the Three Kingdoms" when the user engages in dialogue with the natural language understanding system 720, and less "I hate listening to the music of the Three Kingdoms". And less mentioned "I hate listening to the Romance of the Three Kingdoms" (for example, the feature database 730 records 20 records about "I hate to watch the drama of the Three Kingdoms" (that is, in Figure 8B, "Three Kingdoms" plus The number of negative terms in "TV series" is 20), and the number of negative words in "I hate listening to the music of the Three Kingdoms" (that is, in Figure 8B, the number of negative terms in "Three Kingdoms" plus "Music" is 8) And one of the "books that I hate listening to the Romance of the Three Kingdoms") (that is, in Figure 8B, the number of negative terms in the "Three Kingdoms" plus "books" is 1) because of the return from the feature database 730. The user preference record 717 will contain the number of these three negative terms (ie, 20, 8, 1), and the natural language understanding system 720 will sequentially order the priority of the return answer 711 in the candidate list as "the book of the Three Kingdoms". , "The Music of the Romance of the Three Kingdoms" and "The TV Drama of the Romance of the Three Kingdoms." That is to say, when the keyword 709 is "Three Kingdoms", the natural language understanding system 720 selects the "Three Kingdoms" book as the reward answer 711, and outputs a voice response 707 based on the reward answer 711. It should be noted that although the above is a prioritized use of the statistical values of the negative terms used by the user, in another embodiment, the statistical values of the positive terms used by the user may still be used alone to prioritize ( For example, as previously mentioned, the value of the favorite field 852 is above a certain threshold of the aversion field 862).

值得一提的是，自然語言理解系統720還可同時依據用戶使用的正面用語與負面用語的多寡，以決定回報答案的優先順序。具體來說，特性資料庫730還可記錄有用戶所表達過的關鍵字，例如：「喜歡」、「偶像」(以上為正面用語)、「厭惡」或「討厭」(以上為負面用語)等等。因此，自然語言理解系統720除了可比較用戶使用“喜歡”與“厭惡”的相差次數之外，還可自候選列表中，直接依據上述關鍵字所對應的正面/負面用語次數來對回報答案進行排序(亦即比較正面用語或負面用語哪者的引用次數較多)。舉例來說，假設回報答案中相關於「喜歡」的次數較多(亦即正面用語的引用次數較多、或是喜好欄位852的數值比較大)，則此回報答案會優先被選取。或者，假設回報答案中相關於「厭惡」的次數較多(亦即負面用語的引用次數較多、或是厭惡欄位862的數值比較大)，則較後被選取，於是自然語言理解系統720可將所有的回報答案依據上述的優先順序排列方式整理出一個候選列表。由於部分用戶可能偏好使用正面用語(例如喜好欄位852的數值特別大)、而另一些用戶則習慣使用負面用語(例如厭惡欄位862的數值特別大)，因此在上述實施例中，因用戶喜好記錄717將反映個別用戶的使用習慣，因此可以提供更符合用戶習慣的選項供其選取。It is worth mentioning that the natural language understanding system 720 can also be used at the same time. The amount of positive and negative terms used by the household to determine the priority of the return answer. Specifically, the feature database 730 can also record keywords that the user has expressed, such as "like", "idol" (above is positive), "disgust" or "hate" (above negative words), etc. Wait. Therefore, in addition to comparing the number of times the user uses "like" and "disgust", the natural language understanding system 720 can also directly report the reward answer according to the number of positive/negative terms corresponding to the keyword from the candidate list. Sorting (that is, comparing positive or negative terms to which the number of citations is more). For example, if there are more times in the return answer related to "like" (that is, if the number of citations of the positive term is more, or if the value of the favorite field 852 is larger), the answer will be selected first. Or, assuming that there are more times in the return answer related to "disgust" (that is, the number of references to negative terms is greater, or the value of the aversion field 862 is larger), then it is selected later, so the natural language understanding system 720 All of the reward answers can be sorted into a candidate list according to the above prioritization. Since some users may prefer to use positive terms (for example, the value of the favorite field 852 is particularly large), while others are accustomed to using negative terms (for example, the value of the aversive field 862 is particularly large), in the above embodiment, the user The preference record 717 will reflect the usage habits of individual users, so it is possible to provide options that are more in line with the user's habits for selection.

此外，自然語言理解系統720亦可依據眾人使用習慣，來排序候選列表中的回報答案711的優先順序，其中越是關於眾人經常使用的答案則優先排列(例如使用圖3C的熱度欄位316做記錄)。例如，當關鍵字709為「三國演義」時，假設自然語言理解系統720找到的回報答案例如為三國演義的電視劇、三國演義的書籍與三國演義的音樂。其中，若眾人提到「三國演義」時通常是指「三國演義」的電視劇，較少人會指「三國演義」的電影，而更少人會指「三國演義」的書籍，(例如圖8B中，相關記錄在喜好欄位854的數值分別為80、40、5)，則自然語言理解系統720會按照優先順序排序關於「電視劇」、「電影」、「書籍」的回報答案711。也就是說，自然語言理解系統720會優先選擇「三國演義的電視劇」來做為回報答案711，並依據此回報答案711輸出語音應答707。至於上述的“眾人經常使用的答案優先排列”的方式，可以使用圖3C的熱度欄位316做記錄，而記錄方式已在上述圖3C的相關段落揭示，在此不予贅述。In addition, the natural language understanding system 720 can also prioritize the reward answers 711 in the candidate list according to the usage habits of the people, wherein the more frequently the answers frequently used by the people are prioritized (for example, using the heat field 316 of FIG. 3C). recording). For example, when the keyword 709 is "Three Kingdoms", assuming natural language theory The reward answers found by the solution system 720 are, for example, the TV series of the Romance of the Three Kingdoms, the books of the Romance of the Three Kingdoms, and the music of the Romance of the Three Kingdoms. Among them, if the people refer to the "Three Kingdoms", they usually refer to the "Three Kingdoms" TV series. Less people will refer to the "Three Kingdoms" movie, and fewer people will refer to the "Three Kingdoms" book (for example, Figure 8B). In the case where the values of the related records in the favorite field 854 are 80, 40, and 5, respectively, the natural language understanding system 720 sorts the return answers 711 regarding "drama", "movie", and "book" in order of priority. That is to say, the natural language understanding system 720 preferentially selects the "TV drama of the Three Kingdoms" as the reward answer 711, and outputs a voice response 707 based on the reward answer 711. As for the above-mentioned "priority of the answers frequently used by the people", the heat field 316 of FIG. 3C can be used for recording, and the recording mode has been disclosed in the relevant paragraph of FIG. 3C above, and will not be described herein.

此外，自然語言理解系統720也可依據用戶的使用頻率以決定回報答案711的優先順序。具體來說，因自然語言理解系統720可將曾經接收到來自用戶的語音輸入701記錄在特性資料庫730，特性資料庫730可記錄自然語言理解系統720解析用戶的語音輸入701時，所獲得的關鍵字709以及自然語言理解系統720所有產生過的回報答案711等應答資訊。因此自然語言理解系統720在往後選擇回報答案711時，可根據特性資料庫730中所記錄的應答資訊(例如用戶喜好/厭惡/習慣、甚至是眾人喜好/厭惡/習慣...等資訊)，按照優先排序找出較符合用戶意圖(由用戶的語音輸入所判定)的回報答案711，藉以對應的的語音應答。至於上述“依據用戶習慣決定回報答案711的優先順序”的方式，亦可使用圖3C的熱度欄位316做記錄，而記錄方式已在上述圖3C的相關段落揭示，在此不予贅述。In addition, the natural language understanding system 720 can also determine the priority order of the reward answer 711 according to the frequency of use of the user. Specifically, since the natural language understanding system 720 can record the voice input 701 that has been received from the user in the property database 730, the property database 730 can record the natural language understanding system 720 that resolves the user's voice input 701. The keyword 709 and the natural language understanding system 720 all generate response messages such as the answer 711. Therefore, when the natural language understanding system 720 selects the reward answer 711 in the future, it can be based on the response information recorded in the feature database 730 (for example, user preference/disgust/habit, even everyone's preference/disgust/habit...) According to the prioritization, a reward answer 711 that is more in line with the user's intention (determined by the user's voice input) is found, and the corresponding voice response is obtained. As for the above-mentioned "priority order of returning the answer 711 according to the user's habits", it is also possible to make Recording is performed using the heat field 316 of FIG. 3C, and the recording mode has been disclosed in the relevant paragraph of FIG. 3C above, and will not be described herein.

綜合上述，自然語言理解系統720可將上述的用戶喜好屬性(例如正面用語與負面用語)、用戶習慣及眾人使用習慣儲存至特性資料庫730中(步驟S806)。也就是說，在步驟S802、步驟S804及步驟S806中，從用戶的先前的歷史對話記錄獲知用戶喜好資料715，並將所搜集到的用戶喜好資料715加入特性資料庫730中，此外，也將用戶習慣與眾人使用習慣儲存至特性資料庫730，讓自然語言理解系統720能利用特性資料庫730中豐富資訊(例如用戶喜好記錄717)，提供用戶更正確的應答。In summary, the natural language understanding system 720 can store the user preference attributes (eg, positive and negative terms), user habits, and usage habits described above into the feature database 730 (step S806). That is, in step S802, step S804, and step S806, the user preference information 715 is obtained from the user's previous historical conversation record, and the collected user preference data 715 is added to the feature database 730, and The user is accustomed to storing the usage habits to the feature database 730, allowing the natural language understanding system 720 to utilize the rich information in the feature database 730 (eg, user preference record 717) to provide a more accurate response from the user.

接下來對步驟S830的細節做進一步描述。在步驟S830中，是在步驟S810接收語音輸入、並在S820解析語音輸入的關鍵字709以獲得候選列表後，接著，自然語言理解系統720依據將用戶喜好、用戶習慣或眾人使用習慣等用戶喜好記錄717，決定至少一回報答案的優先順序(步驟S880)。如上所述，優先順序可以透過搜尋的記錄數量、用戶或眾人的正面/負面用語等方式為依據。接著，依據優先順序自候選列表中選擇一回報答案711(步驟S890)，至於回報答案的選擇亦可如上所述，選擇匹配程度最高者、或是優先順序最高者。之後，依據回報答案711，輸出語音應答707(步驟S840)。Next, the details of step S830 are further described. In step S830, after receiving the voice input in step S810 and parsing the keyword 709 of the voice input at S820 to obtain the candidate list, the natural language understanding system 720 then follows the user preferences such as user preference, user habits, or crowd use habits. Record 717, determining at least one priority order for returning the answer (step S880). As mentioned above, the priority order can be based on the number of records searched, the positive/negative terms of the user or the individual. Then, a return answer 711 is selected from the candidate list according to the priority order (step S890), and the selection of the return answer may also be the highest match or the highest priority as described above. Thereafter, based on the return answer 711, the voice response 707 is output (step S840).

另一方面，自然語言理解系統720還可依據用戶更早輸入的語音輸入701，以決定至少一回報答案的優先順序。也就是說，假設有另一個語音輸入701(例如前面提到的第四語音輸入)被語音取樣模組710所接收的時間提前於語音應答707被播放時，則自然語言理解系統720亦可透過解析這個語音輸入701(亦即第四語音輸入)中的關鍵字(亦即第四關鍵字709)，並在候選列表中，優先選取與此關鍵字符合的回報答案以做為回報答案711，並依據此回報答案711輸出語音應答707。On the other hand, the natural language understanding system 720 can also determine the priority order of at least one reward answer based on the voice input 701 entered by the user earlier. That is It is said that the natural language understanding system 720 can also analyze this by assuming that another voice input 701 (such as the aforementioned fourth voice input) is received by the voice sampling module 710 in advance of the voice response 707 being played. The keyword in the voice input 701 (that is, the fourth voice input) (that is, the fourth keyword 709), and in the candidate list, the return answer corresponding to the keyword is preferentially selected as the reward answer 711, and This reward answer 711 outputs a voice response 707.

舉例來說，假設自然語言理解系統720先接收到「我想看電視劇」的語音輸入701，且隔了幾秒之後，假設自然語言理解系統720又接收到「幫我放三國演義好了」的語音輸入701。此時，自然語言理解系統720可在第一次的語音輸入701中識別到「電視劇」的關鍵字(第一關鍵字)，且在後面識別到「三國演義」的關鍵字(第四關鍵字)，因此，自然語言理解系統720會從候選列表中，選取意圖資料是關於「三國演義」與「電視劇」的回報答案，並以此回報答案711據以輸出用語音應答707予用戶。For example, assume that the natural language understanding system 720 first receives the voice input 701 of "I want to watch a TV show", and after a few seconds, assume that the natural language understanding system 720 receives the "Help me put the Romance of the Three Kingdoms". Voice input 701. At this time, the natural language understanding system 720 can recognize the keyword (first keyword) of the "TV drama" in the first voice input 701, and recognize the keyword of the "Three Kingdoms" (fourth keyword) Therefore, the natural language understanding system 720 selects the intent data from the candidate list as the return answer for the "Three Kingdoms" and "TV drama", and returns the answer 711 to output the voice response 707 to the user.

基於上述，自然語言理解系統720可依據來自用戶的語音輸入，並參酌眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等資訊，而輸出較能符合語音輸入701的請求資訊703的語音應答707予用戶。其中，自然語言理解系統720可依據不同的排序方式，例如眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等方式，來優先排序候選列表中的回報答案。藉此，若來自用戶的語音輸入701較不明確時，自然語言理解系統720可參酌眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話，來判斷出用戶的語音輸入701中所意指的意圖(例如語音輸入中的關鍵字709的屬性、知識領域等等)。換言之，若回報答案711與用戶曾表達過/眾人所指的意圖接近時，自然語言理解系統720則會優先考慮此回報答案711。如此一來，自然語言對話系統700所輸出的語音應答707，可較符合用戶的請求資訊703。Based on the above, the natural language understanding system 720 can output the request information 703 that conforms to the voice input 701 according to the voice input from the user and the information of the user's usage habits, user preferences, user habits, or the user's talks. The voice response 707 is given to the user. The natural language understanding system 720 can prioritize the reward answers in the candidate list according to different sorting methods, such as the usage habits, user preferences, user habits, or the user's forward and backward conversations. Thereby, if the voice input 701 from the user is less clear, the natural language understanding system 720 can refer to the usage habits, user preferences, user habits or users. The contextual conversation is said to determine the intended intent of the user's speech input 701 (eg, the attributes of the keyword 709 in the speech input, the knowledge domain, etc.). In other words, if the reward answer 711 is close to what the user has expressed/intended by the person, the natural language understanding system 720 will prioritize the reward answer 711. In this way, the voice response 707 output by the natural language dialogue system 700 can be more consistent with the user's request information 703.

應注意的是，雖然上述將特性資料庫730與結構化資料庫220以不同的資料庫做描述，但這兩個資料庫可合併在一起，本領域的技術人員可依據實際應用進行選擇。It should be noted that although the feature database 730 and the structured database 220 are described in different databases, the two databases may be combined, and those skilled in the art may select according to actual applications.

綜上所述，本發明提供一種自然語言對話方法及其系統，自然語言對話系統可依據來自用戶的語音輸入而輸出對應的語音應答。本發明的自然語言對話系統還可依據依據眾人使用習慣、用戶喜好、用戶習慣或用戶所說的前後對話等等方式，來優先選出較適當的回報答案，據以輸出語音應答予用戶，藉以增進用戶與自然語言對話系統進行對話時的便利性。In summary, the present invention provides a natural language dialogue method and system thereof, and the natural language dialogue system can output a corresponding voice response according to a voice input from a user. The natural language dialogue system of the present invention can also preferentially select a more appropriate return answer according to the usage habits of the people, the user's preference, the user's habits or the user's talks, etc., thereby outputting a voice response to the user, thereby enhancing The convenience of the user in a conversation with the natural language dialogue system.

接著再以自然語言理解系統100與結構化資料庫220等架構與構件，應用於依據用戶語音輸入的請求資訊分析而得的回報答案的數量，決定直接依據資料類型進行操作、或是要求用戶提供進一步指示，隨後在回報答案只剩一者時，亦可直接依據資料類型進行操作的實例做的說明。提供用戶這項選擇的好處為系統可以不必替用戶進行回報答案的篩選，而是將包含回報答案的候選列表直接提供給用戶，並讓用戶透過回報答案的選取，自己決定想要執行的軟體或提供哪種服務，以達到提供用戶友好介面(user-friendly interface)的目的。Then, the architecture and components such as the system 100 and the structured database 220 are understood in a natural language, and the number of reward answers obtained by analyzing the requested information according to the user's voice input is determined, and the operation is directly performed according to the data type, or the user is required to provide Further instructions, then when there is only one return answer, you can also directly explain the example of the operation based on the data type. The benefit of providing the user with this choice is that the system does not have to filter the answer for the user, but instead provides the candidate list with the return answer directly to the user, and allows the user to select the answer by themselves. Decide which software you want to execute or which service to offer in order to provide a user-friendly interface.

圖9為依據本發明一實施例的移動終端裝置的系統示意圖。請參照圖9，在本實施例中，移動終端裝置900包括語音接收單元910、資料處理單元920、顯示單元930及儲存單元940。資料處理單元920耦接語音接收單元910、顯示單元930及儲存單元940。語音接收單元910用以接收第一輸入語音SP1及第二輸入語音SP2且傳送至資料處理單元920。上述的第一語音輸入SP1與第二語音輸入SP2可以是語音輸入501、701。顯示單元930用以受控於資料處理單元920以顯示第一/第二候選列表908/908’。儲存單元940用以儲存多個資料，這些資料可包含前述的結構化資料庫220或特性資料庫730的資料，在此不再贅述。此外，儲存單元940可以是伺服器或電腦系統內的任何類型的記憶體，例如動態隨機記憶體(DRAM)，靜態隨機記憶體(SRAM)、快閃記憶體(Flash memory)、唯讀記憶體(ROM)...等，本發明對此並不加以限制，本領域的技術人員可以依據實際需求進行選用。FIG. 9 is a schematic diagram of a system of a mobile terminal device according to an embodiment of the invention. Referring to FIG. 9 , in the embodiment, the mobile terminal device 900 includes a voice receiving unit 910 , a data processing unit 920 , a display unit 930 , and a storage unit 940 . The data processing unit 920 is coupled to the voice receiving unit 910, the display unit 930, and the storage unit 940. The voice receiving unit 910 is configured to receive the first input voice SP1 and the second input voice SP2 and transmit the data to the data processing unit 920. The first voice input SP1 and the second voice input SP2 described above may be voice inputs 501, 701. The display unit 930 is configured to be controlled by the material processing unit 920 to display the first/second candidate list 908/908'. The storage unit 940 is configured to store a plurality of materials, and the data may include the data of the structured database 220 or the characteristic database 730, and details are not described herein. In addition, the storage unit 940 can be any type of memory in a server or computer system, such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory, and read-only memory. (ROM), etc., the present invention is not limited thereto, and those skilled in the art can select according to actual needs.

在本實施例中，資料處理單元920的作用如同圖1的自然語言理解系統100，會對第一輸入語音SP1進行語音識別以產生請求資訊902，再對第一請求資訊902進行分析與自然語言處理以產生對應第一輸入語音SP1的第一關鍵字904，並且依據第一輸入語音SP1對應的第一關鍵字904從儲存單元940的資料(例如搜尋引擎240依據關鍵字108對結構化資料庫220進行全文檢索) 中找出第一回報答案906(例如第一回報答案511/711)。當所找到的第一回報答案906數量為1時，資料處理單元920可直接依據第一回報答案906所對應的資料的類型進行對應的操作；當第一回報答案906的數量大於1時，資料處理單元920可將第一回報答案906整理成一個第一候選列表908，隨後控制顯示單元940顯示第一候選列表908予用戶。在顯示第一候選列表908供用戶做進一步選取的狀況下，資料處理單元920會收到第二輸入語音SP2，並對其進行語音識別以產生第二請求資訊902’，再對第二請求資訊902’進行自然語言處理以產生對應第二輸入語音SP2的第二關鍵字904’，並且依據第二輸入語音SP2對應的第二關鍵字904’從第一候選列表908中選擇對應的部分。其中，第一關鍵字904及第二關鍵字904’可以由多個關鍵字所構成。上述對第二語音輸入SP2進行分析而產生第二請求資訊902’與第二關鍵字904’的方式，可以運用圖5A與7A對第二語音輸入進行分析的方式，因此不再贅述。In the present embodiment, the data processing unit 920 functions as the natural language understanding system 100 of FIG. 1, and performs speech recognition on the first input speech SP1 to generate the request information 902, and then analyzes the first request information 902 and the natural language. Processing to generate a first keyword 904 corresponding to the first input speech SP1, and according to the first keyword 904 corresponding to the first input speech SP1, the data from the storage unit 940 (eg, the search engine 240 according to the keyword 108 to the structured database) 220 full-text search) Find the first return answer 906 (for example, the first return answer 511/711). When the number of the first reward answers 906 found is 1, the data processing unit 920 can directly perform the corresponding operation according to the type of the data corresponding to the first reward answer 906; when the number of the first reward answers 906 is greater than 1, the data Processing unit 920 can organize the first reward answer 906 into a first candidate list 908, and then control display unit 940 displays first candidate list 908 to the user. In the case that the first candidate list 908 is displayed for the user to make further selection, the data processing unit 920 receives the second input voice SP2 and performs voice recognition to generate the second request information 902', and then the second request information. 902' performs natural language processing to generate a second keyword 904' corresponding to the second input speech SP2, and selects a corresponding portion from the first candidate list 908 according to the second keyword 904' corresponding to the second input speech SP2. The first keyword 904 and the second keyword 904' may be composed of a plurality of keywords. The manner of analyzing the second speech input SP2 to generate the second request information 902' and the second keyword 904' may use the manner in which the second speech input is analyzed by using FIGS. 5A and 7A, and thus will not be described again.

類似地，當第二回報答案906的數量為1時，資料處理單元920會依據第二回報答案906的類型進行對應的操作；當第二回報答案906’的數量大於1時，資料處理單元920會再依據第二回報答案906’整理成一個第二候選列表908’並控制顯示單元940予以顯示。接著，再依據用戶下一個輸入語音以選擇對應的部分，再依據後續回報答案的數量進行對應的操作，此可參照上述說明類推得知，在此則不再贅述。Similarly, when the number of second reward answers 906 is 1, the data processing unit 920 performs corresponding operations according to the type of the second reward answer 906; when the number of second reward answers 906' is greater than 1, the data processing unit 920 The second reward answer 906' is further organized into a second candidate list 908' and the display unit 940 is controlled to display. Then, according to the next input voice of the user, the corresponding part is selected, and then the corresponding operation is performed according to the number of subsequent feedback answers. This can be referred to the analogy of the above description, and will not be described here.

進一步來說，資料處理單元920會將結構化資料庫220的多個記錄302(例如標題欄位304中的各分欄位308的數值資料)與第一輸入語音SP1對應的第一關鍵字904進行比對(如前面對圖1、圖3A、3B、3C所述)。當結構化資料庫220某個記錄302與第一輸入語音SP1的第一關鍵字904為至少部分匹配時，則將此記錄302視為第一輸入語音SP1所產生的匹配結果(例如圖3A/3B的產生匹配結果)。其中，若資料的類型為音樂檔，則記錄302可包括歌曲名稱、歌手、專輯名稱、出版時間、播放次序、...等；若資料的類型為影像檔，則記錄302可包括影片名稱、出版時間、工作人員(包含演出人員)、...等；若資料的類型為網頁檔，則記錄302可包括網站名稱、網頁類型、對應的使用者帳戶、...等；若資料的類型為圖片檔，則記錄302可包括圖片名稱、圖片資訊、...等；若資料的類型為名片檔，則記錄302可包括連絡人名稱、連絡人電話、連絡人位址、...等。上述記錄302為舉例以說明，且記錄302可依據實際應用而定，本發明實施例不以此為限。Further, the data processing unit 920 will map the plurality of records 302 of the structured repository 220 (eg, the numeric data of each of the sub-fields 308 in the title field 304) to the first keyword 904 corresponding to the first input speech SP1. The alignment is performed (as described above with respect to Figures 1, 3A, 3B, 3C). When the structured database 220 has a record 302 that is at least partially matched with the first key 904 of the first input speech SP1, then the record 302 is considered to be the matching result produced by the first input speech SP1 (eg, FIG. 3A/ 3B produces a matching result). Wherein, if the type of the material is a music file, the record 302 may include a song name, a singer, an album name, a publishing time, a play order, ..., etc.; if the type of the material is an image file, the record 302 may include a movie name, Publishing time, staff (including performers), ..., etc.; if the type of the material is a web file, the record 302 may include the website name, the web page type, the corresponding user account, ..., etc.; For the image file, the record 302 may include a picture name, picture information, ..., etc.; if the type of the data is a business card file, the record 302 may include a contact name, a contact phone number, a contact address, ..., etc. . The foregoing record 302 is illustrated by way of example, and the record 302 may be determined according to an actual application, and the embodiment of the present invention is not limited thereto.

接著，資料處理單元920可判斷第二輸入語音SP2對應的第二關鍵字904’是否包含指示順序的一順序辭彙(例如“我要第三個選項”或“我選第三個”)。當第二輸入語音SP2對應的第二關鍵字904’包含指示順序的順序辭彙時，則資料處理單元920依據順序辭彙自第一候選列表908中選擇位於對應位置的資料。當第二輸入語音SP2對應的第二關鍵字904’未包含指示順序的順序辭彙時，表示用戶可能直接選取第一候選列表908中某個第一回報答案906，則資料處理單元920將第一候選列表908中各個第一回報答案906所對應的記錄302與第二關鍵字904’進行比對，以決定第一回報答案906與第二輸入語音SP2的對應程度，再依據對應程度決定第一候選列表908中是否有某個第一回報答案906對應第二輸入語音SP2。在本發明的一實施例中，資料處理單元920可依據第一回報答案906對第二關鍵字904’的對應程度(例如完全匹配或是部分匹配的程度)，來決定第一候選列表906中是否有某個第一回報答案906與第二輸入語音SP2產生對應，藉以簡化選擇的流程。其中，資料處理單元920可選擇資料中對應程度為最高者為對應第二輸入語音SP2。Next, the material processing unit 920 can determine whether the second keyword 904' corresponding to the second input speech SP2 contains a sequential vocabulary indicating the order (for example, "I want a third option" or "I choose the third one"). When the second keyword 904' corresponding to the second input speech SP2 includes the sequential vocabulary indicating the order, the material processing unit 920 selects the material located at the corresponding position from the first candidate list 908 according to the sequential vocabulary. When the second keyword 904' corresponding to the second input speech SP2 does not include the sequential vocabulary indicating the order, it indicates that the user may directly select a certain first return in the first candidate list 908. In answer 906, the data processing unit 920 compares the record 302 corresponding to each first return answer 906 in the first candidate list 908 with the second keyword 904' to determine the first return answer 906 and the second input speech SP2. Corresponding degree, according to the degree of correspondence, determines whether a certain first return answer 906 corresponds to the second input speech SP2 in the first candidate list 908. In an embodiment of the present invention, the data processing unit 920 may determine the first candidate list 906 according to the degree of correspondence of the first reward answer 906 to the second keyword 904' (eg, the degree of complete matching or partial matching). Whether there is a certain first return answer 906 corresponding to the second input speech SP2, thereby simplifying the selection process. The data processing unit 920 can select the highest degree of correspondence in the data to correspond to the second input voice SP2.

舉例來說，若第一輸入語音SP1為“今天天氣怎樣”，在進行語音識別及自然語言處理後，第一輸入語音SP1對應的第一關鍵字904會包括“今天”及“天氣”，因此資料處理單元920會讀取對應今天天氣的資料，並且透過顯示單元930顯示這些天氣資料作為第一候選列表908。接著，若第二輸入語音SP2為“我要看第3筆資料”或“我選擇第3筆”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“第3筆”，在此“第3筆”會被解讀為指示順序的順序辭彙，因此資料處理單元920會讀取第一候選列表908中第3筆資料(亦即第一候選列表908中的第三筆第一回報答案906)，並且再透過顯示單元930顯示對應的天氣資訊。或者，若第二輸入語音SP2為“我要看北京的天氣”或“我選擇北京的天氣”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“北京”及“天氣”，因此資料處理單元920會讀取第一候選列表908中對應北京的資料。當此項選擇所對應的第一回報答案906數量為1時，可直接透過顯示單元930顯示對應的天氣資訊；當所選擇的第一回報答案906數量大於1時，則再顯示進一步的第二候選列表908’(包含至少一個第二回報答案906’)供使用者進一步選擇。For example, if the first input voice SP1 is “What is the weather today”, after performing voice recognition and natural language processing, the first keyword 904 corresponding to the first input voice SP1 may include “Today” and “Weather”, The data processing unit 920 reads the data corresponding to today's weather and displays the weather data as the first candidate list 908 through the display unit 930. Then, if the second input speech SP2 is "I want to see the third data" or "I choose the third pen", after the speech recognition and natural language processing, the second input speech SP2 corresponds to the second keyword 904' Will include "3rd", where "3rd" will be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the third data in the first candidate list 908 (ie, the first candidate) The third pen in the list 908 first returns the answer 906), and the corresponding weather information is displayed again through the display unit 930. Or, if the second input voice SP2 is "I want to see the weather in Beijing" or "I choose the weather in Beijing", after performing speech recognition and natural language processing, The second keyword 904' corresponding to the two-input voice SP2 will include "Beijing" and "weather", so the data processing unit 920 will read the data corresponding to Beijing in the first candidate list 908. When the number of the first reward answers 906 corresponding to the selection is 1, the corresponding weather information may be directly displayed through the display unit 930; when the selected number of the first reward answers 906 is greater than 1, the second second is displayed. The candidate list 908' (containing at least one second reward answer 906') is further selected by the user.

若第一輸入語音SP1為“我要打電話給老張”，在進行語音識別及自然語言處理後，第一輸入語音SP1對應的第一關鍵字904會包括“電話”及“張”，因此資料處理單元920會讀取對應姓“張”的連絡人資料(可透過對結構化資料庫220進行全文檢索，再取得對應於記錄302的詳細資料)，並且透過顯示單元930顯示這些連絡人資料(亦即第一回報答案906)的第一候選列表908。接著，若第二輸入語音SP2為“第3個老張”或“我選擇第3個”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“第3個”，在此“第3個”會被解讀為指示順序的順序辭彙，因此資料處理單元920會讀取第一候選列表908中的第3筆資料(亦即第三個第一回報答案906)，並且依據所選擇的資料進行撥接。或者，若第二輸入語音SP2為“我選139開頭的”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“139”及“開頭”，在此“139”不會被解讀為指示順序的順序辭彙，因此資料處理單元920會讀取第一候選列表908中電話號碼為139開頭的連絡人資料；若第二輸入語音SP2為“我要北京的老張”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“北京”及“張”，資料處理單元920會讀取第一候選列表908中位址為北京的連絡人資料。當所選擇的第一回報答案906數量為1時，則依據所選擇的資料進行撥接；當所選擇的第一回報答案906數量大於1，則將此時所選取的第一回報答案906作為第二回報答案906’，並整理成一第二候選列表908’顯示予用戶供其選擇。If the first input voice SP1 is "I want to call the old card", after performing voice recognition and natural language processing, the first keyword 904 corresponding to the first input voice SP1 may include "telephone" and "sheet", The data processing unit 920 reads the contact information of the corresponding last name "Zhang" (the full-text search can be performed through the structured database 220, and then the detailed information corresponding to the record 302 is obtained), and the contact information is displayed through the display unit 930. The first candidate list 908 (ie, the first return answer 906). Then, if the second input voice SP2 is "3rd old" or "I choose the third", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 may include "3rd", where "3rd" will be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the third data in the first candidate list 908 (ie, the third A return answer 906), and dialing based on the selected data. Alternatively, if the second input speech SP2 is "I choose 139 at the beginning", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 may include "139" and "beginning". Here, "139" is not interpreted as the sequential vocabulary indicating the order, so the material processing unit 920 reads the contact information of the first candidate list 908 whose telephone number is 139; if the second input voice SP2 is "I" For the old Zhang of Beijing, after the speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 will include "Beijing" and "Zhang", and the data processing unit 920 will read the first candidate. The address in list 908 is the contact information of Beijing. When the number of selected first return answers 906 is 1, the dialing is performed according to the selected data; when the number of selected first return answers 906 is greater than 1, then The first return answer 906 selected at this time is taken as the second return answer 906', and is organized into a second candidate list 908' for display to the user for selection.

若第一輸入語音SP1為“我要找餐廳”，在進行語音識別及自然語言處理後，第一輸入語音SP1的第一關鍵字904會包括“餐廳”，資料處理單元920會讀取所有對應於餐廳第一回報答案906，由於這樣的指示並不是很明確，所以將透過顯示單元930顯示第一候選列表908(包含對應於所有餐廳資料的第一回報答案906)予用戶，並等用戶進一步的指示。接著，若用戶透過第二輸入語音SP2輸入“第3個餐廳”或“我選擇第3個”時，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“第3個”，在此“第3個”會被解讀為指示順序的順序辭彙，因此資料處理單元920會讀取第一候選列表908中第3筆資料，並且依據所選擇的資料進行顯示。或者，若第二輸入語音SP2為“我選最近的”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“最近的”，因此資料處理單元920會讀取第一候選列表908中位址與使用者最近的餐廳資料；若第二輸入語音SP2為“我要北京的餐廳”，在進行語音識別及自然語言處理後，第二輸入語音SP2對應的第二關鍵字904’會包括“北京”及“餐廳”，因此資料處理單元920會讀取第一候選列表908中位址為北京的餐廳資料。當所選擇第一回報答案906的數量為1時，則依據所選擇的資料進行顯示；當所選擇的第一回報答案906數量大於1，則將此時所選取的第一回報答案906作為第二回報答案906’，並整理成一第二候選列表908’顯示予使用者供其選擇。If the first input voice SP1 is "I am looking for a restaurant", after performing voice recognition and natural language processing, the first keyword 904 of the first input voice SP1 will include "restaurant", and the data processing unit 920 will read all corresponding The first report answer 906 is returned in the restaurant. Since such an indication is not very clear, the first candidate list 908 (including the first return answer 906 corresponding to all restaurant materials) will be displayed to the user through the display unit 930, and the user is further advanced. Instructions. Next, if the user inputs "3rd restaurant" or "I choose the 3rd" through the second input voice SP2, after the voice recognition and natural language processing, the second keyword 904' corresponding to the second input voice SP2 Will include "3rd", where "3rd" will be interpreted as a sequential vocabulary indicating the order, so the data processing unit 920 will read the third data in the first candidate list 908, and according to the selected The data is displayed. Alternatively, if the second input speech SP2 is "I select the nearest one", after performing speech recognition and natural language processing, the second keyword 904' corresponding to the second input speech SP2 may include "recent", so the data processing unit 920 will read the restaurant information of the first candidate list 908 with the address closest to the user; if the second input voice SP2 is "I want a restaurant in Beijing", the speech recognition and natural language are performed. After the word processing, the second keyword 904' corresponding to the second input voice SP2 will include "Beijing" and "restaurant", so the data processing unit 920 will read the restaurant material in the first candidate list 908 with the address of Beijing. When the number of selected first reward answers 906 is 1, it is displayed according to the selected data; when the number of selected first reward answers 906 is greater than 1, the first return answer 906 selected at this time is taken as the first The second answer 906' is returned and organized into a second candidate list 908' for display to the user for selection.

依據上述，資料處理單元920可依據所選擇第一回報答案906(或第二回報答案906’)的資料的類型進行對應的操作。舉例來說，當所選擇第一回報答案906對應的資料的類型為一音樂檔，則資料處理單元920依據所選擇的資料進行音樂播放；當所選擇的資料的類型為一影像檔，則資料處理920單元依據所選擇的資料進行影像播放；當所選擇的資料的類型為一網頁檔，則資料處理單元920依據所選擇的資料進行顯示；當所選擇的資料的類型為一圖片檔，則資料處理單元920依據所選擇的資料進行圖片顯示；當所選擇的資料的類型為一名片檔，則資料處理單元920依據所選擇的資料進行撥接。In accordance with the above, data processing unit 920 can perform corresponding operations in accordance with the type of material of the selected first reward answer 906 (or second reward answer 906'). For example, when the type of the data corresponding to the selected first reward answer 906 is a music file, the data processing unit 920 performs music playback according to the selected data; when the type of the selected data is an image file, the data The processing unit 920 performs video playback according to the selected data; when the type of the selected data is a web file, the data processing unit 920 displays according to the selected data; when the type of the selected data is an image file, The data processing unit 920 performs picture display according to the selected data; when the type of the selected data is a business card file, the data processing unit 920 performs dialing according to the selected data.

圖10為依據本發明一實施例的資訊系統的系統示意圖。請參照圖9及圖10，在本實施例中，資訊系統1000包括移動終端裝置1010及伺服器1020，其中伺服器1020可以是雲端伺服器、區域網路伺服器、或其他類似裝置，但本發明實施例不以此為限。移動終端裝置1010包括語音接收單元1011、資料處理單元1013及顯示單元1015。資料處理單元1013耦接語音接收單元1011、顯示單元1015及伺服器1020。移動終端裝置1010可以是移動電話(Cell phone)、個人數位助理(Personal Digital Assistant，PDA)手機、智慧型手機(Smart phone)等移動通訊裝置，本發明亦不對此加以限制。語音接收單元1011的功能相似於語音接收單元910，顯示單元1015的功能相似於顯示單元930。伺服器1020用以儲存多個資料且具有語音識別功能。FIG. 10 is a schematic diagram of a system of an information system according to an embodiment of the invention. Referring to FIG. 9 and FIG. 10, in the embodiment, the information system 1000 includes a mobile terminal device 1010 and a server 1020. The server 1020 may be a cloud server, a regional network server, or the like, but The embodiments of the invention are not limited thereto. The mobile terminal device 1010 includes a voice receiving unit 1011, a data processing unit 1013, and a display unit 1015. The data processing unit 1013 is coupled to the voice receiving unit 1011. The display unit 1015 and the server 1020. The mobile terminal device 1010 may be a mobile communication device such as a mobile phone (Cell phone), a personal digital assistant (PDA) mobile phone, or a smart phone, and the present invention does not limit this. The function of the voice receiving unit 1011 is similar to that of the voice receiving unit 910, and the function of the display unit 1015 is similar to that of the display unit 930. The server 1020 is configured to store a plurality of materials and has a voice recognition function.

在本實施例中，資料處理單元1013會透過伺服器1020對第一輸入語音SP1進行語音識別以產生第一請求資訊902，再對第一請求資訊902進行自然語言處理以產生對應第一輸入語音SP1的第一關鍵字904，並且伺服器1020會依據第一關鍵字904對結構化資料庫220進行全文檢索以找出第一回報答案906後並傳送至資料處理單元1013。當第一回報答案906的數量為1時，資料處理單元1013會依據第一回報答案906所對應的資料類型進行對應的操作；當第一回報答案906的數量大於1時，資料處理單元1013將此時所選擇的第一回報答案906整理成第一候選列表908後控制顯示單元1015顯示予用戶，並等候用戶進一步的指示。當用戶又輸入指示後，接著，資料處理單元1013會透過伺服器1020對第二輸入語音PS2進行語音識別以產生第二請求資訊902’，再對第二請求資訊902’進行分析與自然語言處理以產生對應第二輸入語音SP2的第二關鍵字904’，並且伺服器1020依據第二輸入語音SP2對應的第二關鍵字904’從第一候選列表908中挑選對應的第一回報答案906作為第二回報答案906’，並傳送至資料處理單元1013。類似地，當此時對應的第二回報答案906的數量為1時，資料處理單元920會依據第二回報答案906所對應的資料的類型進行對應的操作；當第二回報答案906的數量大於1時，資料處理單元1013會再此時所選擇的第二回報答案906整理成一第二候選列表908’後，再控制顯示單元1015顯示予用戶做進一步選擇。接著，伺服器1020會再依據後續輸入語音選擇對應的部分，並且資料處理單元1013會再依據選擇的資料的數量進行對應的操作，此可參照上述說明類推得知，在此則不再贅述。In this embodiment, the data processing unit 1013 performs voice recognition on the first input voice SP1 through the server 1020 to generate the first request information 902, and then performs natural language processing on the first request information 902 to generate a corresponding first input voice. The first keyword 904 of SP1, and the server 1020 performs a full-text search on the structured database 220 according to the first keyword 904 to find the first return answer 906 and transmits it to the data processing unit 1013. When the number of the first report answer 906 is 1, the data processing unit 1013 performs a corresponding operation according to the data type corresponding to the first report answer 906; when the number of the first report answer 906 is greater than 1, the data processing unit 1013 At this time, the selected first return answer 906 is organized into the first candidate list 908, and then the control display unit 1015 displays to the user, and waits for further instructions from the user. After the user inputs the indication, the data processing unit 1013 then performs voice recognition on the second input voice PS2 through the server 1020 to generate the second request information 902', and then analyzes and natural language processing the second request information 902'. To generate a second keyword 904' corresponding to the second input speech SP2, and the server 1020 selects a corresponding first reward answer 906 from the first candidate list 908 according to the second keyword 904' corresponding to the second input speech SP2. The second reward answer 906', and transmitted to the capital Material processing unit 1013. Similarly, when the number of corresponding second reward answers 906 is 1 at this time, the data processing unit 920 performs corresponding operations according to the type of the data corresponding to the second reward answer 906; when the number of second reward answers 906 is greater than At 1 o'clock, the data processing unit 1013 sorts the second return answer 906 selected at this time into a second candidate list 908', and then controls the display unit 1015 to display the user for further selection. Then, the server 1020 selects the corresponding part according to the subsequent input voice, and the data processing unit 1013 performs the corresponding operation according to the number of selected materials. This can be referred to the analogy of the above description, and will not be described herein.

應注意的是，在一實施例中，若依據第一輸入語音SP1對應的第一關鍵字904所選擇的第一回報答案906數量為1時，可以直接進行該資料對應的操作。此外，在另一實施例中，可以先輸出一個提示予用戶，以通知用戶所選擇的第一回報答案906的對應操作將被執行。再者，在又一實施例中，亦可在依據第二輸入語音SP2對應的第二關鍵字904’所選擇的第二回報答案906數量為1時，直接進行該資料對應的操作。當然，在另一實施例中，亦可以先輸出一個提示予用戶，以通知用戶所選擇的資料的對應操作將被執行，本發明對此都不加以限制。It should be noted that, in an embodiment, if the number of first reward answers 906 selected according to the first keyword 904 corresponding to the first input voice SP1 is 1, the operation corresponding to the data may be directly performed. Moreover, in another embodiment, a prompt may be output to the user to inform the user that the corresponding operation of the selected first reward answer 906 will be performed. Furthermore, in another embodiment, when the number of second reward answers 906 selected according to the second keyword 904' corresponding to the second input voice SP2 is 1, the operation corresponding to the data may be directly performed. Of course, in another embodiment, a prompt may be output to the user to notify the user that the corresponding operation of the selected material is to be performed, and the present invention does not limit this.

進一步來說，伺服器1020會將結構化資料庫220各個記錄302與第一輸入語音SP1對應的第一關鍵字904進行比對。當各個記錄302與第一關鍵字904為至少部分匹配時，則將此記錄302視為第一輸入語音SP1所匹配的資料，並將此記錄302作為第一回報答案906的一者。若依據第一輸入語音SP1對應的第一關鍵字904所選擇的第一回報答案906數量大於1時，用戶可能再透過第二輸入語音SP2輸入指示。由於用戶此時透過第二輸入語音SP2所輸入的指示可能包含順序(用以指示選擇顯示資訊中的第幾項等順序)、直接選定顯示資訊中的某一者(例如直接指示某項資訊的內容)、或是依據指示判定用戶的意圖(例如選取最近的餐廳，就會用顯示“最近”的餐廳給用戶)，於是伺服器1020接著將判斷第二輸入語音SP2對應的第二關鍵字904’是否包含指示順序的一順序辭彙。當第二輸入語音SP2對應的第二關鍵字904’包含指示順序的順序辭彙時，則伺服器1020依據順序辭彙自第一候選列表908中選擇位於對應位置的第一回報答案906。當第二輸入語音SP2對應的第二關鍵字904’未包含指示順序的順序辭彙時，則伺服器1020將第一候選列表908中各個第一回報答案906與第二輸入語音SP2對應的第二關鍵字904’進行比對，以決定第一回報答案906與第二輸入語音SP2的對應程度，並可依據這些對應程度決定第一候選列表908中第一回報答案906是否對應第二輸入語音SP2。在本發明的一實施例中，伺服器1020可依據第一回報答案906與第二關鍵字904’的對應程度決定第一候選列表908中的那些第一回報答案906對應第二輸入語音SP2，以簡化選擇的流程。其中，伺服器1020可選擇第一回報答案906中對應程度為最高者為對應於第二輸入語音SP2者。Further, the server 1020 compares the respective records 302 of the structured database 220 with the first keywords 904 corresponding to the first input speech SP1. When each record 302 is at least partially matched to the first keyword 904, then this record 302 is considered to be the material matched by the first input speech SP1 and this record 302 is taken as one of the first reward answers 906. If the first input voice SP1 corresponds to the first When the number of first reward answers 906 selected by the keyword 904 is greater than one, the user may input an indication through the second input speech SP2. Since the indication input by the user at this time through the second input voice SP2 may include an order (to indicate the order of selecting the first item in the information, etc.), directly select one of the displayed information (for example, directly indicating a certain information) Content), or determining the user's intention according to the instruction (for example, selecting the nearest restaurant, the user is displayed with the "recent" restaurant), and then the server 1020 will then determine the second keyword 904 corresponding to the second input voice SP2. 'Do you include a sequential vocabulary indicating the order. When the second keyword 904' corresponding to the second input speech SP2 includes the sequential vocabulary indicating the order, the server 1020 selects the first return answer 906 located at the corresponding position from the first candidate list 908 according to the sequential vocabulary. When the second keyword 904 ′ corresponding to the second input speech SP2 does not include the sequential vocabulary indicating the order, the server 1020 associates each first return answer 906 in the first candidate list 908 with the second input speech SP2. The two keywords 904' are compared to determine the degree of correspondence between the first return answer 906 and the second input speech SP2, and may determine, according to the degree of correspondence, whether the first return answer 906 in the first candidate list 908 corresponds to the second input speech. SP2. In an embodiment of the present invention, the server 1020 may determine, according to the degree of correspondence between the first return answer 906 and the second keyword 904', those first return answers 906 in the first candidate list 908 corresponding to the second input speech SP2, To simplify the process of selection. The server 1020 can select the one with the highest degree of correspondence in the first report answer 906 as corresponding to the second input voice SP2.

圖11為依據本發明一實施例的基於語音識別的選擇方法的流程圖。請參照圖11，在本實施例中，會接收第一輸入語音(步驟S1100)，並且對第一輸入語音SP1進行語音識別以產生第一請求資訊902(步驟S1110)，再對第一請求資訊902進行分析自然語言處理以產生對應第一輸入語音的第一關鍵字904(步驟S1120)。接著，會依據第一關鍵字904從多個資料中選擇對應的第一回報答案906(步驟S1130)，並且判斷所選擇的第一回報答案906數量是否為1(步驟S1140)。當所選擇第一回報答案906的數量為1時，亦即步驟S1140的判斷結果為“是”，則依據第一回報答案906所對應的資料類型進行對應的操作(步驟S1150)。當所選擇第一回報答案906的數量大於1時，亦即步驟S1140的判斷結果為“否”，依據所選擇第一回報答案906顯示第一候選列表908且接收第二輸入語音SP2(步驟S1160)，並且對第二輸入語音進行語音識別以產生第二請求資訊902’(步驟S1170)，再對第二請求資訊902’進行分析與自然語言處理以產生對應第二輸入語音的第二關鍵字904’(步驟S1180)。接著，依據第二請求資訊902從第一候選列表908中的第一回報答案906選擇對應的部分，再回到步驟S1140判斷判斷所選擇第一回報答案906的數量是否為1(步驟S1190)。其中，上述步驟的順序為用以說明，本發明實施例不以此為限。並且，上述步驟的細節可參照圖9及圖10實施例，在此則不再贅述。11 is a flow chart of a voice recognition based selection method in accordance with an embodiment of the present invention. Referring to FIG. 11, in this embodiment, the first input voice is received (step Step S1100), and performing speech recognition on the first input speech SP1 to generate the first request information 902 (step S1110), and then analyzing the first request information 902 to perform natural language processing to generate a first keyword corresponding to the first input speech. 904 (step S1120). Next, the corresponding first reward answer 906 is selected from the plurality of materials according to the first keyword 904 (step S1130), and it is determined whether the number of the selected first reward answers 906 is 1 (step S1140). When the number of selected first return answers 906 is 1, that is, the determination result of step S1140 is YES, the corresponding operation is performed according to the data type corresponding to the first return answer 906 (step S1150). When the number of selected first return answers 906 is greater than 1, that is, the determination result of step S1140 is "NO", the first candidate list 908 is displayed according to the selected first return answer 906 and the second input speech SP2 is received (step S1160). And performing voice recognition on the second input voice to generate second request information 902' (step S1170), and analyzing and natural language processing on the second request information 902' to generate a second keyword corresponding to the second input voice 904' (step S1180). Next, the corresponding portion is selected from the first report answer 906 in the first candidate list 908 according to the second request information 902, and then the process returns to step S1140 to determine whether the number of selected first report answers 906 is 1 (step S1190). The order of the above steps is for illustration, and the embodiment of the present invention is not limited thereto. For details of the above steps, reference may be made to the embodiments of FIG. 9 and FIG. 10, and details are not described herein again.

綜上所述，本發明實施例的基於語音識別的選擇方法及其移動終端裝置及資訊系統，其對第一輸入語音及第二輸入語音進行語音識別及自然語言處理以確認第一輸入語音及第二輸入語音對應的關鍵字，再依據第一輸入語音及第二輸入語音對應的關鍵字對回報答案進行選擇。藉此，可提升使用者操作的便利性。In summary, the voice recognition-based selection method and the mobile terminal device and the information system thereof in the embodiment of the present invention perform voice recognition and natural language processing on the first input voice and the second input voice to confirm the first input voice and Second input The keyword corresponding to the sound is selected according to the keyword corresponding to the first input voice and the second input voice. Thereby, the convenience of the user's operation can be improved.

接下來針對本發明所揭示的自然語言理解系統100與結構化資料庫220等架構與構件，與輔助啟動裝置相結合的操作實例做說明。Next, an operation example combining the architecture and components of the natural language understanding system 100 and the structured database 220 and the auxiliary activation device disclosed in the present invention will be described.

圖12是依照本發明一實施例所繪示的語音操控系統的方塊圖。請參照圖12，語音操控系統1200包括輔助啟動裝置1210、移動終端裝置1220以及伺服器1230。在本實施例中，輔助啟動裝置1210會透過無線傳輸信號，來啟動移動終端裝置1220的語音系統，使得移動終端裝置1220根據語音信號與伺服器1230進行溝通。FIG. 12 is a block diagram of a voice control system according to an embodiment of the invention. Referring to FIG. 12, the voice control system 1200 includes an auxiliary activation device 1210, a mobile terminal device 1220, and a server 1230. In the present embodiment, the auxiliary activation device 1210 activates the voice system of the mobile terminal device 1220 by wirelessly transmitting signals, so that the mobile terminal device 1220 communicates with the server 1230 according to the voice signal.

詳細而言，輔助啟動裝置1210包括第一無線傳輸模組1212以及觸發模組1214，其中觸發模組1214耦接於第一無線傳輸模組1212。第一無線傳輸模組1212例如是支援無線相容認證(Wireless fidelity，Wi-Fi)、全球互通微波存取(Worldwide Interoperability for Microwave Access，WiMAX)、藍芽(Bluetooth)、超寬頻(ultra-wideband，UWB)或射頻識別(Radio-frequency identification，RFID)等通訊協定的裝置，其可發出無線傳輸信號，以和另一無線傳輸模組彼此對應而建立無線連結。觸發模組1214例如為按鈕、按鍵等。在本實施例中，當使用者按壓此觸發模組1214產生一觸發信號後，第一無線傳輸模組1212接收此觸發信號而啟動，此時第一無線傳輸模組1212會發出無線傳輸信號，並透過第一無線傳輸模組1212傳送此無線傳輸信號至移動終端裝置1220。在一實施例中，上述的輔助啟動裝置1210可為一藍牙耳機。In detail, the auxiliary activation device 1210 includes a first wireless transmission module 1212 and a trigger module 1214 , wherein the trigger module 1214 is coupled to the first wireless transmission module 1212 . The first wireless transmission module 1212 supports, for example, Wireless Fidelity (Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX), Bluetooth, and Ultra-wideband. , UWB) or radio frequency identification (RFID) communication protocol device, which can transmit a wireless transmission signal to establish a wireless connection with another wireless transmission module. The trigger module 1214 is, for example, a button, a button, or the like. In this embodiment, after the user presses the trigger module 1214 to generate a trigger signal, the first wireless transmission module 1212 receives the trigger signal and starts, and the first wireless transmission module 1212 The wireless transmission signal is transmitted, and the wireless transmission signal is transmitted to the mobile terminal device 1220 through the first wireless transmission module 1212. In an embodiment, the auxiliary activation device 1210 may be a Bluetooth headset.

值得注意的是，雖然目前有些免持的耳機/麥克風亦具有啟動移動終端裝置1220某些功能的設計，但本發明的另一實施例中，輔助啟動裝置1210可以不同於上述的耳機/麥克風。上述的耳機/麥克風藉由與移動終端裝置的連線，以取代移動終端裝置1220上的耳機/麥克風而進行聽/通話，啟動功能為附加設計，但本發明的輔助啟動裝置1210“僅”用於開啟移動終端裝置1220中的語音系統，並不具有聽/通話的功能，故內部的電路設計可簡化，成本也較低。換言之，相對於上述的免持耳機/麥克風而言，輔助啟動裝置1210是另外裝置，即使用者可能同時具備免持的耳機/麥克風以及本發明的輔助啟動裝置1210。It should be noted that although some hands-free headsets/microphones currently have a design to activate certain functions of the mobile terminal device 1220, in another embodiment of the present invention, the auxiliary activation device 1210 may be different from the earphone/microphone described above. The above-mentioned earphone/microphone performs listening/talking by replacing the earphone/microphone on the mobile terminal device 1220 by connecting with the mobile terminal device, and the activation function is an additional design, but the auxiliary starting device 1210 of the present invention is "only" used. The voice system in the mobile terminal device 1220 is turned on, and does not have the function of listening/talking, so the internal circuit design can be simplified and the cost is low. In other words, with respect to the hands-free headset/microphone described above, the auxiliary activation device 1210 is another device, that is, the user may have both a hands-free headset/microphone and the auxiliary activation device 1210 of the present invention.

此外，上述的輔助啟動裝置1210的形體可以是使用者隨手可及的用品，例如戒指、手錶、耳環、項鏈、眼鏡等裝飾品，即各種隨身可攜式物品，或者是安裝構件，例如為配置於方向盤上的行車配件，不限於上述。也就是說，輔助啟動裝置1210為“生活化”的裝置，透過內部系統的設置，讓使用者能夠輕易地觸碰到觸發模組1214，以開啟語音系統。舉例來說，當輔助啟動裝置1210的形體為戒指時，使用者可輕易地移動手指來按壓戒指的觸發模組1214使其被觸發。另一方面，當輔助啟動裝置1210的形體為配置於行車配件的裝置時，使用者亦能夠在行車期間輕易地觸發行車配件裝置的觸發模組1214。此外，相較于配戴耳機/麥克風進行聽/通話的不舒適感，使用本發明的輔助啟動裝置1210可以將移動終端裝置1220中的語音系統開啟，甚至進而開啟擴音功能(後將詳述)，使得使用者在不需配戴耳機/麥克風，仍可直接透過移動終端裝置1220進行聽/通話。另外，對於使用者而言，這些“生活化”的輔助啟動裝置1210為原本就會配戴或使用的物品，故在使用上不會有不習慣或是不舒適感的問題，即不需要花時間適應。舉例來說，當使用者在廚房做菜時，需要撥打放置於客廳的移動電話時，假設其配戴具有戒指、項鏈或手錶形體的本發明的輔助啟動裝置1210，就可以輕觸戒指、項鏈或手錶以開啟語音系統以詢問友人食譜細節。雖然目前部份具有啟動功能的耳機/麥克風亦可以達到上述的目的，但是在每次做菜的過程中，並非每次都需要撥打電話請教友人，故對於使用者來說，隨時配戴耳機/麥克風做菜，以備隨時操控移動終端裝置可說是相當的不方便。In addition, the shape of the auxiliary starting device 1210 described above may be an item that is accessible to the user, such as a ring, a watch, an earring, a necklace, an eyeglass, etc., that is, various portable items, or a mounting member, for example, a configuration. The driving accessories on the steering wheel are not limited to the above. That is to say, the auxiliary activation device 1210 is a "living" device, and through the setting of the internal system, the user can easily touch the trigger module 1214 to turn on the voice system. For example, when the shape of the auxiliary activation device 1210 is a ring, the user can easily move the finger to press the trigger module 1214 of the ring to be triggered. On the other hand, when the shape of the auxiliary starting device 1210 is a device disposed on the driving accessory, the user can also easily trigger during driving. The trigger module 1214 of the driving accessory device. In addition, the auxiliary activation device 1210 of the present invention can be used to turn on the voice system in the mobile terminal device 1220, and even turn on the amplification function, compared to the discomfort of listening/talking with the earphone/microphone. Therefore, the user can still listen/talk directly through the mobile terminal device 1220 without wearing the earphone/microphone. In addition, for the user, these "living" auxiliary starting devices 1210 are items that would otherwise be worn or used, so there is no problem of uncomfortable or uncomfortable use, that is, no flowers are needed. Time to adapt. For example, when a user is cooking in a kitchen and needs to dial a mobile phone placed in a living room, assuming that it is wearing the auxiliary starting device 1210 of the present invention having a ring, a necklace or a watch shape, the ring and the necklace can be lightly touched. Or watch to turn on the voice system to ask for friend recipe details. Although some earphones/microphones with start-up function can achieve the above purposes, in the process of cooking, not every time you need to call a friend, so for the user, wear headphones at any time. It is quite inconvenient to cook the microphone in order to control the mobile terminal device at any time.

在其他實施例中，輔助啟動裝置1210還可配置有無線充電電池1216，用以驅動第一無線傳輸模組1212。進一步而言，無線充電電池1216包括電池單元12162以及無線充電模組12164，其中無線充電模組12164耦接於電池單元12162。在此，無線充電模組12164可接收來自一無線供電裝置(未繪示)所供應的能量，並將此能量轉換為電力來對電池單元12162充電。如此一來，輔助啟動裝置1210的第一無線傳輸模組1212可便利地透過無線充電電池1216來進行充電。In other embodiments, the auxiliary activation device 1210 can also be configured with a wireless rechargeable battery 1216 for driving the first wireless transmission module 1212. Further, the wireless charging battery 1216 includes a battery unit 12162 and a wireless charging module 12164. The wireless charging module 12164 is coupled to the battery unit 12162. Here, the wireless charging module 12164 can receive energy supplied from a wireless power supply device (not shown) and convert the energy into power to charge the battery unit 12162. As such, the first wireless transmission module 1212 of the auxiliary activation device 1210 can be conveniently charged through the wireless rechargeable battery 1216.

另一方面，移動終端裝置1220例如為移動電話(Cell phone)、個人數位助理(Personal Digital Assistant，PDA)手機、智慧型手機(Smart phone)，或是安裝有通訊軟體的掌上型電腦(Pocket PC)、平板型電腦(Tablet PC)或筆記型電腦等等。移動終端裝置1220可以是任何具備通訊功能的可攜式(Portable)移動裝置，在此並不限制其範圍。此外，移動終端裝置1220可使用Android作業系統、Microsoft作業系統、Android作業系統、Linux作業系統等等，不限於上述。On the other hand, the mobile terminal device 1220 is, for example, a Cell phone, a Personal Digital Assistant (PDA) mobile phone, a smart phone, or a Pocket PC equipped with a communication software (Pocket PC). ), Tablet PC or laptop, and more. The mobile terminal device 1220 may be any portable mobile device having a communication function, and the scope thereof is not limited herein. Further, the mobile terminal device 1220 may use an Android operating system, a Microsoft operating system, an Android operating system, a Linux operating system, and the like, and is not limited to the above.

移動終端裝置1220包括第二無線傳輸模組1222，第二無線傳輸模組1222能與輔助啟動裝置1210的第一無線傳輸模組1212相匹配，並採用相對應的無線通訊協定(例如無線相容認證、全球互通微波存取、藍芽、超寬頻通訊協定或射頻識別等通訊協定)，藉以與第一無線傳輸模組1212建立無線連結。值得注意的是，在此所述的“第一”無線傳輸模組1212、“第二”無線傳輸模組1222是用以說明無線傳輸模組配置於不同的裝置，並非用以限定本發明。The mobile terminal device 1220 includes a second wireless transmission module 1222, and the second wireless transmission module 1222 can match the first wireless transmission module 1212 of the auxiliary activation device 1210, and adopts a corresponding wireless communication protocol (for example, wireless compatibility). A communication protocol such as authentication, global interoperability microwave access, Bluetooth, ultra-wideband communication protocol or radio frequency identification is established to establish a wireless connection with the first wireless transmission module 1212. It should be noted that the "first" wireless transmission module 1212 and the "second" wireless transmission module 1222 are used to describe that the wireless transmission module is configured in different devices, and is not intended to limit the present invention.

在其他實施例中，移動終端裝置1220還包括語音系統1221，此語音系統1221耦接於第二無線傳輸模組1222，故使用者觸發輔助啟動裝置1210的觸發模組1214後，能透過第一無線傳輸模組1212與第二無線傳輸模組1222無線地啟動語音系統1221。在一實施例中，此語音系統1221可包括語音取樣模組1224、語音合成模組1226以及語音輸出介面1227。語音取樣模組 1224用以接收來自使用者的語音信號，此語音取樣模組1224例如為麥克風(Microphone)等接收音訊的裝置。語音合成模組1226可查詢一語音合成資料庫，而此語音合成資料庫例如是記錄有文字以及其對應的語音的資訊，使得語音合成模組1226能夠找出對應於特定文字訊息的語音，以將文字訊息進行語音合成。之後，語音合成模組1226可將合成的語音透過語音輸出介面1227輸出，藉以播放予使用者。上述的語音輸出介面1227例如為喇叭或耳機等。In other embodiments, the mobile terminal device 1220 further includes a voice system 1221. The voice system 1221 is coupled to the second wireless transmission module 1222. Therefore, after the user triggers the trigger module 1214 of the auxiliary activation device 1210, the user can transmit the first The wireless transmission module 1212 and the second wireless transmission module 1222 wirelessly activate the voice system 1221. In an embodiment, the voice system 1221 can include a voice sampling module 1224, a voice synthesis module 1226, and a voice output interface 1227. Voice sampling module The 1224 is configured to receive a voice signal from a user. The voice sampling module 1224 is, for example, a device for receiving audio such as a microphone. The speech synthesis module 1226 can query a speech synthesis database, and the speech synthesis database is, for example, information recorded with text and corresponding speech, so that the speech synthesis module 1226 can find the speech corresponding to the specific text message. Synthesize text messages into speech. Thereafter, the speech synthesis module 1226 can output the synthesized speech through the speech output interface 1227 for playback to the user. The voice output interface 1227 described above is, for example, a speaker or an earphone.

另外，移動終端裝置1220還可配置有通訊模組1228。通訊模組1228例如是能傳遞與接收無線訊號的元件，如射頻收發器。進一步而言，通訊模組1228能夠讓使用者透過移動終端裝置1220接聽或撥打電話或使用電信業者所提供的其他服務。在本實施例中，通訊模組1228可透過網際網路接收來自伺服器1230的應答資訊，並依據此應答資訊建立移動終端裝置1220與至少一電子裝置之間的通話連線，其中所述電子裝置例如為另一移動終端裝置(未繪示)。In addition, the mobile terminal device 1220 may also be configured with a communication module 1228. The communication module 1228 is, for example, an element capable of transmitting and receiving wireless signals, such as a radio frequency transceiver. Further, the communication module 1228 enables the user to answer or make a call through the mobile terminal device 1220 or use other services provided by the carrier. In this embodiment, the communication module 1228 can receive the response information from the server 1230 through the Internet, and establish a call connection between the mobile terminal device 1220 and the at least one electronic device according to the response information, wherein the electronic device The device is for example another mobile terminal device (not shown).

伺服器1230例如為網路伺服器或雲端伺服器等，其具有語音理解模組1232。在本實施例中，語音理解模組1232包括語音識別模組12322以及語音處理模組12324，其中語音處理模組12324耦接於語音識別模組12322。在此，語音識別模組12322會接收從語音取樣模組1224傳來的語音信號，以將語音信號轉換成多個分段語義(例如關鍵字或字句等)。語音處理模組12324則可依據這些分段語義而解析出這些分段語義所代表的意指(例如意圖、時間、地點等)，進而判斷出上述語音信號中所表示的意思。此外，語音處理模組12324還會根據所解析的結果產生對應的應答資訊。在本實施例中，語音理解模組1232可由一個或數個邏輯門組合而成的硬體電路來實作，亦可以是以電腦程式碼來實作。值得一提的是，在另一實施例中，語音理解模組1232可配置於移動終端裝置1320中，如圖13所示的語音操控系統1300。上述伺服器1230的語音理解模組1232的操作，可如圖1A的自然語言理解系統100、圖5A/7A/7B的自然語言對話系統500/700/700’。The server 1230 is, for example, a network server or a cloud server, and has a voice understanding module 1232. In this embodiment, the voice recognition module 1232 includes a voice recognition module 12322 and a voice processing module 12324. The voice processing module 12324 is coupled to the voice recognition module 12322. Here, the speech recognition module 12322 receives the speech signal transmitted from the speech sampling module 1224 to convert the speech signal into a plurality of segmentation semantics (eg, keywords or words, etc.). Voice processing module 12324 The meanings represented by the segmentation semantics (eg, intent, time, location, etc.) can be parsed according to the segmentation semantics, and the meaning represented in the speech signal can be determined. In addition, the voice processing module 12324 also generates corresponding response information according to the parsed result. In this embodiment, the speech understanding module 1232 can be implemented by a hardware circuit composed of one or several logic gates, or can be implemented by a computer program code. It is worth mentioning that in another embodiment, the voice understanding module 1232 can be configured in the mobile terminal device 1320, such as the voice control system 1300 shown in FIG. The operation of the speech understanding module 1232 of the server 1230 can be as shown in the natural language understanding system 100 of FIG. 1A and the natural language dialogue system 500/700/700' of FIGS. 5A/7A/7B.

以下即結合上述語音操控系統1200來說明語音操控的方法。圖14是依照本發明一實施例所繪示的語音操控方法的流程圖。請同時參照圖12及圖14，於步驟S1402中，輔助啟動裝置1210發送無線傳輸信號至移動終端裝置1220。詳細的說明是，當輔助啟動裝置1210的第一無線傳輸模組1212因接收到一觸發信號被觸發時，此輔助啟動裝置1210會發送無線傳輸信號至移動終端裝置1220。具體而言，當輔助啟動裝置1210中的觸發模組1214被使用者按壓時，此時觸發模組1214會因觸發信號被觸發，而使第一無線傳輸模組1212發送無線傳輸信號至移動終端裝置1220的第二無線傳輸模組1222，藉以使得第一無線傳輸模組1212透過無線通訊協定與第二無線傳輸模組1222連結。上述的輔助啟動裝置1210僅用於開啟移動終端裝置1220中的語音系統，並不具有聽/通話的功能，故內部的電路設計可簡化，成本也較低。換言之，相對於一般移動終端裝置1220所附加的免持耳機/麥克風而言，輔助啟動裝置1210是另一裝置，即使用者可能同時具備免持的耳機/麥克風以及本發明的輔助啟動裝置1210。The method of voice manipulation will be described below in conjunction with the voice control system 1200 described above. FIG. 14 is a flowchart of a voice control method according to an embodiment of the invention. Referring to FIG. 12 and FIG. 14 simultaneously, in step S1402, the auxiliary activation device 1210 transmits a wireless transmission signal to the mobile terminal device 1220. The detailed description is that when the first wireless transmission module 1212 of the auxiliary activation device 1210 is triggered by receiving a trigger signal, the auxiliary activation device 1210 transmits a wireless transmission signal to the mobile terminal device 1220. Specifically, when the trigger module 1214 in the auxiliary activation device 1210 is pressed by the user, the trigger module 1214 is triggered by the trigger signal, and the first wireless transmission module 1212 sends the wireless transmission signal to the mobile terminal. The second wireless transmission module 1222 of the device 1220 is configured to enable the first wireless transmission module 1212 to be coupled to the second wireless transmission module 1222 via a wireless communication protocol. The above-mentioned auxiliary starting device 1210 is only used to turn on the voice system in the mobile terminal device 1220, and does not have the function of listening/talking, so the internal circuit design can be simplified and the cost is low. In other words, The auxiliary activation device 1210 is another device with respect to the hands-free headset/microphone attached to the general mobile terminal device 1220, that is, the user may have both a hands-free headset/microphone and the auxiliary activation device 1210 of the present invention.

值得一提的是，上述的輔助啟動裝置1210的形體可以是使用者隨手可及的用品，例如戒指、手錶、耳環、項鏈、眼鏡等各種隨身可攜式物品，或者是安裝構件，例如為配置於方向盤上的行車配件，不限於上述。也就是說，輔助啟動裝置1210為“生活化”的裝置，透過內部系統的設置，讓使用者能夠輕易地觸碰到觸發模組1214，以開啟語音系統1221。因此，使用本發明的輔助啟動裝置1210可以將移動終端裝置1220中的語音系統1221開啟，甚至進而開啟擴音功能(後將詳述)，使得使用者在不需配戴耳機/麥克風，仍可直接透過移動終端裝置1220進行聽/通話。此外，對於使用者而言，這些“生活化”的輔助啟動裝置1210為原本就會配戴或使用的物品，故在使用上不會有不習慣或是不舒適感的問題。It is worth mentioning that the above-mentioned auxiliary starting device 1210 can be a user's easy-to-access item, such as a ring, a watch, an earring, a necklace, a pair of glasses, and the like, or a mounting member, for example, a configuration. The driving accessories on the steering wheel are not limited to the above. That is to say, the auxiliary activation device 1210 is a "living" device, and through the setting of the internal system, the user can easily touch the trigger module 1214 to turn on the voice system 1221. Therefore, the auxiliary system 1210 of the present invention can be used to turn on the voice system 1221 in the mobile terminal device 1220, and even turn on the sound amplification function (described later in detail), so that the user can still wear the earphone/microphone without using the earphone/microphone. Listening/talking is performed directly through the mobile terminal device 1220. In addition, for the user, these "living" auxiliary starting devices 1210 are items that would otherwise be worn or used, so there is no problem of uncomfortable or uncomfortable use.

此外，第一無線傳輸模組1212與第二無線傳輸模組1222皆可處於睡眠模式或工作模式。其中，睡眠模式指的是無線傳輸模組為關閉狀態，亦即無線傳輸模組不會接收/偵測無線傳輸信號，而無法與其他無線傳輸模組連結。工作模式指的是無線傳輸模組為開啟狀態，亦即無線傳輸模組可不斷地偵測無線傳輸信號，或隨時發送無線傳輸信號，而能夠與其他無線傳輸模組連結。在此，當觸發模組1214被觸發時，倘若第一無線傳輸模組1212 處於睡眠模式，則觸發模組1214會喚醒第一無線傳輸模組1212，使第一無線傳輸模組1212進入工作模式，並使第一無線傳輸模組1212發送無線傳輸信號至第二無線傳輸模組1222，而讓第一無線傳輸模組1212透過無線通訊協定與移動終端裝置1220的第二無線傳輸模組1222連結。In addition, the first wireless transmission module 1212 and the second wireless transmission module 1222 can be in a sleep mode or an operating mode. The sleep mode refers to that the wireless transmission module is in a closed state, that is, the wireless transmission module does not receive/detect wireless transmission signals, and cannot be connected with other wireless transmission modules. The working mode refers to the wireless transmission module being turned on, that is, the wireless transmission module can continuously detect wireless transmission signals, or can transmit wireless transmission signals at any time, and can be connected with other wireless transmission modules. Here, when the trigger module 1214 is triggered, if the first wireless transmission module 1212 In the sleep mode, the trigger module 1214 wakes up the first wireless transmission module 1212, causes the first wireless transmission module 1212 to enter the working mode, and causes the first wireless transmission module 1212 to transmit the wireless transmission signal to the second wireless transmission mode. The group 1222 allows the first wireless transmission module 1212 to be coupled to the second wireless transmission module 1222 of the mobile terminal device 1220 via a wireless communication protocol.

另一方面，為了避免第一無線傳輸模組1212持續維持在工作模式而消耗過多的電力，在第一無線傳輸模組1212進入工作模式後的預設時間(例如為5分鐘)內，倘若觸發模組1214未再被觸發，則第一無線傳輸模組1212會自工作模式進入睡眠模式，並停止與移動終端裝置1220的第二無線傳輸模組1220連結。On the other hand, in order to prevent the first wireless transmission module 1212 from continuously maintaining the operating mode and consuming excessive power, the preset time (for example, 5 minutes) after the first wireless transmission module 1212 enters the working mode, if triggered If the module 1214 is not triggered again, the first wireless transmission module 1212 enters the sleep mode from the working mode and stops connecting with the second wireless transmission module 1220 of the mobile terminal device 1220.

之後，於步驟S1404中，移動終端裝置1220的第二無線傳輸模組1222會接收無線傳輸信號，以啟動語音系統1221。接著，於步驟S1406，當第二無線傳輸模組1222偵測到無線傳輸信號時，移動終端裝置1220可啟動語音系統1221，而語音系統的1221取樣模組1224可開始接收語音信號，例如「今天溫度幾度？」、「打電話給老王。」、「請查詢電話號碼。」等等。Thereafter, in step S1404, the second wireless transmission module 1222 of the mobile terminal device 1220 receives the wireless transmission signal to activate the voice system 1221. Next, in step S1406, when the second wireless transmission module 1222 detects the wireless transmission signal, the mobile terminal device 1220 can activate the voice system 1221, and the 1221 sampling module 1224 of the voice system can start receiving the voice signal, for example, "Today How many degrees?", "Call to Pharaoh.", "Please check the phone number."

於步驟S1408，語音取樣模組1224會將上述語音信號傳送至伺服器1230中的語音理解模組1232，以透過語音理解模組1232解析語音信號以及產生應答資訊。進一步而言，語音理解模組1232中的語音識別模組12322會接收來自語音取樣模組1224的語音信號，並將語音信號分割成多個分段語義，而語音處理模組12324則會對上述分段語義進行語音理解，以產生用以回應語音信號的應答資訊。In step S1408, the voice sampling module 1224 transmits the voice signal to the voice understanding module 1232 in the server 1230 to analyze the voice signal and generate response information through the voice understanding module 1232. Further, the speech recognition module 12322 in the speech understanding module 1232 receives the speech signal from the speech sampling module 1224 and divides the speech signal into a plurality of segmentation semantics, and the speech processing module 12324 then Segmentation semantics for speech understanding to generate responses The response information of the tone signal.

在本發明的另一實施例中，移動終端裝置1220更可接收語音處理模組12324所產生的應答資訊，據以透過語音輸出介面1227輸出應答資訊中的內容或執行應答資訊所下達的操作。於步驟S1410，移動終端裝置1220的語音合成模組1226會接收語音理解模組1232所產生的應答資訊，並依據應答資訊中的內容(例如辭彙或字句等)進行語音合成，而產生語音應答。並且，於步驟S1412，語音輸出介面1227會接收並輸出此語音應答。In another embodiment of the present invention, the mobile terminal device 1220 can further receive the response information generated by the voice processing module 12324, and output the content in the response information or perform the operation performed by the response information through the voice output interface 1227. In step S1410, the voice synthesis module 1226 of the mobile terminal device 1220 receives the response information generated by the voice understanding module 1232, and performs voice synthesis according to the content (for example, vocabulary or sentence) in the response information to generate a voice response. . And, in step S1412, the voice output interface 1227 receives and outputs the voice response.

舉例而言，當使用者按壓輔助啟動裝置1210中的觸發模組1214時，第一無線傳輸模組1212則會發送無線傳輸信號至第二無線傳輸模組1222，使得移動終端裝置1220啟動語音系統1221的語音取樣模組1224。在此，假設來自使用者的語音信號為一詢問句，例如「今天溫度幾度？」，則語音取樣模組1224便會接收並將此語音信號傳送至伺服器1230中的語音理解模組1232進行解析，且語音理解模組1232可將解析所產生的應答資訊傳送回移動終端裝置1220。假設語音理解模組1232所產生的應答資訊中的內容為「30℃」，則語音合成模組1226會將此「30℃」的訊息合成為語音應答，且語音輸出介面1227能將此語音應播報給使用者。For example, when the user presses the trigger module 1214 in the auxiliary activation device 1210, the first wireless transmission module 1212 sends a wireless transmission signal to the second wireless transmission module 1222, so that the mobile terminal device 1220 activates the voice system. Voice sampling module 1224 of 1221. Here, assuming that the voice signal from the user is a query sentence, such as "Today's temperature is a few degrees?", the voice sampling module 1224 receives and transmits the voice signal to the voice understanding module 1232 in the server 1230. The speech understanding module 1232 can transmit the response information generated by the parsing back to the mobile terminal device 1220. Assuming that the content of the response information generated by the speech understanding module 1232 is "30 ° C", the speech synthesis module 1226 synthesizes the "30 ° C" message into a speech response, and the speech output interface 1227 can respond to the speech. Broadcast to the user.

在另一實施例中，假設來自使用者的語音信號為一命令句，例如「打電話給老王。」，則語音理解模組1232中可識別出此命令句為「撥電話給老王的請求」。此外，語音理解模組1232會再產生新的應答資訊，例如「請確認是否撥給老王」，並將此新的應答資訊傳送至移動終端裝置1220。在此，語音合成模組1226會將此新的應答資訊合成為語音應答，並透過語音輸出介面1227播報於使用者。更進一步地說，當使用者的應答為「是」的類的肯定答案時，類似地，語音取樣模組1224可接收並傳送此語音信號至伺服器1230，以讓語音理解模組1232進行解析。語音理解模組1232解析結束後，便會在應答資訊記錄有一撥號指令資訊，並傳送至移動終端裝置1220。此時，通訊模組1228則會依據電話資料庫所記錄的聯絡人資訊，查詢出「老王」的電話號碼，以建立移動終端裝置1220與另一電子裝置之間的通話連線，亦即撥號給「老王」。In another embodiment, assuming that the voice signal from the user is a command sentence, such as "calling to Pharaoh.", the voice understanding module 1232 can recognize the command sentence as "calling the phone to Pharaoh." request". In addition, the speech understanding module 1232 will generate new response information, such as "Please confirm whether to dial to Pharaoh", and this The new response information is transmitted to the mobile terminal device 1220. Here, the speech synthesis module 1226 synthesizes the new response information into a voice response, and broadcasts the message to the user through the voice output interface 1227. Further, when the user's response is a positive answer to the class of "Yes", similarly, the voice sampling module 1224 can receive and transmit the voice signal to the server 1230 for the voice understanding module 1232 to parse. . After the speech understanding module 1232 has finished parsing, a dialing instruction information is recorded in the response information and transmitted to the mobile terminal device 1220. At this time, the communication module 1228 queries the phone number of the "Pharaoh" according to the contact information recorded in the phone database to establish a connection connection between the mobile terminal device 1220 and another electronic device, that is, Dial to "Pharaoh."

在其他實施例中，除上述的語音操控系統1200外，亦可利用語音操控系統1300或其他類似的系統，進行上述的操作方法，並不以上述的實施例為限。In other embodiments, the above-described operation method may be performed by using the voice control system 1300 or other similar system in addition to the voice control system 1200 described above, and is not limited to the above embodiments.

綜上所述，在本實施例的語音操控系統與方法中，輔助啟動裝置能夠無線地開啟移動終端裝置的語音功能。而且，此輔助啟動裝置的形體可以是使用者隨手可及的“生活化”的用品，例如戒指、手錶、耳環、項鏈、眼鏡等裝飾品，即各種隨身可攜式物品，或者是安裝構件，例如為配置於方向盤上的行車配件，不限於上述。如此一來，相較於目前另外配戴免持耳機/麥克風的不舒適感，使用本發明的輔助啟動裝置1210來開啟移動終端裝置1220中的語音系統將更為便利。In summary, in the voice control system and method of the embodiment, the auxiliary activation device can wirelessly turn on the voice function of the mobile terminal device. Moreover, the shape of the auxiliary starting device may be a "living" item accessible by the user, such as a ring, a watch, an earring, a necklace, a pair of glasses, etc., that is, various portable items, or a mounting member. For example, the traveling accessory disposed on the steering wheel is not limited to the above. As a result, it is more convenient to use the auxiliary activation device 1210 of the present invention to turn on the voice system in the mobile terminal device 1220 compared to the current discomfort of wearing the hands-free headset/microphone.

值得注意的是，上述具有語音理解模組的伺服器1230可能為網路伺服器或雲端伺服器，而雲端伺服器可能會涉及到使用者的隱私權的問題。例如，使用者需上傳完整的通訊錄至雲端伺服器，才能完成如撥打電話、發簡訊等與通訊錄相關的操作。即使雲端伺服器採用加密連線，並且即用即傳不保存，還是難以消除使用者的擔優。據此，以下提供另一種語音操控的方法及其對應的語音交互系統，移動終端裝置可在不上傳完整通訊錄的情況下，與雲端伺服器來執行語音交互服務。為了使本發明的內容更為明瞭，以下特舉實施例作為本發明確實能夠據以實施的範例。It should be noted that the above-mentioned server 1230 having a speech understanding module can be Can be a web server or a cloud server, and the cloud server may involve the user's privacy issues. For example, the user needs to upload a complete address book to the cloud server in order to complete operations related to the address book such as making a call, sending a text message, and the like. Even if the cloud server uses an encrypted connection and is not saved for immediate use, it is still difficult to eliminate the user's superiority. Accordingly, the following provides another voice control method and a corresponding voice interaction system thereof, and the mobile terminal device can perform a voice interaction service with the cloud server without uploading the complete address book. In order to clarify the content of the present invention, the following specific examples are given as examples in which the present invention can be implemented.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention, and any one of ordinary skill in the art can make some changes and refinements without departing from the spirit and scope of the present invention. The scope of the invention is defined by the scope of the appended claims.

S1100、S1110、S1120、S1130、S1140、S1150、S1160、S1170、S1180、S1190‧‧‧依據本發明一實施例的基於語音辨識的選擇方法的各步驟S1100, S1110, S1120, S1130, S1140, S1150, S1160, S1170, S1180, S1190‧ ‧ ‧ steps of a speech recognition based selection method according to an embodiment of the present invention

Claims

A voice recognition-based selection method includes: receiving a first voice input; performing voice recognition on the first voice input to generate a first keyword; performing natural language processing on the first keyword to generate a corresponding first a first user intent of the voice input; selecting at least one first reward answer according to the first user intention; when the number of the selected first reward answer is 1, corresponding to the selected type of the first reward answer An operation of displaying a first candidate list including the first report answer when the number of the first report answers selected is greater than one; and performing a second voice input to perform voice recognition on the second voice input to generate a second keyword; performing natural language processing on the second keyword to generate a second user intent corresponding to the second voice input; and selecting from the first report answer of the first candidate list according to the second user intent The second reward answer.

The speech recognition-based selection method of claim 1, wherein the step of selecting the first reward answer according to the first user intention comprises: recording the record stored in a structured database with the first user intent Performing an alignment; and when the record is at least partially matched with the first user intent, then the record is The first return answer corresponding to the first voice input is considered.

The speech recognition-based selection method of claim 1, wherein the step of selecting the second reward answer from the first reward answer in the first candidate list according to the second user intention comprises: determining the Whether the second user intention includes a sequential vocabulary indicating the order; when the second user intends to include the sequential vocabulary indicating the order, the vocabulary according to the order is selected according to the order to select the first position at the corresponding position from the first candidate list Returning an answer; when the second user intention does not include the sequential vocabulary indicating the order, comparing the records corresponding to each of the first return answers in the first candidate list with the second user intention; The comparison result determines which of the first candidate answers of the first candidate list corresponds to the second speech input.

The voice recognition-based selection method of claim 3, wherein determining, according to the comparison result, which of the first candidate answers in the first candidate list corresponds to the second voice input comprises: selecting the first The highest degree of matching in the answer is the corresponding second voice input.

The speech recognition-based selection method according to claim 1, wherein the step of performing the corresponding operation according to the selected type of the first reward answer comprises: when the selected first return answer type is one Music file, then choose The first return answer is played by the music; when the selected first return answer type is an image file, the selected first return answer is played by the image; when the selected first return answer is of the type a webpage file, wherein the selected first reward answer is displayed; when the type of the selected first reward answer is an image file, the selected first reward answer is displayed; and when the selected first When the type of the answer is a business card file, the selected first return answer is dialed.

A mobile terminal device includes: a voice receiving unit, receiving a first voice input and a second voice input; a display unit; a storage unit for storing a plurality of materials; and a data processing unit coupled to the voice a receiving unit, the display unit and the storage unit, the data processing unit performs voice recognition on the first voice input to generate a first keyword, and performs natural language processing on the first keyword to generate a corresponding first voice a first user intent, and selecting a first reward answer according to the first user intention, when the number of the selected first reward answers is 1, the data processing unit corresponds according to the selected type of the first reward answer And when the number of the selected first return answers is greater than 1, the data processing unit controls the display unit to display the first candidate list including the first report answer, and the data processing unit performs the second voice Speech recognition to generate a second keyword, The second keyword performs natural language processing to generate a second user intent corresponding to the second speech input, and selects a second reward answer from the first reward answer in the first candidate list in accordance with the second user intent.

The mobile terminal device of claim 6, wherein the data processing unit compares the record corresponding to each of the first reward answers with the first user intention, and each of the records and the first user intention When at least partially matched, the record is considered to be the first return answer corresponding to the first voice input.

The mobile terminal device of claim 6, wherein the data processing unit determines whether the second user intention includes a sequential vocabulary indicating an order, and when the second user intention includes the sequential vocabulary indicating the order And the data processing unit vocabulary selects the first return answer located at the corresponding position from the first candidate list according to the sequence, and when the second user intention does not include the sequential vocabulary indicating the order, the data processing The unit compares the record corresponding to each of the first report answers in the first candidate list with the second user intention to determine a degree of matching between the first report answer and the second voice input, and according to the matching degree Determining which of the first return answers in the first candidate list corresponds to the second voice input.

The mobile terminal device of claim 8, wherein the data processing unit selects the highest degree of matching in the first report answer to correspond to the second voice input.

The mobile terminal device of claim 6, wherein when the selected type of the first return answer is a music file, the data processing unit performs music playback according to the selected first return answer. The first return selected The type of the answer is an image file, and the data processing unit performs image playback according to the selected first return answer. When the type of the selected first answer answer is a web file, the data processing unit selects according to the selected image. The first return answer is displayed. When the type of the first return answer selected is an image file, the data processing unit performs image display according to the selected first return answer, and when the selected When the type of the answer is a business card file, the data processing unit dials according to the selected first return answer.

An information system includes: a server for storing a plurality of materials and having a voice recognition function; and a mobile terminal device comprising: a voice receiving unit, receiving a first voice input and a second voice input; and displaying a data processing unit coupled to the voice receiving unit, the display unit, and the server, the data processing unit performing voice recognition on the first voice input through the server to generate a first keyword, for the first The keyword performs natural language processing to generate a first user intent corresponding to the first voice input, and the server selects a corresponding at least one first reward from the records included in the structured database according to the first user intent. The answer is transmitted to the data processing unit. When the number of the first return answer selected is 1, the data processing unit performs a corresponding operation according to the selected type of the first return answer, when the first return is selected. When the number of answers is greater than 1, the data processing unit controls the display unit to display the inclusion according to the selected first return answer. The first candidate list of the first return answers, as well as the data processing unit Performing voice recognition on the second voice input through the server to generate a second keyword, performing natural language processing on the second keyword to generate a second user intent corresponding to the second voice input, and the server is based on The second user intends to select a second reward answer from the first reward answer in the first candidate list and transmit to the data processing unit.

The information system of claim 11, wherein the server compares the record of each of the first return answers with the first user intent, wherein each of the records and the first user intention are at least partially When matching, the record is regarded as the first return answer corresponding to the first voice input.

The information system of claim 11, wherein the server determines whether the second user intention includes a sequential vocabulary indicating the order, and when the second user intends to include the sequential vocabulary indicating the order, Determining, according to the sequence, the first return answer located at the corresponding position from the first candidate list, and when the second user intention does not include the sequential vocabulary indicating the order, the server Each of the records in the candidate list is compared with the second user intent to determine a degree of matching between the first report answer and the second voice input, and determining which of the first candidate list is the first according to the matching degree The reward answer corresponds to the second voice input.

The information system of claim 13, wherein the server selects the highest degree of matching in the first report answer to correspond to the second voice input.

For example, the information system described in claim 11 of the patent scope, wherein The type of the first return answer is a music file, and the data processing unit performs music playing according to the selected first return answer. When the selected first return answer type is an image file, the data is The processing unit performs image playback according to the selected first reward answer. When the type of the selected first return answer is a webpage file, the data processing unit displays according to the selected first reward answer. If the type of the first return answer selected is an image file, the data processing unit performs image display according to the selected first return answer, and when the selected first return answer type is a business card file, The data processing unit performs dialing according to the selected first return answer.

A voice recognition-based selection method includes: performing voice recognition on a first voice input to generate a first keyword; performing a search in a structured database according to the first keyword to obtain at least one first reward answer; When the number of the selected first reward answers is greater than 1, displaying a first candidate data including the first reward answer; receiving a second voice input after displaying the first candidate list, and inputting the second voice input Performing speech recognition to generate a second keyword; and selecting a second reward answer from the first reward answer of the first candidate list according to the second user intent.

The speech recognition-based selection method of claim 16, wherein the step of searching the first keyword in a structured database to obtain at least one first reward answer comprises: When the record of the structured database is at least partially matched to the first keyword, the record is considered to be the first return answer corresponding to the first voice input.

The speech recognition-based selection method of claim 16, wherein the step of selecting the second reward answer from the first reward answer in the first candidate list according to the second keyword comprises: when When the second keyword includes the sequential vocabulary indicating the order, the first return answer located at the corresponding position is selected from the first candidate list according to the order; when the second keyword does not include the order indicating the order When the vocabulary is vocabulary, compare the record corresponding to each of the first return answers in the first candidate list with the second keyword; and determine which first return of the first candidate list according to the comparison result The answer corresponds to the second voice input.

The voice recognition-based selection method of claim 18, wherein determining, according to the comparison result, which of the first candidate answers in the first candidate list corresponds to the second voice input comprises: selecting the first The highest degree of matching in the answer is the corresponding second voice input.

The speech recognition-based selection method according to claim 16, wherein the step of performing the corresponding operation according to the selected type of the first reward answer comprises: when the selected first return answer type is one a music file, the music is played on the selected first return answer; When the selected first return answer type is an image file, the selected first return answer is played by the image; when the selected first return answer type is a web file, the selected first Returning the answer for display; when the selected first return answer type is an image file, performing a picture display on the selected first return answer; and when the selected first return answer type is a business card file, Dial the selected first return answer.