TWI780502B

TWI780502B - Speech recognition system, command generation system, and speech recognition method thereof

Info

Publication number: TWI780502B
Application number: TW109136554A
Authority: TW
Inventors: 張國峰; 洪士昇; 汪青
Original assignee: 威盛電子股份有限公司
Priority date: 2020-09-25
Filing date: 2020-10-21
Publication date: 2022-10-11
Also published as: TW202213086A; CN112216278A

Abstract

A speech recognition system, an instruction generation system and a speech recognition method thereof are provided. The speech recognition system is adapted to communicate with an application system. The application system receives a speech input. The speech recognition system includes a speech recognition module, a natural speech understanding system, and a command generation system. The speech recognition module receives the speech input provided by the application system, and recognizes the speech input to generate speech information. The natural speech understanding system is coupled to the speech recognition module. The natural speech understanding system understands speech information to generate a semantic analysis result. The command generation system utilizes the semantic analysis result to compare a selected item in an interface content of a current user interface, and then outputs a control command to the application system according to a comparison result.

Description

Speech recognition system, command generation system and speech recognition method thereof

本發明是有關於一種語音識別技術，且特別是有關於一種語音識別系統、指令產生系統及其語音識別方法。The present invention relates to a speech recognition technology, and in particular to a speech recognition system, an instruction generating system and a speech recognition method thereof.

隨著語音識別技術的演進，各種應用系統都開始嘗試搭載有語音識別功能，以提升應用系統的操作便利性。特別是，對於虛擬實境（Virtual Reality，VR）系統或擴增實境（Augmented Reality，AR）系統而言，若能提供語音識別功能將可大幅提升虛擬實境系統或擴增實境系統的操作便利性以及使用者體驗。然而，由於語音識別往往需要花費大量的系統資源，因此將會導致系統的建置成本增加，甚至影響系統運行速度。With the evolution of speech recognition technology, various application systems are beginning to try to be equipped with a speech recognition function, so as to improve the operation convenience of the application system. Especially, for a virtual reality (Virtual Reality, VR) system or an augmented reality (Augmented Reality, AR) system, if the voice recognition function can be provided, the performance of the virtual reality system or the augmented reality system can be greatly improved. Ease of operation and user experience. However, since the voice recognition often needs to consume a large amount of system resources, it will increase the construction cost of the system, and even affect the running speed of the system.

另一個問題是，由於傳統的語音識別是透過自動語音辨識文法（Automatic Speech Recognition grammar, ASR grammar）的編譯來實現語音識別功能，因此使用者須將語音選擇的所有說法與內容都編譯至自動語音辨識文法當中，並且只有完全符合自動語音辨識文法的文字才能夠被匹配。也就是說，傳統的語音識別方法對於系統開發者來說需要花費大量的工作量，並且語音識別的使用上也不夠靈活。甚至，若需要在虛擬實境系統或擴增實境系統中實現語音識別功能，可能還需要搭配更改應用系統的內部系統設定，而導致系統的設置複雜度以及設置成本的增加。Another problem is that since the traditional speech recognition realizes the speech recognition function through the compilation of Automatic Speech Recognition grammar (ASR grammar), the user must compile all the sayings and contents selected by the speech into the automatic speech Among the recognition grammars, only words that fully conform to the automatic speech recognition grammar can be matched. That is to say, the traditional voice recognition method requires a lot of workload for system developers, and the use of voice recognition is not flexible enough. Even, if the voice recognition function needs to be implemented in the virtual reality system or the augmented reality system, it may be necessary to modify the internal system settings of the application system, which will increase the complexity and cost of setting the system.

本發明提供一種語音識別系統、指令產生系統及其語音識別方法，可提供便捷的語音識別功能。The invention provides a speech recognition system, an instruction generation system and a speech recognition method thereof, which can provide convenient speech recognition functions.

本發明的語音識別系統適於與應用系統進行通信。應用系統用以接收語音輸入。語音識別系統包括語音辨識模組、自然語音理解系統以及指令產生系統。語音辨識模組用以接收由應用系統提供的語音輸入，並且辨識語音輸入，以產生語音資訊。自然語音理解系統耦接語音辨識模組，並且用以理解語音資訊，以產生語意分析結果。指令產生系統耦接自然語音理解系統，並且用以利用該語意分析結果來比對在一當前使用者介面的一介面內容中的一選擇項目，並且依據一比對結果來輸出控制指令至應用系統。The speech recognition system of the present invention is adapted to communicate with application systems. The application system is used for receiving voice input. The speech recognition system includes a speech recognition module, a natural speech understanding system, and an instruction generation system. The voice recognition module is used for receiving voice input provided by the application system, and recognizing the voice input to generate voice information. The natural speech understanding system is coupled to the speech recognition module and used for understanding speech information to generate semantic analysis results. The instruction generating system is coupled to the natural speech understanding system, and is used for comparing a selection item in an interface content of a current user interface by using the semantic analysis result, and outputting a control instruction to the application system according to a comparison result .

在本發明的一實施例中，上述的指令產生系統包括比對模組以及指令確認模組。比對模組用以接收語意分析結果，並且利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目，以產生比對結果。指令確認模組耦接比對模組，並且用以依據指令格式來轉換比對結果，而輸出控制指令。In an embodiment of the present invention, the above-mentioned command generation system includes a comparison module and a command confirmation module. The comparison module is used to receive the semantic analysis result, and use the semantic analysis result to compare the selected items in the interface content of the current user interface to generate a comparison result. The command confirmation module is coupled to the comparison module, and is used for converting the comparison result according to the command format, and outputting the control command.

在本發明的一實施例中，上述的應用系統用以顯示當前使用者介面。當應用系統接收到由語音識別系統所輸出的控制指令後，應用系統依據控制指令來選擇當前使用者介面的介面內容中的選擇項目，並更換以顯示下一使用者介面或執行特定操作。In an embodiment of the present invention, the above-mentioned application system is used to display the current user interface. After the application system receives the control command output by the speech recognition system, the application system selects the selection item in the interface content of the current user interface according to the control command, and changes to display the next user interface or perform a specific operation.

在本發明的一實施例中，上述的自然語音理解系統包括自然語言處理器、知識輔助理解模組、檢索系統以及分析結果輸出模組。自然語言處理器耦接語音辨識模組，並且用以接收語音資訊，以產生可能意圖語法資料。知識輔助理解模組耦接自然語言處理器，並且用儲存可能意圖語法資料的意圖資料。檢索系統耦接知識輔助理解模組，並且用以接收知識輔助理解模組提供的可能意圖語法資料的關鍵字，以依據關鍵字來產生回應結果至知識輔助理解模組，以使知識輔助理解模組依據回應結果來產生確定意圖語法資料。分析結果輸出模組耦接知識輔助理解模組以及指令產生系統，並且用以依據確定意圖語法資料來輸出語意分析結果。In an embodiment of the present invention, the above-mentioned natural speech understanding system includes a natural language processor, a knowledge-aided understanding module, a retrieval system, and an analysis result output module. The natural language processor is coupled to the speech recognition module and is used to receive speech information to generate possible intent grammar data. The knowledge aided comprehension module is coupled to the natural language processor and uses intent data storing possible intent grammar data. The retrieval system is coupled to the knowledge-assisted understanding module, and is used to receive the keywords of the possible intention grammar data provided by the knowledge-assisted understanding module, so as to generate a response result to the knowledge-assisted understanding module according to the keywords, so that the knowledge-assisted understanding module The group generates intent-determined grammar data based on the response results. The analysis result output module is coupled to the knowledge aided comprehension module and the command generation system, and is used to output the semantic analysis result according to the grammar data of the determined intention.

本發明的指令產生系統適於與應用系統進行通信。應用系統用以接收語音輸入。指令產生系統包括比對模組以及指令確認模組。比對模組用以接收對應於語音輸入的語意分析結果，並且利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目，以產生比對結果。指令確認模組耦接比對模組，以依據指令格式來轉換比對結果，而輸出控制指令至應用系統。The instruction generating system of the present invention is adapted to communicate with an application system. The application system is used for receiving voice input. The command generation system includes a comparison module and a command confirmation module. The comparison module is used for receiving the semantic analysis result corresponding to the voice input, and using the semantic analysis result to compare the selected items in the interface content of the current user interface to generate a comparison result. The command confirmation module is coupled to the comparison module to convert the comparison result according to the command format, and output the control command to the application system.

本發明的語音識別方法適於語音識別系統。語音識別系統與應用系統進行通信，並且應用系統用以接收語音輸入。語音識別方法包括以下步驟。接收由應用系統提供的語音輸入。辨識語音輸入，以產生語音資訊。理解語音資訊，以產生語意分析結果。比對語意分析結果，以輸出控制指令至應用系統。The speech recognition method of the present invention is suitable for a speech recognition system. The voice recognition system communicates with the application system, and the application system is used to receive voice input. The speech recognition method includes the following steps. Receive voice input provided by the application system. Recognize voice input to generate voice information. Understand speech information to generate semantic analysis results. Compare the semantic analysis results to output control instructions to the application system.

本發明的語音識別方法適於指令產生系統。指令產生系統適於與應用系統進行通信，並且應用系統用以接收語音輸入。語音識別方法包括以下步驟。接收對應於語音輸入的語意分析結果。利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目，以產生比對結果。依據指令格式來確認比對結果，而輸出控制指令至應用系統。The speech recognition method of the present invention is suitable for an instruction generation system. The instruction generation system is adapted to communicate with the application system, and the application system is configured to receive speech input. The speech recognition method includes the following steps. Semantic analysis results corresponding to the speech input are received. The semantic analysis result is used to compare the selected items in the interface content of the current user interface, so as to generate a comparison result. The comparison result is confirmed according to the instruction format, and the control instruction is output to the application system.

基於上述，本發明的語音識別系統、指令產生系統及其語音識別方法可辨識由應用系統提供的語音輸入，並且回傳對應的控制指令至應用系統，而使應用系統可依據控制指令來執行相對應的操作。因此，本發明的語音識別系統、指令產生系統及其語音識別方法除了可提供便捷的語音識別功能，還可降低在應用系統中對於語音識別所需要的系統資源。Based on the above, the speech recognition system, command generation system and speech recognition method thereof of the present invention can recognize the speech input provided by the application system, and return the corresponding control command to the application system, so that the application system can execute the corresponding command according to the control command. corresponding operation. Therefore, the speech recognition system, command generation system and speech recognition method thereof of the present invention can not only provide a convenient speech recognition function, but also reduce system resources required for speech recognition in an application system.

為讓本發明的上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail together with the accompanying drawings.

為了使本發明之內容可以被更容易明瞭，以下特舉實施例做為本發明確實能夠據以實施的範例。另外，凡可能之處，在圖式及實施方式中使用相同標號的元件/構件/步驟，係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following specific examples are given as examples in which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts.

圖1是依照本發明的一實施例的語音識別系統的示意圖。參考圖1，語音識別系統100適於與應用系統200進行通信，並且語音識別系統100與應用系統200之間可以為有線或無線的方式進行通信。在本實施例中，應用系統200包括語音接收模組210以及指令執行模組220。應用系統200通過語音接收模組210接收由使用者所提供的語音輸入101，並且傳輸語音輸入101至語音識別系統100。在本實施例中，語音識別系統100可對由應用系統200提供的語音輸入101進行語音識別，以產生對應的指令，並且語音輸入101將所述指令回傳至應用系統200的指令執行模組220，以使指令執行模組220執行應用系統200的相關操作。換言之，本實施例的語音識別系統100可搭配任意的應用系統，並且提供語音識別功能。FIG. 1 is a schematic diagram of a speech recognition system according to an embodiment of the invention. Referring to FIG. 1 , the speech recognition system 100 is suitable for communicating with the application system 200 , and the communication between the speech recognition system 100 and the application system 200 can be in a wired or wireless manner. In this embodiment, the application system 200 includes a voice receiving module 210 and an instruction executing module 220 . The application system 200 receives the voice input 101 provided by the user through the voice receiving module 210 , and transmits the voice input 101 to the voice recognition system 100 . In this embodiment, the speech recognition system 100 can perform speech recognition on the speech input 101 provided by the application system 200 to generate corresponding instructions, and the speech input 101 returns the instructions to the instruction execution module of the application system 200 220 , so that the instruction execution module 220 executes related operations of the application system 200 . In other words, the speech recognition system 100 of this embodiment can be used with any application system and provide a speech recognition function.

在本實施例中，應用系統200可例如是一種運行或搭載在虛擬實境（Virtual Reality，VR）設備或擴增實境（Augmented Reality，AR）設備上的遊戲程式或應用程式，並且使用者可通過語音的方式來控制遊戲程式中的相關操作。虛擬實境設備或擴增實境設備可包括處理電路、記憶體及語音感測裝置等硬體電路，以藉由處理電路執行或存取記憶體內的相關模組或程式，而可至少實現本發明的語音接收功能、指令執行功能以及應用程式執行功能。在本實施例中，語音識別系統100可例如是建置在雲端伺服器或本地主機設備中，以提供語音的相關識別及處理功能。語音識別系統100亦可包括另一處理電路及另一記憶體，以藉由另一處理電路執行或存取另一記憶體內的相關模組或程式，而可至少實現本發明的語音識別功能。In this embodiment, the application system 200 may be, for example, a game program or an application program running or carried on a virtual reality (Virtual Reality, VR) device or an augmented reality (Augmented Reality, AR) device, and the user The relevant operations in the game program can be controlled by voice. A virtual reality device or an augmented reality device may include hardware circuits such as processing circuits, memories, and voice sensing devices, so that the processing circuits may execute or access relevant modules or programs in the memory, and at least realize the present invention. Invented voice receiving function, command execution function and application program execution function. In this embodiment, the voice recognition system 100 may be built in a cloud server or a local host device, for example, to provide related voice recognition and processing functions. The speech recognition system 100 may also include another processing circuit and another memory, so that another processing circuit executes or accesses a related module or program in another memory, so as to realize at least the speech recognition function of the present invention.

在本實施例中，語音識別系統100包括語音辨識模組110、自然語言理解系統120、指令產生系統130以及儲存裝置140。語音辨識模組110耦接應用系統200的語音接收模組210以及自然語言理解系統120。指令產生系統130耦接自然語言理解系統120以及儲存裝置140。圖2是依照本發明的一實施例的語音識別方法的流程圖。搭配圖2的語音識別方法，圖1的語音識別系統100可執行如圖2的步驟S210~S240，以實現語音識別功能。在步驟S210中，語音識別系統100的語音辨識模組110接收由應用系統200提供的語音輸入101。在步驟S220中，語音辨識模組110辨識語音輸入101，以產生語音資訊102。在本實施例中，語音辨識模組110可將語音輸入101的信號轉換為電腦可處理及分析的語音資訊102（或稱資料）。In this embodiment, the speech recognition system 100 includes a speech recognition module 110 , a natural language understanding system 120 , an instruction generation system 130 and a storage device 140 . The voice recognition module 110 is coupled to the voice receiving module 210 of the application system 200 and the natural language understanding system 120 . The instruction generating system 130 is coupled to the natural language understanding system 120 and the storage device 140 . FIG. 2 is a flowchart of a voice recognition method according to an embodiment of the invention. With the voice recognition method in FIG. 2 , the voice recognition system 100 in FIG. 1 can execute steps S210 - S240 in FIG. 2 to realize the voice recognition function. In step S210 , the voice recognition module 110 of the voice recognition system 100 receives the voice input 101 provided by the application system 200 . In step S220 , the voice recognition module 110 recognizes the voice input 101 to generate voice information 102 . In this embodiment, the voice recognition module 110 can convert the voice input 101 signal into voice information 102 (or data) that can be processed and analyzed by a computer.

在步驟S230中，自然語言理解系統120接收由語音辨識模組110輸出的語音資訊102，並且理解語音資訊102，以產生語意分析結果103。在步驟S240中，指令產生系統130接收由自然語言理解系統120輸出的語意分析結果103，並且比對語意分析結果103，以輸出控制指令105至應用系統200。在本實施例中，儲存裝置140可提供當前應用系統200所顯示的使用者介面的介面內容104至指令產生系統130，以使指令產生系統130可比對語意分析結果103以及當前使用者介面的介面內容104，而產生控制指令105至應用系統200的指令執行模組220。在本實施例中，指令執行模組220可依據控制指令105來使應用系統200顯示下一使用者介面或執行特定操作。儲存裝置140可以是伺服器或電腦系統內的任何類型的記憶體，例如動態隨機記憶體（DRAM），靜態隨機記憶體（SRAM）、快閃記憶體（Flash memory）、唯讀記憶體（ROM）等，本發明對此並不加以限制，本領域的技術人員可以依據實際需求進行選用In step S230 , the natural language understanding system 120 receives the speech information 102 output by the speech recognition module 110 , and understands the speech information 102 to generate a semantic analysis result 103 . In step S240 , the instruction generating system 130 receives the semantic analysis result 103 output by the natural language understanding system 120 , and compares the semantic analysis result 103 to output the control instruction 105 to the application system 200 . In this embodiment, the storage device 140 can provide the interface content 104 of the user interface displayed by the current application system 200 to the instruction generation system 130, so that the instruction generation system 130 can compare the semantic analysis result 103 with the interface of the current user interface content 104 to generate the control command 105 to the command execution module 220 of the application system 200 . In this embodiment, the command execution module 220 can make the application system 200 display the next user interface or execute a specific operation according to the control command 105 . The storage device 140 can be any type of memory in the server or computer system, such as dynamic random access memory (DRAM), static random access memory (SRAM), flash memory (Flash memory), read-only memory (ROM ), etc., the present invention is not limited to this, those skilled in the art can select according to actual needs

在本實施例中，自然語言理解系統120可例如將語音資訊102轉換為文本信息（Text Information），並且將文本信息進行規範化，而產生具有意圖對象的語意分析結果103。並且，指令產生系統130可產生對應於意圖對象的控制指令105，並提供至應用系統200，以使應用系統200可執行控制指令105而顯示下一使用者介面或執行特定操作。因此，本實施例的語音識別方法可使應用系統200無須花費額外的系統資源來執行語音識別，而可有效地節省應用系統200執行識別使用者的語音輸入的功能所需的系統資源。In this embodiment, the natural language understanding system 120 may convert the speech information 102 into text information (Text Information), and normalize the text information, so as to generate a semantic analysis result 103 with an intended object. Moreover, the command generation system 130 can generate the control command 105 corresponding to the intended object and provide it to the application system 200, so that the application system 200 can execute the control command 105 to display the next user interface or perform a specific operation. Therefore, the speech recognition method of this embodiment can make the application system 200 not need to spend extra system resources to perform speech recognition, and can effectively save the system resources required for the application system 200 to perform the function of recognizing the user's speech input.

值得注意的是，自然語言理解系統120所輸出的語意分析結果103可包括一個或一個以上的可能語意資料，並且語意資料可包括關鍵字及意圖資料。換言之，使用者能夠通過口語化方式來表達選擇意圖，例如選擇項目的全名、簡稱或別名等，即可通過本實施例的語音識別系統100來產生相對應的控制指令，而不需要念出完整的特定名稱。對此，關於自然語言理解系統120產生語意分析結果103的方式，以下將以圖6實施例來舉例說明之。It should be noted that the semantic analysis result 103 output by the natural language understanding system 120 may include one or more possible semantic data, and the semantic data may include keywords and intent data. In other words, the user can express the selection intention in a colloquial manner, such as the full name, abbreviation or alias of the selected item, and the corresponding control command can be generated through the speech recognition system 100 of this embodiment without having to read out The full specific name. Regarding this, regarding the manner in which the natural language understanding system 120 generates the semantic analysis result 103 , the embodiment in FIG. 6 will be used as an example to illustrate below.

圖3是依照本發明的一實施例指令產生系統的示意圖。圖4是依照本發明的另一實施例的語音識別方法的流程圖。參考圖1、圖3以及圖4，圖1的指令產生系統130為一種應用系統介面（Interface），使用者可透過編輯指令產生系統130，來使語音識別系統100可適用於對應的應用系統200。指令產生系統130可包括如圖3所示的系統架構。在本實施例中，指令產生系統130包括比對模組131、指令確認模組132、暫存裝置133、存取模組134以及項目獲取模組135。比對模組131耦接指令確認模組132以及項目獲取模組135。暫存裝置133耦接指令確認模組132以及存取模組134。存取模組134耦接項目獲取模組135以及儲存裝置140。在本實施例中，暫存裝置133可例如是動態隨機記憶體（DRAM），靜態隨機記憶體（SRAM）、快閃記憶體（Flash memory）、唯讀記憶體（ROM）等，本發明對此並不加以限制，本領域的技術人員可以依據實際需求進行選用。FIG. 3 is a schematic diagram of an instruction generation system according to an embodiment of the invention. FIG. 4 is a flowchart of a speech recognition method according to another embodiment of the present invention. Referring to FIG. 1 , FIG. 3 and FIG. 4 , the command generating system 130 in FIG. 1 is an application system interface (Interface). Users can edit the command generating system 130 to make the voice recognition system 100 applicable to the corresponding application system 200 . The instruction generation system 130 may include a system architecture as shown in FIG. 3 . In this embodiment, the instruction generation system 130 includes a comparison module 131 , an instruction confirmation module 132 , a temporary storage device 133 , an access module 134 and an item acquisition module 135 . The comparison module 131 is coupled to the instruction confirmation module 132 and the item acquisition module 135 . The temporary storage device 133 is coupled to the command confirmation module 132 and the access module 134 . The access module 134 is coupled to the item acquisition module 135 and the storage device 140 . In this embodiment, the temporary storage device 133 can be, for example, dynamic random access memory (DRAM), static random access memory (SRAM), flash memory (Flash memory), read-only memory (ROM), etc. This is not limited, and those skilled in the art can select according to actual needs.

搭配圖4的語音識別方法，圖3的指令產生系統130可執行如圖4的步驟S410~S450，以實現語音識別以及指令產生功能。在步驟S410中，暫存裝置133接收由應用系統200提供的當前使用者介面的介面編號301。在步驟S420中，存取模組134依據介面編號301來產生當前使用者介面的介面內容303。在本實施例中，存取模組134可依據介面編號301來存取預先載入在儲存裝置140中的介面資料，以取得應用系統200所顯示的當前使用者介面的介面內容303。With the voice recognition method in FIG. 4 , the command generation system 130 in FIG. 3 can execute steps S410 - S450 in FIG. 4 to realize voice recognition and command generation functions. In step S410 , the temporary storage device 133 receives the interface number 301 of the current user interface provided by the application system 200 . In step S420 , the access module 134 generates the interface content 303 of the current user interface according to the interface number 301 . In this embodiment, the access module 134 can access the interface data preloaded in the storage device 140 according to the interface number 301 to obtain the interface content 303 of the current user interface displayed by the application system 200 .

在步驟S430中，項目獲取模組135接收由存取模組134提供的當前使用者介面的介面內容303，並且項目獲取模組135從當前使用者介面的介面內容303中獲取選擇項目304，以輸出選擇項目304至比對模組131。在步驟S440中，比對模組131利用語意分析結果103來比對當前使用者介面的選擇項目304，以產生比對結果305。值得注意的是，選擇項目304可包括項目名稱以及對應於項目名稱的多個參考關鍵字。也就是說，項目獲取模組135可從當前使用者介面的介面內容303提取出選擇項目304的項目名稱以及對應於項目名稱的所述多個參考關鍵字。並且，比對模組131可比對語意分析結果103是否與項目名稱以及所述多個參考關鍵字的其中之一匹配，以產生比對結果305。換言之，使用者所說出的語音輸入經由自然語言理解系統120理解後所產生的語意分析結果103只要與項目名稱以及所述多個參考關鍵字的其中之一匹配，則比對模組131可例如輸出對應的項目編號的比對結果305。所述多個參考關鍵字可例如是項目名稱的簡稱或別名。In step S430, the item acquisition module 135 receives the interface content 303 of the current user interface provided by the access module 134, and the item acquisition module 135 acquires the selected item 304 from the interface content 303 of the current user interface, to The selection item 304 is output to the comparison module 131 . In step S440 , the comparison module 131 uses the semantic analysis result 103 to compare the selection item 304 of the current user interface to generate a comparison result 305 . It should be noted that the selected item 304 may include an item name and a plurality of reference keywords corresponding to the item name. That is to say, the item obtaining module 135 can extract the item name of the selected item 304 and the plurality of reference keywords corresponding to the item name from the interface content 303 of the current user interface. Moreover, the comparison module 131 can compare whether the semantic analysis result 103 matches the item name and one of the plurality of reference keywords to generate a comparison result 305 . In other words, as long as the semantic analysis result 103 generated after the voice input spoken by the user matches the item name and one of the plurality of reference keywords after being understood by the natural language understanding system 120, the comparison module 131 can For example, the comparison result 305 of the corresponding item number is output. The plurality of reference keywords may be, for example, abbreviations or aliases of project names.

在步驟S450中，指令確認模組132依據指令格式307來轉換比對結果305，而輸出例如具有對應的項目編號的控制指令306。在本實施例中，指令格式307是指應用系統100所能接收的指令形式，並且指令確認模組132是通過暫存裝置133來輸出控制指令306至應用系統200。因此，當應用系統200顯示當前使用者介面時，若應用系統200接收到由語音識別系統100所輸出的控制指令306後，應用系統200可依據控制指令306來選擇當前使用者介面的介面內容303中的選擇項目，以使應用系統200可例如依據上述獲得的項目編號來更換顯示下一使用者介面或執行特定操作。據此，本實施例的語音識別方法可使指令產生系統130可有效地識別使用者的語音輸入，而產生對應的控制指令。In step S450 , the instruction confirmation module 132 converts the comparison result 305 according to the instruction format 307 , and outputs, for example, a control instruction 306 with a corresponding item number. In this embodiment, the instruction format 307 refers to the instruction format that the application system 100 can receive, and the instruction confirmation module 132 outputs the control instruction 306 to the application system 200 through the temporary storage device 133 . Therefore, when the application system 200 displays the current user interface, if the application system 200 receives the control command 306 output by the speech recognition system 100, the application system 200 can select the interface content 303 of the current user interface according to the control command 306 Select the item in the item, so that the application system 200 can, for example, change to display the next user interface or perform a specific operation according to the item number obtained above. Accordingly, the voice recognition method of this embodiment can enable the command generating system 130 to effectively recognize the user's voice input and generate corresponding control commands.

另外，由於語音識別系統100可應用於各種應用系統，因此使用者僅須對語音識別系統100進行相關編輯，而無需更動應用系統。舉例而言，語音識別系統100可先操作在編輯模式(或透過語音辨識系統的軟體開發套件(SDK, Software Development Kit進行編輯)，以將應用系統200所顯示的使用者介面的介面內容104以及指令格式302可預先通過暫存裝置133以及存取模組134來寫入儲存裝置140。因此，當語音識別系統100操作在工作模式時，指令產生系統130可通過暫存裝置133來接收當前使用者介面的介面編號301，並且存取模組134可依據介面編號301來讀取儲存裝置140以取得對應的當前使用者介面的介面內容303。指令確認模組132可通過存取模組134來取得指令格式307。也就是說，本實施例的語音識別系統100可適於搭配各種應用系統，來提供有效的語音識別及語音選擇功能。In addition, since the speech recognition system 100 can be applied to various application systems, the user only needs to edit the speech recognition system 100 without changing the application system. For example, the speech recognition system 100 can first be operated in the edit mode (or edited through a software development kit (SDK, Software Development Kit) of the speech recognition system, so that the interface content 104 of the user interface displayed by the application system 200 and The instruction format 302 can be written into the storage device 140 through the temporary storage device 133 and the access module 134 in advance. Therefore, when the speech recognition system 100 is operated in the working mode, the command generation system 130 can receive the current usage through the temporary storage device 133. The interface number 301 of the user interface, and the access module 134 can read the storage device 140 according to the interface number 301 to obtain the interface content 303 of the corresponding current user interface. The command confirmation module 132 can use the access module 134 to Obtain the instruction format 307. That is to say, the voice recognition system 100 of this embodiment can be adapted to match with various application systems to provide effective voice recognition and voice selection functions.

圖5是依照本發明的一實施例的應用系統的使用者介面示意圖。圖5為圖1的應用系統200所可能顯示的使用者介面的範例。參考圖1、圖3以及圖5，應用系統200可例如執行虛擬實境的遊戲程式。對此，以遊戲程式為例，需先說明的是，遊戲開發者可先依據在遊戲中可能會顯示的每個使用者介面都建立對應的一個或多個資料集，其中每個使用者介面可各別包括一個或多個項目名稱。對此，遊戲開發者可對於每一個項目名稱都建立包括項目標號、項目名稱位於介面上的列數及行數，以及對應於項目名稱的多個參考關鍵字的資料集。因此，當語音識別系統100與遊戲程式連接時，遊戲程式可將建立的多個資料集輸入至指令產生系統130的暫存裝置133，並儲存在暫存裝置133中。存取模組134可接著讀取暫存裝置133，並且將所述多個資料集儲存至語音識別系統100的儲存裝置140中。FIG. 5 is a schematic diagram of a user interface of an application system according to an embodiment of the present invention. FIG. 5 is an example of a user interface that may be displayed by the application system 200 of FIG. 1 . Referring to FIG. 1 , FIG. 3 and FIG. 5 , the application system 200 can, for example, execute a virtual reality game program. In this regard, taking a game program as an example, what needs to be explained first is that the game developer can first create one or more corresponding data sets according to each user interface that may be displayed in the game, wherein each user interface Can include one or more project names each. For this, the game developer can create a data set for each item name, including the item label, the number of columns and rows where the item name is located on the interface, and a plurality of reference keywords corresponding to the item name. Therefore, when the speech recognition system 100 is connected with the game program, the game program can input the created multiple data sets into the temporary storage device 133 of the instruction generation system 130 and store them in the temporary storage device 133 . The access module 134 can then read the temporary storage device 133 and store the plurality of data sets in the storage device 140 of the speech recognition system 100 .

接著，假設應用系統200先顯示如圖5的使用者介面510。使用者介面510的介面內容包括有介面名稱511（首頁）以及多個選擇項目512~514。存取模組134所存取儲存裝置140的介面內容303可例如包括所述多個選擇項目512~514的資料。值得注意的是，項目獲取模組135可從使用者介面510的介面內容提取出選擇項目512~514的多個項目名稱以及對應於項目名稱的多個參考關鍵字，以輸出多個項目名稱及其對應的所述多個參考關鍵字至比對模組131進行比對。在此例中，比對模組131可從項目獲取模組135取得如以下的資料內容:Next, assume that the application system 200 first displays the user interface 510 as shown in FIG. 5 . The interface content of the user interface 510 includes an interface name 511 (home page) and a plurality of selection items 512 - 514 . The interface content 303 of the storage device 140 accessed by the access module 134 may, for example, include the data of the plurality of selection items 512 - 514 . It should be noted that the item acquisition module 135 can extract multiple item names of the selected items 512-514 and multiple reference keywords corresponding to the item names from the interface content of the user interface 510, so as to output multiple item names and The corresponding reference keywords are compared with the comparison module 131 . In this example, the comparison module 131 can obtain the following data content from the item acquisition module 135:

{id=0; column=0; line=0; title=“使用說明”; alias=“使用”, “第一”, “倒數第三”，“一”, “用”, “操作”,…}{id=0; column=0; line=0; title="Instructions for use"; alias="use", "first", "third from last", "one", "use", "operation",… }

{id=1; column=0; line=1; title=“角色選擇”; alias=“選角”, “第二”, “倒數第二”，“角色”, “人物”,…}{id=1; column=0; line=1; title="role selection"; alias="casting", "second", "second to last", "role", "character",...}

{id=2; column=0; line=2; title=“關卡選擇”; alias=“關卡”, “第三”, “倒數第一”，“戰鬥”, “打仗”,…}{id=2; column=0; line=2; title="level selection"; alias="level", "third", "last one", "battle", "war",…}

其中“id”為項目標號、“column”為列數、“line”為行數、“title”為項目名稱、“alias”為參考關鍵字。在此例中，當使用者想選取的是選擇項目513時，可例如說出對應於選擇項目513的全名“角色選擇”、簡稱“選角”或項數“第二”、 “倒數第二”，等，皆可使比對模組131可比對選擇項目513，並且輸出對應的比對結果305至指令確認模組132，以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此，應用系統200可接著更換以顯示下一使用者介面520。Among them, "id" is the item number, "column" is the number of columns, "line" is the number of lines, "title" is the name of the project, and "alias" is the reference keyword. In this example, when the user wants to select the selection item 513, he can, for example, say the full name corresponding to the selection item 513 "role selection", "selection" for short, or the number of items "second", "last but one". Two", etc., can make the comparison module 131 compare the selection item 513, and output the corresponding comparison result 305 to the command confirmation module 132, so as to then generate the corresponding control command 306 to the application system 200 for command execution Module 220. Therefore, the application system 200 can then change to display the next user interface 520 .

接著，當顯示系統200顯示的使用者介面520時，使用者介面520的介面內容包括有介面名稱521（角色選擇）以及多個選擇項目522~524。在此例中，比對模組131可從項目獲取模組135取得如以下的資料內容:Then, when the user interface 520 displayed by the system 200 is displayed, the interface content of the user interface 520 includes an interface name 521 (role selection) and a plurality of selection items 522 - 524 . In this example, the comparison module 131 can obtain the following data content from the item acquisition module 135:

{id=3; column=0; line=0; title=“趙雲”; alias=“趙子龍”, “第一”, “倒數第三”， “一”, “趙”, “子龍”,…}{id=3; column=0; line=0; title="Zhao Yun"; alias="Zhao Zilong", "first", "third from last", "one", "Zhao", "Zilong",… }

{id=4; column=0; line=1; title=“關羽”; alias=“關雲長”, “第二”, “倒數第二”，“關”, “雲長”,…}{id=4; column=0; line=1; title="Guan Yu"; alias="Guan Yunchang", "Second", "Second to last", "Guan", "Yunchang",…}

{id=5; column=0; line=2; title=“曹操”; alias=“曹孟德”, “第三”, “倒數第一”， “三”, “曹”, “孟德”,…}{id=5; column=0; line=2; title=“Cao Cao”; alias=“Cao Mengde”, “third”, “last one”, “three”, “Cao”, “Meng De”,… }

在此例中，使用者可例如說出對應於選擇項目522的全名“趙雲”、簡稱“趙”、別名“趙子龍”或項數“第一”、“倒數第三”等，皆可使比對模組131可比對到使用者所選的是選擇項目522，並且輸出對應的比對結果305至指令確認模組132，以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此，應用系統200可接著更換以顯示下一使用者介面530。In this example, the user can, for example, say the full name "Zhao Yun" corresponding to the selection item 522, the abbreviation "Zhao", the alias "Zhao Zilong" or the item number "first", "the third from the bottom", etc., all can be used The comparison module 131 can compare to the selection item 522 selected by the user, and output the corresponding comparison result 305 to the instruction confirmation module 132, so as to then generate the corresponding control instruction 306 to the instruction execution module of the application system 200 Group 220. Therefore, the application system 200 can then change to display the next user interface 530 .

接著，當顯示系統200顯示的使用者介面530時，使用者介面530的介面內容包括有介面名稱531（武器選擇）以及多個選擇項目532~534。在此例中，比對模組131可從項目獲取模組135取得如以下的資料內容:Next, when the user interface 530 displayed by the system 200 is displayed, the interface content of the user interface 530 includes an interface name 531 (weapon selection) and a plurality of selection items 532 - 534 . In this example, the comparison module 131 can obtain the following data content from the item acquisition module 135:

{id=6; column=0; line=0; title=“青虹劍”; alias=“劍”, “青虹”, “第一”, “倒數第三”，“一”,…}{id=6; column=0; line=0; title="Qinghong Sword"; alias="Sword", "Qinghong", "First", "Third from last", "One",…}

{id=7; column=0; line=1; title=“長槍”; alias=“槍”, “第二”, “倒數第二”，“二”,…}{id=7; column=0; line=1; title="long gun"; alias="gun", "second", "second to last", "two",...}

{id=8; column=0; line=2; title=“大刀”; alias=“刀”, “第三”, “倒數第一”，“三”…}{id=8; column=0; line=2; title="大刀"; alias="刀", "third", "last one", "three"...}

在此例中，使用者可例如說出對應於選擇項目532的全名“青虹劍”、簡稱“青虹”或項數“第一”等，皆可使比對模組131可比對到使用者所選的是選擇項目532，並且輸出對應的比對結果305至指令確認模組132，以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此，應用系統200可接著執行在遊戲程式中的接續的相關特定操作。In this example, the user can, for example, say the full name "Qinghong Sword", the abbreviation "Qinghong" or the item number "No. The user selects the selection item 532 , and outputs the corresponding comparison result 305 to the command confirmation module 132 to then generate the corresponding control command 306 to the command execution module 220 of the application system 200 . Therefore, the application system 200 can then perform subsequent related specific operations in the game program.

然而，使用者所提供的語音輸入不限於上述的全名、簡稱、別名或項數的形式。在一實施例中，比對模組131也可以是直接從語意分析結果103中提取關於當前使用者介面的多個選擇項目的項數資訊（可包括順序的項數資訊或反序的項數資訊）、直接從語意分析結果103中提取關於當前使用者介面的多個選擇項目的行號或列號，或是直接依據語意分析結果103進行拼音匹配，以查找項目名稱的開頭、結尾或字符串可符合的選擇項目等。並且，比對模組131還可以輸出對應於多個匹配成功的多個選擇項目的比對結果305，以使應用系統200也可同時或依序執行多個控制指令。However, the voice input provided by the user is not limited to the above-mentioned forms of full name, abbreviation, alias or item number. In one embodiment, the comparison module 131 can also directly extract item number information (may include sequence item number information or reverse order item number information) of multiple selected items in the current user interface from the semantic analysis result 103 Information), directly extract the line number or column number of multiple selection items on the current user interface from the semantic analysis result 103, or directly perform pinyin matching according to the semantic analysis result 103 to find the beginning, end or character of the item name Strings can be matched to select items, etc. Moreover, the comparison module 131 can also output a comparison result 305 corresponding to a plurality of selected items that are successfully matched, so that the application system 200 can also execute a plurality of control instructions simultaneously or sequentially.

圖6是依照本發明的一實施例的自然語音理解系統的示意圖。應注意的是，在本發明的一些實施例中，本發明的自然語言理解系統可例如是應用如中國發明專利(公告號為CN103761242B)當中的自然語言理解系統的架構，但本發明並不限於此。在本發明的另一些實施例中，本發明的自然語言理解系統亦可採用其他可產生本發明各實施所述的語意分析結果的系統架構。參考圖1以及圖6，圖6的自然語言理解系統620為圖1的自然語言理解系統120的一個實施範例，但本發明的自然語言理解系統不限於此。在本實施例中。自然語言理解系統620包括自然語言處理器621、知識輔助理解模組622、檢索系統624以及分析結果輸出模組629。知識輔助理解模組622耦接自然語言處理器621以及檢索系統624。知識輔助理解模組622包括意圖資料623。檢索系統624包括結構化資料庫625、搜尋引擎626、指示資料儲存裝置627以及檢索介面單元628，其中搜尋引擎626耦接結構化資料庫625、指示資料儲存裝置627以及檢索介面單元628。FIG. 6 is a schematic diagram of a natural speech understanding system according to an embodiment of the present invention. It should be noted that, in some embodiments of the present invention, the natural language understanding system of the present invention may, for example, apply the framework of a natural language understanding system such as a Chinese invention patent (publication number CN103761242B), but the present invention is not limited to this. In other embodiments of the present invention, the natural language understanding system of the present invention may also adopt other system architectures capable of generating the semantic analysis results described in the various implementations of the present invention. Referring to FIG. 1 and FIG. 6 , the natural language understanding system 620 in FIG. 6 is an implementation example of the natural language understanding system 120 in FIG. 1 , but the natural language understanding system of the present invention is not limited thereto. In this example. The natural language understanding system 620 includes a natural language processor 621 , a knowledge-assisted understanding module 622 , a retrieval system 624 and an analysis result output module 629 . The knowledge aided comprehension module 622 is coupled to the natural language processor 621 and the retrieval system 624 . The knowledge aided comprehension module 622 includes intent data 623 . The retrieval system 624 includes a structured database 625 , a search engine 626 , an instruction data storage device 627 and a retrieval interface unit 628 , wherein the search engine 626 is coupled to the structured database 625 , the instruction data storage device 627 and the retrieval interface unit 628 .

在本實施例中，搭配參考以下表1，當自然語言理解系統620接收由圖1的語音辨識模組110所提供的語音資訊102時(例如當使用者於顯示圖5的使用者介面520時，用口頭輸入”我要子龍”)的請求資訊，自然語言處理器621可分析語音資訊102，以產生可能意圖語法資料603。自然語言處理器621可將可能意圖語法資料603送往知識輔助理解模組622，其中可能意圖語法資料603包含關鍵字604與意圖資料623。對此，由於意圖語法資料603中的關鍵字604(例如”子龍”)可能屬於不同的領域(例如角色選擇(＜roleselect＞)以及電影(＜readfilm＞)兩個領域)，所以一個語音資訊102可分析成多個可能意圖語法資料603(例如是"＜roleselect＞,＜rolename＞=子龍"或"＜watchfilm＞,＜filmname＞=子龍")，因此需要透過知識輔助理解模組622做進一步分析，來確認用戶的意圖。在本實施例中，知識輔助理解模組622可取出可能意圖語法資料603中的關鍵字604(例如”子龍”)並送往檢索系統624的檢索介面單元628可透過搜尋引擎626來搜尋結構化資料庫625，以確認是否有“子龍”這個角色名子或影片名稱。並且，自然語言處理器621將意圖資料623儲存在知識輔助理解模組622內部。語音資訊 (請求資訊) 可能意圖語法資料意圖資料關鍵字我要子龍＜roleselect＞,＜rolename＞=子龍＜roleselect＞子龍＜watchfilm＞,＜filmname＞=子龍＜watchfilm＞子龍表1 In this embodiment, with reference to the following Table 1, when the natural language understanding system 620 receives the voice information 102 provided by the voice recognition module 110 of FIG. 1 (for example, when the user displays the user interface 520 of FIG. 5 , use verbal input of the request information "I want Zilong"), the natural language processor 621 can analyze the voice information 102 to generate possible intent grammar data 603. The natural language processor 621 can send the possible intention grammar data 603 to the knowledge aided understanding module 622 , wherein the possible intention grammar data 603 includes keywords 604 and intention data 623 . In this regard, since the keywords 604 (such as "Zilong") in the intent grammar data 603 may belong to different fields (such as two fields of role selection (<roleselect>) and movies (<readfilm>), a voice information 102 can be analyzed into multiple possible intent grammar data 603 (such as "<roleselect>,<rolename>=Zilong" or "<watchfilm>,<filmname>=Zilong"), so it is necessary to understand the module 622 through knowledge assistance Do further analysis to confirm the user's intent. In this embodiment, the knowledge aided comprehension module 622 can extract the keyword 604 (such as "zilong") in the possible intent grammar data 603 and send it to the retrieval interface unit 628 of the retrieval system 624 to search for the structure through the search engine 626 Optimize the database 625 to confirm whether there is the character name or movie title of "Zilong". Moreover, the natural language processor 621 stores the intention data 623 in the knowledge-assisted comprehension module 622 . Voice information (request information) possible intent grammar intent data keywords I want Zilong <roleselect>,<rolename>=Zilong <role select> Zi Long <watchfilm>, <filmname>=Zilong ＜watchfilm＞ Zi Long Table 1

換言之，在本實施例中，自然語言理解系統620能先擷取可能意圖語法資料603中的關鍵字604，並藉由結構化資料庫625的全文檢索結果來判別關鍵字604的領域屬性，隨後再進一步分析並確認使用者的明確意圖。因此使用者能夠很輕鬆地以口語化方式來表達出其意圖或資訊，而不需要特別熟記特定用語，例如現有作法中關於固定詞列表的特定用語。In other words, in this embodiment, the natural language understanding system 620 can first extract the keywords 604 in the possible intent grammar data 603, and use the full-text search results of the structured database 625 to determine the domain attributes of the keywords 604, and then Then further analyze and confirm the clear intention of the user. Therefore, users can easily express their intentions or information in a colloquial manner without needing to memorize specific terms, such as the specific terms related to the list of fixed words in the existing practice.

在本實施例中，檢索系統624中的結構化資料庫625可例如儲存的多個記錄。檢索系統624中的搜尋引擎626將依據關鍵字604對結構化資料庫625進行全文檢索，並且確認用戶的意圖後，再將全文檢索的回應結果605(假設結構化資料庫625儲存有某項記錄，其標題字段內部有”rolenameguid:趙子龍”的記錄，並且沒有任何記錄的標題字段儲存”filmnameduid:趙子龍”的信息，因此回應結果605將是”rolenameguid”)回傳至知識輔助理解模組622。In this embodiment, the structured database 625 in the retrieval system 624 may, for example, store a plurality of records. The search engine 626 in the retrieval system 624 will carry out a full-text search to the structured database 625 according to the keyword 604, and after confirming the user's intention, then the response result 605 of the full-text search (assuming that the structured database 625 stores a certain record , there is a record of "rolenameguid:Zhao Zilong" in its title field, and there is no title field of any record to store the information of "filmnameduid:Zhao Zilong", so the response result 605 will be "rolenameguid") and sent back to the knowledge-aided understanding module 622.

在本實施例中，檢索介面單元628可通過搜尋引擎626從指示資料儲存裝置627而取得指示資料，並且檢索介面單元628依序輸出匹配關鍵字604的完全匹配記錄及部分匹配記錄中的指示資料作為回應結果605送往知識輔助理解系統622，其中完全匹配記錄的優先順序大於部分匹配記錄的優先順序。接著，知識輔助理解模組622可依據回應結果605對所儲存的意圖資料623進行比對，並將所求得的確定意圖語法資料606(例如將回應結果605與可能意圖語法資料603比對後，確定使用者的意圖應是"＜roleselect＞,＜rolename＞=趙子龍")送往分析結果輸出模組629。In this embodiment, the retrieval interface unit 628 can obtain the instruction data from the instruction data storage device 627 through the search engine 626, and the retrieval interface unit 628 sequentially outputs the instruction data in the complete match record and the partial match record of the matching keyword 604 As a response result 605 is sent to the knowledge aided understanding system 622 , wherein the priority of the full matching record is higher than that of the partial matching record. Next, the knowledge-aided understanding module 622 can compare the stored intention data 623 according to the response result 605, and obtain the determined intention grammar data 606 (for example, after comparing the response result 605 with the possible intention grammar data 603 , it is determined that the user's intention should be "<roleselect>,<rolename>=Zhao Zilong") and sent to the analysis result output module 629.

然而，在本發明的另一實施例中，搭配參考以下表2，結構化資料庫220所儲存的每個記錄還可例如包括有熱度欄位、喜好欄位或厭惡欄位等資訊。對此，假設意圖語法資料603可能包含兩筆資料(例如是"＜roleselect＞,＜rolename＞=子龍"或"＜roleselect＞,＜rolename＞=紫龍")。並且，當檢索系統624的搜尋引擎626進行全文檢索後，若判斷有兩筆記錄符合檢索結果(假設結構化資料庫625儲存有兩筆記錄，其標題欄位中的標題字段內部分別有”rolenameguid:趙子龍”以及”rolenameguid:紫龍”的記錄)，則檢索系統624的搜尋引擎626可進一步判斷此兩筆記錄中的熱度欄位、喜好欄位以及厭惡欄位。對此，檢索系統624的搜尋引擎626可例如進一步依據熱度欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的熱度數值(8)較高，且對應於”紫龍”的熱度數值(2)較低，則搜尋引擎626以”趙子龍”作為語意分析結果103)。或者，檢索系統624的搜尋引擎626可進一步例如依據喜好欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的喜好數值(20)較高，且對應於”紫龍”的喜好數值(5)較低，則搜尋引擎626以”趙子龍”作為語意分析結果103)。又或者，檢索系統624的搜尋引擎626可進一步例如依據厭惡欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的厭惡數值(1)較低，且對應於”紫龍”的厭惡數值(20)較高，則搜尋引擎626以”趙子龍”作為語意分析結果103)。並且，在本發明的又一實施例中，檢索系統624的搜尋引擎626亦可合併參考上述熱度欄位、喜好欄位以及厭惡欄位的至少其中之一，而不限於上述單一判斷準則(例如若”趙子龍”及”紫龍”的熱度數值相同，則搜尋引擎626進一步比對喜好數值，或者將熱度欄位以及喜好欄位的數值相加後進行比較)。記錄標題欄內容欄熱度欄喜好欄厭惡欄 1 rolenameguid:趙子龍人物選擇 8 20 1 2 rolenameguid:紫龍人物選擇 2 5 20 表2 However, in another embodiment of the present invention, with reference to Table 2 below, each record stored in the structured database 220 may also include, for example, information such as popularity fields, like fields or dislike fields. For this, it is assumed that the intent grammar data 603 may contain two data (for example, "<roleselect>, <rolename>=Zilong" or "<roleselect>, <rolename>=Zilong"). And, after the search engine 626 of the retrieval system 624 performs a full-text search, if it is judged that there are two records that match the search results (assuming that the structured database 625 stores two records, the title fields in the title fields have "rolenameguid : Zhao Zilong" and "rolenameguid: Zilong" records), then the search engine 626 of the retrieval system 624 can further determine the popularity field, like field and dislike field in these two records. In this regard, the search engine 626 of the retrieval system 624 can, for example, further determine the semantic analysis result 103 according to the value of the popularity column (for example, the popularity value (8) corresponding to "Zhao Zilong" is higher, and the popularity value (8) corresponding to "Zilong" If the value (2) is low, the search engine 626 takes "Zhao Zilong" as the semantic analysis result 103). Or, the search engine 626 of the retrieval system 624 can further determine the semantic analysis result 103 based on the value of the preference field (for example, the preference value (20) corresponding to "Zhao Zilong" is higher, and the preference value corresponding to "Zilong" (5) is lower, then the search engine 626 uses "Zhao Zilong" as the semantic analysis result 103). Alternatively, the search engine 626 of the retrieval system 624 can further determine the semantic analysis result 103 based on the value of the dislike column (for example, the dislike value (1) corresponding to "Zhao Zilong" is lower, and the dislike value corresponding to "Zilong" If the numerical value (20) is higher, then the search engine 626 uses "Zhao Zilong" as the semantic analysis result (103). Moreover, in yet another embodiment of the present invention, the search engine 626 of the retrieval system 624 may also refer to at least one of the above popularity field, favorite field and dislike field, and is not limited to the above single judgment criterion (for example, If the popularity values of "Zhao Zilong" and "Zilong" are the same, the search engine 626 further compares the preference values, or adds the values of the popularity field and the preference field for comparison). Record title content bar popularity bar Favorite column dislike bar 1 rolenameguid:Zhao Zilong character selection 8 20 1 2 rolenameguid: Zilong character selection 2 5 20 Table 2

因此，分析結果輸出模組629可依據確定意圖語法資料606，輸出具有具體的意圖對象的語意分析結果103。對此，由於自然語言理解系統620可實現對關鍵字604進行全文檢索後的完全匹配以及部分匹配的判斷，而輸出適當的語意分析結果103(例如依據所接收的確定意圖語法資料606"＜roleselect＞,＜rolename＞=趙子龍"而確認使用者想選擇趙雲，因此輸出”趙雲”的語意分析結果103並送往指令產生系統105)，因此，在本發明的某些實施例中，使用者可提供更為口語或靈活變化的語音輸入形式，並且具有本實施例的自然語言理解系統620的語音識別系統可有效且準確地回饋相對應的控制指令至應用系統，而提供有效的語音選擇功能。Therefore, the analysis result output module 629 can output the semantic analysis result 103 with a specific intention object according to the determined intention grammar data 606 . In this regard, since the natural language understanding system 620 can realize the judgment of complete matching and partial matching after the full-text search of the keyword 604, and output an appropriate semantic analysis result 103 (for example, according to the received determined intention grammatical data 606 "<roleselect >, <rolename>=Zhao Zilong" and confirm that the user wants to select Zhao Yun, so the semantic analysis result 103 of "Zhao Yun" is output and sent to the command generation system 105), therefore, in some embodiments of the present invention, the user can A more colloquial or flexible voice input form is provided, and the voice recognition system with the natural language understanding system 620 of this embodiment can effectively and accurately feed back corresponding control commands to the application system, thereby providing an effective voice selection function.

綜上所述，本發明的語音識別系統、指令產生系統及其語音識別方法可透過外設在應用系統外的另一系統來提供語音辨識功能，而回傳對應的控制指令至應用系統。並且，本發明的語音識別系統、指令產生系統及其語音識別方法還可對使用者提供的口語化的語音輸入來進行有效的語音識別。因此，本發明的語音識別系統、指令產生系統及其語音識別方法可有效地降低在應用系統中對於語音識別所需要的系統資源，並且可實現便捷且靈活的語音選擇功能。To sum up, the speech recognition system, command generation system and speech recognition method of the present invention can provide speech recognition function through another system externally installed outside the application system, and return corresponding control commands to the application system. Moreover, the speech recognition system, command generation system and speech recognition method thereof of the present invention can also perform effective speech recognition on the colloquial speech input provided by the user. Therefore, the speech recognition system, instruction generation system and speech recognition method thereof of the present invention can effectively reduce system resources required for speech recognition in application systems, and can realize convenient and flexible speech selection functions.

雖然本發明已以實施例揭露如上，然其並非用以限定本發明，任何所屬技術領域中具有通常知識者，在不脫離本發明的精神和範圍內，當可作些許的更動與潤飾，故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above with the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field may make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention should be defined by the scope of the appended patent application.

100:語音識別系統 101:語音輸入 102:語音資訊 103:語意分析結果 104:介面內容 105:控制指令 110:語言辨識模組 120、620:自然語言理解系統 130:指令產生系統 131:比對模組 132:指令確認模組 133:暫存裝置 134:存取模組 135:項目獲取模組 140:儲存裝置 200:應用系統 210:語音接收模組 220:指令執行模組 301:介面編號 302、307:指令格式 303:介面內容 304:選擇項目 305:比對結果 306:控制指令 511、521、531:介面名稱 512~514、522~524、532~534:選擇項目 603:可能意圖語法資料 604:關鍵字 605:回應結果 606:確定意圖語法資料 621:自然語言處理器 622:知識輔助理解模組 623:意圖資料 624:檢索系統 625:結構化資料庫 626:搜尋引擎 627:指示資料儲存裝置 628:檢索介面單元 629:分析結果輸出模組 S210~S240、S410~S450:步驟 100: Speech Recognition System 101:Voice input 102: Voice information 103: Semantic analysis results 104:Interface content 105: Control command 110:Language recognition module 120, 620: Natural Language Understanding System 130: Command generation system 131: Comparison module 132:Command confirmation module 133: Temporary storage device 134: access module 135: Item acquisition module 140: storage device 200: Application system 210:Voice receiving module 220: Instruction execution module 301: interface number 302, 307: instruction format 303: interface content 304: select item 305: Comparison result 306: Control command 511, 521, 531: interface name 512~514, 522~524, 532~534: select the item 603: Possible intent grammar data 604:Keyword 605: Response result 606: Determine intent grammar data 621: Natural Language Processor 622: Knowledge Assisted Comprehension Module 623: Intent data 624: Retrieval system 625: Structured database 626:Search Engine 627: Indicate data storage device 628: Search interface unit 629: Analysis result output module S210~S240, S410~S450: steps

圖1是依照本發明的一實施例的語音識別系統的示意圖。圖2是依照本發明的一實施例的語音識別方法的流程圖。圖3是依照本發明的一實施例指令產生系統的示意圖。圖4是依照本發明的另一實施例的語音識別方法的流程圖。圖5是依照本發明的一實施例的應用系統的使用者介面示意圖。圖6是依照本發明的一實施例的自然語音理解系統的示意圖。 FIG. 1 is a schematic diagram of a speech recognition system according to an embodiment of the invention. FIG. 2 is a flowchart of a voice recognition method according to an embodiment of the invention. FIG. 3 is a schematic diagram of an instruction generation system according to an embodiment of the invention. FIG. 4 is a flowchart of a speech recognition method according to another embodiment of the present invention. FIG. 5 is a schematic diagram of a user interface of an application system according to an embodiment of the present invention. FIG. 6 is a schematic diagram of a natural speech understanding system according to an embodiment of the present invention.

100:語音識別系統 100: Speech Recognition System

101:語音輸入 101:Voice input

102:語音資訊 102: Voice information

103:語意分析結果 103: Semantic analysis results

104:介面內容 104:Interface content

105:控制指令 105: Control command

110:語言辨識模組 110:Language recognition module

120:自然語言理解系統 120:Natural Language Understanding Systems

130:指令產生系統 130: Command generation system

140:儲存裝置 140: storage device

200:應用系統 200: Application system

210:語音接收模組 210:Voice receiving module

220:指令執行模組 220: Instruction execution module

Claims

A voice recognition system, adapted to communicate with an application system, and the application system is used to receive a voice input, wherein the voice recognition system includes: a voice recognition module, used to receive the voice input provided by the application system , and recognize the speech input to generate speech information; a natural speech understanding system, coupled to the speech recognition module, and used to understand the speech information, to generate a semantic analysis result; and an instruction generation system, coupled to The natural speech understanding system is used to use the semantic analysis result to compare a selection item in an interface content of a current user interface displayed by the application system, and output a control command according to a comparison result to the application system.

The speech recognition system as described in claim 1, wherein the instruction generation system includes: a comparison module, used to receive the semantic analysis result, and use the semantic analysis result to compare the interface in the current user interface The selection item in the content is used to generate the comparison result; and an instruction confirmation module is coupled to the comparison module and used for converting the comparison result according to an instruction format to output the control instruction.

The speech recognition system as described in claim 2, wherein the interface content of the current user interface includes an item name of the selected item and an object number corresponding to the item name and a plurality of reference keywords, the comparison model Group comparison of the language The meaning analysis result matches the item name and one of the reference keywords to generate the comparison result.

The speech recognition system as described in claim 3, wherein the command generation system further includes: a temporary storage device for receiving an interface number of the current user interface provided by the application system; an access module coupled to connected to the temporary storage device, and used to generate the interface content of the current user interface according to the interface number; and an item acquisition module, coupled to the access module and the comparison module, and used to obtain from Obtain the item name of the selected item, the item label corresponding to the item name, and the reference keywords from the interface content of the current user interface, so as to output the item name of the selected item and the item name corresponding to the item name The item label and the reference keywords are sent to the comparison module.

The speech recognition system according to claim 4, wherein the temporary storage device is further coupled to the command confirmation module, and the command confirmation module outputs the control command to the application system through the temporary storage device.

The speech recognition system as described in claim 4, further comprising: a storage device coupled to the access module of the command generation system, and the access module is used to access the storage device to obtain the currently used The interface content of the operator interface.

The speech recognition system according to claim 6, wherein the interface content of the current user interface is pre-written into the storage device through the temporary storage device and the access module.

The speech recognition system as described in claim 6, wherein the instruction format is written into the storage device through the temporary storage device and the access module in advance, and the instruction confirmation module obtains the instruction through the access module Format.

The speech recognition system as described in claim 4, wherein the application system is used to display the current user interface, and when the application system receives the control instruction output by the speech recognition system, the application system will command to select the selection item in the interface content of the current user interface, and change to display the next user interface or perform a specific operation.

The speech recognition system as described in claim 9, wherein when the application system changes to display the next user interface, the application system outputs the next interface number of the next user interface to the command generation system, so that the The access module of the command generation system acquires the next interface content of the next user interface through the storage device, and provides it to the item acquisition module.

The speech recognition system as described in Claim 1, wherein the natural speech understanding system includes: a natural language processor, coupled to the speech recognition module, and used to receive the speech information to generate a possible intent grammar data; A knowledge-assisted comprehension module, coupled to the natural language processor, and uses an intent data storing the possible intent grammar data; A retrieval system, coupled to the knowledge-aided understanding module, and used to receive a keyword of the possible intent grammar data provided by the knowledge-assisted understanding module, so as to generate a response result to the knowledge-aided understanding module according to the keyword module, so that the knowledge aided comprehension module generates a definite intention grammar data according to the response result; Determine the intent grammar data to output the semantic analysis result.

The speech recognition system as described in claim 11, wherein the retrieval system includes: a retrieval interface unit coupled to the knowledge aided comprehension module; a search engine coupled to the retrieval interface unit; an instruction data storage device coupled to the search engine; and a structured database coupled to the search engine and storing a plurality of records, wherein the retrieval interface unit can search the records in the structured database through the search engine according to the keyword , and according to a search result, a corresponding instruction data is acquired through the instruction data storage device as the response result.

The speech recognition system as claimed in claim 12, wherein the search engine performs a full-text search on the records, and determines a domain attribute of the keyword according to a full-text search result.

The speech recognition system as described in claim 13, wherein the records each include at least one of a popularity field, a favorite field, and a dislike field One, and the search engine determines the search result by comparing the values of at least one of the popularity field, the like field and the dislike field of the respective records.

The speech recognition system as claimed in claim 12, wherein the knowledge aided comprehension module compares the response result and the intent data to generate the definite intent grammar data.

An instruction generation system is adapted to communicate with an application system, and the application system is used to receive a voice input, wherein the instruction generation system includes: a comparison module, used to receive a semantic analysis corresponding to the voice input result, and use the semantic analysis result to compare a selected item in an interface content of a current user interface displayed by the application system to generate a comparison result; and an instruction confirmation module, coupled to the The comparison module is used for converting the comparison result according to an instruction format, and outputting a control instruction to the application system.

The command generating system as described in claim 16, wherein the interface content of the current user interface includes an item name of the selected item and an object number corresponding to the item name and a plurality of reference keywords, the comparison model and comparing whether the semantic analysis result matches the project name and one of the reference keywords to generate the comparison result.

The command generation system as described in claim 17 further includes: a temporary storage device for receiving an interface number of the current user interface provided by the application system; An access module, coupled to the temporary storage device, and used to generate the interface content of the current user interface according to the interface number; and an item acquisition module, coupled to the access module and the comparison module, and used to obtain the item name of the selected item, the item label corresponding to the item name, and the reference keywords from the interface content of the current user interface, so as to output the item of the selected item name and the item label corresponding to the item name and the reference keywords to the comparison module.

The command generation system as claimed in claim 18, wherein the temporary storage device is further coupled to the command confirmation module, and the command confirmation module outputs the control command to the application system through the temporary storage device.

The command generation system as described in claim 18, wherein a storage device is coupled to the access module of the command generation system, and the access module is used to access the storage device to obtain the current user interface The interface content.

The command generating system as described in claim 20, wherein the interface content of the current user interface is pre-written into the storage device through the temporary storage device and the access module.

The command generating system as described in claim 20, wherein the command format is written into the storage device through the temporary storage device and the access module in advance, and the command confirmation module obtains the command through the access module Format.

The command generation system as described in claim 18, wherein the application system is used to display the current user interface, and when the application system receives the control command output by the command generation system, the application system will instruction to select the selection item in the interface content of the current user interface, and change to display the next user interface or perform a specific operation.

The command generation system as described in claim 23, wherein when the application system changes to display the next user interface, the application system outputs the next interface number of the next user interface to the command generation system, so that the The access module of the command generation system acquires the next interface content of the next user interface through the storage device, and provides it to the item acquisition module.

A speech recognition method suitable for a speech recognition system, wherein the speech recognition system communicates with an application system, and the application system is used to receive a speech input, wherein the speech recognition method includes: receiving the Voice input; identifying the voice input to generate voice information; understanding the voice information to generate a semantic analysis result; and using the semantic analysis result to compare an interface of a current user interface displayed on the application system An item is selected in the content, and a control command is output to the application system according to a comparison result.

The speech recognition method according to claim 25, wherein the step of outputting the control instruction to the application system includes: converting the comparison result according to an instruction format, and outputting the control instruction.

The speech recognition method as described in claim 26, wherein the interface content of the current user interface includes an item name of the selected item and an object number corresponding to the item name and a plurality of reference keywords, and the semantic Analyzing the result to compare the selection item of the current user interface to generate the comparison result includes: comparing the semantic analysis result with the item name and one of the reference keywords to generate The comparison result.

The voice recognition method as described in claim 27, further comprising: receiving an interface number of the current user interface provided by the application system; generating the interface content of the current user interface according to the interface number; and from the The item name of the selected item, the item label corresponding to the item name, and the reference keywords are obtained from the interface content of the current user interface.

The speech recognition method as described in claim 28, wherein the application system is used to display the current user interface, and when the application system receives the control instruction output by the speech recognition system, the application system will command to select the selection item in the interface content of the current user interface, and change to display the next user interface or perform a specific operation.

The speech recognition method as described in claim 25, wherein the step of understanding the speech information to generate the semantic analysis result includes: generating a possible intention grammar data according to the speech information; generating a keyword according to the possible intention grammar data A response result; generating a definite intention grammar data according to the response result; and outputting the semantic analysis result according to the definite intention grammar data.

The speech recognition method as described in claim item 30, wherein the step of generating the response result according to the keyword includes: searching for a plurality of records according to the keyword, and obtaining a corresponding instruction data according to a search result as the Respond to the result.

The speech recognition method as described in claim item 31, wherein the step of searching these records according to the keyword comprises: performing a full-text search on these records, and judging a field attribute of the keyword according to a full-text search result .

The speech recognition method as described in claim 32, wherein the step of searching the records according to the keyword further includes: comparing at least One of the values to determine the search results.

The speech recognition method as claimed in claim 31, wherein the step of outputting the semantic analysis result according to the definite intention grammar data includes: comparing the response result and the possible intention grammar data to generate the definite intention grammar data.

A voice recognition method suitable for an instruction generating system, wherein the instruction generating system is adapted to communicate with an application system, and the application system is used to receive a voice input, wherein the voice recognition method includes: receiving a voice corresponding to the voice input a semantic analysis result; using the semantic analysis result to compare a selected item in an interface content of a current user interface displayed by the application system to generate a comparison result; and confirm the comparison result according to an instruction format, and output a control instruction to the application system.

The speech recognition method as described in claim 35, wherein the interface content of the current user interface includes an item name of the selected item and a plurality of reference keywords corresponding to the item name, and the semantic analysis result is used to compare The step of generating the comparison result for the selected item in the current user interface includes: comparing whether the semantic analysis result matches the item name and one of the reference keywords to generate the comparison result .

The voice recognition method as described in claim 35, further comprising: receiving an interface number of the current user interface provided by the application system; generating the interface content of the current user interface according to the interface number; and from the Get the selected item from the interface content of the current user interface.

The speech recognition method as described in claim item 37, wherein the application system is used to display the current user interface, and when the application system receives the control instruction output by the instruction generation system, the application system according to the control command to select the selection item in the interface content of the current user interface, and change to display the next user interface or perform a specific operation.