TWI780502B - Speech recognition system, command generation system, and speech recognition method thereof - Google Patents
Speech recognition system, command generation system, and speech recognition method thereof Download PDFInfo
- Publication number
- TWI780502B TWI780502B TW109136554A TW109136554A TWI780502B TW I780502 B TWI780502 B TW I780502B TW 109136554 A TW109136554 A TW 109136554A TW 109136554 A TW109136554 A TW 109136554A TW I780502 B TWI780502 B TW I780502B
- Authority
- TW
- Taiwan
- Prior art keywords
- user interface
- speech recognition
- module
- interface
- application system
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/54—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
- Selective Calling Equipment (AREA)
Abstract
Description
本發明是有關於一種語音識別技術,且特別是有關於一種語音識別系統、指令產生系統及其語音識別方法。The present invention relates to a speech recognition technology, and in particular to a speech recognition system, an instruction generating system and a speech recognition method thereof.
隨著語音識別技術的演進,各種應用系統都開始嘗試搭載有語音識別功能,以提升應用系統的操作便利性。特別是,對於虛擬實境(Virtual Reality,VR)系統或擴增實境(Augmented Reality,AR)系統而言,若能提供語音識別功能將可大幅提升虛擬實境系統或擴增實境系統的操作便利性以及使用者體驗。然而,由於語音識別往往需要花費大量的系統資源,因此將會導致系統的建置成本增加,甚至影響系統運行速度。With the evolution of speech recognition technology, various application systems are beginning to try to be equipped with a speech recognition function, so as to improve the operation convenience of the application system. Especially, for a virtual reality (Virtual Reality, VR) system or an augmented reality (Augmented Reality, AR) system, if the voice recognition function can be provided, the performance of the virtual reality system or the augmented reality system can be greatly improved. Ease of operation and user experience. However, since the voice recognition often needs to consume a large amount of system resources, it will increase the construction cost of the system, and even affect the running speed of the system.
另一個問題是,由於傳統的語音識別是透過自動語音辨識文法(Automatic Speech Recognition grammar, ASR grammar)的編譯來實現語音識別功能,因此使用者須將語音選擇的所有說法與內容都編譯至自動語音辨識文法當中,並且只有完全符合自動語音辨識文法的文字才能夠被匹配。也就是說,傳統的語音識別方法對於系統開發者來說需要花費大量的工作量,並且語音識別的使用上也不夠靈活。甚至,若需要在虛擬實境系統或擴增實境系統中實現語音識別功能,可能還需要搭配更改應用系統的內部系統設定,而導致系統的設置複雜度以及設置成本的增加。Another problem is that since the traditional speech recognition realizes the speech recognition function through the compilation of Automatic Speech Recognition grammar (ASR grammar), the user must compile all the sayings and contents selected by the speech into the automatic speech Among the recognition grammars, only words that fully conform to the automatic speech recognition grammar can be matched. That is to say, the traditional voice recognition method requires a lot of workload for system developers, and the use of voice recognition is not flexible enough. Even, if the voice recognition function needs to be implemented in the virtual reality system or the augmented reality system, it may be necessary to modify the internal system settings of the application system, which will increase the complexity and cost of setting the system.
本發明提供一種語音識別系統、指令產生系統及其語音識別方法,可提供便捷的語音識別功能。The invention provides a speech recognition system, an instruction generation system and a speech recognition method thereof, which can provide convenient speech recognition functions.
本發明的語音識別系統適於與應用系統進行通信。應用系統用以接收語音輸入。語音識別系統包括語音辨識模組、自然語音理解系統以及指令產生系統。語音辨識模組用以接收由應用系統提供的語音輸入,並且辨識語音輸入,以產生語音資訊。自然語音理解系統耦接語音辨識模組,並且用以理解語音資訊,以產生語意分析結果。指令產生系統耦接自然語音理解系統,並且用以利用該語意分析結果來比對在一當前使用者介面的一介面內容中的一選擇項目,並且依據一比對結果來輸出控制指令至應用系統。The speech recognition system of the present invention is adapted to communicate with application systems. The application system is used for receiving voice input. The speech recognition system includes a speech recognition module, a natural speech understanding system, and an instruction generation system. The voice recognition module is used for receiving voice input provided by the application system, and recognizing the voice input to generate voice information. The natural speech understanding system is coupled to the speech recognition module and used for understanding speech information to generate semantic analysis results. The instruction generating system is coupled to the natural speech understanding system, and is used for comparing a selection item in an interface content of a current user interface by using the semantic analysis result, and outputting a control instruction to the application system according to a comparison result .
在本發明的一實施例中,上述的指令產生系統包括比對模組以及指令確認模組。比對模組用以接收語意分析結果,並且利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目,以產生比對結果。指令確認模組耦接比對模組,並且用以依據指令格式來轉換比對結果,而輸出控制指令。In an embodiment of the present invention, the above-mentioned command generation system includes a comparison module and a command confirmation module. The comparison module is used to receive the semantic analysis result, and use the semantic analysis result to compare the selected items in the interface content of the current user interface to generate a comparison result. The command confirmation module is coupled to the comparison module, and is used for converting the comparison result according to the command format, and outputting the control command.
在本發明的一實施例中,上述的應用系統用以顯示當前使用者介面。當應用系統接收到由語音識別系統所輸出的控制指令後,應用系統依據控制指令來選擇當前使用者介面的介面內容中的選擇項目,並更換以顯示下一使用者介面或執行特定操作。In an embodiment of the present invention, the above-mentioned application system is used to display the current user interface. After the application system receives the control command output by the speech recognition system, the application system selects the selection item in the interface content of the current user interface according to the control command, and changes to display the next user interface or perform a specific operation.
在本發明的一實施例中,上述的自然語音理解系統包括自然語言處理器、知識輔助理解模組、檢索系統以及分析結果輸出模組。自然語言處理器耦接語音辨識模組,並且用以接收語音資訊,以產生可能意圖語法資料。知識輔助理解模組耦接自然語言處理器,並且用儲存可能意圖語法資料的意圖資料。檢索系統耦接知識輔助理解模組,並且用以接收知識輔助理解模組提供的可能意圖語法資料的關鍵字,以依據關鍵字來產生回應結果至知識輔助理解模組,以使知識輔助理解模組依據回應結果來產生確定意圖語法資料。分析結果輸出模組耦接知識輔助理解模組以及指令產生系統,並且用以依據確定意圖語法資料來輸出語意分析結果。In an embodiment of the present invention, the above-mentioned natural speech understanding system includes a natural language processor, a knowledge-aided understanding module, a retrieval system, and an analysis result output module. The natural language processor is coupled to the speech recognition module and is used to receive speech information to generate possible intent grammar data. The knowledge aided comprehension module is coupled to the natural language processor and uses intent data storing possible intent grammar data. The retrieval system is coupled to the knowledge-assisted understanding module, and is used to receive the keywords of the possible intention grammar data provided by the knowledge-assisted understanding module, so as to generate a response result to the knowledge-assisted understanding module according to the keywords, so that the knowledge-assisted understanding module The group generates intent-determined grammar data based on the response results. The analysis result output module is coupled to the knowledge aided comprehension module and the command generation system, and is used to output the semantic analysis result according to the grammar data of the determined intention.
本發明的指令產生系統適於與應用系統進行通信。應用系統用以接收語音輸入。指令產生系統包括比對模組以及指令確認模組。比對模組用以接收對應於語音輸入的語意分析結果,並且利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目,以產生比對結果。指令確認模組耦接比對模組,以依據指令格式來轉換比對結果,而輸出控制指令至應用系統。The instruction generating system of the present invention is adapted to communicate with an application system. The application system is used for receiving voice input. The command generation system includes a comparison module and a command confirmation module. The comparison module is used for receiving the semantic analysis result corresponding to the voice input, and using the semantic analysis result to compare the selected items in the interface content of the current user interface to generate a comparison result. The command confirmation module is coupled to the comparison module to convert the comparison result according to the command format, and output the control command to the application system.
本發明的語音識別方法適於語音識別系統。語音識別系統與應用系統進行通信,並且應用系統用以接收語音輸入。語音識別方法包括以下步驟。接收由應用系統提供的語音輸入。辨識語音輸入,以產生語音資訊。理解語音資訊,以產生語意分析結果。比對語意分析結果,以輸出控制指令至應用系統。The speech recognition method of the present invention is suitable for a speech recognition system. The voice recognition system communicates with the application system, and the application system is used to receive voice input. The speech recognition method includes the following steps. Receive voice input provided by the application system. Recognize voice input to generate voice information. Understand speech information to generate semantic analysis results. Compare the semantic analysis results to output control instructions to the application system.
本發明的語音識別方法適於指令產生系統。指令產生系統適於與應用系統進行通信,並且應用系統用以接收語音輸入。語音識別方法包括以下步驟。接收對應於語音輸入的語意分析結果。利用語意分析結果來比對在當前使用者介面的介面內容中的選擇項目,以產生比對結果。依據指令格式來確認比對結果,而輸出控制指令至應用系統。The speech recognition method of the present invention is suitable for an instruction generation system. The instruction generation system is adapted to communicate with the application system, and the application system is configured to receive speech input. The speech recognition method includes the following steps. Semantic analysis results corresponding to the speech input are received. The semantic analysis result is used to compare the selected items in the interface content of the current user interface, so as to generate a comparison result. The comparison result is confirmed according to the instruction format, and the control instruction is output to the application system.
基於上述,本發明的語音識別系統、指令產生系統及其語音識別方法可辨識由應用系統提供的語音輸入,並且回傳對應的控制指令至應用系統,而使應用系統可依據控制指令來執行相對應的操作。因此,本發明的語音識別系統、指令產生系統及其語音識別方法除了可提供便捷的語音識別功能,還可降低在應用系統中對於語音識別所需要的系統資源。Based on the above, the speech recognition system, command generation system and speech recognition method thereof of the present invention can recognize the speech input provided by the application system, and return the corresponding control command to the application system, so that the application system can execute the corresponding command according to the control command. corresponding operation. Therefore, the speech recognition system, command generation system and speech recognition method thereof of the present invention can not only provide a convenient speech recognition function, but also reduce system resources required for speech recognition in an application system.
為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail together with the accompanying drawings.
為了使本發明之內容可以被更容易明瞭,以下特舉實施例做為本發明確實能夠據以實施的範例。另外,凡可能之處,在圖式及實施方式中使用相同標號的元件/構件/步驟,係代表相同或類似部件。In order to make the content of the present invention more comprehensible, the following specific examples are given as examples in which the present invention can indeed be implemented. In addition, wherever possible, elements/components/steps using the same reference numerals in the drawings and embodiments represent the same or similar parts.
圖1是依照本發明的一實施例的語音識別系統的示意圖。參考圖1,語音識別系統100適於與應用系統200進行通信,並且語音識別系統100與應用系統200之間可以為有線或無線的方式進行通信。在本實施例中,應用系統200包括語音接收模組210以及指令執行模組220。應用系統200通過語音接收模組210接收由使用者所提供的語音輸入101,並且傳輸語音輸入101至語音識別系統100。在本實施例中,語音識別系統100可對由應用系統200提供的語音輸入101進行語音識別,以產生對應的指令,並且語音輸入101將所述指令回傳至應用系統200的指令執行模組220,以使指令執行模組220執行應用系統200的相關操作。換言之,本實施例的語音識別系統100可搭配任意的應用系統,並且提供語音識別功能。FIG. 1 is a schematic diagram of a speech recognition system according to an embodiment of the invention. Referring to FIG. 1 , the
在本實施例中,應用系統200可例如是一種運行或搭載在虛擬實境(Virtual Reality,VR)設備或擴增實境(Augmented Reality,AR)設備上的遊戲程式或應用程式,並且使用者可通過語音的方式來控制遊戲程式中的相關操作。虛擬實境設備或擴增實境設備可包括處理電路、記憶體及語音感測裝置等硬體電路,以藉由處理電路執行或存取記憶體內的相關模組或程式,而可至少實現本發明的語音接收功能、指令執行功能以及應用程式執行功能。在本實施例中,語音識別系統100可例如是建置在雲端伺服器或本地主機設備中,以提供語音的相關識別及處理功能。語音識別系統100亦可包括另一處理電路及另一記憶體,以藉由另一處理電路執行或存取另一記憶體內的相關模組或程式,而可至少實現本發明的語音識別功能。In this embodiment, the
在本實施例中,語音識別系統100包括語音辨識模組110、自然語言理解系統120、指令產生系統130以及儲存裝置140。語音辨識模組110耦接應用系統200的語音接收模組210以及自然語言理解系統120。指令產生系統130耦接自然語言理解系統120以及儲存裝置140。圖2是依照本發明的一實施例的語音識別方法的流程圖。搭配圖2的語音識別方法,圖1的語音識別系統100可執行如圖2的步驟S210~S240,以實現語音識別功能。在步驟S210中,語音識別系統100的語音辨識模組110接收由應用系統200提供的語音輸入101。在步驟S220中,語音辨識模組110辨識語音輸入101,以產生語音資訊102。在本實施例中,語音辨識模組110可將語音輸入101的信號轉換為電腦可處理及分析的語音資訊102(或稱資料)。In this embodiment, the
在步驟S230中,自然語言理解系統120接收由語音辨識模組110輸出的語音資訊102,並且理解語音資訊102,以產生語意分析結果103。在步驟S240中,指令產生系統130接收由自然語言理解系統120輸出的語意分析結果103,並且比對語意分析結果103,以輸出控制指令105至應用系統200。在本實施例中,儲存裝置140可提供當前應用系統200所顯示的使用者介面的介面內容104至指令產生系統130,以使指令產生系統130可比對語意分析結果103以及當前使用者介面的介面內容104,而產生控制指令105至應用系統200的指令執行模組220。在本實施例中,指令執行模組220可依據控制指令105來使應用系統200顯示下一使用者介面或執行特定操作。儲存裝置140可以是伺服器或電腦系統內的任何類型的記憶體,例如動態隨機記憶體(DRAM),靜態隨機記憶體(SRAM)、快閃記憶體(Flash memory)、唯讀記憶體(ROM)等,本發明對此並不加以限制,本領域的技術人員可以依據實際需求進行選用In step S230 , the natural
在本實施例中,自然語言理解系統120可例如將語音資訊102轉換為文本信息(Text Information),並且將文本信息進行規範化,而產生具有意圖對象的語意分析結果103。並且,指令產生系統130可產生對應於意圖對象的控制指令105,並提供至應用系統200,以使應用系統200可執行控制指令105而顯示下一使用者介面或執行特定操作。因此,本實施例的語音識別方法可使應用系統200無須花費額外的系統資源來執行語音識別,而可有效地節省應用系統200執行識別使用者的語音輸入的功能所需的系統資源。In this embodiment, the natural
值得注意的是,自然語言理解系統120所輸出的語意分析結果103可包括一個或一個以上的可能語意資料,並且語意資料可包括關鍵字及意圖資料。換言之,使用者能夠通過口語化方式來表達選擇意圖,例如選擇項目的全名、簡稱或別名等,即可通過本實施例的語音識別系統100來產生相對應的控制指令,而不需要念出完整的特定名稱。對此,關於自然語言理解系統120產生語意分析結果103的方式,以下將以圖6實施例來舉例說明之。It should be noted that the
圖3是依照本發明的一實施例指令產生系統的示意圖。圖4是依照本發明的另一實施例的語音識別方法的流程圖。參考圖1、圖3以及圖4,圖1的指令產生系統130為一種應用系統介面(Interface),使用者可透過編輯指令產生系統130,來使語音識別系統100可適用於對應的應用系統200。指令產生系統130可包括如圖3所示的系統架構。在本實施例中,指令產生系統130包括比對模組131、指令確認模組132、暫存裝置133、存取模組134以及項目獲取模組135。比對模組131耦接指令確認模組132以及項目獲取模組135。暫存裝置133耦接指令確認模組132以及存取模組134。存取模組134耦接項目獲取模組135以及儲存裝置140。在本實施例中,暫存裝置133可例如是動態隨機記憶體(DRAM),靜態隨機記憶體(SRAM)、快閃記憶體(Flash memory)、唯讀記憶體(ROM)等,本發明對此並不加以限制,本領域的技術人員可以依據實際需求進行選用。FIG. 3 is a schematic diagram of an instruction generation system according to an embodiment of the invention. FIG. 4 is a flowchart of a speech recognition method according to another embodiment of the present invention. Referring to FIG. 1 , FIG. 3 and FIG. 4 , the
搭配圖4的語音識別方法,圖3的指令產生系統130可執行如圖4的步驟S410~S450,以實現語音識別以及指令產生功能。在步驟S410中,暫存裝置133接收由應用系統200提供的當前使用者介面的介面編號301。在步驟S420中,存取模組134依據介面編號301來產生當前使用者介面的介面內容303。在本實施例中,存取模組134可依據介面編號301來存取預先載入在儲存裝置140中的介面資料,以取得應用系統200所顯示的當前使用者介面的介面內容303。With the voice recognition method in FIG. 4 , the
在步驟S430中,項目獲取模組135接收由存取模組134提供的當前使用者介面的介面內容303,並且項目獲取模組135從當前使用者介面的介面內容303中獲取選擇項目304,以輸出選擇項目304至比對模組131。在步驟S440中,比對模組131利用語意分析結果103來比對當前使用者介面的選擇項目304,以產生比對結果305。值得注意的是,選擇項目304可包括項目名稱以及對應於項目名稱的多個參考關鍵字。也就是說,項目獲取模組135可從當前使用者介面的介面內容303提取出選擇項目304的項目名稱以及對應於項目名稱的所述多個參考關鍵字。並且,比對模組131可比對語意分析結果103是否與項目名稱以及所述多個參考關鍵字的其中之一匹配,以產生比對結果305。換言之,使用者所說出的語音輸入經由自然語言理解系統120理解後所產生的語意分析結果103只要與項目名稱以及所述多個參考關鍵字的其中之一匹配,則比對模組131可例如輸出對應的項目編號的比對結果305。所述多個參考關鍵字可例如是項目名稱的簡稱或別名。In step S430, the
在步驟S450中,指令確認模組132依據指令格式307來轉換比對結果305,而輸出例如具有對應的項目編號的控制指令306。在本實施例中,指令格式307是指應用系統100所能接收的指令形式,並且指令確認模組132是通過暫存裝置133來輸出控制指令306至應用系統200。因此,當應用系統200顯示當前使用者介面時,若應用系統200接收到由語音識別系統100所輸出的控制指令306後,應用系統200可依據控制指令306來選擇當前使用者介面的介面內容303中的選擇項目,以使應用系統200可例如依據上述獲得的項目編號來更換顯示下一使用者介面或執行特定操作。據此,本實施例的語音識別方法可使指令產生系統130可有效地識別使用者的語音輸入,而產生對應的控制指令。In step S450 , the
另外,由於語音識別系統100可應用於各種應用系統,因此使用者僅須對語音識別系統100進行相關編輯,而無需更動應用系統。舉例而言,語音識別系統100可先操作在編輯模式(或透過語音辨識系統的軟體開發套件(SDK, Software Development Kit進行編輯),以將應用系統200所顯示的使用者介面的介面內容104以及指令格式302可預先通過暫存裝置133以及存取模組134來寫入儲存裝置140。因此,當語音識別系統100操作在工作模式時,指令產生系統130可通過暫存裝置133來接收當前使用者介面的介面編號301,並且存取模組134可依據介面編號301來讀取儲存裝置140以取得對應的當前使用者介面的介面內容303。指令確認模組132可通過存取模組134來取得指令格式307。也就是說,本實施例的語音識別系統100可適於搭配各種應用系統,來提供有效的語音識別及語音選擇功能。In addition, since the
圖5是依照本發明的一實施例的應用系統的使用者介面示意圖。圖5為圖1的應用系統200所可能顯示的使用者介面的範例。參考圖1、圖3以及圖5,應用系統200可例如執行虛擬實境的遊戲程式。對此,以遊戲程式為例,需先說明的是,遊戲開發者可先依據在遊戲中可能會顯示的每個使用者介面都建立對應的一個或多個資料集,其中每個使用者介面可各別包括一個或多個項目名稱。對此,遊戲開發者可對於每一個項目名稱都建立包括項目標號、項目名稱位於介面上的列數及行數,以及對應於項目名稱的多個參考關鍵字的資料集。因此,當語音識別系統100與遊戲程式連接時,遊戲程式可將建立的多個資料集輸入至指令產生系統130的暫存裝置133,並儲存在暫存裝置133中。存取模組134可接著讀取暫存裝置133,並且將所述多個資料集儲存至語音識別系統100的儲存裝置140中。FIG. 5 is a schematic diagram of a user interface of an application system according to an embodiment of the present invention. FIG. 5 is an example of a user interface that may be displayed by the
接著,假設應用系統200先顯示如圖5的使用者介面510。使用者介面510的介面內容包括有介面名稱511(首頁)以及多個選擇項目512~514。存取模組134所存取儲存裝置140的介面內容303可例如包括所述多個選擇項目512~514的資料。值得注意的是,項目獲取模組135可從使用者介面510的介面內容提取出選擇項目512~514的多個項目名稱以及對應於項目名稱的多個參考關鍵字,以輸出多個項目名稱及其對應的所述多個參考關鍵字至比對模組131進行比對。在此例中,比對模組131可從項目獲取模組135取得如以下的資料內容:Next, assume that the
{id=0; column=0; line=0; title=“使用說明”; alias=“使用”, “第一”, “倒數第三”,“一”, “用”, “操作”,…}{id=0; column=0; line=0; title="Instructions for use"; alias="use", "first", "third from last", "one", "use", "operation",… }
{id=1; column=0; line=1; title=“角色選擇”; alias=“選角”, “第二”, “倒數第二”,“角色”, “人物”,…}{id=1; column=0; line=1; title="role selection"; alias="casting", "second", "second to last", "role", "character",...}
{id=2; column=0; line=2; title=“關卡選擇”; alias=“關卡”, “第三”, “倒數第一”,“戰鬥”, “打仗”,…}{id=2; column=0; line=2; title="level selection"; alias="level", "third", "last one", "battle", "war",…}
其中“id”為項目標號、“column”為列數、“line”為行數、“title”為項目名稱、“alias”為參考關鍵字。在此例中,當使用者想選取的是選擇項目513時,可例如說出對應於選擇項目513的全名“角色選擇”、簡稱“選角”或項數“第二”、 “倒數第二”,等,皆可使比對模組131可比對選擇項目513,並且輸出對應的比對結果305至指令確認模組132,以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此,應用系統200可接著更換以顯示下一使用者介面520。Among them, "id" is the item number, "column" is the number of columns, "line" is the number of lines, "title" is the name of the project, and "alias" is the reference keyword. In this example, when the user wants to select the selection item 513, he can, for example, say the full name corresponding to the selection item 513 "role selection", "selection" for short, or the number of items "second", "last but one". Two", etc., can make the
接著,當顯示系統200顯示的使用者介面520時,使用者介面520的介面內容包括有介面名稱521(角色選擇)以及多個選擇項目522~524。在此例中,比對模組131可從項目獲取模組135取得如以下的資料內容:Then, when the
{id=3; column=0; line=0; title=“趙雲”; alias=“趙子龍”, “第一”, “倒數第三”, “一”, “趙”, “子龍”,…}{id=3; column=0; line=0; title="Zhao Yun"; alias="Zhao Zilong", "first", "third from last", "one", "Zhao", "Zilong",… }
{id=4; column=0; line=1; title=“關羽”; alias=“關雲長”, “第二”, “倒數第二”,“關”, “雲長”,…}{id=4; column=0; line=1; title="Guan Yu"; alias="Guan Yunchang", "Second", "Second to last", "Guan", "Yunchang",…}
{id=5; column=0; line=2; title=“曹操”; alias=“曹孟德”, “第三”, “倒數第一”, “三”, “曹”, “孟德”,…}{id=5; column=0; line=2; title=“Cao Cao”; alias=“Cao Mengde”, “third”, “last one”, “three”, “Cao”, “Meng De”,… }
在此例中,使用者可例如說出對應於選擇項目522的全名“趙雲”、簡稱“趙”、別名“趙子龍”或項數“第一”、“倒數第三”等,皆可使比對模組131可比對到使用者所選的是選擇項目522,並且輸出對應的比對結果305至指令確認模組132,以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此,應用系統200可接著更換以顯示下一使用者介面530。In this example, the user can, for example, say the full name "Zhao Yun" corresponding to the selection item 522, the abbreviation "Zhao", the alias "Zhao Zilong" or the item number "first", "the third from the bottom", etc., all can be used The
接著,當顯示系統200顯示的使用者介面530時,使用者介面530的介面內容包括有介面名稱531(武器選擇)以及多個選擇項目532~534。在此例中,比對模組131可從項目獲取模組135取得如以下的資料內容:Next, when the
{id=6; column=0; line=0; title=“青虹劍”; alias=“劍”, “青虹”, “第一”, “倒數第三”,“一”,…}{id=6; column=0; line=0; title="Qinghong Sword"; alias="Sword", "Qinghong", "First", "Third from last", "One",…}
{id=7; column=0; line=1; title=“長槍”; alias=“槍”, “第二”, “倒數第二”,“二”,…}{id=7; column=0; line=1; title="long gun"; alias="gun", "second", "second to last", "two",...}
{id=8; column=0; line=2; title=“大刀”; alias=“刀”, “第三”, “倒數第一”,“三”…}{id=8; column=0; line=2; title="大刀"; alias="刀", "third", "last one", "three"...}
在此例中,使用者可例如說出對應於選擇項目532的全名“青虹劍”、簡稱“青虹”或項數“第一”等,皆可使比對模組131可比對到使用者所選的是選擇項目532,並且輸出對應的比對結果305至指令確認模組132,以接著產生相對應的控制指令306至應用系統200的指令執行模組220。因此,應用系統200可接著執行在遊戲程式中的接續的相關特定操作。In this example, the user can, for example, say the full name "Qinghong Sword", the abbreviation "Qinghong" or the item number "No. The user selects the selection item 532 , and outputs the
然而,使用者所提供的語音輸入不限於上述的全名、簡稱、別名或項數的形式。在一實施例中,比對模組131也可以是直接從語意分析結果103中提取關於當前使用者介面的多個選擇項目的項數資訊(可包括順序的項數資訊或反序的項數資訊)、直接從語意分析結果103中提取關於當前使用者介面的多個選擇項目的行號或列號,或是直接依據語意分析結果103進行拼音匹配,以查找項目名稱的開頭、結尾或字符串可符合的選擇項目等。並且,比對模組131還可以輸出對應於多個匹配成功的多個選擇項目的比對結果305,以使應用系統200也可同時或依序執行多個控制指令。However, the voice input provided by the user is not limited to the above-mentioned forms of full name, abbreviation, alias or item number. In one embodiment, the
圖6是依照本發明的一實施例的自然語音理解系統的示意圖。應注意的是,在本發明的一些實施例中,本發明的自然語言理解系統可例如是應用如中國發明專利(公告號為CN103761242B)當中的自然語言理解系統的架構,但本發明並不限於此。在本發明的另一些實施例中,本發明的自然語言理解系統亦可採用其他可產生本發明各實施所述的語意分析結果的系統架構。參考圖1以及圖6,圖6的自然語言理解系統620為圖1的自然語言理解系統120的一個實施範例,但本發明的自然語言理解系統不限於此。在本實施例中。自然語言理解系統620包括自然語言處理器621、知識輔助理解模組622、檢索系統624以及分析結果輸出模組629。知識輔助理解模組622耦接自然語言處理器621以及檢索系統624。知識輔助理解模組622包括意圖資料623。檢索系統624包括結構化資料庫625、搜尋引擎626、指示資料儲存裝置627以及檢索介面單元628,其中搜尋引擎626耦接結構化資料庫625、指示資料儲存裝置627以及檢索介面單元628。FIG. 6 is a schematic diagram of a natural speech understanding system according to an embodiment of the present invention. It should be noted that, in some embodiments of the present invention, the natural language understanding system of the present invention may, for example, apply the framework of a natural language understanding system such as a Chinese invention patent (publication number CN103761242B), but the present invention is not limited to this. In other embodiments of the present invention, the natural language understanding system of the present invention may also adopt other system architectures capable of generating the semantic analysis results described in the various implementations of the present invention. Referring to FIG. 1 and FIG. 6 , the natural
在本實施例中,搭配參考以下表1,當自然語言理解系統620接收由圖1的語音辨識模組110所提供的語音資訊102時(例如當使用者於顯示圖5的使用者介面520時,用口頭輸入”我要子龍”)的請求資訊,自然語言處理器621可分析語音資訊102,以產生可能意圖語法資料603。自然語言處理器621可將可能意圖語法資料603送往知識輔助理解模組622,其中可能意圖語法資料603包含關鍵字604與意圖資料623。對此,由於意圖語法資料603中的關鍵字604(例如”子龍”)可能屬於不同的領域(例如角色選擇(<roleselect>)以及電影(<readfilm>)兩個領域),所以一個語音資訊102可分析成多個可能意圖語法資料603(例如是"<roleselect>,<rolename>=子龍"或"<watchfilm>,<filmname>=子龍"),因此需要透過知識輔助理解模組622做進一步分析,來確認用戶的意圖。在本實施例中,知識輔助理解模組622可取出可能意圖語法資料603中的關鍵字604(例如”子龍”)並送往檢索系統624的檢索介面單元628可透過搜尋引擎626來搜尋結構化資料庫625,以確認是否有“子龍”這個角色名子或影片名稱。並且,自然語言處理器621將意圖資料623儲存在知識輔助理解模組622內部。
換言之,在本實施例中,自然語言理解系統620能先擷取可能意圖語法資料603中的關鍵字604,並藉由結構化資料庫625的全文檢索結果來判別關鍵字604的領域屬性,隨後再進一步分析並確認使用者的明確意圖。因此使用者能夠很輕鬆地以口語化方式來表達出其意圖或資訊,而不需要特別熟記特定用語,例如現有作法中關於固定詞列表的特定用語。In other words, in this embodiment, the natural
在本實施例中,檢索系統624中的結構化資料庫625可例如儲存的多個記錄。檢索系統624中的搜尋引擎626將依據關鍵字604對結構化資料庫625進行全文檢索,並且確認用戶的意圖後,再將全文檢索的回應結果605(假設結構化資料庫625儲存有某項記錄,其標題字段內部有”rolenameguid:趙子龍”的記錄,並且沒有任何記錄的標題字段儲存”filmnameduid:趙子龍”的信息,因此回應結果605將是”rolenameguid”)回傳至知識輔助理解模組622。In this embodiment, the
在本實施例中,檢索介面單元628可通過搜尋引擎626從指示資料儲存裝置627而取得指示資料,並且檢索介面單元628依序輸出匹配關鍵字604的完全匹配記錄及部分匹配記錄中的指示資料作為回應結果605送往知識輔助理解系統622,其中完全匹配記錄的優先順序大於部分匹配記錄的優先順序。接著,知識輔助理解模組622可依據回應結果605對所儲存的意圖資料623進行比對,並將所求得的確定意圖語法資料606(例如將回應結果605與可能意圖語法資料603比對後,確定使用者的意圖應是"<roleselect>,<rolename>=趙子龍")送往分析結果輸出模組629。In this embodiment, the
然而,在本發明的另一實施例中,搭配參考以下表2,結構化資料庫220所儲存的每個記錄還可例如包括有熱度欄位、喜好欄位或厭惡欄位等資訊。對此,假設意圖語法資料603可能包含兩筆資料(例如是"<roleselect>,<rolename>=子龍"或"<roleselect>,<rolename>=紫龍")。並且,當檢索系統624的搜尋引擎626進行全文檢索後,若判斷有兩筆記錄符合檢索結果(假設結構化資料庫625儲存有兩筆記錄,其標題欄位中的標題字段內部分別有”rolenameguid:趙子龍”以及”rolenameguid:紫龍”的記錄),則檢索系統624的搜尋引擎626可進一步判斷此兩筆記錄中的熱度欄位、喜好欄位以及厭惡欄位。對此,檢索系統624的搜尋引擎626可例如進一步依據熱度欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的熱度數值(8)較高,且對應於”紫龍”的熱度數值(2)較低,則搜尋引擎626以”趙子龍”作為語意分析結果103)。或者,檢索系統624的搜尋引擎626可進一步例如依據喜好欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的喜好數值(20)較高,且對應於”紫龍”的喜好數值(5)較低,則搜尋引擎626以”趙子龍”作為語意分析結果103)。又或者,檢索系統624的搜尋引擎626可進一步例如依據厭惡欄位的數值來決定語意分析結果103(例如對應於”趙子龍”的厭惡數值(1)較低,且對應於”紫龍”的厭惡數值(20)較高,則搜尋引擎626以”趙子龍”作為語意分析結果103)。並且,在本發明的又一實施例中,檢索系統624的搜尋引擎626亦可合併參考上述熱度欄位、喜好欄位以及厭惡欄位的至少其中之一,而不限於上述單一判斷準則(例如若”趙子龍”及”紫龍”的熱度數值相同,則搜尋引擎626進一步比對喜好數值,或者將熱度欄位以及喜好欄位的數值相加後進行比較)。
因此,分析結果輸出模組629可依據確定意圖語法資料606,輸出具有具體的意圖對象的語意分析結果103。對此,由於自然語言理解系統620可實現對關鍵字604進行全文檢索後的完全匹配以及部分匹配的判斷,而輸出適當的語意分析結果103(例如依據所接收的確定意圖語法資料606"<roleselect>,<rolename>=趙子龍"而確認使用者想選擇趙雲,因此輸出”趙雲”的語意分析結果103並送往指令產生系統105),因此,在本發明的某些實施例中,使用者可提供更為口語或靈活變化的語音輸入形式,並且具有本實施例的自然語言理解系統620的語音識別系統可有效且準確地回饋相對應的控制指令至應用系統,而提供有效的語音選擇功能。Therefore, the analysis
綜上所述,本發明的語音識別系統、指令產生系統及其語音識別方法可透過外設在應用系統外的另一系統來提供語音辨識功能,而回傳對應的控制指令至應用系統。並且,本發明的語音識別系統、指令產生系統及其語音識別方法還可對使用者提供的口語化的語音輸入來進行有效的語音識別。因此,本發明的語音識別系統、指令產生系統及其語音識別方法可有效地降低在應用系統中對於語音識別所需要的系統資源,並且可實現便捷且靈活的語音選擇功能。To sum up, the speech recognition system, command generation system and speech recognition method of the present invention can provide speech recognition function through another system externally installed outside the application system, and return corresponding control commands to the application system. Moreover, the speech recognition system, command generation system and speech recognition method thereof of the present invention can also perform effective speech recognition on the colloquial speech input provided by the user. Therefore, the speech recognition system, instruction generation system and speech recognition method thereof of the present invention can effectively reduce system resources required for speech recognition in application systems, and can realize convenient and flexible speech selection functions.
雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。Although the present invention has been disclosed above with the embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the technical field may make some changes and modifications without departing from the spirit and scope of the present invention. The scope of protection of the present invention should be defined by the scope of the appended patent application.
100:語音識別系統
101:語音輸入
102:語音資訊
103:語意分析結果
104:介面內容
105:控制指令
110:語言辨識模組
120、620:自然語言理解系統
130:指令產生系統
131:比對模組
132:指令確認模組
133:暫存裝置
134:存取模組
135:項目獲取模組
140:儲存裝置
200:應用系統
210:語音接收模組
220:指令執行模組
301:介面編號
302、307:指令格式
303:介面內容
304:選擇項目
305:比對結果
306:控制指令
511、521、531:介面名稱
512~514、522~524、532~534:選擇項目
603:可能意圖語法資料
604:關鍵字
605:回應結果
606:確定意圖語法資料
621:自然語言處理器
622:知識輔助理解模組
623:意圖資料
624:檢索系統
625:結構化資料庫
626:搜尋引擎
627:指示資料儲存裝置
628:檢索介面單元
629:分析結果輸出模組
S210~S240、S410~S450:步驟
100: Speech Recognition System
101:Voice input
102: Voice information
103: Semantic analysis results
104:Interface content
105: Control command
110:
圖1是依照本發明的一實施例的語音識別系統的示意圖。 圖2是依照本發明的一實施例的語音識別方法的流程圖。 圖3是依照本發明的一實施例指令產生系統的示意圖。 圖4是依照本發明的另一實施例的語音識別方法的流程圖。 圖5是依照本發明的一實施例的應用系統的使用者介面示意圖。 圖6是依照本發明的一實施例的自然語音理解系統的示意圖。 FIG. 1 is a schematic diagram of a speech recognition system according to an embodiment of the invention. FIG. 2 is a flowchart of a voice recognition method according to an embodiment of the invention. FIG. 3 is a schematic diagram of an instruction generation system according to an embodiment of the invention. FIG. 4 is a flowchart of a speech recognition method according to another embodiment of the present invention. FIG. 5 is a schematic diagram of a user interface of an application system according to an embodiment of the present invention. FIG. 6 is a schematic diagram of a natural speech understanding system according to an embodiment of the present invention.
100:語音識別系統 100: Speech Recognition System
101:語音輸入 101:Voice input
102:語音資訊 102: Voice information
103:語意分析結果 103: Semantic analysis results
104:介面內容 104:Interface content
105:控制指令 105: Control command
110:語言辨識模組 110:Language recognition module
120:自然語言理解系統 120:Natural Language Understanding Systems
130:指令產生系統 130: Command generation system
140:儲存裝置 140: storage device
200:應用系統 200: Application system
210:語音接收模組 210:Voice receiving module
220:指令執行模組 220: Instruction execution module
Claims (38)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011026628.0A CN112216278A (en) | 2020-09-25 | 2020-09-25 | Speech recognition system, instruction generation system and speech recognition method thereof |
CN202011026628.0 | 2020-09-25 |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202213086A TW202213086A (en) | 2022-04-01 |
TWI780502B true TWI780502B (en) | 2022-10-11 |
Family
ID=74051202
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109136554A TWI780502B (en) | 2020-09-25 | 2020-10-21 | Speech recognition system, command generation system, and speech recognition method thereof |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112216278A (en) |
TW (1) | TWI780502B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180114234A1 (en) * | 2016-10-26 | 2018-04-26 | SignifAI Inc. | Systems and methods for monitoring and analyzing computer and network activity |
WO2018093806A1 (en) * | 2016-11-15 | 2018-05-24 | JIBO, Inc. | Embodied dialog and embodied speech authoring tools for use with an expressive social robot |
TWI690811B (en) * | 2019-03-26 | 2020-04-11 | 中華電信股份有限公司 | Intelligent Online Customer Service Convergence Core System |
US20200234700A1 (en) * | 2017-07-14 | 2020-07-23 | Cognigy Gmbh | Method for conducting dialog between human and computer |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7689417B2 (en) * | 2006-09-04 | 2010-03-30 | Fortemedia, Inc. | Method, system and apparatus for improved voice recognition |
CN104615052A (en) * | 2015-01-15 | 2015-05-13 | 深圳乐投卡尔科技有限公司 | Android vehicle navigation global voice control device and Android vehicle navigation global voice control method |
CN109830239B (en) * | 2017-11-21 | 2021-07-06 | 群光电子股份有限公司 | Speech processing device, speech recognition input system, and speech recognition input method |
CN108877796A (en) * | 2018-06-14 | 2018-11-23 | 合肥品冠慧享家智能家居科技有限责任公司 | The method and apparatus of voice control smart machine terminal operation |
CN110232919A (en) * | 2019-06-19 | 2019-09-13 | 北京智合大方科技有限公司 | Real-time voice stream extracts and speech recognition system and method |
CN110895931A (en) * | 2019-10-17 | 2020-03-20 | 苏州意能通信息技术有限公司 | VR (virtual reality) interaction system and method based on voice recognition |
-
2020
- 2020-09-25 CN CN202011026628.0A patent/CN112216278A/en active Pending
- 2020-10-21 TW TW109136554A patent/TWI780502B/en active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180114234A1 (en) * | 2016-10-26 | 2018-04-26 | SignifAI Inc. | Systems and methods for monitoring and analyzing computer and network activity |
WO2018093806A1 (en) * | 2016-11-15 | 2018-05-24 | JIBO, Inc. | Embodied dialog and embodied speech authoring tools for use with an expressive social robot |
US20200234700A1 (en) * | 2017-07-14 | 2020-07-23 | Cognigy Gmbh | Method for conducting dialog between human and computer |
TWI690811B (en) * | 2019-03-26 | 2020-04-11 | 中華電信股份有限公司 | Intelligent Online Customer Service Convergence Core System |
Also Published As
Publication number | Publication date |
---|---|
CN112216278A (en) | 2021-01-12 |
TW202213086A (en) | 2022-04-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7150770B2 (en) | Interactive method, device, computer-readable storage medium, and program | |
KR102313473B1 (en) | Provides command bundle suggestions for automated assistants | |
CN108369580B (en) | Language and domain independent model based approach to on-screen item selection | |
US6708162B1 (en) | Method and system for unifying search strategy and sharing search output data across multiple program modules | |
US20140282375A1 (en) | Generating Program Fragments Using Keywords and Context Information | |
US10157201B2 (en) | Method and system for searching for and providing information about natural language query having simple or complex sentence structure | |
JP2020149053A (en) | Methods, apparatuses, and storage media for generating training corpus | |
US20200285528A1 (en) | Application program interface lists | |
US10242670B2 (en) | Syntactic re-ranking of potential transcriptions during automatic speech recognition | |
CN114880346B (en) | Data processing method, related assembly and acceleration processor | |
JP7300435B2 (en) | Methods, apparatus, electronics, and computer-readable storage media for voice interaction | |
CN109408799B (en) | Semantic decision method and system | |
JP2019185737A (en) | Search method and electronic device using the same | |
KR100490406B1 (en) | Apparatus and method for processing voice command | |
TWI780502B (en) | Speech recognition system, command generation system, and speech recognition method thereof | |
US20230141200A1 (en) | Labeled knowledge graph based priming of a natural language model providing user access to programmatic functionality through natural language input | |
CN109753557B (en) | Answer output method, device, equipment and storage medium of question-answering system | |
US10831442B2 (en) | Digital assistant user interface amalgamation | |
US9529901B2 (en) | Hierarchical linguistic tags for documents | |
CN109903754B (en) | Method, device and memory device for speech recognition | |
US7099886B2 (en) | Method and apparatus for identifying programming object attributes | |
CN106682221B (en) | Question-answer interaction response method and device and question-answer system | |
JP5041802B2 (en) | Query analysis server, evaluation viewpoint word database, and phrase database generation method | |
US20100250548A1 (en) | Information terminal equipped with content search system | |
WO2023206703A1 (en) | Event slot extraction method and apparatus, storage medium and electronic apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
GD4A | Issue of patent certificate for granted invention patent |