TWI754816B

TWI754816B - Method for processing voice command and system thereof

Info

Publication number: TWI754816B
Application number: TW108115524A
Authority: TW
Inventors: 李嘉銘; 林宛儒
Original assignee: 玉山商業銀行股份有限公司
Priority date: 2019-05-06
Filing date: 2019-05-06
Publication date: 2022-02-11
Also published as: TW202042089A

Abstract

A method for processing voice command and a system thereof are provided. The method is applied to a service host that receives voice message from a terminal device via a network. In the service host, the terminal device can be recognized according to a packet source, and a registered user correlated to the terminal device can be obtained. The voice message is compared with a voiceprint corresponding to the user so as to verify the user. A question and an answer are obtained for forming a question voice provided for the user to answer. A reply voice is then transmitted to the service host by the terminal device. The system verifies the user’s identity based on the voiceprint comparison. The system determines if the user is a real person according to the reply voice. The system provides a voice service according to the voice message since the system confirms a real person.

Description

Voice command processing method and system

本發明涉及一種通過語音控制遠端服務的技術，其中特別是指處理語音指令時同時判斷是否為真人以及本人認證的方法與系統。The invention relates to a technology for controlling a remote service by voice, in particular to a method and a system for simultaneously judging whether it is a real person and authenticating the person when processing a voice command.

通過特定電腦裝置（例如行動裝置或是智能音箱）執行其中語音助理的技術日益成熟，例如可以直接以語音的方式向此電腦裝置發出語音指令，經過電腦裝置轉換為語音封包，並傳送到伺服器端，由伺服器進行語意分析，可得出語音中的意思，伺服器中的軟體程序可以比對資料庫得出回覆語音，或是以人工智能（Artificial Intelligence）回應使用者發出的語音。The technology of implementing a voice assistant through a specific computer device (such as a mobile device or a smart speaker) is becoming more and more mature. For example, a voice command can be sent directly to the computer device in the form of voice, which is converted into a voice packet by the computer device and sent to the server. At the end, the server performs semantic analysis to obtain the meaning of the voice. The software program in the server can compare the database to obtain the reply voice, or use artificial intelligence (Artificial Intelligence) to respond to the user's voice.

一般使用語音助理的目的是希望通過語音詢問的方式得到需要的資訊，例如詢問天氣、附近的餐廳、電影時間，或是查詢網路資料，若要通過語音方式查詢個人機密的資料，或是執行需要身份驗證的動作，利用語音方式需要更為嚴謹的認證技術，而習知技術中並未提出有效驗證身份與確認使用者是否為真人（非機器人）的方案。The general purpose of using a voice assistant is to obtain the required information through voice inquiry, such as inquiring about the weather, nearby restaurants, movie times, or inquiring network information. Actions that require identity verification require more rigorous authentication technology using voice, and there is no solution for effectively verifying identity and confirming whether the user is a real person (not a robot) in the prior art.

揭露書公開一種語音指令處理方法以及實現此方法的系統，系統包括一服務主機，服務主機可以根據使用者在終端產生的語音指令提供對應的服務，並以語音回覆使用者，使得使用者可以在不用手動操作的情況下獲得資訊，並特別的是，整個語音指令的處理過程更執行了使用者身份認證，以及確認使用者為真人的判斷程序。The disclosure discloses a voice command processing method and a system for implementing the method. The system includes a service host. The service host can provide corresponding services according to the voice commands generated by the user at the terminal, and reply to the user with voice, so that the user can Information is obtained without manual operation, and in particular, the entire process of voice command processing also implements user identity authentication and a judgment procedure to confirm that the user is a real person.

根據語音指令處理系統的實施例，系統提出一服務主機，可設有一資料庫，主要包括多個使用者的聲紋資料，並可記載要與使用者來往互動的題庫，服務主機可以一個電腦主機、叢集或是伺服系統實現，服務主機中運行了語音指令處理方法。According to the embodiment of the voice command processing system, the system proposes a service host, which can be provided with a database, which mainly includes the voiceprint data of a plurality of users, and can record the question bank to be interacted with the users. The service host can be a computer host , cluster or servo system implementation, and the voice command processing method runs in the service host.

根據語音指令處理方法的實施例，服務主機自一終端裝置接收語音訊息，例如是由使用者通過智能音箱或是一電腦裝置產生的語音訊息，這時可以此語音訊息進行一聲紋比對，用以確認發出此語音訊息的使用者，之後，可以根據使用者註冊時提供的資料，或是預設的題庫中產生一或多個題目與答案，形成對使用者提出的一或多個問題語音。According to the embodiment of the voice command processing method, the service host receives a voice message from a terminal device, such as a voice message generated by a user through a smart speaker or a computer device. To confirm the user who sent the voice message, then one or more questions and answers can be generated according to the information provided by the user during registration, or from the preset question bank, to form a voice for one or more questions raised by the user .

之後，服務主機通過網路傳送問題語音至終端裝置，並由終端裝置播出問題語音，經該使用者回應各問題語音時，服務主機接收對應各問題語音的答覆語音。如此，在服務主機中，可以取得聲紋比對結果，藉以判斷是否為登錄於服務主機的使用者本人，以確認使用者的身份，以及取得對應各問題語音的答覆語音，藉此判斷是否為一真人。Afterwards, the service host transmits the question voice to the terminal device through the network, and the terminal device broadcasts the question voice. When the user responds to each question voice, the service host receives the reply voice corresponding to each question voice. In this way, in the service host, the voiceprint comparison result can be obtained to determine whether it is the user who is logged in to the service host, to confirm the identity of the user, and to obtain the reply voice corresponding to each question voice, so as to determine whether it is the user himself. A real person.

進一步地，在所述方法中，從語音訊息可解析出一封包來源，例如為終端裝置的媒體存取控制位址或網路位址，使得服務主機可以根據封包來源得出註冊於服務主機的終端裝置，而這個終端裝置可以對照一或多個註冊在服務主機的使用者，一旦經聲紋比對，其結果可得出發出問答語音的使用者。Further, in the method, the source of a packet can be parsed from the voice message, such as the media access control address or the network address of the terminal device, so that the service host can obtain the information registered in the service host according to the source of the packet. The terminal device can be compared with one or more users registered in the service host. Once the voiceprint is compared, the result can be used to obtain the user who issued the question-and-answer voice.

優選地，所述題目與答案可為預設於題庫中的題目與答案，或是根據使用者於服務主機註冊時的資料產生對應使用者的一或多個題目與答案。Preferably, the questions and answers can be preset questions and answers in a question bank, or one or more questions and answers corresponding to the user are generated according to the data of the user when the user registers with the service host.

優選地，所述終端裝置可為設於使用者附近的一智能音箱，也可為用於處理語音訊息的電腦裝置，例如使用者的行動裝置。Preferably, the terminal device may be a smart speaker located near the user, or may be a computer device for processing voice messages, such as the user's mobile device.

進一步地，當服務主機接收到語音訊息時，其中可執行一語意分析，而對應得出使用者所請求的一服務項目，可提供對應此服務項目的服務語音，同樣地，服務語音將可通過終端裝置播出。Further, when the service host receives the voice message, it can perform a semantic analysis, and correspondingly obtain a service item requested by the user, and can provide a service voice corresponding to this service item. Similarly, the service voice will be available through Terminal device broadcast.

為使能更進一步瞭解本發明的特徵及技術內容，請參閱以下有關本發明的詳細說明與圖式，然而所提供的圖式僅用於提供參考與說明，並非用來對本發明加以限制。For a further understanding of the features and technical content of the present invention, please refer to the following detailed descriptions and drawings of the present invention. However, the drawings provided are only for reference and description, and are not intended to limit the present invention.

以下是通過特定的具體實施例來說明本發明的實施方式，本領域技術人員可由本說明書所公開的內容瞭解本發明的優點與效果。本發明可通過其他不同的具體實施例加以施行或應用，本說明書中的各項細節也可基於不同觀點與應用，在不悖離本發明的構思下進行各種修改與變更。另外，本發明的附圖僅為簡單示意說明，並非依實際尺寸的描繪，事先聲明。以下的實施方式將進一步詳細說明本發明的相關技術內容，但所公開的內容並非用以限制本發明的保護範圍。The following are specific embodiments to illustrate the embodiments of the present invention, and those skilled in the art can understand the advantages and effects of the present invention from the content disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments, and various details in this specification can also be modified and changed based on different viewpoints and applications without departing from the concept of the present invention. In addition, the accompanying drawings of the present invention are merely schematic illustrations, and are not drawn according to the actual size, and are stated in advance. The following embodiments will further describe the related technical contents of the present invention in detail, but the disclosed contents are not intended to limit the protection scope of the present invention.

應當可以理解的是，雖然本文中可能會使用到“第一”、“第二”、“第三”等術語來描述各種元件或者信號，但這些元件或者信號不應受這些術語的限制。這些術語主要是用以區分一元件與另一元件，或者一信號與另一信號。另外，本文中所使用的術語“或”，應視實際情況可能包括相關聯的列出項目中的任一個或者多個的組合。It should be understood that although terms such as "first", "second" and "third" may be used herein to describe various elements or signals, these elements or signals should not be limited by these terms. These terms are primarily used to distinguish one element from another, or a signal from another signal. In addition, the term "or", as used herein, should include any one or a combination of more of the associated listed items, as the case may be.

揭露書公開一種通過語音控制遠端服務的技術，所提出的語音指令處理方法與系統能夠讓使用者在具有安全驗證的程序中以語音方式取得遠端系統提供的服務，例如利用語音產生查詢自己金融帳戶的餘額、轉帳、查詢資料、傳遞訊息等，過程中特別是利用語音聲紋特徵認證使用者身份，並且能夠同時判斷是否為真實的人，而非他人仿冒或是智能（AI）機器人的語音。如此，所述語音指令處理方法可以讓處理此程序的系統根據語音信息同時處理是否為真人以及驗證經過註冊的使用者的身份，使得系統可以據此提供所請求的服務。The disclosure discloses a technology for controlling remote services by voice. The proposed voice command processing method and system allow users to obtain services provided by the remote system by voice in a program with security verification, such as using voice to generate and query themselves. Financial account balances, transfers, data inquiries, message transmission, etc., in the process, especially the use of voice and voiceprint features to authenticate the user's identity, and can simultaneously determine whether it is a real person, not someone else's counterfeit or intelligent (AI) robot. voice. In this way, the voice command processing method can allow the system processing the program to simultaneously process whether it is a real person and verify the identity of the registered user according to the voice information, so that the system can provide the requested service accordingly.

先參考圖1所示應用語音指令處理系統的情境示意圖，此圖顯示有一使用者10，附近設有處理語音指令的終端裝置，用於處理使用者10發出的語音訊息，並能通過網路轉送語音訊息至特定目的地，亦用於處理從外部接收到的語音訊息，終端裝置例如圖示的智能音箱12，也可為使用者10所操作的行動裝置、個人電腦等可處理語音訊息（接收、處理、播送）的電腦裝置。Referring first to the schematic diagram of the application of the voice command processing system shown in FIG. 1, this figure shows a user 10, and a terminal device for processing voice commands is arranged nearby, which is used to process the voice messages sent by the user 10 and can be transmitted through the network. The voice message is sent to a specific destination and is also used to process the voice message received from the outside. The terminal device, such as the smart speaker 12 shown in the figure, can also be a mobile device or a personal computer operated by the user 10. , processing, broadcasting) computer equipment.

此例中，使用者10位於智能音箱12可以接收到語音訊息的範圍內，如同處於客廳或房間內，智能音箱12扮演一個物聯網（IoT）的角色，隨時連接網路，如此，使得系統利用一個智能音箱12可以同時處理多人語音的服務，也就是通過智能音箱12，系統可以根據不同使用者的語音訊息提供對應的語音服務。此例顯示為連線到外部的服務主機14，通過智能音箱12，使用者10可以使用語音產生請求服務的指令到服務主機14，也能通過智能音箱12接收來自服務主機14傳送的語音訊息。In this example, the user 10 is located within the range where the smart speaker 12 can receive voice messages, just like in the living room or the room, the smart speaker 12 plays the role of an Internet of Things (IoT) and is connected to the network at any time, so that the system uses One smart speaker 12 can process multiple voice services at the same time, that is, through the smart speaker 12, the system can provide corresponding voice services according to the voice messages of different users. This example is shown as being connected to an external service host 14 . Through the smart speaker 12 , the user 10 can use voice to generate an instruction to request a service to the service host 14 , and can also receive voice messages from the service host 14 through the smart speaker 12 .

根據圖示範例，智能音箱12可以一特定應用程式介面（如OpenAPI）連線至金融機構網站查詢帳戶餘額，此例顯示為服務主機14。當使用者10發出語音：「我要查詢帳戶餘額」，由智能音箱12接收到此語音訊息，若為其內部可以處理的語音指令，智能音箱12可以自行根據其中軟體程序處理此語音指令，例如語音查詢天氣（連結特定伺服器）、使用者個人行事曆、查詢資訊、播放音樂等；若語音指令關於特定服務主機14處理的服務項目，即將語音訊息連同智能音箱12的資訊（裝置資訊、網路資訊等），或包括使用者識別資訊與特定資訊，經過包裹處理後形成語音封包，經網路傳送到服務主機14，之後服務主機14處理此語音訊息，包括從語音封包中得出對應終端裝置的資訊，以及通過語意分析語音訊息而得出特定服務的請求指令。According to the illustrated example, the smart speaker 12 can connect to a financial institution website to inquire about the account balance through a specific application programming interface (eg, OpenAPI), which is shown as the service host 14 in this example. When the user 10 makes a voice: "I want to check the account balance", the smart speaker 12 receives the voice message. If it is a voice command that can be processed internally, the smart speaker 12 can process the voice command according to its software program, for example Voice inquiries about the weather (connecting to a specific server), the user's personal calendar, inquiring information, playing music, etc.; if the voice command is about the service items processed by the specific service host 14, the voice message will be combined with the information of the smart speaker 12 (device information, network road information, etc.), or including user identification information and specific information, after the package processing, a voice packet is formed, which is transmitted to the service host 14 through the network, and then the service host 14 processes the voice message, including deriving the corresponding terminal from the voice packet. Device information, and request commands for specific services by semantically analyzing voice messages.

此例顯示，服務主機14產生的回覆語音通過智能音箱12回覆給使用者10，還能藉此確認使用者10的身份，例如詢問：「請問您的身份證件號碼是什麼？」，這時使用者10可回答：「Axxxxxxx」，同樣地，語音訊息經過包裹後再經智能音箱12傳送到服務主機14，由服務主機14根據資料庫記載的資料認證使用者身份，其中也包括可以聲紋方式識別使用者身份。最後，若通過身份驗證，服務主機14產生回覆語音，通過智能音箱12回覆：「餘額為XXXX」。This example shows that the reply voice generated by the service host 14 is replied to the user 10 through the smart speaker 12, and the identity of the user 10 can also be confirmed by this, such as asking: "What is your ID number?", then the user 10. Answer: "Axxxxxxx". Similarly, the voice message is packaged and then sent to the service host 14 through the smart speaker 12, and the service host 14 authenticates the user's identity according to the data recorded in the database, which also includes voiceprint recognition. User identity. Finally, if the authentication is passed, the service host 14 generates a reply voice, and the smart speaker 12 replies: "The balance is XXXX".

所述情境為應用語音指令處理系統的實施方式，圖2接著顯示運行此語音指令處理方法的系統架構實施例示意圖。The scenario is an implementation of a voice command processing system, and FIG. 2 then shows a schematic diagram of an embodiment of a system architecture for running the voice command processing method.

在此系統架構示意圖中，使用者20端設有終端裝置22，一個終端裝置22可以服務一或多位使用者20，終端裝置22可以為一智能音箱或執行特定語音助理程式的電腦裝置，終端裝置22通過網路24連線服務主機26，服務主機26可以為金融服務相關的伺服器、叢集或是伺服系統，提供多種服務項目，可以依據使用者20發出的語音訊息經語意分析後得出所請求的服務項目。In this schematic diagram of the system architecture, the user 20 is provided with a terminal device 22, one terminal device 22 can serve one or more users 20, and the terminal device 22 can be a smart speaker or a computer device executing a specific voice assistant program. The device 22 is connected to the service host 26 through the network 24. The service host 26 can provide a variety of service items for a server, cluster or server system related to financial services, and can obtain the result after semantic analysis according to the voice message sent by the user 20. The requested service item.

服務主機26設有資料庫28，其中可記載一題庫以及對應多個使用者的聲紋資料，題庫主要是用於產生一或多個題目與答案，除了題庫外，也包括非題庫的隨機問題，如數學問題，服務主機26提供對使用者20提出的一或多個問題語音，搭配題庫，服務主機26也可以根據使用者資料產生對應某位使用者20的問題，讓系統可以對使用者20提問而能根據回應判斷是否為真實之人。在此一提的是，服務主機26所提問的方式可以為題庫中預設的問題，並已經具備預設答案，例如詢問使用者20數學題目、歷史問題、時事問題、科學問題等一或多個問題，讓使用者20回答，系統可以智能處理使用者20的答案判斷是否為真人；服務主機26提出的問題也可以是與使用者20相關的問題，例如依據使用者20註冊服務主機26時填寫的生日、地址、聯絡人、學歷等，以此回問使用者20，同樣地，系統也可以智能處理使用者20的答案判斷是否為真人。The service host 26 is provided with a database 28, which can record a question bank and the voiceprint data corresponding to a plurality of users. The question bank is mainly used to generate one or more questions and answers. In addition to the question bank, it also includes random questions other than the question bank. , such as math problems, the service host 26 provides one or more questions raised by the user 20. With the question bank, the service host 26 can also generate a question corresponding to a certain user 20 according to the user data, so that the system can respond to the user 20. 20 A person who asks a question and can judge whether he is real or not based on the response. It should be mentioned here that the question asked by the service host 26 can be a preset question in the question bank, and has preset answers, such as asking the user 20 one or more math questions, historical questions, current affairs questions, scientific questions, etc. Let the user 20 answer a question, and the system can intelligently process the answer of the user 20 to determine whether it is a real person; the question asked by the service host 26 can also be a question related to the user 20, such as when the user 20 registers the service host 26. Fill in the birthday, address, contact person, educational background, etc., and then ask the user 20. Similarly, the system can also intelligently process the answer of the user 20 to determine whether it is a real person.

服務主機26中以處理器執行特定軟體指令集，使得運行語音指令處理方法，根據所運行的功能，可以包括來源識別單元263，能根據語音封包記載的資訊解析後識別來源，當終端裝置22註冊於服務主機26時，建立資料庫清單，若從語音封包取得終端裝置22的網路位址（IP位址）或媒體存取控制位址（MAC位址），可以比對註冊資料而得出對應的一或多個使用者。In the service host 26, the processor executes a specific software instruction set, so that the voice command processing method is executed. According to the function to be executed, a source identification unit 263 may be included, which can identify the source after parsing the information recorded in the voice packet. When the terminal device 22 registers When serving the host 26, a database list is created. If the network address (IP address) or media access control address (MAC address) of the terminal device 22 is obtained from the voice packet, it can be obtained by comparing the registration data corresponding one or more users.

服務主機26包括一聲紋比對單元265，這是利用軟體程序比對接收到的語音訊息，從中採樣音頻訊號、處理訊噪問題以除去干擾、增益控制，以擷取聲紋特徵，經比對使用者20註冊服務主機26時建立的聲紋資料，可以藉此認證使用者20的身份。The service host 26 includes a voiceprint comparison unit 265, which uses a software program to compare the received voice messages, sample audio signals from them, deal with signal noise problems to remove interference, gain control, and extract voiceprint features. The voiceprint data created when the user 20 registers with the service host 26 can be used to authenticate the identity of the user 20 .

服務主機26包括一真人題庫單元267，利用軟體程序從資料庫28中取得題目與答案，形成詢問使用者20的問答語音，經終端裝置22播出給使用者20聽，可以判斷是否為真人發出的語音訊息。另可配合聲紋比對單元265，從使用者20回覆的語音中採樣而得出聲紋特徵，藉此認證使用者20的身份。The service host 26 includes a real question bank unit 267, which uses a software program to obtain questions and answers from the database 28, and forms a question and answer voice for asking the user 20, and broadcasts it to the user 20 through the terminal device 22. It can be judged whether it is issued by a real person. 's voice message. In addition, the voiceprint matching unit 265 can be used to sample the voice response from the user 20 to obtain the voiceprint feature, thereby authenticating the identity of the user 20 .

服務主機26利用其中身份認證單元269執行使用者20的身份認證，在所述語音指令處理系統中，可以先根據終端裝置22的資訊辨識對應的一或多位使用者，可以更有效率地比對這些使用者的聲紋資料，以認證使用者身份。The service host 26 utilizes the identity authentication unit 269 to perform the identity authentication of the user 20. In the voice command processing system, one or more corresponding users can be identified first according to the information of the terminal device 22, which can be more efficiently compared. The voiceprint data of these users is used to authenticate the user's identity.

所述服務主機26也可同時為提供特定服務的伺服器，但也可涵蓋以另一伺服器提供服務的實施方式，服務主機26可以根據使用者在終端產生的語音指令提供對應的服務，並以語音回覆使用者，使得使用者可以在不用手動操作的情況下獲得服務，並且，整個語音指令的處理過程可同時執行使用者身份認證以及確認使用者為真人。The service host 26 can also be a server that provides specific services at the same time, but it can also cover the implementation of providing services with another server. The service host 26 can provide corresponding services according to the voice commands generated by the user at the terminal, and Replying to the user with voice enables the user to obtain services without manual operation, and the entire voice command processing process can simultaneously perform user identity authentication and confirm that the user is a real person.

所述服務主機中執行語音指令處理方法，可參考圖3所示之流程圖，其中描述的方法應用於語音指令處理系統中，系統提供一或多個題目與答案，其中除了通過服務主機以資料庫記載的題庫與使用者來往互動外，還包括一些隨機的問題，再利用資料庫中使用者的聲紋資料認證使用者身份，方法流程如下，其中步驟順序並非限制發明的實施，可涵蓋步驟順序改變而達成相同目的的各實施方式。The voice command processing method performed in the service host can be referred to the flowchart shown in FIG. 3, wherein the described method is applied to the voice command processing system, and the system provides one or more questions and answers, in addition to the information provided by the service host. The question bank recorded in the database not only interacts with users, but also includes some random questions, and then uses the user's voiceprint data in the database to authenticate the user's identity. The method flow is as follows. The sequence of steps does not limit the implementation of the invention, but can cover Embodiments in which the order is changed to achieve the same object.

一開始，如步驟S301，服務主機自某一終端裝置接收語音訊息，如步驟S303，在服務主機中，可以先利用軟體程序進行一聲紋比對，以確認發出語音訊息的使用者，這時，如步驟S305，運行於服務主機中的軟體程序提供一或多個題目與答案，包括從資料庫題庫中得出題目與答案，或是隨機形成的題目與答案，以形成回應使用者的問題語音。題目的選擇可以從題庫，也可以是數學等隨機的問題，也可以依據與使用者關聯的資訊形成問題語音，數量也不一定，可以視使用者回答問題的狀況而定，直到判斷出是否為真人的結果為止。At the beginning, as in step S301, the service host receives a voice message from a certain terminal device. As in step S303, in the service host, a software program can be used to perform a voiceprint comparison to confirm the user who sent the voice message. At this time, In step S305, the software program running in the service host provides one or more questions and answers, including questions and answers obtained from a database question bank, or randomly formed questions and answers, to form a voice in response to the user's question . The selection of questions can be from question banks, or random questions such as mathematics, or they can form question voices based on the information associated with the user. until real results.

接著，在步驟S307中，將問題語音轉換為網路封包再傳送至終端裝置，經終端裝置轉換為語音後播出，由使用者回應問題語音，並產生答覆語音，再由終端裝置回傳至服務主機，如步驟S309，由服務主機經由網路接收對應問題語音的答覆語音。Next, in step S307, the question voice is converted into a network packet and sent to the terminal device, and the terminal device converts it into voice and broadcasts it. The user responds to the question voice and generates a reply voice, which is then sent back to the terminal device. The service host, in step S309, receives the reply voice corresponding to the question voice via the network by the service host.

如步驟S311，從回覆語音的訊息中解析出聲紋特徵，可用以確認使用者的身份的資訊，並同時依照回覆語音（一或多則）的狀態判斷是否為真人。In step S311, the voiceprint feature is parsed from the message of the reply voice, which can be used to confirm the user's identity information, and at the same time judge whether it is a real person according to the state of the reply voice (one or more).

應用上述語音指令處理方法流程，系統可進一步讓使用者利用語音發出指令，讓服務主機進行語意分析，得出語音指令中對應的服務項目，特別是需要身份認證的服務項目，使得系統將利用語音訊息執行真人判斷與使用者認證。Applying the above voice command processing method flow, the system can further allow the user to issue commands using voice, and let the service host perform semantic analysis to obtain the corresponding service items in the voice command, especially the service items that require identity authentication, so that the system will use voice. The message performs human judgment and user authentication.

圖4顯示為應用語音指令處理方法的實施例流程圖。FIG. 4 shows a flowchart of an embodiment of applying a voice command processing method.

所述語音指令處理方法應用於服務主機中，開始如步驟S401，自一終端裝置接收語音訊息，在此實施例中，如步驟S403，可以語音訊息解析出一封包來源，例如得出終端裝置的網路硬體資訊（如MAC位址），以此判斷服務的對象，其中之一方式是可以對應終端裝置所關聯的一或多個使用者，如步驟S405，此關聯資訊記載終端裝置對照一或多個註冊在服務主機的使用者，使用者名單可以在服務主機註冊服務時建立，使得終端裝置綁定了一或多個使用者，以限定來自此裝置的服務對象。The voice command processing method is applied to the service host, and starts to receive a voice message from a terminal device as in step S401. In this embodiment, as in step S403, the source of a packet can be parsed from the voice message, for example, the terminal device's The network hardware information (such as MAC address) is used to determine the object of the service. One way is to correspond to one or more users associated with the terminal device. In step S405, the associated information records that the terminal device is compared with a or multiple users registered in the service host, the user list can be established when the service host registers the service, so that the terminal device binds one or more users to limit the service objects from this device.

在步驟S407，服務主機也可以根據所接收的語音信息進行聲紋比對，經取得聲紋比對結果，可藉以判斷是否為登錄於服務主機的使用者本人，用以確認使用者身份。值得一提的是，當服務主機先以封包來源得到終端裝置關聯的一或多個使用者，如此可以限定聲紋比對的對象範圍，使得整個認證程序更有效率。In step S407, the service host can also perform voiceprint comparison according to the received voice information, and after obtaining the voiceprint comparison result, it can be used to determine whether it is the user who logs in to the service host, so as to confirm the identity of the user. It is worth mentioning that, when the service host first obtains one or more users associated with the terminal device by the packet source, the object range of the voiceprint comparison can be limited, which makes the whole authentication process more efficient.

在步驟S409中，經聲紋比對後，比對結果即用以判斷接收到的語音訊息是否為綁定此終端裝置的使用者之一，用以進行身份認證，在步驟S411中，判斷是否通過身份認證，若沒有通過身份認證（否），如步驟S413，服務主機可以產生一結束語音，通過終端裝置告知使用者。In step S409, after the voiceprint is compared, the comparison result is used to determine whether the received voice message is one of the users bound to the terminal device for identity authentication. In step S411, it is determined whether If the identity authentication is passed, if the identity authentication is not passed (No), as shown in step S413, the service host may generate an ending voice and notify the user through the terminal device.

若是通過身份認證（是），即執行步驟S415，對接收的語音訊息進行語意分析，判斷請求的服務項目。在步驟S417，可以在系統判斷要提供的服務項目需要進一步認證使用者是否為真人時進行真人判斷程序，即可從題庫產生問題語音，也就是根據資料庫中記載之題庫，或是隨機產生的非題庫問題，產生一或多個題目與答案，形成對使用者提出的一或多個問題語音，並可多次傳送語音訊息到終端裝置，由終端裝置播出問題語音，由使用者回答，接著就是步驟S419，服務主機接收對應各問題語音的答覆語音，可以判斷是否為真人（步驟S421），如果根據來往語音判斷並非真實的人（否），即執行步驟S413，服務主機可以產生結束語音，通過終端裝置告知使用者。If the identity authentication is passed (Yes), step S415 is executed to perform semantic analysis on the received voice message to determine the requested service item. In step S417, when the system judges that the service item to be provided needs to be further authenticated whether the user is a real person or not, a real person judgment procedure can be performed, and the question voice can be generated from the question bank, that is, according to the question bank recorded in the database, or randomly generated Non-question bank questions, generate one or more questions and answers, form one or more question voices to the user, and can transmit voice messages to the terminal device multiple times, the terminal device broadcasts the question voice, and the user answers, The next step is step S419, the service host receives the reply voice corresponding to each question voice, and can determine whether it is a real person (step S421), if it is judged that it is not a real person (No) according to the incoming and outgoing voice, that is, step S413 is executed, and the service host can generate an ending voice , and inform the user through the terminal device.

反之，若確認語音來源為真實的人（是），即如步驟S423，服務主機可以根據上述步驟判斷的服務項目提供服務，例如提供查詢金融帳戶的餘額，產生服務語音，通過網路傳送到終端裝置，服務語音即通過終端裝置播出給使用者。On the contrary, if it is confirmed that the source of the voice is a real person (yes), that is, as in step S423, the service host can provide services according to the service items judged in the above steps, for example, provide a query on the balance of a financial account, generate a service voice, and transmit it to the terminal through the network. device, the service voice is broadcast to the user through the terminal device.

在此一提的是，步驟S415、步驟S417與步驟S407比對聲紋的前後順序並不限制，也可以同時進行。It should be mentioned here that the sequence of comparing the voiceprints in step S415, step S417 and step S407 is not limited, and may be performed simultaneously.

圖5接著以多方流程描述語音指令處理方法的實施例，方法運行於使用者51、終端裝置52、電信伺服器53與服務主機54，其中終端裝置52如一物聯網裝置，可以通過電信伺服器53提供網路服務，與服務主機54連線，服務主機54可以為一電腦主機，或是多個伺服器形成的系統，用以認證使用者51的身份，也提供真人判斷，並提供服務。FIG. 5 then describes an embodiment of a voice command processing method with multiple processes. The method runs on the user 51 , the terminal device 52 , the telecommunication server 53 and the service host 54 . Provide network service, connect with the service host 54, the service host 54 can be a computer host, or a system formed by a plurality of servers to authenticate the identity of the user 51, provide real person judgment, and provide services.

流程由使用者51開始發出語音指令開始，產生的語音訊息由終端裝置52接收（步驟S501），形成語音封包，傳送語音封包至電信伺服器53（步驟S502），再轉送語音封包到服務主機54（步驟S503）。The process starts when the user 51 starts to issue a voice command, the generated voice message is received by the terminal device 52 (step S501 ), a voice packet is formed, the voice packet is sent to the telecommunication server 53 (step S502 ), and then the voice packet is forwarded to the service host 54 (step S503).

在服務主機54中，根據語音封包解析來源、執行語意分析、身份識別，以及產生問答題目，其中可隨機產生或從題庫中取得一或多個題目與答案，並可根據使用者51最初發出的語音訊息中得到使用者身份與服務項目，因此也可以對照使用者資料、服務項目產生題目與答案（步驟S504），這此實施例中，此階段先執行與使用者51的問答，將問答訊息傳送至電信伺服器53（步驟S505），形成訊息封包後轉送至終端裝置52（步驟S506），由終端裝置52播出給使用者51（步驟S507），由使用者51回覆，形成的語音訊息將由終端裝置52接收（步驟S508），通過電信伺服器53（步驟S509）轉送至服務主機54（步驟S510）。In the service host 54 , analyze the source according to the voice packets, perform semantic analysis, identity recognition, and generate question and answer questions, among which one or more questions and answers can be randomly generated or obtained from the question bank, and can be based on the user 51's original question. The user identity and service items are obtained from the voice message, so the questions and answers can also be generated by comparing the user information and service items (step S504). It is sent to the telecommunications server 53 (step S505 ), a message packet is formed and then forwarded to the terminal device 52 (step S506 ), and the terminal device 52 broadcasts it to the user 51 (step S507 ), and the user 51 replies to form a voice message. It will be received by the terminal device 52 (step S508 ), and transferred to the service host 54 through the telecommunication server 53 (step S509 ) (step S510 ).

在服務主機54中，將驗證從使用者51產生的回覆語音，若確認為真人，也從上述步驟中完成身份認證，即可進一步回應服務項目（步驟S511），形成服務訊息，經由電信伺服器53（步驟S512）轉送服務訊息至終端裝置（步驟S513），在使用者端，以終端裝置52播出服務訊息（步驟S514），即完成本次語音服務流程。In the service host 54, the reply voice generated by the user 51 will be verified, and if it is confirmed as a real person, the identity authentication will be completed from the above steps, and the service items can be further responded to (step S511), and a service message will be formed, which is passed through the telecommunication server. 53 (step S512 ) forwarding the service message to the terminal device (step S513 ), on the user side, the terminal device 52 broadcasts the service message (step S514 ), ie, the current voice service process is completed.

接著，根據一實施例，在系統設定的一定時間限制內，使用者可以連續發出語音訊息取得服務。在另一實施例中，服務主機中的軟體程序持續運作，可以在每次使用者發出語音時持續運作語音指令處理方法，包括持續利用語音訊息驗證是否為真人與使用者身份，提供語音中語意請求的服務項目。Next, according to an embodiment, within a certain time limit set by the system, the user can continuously send voice messages to obtain the service. In another embodiment, the software program in the service host runs continuously, and can continuously run the voice command processing method every time the user makes a voice, including continuously using the voice message to verify whether it is a real person and the identity of the user, and to provide semantic meaning in the voice The requested service item.

在此一提的是，上述實施例記載的流程並不限於實施例所記載的步驟順序，凡通過簡單置換而達成相同目的的流程都為本發明所涵蓋，例如在服務主機中執行的真人判斷、聲紋比對與身份認證等步驟順序可以依照實際情況修正。It should be mentioned here that the processes described in the above embodiments are not limited to the sequence of steps described in the embodiments, and all processes that achieve the same purpose through simple replacement are covered by the present invention, such as the real person judgment executed in the service host , voiceprint comparison and identity authentication and other steps can be modified according to the actual situation.

以上所公開的內容僅為本發明的優選可行實施例，並非因此侷限本發明的申請專利範圍，所以凡是運用本發明說明書及圖式內容所做的等效技術變化，均包含於本發明的申請專利範圍內。The contents disclosed above are only preferred feasible embodiments of the present invention, and are not intended to limit the scope of the present invention. Therefore, any equivalent technical changes made by using the contents of the description and drawings of the present invention are included in the application of the present invention. within the scope of the patent.

10:使用者 12:智能音箱 14:服務主機 20:使用者 22:終端裝置 24:網路 26:服務主機 263:來源識別單元 265:聲紋比對單元 267:真人題庫單元 269:身份認證單元 28:資料庫 51:使用者 52:終端裝置 53:電信伺服器 54:服務主機步驟S301～S311:語音指令處理流程步驟S401～S423:語音指令應用流程步驟S501～S514:語音指令處理流程 10: User 12: Smart Speakers 14: Service Host 20: User 22: Terminal device 24: Internet 26: Service Host 263: Source Identification Unit 265: Voiceprint comparison unit 267: Real Question Bank Unit 269: Authentication unit 28:Database 51: User 52: Terminal device 53: Telecom Server 54: Service Host Steps S301-S311: voice command processing flow Steps S401-S423: voice command application process Steps S501-S514: voice command processing flow

圖1顯示為應用語音指令處理系統的情境示意圖；FIG. 1 is a schematic diagram showing a situation of applying a voice command processing system;

圖2顯示為運行語音指令處理方法的系統架構實施例示意圖；2 shows a schematic diagram of an embodiment of a system architecture for running a voice command processing method;

圖3顯示為服務主機執行語音指令處理方法的實施例流程圖；3 shows an embodiment flow chart of a method for processing voice commands for a service host;

圖4顯示為應用語音指令處理方法的實施例流程圖；FIG. 4 shows a flowchart of an embodiment of applying a voice command processing method;

圖5顯示為語音指令處理方法的實施例流程圖。FIG. 5 shows a flowchart of an embodiment of a voice command processing method.

10:使用者 10: User

12:智能音箱 12: Smart Speakers

14:服務主機 14: Service Host

Claims

A voice command processing method, applied to a service host, comprising: receiving a voice message from a terminal device, and performing voiceprint comparison with the voice message and the voiceprint data created when a plurality of users register the service host, To confirm and authenticate a user who sent the voice message, the voice message obtains a request for a service item after a semantic analysis, and provides a service voice corresponding to the service item, and the service voice is broadcast through the terminal device; from the The voice message parses out a packet source, and according to the packet source, the terminal device registered with the service host is obtained; the user who sent the voice message is obtained according to the result of the voiceprint comparison, and a message is provided for the user. or multiple questions and answers, forming one or more question voices posed by the user, the one or more questions and answers include questions and answers preset in a question bank of a database, or according to the user One or more questions and answers corresponding to the user are generated from the data when the service host is registered; the one or more questions are transmitted to the user's terminal device, and the one or more questions are broadcast by the terminal device. a question voice; and after the user responds to each question voice, receiving a reply voice corresponding to each question voice; wherein, when the voiceprint comparison result is obtained, it is determined whether it is the user who is logged in to the service host, so as to confirm The identity of the user, and the answering voice corresponding to each question voice is obtained, wherein the user is asked to answer a question with a preset answer in the question bank through the service host, or is generated when registering the service host and is related to the user. The question requires the user to answer, and then verify the reply voice generated by the user to determine whether it is a real person; wherein, if it is confirmed that the source of the user's voice is the real person, the service host responds to the service item to provide services, forming a service message, forwarding the service message to the terminal device through a telecommunication server, and broadcasting the service message by the terminal device The voice of the service is given to the user; wherein, the voice message received by the terminal device is that the user uses voice to issue a command, and the service host performs the semantic analysis to obtain the corresponding service item in the voice command.

The voice command processing method according to claim 1, wherein a network address or a media access control address of the terminal device is obtained from the voice message to obtain the terminal device registered with the service host , and compare the registration data to obtain the corresponding one or more users.

A voice command processing system, comprising: a service host with a database in which voiceprint data corresponding to a plurality of users is recorded, a voice command processing method running in the service host, the method comprising: receiving a voice command from a terminal device Voice message, compare the voice message with the voiceprint data created when the multiple users registered the service host to confirm and authenticate a user who sent the voice message, the voice message is subjected to a semantic analysis Then, a service item is requested, and a service voice corresponding to the service item is provided, and the service voice is broadcast through the terminal device; a packet source is parsed from the voice message, and according to the packet source, the service voice registered on the service host is obtained. The terminal device; obtains the user who sent the voice message based on the result of the voiceprint comparison, and provides one or more questions and answers for the user to form one or more question voices for the user , the one or more questions and answers include questions and answers preset in the question bank of the database, or one or more questions and answers corresponding to the user are generated according to the data of the user when the user registers with the service host answer; send the one or more question voices to the terminal device of the user, and the terminal device broadcasts the one or more question voices; and through the user responding to the question voices, receive the corresponding question voices Reply voice; wherein, when the voiceprint comparison result is obtained, it is determined whether it is the user who is logged in to the service host to confirm the identity of the user, and to obtain the answer voice corresponding to the voice of each question, which is obtained through the service host Asking the user to answer a question with a preset answer in the question bank, or asking the user to answer a question related to the user, and verifying the answering voice generated from the user to determine whether it is a real person; wherein , if it is confirmed that the user's voice source is the real person, the service host responds to the service item to provide services, forms a service message, and transmits the service message to the terminal device through a telecommunication server, and the terminal device broadcasts the service message The voice of the service is given to the user; wherein, the voice message received by the terminal device is that the user uses voice to issue a command, and the service host performs the semantic analysis to obtain the corresponding service item in the voice command.

The voice command processing system of claim 3, wherein the terminal device is a smart speaker located near the user, or a computer device for processing the voice message.

The voice command processing system of claim 3, wherein a network address or a media access control address of the terminal device is obtained from the voice message to obtain the terminal device registered with the service host , and compare the registration data to obtain the corresponding one or more users.