TW201717192A

TW201717192A - Electronic apparatus and voice trigger method therefor

Info

Publication number: TW201717192A
Application number: TW105113280A
Authority: TW
Inventors: 王建傑; 林珩之
Original assignee: 絡達科技股份有限公司
Priority date: 2015-11-03
Filing date: 2016-04-28
Publication date: 2017-05-16
Also published as: TWI639153B

Abstract

A voice trigger method for an electronic apparatus is provided. The voice trigger method includes the following steps. Receive a current voice signal. Extract a current voice feature from the current voice signal. Determine whether a previous voice feature of a previous voice signal is stored. When the previous voice feature is stored, at least one of adjusting a confidence threshold and determining whether to wake up the electronic apparatus is performed according to whether the previous voice feature is corresponding to a False Acceptance situation or a False Rejection situation and the similarity between the current voice feature and the previous voice feature.

Description

Electronic device and method for waking up by voice recognition

本發明是有關於一種電子裝置及其喚醒的方法，且特別是有關於一種電子裝置及其透過語音辨識喚醒的方法。 The present invention relates to an electronic device and a method for awakening thereof, and more particularly to an electronic device and a method for awakening by voice recognition.

近年來，由於科技的進步，使用者可透過語音進行電子裝置的控制，例如使用者可透過語音喚醒電子裝置。但是，由於各項因素的影響，往往會造成誤接收(False Acceptance)的情況(也就是電子裝置在非預期的情況下被喚醒)，或者造成誤拒絕(False Rejection)的情況(也就是使用者欲透過語音喚醒電子裝置，但卻無法成功喚醒電子裝置)。舉例來說，吵雜的環境聲音有可能在非預期的情況下喚醒電子裝置。或者，使用者在與別人交談，但是講話的聲音卻在非預期的情況下喚醒電子裝置。或者，由於使用者的口音問題而造成無法成功喚醒電子裝置。一般來說，欲改善上述問題必須線上調整(On-line adaptation)關鍵詞組語音模型或預先訓練特定使用者之關鍵詞組語音模型，但是線上調整或是預先訓練關鍵詞組語音模型的複雜度高，不符合成本考量。且若在調整或是預先訓練關鍵詞組語音模型時發生錯誤，則可能造成關鍵詞組語音模型無法使用。因此，如何有效改善誤接收及誤拒絕的情況來增加成功喚醒電子裝置的機率，乃業界所努力的方向之一。 In recent years, due to advances in technology, users can control electronic devices through voice. For example, users can wake up electronic devices through voice. However, due to various factors, it often causes False Acceptance (that is, the electronic device is awakened in unexpected situations), or causes false rejection (False Rejection) (that is, the user). I want to wake up the electronic device through voice, but I can't wake up the electronic device successfully.) For example, noisy ambient sounds may wake up electronic devices in unexpected situations. Or, the user is talking to someone, but the voice of the speech wakes up the electronic device unexpectedly. Or, the electronic device cannot be successfully woken up due to the user's accent problem. In general, to improve the above problems, you must adjust the On-line adaptation keyword group speech model or pre-train the keyword group speech model of a specific user, but online adjustment The complexity of the whole or pre-trained keyword group speech model is high and does not meet cost considerations. If an error occurs in adjusting or pre-training the keyword group speech model, the keyword group speech model may be unusable. Therefore, how to effectively improve the situation of false reception and false rejection to increase the probability of successfully waking up electronic devices is one of the efforts of the industry.

本發明係有關於一種電子裝置及其透過語音辨識喚醒的方法，可調整透過語音辨識喚醒電子裝置的準確率。 The present invention relates to an electronic device and a method for awakening through voice recognition, which can adjust the accuracy of waking up an electronic device through voice recognition.

根據本發明之一方面，提出一種透過語音辨識喚醒一電子裝置的方法。方法包括以下步驟。接收一目前聲音訊號。擷取目前聲音訊號之一目前聲音特徵。判斷是否有儲存一前一聲音訊號之一前一聲音特徵。當儲存有前一聲音特徵時，則根據前一聲音特徵對應一誤接受的情況或一誤拒絕的情況以及目前聲音特徵與前一聲音特徵之間之一相似度，至少調整一信心門檻值與決定是否喚醒電子裝置之一。 According to an aspect of the present invention, a method for waking up an electronic device through voice recognition is provided. The method includes the following steps. Receive a current voice signal. Capture the current sound characteristics of one of the current sound signals. It is determined whether there is a previous sound feature stored in one of the previous audio signals. When the previous sound feature is stored, at least one confidence threshold is adjusted according to a previous sound feature corresponding to a false acceptance or a false rejection and a similarity between the current sound feature and the previous sound feature. Decide whether to wake up one of the electronic devices.

根據本發明之另一方面，提出一種電子裝置。電子裝置包括儲存裝置、聲音接收裝置及處理器。聲音接收裝置用以接收一目前聲音訊號。處理器用以擷取目前聲音訊號之一目前聲音特徵，並判斷儲存裝置中是否有儲存一前一聲音訊號之一前一聲音特徵。當儲存裝置儲存有前一聲音特徵時，則處理器根據前一聲音特徵對應一誤接受的情況或一誤拒絕的情況以及該聲音特徵與前一聲音特徵之間之一相似度，至少調整一信心門檻值與決定是否喚醒該電子裝置之一。 According to another aspect of the present invention, an electronic device is proposed. The electronic device includes a storage device, a sound receiving device, and a processor. The sound receiving device is configured to receive a current sound signal. The processor is configured to capture a current sound characteristic of one of the current audio signals, and determine whether the storage device has a previous sound feature stored in one of the previous audio signals. When the storage device stores the previous sound feature, the processor corresponds to a false acceptance condition or a false rejection condition and the sound according to the previous sound feature. One degree of similarity between the feature and the previous sound feature, at least adjusting a confidence threshold and deciding whether to wake up one of the electronic devices.

為了對本發明之上述及其他方面有更佳的瞭解，下文特舉較佳實施例，並配合所附圖式，作詳細說明如下： In order to better understand the above and other aspects of the present invention, the preferred embodiments are described below, and in conjunction with the drawings, the detailed description is as follows:

S101~S110、S201~S214、S301~S314、S401~S411、S501~S515‧‧‧流程步驟 S101~S110, S201~S214, S301~S314, S401~S411, S501~S515‧‧‧ process steps

100‧‧‧電子裝置 100‧‧‧Electronic devices

101‧‧‧儲存裝置 101‧‧‧Storage device

102‧‧‧聲音接收裝置 102‧‧‧Sound receiver

103‧‧‧處理器 103‧‧‧ processor

104‧‧‧使用者介面 104‧‧‧User interface

1021‧‧‧麥克風 1021‧‧‧Microphone

1022‧‧‧語音活動偵測電路 1022‧‧‧Voice Activity Detection Circuit

1023‧‧‧類比數位轉換器 1023‧‧‧ Analog Digital Converter

第1圖繪示根據本發明一實施例之電子裝置之方塊圖。 FIG. 1 is a block diagram of an electronic device according to an embodiment of the invention.

第2圖繪示根據本發明另一實施例之電子裝置之方塊圖。 FIG. 2 is a block diagram of an electronic device according to another embodiment of the present invention.

第3圖繪示根據本發明一實施例之透過語音辨識喚醒電子裝置的方法之流程圖。 FIG. 3 is a flow chart of a method for waking up an electronic device through voice recognition according to an embodiment of the invention.

第4圖繪示關鍵詞組語音模型之一例之示意圖。 Figure 4 is a schematic diagram showing an example of a keyword group speech model.

第5圖繪示維特比演算法之示意圖。 Figure 5 shows a schematic diagram of the Viterbi algorithm.

第6A及6B圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置的方法之流程圖。 6A and 6B are flowcharts showing a method for waking up an electronic device through voice recognition according to another embodiment of the present invention.

第7圖繪示動態時間扭曲演算法之示意圖。 Figure 7 shows a schematic diagram of a dynamic time warping algorithm.

第8A及8B圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置的方法之流程圖。 8A and 8B are flowcharts showing a method for waking up an electronic device through voice recognition according to another embodiment of the present invention.

第9圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置的方法之流程圖。 FIG. 9 is a flow chart of a method for waking up an electronic device through voice recognition according to another embodiment of the present invention.

第10A及10B圖繪示根據本發明另一實施例之透過語音辨識啟動電子裝置之特定功能的方法之流程圖。 10A and 10B are flowcharts showing a method for activating a specific function of an electronic device through voice recognition according to another embodiment of the present invention.

請參考第1圖，第1圖繪示根據本發明一實施例之電子裝置100之方塊圖。電子裝置100包括一儲存裝置101、一聲音接收裝置102、一處理器103以及選擇性的包括一使用者介面104。儲存裝置101用以儲存資料，儲存裝置101例如為記憶體。聲音接收裝置102用以接收聲音，並將類比聲音訊號轉換成數位聲音訊號。儲存裝置101及聲音接收裝置102耦接至處理器103，處理器103用以接收聲音接收裝置102所輸出之數位聲音訊號，並擷取數位聲音訊號之聲音特徵，以及存取儲存裝置101並管理儲存於儲存裝置101之資料。使用者介面104用以接收使用者之輸入，以及提供輸出資訊，使用者介面104例如為觸控面板。 Please refer to FIG. 1 . FIG. 1 is a block diagram of an electronic device 100 according to an embodiment of the invention. The electronic device 100 includes a storage device 101, a sound receiving device 102, a processor 103, and optionally a user interface 104. The storage device 101 is used to store data, and the storage device 101 is, for example, a memory. The sound receiving device 102 is configured to receive sound and convert the analog sound signal into a digital sound signal. The storage device 101 and the sound receiving device 102 are coupled to the processor 103. The processor 103 is configured to receive the digital audio signal output by the sound receiving device 102, capture the sound characteristics of the digital audio signal, and access the storage device 101 and manage The data stored in the storage device 101. The user interface 104 is configured to receive input from a user and provide output information. The user interface 104 is, for example, a touch panel.

請參考第2圖，第2圖繪示根據本發明另一實施例之電子裝置100之方塊圖。在此實施例中，聲音接收裝置102例如可以一麥克風1021、一語音活動偵測(Voice Activity Detection,VAD)電路1022及一類比/數位轉換器(Analog to Digital Converter)1023來實施。麥克風1021用以接收聲音。類比/數位轉換器1023用以將類比聲音訊號轉換成數位聲音訊號。語音活動偵測電路1022用以偵測聲音，並當偵測到聲音時，發出一訊號至處理器103。例如當語音活動偵測電路1022偵測到聲音時，發出訊號(例如是中斷訊號(interrupt))至處理器103以喚醒處理器103或通知處理器103，使處理器103處理類比/數位轉換器 1023輸出之數位聲音訊號。 Please refer to FIG. 2. FIG. 2 is a block diagram of an electronic device 100 according to another embodiment of the present invention. In this embodiment, the sound receiving device 102 can be implemented, for example, by a microphone 1021, a voice activity detection (VAD) circuit 1022, and an analog to digital converter (1023). The microphone 1021 is for receiving sound. Analog/digital converter 1023 is used to convert analog audio signals into digital audio signals. The voice activity detecting circuit 1022 is configured to detect the sound and send a signal to the processor 103 when the sound is detected. For example, when the voice activity detecting circuit 1022 detects the sound, a signal (for example, an interrupt) is sent to the processor 103 to wake up the processor 103 or notify the processor 103 to cause the processor 103 to process the analog/digital converter. 1023 output digital audio signal.

請參考第1及3圖，第3圖繪示根據本發明一實施例之透過語音辨識喚醒電子裝置100的方法之流程圖。在此實施例中，處理器103可判斷目前聲音訊號之目前聲音特徵是否為誤接收或誤拒絕的情況，當目前聲音特徵為誤接收或誤拒絕的情況時，儲存目前聲音特徵做後續使用。 Please refer to FIGS. 1 and 3 . FIG. 3 is a flow chart of a method for waking up the electronic device 100 through voice recognition according to an embodiment of the invention. In this embodiment, the processor 103 can determine whether the current sound feature of the current audio signal is a false reception or a false rejection. When the current sound feature is a false reception or a false rejection, the current sound feature is stored for subsequent use.

在步驟S101，聲音接收裝置102接收一目前聲音訊號。目前聲音訊號例如係根據使用者目前正在說話的聲音所得到。在步驟S102，處理器103擷取目前聲音訊號之目前聲音特徵。 In step S101, the sound receiving device 102 receives a current sound signal. The current sound signal is obtained, for example, based on the sound the user is currently speaking. In step S102, the processor 103 captures the current sound characteristics of the current audio signal.

在步驟S103，處理器103透過一匹配演算法比較目前聲音特徵與一關鍵詞組語音模型以得到一信心分數。也就是說，處理器103比較目前聲音特徵與關鍵詞組語音模型之關鍵詞組的相似程度得到信心分數，以根據信心分數決定目前聲音特徵是否可以喚醒電子裝置100。當目前聲音特徵與關鍵詞組語音模型的關鍵詞組的相似程度高，則信心分數高，代表使用者講話的內容與關鍵詞組語音模型的關鍵詞組一樣或非常類似，因此使用者可能欲透過語音喚醒電子裝置100。當目前聲音特徵與關鍵詞組語音模型的關鍵詞組的相似程度低，則信心分數低，代表使用者講話的內容與關鍵詞組語音模型的關鍵詞組差異大，因此使用者並非要透過語音來喚醒電子裝置100。 In step S103, the processor 103 compares the current sound feature with a keyword group speech model through a matching algorithm to obtain a confidence score. That is, the processor 103 compares the degree of similarity between the current sound feature and the keyword group of the keyword group speech model to obtain a confidence score to determine whether the current sound feature can wake up the electronic device 100 based on the confidence score. When the current sound feature is highly similar to the keyword group of the keyword group speech model, the confidence score is high, and the content representing the user's speech is the same as or very similar to the keyword group of the keyword group voice model, so the user may want to wake up the electronic through the voice. Device 100. When the current sound feature is less similar to the keyword group of the keyword group voice model, the confidence score is low, and the content representing the user's speech is different from the keyword group of the keyword group voice model, so the user does not want to wake up the electronic device through voice. 100.

在一實施例中，匹配演算法可為維特比(Viterbi)演算法。請參考第4及5圖，第4圖繪示關鍵詞組語音模型之示意圖，第5圖繪示維特比演算法之一例之示意圖。假設關鍵詞組語音模型包括六個狀態，依序為sil、State 0、State 1、State 2、State 3及sil，其中各個狀態上的箭頭代表狀態的轉移，每個狀態的轉移包含一轉移機率(未標示)。在第5圖中，橫軸為目前聲音之音框fr0~fr12，縱軸為狀態s0~s5，狀態s0~s5分別對應至關鍵詞組語音模型之六個狀態sil、State 0、State 1、State 2、State 3及sil。處理器103可根據維特比演算法找出一最佳路徑使得整段路徑的分數為最大，其中最短路徑上之每一格之分數的總和即為信心分數。在另一實施例中，匹配演算法可為任意一種演算法，只要可計算目前聲音特徵與關鍵詞組語音模型中的關鍵詞組的相似程度即可。 In an embodiment, the matching algorithm may be a Viterbi algorithm. Please refer to Figures 4 and 5, and Figure 4 shows the schematic of the keyword group speech model. Fig. 5 is a schematic diagram showing an example of the Viterbi algorithm. It is assumed that the keyword group speech model includes six states, which are sil, State 0, State 1, State 2, State 3, and sil, wherein the arrows on each state represent the transition of states, and the transition of each state includes a transfer probability ( Not marked). In Fig. 5, the horizontal axis is the current sound frame fr0~fr12, the vertical axis is the state s0~s5, and the states s0~s5 correspond to the six states sil, State 0, State 1, and State of the keyword group speech model, respectively. 2. State 3 and sil. The processor 103 can find an optimal path according to the Viterbi algorithm such that the score of the entire path is the largest, and the sum of the scores of each of the shortest paths is the confidence score. In another embodiment, the matching algorithm may be any algorithm as long as the degree of similarity between the current sound feature and the keyword group in the keyword group speech model can be calculated.

在得到信心分數之後，接著進入步驟S104，處理器103判斷信心分數是否大於或等於信心門檻值。信心門檻值代表目前聲音特徵可以喚醒電子裝置100的難易度。當信心門檻值越低，則表示目前聲音特徵與關鍵詞組語音模型中的關鍵詞組的相似程度不需太高即可喚醒電子裝置100，因此電子裝置100較容易被目前聲音特徵喚醒。當信心門檻值越高，則表示目前聲音特徵與關鍵詞組語音模型中的關鍵詞組的相似程度高才可喚醒電子裝置100，因此電子裝置100較不易被目前聲音特徵喚醒。 After the confidence score is obtained, proceeding to step S104, the processor 103 determines whether the confidence score is greater than or equal to the confidence threshold. The confidence threshold represents the ease with which the current sound feature can wake up the electronic device 100. When the confidence threshold is lower, it means that the current sound feature and the keyword group in the keyword group voice model are not too high to wake up the electronic device 100, so the electronic device 100 is more easily awakened by the current sound feature. When the confidence threshold is higher, it indicates that the current sound feature is similar to the keyword group in the keyword group voice model to wake up the electronic device 100, so the electronic device 100 is less likely to be awakened by the current sound feature.

當信心分數大於或等於信心門檻值，則進入步驟S105；當信心分數小於信心門檻值，則進入步驟S106。在步驟S105，處理器103喚醒電子裝置100。當電子裝置100被喚醒之後，使用者可以各種操作方式操作電子裝置100。舉例來說，使用者可透過語音操作電子裝置100。在另一實施例中，使用者透過按鍵操作電子裝置100、或者透過電子裝置100提供的使用者介面104操作電子裝置100，例如透過觸碰觸控螢幕操作電子裝置100。在步驟S106，處理器103不喚醒電子裝置100。 When the confidence score is greater than or equal to the confidence threshold, the process proceeds to step S105; when the confidence score is less than the confidence threshold, the process proceeds to step S106. At step S105, the processor 103 wakes up the electronic device 100. When the electronic device 100 is awakened Thereafter, the user can operate the electronic device 100 in various modes of operation. For example, the user can operate the electronic device 100 through voice. In another embodiment, the user operates the electronic device 100 via the button or the user interface 104 provided by the electronic device 100, for example, by operating the electronic device 100 by touching the touch screen. At step S106, the processor 103 does not wake up the electronic device 100.

當處理器103判斷信心分數大於或等於信心門檻值而喚醒電子裝置100之後(也就是步驟S104及S105之後)。接著進入步驟S107，處理器103判斷目前聲音特徵是否對應誤接受的情況。誤接受的情況為電子裝置100在非預期的情況下被喚醒。其判斷方式可包括在目前聲音特徵喚醒電子裝置100後，使用者是否於一特定時間內關掉喚醒之電子裝置100。假設在目前聲音特徵喚醒電子裝置100時，使用者於特定時間內關掉喚醒之電子裝置100。在這種情況中，代表使用者並不想喚醒電子裝置100，但是電子裝置100卻被目前聲音特徵喚醒，故可判斷為誤接受的情況。接著，進入步驟S108，處理器103儲存目前聲音特徵至儲存裝置101並記錄目前聲音特徵為對應到誤接受的情況。 When the processor 103 determines that the confidence score is greater than or equal to the confidence threshold and wakes up the electronic device 100 (that is, after steps S104 and S105). Next, proceeding to step S107, the processor 103 determines whether the current sound feature corresponds to a false acceptance. The case of misrecognition is that the electronic device 100 is woken up in an unexpected situation. The manner of determining may include whether the user turns off the awake electronic device 100 within a certain time after the current sound feature wakes up the electronic device 100. It is assumed that when the current sound feature wakes up the electronic device 100, the user turns off the awake electronic device 100 within a certain time. In this case, the representative user does not want to wake up the electronic device 100, but the electronic device 100 is awakened by the current sound feature, so that it can be judged as a false acceptance. Next, proceeding to step S108, the processor 103 stores the current sound feature to the storage device 101 and records the current sound feature as corresponding to the false acceptance.

當處理器103判斷信心分數小於信心門檻值而不喚醒電子裝置100之後(也就是步驟S104及S106之後)。接著進入步驟S109，處理器103判斷目前聲音特徵是否對應誤拒絕的情況。誤拒絕的情況為使用者欲透過語音喚醒電子裝置100，但卻無法成功喚醒電子裝置100。其判斷方式可包括在目前聲音特徵未喚醒電子裝置100，且目前聲音特徵與關鍵詞組語音模型比較而得到之信心分數小於信心門檻值且其差值在一預定範圍內。在這種情況中，由於信心分數小於信心門檻值且其差值在預定範圍內(也就是信心分數很接近信心門檻值)，代表使用者想要喚醒電子裝置100，但是電子裝置100卻沒有被目前聲音特徵所喚醒，故可判斷為誤拒絕的情況。接著，進入步驟S110，處理器103儲存目前聲音特徵至儲存裝置101並記錄目前聲音特徵為對應到誤拒絕的情況。 When the processor 103 determines that the confidence score is less than the confidence threshold value without waking up the electronic device 100 (that is, after steps S104 and S106). Next, proceeding to step S109, the processor 103 determines whether the current sound feature corresponds to a false rejection. The case of false rejection is that the user wants to wake up the electronic device 100 through voice, but cannot successfully wake up the electronic device 100. The manner of determining may include that the current sound feature does not wake up the electronic device 100, and the current sound feature is compared with the keyword group voice model. The confidence score obtained is less than the confidence threshold and the difference is within a predetermined range. In this case, since the confidence score is less than the confidence threshold and the difference is within the predetermined range (that is, the confidence score is very close to the confidence threshold), the user wants to wake up the electronic device 100, but the electronic device 100 is not At present, the sound feature is awakened, so it can be judged as a false rejection. Next, proceeding to step S110, the processor 103 stores the current sound feature to the storage device 101 and records the current sound feature as corresponding to the false rejection.

請參考第1、3、6A及6B圖，第6A及6B圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置100的方法之流程圖。在此實施例中，處理器103更判斷儲存裝置101中是否有儲存一前一聲音訊號之一前一聲音特徵，並當儲存裝置101儲存有前一聲音特徵時，決定是否根據前一聲音特徵及目前聲音特徵喚醒電子裝置100。 Please refer to FIGS. 1 , 3 , 6A and 6B . FIG. 6A and FIG. 6B are flowcharts illustrating a method for waking up the electronic device 100 through voice recognition according to another embodiment of the present invention. In this embodiment, the processor 103 further determines whether the storage device 101 stores a previous sound feature of a previous audio signal, and determines whether the previous sound feature is based when the storage device 101 stores the previous sound feature. And the current sound feature wakes up the electronic device 100.

在聲音接收裝置102接收一目前聲音訊號，以及處理器103擷取目前聲音訊號之一目前聲音特徵之後(也就是步驟S201及S202之後)，進入步驟S211。 After the sound receiving device 102 receives a current sound signal, and the processor 103 captures the current sound characteristic of one of the current sound signals (that is, after steps S201 and S202), the process proceeds to step S211.

在步驟S211，處理器103判斷儲存裝置101中是否有儲存一前一聲音訊號之一前一聲音特徵。當儲存裝置101儲存有前一聲音特徵時，則進入步驟S212；當儲存裝置101未儲存有前一聲音特徵時，則進入步驟S203。 In step S211, the processor 103 determines whether the previous sound feature of one of the previous audio signals is stored in the storage device 101. When the storage device 101 stores the previous sound feature, the process proceeds to step S212; when the storage device 101 does not store the previous sound feature, the process proceeds to step S203.

在步驟S212，處理器103計算目前聲音特徵與前一聲音特徵之間之一相似度。也就是說，由於儲存裝置101儲存有前一聲音特徵，因此處理器103根據前一聲音特徵及目前聲音特徵來決定是否喚醒電子裝置100。在一實施例中，處理器103可透過動態時間扭曲(Dynamic Time Warping,DTW)演算法計算前一聲音特徵與目前聲音特徵之間之相似度。請參考第7圖，第7圖繪示動態時間扭曲演算法之示意圖。在第7圖中，P=P₁,…,P_s,…,P_k，P_s=(i_s,j_s)，其中P為翹曲函數(Warping function)。處理器103可透過動態時間扭曲演算法計算目前聲音特徵向量與前一聲音特徵向量之最短距離，該最短距離即為相似度。在另一實施例中，處理器103可透過任意一種演算法計算前一聲音特徵與目前聲音特徵之間之相似度，只要可計算兩個聲音特徵之間之相似度即可，在此並不限制。 At step S212, the processor 103 calculates a similarity between the current sound feature and the previous sound feature. That is, since the storage device 101 stores the previous sound feature, the processor 103 determines whether to wake up the electronic device 100 based on the previous sound feature and the current sound feature. In an embodiment, the processor 103 can calculate the similarity between the previous sound feature and the current sound feature through a Dynamic Time Warping (DTW) algorithm. Please refer to FIG. 7 , which shows a schematic diagram of a dynamic time warping algorithm. In Fig. 7, P = P ₁ , ..., P _s , ..., P _k , P _s = (i _s , j _s ), where P is a warping function. The processor 103 can calculate the shortest distance between the current sound feature vector and the previous sound feature vector through a dynamic time warping algorithm, and the shortest distance is the similarity. In another embodiment, the processor 103 can calculate the similarity between the previous sound feature and the current sound feature through any algorithm, as long as the similarity between the two sound features can be calculated. limit.

在計算出相似度之後，接著進入步驟S213，處理器103判斷目前聲音特徵與前一聲音特徵之間之相似度是否大於或等於一相似度門檻值。當相似度大於或等於相似度門檻值，則表示目前聲音特徵與前一聲音特徵之間的相似度高。也就是，使用者目前說話的內容與前一次說話的內容相似度高；當相似度小於相似度門檻值，則表示目前聲音特徵與前一聲音特徵之間的相似度低。也就是，使用者目前說話的內容與前一次說話的內容相似度低。當相似度大於或等於相似度門檻值，則進入步驟S214；當相似度小於相似度門檻值，則進入步驟S203。 After calculating the similarity, proceeding to step S213, the processor 103 determines whether the similarity between the current sound feature and the previous sound feature is greater than or equal to a similarity threshold. When the similarity is greater than or equal to the similarity threshold, it indicates that the similarity between the current sound feature and the previous sound feature is high. That is, the content currently spoken by the user is highly similar to the content of the previous speech; when the similarity is less than the similarity threshold, the similarity between the current sound feature and the previous sound feature is low. That is, the content currently spoken by the user is less similar to the content of the previous conversation. When the similarity is greater than or equal to the similarity threshold, the process proceeds to step S214; and when the similarity is less than the similarity threshold, the process proceeds to step S203.

在步驟S214，處理器103根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，決定是否喚醒電子裝置100。也就是說，由於目前聲音特徵與前一聲音特徵之間的相似度高，故處理器103根據前一聲音特徵來決定是否喚醒電子裝置100。 In step S214, the processor 103 determines whether to wake up the electronic device 100 based on the case where the previous sound feature corresponds to the erroneous acceptance or the erroneous rejection. Also That is to say, since the similarity between the current sound feature and the previous sound feature is high, the processor 103 determines whether to wake up the electronic device 100 based on the previous sound feature.

當前一聲音特徵為對應到誤接受的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值(也就是目前聲音特徵與前一聲音特徵之間的相似度高)時，則處理器103不喚醒電子裝置100。由於前一聲音特徵係對應到誤接受的情況，表示使用者前一次說話的內容並非要喚醒電子裝置100，但是卻喚醒電子裝置100。因此，當前一聲音特徵為對應到誤接受的情況且使用者目前說話的內容與前一次說話的內容類似，則處理器103可判斷使用者目前並非想要喚醒電子裝置100，故不喚醒電子裝置100。 The current sound feature corresponds to the case of false acceptance, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold (that is, the similarity between the current sound feature and the previous sound feature is high). The processor 103 does not wake up the electronic device 100. Since the previous sound feature corresponds to the case of misacceptance, it indicates that the user's previous speech does not wake up the electronic device 100, but wakes up the electronic device 100. Therefore, if the current sound feature is corresponding to the case of the false acceptance and the content currently spoken by the user is similar to the content of the previous conversation, the processor 103 can determine that the user does not currently want to wake up the electronic device 100, so the electronic device is not woken up. 100.

當前一聲音特徵為對應到誤拒絕的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值(也就是目前聲音特徵與前一聲音特徵之間的相似度高)時，則處理器103喚醒電子裝置100。由於前一聲音特徵係對應到誤拒絕的情況，表示使用者前一次說話的內容事實上想要喚醒電子裝置100，但是卻未成功喚醒電子裝置100。因此，當前一聲音特徵為對應到誤拒絕的情況且使用者目前說話的內容與前一次說話的內容類似，則處理器103可判斷使用者目前想喚醒電子裝置100，故喚醒電子裝置100。 The current sound feature corresponds to the case of false rejection, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold (that is, the similarity between the current sound feature and the previous sound feature is high). The processor 103 wakes up the electronic device 100. Since the previous sound feature corresponds to the case of false rejection, it means that the content that the user spoke last time actually wants to wake up the electronic device 100, but does not successfully wake up the electronic device 100. Therefore, if the current sound feature corresponds to the case of false rejection and the content currently spoken by the user is similar to the content of the previous conversation, the processor 103 can determine that the user currently wants to wake up the electronic device 100, so that the electronic device 100 is woken up.

在步驟S203，由於儲存裝置101未儲存有前一聲音特徵或是目前聲音特徵與前一聲音特徵之間之相似度小於相似度門檻值，因此處理器103不用根據前一聲音特徵及目前聲音特徵來決定是否喚醒電子裝置100，而是根據目前聲音特徵與關鍵詞組語音模型決定是否喚醒電子裝置100。處理器103根據目前聲音特徵與關鍵詞組語音模型決定是否喚醒電子裝置100之流程如下所述，於步驟S203，處理器103透過匹配演算法比較目前聲音特徵與關鍵詞組語音模型以得到信心分數。接著，進入步驟S204，處理器103判斷信心分數是否大於或等於信心門檻值，來決定喚醒電子裝置100(步驟S205)或不喚醒電子裝置100(步驟S206)，並判斷目前聲音特徵為誤接受的情況(步驟S207)或誤拒絕的情況(步驟S210)。當目前聲音特徵為誤接受的情況或誤拒絕的情況時，儲存目前聲音特徵(步驟S208或步驟S210)。 In step S203, since the storage device 101 does not store the previous sound feature or the similarity between the current sound feature and the previous sound feature is less than similar The threshold value is not determined by the processor 103. Instead of waking up the electronic device 100 based on the previous sound feature and the current sound feature, the processor 103 determines whether to wake up the electronic device 100 based on the current sound feature and the keyword group voice model. The process of the processor 103 determining whether to wake up the electronic device 100 based on the current sound feature and the keyword group voice model is as follows. In step S203, the processor 103 compares the current sound feature with the keyword group voice model through a matching algorithm to obtain a confidence score. Next, proceeding to step S204, the processor 103 determines whether the confidence score is greater than or equal to the confidence threshold, to decide to wake up the electronic device 100 (step S205) or not to wake up the electronic device 100 (step S206), and determine that the current sound feature is falsely accepted. Case (step S207) or case of erroneous rejection (step S210). When the current sound feature is a case of false acceptance or a case of false rejection, the current sound feature is stored (step S208 or step S210).

如此一來，當儲存裝置101儲存有前一聲音特徵時，處理器103透過目前聲音特徵與前一聲音特徵之間的相似度以及前一聲音特徵所對應的情況，即可判斷是否要喚醒電子裝置100(如步驟S211至S214)，而不用將目前聲音特徵與關鍵詞組語音模型比較(如步驟S203)，故可降低運算量，提升語音喚醒電子裝置100之效率及正確率。 In this way, when the storage device 101 stores the previous sound feature, the processor 103 can determine whether to wake up the electronic device by the similarity between the current sound feature and the previous sound feature and the corresponding situation of the previous sound feature. The device 100 (such as steps S211 to S214) does not need to compare the current sound feature with the keyword group voice model (step S203), so that the amount of calculation can be reduced, and the efficiency and accuracy of the voice wake-up electronic device 100 can be improved.

請參考第1、6A、6B、8A及8B圖，第8A及8B圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置100的方法之流程圖。在此實施例中，處理器103可根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，調整信心門檻值。 Please refer to FIGS. 1 , 6A , 6B , 8A and 8B . FIG. 8A and FIG. 8B are flowcharts illustrating a method for waking up the electronic device 100 through voice recognition according to another embodiment of the present invention. In this embodiment, the processor 103 can adjust the confidence threshold according to the case where the previous sound feature corresponds to the error acceptance or the case of the false rejection.

步驟S301、S302、S311、S312及S313類似於第6A 圖之步驟S201、S202、S211、S212及S213。不同的是，在步驟S313，處理器103判斷目前聲音特徵與前一聲音特徵之間的相似度大於或等於相似度門檻值之後，進入步驟S314，處理器103根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，調整信心門檻值。 Steps S301, S302, S311, S312, and S313 are similar to the 6A Steps S201, S202, S211, S212, and S213 of the figure. The difference is that, in step S313, the processor 103 determines that the similarity between the current sound feature and the previous sound feature is greater than or equal to the similarity threshold, and proceeds to step S314, the processor 103 correspondingly accepts the error according to the previous sound feature. In the case of a situation or a false rejection, adjust the threshold of confidence.

詳細來說，在步驟S314，當前一聲音特徵為對應到誤接受的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值(也就是目前聲音特徵與前一聲音特徵之間的相似度高)時，處理器103將調高信心門檻值。原因在於由於前一聲音特徵係對應到誤接受的情況，表示使用者前一次說話的內容並非要喚醒電子裝置100，但是卻喚醒電子裝置100。在這種情況下，很有可能是因為信心門檻值太低而導致聲音特徵容易喚醒電子裝置100，故調高信心門檻值，以降低發生誤接收的情況。 In detail, in step S314, the current sound feature corresponds to the case of misacceptance, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold (that is, the current sound feature and the previous sound feature). When the similarity between the two is high, the processor 103 will increase the confidence threshold. The reason is that the content of the previous voice is corresponding to the case of misacceptance, indicating that the content of the user's previous speech does not wake up the electronic device 100, but wakes up the electronic device 100. In this case, it is very likely that because the confidence threshold is too low, the sound feature is likely to wake up the electronic device 100, so the confidence threshold is raised to reduce the occurrence of false reception.

當前一聲音特徵為對應到誤拒絕的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值(也就是目前聲音特徵與前一聲音特徵之間的相似度高)時，處理器103將調低信心門檻值。原因在於由於前一聲音特徵係對應到誤拒絕的情況，表示使用者前一次說話的內容事實上想要喚醒電子裝置100，但是卻未成功喚醒電子裝置100。在這種情況下，很有可能是因為信心門檻值太高而導致聲音特徵不容易喚醒電子裝置100，故調低信心門檻值，以降低發生誤拒絕的情況。 The current sound feature corresponds to the case of false rejection, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold (that is, the similarity between the current sound feature and the previous sound feature is high). The processor 103 will lower the confidence threshold. The reason is that since the previous sound feature corresponds to the case of false rejection, it means that the content of the previous time the user actually wants to wake up the electronic device 100, but does not successfully wake up the electronic device 100. In this case, it is very likely that because the confidence threshold is too high, the sound feature does not easily wake up the electronic device 100, so the confidence threshold is lowered to reduce the occurrence of false rejection.

於執行完步驟S314之後，接著進入步驟S303。處理器103根據目前聲音特徵與關鍵詞組語音模型決定是否喚醒電子裝置100，流程如下所述，於步驟S303，處理器103透過匹配演算法比較目前聲音特徵與關鍵詞組語音模型以得到信心分數。接著，進入步驟S304，處理器103判斷信心分數是否大於或等於信心門檻值，來決定喚醒電子裝置100(步驟S305)或不喚醒電子裝置100(步驟S306)，並判斷目前聲音特徵為誤接受的情況(步驟S307)或誤拒絕的情況(步驟S310)。當目前聲音特徵為誤接受的情況或誤拒絕的情況時，儲存目前聲音特徵(步驟S308或步驟S310)。 After step S314 is performed, the process proceeds to step S303. At The processor 103 determines whether to wake up the electronic device 100 according to the current sound feature and the keyword group voice model. The flow is as follows. In step S303, the processor 103 compares the current sound feature with the keyword group voice model through the matching algorithm to obtain a confidence score. Next, proceeding to step S304, the processor 103 determines whether the confidence score is greater than or equal to the confidence threshold, to decide to wake up the electronic device 100 (step S305) or not to wake up the electronic device 100 (step S306), and determine that the current sound feature is falsely accepted. Case (step S307) or case of false rejection (step S310). When the current sound feature is a case of false acceptance or a case of false rejection, the current sound feature is stored (step S308 or step S310).

在本實施例中，當儲存裝置101儲存有前一聲音特徵時，處理器103可透過目前聲音特徵與前一聲音特徵之間的相似度以及前一聲音特徵所對應的情況來調整信心門檻值。當調高信心門檻值，則可減少誤接收的情況；當調低信心門檻值，則可減少誤拒絕的情況。如此一來，透過調整信心門檻值可有效地改善誤接收及誤拒絕的情況以增加在預期的情況下成功喚醒電子裝置100的機率及正確率。 In this embodiment, when the storage device 101 stores the previous sound feature, the processor 103 can adjust the confidence threshold by the similarity between the current sound feature and the previous sound feature and the corresponding situation of the previous sound feature. . When the confidence threshold is raised, the false reception can be reduced; when the confidence threshold is lowered, the false rejection can be reduced. In this way, by adjusting the confidence threshold, the situation of false reception and false rejection can be effectively improved to increase the probability and accuracy of successfully waking up the electronic device 100 under the expected situation.

請參考第1、3及9圖，第9圖繪示根據本發明另一實施例之透過語音辨識喚醒電子裝置100的方法之流程圖。在此實施例中，處理器103可根據使用者互動的情況調整信心門檻值。步驟S401至S410類似於第3圖之步驟S101至S110。不同的是，在處理器103判斷出目前聲音特徵對應誤接受的情況，並儲存目前聲音特徵至儲存裝置101(步驟S407及S408)，或是處理器103判斷出目前聲音特徵對應誤拒絕的情況，並儲存目前聲音特徵至儲存裝置101(步驟S409及S410)之後，進入步驟S411。在步驟S411中，處理器103可根據使用者互動的情況調整信心門檻值。例如，處理器103可判斷電子裝置100每次被喚醒之後是否又馬上被使用者關掉(也就是誤接受的情況)，或者可判斷使用者是否每次都透過一相同的聲音內容欲喚醒電子裝置100，但都未成功(也就是誤拒絕的情況)。若上述情況連續發生，很有可能是因為信心門檻值太高或太低所造成，因此，處理器103可依據上述情況連續發生之次數是否過多來決定是否調整信心門檻值。 Please refer to FIG. 1 , FIG. 3 and FIG. 9 . FIG. 9 is a flow chart of a method for waking up the electronic device 100 through voice recognition according to another embodiment of the present invention. In this embodiment, the processor 103 can adjust the confidence threshold based on the user interaction. Steps S401 to S410 are similar to steps S101 to S110 of FIG. The difference is that the processor 103 determines that the current sound feature corresponds to the error acceptance, and stores the current sound feature to the storage device 101 (steps S407 and S408), or The processor 103 determines that the current sound feature corresponds to the erroneous rejection, and stores the current sound feature to the storage device 101 (steps S409 and S410), and proceeds to step S411. In step S411, the processor 103 can adjust the confidence threshold according to the user interaction. For example, the processor 103 can determine whether the electronic device 100 is turned off by the user immediately after being awake (that is, the case of misacceptance), or can determine whether the user wants to wake up the electronic device through the same sound content every time. Device 100, but none of them succeeded (ie, in the case of false rejection). If the above situation occurs continuously, it is likely that the confidence threshold is too high or too low. Therefore, the processor 103 can determine whether to adjust the confidence threshold according to whether the number of consecutive occurrences of the above situation is excessive.

具體來說，處理器103係判斷連續發生誤接受情況之次數或連續發生誤拒絕情況之次數是否大於一次數門檻值，來決定是否需要調整信心門檻值。次數門檻值可由設計者自行定義。當連續誤接受之次數大於次數門檻值，表示很有可能是因為信心門檻值太低而導致聲音特徵容易喚醒電子裝置100，則處理器103調高信心門檻值，以降低發生誤接收的情況。當連續誤拒絕之次數大於次數門檻值，表示很有可能是因為信心門檻值太高而導致聲音特徵不容易喚醒電子裝置100，則處理器103調低信心門檻值，以降低發生誤拒絕的情況。 Specifically, the processor 103 determines whether the number of consecutive occurrences of the false acceptance or the number of consecutive occurrences of the false rejection is greater than a threshold value to determine whether the confidence threshold needs to be adjusted. The threshold value can be defined by the designer. When the number of consecutive erroneous acceptances is greater than the threshold value, it is likely that the confidence threshold is too low and the sound feature is likely to wake up the electronic device 100, and the processor 103 raises the confidence threshold to reduce the occurrence of erroneous reception. When the number of consecutive false rejections is greater than the threshold value, it is likely that the confidence threshold is too high and the sound feature does not easily wake up the electronic device 100, and the processor 103 lowers the confidence threshold to reduce the occurrence of false rejection. .

如此一來，處理器103可依據使用者互動的情況調整信心門檻值，以減少發生誤接受或誤拒絕的情況。 In this way, the processor 103 can adjust the confidence threshold according to the user interaction to reduce the occurrence of false acceptance or false rejection.

請參考第1及10A及10B圖，第10A及10B圖繪示根據本發明另一實施例之透過語音辨識啟動電子裝置100之特定功能的方法之流程圖。在此實施例中，在電子裝置100被喚醒後，處理器103更可透過語音辨識啟動電子裝置100之特定功能。也就是說，在電子裝置100被喚醒後，聲音接收裝置102更接收一目前聲音訊號。接著，處理器103更判斷儲存裝置101中是否有儲存一前一聲音訊號之一前一聲音特徵。當儲存裝置101中儲存有前一聲音訊號之前一聲音特徵時，處理器103決定是否根據前一聲音特徵啟動電子裝置100之特定功能。當儲存裝置101中未儲存有前一聲音訊號之前一聲音特徵時，處理器103決定是否根據目前聲音特徵啟動電子裝置100之特定功能。 Please refer to Figures 1 and 10A and 10B, Figures 10A and 10B A flowchart of a method of initiating a specific function of an electronic device 100 through voice recognition according to another embodiment of the present invention. In this embodiment, after the electronic device 100 is woken up, the processor 103 can further activate the specific function of the electronic device 100 through voice recognition. That is, after the electronic device 100 is woken up, the sound receiving device 102 further receives a current audio signal. Next, the processor 103 further determines whether the previous sound feature of one of the previous audio signals is stored in the storage device 101. When the storage device 101 stores a sound feature before the previous audio signal, the processor 103 determines whether to activate the specific function of the electronic device 100 according to the previous sound feature. When a sound feature preceding the previous audio signal is not stored in the storage device 101, the processor 103 determines whether to activate the specific function of the electronic device 100 based on the current sound feature.

在步驟S501，喚醒電子裝置100。在一實施例中，使用者透過按下按鍵喚醒電子裝置100、或者透過觸碰觸控螢幕喚醒電子裝置100。在另一實施例中，使用者透過語音喚醒電子裝置100，透過語音喚醒電子裝置100的方法如前所述，在此不多贅述。 In step S501, the electronic device 100 is woken up. In one embodiment, the user wakes up the electronic device 100 by pressing a button, or wakes up the electronic device 100 by touching the touch screen. In another embodiment, the method for the user to wake up the electronic device 100 through voice and wake up the electronic device 100 through voice is as described above, and will not be described here.

在步驟S502，聲音接收裝置102接收一目前聲音訊號。使用者可說出特定的語音指令以啟動電子裝置100之特定功能。舉例來說，特定的語音指令至少可包括「配對(Pairing)」、「檢查電池(Check battery)」及「是否連線(Am I connected)」其中之一。目前聲音訊號即為使用者所說之特定語音指令所對應的聲音訊號。於步驟S503，處理器103擷取目前聲音訊號之一目前聲音特徵。 In step S502, the sound receiving device 102 receives a current sound signal. The user can speak a particular voice command to activate a particular function of the electronic device 100. For example, a specific voice command may include at least one of "Pairing", "Check battery", and "Am I connected". At present, the sound signal is the sound signal corresponding to the specific voice command spoken by the user. In step S503, the processor 103 captures one of the current sound characteristics of the current audio signal.

接著進入步驟S504，處理器103判斷儲存裝置101中是否有儲存一前一聲音訊號之一前一聲音特徵。當儲存裝置101儲存有前一聲音特徵時，則進入步驟S505；當儲存裝置101未儲存有前一聲音特徵時，則進入步驟S508。 Next, proceeding to step S504, the processor 103 determines whether the previous sound feature of one of the previous audio signals is stored in the storage device 101. When the storage device 101 stores the previous sound feature, the process proceeds to step S505; when the storage device 101 does not store the previous sound feature, the process proceeds to step S508.

在步驟S505，處理器103計算目前聲音特徵與前一聲音特徵之間之一相似度。在計算出相似度之後，接著進入步驟S506，處理器103判斷目前聲音特徵與前一聲音特徵之間之相似度是否大於或等於一相似度門檻值。當相似度大於或等於相似度門檻值，則進入步驟S507；當相似度小於相似度門檻值，則進入步驟S508。 At step S505, the processor 103 calculates a similarity between the current sound feature and the previous sound feature. After calculating the similarity, proceeding to step S506, the processor 103 determines whether the similarity between the current sound feature and the previous sound feature is greater than or equal to a similarity threshold. When the similarity is greater than or equal to the similarity threshold, the process proceeds to step S507; when the similarity is less than the similarity threshold, the process proceeds to step S508.

在步驟S507，處理器103根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，決定是否啟動電子裝置100之特定功能。當前一聲音特徵為對應到誤接受的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值時，則處理器103不啟動電子裝置100之特定功能。由於前一聲音特徵係對應到誤接受的情況，表示使用者前一次說話的內容並非要啟動電子裝置100之特定功能，但是卻啟動電子裝置100之特定功能。因此，當前一聲音特徵為對應到誤接受的情況且使用者目前說話的內容與前一次說話的內容類似，則處理器103可判斷使用者目前並非想要啟動電子裝置100之特定功能，故不啟動電子裝置100之特定功能。當前一聲音特徵為對應到誤拒絕的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值時，則處理器103啟動電子裝置100之特定功能。由於前一聲音特徵係對應到誤拒絕的情況，表示使用者前一次說話的內容事實上想要啟動電子裝置100之特定功能，但是卻未成功啟動電子裝置100之特定功能。因此，當前一聲音特徵為對應到誤拒絕的情況且使用者目前說話的內容與前一次說話的內容類似，則處理器103可判斷使用者目前想啟動電子裝置100之特定功能，故啟動電子裝置100之特定功能。舉例來說，當目前聲音特徵所對應的特定的語音指令為「配對(Pairing)」，前一聲音特徵為對應到誤拒絕的情況，且目前聲音特徵與前一聲音特徵之間的大於或等於相似度門檻值時，則處理器103根據目前聲音特徵啟動確認電子裝置100與其他電子裝置的無線通訊配對是否成功之功能。 In step S507, the processor 103 determines whether to activate the specific function of the electronic device 100 based on the case where the previous sound feature corresponds to the erroneous acceptance or the erroneous rejection. When the current sound feature corresponds to the case of false acceptance, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold, the processor 103 does not activate the specific function of the electronic device 100. Since the previous sound feature corresponds to the case of misacceptance, it indicates that the content of the user's previous speech is not to activate the specific function of the electronic device 100, but activates the specific function of the electronic device 100. Therefore, if the current sound feature corresponds to the case of the false acceptance and the content currently spoken by the user is similar to the content of the previous conversation, the processor 103 can determine that the user does not currently want to activate the specific function of the electronic device 100, so The particular function of the electronic device 100 is activated. The current sound feature corresponds to the case of false rejection, and the current sound feature and the previous sound feature are greater than or equal to the similarity threshold, then The processor 103 activates a particular function of the electronic device 100. Since the previous sound feature corresponds to the case of false rejection, it indicates that the user's previous speech actually wants to activate the specific function of the electronic device 100, but does not successfully activate the specific function of the electronic device 100. Therefore, if the current sound feature corresponds to the case of false rejection and the content currently spoken by the user is similar to the content of the previous conversation, the processor 103 can determine that the user currently wants to activate the specific function of the electronic device 100, thereby starting the electronic device. 100 specific features. For example, when the specific voice command corresponding to the current sound feature is “Pairing”, the previous sound feature corresponds to the case of false rejection, and the current sound feature is greater than or equal to the previous sound feature. When the similarity threshold is reached, the processor 103 starts a function of confirming whether the wireless communication pairing of the electronic device 100 with other electronic devices is successful according to the current sound feature.

在步驟S508，處理器103透過匹配演算法比較目前聲音特徵與關鍵詞組語音模型以得到信心分數。也就是說，當儲存裝置101未儲存有前一聲音特徵或是目前聲音特徵與前一聲音特徵之間的相似度低，處理器103不需根據前一聲音特徵來決定是否啟動電子裝置100之特定功能。處理器103係將目前聲音特徵與關鍵詞組語音模型比較，決定是否根據目前聲音特徵啟動電子裝置100之特定功能。 In step S508, the processor 103 compares the current sound feature with the keyword group speech model through a matching algorithm to obtain a confidence score. That is, when the storage device 101 does not store the previous sound feature or the similarity between the current sound feature and the previous sound feature is low, the processor 103 does not need to decide whether to activate the electronic device 100 according to the previous sound feature. Specific features. The processor 103 compares the current sound characteristics with the keyword group speech model to determine whether to initiate a particular function of the electronic device 100 based on the current sound characteristics.

在步驟S508得到信心分數之後，接著進入步驟S509，處理器103判斷信心分數是否大於或等於信心門檻值。當信心分數大於或等於信心門檻值，則進入步驟S513；當信心分數小於信心門檻值，則進入步驟S510。 After the confidence score is obtained in step S508, proceeding to step S509, the processor 103 determines whether the confidence score is greater than or equal to the confidence threshold. When the confidence score is greater than or equal to the confidence threshold, the process proceeds to step S513; when the confidence score is less than the confidence threshold, the process proceeds to step S510.

在步驟S513，處理器103根據目前聲音特徵啟動電子裝置100之特定功能。舉例來說，處理器103根據目前聲音特徵至少啟動確認電子裝置100與其他電子裝置的無線通訊配對是否成功之功能、啟動檢查電子裝置100的電池電量之功能，或啟動檢查電子裝置100的網路是否連線之功能之一。更清楚來說，假設聲音接收裝置102接收到的目前聲音特徵所對應的特定的語音指令為「配對(Pairing)」，則處理器103根據目前聲音特徵啟動確認電子裝置100與其他電子裝置的無線通訊配對是否成功之功能。假設聲音接收裝置102接收到的目前聲音特徵所對應的特定的語音指令為「檢查電池(Check battery)」，則處理器103根據目前聲音特徵啟動檢查電子裝置100的電池電量之功能。假設聲音接收裝置102接收到的目前聲音特徵所對應的特定的語音指令為「是否連線(Am I connected)」，則處理器103根據目前聲音特徵啟動檢查電子裝置100的網路是否連線之功能。 At step S513, the processor 103 activates a particular function of the electronic device 100 based on the current sound characteristics. For example, the processor 103 activates at least a function of confirming whether the wireless communication pairing of the electronic device 100 and other electronic devices is successful according to the current sound feature, starts a function of checking the battery power of the electronic device 100, or starts checking the network of the electronic device 100. One of the features of the connection. More specifically, if the specific voice command corresponding to the current voice feature received by the voice receiving device 102 is “Pairing”, the processor 103 starts to confirm the wireless of the electronic device 100 and other electronic devices according to the current voice feature. The function of communication pairing success. Assuming that the specific voice command corresponding to the current voice feature received by the voice receiving device 102 is "Check Battery", the processor 103 activates the function of checking the battery power of the electronic device 100 based on the current voice feature. Assuming that the specific voice command corresponding to the current voice feature received by the voice receiving device 102 is "Am I connected", the processor 103 starts to check whether the network of the electronic device 100 is connected according to the current voice feature. Features.

接著進入步驟S514，處理器103判斷目前聲音特徵是否對應誤接受的情況。誤接受的情況為處理器103在非預期的情況下啟動電子裝置100之特定功能。其判斷方式可包括在處理器103透過目前聲音特徵啟動電子裝置100之特定功能後，使用者是否於特定時間內關掉該特定功能。若使用者於特定時間內關掉該特定功能，表示使用者並不想啟動該特定功能，但是該特定功能卻被目前聲音特徵所啟動，故可判斷為誤接受的情況。接著，進入步驟S515，處理器103儲存目前聲音特徵至儲存裝置 101並記錄目前聲音特徵為對應到誤接受的情況。 Next, proceeding to step S514, the processor 103 determines whether the current sound feature corresponds to a false acceptance. The case of misrecognition is that the processor 103 initiates a particular function of the electronic device 100 in an unexpected situation. The manner of determining may include whether the user turns off the specific function within a specific time after the processor 103 activates the specific function of the electronic device 100 through the current sound feature. If the user turns off the specific function within a certain time, it means that the user does not want to activate the specific function, but the specific function is activated by the current sound feature, so it can be judged as a false acceptance. Next, proceeding to step S515, the processor 103 stores the current sound feature to the storage device 101 and record the current sound characteristics as corresponding to the case of false acceptance.

在步驟S510，處理器103不根據目前聲音特徵啟動電子裝置100之特定功能。接著進入步驟S511，處理器103判斷目前聲音特徵是否對應誤拒絕的情況。誤拒絕的情況為使用者欲透過目前聲音啟動電子裝置100之特定功能，但卻無法成功啟動該特定功能。其判斷方式可包括在目前聲音特徵未啟動電子裝置100之特定功能，且目前聲音特徵與關鍵詞組語音模型比較而得到之信心分數小於信心門檻值且其差值在預定範圍內。在這種情況中，由於信心分數小於信心門檻值且其差值在預定範圍內(也就是信心分數很接近信心門檻值)，代表使用者想要透過目前聲音啟動電子裝置100之特定功能，但是電子裝置100之特定功能卻沒有被目前聲音特徵所啟動，故可判斷為誤拒絕的情況。接著，進入步驟S512，處理器103儲存目前聲音特徵至儲存裝置101並記錄目前聲音特徵為對應到誤拒絕的情況。 At step S510, the processor 103 does not initiate a particular function of the electronic device 100 based on the current sound characteristics. Next, proceeding to step S511, the processor 103 determines whether the current sound feature corresponds to a false rejection. The case of false rejection is that the user wants to activate the specific function of the electronic device 100 through the current sound, but cannot successfully start the specific function. The manner of determining may include that the current sound feature does not activate the specific function of the electronic device 100, and the current sound feature is compared with the keyword group voice model to obtain a confidence score that is less than the confidence threshold and the difference is within a predetermined range. In this case, since the confidence score is less than the confidence threshold and the difference is within the predetermined range (that is, the confidence score is very close to the confidence threshold), the user wants to activate the specific function of the electronic device 100 through the current sound, but The specific function of the electronic device 100 is not activated by the current sound feature, so it can be judged as a false rejection. Next, proceeding to step S512, the processor 103 stores the current sound feature to the storage device 101 and records the current sound feature as corresponding to the false rejection.

在此實施例中，語音辨識可被應用在啟動電子裝置100之特定功能。在電子裝置100被喚醒之後，使用者可透過語音辨識啟動電子裝置100之特定功能，處理器103可根據目前聲音特徵啟動電子裝置100之特定功能，並可記錄該目前聲音特徵是因誤接受的情況而啟動該特定功能，或是誤拒絕的情況而未啟動該特定功能。因此，當電子裝置100接收到使用者下一次聲音時，若先前之聲音特徵已被儲存，則處理器103可依據下一次聲音之聲音特徵與儲存之聲音特徵的相似程度，以及儲存之聲音特徵所對應的情況，來決定是否啟動電子裝置100之特定功能，不需透過比較關鍵詞組語音模型，故可提升透過語音啟動電子裝置100之特定功能之效率。 In this embodiment, speech recognition can be applied to activate a particular function of the electronic device 100. After the electronic device 100 is woken up, the user can activate a specific function of the electronic device 100 through voice recognition, and the processor 103 can activate a specific function of the electronic device 100 according to the current sound feature, and can record that the current sound feature is incorrectly accepted. The specific function is activated by the situation, or it is falsely rejected without starting the specific function. Therefore, when the electronic device 100 receives the next sound of the user, if the previous sound feature has been stored, the processor 103 can calculate the similarity between the sound feature of the next sound and the stored sound feature, and the stored sound. The situation corresponding to the levy determines whether to activate the specific function of the electronic device 100, and does not need to compare the keyword group voice model, thereby improving the efficiency of starting the specific function of the electronic device 100 through the voice.

本發明上述實施例所揭露之透過語音辨識喚醒電子裝置的方法，當儲存裝置儲存有前一聲音特徵時，可根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，以及目前聲音特徵與前一聲音特徵之間之相似度決定是否喚醒電子裝置，因此可提高喚醒電子裝置之效率。另外，本發明亦可根據前一聲音特徵對應誤接受的情況或誤拒絕的情況，以及目前聲音特徵與前一聲音特徵之間之相似度來調整信心門檻值，以減少誤接受的情況或誤拒絕的情況發生的機率，不需要線上調整關鍵詞組語音模型或預先訓練關鍵詞組語音模型即可改善誤接受的情況或誤拒絕的情況，因此調整的複雜度低，且可有效降低誤接受或誤拒絕的情況以提高成功喚醒電子裝置的機率及正確率。 The method for waking up an electronic device through voice recognition according to the above embodiment of the present invention, when the storage device stores the previous sound feature, may be based on the previous voice feature corresponding to the situation of false acceptance or false rejection, and the current sound feature and The similarity between the previous sound features determines whether or not the electronic device is woken up, thereby improving the efficiency of waking up the electronic device. In addition, the present invention can also adjust the confidence threshold according to the situation that the previous sound feature corresponds to the error acceptance or the false rejection, and the similarity between the current sound feature and the previous sound feature, so as to reduce the false acceptance or error. The probability of rejection is not required to adjust the keyword group speech model or the pre-trained keyword group speech model to improve the situation of false acceptance or false rejection. Therefore, the adjustment complexity is low, and the error acceptance or error can be effectively reduced. The situation of rejection is to increase the probability and accuracy of successfully waking up the electronic device.

綜上所述，雖然本發明已以較佳實施例揭露如上，然其並非用以限定本發明。本發明所屬技術領域中具有通常知識者，在不脫離本發明之精神和範圍內，當可作各種之更動與潤飾。因此，本發明之保護範圍當視後附之申請專利範圍所界定者為準。 In conclusion, the present invention has been disclosed in the above preferred embodiments, and is not intended to limit the present invention. A person skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the scope of the invention is defined by the scope of the appended claims.

S101、S102、S103、S104、S105、S106、S107、S108、S109、S110‧‧‧流程步驟 S101, S102, S103, S104, S105, S106, S107, S108, S109, S110‧‧‧ process steps

Claims

A method for waking up an electronic device through voice recognition, comprising: receiving a current voice signal; performing a voice recognition wake-up algorithm; receiving and determining a user feedback; and adjusting the voice recognition wake-up algorithm.

A method for waking up an electronic device through voice recognition, comprising: receiving a current audio signal; capturing a current sound characteristic of one of the current audio signals; determining whether a previous sound characteristic of one of the previous audio signals is stored; wherein when storing When there is the previous sound feature, according to the previous sound feature, a case of False Acceptance or a case of false rejection (False Rejection) and one of the current sound feature and the previous sound feature are Similarity, at least adjust a confidence threshold and decide whether to wake up one of the electronic devices.

The method of claim 2, wherein when the previous sound characteristic is the case of the false acceptance and the similarity is greater than or equal to a similarity threshold, the confidence threshold is raised or the electronic is not woken up. Device.

The method of claim 2, wherein when the previous sound feature is the case of the false rejection and the similarity is greater than or equal to a similarity threshold, the confidence threshold is lowered or the electronic device is awakened. .

The method of claim 2, wherein the similarity is calculated by a Dynamic Time Warping (DTW) algorithm.

The method of claim 2, wherein when the previous sound feature of the previous audio signal is not stored, the method further comprises: comparing the current sound feature with a keyword group voice through a matching algorithm Modeling to obtain a confidence score; determining whether the confidence score is greater than or equal to the confidence threshold; when the confidence score is greater than or equal to the confidence threshold, waking up the electronic device; and when the confidence score is less than the confidence threshold, The electronic device is not woken up.

The method of claim 6, wherein after waking up the electronic device, the method further comprises: determining whether the current sound feature is a false acceptance; wherein when the current sound feature is the false acceptance, The current sound feature is stored and the confidence threshold is adjusted based on a user interaction.

The method of claim 6, wherein after the electronic device is not woken up, the method further comprises: determining whether the current sound feature is a false rejection; wherein when the current sound feature is the false rejection And storing the current sound feature and adjusting the confidence threshold based on a user interaction.

The method of claim 8, wherein the step of determining whether the current sound feature is the erroneous rejection comprises: determining whether the difference between the confidence score and the confidence threshold is within a predetermined range ; When the difference between the confidence score and the confidence threshold is within the predetermined range, it is determined that the current sound feature is the false rejection.

The method of claim 6, wherein the matching algorithm is a Viterbi algorithm.

The method of claim 7 or 8, wherein the user interaction comprises a number of consecutive false acceptances or a number of consecutive false rejections, in the step of adjusting the confidence threshold based on the user interaction, including : determining whether the number of consecutive false acceptances is greater than a threshold value or determining whether the number of consecutive false rejections is greater than the threshold value; and when the number of consecutive false acceptances is greater than the threshold threshold, increasing the confidence threshold; And when the number of consecutive false rejections is greater than the threshold of the number of times, the confidence threshold is lowered.

An electronic device comprising: a storage device; a sound receiving device for receiving a current sound signal; and a processor for capturing a current sound characteristic of the current sound signal and determining whether the storage device has a storage device a sound feature of a previous sound signal, when the storage device stores the previous sound feature, the processor corresponds to a false acceptance or a false rejection according to the previous sound feature and the current A similarity between the sound feature and the previous sound feature, at least adjusting a confidence threshold and determining whether to wake up one of the electronic devices.

The electronic device of claim 12, wherein the processor is configured to increase the confidence threshold when the previous sound characteristic is the false acceptance and the similarity is greater than or equal to a similarity threshold Or not wake up the electronic device.

The electronic device of claim 12, wherein the processor is configured to lower the confidence threshold when the previous sound feature is the false rejection and the similarity is greater than or equal to a similarity threshold Or wake up the electronic device.

The electronic device of claim 12, wherein the processor is configured to calculate the similarity through a Dynamic Time Warping (DTW) algorithm.

The electronic device of claim 12, wherein the processor is configured to compare the current sound feature with a keyword group speech model by a matching algorithm to obtain a confidence score, and determine whether the confidence score is greater than or equal to the The confidence threshold is devaluated; wherein when the confidence score is greater than or equal to the confidence threshold, the processor wakes up the electronic device, and when the confidence score is less than the confidence threshold, the processor does not wake up the electronic device.

The electronic device of claim 16, wherein after the processor wakes up the electronic device, the processor is further configured to determine whether the current sound feature is a false acceptance, and when the current sound feature is In the case of the mis-acceptance, the processor stores the current sound feature to the storage device and adjusts the confidence threshold based on a user interaction.

The electronic device of claim 16, wherein the processing is After the device does not wake up the electronic device, the processor is further configured to determine whether the current sound feature is a false rejection, and when the current sound feature is the false rejection, the processor stores the current sound feature. The confidence threshold is adjusted to the storage device and based on a user interaction.

The electronic device of claim 18, wherein the processor determines whether the difference between the confidence score and the confidence threshold is within a predetermined range, when the difference between the confidence score and the confidence threshold is Within the predetermined range, the processor determines that the current sound feature is the false rejection.

The electronic device of claim 16, wherein the matching algorithm is a Viterbi algorithm.

The electronic device of claim 17 or 18, wherein the user interaction includes a number of consecutive false acceptances or a number of consecutive false rejections, and the processor is further configured to determine whether the number of consecutive false acceptances is greater than a threshold value or a judgment of whether the number of consecutive false rejections is greater than the threshold value, and when the number of consecutive false acceptances is greater than the threshold value, the confidence threshold is raised, and when the number of consecutive false rejections is greater than the threshold When the threshold is exceeded, the confidence threshold is lowered.