TW201329785A

TW201329785A - Interactive voice command editing system and method

Info

Publication number: TW201329785A
Application number: TW101101370A
Authority: TW
Inventors: Chi-Tien Chiu; Hsien-Cheng Liao
Original assignee: Ind Tech Res Inst
Priority date: 2012-01-13
Filing date: 2012-01-13
Publication date: 2013-07-16
Also published as: TWI457788B

Abstract

The present invention relates to an interactive voice command editing method. The method comprises: recording a new action and determining the new action whether it is same with one historical action from historical action information while comparing the new action with the historical action information. It establishes a new voice command and generates a new voice command feature vector. Afterward, verifying the new voice command feature vector according to a predetermined threshold value so as to store the new voice command.

Description

Interactive voice instruction construction system and method

本發明係為一種互動式語音指令建構方法與系統，尤其是有關於一種可根據新動作以產生相對應之一語音指令的互動式語音指令建構方法與系統。The present invention relates to an interactive voice instruction construction method and system, and more particularly to an interactive voice instruction construction method and system that can generate a corresponding voice command according to a new action.

近年來，消費電子在生活上的普及率提高，使用者大量增加。在各種應用中，有個人用的手持系統、平板電腦，與日常生活機具結合之車用電子及機器人、電子玩具等等。然而在使用這些電子產品的便利性，更凸顯出人機介面上的重要。本發明的目的便是為了造就方便的人機互動。In recent years, the popularity of consumer electronics has increased in life, and users have increased significantly. In various applications, there are personal hand-held systems, tablet computers, and vehicle electronics and robots, electronic toys, etc. combined with daily life tools. However, the convenience of using these electronic products highlights the importance of the human-machine interface. The purpose of the present invention is to create a convenient human-computer interaction.

人們在使用這些電子產品時，最直覺便是動手去操作或是用語音指令，而操作這些電子產品需要包含一連串的相關動作，這往往讓使用者有著心生抗拒去使用電子產品的意念。或者是語音指令隨著機器裝置便固定相關語音指令，使用者不容易修改。本發明將這一連串動作以最直覺的語音指令去表示。讓使用者將新動作與其語音指令產生關聯以增加操作的便捷性。When people use these electronic products, the most intuitive thing is to operate or use voice commands. The operation of these electronic products requires a series of related actions, which often gives users the idea of resisting the use of electronic products. Or the voice command fixes the relevant voice command with the machine device, and the user does not easily modify it. The present invention expresses this series of actions in the most intuitive voice commands. Let the user associate new actions with their voice commands to increase the ease of operation.

在一實施例中，本發明提供一種互動式語音指令建構方法，包括：記錄一新動作包含判斷動作的起始；以及比較該新動作與歷史動作資訊，以判斷是否有與該新動作相同之動作。若是新動作，則增加語音指令，語音指令根據一預設門檻值以驗證語音指令是否需要產生。In an embodiment, the present invention provides an interactive voice instruction construction method, including: recording a new action including determining a start of an action; and comparing the new action with historical action information to determine whether there is the same as the new action action. If it is a new action, a voice command is added, and the voice command is based on a preset threshold to verify whether the voice command needs to be generated.

在一實施例中，本發明提供一種互動語音指令建構系統，包括：一動作開始/結束偵測模組，判斷新動作的起始；一動作記錄模組，用以記錄一新動作；一資料庫，用以儲存歷史動作資訊與歷史語音指令；以及一動作比對模組，接收該新動作，並比較該新動作與歷史動作資訊，以判斷是否有與該新動作相同之動作。；一語音指令增加模組，增加對應於新動作之新語音指令；一語音指令驗證模組，以門檻值驗證新語音指令與歷史語音指令，是否增加新語音指令之驗證。In an embodiment, the present invention provides an interactive voice instruction construction system, including: an action start/end detection module to determine the start of a new action; and an action recording module for recording a new action; The library is configured to store historical action information and historical voice commands; and an action comparison module, receive the new action, and compare the new action with the historical action information to determine whether there is the same action as the new action. A voice command is added to the module to add a new voice command corresponding to the new action; a voice command verification module is used to verify the new voice command and the historical voice command with the threshold value, and whether the new voice command is verified.

為使　貴審查委員能對本發明之特徵、目的及功能有更進一步的認知與瞭解，下文特將本發明之裝置的相關細部結構以及設計的理念原由進行說明，以使得　審查委員可以了解本發明之特點，詳細說明陳述如下：圖一顯示根據本發明之一實施例之一種互動語音指令建構系統1。該互動語音建構系統包括：一動作開始/結束偵測模組10、一動作紀錄模組11、一資料庫12、一動作比對模組13、一語音指令增加模組14以及一語音指令驗證模組15。該動作開始/結束偵測模組10藉由感測裝置或攝影裝置之訊號，建立動作規則表來判斷該動作的開始與結束。該動作紀錄模組11可藉由感測裝置或攝影裝置以擷取並記錄一新動作，該新動作可為物件動作，例如，交通工具或玩具等物件動作，且該感測裝置與攝影裝置可包括攝影機、感測器等。該資料庫12可用以儲存歷史動作資訊與歷史語音指令。該動作比對模組13可接收該新動作並比較該新動作與歷史動作資訊，以判斷是否有與該新動作相同之動作。若該歷史動作資訊中並無該新動作相同之動作，則根據該新動作輸入一語音，以使該語音指令增加模組14建立一新語音指令並產生一新語音指令特徵值。該語音指令驗證模組15可根據歷史語音指令之一預設門檻值，以驗證該新語音指令特徵值。門檻值之建立可預先設定一經驗值或以其他演算法產生。新語音指令特徵值與資料庫12中歷史語音指令特徵值，可藉由對數相似度演算法(log-Likelihood)得到一彼此之間分數差距，以此分數差距與預設門檻值做比較以進行新語音指令建立與否之判斷，若該新語音指令特徵值與歷史語音指令特徵值之對數相似度分數差距大於該預設門檻值，則儲存該新語音指令至該資料庫12，且該語音指令驗證模組14將該新語音指令特徵值加入歷史語音指令資料庫12中，以作為歷史語音指令特徵值對數相似度分數範圍值之一，以作為下一個新語音指令計算分數差距之依據；否則，若該新語音指令特徵值與歷史語音指令特徵值之對數相似度分數差距小於該預設門檻值，則該語音指令驗證模組放棄該新語音指令或覆蓋一原有的歷史語音指令所對應之動作。此外，若該歷史動作資訊中有與該新動作相同之動作，則該動作比對模組14放棄建立該新語音指令或更改一原有的語音指令。另外，當該新動作不是一種可直接比對的動作且無法直接與該歷史動作資訊相比較時，則進一步應用一演算法來辨別該新動作，且該演算法可為類神經網路演算法、決策樹演算法或支持向量機(support vector machine，SVM)等等。前述之類神經網路演算法包括機械式的背誦學習(rote learning)、指令式的學習(learning by instruction)、類推式的學習(learning by analogy)以及歸納式的學習(learning by induction)等等。In order to enable the reviewing committee to have a further understanding and understanding of the features, objects and functions of the present invention, the related detailed structure of the device of the present invention and the concept of the design are explained below so that the reviewing committee can understand the present invention. Features, detailed descriptions are set forth below: Figure 1 shows an interactive voice command construction system 1 in accordance with an embodiment of the present invention. The interactive voice construction system includes: an action start/end detection module 10, an action record module 11, a database 12, an action comparison module 13, a voice command addition module 14, and a voice command verification. Module 15. The action start/end detection module 10 determines the start and end of the action by establishing a motion rule table by sensing the signal of the device or the camera. The action recording module 11 can capture and record a new action by using a sensing device or a photographing device, and the new action can be an object action, such as an object action such as a vehicle or a toy, and the sensing device and the photographing device It may include a camera, a sensor, and the like. The database 12 can be used to store historical action information and historical voice commands. The action comparison module 13 can receive the new action and compare the new action with the historical action information to determine whether there is the same action as the new action. If there is no action of the new action in the historical action information, a voice is input according to the new action, so that the voice command adding module 14 creates a new voice command and generates a new voice command feature value. The voice command verification module 15 can preset a threshold according to one of the historical voice commands to verify the new voice command feature value. The threshold value can be established by setting an empirical value or by other algorithms. The new voice command feature value and the historical voice command feature value in the database 12 can be obtained by a log-Likelihood score difference, and the score difference is compared with the preset threshold value. And determining whether the new voice command is established or not, if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is greater than the preset threshold, storing the new voice command to the database 12, and the voice The command verification module 14 adds the new voice command feature value to the historical voice command database 12 as one of the historical voice command feature value log similarity score range values, as the basis for calculating the score gap for the next new voice command; Otherwise, if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is less than the preset threshold, the voice command verification module discards the new voice command or overwrites an original historical voice command. Corresponding action. In addition, if the historical action information has the same action as the new action, the action comparison module 14 discards the establishment of the new voice command or changes an original voice command. In addition, when the new action is not a directly alignable action and cannot be directly compared with the historical action information, an algorithm is further applied to identify the new action, and the algorithm may be a neural network algorithm, Decision tree algorithm or support vector machine (SVM) and so on. Neural network algorithms such as the foregoing include mechanical rote learning, learning by instruction, learning by analogy, and learning by induction.

圖二顯示根據本發明之一實施例之一種互動式語音指令建構方法的動作比對流程圖，且該動作比對流程可與圖一之互動式語音指令建構系統搭配並說明之。首先，應用該互動式語音建構方法包括：(步驟S201)應用動作開始/結束偵測模組10與該動作紀錄模組11藉由感測裝置或攝影裝置以擷取並記錄一新動作，該新動作可為物件動作，例如，交通工具或玩具等物件動作，且該感測裝置與攝影裝置可包括攝影機、感測器等。(步驟S202)應用該動作比對模組13比較該新動作與該資料庫中所儲存的歷史動作資訊，(步驟S203)以判斷是否有與該新動作相同之動作，若該歷史動作資訊中並無該新動作相同之動作，(步驟S204)則啟動該語音指令增加模組14，並進一步根據該新動作輸入一語音，以建立一新語音指令並產生一新語音指令特徵值；否則，(步驟S205)若該歷史動作資訊中有與該新動作相同之動作，則放棄建立該新語音指令並重新記錄一新的動作或更改一原有的語音指令以啟動語音指令增加模組14。另外，當該新動作不是一種可直接比對的動作且無法直接與該歷史動作資訊相比較時，則進一步應用一演算法來辨別該新動作，且該演算法可為類神經網路演算法、決策樹演算法或支持向量機(support vector machine，SVM)等等。前述之類神經網路演算法包括機械式的背誦學習(rote learning)、指令式的學習(learning by instruction)、類推式的學習(learning by analogy)以及歸納式的學習(learning by induction)等等。FIG. 2 is a flowchart showing the action comparison of an interactive voice command construction method according to an embodiment of the present invention, and the action comparison process can be combined with the interactive voice command construction system of FIG. 1 and described. First, the application of the interactive voice construction method includes: (Step S201) the application action start/end detection module 10 and the motion recording module 11 capture and record a new action by using a sensing device or a photographing device. The new action may be an object action, such as an object or vehicle action, and the sensing device and the photographic device may include a camera, a sensor, or the like. (Step S202) Applying the action comparison module 13 to compare the new action with the historical action information stored in the database (step S203) to determine whether there is the same action as the new action, if the historical action information is The action is not the same as the new action, (step S204), the voice command adding module 14 is activated, and a voice is further input according to the new action to establish a new voice command and generate a new voice command feature value; otherwise, (Step S205) If the historical action information has the same action as the new action, the new voice command is abandoned and a new action is re-recorded or an original voice command is changed to activate the voice command adding module 14. In addition, when the new action is not a directly alignable action and cannot be directly compared with the historical action information, an algorithm is further applied to identify the new action, and the algorithm may be a neural network algorithm, Decision tree algorithm or support vector machine (SVM) and so on. Neural network algorithms such as the foregoing include mechanical rote learning, learning by instruction, learning by analogy, and learning by induction.

圖三顯示根據本發明之一實施例之一種互動式語音指令建構方法的驗證語音指令流程圖，且該建構語音指令流程圖亦可與圖二之動作比對流程圖以及圖一之互動式語音建構系統搭配並說明之。當啟動該語音指令增加模組14，並進一步根據該新動作輸入一語音，以建立一新語音指令並產生一新語音指令特徵值之後，(步驟S301)可進行該新語音指令的驗證。接著，(步驟S302)可根據對數相似度演算法(log-Likelihood)，將歷史語音指令特徵值與新語音指令特徵值的對數相似度分數值之一差距與一預設門檻值做比較，以驗證該新語音指令特徵值，若該新語音指令特徵值與歷史語音指令特徵值之對數相似度分數差距大於該預設門檻值，則將該新語音指令特徵值作為一新語音指令參數，(步驟S303)並儲存該新語音指令至該資料庫12作為下次計算新語音指令之對數相似度分數差距依據之一；否則，若該新語音指令特徵值與歷史語音指令特徵值之對數相似度分數差距小於該預設門檻值，(步驟S304)則放棄該新語音指令或覆蓋一原有的歷史語音指令所對應之動作以儲存至該資料庫12，(步驟S305)並重新輸入新語音指令。FIG. 3 is a flowchart of a verification voice instruction according to an interactive voice instruction construction method according to an embodiment of the present invention, and the flowchart of constructing a voice instruction may also be compared with the flowchart of FIG. 2 and the interactive voice of FIG. Construct a system to match and explain. After the voice command adding module 14 is activated, and a voice is further input according to the new action to establish a new voice command and a new voice command feature value is generated, (step S301), the new voice command can be verified. Next, (step S302), according to a log-Likelihood, comparing a difference between the logarithmic similarity score value of the historical voice command feature value and the new voice command feature value with a preset threshold value, Verifying the new voice command feature value, if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is greater than the preset threshold value, the new voice command feature value is used as a new voice command parameter, Step S303) and storing the new voice command to the database 12 as one of the basis for calculating the log similarity score of the next new voice command; otherwise, if the new voice command feature value is compared with the logarithmic similarity of the historical voice command feature value The score difference is less than the preset threshold value (step S304), the new voice command is overwritten or the action corresponding to an original historical voice command is overwritten to be stored in the database 12 (step S305) and the new voice command is re-entered. .

此外，本發明之語音指令增加模組14可建立語者相關(speaker dependent)語音指令與語者不相關(speaker independent)語音指令。語者相關語音指令可建立與使用者語音相關特性，最常見的演算法為動態時軸校正(dynamic time warping，以下簡稱DTW)。此DTW演算法在語音辨識上應用了兩筆相近的的音檔來建立語音指令，首先求取音檔的特徵值參數再由DTW演算法建立此語音指令模型參數。而語者不相關語音指令可應用隱藏式馬可夫模型(hidden markov model，HMM)演算法所訓練出來的語音聲學模型，並由使用者錄製音檔做模型參數調整建立語音指令。常見的辨識方法為威特比重估演算法(viterbi re-estimation algorithm)，使用者首先將錄製的指令音檔求取特徵值參數，然後再與原有的HMM模型做組合調適出新的語音指令模型參數。In addition, the voice command addition module 14 of the present invention can establish a speaker dependent voice command and a speaker independent voice command. The speaker-related voice command can establish the characteristics related to the user's voice. The most common algorithm is dynamic time warping (DTW). The DTW algorithm applies two similar sound files to the speech recognition to establish the voice command. Firstly, the eigenvalue parameter of the sound file is obtained, and then the voice command model parameter is established by the DTW algorithm. The speaker unrelated speech command can apply the speech acoustic model trained by the hidden markov model (HMM) algorithm, and the user records the audio file to make the model parameter adjustment to establish the voice command. The common identification method is the Viterbi re-estimation algorithm. The user first obtains the characteristic value parameter from the recorded command sound file, and then combines with the original HMM model to adjust the new voice. Instruction model parameters.

本發明之互動語音指令建構系統與方法可應用於玩具車互動式語音指令建構系統之一實施例。於此應用中，使用者操作玩具車來建立語音指令，當玩具車進行「S」形的動作時，系統中之動作開始/結束偵測模組10與動作紀錄模組11可將剛剛使用者所進行之動作序列記錄下來，接下來應用動作比對模組13與資料庫12進行比對，以判斷是否為新動作，若為新動作，系統便會啟動語音指令增加模組14進行語音指令輸入；否則，若此動作為舊動作時，系統會詢問使用者是否要更改原有的語音指令或放棄此次語音指令的建立，以重新等待與記錄「新動作」的發生。玩具車的動作可以透過各輪軸伺服馬達上的轉動位置表示。當新動作建立，系統可提示使用者錄製並輸入語音命令，此時使用者可為該新動作輸入欲代表之語音命令，且該語音命令可為一或多句語音檔，如「夥計蛇行」等。此時，語音命令增加模組14便透過語音辨識演算法，將使用者輸入之語音轉為一組語音模型參數，然後透過語音驗證模組14與資料庫12中之語音命令進行驗證。語音指令驗證模組14會依照一門檻值來做驗證標準，當驗證值大於設定的門檻值時，新語音指令便會產生，系統便將該新語音指令對應於該新動作。往後使用者在做此S形動作的操控，便可直接輸入該語音指令以加快處理效率。The interactive voice instruction construction system and method of the present invention can be applied to an embodiment of a toy car interactive voice command construction system. In this application, the user operates the toy car to establish a voice command, and when the toy car performs an "S"-shaped action, the action start/end detection module 10 and the action record module 11 in the system can be just the user. The sequence of actions performed is recorded, and then the action comparison module 13 is compared with the database 12 to determine whether it is a new action. If it is a new action, the system activates the voice command increase module 14 to perform a voice command. Input; otherwise, if the action is an old action, the system will ask the user whether to change the original voice command or abandon the establishment of the voice command to re-wait and record the occurrence of "new action". The action of the toy car can be indicated by the rotational position on each axle servo motor. When the new action is established, the system can prompt the user to record and input a voice command, and the user can input the voice command to be represented for the new action, and the voice command can be one or more voice files, such as "folk snake" Wait. At this time, the voice command adding module 14 converts the voice input by the user into a set of voice model parameters through the voice recognition algorithm, and then verifies through the voice verification module 14 and the voice command in the database 12. The voice command verification module 14 will perform the verification standard according to a threshold value. When the verification value is greater than the set threshold value, a new voice command will be generated, and the system will corresponding the new voice command to the new action. Later, when the user performs the manipulation of the S-shaped motion, the voice command can be directly input to speed up the processing efficiency.

圖四顯示本發明應用於電腦系統上進行圖片處理之互動語音指令建構之另一實施例。如圖四所示，於本實施例，當使用者欲將圖片一縮小並旋轉成圖片二時，使用者將對圖片一執行縮小動作，再執行旋轉動作得到圖片二後予以儲存。此時互動式語音命令建構系統中之動作比對模組11將剛剛使用者所進行之動作序列與資料庫12進行比對，如發現為新動作時，將提示使用者是否新增語音命令。使用者可為該新動作輸入欲代表之語音命令，此時，語音命令新增模組14將使用者輸入之語音轉為一組語音模型，再透過語音驗證模組15與資料庫12中之語音命令進行比對，當比對結果為新語音指令時，系統便將該新語音指令對應於該新動作。往後使用者欲將圖片進行相同處理時，便可直接輸入該語音指令以加快處理效率。FIG. 4 shows another embodiment of the construction of interactive voice instructions for applying image processing on a computer system. As shown in FIG. 4, in the embodiment, when the user wants to reduce and rotate the picture into picture two, the user will perform a zoom-out action on the picture, and then perform a rotation action to obtain the picture two and store it. At this time, the action comparison module 11 in the interactive voice command construction system compares the sequence of actions performed by the user with the database 12, and if found to be a new action, the user is prompted whether to add a voice command. The user can input the voice command to be represented for the new action. At this time, the voice command adding module 14 converts the voice input by the user into a set of voice models, and then passes through the voice verification module 15 and the data library 12. The voice command is compared. When the comparison result is a new voice command, the system corresponding the new voice command to the new action. When the user wants to perform the same processing on the image in the future, the voice command can be directly input to speed up the processing.

本發明可藉由使用者的操作動作，動態產生語音指令。當使用者完成一筆動作時，系統便會由動作資料庫，做動作比對驗證，假若是新動作系統便會啟動增加語音指令的要求。此時使用者馬上透過收錄裝置，錄製語音指令，然後系統經由語音指令驗證機制產生新語音指令。The invention can dynamically generate voice commands by the user's operation. When the user completes an action, the system will perform an action comparison verification from the action database. If the new action system starts, the request for adding a voice command will be initiated. At this point, the user immediately records the voice command through the recording device, and then the system generates a new voice command via the voice command verification mechanism.

唯以上所述者，僅為本發明之範例實施態樣爾，當不能以之限定本發明所實施之範圍。即大凡依本發明申請專利範圍所作之均等變化與修飾，皆應仍屬於本發明專利涵蓋之範圍內，謹請　貴審查委員明鑑，並祈惠准，是所至禱。The above description is only exemplary of the invention, and the scope of the invention is not limited thereto. That is to say, the equivalent changes and modifications made by the applicant in accordance with the scope of the patent application of the present invention should still fall within the scope of the patent of the present invention. I would like to ask your review committee to give a clear explanation and pray for it.

1．．．互動語音指令建構系統1. . . Interactive voice command construction system

10．．．動作開始/結束偵測模組10. . . Motion start/end detection module

11．．．動作記錄模組11. . . Motion recording module

12．．．資料庫12. . . database

13．．．動作比對模組13. . . Motion comparison module

14．．．語音指令增加模組14. . . Voice command addition module

15．．．語音指令驗證模組15. . . Voice command verification module

S201~S205．．．步驟S201~S205. . . step

S301~S305．．．步驟S301~S305. . . step

圖一顯示根據本發明之一實施例之一種互動語音指令建構系統1。1 shows an interactive voice instruction construction system 1 in accordance with an embodiment of the present invention.

圖二顯示根據本發明之一實施例之一種互動式語音指令建構方法的動作比對流程圖。2 is a flow chart showing the operation of an interactive voice command construction method according to an embodiment of the present invention.

圖三顯示根據本發明之一實施例之一種互動式語音指令建構方法的語音驗證流程圖。FIG. 3 shows a flow verification flowchart of an interactive voice instruction construction method according to an embodiment of the present invention.

圖四顯示本發明應用於電腦系統上進行圖片處理之互動語音指令建構之實施例。FIG. 4 shows an embodiment of the construction of interactive voice instructions for applying image processing on a computer system.

S201~S205．．．步驟S201~S205. . . step

Claims

An interactive voice instruction construction method includes: recording a new action; and comparing the new action with historical action information to determine whether there is the same action as the new action.

The method for constructing an interactive voice instruction as described in claim 1, further comprising: if the historical action information does not have the same action of the new action, inputting a voice according to the new action to establish a new voice command. And generate a new voice command feature value.

The method for constructing an interactive voice instruction as described in claim 2, further comprising: presetting a threshold according to one of the historical voice commands to verify the new voice command feature value.

The method for constructing an interactive voice instruction as described in claim 3, further comprising: if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is greater than the preset threshold value, storing the new The voice command, and the new voice command feature value is used as one of the ranges of the logarithmic similarity score difference for the next new voice command.

The method for constructing an interactive voice instruction as described in claim 3, further comprising: if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is less than the preset threshold value, the new The voice command or the action corresponding to an original historical voice command.

The method for constructing an interactive voice instruction as described in claim 2, further comprising: if the historical action information has the same action as the new action, abandoning the establishment of the new voice command or changing an original voice command. .

The interactive voice instruction construction method according to claim 1, wherein when the new action cannot directly compare with the historical action information, an algorithm is further applied to identify the new action.

For example, the interactive voice instruction construction method described in claim 7 is wherein the algorithm is a neural network algorithm, a decision tree algorithm or a support vector machine.

An interactive voice command construction system includes: an action start/end detection module for determining the start of a new action; an action record module for recording a new action; and a database for storing The historical action information and the historical voice command; and an action comparison module receive the new action and compare the new action with the historical action information to determine whether there is the same action as the new action.

The interactive voice instruction construction system of claim 9, further comprising: a voice command adding module, if the historical action information does not have the same action of the new action, inputting a voice according to the new action So that the voice command adding module creates a new voice command and generates a new voice command feature value; and a voice command verification module, according to a preset threshold value and a historical voice command feature value and a new voice command feature value The difference between the logarithmic similarity scores to verify the new voice instruction feature value.

The interactive voice instruction construction system according to claim 10, wherein if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is greater than the preset threshold, the new voice command is stored. To the database, and the voice command verification module uses the new voice command feature value as one of the ranges of the log similarity score difference of the next new voice command.

The interactive voice instruction construction system according to claim 10, wherein if the logarithmic similarity score difference between the new voice command feature value and the historical voice command feature value is less than the preset threshold, the voice command verification mode The group abandons the new voice command or overrides the action corresponding to an original historical voice command.

The interactive voice instruction construction system according to claim 9, wherein if the historical action information has the same action as the new action, the action comparison module gives up the establishment of the new voice command or changes an original Some voice commands.

The interactive voice instruction construction system according to claim 9, wherein when the new action cannot directly compare with the historical action information, an algorithm is further applied to identify the new action.

For example, the interactive voice instruction construction system described in claim 14 is wherein the algorithm is a neural network algorithm, a decision tree algorithm or a support vector machine.