201201197 六、發明說明: 【發明所屬之技術領娀】 [0001] 本發明係〆種聲音識別之方法,尤指一種判斷一音 訊為何種聲音源之方法。 【先前技術】 [0002] 由於各種生物或人所發出的聲音具有獨特性,使得201201197 VI. Description of the Invention: [Technical Profile of the Invention] [0001] The present invention relates to a method for voice recognition, and more particularly to a method for determining which sound source is an audio source. [Prior Art] [0002] Due to the uniqueness of sounds emitted by various creatures or people,
藉由聲音之辨識付以判斷该聲音是為何種生物或者何人 所發出,並藉由聲音之獨特性,使得聲音辨識得以應用 於防盜系統、生物辨識系統等V 一般較常使用之聲音辨識之技術係利用兩比聲音間 各個參數之距離遠近作辨識,較常見為metirc_based聲 音識別方法,由一音訊資料庫之複數樣本音訊中,取一 樣本音訊與一待鑑定音訊做相似度比對,(這裡所稱之 音訊係指聲音數位訊號,由於聲音源係為類比訊號,藉 由類比數位轉換器轉換成數位訊號,故以下說明書之内 容若無特別解釋,則本朗書所指之音訊係指聲音數位 訊號),ϋ由比對謂待鑑定音訊中各個音訊參數與該樣 本音訊中各個音訊參數之距離遠近,藉由計算各個音訊 參數之距離’得-相似度值來判斷該待鑑定音訊與該樣 本音訊是否為同—聲音源。 099121320 表單編號A0101 第3頁/共18頁 由於metric-based聲音識別方法係利用計算該待鑑 定音訊中各個音訊參數與該樣本音訊中各個音財數之姐 距離’由於音訊參數數量較彡,使得計算各個音訊參 之距離需4耗較多時間來計算各個音訊參㈣之距離, 且由於生物或者人或者物品所發出之聲音源,並非每a 發1!!,均相同’會因為當時直生聲音之能量或者環= 0992037567-0 201201197 較為吵雜,而使得音訊參數變動較為不規律,而使得 metric-based聲音識別方法於聲音辨識時具有較高之識 別錯誤率’且該metric_based聲音識別方法只憑藉相似 度值來判斷該待鑑定音訊與該樣本音訊是否為同一音訊 ,而無其他檢疋之方法,使得metric_based聲音識別方 法產生較高之識別錯誤率。 由於習用技術具有上述之問題’而使得聲音辨識正 確率較為不足,本發明者由於對聲音學領域之專業知識 具有較高之造詣,並且秉持著對於聲音辨識之熱誠,故 開始研究、思考如何解決上述習甩技術不足之問題。 【發明内容】 [0003] 099121320 有鑑於先前之技術所產生之問題,本發明者認為應 有一種得以改善之方法,經過多次設計、實驗與思考, 終於得到一種聲音識別之方法,藉以改善先前技術不足 之處。 本發明係一種聲音識別乏方法,係藉由下列敘述之 步驟來判斷待鑑定音訊係為何種聲善,包括下列步驟: (A) 取一音訊資料庫,該音訊資料g包括複數樣本音訊; (B) 由該複數樣本音訊中任取一樣本音訊,將一待鑑定音 訊與該樣本音訊利用一分類機對該待鑑定音訊與該樣本 音訊做分類處理,而得一相似度值; (C) 設一相似度門檻值,當該相似度值高於該相似度門檻 值時,則判斷該待鑑定音訊與該樣本音訊為同_音訊, 若該相似度值低於該相似度門椴值時,則回到步驟(B)。 本發明係將該待鑑定音訊與該音訊資料庫中各個樣 本音訊做相似度比對,該樣本音訊包括不同人說話的聲 0992037567-0 表單編號A0101 第4頁/共18頁 201201197 t 音、各種動物發出的叫聲、物體碰撞發出的聲響等,藉 以判斷該待鑑定音訊係為何種聲音,由於相似或相同之 音訊較不易做分類處理,故利用該分類機將該待鑑定音 訊與該音訊資料庫中之任一樣本音訊做分類處理得以作 為該相似度值判斷之依據。 該分類器可為一支援向量機、最近鄰居分類 Ο K means等’ 較佳實施例該分類器為該支援向量機對 亥待鑑定θ Λ與该樣本音訊做分類處理,而得到一分類 線付以將該待鏗&音訊中各個音訊參數與該樣本音訊中 各個音訊參數做分類,再將_駭音財各個音訊參 數與4樣本音訊中各個音訊參數做分類處理之檢驗當 檢驗過程巾⑽衫音財數經蚊㈣雜本音訊參 數則伸到該第-分類錯誤率,以下說明書若無特別說 月則樣本曰訊參數個數標記為A,該待锻定音訊參數個 數標記為B ’因此該第—分類錯誤率計算公式係為: 座麵趣^ ;當檢驗過程巾該樣本L數經檢定後By the identification of the sound, it is judged by which creature or person the sound is emitted, and by the uniqueness of the sound, the sound recognition can be applied to the anti-theft system, the biometric system, etc. The system uses the distance between the two parameters of the sound to identify the distance. It is more common for the metirc_based voice recognition method. In the complex sample audio of an audio database, the same audio is compared with a to-be-identified audio. The so-called audio refers to the sound digital signal. Since the sound source is an analog signal, it is converted into a digital signal by an analog digital converter. Therefore, unless otherwise explained, the audio referred to in this book refers to the sound. The digital signal is determined by comparing the distance between each audio parameter in the to-be-identified audio and each audio parameter in the sample audio, and determining the to-be-identified audio and the sample by calculating the distance 'de-similarity value of each audio parameter' Whether the audio is the same - the sound source. 099121320 Form No. A0101 Page 3 of 18 Since the metric-based voice recognition method uses the calculation of the audio parameters of the to-be-identified audio and the sister distance of each audio number in the sample audio, the number of audio parameters is relatively high. Calculating the distance of each audio parameter requires 4 more time to calculate the distance of each audio reference (4), and because the sound source of the creature or person or item is not every 1!!, the same 'will be due to the original The energy of the sound or the ring = 0992037567-0 201201197 is more noisy, which makes the audio parameter changes more irregular, and makes the metric-based voice recognition method have a higher recognition error rate in sound recognition' and the metric_based voice recognition method only The similarity value is used to determine whether the to-be-identified audio and the sample audio are the same audio, and there is no other method of detection, so that the metric_based voice recognition method generates a higher recognition error rate. Since the conventional technology has the above problems, the correctness rate of the sound recognition is insufficient. The inventor of the present invention has a high degree of expertise in the field of sound science and upholds the enthusiasm for sound recognition, so he began to study and think about how to solve it. The problem of the above-mentioned techniques is insufficient. SUMMARY OF THE INVENTION [0003] 099121320 In view of the problems arising from the prior art, the inventors believe that there should be a method of improvement, after many design, experiment and thinking, finally get a method of voice recognition, thereby improving the previous Technical deficiencies. The present invention is a method for lack of voice recognition, which determines the soundness of the audio system to be authenticated by the following steps, including the following steps: (A) taking an audio database, the audio data g including a plurality of sample audio; B) taking the same audio from the plurality of sample audio, and classifying the to-be-identified audio and the sample audio by using a to-be-identified audio and the sample audio to obtain a similarity value; (C) Setting a similarity threshold, when the similarity value is higher than the similarity threshold, determining that the to-be-identified audio is the same as the sample audio, if the similarity value is lower than the similarity threshold Then go back to step (B). The invention compares the to-be-identified audio with the similarity of each sample audio in the audio database, and the sample audio includes the sounds of different people speaking 0992037567-0 Form No. A0101 Page 4 / Total 18 pages 201201197 t sound, various The sound emitted by the animal, the sound of the collision of the object, etc., to determine what kind of sound is to be identified, because the similar or the same audio is less easy to classify, the audio to be identified and the audio data are used by the sorter. Any sample audio in the library can be classified as the basis for the similarity value judgment. The classifier can be a support vector machine, nearest neighbor classifier, K means, etc. In the preferred embodiment, the classifier classifies the sample signal for the support vector machine and obtains a classification line. To classify the audio parameters in the to-be-amplified audio and the audio parameters in the sample audio, and then perform the classification processing on each audio parameter of the audio signal and the audio parameters in the 4-sample audio as the inspection process towel (10) The audio frequency of the shirt (4) miscellaneous audio parameters extend to the first-category error rate. If there is no special month, the number of sample parameters is marked as A, and the number of parameters to be forged is marked as B. 'Therefore the first-class error rate calculation formula is: seat fun ^; when the inspection process towel the sample L number is verified
AA
為該待鑑定音訊參數,騎職第二㈣錯誤率,第二 刀類錯誤率計算公式係為:檢驗麵入 T 由於相同聲 日之曰λ參數較為相同,故,該分_較無法準域找出 一分類線將該賴定音訊參數與該樣本音財數做較完 整之刀類’使付該第—分類錯誤率與該第二分類錯誤率 具有較高錯誤率值。 099121320 較佳實施例中步驟⑹,該相似度隨值係為 一分類 錯誤率Η檻值’當該第_分軸料與該第二分類錯誤 表單編號Α0101 第5頁/共18頁 0992037567-0 201201197 率均高於該分類錯誤率門檻值時,則判定該待鑑定音訊 與該樣本音訊為同一音訊,藉此找出該待鑑定音訊係為 何種聲音;若只有該第一分類錯誤率高於該分類錯誤率 門檻值,或只有該第二分類錯誤率高於該分類錯誤率門 檻值,或該第一分類錯誤率與該第二分類錯誤率均未高 於該分類錯誤率門檻值時,則判斷該待鑑定音訊與該樣 本音訊係為不同聲音,該待鑑定音訊繼續與其他樣本音 訊做相似度之比對,直到判斷出該待鑑定音訊係為何種 聲音為止。 由於本發明係利用找出該待鑑定音訊參數與該樣本 音訊參數之分類線來判斷是否為同一聲音,較習用技術 係利用該待鑑定音訊參數與該樣本音訊參數之各個參數 間之距離取最小值來做分類,具有較佳之相似度判斷效 果,並且於判斷時需耗較少之時間,且由於本發明之方 法找出該分類線後,再以該分類線對各個音訊參數作檢 驗,進而提高本發明對於聲音判斷之精確度。 【實施方式】 [0004] 以下文字說明,藉由圖式之辅助敘述,說明本發明 之構造、特點以及實施例,俾使貴審查人員對於本發 明有更進一步之瞭解。 本發明係一種聲音識別之方法,係藉由以下之步驟 來判斷一聲音源係為何種聲音,包括以下之步驟: 請參閱第一圖所示,步驟(A),取一音訊資料庫,該 聲音資料庫包括複數樣本音訊,該複數樣本音訊係包括 人、動物、機械或其他物品所發出之聲音源,利用一數 位類比轉換器將這些聲音源轉換成音訊,藉此儲存起來 099121320 表單編號A0101 第6頁/共18頁 0992037567-0 201201197 乍為樣本音訊,係以作判_待鑑定聲音源係為何種聲音 之依據》 /請參閱第-圖所示,步驟⑻,取一待鑑定音訊,該 待鐘疋音可為人、動物、機械或其他物品所發出之聲 曰源’再藉由-數位類比轉換n將該聲音源轉換成音訊 於遠音sfl資料庫令任取一樣本音訊,將該待鑑定音訊 與該樣本音訊傲相似度比對,藉此判斷該待鑑定音訊與 "亥樣本音訊是否為同一聲音,該相似度比對之方法係利 ΟFor the audio parameter to be identified, the second (four) error rate of the ride, the second knife error rate calculation formula is: the test face into the T is the same as the same sound day 曰 λ parameters are relatively the same, therefore, the score _ less accurate Finding a classification line that makes the grading audio parameter and the sample sound number a more complete tool class 'make the first-class classification error rate and the second classification error rate have a higher error rate value. 099121320 In the preferred embodiment, step (6), the similarity value is a classification error rate Η槛 value 'When the _th axis and the second classification error form number Α0101 5th page/total 18 page 0992037567-0 When the 201201197 rate is higher than the classification error rate threshold, it is determined that the to-be-identified audio is the same audio as the sample audio, thereby finding out what kind of sound the to-be-identified audio system is; if only the first classification error rate is higher than The classification error rate threshold, or only the second classification error rate is higher than the classification error rate threshold, or the first classification error rate and the second classification error rate are not higher than the classification error rate threshold. Then, it is determined that the to-be-identified audio is different from the sample audio system, and the to-be-identified audio continues to be compared with other sample audios until the sound of the to-be-identified audio system is determined. Since the present invention uses the classification line of the to-be-identified audio parameter and the sample audio parameter to determine whether it is the same sound, the conventional technique uses the distance between the to-be-identified audio parameter and each parameter of the sample audio parameter to be the smallest. The value is classified, has a better similarity judgment effect, and takes less time to judge, and since the classification line is found by the method of the present invention, each audio parameter is tested by the classification line, and then The accuracy of the sound judgment of the present invention is improved. BRIEF DESCRIPTION OF THE DRAWINGS The structure, features, and embodiments of the present invention will be described in the following description of the accompanying drawings. The present invention is a method for voice recognition, which determines the sound of a sound source by the following steps, including the following steps: Referring to the first figure, step (A), taking an audio database, The sound database includes a plurality of sample audio signals including sound sources emitted by humans, animals, machinery or other items, and these sound sources are converted into audio by a digital analog converter, thereby storing them 099121320 Form No. A0101 Page 6 of 18 Page 0992037567-0 201201197 乍 is the sample audio, which is based on the judgment _ the basis of the sound source to be identified. / Please refer to the figure - (8), take the audio to be identified, The sound of the chime can be a source of sound for humans, animals, machinery or other objects. Then, by converting the sound source into an audio source, the sound source can be converted into an audio. The to-be-identified audio is compared with the sample audio arrogance, thereby determining whether the to-be-identified audio and the "Hai sample audio are the same sound, and the similarity comparison method is advantageous Ο
099121320 用—分類機對該待鑑定音訊與該樣本音訊做分類處理, 由於兩個相似或同一聲音之音訊較不易利用該分類器做 分類處理,因此,藉由該分類器將該将鏗定音訊與該樣 本音訊做分類處理時得以產生一相似度值,藉以判斷是 否為同-聲音’該分類機可為一最近鄭居分類機、一支 援向量機、GMM、K-means等,可將兩筆以上之資料做分 類處理之機器。 ⑺參閱第-圖所示,步驟(c),設一相似度門梭值, 田該相似度值尚於該相似度⑽值時,關斷該待鑑定 曰訊與4樣本音訊為同_音訊,藉此來達成聲音識別之 效果。 / S/支援向量機之作動方式係輸入兩筆不同資料 5支援向量機找出各個參數間最小邊界來取得一 條刀類線,將兩筆資料之各個參數作區別、分類。因此 t發月之第—實施例係為步驟(B)中該分類器為一支援 向ϊ機’ ef參閱第圖所示,該待鑑定音訊包括複數 7 乂疋θ訊參數(2) ’該樣本音訊包括複數樣本音訊參數 (3)湘該支援向量機得以找出該分類線⑴將該各個 表單編號Α0101 0992037567-0 第7頁/共18頁 201201197 待鑑定音訊參數(2)與該樣本音訊參數(3)分成兩類,再 利用該分類線(1 )將該各個待鑑定音訊參數(2 )與該各個 樣本音訊參數(3)進行檢驗,若檢驗過程中原本係待鑑定 音訊參數(2),經檢驗後判定為該樣本音訊參數(3),請 參閱第三-B圖所示,該分類線(1)右側為待鑑定音訊參數 (2) 以圓形做表示,該分類線(1)左側為樣本音訊參數 (3) 以方形作表示,該分類線(1)右側圓形移動至該分類 線(1)左側虛線箭頭所指之方向,係表示原本為該待鑑定 音訊參數(2),經檢驗後判定為該樣本音訊參數(3),統 計這些判斷有誤之參數,則得一第一分類錯誤率,該第 一分類錯誤率計算公式係為:A經檢驗後gB。若檢驗過099121320 classifying the to-be-identified audio and the sample audio by using a classifier, since the audio of two similar or the same sound is less easy to use the classifier for classification processing, the classifier will determine the audio information. A similarity value is generated when the sample audio is classified, so as to determine whether it is the same-sound. The classification machine can be a recent Zhengju classification machine, a support vector machine, GMM, K-means, etc. The machine above the pen for classification processing. (7) Refer to the figure--, step (c), set a similarity threshold value, and when the similarity value is still at the similarity (10) value, turn off the to-be-identified signal and the 4 sample audio as the same_information In order to achieve the effect of voice recognition. The operation mode of the /S/support vector machine is to input two different data. 5 The support vector machine finds the minimum boundary between each parameter to obtain a knife line, and distinguishes and classifies each parameter of the two data. Therefore, the first embodiment of the t-month is the step (B) in which the classifier is a support to the downtime 'ef, as shown in the figure, the to-be-identified audio includes a plurality of 7 乂疋 θ parameters (2) ' The sample audio includes the complex sample audio parameters. (3) The support vector machine can find the classification line. (1) The individual forms are numbered Α0101 0992037567-0. Page 7/18 pages 2011.09797 Audio parameters to be authenticated (2) and the sample audio The parameter (3) is divided into two categories, and the respective to-be-identified audio parameters (2) are compared with the respective sample audio parameters (3) by using the classification line (1), and if the audio parameters are to be identified in the verification process (2) After the test, it is determined as the sample audio parameter (3), please refer to the third-B diagram, the right side of the classification line (1) is the audio parameter to be identified (2) is represented by a circle, the classification line ( 1) The sample audio parameter on the left side (3) is represented by a square, and the right side of the classification line (1) is circularly moved to the direction indicated by the dotted arrow on the left side of the classification line (1), which indicates that the audio parameter to be identified is originally 2), after the test is judged as the sample audio parameters (3) The statistical error of the determined parameters, the first to obtain a classification error, the first error rate is calculated based classification are: A test after gB. If tested
A 程中原本係該樣本音訊參數(3),經檢驗後判定為該待鑑 定音訊參數(2),請參閱第三-B圖,該分類線(1)左側方 形移動至該分類線(1)右側虛線箭頭所指之方向,係表示 原本為該樣本音訊參數(3),經檢驗後判定為該待鑑定音 訊參數(2),統計這些判斷有誤之參數,則得一第二分類In the process of A, the original audio parameter (3) is determined. After the test, it is determined as the audio parameter to be identified (2). Please refer to the third-B diagram. The classification line (1) moves to the left side of the classification line (1). The direction indicated by the dotted arrow on the right side indicates that the original audio parameter (3) of the sample is determined as the audio parameter to be identified after verification (2), and the parameters that are incorrectly judged are counted, and a second classification is obtained.
錯誤率,第二分類錯誤率計算公式係為:B經檢驗後爲AThe error rate and the second classification error rate calculation formula are: B is A after inspection.
B 〇 請參閱第二圖所示,於該第一實施例中,該相似度 值係為該第一分類錯誤率與該第二分類錯誤率,由於藉 由該支援向量機將該待鑑定音訊與該樣本音訊做分類時 ,若該待鑑定音訊與該樣本音訊為同一聲音,則分類時 較不易找出該分類線(1)將該待鑑定音訊與該樣本音訊做 分類處理,因此,於檢驗過程中會產生較高之分類錯誤 099121320 表單編號A0101 第8頁/共18頁 0992037567-0 201201197In the first embodiment, the similarity value is the first classification error rate and the second classification error rate, because the information to be authenticated is obtained by the support vector machine. When classifying the sample audio, if the to-be-identified audio and the sample audio are the same sound, it is difficult to find the classification line when sorting (1) to classify the to-be-identified audio and the sample audio, and therefore, A higher classification error will occur during the inspection process. 099121320 Form No. A0101 Page 8 of 18 Page 0992037567-0 201201197
率,故藉由分類錯誤率得以作為相似度之判斷。故,步 驟(c)之該相似度門檻值係為一分類錯誤率門檻值(〇), 當該第-分類錯誤率與該第二分類錯誤率均高於該分類 錯誤率門檻值(〇)時,則判定該待鑑定音訊與該樣本音訊 為同一聲音,若只有該第一分類錯誤率高於該分類錯誤 率門檻值(0),或只有該第二分類錯誤率高於該分類錯誤 率門檻值(0) ’或該第-分類錯誤率與該第二分類錯誤率 均未高於該分類錯誤率門檻值(0)時,則判斷該待鑑定音 訊與該樣本音訊為不同聲音,藉由該第—分類錯誤率與 該第二分類錯誤率均需高於該分類錯誤率門檻值(0)才判 斷該待鏗定音訊與該樣本音訊為同—聲音,使得本發明 具有較高聲音辨識正確率。 請參閱第二圖所示,本發明之第二實施例係於步驟 (A)前更設-前置步驟,該前置步驟係先取一對話音訊 ⑻,該對話音訊⑻可為電台廣播、會議紀錄、或一般 對話紀錄等具有複數不同語者㈣之紀錄,再將該對話 紀錄用類比數位轉轉轉換成該對話音訊(6),先將該對 話音訊(6)中,取出不同語者之音訊重叠段與音訊靜音段 ’以便於找出不同語者之一音訊分界⑺,找出該音訊分 界(7)之方法可為利用最近鄰居分類機、支援向量機等, 本發明較佳實施例剌用該支援向量機來找&該音訊分 界⑺》 099121320 請參閱第二圖所示,設一第一谓測視窗⑷與一第二 偵測視窗(5) ’該第一偵測視窗⑷與該第二偵測視窗 ⑸係以分別制相同單位時間内該對話音訊⑹,而分 別得一第一音 表單編號A0101 訊參數與一第二音訊參數,再利用該支援 第9頁/共18頁 0992037567-0 201201197 向量機對該第一音訊參數與該第二音訊參數做分類,藉 以判斷該第一偵測視窗(4)與該第二偵測視窗(5 )所偵測 該對話音訊(6)是否為同一語者之音訊,其判斷之原理於 本發明實施方式已說明’故容不贅述,而得一第一分類 錯誤率曲線(8),與一第二分類錯誤率曲線(9)。若當判 斷該第一偵測視窗(4 )所偵測之該對話音訊(6)與該第二 偵測視窗(5)所偵測之該對話音訊(6)為不同語者音訊, 則設該音訊分界(7),該音訊分界(7)通過該第二偵測視 ®(5)鄰近該第一偵測視窗(4)之緣邊,而使該對話音訊 (6)分成不同語者之該待鑑定音訊,該第一偵測視窗(4) 係由該對話音訊(6)之起始時間開始偵測,該第二偵測視 窗(5)鄰接該第一偵測視窗(4)開始偵測,藉由該第一偵 測視窗(4)與該第二偵測視窗(5)偵測該對話音訊(6), 而得到至少一該音訊分界(7)將該對話音訊(6)分成複數 不同語者之該待鑑定音訊,再藉由步驟(A)到步驟(〇判 斷該待鑑定音訊為何語者之聲音“ 本發明之第三實施例承接第i實施例,請參閱第二 圖所示,於該複數待鑑定音訊中任取一待鑑定音訊,利 用該支援向量機,將該待鑑定音訊與其他複數待鏗定音 訊做相似度比對,將比對後相似度值高於該相似度門檻 值之待鑑定音訊標示為同一語者之聲音,將該對話音訊 (6)中,各個相同語者之音訊區塊標上記號,使得步驟 (A)到步驟(C)只需鑑定不同語者之該待鑑定音訊即可, 而減少判斷各個待鑑定音訊為何語者之聲音所需之時間 〇 099121320 本創作之第四實施例,更設一步驟(D) 表單編號A0101 第10頁/共18頁 該步驟(D) 0992037567-0 201201197 設-語言辨識器,該語言辨識器係將該對話音訊(6)之對 =内容轉換成-文字域,使得本發明之實施例得以應 於自動會議紀錄,係將開會内容用錄音器紀錄下來, 藉由本發明之方法,找出每個人說話之内容,再利用银 言辨識器,將每個人說話之内容轉換成文字 開會時不«由人工的方式將會仙容記錄下來,而加 快開會時所需之時間。 综上所述,本發明確實符合產業利用性,且未於申 請前見於刊物或公開使用,亦未為公眾所知悉,且具有 非顯而易知性’符合可專利之要件,表依法提出專利申 請。 惟上述之所陳,為本發明在產業上一較佳實施例, 舉凡依本發明申請專利範圍所作之均等變化皆屬本案 诉求標的之範嘴。 【圖式簡單說明】 [0005] 第一圖係本發明之步驟流程圖 第二圖係本發明之實施例示意圖 第三-A圖係本發明之支援向量機作動示意圖(一) 第三-B圖係本發明之支援向量機作動示意圖(二) 【主要元件符號說明】 [0006] (0)分類錯誤率門檻值 (1) 分類線 (2) 待鑑定音訊參數 (3) 樣本音訊參數 (4) 第一偵測視窗 099121320 表單編號A0101 第11貢/共18頁 0992037567-0 201201197 (5) 第二偵測視窗 (6) 對話音訊 (7 )音訊分界 (8) 第一分類錯誤率曲線 (9) 第二分類錯誤率曲線 099121320 表單編號A0101 第12頁/共18頁 0992037567-0Rate, so the classification error rate can be used as the judgment of similarity. Therefore, the similarity threshold of step (c) is a classification error rate threshold (〇), and the first classification error rate and the second classification error rate are both higher than the classification error rate threshold (〇). And determining that the to-be-identified audio is the same sound as the sample audio, if only the first classification error rate is higher than the classification error rate threshold (0), or only the second classification error rate is higher than the classification error rate When the threshold value (0) ' or the first classification error rate and the second classification error rate are not higher than the classification error rate threshold value (0), it is determined that the to-be-identified audio and the sample audio are different sounds, The first classification error rate and the second classification error rate are both higher than the classification error rate threshold (0) to determine that the to-be-determined audio is the same as the sample audio, so that the present invention has a higher sound. Identify the correct rate. Referring to the second figure, the second embodiment of the present invention is preceded by a pre-step (A) pre-step, the pre-step is to first take a conversation audio (8), the conversation audio (8) can be a radio broadcast, conference A record with a plurality of different languages (4), such as a record or a general dialogue record, and then convert the dialogue record into an analog voice (6) by first converting the dialogue record into the dialogue audio (6). The audio overlap segment and the audio mute segment are used to find an audio boundary (7) of different speakers. The method for finding the audio boundary (7) may be using a nearest neighbor classifier, a support vector machine, etc., in accordance with a preferred embodiment of the present invention.该Use the support vector machine to find & the audio demarcation (7)" 099121320 Please refer to the second figure, set a first pre-measurement window (4) and a second detection window (5) 'the first detection window (4) And the second detection window (5) is configured to respectively perform the dialogue audio (6) in the same unit time, and respectively obtain a first sound form number A0101 parameter and a second audio parameter, and then use the support page 9/total 18 Page 0992037567-0 201201197 The machine classifies the first audio parameter and the second audio parameter to determine whether the dialogue window (4) detected by the first detection window (4) and the second detection window (5) is the same language. The voice of the user, the principle of the judgment has been described in the embodiment of the present invention, and the first classification error rate curve (8) and the second classification error rate curve (9) are obtained. If it is determined that the dialogue audio (6) detected by the first detection window (4) and the conversation audio (6) detected by the second detection window (5) are different language audio, then The audio boundary (7), the audio boundary (7) is separated from the edge of the first detection window (4) by the second detection view (5), and the conversation audio (6) is divided into different speakers. For the audio to be authenticated, the first detection window (4) is detected by the start time of the conversation audio (6), and the second detection window (5) is adjacent to the first detection window (4). Start detecting, detecting the dialogue audio (6) by the first detection window (4) and the second detection window (5), and obtaining at least one audio boundary (7) to record the conversation (6) And dividing the audio to be authenticated into a plurality of different languages, and then performing the step (A) to the step (determining the voice of the speaker to be authenticated). The third embodiment of the present invention takes the i-th embodiment, see the As shown in the second figure, any audio to be authenticated is taken in the plurality of to-be-identified audio signals, and the to-be-identified audio and other complex numbers are determined by using the support vector machine. The audio is compared with the similarity, and the to-be-identified audio whose similarity value is higher than the similarity threshold is marked as the voice of the same speaker, and the audio block of each of the same speakers in the dialogue audio (6) is used. Marking the mark, so that step (A) to step (C) only need to identify the to-be-identified audio of different speakers, and reduce the time required to judge the voice of each speaker to be authenticated. 099121320 Four embodiments, further step (D) Form No. A0101 Page 10 of 18 This step (D) 0992037567-0 201201197 Set-language recognizer, the language recognizer is the pair of dialogue audio (6) = The content is converted into a text field, so that the embodiment of the present invention can be recorded in the automatic meeting record, the content of the meeting is recorded by the recorder, and the content of each person is found by the method of the invention, and then the silver language is used for identification. In order to convert the content of each person’s speech into a text meeting, the time required for the meeting will be recorded by the manual method. In summary, the present invention is indeed in line with industrial applicability, and It is found in the publication or public use before the application, and it is not known to the public, and it has non-obvious knowledge. It meets the patentable requirements and the patent application is filed according to law. However, the above is the industry's comparison. The preferred embodiment of the present invention is based on the scope of the present invention. The first diagram is a flow chart of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 3A is a schematic diagram of the operation of the support vector machine of the present invention (1) The third-B diagram is a schematic diagram of the operation of the support vector machine of the present invention (2) [Description of main component symbols] [0006] (0) Classification Error rate threshold (1) Classification line (2) Audio parameters to be authenticated (3) Sample audio parameters (4) First detection window 099121320 Form number A0101 11th/18 pages 0992037567-0 201201197 (5) Second Detection window (6) Conversation audio (7) Audio demarcation (8) First classification error rate curve (9) Second classification error rate curve 099121320 Form number A0101 Page 12 of 18 0992037567-0