JP5091202B2

JP5091202B2 - Identification method that can identify any language without using samples

Info

Publication number: JP5091202B2
Application number: JP2009180750A
Authority: JP
Inventors: 黎自奮; 李台珍; 黎世聰; 黎世宏; 寥麗娟
Original assignee: Shih Hon Li; Tai Jan Lee Li; Tze Fen Li
Current assignee: Shih Hon Li; Tai Jan Lee Li; Tze Fen Li
Priority date: 2009-08-03
Filing date: 2009-08-03
Publication date: 2012-12-05
Anticipated expiration: 2029-08-03
Also published as: JP2011033879A

Abstract

<P>PROBLEM TO BE SOLVED: To provide an identification method correctly identifying all language sentences by improving a feature of continuous sound, and thereby, identifying all languages such as Taiwan-Chinese, English, Japanese, German, French, Korean, Russian, Cantonese and Taiwanese, without using samples. <P>SOLUTION: The continuous sound (word) includes one or more single short duration sounds, and a feature of the continuous sound of all languages is extracted from unknown continuous sound of all languages. The unknown continuous sound is displayed by using a matrix value, and dispersed to 144-dimensional space, and the feature of known continuous sound of all languages is dispersed to the 144-dimensional space, and simulated and calculated by the feature of the unknown continuous sound around the known continuous sound. The continuous sound including 12 elastic frames with the same length, without filters and without overlapping, is converted to sound wave 12×12 matrix with various lengths, and compared and identified by Bayes' identifying method. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明はサンプルを用いずあらゆる言語を識別可能な識別方法に関し、特にある連続音は１個或いは多数の音節(単音)を含み、連続音のサンプルを用いずすべての言語を識別可能で、１２個の伸縮可能なフレームを用い、長さが等しく、フィルターが無く、オーバーラップせず、長さがさまざまなある連続音の音波を、１２×１２の線形予測ケプストラム係数(LPCC)のマトリックスに転換し、１個の未知の言葉又は連続音は、１２×１２の線形予測ケプストラム係数のマトリックスを用いて表示し、１個の１２×１２マトリックスは、１個の１４４次元空間の１個のベクトルとして認知し、多くの未知の言葉又は連続音のベクトルは、１４４次元空間に散らばり、発話者が１個の既知の連続音を発すると、該既知の連続音の特徴は、周囲の未知の言葉又は連続音の特徴(LPCC)により、シミュレート及び計算され、１２個の伸縮可能なフレームを含み、ある連続音の音波を正規化し、ベイズ比較法は、未処理データベース中で、発音者の未知の言葉又は連続音のために、１個の既知の連続音を探し、１個の発話者の１個の未知の文を、Ｄ個の未知の言葉又は連続音に分割し、及び１個のウィンドウスクリーニングは、１個の既知の文を、発話者の未知の文としてスクリーニングするサンプルを用いずあらゆる言語を識別可能な識別方法に関する。 The present invention relates to an identification method that can identify any language without using a sample. In particular, a continuous tone includes one or many syllables (single notes), and can identify all languages without using a continuous tone sample. Converts continuous sound waves of equal length, no filters, no overlap, and varying lengths into a matrix of 12x12 linear predictive cepstrum coefficients (LPCCs) using one stretchable frame An unknown word or continuous tone is displayed using a matrix of 12 × 12 linear prediction cepstrum coefficients, and one 12 × 12 matrix is represented as one vector in one 144-dimensional space. Recognize that many unknown words or vectors of continuous sounds are scattered in a 144-dimensional space, and when a speaker emits one known continuous sound, the features of the known continuous sound are Simulated and calculated by leaf or continuous sound features (LPCC), including 12 stretchable frames, normalizing certain continuous sound waves, Bayesian comparison is performed in the raw database Search for one known continuous sound for an unknown word or continuous sound, split one unknown sentence of one speaker into D unknown words or continuous sound, and one The window screening relates to an identification method capable of identifying any language without using a sample for screening one known sentence as an unknown sentence of a speaker.

ある連続音を発する時、その発音は、音波により表示される。音波は、時間に従い、非線形変化を行なう一種のシステムで、ある連続音の音波内には、一種の動的特性を含み、また時間に従い、非線形の連続変化を行なう。相同の連続音が発せられる時には、一連の相同の動的特性を有し、時間に従い、非線形の伸展及び收縮を行なう。但し、相同の動的特性は、時間に基づき排列する順序は同様であるが、時間が異なる。相同の連続音が発せられる時、相同の動的特性を、同一時間位置上に配列するのは、非常に困難である。さらに、相似の連続音が特別に多いため、識別をより難しくしている。
（但し、上記以降、「同相」は、「同一言語」（例えば、「日本語」など）を意味する。） When a certain continuous sound is emitted, the pronunciation is displayed by sound waves. A sound wave is a kind of system that performs a non-linear change according to time. A sound wave of a certain continuous sound includes a kind of dynamic characteristic and performs a non-linear continuous change according to time. When a homologous continuous tone is emitted, it has a series of homologous dynamic properties and performs non-linear extension and contraction over time. However, homologous dynamic characteristics are arranged in the same order based on time but are different in time. When homologous continuous sounds are emitted, it is very difficult to align homologous dynamic properties on the same time position. Furthermore, since there are a lot of similar continuous sounds, the identification is made more difficult.
( However, hereinafter, “in-phase” means “same language” (for example, “Japanese”) .)

あるコンピューター化された言語識別システムでは、先ず、音波関連の言語情報、つまり動的特性を抽出し、言語と無関係の雑音をろ過する必要がある。例えば、人の声の音色、音の調子、発話時の心理、生理、情緒などは、音声識別とは無関係であるため、先に削除する。続いて、相同の連続音の相同の特徴を、相同の時間位置上に並べる。この一連の特徴は、長さが等しい系列特徴のベクトルを用い表示し、ある連続音の特徴パターンと呼ばれる。現在の音声識別システムでは、大きさが一致した特徴パターンの発生は複雑に過ぎ、しかも時間がかかる。なぜなら、相同の連続音の相同の特徴は、同一時間位置上には非常に並べ難く、特に、英語は識別がより困難である。 In some computerized language identification systems, it is first necessary to extract sonic-related language information, i.e., dynamic characteristics, and filter noise that is unrelated to the language. For example, the timbre of the human voice, the tone of the sound, the psychology at the time of speaking, the physiology, and the emotion are irrelevant to the voice identification, and are deleted first. Subsequently, the homologous features of the homologous continuous sounds are arranged on the homologous time position. This series of features is displayed using a sequence feature vector having the same length, and is called a feature pattern of a continuous sound. In current speech identification systems, the generation of feature patterns with matching sizes is too complicated and time consuming. This is because the homologous features of homologous continuous sounds are very difficult to line up at the same time position, and in particular, English is more difficult to identify.

一般の文、或いは名称の識別方法には、以下の５個の主要な作業がある。未知の文或いは名称を、Ｄ個の未知の言葉又は連続音に分割し、特徴を抽出し、特徴を正規化し（特徴パターンの大きさが一致し、しかも相同の言葉又は連続音の相同の特徴が、同一時間位置に排列されている）、未知の言葉又は連続音を識別し、及び文或いは名称データベースにおいて、適合する文或いは名称を探し出す。ある連続音の音波の特徴は、しばしばエネルギー（energy）、ゼロ交差（zero crossings）、エクストリームカウント（extreme count）、ホルマント（formants）、線形予測ケプストラム係数（LPCC）、メル周波数ケプストラム係数（MFCC）を用い表現される。 There are the following five main tasks in a general sentence or name identification method. Divide an unknown sentence or name into D unknown words or continuous sounds, extract features, normalize the features (characteristic pattern sizes match, and homologous features of homologous words or continuous sounds Are identified at the same time position), and unknown words or continuous sounds are identified, and the sentence or name database is searched for a matching sentence or name. The characteristics of certain continuous sound waves often include energy, zero crossings, extreme count, formants, linear prediction cepstrum coefficient (LPCC), and mel frequency cepstrum coefficient (MFCC). Expressed.

内、線形予測ケプストラム係数（LPCC）及びメル周波数ケプストラム係数（MFCC）によるものが、最も有効で、広く使用されている。線形予測ケプストラム係数（LPCC）は、ある連続音の最も信頼でき、安定し、また正確な言語の特徴を表す。それは、線形回帰方式を用い、連続音の音波を代表し、最小平方推計法により、回帰係数を計算する。その推計値を、さらにケプストラムに転換すると、線形予測ケプストラム係数（LPCC）となる。 Of these, the linear prediction cepstrum coefficient (LPCC) and the mel frequency cepstrum coefficient (MFCC) are the most effective and widely used. Linear predictive cepstrum coefficients (LPCC) represent the most reliable, stable and accurate language features of a continuous tone. It uses a linear regression method and represents a continuous sound wave, and calculates a regression coefficient by a minimum square estimation method. When the estimated value is further converted into a cepstrum, a linear prediction cepstrum coefficient (LPCC) is obtained.

メル周波数ケプストラム係数（MFCC）は、音波を、フーリエ転換法を用い、周波数に転換する。さらに、メル周波数比例去に基づき、聴覚システムを推計する。S.B. Davis氏とP. Mermelstein氏は、１９８０年に出版された「IEEE Transactions on Acoustics, Speech Signal Processing, Vol.２８, No.４」で発表した論文「Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences」によれば、動的時間伸縮法（DTW）を用いたメル周波数ケプストラム係数（MFCC）特徴は、線形予測ケプストラム係数（LPCC）特徴の識別率より高い。しかし、数回の音声識別実験（本発明人の従来の発明を含む）においては、ベイズ識別器を用いた線形予測ケプストラム係数（LPCC）特徴の識別率は、メル周波数ケプストラム係数（MFCC）特徴よりも高く、しかも省時間である。 The Mel frequency cepstrum coefficient (MFCC) converts sound waves into frequencies using a Fourier transform method. Furthermore, the auditory system is estimated based on the mel frequency proportionality. SB Davis and P. Mermelstein publish a paper "Comparison of parametric representations for monosyllabic word recognition in continuously spoken" published in "IEEE Transactions on Acoustics, Speech Signal Processing, Vol. 28, No. 4" published in 1980. According to sentences, the Mel frequency cepstrum coefficient (MFCC) feature using the dynamic time warping method (DTW) is higher than the identification rate of the linear prediction cepstrum coefficient (LPCC) feature. However, in several speech recognition experiments (including the inventors' previous invention), the linear predictive cepstrum coefficient (LPCC) feature identification rate using the Bayes classifier is more than the mel frequency cepstrum coefficient (MFCC) feature. It is also expensive and saves time.

言語識別には、既に多くの方法が採用されている。例えば、動的時間伸縮法(Ｄynamic time-warping)、ベクトル量子化法（vector quantization）、隠れマルコフモデル法（HMM）がある。もし、相同の発音が、時間上の変化において差異があるなら、比較しながら、相同の特徴を同一の時間位置へとのばす。この識別率は非常に高いが、相同の特徴を同一位置までのばすのは、非常に困難で、ワープ時間が長過ぎ、応用不能である。ベクトル量子化法は、大量の連続音を識別する場合には、不正確であるばかりか、時間がかかる。隠れマルコフモデル法（HMM）による識別方法は、優れているが、方法が煩雑で、あまりにも多くの未知のパラメーターを推計しなければならないため、推計値計算と識別に時間がかかる。 Many methods have already been adopted for language identification. For example, there are a dynamic time-warping method, a vector quantization method, and a hidden Markov model method (HMM). If the homologous pronunciation is different in the change over time, the homologous feature is extended to the same time position while comparing. Although this discrimination rate is very high, it is very difficult to extend homologous features to the same position, the warp time is too long, and it is not applicable. Vector quantization is not only inaccurate but also time consuming when identifying large numbers of continuous sounds. The hidden Markov model method (HMM) is a good discrimination method, but it is complicated and requires too many unknown parameters to be estimated, so it takes time to calculate and identify the estimated values.

T.F. Li（黎自奮）氏は２００３年に出版された「Pattern Recognition, vol. ３６」で発表した論文「Speech recognition of mandarin monosyllables」中において、ベイズ識別器を用い、相同のデータベースにより、各種の長短の一系列のLPCCベクトルを大きさが相同の分類パターンに圧縮した。その識別結果は、Y.K. Chen氏、C.Y.Liu氏、G.H. Chiang氏、M.T. Lin氏が、１９９０年に出版された「Proceedings of Telecommunication Symposium, Taiwan」で発表した論文「The recognition of mandarin monosyllables based on the discrete hidden Markov model」中において、隠れマルコフモデル法HMM方法を用いたものより良い。しかし、圧縮過程は複雑で、時間がかかり、しかも相同の連続音の相同の特徴を相同の時間位置に圧縮するのは非常に難しく、相似した連続音に対しては、識別が極めて難しい。 In the paper “Speech recognition of mandarin monosyllables” published in “Pattern Recognition, vol. 36” published in 2003, TF Li uses a Bayesian classifier and uses a homologous database. A series of LPCC vectors was compressed into a classification pattern of homologous size. The recognition results are based on the paper “The recognition of mandarin monosyllables based on the discrete” published in 1990 by “Proceedings of Telecommunication Symposium, Taiwan” published by YK Chen, CYLiu, GH Chiang, and MT Lin. In "Hidden Markov model", it is better than using the HMM method of hidden Markov model. However, the compression process is complex, time consuming, and it is very difficult to compress the homologous features of homologous continuous sounds to homologous time positions, and it is extremely difficult to identify similar continuous sounds.

本発明音声識別方法は、上記欠点に対して、学理の面から、音波のある音声特徴に基づき、時間に従い、非線形変化を行い、音声特徴を抽出する方法を自然に導き出す。ある連続音の音波を、先ず、正規化し、次に該連続音を代表するに足る大きさが相同の特徴パターンに転換する。しかも相同の連続音は、それら特徴パターン内の相同の時間位置は相同の特徴を有し、人為或いは実験により本発明内の未知パラメーター及び基準値を調節する必要はない。簡易なベイズ識別器を用い、未知の言葉又は連続音分類パターンと連続音特徴データベース内の既知の連続音標準パターンを比較し、再圧縮、ワープ、或いは相同の特徴を探して比較する必要はない。よって、本発明音声識別方法は、特徴の抽出、特徴正規化、及び識別を迅速に完成可能である。 The speech identification method of the present invention naturally derives a method for extracting speech features by performing non-linear changes according to time based on speech features with sound waves, from the viewpoint of science, in view of the above drawbacks. A sound wave of a certain continuous sound is first normalized, and then converted into a feature pattern having a size sufficient to represent the continuous sound. Moreover, homologous continuous sounds have homologous features at the time positions of homology within their feature patterns, and it is not necessary to adjust unknown parameters and reference values in the present invention by human or experiment. Use simple Bayesian classifiers to compare unknown word or continuous sound classification patterns with known continuous sound standard patterns in the continuous sound feature database, without having to look for and compare recompressed, warped, or homologous features . Therefore, the voice identification method of the present invention can quickly complete feature extraction, feature normalization, and identification.

本発明が解決しようとする課題は、サンプルを用いずあらゆる言語を識別可能な識別方法を提供することである。 The problem to be solved by the present invention is to provide an identification method capable of identifying any language without using a sample.

上記課題を解決するため、本発明は下記のサンプルを用いずあらゆる言語を識別可能な識別方法を提供する。
本発明の最重要目的は、多数の未知の言葉又は連続音の特徴を用いて、あらゆる言語の任意の１個の既知の連続音の特徴をシミュレート及び計算することであるため、本発明はサンプルを用いず、あらゆる言語のある連続音の特徴を構築可能で、すなわち本発明サンプルを用いずとも、各種言語を正確に識別することができる。詳しく言えば、本発明は、あらゆる言語の任意の１個の既知の連続音に対して、ベイズ距離を用い、１４４次元空間において、N個の未知の言葉又は連続音マトリックスを探し、該既知の連続音をシミュレート及び計算し、こうして既知の連続音のサンプルを用いずに、あらゆる既知の連続音の特徴を構築することができる。よってあらゆる言語を識別することができる。
本発明は言語識別方法を提供し、それは言語を備えない音声波を削除することができる。
本発明は、連続音の音波正規化及び特徴を抽出する方法を提供する。それは、E個の相互に等しい伸縮可能なフレームを使用し、オーバーラップせず、フィルターがなく、ある連続音波の長短に基づき、すべての波長を自由に調節でき、連続音の音波内で時間に従い非線形変化を行なう一系列の動的特性を、１個の大きさが相互に等しい特徴パターンに転換し、しかも相同の連続音の音波の特徴パターンは、相同の時間位置上では、相同の特徴を有する。即時に識別が可能で、コンピューター即時識別効果を達成することができる。
本発明は、簡易で有効なベイズ法による未知の言葉又は連続音を識別する方法を提供し、識別エラーの確率を最小とし、計算が少なく、識別が速く、弁識率が高い。
本発明は、連続音の特徴の抽出方法を提供し、連続音の音波は一種の時間に従い、非線形変化を行なう動的特性を備える。本発明は、時間に従い、線形変化を行う回帰モデル推計時間に従い、非線形変化を行なう音波を用い、回帰未知係数の最小平方推計値（LPCベクトル）を生じる。
本発明は、すべての音声を備える音波（音波サンプル点）を使用する。より少ない数E＝１２個の相互に等しい伸縮可能なフレームを用い、フィルターがなく、オーバーラップせず、すべてのサンプル点の特徴を含む。ある連続音の音波が短過ぎても、該連続音を削除せず、長過ぎても、一部のサンプル点を削除或いは圧縮しない。ヒトの聴覚がこの連続音を識別可能なら、本発明は、該連続音の特徴を抽出することができる。よって、本発明音声識別方法は、各１個の音声を備えるサンプル点を応用し、できるだけ音声特徴を抽出することができる。E＝１２個の伸縮可能なフレームはオーバーラップせず、フレーム数が少ないため、特徴抽出及び線形予測ケプストラム係数（LPCC）を計算する時間を大幅に減少させることができる。
本発明の識別方法は、話すのが速過ぎる或いは話すのが遅過ぎる連続音を識別することができる。話すのが速過ぎる時には、ある連続音の音波は非常に短い。本発明は、伸縮可能なフレームの長さを短くすれば、相同数のE個の等しい長さの伸縮可能なフレームを用いて、短音波を網羅することができ、E個の線形予測ケプストラム係数（LPCC）ベクトルを生じる。該短音をヒトが弁別できさえすれば、該E個の線形予測ケプストラム係数（LPCC）ベクトルも、該短音の特徴パターンを有効に代表することができる。話すのが遅過ぎる際に発せられる連続音の音波はより長く、伸縮可能なフレームは伸び、発生するE個の線形予測ケプストラム係数（LPCC）ベクトルは、該長音を有効に代表することができる。
本発明は、データベース内のすべての既知の連続音の特徴を安定及び調節する方法を提供し、これによりすべての連続音の特徴は、１４４次元空間内において、相互に自己の位置及び空間を占有し、こうして正確に識別を行なうことができる。
１個の文或いは名称を識別する時には、先ず、未知の文或いは名称を、Ｄ個の未知の連続音に分割し、本発明は、各未知の言葉又は連続音を、ベイズ法を用いて、連続音特徴データベースにおいて、最も相似したF個の既知の連続音を選択する。１個の文は、Ｄ×Ｆ個の既知の連続音により表示され、切断が困難であるため、比較的多い或いは比較的少ない個数の未知の言葉又は連続音に分割され、本発明は、各未知の言葉又は連続音の前後三列のＦ個の相似した既知の連続音により、文或いは名称中の１個の既知の連続音を比較し、また文及び名称データベース中において、各一文或いは名称に対して、３×Ｆウィンドウの既知の相似した連続音を用い、１個の既知の連続音をスクリーニングし、さらに、文及び名称データベースから、最も可能性が高い文或いは名称を探すため、方法は簡単で、成功率が非常に高い（７０個の英語文及び名称と、４０７台湾中国語の文及び名称を識別）。
本発明は、２種の技術を提供し、連続音の特徴を修正し、これにより未知の言葉又は連続音及び未知の文或いは名称の識別を成功させる。
本発明は、１個の台湾中国語単音を、１個の１音節だけの連続音とし、中国語及び外国語の特徴はすべて、同じサンプル大きさのマトリックスにより表示する。よって、本発明は、各種言語を同時に識別することができる。 In order to solve the above problems, the present invention provides an identification method capable of identifying any language without using the following sample.
Since the most important object of the present invention is to simulate and calculate the characteristics of any one known continuous sound in any language using a number of unknown words or continuous sound characteristics, the present invention It is possible to construct features of continuous sounds in any language without using a sample, that is, various languages can be accurately identified without using a sample of the present invention. Specifically, the present invention uses Bayesian distances for any one known continuous sound in any language to search for N unknown words or continuous sound matrices in a 144-dimensional space, Continuous sounds can be simulated and calculated, thus constructing any known continuous sound feature without using a sample of known continuous sounds. Thus, any language can be identified.
The present invention provides a language identification method, which can eliminate speech waves that do not have a language.
The present invention provides a sonic normalization and feature extraction method for continuous sounds. It uses E mutually equal stretchable frames, does not overlap, has no filter, can freely adjust all wavelengths based on the length of some continuous sound wave, and according to time within the sound wave of continuous sound A series of dynamic characteristics with nonlinear changes is converted into a feature pattern of equal magnitude, and the homologous continuous sound wave feature pattern shows a homologous feature at the homologous time position. Have. Immediate identification is possible, and a computer immediate identification effect can be achieved.
The present invention provides a method for identifying unknown words or continuous sounds by a simple and effective Bayesian method, minimizing the probability of identification error, reducing calculation, quick identification, and high discrimination rate.
The present invention provides a method for extracting features of a continuous sound, and the sound wave of a continuous sound has a dynamic characteristic that performs a non-linear change according to a kind of time. The present invention uses a sound wave that performs a non-linear change according to a regression model estimation time that performs a linear change according to time, and generates a minimum square estimate value (LPC vector) of a regression unknown coefficient.
The present invention uses sound waves (sound sample points) with all sounds. A smaller number E = 12 mutually equal stretchable frames is used, there is no filter, no overlap, and all sample point features are included. If a continuous sound wave is too short, the continuous sound is not deleted, and if it is too long, some sample points are not deleted or compressed. If human hearing can distinguish this continuous sound, the present invention can extract the characteristics of the continuous sound. Therefore, the speech identification method of the present invention can extract speech features as much as possible by applying sample points each having one speech. Since E = 12 stretchable frames do not overlap and the number of frames is small, the time to calculate feature extraction and linear prediction cepstrum coefficients (LPCC) can be greatly reduced.
The identification method of the present invention can identify continuous sounds that are too fast to speak or too slow to speak. When talking too fast, some continuous sound waves are very short. If the length of the stretchable frame is shortened, the present invention can cover short waves using E equal length stretchable frames of homologous number, and E linear prediction cepstrum coefficients. (LPCC) vector is generated. The E linear prediction cepstrum coefficient (LPCC) vectors can effectively represent the characteristic pattern of the short sound as long as the short sound can be discriminated by humans. The continuous sound waves emitted when speaking too late are longer, the stretchable frame stretches, and the generated E linear predictive cepstrum coefficient (LPCC) vectors can effectively represent the long sound.
The present invention provides a method for stabilizing and adjusting all known continuous tone features in a database, whereby all continuous tone features occupy their position and space relative to each other in a 144-dimensional space. Thus, accurate identification can be performed.
When identifying one sentence or name, first, the unknown sentence or name is divided into D unknown continuous sounds, and the present invention uses Bayesian methods to separate each unknown word or continuous sound, In the continuous sound feature database, select F known continuous sounds that are most similar. One sentence is displayed by D × F known continuous sounds and is difficult to cut, so it is divided into a relatively large or relatively small number of unknown words or continuous sounds. Compares one known continuous sound in a sentence or name by F similar known continuous sounds in three rows before and after an unknown word or continuous sound, and each sentence or name in the sentence and name database In contrast, a method for screening one known continuous sound using known similar continuous sounds in a 3 × F window and searching for the most likely sentence or name from the sentence and name database. Is simple and has a very high success rate (identifies 70 English sentences and names and 407 Taiwanese sentences and names).
The present invention provides two techniques to modify the characteristics of continuous sounds, thereby successfully identifying unknown words or continuous sounds and unknown sentences or names.
In the present invention, one Taiwanese Chinese single sound is a continuous sound of only one syllable, and all Chinese and foreign language features are displayed by a matrix of the same sample size. Therefore, the present invention can simultaneously identify various languages.

本発明サンプルを用いずあらゆる言語を識別可能な識別方法は、ある連続音の特徴を改善し、これによりあらゆる言語文を正しく識別でき、よって、サンプルを用いず、台湾中国語、英語、日本語、ドイツ語、フランス語、韓国語、ロシア語、広東語、台湾語等のすべての言語を識別することができる。 The identification method that can identify any language without using the sample of the present invention improves the characteristics of a certain continuous sound, thereby correctly identifying any language sentence, and thus, without using the sample, Taiwanese Chinese, English, Japanese All languages such as German, French, Korean, Russian, Cantonese, Taiwanese can be identified.

既知の連続音永久データベース、既知の連続音特徴データベース、文及び名称データベースの構築プロセスを示すフローチャートである。It is a flowchart which shows the construction process of a known continuous sound permanent database, a known continuous sound feature database, a sentence and a name database. １個の未知の文或いは名称の識別方法のプロセスを示すフローチャートである。It is a flowchart which shows the process of the identification method of one unknown sentence or a name. ３８４個の台湾中国語単音、１個のドイツ語、１個の日本語、２個の台湾語の識別方法を示す図である。It is a figure which shows the identification method of 384 Taiwanese Chinese single sound, 1 German, 1 Japanese, and 2 Taiwanese. １５４個の英語、１個のドイツ語の識別方法を示す図である。It is a figure which shows the identification method of 154 English and one German. ２６９個の台湾中国語単音、３個の台湾語を識別する方法の図である。FIG. 3 is a diagram of a method for identifying 269 Taiwanese Chinese phonetics and 3 Taiwanese words. 文及び名称データベースは、７０個の英語文と４０７個の中国語文及び名称を有することを示す図である。The sentence and name database is a diagram showing that there are 70 English sentences and 407 Chinese sentences and names. 英語及び台湾中国語の文、名称を同時に識別する方法を示すVisual Basic 識別図である。It is a Visual Basic identification diagram showing a method for simultaneously identifying English and Taiwanese Chinese sentences and names. 英語及び台湾中国語の文、名称を同時に識別する方法を示すVisual Basic 識別図である。It is a Visual Basic identification diagram showing a method for simultaneously identifying English and Taiwanese Chinese sentences and names.

以下に図面を参照しながら本発明を実施するための最良の形態について詳細に説明する。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings.

図１及び図２は、本発明の執行プロセスを説明する。
図１は、既知の連続音永久データベース、既知の連続音特徴データベース、文及び名称データベースという３個のデータベースの構築プロセスを示す。
連続音特徴データベースは、すべての既知の連続音の標準パターンを含み、既知の連続音の特徴を示す。
先ず、１個の既知の連続音或いは１個の文或いは名称１を入力し(文或いは名称は、多数の連続音に分割される)、ある連続音波１０形式によりレシーバー２０に進入する。
デジタル転換器３０は、連続音波を、シーケンス音波デジタルのサンプル点に転換する。 1 and 2 illustrate the enforcement process of the present invention.
FIG. 1 shows the construction process of three databases: a known continuous sound permanent database, a known continuous sound feature database, and a sentence and name database.
The continuous sound feature database includes a standard pattern of all known continuous sounds and indicates the characteristics of known continuous sounds.
First, one known continuous sound or one sentence or name 1 is input (the sentence or name is divided into a number of continuous sounds), and the receiver 20 is entered in a certain continuous sound wave 10 format.
The digital converter 30 converts continuous sound waves into sequence sound wave digital sample points.

プリプロセッサー４５は、以下のような２種の削除方法を有する。
ある一定の時間枠内のサンプル点の分散値及び一般雑音の分散値を計算する。もし前者が後者より小さければ、その一定時間枠は音声を備えないため、削除すべきである。
ある一定の時間枠内の連続する２個のサンプル点の距離の総和及び一般雑音の総和を計算する。もし前者が後者より小さければ、その一定時間枠は音声を備えないため、削除すべきである。 The preprocessor 45 has the following two types of deletion methods.
A variance value of sample points and a variance value of general noise within a certain time frame are calculated. If the former is smaller than the latter, the certain time frame has no audio and should be deleted.
The sum of the distances between two consecutive sample points within a certain time frame and the sum of the general noise are calculated. If the former is smaller than the latter, the certain time frame has no audio and should be deleted.

プリプロセッサー４５を経過することで、シーケンスは該既知の連続音サンプル点を備える。
先ず、音波を正規化し、次に特徴を抽出し、既知の連続音のすべてのサンプル点を、E等時間枠に分割する。
各時間枠は１個のフレームを組成する。
ある連続音は、合計E個の等長フレーム５０を有し、フィルターがなく、オーバーラップしない。
連続音のすべてのサンプル点の長さに基づき、E個のフレームの長さは、すべてのサンプル点を網羅できるよう自由に調整する。
よって、そのフレームは伸縮可能なフレームと呼称し、長さは自由に伸縮可能だが、E個の伸縮可能なフレームの長さは同じである。
ハミング(Hamming)ウィンドウとは異なり、フィルターを有し、ハーフオーバーラップし、長さは固定で、波長に応じて自由に調整することはできない。 As the preprocessor 45 passes, the sequence comprises the known continuous tone sample points.
First, normalize sound waves, then extract features, and divide all sample points of known continuous sound into E equal time frames.
Each time frame constitutes one frame.
A certain continuous sound has a total of E isometric frames 50, has no filter, and does not overlap.
Based on the length of all sample points in the continuous sound, the length of E frames is freely adjusted to cover all sample points.
Therefore, the frame is called a stretchable frame, and the length can be freely stretched, but the lengths of the E stretchable frames are the same.
Unlike a Hamming window, it has filters, is half-overlapped, has a fixed length, and cannot be freely adjusted according to wavelength.

ある連続音は、音波時間に従い、非線形変化を行い、音波は、１個の音声動的特徴を含み、また時間に従い、非線形変化を行なう。オーバーラップしないため、本発明は比較的少ない(E=１２)個の伸縮可能なフレームを使用し、すべての連続音の音波を網羅する。サンプル点は前のサンプル点から推計できるため、用時間に従い、線形変化の回帰方式を行い、非線形変化の音波を密接に推計し、最小平方法を用いて、回帰未知係数を推計する。各フレーム内に、１組の未知係数最小平方推計値を生じ、これを線形予測コード（LPC）ベクトルと呼称する。
さらに、線形予測コード（LPC）ベクトルを、比較的安定した線形予測ケプストラム係数（LPCC）に転換する。ある連続音の音波内には、シーケンス時間に従い、非線形変化を行なう音声動的特徴を含み、本発明内では、大きさが相互に等しいE個の線形予測ケプストラム係数（LPCC）ベクトル６０に転換する。 A certain continuous sound changes nonlinearly according to the sound wave time, and the sound wave includes one voice dynamic feature and changes nonlinearly according to time. Since there is no overlap, the present invention uses relatively few (E = 12) stretchable frames and covers all continuous sound waves. Since the sample point can be estimated from the previous sample point, the linear change regression method is performed according to the required time, the non-linear change sound wave is closely estimated, and the regression unknown coefficient is estimated using the minimum flat method. Within each frame, a set of unknown coefficient minimum square estimates is generated, which is referred to as a linear prediction code (LPC) vector.
Further, the linear prediction code (LPC) vector is converted into a relatively stable linear prediction cepstrum coefficient (LPCC). A sound wave of a continuous tone includes a speech dynamic feature that changes nonlinearly according to a sequence time. In the present invention, the sound wave is converted into E linear prediction cepstrum coefficient (LPCC) vectors 60 having the same magnitude. .

１個の既知の連続音の特徴を抽出するため、先ず、１個の永久既知の連続音データベースを準備する。各既知の連続音は、発音が標準的で明晰な発話者により１回発音する。なまりがひどい、或いは標準的でない発話を識別する場合には、そのような発話者により発音し、すべての既知の連続音をE×P個のLPCCマトリックスに転換し、永久既知の連続音データベース内に組み入れる。永久既知の連続音データベース内において、１個の既知の連続音特徴を抽出するために、先ず、未知の言葉又は連続音のデータベースを準備する。 In order to extract the characteristics of one known continuous sound, first, one permanent known continuous sound database is prepared. Each known continuous tone is pronounced once by a speaker with a standard and pronounced pronunciation. When identifying poor or non-standard utterances, they are pronounced by such a speaker and all known continuous sounds are converted into E × P LPCC matrices in a permanently known continuous sound database. Incorporate In order to extract one known continuous sound feature in a permanently known continuous sound database, an unknown word or continuous sound database is first prepared.

未知の言葉又は連続音データベースには、２種存在する。１種は、サンプルがある未知の言葉又は連続音有サンプルで、もう１種は標準がない。サンプルがある未処理データベースは、先ず、各１個の未知の言葉又は連続音の平均値及び分散値を求める。サンプルがある未知の言葉又は連続音データベース中において、ベイズ距離を用い、その既知の連続音周囲に対して、N個の最も近い未知の言葉又は連続音を探す。さらに、N個の未知音のN個の平均値、及びその既知の連続音の線形予測ケプストラム係数(LPCC)の N＋１個の加重平均値を求め、既知の連続音の平均値とし、N個の連続音のN個の分散値の加重平均値を、その既知の連続音の分散値とする。このE×P平均値及び分散値マトリックスは、その既知の連続音の初期特徴値７９で、連続音特徴データベース中に組み入れる。 There are two types of unknown words or continuous sound databases. One type is an unknown word or sample with continuous sound, and the other type has no standard. An unprocessed database with samples first determines the mean and variance of each unknown word or continuous sound. In a database of unknown words or continuous sounds with a sample, use the Bayesian distance to find the N nearest unknown words or continuous sounds around that known continuous sound. Further, N average values of N unknown sounds and N + 1 weighted average values of the linear prediction cepstrum coefficients (LPCC) of the known continuous sounds are obtained, and the average values of the known continuous sounds are obtained. The weighted average value of N variance values of the continuous tone is set as the known variance value of the continuous tone. This E × P mean value and variance value matrix is the initial feature value 79 of the known continuous tone and is incorporated into the continuous tone feature database.

もし、未知の未処理データベースにサンプルがなければ、未知の言葉又は連続音データベースにおいて、最小絶対値距離を用いて、その既知の連続音周囲にN個の未知の言葉又は連続音を探す。その既知の連続音及びN個の未知の言葉又は連続音の線形予測ケプストラム係数(LPCC)を、(N＋１)個の数字とする。(N+１)個の数字の加重平均値を求め、その既知の連続音の平均値とし、及び(N+１)個の数字の分散値を求め、その既知の連続音の分散値とし、このE×P平均値及び分散値のマトリックスは、その既知音の初期特徴を表し、既知の連続音特徴データベース内に組み入れる７９。 If there are no samples in the unknown raw database, the unknown word or continuous sound database is searched for N unknown words or continuous sounds around the known continuous sound using the minimum absolute distance. The linear prediction cepstrum coefficient (LPCC) of the known continuous sound and N unknown words or continuous sounds is defined as (N + 1) numbers. Find the weighted average value of (N + 1) numbers and make it the average value of the known continuous sounds, and find the variance value of (N + 1) numbers and make the variance value of the known continuous sounds, This matrix of E × P mean and variance values represents the initial features of the known sound and is incorporated 79 into a known continuous sound feature database.

既知の連続音特徴データベース内において、もし１個の既知の連続音の平均値と、永久既知の連続音データベース内の同様の１個の既知の連続音のLPCCのベイズ距離が、特徴データベース内において最小でないなら、特徴データベース内においてベイズ距離を用い、N個の既知の連続音を探し、それらのベイズマトリックスのその既知の連続音に対するLPCCは、N個の最小である。N個の既知の連続音を求め、N個の平均値及びその既知の単音のLPCC加重平均値を、その既知の連続音の新平均値とし、N個の既知の連続音のN個の分散値の加重平均値を用い、その既知の連続音の新しい分散値とする。この方法を繰り返し数回用いて、特徴データベース内の各１個の既知の連続音の新平均値及び分散値を計算する。最後に、E×Pの新しい平均値及び分散値マトリックスを、標準パターンと呼称し、その既知の連続音を代表し、特徴データベース中に組み入れる８０。さらに、既知の特徴データベースの既知の連続音を用いて、文及び名称データベースを構築する８５。 In the known continuous sound feature database, if the average value of one known continuous sound and the Bayesian distance of a similar known continuous sound LPCC in the permanent known continuous sound database are If not, use the Bayesian distance in the feature database and look for N known continuous notes, and the LPCC for that known continuous note in those Bayesian matrices is the N smallest. N known continuous sounds are obtained, N average values and LPCC weighted average values of the known single notes are set as new average values of the known continuous sounds, and N variances of the N known continuous sounds are obtained. Use the weighted average of the values as the new variance value for that known continuous sound. This method is repeated several times to calculate new average values and variance values for each known continuous tone in the feature database. Finally, the new E × P mean and variance matrix is referred to as the standard pattern, representing its known continuous sound, and incorporated into the feature database 80. In addition, a sentence and name database is constructed 85 using known continuous sounds from a known feature database.

図２は、１個の未知の文或いは名称の識別方法手順を示す。１個の未知の文或いは名称２を、本発明音声識別方法に入力後、１組の未知の言葉又は連続音波１１により、レシーバー２０に進入する。デジタル転換器３０により、一系列の音波サンプル点に転換する。１個の未知の文或いは名称の音波を、Ｄ個の未知の言葉又は連続音の音波４０に分割する。さらに、図１に示すプリプロセッサー４５により、音声を備えない音波を削除する。次に、各未知の言葉又は連続音の音波を正規化し、特徴を抽出し、文或いは名称各未知の言葉又は連続音の音声を備えるすべてのサンプル点をE等時間枠に分割する。各時間枠は、１個の伸縮可能なフレームを形成する５０。各連続音は、合計E個の伸縮可能なフレームを有し、フィルターがなく、オーバーラップせず、自由に伸縮し、すべてのサンプル点を網羅する。 FIG. 2 shows a procedure for identifying one unknown sentence or name. After entering one unknown sentence or name 2 into the speech recognition method of the present invention, the receiver 20 is entered by a set of unknown words or continuous sound waves 11. The digital converter 30 converts to a series of sonic sample points. A sound wave of one unknown sentence or name is divided into D unknown words or sound waves 40 of continuous sounds. Further, the pre-processor 45 shown in FIG. Next, normalize the sound wave of each unknown word or continuous sound, extract features, and divide all sample points comprising the sentence or name of each unknown word or continuous sound into E equal time frames. Each time frame forms a stretchable frame 50. Each continuous tone has a total of E stretchable frames, no filters, no overlap, stretches freely and covers all sample points.

各フレーム内において、サンプル点は、前の信号により推計することができるため、最小平方法を用いて、回帰未知係数の推計値を求める。各フレーム内に、生じる１組の最小平方推計値を、線形予測コード（LPC）ベクトルと呼称する。線形予測コード（LPC）ベクトルは、正常に分配され、さらに、線形予測コード（LPC）ベクトルを、比較的安定した線形予測ケプストラム係数（LPCC）ベクトルに転換する６０。１個の未知の言葉又は連続音は、E個の線形予測ケプストラム係数（LPCC）ベクトルを、特徴パターンとし、分類パターンと呼称し９０、既知の連続音標準パターンと大きさが同じである。１個の文は、計Ｄ個の分類パターンを有し、Ｄ個の未知の言葉又は連続音を代表する９０。もし１個の既知の連続音がこの未知の言葉又は連続音であるなら、その標準パターンの平均値は、未知の言葉又は連続音分類パターンに最も近い線形予測ケプストラム係数（LPCC）である。よって、本発明の簡易ベイズ識別法は、未知の言葉又は連続音の分類パターンと連続音データベース８０により、各１個の既知の連続音の標準パターンを比較する１００。 Since each sample point can be estimated from the previous signal within each frame, the estimated value of the regression unknown coefficient is obtained using the minimum flat method. The set of least square estimates that occur within each frame is referred to as a linear prediction code (LPC) vector. The linear prediction code (LPC) vector is successfully distributed and further transforms the linear prediction code (LPC) vector into a relatively stable linear prediction cepstrum coefficient (LPCC) vector 60. One unknown word or sequence The sound has E linear prediction cepstrum coefficient (LPCC) vectors as feature patterns and is called a classification pattern 90 and is the same size as a known continuous sound standard pattern. A sentence has a total of D classification patterns and represents 90 unknown words or continuous sounds. If one known continuous sound is this unknown word or continuous sound, the average value of the standard pattern is the linear prediction cepstrum coefficient (LPCC) closest to the unknown word or continuous sound classification pattern. Thus, the simplified Bayesian identification method of the present invention compares the standard pattern of each known continuous sound 100 with the unknown word or continuous sound classification pattern and the continuous sound database 80.

もし、１個の既知の連続音が、その未知の言葉又は連続音であるなら、計算の時間を節約するため、未知の言葉又は連続音の分類パターン内のすべての線形予測ケプストラム係数（LPCC）が、独立した正規分配を有すると仮定し、それらの平均数及び分散値を、既知の連続音標準パターン内の平均値及び分散値により推計する。簡易ベイズ法は、未知の言葉又は連続音の線形予測ケプストラム係数（LPCC）と既知の連続音の平均数の距離を計算する。さらに、既知の連続音分散値により調整し、得られた値は、その未知の言葉又は連続音と１個の既知の連続音の相似度を表す。未知の言葉又は連続音と、F個の相似度が最高の既知の連続音を選択し、未知の言葉又は連続音とする。よって、１個の未知の文或いは名称は、D×F個の既知の連続音を用いて表示される１１０。 If one known continuous sound is that unknown word or continuous sound, all linear prediction cepstrum coefficients (LPCC) in the unknown word or continuous sound classification pattern to save computation time Are independent normal distributions, and their average number and variance are estimated by the average and variance within a known continuous tone standard pattern. The simplified Bayesian method calculates the distance between the unknown word or the linear prediction cepstrum coefficient (LPCC) of continuous sounds and the average number of known continuous sounds. Further, the value obtained by adjusting by the known continuous sound dispersion value represents the similarity between the unknown word or continuous sound and one known continuous sound. An unknown word or continuous sound and a known continuous sound with the highest F similarity are selected and set as an unknown word or continuous sound. Thus, one unknown sentence or name is displayed 110 using D × F known continuous sounds.

１個の未知の文或いは名称を、D個の未知の言葉又は連続音に分割した後、１個の未知の文或いは名称が含む連続音及び個数をちょうど分割することは難しい。ある時はある連続音を２個に分割し、ある時は２個の連続音を非常に似たように発音し、コンピューターは１個に分割する。よって、D個の未知の言葉又は連続音は、発話者の本当の連続音の個数とは限らない。よって、ある一列のF個の既知と相似した連続音は、発話者の連続音を含むとは限らない。１個の未知の文或いは名称を識別する時、文と名称データベース８５において、各１個の既知の文及び名称をテストする。１個の文或いは名称が、発話者の文或いは名称であるか否かをテストし、その文或いは名称を、一つ目の既知の連続音から、D×Fマトリックスが相似する連続音の前後三列の相似の連続音と比較する（当然、一つ目の比較は、中と後の２列の相似の連続音しか比較することはできない）。次に、３×Fウィンドウ（前後三列の既知の相似の連続音）に移動し１２０、文の二つ目の既知の連続音を探す。こうして、文のすべての既知の連続音をテストする。 After dividing one unknown sentence or name into D unknown words or continuous sounds, it is difficult to just divide the continuous sound and number included in one unknown sentence or name. At times, a continuous sound is divided into two parts. At other times, the two continuous sounds are pronounced very similar, and the computer divides them into one. Thus, the D unknown words or continuous sounds are not necessarily the true number of continuous sounds of the speaker. Therefore, the F continuous sounds similar to the known F in a line do not necessarily include the continuous sound of the speaker. When identifying an unknown sentence or name, each sentence and name database 85 tests each known sentence and name. Test whether a sentence or name is a speaker's sentence or name, and the sentence or name from the first known continuous sound before or after the continuous sound with similar D × F matrix Compare with similar sounds in three rows (naturally, the first comparison can only compare similar sounds in two rows in the middle and the back). Next, move to the 3 × F window (three similar rows of similar similar continuous sounds) 120 and look for the second known continuous sound of the sentence. In this way, all known continuous sounds of the sentence are tested.

文及び名称データベースにおいて、最高確率の文或いは名称は、発話者の文或いは名称である（テストした文或いは名称中の既知の連続音の３×Fウィンドウにおける数を、テストした文或いは名称中の連続音数で割る）１３０。当然、文及び名称データベースにおいて、未知の文或いは名称（Ｄ個の未知の言葉又は連続音）長さが大体相同の文或いは名称を選択して比較し、時間を節約することができる。もし、文或いは名称が識別できない場合には、ベイズ識別器を用いて、特徴データベース中において、N個の最も相似する連続音を探し７９、文中の連続音特徴を改善すれば、識別は必ず成功する。 In the sentence and name database, the sentence or name with the highest probability is the sentence or name of the speaker (the number of known continuous sounds in the tested sentence or name in the 3 × F window is the number in the tested sentence or name. Divide by the number of continuous notes 130). Of course, in the sentence and name database, unknown sentences or names (D unknown words or continuous sounds) can be selected and compared with sentences or names whose lengths are approximately homologous, thereby saving time. If the sentence or name cannot be identified, the Bayes classifier is used to search the feature database for the N most similar continuous sounds79, and if the continuous sound features in the sentence are improved, the identification is always successful. To do.

以下に詳述する。
ある連続音を音声識別方法に入力後、この連続音音波を一系列の音波サンプル点（signal sampled points）に転換する。さらに、音声音波を備えないサンプル点を削除する。本発明は２種の方法を提供する。一つ目は、ある一定の時間枠内サンプル点の分散値を計算する。二つ目は、その時間枠内の相互に隣接する２つのサンプル点の距離の総和を計算する。理論上は、第一の方法がより良いが、サンプル点の分散値が、雑音分散値より大きく、音声が存在することを表す。但し、本発明が連続音を識別する時には、２種の方法の識別率は同じであるが、第二の方法が時間を節約できる。 This will be described in detail below.
After a continuous sound is input to the speech identification method, the continuous sound wave is converted into a series of signal sampled points. In addition, sample points that do not have audio sound waves are deleted. The present invention provides two methods. First, the variance of sample points within a certain time frame is calculated. The second calculates the sum of the distances between two adjacent sample points within the time frame. Theoretically, the first method is better, but the variance value of the sample points is larger than the noise variance value, indicating that speech is present. However, when the present invention identifies continuous sounds, the two methods have the same identification rate, but the second method can save time.

音声を備えないサンプル点を削除後、残ったサンプル点はある連続音のすべてのサンプル点を表す。先ず、音波を正規化し、次に特徴を抽出し、すべてのサンプル点をE等時間枠に分割する。各時間枠は１個のフレームを形成する。ある連続音は、合計E個の等しい長さの伸縮可能なフレームを有し、フィルターがなく、オーバーラップせず、自由に伸縮して、すべてのサンプル点を網羅する。伸縮可能なフレーム内のサンプル点は時間に従い、非線形変化を行い、数学モデルにより表すのは難しい。なぜならJ.Markhoul氏は、１９７５年に出版された「Proceedings of IEEE, Vol.６３, No.４」において、論文「Linear Prediction: A tutorial review」を発表しているが、その中で、サンプル点と前のサンプル点には線形関係があり、時間に従い、線形変化を行う回帰モデルを用いて、この非線形変化のサンプル点を推計することができる、と説明しているからである。

ると、因最後の線形予測ケプストラム係数（LPCC）によれば０に近似する。ある連続音はE個の線形予測ケプストラム係数（LPCC）ベクトル表示特徴とし，つまり１個のE×P個の線形予測ケプストラム係数（LPCC）のマトリックス表示のある連続音を含み，ある
連続音は一個ないし多数の音節を含む。
After deleting sample points that do not have sound, the remaining sample points represent all sample points of a continuous tone. First, normalize the sound waves, then extract features, and divide all sample points into E equal time frames. Each time frame forms one frame. A continuous tone has a total of E equal length stretchable frames, has no filters, does not overlap, stretches freely, and covers all sample points. The sample points in the stretchable frame change nonlinearly with time and are difficult to represent with a mathematical model. Because J. Markhoul published a paper “Linear Prediction: A tutorial review” in “Proceedings of IEEE, Vol. 63, No. 4” published in 1975. This is because there is a linear relationship between the previous sampling point and the previous sampling point, and it is explained that the sampling point of this nonlinear change can be estimated using a regression model that changes linearly according to time.

Then, according to the last linear prediction cepstrum coefficient (LPCC), it approximates to 0. A continuous tone has E linear prediction cepstrum coefficient (LPCC) vector display features, that is, contains a continuous sound with a matrix display of one E × P linear prediction cepstrum coefficient (LPCC), and one continuous tone is one Or contains many syllables.

(３)同様方法で、式(８-１５)により、１個の未知の言葉又は連続音の音波のE個の線形予測ケプストラム係数（LPCC）ベクトルを計算すると、同様の大きさのE×P個のLPCCのマトリックスを備え、それを未知の言葉又は連続音の分類パターンと呼称する。 (3) In the same way, when E linear prediction cepstrum coefficient (LPCC) vectors of one unknown word or continuous sound wave are calculated by equation (8-15), E × P of the same size A matrix of LPCCs is provided, which is called an unknown word or continuous tone classification pattern.

(５)１個の既知の連続音の特徴を抽出するため、先ず、未知の言葉又は連続音のデータベースを準備する。未知の言葉又は連続音データベースには２種ある。一種は、未知の言葉又は連続音のサンプルがあり、もう一種は、サンプルがない。サンプルがある未処理データベースでは、先ず、各１個の未知の言葉又は連続音の平均値及び分散値を求める。サンプルがある未知の言葉又は連続音データベース中において、ベイズ距離を用い、その既知の連続音周囲に対して、N個の最も近い未知の言葉又は連続音を探す。さらに、N個の未知の音のN個の平均値、及びその既知の連続音の線形予測ケプストラム係数(LPCC)の N＋１個の加重平均値を求め、既知の連続音の平均値とし、N個の連続音のN個の分散値の加重平均値を、その既知の連続音の分散値とする。このE×P平均値及び分散値マトリックスは、その既知の連続音の初期特徴値７９で、連続音特徴データベース中に組み入れる。もし、未知の言葉又は連続音データベースにサンプルがなければ、未知の言葉又は連続音データベースにおいて、最小絶対値距離を用いて、その既知の連続音周囲にN個の未知の言葉又は連続音を探す。その既知の連続音及びN個の未知の言葉又は連続音の線形予測ケプストラム係数(LPCC)を、(N＋１)個の数字とする。(N+１)個の数字の加重平均値を求め、その既知の連続音の平均値とし、及び(N+１)個の数字の分散値を求め、その既知の連続音の分散値とする。このE×P平均値及び分散値のマトリックスは、その既知の連続音の初期特徴を表し、既知の連続音特徴データベース内に組み入れる７９。既知の連続音特徴データベース内において、もし１個の既知の連続音の平均値と、永久既知の連続音データベース内の同様の１個の既知の連続音のLPCCのベイズ距離が、特徴データベース内において最小でないなら、特徴データベース内においてベイズ距離を用い、N個の既知の連続音を探す。それらのベイズマトリックスのその既知の連続音に対するLPCCは、N個の最小である。N個の既知の連続音を求め、N個の平均値及びその既知の連続音のLPCC加重平均値を、その既知の連続音の新平均値とし、N個の既知の連続音のN個の分散値の加重平均値を用い、その既知の連続音の新しい分散値とする。この方法を繰り返し数回用いて、特徴データベース内の各１個の既知の連続音の新平均値及び分散値を計算する。最後に、E×Pの新しい平均値及び分散値マトリックスを、標準パターンと呼称し、その既知の連続音を表し、特徴データベース中に組み入れ８０、既知の特徴データベースの既知の連続音を用いて、文及び名称データベースを構築する８５。 (5) In order to extract the characteristics of one known continuous sound, first, a database of unknown words or continuous sounds is prepared. There are two types of unknown words or continuous sound databases. One type is a sample of unknown words or continuous sounds, and the other type is no sample. In an unprocessed database with samples, first, an average value and a variance value of each unknown word or continuous sound are obtained. In a database of unknown words or continuous sounds with a sample, use the Bayesian distance to find the N nearest unknown words or continuous sounds around that known continuous sound. Further, N average values of N unknown sounds and N + 1 weighted average values of the linear prediction cepstrum coefficients (LPCC) of the known continuous sounds are obtained, and the average value of the known continuous sounds is obtained. The weighted average value of the N variance values of the continuous tone is set as the variance value of the known continuous tone. This E × P mean value and variance value matrix is the initial feature value 79 of the known continuous tone and is incorporated into the continuous tone feature database. If there are no samples in the unknown word or continuous sound database, search for N unknown words or continuous sounds around the known continuous sound in the unknown word or continuous sound database using the minimum absolute distance. . The linear prediction cepstrum coefficient (LPCC) of the known continuous sound and N unknown words or continuous sounds is defined as (N + 1) numbers. Find the weighted average value of (N + 1) numbers and use it as the average value of its known continuous sounds, and find the variance value of (N + 1) numbers and use it as the variance value of the known continuous sounds . This matrix of E × P mean and variance values represents the initial features of the known continuous tone and is incorporated 79 into the known continuous tone feature database. In the known continuous sound feature database, if the average value of one known continuous sound and the Bayesian distance of a similar known continuous sound LPCC in the permanent known continuous sound database are If not, use the Bayes distance in the feature database and look for N known continuous sounds. The LPCC for that known continuous tone of their Bayesian matrix is the N smallest. N known continuous sounds are obtained, and N average values and LPCC weighted average values of the known continuous sounds are set as new average values of the known continuous sounds. The weighted average value of the variance values is used as the new variance value of the known continuous sound. This method is repeated several times to calculate new average values and variance values for each known continuous tone in the feature database. Finally, the new average and variance matrix of E × P is referred to as the standard pattern and represents its known continuous sound, incorporated into the feature database 80, using the known continuous sound of the known feature database, Build sentence and name database 85.

(７)本発明が同時にあらゆる言語を識別可能であることを証明するため、本発明は２人の音声識別実験を行なった。
(a)先ず、１個の未知の言葉又は連続音データベースを構築する。本単音データベースは、台湾の中央研究院より購入した。データベースには、計３８８個の台湾中国語単音（図３）があり、全て女性が発音しており、サンプルは、６個から９９個で、多くの単音の発音は、ほぼ同様である。
(b)（２）節中方法から、すべてのサンプルをE×P LPCCマトリックスに転換すると、計１２４００個のマトリックスを有する。
(c)３８８個の台湾中国語単音中において、サンプルを用いて平均値及び分散値を求める。
(Ｄ)アットランダムに３８８個の台湾中国語単音を混合し、３８８個のサンプルがある平均値及び分散値の単音を、３８８個の未知の言葉又は連続音データベースとする(１個の台湾中国語単音は、音節が１個だけの連続音である)。
(e)次に、男性一人、女性一人により、６５４個の台湾中国語単音、１５４個の英語、１個のドイツ語、１個の日本語及び３個の台湾語を、１回発音し、２個の８１３個の永久既知の連続音データベースを構築する。各連続音は、線形予測ケプストラム係数(LPCC)E×Pマトリックスにより表示する。
(f)永久既知の連続音データベースの８１３個の既知の連続音中において、各１個の既知の連続音に対して、ベイズ距離２０を用い、３８８個の未知の言葉又は連続音中において、N=１５個の未知の言葉又は連続音を探す。その既知の連続音の線形予測ケプストラム係数(LPCC)及びN個の未知の言葉又は連続音のサンプル平均値は、N+１個加重平均値を求め、その既知の連続音の平均値とし、N個の未知の言葉又は連続音のサンプル分散値の加重平均値を求め、その既知の連続音の分散値とする。この平均値及び分散値１２×１２マトリックスを、その既知の連続音の初期特徴と呼称７９し、既知の連続音特徴データベースに存在する。つまり、特徴データベースは、８１３個の１２×１２平均値及び分散値マトリックス８０を含む。
(g)特徴データベース中において、もし１個の既知の連続音の平均値が、永久連続音データベース中においてと同様であるなら、その既知の連続音のLPCCのベイズ距離は、最小ではない。８１３個の連続音特徴ベイズ距離を用い、N=１５既知の連続音を探す。N個の連続音のN個の平均値及びその既知の連続音のLPCCを用いて、加重平均値を求め、その既知の連続音の新しい平均値とする。N個の既知の連続音の分散値に対して、加重平均値を求め、その既知の連続音の新分散値とする。新平均値及び分散値を繰り返し数回計算する。最後の１２×１２平均値及び分散値マトリックスを標準パターンと故障し、その既知の連続音特徴を表し、既知の連続音特徴データベース中８０に存在する。
本発明は、以下の連続音識別を行なった。識別率は、人により決まり、相似が多すぎるため、上位３人を正解とする。
３８４個の台湾中国語単音、１個のドイツ語、１個の日本語、２個の台湾語を識別する(図３参照)（識別率が非常に高い）
１５４個の英語、１個のドイツ語を識別する(図４参照)（識別率が非常に高い）
１５４個の英語及び３８８個の台湾中国語、１個のドイツ語、１個の日本語、２個の台湾語を同時に識別する（識別率が非常に高い）
（４）６５４個の台湾中国語単音、１個のドイツ語、１個の日本語、３個の台湾語を識別する(図５参照)（識別率は高いが、上記三例ほどではない） (7) In order to prove that the present invention can discriminate all languages at the same time, the present invention conducted a voice discrimination experiment of two people.
(a) First, an unknown word or continuous sound database is constructed. This phone database was purchased from the Central Research Institute in Taiwan. There are a total of 388 Taiwanese Chinese phonetics in the database (Fig. 3), all of which are pronounced by women, and there are 6 to 99 samples, and the pronunciation of many singles is almost the same.
(b) From the method in section (2), when all samples are converted to an E × P LPCC matrix, it has a total of 12400 matrices.
(c) In 388 Taiwanese Chinese single notes, the average value and the variance value are obtained using samples.
(D) At random, 388 Taiwanese Chinese single notes are mixed, and 388 samples have average and variance single notes as 388 unknown words or continuous sound database (1 Taiwan Chinese A single word is a continuous sound with only one syllable).
(e) Next, one male and one female each pronounces 654 Taiwanese Chinese sounds, 154 English, 1 German, 1 Japanese, and 3 Taiwanese, Construct two 813 permanently known continuous tone databases. Each continuous tone is displayed by a linear prediction cepstrum coefficient (LPCC) E × P matrix.
(f) In 813 known continuous sounds in a permanent known continuous sound database, for each one known continuous sound, using Bayesian distance 20, in 388 unknown words or continuous sounds, Search for N = 15 unknown words or continuous sounds. The linear prediction cepstrum coefficient (LPCC) of the known continuous sound and the sample average value of N unknown words or continuous sounds are obtained as N + 1 weighted average values, and the average value of the known continuous sounds is defined as N A weighted average value of the sample variance values of the unknown words or continuous sounds is obtained and set as the dispersion value of the known continuous sounds. This average and variance 12 × 12 matrix is referred to as the initial feature of the known continuous tone 79 and is present in the known continuous feature database. That is, the feature database includes 813 12 × 12 average and variance value matrices 80.
(g) In the feature database, if the average value of one known continuous sound is the same as in the permanent continuous sound database, the Bayes distance of the LPCC of the known continuous sound is not minimum. N = 15 known continuous sounds are searched using 813 continuous sound feature Bayes distances. A weighted average value is obtained using N average values of N continuous sounds and LPCC of the known continuous sounds, and set as a new average value of the known continuous sounds. A weighted average value is obtained for the dispersion values of N known continuous sounds, and set as a new dispersion value of the known continuous sounds. New average and variance values are calculated several times. The last 12 × 12 mean and variance matrix fails with the standard pattern and represents its known continuous tone features and is present in 80 in the known continuous tone feature database.
The present invention performed the following continuous tone identification. The identification rate is determined by the person and there are too many similarities, so the top three are correct.
Identify 384 Taiwanese Chinese singles, 1 German, 1 Japanese, 2 Taiwanese (see Figure 3) (very high recognition rate)
Identify 154 English, 1 German (see Figure 4) (very high identification rate)
154 English and 388 Taiwanese Chinese, 1 German, 1 Japanese, and 2 Taiwanese at the same time (very high identification rate)
(4) Identify 654 Taiwanese Chinese singles, 1 German, 1 Japanese, 3 Taiwanese (see Fig. 5) (identification rate is high, but not as high as the above three examples)

(８)ある発話者の文或いは名称を識別するに当たり、我々は先ず、１個の英語及び台湾中国語の文及び名称データベースを構築した。各文或いは名称内の連続音すべては、連続音特徴データベース内(３８４+１５４)の既知英語及び台湾中国語により任意に組成する。１５４個の英単語は７０個の英語文及び名称を組成し、３８４個の台湾中国語単語は、４０７個の台湾中国語の文及び名称を組成する（図６参照）。
その識別方法は、以下の通りである。
(a)１個の未知の文或いは名称を、Ｄ個の未知の言葉又は連続音に分割し、各単位時間枠は、相互に隣接する２つのサンプル点落差距離総和を計算する。もし小さ過ぎるなら、その時間枠は、雑音或いは靜音で、音声信号のない相互に隣接する単位時間枠の累積が多過ぎ（連続音２音節時間より多い）、すべてが雑音或いは靜音であることを示しており、２個の連続音の境界線で分割すべきで、計Ｄ個の未知の言葉又は連続音に分割する。次に、図２の４５、５０、６０及び９０プロセスを用いて、E×P LPCCマトリックスに転換する。各１個の未知の言葉又は連続音に対して、ベイズ識別器２０を用いて、英語及び台湾中国語の特徴データベース中において、最も相似したF個の既知の連続音を選択する(同時に、英語及び台湾中国語を含む可能性がある(図))。未知の文或いは名称は、Ｄ×F最も相似した既知の連続音により表示する。
(b)文及び名称データベースにおいて、発話者の文或いは名称を探し、４７７個の英語及び台湾中国語の文と名称中において、長さが（Ｄ±１）個の既知の連続音文と名称を選択する。
(c)もし、データベースの選択が、比較する文或いは名称及び発話者の文或いは名称と等しい長さであるなら、Ｄ個の未知の言葉又は連続音である時には、Ｄ個の各列F個の相似した既知の連続音と比較する文或いは名称のＤ個の既知の連続音は、順番に比較し、F個の相似する連続音が、比較する文或いは名称内の既知の連続音であるかどうかを見る。もし、各列の相似の連続音内に、すべて１個の比較文或いは名称内の既知の連続音を含むなら、正確な連続音をＤ個と識別する。すなわち、その比較の文或いは名称は、発話者の文或いは名称である。
(d)もし、データベース比較文と名称内既知の連続音数が、Ｄ-１或いはＤ+１、或いは（c）の識別正確連続音がＤ個でないなら、本発明は、３×Fウィンドウを用いてスクリーニングする。比較文或いは名称（データベース内）中において、第ｉ個の既知の連続音は、D×Fマトリックス中の前後三列の相似した既知の連続音（すなわち第ｉ−１、ｉ、ｉ＋１列）を用いて、第ｉ個の既知の連続音を比較し、Ｄ×Fマトリックスにどれだけの比較文或いは名称内の既知の連続音があるかを計算する。次に、総数Ｄにより割り、その比較文或いは名称の確率を求め、データベースにおいて、１個の確率が最大の文或いは名称を発話者の発音として選択する。
(e)もし、ある文或いは名称の識別がエラーであるなら、必ず、Ｄ個の未知の言葉又は連続音中に１個或いは多数あり、それらのF個の相似した既知の連続音にはない。ベイズ識別器２０を用いて、(１５５＋３８４)個の既知の連続音中で、前からN＝１５順位の既知の連続音を探し、N個の相似の連続音及びその未知の言葉又は連続音のLPCC加重平均値を求め、その未知の言葉又は連続音を改善する。こうしてＤ個の未知の言葉又は連続音は、それらF個の相似した既知の連続音内にあり、再度のテストは必ず成功する。
本発明は、以下の英語及び台湾中国語の文及び名称識別を行なった。識別はほとんどすべてが正しいが、人により異なる。
（１）７０個の英語文及び名称を識別（非常に良い）。
（２）４０７個の台湾中国語の文及び名称を識別（非常に良い）
（３）７０個の英語文及び名称と４０７個の台湾中国語の文及び名称を識別（非常に良い）。 (8) In identifying a speaker's sentence or name, we first built one English and Taiwanese sentence and name database. All continuous sounds in each sentence or name are arbitrarily composed of known English and Taiwanese Chinese in the continuous sound feature database (384 + 154). 154 English words compose 70 English sentences and names, and 384 Taiwanese Chinese words compose 407 Taiwan Chinese sentences and names (see FIG. 6).
The identification method is as follows.
(a) One unknown sentence or name is divided into D unknown words or continuous sounds, and each unit time frame calculates the sum of two sample drop distances adjacent to each other. If it is too small, the time frame is noisy or stuttering, and there is too much accumulation of adjacent unit timeframes (no more than two continuous syllable times) with no audio signal, and all are noise or stuttering. It should be divided at the boundary between two continuous sounds, and divided into a total of D unknown words or continuous sounds. Next, the 45, 50, 60, and 90 processes of FIG. 2 are used to convert to an E × P LPCC matrix. For each unknown word or series of sounds, the Bayes classifier 20 is used to select the most similar F known series of sounds in the feature database of English and Taiwanese (at the same time, English And may include Taiwanese Chinese (Figure)). An unknown sentence or name is displayed by a known continuous sound that is most similar to D × F.
(b) Look up the sentence or name of the speaker in the sentence and name database. Among the 477 English and Taiwanese sentences and names, (D ± 1) known continuous sound sentences and names Select.
(c) If the database selection is of length equal to the sentence or name to be compared and the sentence or name of the speaker, if there are D unknown words or continuous sounds, then each of D columns F D known series of sentences or names to be compared with similar known continuous sounds of F are compared in order, and F similar continuous sounds are known continuous sounds in the sentence or name to be compared. See if. If all the similar continuous sounds in each row include one continuous sentence or a known continuous sound in the name, the correct continuous sound is identified as D. That is, the comparison sentence or name is the sentence or name of the speaker.
(d) If the database comparison sentence and the number of known continuous sounds in the name are D-1 or D + 1, or (c) the number of identified accurate continuous sounds is not D, the present invention uses a 3 × F window. To screen. In the comparison sentence or name (in the database), the i-th known continuous sound is the similar known continuous sound of the front and rear three rows in the D × F matrix (that is, the i−1, i, i + 1th columns). Used to compare the i th known continuous sounds and calculate how many comparison sentences or known continuous sounds in the name are in the D × F matrix. Next, the probability of the comparison sentence or name is obtained by dividing by the total number D, and the sentence or name having the maximum probability in the database is selected as the pronunciation of the speaker.
(e) If the identification of a sentence or name is an error, there will always be one or many of the D unknown words or continuous sounds, and not those F similar known continuous sounds . The Bayes classifier 20 is used to search for (N = 15) known continuous sounds from among (155 + 384) known continuous sounds, and to search for N similar continuous sounds and their unknown words or continuous sounds. Find the LPCC weighted average and improve the unknown word or continuous sound. Thus, the D unknown words or series of sounds are within those F similar known series and the second test will always succeed.
In the present invention, the following English and Taiwanese Chinese sentences and names were identified. Almost everything is correct, but it varies from person to person.
(1) Identify 70 English sentences and names (very good).
(2) Identify 407 Taiwanese Chinese sentences and names (very good)
(3) Identify 70 English sentences and names and 407 Taiwan Chinese sentences and names (very good).

本発明は多数回の試験を経て、予期の目的を達成可能であることが確証された。しかもその機能は卓越しており、申請前に公開刊行物で未見で、及び公開使用の事実もないため、本発明は特許請求の要件である新規性を備え、従来の同類製品に比べ十分な進歩を有し、実用性が高く、社会のニ一ズに合致しており、産業上の利用価値は非常に大きい。 The present invention has been confirmed to be able to achieve the expected purpose after many tests. Moreover, its functions are outstanding, it has not been seen in public publications before application, and there is no fact of public use. Therefore, the present invention has novelty that is a requirement of claims and is sufficient compared to conventional similar products. It has great progress, is highly practical, meets social needs, and has a very high industrial utility value.

１１個の既知の連続音永久データベースを構築し、ある連続音或いは１個の文を発音し、文をさらに多数の既知の連続音に分割する。
１０連続音波
２０レシーバー
３０音波デジタル転換器
４５雑音除去
５０ E個の伸縮可能なフレーム正規化音波
６０最小平方法により線形予測ケプストラム係数（LPCC）ベクトルを計算
７０ベイズ距離(絶対値距離)を用い、各１個の既知の連続音(永久データベース)に対して、未知の言葉又は連続音データベースにおいて、N個の最も新しい未知の言葉又は連続音を探す。
７９各１個の既知の連続音(永久データベース)に対して、周囲のN個の未知の言葉又は連続音及び該既知の連続音のLPCCを用いて、加重平均値を求める。該既知の連続音の初期特徴を、特徴データベースに組み入れる。さらに、特徴データベースにおいて、ベイズ距離を用い、N個の既知の連続音と該既知の連続音LPCC加重平均値を求め、数回の計算を行なう。最後の加重平均値(E×P平均値及び分散値)は、該既知の連続音の標準パターンを表す。
８０既知の連続音特徴データベースは、すべての平均値及び分散値の標準パターンを含む。
８５既知の連続音特徴データベースの連続音を用いて、識別しようとする文及び名称の文及び名称データベースを構築する。
２未知の文或いは名称を入力する。
１１１組の未知の言葉又は連続音波
４０１個の文或いは名称を、Ｄ個の未知の言葉又は連続音に分割する。
９０Ｄ個の未知の言葉又は連続音の線形予測ケプストラム係数（LPCC）マトリックスは、Ｄ個の未知の言葉又は連続音分類パターンを表す。
１００ベイズ識別器を用いて、各１個の既知の連続音標準パターンと、未知の言葉又は連続音分類パターンを比較する。
１１０一文或いは名称中から、各１個の未知の言葉又は連続音の最も近接するF個の既知の連続音を探し、一文或いは名称は、計Ｄ×F個の既知の最も相似する連続音により表される。
１２０文と名称データベースにおいて、３×Fウィンドウの相似した既知の連続音を用いて、すべての文及び名称中の各既知の連続音をスクリーニングする。
１３０文及び名称データベースにおいて、１個の最も可能性の高い文或いは名称を探す。 1. Build one known continuous sound permanent database, pronounce a continuous sound or a sentence, and divide the sentence into a number of known continuous sounds.
10 continuous sound wave 20 receiver 30 sound wave digital converter 45 noise removal 50 E stretchable frame normalized sound wave 60 calculate linear prediction cepstrum coefficient (LPCC) vector by minimum flat method 70 using Bayesian distance (absolute value distance), For each known continuous sound (permanent database), look for the N newest unknown words or continuous sounds in the unknown word or continuous sound database.
79 For each known continuous sound (permanent database), a weighted average value is obtained by using the surrounding N unknown words or continuous sounds and LPCC of the known continuous sounds. The initial features of the known continuous sound are incorporated into a feature database. Further, using the Bayes distance in the feature database, N known continuous sounds and the known continuous sound LPCC weighted average value are obtained, and calculation is performed several times. The last weighted average value (E × P average value and variance value) represents the standard pattern of the known continuous sound.
80 The known continuous sound feature database contains a standard pattern of all mean and variance values.
85 Build a sentence and name database of sentences and names to be identified using the continuous sounds of a known continuous sound feature database.
2 Enter an unknown sentence or name.
11 A set of unknown words or continuous sound waves 40 A sentence or name is divided into D unknown words or continuous sounds.
A 90 linear prediction cepstrum coefficient (LPCC) matrix of D unknown words or series represents D unknown words or series classification patterns.
A 100 Bayes classifier is used to compare each one known continuous tone standard pattern with an unknown word or continuous tone classification pattern.
110 Search for F known continuous sounds closest to each unknown word or continuous sound in a sentence or name, and the sentence or name is based on a total of D × F known similar sounds. expressed.
In the 120 sentence and name database, each known continuous sound in all sentences and names is screened using similar known continuous sounds in a 3 × F window.
In the sentence and name database, look for one most likely sentence or name.

Claims

A method for identifying utterances in any language, with the following steps:
(1) An unprocessed database comprising a plurality of samples of an arbitrary language and consisting of unknown words or continuous sounds; or an unprocessed database consisting of an unknown word or continuous sounds without having a sample of arbitrary languages; With
The plurality of samples, said emitted by an unknown word or continuous sound the same speaker, step consists of at least a plurality of words or continuous sound,
(2) providing a permanent database of known words pronounced by a speaker with a standard, clear and clear utterance or by a subject;
(3) using a processor to delete noise and a time frame without a speech signal from the speech waveform;
(4) Normalize the total length of the utterance waveform of one word or continuous sound, and use E = 12 stretchable frames without filters and without overlap, Transforming to ExP = 12 × 12 identically sized matrices of linear prediction cepstrum coefficients (LPCC);
(5) calculating an average value and a variance value of LPCC of samples from a plurality of samples in an unprocessed database having the plurality of samples;
(6) From the unprocessed database having the plurality of samples, N samples having the average value and the variance value of the LPCC of the samples and using a simple Bayes classifier are closest to the known words in the permanent database. Find N unknown words with a Bayesian distance of
Locating N unknown words from an unprocessed database without the sample with N absolute distances closest to known words in the permanent database;
(7) N unknown words having an average value and variance value of LPCC of the samples and having N Bayes distances closest to the known words in the raw database having the plurality of samples. From the (N + 1) data of the N LPCCs of the known words and the LPCCs of the known words in the permanent database, an average value and a variance value of the LPCCs of the known words are calculated,
Display an ExP = 12 × 12 matrix of LPCC mean and variance values of the known language as features of known words called standard patterns, and other known words of several different languages And storing a standard pattern of the known words in a word database,
And creating necessary sentences and names from known words in the word database and storing them in the sentences and name database;
(8) If the unknown word or continuous sound in the raw database does not have a sample, it has the N absolute distances closest to the known words in the permanent database, does not have the sample Consider N LPCCs of N unknown words in the raw database and LPCCs of known words in the permanent database as (N + 1) data;
Calculate an average value and a variance value of the (N + 1) data,
And storing in the word database an Exp = 12 × 12 matrix of mean values and variance values of the LPCC as known word features, referred to as standard putters;
(9) Normalize the input unknown word or continuous sound waveform length using E = 12 extendable frames without filter and without overlap,
And converting the entire waveform length into an ExP = 12 × 12 identically sized matrix of LPCCs, referred to as the unknown word classification pattern;
(10) The standard pattern of each known word in the word database is matched with the inputted classification pattern of the unknown word,
And using a simplified Bayesian classifier, searching the word database for a known word having a Bayes distance closest to the unknown word;
(11) dividing one unknown sentence or name into D unknown words;
(12) Using a Bayes classifier, search for F known words most similar to the unknown word from the word database;
And displaying the unknown sentence or name by a DxF matrix of similar known words in several languages,
(13) matching the DxF matrix of similar known words displaying the unknown sentence or name with all known sentences and names in the sentence and name database;
And searching the sentence and name database for a known sentence or name most likely to be the unknown sentence or name,
(14) improving the characteristics of unknown words in the input unknown sentence or name so as to ensure that the input unknown sentence or name is correctly identified. A method of identifying utterances in any language characterized by

The step (3) further includes:
(A) within a unit time frame, calculate the variance value of the sample points of the speech signal and the variance value of the sample points of the noise, and if the variance value of the sample points of the speech signal is smaller than the variance value of the noise sample points Delete the unit time frame,
(B) calculating a sum of absolute distances between two adjacent utterance signal sample points and a sum of absolute distances between two adjacent noise sample points within a unit time frame; The method of claim 1, further comprising the step of deleting the time frame if the sum of absolute distances between sample points of the signal is smaller than the sum of absolute distances between sample points of the noise. To identify utterances.

The step (4) further includes:
(A) E = 12 equal length stretchable frames to divide the full waveform length of one word or continuous sound into E = 12 equal sections and cover the full waveform length Each section as a stretchable frame without filters and without overlapping so that they can touch and stretch them,
(B) Use a linear regression model with P = 12 regression coefficients to estimate a nonlinear time-varying waveform within each stretchable frame, and use the least squares method to calculate P = 12 Generate linear predictive code coefficients (LPC);
(C) Using Durbin's recurrence equation with N points in each frame,

2. The sample according to claim 1, comprising the step of: (d) displaying E = 12 LPCC vectors, which are words or continuous sounds, represented by an ExP = 12 × 12 matrix of LPCCs. To identify utterances in any language without using.

The step (5) further includes
(A) Divide the entire length of the waveform of unknown words or continuous sounds into E = 12 equal sections, and form each section as a stretchable frame without filters and without overlapping,
(B) To estimate a non-linear time-varying waveform, use a linear regression model with P = 12 regression coefficients in each E = 12 stretchable frame, and use the least squares method to calculate the LPC vector Produces
(C) Perform a least squares method using Durbin's recurrence equation,

(e) Calculate LPCC mean and variance of unknown words or continuous sound samples using a matrix of ExP = 12 × 12 consisting of LPCCs of unknown words or continuous sound samples with two samples And storing the mean and variance values in a raw database having the plurality of samples without using samples as claimed in claim 1.

The step (6) further includes

(E) using a simple Bayesian classifier to match the known words in the permanent database with all unknown words or continuous sounds in the raw database having the plurality of samples;

(G) After calculating the logarithmic value of f (x | ω _i ) and deleting unnecessary constants,
Use Bayesian classifier to display similarity by Bayesian distance,

(H) Each unknown word ω _i , i = 1,. . . , M, calculate the Bayesian distance l (ω _i ) from the known word X to the unknown word ω _i in (g),
(I) N Bayesian distances l (ω i) closest to the known word X in the permanent database to calculate a feature value of the known word in the permanent database, referred to as a standard pattern of known words ), The N unknown words closest to the known word X in the raw database with the plurality of samples comprising the LPCC mean and variance values of the samples around the known word A method for identifying utterances in any language without using a sample characterized in that

The step (11) further includes:
(A) For a speech signal and noise within a unit time frame, the sum of absolute distances between two adjacent sample points is calculated, and if the sum of absolute distances between sample points of the speech signal is a noise sample point If the unit time frame is smaller than the sum of the absolute distances between, the unit time frame is a unit time frame without speech signal,
(B) If the unit time frame without the speech signal is longer than the time between two syllables in one word, the boundary line between two unknown words in the one word Locating and dividing the unknown sentence or name into D unknown words on the boundary,
(C) Normalize the waveform of each of the D unknown words with E = 12 stretchable frames without filters and without overlapping, and within each frame, LPC vectors and D = 12 × 12 A LPCC vector that displays an unknown word by a matrix of and including a step of displaying the unknown sentence or name by means of a D = 12 × 12 matrix of LPCC How to identify language utterances.

The step (12) further includes:
Find out,
(B) identifying unknown language or utterances without using samples as claimed in claim 1, including the step of displaying unknown sentences or names with a DxF matrix of similar known words belonging to different languages how to.

The step (13) further includes:
(A) Select sentences and names that match (D-1), D, and (D + 1) known words in the sentence and name database;
(B) Select the matching known sentence or name with D words, and each of the D known words in the matching sentence or name and the F known words that are the most similar Each of the D columns of
(C) If each sequence of the F similar words that is most similar includes the corresponding words of the matching sentence or name in turn, the matching sentence or name is the unknown sentence or name. Determined that there was
(D) In (c), the number of correctly identified words is not D, or if the matching sentence or name is (D−1) or (D + 1) If it contains known words, then 3xF successive 3 columns of the most similar F words in the DxF matrix of known words to sort each known word in the matching sentence or name Using a sorting window,
To compare with the i th known word in the matching sentence or name, use the (i−1) th, i th, (i + 1) th column of the F most similar known words,
Use the first two columns of the F most similar words that are most similar to compare with the first known word in the matching sentence or name, and 3 × F sorting windows from the first column to the last column Move and
Calculate the number of known words in the matching sentence or name in a 3xF sorting window;
(E) the most likely match, calculated by the number of known words of the matching sentence or name in a 3xF sorting window divided by the total number of words in the matching sentence or name. The method of identifying utterances in any language without using a sample according to claim 1, comprising the step of selecting highly matching sentences or names.

The step (14) further includes:
(a) If the unknown sentence or name is not correctly identified, find the word ω of the unknown sentence or name that is not in the most similar F words;
(b) N words closest to the word ω are displayed from the word database by N matrixes of LPCC mean and variance, {μ _IJl , σ ² _IJl }, i = 1 _,. To find N known words for the word ω with a Bayesian distance of
Calculate a weighted average of N matrices,
as well as,
As new feature values, weighted averages, {μ _IJl , σ ² _IJl }, i = 1,..., E, i = 1 _,.
Replacing the standard pattern of the word ω in the word database, and storing the new feature value of the word ω in the word database as a new standard pattern of the word ω,
(C) From the word database, the N values closest to the word ω are displayed by N matrixes of LPCC mean and variance values, {μ _IJl , σ ² _IJl }, i = 1 _,. To find the N known words closest to the word ω with a Bayesian distance of

The sample of claim 1, comprising: replacing the standard pattern of the word ω and storing the new standard pattern of the word ω in the word database as a new standard pattern of the word ω. A method to identify utterances in any language without using it.