JPH06214595A

JPH06214595A - Voice recognition method

Info

Publication number: JPH06214595A
Application number: JP5007396A
Authority: JP
Inventors: Makoto Shosakai; 誠庄境; Kunihiko Owa; 邦彦尾和; Kazuya Takeda; 一哉武田; Shingo Kuroiwa; 眞吾黒岩
Original assignee: Kokusai Denshin Denwa KK; Asahi Chemical Industry Co Ltd
Current assignee: KDDI Corp; Asahi Chemical Industry Co Ltd
Priority date: 1993-01-20
Filing date: 1993-01-20
Publication date: 1994-08-05

Abstract

PURPOSE:To eliminate waiting time of each operation processor and to reduce a frame length by repeatedly executing a third operation process for every frame with a new first operation process. CONSTITUTION:After a first operation process of a first operation processor 10, a second operation processor 20 repeatedly executes a second operation process for every frame with at least more than one frame time delay, the processor 10 repeatedly executes a third operation process, which is at least more than one frame time delayed than the second operation process, with the first operation process. Thus, a series of processes, i.e., the first to the third operation processes, which are related to one data and executed by two operation processors, are executed in plural frame time. However, each operation processor executes the process assigned to itself for every frame and performs an operation process against the data which are time sequentially inputted.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、フレームのような一定
周期で音声認識処理を実行する音声認識方法に関し、よ
り詳しくは、複数のデジタル処理プロセッサで処理を分
担する音声認識方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition method for executing voice recognition processing in a fixed cycle such as a frame, and more particularly to a voice recognition method in which a plurality of digital processing processors share the processing.

【０００２】[0002]

【従来の技術】従来、一般的な音声認識方法では、音声
を電気信号に変換し、その信号波形の特徴を抽出し、予
め定めた標準パターンと特徴比較を行う。最も類似した
標準パターンに付加された識別ラベル（音素や音韻につ
いての識別番号）を音声認識結果として信号出力する。2. Description of the Related Art Conventionally, in a general speech recognition method, speech is converted into an electric signal, the characteristics of the signal waveform are extracted, and the characteristics are compared with a predetermined standard pattern. An identification label (identification number for phoneme or phoneme) added to the most similar standard pattern is output as a speech recognition result.

【０００３】このような音声認識方法では音声信号を一
定周期（フレーム周期と呼ばれる）毎に特徴抽出を行
い、その結果に基づいて音声認識を行う。信号波形の特
徴抽出や標準パターンの類似度の算出（距離計算）には
複雑な計算処理が伴なうために、その演算処理に時間が
かかる。そこで、最近では、複数のデジタル処理プロセ
ッサにより演算処理を分担して行い、フレーム周期を短
縮させることによって音声認識精度の向上を図った音声
認識装置が提案されている。In such a voice recognition method, a feature is extracted from a voice signal at regular intervals (called a frame period), and voice recognition is performed based on the result. Since a complicated calculation process is involved in the extraction of the characteristic of the signal waveform and the calculation of the similarity of the standard pattern (distance calculation), the calculation process takes time. Therefore, recently, a voice recognition apparatus has been proposed in which a plurality of digital processors share the arithmetic processing and shorten the frame period to improve the voice recognition accuracy.

【０００４】従来、この種、音声認識装置の部分回路構
成を図２に示す。図２において、第１のデジタル（演
算）処理プロセッサ（ＤＳＰ）１０と第２のＤＳＰ２０
の間にバッファ１５が設けられている。FIG. 2 shows a partial circuit configuration of a speech recognition apparatus of this type conventionally. In FIG. 2, a first digital (arithmetic) processor (DSP) 10 and a second DSP 20
A buffer 15 is provided between them.

【０００５】このような構成において、第１のＤＳＰ１
０は音声信号から抽出された特徴パラメータを用いて音
声認識のための第１の演算処理を実行すると、バッファ
１５にその演算結果Ｐ（ｔ）を転送する。バッファ１５
は転送タイミングの調整のために演算結果Ｐ（ｔ）を一
時格納し、第２のＤＳＰ２０に一時格納後の演算結果
Ｐ′（ｔ）を引渡す。In such a configuration, the first DSP 1
When 0 executes the first calculation process for voice recognition using the characteristic parameter extracted from the voice signal, 0 transfers the calculation result P (t) to the buffer 15. Buffer 15
Temporarily stores the operation result P (t) for adjusting the transfer timing, and delivers the operation result P ′ (t) after the temporary storage to the second DSP 20.

【０００６】第２のＤＳＰ２０はこの演算結果Ｐ′
（ｔ）を用いて音声認識に関わる第２の演算処理を行
う。この演算結果Ｑ（ｔ）はバッファ１５に一時格納
後、Ｑ′（ｔ）として第１のＤＳＰ１０に引渡される。
第１のＤＳＰ１０は第２の演算結果Ｑ′（ｔ）を用い
て、第３の演算処理を行って音声認識結果を出力する。The second DSP 20 outputs the calculation result P '.
The second arithmetic processing relating to voice recognition is performed using (t). The calculation result Q (t) is temporarily stored in the buffer 15 and then delivered to the first DSP 10 as Q ′ (t).
The first DSP 10 uses the second calculation result Q ′ (t) to perform a third calculation process and output a voice recognition result.

【０００７】このような一連の処理が１フレームの中で
実行される。参考のために、上述の処理の実行タイミン
グを図３に示した。図３から明らかなように１フレーム
内で実行される処理は、以下の処理、すなわち、第１の
ＤＳＰ１０の演算→バッファ１５への転送→第２のＤＳ
Ｐ２０の演算→バッファ１５への転送→第１のＤＳＰ１
０の演算の処理があるため、１フレーム長は上述の処理
時間の合計となる。Such a series of processing is executed in one frame. For reference, the execution timing of the above process is shown in FIG. As is clear from FIG. 3, the processing executed in one frame is as follows: operation of the first DSP 10 → transfer to the buffer 15 → second DS
P20 operation → transfer to buffer 15 → first DSP1
Since there is a calculation operation of 0, one frame length is the total of the above processing times.

【０００８】[0008]

【発明が解決しようとする課題】また、上述の第１，第
２のＤＳＰ１０，２０の一方が演算処理を実行している
間、他方のＤＳＰは待ち時間となる（図３参照）。加え
て、ＤＳＰの処理速度には限界があるので、ＤＳＰの最
高処理速度を用いても１フレーム長に限界が生じ１フレ
ーム長の短縮が難しいという解決すべき課題が従来方法
にはあった。Further, while one of the first and second DSPs 10 and 20 described above is executing the arithmetic processing, the other DSP becomes a waiting time (see FIG. 3). In addition, since the processing speed of the DSP is limited, even if the maximum processing speed of the DSP is used, there is a limit to the length of one frame and it is difficult to reduce the length of one frame.

【０００９】そこで、上述の点に鑑みて、本発明の目的
は、音声の認識処理を複数のＤＳＰで分担して実行する
場合、ＤＳＰの待ち時間を減少させ、１フレーム長を短
縮することの可能な音声認識方法を提供することにあ
る。Therefore, in view of the above points, an object of the present invention is to reduce the waiting time of the DSP and shorten the length of one frame when the voice recognition processing is shared by a plurality of DSPs. It is to provide a possible voice recognition method.

【００１０】[0010]

【課題を解決するための手段】このような目的を達成す
るために、本発明は、第１の演算プロセッサにより第１
の演算処理を実行し、その第１の演算結果を用いて、第
２の演算プロセッサにより第２の演算処理を実行し、そ
の第２の演算処理結果を用いて前記第１の演算プロセッ
サにより第３の演算処理を実行して、音声認識に関わる
処理をフレーム毎に繰り返し実行する音声認識方法にお
いて、前記第２の演算プロセッサは、前記第１の演算プ
ロセッサの第１の演算処理の後、少なくとも１フレーム
以上の時間だけ遅延させて前記第２の演算処理をフレー
ム毎に繰り返し実行し、前記第１の演算プロセッサは、
前記第２の演算処理より少なくとも１フレーム以上の時
間だけ遅延させた前記第３の演算処理を新たな前記第１
の演算処理と共にフレーム毎に繰り返し実行することを
特徴とする。In order to achieve such an object, the present invention provides a first arithmetic processor which provides a first operation processor.
The second arithmetic processor executes the second arithmetic processing using the first arithmetic result, and the second arithmetic processor executes the second arithmetic processing using the second arithmetic processing result. In the speech recognition method of executing the arithmetic processing of No. 3, and repeatedly executing the processing related to the speech recognition for each frame, the second arithmetic processor is at least after the first arithmetic processing of the first arithmetic processor. The second arithmetic processing is repeatedly executed for each frame with a delay of one frame or more, and the first arithmetic processor is
The third arithmetic process delayed from the second arithmetic process by at least one frame or more is newly added to the first arithmetic process.
It is characterized in that it is repeatedly executed for each frame together with the calculation processing of.

【００１１】[0011]

【作用】本発明では、１つのデータについて関連して２
つの演算処理プロセッサにより実行される、第１の演算
処理〜第３の演算処理までの一連の処理は複数フレーム
時間で実行されるが、各演算プロセッサは自己に割当て
られた処理をフレーム毎に実行して時系列的に入力され
るデータに対して演算処理を施す。In the present invention, one data is related to two
A series of processing from the first arithmetic processing to the third arithmetic processing, which is executed by one arithmetic processing processor, is executed in a plurality of frame times, but each arithmetic processor executes the processing assigned to itself for each frame. Then, arithmetic processing is performed on the data input in time series.

【００１２】[0012]

【実施例】以下、図面を参照して本発明の実施例を詳細
に説明する。Embodiments of the present invention will now be described in detail with reference to the drawings.

【００１３】図１は本発明実施例の処理順序を示す。FIG. 1 shows the processing sequence of the embodiment of the present invention.

【００１４】なお、本実施例の回路構成は図２の従来例
と同様であり、また構成回路の処理内容も従来例と同様
である。本実施例では、従来、１フレーム内で実行して
いた複数の処理を１フレームずつ順次に遅延させて実行
させるように処理順序を変えることによりＤＳＰの待ち
時間をなくし、しかもフレーム単位で音声認識結果を出
力する。以下、本実施例の処理順序を説明する。The circuit configuration of this embodiment is the same as that of the conventional example shown in FIG. 2, and the processing contents of the configuration circuit are also the same as those of the conventional example. In the present embodiment, the waiting time of the DSP is eliminated by changing the processing order so that a plurality of processes that were conventionally executed in one frame are sequentially delayed by one frame and then executed, and voice recognition is performed in frame units. Output the result. The processing order of this embodiment will be described below.

【００１５】図１において、第１のＤＳＰ１０が、音声
認識に用いる特徴パラメータを時刻ｔで入力すると、第
１の演算処理を実行し、その演算結果Ｐ（ｔ）をバッフ
ァ１５に引渡す。次の時刻ｔ＋１で、時刻ｔの演算結果
がＰ′（ｔ）として第２のＤＳＰ２０に引渡される。第
２のＤＳＰ２０は時刻ｔ＋２で、第１のＤＳＰ１０の時
刻ｔの演算結果を用いて第２の演算を行う。その演算結
果Ｑ（ｔ）がバッファ１５へ転送され、時刻ｔ＋３でバ
ッファ１５から第１のＤＳＰ１０に引渡される。第１の
ＤＳＰは時刻ｔ＋４で第２の演算結果Ｑ′（ｔ）に対す
る第３の演算処理を実行して音声認識結果を出力する。
なお、第１のＤＳＰは時刻ｔ＋４で従来と同様、第１の
演算処理をも行うが、この時刻の第１の演算処理に用い
る特徴パラメータは時刻ｔ＋４で入力したデータであ
る。以上がある時刻ｔの特徴パラメータが時刻ｔ＋４で
音声認識結果に変換されるまでの処理順序である。In FIG. 1, when the first DSP 10 inputs the characteristic parameter used for voice recognition at time t, the first arithmetic processing is executed and the arithmetic result P (t) is delivered to the buffer 15. At the next time t + 1, the calculation result at time t is delivered to the second DSP 20 as P ′ (t). The second DSP 20 performs the second calculation at time t + 2 using the calculation result of the first DSP 10 at time t. The calculation result Q (t) is transferred to the buffer 15 and is transferred from the buffer 15 to the first DSP 10 at time t + 3. The first DSP executes the third arithmetic processing on the second arithmetic result Q ′ (t) at time t + 4 and outputs the voice recognition result.
Although the first DSP also performs the first arithmetic processing at time t + 4 as in the conventional case, the characteristic parameter used for the first arithmetic processing at this time is the data input at time t + 4. The above is the processing order until the characteristic parameter at time t is converted to the voice recognition result at time t + 4.

【００１６】第１のＤＳＰ１０では、第１の演算処理と
第３の演算処理を各時刻で繰り返して行く。したがって
第１のＤＳＰでは一定時間毎に音声認識結果を出力して
行くことができる。The first DSP 10 repeats the first arithmetic processing and the third arithmetic processing at each time. Therefore, the first DSP can output the voice recognition result at regular intervals.

【００１７】本実施例ではフレーム毎に、各構成間のデ
ータ転送の同期がとられているので、１フレーム内でデ
ータ転送を行う従来例に比べ第１ＤＳＰ１０および第２
ＤＳＰ２０共に相手側とデータ転送を行う間の待ち時間
がなくなる。したがって、この待ち時間を短縮した分だ
け１フレーム長を短縮することができる。In this embodiment, since the data transfer between the respective structures is synchronized for each frame, the first DSP 10 and the second DSP 10 are different from the conventional example in which the data transfer is performed within one frame.
The DSP 20 eliminates the waiting time during data transfer with the other party. Therefore, the length of one frame can be shortened by the shortened waiting time.

【００１８】より具体的には、従来の１フレーム長は第
１のＤＳＰ１０の第１，第３の演算処理の時間＋バッフ
ァ１５のデータ転送時間＋第２のＤＳＰ２０の第２の演
算処理時間である。本実施例では第１，第３演算処理の
時間か、バッファ１５のデータ転送時間かまたは、第２
のＤＳＰ２０の第２の演算処理時間の中のいずれかの最
大となる時間が１フレーム長となる。More specifically, the conventional one frame length is the time of the first and third arithmetic processing of the first DSP 10 + the data transfer time of the buffer 15 + the second arithmetic processing time of the second DSP 20. is there. In the present embodiment, the time for the first and third arithmetic processing, the data transfer time of the buffer 15, or the second
The maximum time of any one of the second arithmetic processing times of the DSP 20 is 1 frame length.

【００１９】本実施例の他に次の例を実施できる。In addition to this embodiment, the following example can be carried out.

【００２０】（１）本実施例では、第１のＤＳＰ１０
および第２のＤＳＰ２０の間にデータ転送用バッファを
設ける例を示しているが、バッファを特に設ける必要は
なく、第１のＤＳＰ１０と第２のＤＳＰ２０との間で直
接データの授受を行うこともできる。この場合は、上記
両プロセッサ間でデータ転送用の通信制御信号の授受を
行う。また、本実施例ではバッファ１５は第１のＤＳＰ
１０のデータの入力用および出力用に兼用しているが、
入力用と出力用に２つのバッファを用いてもよい。(1) In this embodiment, the first DSP 10
Although an example in which a data transfer buffer is provided between the second DSP 20 and the second DSP 20 is shown, it is not necessary to provide a buffer in particular, and data can be directly transferred between the first DSP 10 and the second DSP 20. it can. In this case, communication control signals for data transfer are exchanged between the two processors. Further, in this embodiment, the buffer 15 is the first DSP.
It is used for both input and output of 10 data,
Two buffers may be used for input and output.

【００２１】（２）本実施例では、２つのＤＳＰで音
声認識処理を実行する場合を例にしているが、２つ以上
のＤＳＰを直列的に接続し、最下流のＤＳＰの処理結果
を用いて最下流以外のＤＳＰが再び演算処理を実行する
場合にも本発明を適用できる。(2) In this embodiment, the case where the voice recognition processing is executed by two DSPs is taken as an example, but two or more DSPs are connected in series and the processing result of the most downstream DSP is used. The present invention can also be applied to a case where a DSP other than the most downstream one executes the arithmetic processing again.

【００２２】（３）本実施例では音声認識処理に関わ
る３つの処理を２つのＤＳＰで実行する例であるが、２
つのＤＳＰで実行する処理は、３つ以上でもよく、この
場合は、授受を行う関連データについて２つのＤＳＰが
実行する処理のフレームタイミングを異ならせることに
なる。(3) In this embodiment, three DSP-related processes are executed by two DSPs.
The number of processes executed by one DSP may be three or more. In this case, the frame timings of the processes executed by the two DSPs for the related data to be exchanged will be different.

【００２３】[0023]

【発明の効果】以上、説明したように、本発明によれ
ば、各演算処理プロセッサの待ち時間がなくなり、ま
た、フレーム長を短くできるので、音声認識精度の向上
に寄与することができる。As described above, according to the present invention, the waiting time of each arithmetic processing processor is eliminated and the frame length can be shortened, which can contribute to the improvement of the voice recognition accuracy.

[Brief description of drawings]

【図１】本発明実施例の処理順序を示す説明図である。FIG. 1 is an explanatory diagram showing a processing order of an embodiment of the present invention.

【図２】音声認識装置の部分構成を示すブロック図であ
る。FIG. 2 is a block diagram showing a partial configuration of a voice recognition device.

【図３】従来例の処理順序を示す説明図である。FIG. 3 is an explanatory diagram showing a processing order of a conventional example.

[Explanation of symbols]

１０第１のデジタル処理プロセッサ（第１のＤＳＰ）１５バッファ２０第２のデジタル処理プロセッサ（第２のＤＳＰ） 10 First Digital Processor (First DSP) 15 Buffer 20 Second Digital Processor (Second DSP)

───────────────────────────────────────────────────── フロントページの続き (72)発明者武田一哉東京都新宿区西新宿２丁目３番２号国際電信電話株式会社内 (72)発明者黒岩眞吾東京都新宿区西新宿２丁目３番２号国際電信電話株式会社内 ─────────────────────────────────────────────────── ─── Continued Front Page (72) Inventor Kazuya Takeda 2-3-2 Nishishinjuku, Shinjuku-ku, Tokyo International Telegraph and Telephone Corporation (72) Inventor Shingo Kuroiwa 2-3-2 Nishishinjuku, Shinjuku-ku, Tokyo No. International Telegraph and Telephone Corporation

Claims

[Claims]

1. A first arithmetic processor executes a first arithmetic processing, a second arithmetic processor executes a second arithmetic processing using the first arithmetic result, and a second arithmetic processing is executed.
In the speech recognition method, wherein the third arithmetic processing is executed by the first arithmetic processor by using the arithmetic processing result of 1., and the processing related to the speech recognition is repeatedly executed for each frame, wherein the second arithmetic processor is After the first arithmetic processing of the first arithmetic processor, the second arithmetic processing is repeatedly executed for each frame with a delay of at least one frame or more, and the first arithmetic processor is configured to execute the second arithmetic processing. A speech recognition method, characterized in that the third arithmetic processing delayed from the arithmetic processing by at least one frame or more is repeatedly executed for each frame together with the new first arithmetic processing.