JPS59119397A

JPS59119397A - Voice recognition equipment

Info

Publication number: JPS59119397A
Application number: JP57231861A
Authority: JP
Inventors: 繁佐々木; 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1982-12-25
Filing date: 1982-12-25
Publication date: 1984-07-10

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】囚　発明の技術分野本発明は、音声認識装置、特に装置全体を分析部と認識
部と入出力管理部とに区分して、処理結果を１ｖ次引継
がせるよう構成すると共に、分析部と認識部との間の転
送を高速化した音声認識装置に関するものである。[Detailed Description of the Invention] Technical Field of the Invention The present invention is a speech recognition device, in particular, the entire device is divided into an analysis section, a recognition section, and an input/output management section, and the processing results are configured to be inherited from one level to the next. The present invention also relates to a speech recognition device that speeds up transfer between an analysis section and a recognition section.

（Ｂ）　　技術の背景と問題点従来から、音声認識装置においては、一般に、（Ｄ入力
されてくる音声の周波数分析を行って複数個の時系列特
徴情報を生成し、（＋＋）辞書に格納されている標準特
徴時系列とのマツチングをとって上記入力されてくる音
声のカテゴリを決定し、Ｏ１Ｏ当該認識結果を入出力装
置に出力するよう構成されている。このような音声認識
装置において、上記各処理（＋）　（ｉｉ）　（ｉｉ［
）に対応してマイクロプロセッサを用意し、各処理結果
を順次引継いでゆく構成が考慮されている。(B) Technical Background and Problems Traditionally, speech recognition devices generally perform frequency analysis of input speech to generate multiple pieces of time-series feature information and store them in a (++) dictionary. The voice recognition device is configured to determine the category of the input voice by matching it with the standard feature time series that has been provided, and output the recognition result to the input/output device.In such a voice recognition device, Each of the above processes (+) (ii) (ii[
), a configuration is being considered in which a microprocessor is prepared for each process and the results of each process are sequentially taken over.

このような構成を考慮する場合、各処理において１単位
として処理結果を引継ぐだめの単位を如何にとるかが問
題となる。即ち、上記処理（ｉ）　（ｉｉ）　（ｉｉＤ
に対応づけて、夫々分析部、認識部、入出力管理部とし
た場合における処理結果引継き上の問題点が第１図を参
照しつつ説明される。When considering such a configuration, the problem is how to take over the processing results as one unit in each process. That is, the above processing (i) (ii) (iiD
With reference to FIG. 1, problems in handing over processing results in the case of an analysis section, a recognition section, and an input/output management section will be explained with reference to FIG.

第１図において入力音声が図形の如きものであったとす
る。なお図示波形は、例えば「あった」などの撥音が混
在している入力音声の場合であると考えてよい。分析部
においては、従来周知の如く、入力音声の周波数分析を
行ってゆき各周波数帯毎の時系列特徴情報を生成してゆ
くが、一方入力音声の例えばエネルギを監視しており、
図示閾値以下の無音期間の長さをチェックしている。そ
して、例えば無音期間が０．３秒以上存在すれば単語間
の無音期間であるとみなし、それ以下であれば上述の如
き撥音などに起因するものとみて未だ単語が持続してい
ると判断するようにされる。In FIG. 1, it is assumed that the input voice is something like a figure. It should be noted that the illustrated waveform may be considered to be a case of an input voice in which a plucked sound such as "Ataru" is mixed, for example. As is well known in the art, the analysis section performs frequency analysis of the input voice and generates time-series feature information for each frequency band, but also monitors, for example, the energy of the input voice.
The length of the silent period below the indicated threshold is checked. For example, if a silent period exists for 0.3 seconds or more, it is considered to be a silent period between words, and if it is shorter than that, it is assumed that it is caused by the above-mentioned clipped sound, and it is determined that the word is still continuing. It will be done like this.

第１図図示の無音期間（５）については、０．３秒以内
であることから、分析部は単語が持続しているものとみ
ていることとなる。そして図示無音期間（Ｂ）において
始めて０．３秒を、検出したことから、単語の区切りが
到来したものとして、それまでの各周波数帯の時系列特
徴情報を、例えば１６分割し１６ステツプの情報として
認識部へ引渡すようにされる。第１図図示の場合、認識
部は、上記０．３秒経過後に当該１６ステツプの情報を
受取処理を開始する。そしてその認識結果を入出力管理
部に引渡すようにされる。Since the silent period (5) shown in FIG. 1 is within 0.3 seconds, the analysis unit considers that the word continues. Then, since the first 0.3 seconds in the silent period (B) in the diagram is detected, it is assumed that a word break has arrived, and the time series feature information of each frequency band up to that point is divided into 16, for example, and 16 steps of information are generated. The information is passed to the recognition unit as follows. In the case shown in FIG. 1, the recognition unit starts receiving the information of the 16 steps after the lapse of 0.3 seconds. The recognition result is then delivered to the input/output management section.

上述の如く、分析部と認識部と入出力管理部との間で処
理結果を順次引継ぐ方式を採用しようとするとき、一般
には、第１図図示の如き引継ぎ態様とならざるを得ない
。しかし、オン・ライン音声認識処理を行わせようとす
るとき、上記０．３秒の遅れ時間をも問題となる。As described above, when attempting to adopt a method of sequentially handing over processing results between the analysis section, the recognition section, and the input/output management section, the handover mode as shown in FIG. 1 is generally inevitable. However, when trying to perform online speech recognition processing, the delay time of 0.3 seconds mentioned above also becomes a problem.

（６）発明の目的と構成本発明は上記の点を解決することを目的としておシ、本
発明の音声認識装置は、入力されてくる音声の周波数分
析を行って複数個の時系列特徴情報を生成し、辞書に格
納されている標準特徴時系列とのマツチングをとって上
記入力されてくる音声のカテゴリを決定し、当該認識結
果を入出力装置に出力するよう構成された音声認識装置
において、上記複数個の時系特徴情報を生成する分析部
と、上記カテコ゛りを決定する認識部と、上記出力を行
う入出力管理部とをもうけ、上記分析部によって得られ
た結果を上記認識部に引渡し、該認識部によって得られ
た結果を上記入出力管理部に引渡すよう・千イブライン
処理を行うと共に、上記分析部は、上記入力されてくる
音声の無音期間の時間長を監視する監゛視手段をそなえ
、上記無音期間の開始時にそれまでの処理結果にもとづ
く上記時系列特徴情報を上記認識部に転送した上、で、
当該無音期間が単語間無音期間であったか否かを上記認
識部に通知し、単語間無音期間でなかった場合に改めて
次の無音期間で時系列特徴情報を転送するよう構成され
、上記認識部は、受信した時系列特徴情報にもとづいて
認識処理を開始し、上記単語間無音期間に対応する通知
時に当該認識処理を継続するか停止するかを判断するよ
う構成したことを特徴としている。以下図面を参照しつ
つ説明する。(6) Object and Structure of the Invention The present invention aims to solve the above-mentioned problems.The speech recognition device of the present invention performs frequency analysis of input speech to obtain a plurality of time-series feature information. In the speech recognition device configured to generate the above-mentioned input speech, determine the category of the input speech by matching it with the standard feature time series stored in the dictionary, and output the recognition result to the input/output device. , an analysis section that generates the plurality of time series feature information, a recognition section that determines the category, and an input/output management section that outputs the above, and the results obtained by the analysis section are sent to the recognition section. The analysis section monitors the length of the silent period of the input voice. comprising a visual means, and transmitting the time-series feature information based on the processing results up to that point to the recognition unit at the start of the silent period;
The recognition unit is configured to notify the recognition unit whether or not the silent period is an inter-word silent period, and if it is not an inter-word silent period, transfer the time-series feature information again in the next silent period, and the recognition unit The present invention is characterized in that the recognition process is started based on the received time-series feature information, and it is determined whether to continue or stop the recognition process at the time of notification corresponding to the inter-word silent period. This will be explained below with reference to the drawings.

■）　発明の実施例第２図は本発明の一実施例構成を示し、第３図は第１図
図示説明図に対応する本発明による一実施例処理を説明
する説明図を示す。(2) Embodiment of the Invention FIG. 2 shows the configuration of an embodiment of the present invention, and FIG. 3 shows an explanatory diagram illustrating processing of an embodiment of the present invention corresponding to the explanatory diagram shown in FIG. 1.

第２図において、１は分析部、２は認識部、３は入出力
管理部、４Ｉｒｉネツトワーク・ノードであって各部相
互間の転送を仲介するもの、５は分析回路であって複数
の周波数帯域毎の時系列特徴情報を抽出するもの、６は
分析プロセッサであって分析部ＩＫおける処理・制御を
行うもの、７は無音期間監視手段、８は認識プロセッサ
であって認識処理に関する処理・制御を行うもの、９は
辞書＆マツチング回路、１０は入出カプロセッサであっ
て認識結果にもとづいて例えば文字出力するなどのため
の処理・制御を行うもの、１１は入出力装置を表わして
いる。In FIG. 2, 1 is an analysis section, 2 is a recognition section, 3 is an input/output management section, 4 is an Iri network node that mediates transfer between each section, and 5 is an analysis circuit that supports multiple frequencies. 6 is an analysis processor that performs processing and control in the analysis unit IK; 7 is a silent period monitoring means; 8 is a recognition processor that performs processing and control related to recognition processing. 9 is a dictionary and matching circuit; 10 is an input/output processor which performs processing and control such as outputting characters based on recognition results; and 11 an input/output device.

以下第３図を併わせ参照しつつ動作を説明する。The operation will be explained below with reference to FIG.

分析回路５において上述の叩く複数個の時系列特徴情報
を抽出しつつあυ、この間、分析フ０ロセツサ６は監視
手段７を利用しつつ無音期間の存在をチェックしている
。そして図示無音期間（５）が開始されると、分析フ０
ロセツザ６はその時点で単語間無音期間が始まったもの
とみなしてそれ捷での時系列特徴情報を上述の如く例え
は１６ステツ２°の情報に区分して（図示分析データ１
）、認識部２の認識７０ロセツサ８に転送する。認識グ
ロセツ、す８は、即時にマツチング処理を開始して処理
を続ける。また早期に処理を終ればその結果を入出力管
理部３の入出カプロセッサ１０に転送する〇この間に、
入力音声の無音期間が図示期間（４）の如く０．３秒以
内で解消したとすると、分析プロセッサ６は認識プロセ
ッサ８にまた必要に応じて入出カプロセッサ１０に対し
てキャンセル命令を発する。そして、認識プロセッサ８
や入出力ノロセッサＩＯにおいては、それまでの処理を
破棄する。While the analysis circuit 5 is extracting the above-mentioned pieces of time-series characteristic information υ, the analysis processor 6 uses the monitoring means 7 to check for the existence of a silent period. Then, when the illustrated silent period (5) starts, the analysis frame 0
Rosetsuza 6 assumes that the inter-word silent period has started at that point, and divides the time-series feature information at that point into information of 16 steps and 2 degrees as described above (Illustrated analysis data 1
), the recognition unit 2 transfers the recognition 70 to the processor 8. The recognition unit 8 immediately starts the matching process and continues the process. Also, if the processing is completed early, the result is transferred to the input/output processor 10 of the input/output management section 3. During this time,
Assuming that the silent period of the input voice disappears within 0.3 seconds as shown in period (4), the analysis processor 6 issues a cancel command to the recognition processor 8 and, if necessary, to the input/output processor 10. And recognition processor 8
In the input/output processor IO, the processing up to that point is discarded.

次いで図示期間（Ｂ）の如く無音期間が開始すると、分
析プロセッサ６は、図示分析開始点ｔ１からの情報を上
記と同様に１６ステツプの情報に区分して（図示分析デ
ータ２）、認識部２の認識プロセッサ８に転送する。認
識プロセッサ８は即時処理を開始する。図示無音期間（
Ｂ）の如くＯ・、３秒以上無音期間が継続すると、分析
プロセッサ６は、認識プロセッサ８にまた必要に応じて
入出力プロセッサ１０に対して継続命令を発する。した
がって、処理を継続して実行されてゆく。勿論、図示無
音期間の帆３秒が経過すると、次の単語に対応する音声
が入力されてきてもよい。Next, when a silent period starts as shown in the illustrated period (B), the analysis processor 6 divides the information from the illustrated analysis start point t1 into information of 16 steps (illustrated analysis data 2) in the same way as above, and the recognition unit 2 It is transferred to the recognition processor 8 of. The recognition processor 8 begins immediate processing. Illustrated silent period (
If the silent period continues for O.3 seconds or more as in B), the analysis processor 6 issues a continuation command to the recognition processor 8 and, if necessary, to the input/output processor 10. Therefore, the process continues to be executed. Of course, after the illustrated silent period of 3 seconds has elapsed, the voice corresponding to the next word may be input.

（ト）　発明の詳細な説明した如く１、本発明によれば、音声認識装置を、
分析部、認識部、入出力管理部に区分した上で、各部間
の情報の転送を効率よく行うことが可能となり、オンラ
イン処理時の時間遅れを解消することが可能となる。(g) As described in detail of the invention, 1. According to the present invention, a voice recognition device,
By dividing the system into an analysis section, a recognition section, and an input/output management section, it becomes possible to efficiently transfer information between each section, and it becomes possible to eliminate time delays during online processing.

[Brief explanation of drawings]

第１図は装置内での処理を区分して処理結果を順次転送
してゆく構成を考慮した場合に顕著になる問題点を説明
する説明図、第２図は本発明の一実施例構成、第３図は
第１図図示説明図に交・４応する本発明による一実施例
処理を説明する説明図を示す。図中、１は分析部、２は認識部、３は入出力管理部、４
けネットワーク・ノー）１．６，８．１０は夫々プロセ
ッサを表わす。特許出願人　富士通株式会社代理人弁理士　　森　　１）　　　　　寛（外１名）FIG. 1 is an explanatory diagram illustrating problems that become noticeable when considering a configuration in which processing within the device is divided and processing results are sequentially transferred, and FIG. FIG. 3 shows an explanatory diagram illustrating one embodiment of the process according to the present invention, which corresponds to or corresponds to the illustrated explanatory diagram in FIG. 1. In the figure, 1 is the analysis section, 2 is the recognition section, 3 is the input/output management section, and 4
1.6 and 8.10 represent processors, respectively. Patent applicant Hiroshi Mori (1 other person), agent patent attorney of Fujitsu Ltd.

Claims

[Claims]

Performing frequency analysis of the input voice to generate a plurality of time series feature information, and determining the category of the input voice by matching with the standard feature time series stored in a dictionary, In a speech recognition device configured to output the recognition result to an input/output device, an analysis unit that generates the plurality of time series feature information, a recognition unit that determines the category, and an input/output manager that outputs the above. The analysis section performs /f-ebrining processing to transfer the results obtained by the analysis section to the recognition section and the results obtained by the certification section to the human/output management section. comprises a monitoring means for monitoring the length of the silent period of the input voice, and at the start of the silent period, transfers the time-series characteristic information based on the processing results up to that point to the recognition unit, The recognition unit is configured to notify the recognition unit whether or not the silent period is an inter-word silent period, and if it is not an inter-word silent period, transfer the time-series feature information again in the next silent period, and the recognition unit The unit is characterized in that it is configured to start the recognition process based on the received time-series feature information, and determine whether to continue or stop the drawn recognition process at the time of notification corresponding to the inter-word silent period. Speech recognition device.