JPS62195699A

JPS62195699A - Continuous numerical voice recognition

Info

Publication number: JPS62195699A
Application number: JP61036949A
Authority: JP
Inventors: 広田　敦子; 山田　興三
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-02-21
Filing date: 1986-02-21
Publication date: 1987-08-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、音声認識におけるＭａ数字音声認識方法に
関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a Ma digit speech recognition method in speech recognition.

（従来の技術）従来より、連続発声した連続数字の入力音声を認識し判
定する連続数字音声認識の方法の開発が進められ実用に
供されてきている６例えば文献「特開昭５８−１８９８
９５号」に開示されているこの種の認識方法では、連続
音声認識方法のうち特に数字に関して、ＤＰマツチング
手法を用い、かつ桁数を限定することによって、数字の
棒読みの連続入力を認識するものである。(Prior Art) Continuous digit speech recognition methods have been developed and put to practical use in order to recognize and judge input speech of consecutive digits uttered continuously.
This type of recognition method disclosed in No. 95 uses a DP matching method especially for numbers among continuous speech recognition methods and limits the number of digits to recognize continuous input of reading numbers. It is.

この従来提案された連続数字音声認識方法を第２図を参
照して簡単に説明する。This conventionally proposed continuous digit speech recognition method will be briefly explained with reference to FIG.

第２図において１はＪ！Ｗ統数字音声の入力端子。In Figure 2, 1 is J! W syntax numeral audio input terminal.

２はＡ／Ｄ変換器、３はバッファ、４は特徴抽出部、５
は標準パタンメモリ、６は距離計算部、７は連続ＤＰ処
理部、８は論理演算処理部、９はパワー分析部、１０は
リミッタ、１１は出力端子の如く構成されている。2 is an A/D converter, 3 is a buffer, 4 is a feature extraction unit, 5
6 is a standard pattern memory, 6 is a distance calculation section, 7 is a continuous DP processing section, 8 is a logic operation processing section, 9 is a power analysis section, 10 is a limiter, and 11 is an output terminal.

この方法は、入力音声から一定間隔で短時間パワーを求
め、さらに短時間パワーの時系列値を直交関数展開し、
直交関数展開により入力音声を構成するモーラ（音節〕
数を求める。そして、求められたモーラ数に１を加えた
値を２で割り、小数以下を切り捨てた整数を入力音声が
連続数字であると５の桁数を推定し認識処理を行うもの
である。This method calculates the short-term power from the input audio at regular intervals, and then expands the time-series value of the short-time power with an orthogonal function.
Mora (syllables) that compose input speech by orthogonal function expansion
Find the number. Then, if the input voice is a continuous number, the number of digits of 5 is estimated by dividing the value obtained by adding 1 to the obtained mora number and dividing the value by 2, rounding down the decimal parts, and performing recognition processing.

（発明が解決しようとする問題点）しかしながら、上述した従来方法では、第１に連続ＤＰ
マツチング法を用いているためマツチング部等の規模が
非常に大きくなり経済性の点で非現実的なものとなる。(Problems to be Solved by the Invention) However, in the above-mentioned conventional method, firstly, the continuous DP
Since the matching method is used, the scale of the matching section etc. becomes extremely large, making it unrealistic from an economic point of view.

第２に桁数の限定があるので桁数の限定がない方法より
も、マツチング回数が減少して認識処理時間がある程度
短縮されるという利点があるが、実際に連続数字認識の
実用化を考慮した場合、桁数が限定され、かつＳ読み発
声では（例えば電話番号の入力「ゼロニイゴオハチ」へ
利用する程度で）非常に利用範囲が狭くなる。さらにこ
の第２の点に関連する事柄として、第３に様々な単位（
例えば金額の場合の「円」、距離の単位を表わすｒＫｍ
Ｊ　、他〕のついた連続数字を音声として発声する場合
、必ず「Ｏ万０千Ｏ百Ｏ十ＯＪと位取りを入れて発声す
るので、棒読み連続数字はもとより、それ以外の連続数
字をも含む一般的な連続数字を前述したような「棒読み
の数字」専用のアルゴリズムによって認識することは当
然困難となる。従って、今後使用者の要求に対応した連
続数字認識方法を考慮した場合、両方の機衡を兼ね備え
入力音声に即座に対応できるシステムが必要となると考
えられる。Second, since there is a limit on the number of digits, it has the advantage that the number of matchings is reduced and the recognition processing time is shortened to some extent compared to a method without a limit on the number of digits. In this case, the number of digits is limited, and the range of use for S-pronunciation is extremely narrow (for example, when inputting a telephone number, ``Zero-ni-igo-ohachi''). Furthermore, related to this second point, thirdly, various units (
For example, "yen" for monetary amount, rKm for distance unit
J, etc.], it is always uttered with a scale such as "O million thousand O hundred O ten O J", so it includes not only consecutive numbers with a standard reading but also other consecutive numbers. Naturally, it is difficult to recognize ordinary consecutive numbers using the above-mentioned algorithm dedicated to "reading numbers". Therefore, when considering continuous digit recognition methods that meet the needs of users in the future, it is thought that a system that has both mechanisms and can immediately respond to input speech will be required.

この発明の目的は、上述した問題点に鑑み、任意の桁数
の数字を連続的に発声した際に、その入力音声が位取り
有りか、数字の棒読みかを短時間で検知し、検知後、各
々に対応した認識処理を精度良く行うことを可能にした
連続数字認識方法を提供することにある。In view of the above-mentioned problems, an object of the present invention is to detect in a short time whether the input voice has a scale or a reading of the numbers when a number of arbitrary digits is uttered continuously, and after detection, It is an object of the present invention to provide a continuous number recognition method that enables highly accurate recognition processing corresponding to each number.

（問題点を解決するための手段）この目的の達成を図るため、この発明の連続数字音声認
識方法においては、特にＮ桁（但し、Ｎは正の任意の整
数）の数字を連続発声した場合、入力音声が位取り有り
（Ｏ万Ｏ千Ｏ百ｏ十０）か、数字の棒読みかを検知して
入力形態認識判定を行う、そして、この判定に当り、始
端フレームと終端フレームとで定まる音声区間の音声パ
ワーデータを単語候補毎にブロック化する０次に、この
音声パワーデータの終端フレームを検出し、この検出時
点からこの終端フレームから始端フレーム側へ１ブロッ
ク目と２ブロック目のみの単語候補の音声パタンと位標
準パタンとの距離演算を行う０次に得られた判定結果に
従い形態に応じた認識処理を行う。(Means for solving the problem) In order to achieve this purpose, in the continuous digit speech recognition method of the present invention, especially when digits of N digits (N is any positive integer) are continuously uttered, , the input form recognition is determined by detecting whether the input voice has a scale (00,000,000,000) or whether the number is read in a straight line, and in making this determination, the voice determined by the start frame and the end frame is used. Block the audio power data of the section for each word candidate Next, detect the end frame of this audio power data, and from the time of detection, only the first and second blocks of words are detected from this end frame to the start frame. Recognition processing according to the form is performed in accordance with the determination result obtained at the 0th order, which calculates the distance between the candidate speech pattern and the position standard pattern.

さらに判定された入力形態が位取りである場合には、こ
の位取り連続数字の入力音声を認識するに当り、音声パ
タンと標準パタンとの距離演算を位取り特有の規則を格
納した規則テーブルを参照して行う。Furthermore, when the determined input form is scale, when recognizing the input voice of consecutive numbers with scale, the distance calculation between the voice pattern and the standard pattern is performed by referring to a rule table that stores rules specific to scale. conduct.

（作用）このような構成によれば、任意の桁数のＭ統数字音声が
入力すると、音声区間の音声パワーデータの単語候補の
うち終端フレーム側から２ブロック目までの単語候補の
音声パタンと位標準パタンとのマツチングのみを行うの
で、マツチング時間が短縮される。(Function) According to such a configuration, when an M-syllable digit voice with an arbitrary number of digits is input, the voice pattern of the word candidates from the end frame side to the second block from the end frame side among the word candidates of the voice power data of the voice section is input. Since only the matching with the standard pattern is performed, the matching time is shortened.

又、入力音声の入力形態を判定し、この判定された入力
形態に対応した音声認識処理が行われるので認識精度が
高く、又、桁数に係わりなく認識処理できるので応用範
囲が極めて広い。Furthermore, since the input form of the input voice is determined and speech recognition processing corresponding to the determined input form is performed, recognition accuracy is high, and recognition processing can be performed regardless of the number of digits, so the range of application is extremely wide.

又、位取り数字の音声パワーデータの場合には位取り特
有の規則を格納した規則テーブルを参照しながら認識処
理を行うので、距離演算及び判定の筒略化が図れる。In addition, in the case of voice power data of place value digits, recognition processing is performed while referring to a rule table storing rules specific to place value, so distance calculation and determination can be simplified.

（実施例）以下、図面を参照し、この発明の実施例につき説明する
。(Embodiments) Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は、この発明の連続数字認識方法の説明図で、こ
の方法を実施する装置の構成の一実施例をブロック図で
示しである。FIG. 1 is an explanatory diagram of the consecutive number recognition method of the present invention, and is a block diagram showing one embodiment of the configuration of an apparatus for carrying out this method.

第１図において１００は入力端子、２００は周波数分析
部、３００は対数変換部である。４００は音声区間決定
部、５００はスペクトル変換部、６００は再サンプル部
である。７００は入力形態認識判定部であり、ブロック
化部７１０、距離演算部７２０、「位」標準パタンメモ
リ７３０、形態判定部７４０及びこれら各部７１０．７
２０　、７３０　、７４０での処理を指令したり制御し
たりする制御部７６０から成る。９００は位取り認識処
理部であり、距離演算部９１Ｏ１標準パタンメモリ９２
０、判定部９３０、規則テーブル９４０から成る。　ｔ
ｏｏｏは棒読み認識処理部であり、距離演算部１０１０
、標準パタンメモリ１０２０、判定部１０３　’Ｃから
成り、８１０は出力端子である。In FIG. 1, 100 is an input terminal, 200 is a frequency analysis section, and 300 is a logarithmic conversion section. 400 is a voice section determining section, 500 is a spectrum converting section, and 600 is a resampling section. Reference numeral 700 denotes an input form recognition determination section, which includes a blocking section 710, a distance calculation section 720, a "place" standard pattern memory 730, a form determination section 740, and each of these sections 710.7
The control section 760 instructs and controls the processing at 20, 730, and 740. 900 is a scale recognition processing section, distance calculation section 91O1 standard pattern memory 92
0, a determination section 930, and a rule table 940. t
ooo is a stick reading recognition processing unit, and a distance calculation unit 1010
, a standard pattern memory 1020, and a determining section 103'C, and 810 is an output terminal.

このような構成において、入力端子１００には連続発声
された数字音声が入力し、この入力端子１００から入力
される入力音声信号を、周波数分析部２００に入力させ
て複数の周波数帯域に対応した量子化信号として周波数
分析する０周波数分析された信号を対数変換部３００に
送り、対数スペクトル情報及び全域パフ−情報を得る。In such a configuration, continuously uttered numeric sounds are input to the input terminal 100, and the input audio signal input from the input terminal 100 is input to the frequency analysis section 200 to generate a quantum signal corresponding to a plurality of frequency bands. The zero-frequency analyzed signal is sent to the logarithmic conversion unit 300 to obtain logarithmic spectrum information and full range puff information.

これらスペクトル情報及びパワー情報を音声区間決定部
４００へ送ると共に、スペクトル情報のみをスペクトル
変換部５００へ送る。スペクトル変換部５００は、話者
により変動する分析データを音声スペクトルの最小二乗
近似直線を差し引くことによって、発声強度及び音源特
性の正規化を行う。The spectral information and power information are sent to the voice section determination section 400, and only the spectral information is sent to the spectral conversion section 500. The spectrum conversion unit 500 normalizes the utterance intensity and sound source characteristics by subtracting the least squares approximation straight line of the voice spectrum from the analysis data that varies depending on the speaker.

この音声区間決定部４００では発声された連続数字音声
信号の始端フレーム（ＳＴＦＲ）及び終端フレーム（Ｅ
ＤＦＲ）の検出を行う。決定された始端フレーム（ＳＴ
ＦＲ）及び終端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）間で定まる
音声区間の音声パワーデータをスペクトル変換部５００
から送られる情報と同時に再サンプル部６００に送る。This voice section determination unit 400 is configured to perform a start frame (STFR) and an end frame (E
DFR) is detected. The determined start frame (ST
FR) and the end frame (EDFR) and the end frame (EDFR), the spectrum conversion unit 500
It is sent to the re-sampling unit 600 at the same time as the information sent from.

再サンプル部６００では、音声パワーデータの時間軸の
正規化を行う。時間軸の正規化の方法は、従来公知の技
術であり、リニアマツチング方法では音声区間を認識装
置の条件によって定められた一定数に時間的に等間隔に
分割、再サンプルする方法である。The resampling unit 600 normalizes the time axis of the audio power data. The method of normalizing the time axis is a conventionally known technique, and the linear matching method is a method of dividing the speech section into a fixed number of equal intervals in time determined by the conditions of the recognition device and resampling the same.

λ辺ＵＳ（以■亙さて、再サンプル部８００で再サンプルされた音声パワ
ーデータを、まず入力形態認識判定部７００の中のブロ
ック部７１０へ送る。入力形態認識判定部７００は音声
パワーデータとして入力される８桁のＮ統数字（但し、
Ｎは正の任意の整数）がどのような形態で入力されてい
るものなのか、すなわち「Ｏ千０百ｏ十０」という「位
」がＭ続発声された数字の中に含まれているのか、ある
いは「イチニサンヨン」というような単なる棒読みの数
字なのかといった入力形態の判定を行う部分である。以
下、この判定処理につき説明する。λ side US (hereafter) The voice power data resampled by the resampling unit 800 is first sent to the block unit 710 in the input form recognition determining unit 700.The input form recognition determining unit 700 receives the voice power data as voice power data. The 8-digit N number to be input (however,
N is any positive integer. This is the part that determines the type of input, such as whether it is a simple digit or a simple number such as ``ichi ni san yeon.'' This determination process will be explained below.

第３図は一例として位取り有りと棒読みの４桁数字の発
声時の音声パワーの様子を示す音声パワー図であり、横
軸に時間軸及び縦軸に音声パワーの大きさを取ってそれ
ぞれ示しである。同図において上段の図が位取り有りで
「ロクセンゴヒャクナナジュウニ」　（６千５百７七２
）と発声した際のパワー及びｆ段の図が棒読みで「ロク
ゴ（＋）ナナニ（イ）Ｊ　　（６５７２）と発声した際
のパワーをそれぞれ示す。Figure 3 is a voice power diagram showing, as an example, the state of voice power when pronouncing 4-digit numbers with scale and in stick reading.The horizontal axis shows the time axis, and the vertical axis shows the magnitude of the voice power. be. In the same figure, the upper figure has scales and is ``Rokusengohyakunanajuuni'' (6,557,72
) and the power when uttering ``Rokugo (+) Nanani (i) J (6572)'' are shown in the figure of the f stage in stick reading.

入力形態の判定は、区切られたブロック単位での音声パ
ワーデータを用いて行う為、ブロック化部７１０におい
て単語の区間である可能性の高い箇所で区切ってブロッ
ク化を行う。Since the input form is determined using voice power data in units of divided blocks, the blocking unit 710 divides and divides into blocks at locations that are likely to be word sections.

第３図上での破線はポインタと称し、ブロック化を行っ
て単語の区間であるとした境界の位置であり、ポインタ
間が数字、位等の単語候補の音声パタンである。又、位
取り数字の音声パワーと。The broken lines in FIG. 3 are called pointers, and are the positions of the boundaries between blocks of words, and the area between the pointers is the sound pattern of word candidates such as numbers and places. Also, the audio power of place value numbers.

棒読み数字の音声パワーの区間の矢印はそれぞれのポイ
ンタ間の位置関係を示す。The arrows in the audio power section of the stick numbers indicate the positional relationship between the respective pointers.

さて、距離演算部７１０においてブロック化部７１０で
ブロック化された音声パワーデータの音声パタン（単語
候補の音声パタン）と、ｒ位」標準パタンメモリ７３０
に格納されている「位」標準パタンと距離演算を行う、
この場合、「位」標準パタンメモリ７３０には「ｅ　拳
ｆｆｌ、万、千、百・串」のような「位」の標準パタン
か格納されている。Now, in the distance calculation unit 710, the audio pattern (the audio pattern of the word candidate) of the audio power data that has been blocked by the blocking unit 710 and the “r” standard pattern memory 730
Perform distance calculations with the "place" standard pattern stored in
In this case, the "place" standard pattern memory 730 stores standard "place" patterns such as "e fistffl, 10,000, 1,000, 100, skewers".

また位によっては例えば「百」の位のように位の前に来
る数字で「ヒャク」が「ビャク」や「ビャり」のように
発声の仕方が変わるものがあるので、このような変形パ
タンも格納されている。Also, depending on the digit, for example, the number that comes before the digit of ``hundred'', the way in which ``hyaku'' is pronounced changes, such as ``byaku'' or ``byari'', so this modified pattern is also stored.

これらの標準パタンと、単語候補の音声パタンとのマツ
チングは、先ず、ブロック化部７１０によりパワーでの
大きさの小さいポインタで区切られた音声パワーデータ
のブロックの終端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）を検出す
る。この終端フレームの検出により、−・つの連続発声
の終端を確認できる。In order to match these standard patterns with the speech patterns of word candidates, the blocking unit 710 first extracts the end frame (E DFR) of a block of speech power data separated by pointers with small power values. To detect. By detecting this end frame, it is possible to confirm the end of the consecutive utterances.

次にこの終端フレームを検出した時点から始端フレーム
（ＳＴＦＲ）の方向にさかのぼるようにポインタをサー
チしていく、このサーチは終端フレーム側から順次に全
てのブロックの音声パタンすなわち全てのポインタ間の
１Ｍ候補の音声パタンを「位」標準パタンとマツチング
させて行う。Next, the pointer is searched from the point at which this end frame is detected, going back in the direction of the start end frame (STFR). This is done by matching the candidate voice pattern with the "rank" standard pattern.

そして「位」標準パタンとの距離演算で求められた各距
離の総和の最小値と、距離演算された音声区間の音声パ
ワーデータとが形態判定部７４０に出力され、そこでこ
の総和距離最小値の値によって入力された連続数字が位
取り有りか棒読みかの判定を行う。Then, the minimum value of the sum of each distance obtained by calculating the distance with the "place" standard pattern and the voice power data of the voice section for which the distance was calculated are output to the form determining section 740, where the minimum value of the total distance is Depending on the value, it is determined whether the input consecutive numbers have a scale or are in stick reading.

このように、この方法では、始端フレーム（Ｓ　Ｔ　Ｆ
　Ｒ）から終端フレーム（ＥＤＦＲ）までの全てのポイ
ンタ間のブロックについて「位」標準パタンとの距離計
算を行っている。In this way, in this method, the starting frame (S T F
Distances from the "place" standard pattern are calculated for all the blocks between pointers from R) to the end frame (EDFR).

しかしながら「位」は実際位取りの発声した場合に必ず
終端フレーム（ＥＤＦＲ）から数えて１ブロック目、あ
るいは２ブロック目に「位」　（例えば、に→ぢゆう→
なな→ひ壱＜争・・）が来るということに着目すると、
終端フレーム側から１番手さい位のみ調べれば位取り数
字か棒読み数字かという両者の区別は可能であるので、
さらにマツチング時間を短縮化する方法としてブロック
の終端フレーム（ＥＤＦＲ）から数えて１ブロック目と
２ブロック目のみ「位」標準パタンとマツチングを行い
判定結果を出力する方法も考えられる。However, when the actual value is uttered, "place" always appears in the first or second block counting from the end frame (EDFR) (for example, ni→jiyuu→
If we focus on the fact that Nana→Hiichi<War...) is coming,
It is possible to distinguish between scale numerals and stick digits by examining only the first digit from the end frame side.
Furthermore, as a method for shortening the matching time, a method may be considered in which only the first and second blocks counted from the end frame (EDFR) of the block are matched with the "place" standard pattern and the determination results are output.

尚、このような入力形態認識判定部７００での終端フレ
ームの検出、ポインタのサーチ、マツチング時間短縮化
のためのマツチングすべきブロックの指定の指令や制御
等々は例えば制御部７６０で行う。Incidentally, the control section 760 performs, for example, the detection of the end frame, the search for the pointer, the instruction and control of specifying blocks to be matched to shorten the matching time, etc. in the input form recognition and determination section 700.

この判定部７４０においては、判定結果が位取りの数字
であればフラグ（ＦＬＡＧ）＝　１とし、送られてきた
音声区間の音声パワーデータにこのフラグ＝１を付加し
、また判定結果が棒読みの数字であればフラグ（ＦＬＡ
Ｇ）＝Ｏとして送られてきた音声パワーデータにフラグ
＝Ｏを付加し、その結果を次段の認識処理部８００に送
る。In this judgment unit 740, if the judgment result is a scale number, it sets a flag (FLAG) = 1, adds this flag = 1 to the voice power data of the voice section that is sent, and also if the judgment result is a decimal number. If it is a flag (FLA
G) A flag=O is added to the voice power data sent as =O, and the result is sent to the next stage recognition processing section 800.

区亀亙１この認識処理部８００においては、フラグ＝１、すなわ
ち入力された音声が位取り有りであると判定された音声
パワーデータは、位取り認識処理部８００に送られる。In the recognition processing section 800, the voice power data for which the flag is 1, that is, the input voice is determined to have scale, are sent to the scale recognition processing section 800.

距離演算部９１０では標準パタンメモリ９２０と距離演
算を行う６以下、この距離演算につき説明する。The distance calculation unit 910 performs distance calculation with the standard pattern memory 920. This distance calculation will be explained below.

先ず、第４図に、標準パタンメモリ９２０の内容を示す
０図示の内容は各桁と数字の組み合わせに応じた読みの
変形に対応する為に必要な標準パタンの一例を示すもの
であり、入力形態判定部７００の中の距離演算に用いた
「位」標準パタンメモリ７３０内の「位」の標準パタン
に加えて、第４図のように１〜９までの各数字の標準パ
タンを持っている。さらに、この第４図から明らかなよ
うに、１口（ツ）」は百の位にしか存在しないことや、
「ハ（ツ）」は千、又は百の位にしか存在しない等々、
単独発声時の数字の音声パタンとは異なった特徴的なパ
タンか存在する。さらに、位取りの連続数字の場合は、
前述した特徴的パタンに加え、第５図に示すような、あ
る特有の規則が存在する。First, FIG. 4 shows the contents of the standard pattern memory 920. The contents shown in FIG. In addition to the standard patterns for digits in the digit standard pattern memory 730 used for distance calculations in the form determining section 700, standard patterns for each number from 1 to 9 are also provided as shown in FIG. There is. Furthermore, as is clear from this Figure 4, ``1 Kuchi (tsu)'' only exists in the 100's place.
"Ha (tsu)" only exists in the thousand or hundred places, etc.
There is a characteristic pattern that differs from the sound pattern of numbers when spoken alone. Furthermore, in the case of consecutive numbers with scale,
In addition to the characteristic patterns described above, there are certain specific rules as shown in FIG.

第５図は数字と位取りの読みの関係の千の位から−の位
までの例を示したものである。この図において、数字の
読みをローマ字表示しである。また、ロコ内が位であり
、変形された位の読み方を規定している。又、ロコに結
びついている実線が、それぞれの位の読み方につながる
数字である０例えば千の位は”セン（ＳＥＮ）”と゛ゼ
ン（ＺＥＮ）”という読み方があるが千の位を′°ゼン
（ＺＥＮ）”と読む数字はｒ３」　（ＳＡＮ）しか存在
しないということになる。又、百の位はさらに分類され
、数字のパタン自体に特殊なものもでてくることがわか
る。FIG. 5 shows an example of the relationship between numbers and scale readings from the thousands digit to the - digit. In this figure, the readings of the numbers are shown in Roman letters. In addition, the place within the loco is the place, and the reading of the transformed place is prescribed. Also, the solid line connected to the loco is a number that connects to the reading of each digit.For example, the thousands digit can be read as “SEN” and (ZEN)" and the number pronounced as "r3" (SAN) is the only one that exists. In addition, the hundreds place is further classified, and it can be seen that there are some special patterns in the number patterns themselves.

位取り認識処理部ＳＯＯ内の規則テーブル９４０には第
５図に示した位取り入り連続数字の読みの特徴規則が格
納されている。従って、距離演算部８１０では、すでに
ブロック化部７１０でブロック化した単語候補をマツチ
ングし、距離演算結果を判定部９３０へ送り、判定部９
３０では演算時に規則テーブル８４０に格納されている
規則を参照するとともに桁と数字を分離して距離演算時
間及び判定時間の短縮と、認識精度の向上を図る。The rule table 940 in the scale recognition processing unit SOO stores the feature rules for reading continuous numbers with scale shown in FIG. Therefore, the distance calculation section 810 matches the word candidates already blocked by the blocking section 710, sends the distance calculation result to the determination section 930, and sends the distance calculation result to the determination section 930.
30 refers to rules stored in the rule table 840 during calculation and separates digits and numbers to shorten distance calculation time and judgment time and improve recognition accuracy.

以上のように限定された組み合わせのみで総合距屋を求
めて最も距屋値の小さいカテゴリを認識結果として出力
端子８１０へ出力する。As described above, a comprehensive range is determined using only the limited combinations, and the category with the smallest range is output to the output terminal 810 as a recognition result.

一方、フラグ＝Ｏ１すなわち入力された音声がＷ５読み
であると判定された音声パワーデータは棒読み認識処理
部１０００に送られる。On the other hand, the voice power data indicating that the flag=O1, that is, the input voice is determined to be W5 reading, is sent to the stick reading recognition processing section 1000.

互層演算部１０１０ではブロック化された各数字単語の
候補毎に一桁ずつ標準パタンメモリ１０２０に格納され
ている標準パタンと距離演算先行い、その結果を判定部
１０３０へ送る。The alternating layer calculation unit 1010 performs distance calculations on the standard pattern stored in the standard pattern memory 1020 one digit at a time for each block-formed numerical word candidate, and sends the result to the determination unit 1030.

標準パタンメモリ１０２０は標準パタンメモリ９２０と
は異なり、数字の棒読みに即した＠足された数字の読み
のパタンのみ格納されている９判定部１０３０では総合
距離とのカテゴリ名を認識結果として出力端子８１０か
ら出力する。又、棒読み認識処理部１０００の距離演算
部１０１０及び判定部＋０３０の詳細な動作は、例えば
、この発明の出願人に係る特願昭５９−５８４４０号に
提案されている「音声パタンマツチング方法」に従って
行うことができ、音声定常部バスの傾斜を制限すること
によって、発声速度が予想される範囲に過渡部バスを制
限する。The standard pattern memory 1020 differs from the standard pattern memory 920 in that it stores only the reading pattern of @added numbers that corresponds to the reading of numbers.The 9 judgment unit 1030 outputs the total distance and category name as a recognition result to an output terminal. Output from 810. Further, the detailed operations of the distance calculating section 1010 and the determining section +030 of the stick reading recognition processing section 1000 are described in the "voice pattern matching method" proposed in Japanese Patent Application No. 59-58440 filed by the applicant of the present invention. By limiting the slope of the speech steady-state bus, the transient bus is limited to the range where the rate of speech is expected.

又、それに対応して最適化と仮称したバス設定処理を複
数回数繰り返すようにしているものである。In addition, in response to this, a bus setting process tentatively called optimization is repeated a plurality of times.

上述した説明からも理解でさるように、この発明の方法
によれば、連続数字音声の入力があった場合に、これが
位取りのある数字か、棒読みの数字かを判定した後に、
それぞれ対応する音声認識処理を行う構成となっていれ
ば良く、その具体的な構成は上述した実施例にのみ限定
されるものではない。As can be understood from the above explanation, according to the method of the present invention, when a continuous number voice is input, after determining whether it is a number with a scale or a number with a scale,
It is sufficient to have a configuration that performs the corresponding voice recognition processing, and the specific configuration is not limited to the above-described embodiment.

尚、上述した第１図に示した主要構成成分４００　、５
００　、８００　、７１０　、７２０　、７４０　、７
８０、！３１０　、９３０　、１０１０．　１０３０の
処理はＣＰＵ（中央処理装置）で行うことが出来る。In addition, the main components 400 and 5 shown in FIG.
00, 800, 710, 720, 740, 7
80,! 310, 930, 1010. Processing 1030 can be performed by a CPU (central processing unit).

（発明の効果）上述した説明からも明らかなように、この発明によれば
任意の数字を連続的に発声した際に、その入力音声が位
取り有りか数字の棒読みかを短時間で検知し、検知後各
々に対応した認識処理を行うので、発声した連続数字音
声の認識処理を精度良く行うことができ、従って認識率
の向上と、演算処理の簡単化の効果が期待できる。さら
に、棒゛読み１位取りの両方の連続数字認識方法の機能
を兼ね備えているので、今後使用者の要求に対応した連
続数字認識装置にも適用可能で応用範囲が広い。(Effects of the Invention) As is clear from the above description, according to the present invention, when arbitrary numbers are uttered continuously, it is possible to detect in a short time whether the input voice has a scale or is a reading of the numbers, Since the recognition process corresponding to each detection is performed after each detection, the recognition process of the uttered continuous numeric voice can be performed with high accuracy, and therefore, the effect of improving the recognition rate and simplifying the calculation process can be expected. Furthermore, since it has the functions of both consecutive number recognition methods, such as the one-place reading method, it can be applied to continuous number recognition devices that meet the needs of users in the future, and has a wide range of applications.

[Brief explanation of drawings]

第１図はこの発明の連続数字音声認識方法の一実施例の
説明に供するブロック図、第２図は従来の連続数字音声認識方法の説明に供するブ
ロック図、第３図は位取り有りと棒読みの４桁数字の発声時の音声
パワー図、第４図は標準パタンメモリの内容を示す図。第５図は数字と位取りの関係（規則）を示す図である。１００・・・入力端子、　　　２００・・・周波数分析
部３００・・・対数変換部、　　４００・・・音声区間
決定部５００・・・スペクトル変換部６００・・・再サンプル部７００・・・入力形態認識判定部７１０・・・ブロック化部、７２０・・・距離演算部７
３０・・・「位」標準パタンメモリ７４０・・・形態判定部、　　７６０・・・制御部８０
０・・・認識処理部、　　８１０・・・出力端子９００
・・・位取り認識処理部９１０・・・距離演算部９２０・・・標辱パタンメモリ９３０・・・判定部、　　　　９４０・・・規則テーブ
ル１０００・・・棒読み認識処理部１０１０・・・距離＠ス部１０２０・・・標準パタンメモリ１０３０・・・判定部。特許出願人　　　　沖電気工業株式会社ｇ　　　　　　
　／／／Ａ之−ｉの連」叱号炙乍竜戸認識方伍１７）　Ｓ先Ｂ
月図第２図Fig. 1 is a block diagram for explaining an embodiment of the continuous digit speech recognition method of the present invention, Fig. 2 is a block diagram for explaining a conventional continuous digit speech recognition method, and Fig. 3 is a block diagram for explaining the conventional continuous digit speech recognition method. Voice power diagram when uttering a 4-digit number. Figure 4 is a diagram showing the contents of the standard pattern memory. FIG. 5 is a diagram showing the relationship (rule) between numbers and scale. 100... Input terminal, 200... Frequency analysis section 300... Logarithmic conversion section, 400... Speech interval determination section 500... Spectrum conversion section 600... Re-sampling section 700... Input form Recognition determination unit 710...blocking unit, 720...distance calculation unit 7
30... "place" standard pattern memory 740... form determining section, 760... control section 80
0... Recognition processing unit, 810... Output terminal 900
... Place recognition processing section 910 ... Distance calculation section 920 ... Humiliation pattern memory 930 ... Judgment section, 940 ... Rule table 1000 ... Bar reading recognition processing section 1010 ... Distance@S Section 1020... Standard pattern memory 1030... Judgment section. Patent applicant: Oki Electric Industry Co., Ltd.
// /A-i no Ren” scolding roasted Ryuto recognition method 5 17) S destination B
Moon map figure 2

Claims

[Claims]

(1) Recognize whether the input form of continuous numbers in the input voice is continuous numbers with place value or consecutive numbers with bar reading using the place standard pattern, and perform recognition processing of the input voice according to the obtained judgment result. When recognizing an input voice consisting of an arbitrary number of consecutive digits uttered continuously, the voice power data of the voice section determined by the start frame and end frame is divided into blocks for each word candidate in the recognition judgment of the input form. After that, the end frame is detected, and from the time of detection, from the end frame to the start frame side, distance calculation is performed between the voice pattern and position standard pattern of word candidates only in the first and second blocks, Furthermore, in recognition processing of input speech whose input form is determined to be scaled, a distance calculation between the speech pattern and the standard pattern is performed using a rule table storing rules specific to scale, and a final recognition result is output. A continuous digit speech recognition method characterized by the following.