JPS62195700A

JPS62195700A - Continuous numerical voice recognition

Info

Publication number: JPS62195700A
Application number: JP61036950A
Authority: JP
Inventors: 広田　敦子; 山田　興三
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-02-21
Filing date: 1986-02-21
Publication date: 1987-08-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は、音声認識における連続数字音声認識方法に
関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a continuous digit speech recognition method in speech recognition.

（従来の技術）従来より、連続発声した連続数字の入力音声を認識し判
定する連続数字音声認識の方法の開発が進められ実゛用
に供されてきている。例えば文献「特開昭５８−１８９
８９５号」に開示されているこの種の認識方法では、連
続音声認識方法のうち特に数字に関して、ＤＰマツチン
グ手法を用い、かつ桁数を限定することによって、数字
の棒読みの連続入力を認識するものである。(Prior Art) Conventionally, continuous digit speech recognition methods for recognizing and determining input speech of consecutive digits uttered continuously have been developed and put into practical use. For example, the document “Unexamined Japanese Patent Publication No. 58-189
This type of recognition method disclosed in ``No. 895'' recognizes continuous input of reading numbers by using the DP matching method and limiting the number of digits, especially regarding numbers among continuous speech recognition methods. It is.

この従来提案された連続数字音声認識方法を第２図を参
照して簡単に説明する。This conventionally proposed continuous digit speech recognition method will be briefly explained with reference to FIG.

第２図においてｌは連続数字音声の入力端子。In FIG. 2, l is an input terminal for continuous digit voice.

２はＡ／Ｄ変換器、３はバッファ、４は特徴抽出部、５
は標準バタンメモリ、６は距離計算部。2 is an A/D converter, 3 is a buffer, 4 is a feature extraction unit, 5
is the standard button memory, and 6 is the distance calculation section.

７は連続ＤＰ処理部、８は論理＠算処理部、９はパワー
分析部、１０はリミッタ、１１は出力端子の如く構成さ
れている。7 is a continuous DP processing section, 8 is a logic@arithmetic processing section, 9 is a power analysis section, 10 is a limiter, and 11 is an output terminal.

この方法は、入力音声から一定間隔で短時間パワーを求
め、さらに短時間パワーの時系列値を直交関数展開し、
直交関数展開により入力音声を構成するモーラ（音節）
数を求める。そして、求められたモーラ数に１を加えた
値を２で割り、小数以下を切り捨てた整数を入力音声が
連続数字であるときの桁数を推定し認識処理を行うもの
である。This method calculates the short-term power from the input audio at regular intervals, and then expands the time-series value of the short-time power with an orthogonal function.
Mora (syllables) that compose input speech by orthogonal function expansion
Find the number. Then, the obtained mora number plus 1 is divided by 2, the decimal part is rounded down to an integer, and the number of digits when the input voice is continuous numbers is estimated and recognition processing is performed.

（発明が解決しようとする問題点）しかしながら、Ｊ：、述した従来方法では、第１に連ｆ
ｉＤＰマツチング法を用いているためマツチング部等の
規模が非常に大きくなり経済性の点で非現実的なものと
なる。第２に桁数の限定があるので桁数の限定がない方
法よりも、マツチング回数が減少して認識処理時間があ
る程度短縮されるという利点があるが、実際に連続数字
認識の実用化を考慮した場合、桁数が限定され、かつ棒
読み発声では（例えば電話番号の入力「ゼロニイゴオハ
チ」へ利用する程度で）非常に利用範囲が狭くなる。ざ
らにこの第２の点に関連する事柄として、Ｅ３３に様々
な単位（例えば金額の場合の「円」、距離の単位を表わ
すｒＫｍＪ　、他）のついた連続数字を音声として発声
する場合、必ず「Ｏ万Ｏ千Ｏ百Ｏ＋−Ｏ」と位取りを入
れて発声するので、棒読み連続数字はもとより、それ以
外の連続数字をも含む一般的な連続数字を前述したよう
な「棒読みの数字」専用のアルゴリズムによって認識す
ることは当然困難となる。従って、今後使用者の要求に
対応した連続数字認識方法を考慮した場合、両刀の機能
を兼ね備え入力音声に即座に対応できるシステムが必要
となると考えられる。(Problems to be Solved by the Invention) However, in the conventional method described above,
Since the iDP matching method is used, the scale of the matching section etc. becomes extremely large, making it unrealistic from an economic point of view. Second, since there is a limit on the number of digits, it has the advantage that the number of matchings is reduced and the recognition processing time is shortened to some extent compared to a method without a limit on the number of digits. In this case, the number of digits is limited, and the range of use becomes extremely narrow when using standard pronunciation (for example, when inputting a telephone number "Zero Niigo Ohachi"). Roughly related to this second point, when uttering consecutive numbers in E33 with various units (for example, ``yen'' for monetary amounts, rKmJ for distance units, etc.), always Since it is uttered with the place value "O million O thousand O hundred O + - O", it is used not only for regular consecutive numbers, but also for other consecutive numbers. Naturally, it is difficult to recognize using this algorithm. Therefore, when considering continuous digit recognition methods that meet the needs of users in the future, it is thought that a system that has both functions and can immediately respond to input speech will be required.

この発明の目的は、上述した問題点に鑑み、任意の桁数
の数字を連続的に発声した際に、その入力音声が位取り
有りか、数字の棒読みかを検知し、検知後、各々に対応
した認識処理を精度良く行うことを可能にした連続数字
認識方法を提供することにある。In view of the above-mentioned problems, the purpose of this invention is to detect whether the input voice has a scale or a reading of the numbers, when a number of arbitrary digits is uttered continuously, and after detection, responds to each. It is an object of the present invention to provide a continuous number recognition method that enables highly accurate recognition processing.

（問題点を解決するための手段）この目的の達成を図るため、この発明の連続数字音声認
識方法においては、特に８桁（但し、Ｎは正の任意の整
数）の数字を連続発声した場合、入力音声が位取り有り
（０万Ｏ千Ｏ百Ｏ十Ｏ）か、数字の棒読みかを検知して
入力形態認識判定を行う、この判定は入力形態認識判定
部で行う。(Means for solving the problem) In order to achieve this purpose, in the continuous digit speech recognition method of the present invention, especially when 8-digit numbers (N is any positive integer) are continuously uttered, , an input form recognition judgment is performed by detecting whether the input voice has a scale (00,000,000,000,000) or numbers are read in a straight line.This judgment is performed by an input form recognition judgment section.

次にこの認識判定で得られた判定結果に従い位取り又は
棒読みの入力形態に応じた認識処理を行って、精度良く
連続数字音声の認識処理を行う、この処理を認識処理部
で行う。Next, in accordance with the determination result obtained in this recognition determination, recognition processing is performed according to the input form of place value or stick reading, and continuous digit speech recognition processing is performed with high precision.This processing is performed by the recognition processing section.

（作用）このような構成によれば、任意の桁数の連続数字音声が
入力すると入力形態認識判定部で位取り数字か棒読み数
字が判定される０位取り数字と判定されたときは、音声
は位取り認識処理部で認識処理が行われ、棒読み数字と
判定されたときは音声は棒読み認識処理部で認識処理が
行われる。(Function) According to such a configuration, when a continuous number voice with an arbitrary number of digits is input, the input form recognition determining section determines whether it is a digit with a place value or a digit with a 0 place value. Recognition processing is performed in the recognition processing section, and when it is determined that the number is a stick-reading number, recognition processing is performed on the voice in the stick-reading recognition processing section.

このように入力音声の入力形態を判定し、この判定され
た入力形態に対応した音声認識処理が行われるので認識
精度が高く、又１桁数に係わりなく認識処理できるので
応用範囲が極めて広い。In this way, the input form of the input voice is determined, and the speech recognition process corresponding to the determined input form is performed, so the recognition accuracy is high, and since the recognition process can be performed regardless of the number of single digits, the range of application is extremely wide.

（実施例）以下、図面を参照し、この発明の実施例につき説明する
。(Embodiments) Hereinafter, embodiments of the present invention will be described with reference to the drawings.

第１図は、この発明の連続数字認識方法の説明図で、こ
の方法を実施する装置の構成の一実施例をブロック図で
示しである。FIG. 1 is an explanatory diagram of the consecutive number recognition method of the present invention, and is a block diagram showing one embodiment of the configuration of an apparatus for carrying out this method.

第１図において１００は入力端子、２００は周波数分析
部、３００は対数変換部である。４００は音声区間決定
部、５００はスペクトル変換部、６００は再サンプル部
である。７００は入力形態認識判定部であり、ブロック
化部７１０、距離演算部７２０、「位」標準バタンメモ
リ７３０、形態判定部７４０及びこれら各部７１０　、
７２０　、７３０　、７４０での処理を指令したり制御
したりする制御部７８０から成る。９００は位取り認識
処理部であり、距離演算部９１０、標準バタンメモリ９
２０、判定部９３０、規則テーブル９４０から成る。１
０００は棒読み認識処理部であり、距離演算部＋０１０
．標準バタンメモリ１０２０、判定部１０３０から成り
、８１０は出力端子である。In FIG. 1, 100 is an input terminal, 200 is a frequency analysis section, and 300 is a logarithmic conversion section. 400 is a voice section determining section, 500 is a spectrum converting section, and 600 is a resampling section. 700 is an input form recognition determination unit, which includes a blocking unit 710, a distance calculation unit 720, a “place” standard slam memory 730, a form determination unit 740, and each of these units 710,
It consists of a control section 780 that instructs and controls the processing at 720 , 730 , and 740 . 900 is a scale recognition processing unit, a distance calculation unit 910, a standard button memory 9
20, a determination section 930, and a rule table 940. 1
000 is the stick reading recognition processing unit, distance calculation unit +010
．． It consists of a standard button memory 1020 and a determining section 1030, and 810 is an output terminal.

このような構成において、入力端子１００には連続発声
された数字音声が入力し、この入力端子１００から入力
される入力音声信号を１周波数分析部２００に入力させ
て複数の周波数帯域に対応した量子化信号として周波数
分析する０周波数分析された信号を対数変換部３００に
送り、対数スペクトル情報及び全域パワー情報を得る。In such a configuration, continuously uttered numeric sounds are input to the input terminal 100, and the input audio signal input from the input terminal 100 is input to the single frequency analysis section 200 to generate a quantum signal corresponding to a plurality of frequency bands. The zero-frequency analyzed signal is sent to the logarithmic conversion unit 300 to obtain logarithmic spectrum information and full range power information.

これらスペクトル情報及びパワー情報を音声区間決定部
４００へ送ると共に、スペクトル情報のみをスペクトル
変換部５００へ送る。スペクトル変換部５００は、話者
により変動する分析データを音声スペクトルの最小二乗
近似直線を差し引くことによって、発声強度及び音源特
性の正規化を行う。The spectral information and power information are sent to the voice section determination section 400, and only the spectral information is sent to the spectral conversion section 500. The spectrum conversion unit 500 normalizes the utterance intensity and sound source characteristics by subtracting the least squares approximation straight line of the voice spectrum from the analysis data that varies depending on the speaker.

この音声区［■Ｉ決定部４００では発ｊｐＨされた連続
数字音声信号の始端フレーム（Ｓ　Ｔ　Ｆ’　Ｒ）及び
終端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）の検出を行う。決定さ
れた始端フレーム（ＳＴＦＲ）　及び終端フレーム（Ｅ
　Ｄ　Ｆ　Ｒ）間で定まる音声区間の音声パワーデータ
をスペクトル変換部５００から送られる情報と同時に再
サンプル部６００に送る。再サンプル部６００では、音
声パワーデータの時間軸の正規化を行う。時間軸の正規
化の方法は、従来公知の技術であり、リニアマツチング
方法では音声区間を認識装置の条件によって定められた
一定数に時間的に等間隔に分割、再サンプルする方法で
ある。The voice section [■I determination unit 400 detects the start frame (S T F' R) and end frame (E D F R) of the continuous numeric voice signal issued by jpH. The determined start frame (STFR) and end frame (E
The audio power data of the audio section determined between DFR) is sent to the resampling unit 600 at the same time as the information sent from the spectrum converting unit 500. The resampling unit 600 normalizes the time axis of the audio power data. The method of normalizing the time axis is a conventionally known technique, and the linear matching method is a method of dividing the speech section into a fixed number of equal intervals in time determined by the conditions of the recognition device and resampling the same.

入力形態の判定さて、再サンプル部６００で再サンプルされた音声パワ
ーデータを、まず入力形態認識判定部７００の中のブロ
ンク部？１０へ送る。入力形態認識判子？Ａ７００は音
声パワーデータとして入力されるＮ桁の連続数字（但し
、Ｎは正の任意の整数）がどのような形態で入力されて
いるものなのか、すなわち「Ｏ″ｆＯ百○ｌ−ＯＪ　と
いう「位」か連続発声された数字の中に含まれているの
か、あるいは「イチニサンヨン」というような巾なる棒
読みの数字なのかといった入力形態の判定を行う部分で
ある。以下、この判定処理につき説明する。Judgment of Input Form First, the voice power data resampled by the resampling unit 600 is processed by the bronc section in the input form recognition determining unit 700. Send to 10. Input form recognition stamp? A700 is the format in which N-digit consecutive numbers (N is any positive integer) are input as audio power data, that is, "O"fO100l-OJ. This is the part that determines the type of input, such as whether it is included in a series of digits that are uttered, or whether it is a digit that is read in a straight line, such as ``ichi ni san yeon.'' This determination process will be explained below.

第３図は一例として位取り有りと棒読みの４桁数字の発
声時の音声パワーの様ｆを示す音声パワー図であり、横
軸に時間軸及び縦軸に音声パワーの大きさを取ってそれ
ぞれ示しである。同図において上段の図が位取り有りで
「ロクセンゴヒャクナナジュウニ」　（６千５百７十２
）と発声した際のパワー及び下段の図が棒読みで［ロク
ゴ（イ）ナナニＨ）Ｊ　　（６５７２）と発声した際の
パワーをそれぞれ示す。Figure 3 is a voice power diagram showing, as an example, the voice power when uttering 4-digit numbers with scale and in stick reading.The horizontal axis shows the time axis, and the vertical axis shows the magnitude of the voice power. It is. In the same figure, the upper figure has a scale and is ``Rokusengohyakunanajuuni'' (6,572
) and the lower figure shows the power when pronouncing [Rokugo (I) Nanani H) J (6572), respectively.

入力形態の判定は１区切られたブロック単位での音声パ
ワーデータを用いて行う為、ブロック化部７１０におい
て単語の区間である可能性の高い箇所で区切ってブロッ
ク化を行う。Since the input form is determined using voice power data in units of one divided block, the blocking unit 710 performs blocking by dividing the input form at a place that is likely to be a word section.

第３図上での破線はポインタと称し、ブロック化を行っ
て単語の区間であるとした境界の位置であり、ポインタ
間が数字、位等の単語候補の音声バタンである。又、位
取り数字の音声パワーと、棒読み数字の音声パワーの区
間の矢印はそれぞれのポインタ間の位否関係を示す。The broken lines in FIG. 3 are called pointers, and are the positions of boundaries between word sections after block formation, and the space between the pointers is the sound of word candidates such as numbers and places. Further, the arrows between the voice power of the scale digit and the voice power of the scale digit indicate the position relationship between the respective pointers.

ざて、距離演算部７１０においてブロック化部７１０で
ブロック化された音声パワーデータの音声バタン（！１
１語候補の音声バタン）と、「位」標準バタンメモリ７
３０に格納されている「位」ｅ準バタンと距ＲＦ＋算を
行う。この場合、「位」標準バタンメモリ７３０には「
φ・億、万、千、百１１会」のような「位」の標準バタ
ンか格納されている。Then, the distance calculation unit 710 calculates the sound bang (!1
1 word candidate voice button) and “place” standard button memory 7
The distance RF+ calculation is performed with the "place" e quasi-button stored in 30. In this case, the "place" standard button memory 730 contains "
Standard stamps for ``place'' such as φ・100 million, 10,000, 1,000, 111, etc. are stored.

また位によっては例えば「百」の位のように位の前に来
る数字で「ヒャク」が「ビャク」や「ピャク」のように
発声の仕方が変わるものがあるので、このような変形バ
タンも格納されている。Also, depending on the place, for example, the number that comes before the place, such as the ``hundred'' place, the way ``hyaku'' is pronounced changes like ``byaku'' or ``pyaku'', so such modified bangs are also used. Stored.

これらの標準バタンと、単語候補の音声バタンとのマツ
チングは、先ず、ブロック化部７１０によりパワーでの
大きさの小さいポインタで区切られた音声パワーデータ
のブロフクの終端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）を検出す
る。この終端フレームの検出により、一つの連続発声の
終端を確認できる。In order to match these standard bangs and word candidate voice bangs, first, the blocking unit 710 extracts the end frame (EDFR) of blocks of voice power data delimited by pointers with small power values. To detect. By detecting this end frame, the end of one continuous utterance can be confirmed.

次にこの終端フレームを検出した時点から始端フレーム
（ＳＴＦＲ）の方向にさかのぼるようにポインタをサー
チしていく。このサーチは終端フレーム側からＩＩＲ次
に全てのブロックの音声バタンすなわち全でのポインタ
間の！ｌｉ語候補の音声バタンを「位」標準バタンとマ
ツチングさせて行う。Next, the pointer is searched from the point in time when this end frame is detected, going back in the direction of the start end frame (STFR). This search starts from the end frame side and then all the blocks' audio bangs, that is, between all the pointers! This is done by matching the phonetic bang of the li word candidate with the ``place'' standard bang.

そして「位」標準バタンとの距離演算で求められた各互
層の総和の最小値と、距離演算された音声区間の音声パ
ワーデータとが形態判定部７４０に出力され、そこでこ
の総和距＃最小値の値によって入力された連続数字が位
取り有りか棒読みかの判定を行う。Then, the minimum value of the sum of each alternating layer obtained by distance calculation with the standard baton and the voice power data of the voice section for which the distance was calculated are output to the form determining section 740, where this total distance #minimum value Based on the value of , it is determined whether the input consecutive numbers have a scale or are in stick reading.

このように、この方法では、始端フレーム（ＳＴＦＲ）
から終端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）までの全てのポイ
ンタ１１Ｈのブロフクについて「位」標準バタンとの距
離計算を行っている。In this way, in this method, the starting frame (STFR)
The distance from the "place" standard button is calculated for all the blocks of the pointer 11H from to the end frame (EDFR).

しかしながら「位」は叉際位取りの発声した場合に必す
経端フレーム（Ｅ　Ｄ　Ｆ　Ｒ）から数えて１ブロンク
目、あるいは２ブロツク目に「位」　（例えば、に→ぢ
ゆう→なな→ひ壱＜　−−−）が来るということに着目
すると、終端フレーム側から１番手さい位のみ調べれば
位取り数字か棒読み数字かという両者の区別は可能であ
るので、さらにマツチング時間を短縮化する方法として
ブロックの終端フレーム（ＥＤＦＲ）から数えて１ブロ
ツク目と２ブロツク目のみ「位」標準バタンとマツチン
グを行い判定結果を出力する方法も考えられる。However, ``I'' is the first bronc or second block counting from the end frame (EDF R) which is required when uttering the ``Ichi'' (for example, ni → jiyu → nana → Focusing on the fact that hiichi < ---) comes, it is possible to distinguish between scale digits and stick digits by checking only the first digit from the end frame side, so there is a method to further reduce the matching time. It is also conceivable to match only the first and second blocks counted from the end frame (EDFR) of the block with the "place" standard button and output the determination result.

尚、このような入力形態認識判定部７００での終端フレ
ームの検出、ポインタのサーチ、マツチング時間短縮化
のためのマツチングすべきブロックの指定の指令や制御
等々は例えば制御部７６０で行う。Incidentally, the control section 760 performs, for example, the detection of the end frame, the search for the pointer, the instruction and control of specifying blocks to be matched to shorten the matching time, etc. in the input form recognition and determination section 700.

この判定部７４０においては、判定結果が位取りの数字
であればフラグ（ＦＬＡＧ）＝　１とし、送られてきた
音声区間の音声パワーデータにこのフラグ−１を付加し
、また判定結果か棒読みの数字であればフラグ（ＦＬＡ
Ｇ）＝Ｏとして送られてさた音声パワーデータにフラグ
−Ｏを付加し、その結果を次段の認識処理部８００に送
る。In this judgment unit 740, if the judgment result is a scale number, it sets a flag (FLAG) = 1, adds this flag -1 to the voice power data of the voice section that is sent, and also determines whether the judgment result is a scale figure or not. If it is a flag (FLA
G) A flag -O is added to the voice power data sent as =O, and the result is sent to the recognition processing section 800 at the next stage.

認識処理この認識処理部８００においては、フラグ＝１、すなわ
ち入力された音声が位取り有りであると判定された音声
パワーデータは、位取り認識処理部９００に送られる。Recognition Processing In the recognition processing unit 800, the voice power data for which the flag is 1, that is, it is determined that the input voice is scaled, is sent to the scale recognition processing unit 900.

距離演算部９１０では標準バタンメモ１Ｊ９２０と距謬
演算を行う。以下、この距離＠算につき説明する。The distance calculation unit 910 performs distance calculation with the standard slam memo 1J920. This distance @ calculation will be explained below.

先ず、第４図に、標準バタンメモリ９２０の内容を示す
。図示の内容は各桁と数字の組み合わせに応じた読みの
変形に対応する為に必要な標準バタンの一例を示すもの
であり、入力形態′＃足部７００の中の距離演算に用い
た「位」標準バタンメモリ７３０内の「位」の標準バタ
ンに加えて、第４図のように１〜９までの各数字の標僧
バタンを持っている。さらに、この第４図から明らかな
ように、「口（））」は百の位にしか存在しないことや
、「ハクフ）」は千、又は百の位にしか存在しない等々
、単独発声時の数字の音声バタンとは異なった特徴的な
バタンか存在する。さらに、位取りの連続数字の場合は
、前述した特徴的バタンに加え、第５図に示すような、
ある特有の規則が存在する。First, FIG. 4 shows the contents of the standard button memory 920. The content shown is an example of the standard button required to correspond to the transformation of the reading according to the combination of each digit and number. In addition to the standard button for ``place'' in the standard button memory 730, it also has the standard button for each number from 1 to 9 as shown in FIG. Furthermore, as is clear from Figure 4, ``kuchi ()'' only exists in the hundreds place, and ``hakufu'' only exists in the thousand or hundred places, etc. There is a distinctive type of bang that is different from the number sound type of bang. Furthermore, in the case of continuous numbers, in addition to the characteristic bang mentioned above, as shown in Figure 5,
There are certain specific rules.

第５図は数字と位取りの読みの関係の千の位から−の位
までの例を示したものである。この図において、数字の
読みをローマ字表示しである。また、ロコ内が位であり
、変形された位の読み方を規定している。又、０に結び
ついている実線が、それぞれの位の読み方につながる数
字である。例えば千の位は゛°セン（ＳＥＮ）”と゛°
ゼン（ＺＥＮ）”という読み方があるが千の位を“セン
（ＺＥＮ）”と読む数字はｒ３」　（ＳＡＮ）Ｌ。FIG. 5 shows an example of the relationship between numbers and scale readings from the thousands digit to the - digit. In this figure, the readings of the numbers are shown in Roman letters. In addition, the place within the loco is the place, and the reading of the transformed place is prescribed. Also, the solid line connected to 0 is the number connected to the reading of each digit. For example, the thousands place is “SEN”.
There is a way to read it as ``ZEN'', but the number in the thousandth place is read as ``ZEN'', and the number is r3'' (SAN)L.

か存在しないということになる。又、百の位はさらに分
類され、数字のバタン自体に特殊なものもでてくること
がわかる。Or it doesn't exist. In addition, the hundreds digit is further classified, and it can be seen that there are some special types of number bangs themselves.

位取り認識処理部３００内の規則テーブル９４０には第
５図に示した位取り入り連続数字の読みの特徴規則が格
納されている。従って、距離演算部９１０では、すでに
ブロック化部７１０でブロック化した単語候補をマツチ
ングし、距離演算結果を判定部９３０へ送り、判定部９
３０では演算時に規則テーブル９４０に格納されている
・規則を参照するとともに桁と数字を分離して距離演算
時間及び判定時間の短縮と、認識精度の向上を図る。The rule table 940 in the scale recognition processing section 300 stores the feature rules for reading continuous numbers with scale shown in FIG. Therefore, the distance calculation unit 910 matches the word candidates already blocked by the blocking unit 710, sends the distance calculation result to the determination unit 930, and sends the distance calculation result to the determination unit 930.
30 refers to the rules stored in the rule table 940 during calculation and separates digits and numbers to shorten distance calculation time and judgment time and improve recognition accuracy.

以りのように限定された組み合わせのみで総合距離を求
めて最も距ｒａ値の小さいカテゴリを認識結果として出
力端子８１０へ出力する。As described above, the total distance is determined using only the limited combinations, and the category with the smallest distance ra value is output to the output terminal 810 as the recognition result.

−一万、フラグ二〇、すなわち入力された音声が杯読み
であると判定された音声パワーデータは棒読み認識処理
部１０００に送られる。-10,000, flag 20, that is, the voice power data determined that the input voice is Haiyomi is sent to the Boyomi recognition processing section 1000.

距離演算部１０１０ではブロフク化された各数字単語の
候補毎に一桁ずつ標準バタンメモリ１０２０に格納され
ている標準バタンと距離演算を行い、その結果を判定部
１０３０へ送る。The distance calculation unit 1010 performs distance calculations for each numerical word candidate that has been converted into a block, one digit at a time, with respect to the standard button stored in the standard button memory 1020, and sends the result to the determination unit 1030.

標準バタンメモリ１０２０は標準バタンメモリ９２０と
は異なり、数字の棒読みに即した限定された数字の読み
のバタンのみ格納されている。判定部１０３０では総合
距離とのカテゴリ名を認識結果として出力端子８１０か
も出力する。又、棒読み認識処理部１０００の距ｉ演算
部１０１０及び判定部１０３０の詳細な動作は、例えば
、この発明の出願人に係る特願昭５９−５８４４０号に
提案されているｒ音声バタンマンチング方法」に従って
行うことができ、音声定常部バスの傾斜を制限すること
によって、発声速度が予想される範囲に過渡部バスを制
限する。The standard button memory 1020 differs from the standard button memory 920 in that only a limited number of button readings corresponding to the number readings are stored. The determination unit 1030 also outputs the total distance and the category name to the output terminal 810 as a recognition result. Further, the detailed operations of the distance i calculation section 1010 and the determination section 1030 of the stick reading recognition processing section 1000 are described, for example, in the r-voice slamming method proposed in Japanese Patent Application No. 58440/1983 filed by the applicant of the present invention. ', and by limiting the slope of the voice stationary part bus, the transient part bus is limited to the range where the rate of speech is expected.

又、それに対応して最適化と仮称したパス設定処理を複
数回数繰り返すようにしているものである。In addition, in response to this, a path setting process tentatively called optimization is repeated a plurality of times.

上述した説明からも理解できるように、この発明の方法
によれば、連続数字音声の入力があった場合に、これか
位取りのある数字か、棒読みの数字かを判定した後に、
それぞれ対応する音声認識処理を行う構成となっていれ
ば良く、その具体的な構成は上述した実施例にのみ限定
されるものではない。As can be understood from the above explanation, according to the method of the present invention, when a continuous number voice is input, after determining whether it is a number with a scale or a number with a scale,
It is sufficient to have a configuration that performs the corresponding voice recognition processing, and the specific configuration is not limited to the above-described embodiment.

尚、上述した第１図に示した主要構成成分４００　、５
００　、　Ｂｏｏ　、？１０　、７２０　、７４０　、
７８０．９１０　、９３０　、１０１０、１０３０の処
理はＣＰＵ（中央処理装置）で行うことが出来る。In addition, the main components 400 and 5 shown in FIG.
00, Boo,? 10, 720, 740,
The processing of 780.910, 930, 1010, and 1030 can be performed by a CPU (central processing unit).

（発明の効果）上述した説明からも明らかなように、この発明によれば
任意の数字を連続的に発声した際に、その入力音声が位
取り有りか数字の棒読みか検知し、検知後各々に対応し
た認識処理を行うので、発声した連続数字音声の認識処
理を精度良く行うことができ、従って認識率の向トと、
演算処理のＮ単化の効果が期待できる。さらに、棒読み
、位取りの両刀の連続数字認識方法の機能を兼ね備えて
いるので、今後使用者の要求に対応した連続数字認識装
置にも適用可能で応用範囲が広い。(Effects of the Invention) As is clear from the above explanation, according to the present invention, when arbitrary numbers are uttered continuously, it is detected whether the input voice has a scale or the number is read in a rounded manner, and after the detection, it is possible to Since the corresponding recognition processing is performed, it is possible to perform the recognition processing of the uttered continuous numeric voice with high accuracy, thus improving the recognition rate.
The effect of reducing the number of calculations to N can be expected. Furthermore, since it has the functions of continuous number recognition methods for both stick reading and placekeeping, it can be applied to continuous number recognition devices that meet the needs of users in the future, and has a wide range of applications.

[Brief explanation of drawings]

男１図はこの発明の連続数字音声認識方法の一実施例の
説明に供するブロック図、第２図は従来の連続数字音声認識方法の説明に供するブ
ロック図、第３図は位取り有りと棒読みの４桁数字の発声時の音声
パワー図、第４図は標畢バタンメモリの内容を示す図、第５図は数
字と位取りの関係（規則）を示す図である。１００・・・入力端子、　　　２００・・・周波数分析
部３００・・・対数変換部、　　４００・・・音声区間
状足部５００・・・スペクトル変換部６００・・・再サンプル部７００・・・入力形態認識判定部７１０・・・ブロンク化部、　７２０・・・距離演算部
７３０・・・「位」標準バタンメモリ７４０・・・形態判定部、　　７８０・・・制御部８０
０・・・認識処理部、　　８１０・・・出力端子８００
・・・位取り認識処理部８１０・・・距離ｌｒＴ算部９２０・・・標準バタンメモリ９３０・・・判定部、　　　　９４０・・・規則テーブ
ル１０００・・・＋５読み認識処理部１０１０・・・互層演算部１０２０・・・標準バタンメモリ１０３０・・・判定部。特許出願人　　　　沖電気工業株式会社１・、Ｉ’ｌｌ
　′’　、’、。代理人　弁理士　　　　大　　垣　　　　孝　、′−。ゾ１鑓乗０止玖敬″Ｓ３□？認識方法の９え明口第２図Figure 1 is a block diagram for explaining an embodiment of the continuous digit speech recognition method of the present invention, Figure 2 is a block diagram for explaining the conventional continuous digit speech recognition method, and Figure 3 is a block diagram for explaining the conventional continuous digit speech recognition method. A voice power diagram when uttering a 4-digit number, Figure 4 is a diagram showing the contents of the title stamp memory, and Figure 5 is a diagram showing the relationship (rules) between numbers and scale. 100... Input terminal, 200... Frequency analysis section 300... Logarithmic conversion section, 400... Speech interval foot section 500... Spectrum conversion section 600... Re-sampling section 700... Input Form recognition determination unit 710... Broncization unit, 720... Distance calculation unit 730... "place" standard slam memory 740... Form determination unit, 780... Control unit 80
0... Recognition processing unit, 810... Output terminal 800
... Scale recognition processing section 810 ... Distance lrT calculation section 920 ... Standard button memory 930 ... Judgment section, 940 ... Rule table 1000 ... +5 reading recognition processing section 1010 ... Alternate layer calculation Section 1020... Standard button memory 1030... Judgment section. Patent applicant Oki Electric Industry Co., Ltd. 1., I'll
′′ , ',. Agent: Patent attorney Takashi Ogaki, '-. zo1 鑓ride0stopkukei''S3□?Recognition method 9emeiguchi Figure 2

Claims

[Claims]

(1) When recognizing the input voice of continuous numbers of any number of digits that are continuously uttered, it is determined whether the input format of the continuous numbers in the input voice is continuous numbers with place value or continuous numbers with bar reading. A continuous number speech recognition method, characterized in that a scale recognition process or a reading recognition process is performed on the input voice according to a result of an input form recognition determination.