JPS58223198A

JPS58223198A - Syllable inputting system

Info

Publication number: JPS58223198A
Application number: JP57106570A
Authority: JP
Inventors: 浜田　洋; 良平中津
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1982-06-21
Filing date: 1982-06-21
Publication date: 1983-12-24
Also published as: JPH0259480B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】この発明は例えば単音節毎（二発声された音声を認識し
、かな文字列を出力する音節入力方式（：用いられ、入
力背戸から音声区間を検出する音節入力方式（二関する
ものである。DETAILED DESCRIPTION OF THE INVENTION The present invention uses a syllable input method (:) that recognizes a single syllable (for example, two uttered sounds) and outputs a kana character string, and a syllable input method (: that detects a speech interval from an input back door). This is related to two things.

〈従来技雨〉従来の音節入力方式（：おける音声区間検出の例を第１
図（：示す。入力された音声はフレーム毎（二背戸パワ
ａが計算され、あらかじめ無人力時のパワより定められ
たしきい値すとの比較が行われる。<Conventional technique> The conventional syllable input method (:
The input voice is calculated for each frame (Niseido power a is calculated, and compared with a threshold value predetermined from the unmanned power.

比較の結果しきい値以上のフレームがあらかじめ定めた
長さのフレーム（ｎ、フレーム）以上連続した場合、そ
の時の音声パワがしきい値な越廠たフレームを音声区間
の始端Ｃとする。音声区間の始端検出後、音声パワがし
きい値以下のフレームがあらかじめ定めた長さのフレー
ム（ｎｌｃフレーム）以上連続した場合無音区間が検出
されたとしてその時、音声パワがしきい値以下シ：なっ
たフレームを音声区間の終端ｄとする。As a result of the comparison, if frames of a predetermined length (n, frames) or more are continuous, the frame in which the voice power exceeds the threshold is set as the start point C of the voice section. After detecting the start of a voice section, if frames with voice power below the threshold continue for more than a predetermined length of frames (NLC frames), a silent section is detected and the voice power is below the threshold: The frame in which this occurs is set as the end d of the audio section.

以上説明した従来の方法は、音節の発声後、間隔（ｎＢ
フレーム以上）をあけて次の音節を発声しなければ音節
の終端が検出されないという欠点があった。そのため高
速（−次々と音節を発声することができず、また音節間
のポーズを利用者−二強制することとなり負担が大きく
、かつ前節間のポーズが長いため通常の音声発声速度と
比較して遅くなり過ぎ発声がやりにくかった。In the conventional method described above, after the utterance of a syllable, the interval (nB
The problem was that the end of a syllable could not be detected unless the next syllable was uttered after a gap of at least one frame. Therefore, it is not possible to pronounce syllables one after another, and the user is forced to pause between syllables, which is a heavy burden, and the pause between the previous clauses is long, so the speech production speed is faster than normal speech production. It was too late and it was difficult to speak.

〈発明の櫨要〉この発明はこれらの欠点を解決するため１：、無性区間
の情報のみでなく、音声パワの谷の情報をも用いて音声
区間の終端検出を行うようにすることにより、音節間（
二長いポーズがなくても音声区間の検出が可能となり、
高速な音節入力が可能となり、音節ごとに区切って発声
するがポーズを入れなくて済み、通常の発声速度とはゾ
同程度で発声入力させると−とができる。<Summary of the Invention> This invention solves these drawbacks by: 1. By detecting the end of a voice section using not only the information on the asexual section but also the information on the valley of voice power. , intersyllabic (
2. It is now possible to detect voice sections without long pauses,
It is possible to input syllables at high speed, and although each syllable is uttered separately, there is no need to insert pauses, and it is possible to input speech at about the same speed as normal speech.

〈実施例〉第２図はこの発明の一実施例の構成を示すブロック図で
ある。マイクロホン１からの音声入力信号は帯域通過フ
ィルタ２を通り帯域制限された後、ＡＤ変換器３（二よ
りディジタル信号１二変換される。<Embodiment> FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention. The audio input signal from the microphone 1 passes through a bandpass filter 2 and is band-limited, and then converted into a digital signal 12 by an AD converter 3 (2).

次にこのディジタル（Ｍ号は音声パワ算出部４Ｃ二おい
てあらかじめ定めたフレーム長、フレーム周期で音声パ
ワが算出され、音声パワ出力端子５へ出力される。Next, the audio power of this digital signal (M) is calculated by the audio power calculation unit 4C2 with a predetermined frame length and frame period, and is output to the audio power output terminal 5.

無音区間検出部６では無人力時のパワＣ二適当な値を加
えることシ：より設定したしきい値と、音響パワ出力端
子５から入力される音声パワとの比較をフレーム毎（：
行い、音声パワがしきい値より大きい場合は１”を、音
声パワがしきい値より小さい場合は“θ″を無音区間出
力端子７鑞：出力する。The silent section detection unit 6 compares the threshold value set by the unmanned power C2 with the audio power input from the audio power output terminal 5 for each frame (:
When the audio power is greater than the threshold value, "1" is output, and when the audio power is less than the threshold value, "θ" is output from the silent section output terminal 7.

パワ・ディップ（音声パワの谷）検出部８では音声パワ
出力端子５を通して入力される音声パワの谷を検出する
。パワ・ディップ検出法としては同梱類か考えられるが
、その−例として音声パワ時系列な２次曲線近似した場
合の２次微係数（二よる方法がある。求められた２次微
係数をあらかじめ設定した適当なしきい値とフレーム毎
（−比較し、微係数がしきい値より大きい場合Ｃ−は“
ｌ”、しきい値より小さい場合は１０”をパワ・ディ、
ツブ出力端子９（：、また微係数の値を微係数出力端子
１０（−出力する。A power dip (trough in audio power) detection section 8 detects a trough in the audio power input through the audio power output terminal 5. As a power dip detection method, it can be considered that it is included in the package, but an example is the second-order differential coefficient when approximating the quadratic curve of the audio power time series. Compare each frame with an appropriate threshold value set in advance (-, and if the differential coefficient is larger than the threshold value, C- is “
l”, if smaller than the threshold, power 10”;
Tsubu output terminal 9 (:, also outputs the value of the differential coefficient to the differential coefficient output terminal 10 (-).

背戸区間決定部１１では無音区間出力端子７及びパワ・
ディップ出力端子９、微係数出力端子１０を通して入力
される情報から音声区間の始端・終端を決定する。具体
的な方法の例としては以下（二述べる方法がある。即ち（１）無音区間出力端子７の値が１０１から８１ｍ１（
二敦化した後″′１”のフレームがｎＢフレーム以上連
続した場合に、その峙１０”から′″１１に便化したフ
レームを音声の始端とする。The back door section determining section 11 connects the silent section output terminal 7 and the power
The start and end of the voice section are determined from information input through the dip output terminal 9 and the differential coefficient output terminal 10. As a specific example of the method, there are two methods described below. Namely, (1) the value of the silent section output terminal 7 is from 101 to 81 m1 (
After the duplexing, if frames of ``'1'' continue for nB frames or more, the frame converted from 10'' to ``11'' is taken as the start of the audio.

■）　音響の始端検出後、無音区間出力端子７の値が１
１＃から＠０　’　４ｍ変化し、０＃のフレームが０６
フレ一ム以上連続した場合（：、その時′″１”から“
θ′″に変化したフレームを音声の終端とする。■) After detecting the start of the sound, the value of the silent section output terminal 7 becomes 1.
1# to @0' 4m change, 0# frame is 06
If more than one frame continues (:, then ``1'' to ``
The frame that changes to θ''' is the end of the audio.

（３）　　音声の始端検出後、無音区間出力端子７の値
が１０”のフレームが１フレ一ム以上ｎＥフレーム未満
で、かつ無音区間出力端子７の値が＠０”の区間（；パ
ワ・ディップ出力端子９の値か′″１＃であるフレーム
が存在する時、無音区間出力端子７のイ１ムが１＃から
“Ｏ”に変化したフレームを音声の終端とする。(3) After detecting the start of the audio, the period where the value of the silent section output terminal 7 is 10" is one frame or more and less than nE frames, and the value of the silent section output terminal 7 is @0"(; power When there is a frame in which the value of the dip output terminal 9 is ``1#'', the frame in which the value of the silent section output terminal 7 changes from 1# to "O" is set as the end of the audio.

（４）　　音声の始端検出後、無音区間出力端子７の値
が“１′で、かつパワ・ディップ出力９１Ｍ子９の値が
″１”である時パワ・ディップ出力端子９の値が１１ｍ
の区間中、微係数出力端子１０から入力される微係数値
最大のフレームを音声の終端及び次の音節の始端とする
。(4) After detecting the start of the voice, when the value of the silent section output terminal 7 is "1'" and the value of the power dip output 91M child 9 is "1", the value of the power dip output terminal 9 is 11m.
During the interval, the frame with the maximum differential coefficient value input from the differential coefficient output terminal 10 is set as the end of the speech and the beginning of the next syllable.

なおこの発明の趣旨は無音区画のみではなく、音声パワ
の谷部をもって音響の終端とすること（：あり、その具
体的−理は必ずしも前述の（１）〜（４）の論理６：固
定されるものではなく、類似の一理を用いてもよい。音
節ごと（：音声を入力するため隣接音節間が接近してい
ても音節間の音声パワの２次微係数は大きくなる。The purpose of this invention is not only to have a silent section, but also to make the valley of the audio power the end of the sound. Instead, a similar principle may be used.Since each syllable (: voice is input), even if adjacent syllables are close to each other, the second-order differential coefficient of the speech power between syllables will increase.

以上の方法で検出された音声区間は音１１０６忍畠部１
２で認識され、かな文字列がＮＭ識結釆出力端子１３か
ら出力される。音節認―法１−関して）よ例えば中津、
浜田、方弁、萬浜「日本誌単音節音声紹鵬法の検討」昭
８Ｊ５６年反′電子荊伯学会情報システム部門全国大会
予価集１−１１７（昭５６年１０月）シー示されている
。The audio section detected using the above method is sound 1106 Oshibatake 1
2, the kana character string is output from the NM recognition button output terminal 13. For example, Nakatsu,
Hamada, Hoben, Manhama, "Study of the Japanese Journal Monosyllabic Speech Introduction Method," 1988 J56 Anti-electronics Society Information Systems Division National Conference Preliminary Collection 1-117 (October 1982) .

第３図は以上説明した方法（二よる＃ル区間株出の例で
ある。音声パワ算出部４においてｊ（〆〕られた音ｈパ
ワａをしきい値すと比較し、無音区ｎＪＩ出カイｊ号ｅ
が求められる。またパワ・ディップ検出６１１８　にお
いて求められた音声パワを２次曲線近似した場合の２次
微係数ｆが微係数出力端子１ｏ（＝出力されると共にし
きい値ｇとの比較≦二よりパワ・ディップ出力信号りが
求められる。これらの情報から前述のに！理１ｍ、基づ
い°Ｃ音声区間の始端Ｃ１終喘ｄが決定される。FIG. 3 is an example of the above-described method (2). chi j issue e
is required. In addition, the quadratic differential coefficient f when the audio power obtained in the power dip detection 6118 is approximated by a quadratic curve is outputted from the differential coefficient output terminal 1o (== and the comparison with the threshold value g≦2 shows that the power dip is The output signal is determined. From this information, the start point C1 and end point d of the voice section are determined based on the above-mentioned information.

以上説明したよう（−この発明は音声の終端検出に無音
区間検出のみでなく音声パワの谷部の情帳をも用いるた
め、各背節間纏：無音区間を置く必要がないという利点
がある。従って単語、文節内の各ね１１′Ｏは擬似連続
的（二発声するなど利用者が任意の速反で入力可ｈｌｌ
：な茜速會節入力方式が実現できる。As explained above, this invention has the advantage that it is not necessary to provide a silent section for each dorsal intersegmental, since the present invention uses not only silent section detection but also the trough information of the voice power to detect the end of speech. Therefore, each phrase 11'O in a word or phrase is pseudo-continuous (the user can input it in any rapid response, such as by saying it twice).
:A fast-paced input method can be realized.

[Brief explanation of the drawing]

第１図は従来の百節入力方式（：おける音声区間検出の
例を示す波形図、弗２図はこの発明の一実施例の格１戊
を示すブロック図、第３図ｔまこの発明の首頗入カカ式
（−おける音声区間検出のＥ／ＩＪを示す波形図である
。にマイクロホン、２：帯域通過フィルタ、３：ＡＤ変換
器、４：音声パワ算出部、５：音声パワ出力端子、６：
無音区間検出部、７：無音区間出力端子、８：パワ・デ
ィップ検出部、９：パワ・ディップ出力端子、ｌＯ：微
係数出力端子、１１：音声区間決定部、１２：竹節認識
部、１３：認識結果出力端子。特許出願人　　日本゛蟻伯電話公社Fig. 1 is a waveform diagram showing an example of speech interval detection in the conventional 100-section input method (:), Fig. 2 is a block diagram showing case 1 戊 according to an embodiment of the present invention, and Fig. 3 It is a waveform diagram showing the E/IJ of voice section detection in the Kubikiri Kaka type (-). Microphone, 2: band pass filter, 3: AD converter, 4: voice power calculation unit, 5: voice power output terminal , 6:
Silent section detection section, 7: Silent section output terminal, 8: Power/dip detection section, 9: Power/dip output terminal, lO: Differential coefficient output terminal, 11: Voice section determining section, 12: Bamboo knot recognition section, 13: Recognition result output terminal. Patent applicant: Japan Aribo Telephone Corporation

Claims

[Claims]

(1) For each sound h0 1: A syllable input method that recognizes each syllable of the uttered voice (: A method for calculating the power of the input voice every frame of a predetermined length 4: A means for calculating the power of the input voice and its expansion power means for detecting a silent interval from 62 by comparing with a predetermined threshold; means for determining the end of a syllable at the valley of the voice power detected from the voice power calculated above; Detected syllable end (
- A syllable input method characterized by comprising means for detecting speech intervals.