JPS58223198A - Syllable inputting system - Google Patents

Syllable inputting system

Info

Publication number
JPS58223198A
JPS58223198A JP57106570A JP10657082A JPS58223198A JP S58223198 A JPS58223198 A JP S58223198A JP 57106570 A JP57106570 A JP 57106570A JP 10657082 A JP10657082 A JP 10657082A JP S58223198 A JPS58223198 A JP S58223198A
Authority
JP
Japan
Prior art keywords
power
syllable
voice
output terminal
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP57106570A
Other languages
Japanese (ja)
Other versions
JPH0259480B2 (en
Inventor
浜田 洋
良平 中津
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP57106570A priority Critical patent/JPS58223198A/en
Publication of JPS58223198A publication Critical patent/JPS58223198A/en
Publication of JPH0259480B2 publication Critical patent/JPH0259480B2/ja
Granted legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 この発明は例えば単音節毎(二発声された音声を認識し
、かな文字列を出力する音節入力方式(:用いられ、入
力背戸から音声区間を検出する音節入力方式(二関する
ものである。
DETAILED DESCRIPTION OF THE INVENTION The present invention uses a syllable input method (:) that recognizes a single syllable (for example, two uttered sounds) and outputs a kana character string, and a syllable input method (: that detects a speech interval from an input back door). This is related to two things.

〈従来技雨〉 従来の音節入力方式(:おける音声区間検出の例を第1
図(:示す。入力された音声はフレーム毎(二背戸パワ
aが計算され、あらかじめ無人力時のパワより定められ
たしきい値すとの比較が行われる。
<Conventional technique> The conventional syllable input method (:
The input voice is calculated for each frame (Niseido power a is calculated, and compared with a threshold value predetermined from the unmanned power.

比較の結果しきい値以上のフレームがあらかじめ定めた
長さのフレーム(n、フレーム)以上連続した場合、そ
の時の音声パワがしきい値な越廠たフレームを音声区間
の始端Cとする。音声区間の始端検出後、音声パワがし
きい値以下のフレームがあらかじめ定めた長さのフレー
ム(nlcフレーム)以上連続した場合無音区間が検出
されたとしてその時、音声パワがしきい値以下シ:なっ
たフレームを音声区間の終端dとする。
As a result of the comparison, if frames of a predetermined length (n, frames) or more are continuous, the frame in which the voice power exceeds the threshold is set as the start point C of the voice section. After detecting the start of a voice section, if frames with voice power below the threshold continue for more than a predetermined length of frames (NLC frames), a silent section is detected and the voice power is below the threshold: The frame in which this occurs is set as the end d of the audio section.

以上説明した従来の方法は、音節の発声後、間隔(nB
フレーム以上)をあけて次の音節を発声しなければ音節
の終端が検出されないという欠点があった。そのため高
速(−次々と音節を発声することができず、また音節間
のポーズを利用者−二強制することとなり負担が大きく
、かつ前節間のポーズが長いため通常の音声発声速度と
比較して遅くなり過ぎ発声がやりにくかった。
In the conventional method described above, after the utterance of a syllable, the interval (nB
The problem was that the end of a syllable could not be detected unless the next syllable was uttered after a gap of at least one frame. Therefore, it is not possible to pronounce syllables one after another, and the user is forced to pause between syllables, which is a heavy burden, and the pause between the previous clauses is long, so the speech production speed is faster than normal speech production. It was too late and it was difficult to speak.

〈発明の櫨要〉 この発明はこれらの欠点を解決するため1:、無性区間
の情報のみでなく、音声パワの谷の情報をも用いて音声
区間の終端検出を行うようにすることにより、音節間(
二長いポーズがなくても音声区間の検出が可能となり、
高速な音節入力が可能となり、音節ごとに区切って発声
するがポーズを入れなくて済み、通常の発声速度とはゾ
同程度で発声入力させると−とができる。
<Summary of the Invention> This invention solves these drawbacks by: 1. By detecting the end of a voice section using not only the information on the asexual section but also the information on the valley of voice power. , intersyllabic (
2. It is now possible to detect voice sections without long pauses,
It is possible to input syllables at high speed, and although each syllable is uttered separately, there is no need to insert pauses, and it is possible to input speech at about the same speed as normal speech.

〈実施例〉 第2図はこの発明の一実施例の構成を示すブロック図で
ある。マイクロホン1からの音声入力信号は帯域通過フ
ィルタ2を通り帯域制限された後、AD変換器3(二よ
りディジタル信号1二変換される。
<Embodiment> FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention. The audio input signal from the microphone 1 passes through a bandpass filter 2 and is band-limited, and then converted into a digital signal 12 by an AD converter 3 (2).

次にこのディジタル(M号は音声パワ算出部4C二おい
てあらかじめ定めたフレーム長、フレーム周期で音声パ
ワが算出され、音声パワ出力端子5へ出力される。
Next, the audio power of this digital signal (M) is calculated by the audio power calculation unit 4C2 with a predetermined frame length and frame period, and is output to the audio power output terminal 5.

無音区間検出部6では無人力時のパワC二適当な値を加
えることシ:より設定したしきい値と、音響パワ出力端
子5から入力される音声パワとの比較をフレーム毎(:
行い、音声パワがしきい値より大きい場合は1”を、音
声パワがしきい値より小さい場合は“θ″を無音区間出
力端子7鑞:出力する。
The silent section detection unit 6 compares the threshold value set by the unmanned power C2 with the audio power input from the audio power output terminal 5 for each frame (:
When the audio power is greater than the threshold value, "1" is output, and when the audio power is less than the threshold value, "θ" is output from the silent section output terminal 7.

パワ・ディップ(音声パワの谷)検出部8では音声パワ
出力端子5を通して入力される音声パワの谷を検出する
。パワ・ディップ検出法としては同梱類か考えられるが
、その−例として音声パワ時系列な2次曲線近似した場
合の2次微係数(二よる方法がある。求められた2次微
係数をあらかじめ設定した適当なしきい値とフレーム毎
(−比較し、微係数がしきい値より大きい場合C−は“
l”、しきい値より小さい場合は10”をパワ・ディ、
ツブ出力端子9(:、また微係数の値を微係数出力端子
10(−出力する。
A power dip (trough in audio power) detection section 8 detects a trough in the audio power input through the audio power output terminal 5. As a power dip detection method, it can be considered that it is included in the package, but an example is the second-order differential coefficient when approximating the quadratic curve of the audio power time series. Compare each frame with an appropriate threshold value set in advance (-, and if the differential coefficient is larger than the threshold value, C- is “
l”, if smaller than the threshold, power 10”;
Tsubu output terminal 9 (:, also outputs the value of the differential coefficient to the differential coefficient output terminal 10 (-).

背戸区間決定部11では無音区間出力端子7及びパワ・
ディップ出力端子9、微係数出力端子10を通して入力
される情報から音声区間の始端・終端を決定する。具体
的な方法の例としては以下(二述べる方法がある。即ち (1)無音区間出力端子7の値が101から81m1(
二敦化した後″′1”のフレームがnBフレーム以上連
続した場合に、その峙10”から′″11に便化したフ
レームを音声の始端とする。
The back door section determining section 11 connects the silent section output terminal 7 and the power
The start and end of the voice section are determined from information input through the dip output terminal 9 and the differential coefficient output terminal 10. As a specific example of the method, there are two methods described below. Namely, (1) the value of the silent section output terminal 7 is from 101 to 81 m1 (
After the duplexing, if frames of ``'1'' continue for nB frames or more, the frame converted from 10'' to ``11'' is taken as the start of the audio.

■) 音響の始端検出後、無音区間出力端子7の値が1
1#から@0 ’ 4m変化し、0#のフレームが06
フレ一ム以上連続した場合(:、その時′″1”から“
θ′″に変化したフレームを音声の終端とする。
■) After detecting the start of the sound, the value of the silent section output terminal 7 becomes 1.
1# to @0' 4m change, 0# frame is 06
If more than one frame continues (:, then ``1'' to ``
The frame that changes to θ''' is the end of the audio.

(3)  音声の始端検出後、無音区間出力端子7の値
が10”のフレームが1フレ一ム以上nEフレーム未満
で、かつ無音区間出力端子7の値が@0”の区間(;パ
ワ・ディップ出力端子9の値か′″1#であるフレーム
が存在する時、無音区間出力端子7のイ1ムが1#から
“O”に変化したフレームを音声の終端とする。
(3) After detecting the start of the audio, the period where the value of the silent section output terminal 7 is 10" is one frame or more and less than nE frames, and the value of the silent section output terminal 7 is @0"(; power When there is a frame in which the value of the dip output terminal 9 is ``1#'', the frame in which the value of the silent section output terminal 7 changes from 1# to "O" is set as the end of the audio.

(4)  音声の始端検出後、無音区間出力端子7の値
が“1′で、かつパワ・ディップ出力91M子9の値が
″1”である時パワ・ディップ出力端子9の値が11m
の区間中、微係数出力端子10から入力される微係数値
最大のフレームを音声の終端及び次の音節の始端とする
(4) After detecting the start of the voice, when the value of the silent section output terminal 7 is "1'" and the value of the power dip output 91M child 9 is "1", the value of the power dip output terminal 9 is 11m.
During the interval, the frame with the maximum differential coefficient value input from the differential coefficient output terminal 10 is set as the end of the speech and the beginning of the next syllable.

なおこの発明の趣旨は無音区画のみではなく、音声パワ
の谷部をもって音響の終端とすること(:あり、その具
体的−理は必ずしも前述の(1)〜(4)の論理6:固
定されるものではなく、類似の一理を用いてもよい。音
節ごと(:音声を入力するため隣接音節間が接近してい
ても音節間の音声パワの2次微係数は大きくなる。
The purpose of this invention is not only to have a silent section, but also to make the valley of the audio power the end of the sound. Instead, a similar principle may be used.Since each syllable (: voice is input), even if adjacent syllables are close to each other, the second-order differential coefficient of the speech power between syllables will increase.

以上の方法で検出された音声区間は音1106忍畠部1
2で認識され、かな文字列がNM識結釆出力端子13か
ら出力される。音節認―法1−関して)よ例えば中津、
浜田、方弁、萬浜「日本誌単音節音声紹鵬法の検討」昭
8J56年反′電子荊伯学会情報システム部門全国大会
予価集1−117(昭56年10月)シー示されている
The audio section detected using the above method is sound 1106 Oshibatake 1
2, the kana character string is output from the NM recognition button output terminal 13. For example, Nakatsu,
Hamada, Hoben, Manhama, "Study of the Japanese Journal Monosyllabic Speech Introduction Method," 1988 J56 Anti-electronics Society Information Systems Division National Conference Preliminary Collection 1-117 (October 1982) .

第3図は以上説明した方法(二よる#ル区間株出の例で
ある。音声パワ算出部4においてj(〆〕られた音hパ
ワaをしきい値すと比較し、無音区nJI出カイj号e
が求められる。またパワ・ディップ検出6118 にお
いて求められた音声パワを2次曲線近似した場合の2次
微係数fが微係数出力端子1o(=出力されると共にし
きい値gとの比較≦二よりパワ・ディップ出力信号りが
求められる。これらの情報から前述のに!理1m、基づ
い°C音声区間の始端C1終喘dが決定される。
FIG. 3 is an example of the above-described method (2). chi j issue e
is required. In addition, the quadratic differential coefficient f when the audio power obtained in the power dip detection 6118 is approximated by a quadratic curve is outputted from the differential coefficient output terminal 1o (== and the comparison with the threshold value g≦2 shows that the power dip is The output signal is determined. From this information, the start point C1 and end point d of the voice section are determined based on the above-mentioned information.

以上説明したよう(−この発明は音声の終端検出に無音
区間検出のみでなく音声パワの谷部の情帳をも用いるた
め、各背節間纏:無音区間を置く必要がないという利点
がある。従って単語、文節内の各ね11′Oは擬似連続
的(二発声するなど利用者が任意の速反で入力可hll
:な茜速會節入力方式が実現できる。
As explained above, this invention has the advantage that it is not necessary to provide a silent section for each dorsal intersegmental, since the present invention uses not only silent section detection but also the trough information of the voice power to detect the end of speech. Therefore, each phrase 11'O in a word or phrase is pseudo-continuous (the user can input it in any rapid response, such as by saying it twice).
:A fast-paced input method can be realized.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は従来の百節入力方式(:おける音声区間検出の
例を示す波形図、弗2図はこの発明の一実施例の格1戊
を示すブロック図、第3図tまこの発明の首頗入カカ式
(−おける音声区間検出のE/IJを示す波形図である
。 にマイクロホン、2:帯域通過フィルタ、3:AD変換
器、4:音声パワ算出部、5:音声パワ出力端子、6:
無音区間検出部、7:無音区間出力端子、8:パワ・デ
ィップ検出部、9:パワ・ディップ出力端子、lO:微
係数出力端子、11:音声区間決定部、12:竹節認識
部、13:認識結果出力端子。 特許出願人  日本゛蟻伯電話公社
Fig. 1 is a waveform diagram showing an example of speech interval detection in the conventional 100-section input method (:), Fig. 2 is a block diagram showing case 1 戊 according to an embodiment of the present invention, and Fig. 3 It is a waveform diagram showing the E/IJ of voice section detection in the Kubikiri Kaka type (-). Microphone, 2: band pass filter, 3: AD converter, 4: voice power calculation unit, 5: voice power output terminal , 6:
Silent section detection section, 7: Silent section output terminal, 8: Power/dip detection section, 9: Power/dip output terminal, lO: Differential coefficient output terminal, 11: Voice section determining section, 12: Bamboo knot recognition section, 13: Recognition result output terminal. Patent applicant: Japan Aribo Telephone Corporation

Claims (1)

【特許請求の範囲】[Claims] (1) 音h0毎1:発声された音声の各音節を認識す
る音節入力方式(:おいて、入力音声のパワをあらかじ
め定めた長さのフレーム毎4:算出する手段と、その膨
出パワをあらかじめ定めたしきい値と比較すること6二
より無音区間を検出する手段と、上記算出した音声パワ
より検出した音声パワの谷をもって音節の終端とする手
段と、上記検出した無音区間、上記検出した音節終端(
−よって音声区間検出を行う手段を備えたことを特徴と
する音節入力方式。
(1) For each sound h0 1: A syllable input method that recognizes each syllable of the uttered voice (: A method for calculating the power of the input voice every frame of a predetermined length 4: A means for calculating the power of the input voice and its expansion power means for detecting a silent interval from 62 by comparing with a predetermined threshold; means for determining the end of a syllable at the valley of the voice power detected from the voice power calculated above; Detected syllable end (
- A syllable input method characterized by comprising means for detecting speech intervals.
JP57106570A 1982-06-21 1982-06-21 Syllable inputting system Granted JPS58223198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP57106570A JPS58223198A (en) 1982-06-21 1982-06-21 Syllable inputting system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP57106570A JPS58223198A (en) 1982-06-21 1982-06-21 Syllable inputting system

Publications (2)

Publication Number Publication Date
JPS58223198A true JPS58223198A (en) 1983-12-24
JPH0259480B2 JPH0259480B2 (en) 1990-12-12

Family

ID=14436909

Family Applications (1)

Application Number Title Priority Date Filing Date
JP57106570A Granted JPS58223198A (en) 1982-06-21 1982-06-21 Syllable inputting system

Country Status (1)

Country Link
JP (1) JPS58223198A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01219623A (en) * 1988-02-29 1989-09-01 Nec Home Electron Ltd Automatic score taking method and apparatus

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56135898A (en) * 1980-03-26 1981-10-23 Sanyo Electric Co Voice recognition device
JPS58168800U (en) * 1982-05-07 1983-11-10 株式会社日立製作所 audio cutting device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56135898A (en) * 1980-03-26 1981-10-23 Sanyo Electric Co Voice recognition device
JPS58168800U (en) * 1982-05-07 1983-11-10 株式会社日立製作所 audio cutting device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01219623A (en) * 1988-02-29 1989-09-01 Nec Home Electron Ltd Automatic score taking method and apparatus

Also Published As

Publication number Publication date
JPH0259480B2 (en) 1990-12-12

Similar Documents

Publication Publication Date Title
JPH10254475A (en) Speech recognition method
JPS62115199A (en) Voice responder
JPS5982608A (en) System for controlling reproducing speed of sound
JPS58223198A (en) Syllable inputting system
JP3266124B2 (en) Apparatus for detecting similar waveform in analog signal and time-base expansion / compression device for the same signal
JP2000099099A (en) Data reproducing device
JPS60129796A (en) Sillable boundary detection system
JPS6043697A (en) Boundary detector between consonant and vowel
JPH0567040B2 (en)
JPS6217800A (en) Voice section decision system
JPS61260299A (en) Voice recognition equipment
JPS60198596A (en) Syllable boundary selection system
JPH02254500A (en) Vocalization speed estimating device
JPH07104675B2 (en) Speech recognition method
KR930011736B1 (en) Pitch control method of voice signal
JPH0474720B2 (en)
JPS6027000A (en) Pattern matching
JPH02192335A (en) Word head detecting system
JPS63217399A (en) Voice section detecting system
JPH09146575A (en) Uttering speed detecting method
JPS59123900A (en) Detection of long vowel for voice input unit
JPS6256998A (en) Consonant section detector
JPS6225796A (en) Voice recognition equipment
JPS5925240B2 (en) Word beginning detection method for speech sections
JPS5969798A (en) Extraction of pitch