JPS6410080B2

JPS6410080B2 -

Info

Publication number: JPS6410080B2
Application number: JP56213731A
Authority: JP
Inventors: Junichi Ichikawa; Takayuki Ooyama; Yasuo Sato; Osamu Terao; Hidekazu Shiratori
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-12-29
Filing date: 1981-12-29
Publication date: 1989-02-21
Also published as: JPS58116600A

Description

【発明の詳細な説明】 (1) 発明の技術分野本発明は入力音声パターンの最後尾音節の無声
化又は非無声化に拘らずその入力音声パターンを
認識させ得る標準パターンを登録しうる標準パタ
ーン登録方式に関する。[Detailed Description of the Invention] (1) Technical Field of the Invention The present invention provides a standard pattern that can register a standard pattern that can recognize an input speech pattern regardless of whether the last syllable of the input speech pattern is devoiced or unvoiced. Regarding registration method.

(2) 技術の背景音声認識システムにおける認識手段の１つとし
て予め標準パターンを登録しておき、このパター
ンと入力音声パターンとを照合してその一致から
入力音声を認識する技法が用いられている。この
技法において、入力音声パターンの最後尾音節が
言葉によつては無声化してしまい、音声認識上の
１つの障害となつている。(2) Background of the technology As one of the recognition methods in speech recognition systems, a technique is used in which a standard pattern is registered in advance, this pattern is compared with an input speech pattern, and the input speech is recognized based on the match. . In this technique, the last syllable of the input speech pattern becomes devoiced depending on the word, which is an obstacle in speech recognition.

(3) 従来技術と問題点このような不都合を解決すべく、最後尾音節が
無声化し易い言葉については何回かその言葉を発
声し、その平均化したパターンを標準パターンと
して登録しておき、認識すべき入力音声パターン
との照合を行うか、上記発声して得られたパター
ンを複数個登録してこれと音声認識のための照合
を行う如き技法が用いられている。(3) Prior Art and Problems In order to solve these inconveniences, for words whose final syllables tend to be devoiced, the words are uttered several times and the averaged pattern is registered as a standard pattern. Techniques are used, such as matching the input speech pattern to be recognized, or registering a plurality of patterns obtained by the above-mentioned utterances and matching them for speech recognition.

しかしながら、これらは標準パターンの登録に
際して使用者に負担をかける割に認識率の向上が
望めないばかりでなく、上記後者の技法にあつて
は標準パターンを登録しておくための記憶容量の
増大も避けられない。 However, these methods not only impose a burden on the user when registering the standard patterns but cannot expect to improve the recognition rate, and in the case of the latter technique, the storage capacity required to register the standard patterns also increases. Inevitable.

(4) 発明の目的本発明は上述の如き従来技法の有する欠点に鑑
みて創案されたもので、その目的は使用者に負担
をかけず認識率の向上を促し、しかも記憶容量の
増大もない標準パターン登録方式を提供すること
にある。(4) Purpose of the Invention The present invention was devised in view of the drawbacks of the conventional techniques as described above, and its purpose is to improve the recognition rate without imposing any burden on the user, and without increasing the storage capacity. The objective is to provide a standard pattern registration method.

(5) 発明の構成そして、この目的は最後尾音節が無声化し易い
単語の標準パターンの登録に際し、その単語の発
声された音声パターンの最後尾音節を、平均音声
パターンから得られる無声化した最後尾音節及び
無声化していない最後尾音節と入れ替えて第１及
び第２の合成音声パターンを発生し、これら合成
音声パターンと上記発声された音声パターンとを
照合して照合距離の大きい合成音声パターン及び
上記発声された音声パターンを標準パターンとし
て登録することによつて達成される。(5) Structure of the Invention The purpose of this invention is to register the standard pattern of a word whose final syllable is easily devoiced, and to convert the final syllable of the uttered speech pattern of the word into the devoiced final syllable obtained from the average speech pattern. First and second synthesized speech patterns are generated by replacing the tail syllable and the last syllable that has not been devoiced, and these synthesized speech patterns are compared with the uttered speech pattern to generate a synthesized speech pattern with a large matching distance and a synthesized speech pattern. This is achieved by registering the uttered voice pattern as a standard pattern.

(6) 発明の実施例以下、添付図面を参照しながら、本発明の実施
例を説明する。(6) Embodiments of the invention Hereinafter, embodiments of the invention will be described with reference to the accompanying drawings.

第１図は本発明方式を実施した音声認識システ
ムを示す。１はマイクロホン、２は周波数分析
部、３はパラメータ抽出部、４はセグメンテーシ
ヨン部、５はパターンバツフア、６は切換え手
段、７はパターン照合部、８は標準パターン記憶
部で、これらは従来の音声認識システムを構成し
ている。 FIG. 1 shows a speech recognition system implementing the method of the present invention. 1 is a microphone, 2 is a frequency analysis section, 3 is a parameter extraction section, 4 is a segmentation section, 5 is a pattern buffer, 6 is a switching means, 7 is a pattern matching section, and 8 is a standard pattern storage section. It constitutes a conventional speech recognition system.

そのパターンバツフア５と標準パターン記憶部
８との間に標準パターン作成部９が介設されて本
発明が上記従来の音声認識システム内で実施され
ている。そして、パターンバツフア５も本発明構
成の一部をなし、本発明の具体的構成は第２図に
示されている。 A standard pattern creation section 9 is interposed between the pattern buffer 5 and the standard pattern storage section 8, and the present invention is implemented within the above-mentioned conventional speech recognition system. The pattern buffer 5 also constitutes a part of the structure of the present invention, and a specific structure of the present invention is shown in FIG.

第２図において、１０はパターンバツフア（第
１図の参照番号５と同じ）で、これには入力音声
パターンの内の最後尾音節以外のパターン部をＡ
で、また最後尾音節を０で示してある。１１，１
２は夫々、パターンバツフア１０から入力音声パ
ターンを受ける合成部にある。合成部１１は非無
声化音節パターン供給部１３へ接続されている。 In FIG. 2, 10 is a pattern buffer (same as reference numeral 5 in FIG. 1).
Also, the last syllable is indicated by 0. 11,1
2 are in synthesis sections that receive input speech patterns from the pattern buffer 10, respectively. The synthesis section 11 is connected to a devoiced syllable pattern supply section 13 .

この供給部には、複数人の発声から得られる平
均音声パターンから求められた無声化していない
最後尾音節のパターンが記憶されており、合成部
１１へ供給される。 This supply section stores a pattern of the last syllable that is not devoiced, which is determined from an average speech pattern obtained from the utterances of a plurality of people, and supplies it to the synthesis section 11 .

また、合成部１２は無声化音声パターン供給部
１４へ接続されている。この供給部１４には、複
数人の発声から得られる平均音声パターンから求
められた無声化した最後尾音節のパターンが記憶
されており、合成部１２へ供給される。 Furthermore, the synthesis section 12 is connected to a devoiced speech pattern supply section 14 . This supply section 14 stores a devoiced last syllable pattern obtained from the average speech pattern obtained from the utterances of a plurality of people, and supplies it to the synthesis section 12.

１５，１６は夫々、合成音声パターン記憶部で
ある。 15 and 16 are synthetic speech pattern storage units, respectively.

１７，１８は夫々、照合部で、照合部１７は合
成音声パターン記憶部１５とパターンバツフア１
０に接続され、記憶部１５からの第１の合成音声
パターンとパターンバツフア１０からの入力音声
パターンとを照合し、これら間の照合距離を出力
する。照合部１８もまた、同様に、合成音声パタ
ーン記憶部１６とパターンバツフア１０とに接続
され、記憶部１６からの第２の合成音声パターン
とパターンバツフア１０からの入力音声パターン
とを照合し、これら間の照合距離を出力する。 17 and 18 are matching units, respectively, and the matching unit 17 includes the synthesized speech pattern storage unit 15 and the pattern buffer 1.
0, matches the first synthesized speech pattern from the storage section 15 and the input speech pattern from the pattern buffer 10, and outputs the matching distance between them. The matching section 18 is also connected to the synthetic speech pattern storage section 16 and the pattern buffer 10, and matches the second synthetic speech pattern from the storage section 16 with the input speech pattern from the pattern buffer 10. , output the matching distance between them.

１９はパターン選択部であり、これは照合距離
の大きい合成音声パターンと入力音声パターンと
を選出して標準パターン記憶部２０（第１図の参
照番号８と同じ）へ供給するように構成されてい
る。 19 is a pattern selection section, which is configured to select a synthesized speech pattern and an input speech pattern with a large matching distance and supply them to the standard pattern storage section 20 (same as reference numeral 8 in FIG. 1). There is.

次に、第２図装置を用いて入力音声パターンと
照合される標準パターンが登録されるまでの過程
を説明する。 Next, the process of registering a standard pattern to be matched with an input voice pattern using the apparatus shown in FIG. 2 will be explained.

音節がa₁，a₂，………a_Nから成り最後尾音節a_N
が無声化し易い単語Ａがマイクロホン１に向けて
発生され、その出力信号が従来と同様に、周波数
分析部２、パラメータ抽出部３、セグメンテーシ
ヨン部４で処理され、その入力音声パターンA_P
がパターンバツフア１０に置かれる。 The syllables consist of a ₁ , a ₂ , ......a _N , and the last syllable is a _N
A word A that is likely to be devoiced is generated toward the microphone 1, and its output signal is processed by the frequency analysis section 2, parameter extraction section 3, and segmentation section 4 as in the conventional case, and the input speech pattern A _P
is placed in the pattern buffer 10.

これに先立つて、非無声化パターン供給部１３
には、上述した無声化していない最後尾音節a_Nの
パターンa^T _Nが記憶され、また無声化パターン供給
部１４には、上述した無声化した最後尾音節a_Nの
パターンa^U _Nが記憶され、夫々のパターンは上記入
力音声パターンA^Pのパターンバツフア１０から
の出力時に夫々の供給部から合成部１１，１２へ
供給される。 Prior to this, the devoicing pattern supply unit 13
, the pattern a ^T _N of the unvoiced last syllable a _N described above is stored, and the devoicing pattern supply unit 14 stores the pattern a ^U _N of the devoiced final syllable a _N described above. The respective patterns are supplied from the respective supply sections to the synthesis sections 11 and 12 when the input audio pattern ^AP is output from the pattern buffer 10.

これらの供給を受ける合成部１１，１２におい
て次のような処理がなされる。即ち、合成部１１
においては、入力音声パターンA^Pの最後尾音節
a_Nのパターンと無声化していないパターンa^T _Nとが
入れ替えられて第１の合成音声パターンA^T〔その
最後尾音節を“１”で示してある〕が発生され、
合成音声パターン記憶部１５へ供給されてそこに
記憶される。また、合成部１２においては、入力
音声パターンA^Pの最後尾音節a_Nのパターンが無
声化したパターンa^U _Nと入れ替えられて第２の合成
音声パターンA^U〔その最後尾音節を“２”で示し
てある〕が発生され、合成音声パターン記憶部１
６へ供給されてそこに記憶される。 The following processing is performed in the combining units 11 and 12 that receive these supplies. That is, the synthesis section 11
, the last syllable of the input speech pattern A ^P
The pattern of a _N and the unvoiced pattern a ^T _N are exchanged to generate a first synthesized speech pattern A ^T [the last syllable of which is indicated by "1"],
The signal is supplied to the synthesized speech pattern storage section 15 and stored there. In addition, in the synthesis unit 12, the pattern of the last syllable a ^N of the input speech pattern A _P is replaced with the devoiced pattern a ^U _N to form a second synthesized speech pattern A ^U [the last syllable is "2"]. ] is generated, and the synthesized speech pattern storage unit 1
6 and stored there.

合成音声パターン記憶部１５の第１の合成音声
パターンA^T及びパターンバツフア１０の入力音
声パターンA^Pが照合部１７へ供給され、これら
両パターンが照合されてその両者間の照合距離が
出力される。これと並行して、合成音声パターン
記憶部１６の第２の合成音声パターンA^U及びパ
ターンバツフア１０の入力音声パターンA^Pが照
合部１８へ供給され、これら両パターンが照合さ
れてそれら両者間の照合距離が出力される。 The first synthesized speech pattern A ^T in the synthesized speech pattern storage section 15 and the input speech pattern A ^P in the pattern buffer 10 are supplied to the matching section 17, these two patterns are matched, and the matching distance between them is output. Ru. In parallel with this, the second synthesized speech pattern A ^U of the synthesized speech pattern storage section 16 and the input speech pattern A ^P of the pattern buffer 10 are supplied to the collation section 18, and these two patterns are collated. The matching distance of is output.

これら両照合距離がパターン選択部１９へ供給
され、そこにおいて照合距離の大きい合成音声パ
ターンA^U又はA^Tが選択されて出力されると共に
入力音声パターンA^Pが出力される。 Both of these matching distances are supplied to the pattern selection section 19, where the synthesized speech pattern A ^U or ^AT with the larger matching distance is selected and output, and the input speech pattern ^AP is also output.

これら両パターンが標準パターンとして標準パ
ターン記憶部２０に記憶される。 Both of these patterns are stored in the standard pattern storage section 20 as standard patterns.

このような登録は唯一回の操作で完了する。ま
た、その標準パターンには、上述の如き登録のた
めに発声された単語に無声化が生じているか否か
に拘わらず、その単語について発声された音声パ
ターンと、その単語のための最後尾音節が無声化
していない音声パターン又は最後尾音節が無声化
した音声パターンとが標準パターンとして含まれ
ているから、上述のような登録後に上記登録され
た標準パターンに対応する単語が無声化されて発
声されようが、また無声化せずに発声されよう
が、その音声を認識しうるから、その認識率を向
上させうる。これに加えて、一つの単語毎に数多
くの標準パターンを登録する場合に比し記憶容量
が少なくて済むばかりでなく処理も簡略化する。 Such registration is completed in a single operation. In addition, the standard pattern includes the sound pattern uttered for the word and the last syllable for the word, regardless of whether or not the word uttered is devoiced for registration as described above. Since the standard pattern includes a speech pattern in which the last syllable is not devoiced or a speech pattern in which the last syllable is devoiced, after registration as described above, the word corresponding to the registered standard pattern is devoiced and uttered. Since the speech can be recognized regardless of whether the speech is uttered without devoicing, the recognition rate can be improved. In addition, compared to the case where a large number of standard patterns are registered for each word, not only does the storage capacity become smaller, but the processing is also simplified.

(7) 発明の効果以上の説明から明らかなように、本発明によれ
ば、次の効果が得られる。(7) Effects of the invention As is clear from the above explanation, according to the present invention, the following effects can be obtained.

(1) １回の操作で標準パターンを登録しうる。(1) Standard patterns can be registered in one operation.

(2) 従つて、登録処理の簡略化を達成しうる。(2) Therefore, the registration process can be simplified.

(3) このような標準パターンの登録において、音
声の認識率を向上させ得る等である。(3) In registering such standard patterns, the speech recognition rate can be improved.

[Brief explanation of drawings]

第１図は本発明を実施する音声認識システムの
構成を示す図、第２図は本発明の実施例を示す図
である。図中、１０はパターンバツフア、１１，１２は
合成部、１３は非無声化音節パターン供給部、１
４は無声化音節パターン供給部、１５，１６は合
成音声パターン記憶部、１７，１８は照合部、１
９はパターン選択部である。 FIG. 1 is a diagram showing the configuration of a speech recognition system implementing the present invention, and FIG. 2 is a diagram showing an embodiment of the present invention. In the figure, 10 is a pattern buffer, 11 and 12 are synthesis units, 13 is a devoiced syllable pattern supply unit, and 1
4 is a devoiced syllable pattern supply unit, 15 and 16 are synthesized speech pattern storage units, 17 and 18 are collation units, 1
9 is a pattern selection section.

Claims

[Claims]

1. When registering a standard pattern for a word whose final syllable is likely to be devoiced, the final syllable of the vocalized speech pattern of the word is determined by devoicing the final syllable obtained from the average speech pattern and the final unvoiced syllable. , generate first and second synthesized speech patterns, compare these synthesized speech patterns with the uttered speech pattern, and use the synthesized speech pattern with a large matching distance and the uttered speech pattern as a standard pattern. A standard pattern registration method characterized by registration.