JPH04128899A

JPH04128899A - Voice recognition device

Info

Publication number: JPH04128899A
Application number: JP2252493A
Authority: JP
Inventors: Tatsuro Matsumoto; 達郎松本
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-09-20
Filing date: 1990-09-20
Publication date: 1992-04-30

Abstract

PURPOSE:To speedily and accurately recognize a correct word by taking accent into consideration for rendering data obtained from a voice. CONSTITUTION:An AD conversion part 31 converts the voice into a digital value and a feature extraction part 32 extracts the feature quantity of its voice waveform. A score calculation part 33 matches a template with the feature quantity of the input voice to calculate a matching score. A score order sorting part 34 classifies candidates according to their scores, a priority order sorting part 41 reclassifies and displays the candidates at a rendering display part 42, and a rendering determination 43 perform determination. Simultaneously, an access information extraction part 51 extracts the accent information of the input voice and collates it with accent information stored in an accent information matching part 52. A display control part 53 performs control so that candidates having large consistency of accent information are displayed preferentially on a candidate display part 6. A candidate selection part 7 outputs a word selected by a user.

Description

【発明の詳細な説明】〔概　要〕音声からその音声が表わす単語を認識する音声認識装置
に関し、音声より得られる読みのデータにアクセントを考慮して
正しい単語を迅速かつ正確に認識することを目的とし、音声の特徴を蓄積した音声辞書と、人力された音声を識
別し前記音声辞書を参照してこの音声に該当する候補を
出力する音声識別部と、この音声識別部の出力した候補
の読みを確定する読み確定部と、音声のアクセントの特
徴を蓄積したアクセント辞書と、入力された音声のアク
セントの特徴を識別し前記読み確定部で確定した前記入
力された音声の読みを入力し前記アクセント辞書を参照
してこの読みのアクセントに対応した単語を出力するア
クセント識別部と、このアクセント識別部の出力する単
語を表示する候補表示部と、この候補表示部に表示され
た単語の中から入力された音声を表す単語を選択する候
補選択部とを備えるよう構成する。[Detailed Description of the Invention] [Summary] Regarding a speech recognition device that recognizes words expressed by speech from speech, the present invention is capable of quickly and accurately recognizing correct words by considering accents in reading data obtained from speech. A voice dictionary that stores voice features; a voice recognition unit that identifies human-generated voices and outputs candidates corresponding to the voice by referring to the voice dictionary; a pronunciation determining section that determines the pronunciation; an accent dictionary that stores features of the accents of voices; An accent identification section that refers to an accent dictionary and outputs a word that corresponds to this reading accent; a candidate display section that displays the words output by this accent identification section; and a candidate display section that displays words from among the words displayed in this candidate display section. and a candidate selection unit that selects a word representing the input voice.

[Industrial application field]

本発明は、音声からその音声が表す単語を認識する音声
認識装置に関する。The present invention relates to a speech recognition device that recognizes words expressed by speech from speech.

音声による文書作成、データベースや辞書の検索には大
語霊の音声認識システムが必要とされる。Daigotama's voice recognition system is required to create documents by voice and search databases and dictionaries.

現在の大語霊音声認識システムでは、認識スコアの最も
高い候補が正解となる可能性が低い、これは、大語案の
場合、類似する発声の単語、例えば子音が一箇所だけ異
なっていたりするような単語が多数存在するので、認識
スコアが高くても実際に発声された単語とは異なる場合
が多くなる。このため複数の候補をデイスプレーに表示
し、その中から正解を選択する方法が用いられている。In the current Daigo speech recognition system, the candidate with the highest recognition score is unlikely to be the correct answer. Since there are many such words, even if the recognition score is high, the words often differ from the words actually uttered. For this reason, a method is used in which multiple candidates are displayed on a display and the correct answer is selected from among them.

[Conventional technology]

従来用いられている音声認識システムの原理を第２図を
用いて説明する。The principle of a conventionally used speech recognition system will be explained with reference to FIG.

同図において、音声が入力されると、スコア計算部１０
１において、辞書１０４に記憶されている各候補となる
単語のテンプレートと入力音声のマツチングを行い、候
補となる単語のスコアが計算され、候補表示部１０２が
、この計算に基づきある基準以上の候補をスコア順とか
、辞書順に表示され、候補選択部１０３はユーザの正解
の選択に基づいて認識結果を入力音声を表す単語Ｃ文字
列）として出力する。In the figure, when audio is input, the score calculation unit 10
1, the input speech is matched with the template of each candidate word stored in the dictionary 104, the score of the candidate word is calculated, and the candidate display unit 102 displays candidates that are higher than a certain standard based on this calculation. are displayed in score order or dictionary order, and the candidate selection unit 103 outputs the recognition result as a word (C character string) representing the input voice based on the user's selection of the correct answer.

〔発明が解決しようとする課ｆｌ）上述したように、大語霊音声認識システムでは、複数の
候補をデイスプレーに表示し、ユーザがその中から正解
を選択する方法がとられている。複数の候補をデイスプ
レーに表示する方式として、辞書順に表示したり、正解
である可能性の高い候補を輝度や色を変えて表示してい
る。しかし、このようにユーザが候補を選択しやすいイ
ンタフェースを用いて読みの候補を選択しても、日本語
には同音異義語が多数存在するため、目的とする漢字を
含む単語を迅速に選択することが困難な場合が多い。な
お、本明細書における単語とは助詞を含む文節も意味す
るものとする。[Problem to be Solved by the Invention fl] As described above, the Daigotosei speech recognition system employs a method in which a plurality of candidates are displayed on the display and the user selects the correct answer from among them. Methods for displaying multiple candidates on the display include displaying them in dictionary order, and displaying candidates with a high probability of being correct by changing their brightness and color. However, even if the user selects pronunciation candidates using an interface that makes it easy for the user to select candidates, since there are many homonyms in Japanese, it is difficult to quickly select words that include the desired kanji. This is often difficult. Note that the word in this specification also means a clause that includes a particle.

本発明は、上記問題点に鑑みてなされたもので、音声よ
り得られる読みのデータにアクセントを考慮して、正し
い単語を迅速かつ正確に得ることのできる音声認識装置
を提供することを目的とする。The present invention has been made in view of the above-mentioned problems, and an object of the present invention is to provide a speech recognition device that can quickly and accurately obtain the correct word by taking into account accents in reading data obtained from speech. do.

[Means to solve the problem]

第１図は本発明の原理図である０本発明は、従来の読み
のデータにアクセントのデータを加えて候補となる単語
の範囲を限定したものであり、本発明の音声認識装置は
、音声の特徴を蓄積した音声辞書１と、入力された音声
を識別し前記音声辞書ｌを参照してこの音声に該当する
候補を出力する音声識別部３と、この音声識別部２の出
力した候補の読みを確定する読み確定部４と、音声のア
クセントの特徴を蓄積したアクセント辞書２と、入力さ
れた音声のアクセントの特徴を識別し前記読み確定部４
で確定した前記入力された音声の読みを入力し前記アク
セント辞書２を参照してこの読みのアクセントに対応し
た単語を出力するアクセント識別部５と、このアクセン
ト識別部５の出力する単語を表示する候補表示部６と、
この候補表示部６に表示された単語の中から入力された
音声を表す単語を選択する候補選択部７とを備えたもの
である。また、前記アクセント識別部５が、入力音声の
アクセント型に一致する度合いに応じて順序付けした単
語を前記候補表示部６に表示させるようにする。また、
前記アクセント識別部５が入力音声の基本周波数を抽出
してアクセント型を推定するようにする。また、前記ア
クセント辞書２に方言に応じたアクセントデータを蓄積
し、前記アクセント識別部５が入力音声の方言に応じた
アクセントデータを参照するようにする。また、前記ア
クセント辞書２に標準アクセントデータを蓄積し、前記
アクセント識別部５に標準アクセントと各方言アクセン
トの変換規則を記憶し入力音声の方言に対応して前記標
準アクセントデータを変換して参照するようにする。ま
た、前記アクセント識別部５が、同一の音声入力者の最
初の音声に対する前記候補選択部７の選択結果から前記
音声入力者の方言を決定するようにする。また、前記ア
ク４セント辞書２には単語ごとに方言別のアクセント情
報とこの方言別のアクセント情報を前記アクセント識別
部５が使用した使用頻度を記憶しておき、前記アクセン
ト識別部５が入力音声に対応した単語を選定する隙この
使用頻度も考慮するようにする。さらに、前記候補選択
部７が選択した単語が、前記方言別のアクセント情報に
基づいて前記アクセント識別部５より出力された単語と
異なったときは、この出力された単語を決定するのに使
用した前記方言別のアクセント情報の使用頻度を修正す
るようにする。また、前記候補選択部７が選択した単語
について前記アクセント辞書２に格納されているその単
語に対するアクセント型が、この単語に対して前記アク
セント識別部５が認識したアクセント型と異なるときは
、前記アクセント辞書２に格納されているその単語に対
するアクセント型を前記アクセント識別部５が認識した
アクセント型に変更するようにする。Figure 1 is a diagram showing the principle of the present invention.The present invention adds accent data to conventional pronunciation data to limit the range of candidate words. a speech dictionary 1 which has accumulated the features of the speech dictionary 1; a speech identification section 3 which identifies the input speech and outputs a candidate corresponding to the speech by referring to the speech dictionary 1; A pronunciation determining unit 4 that determines the pronunciation, an accent dictionary 2 that stores accent characteristics of voices, and a pronunciation determining unit 4 that identifies the accent characteristics of the input voice.
an accent identification unit 5 which inputs the pronunciation of the input voice determined by and outputs a word corresponding to the accent of this pronunciation by referring to the accent dictionary 2; and displays the word outputted by the accent identification unit 5. Candidate display section 6;
The device includes a candidate selection section 7 that selects a word representing the input voice from among the words displayed on the candidate display section 6. Further, the accent identifying section 5 causes the candidate display section 6 to display words ordered according to the degree of matching with the accent type of the input speech. Also,
The accent identification unit 5 extracts the fundamental frequency of the input voice and estimates the accent type. Further, accent data corresponding to the dialect is stored in the accent dictionary 2, so that the accent identifying section 5 refers to the accent data corresponding to the dialect of the input voice. Further, the standard accent data is stored in the accent dictionary 2, and the conversion rules between the standard accent and each dialect accent are stored in the accent identification unit 5, and the standard accent data is converted in accordance with the dialect of the input speech and is referred to. do it like this. Further, the accent identifying section 5 determines the dialect of the voice inputting person based on the selection result of the candidate selecting section 7 for the first voice of the same voice inputting person. In addition, the accent identification section 5 stores accent information for each dialect and the frequency of use of this accent information for each dialect in the accent dictionary 2. Consider the frequency of use when selecting words that correspond to the words. Further, when the word selected by the candidate selection section 7 is different from the word output from the accent identification section 5 based on the dialect-specific accent information, the word selected by the candidate selection section 7 is different from the word output from the accent identification section 5 based on the dialect-specific accent information. The frequency of use of the accent information for each dialect is corrected. Further, when the accent type for the word selected by the candidate selection unit 7 and stored in the accent dictionary 2 is different from the accent type recognized by the accent identification unit 5 for this word, the accent The accent type for the word stored in the dictionary 2 is changed to the accent type recognized by the accent identifying section 5.

[For production]

音声が入力されると音声識別部３はこの音声を識別し、
音声辞書１を参照してこの音声に該当する候補を出し、
読み確定部４はこの候補の読みを確定する。一方アクセ
ント識別部５は、入力された音声のアクセントの特徴を
識別し、前記読み確定部４で確定した読みに基づきアク
セント辞書２に格納された、読みに対応する単語のアク
セント情報を参照してアクセントに対応した単語を出力
する。候補表示部６は、アクセント識別部５の出力した
単語を表示し、この表示された単語の中から候補選択部
７は入力された音声に対する正しい単語を選択する。こ
のようにアクセントを考慮して候補を限定するので、迅
速な選択が可能となる。When a voice is input, the voice recognition unit 3 identifies this voice,
Refer to the voice dictionary 1 to find candidates that correspond to this voice,
The reading determination unit 4 determines the reading of this candidate. On the other hand, the accent identification unit 5 identifies the characteristics of the accent of the input voice, and based on the pronunciation determined by the pronunciation determination unit 4, refers to the accent information of the word corresponding to the pronunciation stored in the accent dictionary 2. Output the word corresponding to the accent. The candidate display section 6 displays the words output by the accent identification section 5, and the candidate selection section 7 selects the correct word for the input speech from among the displayed words. In this way, candidates are limited in consideration of the accent, making it possible to make a quick selection.

この際、候補表示部６はアクセント識別部５の出力した
単語を入力音声のアクセント型に一致する度合いに応じ
て順序付けして表示することにより、選択範囲が限定さ
れ迅速な選択ができるようになる。また、アクセント識
別部５が入力音声のアクセントの特徴を把握するのに入
力音声の基本周波数を抽出して、周波数解析等を行うこ
とにより、正確にアクセントの特徴を把握することがで
きる。At this time, the candidate display unit 6 displays the words output by the accent identification unit 5 in an order according to the degree of matching with the accent type of the input voice, thereby limiting the selection range and allowing quick selection. . Further, when the accent identification unit 5 extracts the fundamental frequency of the input voice and performs frequency analysis or the like to grasp the accent characteristics of the input voice, it is possible to accurately grasp the accent characteristics.

また入力音声は方言で行われることが多いので、方言に
応したアクセントデータをアクセント辞書に格納して方
言に応じたアクセントデータを参照することによりアク
セントの特徴を正確に把握できるようになる。また、ア
クセント辞書２には標準アクセントデータを蓄積してお
き、アクセント識別部５には標準アクセントと各方言ア
クセントの変換則を記憶しておき入力音声の方言に応じ
て標準アクセントを変換することにより方言に応じた正
しいアクセントの特徴を把握することができる。Furthermore, since the input speech is often in a dialect, by storing accent data corresponding to the dialect in an accent dictionary and referring to the accent data corresponding to the dialect, it becomes possible to accurately grasp the characteristics of the accent. In addition, the accent dictionary 2 stores standard accent data, and the accent identification unit 5 stores conversion rules between the standard accent and each dialect accent, and converts the standard accent according to the dialect of input speech. Be able to understand the characteristics of correct accents depending on the dialect.

また、同一の音声入力者の最初の音声に対する候補選択
部の選択結果から入力音声の方言を決めるようにするこ
とにより、以降は方言に応じた正しい決定をすることが
できる。Further, by determining the dialect of the input voice based on the selection result of the candidate selection unit for the first voice of the same voice input person, it is possible to make a correct determination depending on the dialect from now on.

また、アクセント辞書２には単語ごとに方言別のアクセ
ント情報と、この方言別のアクセント情報をアクセント
識別部５が使用した使用頻度を記憶し、この使用頻度の
大きいものを優先的に使用することによりアクセント識
別部５の出力する単語の正解率が向上する。この際アク
セント識別部５が使用したアクセント情報の単語が候補
選択部７か選択した単語と相違した時は、アクセント識
別部５の使用したアクセント情報は正しくなかったとし
てその使用頻度を減少させ、より正しいものが使用され
る確率を大きくする。In addition, the accent dictionary 2 stores dialect-specific accent information for each word and the frequency of use of this dialect-specific accent information by the accent identification unit 5, and preferentially uses the most frequently used accent information. This improves the accuracy rate of words output by the accent identification section 5. At this time, if the word of the accent information used by the accent identification section 5 is different from the word selected by the candidate selection section 7, the accent information used by the accent identification section 5 is deemed to be incorrect, and the frequency of its use is reduced. Increase the probability that the correct one will be used.

また、候補選択部７が選択した単語について、アクセン
ト辞書２に格納されているその単語に対するアクセント
型が、この単語に対してアクセント識別部５が認識した
アクセント型と異なるときは、アクセント辞書２のアク
セント型を、アクセント識別部５で認識したアクセント
型に変更することにより、アクセント辞書２の内容を使
用に応じて正しく更新してゆくことができる。Further, for a word selected by the candidate selection unit 7, if the accent type for the word stored in the accent dictionary 2 is different from the accent type recognized by the accent identification unit 5 for this word, the accent type for the word is stored in the accent dictionary 2. By changing the accent type to the accent type recognized by the accent identification section 5, the contents of the accent dictionary 2 can be updated correctly according to usage.

〔Example〕

以下、本発明の実施例を図面を参照して説明する。 Embodiments of the present invention will be described below with reference to the drawings.

第３図は本発明の第１実施例を示すブロック図である０
本実施例は、所定の音声に対する特徴量を記憶したテン
プレート記憶部１０と、所定の音声に対するアクセント
情報を記憶したアクセント情報記憶部２０と、音声をデ
ジタル変換するＡＤ変換部３１と、このデジタル値を波
形分析し音声の特徴量を抽出する特徴抽出部３２と、こ
の特徴量と、テンプレート記憶部１０に記憶されたテン
プレートとのマツチングを行い、マツチングしたテンプ
レートに対応する候補のマツチングスコアを計算するス
コア計算部３３と、このマツチングスコアの順に選定し
た候補を分類するスコア順ソート部３４と、スコア順ソ
ート部３４で分類した候補を所定の基準で分類する優先
順ソート部４１と、この優先順ソート部４１で分類した
もののうち所定の順位までの候補を表示する読み表示部
４２と、この読み表示部４２に表示された読みのうちか
ら正しい読みを確定する読み確定部４３と、ＡＤ変換部
３１の出力するデジタル値から入力音声のアクセント情
報を抽出するアクセント情報抽出部５１と、読み確定部
４３で確定した読みについてアクセント情報記憶部２０
に格納されているアクセント情報と、入力音声より抽出
されたアクセント情報の照合をするアクセント情報照合
部５２と、このアクセント情報照合に基づきアクセント
情報の一致が大きい候補の優先表示を制御する表示制御
部５３と、この表示制御部５３の指示する候補と、その
優先表示を行う候補表示部６と、この候補表示部６に表
示された候補より正解の単語を選択する候補選択部７よ
り構成される。FIG. 3 is a block diagram showing the first embodiment of the present invention.
This embodiment includes a template storage section 10 that stores feature amounts for a predetermined voice, an accent information storage section 20 that stores accent information for a predetermined voice, an AD conversion section 31 that digitally converts the voice, and the digital value A feature extraction unit 32 analyzes the waveform of the voice and extracts the feature quantity of the voice, matches this feature quantity with the template stored in the template storage unit 10, and calculates the matching score of the candidate corresponding to the matched template. a score calculation unit 33 that performs matching, a score order sorting unit 34 that sorts the candidates selected in the order of the matching scores, a priority order sorting unit 41 that sorts the candidates classified by the score order sorting unit 34 based on predetermined criteria; A pronunciation display section 42 that displays candidates up to a predetermined rank among those classified by the priority sorting section 41, a pronunciation confirmation section 43 that determines the correct pronunciation from among the pronunciations displayed on the pronunciation display section 42, and an AD an accent information extraction section 51 that extracts accent information of input speech from the digital value output from the conversion section 31; and an accent information storage section 20 for the pronunciation determined by the pronunciation determination section 43;
an accent information matching unit 52 that matches the accent information stored in the input voice with accent information extracted from the input speech; and a display control unit that controls preferential display of candidates with a large matching accent information based on this accent information matching. 53, candidates instructed by the display control unit 53, a candidate display unit 6 that displays the candidates with priority, and a candidate selection unit 7 that selects the correct word from the candidates displayed on the candidate display unit 6. .

次に本実施例の動作について説明する。Next, the operation of this embodiment will be explained.

音声が入力されるとＡＤ変換部３１でアナログ値である
音声をデジタル値に変換し、特徴量抽出部３２でこのデ
ジタル値によって表された音声波形を分析し特徴量を抽
出する。スコア計算部３３はテンプレート記憶部１０に
記憶されているテンプレートと入力音声の特徴量をマツ
チングし、各テンプレートに対応している候補のマツチ
ングスコアを計算する。スコア順ソート部３４は、この
マツチングスコアに基づきある順位までのスコアを持つ
候補をスコア順に分類する。優先順ソート部４１は、さ
らに辞書順などの所定の基準に従って上述の候補を再分
類する。この分類結果を読み表示部４２に表示し、読み
確定部４３により表示された候補から正確な読みを確定
する。When audio is input, an AD converter 31 converts the audio as an analog value into a digital value, and a feature extractor 32 analyzes the audio waveform represented by the digital value and extracts a feature. The score calculation unit 33 matches the templates stored in the template storage unit 10 with the feature amounts of the input speech, and calculates matching scores for candidates corresponding to each template. The score order sorting unit 34 sorts candidates having scores up to a certain rank based on this matching score in order of score. The priority sorting unit 41 further reclassifies the above-mentioned candidates according to a predetermined criterion such as dictionary order. This classification result is displayed on the pronunciation display section 42, and the pronunciation determining section 43 determines the correct pronunciation from the displayed candidates.

一方これと並行して、アクセント情報抽出部５１は、Ａ
Ｄ変換器３１のデジタル出力から入力音声のアクセント
情報を抽出する。アクセント情報照合部５２は、読み確
定部４３で確定した読みについてアクセント情報記憶部
２０に格納されているアクセント情報と、入力音声のア
クセント情報を照合する。表示制御部５３はアクセント
情報照合部からの照合情報を受は取って、アクセント情
報の一致度の大きい候補を優先的に表示するよう制御す
る。Meanwhile, in parallel with this, the accent information extraction unit 51
Accent information of the input voice is extracted from the digital output of the D converter 31. The accent information collation unit 52 collates the accent information stored in the accent information storage unit 20 for the pronunciation determined by the pronunciation determination unit 43 with the accent information of the input voice. The display control section 53 receives the collation information from the accent information collation section and performs control to preferentially display candidates with a high degree of matching of accent information.

候補表示部６は表示制御部５３の指示による優先表示を
付けた候補を表示する。候補選択部７は候補表示部６に
表示された候補のうちからユーザによって選択された候
補の単語（文字列）を出力する。The candidate display unit 6 displays candidates with priority display according to instructions from the display control unit 53. The candidate selection unit 7 outputs a candidate word (character string) selected by the user from among the candidates displayed on the candidate display unit 6.

次に第２実施例を第４図、第５図１第６図を用いて説明
する。Next, a second embodiment will be explained using FIG. 4, FIG. 5, and FIG. 6.

第４図は、第２実施例の構成を示すブロック図である。FIG. 4 is a block diagram showing the configuration of the second embodiment.

本実施例は入力音声の基本周波数を抽出し、この基本周
波数から入力音声のアクセントの型を決定する。これに
対応して第３図の関係するブロックを次のように変更す
る。まず第３図のアクセント情報記憶部２０を、音声が
表す意味を示す単語に対応するアクセント型のデータを
蓄積したアクセント型記憶部２０１に変更する０次に、
アクセント情報抽出部５１を入力音声から基本周波数を
抽出する基本周波数抽出部５１１　と抽出された基本周
波数からその音声のアクセント型を決定するアクセント
型決定部５１２にする。これに対応してアクセント情報
照合部５２を確定した読みについて、アクセント型記憶
部２０１に格納されているアクセント型と入力音声のア
クセント型を照合するアクセント型照合部５２１に変更
する。上記以外は第３図と同一である。This embodiment extracts the fundamental frequency of the input voice, and determines the type of accent of the input voice from this fundamental frequency. Correspondingly, the related blocks in FIG. 3 are changed as follows. First, the accent information storage unit 20 in FIG. 3 is changed to the accent type storage unit 201 that stores accent type data corresponding to the word that indicates the meaning expressed by the voice.
The accent information extraction section 51 is made into a fundamental frequency extraction section 511 that extracts a fundamental frequency from an input voice and an accent type determination section 512 that determines the accent type of the voice from the extracted fundamental frequency. Correspondingly, the accent information matching unit 52 is changed to an accent type matching unit 521 that matches the accent type stored in the accent type storage unit 201 with the accent type of the input voice for the determined pronunciation. Other than the above, it is the same as FIG. 3.

次に本実施例の特徴となるアクセント型について具体的
に説明する。Next, the accent type, which is a feature of this embodiment, will be specifically explained.

一例として、発声された音声が「かえる」の場合、同音
異義語が多数存在する。そのため、読みがｒかえる」と
確定した後でも、必要とする単語を選択するにはある程
度の時間を要してしまう。As an example, if the uttered sound is "kaeru", there are many homophones. Therefore, even after it is determined that the reading is r, it takes a certain amount of time to select the desired word.

そこで、ｒかえる」という単語のアクセント型の情報を
用いて、多数ある同音異義語の中から現在の発声のアク
セント型に近いものを優先的に表示して、必要とする単
語を早く選択できるようにする。ｒかえる１の同音異義
語を、アクセント型で分類すると以下のようになる。Therefore, by using information on the accent type of the word ``r kaeru'', we can display preferentially those that are close to the accent type of the current pronunciation from among the many homophones, so that you can quickly select the word you need. Make it. Homophones of r frog 1 are categorized by accent type as follows.

平板型（かえる）→（１１１）：代える。変える。蛙、
換える。替える頭高型（かえる）→（１００）：返る。帰る。Flat plate type (frog) → (111): Change. change. frog,
exchange. Change head height type (frog) → (100): Return. I'm going home.

卿る。飼えるなおｆｌｊはアクセントの強い部分、「０」はアクセン
トの弱い部分を表す。この単語はアクセント型で２つの
カテゴリーに分けられるので、単純に表示されている場
合よりも確実に早く選択できる。Sir. The character 'flj' indicates a strongly accented part, and '0' indicates a weakly accented part. Since the words are divided into two categories based on accent type, you can definitely select them faster than if they were simply displayed.

アクセント型決定の方法は、基本周波数の働きから、そ
の発声のアクセント型を推定するものである。その例と
して、平板型と頭高型のｒかえる」の発声の基本周波数
パターンを第５図と第６図にそれぞれ示す。本図から明
らかなように、平板型は基本周波数が平板で、若干、語
尾に行くに従って低くなっている。一方、頭高型は語尾
から「ａｌまで周波数が高く、ｒｅ」の直前で象、激に
周波数が低下している。この例のように、基本周波数の
変化がないことを見つけたり、急激に変化したりする場
所を見つけることによって、アクセント型の推定が可能
となる。The accent type determination method estimates the accent type of the utterance from the function of the fundamental frequency. As an example, the fundamental frequency patterns of the utterance of "r frog" for flat type and high head type are shown in Figures 5 and 6, respectively. As is clear from this figure, the fundamental frequency of the flat type is flat, and it decreases slightly toward the end of the word. On the other hand, in the head-height type, the frequency is high from the end of the word to "al," and the frequency drops dramatically just before the word "re." As in this example, the accent type can be estimated by finding no change in the fundamental frequency or finding places where it changes suddenly.

次に第３実施例を第７図、第８図を用いて説明する。Next, a third embodiment will be explained using FIGS. 7 and 8.

第３実施例は第２実施例に方言に応したアクセント型記
憶部を設け、話者の方言に応じたアクセントを用いるよ
うにしたものである。第７図に本実施例のブロック図を
示す。第４図のアクセント型記憶部２０１には各方言側
に方言の数Ｎ個のアクセント型記憶部を設け、アクセン
ト型選択部５４を設けて方言に対応したアクセント型を
取り出すようにしたもので、他は第２実施例と同一であ
る。The third embodiment is the same as the second embodiment by providing an accent type storage section corresponding to the dialect, so that an accent corresponding to the dialect of the speaker is used. FIG. 7 shows a block diagram of this embodiment. The accent type storage unit 201 in FIG. 4 is provided with N accent type storage units for each dialect, and an accent type selection unit 54 is provided to extract the accent type corresponding to the dialect. The rest is the same as the second embodiment.

第８図は方言に応じたアクセント型の一例を示す図であ
る８図中の’ＩＪ、’ＯＪはそれぞれアクセントの高い
ところ、低いところを示す、このように方言によって、
同じ読みに対応する漢字が異なってしまう場合が多いの
で、本実施例により方言に応したアクセント型を選択し
音声に対する正しい単語（つまり正しい漢字を用いた単
語）を得ることができる。Figure 8 is a diagram showing an example of accent types depending on the dialect. 'IJ and 'OJ in Figure 8 indicate high and low accents, respectively. In this way, depending on the dialect,
Since the kanji corresponding to the same reading often differ, this embodiment makes it possible to select an accent type appropriate to the dialect and obtain the correct word for the voice (that is, a word using the correct kanji).

次に第４実施例を第９図、第１Ｏ図を用いて説明する。Next, a fourth embodiment will be explained using FIG. 9 and FIG. 1O.

本実施例は、第３実施例が方言別のアクセント型記憶部
２０１を設けたのに対し、アクセント型記憶部２０１に
は標準のアクセント型を格納し、この標準アクセント型
と各方言のアクセント型との間に存在する変換規則を用
いて、方言に応じて標準アクセント型を変換するように
したものである。In this embodiment, whereas the third embodiment has an accent type storage unit 201 for each dialect, the accent type storage unit 201 stores a standard accent type, and the standard accent type and the accent type of each dialect are stored in the accent type storage unit 201. The standard accent type is converted according to the dialect using the conversion rules that exist between .

第９図に示すアクセント型変換部５５によって、この変
換が行われる。他は第４図に示す第２実施例と同一であ
る。This conversion is performed by the accent type conversion unit 55 shown in FIG. The rest is the same as the second embodiment shown in FIG.

第１０図は、このアクセント型変換規則の一例を示した
図である。つまり、京阪方言の音声のときは、ｒ売るｊ
の東京方言（これを標準とした場合）のアクセント型「
０１１をｒｌｌｌに変換すればよい。FIG. 10 is a diagram showing an example of this accent type conversion rule. In other words, when the voice is in the Keihan dialect, r sell j
The accent type of the Tokyo dialect (if this is taken as the standard) is ``
011 should be converted to rllll.

次に第５実施例を第１１図を用いて説明する。Next, a fifth embodiment will be explained using FIG. 11.

本実施例は同一の音声入力者の最初の音声に対する正し
い単語の選択結果からこの入力者の方言を推定し以降は
その方言に応じたアクセント型記憶部２０１のデータを
用いるようにしたものである。In this embodiment, the dialect of the same voice inputter is estimated from the selection result of the correct word for the first voice of the same voice inputter, and thereafter data in the accent type storage unit 201 corresponding to the dialect is used. .

第１１図に於いて方言推定部５６は、発声者が実際に発
声した音声をアクセント型決定部５１２で決定したアク
セント型と、実際に候補選択部７で選択された候補（読
みに対しその意味を表す漢字を割り当てた単語）のアク
セント型が不一致の場合、この不一致は方言によるアク
セント型の相違であるとして、アクセント型記憶部２０
１を参照して正しい方言を推定し、次回以降はアクセン
ト型選択部５４へ推定した方言に切り換えるよう指示す
る。In FIG. 11, the dialect estimating unit 56 converts the voice actually uttered by the speaker into the accent type determined by the accent type determining unit 512 and the candidate actually selected by the candidate selecting unit 7 (the meaning of the pronunciation). If the accent types of the words assigned the kanji representing
1 to estimate the correct dialect, and instruct the accent type selection unit 54 to switch to the estimated dialect from next time onwards.

この方言推定部５６が行う推定例を以下に説明する。An example of estimation performed by the dialect estimating unit 56 will be described below.

例えば京阪方言の話者がｒ帰る」と発声した場合、アク
セント型決定部５１２は基本周波数の分析結果からアク
セント型はｒｌｌｌｊとなる。この場合、標準語アクセ
ント辞書の中から最も近いアクセントを持つｒ変えるＪ
が候補表示部６に第１候補として優先的に表示される。For example, when a speaker of the Keihan dialect utters ``r return'', the accent type determining unit 512 determines the accent type as rlllj based on the fundamental frequency analysis result. In this case, change J with the closest accent from the standard accent dictionary.
is preferentially displayed on the candidate display section 6 as the first candidate.

実際候補選択部７で選択された単語が「変える１でなく
、ｒ帰る１であれば、各方言のアクセント型辞書を検索
してアクセント型の一致する単語を持つ辞書を検索し、
次回の人力音声からその辞書を使用する。　第１１図は
上記以外は第７図に示す第３実施例と同一である。If the word selected by the actual candidate selection unit 7 is not ``change 1'' but r return 1, search the accent type dictionaries of each dialect and search for dictionaries that have words with matching accent types,
Use that dictionary from the next human-powered voice. 11 is the same as the third embodiment shown in FIG. 7 except for the above.

次に第６実施例を第１２図、第１３図を用いて説明する
。本実施例はアクセント型記憶部２０１に格納されてい
る方言別アクセント型にその使用頻度も合わせて記憶し
ておき、使用頻度の高いアクセント型をアクセント型照
合部５２１が用いるようにし、最終的に候補選択部７で
選択されたアクセント型と人力音声のアクセント型が異
なる場合は、アクセント型頻度修正部５７がその入力音
声について使用した方言の使用頻度を修正するようにし
たものである。アクセント型記憶部２０１の記憶内容お
よびアクセント型頻度修正部５７以外は第７図に示す第
３実施例と同一である。Next, a sixth embodiment will be explained using FIG. 12 and FIG. 13. In this embodiment, the dialect-specific accent types stored in the accent type storage unit 201 are also stored together with their frequency of use, and the accent type collation unit 521 uses the frequently used accent types. If the accent type selected by the candidate selection section 7 is different from the accent type of the human voice, an accent type frequency correction section 57 corrects the usage frequency of the dialect used for the input speech. The contents of the accent type storage section 201 and the accent type frequency correction section 57 are the same as the third embodiment shown in FIG.

第１３図は、アクセント型記憶部２０１に記憶される方
言別のアクセント型使用頻度の一例を示した図である。FIG. 13 is a diagram showing an example of accent type usage frequencies for each dialect stored in the accent type storage unit 201.

次に第７実施例を第１４図を用いて説明する。Next, a seventh embodiment will be explained using FIG. 14.

本実施例は、候補選択部７で選択された単語のアクセン
ト型とアクセント型決定部５１２で決定した入力音声の
アクセント型が異なった場合、アクセント型登録部５８
によって記憶されているその選択された単語のアクセン
ト型を今回発声されたアクセント型に置き換え、使用実
情に合わせたデータ更新を行うようにしたものである。In this embodiment, when the accent type of the word selected by the candidate selection unit 7 and the accent type of the input voice determined by the accent type determination unit 512 are different, the accent type registration unit 58
The accent type of the selected word stored in the memory is replaced with the accent type uttered this time, and the data is updated according to the actual usage situation.

第１４図は、上記以外は第４図に示す第２実施例の場合
と同一である。14 is the same as the second embodiment shown in FIG. 4 except for the above.

〔Effect of the invention〕

以上の説明から明らかなように、本発明は、入力音声か
ら読みを確定し、この読みにアクセント情報を考慮して
候補の表示を優先順序をっけて表示し、ユーザに選択さ
せるので正しい候補を迅速に選択することが可能となる
。As is clear from the above explanation, the present invention determines the pronunciation from the input speech, takes accent information into consideration in this pronunciation, displays candidates in priority order, and allows the user to select the correct candidate. can be selected quickly.

【図面の簡単な説明】第１図は本発明の原理図、第２図は従来方式の原理の一
例を示す図、第３図は本発明の第１実施例を示すブロッ
ク図、第４図は第２実施例を示すブロック図、第５図、
第６図は入力音声の基本周波数パターンの一例を示す図
、第７図は第３実施例を示すブロック図、第８図は方言
に応じたアクセント型の一例を示す図、第９図は第４実
施例を示すブロック図、第１０図は方言側アクセント規
則の一例を示す図、第１１図は第５実施例を示すブロッ
ク図、第１２図は第６実施例を示すブロック図、第１３
図は方言側アクセント型使用頻度の一例を示す図、第１
４図は第７実施例を示すブロック図である。図において、１−音声辞書、　２−アクセント辞書、３−音声識別部
、　　４−読み確定部、５−アクセント識別部、　６−
・−候補表示部、７−候補選択部、　１０−テンプレー
ト記憶部、２０〜アクセント情報記憶部、　３ｌ−ＡＤ
変換部、３２−特徴量抽出部、　３３−スコア針軍部、
３４−　スコア順ソート部、　４１−・−・優先順ソー
ト部、４２−読み表示部、　４３−読み確定部、５１−
アクセント情報抽出部、５２−アクセント情報照合部、　５３−表示制御部、５
４−アクセント型選択部、５５−アクセント型変換部、　５６−・方言推定部、５
７−アクセント型頻度修正部、５８−アクセント型登録部、２０１・−アクセント型記憶部、５１１−基本周波数抽出部、５１２−アクセント型決定部、５２１−−−アクセント型照合部。従来方式の原理図の一例第２図読　み　漢　字　東京方言　東京方言頻度　京阪方言　
京阪方言頻度かえる　帰る　　　１００　　　７０　　
　　１１１　　　２０かえる　変える　　０１１　　　
４０　　　　１００　　　６０方言別アクセント型使用
頻度第１３図音声文字列＠ｌ実施例第３図音声第２実施例第４図東京方言読みかえるかえる漢字帰る変えるアクセント型＋００京阪方言かえる　　　帰るかえる　　　変える方言に応じたアクセント型第８図読み漢字東京方言京阪方言あう会う方言別アクセント規則第１０図[Brief Description of the Drawings] Fig. 1 is a diagram showing the principle of the present invention, Fig. 2 is a diagram showing an example of the principle of the conventional method, Fig. 3 is a block diagram showing the first embodiment of the invention, Fig. 4 is a block diagram showing the second embodiment, FIG.
FIG. 6 is a diagram showing an example of the fundamental frequency pattern of the input voice, FIG. 7 is a block diagram showing the third embodiment, FIG. 8 is a diagram showing an example of the accent type depending on the dialect, and FIG. FIG. 10 is a block diagram showing an example of dialect-side accent rules. FIG. 11 is a block diagram showing the fifth embodiment. FIG. 12 is a block diagram showing the sixth embodiment.
The figure shows an example of the frequency of accent type usage in dialects.
FIG. 4 is a block diagram showing the seventh embodiment. In the figure, 1-speech dictionary, 2-accent dictionary, 3-speech identification section, 4-reading confirmation section, 5-accent identification section, 6-
・-Candidate display section, 7-Candidate selection section, 10-Template storage section, 20-Accent information storage section, 3l-AD
conversion unit, 32-feature extraction unit, 33-score needle military unit,
34-Score order sorting unit, 41-Priority sorting unit, 42-Reading display unit, 43-Reading confirmation unit, 51-
accent information extraction unit, 52-accent information collation unit, 53-display control unit, 5
4-Accent type selection unit, 55-Accent type conversion unit, 56-Dialect estimation unit, 5
7-accent type frequency correction unit, 58-accent type registration unit, 201--accent type storage unit, 511-fundamental frequency extraction unit, 512-accent type determination unit, 521--accent type matching unit. An example of the principle diagram of the conventional method Figure 2 Reading Kanji Tokyo dialect Tokyo dialect frequency Keihan dialect
Keihan dialect frequency change Return 100 70
111 20 Frog Change 011
40 100 60 Frequency of use of accent type by dialect Figure 13 Voice string @l Example Figure 3 Voice 2 Example Figure 4 Tokyo dialect Rereading Kaeru Kanji Return Changing accent type +00 Keihan dialect Kaeru Returning Kaeru Depending on the dialect to change Accent type Figure 8 Reading Kanji Tokyo dialect Keihan dialect Accent rules by dialect Figure 10

Claims

[Scope of Claims] 1) A speech dictionary (1) that stores features of speech, and a speech identification unit that identifies input speech and refers to the speech dictionary (1) to output candidates corresponding to this speech. (3), a pronunciation determination unit (4) that determines the pronunciation of the candidate output from the speech identification unit (2), an accent dictionary (2) that stores characteristics of the accent of the voice, and an accent of the input voice. an accent identification unit (which inputs the pronunciation of the input voice determined by the pronunciation determination unit (4) and outputs a word corresponding to the accent of this pronunciation by referring to the accent dictionary (2); 5) and this accent recognition part (5
), and a candidate selection section (7) for selecting a word representing the input voice from among the words displayed on the candidate display section (6). A voice recognition device characterized by: 2) The accent identification unit (5) causes the candidate display unit (6) to display words ordered according to the degree of matching with the accent type of the input speech. Speech recognition device. 3) The speech recognition device according to claim 1 or 2, wherein the accent identification unit (5) extracts the fundamental frequency of the input speech and estimates the accent type. 4) Accent data according to the dialect of the input voice is stored in the accent dictionary (2), and the accent identification section (5) refers to the accent data according to the dialect of the input voice. 3. The speech recognition device according to any one of 3 to 3. 5) Accumulate standard accent data in the accent dictionary (2), store conversion rules between the standard accent and each dialect accent in the accent identification unit (5), and store the standard accent data in accordance with the dialect of input speech. The speech recognition device according to any one of claims 1 to 3, characterized in that the speech recognition device converts and references the speech recognition device. 6) A claim characterized in that the accent identification unit (5) determines the dialect of the voice input person from the selection result of the candidate selection unit (7) for the first voice of the same voice input person. The speech recognition device according to any one of Items 4 and 5. 7) The accent dictionary (2) stores dialect-specific accent information for each word and the frequency of use of this dialect-specific accent information by the accent identification unit (5). 5. The speech recognition device according to claim 1, wherein when selecting a word corresponding to the input speech, the speech recognition device also takes into consideration the frequency of use of the word. 8) The word selected by the candidate selection unit (7) is determined by the accent identification unit (7) based on the dialect-specific accent information.
5) If the output word differs from the output word, the frequency of use of the dialect-specific accent information used to determine the output word is corrected.
The voice recognition device described. 9) For the word selected by the candidate selection unit (7), the accent type for the word stored in the accent dictionary (2) is the same as the accent type recognized by the accent identification unit (5) for this word. If the accent type is different, the accent type for the word stored in the accent dictionary (2) is changed to the accent type recognized by the accent identification unit (5).
3. The speech recognition device according to any one of 3 to 3.