JPH07230299A

JPH07230299A - Voice recognition device

Info

Publication number: JPH07230299A
Application number: JP6020456A
Authority: JP
Inventors: Toshiyuki Watanabe; 俊幸渡辺; Akira Ishida; 明石田
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1994-02-17
Filing date: 1994-02-17
Publication date: 1995-08-29

Abstract

PURPOSE:To roughly discriminate which one of the word groups beforehand prepared inputted voice belong to employing a group discrimination section which is made up of neural network. CONSTITUTION:Feature patterns of the voice signals inputted through a voice input section 1 is generated by a feature pattern generation section 2. Based on the feature patterns, a group discrimination section 3 constituted of neural network discriminates which one of input items 1, 2 or 3 the word belongs to. Then, the words stored in the input items are discriminated by word recognition sections 4, 5 or 6 of discriminated input items 1, 2 or 3.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は入力音声がいずれのグル
ープに属するかを識別するグループ識別部と、各グルー
プ内のいずれの単語に対応するかを識別する単語識別部
とにより入力音声を認識する音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention recognizes an input voice by a group identification unit for identifying which group the input voice belongs to and a word identification unit for identifying which word in each group it corresponds to. Voice recognition device.

【０００２】図５は従来の音声認識装置の構成を示すブ
ロック図であり、図中１１は音声入力部を示している。
音声入力部１１はマイクロフォン，マイクアンプ，ＡＤ
コンバータ等で構成されており、これを通じて入力され
た音声は音声信号として周波数スペクトル分析等により
特徴パターンを抽出する特徴パターン作成部１２へ入力
される。特徴パターン作成部１２は音声信号の特徴とす
る周波数スペクトルパターンを抽出し、これを各単語識
別部１３，１４又は１５へ入力する。FIG. 5 is a block diagram showing the structure of a conventional voice recognition apparatus, and reference numeral 11 in the figure denotes a voice input unit.
The voice input unit 11 includes a microphone, a microphone amplifier, and an AD.
The voice input through the converter is input to the characteristic pattern creating unit 12 that extracts a characteristic pattern as a voice signal by frequency spectrum analysis or the like. The characteristic pattern creating unit 12 extracts a frequency spectrum pattern that is a characteristic of the audio signal and inputs it to each word identifying unit 13, 14 or 15.

【０００３】各単語識別部１３，１４，１５には、例え
ば「人」の特徴を表わす内容についてのグループ、即ち
入力項目である性別，出身地，年令が割付けられ、性別
の場合は男，女、男性，女性等の単語が、また出身地の
場合には東京，大阪等の単語が、更に年令の場合には１
０代，２０代，３０代等の単語が夫々の単語認識部１
３，１４，１５に割り当てたメモリに各別に格納されて
おり、キー入力部１６、認識結果制御部１７を通じての
指示に従って、順次的に動作せしめられるようにしてあ
る。To each of the word identifying parts 13, 14 and 15, for example, a group of contents representing characteristics of "person", that is, input items such as sex, birthplace and age are assigned, and in the case of sex, male, Words for women, men, women, etc., in the case of birthplace, words for Tokyo, Osaka, etc., and for older people, 1
Word recognition unit 1 for words in their 0s, 20s, 30s, etc.
It is stored separately in the memories assigned to 3, 14, and 15, and can be operated sequentially in accordance with instructions from the key input unit 16 and the recognition result control unit 17.

【０００４】例えば最初に性別に関しての単語がメモリ
に格納されている入力項目１用の単語識別部１３が動作
せしめられている場合に、音声入力部１１から「男」の
音声が入力されると、単語識別部１３は特徴パターン作
成部１２から入力された特徴パターンと、予め格納され
ている性別を表す単語とを比較し、特徴パターンと対応
する単語を識別し、対応する単語が存在する場合には該
当する単語が認識されたことを示す信号を認識結果制御
部１７へ出力する。For example, when the word input unit 11 inputs the voice of "male" when the word identifying unit 13 for the input item 1 in which a word regarding gender is first stored in the memory is operated. The word identifying unit 13 compares the feature pattern input from the feature pattern creating unit 12 with a previously stored word representing gender, identifies a word corresponding to the feature pattern, and when a corresponding word exists. Outputs a signal indicating that the corresponding word has been recognized to the recognition result control unit 17.

【０００５】次に出身地に関する単語がメモリに格納さ
れている入力項目２用の単語識別部１４をキー入力部１
６の操作にて動作状態とし、単語識別部１４を待機状態
とする。出身地に関する音声が入力され、出身地に関す
る入力音声に対応する単語が認識されれば、年令に関す
る単語がメモリに格納されている入力項目３用の単語識
別部１５をキー入力部１６にて動作状態とし、年令に関
しての音声が入力され、それに対応する単語が認識され
れば再びキー入力部１６にて単語認識部１３を動作状態
として待機する。Next, the word input section 1 for the input item 2 in which the word concerning the place of birth is stored in the memory is used as the key input section 1.
The operation of 6 is set to the operating state, and the word identifying unit 14 is set to the standby state. When the voice relating to the place of birth is input and the word corresponding to the input voice relating to the place of birth is recognized, the word identifying unit 15 for the input item 3 in which the word relating to the age is stored in the memory is input by the key input unit 16. When the voice is input in the operating state and the word corresponding to the age is input, and the word corresponding thereto is recognized, the key input unit 16 again sets the word recognition unit 13 in the operating state and waits.

【０００６】ところでこのような従来装置にあっては各
単語識別部１３，１４，１５を順次的に動作させて入力
音声と対応する単語を識別することとなるから、操作が
煩わしいという問題があった。この対策として入力され
た単語音声の特徴パターンを分析し、予め用意したグル
ープ別の粗い識別を行い、次いで各グループ内で用意さ
れている単語を識別することにより音声認識を行う技術
が提案されている（特公平２−５２２７８号公報：Ｇ１
０Ｌ３／００）。By the way, in such a conventional apparatus, since the word identification units 13, 14, 15 are sequentially operated to identify the word corresponding to the input voice, there is a problem that the operation is troublesome. It was As a countermeasure against this, a technique has been proposed in which the characteristic pattern of the input word speech is analyzed, rough identification is performed for each group prepared in advance, and then the speech recognition is performed by identifying the words prepared in each group. (Japanese Patent Publication No. 2-52278: G1)
0L 3/00).

【０００７】この従来技術では、クラスタリング手法を
用いて単語を予めグループ分けしておき、各グループの
センター座標を求めてこれを代表標準パターンとし、入
力音声の分析パターンを代表標準パターンと照合し、相
互の距離が最小となる代表標準パターンのグループを該
当グループと識別する。In this prior art, words are preliminarily divided into groups by using a clustering method, the center coordinates of each group are obtained, and this is used as a representative standard pattern, and the analysis pattern of the input voice is collated with the representative standard pattern. The group of the representative standard pattern having the minimum mutual distance is identified as the corresponding group.

【０００８】図６は上記した従来装置によるグループの
分類態様を示す説明図である。いま６種類の単語があ
り、これを例えば記号○，●，◇，◆，□，△等で示す
ものとすると各単語の物理的な特徴により○，●，☆の
３単語はグループＧ₁、◇，◆の２単語はグループ
Ｇ₂、□，＃の２単語はグループＧ₃、△，▽の２単語
がグループＧ₄に夫々グループ化しておく。そして入力
音声の分析パターンを各グループの代表標準パターンと
照合し、相互の距離が最小となる代表標準パターンのグ
ループを識別した後、次に図５に示した従来技術と同様
に各グループ別に入力音声をグループ内の単語と照合し
て識別を行う。FIG. 6 is an explanatory diagram showing a group classification mode by the above-mentioned conventional apparatus. There are now six types of words, and if these are indicated by symbols ○, ●, ◇, ◆, □, △, etc., the three words ○, ●, and ☆ are group G ₁ , due to the physical characteristics of each word. The two words ◇ and ◆ are grouped into a group G ₂ , the two words □ and # are grouped into a group G ₃ , and the two words Δ and ▽ are grouped into a group G ₄ , respectively. Then, the analysis pattern of the input voice is collated with the representative standard pattern of each group, the group of the representative standard pattern having the smallest mutual distance is identified, and then input for each group as in the conventional technique shown in FIG. The voice is matched with the words in the group for identification.

【０００９】[0009]

【発明が解決しようとする課題】ところがこのような方
式では、グループはクラスタリング手法を用いてグルー
プ分けしており、図６からも明らかなように物理的な特
徴の類似度が低いと誤認が多くなるためグループ化は対
象単語の物理的特徴に依らざるを得ず、音声それ自体に
よるグループ化となり、グループの細分化が避けられな
いとう問題があった。However, in such a method, the groups are grouped by using a clustering method, and as is apparent from FIG. 6, it is often erroneously recognized that the physical feature similarity is low. Therefore, the grouping is unavoidable because it depends on the physical characteristics of the target word, and the grouping is based on the voice itself, and there is a problem that the subdivision of the group cannot be avoided.

【００１０】本発明はかかる事情に鑑みなされたもので
あって、その目的とするところはグループの識別を学習
内容に応じた識別機能を備え得るニューラルネットワー
クを用いて行うことで、入力音声に物理的特徴が少ない
場合にもグループ化を可能とした音声認識装置を提供す
ることにある。また本発明の他の目的はニューラルネッ
トワークで構成されたグループ識別部の識別結果に基づ
いて、制御部が各グループ毎の単語識別部を順次的に動
作させることで、従来の如きキー入力部の操作を不要と
した音声認識装置を提供することにある。The present invention has been made in view of the above circumstances, and an object of the present invention is to identify a group by using a neural network capable of having an identification function according to learning content, thereby physically inputting speech. An object of the present invention is to provide a voice recognition device that enables grouping even when there are few characteristic features. Another object of the present invention is to allow the control section to sequentially operate the word identifying section for each group based on the identification result of the group identifying section composed of a neural network, thereby enabling the conventional key input section to operate. It is to provide a voice recognition device that does not require operation.

【００１１】[0011]

【課題を解決するための手段】第１の発明に係る音声認
識装置は、予め定めた単語のグループ夫々に対応する音
声を順次入力させ、グループ内における音声入力と対応
する単語を認識するようにした音声認識装置において、
音声入力部と、入力された音声の特徴パターンを作成す
る特徴パターン作成部と、前記特徴パターンに基づいて
入力音声が前記グループのいずれのグループに属するか
を識別可能に学習させたニューラルネットワークで構成
されたグループ識別部と、該グループ識別部で識別され
たグループ内から夫々前記特徴パターンに基づき対応す
る単語を識別する単語識別部とを具備することを特徴と
する。第２の発明に係る他の音声認識装置は、前記グル
ープ識別部の識別結果に基づいて次に音声入力させるべ
きグループの切替えを行う制御部を具備することを特徴
とする。According to a first aspect of the present invention, a voice recognition device sequentially inputs a voice corresponding to each predetermined group of words, and recognizes a word corresponding to a voice input in the group. In the voice recognition device,
A voice input unit, a feature pattern creation unit that creates a feature pattern of an input voice, and a neural network that is trained to enable identification of which group the input voice belongs to based on the feature pattern And a word identifying unit that identifies a corresponding word from each of the groups identified by the group identifying unit based on the characteristic pattern. Another voice recognition device according to the second aspect of the invention is characterized by including a control unit that switches a group to be input next voice based on the identification result of the group identification unit.

【００１２】[0012]

【作用】第１の発明にあってはこれによって、ニューラ
ルネットワークに対する学習パターンを用途に応じて設
定することで、入力音声の物理的特徴の類似度が低い場
合であっても入力音声のグループ別の識別が可能とな
る。また第２の発明にあってはニューラルネットワーク
で構成されたグループ識別部の識別結果に応じて制御部
が単語識別部を順次選択的に動作させ得ることとなり、
入力音声を自動的に識別してゆくことが可能となる。According to the first aspect of the present invention, by setting the learning pattern for the neural network according to the application, even if the physical characteristics of the input voices are low in similarity, the input voices can be classified by group. Can be identified. In the second invention, the control unit can sequentially and selectively operate the word identifying units according to the identification result of the group identifying unit configured by the neural network.
It is possible to automatically identify the input voice.

【００１３】[0013]

【実施例】以下本発明をその実施例を示す図面に基づき
具体的に説明する。図１は本発明に係る音声認識装置の
構成を示すブロック図であり、図中１は音声入力部を示
している。音声入力部１はマイクロフォン，マイクアン
プ，ＡＤコンバータ等にて構成され、これを通じて入力
された音声は音声信号として特徴パターン作成部２へ出
力される。特徴パターン作成部２は入力された音声信号
の周波数スペトクルを求めて、その特徴を抽出し、これ
をニューラルネットワークで構成されたグループ識別部
３へ出力する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be specifically described below with reference to the drawings showing the embodiments. FIG. 1 is a block diagram showing the configuration of a voice recognition device according to the present invention, in which 1 denotes a voice input unit. The voice input unit 1 is composed of a microphone, a microphone amplifier, an AD converter, and the like, and the voice input through the voice input unit 1 is output to the characteristic pattern creating unit 2 as a voice signal. The characteristic pattern creating unit 2 obtains the frequency spectrum of the input voice signal, extracts the feature, and outputs this to the group identifying unit 3 configured by a neural network.

【００１４】特徴パターン作成部２としては、例えば１
〜２０ｍｓ程度の区間の周波数的特徴を抽出する短時間
スペクトル法、又はこれを表す係数列を抽出する法等が
用いられるが、特にこれに限らず、従来知られている他
の方法を採用してもよい。As the characteristic pattern creating section 2, for example, 1
A short-time spectrum method for extracting a frequency characteristic of a section of about ˜20 ms, a method for extracting a coefficient sequence representing this, or the like is used, but not limited to this, another conventionally known method is used. May be.

【００１５】ニューラルネットワークで構成されたグル
ープ識別部３は入力された特徴パターンに基づいて、予
めグループ分けされた単語が入力されている各グループ
毎の単語識別部、即ち入力項目１用の単語識別部４、入
力項目２用の単語識別部５、入力項目３用の単語識別部
６のいずれの単語識別部６に属する音声信号かを照合，
識別し、対応する単語識別部４，５又は６のいずれかへ
信号を出力すると共に、認識結果制御部７へも信号を出
力する。The group identification unit 3 composed of a neural network is a word identification unit for each group in which words grouped in advance are input based on the input characteristic pattern, that is, word identification for the input item 1. The part 4, the word identifying part 5 for the input item 2 and the word identifying part 6 for the input item 3 are matched to which of the word identifying parts 6 the voice signal belongs,
It discriminates and outputs a signal to any of the corresponding word discriminating units 4, 5 or 6 and also outputs a signal to the recognition result control unit 7.

【００１６】図２はニューラルネットワークの出力層ニ
ューロンと単語識別部４，５，６との関係を示す説明図
である。ニューラルネットワークは通常入力層ニューロ
ン，中間層ニューロン（いずれも図示せず）及び出力層
ニューロンの３層からなる階層構造に構成されており、
各層は夫々１又は複数のニューロンを備え、各入力層ニ
ューロンと中間層ニューロンとの間、各中間層ニューロ
ンと出力層ニューロンとの間は夫々異なる結合係数にて
結合されている。実施例では単語識別部４，５，６夫々
に３個の出力層ニューロンＯ₁，Ｏ₂，Ｏ₃を対応させ
た構成としてある。FIG. 2 is an explanatory diagram showing the relationship between the output layer neurons of the neural network and the word identifying units 4, 5, and 6. The neural network is usually configured in a hierarchical structure composed of three layers of an input layer neuron, an intermediate layer neuron (none of which are shown) and an output layer neuron,
Each layer includes one or a plurality of neurons, and the input layer neurons and the intermediate layer neurons are coupled with different coupling coefficients between the intermediate layer neurons and the output layer neurons. In the embodiment, the word identifying units 4, 5 and 6 are respectively associated with three output layer neurons O ₁ , O ₂ and O ₃ .

【００１７】図３はニューラルネットワークで構成され
たグループ識別部３を用いる場合のグループ化の一例を
示す説明図であり、いま、例えば図３に示す如く単語
○，×はグループＫ₁に、また単語□，☆はグループＫ
₂に、単語●，△，▽はグループＫ₃に、更に◇，◆は
グループＫ₄にグループ分けしておくものとすると、ニ
ューラルネットワークで構成されたグループ識別部３に
対し、上記した各単語が夫々グループＫ₁〜Ｋ₄のいず
れかに属するかを識別し得るよう繰り返し学習させてお
く。図３，図６を対比すれば明らかなように図６に示し
たグループＧ₁，Ｇ₂〜Ｇ₄は極めて類似した物理的特
徴に基づき区分されているのに対し、図３に示したグル
ープＫ₁，Ｋ₂〜Ｋ₄は必ずしも物理的特徴が類似して
いる場合に限られないことが解る。[0017] Figure 3 is an explanatory view showing an example of grouping in the case of using the group identification portion 3 composed of a neural network, now word ○ as shown in FIG. 3, for example, × the group K _1, also Words □ and ☆ are group K
₂ , the words ●, △, and ▽ are grouped into the group K ₃ , and the ◇ and ◆ are grouped into the group K ₄ , and the above-mentioned words are given to the group identification unit 3 configured by the neural network. Are repeatedly learned so that it can be discriminated which _one of the groups K _{1 to} K ₄ belongs to, respectively. As is clear from comparison between FIG. 3 and FIG. 6, the groups G ₁ , G _{2 to} G ₄ shown in FIG. 6 are divided based on extremely similar physical characteristics, while the groups shown in FIG. It is understood that K ₁ , K _{2 to} K ₄ are not necessarily limited to the case where the physical characteristics are similar.

【００１８】そして、例えばグループＫ₁に属する単語
は入力項目１用の単語識別部４に割り付けたメモリに、
またグループＫ₂に属する単語は入力項目２用の単語識
別部５に割り付けたメモリに、更にグループＫ₃に属す
る単語は入力項目３用の単語識別部６に割り付けたメモ
リへ夫々格納しておく。これによってニューラルネット
ワークで構成されたグループ識別部３の図示しない入力
層ニューロンに特徴パターン作成部２からの出力を入力
させると、その出力が各中間層ニューロンに、更に各中
間層ニューロンの出力が各出力層ニューロンに与えら
れ、この過程でグループ識別が行われ、各出力層ニュー
ロンＯ₁〜Ｏ₃から夫々「１００」「０１０」又は「０
０１」の如き照合，識別信号が単語識別部４，５，６へ
出力される。Then, for example, the words belonging to the group K ₁ are stored in the memory assigned to the word identification unit 4 for the input item 1,
The words belonging to group K ₂ are stored in the memory allocated to the word identifying unit 5 for input item 2, and the words belonging to group K ₃ are stored in the memory allocated to the word identifying unit 6 for input item 3 respectively. . As a result, when the output from the characteristic pattern creating unit 2 is input to an input layer neuron (not shown) of the group identifying unit 3 configured by a neural network, the output is input to each intermediate layer neuron, and the output of each intermediate layer neuron is The output layer neurons are given a group identification in this process, and the output layer neurons O _{1 to} O ₃ respectively output “100”, “010” or “0”.
A collation / identification signal such as "01" is output to the word identification units 4, 5, and 6.

【００１９】例えば出力層ニューロンＯ₁〜Ｏ₃から
「１００」の信号が出力されると入力項目１用の単語識
別部４が、また「０１０」の信号が出力されると入力項
目２用の単語識別部５が、更に「００１」の信号が出力
されると入力項目３用の単語識別部６が夫々動作せしめ
られ、特徴パターン作成部２から出力された特徴パター
ンに基づく単語識別が実施される。For example, when a signal "100" is output from the output layer neurons O _{1 to} O ₃ , the word identifying unit 4 for the input item 1 is output, and when a signal "010" is output, the signal for the input item 2 is input. When the word identifying unit 5 further outputs the signal "001", the word identifying units 6 for the input items 3 are made to operate respectively, and the word identification based on the feature pattern output from the feature pattern creating unit 2 is performed. It

【００２０】即ち、各単語識別部４，５又は６は入力音
声の特徴パターンと予め格納されている各単語とを照合
し、対応する単語が認識されると、識別信号を認識結果
制御部７へ出力する。次にこの認識結果制御部７の制御
動作を図４に示すフローチャートと共に説明する。That is, each word identifying unit 4, 5 or 6 collates the characteristic pattern of the input voice with each word stored in advance, and when the corresponding word is recognized, an identification signal is output to the recognition result control unit 7. Output to. Next, the control operation of the recognition result control unit 7 will be described with reference to the flowchart shown in FIG.

【００２１】なお各入力項目１，２，３用の単語識別部
４，５，６夫々に割り付けたメモリ内には予め下記に示
す如き単語が格納されているものとする。入力項目対象単語１男，女，男性，女性２北海道，東北，関東，中部，近畿，中国，四国，九州３１０代，２０代，３０代，４０代，５０代，６０代It is assumed that the following words are stored in advance in the memories assigned to the word identifying sections 4, 5, 6 for the input items 1, 2, and 3, respectively. Input item Target word 1 Male, Female, Male, Female 2 Hokkaido, Tohoku, Kanto, Chubu, Kinki, Chugoku, Shikoku, Kyushu 3 10's, 30's, 40's, 50's, 60's

【００２２】図４において、先ずＦＬＡＧ〔１〕＝偽、
ＦＬＡＧ〔２〕＝偽、ＦＬＡＧ〔３〕＝偽として初期設
定を行い、また入力項目番号を示す符号をＩとして、こ
れを入力項目１に対応する数値「１」とおき (Ｓ１）、
Ｉ＝３か否かを判断する (Ｓ２）。Ｉは３ではないか
ら、音声入力を行い、また入力された音声信号から特徴
パターン作成部２にて特徴パターンを作成し、ニューラ
ルネットワークで構成されたグループ識別部３にて識別
を行う (Ｓ３）。In FIG. 4, first, FLAG [1] = false,
Initialization is performed with FLAG [2] = false and FLAG [3] = false, and the code indicating the input item number is set to I, and this is set to the numerical value "1" corresponding to input item 1 (S1),
It is determined whether I = 3 (S2). Since I is not 3, voice input is performed, a feature pattern is created by the feature pattern creation unit 2 from the input voice signal, and identification is performed by the group identification unit 3 configured by a neural network (S3). .

【００２３】これによって入力された音声信号のグルー
プ、即ち入力項目１が識別されるが、いま例えば入力項
目ｉ（入力項目の一般化した番号）であると識別された
ものとすると、Ｉ＝ｉか否かの判断で (Ｓ６）、Ｉ＝ｉ
であるから、単語識別を行い、その入力項目中のどの単
語が認識されたかを示す値ＲＣＧ［１］をｊ、またその
入力項目で既に何らかの入力が有ったことを示すフラグ
ＦＬＡＧ［１］を真とする処理を行い（Ｓ９）、ステッ
プＳ２へ戻る。またステップＳ６の判断において、Ｉ≠
ｉの場合、即ち前提条件であるＩ＝ｉに対し、ニューラ
ルネットワークで構成されたグループ識別部３により識
別された入力項目が、例えば「２」であった場合にはフ
ラグＦＬＡＧ〔１〕の真，偽を判断し (Ｓ７）、ＦＬＡ
Ｇ〔１〕が真である場合にはＩ＝Ｉ＋１とし（Ｓ８）、
ステップＳ９へ進み、またＦＬＡＧ〔１〕が偽である場
合にはステップＳ２へ戻る。This identifies a group of input audio signals, that is, input item 1. Now, for example, if it is identified as input item i (generalized number of input item), I = i. Whether or not (S6), I = i
Therefore, the word identification is performed, the value RCG [1] indicating which word in the input item is recognized is j, and the flag FLAG [1] indicating that some input has already been made in the input item. Is performed (S9), and the process returns to step S2. Further, in the determination of step S6, I ≠
In the case of i, that is, when the input item identified by the group identification unit 3 configured by the neural network is, for example, “2” with respect to the precondition I = i, the flag FLAG [1] is true. ， False is judged (S7), FLA
When G [1] is true, I = I + 1 is set (S8),
The process proceeds to step S9, and if FLAG [1] is false, the process returns to step S2.

【００２４】例えばＩ＝１の場合において、音声入力が
「男性」である場合にはニューラルネットワークで構成
されたグループ識別部３にて識別される入力項目は
「１」、即ちｉ＝１となり、Ｉ＝ｉか否かの判断におい
てＩ＝ｉとなり、ＲＣＧ〔１〕＝男性、ＦＬＡＧ〔１〕
＝真を代入し、ステップＳ３へ戻る。この場合にはＩの
値は１のままであるから続いて入力項目１の変更入力が
可能となる。For example, in the case of I = 1, when the voice input is “male”, the input item identified by the group identification unit 3 composed of the neural network is “1”, that is, i = 1, In the judgment as to whether or not I = i, I = i, RCG [1] = male, FLAG [1]
= True is substituted, and the process returns to step S3. In this case, since the value of I remains 1, the change input of the input item 1 becomes possible.

【００２５】次に、例えばＩ＝１、即ち性別に関して
の入力項目である「１」とした状態で「近畿」の音声入
力があった場合、ニューラルネットワークにより識別さ
れる入力項目は「２」であるから、ステップＳ７で入力
項目１について既に何らかの入力があったかを示すフラ
グＦＬＡＧ〔１〕が真か偽かを判断する (Ｓ７）。ＦＬ
ＡＧ〔１〕が真の場合、即ち既に性別に関しての入力が
あった場合にはＩ＝Ｉ＋１＝２に入力項目を変え、ステ
ップＳ９ではＦＬＡＧ〔２〕＝真、またＲＣＧ〔２〕＝
「近畿」とする処理を行う。一方ＦＬＡＧ〔１〕が偽で
ある場合には音声入力ミスとし、ステップＳ３に戻り、
入力項目Ｉ＝１のままで次の音声入力待ちの状態とな
る。なお、上述の実施例では入力項目１，２，３の各単
語識別部４，５，６が３個備える場合を示したが、これ
に限るものではなく必要に応じて増減してよいことは言
うまでもない。Next, for example, when I = 1, that is, when there is a voice input of "Kinki" with "1" which is the input item regarding sex, the input item identified by the neural network is "2". Therefore, in step S7, it is determined whether the flag FLAG [1] indicating whether or not an input item 1 has already been input is true or false (S7). FL
If AG [1] is true, that is, if there is already an input regarding gender, the input item is changed to I = I + 1 = 2, and in step S9 FLAG [2] = true and RCG [2] =
Perform the process of "Kinki". On the other hand, if FLAG [1] is false, it is determined that a voice input error has occurred, and the process returns to step S3.
The input item I = 1 remains as it is, and the next voice input waiting state is entered. In the above-described embodiment, the case where the word identification units 4, 5, and 6 of the input items 1, 2, and 3 are provided is three, but the present invention is not limited to this, and the number may be increased or decreased as necessary. Needless to say.

【００２６】[0026]

【発明の効果】以上の如く第１の発明にあっては識別対
象単語を予め複数のグループに区分しておき、これを識
別可能なようニューラルネットワークで構成されたグル
ープ識別部に学習させておくことで、物理的特徴が必ず
しも類似していないパターンのグループ分けが可能とな
り、用途に応じたグループ分けが可能となる等、本発明
は優れた効果を奏する。また第２の発明にあっては制御
部がニューラルネットワークで構成されたグループ識別
部の識別結果に基づいて順次別の単語識別部を動作させ
てゆくことが可能となり、従来の如くキー入力部の操作
を必要とせず、操作が極めて容易となる。As described above, according to the first aspect of the present invention, the identification target word is divided into a plurality of groups in advance, and the group identification unit constituted by the neural network is made to learn this so that it can be identified. This makes it possible to group patterns that do not necessarily have similar physical characteristics and to group the patterns according to the intended use. Further, according to the second aspect of the invention, the control unit can sequentially operate different word identifying units based on the identification result of the group identifying unit configured by the neural network. No operation is required, and the operation is extremely easy.

[Brief description of drawings]

【図１】本発明に係る音声認識装置の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a voice recognition device according to the present invention.

【図２】ニューラルネットワークで構成されたグループ
識別部とグループとの関係を示す説明図である。FIG. 2 is an explanatory diagram showing a relationship between a group identification unit configured by a neural network and groups.

【図３】グループ化された入力項目別の内容を示す説明
図である。FIG. 3 is an explanatory diagram showing the contents of each grouped input item.

【図４】本発明に係る音声認識装置の処理過程を示すフ
ローチャートである。FIG. 4 is a flowchart showing a processing process of the voice recognition device according to the present invention.

【図５】従来装置の構成を示すブロック図である。FIG. 5 is a block diagram showing a configuration of a conventional device.

【図６】従来装置におけるグループ化された入力項目の
内容を示す説明図である。FIG. 6 is an explanatory diagram showing the contents of grouped input items in the conventional apparatus.

[Explanation of symbols]

１音声入力部２特徴パターン作成部３ニューラルネットワークで構成されたグループ識別
部４入力項目１用の単語識別部５入力項目２用の単語識別部６入力項目３用の単語識別部７認識結果制御部1 voice input unit 2 feature pattern creation unit 3 group identification unit composed of neural network 4 word identification unit for input item 5 word identification unit for input item 2 word identification unit for input item 3 recognition result control Department

Claims

[Claims]

1. A voice recognition device in which voices corresponding to respective groups of predetermined words are sequentially input, and words corresponding to voice inputs in the group are recognized, the voice input unit and the input voices. A feature pattern creating unit that creates a feature pattern, a group identifying unit configured by a neural network that is learned based on the feature pattern so as to identify which of the groups the input voice belongs to, and the group A voice recognition device, comprising: a word identification unit that identifies a corresponding word from each of the groups identified by the identification unit based on the characteristic pattern.

2. The voice recognition device according to claim 1, further comprising a control unit that switches a group to be input next by voice based on the identification result of the group identification unit.