JPH0250197A

JPH0250197A - Dictionary pattern producing device

Info

Publication number: JPH0250197A
Application number: JP63212047A
Authority: JP
Inventors: Junichiro Fujimoto; 潤一郎藤本
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1988-05-06
Filing date: 1988-08-25
Publication date: 1990-02-20

Abstract

PURPOSE:To easily get a dictionary for an undefined speaker by indicating a held classification to a part, where voice patterns are held, from a part which indicates the classification of voice and selecting indicated patterns, and transferring patterns, whose quantity is equal to or smaller than the number of all selected patterns, to another medium. CONSTITUTION:Several patterns for dictionary generation, namely, standard patterns are held in a pattern storage part 1, and a list of contents is seen on a CRT display device 3. Required object names are inputted from a keyboard 4 to take out them from the storage part 1, and they are outputted to, for example, a floppy disk 5 to generate a dictionary for an unspecific speaker. The dictionary outputted to the floppy disk 5 is loaded to a recognizing device and is used. Thus, the dictionary for the unspecific speaker is easily generated.

Description

【発明の詳細な説明】技権分互本発明は、音声認識の辞書パターン作成装置に係るもの
である。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech recognition dictionary pattern creation device.

１１１ｊｌ生最近、音声認識の研究が盛んであり、市場に普及しかけ
てきた。この装置には利用者があらかじめ音声を登録し
てから使う特定話者方式と、登録なしで誰の声でも認識
できる不特定話者方式がある。具体的に単語音声認識を
例にあげて説明すると、不特定話者用の単語音声認識装
置では、使用する予定の単語の音声データを多景に集め
、あらかじめ分析した結果を装置の中へ保持させておく
。111jlRecently, research on voice recognition has been active, and it is beginning to spread into the market. This device has two types: a specific speaker system in which the user registers their voice in advance, and an unspecified speaker system in which anyone's voice can be recognized without registration. To explain specifically using word speech recognition as an example, a word speech recognition device for unspecified speakers collects speech data of the words to be used in many places, and stores the pre-analyzed results in the device. I'll let you.

次に未知の入力がその分析結果のどれと最も類似してい
るかを調べて認識結果とするものである。Next, it is determined which of the analysis results the unknown input is most similar to, and this is used as a recognition result.

不特定話者方式の認識装置は上述のように誰の声でも認
識できるというメリットがある反面、上記の如き理由で
装置の使用者が自由に登録単語することができないとい
う欠点があった。Although the speaker-independent recognition device has the advantage of being able to recognize anyone's voice as described above, it has the disadvantage that the user of the device cannot freely register words for the reasons mentioned above.

且−一旗本発明は、上述のごとき実情に鑑みてなされたもので、
特に、不特定話者認識装置のための音声認識用辞書を装
置の使用者、或は、装置の販売店等で簡易な方法で作成
できるような装置を提供することを目的としてなされた
ものである。The present invention has been made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a device that allows a user of the device or a store selling the device to easily create a speech recognition dictionary for a speaker-independent recognition device. be.

青−一双本発明は、上記目的を達成するために、音声パターンを
保持する部分と、音声の種類を指示する部分と、指示さ
れた音声を選出して他の媒体へ転送する部分とを有し、
前記音声の種類を指示する部分から音声パターンを保持
する部分に保持された種類の中の全体又は一部を指示し
、指示されたパターンを選出し、選出された全てのパタ
ーンの数以下のパターンを他の媒体へ転送するようにし
たこと、或いは、音声のパターンを保持する部分と、音
声の種類を指示する部分と、指示された種類の音声をと
り出して演算する部分と、演算の結果得られたパターン
を他の媒体へ転送する手段とを有しており、一つの音声
の種類について一つ以上の標準パターンとなり得るパタ
ーンを保持し、認識時の未知入力パターンと等価なパタ
ーンを各音声の種類について一つ以上保持し、該音声保
持部から指示された種類の音声を選び出し、前記人力等
価パターンとの間で認識演算を行なった上で認識の誤り
が最小或はそれに準ずる数となるような組合せの標準パ
ターンとなり得るパターンを選出し、それを他の媒体へ
転送するようにしたことを特徴としたものである。以下
、本発明の実施例に基いて説明する。In order to achieve the above-mentioned object, the present invention has a part that holds the audio pattern, a part that specifies the type of audio, and a part that selects the designated audio and transfers it to another medium. death,
Instruct all or part of the types held in the voice pattern holding part from the voice type indicating part, select the specified pattern, and select a pattern that is less than or equal to the number of all the selected patterns. A part that holds the sound pattern, a part that specifies the type of sound, a part that extracts the specified type of sound and performs calculations, and the result of the calculation. It has a means to transfer the obtained pattern to another medium, holds one or more patterns that can be standard patterns for one type of voice, and transfers each pattern equivalent to the unknown input pattern at the time of recognition. One or more types of voices are retained, a voice of the specified type is selected from the voice holding unit, and a recognition operation is performed between the voice and the human-powered equivalent pattern, and the number of voices with a minimum recognition error or equivalent thereto is selected. This system is characterized by selecting a pattern that can be a standard pattern for such a combination and transferring it to another medium. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明の一実施例を説明するための構成図で
、図中、１はパターン格納部、２は演算部、３はＣＲＴ
、４はキーボード、５はフロッピーディスクである。こ
の実施例においては、パターン格納部１に辞書作成用の
パターン（標準パターン）がいくつか持保されており、
ＣＲＴデイスプレー３上にその内容−覧を見ることがで
きる。FIG. 1 is a configuration diagram for explaining one embodiment of the present invention, in which 1 is a pattern storage section, 2 is a calculation section, and 3 is a CRT.
, 4 is a keyboard, and 5 is a floppy disk. In this embodiment, several patterns (standard patterns) for dictionary creation are held in the pattern storage unit 1.
The contents can be viewed on the CRT display 3.

それに対し、キーボード４から必要な対象名を入力する
ことでそれらを格納部１からとり出し、他の媒体、ここ
ではフロッピーディスク５へ出力して不特定話者用の辞
書を作成するものである。フロッピーディスク５へ出力
された辞書は認識装置にロードして使用する。こうして
簡易な方法で不特定話者用の辞−Ｊｌを作成することが
できる。しかし、単語認識を考えると、使用するｍ語群
によって辞書パターンが異なる方が望ましい。例えば「
１個」　「２個」　「３個」・・・という語群の中に単
語「リコー」がある場合ｒｎ　ｉ　ｋｏＪとｒｒｉｋＯ
」が類似していることから誤りやすく、両者が区別され
るためには、／ｎ／と／ｒ／の差異を強調する必要があ
り、「国語」　「理科」　「社会」・・・の中に「リコ
ー」を入れる場合はｒｒｉｋａＪとｒ　ｒ　ｉ　ｋ　ｏ
　Ｊの区別、つまり／ａ／と／　ｏ　／を強調せねばな
らない。従って、先の単語群と後の単語群では同じ「リ
コー」でも異ったパターンを使用する方が認識率が上る
ことは明らかである。On the other hand, by inputting the necessary target names from the keyboard 4, they are retrieved from the storage unit 1 and output to another medium, in this case a floppy disk 5, to create a dictionary for unspecified speakers. . The dictionary output to the floppy disk 5 is loaded into the recognition device and used. In this way, the dictionary -Jl for unspecified speakers can be created using a simple method. However, when considering word recognition, it is desirable that the dictionary patterns differ depending on the m-word group used. for example"
If the word ``Ricoh'' is in the word group ``1 piece'', ``2 pieces'', ``3 pieces''...rn i koJ and rrikO
” are easy to mistake because they are similar, and in order to distinguish between the two, it is necessary to emphasize the difference between /n/ and /r/, and in “Japanese,” “Science,” “Social Studies,” etc. If you want to put "Ricoh" in , use rrikaJ and r r i ko
The distinction between J, ie /a/ and /o/ must be emphasized. Therefore, it is clear that the recognition rate will be higher if different patterns are used for the same word "Ricoh" in the first word group and the second word group.

そこで、この対策として各カテゴリー（例えば単語、以
後単語の例で説明する）ごとにタイプの異なる複数のパ
ターンを保持しておいて、必要な単語群を与え、その決
められた単語群で各１ｊＬ語の複数のパターンの中から
もっとも誤りが少なくなるような組合せをとり出して最
良の辞書とすることが望ましい。そのためには、第２図
に示す実施例のようにする。Therefore, as a countermeasure to this, multiple patterns of different types are maintained for each category (for example, words, which will be explained below using examples of words), and the necessary word groups are provided, and each 1jL pattern is created using the determined word group. It is desirable to select the combination that causes the least number of errors from among a plurality of word patterns to create the best dictionary. For this purpose, the embodiment shown in FIG. 2 is used.

第２図に示した実施例において、６は評価パターン格納
部、１乃至５は、第１図に示した実施例における１乃至
５と同様の作用をするもので、この実施例は、音声のパ
ターンを保持する部分と、音声の種類を指示する部分と
、指示された種類の音声をとり出して演算する部分と、
演算の結果得られたパターンを他の媒体へ転送する手段
とを有しており、一つの音声の種類について一つ以上の
標準パターンとなり得るパターンを保持し、認識時の未
知人カバターンと等価なパターンを各音声の種類につい
て一つ以上保持し、該音声保持部から指示された種類の
音声を選び出し、前記人力等価なパターンとの間で認識
演算を行なった上で認識の誤りが最小或はそれにＫＱす
る数となるような組合せの標準パターンとなり得るパタ
ーンを選出し、それを他の媒体へ転送するようにしたこ
とを特徴とするものである。而して、この第２図に示し
た実施例において、辞書作成用のパターンを格納する部
分１とそれを評価するパターンが格納される評価パター
ン格納部分６があり、その中には多くの単語のパターン
が各東語について複数個格納されている。例えば、ｆｕ
ＺＺｙ理論を用いた認識方式として発表されている方法
（情報理論とその応用シンポジウム（１９８７年１１２
月１９−２１０）ＪＣ：３−３−１を利用する場合につ
いて説明すると、上記ＪＣ３−３−１記載の方法におい
ては、まず、認識は、登録された各標準パターンと未知
の入力パターンとをパターンマツチングによって類似性
を求め、最大の類似性を示すカテゴリーへ分類する。登
録されている単語の集合を工。In the embodiment shown in FIG. 2, 6 is an evaluation pattern storage unit, and 1 to 5 have the same functions as 1 to 5 in the embodiment shown in FIG. A part that holds the pattern, a part that specifies the type of audio, and a part that extracts the specified type of audio and performs calculations.
It has a means to transfer the pattern obtained as a result of the calculation to another medium, holds one or more patterns that can be a standard pattern for one type of voice, and has a means to transfer the pattern obtained as a result of the calculation to another medium. One or more patterns are held for each type of voice, and a voice of the specified type is selected from the voice holding unit, and a recognition operation is performed between it and the human-equivalent pattern, and the recognition error is minimized or This feature is characterized in that a pattern that can be a standard pattern of combinations that result in a number of KQs is selected, and the pattern is transferred to another medium. In the embodiment shown in FIG. 2, there is a section 1 for storing patterns for dictionary creation and an evaluation pattern storage section 6 for storing patterns for evaluating the dictionary. Multiple patterns are stored for each Togo language. For example, fu
A method announced as a recognition method using ZZy theory (Information Theory and Its Application Symposium (1987, 112
19-210) When using JC: 3-3-1, in the method described in JC 3-3-1 above, first, recognition is performed using each registered standard pattern and an unknown input pattern. Find similarities using pattern matching and classify into categories that show the greatest similarity. Manipulate a set of registered words.

１＝（ｉ□、１２．・・・ｉｎ）、その単語パターンの
集合をＸ、　Ｘ＝　（ｘ　ｉ□、　ｘ　ｉ２．−ｘ　ｉ
ｎ）とする。工は有限集合であり、Ｘは無限集合である
。1=(i□, 12....in), the set of word patterns is X, X= (x i□, x i2.-x i
n). is a finite set, and X is an infinite set.

また、単語工らしさを表すメンバーシップ関数の集合を
Ｍ、　Ｍ＝　（ｍ　ｉ□、　ｍ　ｉｚｔ　−ｍ　ｉｎ）
と定義する。未知の音声のパターンｙ　（ｙ　（Ｆ−Ｘ
）が入力されたとき、各単語ｊと未知パターンとの類似
度Ｓｊｙを求め、ｊ＝ｍａ　ｘ　（Ｓｊｙ）　　　　（１）ＪＣ工となる単語名ｊを認識結果として出方する。装置内部に
は各単語のメンバーシップ関数が登録°されており、未
知入力に対しメンバーシップ関数から類似度を求めるも
のである。In addition, the set of membership functions representing wordsmith-likeness is M, M= (m i□, m izt −min)
It is defined as Unknown voice pattern y (y (F-X
) is input, the degree of similarity Sjy between each word j and the unknown pattern is calculated, and j=max (Sjy) (1) A word name j that is a JC code is generated as a recognition result. A membership function for each word is registered inside the device, and the degree of similarity is determined from the membership function for unknown input.

メンバーシップ関数ｍ（ｘ）は１話者による周波数変動
と時間変動が存在するなかで、単語パターンＸらしさを
表すものでなければならない。この例では単語パターン
として単語音声を周波数分析した結果を２値化処理した
ものを使う。The membership function m(x) must represent the likeness of the word pattern X in the presence of frequency fluctuations and time fluctuations due to one speaker. In this example, a binarized result of frequency analysis of word sounds is used as the word pattern.

第３図（ａ）に単語”　ｓ　ｔ　ａ　ｒ　ｔ　”の単語
パターンを、第３図（ｂ）に同じ単語のメンバーシップ
関数を示す。FIG. 3(a) shows the word pattern of the word "s t a r t", and FIG. 3(b) shows the membership function of the same word.

実際のシステム中では標準パターンであるメンバーシッ
プ関数ｍｊと未知の入力パターンｙ間の類似度Ｓｊｙはと表している。但し、である。ここで、・は通常の積で、本はｙとｍｊのαレ
ベルカットの論理積で次のような式で表されるものとす
る。In an actual system, the similarity Sjy between the membership function mj, which is a standard pattern, and the unknown input pattern y is expressed as follows. However, . Here, . is a normal product, and book is a logical product of α level cuts of y and mj, which is expressed by the following formula.

多くの場合、αはＯから３の値をとる。In many cases, α takes values from O to 3.

辞書用パターンとしては各単語パターンが第３図のよう
に重ね合わされたものとなり、評価用パターンとしては
各エレメントが１又は０の２値データである。勿論、評
価用は２値データでなくともサンプリングしたデータそ
のものを格納しておいて次の演算時に２値化しても良い
が、あらかじめ２値化しておく方がデータ量が少なく余
分な演算をしなくても良いことになる。辞書のパターン
はｌｌｉ語ラベラベルけ、一つの単語について複数の異
るパターンを有していることが望しい。又、評価用のパ
ターンも一つの単語について複数のパターンを持ち、で
きれば、それらのパターンの長さ、ホルマント周波数の
分布が対象とする言語を話す人々の分布と近くなるよう
に複数のパターンを持つことが望ましい。これは例えば
関東地方での辞書が必要な場合を例にとって説明すると
、多くの単語を多くの人が発声した時の各単語の平均長
と人数の分布をとると第４図（ａ）のような分布になる
と考えられる。又、同様にしである母音のホルマント周
波数とその人数を軸にグラフ化すると第４図（ｂ）のよ
うになると考えられる。そこで、第２図の評価パターン
格納部６の評価用パターンは第４図（ａ）、（ｂ）と同
じ分布を持つような集合を作る。この時の一つの単語に
対する人数は特に制限しない。辞書パターン格納部２の
辞書パターンは特に決められた選択方法はなく、色々な
パターンを登録しておく方が良い。The dictionary pattern is a combination of word patterns superimposed as shown in FIG. 3, and the evaluation pattern is binary data in which each element is 1 or 0. Of course, for evaluation purposes, it is possible to store the sampled data itself, rather than binary data, and convert it into binarization during the next calculation, but it is better to binarize it in advance because the amount of data is smaller and unnecessary calculations are required. It turns out that you don't have to do it. It is preferable that the dictionary pattern has a plurality of different patterns for one word. Also, evaluation patterns should have multiple patterns for one word, and if possible, have multiple patterns so that the length and formant frequency distribution of these patterns are close to the distribution of people who speak the target language. This is desirable. This can be explained by taking as an example a case where a dictionary is needed in the Kanto region.When many words are uttered by many people, the average length of each word and the distribution of the number of people are taken as shown in Figure 4 (a). It is thought that the distribution will be similar. Similarly, if a graph is plotted based on the formant frequency of a certain vowel and the number of people, the result will be as shown in FIG. 4(b). Therefore, the evaluation patterns in the evaluation pattern storage section 6 in FIG. 2 are assembled into a set having the same distribution as in FIGS. 4(a) and 4(b). At this time, there is no particular restriction on the number of people for one word. There is no particular method for selecting dictionary patterns in the dictionary pattern storage section 2, and it is better to register a variety of patterns.

第５図は、辞書パターン格納部２の内容の一例を示す図
で、辞書パターン格納部２はディレクトリ部Ａとパター
ン用メモリＢに分かれ、ディレクトリＡに登録されてい
る単語名と各単語のパターン数の各々が格納されている
番地等が保持されている。パターン用データメモリはデ
ィレクトリに示された値に従ってパターンデータが配置
されている。FIG. 5 is a diagram showing an example of the contents of the dictionary pattern storage section 2. The dictionary pattern storage section 2 is divided into a directory section A and a pattern memory B, and the word names and patterns of each word registered in the directory A. The address where each number is stored is held. In the pattern data memory, pattern data is arranged according to the values indicated in the directory.

一方、第６図は、評価用データを示し、Ｃはディレクト
リ部、Ｄは評価用データで、該評価用データも辞書用パ
ターンと同様に格納されている訳であるが、先に示した
認識方式の例では辞書の一つの要素を表わすビット数は
評価用よりも多い。On the other hand, FIG. 6 shows the evaluation data, where C is the directory section and D is the evaluation data, and the evaluation data is also stored in the same way as the dictionary pattern. In the example scheme, the number of bits representing one element of the dictionary is greater than that for evaluation.

つまり評価用のパターンは２値化されているが。In other words, the evaluation pattern is binarized.

辞書用は評価用と同等のパターンを多数重ね合わせて加
えて作成するためデータ量が多くなる。Dictionaries are created by overlapping and adding many patterns equivalent to those for evaluation, resulting in a large amount of data.

次に使用例を説明する。１０数字の標準パターンを作成
する場合、まず、キーボード４から０〜９の数字を入力
する。それに従って、辞書パターン格納部１からＯのパ
ターン、１のパターン、・・・９のパターンを抽出する
。この時、各単語に対しｎ個のパターンが登録されてい
る場合、１０Ｘｎ個のパターンが演算部３のＲＡＭにロ
ードされる。Next, a usage example will be explained. When creating a standard pattern of 10 numbers, first input numbers 0 to 9 from the keyboard 4. Accordingly, patterns O, 1, . . . 9 are extracted from the dictionary pattern storage unit 1. At this time, if n patterns are registered for each word, 10×n patterns are loaded into the RAM of the calculation unit 3.

これでｎ種類の１０数字辞書が作られたことになる。次
に評価用パターン格納部６の中から評価用の０〜１のパ
ターンをとり出す。これは第４図（ａ）、（ｂ）のよう
な分布になるように各パターンｍ個のパターンがあれば
それを演算部３のＲＡＭにロードし、先にロードしたｎ
種の辞書とｍ種の評価パターンで認識演算する。この演
算のしかたは第７図に示すようなフローチャートに従う
と良い。まず、第１の辞書を使って認識を行ない、その
認識率をＲｓとする。この時の認識は評価者全員の認識
率を加えたもので良い０次に単語Ｉｊを１にセットする
。ｊは辞書の各単語のパターン数を示し、これもますは
２にセットされる。単語ｊのパターンをｉの辞書パター
ンと入れ替えて認識するが、この時は評価者全員で認識
し、全員の和を認識率Ｒｊとする。ＲｊをＲｓを比較し
、Ｒｊが大きければ単語ｊはｉのパターンと入れ替えて
新しい辞書を作る。Ｒｊ　＞　Ｒｓが成立しなければ、
パターンは入れ替えることなく、次の辞書のｊのパター
ンと入れかえてみて又認識する。こうしてｉを１〜ｎま
で、ｊを１〜ｉｌｔ語数までくり返すことによって多く
の辞書データの中から１０単語の最良の辞書パターンが
一組選び出される。このように選出された一組のパター
ン（必要に応じて一つの単語に二つ以上のパターンを持
たせても良い）がフロッピー５へ出力され、辞ａファイ
ル（フロッピー５）が完成する。このフロッピーは先に
述べたような音声認識装置のフロッピードライバーに入
れ、作成された辞書を装置側にロードすることにより１
ｏ数字の不特定話者認、７ａ装置となる。このような辞
書パターンと評価パターンの格納部には使用頻度の高い
ｍ語、例えば１０数字、スタート、ストップ、終り、は
じめ、ハイ、イイエ、・・白、黒、赤・・・、子、午、
虎、卯、・・・などを入れておいて必要な単語を選出し
て任、伍の辞書を作ることができる。This means that n types of 10-digit dictionaries have been created. Next, patterns 0 to 1 for evaluation are taken out from the evaluation pattern storage section 6. This means that if there are m patterns, each pattern is loaded into the RAM of the calculation unit 3 so that the distributions shown in FIGS. 4(a) and 4(b) are obtained, and the n
Recognition calculations are performed using a dictionary of species and m types of evaluation patterns. This calculation may be performed in accordance with a flowchart as shown in FIG. First, recognition is performed using the first dictionary, and the recognition rate is set as Rs. The recognition at this time may be the sum of the recognition rates of all the evaluators.The zero-order word Ij is set to 1. j indicates the number of patterns for each word in the dictionary, and this box is also set to two. The pattern of word j is recognized by replacing it with the dictionary pattern of i, but at this time, all the evaluators recognize it, and the sum of all the evaluators is taken as the recognition rate Rj. Rj is compared with Rs, and if Rj is larger, word j is replaced with the pattern of i to create a new dictionary. If Rj > Rs does not hold,
Without replacing the pattern, try replacing it with pattern j from the next dictionary and recognize it again. In this way, by repeating i from 1 to n and j from 1 to the number of ilt words, a set of the best dictionary patterns of 10 words is selected from a large amount of dictionary data. A set of patterns selected in this way (one word may have two or more patterns if necessary) is output to the floppy disk 5, and the file a (floppy disk 5) is completed. This floppy is inserted into the floppy driver of the speech recognition device mentioned above, and the created dictionary is loaded into the device.
o Number speaker-independent recognition, 7a device. The storage area for such dictionary patterns and evaluation patterns includes m-words that are frequently used, such as 10 numbers, start, stop, end, hajime, hai, no, white, black, red, etc. ,
You can create a dictionary for Ren and Wu by adding words such as tiger, rabbit, etc. and selecting the necessary words.

又、この方法ではメモリー内に格納されている辞書パタ
ーンについて単語の選出をするものであるが、その中に
格納されていないものが必要となった場合、メーカーに
依頼してＲＯＭの書き替えと言った面倒な方法をとらね
ばならない。そこで、これを外部記憶媒体、例えばフロ
ッピーディスク等から読み出してＲＡ　Ｍヘロードする
ようにする。In addition, this method selects words based on dictionary patterns stored in memory, but if you need something that is not stored in the dictionary, you can ask the manufacturer to rewrite the ROM. I have to take the troublesome method I mentioned. Therefore, this data is read from an external storage medium, such as a floppy disk, and loaded into the RAM.

このような例を第８図に示す。Such an example is shown in FIG.

第８図に示した実施例は、第２図に示した辞書パターン
格納部１及び評価パターン格納部６の代りにフロッピー
ディスク７を設け、この中へ辞赴用と評価用のパターン
を格納しておき、それをキーボード４から指定してとり
出し、すでに述べたような演算後、フロッピー５へ出来
上った辞書を出力する。第９図は、このような手順を示
す図で、類似度を求めてパターンを選択するやり方は先
に述べたものと同じで良いが、この方法に限ることはな
い。例えば、第８図において、キーボード１１から入力
した単語の辞書パターンをフロッピーディスク７からと
り出す。この時、各単語についてｎ個のパターン全てを
とり出すとｎ組の辞書が出来ることになる。これをフロ
ッピーディスク５に出力しても良い。この時、認識装置
の方でｎ組の辞書を順に使用して使用上一番都合の良い
ものを使うようにすれば良い。第８図に示すような装置
の場合、フロッピーディスク７をとり換えることによっ
て無限の単語に対応することが出来る。In the embodiment shown in FIG. 8, a floppy disk 7 is provided in place of the dictionary pattern storage section 1 and the evaluation pattern storage section 6 shown in FIG. The dictionary is then designated and retrieved from the keyboard 4, and after the above-mentioned calculations are performed, the completed dictionary is output to the floppy disk 5. FIG. 9 is a diagram showing such a procedure, and the method of selecting patterns based on similarity may be the same as that described above, but is not limited to this method. For example, in FIG. 8, the dictionary pattern of the word input from the keyboard 11 is retrieved from the floppy disk 7. At this time, if all n patterns are extracted for each word, n sets of dictionaries will be created. This may be output to the floppy disk 5. At this time, the recognition device may sequentially use n sets of dictionaries and use the one that is most convenient for use. In the case of the apparatus shown in FIG. 8, by replacing the floppy disk 7, it is possible to handle an infinite number of words.

この方式は、不特定話者用の辞書作成を目的としている
ものではあるが、不特定話者装置が特定話者装置として
利用できれば利用者が任意のａ声の登録が出来て便利で
ある。そこで利用者の音声を不特定話者用の辞書の中に
加えるようにしたのが第１０図及び第１１図に示したも
のである。認識方式としては先に述へた２値のＴＳＰを
用いる方式を示しているが、これに限るものではない。Although this method is aimed at creating a dictionary for non-specific speakers, it is convenient because if the non-specific speaker device can be used as a specific speaker device, the user can register any a-voice. Therefore, the system shown in FIGS. 10 and 11 is such that the user's voice is added to the dictionary for unspecified speakers. As the recognition method, a method using the binary TSP described above is shown, but the present invention is not limited to this.

まず、利用者が必要な単語を必要な回数だけマイク８に
向って発声する。各々の音声はマイクアンプ９で増幅さ
れた後、フィルタバンク１０，２値化部１１を通して２
値化されて演算部２にて重ね合わされて標準パターンが
作成される。これを辞書パターン格納部１２へ他のパタ
ーンと併せて格納する。こうすることによって不特定話
者用だけでなく特定話者用の辞書パターンが作成できる
ため１通常の不特定話者用の音声認識装置を特定話者用
の認識装置として利用することができるようになり、不
特定話者方式よりも高い正答率で任意の単語の登録がで
きるようになる。First, the user speaks the required words into the microphone 8 the required number of times. After each voice is amplified by a microphone amplifier 9, it passes through a filter bank 10 and a binarization unit 11 to
The values are converted into values and superimposed in the calculation unit 2 to create a standard pattern. This is stored in the dictionary pattern storage section 12 together with other patterns. By doing this, it is possible to create dictionary patterns not only for unspecified speakers but also for specific speakers. 1. A normal speech recognition device for unspecified speakers can be used as a recognition device for specific speakers. This makes it possible to register any word with a higher correct answer rate than the speaker-independent method.

なお１以上の説明においては、あらかじめ辞書パターン
の形になったものをメモリーに格納しているが、必ずし
もこの必要はなく、データを演算部に入れて統計的な処
理を施しても良い。Note that in the above explanations, dictionary patterns are stored in memory in advance, but this is not necessarily necessary, and the data may be input into a calculation unit and subjected to statistical processing.

また、このような装置を個人用に持たず、業務用として
配し、使用者が自分でフロッピー等の記録媒体を決めら
れた位置に挿入し、指定のキーボードから単語を入力し
、金銭を投入して自分のフロッピーに必・要な辞書パタ
ーンを転送するいわゆる自動販売機の形にすることも考
えられる。In addition, such a device is not kept for personal use, but is provided for business use, and the user inserts a recording medium such as a floppy into a designated position, enters words from a designated keyboard, and inserts money. It is also possible to create a so-called vending machine that transfers necessary dictionary patterns to one's own floppy disk.

処−一果以上の説明から明らかなように、発明によると、不特定
話者用の高い認識率の辞書が簡易に入手できる。As is clear from the above description, according to the invention, a dictionary with a high recognition rate for any speaker can be easily obtained.

[Brief explanation of the drawing]

第１図及び第２図は、それぞれ本発明の詳細な説明する
ための構成図、第３図は、メンバーシップ関数の構成例
を示す図、第４図は、各単語の平均長と人数の分布を示
す図、第５図は、ＪＲ準パターンの登録例を示す図、第
６図は、評価用パターンの登録例を示す図、第７図は、
認識演算のフローチャートを示す図、第８図は、本発明
の他の実施例を説明するための図、第９図は、第８図に
示した実施例の動作手順を示す図、第１０図及び第１１
図は、不特定話者用の認識方式の例を示す図である。１・・・パターン格納部、２・・・演算部、３・・・Ｃ
ＲＴ、４・・・キーボード、５・・・フロッピーディス
ク、６・・・評価パターン格納部、７・・・フロッピー
ディスク。８・・・マイク、９・・・マイクアンプ、１０・・・フ
ィルタバンク、１１・・・２値化部、１２・・・辞書パ
ターン格納部。第図第図第図簗図第図第図第図第図Figures 1 and 2 are configuration diagrams for explaining the present invention in detail, Figure 3 is a diagram showing a configuration example of a membership function, and Figure 4 is a diagram showing the average length of each word and the number of people. A diagram showing the distribution, FIG. 5 is a diagram showing an example of registration of JR quasi-patterns, FIG. 6 is a diagram showing an example of registration of evaluation patterns, and FIG.
FIG. 8 is a diagram showing a flowchart of recognition calculation; FIG. 8 is a diagram for explaining another embodiment of the present invention; FIG. 9 is a diagram showing the operating procedure of the embodiment shown in FIG. 8; FIG. and 11th
The figure is a diagram showing an example of a recognition method for unspecified speakers. 1... Pattern storage section, 2... Calculation section, 3... C
RT, 4...keyboard, 5...floppy disk, 6...evaluation pattern storage section, 7...floppy disk. 8... Microphone, 9... Microphone amplifier, 10... Filter bank, 11... Binarization section, 12... Dictionary pattern storage section. Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures Figures

Claims

[Scope of Claims] 1. It has a part that holds a sound pattern, a part that instructs the type of sound, and a part that selects the designated sound and transfers it to another medium, and Instruct all or part of the types held in the part that holds the audio pattern from the instruction part, select the specified pattern, and transfer patterns that are less than or equal to the number of all selected patterns to other media. A dictionary pattern creation device characterized in that it transfers data. 2. A part that holds the sound pattern, a part that specifies the type of sound, a part that extracts the designated type of sound and performs calculations, and a means for transmitting the pattern obtained as a result of the calculation to another medium. It has one or more patterns that can be standard patterns for one voice type, holds one or more patterns for each voice type that are equivalent to the unknown input pattern during recognition, and Selects the specified type of voice from the holding unit, performs a recognition operation with the input equivalent pattern, and selects a pattern that can be a standard pattern of combinations that will result in a minimum number of recognition errors or an equivalent number. A dictionary pattern creation device characterized in that the dictionary pattern creation device is configured to create a dictionary pattern and transfer it to another medium.