JPS60130798A

JPS60130798A - Voice indentifier

Info

Publication number: JPS60130798A
Application number: JP58239186A
Authority: JP
Inventors: 英行高木; 中嶋　章子
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1983-12-19
Filing date: 1983-12-19
Publication date: 1985-07-12

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】産業上の利用分野本発明は、音声識別に関し、特に認識さｔｄｔ音声の持
つ意味の違いに応じて出力コードを変化させる音声識別
装置に関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention The present invention relates to speech identification, and more particularly to a speech identification device that changes an output code depending on the difference in meaning of recognized TDT speech.

従来例の構成さその問題点近年、ワードプロセッサ等の文章入力方式として、効率
の面から音声入力方式が注目されている。Problems with the Structure of the Conventional Example In recent years, voice input methods have been attracting attention from the viewpoint of efficiency as a text input method for word processors and the like.

従来の音声識別装置は第１図に示す様に音声認識部と音
声標準パターン辞書から成り立っており、認識された音
声に対応するコード、例えば平板名文字列が出力される
。As shown in FIG. 1, a conventional speech recognition device consists of a speech recognition section and a speech standard pattern dictionary, and outputs a code, such as a flat name character string, corresponding to the recognized speech.

ところで文章中には読みが同じ文字、すなわち同音語が
多数存在するが、これらは音声認識部ではまったく同じ
ものとして扱われる。従って、従来の音声識別装置では
言語的な意味の区別をすることは不可能であった。By the way, there are many characters in a text that have the same pronunciation, that is, homophones, but these are treated as exactly the same by the speech recognition unit. Therefore, it has been impossible for conventional speech recognition devices to distinguish between linguistic meanings.

発明の目的本発明は上記従来の問題点を解消するもので、言語的意
味の違いに応じて出力コードを変化する音声識別装置を
提供することにある。OBJECTS OF THE INVENTION The present invention solves the above-mentioned conventional problems, and it is an object of the present invention to provide a speech identification device that changes the output code depending on the difference in linguistic meaning.

発明の構成本発明は、言語的意味の観点から認識音声を区別するだ
めの言語的意味情報辞書と、入力音声の持つ言語的意味
を保持しておく意味フラグを備えた音声識別装置であり
、直前に入力された音声の言語的意味を保持している意
味フラグを参照することによって同音語の処理負担を軽
減することができるものである。Structure of the Invention The present invention is a speech identification device equipped with a linguistic meaning information dictionary for distinguishing recognized speech from the viewpoint of linguistic meaning, and a meaning flag for retaining the linguistic meaning of input speech. By referring to the meaning flag that holds the linguistic meaning of the voice input immediately before, the burden of processing homophones can be reduced.

実施例の説明第２図は本発明の一実施例における音声識別装置の構成
図である。第２図において、１１′は入力音声から音声
の特徴パラメータを抽出し、抽出された特徴パラメータ
を音声標準パターン判定部に出力する音声特徴抽出部、
１２け音声特徴抽出部１１から出力される特徴パラメー
タを入力とし音声標準パターン辞書を検索することによ
り音声を判定し音節コード列からなる判定結果を言語的
意味情報判定部に出力する音声標準パターン判定部、１
３け音声標準パターン判定部１２がら出力される音声の
判定結果である音節コード列を人力し直前に入力された
音声の言語的意味を保持している意味フラグと言語的意
味情報辞書とを参照することにより入力音節コード列に
意味情報を付加すると同時に判定した言語的意味を新た
に意味フラグに保持させ入力音声に対応する認識コード
を出力する言語的意味情報判定部、１は音声特徴抽出部
１１と音声標準パターン判定部１２と言語的意味情報判
定部１３とからなり入力音声を認識し認識コードを出力
する音声認識部、２は音声標準パターン判定部１２にお
いて音声を判定する時に用いる音声の標準パターンを内
蔵している音声標準パターン辞書、３は言語的意味情報
判定部１３において入力音節コード列に意味情報を細か
するだめの言語的意味情報辞書、４は言語的意味情報判
定部１３において音節コード列に対する複数個の言語的
意味情報の中から最適な言語的意味を選びだすために入
力音声の音節コード列の言語的意味を保持している意味
フラグである。DESCRIPTION OF THE EMBODIMENTS FIG. 2 is a block diagram of a voice recognition device according to an embodiment of the present invention. In FIG. 2, 11' is a voice feature extraction unit that extracts voice feature parameters from the input voice and outputs the extracted feature parameters to the voice standard pattern determination unit;
Speech standard pattern determination that inputs feature parameters output from the 12-digit speech feature extraction section 11, judges speech by searching a speech standard pattern dictionary, and outputs a judgment result consisting of a syllable code string to a linguistic semantic information judgment section. Part 1
The syllable code string that is the judgment result of the speech outputted from the 3-key speech standard pattern judgment unit 12 is manually inputted, and reference is made to the meaning flag and linguistic meaning information dictionary that hold the linguistic meaning of the speech input immediately before. 1 is a speech feature extraction section which adds semantic information to the input syllable code string and simultaneously stores the determined linguistic meaning in a meaning flag and outputs a recognition code corresponding to the input speech. A speech recognition section 11, a speech standard pattern judgment section 12, and a linguistic semantic information judgment section 13, which recognizes input speech and outputs a recognition code; A speech standard pattern dictionary incorporating standard patterns; 3 a linguistic semantic information dictionary used to refine semantic information into the input syllable code string in the linguistic semantic information determining section 13; 4 a linguistic semantic information dictionary used in the linguistic semantic information determining section 13; This is a meaning flag that holds the linguistic meaning of the syllable code string of the input speech in order to select the optimal linguistic meaning from a plurality of pieces of linguistic meaning information for the syllable code string.

以上の様に構成された本実施例の音声識別装置について
、以下その動作を説明する。The operation of the voice recognition device of this embodiment configured as described above will be described below.

文章１’−ＲＥＱＵＥＳＴボタンを１９回押す」を音声
で／アール／イー／キュー／ニー／イー／ニス／ティー／
ボ／り／ン／オ／イチ／キュー／力／イ／オ／ス／と入力する場合を考える。／アール／が音声認識２和１
に入力されると、言語的意味情報辞書３の参照によって
アルファベットという！語的意味を持つと吉がわかる。Sentence 1'-Press the REQUEST button 19 times' in voice / R / E / Cue / Knee / E / Varnish / T /
Let's consider the case of inputting ``bo/ri/n/o/ichi/cue/force/i/o/su/. /R/ is voice recognition 2 sum 1
is input, it is called an alphabet by referring to the linguistic meaning information dictionary 3! If you have a word meaning, you can understand good fortune.

これにより、音声認識部１は文字ｒＲＪを出力すると共
に、意味フラグ４にアルファベントという意味情報を保
持させる。同様に／イー／が入力された時も文字「Ｅ」
を出力すると共に、意味フラグ４にアルファベットとい
う意味情報を保持させる。As a result, the speech recognition unit 1 outputs the character rRJ, and causes the semantic flag 4 to hold the semantic information "alpha vent". Similarly, when /E/ is input, the character "E"
At the same time, the semantic flag 4 is made to hold the semantic information of the alphabet.

さて、次に／キュー／が入力されると、言語的意味情報
辞書３の参照により、アルファベントド数字の２つの言
語的意味を持つことがわかり、出力コードが確定しない
。この時、意味フラグ４を参照することによって、直前
に入力された音声がアルファベットであったことから今
回人力されだ／キュー／もアルファベットであると判断
し、文字ｒＱＪを出力する。Now, when /cue/ is input next, by referring to the linguistic meaning information dictionary 3, it is found that it has two linguistic meanings of an alpha bent number, and the output code is not determined. At this time, by referring to the meaning flag 4, since the voice input just before was an alphabet, it is determined that the human input /cue/ is also an alphabet, and the character rQJ is output.

同様に、「・・・１９回・・・」の／キュー／に対して
は、直前入力の／イチ／が数字という言語的意味情報を
持っていたことが意味フラグ４によってわかるので、文
字「９」を出力する。Similarly, for the /cue/ of "...19 times...", semantic flag 4 indicates that the previous input /ichi/ had the linguistic semantic information of a number, so the character "9" is output.

文章中には、［アルファベット・数字がそれぞ連続して
出現することの方が、ランダム単独に出現することより
も多い」という事実があることが自然言語情報の統計的
結論より知られており、本興明の原理はこの事実に基づ
いたものである。It is known from the statistical conclusions of natural language information that there is a fact that ``alphabets and numbers appear consecutively more often than they appear randomly alone'' in sentences. , the principle of Honkomei is based on this fact.

以上の様に本実施例によれば、直前に入力された音声の
言語的意味情報を保持できるようにすることによって、
同音語である「Ｑ」と「９」の区別を容易に行うことが
できる。As described above, according to this embodiment, by being able to retain the linguistic semantic information of the voice input immediately before,
The homophones "Q" and "9" can be easily distinguished.

なお、上記実施例ではアルファベントと数字の区別を例
に取りあげたが、言語的観点からの意味グループ分けは
、他にもいろいろ考えられることは言うまでもない。特
に音声入力する文章内容に適したグループ分けをしてお
くことによって、同音語の処理負担をより小さくするこ
とができる。Note that in the above embodiment, the distinction between alphabento and numbers was taken as an example, but it goes without saying that various other semantic groupings can be considered from a linguistic perspective. In particular, by categorizing the text into groups appropriate to the content of the text to be input by voice, the burden of processing homophones can be further reduced.

また、上記実施例では単音節入力を中心とし、数字とア
ルファベントを連続音節入力した場合を取り上げだが、
本質は辞書の言語的意味グループ分けにあるので、連続
音声入力であっても本発明は有効である。例えば、「政
府」を商用グループに、「正負」を技術グループに分け
ておけば、本発明への入力文章を商用文章にした場合で
も技術資料にした場合でも、／セイフ／の同音異義語の
処理に対する発声者の都後処理負担を少なくすることが
期待できる。In addition, in the above example, we mainly focused on monosyllable input, and took up the case where numbers and alpha vents were input as consecutive syllables.
Since the essence lies in the linguistic semantic grouping of the dictionary, the present invention is effective even with continuous voice input. For example, by dividing "government" into a commercial group and "positive/negative" into a technical group, whether the text input to the present invention is commercial text or technical data, the homonyms of /safe/ It can be expected that the post-processing burden on the speaker will be reduced.

捷だ、本実施例では音声標準バクーン辞書２と言語的意
味情報辞書３を分離して説明しだが、音声標準パターン
を、言語的意味の観点から分離した言語的意味情報辞書
を作れば当然のことながら、音声標準パターン辞書を単
独に存在さぜる必要け々い。In this example, the speech standard Bakun dictionary 2 and the linguistic semantic information dictionary 3 are explained separately, but it is natural to create a linguistic semantic information dictionary that separates the speech standard patterns from the viewpoint of linguistic meaning. However, it is not necessary to have a separate speech standard pattern dictionary.

発明の効果本発明は、人力音声の持つ言語的意味フラグを参照する
ようにしたことにより、同音語を容易にその文脈に応じ
て処理可能となり、その文脈中での正解率が向上し、処
理ステップ数が少なくなり、高速化、及び操作の容易化
が実現可能と々るものである。Effects of the Invention The present invention makes it possible to easily process homophones according to the context by referring to the linguistic meaning flags of human speech, improving the accuracy rate in that context and improving the processing speed. The number of steps is reduced, speeding up, and ease of operation can be achieved.

[Brief explanation of the drawing]

第１図は従来の音声識別装置の構成図、第２図は本発明
の一実施例の構成図を示す。１・・・・・・音声認識部、２・・・・・・音声標準パ
ターン辞書、３・・・・・・言語的意味情報辞書、４・
・・・・・意味フラグ、１１・・・・・・音声特徴抽出
部、１２・・・・・・音声標準パターン判定部、１３・
・・・・・言語的意味情報判定部。FIG. 1 is a block diagram of a conventional voice recognition device, and FIG. 2 is a block diagram of an embodiment of the present invention. 1...Speech recognition unit, 2...Speech standard pattern dictionary, 3...Linguistic meaning information dictionary, 4.
... Meaning flag, 11 ... Voice feature extraction section, 12 ... Voice standard pattern determination section, 13.
...Linguistic semantic information determination unit.

Claims

[Claims]

(1) A speech recognition device characterized by comprising speech recognition means, a linguistic meaning information dictionary for distinguishing recognized speech from the viewpoint of linguistic meaning, and a meaning flag for retaining linguistic meaning. .

(2) The speech identification device according to claim 1, wherein the speech recognition means includes a speech feature extraction section, a speech standard pattern judgment section, and a linguistic meaning information judgment section.