JPH046600A

JPH046600A - Voice recognition device

Info

Publication number: JPH046600A
Application number: JP2108025A
Authority: JP
Inventors: Yasunaga Miyazawa; 宮沢　康永
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 1990-04-24
Filing date: 1990-04-24
Publication date: 1992-01-10

Abstract

PURPOSE:To increase the recognition rate by incorporating a voice input part, a feature extraction part, a recognition and decision part, and a recognition result transmission part in a wrist watch. CONSTITUTION:A voice input part consists of a microphone, a high-frequency- range emphasizing filter, and an AD converter and samples a voice at an 8kHz, 12-bit rate. The voice signal which is converted into a digital signal is converted by the feature extraction part into frequency elements and feature parameters in the frequency range of the voice are extracted as a voice spectrum which is the converted signal. Further, the feature parameters which are obtained are compared with by the recognition and decision part with the feature parameters of the phoneme of the speaker in the frequency range which are learnt in advance to recognize and decide which phoneme is voiced, thereby coding the voice. Then the voice input part, feature extraction part, recognition and decision part, and recognition result transmission part are incorporated in the wrist watch and a recognition result reception part and a recognition result processing part are incorporated in a system main body. Consequently, the recognition rate is increased.

Description

【発明の詳細な説明】［産業上の利用分野１本発明は腕時計を用いた音声認識装置に関する。[Detailed description of the invention] [Industrial application field 1 The present invention relates to a speech recognition device using a wristwatch.

［従来の技術１従来の技術では、電子通信学会バタン研究会資料、ＰＲ
Ｌ７９−６１に記載される会話音声認識システムのよう
に、音声入力部、特徴抽出部、認識判定部、認識結果処
理部より構成される、不特定話者用音声認識装置が知ら
れていた。[Conventional technology 1 In the conventional technology, materials from the Batan study group of the Institute of Electronics and Communication Engineers, PR
A speech recognition device for an unspecified speaker, which is composed of a speech input section, a feature extraction section, a recognition determination section, and a recognition result processing section, is known, such as the conversation speech recognition system described in No. L79-61.

［発明が解決しようとする課題及び目的１しかし、従来
の技術では、不特定話者の音声を認識する必要性のある
装置において、不特定話者の音声認識を行うために、特
徴抽出部、認識判定部の構成が非常に複雑となり、実時
間処理を行うためのハードウェアが大規模かつ高価にな
ってしまい、認識率も低いという問題点を有していた。[Problems and Objectives to be Solved by the Invention 1 However, in the conventional technology, in a device that needs to recognize the voice of an unspecified speaker, in order to recognize the voice of an unspecified speaker, a feature extraction unit, The structure of the recognition determination unit is extremely complicated, the hardware for performing real-time processing is large-scale and expensive, and the recognition rate is low.

現在存在する不特定話者用音声認識装置の認識率は最高
でも９５％程度であり、誤認識が確実に起こるという問
題点を有している。The recognition rate of currently existing speech recognition devices for non-specific speakers is about 95% at best, and has the problem that erroneous recognition is certain to occur.

これは人間の音声波形が各個人によって大きく異なるこ
とに起因している。This is due to the fact that human voice waveforms vary greatly from person to person.

よって本発明はこのような問題点を解決するもので、そ
の目的とするところは、不特定話者の音声を認識する必
要性のある装置において、各個人用の腕時計を使用し、
特定話者の音声認識を行うことによって、認識率を高く
して、音声認識部を小型化するところにある。Therefore, the present invention is intended to solve these problems, and its purpose is to use each individual's wristwatch in a device that needs to recognize the voice of an unspecified speaker.
By recognizing the voice of a specific speaker, the recognition rate can be increased and the size of the voice recognition unit can be reduced.

［課題を解決するための手段１特定話者の音声を入力し、その入力された音声により機
械を作動させ、話者の目的とする処理を行う音声認識装
置において、音声を人力し、その音声信号をデジタル信
号に変換する音声人力部、前記音声入力部からの信号を
受け、その特徴パラメータを抽出する特徴抽出部、前記
特徴抽出部からの信号を受け、その信号を音素等の音声
コードとして認識する音声認識部、前記音声認識部から
の信号を受け、その信号を、送出する認識結果送信部、
前記認識結果送信部からの信号を受ける認識結果受信部
、前記認識結果受信部からの信号を受け、意味を解析し
、所定の対応をとる認識結果処理部を有し、前記音声入
力部、前記特徴抽出部、前記認識判定部、及び前記認識
結果送信部が腕時計の内部に組み込まれていることを特
徴とする。[Means for solving the problem 1: In a speech recognition device that inputs the voice of a specific speaker, operates a machine based on the input voice, and performs the processing desired by the speaker, the voice is manually input and the voice is a voice input unit that converts the signal into a digital signal; a feature extraction unit that receives the signal from the voice input unit and extracts its characteristic parameters; and a feature extraction unit that receives the signal from the feature extraction unit and converts the signal into a voice code such as a phoneme. a speech recognition unit for recognition; a recognition result transmission unit that receives a signal from the speech recognition unit and sends the signal;
a recognition result receiving section that receives a signal from the recognition result transmitting section; a recognition result processing section that receives the signal from the recognition result receiving section, analyzes the meaning, and takes a predetermined response; The present invention is characterized in that the feature extraction section, the recognition determination section, and the recognition result transmission section are built into a wristwatch.

［実施例］以下、本発明の一実施例を図面に沿って説明する。[Example] An embodiment of the present invention will be described below with reference to the drawings.

第１図は本発明の音声認識装置のシステム構成図である
。FIG. 1 is a system configuration diagram of a speech recognition device according to the present invention.

第１図で示されるように、本発明の音声認識装置は、音
声入力部、特徴抽出部、認識判定部、認識結果送信部、
認識結果受信部、及び認識結果処理部より構成され、音
声入力部、特徴抽出部、認識判定部、認識結果送信部が
、腕時計の内部に組み込まれ、認識結果受信部、認識結
果処理部がシステム本体に組み込まれた構造となってい
る。ここで、腕時計の内部で行われる音声認識は、その
所有者個人の音声のみを認識する、特定話者認識である
。As shown in FIG. 1, the speech recognition device of the present invention includes a speech input section, a feature extraction section, a recognition determination section, a recognition result transmission section,
Consisting of a recognition result receiving section and a recognition result processing section, the voice input section, feature extraction section, recognition determination section, and recognition result transmitting section are built into the wristwatch, and the recognition result receiving section and recognition result processing section are integrated into the system. It has a structure built into the main body. Here, the voice recognition performed inside the wristwatch is a specific speaker recognition in which only the voice of the individual owner of the wristwatch is recognized.

以下に各部の説明をする。音声入力部は、マイク、高域
強調フィルタ、ＡＤ変換器より構成され、音声を８ＫＨ
ｚ、１２ｂｉｔｓでサンプリングする。デジタル信号に
変換された音声信号は、特徴抽出部において、周波数次
元に変換され、その変換された信号である音声スペクト
ルより、音声の周波数領域での特徴パラメータを抽出す
る。得られた特徴パラメータを、認識判定部において、
あらかじめ学習されている話者（すなわち腕時計の所有
者）の音素の周波数領域での特徴パラメータと比較する
ことにより、どのような音素が音声として発話されたの
かを認識判定し、音声コード化を行う。Each part will be explained below. The audio input section consists of a microphone, a high-frequency emphasis filter, and an AD converter, and outputs audio at 8KH.
z, sample at 12 bits. The audio signal converted into a digital signal is converted into a frequency dimension in a feature extraction section, and feature parameters in the frequency domain of the audio are extracted from the audio spectrum that is the converted signal. The obtained feature parameters are processed in the recognition determination unit.
By comparing the frequency-domain feature parameters of the phonemes of the speaker (i.e., the owner of the watch) that have been learned in advance, it recognizes and determines what kind of phoneme was uttered as speech, and performs speech encoding. .

第３図（ａ）に、ある男性が発声した母音／ａ／の原波
形１を示す。ここで、２は時間軸であり１目盛り区間は
１０ｍ５である。第３図（ｂ）に、その音声信号を周波
数変換した音声スペクトル包絡線３を示す。ここで、横
軸４は周波数軸であり、１目盛り区間はＩＫＨｚである
。また、縦軸５は振幅（ｄＢ）である。ここで示される
音声スペクトル包絡線３の特性は、各個人毎に非常に高
い再現性を持っている。よって特定話者の音声認識は、
その個人の各音素の周波数特性をあらかじめ学習バタン
データとして音声認識部のメモリーに保存しておき、認
識時には、発声された音声信号の周波数特性に一致する
学習バタンデータを選択することにより、発声された音
素を認識することができる。よって、本発明における音
声認識は非常に簡単なアルゴリズムで達成できるため、
実時間で処理を行うためのハードウェアの構成も非常に
簡単となり、装置全体の小型化が可能となる。更に、特
定話者認識では、本発明のような簡単な装置においても
１００％に近い認識率を得ることが可能となる。FIG. 3(a) shows the original waveform 1 of the vowel /a/ uttered by a certain man. Here, 2 is the time axis, and one scale section is 10 m5. FIG. 3(b) shows an audio spectrum envelope 3 obtained by converting the frequency of the audio signal. Here, the horizontal axis 4 is a frequency axis, and one scale section is IKHz. Moreover, the vertical axis 5 is the amplitude (dB). The characteristics of the voice spectrum envelope 3 shown here have extremely high reproducibility for each individual. Therefore, speech recognition of a specific speaker is
The frequency characteristics of each phoneme of the individual are stored in the memory of the speech recognition unit in advance as learning button data, and during recognition, the learning button data that matches the frequency characteristics of the uttered voice signal is selected. be able to recognize phonemes. Therefore, since speech recognition in the present invention can be achieved with a very simple algorithm,
The hardware configuration for performing processing in real time is also extremely simple, making it possible to downsize the entire device. Furthermore, in specific speaker recognition, it is possible to obtain a recognition rate close to 100% even with a simple device such as the present invention.

このようにして実時間で処理され、得られた音声コード
列は、腕時計の内部の認識結果送信部より随時電波によ
って送信され、システム本体の認識結果受信部で受信さ
れる。受信された音声コード列は認識結果処理部に送ら
れ、その認識結果処理部において各システムで必要な様
々な処理が施される。The voice code strings obtained through real-time processing are transmitted via radio waves from the recognition result transmitter inside the wristwatch at any time, and are received by the recognition result receiver in the system main body. The received voice code string is sent to the recognition result processing section, where various processes necessary for each system are performed.

一例として、本発明の音声認識装置を切符の自動販売機
に応用した例を説明する。話者は自分の腕時計に向かっ
て、目的駅名、列車時刻、切符の枚数、指定席か否か、
等を発声することにより、システム本体がそれらの情報
を受信し、その列車の座席を予約し、金額を計算する等
の処理を行い、切符の販売を迅速かつ正確に行う。As an example, an example in which the speech recognition device of the present invention is applied to a ticket vending machine will be described. The speaker looks at his watch and records the name of his destination station, train time, number of tickets, and whether or not he has a reserved seat.
etc., the system receives the information, reserves seats on the train, calculates the price, and sells tickets quickly and accurately.

音声認識部を腕時計の内部に組み込んだ構造にすること
により、話者は現在の時刻を自分の腕時計で確認しなが
ら目的の切符を指定できるため、より迅速かつ正確な切
符の購入が可能となる。By incorporating the voice recognition unit inside the wristwatch, the speaker can specify the desired ticket while checking the current time on his or her wristwatch, making it possible to purchase tickets more quickly and accurately. .

更にこの腕時計に、必要な列車の時刻等を記憶しておく
構造を加えることにより、切符購入の効率を更に高める
ことができる。Furthermore, by adding a structure to this wristwatch to memorize necessary train times, etc., it is possible to further improve the efficiency of ticket purchasing.

また、この音声認識装置は、ジュース、タバコ等の自動
販売機にも応用できる。さらに、腕時計より送信される
信号を共通化して処理を行うことにより、全ての装置を
、各人の持つ腕時計によって動作させることが可能とな
る。すなわち、あらゆる場所であらゆる商品を、腕時計
を用いて音声により購入することが可能となる。This voice recognition device can also be applied to vending machines for juice, cigarettes, etc. Furthermore, by commonizing and processing the signals transmitted from the wristwatches, it becomes possible to operate all the devices using the wristwatches owned by each person. In other words, it becomes possible to purchase any product anywhere, by voice, using a wristwatch.

更に、腕時計の認識結果送信部からの信号を、音波また
は光として、電話回線につながる構造を加えることによ
り、音声による非常に膨大な情報の授受を、電話回線を
利用して、このような簡単な音声認識装置でおこなうこ
とが可能となる。Furthermore, by adding a structure that connects the signal from the recognition result transmitting section of the wristwatch as sound waves or light to a telephone line, it is possible to send and receive extremely large amounts of information by voice using a telephone line, making it possible to do so easily. This can be done using a voice recognition device.

［発明の効果］本発明の音声認識装置は、以上説明したように、音声入
力部、特徴抽出部、認識判定部、認識結果送信部を腕時
計の内部に組み込む構造にしたことにより、不特定話者
の音声を認識する必要性のある装置において、特定話者
音声認識により処理を行うことが可能となるため、認識
率を非常に高くする効果がある。[Effects of the Invention] As explained above, the voice recognition device of the present invention has a structure in which the voice input section, feature extraction section, recognition determination section, and recognition result transmission section are incorporated into the wristwatch, so that it can be used for unspecified speech. In a device that needs to recognize a person's voice, processing can be performed by specific speaker voice recognition, which has the effect of greatly increasing the recognition rate.

特に、本発明の音声認識装置を、現在の時刻を確認しな
がら行う処理へ応用した場合、その処理を迅速かつ正確
に行うことを可能にする効果がある。In particular, when the speech recognition device of the present invention is applied to processing that is performed while checking the current time, it has the effect of enabling the processing to be performed quickly and accurately.

更に、腕時計から送出される信号は音声コードであるた
め、信号を圧縮化する効果と、それにより通信方法を簡
単化し、装置を小型化する効果がある。Furthermore, since the signal sent from the wristwatch is a voice code, it has the effect of compressing the signal, thereby simplifying the communication method and downsizing the device.

[Brief explanation of the drawing]

第１図は、本発明の音声認識装置のシステム構成図。第２図は、従来例の音声認識装置のシステム構成図。第３図（ａ）は、ある男性が発声した母音／ａ／の原波
形図、第３図（ｂ）は、母音／ａ／の音声信号を周波数
変換した音声スペクトル包絡線図である。以上出願人　セイコーエプソン株式会社代理人　弁理士　鈴木喜三部（他１名）腕日寺ｇ十第１図第２図FIG. 1 is a system configuration diagram of a speech recognition device of the present invention. FIG. 2 is a system configuration diagram of a conventional speech recognition device. FIG. 3(a) is an original waveform diagram of the vowel /a/ uttered by a certain man, and FIG. 3(b) is a speech spectrum envelope diagram obtained by converting the frequency of the audio signal of the vowel /a/. Applicant Seiko Epson Co., Ltd. Agent Patent Attorney Kizobe Suzuki (and 1 other person) Kunijiji G1 Figure 1 Figure 2

Claims

[Scope of Claims] (a) A speech recognition device that inputs the voice of a specific speaker, operates a machine based on the input voice, and performs processing intended by the speaker, (b) , an audio input section that converts the audio signal into a digital signal, (c) a feature extraction section that receives the signal from the audio input section and extracts its feature parameters, (d) receives a signal from the feature extraction section, a speech recognition section that recognizes the signal as a speech code such as a phoneme; (e) a recognition result transmission section that receives the signal from the speech recognition section and sends the signal; (f) a recognition result transmission section that receives the signal from the recognition result transmission section. (g) a recognition result processing unit that receives the signal from the recognition result reception unit, analyzes the meaning, and takes a predetermined response; (h) the voice input unit; the feature extraction unit; , the recognition determination section, and the recognition result transmission section are incorporated into a wristwatch.