JP2006039342A

JP2006039342A - Voice memo printer

Info

Publication number: JP2006039342A
Application number: JP2004221295A
Authority: JP
Inventors: Yoshihiko Ikeda; 喜彦池田; Naoki Sekine; 直樹関根; Nobuo Watanabe; 伸夫渡辺; Shunji Saito; 俊次齊藤; Masanori Takeuchi; 雅則竹内; Junko Watanabe; 順子渡辺; Ekigen Yana; 益源梁; Wataru Sakurai; 渉櫻井
Original assignee: Toshiba TEC Corp
Current assignee: Toshiba TEC Corp
Priority date: 2004-07-29
Filing date: 2004-07-29
Publication date: 2006-02-09
Anticipated expiration: 2024-07-29
Also published as: JP4544933B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice memo printer which is able to distinguish words of different meanings from each other and print them even if their pronunciations are the same. <P>SOLUTION: The voice memo printer judges uttered words based on a voice pitch extracted by a voice pitch extraction means 58c, a voice recognition result outputted by a voice recognition means 58b, and time-sequential pitch data of the uttered words stored in a pitch dictionary 61. Thus, the voice memo printer is able to distinguish words of different meanings and print them even if their pronunciations are the same, for example, like "rain" and "candy" which are both uttered "ame" in Japanese. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声認識機能を搭載し、音声認識された発声文言を印字する音声メモプリンタに関する。 The present invention relates to a voice memo printer that is equipped with a voice recognition function and prints out voiced words that have been voice-recognized.

マイクなどから入力された音声に基づいて生成された音声デジタルデータを解析し、人間の声をテキストに変換する音声認識技術はパーソナルコンピュータ等で活用され、キーボードによる手入力に代わる手段として普及し始めている。 Voice recognition technology that analyzes voice digital data generated based on voice input from a microphone, etc., and converts human voice into text has been utilized in personal computers, etc., and has begun to become popular as an alternative to manual input using a keyboard. Yes.

一方、マイクなどから入力された音声を認識して用紙へ直接印字するものが知られている（例えば、特許文献１，２参照）。 On the other hand, there is known one that recognizes voice input from a microphone or the like and prints directly on a sheet (for example, see Patent Documents 1 and 2).

特開平５−３５４２８号公報Japanese Patent Laid-Open No. 5-35428 特開平８−２０１５号公報JP-A-8-2015

ところで、前述したような特許文献１，２のように人間の声を直接印字する場合には、発声文言のイントネーションの区別は行わずに、同一語として扱っている。 By the way, when the human voice is directly printed as in Patent Documents 1 and 2 as described above, the distinction of the intonation of the utterance is not performed, but is handled as the same word.

しかしながら、日本語においては、「雨」と「飴」、「橋」と「箸」のように、同一の称呼であってもイントネーションが異なる単語がある。すなわち、イントネーションの違いが意味の違いになっている。 However, in Japanese, there are words with different intonation even if they have the same name, such as “rain” and “飴”, “bridge” and “chopsticks”. In other words, the difference in intonation is the difference in meaning.

また、発声文言についてのイントネーションや発音のチェックをしたいという要望もある。 There is also a desire to check the intonation and pronunciation of utterances.

本発明は、同一の称呼でも意味の異なる単語を区別して印字することを目的とする。 An object of the present invention is to distinguish and print words having different meanings even with the same designation.

本発明は、発声者に発声文言についてのイントネーションや発音の可否を認識させることを目的とする。 An object of the present invention is to make a speaker recognize the intonation and utterance of a utterance wording.

本発明は、所定事項を印字する印字部と、発声文言である音声を入力するマイクと、このマイクから入力された音声アナログデータを音声デジタルデータに変換するＡ／Ｄ変換手段と、このＡ／Ｄ変換手段により変換された音声デジタルデータを周波数変換して解析する周波数解析手段と、この周波数解析手段により解析された周波数に基づく音声認識結果を出力する音声認識手段と、前記Ａ／Ｄ変換手段により変換された音声デジタルデータから音声ピッチを抽出する音声ピッチ抽出手段と、発声語に対応付けてこの発声語の時系列のピッチデータを格納しているピッチ辞書と、前記音声ピッチ抽出手段により抽出された音声ピッチと、前記音声認識手段により出力された音声認識結果と、前記ピッチ辞書に格納されている発声語の時系列のピッチデータとに基づき、発声文言を判定する判定手段と、この判定手段により判定された発声文言を前記印字部で印字する印字手段と、を備える。 The present invention includes a printing unit that prints predetermined items, a microphone that inputs voice that is a speech statement, A / D conversion means that converts audio analog data input from the microphone into audio digital data, and A / D Frequency analysis means for frequency-converting and analyzing voice digital data converted by the D conversion means, voice recognition means for outputting a voice recognition result based on the frequency analyzed by the frequency analysis means, and the A / D conversion means The voice pitch extracting means for extracting the voice pitch from the voice digital data converted by the above, the pitch dictionary storing the time series pitch data of the spoken word in association with the spoken word, and the voice pitch extracting means A time series of the spoken words stored in the pitch dictionary, and the voice recognition results output by the voice recognition means Based on the pitch data comprises determination means for vocal language, and printing means for printing an utterance language determined by the the determination means by the printing unit.

本発明は、所定事項を印字する印字部と、発声文言である音声を入力するマイクと、このマイクから入力された音声アナログデータを音声デジタルデータに変換するＡ／Ｄ変換手段と、このＡ／Ｄ変換手段により変換された音声デジタルデータを周波数変換して解析する周波数解析手段と、この周波数解析手段により解析された周波数に基づく音声認識結果を出力する音声認識手段と、前記Ａ／Ｄ変換手段により変換された音声デジタルデータから音声ピッチを抽出する音声ピッチ抽出手段と、発声語に対応付けてこの発声語の時系列のピッチデータを格納しているピッチ辞書と、前記音声ピッチ抽出手段により抽出された音声ピッチと、前記音声認識手段により出力された音声認識結果と、前記ピッチ辞書に格納されている発声語の時系列のピッチデータとに基づき、発声文言が正しいイントネーションで発声されているか否かを判定する判定手段と、この判定手段による判定結果を前記印字部で印字する印字手段と、を備える。 The present invention includes a printing unit that prints predetermined items, a microphone that inputs voice that is a speech statement, A / D conversion means that converts audio analog data input from the microphone into audio digital data, and A / D Frequency analysis means for frequency-converting and analyzing voice digital data converted by the D conversion means, voice recognition means for outputting a voice recognition result based on the frequency analyzed by the frequency analysis means, and the A / D conversion means The voice pitch extracting means for extracting the voice pitch from the voice digital data converted by the above, the pitch dictionary storing the time series pitch data of the spoken word in association with the spoken word, and the voice pitch extracting means A time series of the spoken words stored in the pitch dictionary, and the voice recognition results output by the voice recognition means Based on the pitch data comprises determining means for determining whether or not the utterance language is spoken in the correct intonation, and printing means for printing the determination result by the determining means by the printing unit.

本発明は、所定事項を印字する印字部と、発声文言である音声を入力するマイクと、このマイクから入力された音声アナログデータを音声デジタルデータに変換するＡ／Ｄ変換手段と、このＡ／Ｄ変換手段により変換された音声デジタルデータを周波数変換して解析する周波数解析手段と、この周波数解析手段により解析された周波数に基づく音声認識結果を出力する音声認識手段と、前記Ａ／Ｄ変換手段により変換された音声デジタルデータから音声ピッチを抽出する音声ピッチ抽出手段と、発声語に対応付けてこの発声語の時系列のピッチデータを格納しているピッチ辞書と、前記音声ピッチ抽出手段により抽出された音声ピッチと、前記音声認識手段により出力された音声認識結果と、前記ピッチ辞書に格納されている発声語の時系列のピッチデータとに基づき、発声文言が正しいイントネーションで発声されているか否かを判定する判定手段と、この判定手段により発声文言が正しいイントネーションで発声されていると判定された場合にのみ、前記言語特徴抽出手段により出力された音声認識結果を前記印字部で印字する印字手段と、を備える。 The present invention includes a printing unit that prints predetermined items, a microphone that inputs voice that is a speech statement, A / D conversion means that converts audio analog data input from the microphone into audio digital data, and A / D Frequency analysis means for frequency-converting and analyzing voice digital data converted by the D conversion means, voice recognition means for outputting a voice recognition result based on the frequency analyzed by the frequency analysis means, and the A / D conversion means The voice pitch extracting means for extracting the voice pitch from the voice digital data converted by the above, the pitch dictionary storing the time series pitch data of the spoken word in association with the spoken word, and the voice pitch extracting means A time series of the spoken words stored in the pitch dictionary, and the voice recognition results output by the voice recognition means Based on the pitch data, the language feature only when the determination means determines whether or not the utterance word is uttered with the correct intonation, and the determination means determines that the utterance word is uttered with the correct intonation. Printing means for printing the voice recognition result output by the extracting means on the printing unit.

本発明によれば、同じ称呼の文言でもイントネーションの違いを区別して発声文言として判定して印字するので、例えば発声文言が「アメ」である「雨」と「飴」のように同一の称呼でも意味の異なる単語を区別して印字することができる。 According to the present invention, even in the same name, the distinction of the intonation is distinguished and printed as the utterance wording. For example, even if the utterance wording is “Ame” and “Rain” and “飴” Words with different meanings can be distinguished and printed.

本発明によれば、発声文言についてのイントネーションの判定結果（合否、得点、ずれ度など）を印字することで、発声者に発声文言についてのイントネーションや発音の可否を認識させることができる。
を印字する
本発明によれば、発声文言についてのイントネーションの判定結果が合格であれば印字し、不合格であれば印字を行わないことで、発声者に発声文言についてのイントネーションや発音の可否を認識させることができる。 According to the present invention, the intonation determination result (pass / fail, score, deviation degree, etc.) for the utterance wording is printed, so that the utterer can recognize the intonation and utterance of the utterance wording.
According to the present invention, if the determination result of the intonation for the utterance word is acceptable, it is printed, and if it is unacceptable, the printing is not performed. Can be recognized.

本発明の実施の一形態を図１ないし図８に基づいて説明する。本実施の形態は、音声メモプリンタとして携帯可能なポータブルプリンタを適用したものである。 An embodiment of the present invention will be described with reference to FIGS. In this embodiment, a portable printer that is portable as a voice memo printer is applied.

ここで、図１は本発明の実施の一形態の音声メモプリンタ１をラベル排出側から示す外観斜視図、図２は音声メモプリンタ１をオペレータ装着側から示す外観斜視図、図３は音声メモプリンタ１の内部構造を示す水平断面図である。 Here, FIG. 1 is an external perspective view showing the voice memo printer 1 according to an embodiment of the present invention from the label discharge side, FIG. 2 is an external perspective view showing the voice memo printer 1 from the operator mounting side, and FIG. FIG. 2 is a horizontal sectional view showing the internal structure of the printer 1.

図１ないし図３に示すように、携帯可能なポータブルプリンタである音声メモプリンタ１のプリンタ本体１ａは、一面が開放されたケース２と、このケース２の開放された面を開閉するカバー３とにより構成されている。カバー３は、ケース２に設けられた支点軸４により回動自在に支持されている。そして、ケース２には、カバー３を閉じた状態で、ロール状に巻回された長尺状の記録紙５を転動自在に収納するホッパ６が形成されている。なお、本実施の形態においては、記録紙５として台紙５ａに多数のラベル５ｂを等間隔で貼付したものを用いているが、他の記録紙を用いても良い。ラベル５ｂには粘着力の弱い糊が塗布されており、印字発行後には、付箋紙Ｐ（図８参照）としても利用可能である。 As shown in FIGS. 1 to 3, a printer main body 1a of a voice memo printer 1 which is a portable portable printer includes a case 2 with one side opened, and a cover 3 for opening and closing the opened side of the case 2. It is comprised by. The cover 3 is rotatably supported by a fulcrum shaft 4 provided on the case 2. The case 2 is formed with a hopper 6 for storing the long recording paper 5 wound in a roll shape with the cover 3 closed. In the present embodiment, the recording paper 5 is one in which a large number of labels 5b are affixed to the mount 5a at equal intervals, but other recording paper may be used. The label 5b is coated with a glue having a weak adhesive force, and can be used as a sticky note P (see FIG. 8) after the printing is issued.

このようなケース２には、ホッパ６の底部からカバー３側に向けて延出する用紙ガイド７が設けられており、この用紙ガイド７のカバー３に近い部分には、回転自在のプラテン８と、このプラテン８の長手方向に沿うラベル剥離体９とが配設されている。 In such a case 2, a paper guide 7 extending from the bottom of the hopper 6 toward the cover 3 is provided. A portion of the paper guide 7 near the cover 3 has a rotatable platen 8 and A label peeling body 9 along the longitudinal direction of the platen 8 is disposed.

図３に示すように、カバー３の内面（ホッパ６側）には、サーマルヘッド１２を備えたヘッド支持体１１が支軸１１ａを中心に回動自在に設けられている。このヘッド支持体１１は板ばね１３により一方向に付勢されており、サーマルヘッド１２はカバー３を閉じた状態でプラテン８に当接することになる。すなわち、プラテン８とサーマルヘッド１２とにより印字部１４が形成されている。 As shown in FIG. 3, a head support 11 having a thermal head 12 is provided on the inner surface of the cover 3 (on the hopper 6 side) so as to be rotatable around a support shaft 11a. The head support 11 is biased in one direction by a leaf spring 13, and the thermal head 12 comes into contact with the platen 8 with the cover 3 closed. That is, the printing unit 14 is formed by the platen 8 and the thermal head 12.

また、カバー３の自由端側の両側には、スプリング１５の付勢力によりプラテン８に圧接されたピンチローラ１６が回転自在に設けられている。さらに、カバー３には、サーマルヘッド１２とピンチローラ１６との間に配置されてラベル５ｂを排出させるラベル排出口１７と、ホッパ６内の記録紙５の浮きを押える紙押え１８とが形成されている。ケース２にはカバー３の自由端との間で台紙５ａを排出させる台紙排出口１９が形成されている。 Further, on both sides of the free end side of the cover 3, pinch rollers 16 that are pressed against the platen 8 by the urging force of the spring 15 are rotatably provided. Further, the cover 3 is formed with a label discharge port 17 that is disposed between the thermal head 12 and the pinch roller 16 and discharges the label 5b, and a paper presser 18 that presses the floating of the recording paper 5 in the hopper 6. ing. The case 2 is formed with a mount discharge port 19 for discharging the mount 5 a between the free end of the cover 3.

ケース２の上面には、バッテリ１０（図３参照）からの電力供給のＯＮ／ＯＦＦを宣言する電源スイッチ２０、ラベル５ｂに印字を行わせるフィードスイッチ２１、蓋部２２、赤外線を受光する受光窓２３が設けられている。蓋部２２は、ケース２の一つの面である上面に開口して設けられたバッテリ収納部３０（図３参照）に対してバッテリ１０を着脱する場合に開閉するものである。さらに、カバー３の両側には係止爪２４がスライド自在に設けられている（図１参照）。これらの係止爪２４は外側に向けて付勢されてケース２に係止され、カバー３を開放するときに係止爪２４を矢印マークで示すように内方スライドさせてケース２との係止状態を解除する。 On the upper surface of the case 2, a power switch 20 that declares ON / OFF of power supply from the battery 10 (see FIG. 3), a feed switch 21 that performs printing on the label 5 b, a lid 22, and a light receiving window that receives infrared rays. 23 is provided. The lid portion 22 opens and closes when the battery 10 is attached to and detached from the battery storage portion 30 (see FIG. 3) provided to be opened on the upper surface that is one surface of the case 2. Further, locking claws 24 are slidably provided on both sides of the cover 3 (see FIG. 1). These locking claws 24 are urged outward to be locked to the case 2, and when the cover 3 is opened, the locking claws 24 are slid inward as indicated by the arrow marks to engage with the case 2. Release the stop state.

また、ケース２のラベル排出口１７と同一面には、内蔵マイク５２が設けられている。本実施の形態の音声メモプリンタ１には、音声認識機能が搭載されており、この内蔵マイク５２は、この音声認識機能を実行する際に用いられるものである。 A built-in microphone 52 is provided on the same surface of the case 2 as the label discharge port 17. The voice memo printer 1 according to the present embodiment is equipped with a voice recognition function, and the built-in microphone 52 is used when executing the voice recognition function.

加えて、ケース２の上面には、ＬＥＤ５６が配設されている。本実施の形態の音声メモプリンタ１は、このＬＥＤ５６を点灯させたり点滅させることにより、音声メモプリンタ１の動作状態をオペレータに対して報知することができるようになっている。 In addition, an LED 56 is disposed on the upper surface of the case 2. The voice memo printer 1 of the present embodiment can notify the operator of the operation state of the voice memo printer 1 by turning on or blinking the LED 56.

さらに、図２に示すように、プリンタ本体１ａのカバー３とは反対側の一面には、オペレータの腰のあたりに密着される弧面２５が形成され、この弧面２５にはオペレータの衣服に対して滑りを少なくするための滑り止め２６と、この滑り止め２６に対向してオペレータのベルトに引っ掛けられるベルト掛け２７とが形成されている。 Further, as shown in FIG. 2, an arc surface 25 is formed on one surface of the printer body 1a opposite to the cover 3 so as to be in close contact with the operator's waist. On the other hand, a non-slip 26 for reducing slippage and a belt hook 27 which is hooked on the operator's belt so as to face the non-slip 26 are formed.

このような構成により、バッテリ１０がバッテリ収納部３０へと正しく収納された場合には、電源スイッチ２０がＯＮしている状態でバッテリ収納部３０の端子とバッテリ１０の端子とが接触して電気的に接続された状態となり、バッテリ１０から電力供給を必要とするサーマルヘッド１２等の各部へと電力が供給されることになる。 With such a configuration, when the battery 10 is correctly stored in the battery storage unit 30, the terminal of the battery storage unit 30 and the terminal of the battery 10 come into contact with each other while the power switch 20 is ON. Thus, power is supplied from the battery 10 to each part such as the thermal head 12 that requires power supply.

このような音声メモプリンタ１は、記録紙５をセットする場合にカバー３を開放し、ロール状に巻回された記録紙５をプリンタ本体１ａのホッパ６に収納し、カバー３が開放されている状態で記録紙５の先端をプラテン８及びラベル剥離体９を覆う位置まで引き出し、カバー３を閉塞する。これにより、図３に示すように、記録紙５の台紙５ａの先端部分が、サーマルヘッド１２とピンチローラ１６とによりプラテン８上に圧接され、また、ラベル剥離体９により台紙５ａの引き出し経路が鋭角に折曲され、ホッパ６の底面からの記録紙５の浮きが紙押え１８により阻止される。記録紙５をセットしたプリンタ本体１ａは、机上に置いて使用することも可能であるが、通常はオペレータの腰に装着した状態でも使用可能である。 In such a voice memo printer 1, when the recording paper 5 is set, the cover 3 is opened, the recording paper 5 wound in a roll shape is stored in the hopper 6 of the printer main body 1a, and the cover 3 is opened. In this state, the front end of the recording paper 5 is pulled out to a position covering the platen 8 and the label peeling body 9 to close the cover 3. As a result, as shown in FIG. 3, the leading end portion of the mount 5a of the recording paper 5 is pressed against the platen 8 by the thermal head 12 and the pinch roller 16, and the pull-out path of the mount 5a is formed by the label peeling member 9. The recording paper 5 is bent at an acute angle and the recording paper 5 is prevented from floating from the bottom surface of the hopper 6. The printer main body 1a on which the recording paper 5 is set can be used by placing it on a desk, but it can also be used even when it is usually worn on the operator's waist.

次に、音声メモプリンタ１の各部の制御系の接続について図４を参照しつつ説明する。音声メモプリンタ１は、各部を集中的に制御するＣＰＵ（Central Processing Unit）４１を備えており、このＣＰＵ４１には、ＣＰＵ４１が実行するプログラム等の固定データが書き込まれているＲＯＭ（Read Only Memory）４２と、ワークデータ等の可変データを更新自在に書き込むＲＡＭ（Random Access Memory）４３と、各種情報を登録するフラッシュメモリ４４とがバスライン４５を介して接続されている。そして、サーマルヘッド１２を駆動するサーマルヘッドドライバ４６、プラテン８が連結されたモータ４７を駆動するモータドライバ４８、各種センサ４９が接続されたセンサ回路５０、カバー３の開閉によりオン、オフするカバーオープンスイッチ５１と電源スイッチ２０とフィードスイッチ２１とが接続されたスイッチ回路５４、赤外線インタフェース５５、ＬＥＤ５６が接続された点灯制御回路５７が、ＣＰＵ４１に接続されている。このように、図４に示す回路はプリンタ本体１ａの内部に設けられた基板（図示せず）上に形成されている。なお、赤外線インタフェース５５は、前述した受光窓２３の内方に配置されている。インタフェースは図ではＩ／Ｆと記す。 Next, the connection of the control system of each part of the voice memo printer 1 will be described with reference to FIG. The voice memo printer 1 includes a CPU (Central Processing Unit) 41 that centrally controls each unit. The CPU 41 stores a ROM (Read Only Memory) in which fixed data such as a program executed by the CPU 41 is written. 42, a RAM (Random Access Memory) 43 in which variable data such as work data is renewably written, and a flash memory 44 for registering various information are connected via a bus line 45. Then, a thermal head driver 46 for driving the thermal head 12, a motor driver 48 for driving a motor 47 connected to the platen 8, a sensor circuit 50 to which various sensors 49 are connected, and a cover open that is turned on and off by opening and closing the cover 3. A switch circuit 54 to which the switch 51, the power switch 20 and the feed switch 21 are connected, an infrared interface 55, and a lighting control circuit 57 to which the LED 56 is connected are connected to the CPU 41. As described above, the circuit shown in FIG. 4 is formed on a substrate (not shown) provided inside the printer main body 1a. The infrared interface 55 is disposed inside the light receiving window 23 described above. The interface is denoted as I / F in the figure.

また、ＣＰＵ４１には、音声入力用ＣＯＤＥＣ５３が接続されている。この音声入力用ＣＯＤＥＣ５３には、内蔵マイク５２が接続されている。音声入力用ＣＯＤＥＣ５３は、Ａ／Ｄ変換手段として機能するものであり、内蔵マイク５２から入力された音声アナログデータを音声デジタルデータに変換してＣＰＵ４１に出力する。 The CPU 41 is connected with a voice input CODEC 53. A built-in microphone 52 is connected to the audio input CODEC 53. The audio input CODEC 53 functions as an A / D conversion unit, converts audio analog data input from the built-in microphone 52 into audio digital data, and outputs the audio digital data to the CPU 41.

さらに、ＣＰＵ４１には、音声認識エンジン５８が接続されている。この音声認識エンジン５８は、内蔵マイク５２から入力されて音声入力用ＣＯＤＥＣ５３で生成された音声デジタルデータを解析し、人間の声をテキストに変換するものである。このような音声認識エンジン５８は、例えば、人間の発声の小さな単位（音素）の音響特徴（音韻）が登録される音響辞書５９や音声認識させる単語の言語特徴が登録されている言語パターン辞書６０を用いて音声認識を行う。 Further, a speech recognition engine 58 is connected to the CPU 41. The voice recognition engine 58 analyzes voice digital data input from the built-in microphone 52 and generated by the voice input CODEC 53, and converts human voice into text. Such a speech recognition engine 58 includes, for example, an acoustic dictionary 59 in which acoustic features (phonemes) of small units (phonemes) of human speech are registered, and a language pattern dictionary 60 in which language features of words to be recognized are registered. Voice recognition is performed using.

本実施の形態の言語パターン辞書６０に登録されている音声認識させる単語は、特定用途に絞られている。特定用途では決まった言葉が発せられることが多いため、このように特定用途に絞った単語のみを言語パターン辞書６０に登録するようにすることで、言語パターン辞書６０を安価に構成することができる。具体的には、使用される用途において使用されるであろう会話や発声文言を一覧に纏め、用途別使用言語表（図示せず）とする。この用途別使用言語表に登録された各言語毎に、その言語の周波数を解析し、音声特徴（音韻情報）と言語特徴（音韻の系列情報）に分離する。このようにして分離された言語特徴が、言語パターン辞書６０に登録される。 The words to be recognized by speech registered in the language pattern dictionary 60 of the present embodiment are limited to specific applications. Since a specific word is often issued in a specific application, the language pattern dictionary 60 can be configured at low cost by registering only the words focused on the specific application in the language pattern dictionary 60 in this way. . Specifically, conversations and utterances that are likely to be used in the intended use are collected in a list and used as a use language table (not shown). For each language registered in this use language table for each application, the frequency of the language is analyzed and separated into speech features (phoneme information) and language features (phoneme series information). The language features separated in this way are registered in the language pattern dictionary 60.

音響辞書５９は、用途別でなく、音声認識全般に係わる辞書として使用される。声を発する原理は、
（１）『喉が震える』
（２）『口腔／鼻腔を通過』
と考えられることから、音響辞書５９には、声の周波数から（１）（２）の形状を特定する情報を格納する。 The acoustic dictionary 59 is used as a dictionary related to voice recognition in general, not by use. The principle of speaking is
(1) “My throat trembles”
(2) “Passing through oral cavity / nasal cavity”
Therefore, the acoustic dictionary 59 stores information for specifying the shapes (1) and (2) from the voice frequency.

このような構成の音声認識エンジン５８は、図５に示すように、内蔵マイク５２から入力されて音声入力用ＣＯＤＥＣ５３で生成された音声デジタルデータを周波数解析手段である周波数解析部５８ａにより周波数変換して解析し、音声認識手段である比較部５８ｂにおいて音響辞書５９に基づいて音響特徴を算出する（音声特徴抽出手段）。この段階では、前述した（１）（２）の形状が特定できただけで、５０音のどれかは、未だ特定できない。そこで、言語パターン辞書６０に登録されている単語の中から、単語の言語特徴が入力音声の音響特徴に最も近い単語を探して音声認識結果として出力する（言語特徴抽出手段）。このように言語パターン辞書６０と比較することで、初めて「あいうえお」等を特定することができる。不特定多数の言葉が発せられると特定は困難だが、特定の用途で発せられる言葉に絞り込むようにし、前述した（１）（２）の関係と音韻系列波形の特徴を単語全体で比較すれば、誤認識の可能性を極力抑える事ができ、このような簡便な機構で音声認識が可能となる。 As shown in FIG. 5, the speech recognition engine 58 having such a configuration performs frequency conversion on speech digital data input from the built-in microphone 52 and generated by the speech input CODEC 53 by a frequency analysis unit 58a which is a frequency analysis means. And the acoustic feature is calculated based on the acoustic dictionary 59 in the comparison unit 58b which is speech recognition means (speech feature extraction means). At this stage, only the shapes (1) and (2) described above can be identified, and any of the 50 sounds cannot be identified yet. Therefore, from the words registered in the language pattern dictionary 60, a word whose language feature is closest to the acoustic feature of the input speech is searched for and output as a speech recognition result (language feature extraction means). By comparing with the language pattern dictionary 60 in this way, “Aiueo” or the like can be specified for the first time. It is difficult to specify when a large number of unspecified words are uttered, but if you try to narrow down to words that are uttered for a specific purpose and compare the relationship of (1) and (2) above and the characteristics of the phoneme sequence waveform, The possibility of misrecognition can be suppressed as much as possible, and speech recognition is possible with such a simple mechanism.

また、言語パターン辞書６０は、音声メモプリンタ１に図示しない外部機器（パーソナルコンピュータ等）を赤外線インタフェース５５を介して接続することで、当該外部機器から更新可能である。さらに、言語パターン辞書６０を格納する言語パターン格納チップ（辞書）の交換や言語パターン辞書６０の図示しない外部機器（パーソナルコンピュータ等）からのダウンロードによる登録内容の書き換えにより、言語パターン辞書６０の内容を特定用途毎に変えることも可能である。新たな言語パターン辞書６０が赤外線インタフェース５５を介してダウンロードされた場合には、旧言語パターン辞書６０は、抹消される。 The language pattern dictionary 60 can be updated from an external device by connecting an external device (such as a personal computer) (not shown) to the voice memo printer 1 via the infrared interface 55. Further, the contents of the language pattern dictionary 60 are changed by exchanging the language pattern storage chip (dictionary) for storing the language pattern dictionary 60 or rewriting the registered contents by downloading the language pattern dictionary 60 from an external device (such as a personal computer) (not shown). It is also possible to change for each specific application. When a new language pattern dictionary 60 is downloaded via the infrared interface 55, the old language pattern dictionary 60 is deleted.

加えて、本実施の形態の音声認識エンジン５８は、前述したような言葉としての音声認識に加え、言葉が正しいイントネーションで発声されたか否かを確認することが可能である。詳細には、音声認識エンジン５８は、図５に示すように、内蔵マイク５２から入力されて音声入力用ＣＯＤＥＣ５３で生成された音声デジタルデータから音声ピッチ抽出手段である音声ピッチ抽出部５８ｃにおいて音声ピッチを抽出し、音声ピッチ抽出部５８ｃで抽出した音声ピッチと、前述した比較部５８ｂにおける音声認識結果と、ピッチ辞書６１とに基づいて判定手段であるスコア判定部５８ｄにおいて正しいイントネーションで発声されたか否かを判定する。 In addition, the speech recognition engine 58 of the present embodiment can confirm whether or not the words are uttered with the correct intonation in addition to the speech recognition as the words as described above. Specifically, as shown in FIG. 5, the speech recognition engine 58 includes a speech pitch extraction unit 58 c which is speech pitch extraction means from speech digital data input from the built-in microphone 52 and generated by the speech input CODEC 53. Whether or not the voice is extracted with the correct intonation in the score determination unit 58d as the determination means based on the voice pitch extracted by the voice pitch extraction unit 58c, the voice recognition result in the comparison unit 58b described above, and the pitch dictionary 61. Determine whether.

ピッチ辞書６１には、声の高さやイントネーションデータである「時系列のピッチデータ（ピッチ系列）」が「発声語」と対になって格納されている。具体的には、音声デジタルデータから音声ピッチを抽出し、線形予測法や最大エントロピー法などによりその周期性を解析して「時系列のピッチデータ（ピッチ系列）」を生成し、発声語とともに登録したものである。例えば、英語教師が発声練習に音声メモプリンタ１を用いる場合には、教師が文言を発声してピッチ辞書６１に「発声語」と「時系列のピッチデータ（ピッチ系列）」とを登録しておくことになる。つまり、例えば称呼が「アメ」である「雨」と「飴」のように同一の称呼でも意味の異なる「発声語」が、区別されてピッチ辞書６１に登録されることになる。 The pitch dictionary 61 stores “time-sequential pitch data (pitch sequence)”, which is voice pitch and intonation data, in pairs with “spoken words”. Specifically, the voice pitch is extracted from the voice digital data, and the periodicity is analyzed by the linear prediction method or the maximum entropy method, etc., to generate “time series pitch data (pitch series)” and register it with the spoken word It is a thing. For example, when an English teacher uses the voice memo printer 1 for speech practice, the teacher utters a word and registers “spoken words” and “time-series pitch data (pitch series)” in the pitch dictionary 61. I will leave. That is, for example, “spoken words” having different meanings, such as “rain” and “称”, whose name is “Ame”, are distinguished and registered in the pitch dictionary 61.

ピッチ辞書６１は、音声メモプリンタ１に図示しない外部機器（パーソナルコンピュータ等）を赤外線インタフェース５５を介して接続することで、当該外部機器から更新可能である。さらに、ピッチ辞書６１を格納するピッチ辞書格納チップ（辞書）の交換やピッチ辞書６１の図示しない外部機器（パーソナルコンピュータ等）からのダウンロードによる登録内容の書き換えにより、ピッチ辞書６１の内容を変えることも可能である。新たなピッチ辞書６１が赤外線インタフェース５５を介してダウンロードされた場合には、ピッチ辞書６１は、抹消される。 The pitch dictionary 61 can be updated from an external device by connecting an external device (such as a personal computer) (not shown) to the voice memo printer 1 via the infrared interface 55. Further, the contents of the pitch dictionary 61 can be changed by exchanging the pitch dictionary storing chip (dictionary) for storing the pitch dictionary 61 or rewriting the registered contents by downloading the pitch dictionary 61 from an external device (such as a personal computer) (not shown). Is possible. When a new pitch dictionary 61 is downloaded via the infrared interface 55, the pitch dictionary 61 is deleted.

したがって、このようなピッチ辞書６１は、使用する場所で同じ言葉もイントネーション、発声文言が異なる辞書を選択できる。例えば、関西地方の弁当店ならば、「大阪弁」のピッチ辞書、関東地方の弁当店ならば、「東京弁」のピッチ辞書を使用する。もちろん、「標準語」のピッチ辞書使用への切り替えも可能である。 Therefore, such a pitch dictionary 61 can select a dictionary in which the same words are different in intonation and utterance words at the place of use. For example, a pitch dictionary of “Osaka dial” is used for a lunch box in the Kansai region, and a pitch dictionary of “Tokyo dialect” is used for a lunch box store in the Kanto region. Of course, it is also possible to switch to the use of a “standard language” pitch dictionary.

ここで、スコア判定部５８ｄにおける判定手法について、図６に示す例に基づいて説明する。時間軸に対し、ピッチ系列が似ているかどうかの判定は、相互相関法を用いる。この相互相関法は、２関数の類似性を判定するのに一般に使われる手法である。図６に示す例によれば、
Σ教師発声×教師発声＝207584.6283
Σ教師発声×ユーザ発声１＝225362.7706
Σ教師発声×ユーザ発声２＝229427.6884
となることから、各ユーザ発声の基準ずれ度は、
ユーザ発声１＝（Σ教師発声×ユーザ発声１）／（Σ教師発声×教師発声）*１００
＝92.11％
ユーザ発声２＝（Σ教師発声×ユーザ発声２）／（Σ教師発声×教師発声）*１００
＝90.48％
として算出される。例えば、91％以上であれば正しいイントネーションで発声されたものとする場合、ユーザ発声１は91％以上であることから正しいイントネーションで発声されたと判定され、ユーザ発声２は91％以下であることから正しいイントネーションで発声されたと判定されないことになる。スコア判定部５８ｄは、このようにして正しいイントネーションで発声されたと判定された音声についての音声認識結果を出力する。 Here, the determination method in the score determination part 58d is demonstrated based on the example shown in FIG. The cross-correlation method is used to determine whether the pitch sequence is similar to the time axis. This cross-correlation method is a method generally used to determine the similarity of two functions. According to the example shown in FIG.
Σ Teacher utterance × Teacher utterance = 207584.6283
Σ Teacher utterance x User utterance 1 = 225362.7706
Σ Teacher utterance x User utterance 2 = 229427.6884
Therefore, the standard deviation degree of each user utterance is
User utterance 1 = (Σ teacher utterance × user utterance 1) / (Σ teacher utterance × teacher utterance) * 100
= 92.11%
User utterance 2 = (Σ teacher utterance × user utterance 2) / (Σ teacher utterance × teacher utterance) * 100
= 90.48%
Is calculated as For example, when 91% or more is assumed to be uttered with correct intonation, user utterance 1 is determined to be uttered with correct intonation because user utterance 1 is 91% or more, and user utterance 2 is 91% or less. It is not determined that the voice was uttered with the correct intonation. The score determination unit 58d outputs a speech recognition result for the speech determined to be uttered with correct intonation in this way.

次に、音声メモプリンタ１に内蔵されたＲＯＭ４２に格納された制御プログラムがＣＰＵ４１に実行させる機能のうち、本実施の形態の音声メモプリンタ１が備える特長的な機能について説明する。 Next, of the functions that the control program stored in the ROM 42 built in the voice memo printer 1 causes the CPU 41 to execute, the characteristic functions provided in the voice memo printer 1 of the present embodiment will be described.

ここで、音声メモプリンタ１のＣＰＵ４１が実行する音声印字処理について説明する。図７は、音声印字処理の流れを示すフローチャートである。図７に示すように、デジタル化された音声が入力されると（ステップＳ１のＹ）、ステップＳ２に進み、認識パターンの登録処理か、発声の音声認識処理かが判断される。 Here, the voice printing process executed by the CPU 41 of the voice memo printer 1 will be described. FIG. 7 is a flowchart showing the flow of the voice printing process. As shown in FIG. 7, when a digitized voice is input (Y in step S1), the process proceeds to step S2, and it is determined whether the process is a recognition pattern registration process or an utterance voice recognition process.

発声の音声認識処理であると判断されると、音声認識エンジン５８による音声認識処理が実行される（ステップＳ３）。 If it is determined that the speech recognition processing is utterance, the speech recognition processing by the speech recognition engine 58 is executed (step S3).

音声認識処理において言語パターン辞書６０に登録されている単語であると判断された場合（ステップＳ４のＹ）、単語の言語特徴が入力音声の音響特徴に最も近い単語を探して音声認識結果として印字部１４に出力して印字する（ステップＳ５：印字手段）。 If it is determined in the speech recognition process that the word is registered in the language pattern dictionary 60 (Y in step S4), the word whose language feature is closest to the acoustic feature of the input speech is searched and printed as a speech recognition result. The data is output to the section 14 and printed (step S5: printing means).

音声認識処理において言語パターン辞書６０に登録されている単語でないと判断された場合（ステップＳ４のＮ）、音声認識せずにステップＳ１に戻る。 If it is determined in the voice recognition process that the word is not registered in the language pattern dictionary 60 (N in step S4), the process returns to step S1 without performing voice recognition.

一方、認識パターンの登録処理であると判断されると、ステップＳ６に進み、認識パターン登録処理を実行する。認識パターン登録処理は、使用される用途において使用されるであろう会話や発声文言を一覧に纏め、用途別使用言語表（図示せず）とし、この用途別使用言語表に登録された各言語毎に、その言語の周波数を解析し、音声特徴（音韻情報）と言語特徴（音韻の系列情報）に分離する。そして、このようにして分離された言語特徴を、言語パターン辞書６０に登録する。 On the other hand, if it is determined that the process is a recognition pattern registration process, the process advances to step S6 to execute a recognition pattern registration process. In the recognition pattern registration process, conversations and utterances that will be used in the intended use are collected in a list and used as a use language table (not shown) for each use, and each language registered in this use language table for each use Each time, the frequency of the language is analyzed and separated into speech features (phoneme information) and language features (phoneme sequence information). Then, the language features separated in this way are registered in the language pattern dictionary 60.

このような音声メモプリンタ１は、あらゆるシーンで利用可能である。例えば、英語の発音の試験においては、スコア判定部５８ｄにおける判定結果が合格であれば印字し、不合格であれば印字を行わないことで、発声者に不合格であることを認識させるようにしても良い。ここで、図８は発行された付箋紙Ｐの一例を示す平面図である。図８に示すように、付箋紙Ｐには、「Ａｐｐｌｅ」と正しく発声した場合のテキスト「Ａｐｐｌｅ」が印字されている。なお、判定結果（合否、得点、ずれ度など）のみを印字するようにしても良い。 Such a voice memo printer 1 can be used in any scene. For example, in an English pronunciation test, if the determination result in the score determination unit 58d is acceptable, printing is performed, and if it is unacceptable, printing is not performed, so that the speaker is recognized as unacceptable. May be. Here, FIG. 8 is a plan view showing an example of the issued sticky note P. FIG. As shown in FIG. 8, the text “Apple” in the case where “Apple” is correctly spoken is printed on the sticky note P. Only determination results (pass / fail, score, deviation degree, etc.) may be printed.

このように本実施の形態によれば、発声文言についてのイントネーションや発音の判定結果が合格であれば印字し、不合格であれば印字を行わないことで、発声者に発声文言についてのイントネーションや発音の可否を認識させることができる。また、発声文言についてのイントネーションや発音の判定結果（合否、得点、ずれ度など）を印字することで、発声者に発声文言についてのイントネーションや発音の可否を認識させることができる。 Thus, according to the present embodiment, if the intonation or pronunciation determination result for the utterance wording is acceptable, it is printed, and if it is unacceptable, the printing is not performed. It is possible to recognize whether or not pronunciation is possible. In addition, by printing intonation and pronunciation determination results (pass / fail, score, deviation degree, etc.) about the utterance wording, it is possible to make the speaker recognize the intonation and utterance of the utterance wording.

また、ピッチ辞書６１として、関西地方の弁当店ならば「大阪弁」のピッチ辞書、関東地方の弁当店ならば「東京弁」のピッチ辞書を使用することにより、弁当店等における注文を受ける際にも、注文を受けた商品について発声するだけで付箋紙Ｐに注文された商品が印字された状態で発行されてくるので、発行された付箋紙Ｐを注文票として利用することができる。この注文票は、商品引渡しの際に商品に貼り付けておくようにすれば、商品の取り違いを防止することもできる。 In addition, when using the pitch dictionary 61 for a lunch box in the Kansai region, the pitch dictionary for “Osaka dialect” is used for a lunch box in the Kansai region, and for the lunch box store in the Kanto region, the pitch dictionary “Tokyo dialect” is used to receive an order at a bento store. In addition, since the ordered product is issued on the sticky note P simply by speaking about the ordered product, the issued sticky note P can be used as an order form. If the order slip is affixed to the product when the product is delivered, it is possible to prevent the product from being mixed.

加えて、本実施の形態によれば、発声文言が、音声ピッチ抽出部５８ｃにより抽出された音声ピッチと、比較部５８ｂにより出力された音声認識結果と、ピッチ辞書６１に格納されている発声語の時系列のピッチデータとに基づいて判定される。これにより、例えば発声文言が「アメ」である「雨」と「飴」のように同一の称呼でも意味の異なる単語を区別して印字することが可能になる。 In addition, according to the present embodiment, the utterance wording includes the voice pitch extracted by the voice pitch extraction unit 58c, the voice recognition result output by the comparison unit 58b, and the utterance word stored in the pitch dictionary 61. And the time-series pitch data. This makes it possible to distinguish and print words having different meanings even with the same name, such as “rain” and “飴” whose utterance is “Ame”.

なお、本実施の形態においては、音声認識させる特定用途についての単語の言語特徴を登録している単一の言語パターン辞書６０を備えるようにしたが、これに限るものではなく、異なる特定用途についての単語の言語特徴をそれぞれ登録している複数の言語パターン辞書６０を備えるようにしても良い。この場合、特定用途別に言語パターン辞書６０を切り替えて使用するようにすれば良い。言語パターン辞書６０を切り替えは、入力された音声内容により切り替えるようにしても良いし、スイッチによって切り替えるようにしても良い。 In the present embodiment, the single language pattern dictionary 60 that registers the linguistic features of words for specific applications to be recognized by speech is provided. However, the present invention is not limited to this. A plurality of language pattern dictionaries 60 each registering the language characteristics of the word may be provided. In this case, the language pattern dictionary 60 may be switched and used for each specific application. The language pattern dictionary 60 may be switched according to the input voice content or may be switched by a switch.

本発明の実施の一形態の音声メモプリンタをラベル排出側から示す外観斜視図である。1 is an external perspective view showing a voice memo printer according to an embodiment of the present invention from a label discharge side. 音声メモプリンタをオペレータ装着側から示す外観斜視図である。It is an external appearance perspective view which shows a voice memo printer from the operator mounting side. 音声メモプリンタの内部構造を示す水平断面図である。It is a horizontal sectional view showing the internal structure of the voice memo printer. 音声メモプリンタの各部の制御系の接続を示すブロック図である。It is a block diagram which shows the connection of the control system of each part of a voice memo printer. 音声認識エンジンの構成を示すブロック図である。It is a block diagram which shows the structure of a speech recognition engine. ある発声語についてのピッチ系列を示す説明図である。It is explanatory drawing which shows the pitch series about a certain spoken word. 音声印字処理の流れを示すフローチャートである。It is a flowchart which shows the flow of an audio | voice printing process. 発行された付箋紙の一例を示す平面図である。It is a top view which shows an example of the issued sticky note paper.

Explanation of symbols

１…音声メモプリンタ、１４…印字部、５２…マイク、５３…Ａ／Ｄ変換手段、５８ａ…周波数解析手段、５８ｂ…音声認識手段、５８ｃ…音声ピッチ抽出手段、５８ｄ…判定手段、５９…音響辞書、６０…言語パターン辞書、６１…ピッチ辞書、
DESCRIPTION OF SYMBOLS 1 ... Voice memo printer, 14 ... Printing part, 52 ... Microphone, 53 ... A / D conversion means, 58a ... Frequency analysis means, 58b ... Voice recognition means, 58c ... Voice pitch extraction means, 58d ... Determination means, 59 ... Sound Dictionary, 60 ... language pattern dictionary, 61 ... pitch dictionary,

Claims

A printing section for printing predetermined items;
A microphone for inputting voice, which is a spoken word,
A / D conversion means for converting audio analog data input from the microphone into audio digital data;
Frequency analysis means for frequency-converting and analyzing the audio digital data converted by the A / D conversion means;
Speech recognition means for outputting a speech recognition result based on the frequency analyzed by the frequency analysis means;
Audio pitch extraction means for extracting audio pitch from the audio digital data converted by the A / D conversion means;
A pitch dictionary storing time series pitch data of the spoken word in association with the spoken word;
Determine the utterance wording based on the voice pitch extracted by the voice pitch extraction means, the voice recognition result output by the voice recognition means, and the time-series pitch data of the utterance words stored in the pitch dictionary. Determination means to perform,
Printing means for printing the utterance wording determined by the determination means on the printing unit;
Voice memo printer equipped with.

A printing section for printing predetermined items;
A microphone for inputting voice, which is a spoken word,
A / D conversion means for converting audio analog data input from the microphone into audio digital data;
Frequency analysis means for frequency-converting and analyzing the audio digital data converted by the A / D conversion means;
Speech recognition means for outputting a speech recognition result based on the frequency analyzed by the frequency analysis means;
Audio pitch extraction means for extracting audio pitch from the audio digital data converted by the A / D conversion means;
A pitch dictionary storing time series pitch data of the spoken word in association with the spoken word;
Based on the voice pitch extracted by the voice pitch extraction means, the voice recognition result output by the voice recognition means, and the time-series pitch data of the spoken words stored in the pitch dictionary, the utterance wording is correct. A determination means for determining whether or not the voice is uttered by intonation;
Printing means for printing the determination result by the determination means in the printing unit;
Voice memo printer equipped with.

A printing section for printing predetermined items;
A microphone for inputting voice, which is a spoken word,
A / D conversion means for converting audio analog data input from the microphone into audio digital data;
Frequency analysis means for frequency-converting and analyzing the audio digital data converted by the A / D conversion means;
Speech recognition means for outputting a speech recognition result based on the frequency analyzed by the frequency analysis means;
Audio pitch extraction means for extracting audio pitch from the audio digital data converted by the A / D conversion means;
A pitch dictionary storing time series pitch data of the spoken word in association with the spoken word;
Based on the voice pitch extracted by the voice pitch extraction means, the voice recognition result output by the voice recognition means, and the time-series pitch data of the spoken words stored in the pitch dictionary, the utterance wording is correct. A determination means for determining whether or not the voice is uttered by intonation;
A printing unit that prints the speech recognition result output by the language feature extraction unit on the printing unit only when it is determined by the determination unit that the utterance is spoken with correct intonation;
Voice memo printer equipped with.

The speech recognition means includes a language pattern dictionary in which language features of words for a specific application are registered, an acoustic dictionary in which acoustic features of phonemes of human utterances are registered, and a frequency analyzed by the frequency analysis means. Voice feature extraction means for extracting the specified acoustic feature from the acoustic dictionary, and a word having a language feature closest to the acoustic feature extracted by the voice feature extraction means is extracted from the language pattern dictionary, and the voice is extracted. Language feature extraction means for outputting as a recognition result,
The voice memo printer according to any one of claims 1 to 3.

The language pattern dictionary is prepared for each specific application and can be exchanged according to the specific application.
The voice memo printer according to claim 4.

A plurality of language pattern dictionaries are prepared for each specific application, and the language pattern dictionary is switched according to the specific application.
The voice memo printer according to claim 4.

The registered contents of the language pattern dictionary can be rewritten from an external device.
The voice memo printer according to any one of claims 4 to 6.