JP2012018544A

JP2012018544A - Audio output device, audio output method and program

Info

Publication number: JP2012018544A
Application number: JP2010155252A
Authority: JP
Inventors: Michio Aizawa; 道雄相澤; Keita Yoshida; 圭太吉田; Ritsu Wakui; 立和久井; Nobuo Oshimoto; 信夫押本
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2010-07-07
Filing date: 2010-07-07
Publication date: 2012-01-26

Abstract

PROBLEM TO BE SOLVED: To provide an audio output device, an audio output method and a program to solve a problem in which it is difficult for audio output devices to output recognition results as audio with expression.SOLUTION: An audio output device includes: an input unit for inputting a series, from the beginning to the end, of handwriting; a recognition unit for recognizing the shape and the size of the handwriting inputted by the input unit; a classification unit for classifying the handwriting recognized by the recognition unit into a category based on the numbers of the strokes that constitute the handwriting and the size of the handwriting; a selection unit for selecting a onomatopoeic word or a mimetic word associated with the category specified by the classification unit from onomatopoeic words and mimetic words prestored on a storage unit; and an output unit for outputting the onomatopoetic word or the mimetic word selected by the selection unit as audio.

Description

本発明は、ユーザの入力に合わせて音声を出力する音声出力装置、音声出力方法、及びプログラムに関する。 The present invention relates to an audio output device, an audio output method, and a program that output audio in accordance with user input.

ユーザの手書き入力に合わせて音声を出力する装置が知られている。特許文献１では、擬似音声を出力する装置が開示されている。これは、ペン入力の移動速度を算出し、移動速度に応じた擬似音声を出力するというものである。紙に何かを書いているような感触をユーザに与えるという効果がある。また、特許文献２では、文字の認識結果だけではなく単語の認識結果を合わせて音声出力する装置が開示されている。単語や文章としてまとまった入力感をユーザに与えるという効果がある。 An apparatus that outputs sound in accordance with a user's handwriting input is known. Patent Document 1 discloses an apparatus that outputs pseudo sound. This is to calculate the moving speed of the pen input and to output a pseudo sound corresponding to the moving speed. This has the effect of giving the user the feeling of writing something on paper. Patent Document 2 discloses an apparatus that outputs not only a character recognition result but also a word recognition result together with voice output. This has the effect of giving the user a sense of input as words and sentences.

また、文字や単語ではなく、擬音語を出力する装置が知られている。特許文献３では、入力された音量に応じた擬音語を文字として撮影画像と合成する装置が開示されている。音を視覚的にユーザに与えることが可能になるという効果がある。 Also known are devices that output onomatopoeic words instead of letters and words. Patent Document 3 discloses an apparatus that synthesizes an onomatopoeia corresponding to an input sound volume as a character with a captured image. There is an effect that it is possible to visually give a sound to the user.

特開平８−１９０４５０号公報JP-A-8-190450 特開２００２−２８８５８８号公報JP 2002-288588 A 特開２００６−１０９３２２号公報JP 2006-109322 A

文字やジェスチャを認識する機能と音声出力機能とを有する装置では、装置が認識した結果を音声出力する機能が有用である。さらにユーザの興味を引くためには、オノマトペ（擬音語や擬態語）などを用いることにより、出力される音声が、表現豊かであることが望ましい。 In a device having a function of recognizing characters and gestures and a sound output function, a function of outputting the result recognized by the device is useful. Further, in order to attract the user's interest, it is desirable that the output voice is rich in expression by using onomatopoeia (onomatopoeia or mimicry word).

しかしながら、特許文献１に開示されている装置が出力する音声からは、装置の認識結果を確認し難い。また、特許文献２に開示されている装置が出力する音声は単調になる。特許文献３に開示されている装置には、手書き入力と擬音語を対応付ける方法が考慮されていない。つまり、従来は手書き入力された文字等を音声として出力する音声出力装置の認識結果を、表現豊かに音声出力することが困難であるという課題がある。 However, it is difficult to confirm the recognition result of the device from the sound output by the device disclosed in Patent Document 1. Also, the sound output by the device disclosed in Patent Document 2 is monotonous. The apparatus disclosed in Patent Document 3 does not consider a method for associating handwritten input with onomatopoeia. That is, conventionally, there is a problem that it is difficult to output the recognition result of the voice output device that outputs a handwritten character or the like as voice in an expressive manner.

上記の課題に鑑み、本発明は、手書き入力された文字等の認識結果を表現豊かに音声出力することを目的とする。 In view of the above-described problems, an object of the present invention is to express a recognition result of handwritten characters and the like in an expressive manner.

上記の目的を達成する本発明に係る音声出力装置は、
描き始めから描き終わりまでの一連の筆跡を入力する入力手段と、
前記入力手段により入力された筆跡の形状及び大きさを認識する認識手段と、
前記認識手段により認識された筆跡を、当該筆跡を構成する線分の数及び当該筆跡の大きさごとにカテゴリに分類する分類手段と、
前記分類手段により分類されたカテゴリに対応する擬音語又は擬態語を、予め記憶手段に記憶された擬音語又は擬態語から選択する選択手段と、
前記選択手段により選択された擬音語又は擬態語を音声として出力する出力手段と、
を備えることを特徴とする。 An audio output device according to the present invention that achieves the above object is as follows.
An input means for inputting a series of handwriting from the beginning of drawing to the end of drawing,
Recognizing means for recognizing the shape and size of the handwriting input by the input means;
Classification means for classifying the handwriting recognized by the recognition means into categories for each number of line segments constituting the handwriting and the size of the handwriting,
A selection means for selecting an onomatopoeia or mimicry word corresponding to the category classified by the classification means from an onomatopoeia or mimicry word stored in advance in the storage means;
Output means for outputting the onomatopoeia or mimicry word selected by the selection means as speech;
It is characterized by providing.

本発明によれば、手書き入力された文字等の認識結果を表現豊かに音声出力することが可能になる。 According to the present invention, it is possible to express a recognition result of handwritten characters and the like in a richly expressive manner.

（ａ）音声出力装置に係るハードウェア構成を示すブロック図、（ｂ）音声出力装置の機能構成を示すブロック図。(A) The block diagram which shows the hardware constitutions which concern on the audio output device, (b) The block diagram which shows the functional constitution of the audio output device. ストローク辞書１５２の例を示す図The figure which shows the example of the stroke dictionary 152 （ａ）オノマトペ辞書１５３の例を示す図、（ｂ）オノマトペ辞書１５３の例を示す図。(A) The figure which shows the example of the onomatopoeia dictionary 153, (b) The figure which shows the example of the onomatopoeia dictionary 153. ジェスチャ辞書１６１の例を示す図。The figure which shows the example of the gesture dictionary 161. FIG. 音声出力装置の処理手順を示すフローチャート。The flowchart which shows the process sequence of an audio | voice output apparatus. 音声出力装置の動作例を示す図。The figure which shows the operation example of an audio | voice output apparatus. ストロークの形状を認識するための処理手順を示すフローチャート。The flowchart which shows the process sequence for recognizing the shape of a stroke. （ａ）ストロークの頂点を説明する図、（ｂ）及び（ｃ）入力されたジェスチャの例を示す図。(A) The figure explaining the vertex of a stroke, (b) And (c) The figure which shows the example of the input gesture. ストローク文言を簡略化した他のストローク文言へ変更する処理手順を示すフローチャート。The flowchart which shows the process sequence which changes the stroke wording into the other stroke wording simplified. ジェスチャの正しい書き方を音声出力するための処理手順を示すフローチャート。The flowchart which shows the process sequence for carrying out the audio | voice output of how to write the gesture correctly.

（第１実施形態）
図１（ａ）を参照して、本発明に係る音声出力装置のハードウェア構成について説明する。ＣＰＵ（中央処理装置）１０１は、システム制御部として装置全体の動作を制御する。ＲＯＭ１０２は、制御プログラムを格納する。具体的には、後述する処理を行うためのプログラムを格納している。ＲＡＭ１０３は、ＣＰＵ１０１のワークエリアを提供し、各種データなどを格納するために用いられる。記憶装置１０４は、画像データなどを格納するために用いられる。具体的には、ＳＤカードやハードディスク（ＨＤＤ）などである。タッチパネル１０５は、指やペンなどによる入力を処理するために用いられる。スピーカ１０６は、音声を出力するために用いられる。 (First embodiment)
With reference to FIG. 1A, the hardware configuration of the audio output device according to the present invention will be described. A CPU (Central Processing Unit) 101 controls the operation of the entire apparatus as a system control unit. The ROM 102 stores a control program. Specifically, a program for performing processing to be described later is stored. The RAM 103 provides a work area for the CPU 101 and is used for storing various data. The storage device 104 is used for storing image data and the like. Specifically, it is an SD card or a hard disk (HDD). The touch panel 105 is used for processing an input with a finger or a pen. The speaker 106 is used for outputting sound.

図１（ｂ）を参照して、音声出力装置の機能構成を示すブロック図について説明する。音声出力装置は、入力部１５１、ストローク辞書１５２、オノマトペ辞書１５３、ストローク処理部１５４、ストローク認識部１５５、ストローク分類部１５６、オノマトペ選択部１５７、音声出力部１５８を備える。また、ストローク保持部１５９、ストローク文言取得部１６０、ジェスチャ辞書１６１、ジェスチャ処理部１６２、ジェスチャ認識部１６３、ジェスチャ文言取得部１６４、ジェスチャ実行部１６５、ヘルプ部１６６を備える。 With reference to FIG.1 (b), the block diagram which shows the function structure of an audio | voice output apparatus is demonstrated. The voice output device includes an input unit 151, a stroke dictionary 152, an onomatopoeia dictionary 153, a stroke processing unit 154, a stroke recognition unit 155, a stroke classification unit 156, an onomatopoeia selection unit 157, and a voice output unit 158. Further, a stroke holding unit 159, a stroke word acquisition unit 160, a gesture dictionary 161, a gesture processing unit 162, a gesture recognition unit 163, a gesture word acquisition unit 164, a gesture execution unit 165, and a help unit 166 are provided.

入力部１５１は、ユーザからの入力を受け付ける。ストローク辞書１５２は、音声出力装置が認識可能な、ユーザからの入力の情報であるストローク（筆跡）の情報を保持する。ストローク（筆跡）とは、ユーザがタッチパネル１０５に触れてから離れるまでの一連の動作である。オノマトペ辞書１５３は、ストロークを構成する線分の数を用いて、入力されたストロークの形状をカテゴリごとに分類して記録する。オノマトペとは、擬音語、擬態語を包括的に表す言葉である。ストローク処理部１５４は、ストロークに関する処理を統括する。ストローク認識部１５５は、ストローク処理部１５４から送られる入力座標の列から、ストロークの形状とタッチパネル等の入力領域におけるストロークの大きさを認識する。ストローク分類部１５６は、ストローク認識部１５５で認識された形状と大きさを用いて、入力されたストロークをカテゴリに分類する。オノマトペ選択部１５７は、分類されたカテゴリに対応するオノマトペをオノマトペ辞書１５３から選択する。音声出力部１５８は、送られてくるオノマトペ、後述のストローク文言（第１の文言）、後述のジェスチャ文言（第２の文言）、又は、その他の文言を音声出力する。 The input unit 151 receives input from the user. The stroke dictionary 152 holds information on strokes (handwriting) that can be recognized by the audio output device and is input information from the user. A stroke (handwriting) is a series of operations from when the user touches the touch panel 105 until the user leaves. The onomatopoeia dictionary 153 classifies and records the shape of the input stroke for each category using the number of line segments constituting the stroke. Onomatopoeia is a word that comprehensively represents onomatopoeia and mimicry words. The stroke processing unit 154 supervises processing related to the stroke. The stroke recognition unit 155 recognizes the stroke shape and the stroke size in the input area such as a touch panel from the input coordinate sequence sent from the stroke processing unit 154. The stroke classification unit 156 classifies the input strokes into categories using the shape and size recognized by the stroke recognition unit 155. The onomatopoeia selection unit 157 selects an onomatopoeia corresponding to the classified category from the onomatopoeia dictionary 153. The voice output unit 158 outputs the onomatopoeia, the later-described stroke wording (first wording), the later-described gesture wording (second wording), or any other wording.

ストローク保持部１５９は、入力座標の列を保持する。第１取得手段として機能するストローク文言取得部１６０は、認識された形状に対応するストローク文言（第１の文言）を、ストローク辞書１５２から取得する。ジェスチャ辞書１６１は、１又は２以上のストロークから構成されるジェスチャの情報（図形）を保持する。ジェスチャ処理部１６２は、ジェスチャに関する処理（図形に対応する処理）を統括する。ジェスチャ認識部１６３は、ジェスチャ処理部１６２から送られる入力座標の列を用いて、ジェスチャ（図形）を認識する。第２取得手段として機能するジェスチャ文言取得部１６４は、ジェスチャ認識部１６３により認識されたジェスチャに対応するジェスチャ文言（第２の文言）を、ジェスチャ辞書１６１から取得する。ジェスチャ実行部１６５は、認識されたジェスチャに対応するコマンドを実行する。ヘルプ部１６６は、認識されたジェスチャの尤度が小さい場合にジェスチャの正しい書き方を取得する。 The stroke holding unit 159 holds a column of input coordinates. The stroke word acquisition unit 160 functioning as a first acquisition unit acquires a stroke word (first word) corresponding to the recognized shape from the stroke dictionary 152. The gesture dictionary 161 holds gesture information (graphics) composed of one or more strokes. The gesture processing unit 162 controls processing related to gestures (processing corresponding to graphics). The gesture recognition unit 163 recognizes a gesture (figure) using the input coordinate sequence sent from the gesture processing unit 162. The gesture word acquisition unit 164 functioning as the second acquisition unit acquires a gesture word (second word) corresponding to the gesture recognized by the gesture recognition unit 163 from the gesture dictionary 161. The gesture execution unit 165 executes a command corresponding to the recognized gesture. The help unit 166 acquires the correct way of writing the gesture when the likelihood of the recognized gesture is small.

以下、上記各処理部における処理について具体的に説明する。入力部１５１は、タッチパネル１０５を備える。入力部１５１は、ユーザが指やペンなどでタッチパネル１０５に触れると、触れた位置の入力座標を検出し、ペンイベントを生成する。生成されるペンイベントは、ｐｅｎＤｏｗｎ、ｐｅｎＭｏｖｅ、ｐｅｎＵｐの３種類である。まず、ユーザの指がタッチパネル１０５に触れた場合、入力部１５１はｐｅｎＤｏｗｎを生成する。そして、指がタッチパネル１０５に触れたまま移動した場合、ｐｅｎＭｏｖｅを生成する。最後に、指がタッチパネル１０５から離れた場合、ｐｅｎＵｐを生成する。ペンイベントにおいて検出された入力座標は、後述のストローク保持部１５９により保持される。また、入力の際に指やペンの代わりにマウスを用いることも可能である。その場合、入力部１５１はタッチパネル１０５を備えなくても良い。 Hereinafter, the processing in each of the processing units will be specifically described. The input unit 151 includes a touch panel 105. When the user touches the touch panel 105 with a finger or a pen, the input unit 151 detects the input coordinates of the touched position and generates a pen event. There are three types of pen events to be generated: penDown, penMove, and penUp. First, when the user's finger touches the touch panel 105, the input unit 151 generates penDown. When the finger moves while touching the touch panel 105, penMove is generated. Finally, when the finger is removed from the touch panel 105, penUp is generated. The input coordinates detected in the pen event are held by a stroke holding unit 159 described later. It is also possible to use a mouse instead of a finger or pen for input. In that case, the input unit 151 may not include the touch panel 105.

そして、ストローク処理部１５４は、入力部１５１により生成されるペンイベントを受け取り、ストロークの区切りを検出する。ユーザがタッチパネル１０５に触れてから離れるまでを１つのストロークと定義する。つまり、１つのストロークは、１つのｐｅｎＤｏｗｎで始まり、１または複数のｐｅｎＭｏｖｅが続き、１つのｐｅｎＵｐで終わるというイベント列である。ストローク処理部１５４は、ストロークの区切りを検出すると、各ペンイベントから入力座標を取り出し、入力座標の列をストローク認識部１５５へ送る。 The stroke processing unit 154 receives a pen event generated by the input unit 151 and detects a stroke break. A stroke from when the user touches the touch panel 105 until the user leaves is defined as one stroke. That is, one stroke is an event sequence starting with one penDown, followed by one or more penMove, and ending with one penUp. When the stroke processing unit 154 detects a stroke break, the stroke processing unit 154 extracts input coordinates from each pen event, and sends a string of input coordinates to the stroke recognition unit 155.

図２を参照して、上記ストローク辞書１５２の例について説明する。ストローク辞書１５２は、装置が認識可能なストロークの情報を保持している。図２に示すストローク辞書１５２では、形状名、基準となるストローク、及び対応するストローク文言について７種類のストロークが登録されている。具体的には、横線２０１、縦線２０２、小なり２０３、右向き三角２０４、左向き三角２０５、四角２０６、上半円２０７の７種類である。 An example of the stroke dictionary 152 will be described with reference to FIG. The stroke dictionary 152 holds stroke information that can be recognized by the apparatus. In the stroke dictionary 152 shown in FIG. 2, seven types of strokes are registered for the shape name, the reference stroke, and the corresponding stroke wording. Specifically, there are seven types: horizontal line 201, vertical line 202, less than 203, right triangle 204, left triangle 205, square 206, and upper semicircle 207.

ストローク認識部１５５は、ストローク処理部１５４から送られる入力座標の列とストローク辞書１５２に登録された情報とから、ストロークの形状と大きさを認識する。まずストロークの大きさの認識方法について説明する。ストロークの形状の認識方法については後述する。 The stroke recognition unit 155 recognizes the shape and size of the stroke from the input coordinate sequence sent from the stroke processing unit 154 and the information registered in the stroke dictionary 152. First, a method for recognizing the stroke size will be described. A method of recognizing the stroke shape will be described later.

ストロークの大きさは以下の方法で認識される。最初に、入力座標の列に外接する矩形を求める。この矩形の幅がタッチパネルの幅の半分以上、又は矩形の高さがタッチパネルの高さの半分以上ある場合に、ストロークの大きさは「大」と認識される。逆に、矩形の幅と高さが共にタッチパネルの高さと幅の半分未満の場合に、ストロークの大きさは「小」と認識される。 The stroke size is recognized by the following method. First, a rectangle circumscribing the input coordinate column is obtained. If the width of the rectangle is half or more of the width of the touch panel or the height of the rectangle is more than half of the height of the touch panel, the stroke size is recognized as “large”. Conversely, when the width and height of the rectangle are both less than half the height and width of the touch panel, the stroke size is recognized as “small”.

ストローク分類部１５６は、ストローク認識部１５５により認識された形状と大きさを用いて、入力されたストロークをカテゴリに分類する。分類方法はオノマトペ辞書１５３に登録されている分類に従う。オノマトペ選択部１５７は、分類されたカテゴリに対応するオノマトペを、オノマトペ辞書１５３から選択する。選択したオノマトペは、音声出力部１５８へ送られる。 The stroke classification unit 156 classifies the input strokes into categories using the shape and size recognized by the stroke recognition unit 155. The classification method follows the classification registered in the onomatopoeia dictionary 153. The onomatopoeia selection unit 157 selects an onomatopoeia corresponding to the classified category from the onomatopoeia dictionary 153. The selected onomatopoeia is sent to the audio output unit 158.

図３（ａ）を参照して、オノマトペ辞書１５３の例について説明する。図３（ａ）のオノマトペ辞書１５３は、図２のストローク辞書１５２に対応している。すなわち、図２のストローク辞書１５２に含まれる形状を全て含む。図３（ａ）のストロークの形状に含まれる「不明１」、「不明２」、「不明３」、「不明４」の形状については後述する。ユーザが入力したストロークに対する音声出力装置の認識結果を、ユーザがオノマトペから確認可能とするために、オノマトペ辞書１５３は以下の特徴を有する。 An example of the onomatopoeia dictionary 153 will be described with reference to FIG. The onomatopoeia dictionary 153 in FIG. 3A corresponds to the stroke dictionary 152 in FIG. That is, all the shapes included in the stroke dictionary 152 of FIG. 2 are included. The shapes of “unknown 1”, “unknown 2”, “unknown 3”, and “unknown 4” included in the shape of the stroke in FIG. 3A will be described later. The onomatopoeia dictionary 153 has the following features so that the user can confirm the recognition result of the voice output device for the stroke input by the user from the onomatopoeia.

オノマトペ辞書１５３では、ストロークを構成する線分の数を用いて、形状がカテゴリに分類されている。図３（ａ）の例では、形状は４つのカテゴリに分類されている。それぞれのカテゴリは、ストロークを構成する線分の数が１つ、２つ、３つ、４つの形状を含む。「横線２０１」、「縦線２０２」、「上半円２０７」、及び「不明１」の形状が、線分の数が１つのカテゴリに含まれる。同様に、「小なり２０３」及び「不明２」が、線分の数が２つのカテゴリに含まれる。「右向き三角２０４」、「左向き三角２０５」、及び「不明３」が、線分の数が３つのカテゴリに含まれる。「四角２０６」、及び「不明４」が、線分の数が４つのカテゴリに含まれる。分類されたカテゴリごとに異なるオノマトペを割り当てる。このようにオノマトペを割り当てると以下の効果が得られる。ストロークを構成する線分の数はストロークの形状と密接な関係がある。そのため、認識されたストロークの形状がオノマトペから想像可能となる。よって、音声出力されるオノマトペをユーザが聞くことで、認識されたストロークの形状を確認することが可能となる。 In the onomatopoeia dictionary 153, shapes are classified into categories by using the number of line segments constituting the stroke. In the example of FIG. 3A, the shapes are classified into four categories. Each category includes one, two, three, and four shapes that form a stroke. The shapes of “horizontal line 201”, “vertical line 202”, “upper semicircle 207”, and “unknown 1” are included in one category with the number of line segments. Similarly, “less than 203” and “unknown 2” include the number of line segments in two categories. “Right-facing triangle 204”, “left-facing triangle 205”, and “unknown 3” are included in the three categories of line segments. “Square 206” and “Unknown 4” include the number of line segments in four categories. Assign different onomatopoeia for each classified category. By assigning onomatopoeia in this way, the following effects can be obtained. The number of line segments constituting the stroke is closely related to the shape of the stroke. Therefore, the recognized stroke shape can be imagined from onomatopoeia. Therefore, the user can confirm the shape of the recognized stroke by listening to the onomatopoeia that is output as voice.

また、オノマトペ辞書１５３では、ストロークの大きさをカテゴリに分類している。図３（ａ）の例では「大」と「小」の２つのカテゴリに分類している。このカテゴリごとに異なるオノマトペを割り当てる。このようにオノマトペを割り当てると以下の効果が得られる。認識されたストロークの大きさがオノマトペから想像可能になる。よって、音声出力されるオノマトペをユーザが聞くことで、認識されたストロークの大きさを確認することが可能となる。 The onomatopoeia dictionary 153 classifies stroke sizes into categories. In the example of FIG. 3A, the data is classified into two categories, “large” and “small”. Assign different onomatopoeia for each category. By assigning onomatopoeia in this way, the following effects can be obtained. The recognized stroke size can be imagined from onomatopoeia. Therefore, the user can confirm the size of the recognized stroke by listening to the onomatopoeia that is output as voice.

上記説明では、ストロークの形状と大きさのそれぞれをカテゴリに分類しているが、形状のみをカテゴリに分類することも可能である。例えば、大きさを区別しないジェスチャを用いる場合、ストロークの大きさを認識する必要はない。ジェスチャについては後述する。 In the above description, the shape and size of the stroke are classified into categories, but it is also possible to classify only the shapes into categories. For example, when using a gesture that does not distinguish the size, it is not necessary to recognize the size of the stroke. The gesture will be described later.

図３（ａ）に示す例を用いて、オノマトペの割り当て方法をより詳細に説明する。まず、形状を分類したカテゴリごとに、オノマトペを割り当てる方法について説明する。ストロークを構成する線分の数に合わせてオノマトペの拍数を増減する。拍数とは、オノマトペの基本となる拍（後述の「さ」又は「す」等）を連続する数である。例えば、ストロークの大きさが「小」の場合は次の通りである。基本の拍を「さ」とする。ストロークを構成する線分の数と同じだけ基本の拍を繰り返し、その後に拍「っ」を加え、オノマトペとする。ストロークを構成する線分の数が２の場合、「さ」を２回繰り返し「っ」を加えた「ささっ」をオノマトペとする。ただし、基本の拍の数とストロークを構成する線分の数を必ずしも同数にする必要はない。例えば、ストロークを構成する線分の数が５以上のストロークに、同一のオノマトペを割り当てることも可能である。オノマトペの基本の拍数を、ストロークを構成する線分の数と等しくすると、音声出力されるオノマトペからストロークの形状が容易に想像可能となる。 The onomatopoeia allocation method will be described in more detail using the example shown in FIG. First, a method for assigning onomatopoeia for each category into which shapes are classified will be described. Increase or decrease the number of beats of onomatopoeia according to the number of line segments that make up the stroke. The number of beats is a number in which beats that are the basis of onomatopoeia (such as “sa” or “su” described later) are consecutive. For example, when the stroke size is “small”, it is as follows. The basic beat is “sa”. Repeat the basic beat as many as the number of line segments that make up the stroke, and then add the beat “tsu” to make the onomatopoeia. When the number of line segments constituting the stroke is 2, “sasa”, which is obtained by repeating “sa” twice and adding “tsu”, is defined as onomatopoeia. However, the number of basic beats and the number of line segments constituting the stroke are not necessarily the same. For example, the same onomatopoeia can be assigned to a stroke having five or more line segments constituting the stroke. When the basic beat number of onomatopoeia is equal to the number of line segments constituting the stroke, the shape of the stroke can be easily imagined from the onomatopoeia that is output as voice.

次に、大きさを分類したカテゴリごとに、オノマトペを割り当てる方法について説明する。ストロークの大きさに合わせて基本の拍を異なるものとする。例えば、大きさが「小」のストロークに対する基本の拍を「さ」とする。そして、大きさが「大」のストロークに対する基本の拍を「す」とする。これにより、認識されたストロークの大きさを容易に確認可能となる。また、基本の拍を繰り返した後に加える拍を異なるものとすると良い。例えば、大きさが「小」のストロークは１拍の「っ」を加える。そして、大きさが「大」のストロークは３拍の「ぅ〜っ」を加える。ストロークを構成する線分の数が２、かつ大きさが「大」の場合、「すすぅ〜っ」をオノマトペとする。大きさが「大」のストロークに対するオノマトペは、大きさが「小」のストロークに対するオノマトペよりも拍数が多くなる。ユーザは、音声出力するオノマトペからストロークの大きさを容易に想像可能となる。 Next, a method for assigning onomatopoeia for each category in which the sizes are classified will be described. The basic beat is different according to the size of the stroke. For example, a basic beat for a stroke having a size of “small” is “sa”. A basic beat for a stroke having a size of “large” is defined as “su”. As a result, the recognized stroke size can be easily confirmed. Moreover, it is good to make the beat added after repeating a basic beat differ. For example, a stroke having a size of “small” adds one beat “t”. The stroke of “large” is added with 3 beats “ぅ～っ”. When the number of line segments composing the stroke is 2 and the size is “large”, “Susuzu ~ tsu” is set as the onomatopoeia. The onomatopoeia for the stroke of “large” has a higher number of beats than the onomatopoeia for the stroke of “small”. The user can easily imagine the size of the stroke from the onomatopoeia that outputs voice.

ここではオノマトペとして擬音語を用いる例を説明した。擬音語の代わりに擬態語を用いることも可能である。例えば、形状に円弧を含むストロークと含まないストロークとでカテゴリを分類する。円弧を含むストロークに対するオノマトペを擬態語の「ふわっ」又は「ふわふわっ」とする。ここで「ふわっ」は大きさが「小」の場合で、「ふわふわっ」が大きさが「大」の場合である。円弧を含まないストロークに対しては、図３（ａ）と同じ擬音語を割り当てる。 Here, an example using an onomatopoeia as an onomatopoeia has been described. It is also possible to use a mimetic word instead of the onomatopoeia. For example, the categories are classified into strokes that include arcs and strokes that do not include arcs. The onomatopoeia for a stroke including an arc is the mimetic word “fluffy” or “fluffy”. Here, “fluff” is a case where the size is “small” and “fluffy” is a case where the size is “large”. The same onomatopoeia as in FIG. 3A is assigned to a stroke that does not include an arc.

以上説明したとおり、ストロークの形状や大きさに、擬音語や擬態語などオノマトペを割り当てることが可能である。割り当てたオノマトペからストロークの形状や大きさが容易に想像可能である。よって、ユーザは音声出力されるオノマトペを聞くことで、認識されたストロークの形状や大きさを確認することが可能となる。また、オノマトペを用いることで表現豊かな音声出力が可能となる。なお、図３（ｂ）は第２実施形態に係るオノマトペ辞書１５３の例を示す図であり、後述の第２実施形態において説明する。 As described above, onomatopoeia such as onomatopoeia and mimetic words can be assigned to the shape and size of the stroke. The shape and size of the stroke can be easily imagined from the assigned onomatopoeia. Therefore, the user can confirm the shape and size of the recognized stroke by listening to the onomatopoeia that is output as voice. In addition, the use of onomatopoeia makes it possible to output an expressive voice. FIG. 3B is a diagram showing an example of the onomatopoeia dictionary 153 according to the second embodiment, which will be described in the second embodiment to be described later.

ストローク保持部１５９は、ペンイベントから取り出した入力座標の列を保持する。ストローク文言取得部１６０は、認識された形状に対応するストローク文言を、ストローク辞書１５２から取得する。ストローク文言は、認識されたストロークの形状を説明する文言である。ここで、取得されたストローク文言に何らかの修正が加えられても良い。取得されたストローク文言は、音声出力部１５８へ送られる。 The stroke holding unit 159 holds a column of input coordinates extracted from the pen event. The stroke word acquisition unit 160 acquires a stroke word corresponding to the recognized shape from the stroke dictionary 152. The stroke wording is a wording explaining the shape of the recognized stroke. Here, some correction may be added to the acquired stroke wording. The acquired stroke wording is sent to the audio output unit 158.

一方、ジェスチャ処理部１６２は、入力された複数のストロークに対して、ジェスチャの区切りを検出する。検出方法については後述する。１つのジェスチャは、１つ又は複数のストロークから構成される。 On the other hand, the gesture processing unit 162 detects a gesture break for a plurality of input strokes. The detection method will be described later. One gesture is composed of one or a plurality of strokes.

図４を参照して、ジェスチャ辞書１６１の例について説明する。例えば、「４枚表示」のジェスチャは「横線２０１」と「縦線２０２」の２つのストロークから構成される。また、「１枚表示」のジェスチャは「四角２０６」の１つのストロークから構成される。検出されたジェスチャ区切り同士の間に含まれる各ストロークに対応する入力座標の列を、ストローク保持部１５９から取り出す。取り出した入力座標の列はジェスチャ認識部１６３へ送られる。 An example of the gesture dictionary 161 will be described with reference to FIG. For example, a “four-sheet display” gesture is composed of two strokes of “horizontal line 201” and “vertical line 202”. Further, the “single display” gesture is composed of one stroke of “square 206”. A column of input coordinates corresponding to each stroke included between the detected gesture breaks is extracted from the stroke holding unit 159. The extracted input coordinate sequence is sent to the gesture recognition unit 163.

図４に示すジェスチャ辞書１６１の例について補足する。「４枚表示」と「１枚表示」のジェスチャに含まれる点線の矩形は、タッチパネルの位置と大きさを表すガイドである。これらのジェスチャは、ガイドに合わせた位置及び大きさで入力する必要がある。なお、ガイドがないジェスチャは、タッチパネル上の任意の位置に小さく入力する。 The example of the gesture dictionary 161 shown in FIG. 4 will be supplemented. The dotted rectangles included in the “four-sheet display” and “one-sheet display” gestures are guides representing the position and size of the touch panel. These gestures need to be input at a position and size according to the guide. Note that a gesture without a guide is input small at an arbitrary position on the touch panel.

ジェスチャ認識部１６３は、ジェスチャ処理部１６２から送られる入力座標の列を用いて、ジェスチャを認識する。ジェスチャの認識方法については後述する。ジェスチャ文言取得部１６４は、認識されたジェスチャに対応するジェスチャ文言を、ジェスチャ辞書１６１から取得する。ここで、取り出したジェスチャ文言に何らかの修正を加えることも可能である。取得されたジェスチャ文言は音声出力部１５８へ送られる。ジェスチャ実行部１６５は、認識されたジェスチャに対応するコマンドを実行する。ヘルプ部１６６は、認識されたジェスチャの尤度が閾値よりも小さい場合にジェスチャの正しい書き方を取得する。尤度の詳細については後述する。 The gesture recognition unit 163 recognizes a gesture using the input coordinate sequence sent from the gesture processing unit 162. The gesture recognition method will be described later. The gesture word acquisition unit 164 acquires a gesture word corresponding to the recognized gesture from the gesture dictionary 161. Here, it is possible to add some correction to the extracted gesture wording. The acquired gesture wording is sent to the voice output unit 158. The gesture execution unit 165 executes a command corresponding to the recognized gesture. The help unit 166 acquires the correct writing method of the gesture when the likelihood of the recognized gesture is smaller than the threshold. Details of the likelihood will be described later.

音声出力部１５８は、送られてくるオノマトペ、ストローク文言、ジェスチャ文言、又はその他の文言を音声出力する。音声出力には様々な方法を用いることが可能である。例えば、音声合成を用いることが可能である。また、オノマトペなどに対応する音声を予め録音しておき、その録音した音声を再生しても良い。オノマトペに対しては、そのオノマトペから想像可能な効果音を予め録音しておくことも可能である。また、オノマトペなどに対応するＭＩＤＩデータを保持しておき、そのＭＩＤＩデータをシンセサイザー等を用いて演奏することも可能である。 The voice output unit 158 outputs the onomatopoeia, stroke wording, gesture wording, or other wording sent as voice. Various methods can be used for audio output. For example, speech synthesis can be used. Also, sound corresponding to onomatopoeia or the like may be recorded in advance, and the recorded sound may be reproduced. For onomatopoeia, sound effects that can be imagined from the onomatopoeia can be recorded in advance. It is also possible to store MIDI data corresponding to onomatopoeia or the like and perform the MIDI data using a synthesizer or the like.

以上が、各処理部の動作についての説明である。 The above is the description of the operation of each processing unit.

次に、図５を参照して、本実施形態に係る音声出力装置の処理手順を示すフローチャートについて説明する。また、図６は本実施形態の手書き入力装置の動作例を示す図であり、図５のフローチャートの説明を補足するために用いる。 Next, with reference to FIG. 5, a flowchart illustrating a processing procedure of the audio output device according to the present embodiment will be described. FIG. 6 is a diagram illustrating an operation example of the handwriting input device of the present embodiment, and is used to supplement the description of the flowchart of FIG.

ステップＳ５０１において、ストローク処理部１５４は、ｐｅｎＤｏｗｎのペンイベントを受け取ったか否かを判定する。受け取った場合（ステップＳ５０１；ＹＥＳ）、ステップＳ５０２へ進む。受け取らなかった場合（ステップＳ５０１；ＮＯ）、ステップＳ５１３へ進む。 In step S501, the stroke processing unit 154 determines whether a penDown pen event has been received. If received (step S501; YES), the process proceeds to step S502. If not received (step S501; NO), the process proceeds to step S513.

ステップＳ５０２において、ストローク処理部１５４は、受け取ったｐｅｎＤｏｗｎのペンイベントから入力座標を取り出す。ストローク文言取得部１６０は、取り出された入力座標のタッチパネル上での大まかな位置を判定し、この位置に対応する文言を取得する。例えば、タッチパネルの左端付近である場合、図６のＩＤ２に示されるように「ひだりのはしから〜」という文言を取得する。なお、タッチパネル上での大まかな位置のそれぞれに対して、対応する文言を予め保持しておくものとする。取得された文言は音声出力部１５８へ送られる。 In step S502, the stroke processing unit 154 extracts input coordinates from the received penDown pen event. The stroke word acquisition unit 160 determines a rough position of the extracted input coordinates on the touch panel, and acquires a word corresponding to this position. For example, in the case of the vicinity of the left end of the touch panel, the wording “From Hidari no Hashi” is acquired as indicated by ID2 in FIG. It is assumed that a corresponding wording is held in advance for each of the rough positions on the touch panel. The acquired wording is sent to the audio output unit 158.

ステップＳ５０３において、音声出力部１５８は、送られてきた文言を音声出力する。ｐｅｎＤｏｗｎのペンイベントはストロークの書き始めに生成される。ここでは、音声出力装置が認識したストロークの書き始めの位置を表す文言を、音声出力することになる。ユーザは出力された音声を聞くことにより、装置が認識した書き始めの位置を確認することができる。図６のＩＤ６の場合も同様である。 In step S503, the voice output unit 158 outputs the sent word as voice. A penDown pen event is generated at the beginning of stroke writing. Here, the wording indicating the stroke start position recognized by the voice output device is outputted as voice. The user can confirm the writing start position recognized by the apparatus by listening to the output voice. The same applies to the case of ID6 in FIG.

ステップＳ５０４において、ストローク処理部１５４は、ペンイベントから入力座標を取り出す。そして、取り出した入力座標はストローク保持部１５９で保持される。ステップＳ５０５において、ストローク処理部１５４は、ｐｅｎＭｏｖｅのペンイベントを受け取ったか否かを判定する。受け取った場合（ステップＳ５０５；ＹＥＳ）、ステップＳ５０４へ戻る。受け取らなかった場合（ステップＳ５０５；ＮＯ）、ステップＳ５０６へ進む。 In step S504, the stroke processing unit 154 extracts input coordinates from the pen event. The extracted input coordinates are held by the stroke holding unit 159. In step S505, the stroke processing unit 154 determines whether or not a pen Move pen event has been received. If received (step S505; YES), the process returns to step S504. If not received (step S505; NO), the process proceeds to step S506.

ステップＳ５０６において、ストローク処理部１５４は、ｐｅｎＵｐのペンイベントを受け取ったか否かを判定する。受け取った場合（ステップＳ５０６；ＹＥＳ）、ステップＳ５０７へ進む。受け取らなかった場合（ステップＳ５０６；ＮＯ）、ステップＳ５０５へ戻る。ｐｅｎＵｐのペンイベントがストロークの区切りとなる。つまり、ステップＳ５０１からステップＳ５０６において受け取ったペンイベントの列が１つのストロークとなる。 In step S506, the stroke processing unit 154 determines whether or not a penUp pen event has been received. If received (step S506; YES), the process proceeds to step S507. If not received (step S506; NO), the process returns to step S505. A penUp pen event is a stroke break. That is, the pen event sequence received in steps S501 to S506 is one stroke.

ステップＳ５０７において、ストローク認識部１５５は、入力されたストロークの形状と大きさを認識する。ステップＳ５０８において、ストローク分類部１５６は、認識された形状と大きさを用いて、入力されたストロークをカテゴリに分類する。ステップＳ５０９において、オノマトペ選択部１５７は、分類されたカテゴリに対応するオノマトペをオノマトペ辞書１５３から選択する。選択されたオノマトペは音声出力部１５８へ送られる。なお、ストロークが連続して入力された場合に、簡略化したオノマトペを選択することも可能である。例えば、基本の拍の数を少なくしたものを簡略したオノマトペとする。 In step S507, the stroke recognition unit 155 recognizes the shape and size of the input stroke. In step S508, the stroke classification unit 156 classifies the input strokes into categories using the recognized shape and size. In step S509, the onomatopoeia selection unit 157 selects an onomatopoeia corresponding to the classified category from the onomatopoeia dictionary 153. The selected onomatopoeia is sent to the audio output unit 158. It is also possible to select a simplified onomatopoeia when strokes are input continuously. For example, a simple onomatopoeia is obtained by reducing the number of basic beats.

ステップＳ５１０において、ストローク文言取得部１６０は、ストローク辞書１５２から、認識したストロークの形状に対応するストローク文言を取得する。取得された文言は音声出力部１５８へ送られる。 In step S 510, the stroke word acquisition unit 160 acquires a stroke word corresponding to the recognized stroke shape from the stroke dictionary 152. The acquired wording is sent to the audio output unit 158.

ステップＳ５１１において、音声出力部１５８は、送られてきたオノマトペを音声出力する。図６のＩＤ４の場面に対応する。次に、音声出力部１５８は、送られてきたストローク文言を音声出力する。図４のＩＤ５の場面に対応する。ＩＤ５の場面は、ストローク文言取得部１６０が、取り出した文言に修正を加えた例である。オノマトペとストローク文言との間を違和感なくつなぐために、文言の先頭に助詞「と」を挿入している。ここでは、入力されたストロークに対して、装置が認識した結果が音声出力されることになる。「すぅ〜っ」と「と、よこぼ〜」のように、オノマトペとストローク文言は分けて音声出力部１５８へ送られる。「すぅ〜っと、よこぼ〜」のようにまとめて送られることはない。これは、後述する冗長な音声出力を避ける処理を容易にするためである。図６のＩＤ８の場面では、オノマトペとストローク文言とが共に出力されている。これは、分けて音声出力されたものを１つの場面にまとめたに過ぎない。 In step S511, the voice output unit 158 outputs the sent onomatopoeia as voice. This corresponds to the scene of ID4 in FIG. Next, the voice output unit 158 outputs the sent stroke word as voice. This corresponds to the scene of ID5 in FIG. The scene of ID5 is an example in which the stroke word acquisition unit 160 modifies the extracted word. In order to connect the onomatopoeia and the stroke wording without a sense of incongruity, the particle "to" is inserted at the beginning of the wording. Here, the result recognized by the apparatus is output by voice with respect to the input stroke. Onomatopoeia and stroke wording are sent separately to the voice output unit 158, as in “Su-u” and “To-yoko-bo”. They are not sent together like "Su-u, yokobo-". This is for facilitating processing to avoid redundant audio output, which will be described later. In the scene of ID8 in FIG. 6, both onomatopoeia and stroke wording are output. This is just a collection of the audio output divided into one scene.

次に、ステップＳ５１２において、タイマによる計測を開始する。このタイマはジェスチャの区切りを検出するために利用される。 Next, in step S512, measurement by a timer is started. This timer is used to detect gesture breaks.

ステップＳ５１３において、ジェスチャ処理部１６２は、タイマによる計測の開始後一定時間が経過したか否かを判定する。一定時間が経過した場合（ステップＳ５１３；ＹＥＳ）、ステップＳ５１４へ進む。一定時間が経過していない場合（ステップＳ５１３；ＮＯ）、ステップＳ５０１へ進む。最後のストロークが入力された後、一定時間が経過した場合、ジェスチャの区切りと判定する。それまでに入力された１つ以上の複数のストロークをまとめて１つのジェスチャと判定する。 In step S513, the gesture processing unit 162 determines whether a certain time has elapsed after the start of measurement by the timer. If the predetermined time has elapsed (step S513; YES), the process proceeds to step S514. If the predetermined time has not elapsed (step S513; NO), the process proceeds to step S501. If a certain time has elapsed after the last stroke is input, it is determined that the gesture is separated. One or more strokes input so far are collectively determined as one gesture.

ステップＳ５１４において、タイマによる計測を停止する。ステップＳ５１５において、ジェスチャ認識部１６３は、入力されたジェスチャを認識する。ステップＳ５１６において、ジェスチャ文言取得部１６４は、ジェスチャ辞書１６１から、認識されたジェスチャに対応するコマンドを説明するジェスチャ文言を取得する。取得されたジェスチャ文言は音声出力部１５８へ送られる。 In step S514, measurement by the timer is stopped. In step S515, the gesture recognition unit 163 recognizes the input gesture. In step S 516, the gesture word acquisition unit 164 acquires a gesture word describing a command corresponding to the recognized gesture from the gesture dictionary 161. The acquired gesture wording is sent to the voice output unit 158.

ステップＳ５１７において、音声出力部１５８は、送られてきたジェスチャ文言を音声出力する。このステップは図６のＩＤ９の場面に対応する。ここでは、入力されたジェスチャに対し、装置が認識した結果を音声出力することになる。入力された複数のストロークに関するオノマトペとストローク文言を音声出力した後に、ジェスチャ文言を音声出力することになる。 In step S517, the voice output unit 158 outputs the received gesture text as a voice. This step corresponds to the scene of ID9 in FIG. Here, the result recognized by the apparatus is output as voice for the input gesture. After the onomatopoeia and stroke wording relating to a plurality of input strokes are output as voices, the gesture wording is output as voices.

ステップＳ５１８において、ジェスチャ実行部１６５は、認識されたジェスチャに対応するコマンドを実行する。図６の例では、「４枚表示」のジェスチャが認識される。そのため、ＩＤ１０の場面において、画面の表示を４枚表示に変更するというコマンドを実行する。 In step S518, the gesture execution unit 165 executes a command corresponding to the recognized gesture. In the example of FIG. 6, the “four-sheet display” gesture is recognized. Therefore, in the scene of ID10, a command for changing the screen display to the four-screen display is executed.

以上の処理により、例えば図６に示すようなユーザのジェスチャ入力に合わせて、以下の音声が出力される。「ひだりのはしから〜。すぅ〜っとよこぼう〜。こんどはうえから〜。すぅ〜っとたてぼ〜。４まいひょうじ〜」。これは「４枚表示」のジェスチャの書き方を説明する音声になっている。ジェスチャの入力に合わせてその書き方が音声出力される。そのため、ユーザはジェスチャの書き方を何回も聞くことになり、書き方が記憶に定着しやすくなる。また、文言にメロディーを付けて音声出力すれば、絵描き歌の要領で楽しくジェスチャの書き方を覚えることが可能になる。 Through the above processing, for example, the following voice is output in accordance with the user's gesture input as shown in FIG. “From Hidari no Hashi. Suu ~ tsutoyokobo ~. This is from the top. This is a voice explaining how to write a “4-sheet display” gesture. The writing is output as a gesture. Therefore, the user hears how to write the gesture many times, and the writing is easily fixed in the memory. Also, if you add a melody to the word and output it as a voice, you can learn how to write gestures in a fun way of drawing songs.

次に、図７のフローチャートを参照して、ステップＳ５０７における、ストロークの形状を認識する処理手順を詳細に説明する。ステップＳ７０１において、ストローク認識部１５５は、入力されたストロークに対応する入力座標の列と認識候補の基準のストロークとでマッチングを行い、それぞれの認識候補に対する尤度を求める。ストローク辞書１５２に含まれる形状が認識候補である。基準のストロークは、手本となるストロークの書き方を示すデータである。これはベクトルデータや座標データの形で保持する。 Next, the processing procedure for recognizing the stroke shape in step S507 will be described in detail with reference to the flowchart of FIG. In step S 701, the stroke recognition unit 155 performs matching between the input coordinate sequence corresponding to the input stroke and the reference stroke of the recognition candidate, and obtains the likelihood for each recognition candidate. A shape included in the stroke dictionary 152 is a recognition candidate. The reference stroke is data indicating how to write a model stroke. This is stored in the form of vector data or coordinate data.

ステップＳ７０２において、ステップＳ７０１で求めた尤度の最大値が予め定めた閾値以上か否かを判定する。閾値以上の場合（ステップＳ７０２；ＹＥＳ）、ステップＳ７０６へ進む。一方、閾値より小さい場合（ステップＳ７０２；ＮＯ）、ステップＳ７０３へ進む。尤度の最大値が閾値より小さい場合は、入力されたストロークがストローク辞書１５２に登録されているストロークにマッチしなかったと判定する。つまり、尤度の最大値が閾値より小さいということは、ストロークの形状の認識に失敗したことを意味する。 In step S702, it is determined whether or not the maximum likelihood value obtained in step S701 is equal to or greater than a predetermined threshold value. If it is equal to or greater than the threshold (step S702; YES), the process proceeds to step S706. On the other hand, when smaller than a threshold value (step S702; NO), it progresses to step S703. When the maximum likelihood value is smaller than the threshold value, it is determined that the input stroke does not match the stroke registered in the stroke dictionary 152. That is, that the maximum likelihood value is smaller than the threshold value means that the stroke shape has failed to be recognized.

ステップＳ７０３において、公知技術を用いて、入力されたストロークの頂点を求める。例えば、図８（ａ）に示すストロークの頂点は２箇所にある。ステップＳ７０４において、入力されたストロークを頂点の位置で線分に分割し、ストロークを構成する線分の数ｎを求める。ステップＳ７０５において、形状として「不明ｎ」を認識結果とする。ステップＳ７０６において、尤度が最大値となる認識候補を認識結果とする。 In step S703, the vertex of the input stroke is obtained using a known technique. For example, there are two vertices of the stroke shown in FIG. In step S704, the input stroke is divided into line segments at the vertex positions, and the number n of line segments constituting the stroke is obtained. In step S705, “unknown n” is used as the recognition result as the shape. In step S706, the recognition candidate having the maximum likelihood is set as the recognition result.

以上の処理により以下の効果がある。ストローク辞書１５２を用いた形状の認識に失敗した場合であっても、ストロークを構成する線分の数に応じた形状を結果とすることが可能である。また、オノマトペ辞書１５３は、図３（ａ）に示すように、ストロークを構成する線分の数を用いて形状をカテゴリに分類している。よって、入力されたストロークの形状の認識に失敗した場合であっても、成功した場合と一貫した基準で、オノマトペを選択して音声出力することが可能となる。 The above processing has the following effects. Even if the shape recognition using the stroke dictionary 152 fails, a shape corresponding to the number of line segments constituting the stroke can be the result. Further, as shown in FIG. 3A, the onomatopoeia dictionary 153 classifies shapes into categories using the number of line segments constituting the stroke. Therefore, even when the input stroke shape is unsuccessful, it is possible to select and output the onomatopoeia on the basis consistent with the successful case.

図９のフローチャートを参照して、図５のステップＳ５１１における、オノマトペとストローク文言を音声出力する処理手順を詳細に説明する。ステップＳ９０１において、ストローク認識部１５５は、今回入力されたストロークが、１つ前に（前回）入力されたストロークと等しいか否かを判定する。２つのストロークの形状と大きさが等しい場合、２つのストロークが等しいと判定する。等しい場合（ステップＳ９０１；ＹＥＳ）、ステップＳ９０２へ進む。等しくない場合（ステップＳ９０１；ＮＯ）、ステップＳ９０９へ進む。ステップＳ９０２において、音声出力部１５８は、１つ前（前回）のストロークに関するオノマトペを音声出力中であるか否かを判定する。音声出力中の場合（ステップＳ９０２；ＹＥＳ）、ステップＳ９０３へ進む。音声出力中でない、つまり音声出力が終了している場合（ステップＳ９０１；ＮＯ）、ステップＳ９０６へ進む。 With reference to the flowchart of FIG. 9, the processing procedure for outputting the onomatopoeia and stroke word in step S511 of FIG. 5 will be described in detail. In step S901, the stroke recognizing unit 155 determines whether or not the stroke input this time is equal to the stroke input immediately before (previous). If the two strokes have the same shape and size, it is determined that the two strokes are equal. If equal (step S901; YES), the process proceeds to step S902. If they are not equal (step S901; NO), the process proceeds to step S909. In step S902, the audio output unit 158 determines whether or not the onomatopoeia related to the previous (previous) stroke is being output as audio. If the sound is being output (step S902; YES), the process proceeds to step S903. If the sound is not being output, that is, if the sound output has been completed (step S901; NO), the process proceeds to step S906.

ステップＳ９０３において、音声出力部１５８は、１つ前（前回）のストロークに関するストローク文言の音声出力をキャンセルする。また、ストローク文言取得部１６０は、１つ前（前回）のストロークの形状に対応する簡略化したストローク文言を取得し、音声出力部１５８へ送る。つまり、１つ前（前回）のストロークに関し、ステップＳ５１０で取得されたストローク文言を、簡略化したストローク文言へと変更し音声出力することになる。簡略化したストローク文言は予め保持しておく。例えば、形状の「縦線２０２」に対応する簡略化したストローク文言を「たて」とする。 In step S903, the voice output unit 158 cancels the voice output of the stroke text related to the previous (previous) stroke. Further, the stroke word acquisition unit 160 acquires a simplified stroke word corresponding to the shape of the previous (previous) stroke and sends it to the voice output unit 158. That is, with respect to the previous (previous) stroke, the stroke word acquired in step S510 is changed to a simplified stroke word and the sound is output. Simplified stroke wording is retained in advance. For example, a simplified stroke word corresponding to the shape “vertical line 202” is “vertical”.

ステップＳ９０４において、音声出力部１５８は、今回のストロークに関するオノマトペの音声出力をキャンセルする。ステップＳ９０５において、音声出力部１５８は、今回のストロークに関するストローク文言の音声出力をキャンセルする。また、ストローク文言取得部１６０は、今回のストロークの形状に対応する簡略化したストローク文言を取得し、音声出力部１５８へ送る。つまり、今回のストロークに関し、ステップＳ５１０で取得したストローク文言を簡略化したストローク文言へ変更し音声出力する。 In step S904, the voice output unit 158 cancels the onomatopoeia voice output related to the current stroke. In step S905, the voice output unit 158 cancels the voice output of the stroke word relating to the current stroke. Further, the stroke word acquisition unit 160 acquires a simplified stroke word corresponding to the shape of the current stroke and sends it to the voice output unit 158. That is, for the current stroke, the stroke word acquired in step S510 is changed to a simplified stroke word, and the sound is output.

ステップＳ９０６において、音声出力部１５８は、１つ前（前回）のストロークに関するストローク文言を音声出力中であるか否かを判定する。音声出力中である場合（ステップＳ９０６；ＹＥＳ）、ステップＳ９０７へ進む。音声出力中でない、つまり音声出力が終了している場合（ステップＳ９０６；ＮＯ）、ステップＳ９０９へ進む。ステップＳ９０７において、音声出力部１５８は、今回のストロークに関するオノマトペの音声出力をキャンセルする。ステップＳ９０８において、音声出力部１５８は、今回のストロークに関するストローク文言の音声出力をキャンセルする。また、ストローク文言取得部１６０は、繰り返しを表す文言を取得し、音声出力部へ送る。つまり、今回のストロークに関し、ステップＳ５１０で取得したストローク文言を繰り返しを表す文言へ変更して音声出力する。繰り返しを表す文言は予め保持しておく。例えば、「２つ〜」が繰り返しを表す文言である。 In step S 906, the voice output unit 158 determines whether or not a voice message for the stroke word relating to the previous (previous) stroke is being output. If the sound is being output (step S906; YES), the process proceeds to step S907. If the sound is not being output, that is, if the sound output has been completed (step S906; NO), the process proceeds to step S909. In step S907, the voice output unit 158 cancels the onomatopoeia voice output related to the current stroke. In step S908, the voice output unit 158 cancels the voice output of the stroke word relating to the current stroke. Moreover, the stroke word acquisition part 160 acquires the word showing a repetition, and sends it to an audio | voice output part. That is, for the current stroke, the stroke word acquired in step S510 is changed to a word indicating repetition and output as a voice. Words indicating repetition are retained in advance. For example, “2 to” is a word representing repetition.

ステップＳ９０９において、今回のストロークに関するオノマトペとストローク文言を音声出力する。形状が「縦線２０２」であり、大きさが「小」のストロークを２回連続で入力すると、通常は「さっ、とたてぼ〜。さっ、とたてぼ〜」という音声を出力する。これはステップＳ９０９を通る処理に対応する。一方、同じストロークを２回、少しテンポよく入力すると、「さっ、とたてぼ〜。２つ〜」という音声を出力する。これはステップＳ９０８を通る処理に対応する。そして、同じストロークを２回、さらにテンポよく入力すると、「さっ、とたて、たて」という音声を出力する。これはステップＳ９０５を通る処理に対応する。図９に示す処理を行うことにより、冗長な音声出力を避けて、入力のテンポに合わせた音声出力が可能となる。オノマトペとストロークの形状に対応する文言とを分けて音声出力部１５８へ送るため、このような文言の変更処理が容易に実現できる。 In step S909, the onomatopoeia and stroke word relating to the current stroke are output as voice. When a stroke having a shape of “vertical line 202” and a size of “small” is input twice in succession, a voice “That, Tatebo. . This corresponds to the process through step S909. On the other hand, if the same stroke is input twice with a slightly good tempo, the sound “Sat, Totebo ~. Two ~” is output. This corresponds to the process through step S908. Then, when the same stroke is input twice and with a better tempo, a sound “simply, fresh” is output. This corresponds to the process through step S905. By performing the processing shown in FIG. 9, it is possible to avoid the redundant audio output and to output the audio in accordance with the input tempo. Since the onomatopoeia and the wording corresponding to the shape of the stroke are sent separately to the voice output unit 158, such a wording change process can be easily realized.

上記において、１つ前（前回）のストロークと今回のストロークとの、２つの等しいストロークが連続して入力された場合について説明した。これは、２つ以上の等しいストロークが連続して入力された場合についても当てはまる。つまり、ストローク文言取得部１６０は、複数の等しいストロークが連続して入力された場合に、ストローク文言を簡略したストローク文言へ変更する。さらに、音声出力部１５８は、２つ目以降のストロークに関するオノマトペの音声出力をキャンセルする。 In the above description, the case where two equal strokes, the previous (previous) stroke and the current stroke, are continuously input has been described. This is true even when two or more equal strokes are input consecutively. That is, the stroke word acquisition unit 160 changes the stroke word to a simplified stroke word when a plurality of equal strokes are continuously input. Furthermore, the audio output unit 158 cancels the onomatopoeia audio output related to the second and subsequent strokes.

２つのストロークが連続して入力されたか否かは次のようにして判定される。１つ前（前回）のストロークに関するオノマトペ又はストローク文言を音声出力中に次のストロークが入力された場合に、ストロークが連続して入力されたと判定される。具体的には、ステップＳ９０２の処理と、ステップＳ９０６の処理とにより判定される。 Whether or not two strokes are continuously input is determined as follows. When the next stroke is input during voice output of the onomatopoeia or stroke word for the previous (previous) stroke, it is determined that the stroke has been continuously input. Specifically, it is determined by the process of step S902 and the process of step S906.

これまでは複数の等しいストロークが連続して入力された場合について説明した。逆に、等しくないストロークが連続して入力された場合に適用することも可能である。具体的には、ステップＳ９０１の処理はスキップされる。そして、ステップＳ９０８の処理は、ステップＳ９０５の処理に置き換えられる。上記のように処理を行うことにより、等しくないストロークが連続して入力された場合にも、冗長な音声出力を避けて、入力のテンポに合わせた音声出力が可能となる。 So far, the case where a plurality of equal strokes are continuously input has been described. Conversely, the present invention can be applied when unequal strokes are continuously input. Specifically, the process of step S901 is skipped. Then, the process of step S908 is replaced with the process of step S905. By performing processing as described above, even when unequal strokes are continuously input, it is possible to avoid redundant audio output and to output audio in accordance with the input tempo.

次に、図１０のフローチャートを参照して、ステップＳ５１５〜ステップＳ５１８の処理手順をより詳細に説明する。認識されたジェスチャの尤度が閾値よりも小さい場合に、ジェスチャの正しい書き方をユーザに音声出力する機能が追加されている。尤度が小さいということは、ユーザの書き方に何か問題があるということである。正しい書き方を音声出力することにより、ユーザにこの問題を指摘することが可能となる。 Next, the processing procedure of steps S515 to S518 will be described in more detail with reference to the flowchart of FIG. When the likelihood of the recognized gesture is smaller than the threshold value, a function for outputting the correct way of writing the gesture to the user is added. A small likelihood means that there is a problem with the way the user writes. This problem can be pointed out to the user by outputting the correct writing method by voice.

図１０におけるステップＳ５１６〜ステップＳ５１８は、図５で説明した同名のステップと同じ処理である。図５のステップＳ５１４から、ステップＳ１００１へ進む。図４のジェスチャ辞書１６１を用いた場合について説明する。 Steps S516 to S518 in FIG. 10 are the same processes as the steps having the same names described in FIG. From step S514 in FIG. 5, the process proceeds to step S1001. A case where the gesture dictionary 161 of FIG. 4 is used will be described.

ステップＳ１００１において、ジェスチャ認識部１６３は、入力されたジェスチャに対応する入力座標の列と認識候補の基準のジェスチャとでマッチングを行い、それぞれの認識候補に対し尤度を算出する。ジェスチャ辞書１６１に含まれるジェスチャが認識候補（図形候補）である。基準のジェスチャは、手本となるジェスチャの書き方を示すデータである。この基準のジェスチャはベクトルデータや座標データの形で保持する。そして、ジェスチャ認識部１６３は尤度が最大となる認識候補をジェスチャとして識別する。 In step S 1001, the gesture recognition unit 163 performs matching between the input coordinate sequence corresponding to the input gesture and the reference gesture reference criterion, and calculates the likelihood for each recognition candidate. Gestures included in the gesture dictionary 161 are recognition candidates (graphic candidates). The reference gesture is data indicating how to write a model gesture. This reference gesture is held in the form of vector data or coordinate data. Then, the gesture recognition unit 163 identifies a recognition candidate having the maximum likelihood as a gesture.

ステップＳ１００２において、ジェスチャ認識部１６３は、認識されたジェスチャの尤度が予め定めた閾値以上であるか否かを判定する。閾値以上の場合（ステップＳ１００２；ＹＥＳ）、ステップＳ５１６へ進む。一方、閾値より小さい場合（ステップＳ１００２；ＮＯ）、ステップＳ１００３へ進む。ステップＳ１００３において、音声出力部１５８は、ジェスチャの書き方に問題があることを音声出力する。すなわち、記憶部に記憶されたジェスチャの図形に該当しない旨を音声出力する。例えば、音声出力部１５８は「ジェスチャの書き方に問題があります」のような文言を音声出力する。ステップＳ１００４において、音声出力部１５８は、入力されたストローク列と、認識されたジェスチャを構成する正しいストローク列とが等しいか否かを判定する。ここで、１又は２以上のストロークをストローク列と称する。それぞれのストローク列に含まれるストロークを１つずつ比較し、その形状と大きさがすべて等しい場合、ストローク列が等しいと判定される。ストローク列が等しい場合（ステップＳ１００４；ＹＥＳ）、ステップＳ１００６へ進む。等しくない場合（ステップＳ１００４；ＮＯ）、ステップＳ１００５へ進む。ジェスチャを構成する正しいストローク列は、ジェスチャ辞書１６１に予め定めておく。例えば、図４に示すように、「印刷設定（印刷モードへ遷移）」のジェスチャを構成するストローク列は、「形状：四角、大きさ：小」及び「形状：四角、大きさ：小」の２つのストロークとする。 In step S1002, the gesture recognition unit 163 determines whether the likelihood of the recognized gesture is equal to or greater than a predetermined threshold. If it is equal to or greater than the threshold (step S1002; YES), the process proceeds to step S516. On the other hand, when smaller than a threshold value (step S1002; NO), it progresses to step S1003. In step S1003, the sound output unit 158 outputs a sound indicating that there is a problem in how to write the gesture. That is, a voice is output to the effect that the figure does not correspond to the gesture figure stored in the storage unit. For example, the voice output unit 158 outputs a voice such as “There is a problem in how to write a gesture”. In step S1004, the audio output unit 158 determines whether or not the input stroke sequence is equal to the correct stroke sequence constituting the recognized gesture. Here, one or more strokes are referred to as a stroke sequence. The strokes included in each stroke sequence are compared one by one, and if the shapes and sizes are all equal, it is determined that the stroke sequences are equal. If the stroke strings are equal (step S1004; YES), the process proceeds to step S1006. If they are not equal (step S1004; NO), the process proceeds to step S1005. The correct stroke sequence constituting the gesture is determined in advance in the gesture dictionary 161. For example, as shown in FIG. 4, the stroke sequence constituting the gesture of “print setting (transition to print mode)” is “shape: square, size: small” and “shape: square, size: small”. Two strokes are assumed.

入力されたストローク列は、ユーザの入力をストローク認識部１５５が認識した結果である。ユーザによる入力例を図８（ｂ）、図８（ｃ）に示す。図８（ｂ）の入力に対するストローク列は、「形状：四角、大きさ：小」及び「形状：四角、大きさ：小」のようになる。このストローク列は、「印刷設定（印刷モードへ遷移）」のジェスチャを構成する正しいストローク列と等しい。一方、図８（ｃ）の入力に対するストローク列は、「形状：丸、大きさ：小」及び「形状：四角、大きさ：小」のようになる。このストローク列は、「印刷設定（印刷モードへ遷移）」のジェスチャを構成する正しいストローク列と等しくない。 The input stroke sequence is a result of the stroke recognition unit 155 recognizing the user input. Examples of input by the user are shown in FIGS. 8B and 8C. The stroke sequence corresponding to the input in FIG. 8B is “shape: square, size: small” and “shape: square, size: small”. This stroke sequence is equal to the correct stroke sequence constituting the “print setting (transition to print mode)” gesture. On the other hand, the stroke sequence corresponding to the input in FIG. 8C is “shape: circle, size: small” and “shape: square, size: small”. This stroke sequence is not equal to the correct stroke sequence constituting the “print setting (transition to print mode)” gesture.

ステップＳ１００５において、音声出力部１５８は、認識されたジェスチャの通常の正しい書き方を取得して音声出力する。通常の正しい書き方は、ジェスチャを構成する各ストロークに関するオノマトペと、ストローク文言と、ジェスチャに関するジェスチャ文言とから構成される。この場合の音声出力は、ジェスチャを正しい書き方で入力した場合に、図５の処理手順に従って出力される音声と等しい。例えば、「印刷設定（印刷モードへ遷移）」のジェスチャの通常の正しい書き方は、「さっ、としかく〜。さっ、としかく〜。いんさつせってぇ〜」である。 In step S1005, the voice output unit 158 acquires the normal correct way of writing the recognized gesture and outputs the voice. The normal correct way of writing is composed of an onomatopoeia for each stroke constituting the gesture, a stroke word, and a gesture word for the gesture. The audio output in this case is equal to the audio output according to the processing procedure of FIG. 5 when the gesture is input in the correct way. For example, the normal correct way of writing the gesture of “print setting (transition to print mode)” is “Now, it ’s all right.

ステップＳ１００６において、音声出力部１５８は、認識されたジェスチャの詳細な正しい書き方を取得して音声出力する。詳細な正しい書き方とは、通常の正しい書き方にストローク間の相対的な位置関係や大きさを示す文言を加えたものである。例えば、「印刷設定（印刷モードへ遷移）」のジェスチャの詳細な正しい書き方は、「さっ、としかく〜。そのしたに〜、よこながに〜。さっ、としかく〜。いんさつせってぇ〜」である。ここで「そのしたに〜、よこながに〜」の部分が、各ストローク間の相対的な位置関係や大きさを示す文言である。 In step S 1006, the voice output unit 158 acquires the detailed correct way of writing the recognized gesture and outputs the voice. The detailed correct writing is obtained by adding a word indicating the relative positional relationship and size between strokes to the normal correct writing. For example, the correct and correct way of writing the “print setting (transition to print mode)” gesture is as follows: “Now, it ’s really good. ~ ". Here, the part of “That's it, yokonagani” is a wording indicating the relative positional relationship and size between the strokes.

具体的には、図８（ｂ）、図８（ｃ）の入力に対して、ジェスチャ認識部１６３は「印刷設定（印刷モードへ遷移）」のジェスチャを認識する。ただし、図８（ｂ）、図８（ｃ）のどちらの入力にも書き方に問題があるため、尤度が予め定めた閾値よりも小さくなる。よって、ステップＳ１００２からステップＳ１００３へ進む。 Specifically, the gesture recognition unit 163 recognizes a “print setting (transition to print mode)” gesture in response to the inputs shown in FIGS. 8B and 8C. However, since there is a problem in the writing method for both the inputs in FIGS. 8B and 8C, the likelihood is smaller than a predetermined threshold. Therefore, the process proceeds from step S1002 to step S1003.

図８（ｂ）のストローク列は、「印刷設定（印刷モードへ遷移）」のジェスチャを構成する正しいストローク列と等しい。よって、ステップＳ１００８において、音声出力部１５８は詳細な正しい書き方を音声出力する。しかしながら、図８（ｂ）の入力は、個々のストロークは等しいが、２つ目のストロークの大きさに問題がある。２つ目のストロークは１つ目のストロークよりも横長である必要がある（図４参照）。この問題は、ストローク間の相対的な位置関係や大きさを加えた詳細な正しい書き方により解決することが可能である。なお、個々のストロークは正しいため、通常の正しい書き方では、問題を指摘できない。 The stroke sequence in FIG. 8B is equal to the correct stroke sequence constituting the gesture of “print setting (transition to print mode)”. Therefore, in step S1008, the audio output unit 158 outputs the detailed correct writing method as audio. However, the input of FIG. 8B has the same individual stroke, but there is a problem with the size of the second stroke. The second stroke needs to be horizontally longer than the first stroke (see FIG. 4). This problem can be solved by a detailed and correct writing method that adds the relative positional relationship and size between strokes. In addition, since each stroke is correct, the problem cannot be pointed out with normal correct writing.

図８（ｃ）のストローク列は、「印刷設定（印刷モードへ遷移）」のジェスチャを構成する正しいストローク列と等しくない。よって、ステップＳ１００５において、音声出力部１５８は通常の正しい書き方を音声出力する。図８（ｃ）の入力は、１つ目のストロークの形状に問題がある。この問題は、通常の正しい書き方により指摘が可能である。 The stroke sequence in FIG. 8C is not equal to the correct stroke sequence constituting the “print setting (transition to print mode)” gesture. Therefore, in step S1005, the voice output unit 158 outputs a normal correct writing method as a voice. The input shown in FIG. 8C has a problem with the shape of the first stroke. This problem can be pointed out in the normal correct way of writing.

一方、図８（ｂ）、図８（ｃ）の入力に対し、「印刷設定（印刷モードへ遷移）」のジェスチャに対応するコマンドを実行することも可能である。しかしながら、その場合、ユーザが間違った書き方を覚えてしまうという課題が生じる。認識されたジェスチャの尤度が小さい場合は、音声出力部１５８はユーザの書き方に問題があると判断し、コマンドを実行せずに、正しい書き方を音声出力する。これによりユーザが間違った書き方を覚えてしまうことは回避できる。 On the other hand, it is also possible to execute a command corresponding to the gesture of “print setting (transition to print mode)” in response to the inputs shown in FIGS. 8B and 8C. However, in that case, there arises a problem that the user learns the wrong way of writing. When the likelihood of the recognized gesture is small, the voice output unit 158 determines that there is a problem with the user's writing method, and outputs the correct writing method without executing the command. Thereby, it is possible to avoid that the user learns the wrong way of writing.

（第２実施形態）
本実施形態に係る音声出力装置は、ユーザの入力したジェスチャに対して、装置がどのように認識したかを音声出力する。また、ユーザの書き方に問題がある場合、ジェスチャの正しい書き方を音声で知らせることが可能である。よって、画面がなくても操作が可能である。例えば、タッチパネル付きの音楽プレイヤーに適用可能である。ユーザは指でタッチパネルにジェスチャを書いて操作する。装置はイヤホンなどを通してユーザに音声出力する。画面を見る必要がないため、音楽プレイヤーをポケットやカバンに入れたままで操作が可能となる。また、画面が必須ではないので、例えば視覚障害者用のユーザインタフェースとして適用が可能である。一般にアプリケーションは複数の画面を持つ。画面を見ずに操作を行う場合、現在どの画面にいるかを音声で確認できることが望ましい。 (Second Embodiment)
The voice output device according to the present embodiment outputs a voice indicating how the device recognizes a gesture input by a user. Also, if there is a problem with the user's way of writing, it is possible to notify the correct way of writing the gesture by voice. Therefore, the operation can be performed without a screen. For example, it can be applied to a music player with a touch panel. The user operates by writing a gesture on the touch panel with a finger. The device outputs sound to the user through an earphone or the like. Since there is no need to look at the screen, the music player can be operated with it in a pocket or bag. Further, since the screen is not essential, it can be applied as a user interface for the visually impaired, for example. In general, an application has a plurality of screens. When operating without looking at the screen, it is desirable to be able to confirm by voice which screen is currently displayed.

図３（ｂ）を参照して、オノマトペ辞書１５３の他の例について説明する。これは「プレビュー」、「スライドショー」、「印刷」の３画面を持つ画像ビューアの例である。画面ごとにオノマトペの基本となる拍は異なるものとする。これにより、装置が出力するオノマトペから、現在どの画面にいるかを確認することが可能となる。 With reference to FIG. 3B, another example of the onomatopoeia dictionary 153 will be described. This is an example of an image viewer having three screens of “preview”, “slide show”, and “print”. The basic beat of onomatopoeia is different for each screen. This makes it possible to check which screen is currently displayed from the onomatopoeia output by the apparatus.

また、本発明に係る音声出力装置は、２次元ジェスチャに限らず、空間（３次元）ジェスチャを用いることも可能である。この場合、タッチパネルの代わりに、手の位置などを検出するためのセンサーを用いる。例えばＬＥＤから光を照射し、手が反射した光がセンサーに届くまでの時間を計測し、位置を検出する構成にすれば良い。 In addition, the audio output device according to the present invention is not limited to a two-dimensional gesture, and a spatial (three-dimensional) gesture can also be used. In this case, a sensor for detecting the position of the hand or the like is used instead of the touch panel. For example, it may be configured to detect the position by irradiating light from the LED, measuring the time until the light reflected by the hand reaches the sensor.

（その他の実施形態）
また、本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）を、ネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（またはＣＰＵやＭＰＵ等）がプログラムを読み出して実行する処理である。 (Other embodiments)
The present invention can also be realized by executing the following processing. That is, software (program) that realizes the functions of the above-described embodiments is supplied to a system or apparatus via a network or various storage media, and a computer (or CPU, MPU, or the like) of the system or apparatus reads the program. It is a process to be executed.

Claims

An input means for inputting a series of handwriting from the beginning of drawing to the end of drawing,
Recognizing means for recognizing the shape and size of the handwriting input by the input means;
Classification means for classifying the handwriting recognized by the recognition means into categories for each number of line segments constituting the handwriting and the size of the handwriting,
A selection means for selecting an onomatopoeia or mimicry word corresponding to the category classified by the classification means from an onomatopoeia or mimicry word stored in advance in the storage means;
Output means for outputting the onomatopoeia or mimicry word selected by the selection means as speech;
An audio output device comprising:

When the recognition unit recognizes that the same handwriting as the previous input is input by the input unit during the output of the voice by the output unit,
2. The voice output device according to claim 1, wherein the selection unit selects a simplified onomatopoeia or mimetic word from the storage unit.

A first acquisition unit that acquires, from the storage unit, a first word representing the shape of the handwriting recognized by the recognition unit;
3. The voice output device according to claim 1, wherein the output unit outputs the first word acquired by the first acquisition unit as a voice after outputting the onomatopoeia or the mimetic word as a voice. .

When the recognition means recognizes that the same handwriting as the previous input is input by the input means,
The voice output device according to claim 3, wherein the first acquisition unit acquires a simplified word representing a shape of a handwriting.

When the first acquisition unit acquires a simplified word representing the shape of the handwriting,
The voice output device according to claim 4, wherein the output unit does not output the voice of the onomatopoeia or mimicry word corresponding to the same handwriting as the previous input.

Identification means for identifying one or more handwritings input by the input means as a single figure as a whole;
A second acquisition unit that acquires, from the storage unit, a second word for executing the operation indicated by the identified graphic;
The output means outputs the onomatopoeia or mimetic word corresponding to one or more handwriting inputted by the input means and the first word as speech, and then outputs the second acquired by the second acquisition means. The voice output device according to any one of claims 3 to 5, wherein the wording is output as voice.

A calculation means for calculating a likelihood for each of a plurality of graphic candidates corresponding to the whole of one or more handwritings input by the input means;
The voice output device according to claim 6, wherein the identifying unit identifies a graphic from a graphic candidate having the maximum likelihood.

A determination unit for determining whether or not the likelihood of the graphic identified by the identification unit is smaller than a threshold;
When the determining means determines that the likelihood of the graphic identified by the identifying means is smaller than the threshold, the output means does not store the graphic corresponding to the input graphic in the storage means The voice output device according to claim 7, wherein the voice is output as a voice.

Determining means for determining whether or not the likelihood of the graphic identified by the identifying means is smaller than a threshold;
Execution means for executing processing corresponding to the graphic identified by the identification means,
When the determination means determines that the likelihood of the graphic identified by the identification means is greater than or equal to a threshold value,
The voice output device according to claim 7, wherein the execution unit executes a process corresponding to the graphic identified by the identification unit.

A determination unit for determining whether or not the likelihood of the graphic identified by the identification unit is smaller than a threshold;
When it is determined by the determining means that the likelihood of the graphic identified by the identifying means is smaller than a threshold value,
8. The audio output apparatus according to claim 7, wherein the output unit outputs a drawing method of the figure identified by the identification unit as a voice.

An input process in which the input means inputs a series of handwriting from the start to the end of the drawing,
A recognition step for recognizing the shape and size of the handwriting input in the input step;
A classification step for classifying the handwriting recognized by the recognition step into categories according to the number of line segments constituting the handwriting and the size of the handwriting,
A selection step in which the selection means selects an onomatopoeia or mimetic word corresponding to the category classified in the classification step from an onomatopoeia or mimetic word stored in advance in the storage step;
An output step, wherein the output means outputs the onomatopoeia or mimetic word selected in the selection step as speech;
An audio output method comprising:

The program for making a computer perform the audio | voice output method of Claim 11.