JP2015072415A

JP2015072415A - Display device, head-mounted display device, display device control method, and head-mounted display device control method

Info

Publication number: JP2015072415A
Application number: JP2013208872A
Authority: JP
Inventors: 薫千代; Kaoru Sendai
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2013-10-04
Filing date: 2013-10-04
Publication date: 2015-04-16
Anticipated expiration: 2033-10-04
Also published as: JP6364735B2

Abstract

PROBLEM TO BE SOLVED: To provide a display device which allows a user to visually confirm a sound source and a character image representing a voice acquired from the sound source in association.SOLUTION: A transmissive display device comprises: an image display unit which generates image light representing an image, causes the user to visually confirm the image light, and transmits an external scene therethrough; a voice acquisition unit for acquiring the voice; a conversion unit for converting the voice to a character image represented as an image with characters; a specific direction setting unit for setting a specific direction; and a display position setting unit which, on the basis of the specific direction, sets an image display position being a position to visually confirm character image light representing the character image in the visual field of the user.

Description

本発明は、表示装置に関する。 The present invention relates to a display device.

頭部に装着する表示装置である頭部装着型表示装置（ヘッドマウントディスプレイ（Head Mounted Display）、ＨＭＤ）が知られている。頭部装着型表示装置は、例えば、液晶ディスプレイおよび光源を利用して画像を表す画像光を生成し、生成された画像光を投写光学系や導光板を利用して使用者の眼に導くことにより、使用者に虚像を視認させる。頭部装着型表示装置には、使用者が虚像に加えて外景も視認可能な透過型と、使用者が外景を視認できない非透過型と、の２つのタイプがある。透過型の頭部装着型表示装置には、光学透過型とビデオ透過型とがある。 A head-mounted display device (Head Mounted Display, HMD) that is a display device mounted on the head is known. The head-mounted display device, for example, generates image light representing an image using a liquid crystal display and a light source, and guides the generated image light to a user's eye using a projection optical system or a light guide plate This causes the user to visually recognize the virtual image. There are two types of head-mounted display devices: a transmission type in which the user can visually recognize the outside scene in addition to a virtual image, and a non-transmission type in which the user cannot visually recognize the outside scene. The transmissive head-mounted display device includes an optical transmissive type and a video transmissive type.

透過型の頭部装着型表示装置において、音声を変換して音声を表す文字画像として使用者に視認させる技術が知られている。例えば、特許文献１には、聴覚障害者用の頭部装着型表示装置において、備え付けられたマイクが取得した多様な音声を文字画像として使用者に視認させる技術が開示されている。 In a transmissive head-mounted display device, a technique is known that allows a user to visually recognize a character image representing the sound by converting the sound. For example, Patent Document 1 discloses a technique for allowing a user to visually recognize various sounds acquired by a microphone provided in a head-mounted display device for a hearing impaired person as a character image.

特開２００７−３３４１４９号公報JP 2007-334149 A

しかし、特許文献１に記載された技術では、頭部装着型表示装置において、使用者は音声を表す文字画像を視認できるものの、文字画像を使用者に視認させるために虚像を形成する位置が使用者の視野において固定されているため、文字画像が使用者の視界を妨げるおそれがあった。また、音源と音源から取得される音声を表す文字画像との関係が何ら考慮されていないという課題があった。また、複数の音源から取得される複数の種類の音声を識別したいという課題があった。なお、上述の課題は、頭部装着型表示装置に限らず、表示装置に共通する課題であった。 However, in the technique described in Patent Document 1, in the head-mounted display device, although the user can visually recognize the character image representing the voice, the position where the virtual image is formed is used to make the user visually recognize the character image. Since it is fixed in the user's field of view, the character image may interfere with the user's field of view. In addition, there is a problem that the relationship between the sound source and the character image representing the sound acquired from the sound source is not taken into consideration. In addition, there is a problem that it is desired to identify a plurality of types of sound acquired from a plurality of sound sources. The above-described problem is not limited to the head-mounted display device, and is a problem common to display devices.

本発明は、上述の課題の少なくとも一部を解決するためになされたものであり、以下の形態として実現することが可能である。 SUMMARY An advantage of some aspects of the invention is to solve at least a part of the problems described above, and the invention can be implemented as the following forms.

（１）本発明の一形態によれば、透過型の表示装置が提供される。この表示装置は、画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と；音声を取得する音声取得部と；前記音声を文字により画像として表した文字画像に変換する変換部と；特定の方向を設定する特定方向設定部と；前記特定の方向に基づいて、使用者の視野における前記文字画像を表す文字画像光を視認させる位置である画像表示位置を設定する表示位置設定部と、を備える。この形態の表示装置によれば、使用者が設定した特定の方向に基づいて、取得される音声を、音声を表す文字画像として使用者に視認させることができ、使用者が音声を認識する理解度を向上させることができる。また、使用者が設定した特定の方向に基づいて、取得された音声を表す文字画像を使用者の視野における画像表示位置が設定されるので、使用者は、設定された特定の方向と文字画像とを関連付けて視認でき、設定した特定の方向と文字画像との関係を認識しやすく、使用者の利便性が向上する。 (1) According to an aspect of the present invention, a transmissive display device is provided. The display device generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene; a sound acquisition unit that acquires sound; and the sound as an image using characters. A conversion unit that converts the character image into a character direction; a specific direction setting unit that sets a specific direction; and a position for visually recognizing the character image light representing the character image in the field of view of the user based on the specific direction. A display position setting unit for setting an image display position. According to the display device of this aspect, based on a specific direction set by the user, the acquired voice can be visually recognized by the user as a character image representing the voice, and the user recognizes the voice. The degree can be improved. Moreover, since the image display position in the user's visual field is set for the character image representing the acquired voice based on the specific direction set by the user, the user can set the specific direction and the character image that are set. Can be visually recognized, the relationship between the set specific direction and the character image can be easily recognized, and the convenience for the user is improved.

（２）上記形態の表示装置において、前記表示位置設定部は、前記画像表示位置を、使用者の視野における前記特定の方向に対応する位置に重複しないように設定してもよい。この形態の表示装置によれば、使用者は、音源と音源から取得される音声を表す文字画像とを関連付けて認識しやすい。 (2) In the display device according to the above aspect, the display position setting unit may set the image display position so as not to overlap with a position corresponding to the specific direction in a user's visual field. According to this type of display device, the user can easily recognize the sound source and the character image representing the sound acquired from the sound source in association with each other.

（３）上記形態の表示装置において、前記表示位置設定部は、前記画像表示位置を、使用者の視野における中心以外に対応する位置に設定してもよい。この形態の表示装置によれば、使用者は、視野の中心で外景を視認しながら、音声を表す文字画像を視認できる。 (3) In the display device according to the above aspect, the display position setting unit may set the image display position to a position corresponding to a position other than the center in the user's visual field. According to the display device of this aspect, the user can visually recognize the character image representing the voice while visually recognizing the outside scene at the center of the visual field.

（４）上記形態の表示装置において、前記音声取得部は、音源から前記音声取得部への方向に応じて音声を取得する感度が異なり；前記特定の方向は、取得された音声の感度に基づいて設定されてもよい。この形態の表示装置によれば、最も音量の大きい音声を表す文字画像を使用者に視認させるため、使用者が外部の音声を聞きづらい場合であっても、最も注意すべき外部の音声を使用者に視覚情報として認識させることができる。 (4) In the display device according to the above aspect, the sound acquisition unit has a different sensitivity for acquiring sound according to a direction from a sound source to the sound acquisition unit; the specific direction is based on the sensitivity of the acquired sound. May be set. According to the display device of this aspect, since the user visually recognizes the character image representing the sound with the loudest volume, even when the user has difficulty in hearing the external sound, the external sound that should be most carefully watched is used. Can be recognized as visual information.

（５）上記形態の表示装置において、前記特定の方向は、前記音声取得部から音源への方向であってもよい。この形態の表示装置によれば、音源から取得された音声を音源と関連付けて使用者に視認させることができ、使用者にとって取得された音声に対する理解度が向上する。 (5) In the display device of the above aspect, the specific direction may be a direction from the sound acquisition unit to a sound source. According to the display device of this aspect, it is possible to make the user visually recognize the sound acquired from the sound source in association with the sound source, and the degree of understanding of the sound acquired by the user is improved.

（６）上記形態の表示装置において、複数の時点における外景の画像を取得する画像取得部を備え；前記画像表示位置設定部は、複数の時点における前記外景の画像の変化と前記特定の方向とに基づいて前記画像表示位置を設定してもよい。この形態の表示装置によれば、音源方向がより詳細に使用者に認識され、使用者に文字画像が視認される位置が音源方向の近くに設定されるため、使用者に音源方向と目標音源が発する音声を表す文字画像とをより関連付けて認識させやすい。 (6) The display device according to the above aspect includes an image acquisition unit that acquires an image of an outside scene at a plurality of points in time; the image display position setting unit includes a change in the image of the outside scene at a plurality of points in time and the specific direction. The image display position may be set based on the above. According to the display device of this aspect, the sound source direction is recognized by the user in more detail, and the position at which the character image is visually recognized by the user is set near the sound source direction. It is easy to recognize and associate with a character image representing a voice generated by.

（７）上記形態の表示装置において、さらに；前記表示装置から音源までの距離を特定する距離特定部を備え；前記表示位置設定部は、前記特定された距離に基づいて、前記文字画像光の大きさの変化と前記画像表示位置と設定との少なくとも一方を行なってもよい。この形態の表示装置によれば、使用者から目標音源までの距離に応じて文字画像の大きさの変化または画像表示位置の設定との少なくとも一方が行なわれるため、使用者に目標音源までの距離を視覚情報として認識させることができる。 (7) The display device according to the above aspect further includes: a distance specifying unit that specifies a distance from the display device to the sound source; and the display position setting unit is configured to detect the character image light based on the specified distance. You may perform at least one of a change of a magnitude | size, the said image display position, and a setting. According to the display device of this embodiment, since at least one of the change in the size of the character image and the setting of the image display position is performed according to the distance from the user to the target sound source, the distance from the user to the target sound source is determined. Can be recognized as visual information.

（８）上記形態の表示装置において、さらに；前記表示装置から音源までの距離を特定する距離特定部を備え；前記変換部は、前記特定された距離に基づいて前記文字画像の種類を変更してもよい。この形態の表示装置によれば、使用者から目標音源までの距離に応じて、使用者に視認させる文字画像の種類が変化するため、使用者に目標音源までの距離を視覚情報として認識させることができる。 (8) The display device according to the above aspect further includes a distance specifying unit that specifies a distance from the display device to the sound source; and the conversion unit changes the type of the character image based on the specified distance. May be. According to the display device of this aspect, since the type of the character image to be visually recognized by the user changes according to the distance from the user to the target sound source, the user can recognize the distance to the target sound source as visual information. Can do.

（９）上記形態の表示装置において、前記音声取得部は、音源から前記音声取得部への方向に応じて音声の音量を取得する感度が異なり；前記表示位置設定部は、異なる同じ音源が発する方向ごとに異なって取得された音声の音量に基づいて、前記画像表示位置を設定してもよい。この形態の表示装置によれば、外景画像が取得されなくても、音源方向を設定して、目標音源の近くに取得された文字画像を使用者に視認させられるので、使用者の利便性が向上する。 (9) In the display device of the above aspect, the sound acquisition unit has different sensitivity for acquiring sound volume according to a direction from a sound source to the sound acquisition unit; the display position setting unit emits the same different sound source The image display position may be set based on the sound volume acquired differently for each direction. According to the display device of this aspect, even if the outside scene image is not acquired, the sound source direction can be set and the user can visually recognize the character image acquired near the target sound source. improves.

（１０）上記形態の表示装置において、前記音声取得部は、音源から前記音声取得部への方向に応じて音声を取得する感度が異なり、前記特定の方向からの音声を取得する感度が最大となるような向きに設定されてもよい。この形態の表示装置によれば、音源取得部は、特定の方向からの音声を高い感度で取得し、特定の方向からずれるほど音声を取得しにくくなるため、特定の方向から取得された音声の精度が向上する。 (10) In the display device according to the above aspect, the sound acquisition unit has different sensitivity for acquiring sound in accordance with a direction from a sound source to the sound acquisition unit, and has a maximum sensitivity for acquiring sound from the specific direction. May be set in such a direction. According to the display device of this aspect, the sound source acquisition unit acquires sound from a specific direction with high sensitivity, and it becomes difficult to acquire the sound as it deviates from the specific direction. Accuracy is improved.

（１１）上記形態の表示装置において、さらに；複数の音源から取得される異なる種類の音声を前記音声の種類ごとに識別する音声識別部と；使用者による操作を受け付ける操作部と、を備え；前記特定方向設定部は、前記操作に基づいて、前記音声取得部から、複数の前記音声のうち一の前記音声が取得された音源までの方向である特定音源方向を特定し；前記表示位置設定部は、使用者の視野において、前記一の前記音声を表す前記画像光を認識させる位置を、前記特定音源方向に対応する位置に設定してもよい。この形態の表示装置によれば、複数の人が話す会話においても、使用者の視野ＶＲにおいて、特定音源方向から取得された音声を表す文字画像が特定音源方向の近くの位置に視認される。よって、使用者は、聴覚に加えて、視覚で特定音源方向と特定音源方向から取得される音声を表す文字画像とを関連付けて認識でき、会話の内容を理解しやすい。 (11) The display device according to the above aspect further includes: a voice identification unit that identifies different types of voice acquired from a plurality of sound sources for each type of the voice; and an operation unit that receives an operation by a user; The specific direction setting unit specifies a specific sound source direction that is a direction from the sound acquisition unit to a sound source from which one of the plurality of sounds is acquired based on the operation; and the display position setting The unit may set a position for recognizing the image light representing the one sound in a user's visual field to a position corresponding to the specific sound source direction. According to this form of display device, even in a conversation spoken by a plurality of people, in the user's visual field VR, a character image representing the voice acquired from the specific sound source direction is visually recognized at a position near the specific sound source direction. Therefore, in addition to hearing, the user can visually recognize the specific sound source direction and the character image representing the voice acquired from the specific sound source direction in association with each other, and can easily understand the content of the conversation.

（１２）上記形態の表示装置において、前記表示位置設定部は、使用者に視野において、前記一の前記音声を表す前記画像光を認識させる位置を、複数の前記特定音源方向に対応する位置のいずれにも重複しない位置に設定してもよい。この形態の表示装置によれば、使用者の視野ＶＲにおいて、複数の特定音源方向から取得された音声を表す文字画像のいずれも、複数の音源方向と重複しない位置に視認されるため、使用者は、特定音源方向と特定音源方向から取得される音声を表す文字画像とをより関連付けて視認することができる。 (12) In the display device according to the aspect described above, the display position setting unit is configured to set a position at which the user recognizes the image light representing the one sound in a visual field in a position corresponding to a plurality of the specific sound source directions. You may set to the position which does not overlap in any. According to the display device of this aspect, in the user's visual field VR, all of the character images representing the voices acquired from the plurality of specific sound source directions are visually recognized at positions that do not overlap with the plurality of sound source directions. Can visually recognize the specific sound source direction and the character image representing the sound acquired from the specific sound source direction in further association with each other.

（１３）上記形態の表示装置において、前記画像表示部は、複数の前記音声を前記音声の種類ごとに異なる前記画像光に生成し、使用者に複数の前記音声の種類ごとの前記画像光を虚像として認識させ；前記操作は、使用者の視野において認識される複数の前記音声の種類ごとの前記画像光から、一の前記特定音源方向からの前記音声に対応する前記画像光を特定する操作であってもよい。この形態の表示装置によれば、簡便な操作によって、使用者は、特定音源方向と特定音源方向から取得される音声を表す文字画像とを容易に設定できる。 (13) In the display device according to the above aspect, the image display unit generates a plurality of the sounds for the image light different for each type of sound, and gives the user the image light for each of the plurality of sound types. Recognizing as a virtual image; the operation is an operation of specifying the image light corresponding to the sound from one specific sound source direction from the image light for each of the plurality of types of sound recognized in the visual field of the user It may be. According to the display device of this aspect, the user can easily set the specific sound source direction and the character image representing the sound acquired from the specific sound source direction by a simple operation.

（１４）上記形態の表示装置において、前記画像表示部は、前記音声取得部が前記音声を取得した時点から所定の時間遅らせて前記画像光を虚像として使用者に認識させてもよい。この形態の表示装置によれば、一時的に音声を聞き逃すと共に音声を表す文字画像を見逃した場合に、音声よりも遅れて使用者の視野に認識される文字画像によって、取得される音声の前後のつながりに対する使用者の理解度が向上する。 (14) In the display device according to the above aspect, the image display unit may cause the user to recognize the image light as a virtual image with a predetermined time delay from the time when the sound acquisition unit acquires the sound. According to the display device of this aspect, when the voice image is temporarily missed and the character image representing the voice is missed, the acquired voice image is recognized by the character image recognized in the user's field of view later than the voice. The user's understanding of the connection before and after is improved.

（１５）上記形態の表示装置において、さらに；使用者の視線方向を推定する視線方向推定部を備え；前記画像表示部は、使用者の頭部に装着された状態において使用者に前記画像光を視認させ；前記表示位置設定部は、前記特定の方向と前記視線方向との関係に基づいて前記画像表示位置を設定してもよい。この形態の表示装置によれば、特定の方向と視線方向とのずれに応じて、使用者の視野に特定の方向が視認されているかいないかが判断されるので、特定の方向と取得された音声を表す文字画像とを、関連付けて認識させやすい。 (15) The display device according to the above aspect further includes: a gaze direction estimation unit that estimates a gaze direction of the user; and the image display unit provides the image light to the user in a state of being mounted on the user's head. The display position setting unit may set the image display position based on a relationship between the specific direction and the line-of-sight direction. According to the display device of this aspect, it is determined whether or not the specific direction is visually recognized in the user's field of view according to the difference between the specific direction and the line-of-sight direction. It is easy to associate and recognize a character image representing

（１６）上記形態の表示装置において、前記表示位置設定部は、前記視線方向と前記特定の方向とがなす角度である特定角度が第１の閾値未満の場合に、使用者の視野における前記特定の方向に対応する位置の近くに前記画像表示位置を設定し、前記特定角度が第１の閾値以上の場合に、前記特定の方向とは無関係に前記画像表示位置を設定してもよい。この形態の表示装置によれば、使用者の視野において、外景として特定の方向が視認されている場合には、特定の方向の近くに画像表示位置が設定されるため、取得される音声に対する使用者の理解度が向上する。 (16) In the display device according to the above aspect, the display position setting unit may determine the specific position in a user's visual field when a specific angle that is an angle formed by the line-of-sight direction and the specific direction is less than a first threshold. The image display position may be set near a position corresponding to the direction, and the image display position may be set regardless of the specific direction when the specific angle is equal to or greater than a first threshold. According to the display device of this embodiment, when a specific direction is visually recognized as an outside scene in the user's field of view, the image display position is set near the specific direction, so that the use for the acquired audio is used. Person's understanding is improved.

（１７）上記形態の表示装置において、さらに；外景の画像を取得する画像取得部を備え；前記画像表示部は、前記視線方向と前記特定の方向とがなす角度である特定角度が第２の閾値以上の場合に、前記画像取得部が取得した前記特定の方向の画像を表す画像光である特定方向画像光を生成して虚像として使用者に認識させ、前記特定角度が第２の閾値未満の場合に、前記特定方向画像光を生成せず；前記表示位置設定部は、前記特定角度が第２の閾値以上の場合に、前記特定方向画像光を認識させる位置を、前記画像表示位置に重複させず、かつ、前記画像表示位置の近くに設定し、前記特定角度が第２の閾値未満の場合に、使用者の視野における前記特定の方向に対応する位置の近くに前記画像表示位置を設定してもよい。この形態の表示装置によれば、使用者の視野において、特定の方向が外景に視認されていなくても、特定の方向が撮像された画像と画像表示位置とが近くに視認されるので、取得される音声に対する使用者の理解度が向上する。 (17) The display device according to the above aspect further includes: an image acquisition unit that acquires an image of an outside scene; the image display unit has a second specific angle that is an angle formed by the line-of-sight direction and the specific direction. When the threshold value is greater than or equal to a threshold value, the image acquisition unit generates a specific direction image light that is an image light representing the image in the specific direction and causes the user to recognize it as a virtual image, and the specific angle is less than a second threshold value. In this case, the specific direction image light is not generated; and the display position setting unit sets a position for recognizing the specific direction image light as the image display position when the specific angle is equal to or larger than a second threshold. When not overlapping and set near the image display position, and the specific angle is less than the second threshold, the image display position is set near the position corresponding to the specific direction in the user's field of view. It may be set. According to the display device of this form, in the user's field of view, even if the specific direction is not visually recognized in the outside scene, the image in which the specific direction is captured and the image display position are visually recognized in the vicinity. This improves the user's level of understanding of the voice being played.

（１８）上記形態の表示装置において、さらに；前記取得された音声と、前記取得された音声とは異なる特定音声と、を識別する音声識別部を備え；前記変換部は、前記取得された音声と前記特定音声とを異なる種類の前記文字画像に変換してもよい。この形態の表示装置によれば、取得された音声を表す文字画像と、取得された音声とは異なる音声を表す文字画像と、が異なる種類の文字画像で表示されるため、使用者に音声を発する音源の違いを視覚によって認識させることができる。 (18) In the display device according to the above aspect, the display device further includes: a voice identification unit that identifies the acquired voice and a specific voice different from the acquired voice; the conversion unit includes the acquired voice And the specific voice may be converted into different types of character images. According to the display device of this aspect, the character image representing the acquired sound and the character image representing the sound different from the acquired sound are displayed as different types of character images. Differences in the sound source to be emitted can be recognized visually.

（１９）上記形態の表示装置において、さらに；通信によって音声信号を取得する通信部を備え；前記特定音声は、前記通信部によって取得された音声信号に基づいて出力される音声であってもよい。この形態の表示装置によれば、取得された外部の音声のみでなく、通信によって取得されたさまざま音声信号を表す音声を使用者に視聴させると共に、通信によって取得された音声を視覚情報として認識させることができる。 (19) The display device according to the above aspect further includes: a communication unit that acquires an audio signal by communication; and the specific audio may be audio output based on the audio signal acquired by the communication unit. . According to the display device of this aspect, not only the acquired external audio but also the audio representing various audio signals acquired by communication are viewed by the user, and the audio acquired by communication is recognized as visual information. be able to.

（２０）本発明の他の形態によれば、透過型の頭部装着型表示装置が提供される。この頭部装着型表示装置は、画像を表す画像光を生成し、使用者の頭部に装着された状態において使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と；音声を取得する音声取得部と；前記音声を文字により画像として表した文字画像に変換する変換部と；使用者の視線方向を推定する視線方向推定部と；前記視線方向の変化に基づいて、使用者の視野における前記文字画像を表す文字画像光を視認させる位置である画像表示位置を設定する表示位置設定部と、を備える。この頭部装着型表示装置は、使用者の視認方向にあわせて、使用者の視界の妨げにならない位置に画像表示位置が設定されるので、使用者の使い勝手が向上する。 (20) According to another aspect of the present invention, a transmissive head-mounted display device is provided. The head-mounted display device generates image light representing an image, and allows the user to visually recognize the image light while being mounted on the user's head, and to transmit an outside scene; audio A voice acquisition unit that acquires the voice; a conversion unit that converts the voice into a character image represented by characters; a gaze direction estimation unit that estimates a user's gaze direction; and a use based on the change in the gaze direction A display position setting unit that sets an image display position that is a position for visually recognizing the character image light representing the character image in the visual field of the person. In this head-mounted display device, the image display position is set at a position that does not hinder the user's field of view in accordance with the user's viewing direction, so that the user's usability is improved.

（２１）上記形態の頭部装着型表示装置において、前記音声取得部は、音源から前記音声取得部への方向に応じて音声の音量を取得する感度が異なり；前記表示位置設定部は、異なる同じ音源が発する方向ごとに異なって取得された音声の音量に基づいて、前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、外景画像が取得されなくても、音源方向を設定して、目標音源の近くに取得された文字画像を使用者に視認させられるので、使用者の利便性が向上する。 (21) In the head-mounted display device according to the above aspect, the sound acquisition unit has different sensitivity for acquiring sound volume according to a direction from a sound source to the sound acquisition unit; the display position setting unit is different The image display position may be set based on the sound volume obtained differently for each direction in which the same sound source emits. According to the head-mounted display device of this aspect, the user can visually recognize the character image acquired near the target sound source by setting the sound source direction even if the outside scene image is not acquired. Improved convenience.

（２２）上記形態の頭部装着型表示装置において、前記視線方向推定部は、前記文字画像光が使用者に認識されている表示状態を基準として、前記視線方向の角速度と角度の変化量との少なくとも一方の特定値を推定し；前記表示位置設定部は、前記特定値が一定値を超えた場合に、使用者の視野における中央部以外に前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、使用者の視認方向の変化にあわせて、使用者の視界の妨げにならない位置に画像表示位置が設定されるので、使用者の使い勝手が向上する。 (22) In the head-mounted display device of the above aspect, the line-of-sight direction estimation unit includes an angular velocity in the line-of-sight direction and an amount of change in angle with reference to a display state in which the character image light is recognized by a user. The display position setting unit may set the image display position other than the central part in the user's field of view when the specific value exceeds a certain value. According to the head-mounted display device of this aspect, the image display position is set at a position that does not hinder the user's field of view in accordance with the change in the user's viewing direction, so that the user's convenience is improved. .

（２３）上記形態の頭部装着型表示装置において、前記視線方向推定部は、重力方向と前記重力方向に垂直な水平方向とを推定し；前記表示位置設定部は、前記重力方向と前記水平方向に対して前記表示状態における前記特定値に基づいて、使用者の視野における前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、重力方向または水平方向に対する使用者の視認方向の変化にあわせて、使用者の視界の妨げにならない位置に画像表示位置が設定されるので、使用者の使い勝手が向上する。 (23) In the head mounted display device according to the above aspect, the line-of-sight direction estimation unit estimates a gravity direction and a horizontal direction perpendicular to the gravity direction; the display position setting unit includes the gravity direction and the horizontal direction. The image display position in the user's field of view may be set based on the specific value in the display state with respect to the direction. According to this form of the head-mounted display device, the image display position is set at a position that does not hinder the user's field of view in accordance with the change in the user's viewing direction relative to the gravity direction or the horizontal direction. User convenience is improved.

（２４）上記形態の頭部装着型表示装置において、前記表示位置設定部は、前記角度の変化量が第３の閾値以上である場合に、使用者の視野における中央部以外に前記画像表示位置を設定し、前記角度の変化量が第３の閾値未満である場合に、使用者の視野における予め設定された位置に前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、使用者の視認方向の角度変化にあわせて、使用者の視界の妨げにならない位置に画像表示位置が設定されるので、使用者の使い勝手が向上する。 (24) In the head-mounted display device according to the above aspect, the display position setting unit may display the image display position in addition to the central part in the user's visual field when the change amount of the angle is equal to or greater than a third threshold. And the image display position may be set to a preset position in the user's field of view when the amount of change in the angle is less than a third threshold. According to the head-mounted display device of this embodiment, the image display position is set at a position that does not hinder the user's view according to the angle change in the user's viewing direction, so that the user's usability is improved. To do.

（２５）上記形態の頭部装着型表示装置において、前記表示位置設定部は、前記角度の変化量が第４の閾値未満の状態で所定の時間が経過した場合に、使用者の視野における中央部分に前記画像表示位置を設定し、前記角度の変化量が第４の閾値以上の場合に、使用者の視野における中央部以外に前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、使用者が視認している文字画像に注目していると判定された場合に、画像表示位置を使用者が視認しやすい位置に自動で変更されるので、使用者の使い勝手が向上する。 (25) In the head-mounted display device according to the above aspect, the display position setting unit is configured to display a center in the user's field of view when a predetermined time has elapsed with the change amount of the angle being less than a fourth threshold. The image display position may be set in a portion, and when the amount of change in the angle is equal to or greater than a fourth threshold, the image display position may be set in a portion other than the central portion in the user's visual field. According to the head-mounted display device of this aspect, when it is determined that the user is paying attention to the character image that is being visually recognized, the image display position is automatically changed to a position that is easy for the user to visually recognize. Therefore, the convenience for the user is improved.

（２６）上記形態の頭部装着型表示装置において、前記表示位置設定部は、前記角速度が第５の閾値以上である場合に、使用者の視野における中央部以外に前記画像表示位置を設定し、前記角速度が第５の閾値未満である場合に、使用者の視野における予め設定された位置に前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、使用者の視認方向の角速度にあわせて、使用者の視界の妨げにならない位置に画像表示位置が設定されるので、使用者の使い勝手が向上する。 (26) In the head-mounted display device according to the above aspect, the display position setting unit sets the image display position other than the central portion in the user's visual field when the angular velocity is equal to or higher than a fifth threshold. When the angular velocity is less than the fifth threshold, the image display position may be set at a preset position in the user's visual field. According to the head-mounted display device of this embodiment, the image display position is set at a position that does not hinder the user's field of view according to the angular velocity in the user's viewing direction, so that the user's usability is improved. .

（２７）上記形態の頭部装着型表示装置において、前記表示位置設定部は、前記角速度が第６の閾値未満の状態で所定の時間が経過した場合に、使用者の視野における中央部に前記画像表示位置を設定し、前記角速度が第６の閾値以上の場合に、使用者の視野における中央部以外に前記画像表示位置を設定してもよい。この形態の頭部装着型表示装置によれば、使用者が視認している文字画像に注目していると判定された場合に、画像表示位置を使用者が視認しやすい位置に自動で変更されるので、使用者の使い勝手が向上する。 (27) In the head-mounted display device according to the above aspect, the display position setting unit is arranged at a central portion in a user's field of view when a predetermined time has elapsed with the angular velocity being less than a sixth threshold. An image display position may be set, and when the angular velocity is equal to or greater than a sixth threshold, the image display position may be set at a position other than the central portion in the user's visual field. According to the head-mounted display device of this aspect, when it is determined that the user is paying attention to the character image that is being visually recognized, the image display position is automatically changed to a position that is easy for the user to visually recognize. Therefore, the convenience for the user is improved.

上述した本発明の各形態の有する複数の構成要素はすべてが必須のものではなく、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、適宜、前記複数の構成要素の一部の構成要素について、その変更、削除、新たな他の構成要素との差し替え、限定内容の一部削除を行なうことが可能である。また、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、上述した本発明の一形態に含まれる技術的特徴の一部または全部を上述した本発明の他の形態に含まれる技術的特徴の一部または全部と組み合わせて、本発明の独立した一形態とすることも可能である。 A plurality of constituent elements of each embodiment of the present invention described above are not essential, and some or all of the effects described in the present specification are to be solved to solve part or all of the above-described problems. In order to achieve the above, it is possible to appropriately change, delete, replace with another new component, and partially delete the limited contents of some of the plurality of components. In order to solve some or all of the above-described problems or achieve some or all of the effects described in this specification, technical features included in one embodiment of the present invention described above. A part or all of the technical features included in the other aspects of the present invention described above may be combined to form an independent form of the present invention.

例えば、本発明の一形態は、音声取得部と、変換部と、画像表示部と、特定方向設定部と、表示位置設定部と、の５つ要素の内の一つ以上または全部の要素を備えた装置として実現可能である。すなわち、この装置は、音声取得部を有していてもよく、有していなくてもよい。また、装置は、変換部を有していてもよく、有していなくてもよい。また、装置は、画像表示部を有していてもよく、有していなくてもよい。また、装置は、特定方向設定部を有していてもよく、有していなくてもよい。また、装置は、表示位置設定部を有していてもよく、有していなくてもよい。画像表示部は、前記文字画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させてもよい。音声取得部は、例えば、音声を取得してもよい。変換部は、例えば、前記音声を文字により画像として表した文字画像に変換してもよい。特定方向設定部は、例えば、特定の方向を設定してもよい。表示位置設定部は、例えば、前記特定の方向に基づいて、使用者の視野における前記文字画像を表す文字画像光を視認させる位置である画像表示位置を設定してもよい。こうした装置は、例えば、表示装置として実現できるが、表示装置以外の他の装置としても実現可能である。このような形態によれば、装置の操作性の向上、着脱時の容易化、装置の一体化や、製造の容易化等の種々の課題の少なくとも１つを解決することができる。前述した表示装置の各形態の技術的特徴の一部または全部は、いずれもこの装置に適用することが可能である。 For example, according to one aspect of the present invention, one or more or all of the five elements of the sound acquisition unit, the conversion unit, the image display unit, the specific direction setting unit, and the display position setting unit are included. It can be realized as a device provided. That is, this apparatus may or may not have a voice acquisition unit. Moreover, the apparatus may or may not have the conversion unit. Further, the apparatus may or may not have an image display unit. The device may or may not have the specific direction setting unit. Further, the device may or may not have a display position setting unit. The image display unit may generate image light representing the character image, allow the user to visually recognize the image light, and transmit the outside scene. The sound acquisition unit may acquire sound, for example. For example, the conversion unit may convert the voice into a character image represented as an image. The specific direction setting unit may set a specific direction, for example. For example, the display position setting unit may set an image display position that is a position where the character image light representing the character image in the visual field of the user is visually recognized based on the specific direction. Such a device can be realized as a display device, for example, but can also be realized as a device other than the display device. According to such a form, it is possible to solve at least one of various problems such as improvement in operability of the device, ease of attachment / detachment, integration of the device, and ease of manufacture. Any or all of the technical features of each form of the display device described above can be applied to this device.

本発明は、表示装置以外の種々の形態で実現することも可能である。例えば、頭部装着型表示装置、表示装置および頭部装着型表示装置の制御方法、表示システムおよび頭部装着型表示システム、表示システムおよび頭部装着型表示システムの機能を実現するためのコンピュータープログラム、そのコンピュータープログラムを記録した記録媒体、そのコンピュータープログラムを含み搬送波内に具現化されたデータ信号等の形態で実現できる。 The present invention can also be realized in various forms other than the display device. For example, a head-mounted display device, a display device, a control method for the head-mounted display device, a display system, a head-mounted display system, a display system, and a computer program for realizing the functions of the head-mounted display system It can be realized in the form of a recording medium recording the computer program, a data signal including the computer program and embodied in a carrier wave.

頭部装着型表示装置１００の外観構成を示す説明図である。2 is an explanatory diagram showing an external configuration of a head-mounted display device 100. FIG. 頭部装着型表示装置１００の構成を機能的に示すブロック図である。3 is a block diagram functionally showing the configuration of the head-mounted display device 100. FIG. 取得音声の画像表示処理の流れを示す説明図である。It is explanatory drawing which shows the flow of the image display process of an acquisition audio | voice. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. カメラ６１が撮像した外景の画像の一例を示す説明図である。It is explanatory drawing which shows an example of the image of the outside scene which the camera 61 imaged. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 第２実施形態における取得音声の画像表示処理の流れを示す説明図である。It is explanatory drawing which shows the flow of the image display process of the acquisition audio | voice in 2nd Embodiment. 使用者の視野ＶＲの一例を表す説明図である。It is explanatory drawing showing an example of the visual field VR of a user. カメラ６１が撮像した外景画像ＢＩＭの一例を示す説明図である。It is explanatory drawing which shows an example of the outside scene image BIM which the camera 61 imaged. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 第３実施形態における頭部装着型表示装置１００ｂの構成を機能的に示す説明図である。It is explanatory drawing which shows functionally the structure of the head mounted display apparatus 100b in 3rd Embodiment. 第３実施形態における取得音声の画像表示処理の流れを示す説明図である。It is explanatory drawing which shows the flow of the image display process of the acquisition audio | voice in 3rd Embodiment. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 使用者の視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR of a user. 変形例における頭部装着型表示装置の外観構成を示す説明図である。It is explanatory drawing which shows the external appearance structure of the head mounted display apparatus in a modification.

次に、本発明の実施の形態を実施形態に基づいて以下の順序で説明する。
Ａ．第１実施形態：
Ａ−１．頭部装着型表示装置の構成：
Ａ−２．取得音声の画像表示処理：
Ｂ１．第２実施形態：
Ｂ２．第３実施形態：
Ｃ．変形例： Next, embodiments of the present invention will be described in the following order based on the embodiments.
A. First embodiment:
A-1. Configuration of head mounted display device:
A-2. Acquired sound image display processing:
B1. Second embodiment:
B2. Third embodiment:
C. Variation:

Ａ．第１実施形態：
Ａ−１．頭部装着型表示装置の構成： A. First embodiment:
A-1. Configuration of head mounted display device:

図１は、頭部装着型表示装置１００の外観構成を示す説明図である。頭部装着型表示装置１００は、頭部に装着する表示装置であり、ヘッドマウントディスプレイ（Head Mounted Display、ＨＭＤ）とも呼ばれる。本実施形態の頭部装着型表示装置１００は、使用者が、虚像を視認すると同時に外景も直接視認可能な光学透過型の頭部装着型表示装置である。なお、本明細書では、頭部装着型表示装置１００によって使用者が視認する虚像を便宜的に「表示画像」とも呼ぶ。また、画像データに基づいて生成された画像光を射出することを「画像を表示する」ともいう。 FIG. 1 is an explanatory diagram showing an external configuration of the head-mounted display device 100. The head-mounted display device 100 is a display device mounted on the head, and is also called a head mounted display (HMD). The head-mounted display device 100 of the present embodiment is an optically transmissive head-mounted display device that allows a user to visually recognize a virtual image and at the same time directly view an outside scene. In this specification, a virtual image visually recognized by the user with the head-mounted display device 100 is also referred to as a “display image” for convenience. Moreover, emitting image light generated based on image data is also referred to as “displaying an image”.

頭部装着型表示装置１００は、使用者の頭部に装着された状態において使用者に虚像を視認させる画像表示部２０と、画像表示部２０を制御する制御部１０（コントローラー１０）と、を備えている。 The head-mounted display device 100 includes an image display unit 20 that allows a user to visually recognize a virtual image when mounted on the user's head, and a control unit 10 (controller 10) that controls the image display unit 20. I have.

画像表示部２０は、使用者の頭部に装着される装着体であり、本実施形態では眼鏡形状を有している。画像表示部２０は、右保持部２１と、右表示駆動部２２と、左保持部２３と、左表示駆動部２４と、右光学像表示部２６と、左光学像表示部２８と、カメラ６１と、マイク６３と、を含んでいる。右光学像表示部２６および左光学像表示部２８は、それぞれ、使用者が画像表示部２０を装着した際に使用者の右および左の眼前に位置するように配置されている。右光学像表示部２６の一端と左光学像表示部２８の一端とは、使用者が画像表示部２０を装着した際の使用者の眉間に対応する位置で、互いに接続されている。 The image display unit 20 is a mounting body that is mounted on the user's head, and has a glasses shape in the present embodiment. The image display unit 20 includes a right holding unit 21, a right display driving unit 22, a left holding unit 23, a left display driving unit 24, a right optical image display unit 26, a left optical image display unit 28, and a camera 61. And a microphone 63. The right optical image display unit 26 and the left optical image display unit 28 are arranged so as to be positioned in front of the right and left eyes of the user when the user wears the image display unit 20, respectively. One end of the right optical image display unit 26 and one end of the left optical image display unit 28 are connected to each other at a position corresponding to the eyebrow of the user when the user wears the image display unit 20.

右保持部２１は、右光学像表示部２６の他端である端部ＥＲから、使用者が画像表示部２０を装着した際の使用者の側頭部に対応する位置にかけて、延伸して設けられた部材である。同様に、左保持部２３は、左光学像表示部２８の他端である端部ＥＬから、使用者が画像表示部２０を装着した際の使用者の側頭部に対応する位置にかけて、延伸して設けられた部材である。右保持部２１および左保持部２３は、眼鏡のテンプル（つる）のようにして、使用者の頭部に画像表示部２０を保持する。 The right holding unit 21 extends from the end ER which is the other end of the right optical image display unit 26 to a position corresponding to the user's temporal region when the user wears the image display unit 20. It is a member. Similarly, the left holding unit 23 extends from the end EL which is the other end of the left optical image display unit 28 to a position corresponding to the user's temporal region when the user wears the image display unit 20. It is a member provided. The right holding unit 21 and the left holding unit 23 hold the image display unit 20 on the user's head like a temple of glasses.

右表示駆動部２２と左表示駆動部２４とは、使用者が画像表示部２０を装着した際の使用者の頭部に対向する側に配置されている。なお、以降では、右保持部２１および左保持部２３を総称して単に「保持部」とも呼び、右表示駆動部２２および左表示駆動部２４を総称して単に「表示駆動部」とも呼び、右光学像表示部２６および左光学像表示部２８を総称して単に「光学像表示部」とも呼ぶ。 The right display drive unit 22 and the left display drive unit 24 are disposed on the side facing the user's head when the user wears the image display unit 20. Hereinafter, the right holding unit 21 and the left holding unit 23 are collectively referred to simply as “holding unit”, and the right display driving unit 22 and the left display driving unit 24 are collectively referred to simply as “display driving unit”. The right optical image display unit 26 and the left optical image display unit 28 are collectively referred to simply as “optical image display unit”.

表示駆動部２２，２４は、液晶ディスプレイ２４１，２４２（Liquid Crystal Display、以下「ＬＣＤ２４１，２４２」とも呼ぶ）や投写光学系２５１，２５２等を含む（図２参照）。表示駆動部２２，２４の構成の詳細は後述する。光学部材としての光学像表示部２６，２８は、導光板２６１，２６２（図２参照）と調光板とを含んでいる。導光板２６１，２６２は、光透過性の樹脂材料等によって形成され、表示駆動部２２，２４から出力された画像光を使用者の眼に導く。調光板は、薄板状の光学素子であり、使用者の眼の側とは反対の側である画像表示部２０の表側を覆うように配置されている。調光板は、導光板２６１，２６２を保護し、導光板２６１，２６２の損傷や汚れの付着等を抑制する。また、調光板の光透過率を調整することによって、使用者の眼に入る外光量を調整して虚像の視認のしやすさを調整できる。なお、調光板は省略可能である。 The display driving units 22 and 24 include liquid crystal displays 241 and 242 (hereinafter referred to as “LCDs 241 and 242”), projection optical systems 251 and 252 and the like (see FIG. 2). Details of the configuration of the display driving units 22 and 24 will be described later. The optical image display units 26 and 28 as optical members include light guide plates 261 and 262 (see FIG. 2) and a light control plate. The light guide plates 261 and 262 are formed of a light transmissive resin material or the like, and guide the image light output from the display driving units 22 and 24 to the eyes of the user. The light control plate is a thin plate-like optical element, and is arranged so as to cover the front side of the image display unit 20 which is the side opposite to the user's eye side. The light control plate protects the light guide plates 261 and 262 and suppresses damage to the light guide plates 261 and 262 and adhesion of dirt. In addition, by adjusting the light transmittance of the light control plate, it is possible to adjust the amount of external light entering the user's eyes and adjust the ease of visual recognition of the virtual image. The light control plate can be omitted.

カメラ６１は、使用者が画像表示部２０を装着した際の使用者の眉間に対応する位置に配置されている。カメラ６１は、使用者の眼の側とは反対側方向の外部の景色である外景を撮像し、外景画像を取得する。本実施形態におけるカメラ６１は、単眼カメラであるが、ステレオカメラであってもよい。カメラ６１は、請求項における画像取得部に相当する。 The camera 61 is disposed at a position corresponding to the user's eyebrow when the user wears the image display unit 20. The camera 61 captures an outside scene that is an external scenery in a direction opposite to the user's eye side, and acquires an outside scene image. The camera 61 in the present embodiment is a monocular camera, but may be a stereo camera. The camera 61 corresponds to an image acquisition unit in the claims.

マイク６３は、右保持部２１における右表示駆動部２２の反対側に配置されている。マイク６３は、方向によって音声を取得する感度が異なる指向性を有するマイクである。マイク６３が接続されている右保持部２１の内部には機械的な構造が形成されており、マイク６３は、右保持部２１に対して相対的に動くことができる。 The microphone 63 is arranged on the opposite side of the right display drive unit 22 in the right holding unit 21. The microphone 63 is a microphone having directivity with different sensitivities for acquiring sound depending on directions. A mechanical structure is formed inside the right holding portion 21 to which the microphone 63 is connected, and the microphone 63 can move relative to the right holding portion 21.

画像表示部２０は、さらに、画像表示部２０を制御部１０に接続するための接続部４０を有している。接続部４０は、制御部１０に接続される本体コード４８と、右コード４２と、左コード４４と、連結部材４６と、を含んでいる。右コード４２と左コード４４とは、本体コード４８が２本に分岐したコードである。右コード４２は、右保持部２１の延伸方向の先端部ＡＰから右保持部２１の筐体内に挿入され、右表示駆動部２２に接続されている。同様に、左コード４４は、左保持部２３の延伸方向の先端部ＡＰから左保持部２３の筐体内に挿入され、左表示駆動部２４に接続されている。連結部材４６は、本体コード４８と、右コード４２および左コード４４と、の分岐点に設けられ、イヤホンプラグ３０を接続するためのジャックを有している。イヤホンプラグ３０からは、右イヤホン３２および左イヤホン３４が延伸している。 The image display unit 20 further includes a connection unit 40 for connecting the image display unit 20 to the control unit 10. The connection unit 40 includes a main body cord 48, a right cord 42, a left cord 44, and a connecting member 46 that are connected to the control unit 10. The right cord 42 and the left cord 44 are codes in which the main body cord 48 is branched into two. The right cord 42 is inserted into the casing of the right holding unit 21 from the distal end AP in the extending direction of the right holding unit 21 and connected to the right display driving unit 22. Similarly, the left cord 44 is inserted into the housing of the left holding unit 23 from the distal end AP in the extending direction of the left holding unit 23 and connected to the left display driving unit 24. The connecting member 46 is provided at a branch point between the main body cord 48, the right cord 42 and the left cord 44, and has a jack for connecting the earphone plug 30. A right earphone 32 and a left earphone 34 extend from the earphone plug 30.

画像表示部２０と制御部１０とは、接続部４０を介して各種信号の伝送を行なう。本体コード４８における連結部材４６とは反対側の端部と、制御部１０と、のそれぞれには、互いに嵌合するコネクター（図示しない）が設けられている。本体コード４８のコネクターと制御部１０のコネクターとの嵌合／嵌合解除により、制御部１０と画像表示部２０とが接続されたり切り離されたりする。右コード４２と、左コード４４と、本体コード４８とには、例えば、金属ケーブルや光ファイバーを採用できる。 The image display unit 20 and the control unit 10 transmit various signals via the connection unit 40. A connector (not shown) that fits each other is provided at each of the end of the main body cord 48 opposite to the connecting member 46 and the control unit 10. By fitting / releasing the connector of the main body cord 48 and the connector of the control unit 10, the control unit 10 and the image display unit 20 are connected or disconnected. For the right cord 42, the left cord 44, and the main body cord 48, for example, a metal cable or an optical fiber can be adopted.

制御部１０は、頭部装着型表示装置１００を制御するための装置である。制御部１０は、決定キー１１と、点灯部１２と、表示切替キー１３と、トラックパッド１４と、輝度切替キー１５と、方向キー１６と、メニューキー１７と、電源スイッチ１８と、を含んでいる。決定キー１１は、押下操作を検出して、制御部１０で操作された内容を決定する信号を出力する。点灯部１２は、頭部装着型表示装置１００の動作状態を、その発光状態によって通知する。頭部装着型表示装置１００の動作状態としては、例えば、電源のＯＮ／ＯＦＦ等がある。点灯部１２としては、例えば、ＬＥＤ（Light Emitting Diode）が用いられる。表示切替キー１３は、押下操作を検出して、例えば、コンテンツ動画の表示モードを３Ｄと２Ｄとに切り替える信号を出力する。トラックパッド１４は、トラックパッド１４の操作面上での使用者の指の操作を検出して、検出内容に応じた信号を出力する。トラックパッド１４としては、静電式や圧力検出式、光学式といった種々のトラックパッドを採用できる。輝度切替キー１５は、押下操作を検出して、画像表示部２０の輝度を増減する信号を出力する。方向キー１６は、上下左右方向に対応するキーへの押下操作を検出して、検出内容に応じた信号を出力する。電源スイッチ１８は、スイッチのスライド操作を検出することで、頭部装着型表示装置１００の電源投入状態を切り替える。 The control unit 10 is a device for controlling the head-mounted display device 100. The control unit 10 includes a determination key 11, a lighting unit 12, a display switching key 13, a track pad 14, a luminance switching key 15, a direction key 16, a menu key 17, and a power switch 18. Yes. The determination key 11 detects a pressing operation and outputs a signal for determining the content operated by the control unit 10. The lighting unit 12 notifies the operation state of the head-mounted display device 100 by its light emission state. Examples of the operating state of the head-mounted display device 100 include power ON / OFF. For example, an LED (Light Emitting Diode) is used as the lighting unit 12. The display switching key 13 detects a pressing operation and outputs a signal for switching the display mode of the content video between 3D and 2D, for example. The track pad 14 detects the operation of the user's finger on the operation surface of the track pad 14 and outputs a signal corresponding to the detected content. As the track pad 14, various track pads such as an electrostatic type, a pressure detection type, and an optical type can be adopted. The luminance switching key 15 detects a pressing operation and outputs a signal for increasing or decreasing the luminance of the image display unit 20. The direction key 16 detects a pressing operation on a key corresponding to the up / down / left / right direction, and outputs a signal corresponding to the detected content. The power switch 18 switches the power-on state of the head-mounted display device 100 by detecting a slide operation of the switch.

図２は、頭部装着型表示装置１００の構成を機能的に示すブロック図である。図２に示すように、制御部１０は、ＣＰＵ１４０と、操作部１３５と、入力情報取得部１１０と、記憶部１２０と、電源１３０と、インターフェイス１８０と、送信部５１（Ｔｘ５１）および送信部５２（Ｔｘ５２）と、を有している。操作部１３５は、使用者による操作を受け付け、決定キー１１、表示切替キー１３、トラックパッド１４、輝度切替キー１５、方向キー１６、メニューキー１７、電源スイッチ１８、から構成されている。 FIG. 2 is a block diagram functionally showing the configuration of the head-mounted display device 100. As shown in FIG. 2, the control unit 10 includes a CPU 140, an operation unit 135, an input information acquisition unit 110, a storage unit 120, a power supply 130, an interface 180, a transmission unit 51 (Tx51), and a transmission unit 52. (Tx52). The operation unit 135 receives an operation by the user and includes an enter key 11, a display switch key 13, a track pad 14, a luminance switch key 15, a direction key 16, a menu key 17, and a power switch 18.

入力情報取得部１１０は、使用者による操作入力に応じた信号を取得する。操作入力に応じた信号としては、例えば、トラックパッド１４、方向キー１６、電源スイッチ１８、に対する操作入力がある。電源１３０は、頭部装着型表示装置１００の各部に電力を供給する。電源１３０としては、例えば二次電池を用いることができる。記憶部１２０は、種々のコンピュータープログラムを格納している。記憶部１２０は、ＲＯＭやＲＡＭ等によって構成されている。ＣＰＵ１４０は、記憶部１２０に格納されているコンピュータープログラムを読み出して実行することにより、オペレーティングシステム１５０（ОＳ１５０）、画像処理部１６０、表示制御部１９０、マイク駆動部１６３、変換部１８５、音声処理部１７０、方向判定部１６１、として機能する。 The input information acquisition unit 110 acquires a signal corresponding to an operation input by the user. As a signal corresponding to the operation input, for example, there is an operation input to the track pad 14, the direction key 16, and the power switch 18. The power supply 130 supplies power to each part of the head-mounted display device 100. As the power supply 130, for example, a secondary battery can be used. The storage unit 120 stores various computer programs. The storage unit 120 is configured by a ROM, a RAM, or the like. The CPU 140 reads out and executes the computer program stored in the storage unit 120, thereby operating the operating system 150 (OS150), the image processing unit 160, the display control unit 190, the microphone driving unit 163, the conversion unit 185, and the audio processing unit. 170 functions as a direction determination unit 161.

画像処理部１６０は、コンテンツに含まれる画像信号を取得する。画像処理部１６０は、取得した画像信号から、垂直同期信号ＶＳｙｎｃや水平同期信号ＨＳｙｎｃ等の同期信号を分離する。また、画像処理部１６０は、分離した垂直同期信号ＶＳｙｎｃや水平同期信号ＨＳｙｎｃの周期に応じて、ＰＬＬ（Phase Locked Loop）回路等（図示しない）を利用してクロック信号ＰＣＬＫを生成する。画像処理部１６０は、同期信号が分離されたアナログ画像信号を、Ａ／Ｄ変換回路等（図示しない）を用いてデジタル画像信号に変換する。その後、画像処理部１６０は、変換後のデジタル画像信号を、対象画像の画像データＤａｔａ（ＲＧＢデータ）として、１フレームごとに記憶部１２０内のＤＲＡＭに格納する。なお、画像処理部１６０は、必要に応じて、画像データに対して、解像度変換処理、輝度、彩度の調整といった種々の色調補正処理、キーストーン補正処理等の画像処理を実行してもよい。 The image processing unit 160 acquires an image signal included in the content. The image processing unit 160 separates synchronization signals such as the vertical synchronization signal VSync and the horizontal synchronization signal HSync from the acquired image signal. Further, the image processing unit 160 generates a clock signal PCLK using a PLL (Phase Locked Loop) circuit or the like (not shown) according to the period of the separated vertical synchronization signal VSync and horizontal synchronization signal HSync. The image processing unit 160 converts the analog image signal from which the synchronization signal is separated into a digital image signal using an A / D conversion circuit or the like (not shown). Thereafter, the image processing unit 160 stores the converted digital image signal in the DRAM in the storage unit 120 for each frame as image data Data (RGB data) of the target image. Note that the image processing unit 160 may execute image processing such as various tone correction processing such as resolution conversion processing, brightness and saturation adjustment, and keystone correction processing on the image data as necessary. .

画像処理部１６０は、生成されたクロック信号ＰＣＬＫ、垂直同期信号ＶＳｙｎｃ、水平同期信号ＨＳｙｎｃ、記憶部１２０内のＤＲＡＭに格納された画像データＤａｔａ、のそれぞれを、送信部５１、５２を介して送信する。なお、送信部５１を介して送信される画像データＤａｔａを「右眼用画像データ」とも呼び、送信部５２を介して送信される画像データＤａｔａを「左眼用画像データ」とも呼ぶ。送信部５１、５２は、制御部１０と画像表示部２０との間におけるシリアル伝送のためのトランシーバーとして機能する。 The image processing unit 160 transmits the generated clock signal PCLK, vertical synchronization signal VSync, horizontal synchronization signal HSync, and image data Data stored in the DRAM in the storage unit 120 via the transmission units 51 and 52, respectively. To do. The image data Data transmitted via the transmission unit 51 is also referred to as “right eye image data”, and the image data Data transmitted via the transmission unit 52 is also referred to as “left eye image data”. The transmission units 51 and 52 function as a transceiver for serial transmission between the control unit 10 and the image display unit 20.

表示制御部１９０は、右表示駆動部２２および左表示駆動部２４を制御する制御信号を生成する。具体的には、表示制御部１９０は、制御信号により、右ＬＣＤ制御部２１１による右ＬＣＤ２４１の駆動ＯＮ／ＯＦＦ、右バックライト制御部２０１による右バックライト２２１の駆動ＯＮ／ＯＦＦ、左ＬＣＤ制御部２１２による左ＬＣＤ２４２の駆動ＯＮ／ＯＦＦ、左バックライト制御部２０２による左バックライト２２２の駆動ＯＮ／ＯＦＦなど、を個別に制御する。これにより、表示制御部１９０は、右表示駆動部２２および左表示駆動部２４のそれぞれによる画像光の生成および射出を制御する。例えば、表示制御部１９０は、右表示駆動部２２および左表示駆動部２４の両方に画像光を生成させたり、一方のみに画像光を生成させたり、両方共に画像光を生成させなかったりする。 The display control unit 190 generates control signals for controlling the right display drive unit 22 and the left display drive unit 24. Specifically, the display control unit 190 controls driving of the right LCD 241 by the right LCD control unit 211, driving ON / OFF of the right backlight 221 by the right backlight control unit 201, and left LCD control unit according to control signals. The left LCD 242 driving ON / OFF by 212, the left backlight 222 driving ON / OFF by the left backlight control unit 202, and the like are individually controlled. Thus, the display control unit 190 controls the generation and emission of image light by the right display driving unit 22 and the left display driving unit 24, respectively. For example, the display control unit 190 may cause both the right display driving unit 22 and the left display driving unit 24 to generate image light, generate only one image light, or neither may generate image light.

表示制御部１９０は、右ＬＣＤ制御部２１１と左ＬＣＤ制御部２１２とに対する制御信号のそれぞれを、送信部５１および５２を介して送信する。また、表示制御部１９０は、右バックライト制御部２０１と左バックライト制御部２０２とに対する制御信号のそれぞれを送信する。マイク駆動部１６３は、マイク６３の向きを設定する。操作部１３５が受け付けた操作によって使用者が取得したい音声の音源（以下、「目標音源」とも呼ぶ）が特定されると、マイク駆動部１６３は、目標音源からの音声を取得する感度が最大になるように、マイク６３の向きを変更する。マイク駆動部１６３は、後述する９軸センサー６６が検出する画像表示部２０の位置および向きを取得する。これにより、画像表示部２０と目標音源との位置関係に関わらず、マイク駆動部１６３は、マイク６３が常にマイク６３から目標音源への方向（以下、「音源方向」とも呼ぶ）からの音声を取得する感度が最大になるように向きを変更できる。そのため、本実施形態における頭部装着型表示装置１００では、マイク６３は、音源方向からの音声を高い感度で取得し、音源方向からずれるほど音声を取得しにくくなるため、取得した音声を表す文字画像の精度を向上させる。なお、操作部１３５は、請求項における特定方向設定部に相当し、マイク６３およびマイク駆動部１６３は、請求項における音声取得部に相当する。 The display control unit 190 transmits control signals for the right LCD control unit 211 and the left LCD control unit 212 via the transmission units 51 and 52, respectively. In addition, the display control unit 190 transmits control signals to the right backlight control unit 201 and the left backlight control unit 202, respectively. The microphone driving unit 163 sets the direction of the microphone 63. When the sound source (hereinafter also referred to as “target sound source”) that the user wants to acquire is specified by the operation received by the operation unit 135, the microphone driving unit 163 has the highest sensitivity for acquiring sound from the target sound source. Thus, the direction of the microphone 63 is changed. The microphone drive unit 163 acquires the position and orientation of the image display unit 20 detected by a 9-axis sensor 66 described later. Thus, regardless of the positional relationship between the image display unit 20 and the target sound source, the microphone driving unit 163 always outputs sound from the direction from the microphone 63 to the target sound source (hereinafter also referred to as “sound source direction”). The orientation can be changed to maximize the sensitivity. For this reason, in the head-mounted display device 100 according to the present embodiment, the microphone 63 acquires sound from the sound source direction with high sensitivity, and the sound that becomes difficult to acquire as the position deviates from the sound source direction. Improve image accuracy. The operation unit 135 corresponds to the specific direction setting unit in the claims, and the microphone 63 and the microphone driving unit 163 correspond to the sound acquisition unit in the claims.

変換部１８５は、マイク６３が取得した音声を、音声を文字によって表した文字画像へと変換する。文字画像は、表示制御部１９０によって文字画像を表す制御信号として画像表示部２０に送信される。画像表示部２０は、送信された制御信号に基づいて文字画像を表す画像光を生成して、使用者の眼に射出することで、使用者は、音声を文字画像として視認できる。 The conversion unit 185 converts the sound acquired by the microphone 63 into a character image in which the sound is represented by characters. The character image is transmitted to the image display unit 20 as a control signal representing the character image by the display control unit 190. The image display unit 20 generates image light representing a character image based on the transmitted control signal and emits it to the user's eyes, so that the user can visually recognize the sound as a character image.

方向判定部１６１は、音源方向と、後述する９軸センサー６６が検出した画像表示部２０の向きから推定される使用者の視線方向と、がなす角度が予め定められた閾値以上か否かを判定する。また、方向判定部１６１は、音源方向と視線方向とがなす角度に基づいて、使用者が外景に目標音源を視認しているか否かを判定する。なお、設定される特定の方向と使用者の視線方向とがなす角度は、請求項における特定角度に相当し、特定の方向の一例として設定された音源方向と使用者の視線方向とがなす角度も特定角度に相当する。 The direction determination unit 161 determines whether the angle formed by the sound source direction and the user's line-of-sight direction estimated from the orientation of the image display unit 20 detected by the 9-axis sensor 66 described later is equal to or greater than a predetermined threshold. judge. In addition, the direction determination unit 161 determines whether or not the user visually recognizes the target sound source in the outside scene based on the angle formed by the sound source direction and the line-of-sight direction. The angle formed between the specified specific direction and the user's line-of-sight direction corresponds to the specific angle in the claims, and the angle formed between the sound source direction set as an example of the specific direction and the user's line-of-sight direction. Corresponds to a specific angle.

音声処理部１７０は、コンテンツに含まれる音声信号を取得し、取得した音声信号を増幅して、連結部材４６に接続された右イヤホン３２内のスピーカー（図示しない）および左イヤホン３４内のスピーカー（図示しない）に対して供給する。なお、例えば、Ｄｏｌｂｙ（登録商標）システムを採用した場合、音声信号に対する処理がなされ、右イヤホン３２および左イヤホン３４のそれぞれからは、例えば周波数等が変えられた異なる音が出力される。また、音声処理部１７０は、マイク６３が取得した音声から特徴を抽出してモデル化することで、複数の人の声を別々に認識して、声ごとに話している人を特定する話者認識を行なう。音声処理部１７０は、請求項における音声識別部に相当する。 The audio processing unit 170 acquires an audio signal included in the content, amplifies the acquired audio signal, and a speaker (not shown) in the right earphone 32 and a speaker (not shown) connected to the connecting member 46 ( (Not shown). For example, when the Dolby (registered trademark) system is adopted, processing on the audio signal is performed, and different sounds with different frequencies or the like are output from the right earphone 32 and the left earphone 34, for example. In addition, the voice processing unit 170 extracts a feature from the voice acquired by the microphone 63 and models it, thereby recognizing the voices of a plurality of people separately and identifying a person who speaks for each voice. Recognize. The voice processing unit 170 corresponds to the voice identification unit in the claims.

インターフェイス１８０は、制御部１０に対して、コンテンツの供給元となる種々の外部機器ＯＡを接続するためのインターフェイスである。外部機器ＯＡとしては、例えば、パーソナルコンピューター（ＰＣ）や携帯電話端末、ゲーム端末等、がある。インターフェイス１８０としては、例えば、ＵＳＢインターフェイス、マイクロＵＳＢインターフェイス、メモリーカード用インターフェイス等、を用いることができる。 The interface 180 is an interface for connecting various external devices OA that are content supply sources to the control unit 10. Examples of the external device OA include a personal computer (PC), a mobile phone terminal, and a game terminal. As the interface 180, for example, a USB interface, a micro USB interface, a memory card interface, or the like can be used.

画像表示部２０は、右表示駆動部２２と、左表示駆動部２４と、右光学像表示部２６としての右導光板２６１と、左光学像表示部２８としての左導光板２６２と、カメラ６１と、９軸センサー６６と、マイク６３と、を備えている。 The image display unit 20 includes a right display drive unit 22, a left display drive unit 24, a right light guide plate 261 as a right optical image display unit 26, a left light guide plate 262 as a left optical image display unit 28, and a camera 61. And a 9-axis sensor 66 and a microphone 63.

９軸センサー６６は、加速度（３軸）、角速度（３軸）、地磁気（３軸）、を検出するモーションセンサーである。９軸センサー６６は、画像表示部２０に設けられているため、画像表示部２０が使用者の頭部に装着されているときには、使用者の頭部の動きを検出する。検出された使用者の頭部の動きから画像表示部２０の向きがわかるため、方向判定部１６１は、使用者の視線方向を推定できる。方向判定部１６１と９軸センサー６６とは、請求項における視線方向推定部に相当する。マイク６３は、取得した音声の音声信号を変換部１８５および音声処理部１７０に送信する。 The 9-axis sensor 66 is a motion sensor that detects acceleration (3 axes), angular velocity (3 axes), and geomagnetism (3 axes). Since the 9-axis sensor 66 is provided in the image display unit 20, when the image display unit 20 is mounted on the user's head, the movement of the user's head is detected. Since the orientation of the image display unit 20 is known from the detected movement of the user's head, the direction determination unit 161 can estimate the user's line-of-sight direction. The direction determination unit 161 and the 9-axis sensor 66 correspond to a gaze direction estimation unit in claims. The microphone 63 transmits the acquired audio signal to the conversion unit 185 and the audio processing unit 170.

右表示駆動部２２は、受信部５３（Ｒｘ５３）と、光源として機能する右バックライト制御部２０１（右ＢＬ制御部２０１）および右バックライト２２１（右ＢＬ２２１）と、表示素子として機能する右ＬＣＤ制御部２１１および右ＬＣＤ２４１と、右投写光学系２５１と、を含んでいる。右バックライト制御部２０１と右バックライト２２１とは、光源として機能する。右ＬＣＤ制御部２１１と右ＬＣＤ２４１とは、表示素子として機能する。なお、右バックライト制御部２０１と、右ＬＣＤ制御部２１１と、右バックライト２２１と、右ＬＣＤ２４１と、を総称して「画像光生成部」とも呼ぶ。 The right display driving unit 22 includes a receiving unit 53 (Rx53), a right backlight control unit 201 (right BL control unit 201) and a right backlight 221 (right BL221) that function as a light source, and a right LCD that functions as a display element. A control unit 211, a right LCD 241 and a right projection optical system 251 are included. The right backlight control unit 201 and the right backlight 221 function as a light source. The right LCD control unit 211 and the right LCD 241 function as display elements. The right backlight control unit 201, the right LCD control unit 211, the right backlight 221 and the right LCD 241 are collectively referred to as “image light generation unit”.

受信部５３は、制御部１０と画像表示部２０との間におけるシリアル伝送のためのレシーバーとして機能する。右バックライト制御部２０１は、入力された制御信号に基づいて、右バックライト２２１を駆動する。右バックライト２２１は、例えば、ＬＥＤやエレクトロルミネセンス（ＥＬ）等の発光体である。右ＬＣＤ制御部２１１は、受信部５３を介して入力されたクロック信号ＰＣＬＫと、垂直同期信号ＶＳｙｎｃと、水平同期信号ＨＳｙｎｃと、右眼用画像データＤａｔａ１と、に基づいて、右ＬＣＤ２４１を駆動する。右ＬＣＤ２４１は、複数の画素をマトリクス状に配置した透過型液晶パネルである。 The receiving unit 53 functions as a receiver for serial transmission between the control unit 10 and the image display unit 20. The right backlight control unit 201 drives the right backlight 221 based on the input control signal. The right backlight 221 is a light emitter such as an LED or electroluminescence (EL). The right LCD control unit 211 drives the right LCD 241 based on the clock signal PCLK, the vertical synchronization signal VSync, the horizontal synchronization signal HSync, and the right eye image data Data1 input through the reception unit 53. . The right LCD 241 is a transmissive liquid crystal panel in which a plurality of pixels are arranged in a matrix.

右投写光学系２５１は、右ＬＣＤ２４１から射出された画像光を並行状態の光束にするコリメートレンズによって構成される。右光学像表示部２６としての右導光板２６１は、右投写光学系２５１から出力された画像光を、所定の光路に沿って反射させつつ使用者の右眼ＲＥに導く。なお、右投写光学系２５１と右導光板２６１とを総称して「導光部」とも呼ぶ。 The right projection optical system 251 is configured by a collimator lens that converts the image light emitted from the right LCD 241 to light beams in a parallel state. The right light guide plate 261 as the right optical image display unit 26 guides the image light output from the right projection optical system 251 to the right eye RE of the user while reflecting the image light along a predetermined optical path. The right projection optical system 251 and the right light guide plate 261 are collectively referred to as “light guide unit”.

左表示駆動部２４は、右表示駆動部２２と同様の構成を有している。左表示駆動部２４は、受信部５４（Ｒｘ５４）と、光源として機能する左バックライト制御部２０２（左ＢＬ制御部２０２）および左バックライト２２２（左ＢＬ２０２）と、表示素子として機能する左ＬＣＤ制御部２１２および左ＬＣＤ２４２と、左投写光学系２５２と、を含んでいる。左バックライト制御部２０２と左バックライト２２２とは、光源として機能する。左ＬＣＤ制御部２１２と左ＬＣＤ２４２とは、表示素子として機能する。なお、左バックライト制御部２０２と、左ＬＣＤ制御部２１２と、左バックライト２２２と、左ＬＣＤ２４２と、を総称して「画像光生成部」とも呼ぶ。また、左投写光学系２５２は、左ＬＣＤ２４２から射出された画像光を並行状態の光束にするコリメートレンズによって構成される。左光学像表示部２８としての左導光板２６２は、左投写光学系２５２から出力された画像光を、所定の光路に沿って反射させつつ使用者の左眼ＬＥに導く。なお、左投写光学系２５２と左導光板２６２とを総称して「導光部」とも呼ぶ。 The left display drive unit 24 has the same configuration as the right display drive unit 22. The left display driving unit 24 includes a receiving unit 54 (Rx54), a left backlight control unit 202 (left BL control unit 202) and a left backlight 222 (left BL202) that function as a light source, and a left LCD that functions as a display element. A control unit 212 and a left LCD 242 and a left projection optical system 252 are included. The left backlight control unit 202 and the left backlight 222 function as a light source. The left LCD control unit 212 and the left LCD 242 function as display elements. The left backlight control unit 202, the left LCD control unit 212, the left backlight 222, and the left LCD 242 are also collectively referred to as “image light generation unit”. The left projection optical system 252 is configured by a collimating lens that converts the image light emitted from the left LCD 242 into a light beam in a parallel state. The left light guide plate 262 as the left optical image display unit 28 guides the image light output from the left projection optical system 252 to the left eye LE of the user while reflecting the image light along a predetermined optical path. The left projection optical system 252 and the left light guide plate 262 are collectively referred to as “light guide unit”.

Ａ−２．取得音声の画像表示処理：
図３は、取得音声の画像表示処理の流れを示す説明図である。図３には、マイク６３が取得した音声を画像表示部２０に文字画像として表示する処理の流れが示されている。 A-2. Acquired sound image display processing:
FIG. 3 is an explanatory diagram showing the flow of the image display process of acquired sound. FIG. 3 shows a flow of processing for displaying the sound acquired by the microphone 63 on the image display unit 20 as a character image.

初めに、カメラ６１は、外景を撮像する（ステップＳ３０５）。図４は、使用者の視野ＶＲの一例を示す説明図である。図４には、使用者が視認できる視野ＶＲと、画像表示部２０が画像を表示できる領域である最大画像表示領域ＰＮと、が示されている。図４に示すように、使用者は、目標音源である教師ＴＥと、教師ＴＥの発言を聞いている複数の生徒ＳＴと、を外景として視認できる。また、使用者は、教師ＴＥがホワイトボードＷＢに書いた文字を視認できる。使用者は、操作部１３５を操作することにより、カメラ６１によって視認している外景を撮像できる。なお、使用者が視認している外景とカメラ６１が撮像する外景とは、使用者の視線方向やカメラ６１の向き等によって異なる場合がある。そのため、他の実施形態では、カメラ６１が外景を撮像する前に、撮像される外景の画像を最大画像表示領域ＰＮに表示させて使用者に視認させ、使用者が操作部１３５を操作することで撮像する外景の画像が使用者の視野ＶＲに視認される外景に近づくように補正される態様にしてもよい。 First, the camera 61 captures an outside scene (step S305). FIG. 4 is an explanatory diagram showing an example of the visual field VR of the user. FIG. 4 shows a visual field VR that can be visually recognized by the user and a maximum image display area PN that is an area in which the image display unit 20 can display an image. As shown in FIG. 4, the user can visually recognize the teacher TE, which is the target sound source, and a plurality of students ST listening to the teacher TE's remarks as an outside scene. In addition, the user can visually recognize characters written on the whiteboard WB by the teacher TE. The user can take an image of the outside scene visually recognized by the camera 61 by operating the operation unit 135. Note that the outside scene visually recognized by the user and the outside scene captured by the camera 61 may differ depending on the viewing direction of the user, the orientation of the camera 61, and the like. Therefore, in another embodiment, before the camera 61 captures an outside scene, the image of the captured outside scene is displayed on the maximum image display area PN to be visually recognized by the user, and the user operates the operation unit 135. It is also possible to adopt a mode in which the image of the outside scene imaged in is corrected so as to approach the outside scene visually recognized in the user's visual field VR.

次に、使用者は、音源方向を設定する（ステップＳ３１０）。図５は、カメラ６１が撮像した外景の画像の一例を示す説明図である。図５には、使用者が目標音源を特定する場合に、最大画像表示領域ＰＮの全域にカメラ６１が撮像した外景の画像が表示され、使用者の視野ＶＲ内で、かつ、最大画像表示領域ＰＮ以外の領域では使用者が外景を視認している状態が示されている。使用者が目標音源を特定したい場合に所定の操作を行なうと、画像処理部１６０は、最大画像表示領域ＰＮに表示されている画像に対して顔認識を行ない、教師ＴＥを目標音源の候補として抽出する。図５に示すように、最大画像表示領域ＰＮに表示された画像において教師ＴＥが抽出されると、画像表示部２０は、教師ＴＥの顔を囲んで点滅する矩形形状の枠ＭＡを最大画像表示領域ＰＮに表示する。この状態で、使用者が決定キー１１を押下すると、教師ＴＥが目標音源として特定され、音源方向が設定される。目標音源が特定されると、記憶部１２０は、最大画像表示領域ＰＮに表示された画像における枠ＭＡ内の画像を目標音源の画像として記憶する。そのため、本実施形態における頭部装着型表示装置１００では、操作部１３５に設定される特定の方向が音源方向であるため、音源から取得された音声を音源と関連付けて最大画像表示領域ＰＮに表示することができ、使用者の音声を認識する理解度を向上させることができる。 Next, the user sets the sound source direction (step S310). FIG. 5 is an explanatory diagram illustrating an example of an outside scene image captured by the camera 61. In FIG. 5, when the user specifies the target sound source, an image of the outside scene captured by the camera 61 is displayed over the entire maximum image display area PN, and is within the user's visual field VR and the maximum image display area. In a region other than the PN, a state where the user is viewing the outside scene is shown. When the user wants to specify the target sound source and performs a predetermined operation, the image processing unit 160 performs face recognition on the image displayed in the maximum image display area PN, and uses the teacher TE as a target sound source candidate. Extract. As shown in FIG. 5, when the teacher TE is extracted from the image displayed in the maximum image display area PN, the image display unit 20 displays a rectangular frame MA flashing around the face of the teacher TE. Display in area PN. When the user presses the enter key 11 in this state, the teacher TE is specified as the target sound source, and the sound source direction is set. When the target sound source is specified, the storage unit 120 stores the image within the frame MA in the image displayed in the maximum image display area PN as the image of the target sound source. Therefore, in the head-mounted display device 100 according to the present embodiment, since the specific direction set in the operation unit 135 is the sound source direction, the sound acquired from the sound source is displayed in the maximum image display area PN in association with the sound source. It is possible to improve the understanding level of recognizing the user's voice.

目標音源の画像が記憶されると、画像表示部２０は、最大画像表示領域ＰＮに撮像された外景の画像を表示しなくなり、使用者は、外景として教師ＴＥや生徒ＳＴを視認できるようになる。なお、設定された音源方向は、画像表示部２０の向きに関わらず、マイク６３から目標音源への絶対方向である。 When the image of the target sound source is stored, the image display unit 20 does not display the image of the outside scene captured in the maximum image display area PN, and the user can visually recognize the teacher TE and the student ST as the outside scene. . The set sound source direction is the absolute direction from the microphone 63 to the target sound source regardless of the orientation of the image display unit 20.

音源方向が設定されると、次に、マイク駆動部１６３は、マイク６３の向きを設定する（図３のステップＳ３２０）。マイク駆動部１６３は、マイク６３が音源方向からの音声を取得する感度が最大になるような向きに設定する。マイク６３は、向きが設定されると、音声を取得する（ステップＳ３３０）。次に、変換部１８５は、マイク６３が取得した音声を、音声を表す文字画像へと変換する（ステップＳ３４０）。画像処理部１６０および画像表示部２０は、使用者に文字画像を視認させる（ステップＳ３５０）。 When the sound source direction is set, the microphone driving unit 163 next sets the direction of the microphone 63 (step S320 in FIG. 3). The microphone driving unit 163 sets the microphone 63 so that the sensitivity at which the microphone 63 acquires sound from the sound source direction is maximized. When the direction is set, the microphone 63 acquires sound (step S330). Next, the conversion unit 185 converts the voice acquired by the microphone 63 into a character image representing the voice (step S340). The image processing unit 160 and the image display unit 20 make the user visually recognize the character image (step S350).

図６は、使用者の視野ＶＲの一例を示す説明図である。図６には、使用者が外景に加えて、教師ＴＥの音声を文字画像として表すテキスト画像ＴＸ１、テキスト画像ＴＸ２、テキスト画像ＴＸ３（以下、あわせて「テキスト画像群」とも呼ぶ）、を視認している状態が示されている。テキスト画像ＴＸ１は、教師ＴＥの音声がリアルタイムで変換されて更新されて表される文字画像である。テキスト画像ＴＸ２とテキスト画像ＴＸ３とは、テキスト画像ＴＸ１よりも所定の時間前の教師ＴＥの音声を表す文字画像である。テキスト画像群は、視野ＶＲにおいて、音源方向に存在する教師ＴＥに重複しない位置であると共に、教師ＴＥの近くの位置に表示される。そのため、第１実施形態における頭部装着型表示装置１００では、目標音源から取得される音声を表す文字画像であるテキスト画像群が音源方向に重複しない位置に表示されるため、使用者は、発言者と発言者の音声とをより関連付けて認識しやすい。 FIG. 6 is an explanatory diagram showing an example of the visual field VR of the user. In FIG. 6, in addition to the outside scene, the user visually recognizes a text image TX1, a text image TX2, and a text image TX3 (hereinafter also referred to as “text image group”) representing the voice of the teacher TE as a character image. The state is shown. The text image TX1 is a character image that is represented by updating the voice of the teacher TE in real time. The text image TX2 and the text image TX3 are character images representing the voice of the teacher TE a predetermined time before the text image TX1. In the visual field VR, the text image group is displayed at a position not overlapping with the teacher TE existing in the sound source direction and at a position near the teacher TE. Therefore, in the head-mounted display device 100 according to the first embodiment, the text image group that is a character image representing the sound acquired from the target sound source is displayed at a position that does not overlap in the sound source direction. It is easy to recognize the voice of the speaker and the speaker more closely.

本実施形態では、テキスト画像ＴＸ１は、使用者の視野ＶＲにおける音源方向が視認される位置の近くに表示される。一般に、使用者の視野ＶＲは、最大で水平方向に約２００度、重力方向に約１２５度であることが知られている。本明細書では、使用者の視野ＶＲにおける音源方向の近くとは、音源方向を中心とする使用者の視野角６０度以内のことをいう。テキスト画像ＴＸ１が音源方向を中心として使用者の視野角４５度以内に表示されるとさらに好ましい。 In the present embodiment, the text image TX1 is displayed near the position where the sound source direction is visually recognized in the user's visual field VR. Generally, it is known that the user's visual field VR is about 200 degrees in the horizontal direction and about 125 degrees in the gravity direction at the maximum. In this specification, “near the sound source direction in the user's visual field VR” means that the user's visual angle around the sound source direction is within 60 degrees. More preferably, the text image TX1 is displayed within a viewing angle of 45 degrees of the user centering on the sound source direction.

また、テキスト画像ＴＸ１等は、使用者の視野ＶＲにおける中央部以外に表示される。本明細書では、使用者の視野ＶＲにおける中央部以外とは、使用者の視野ＶＲにおける中心から左右上下方向の３０度以内を除いた範囲をいう。また、テキスト画像ＴＸ１等が視野ＶＲにおける中心の左右上下方向の４５度以内を除いた範囲に表示されるとさらに好ましい。なお、画像処理部１６０と画像表示部２０とは、請求項における表示位置設定部に相当する。 Further, the text image TX1 or the like is displayed in a portion other than the central portion in the user's visual field VR. In the present specification, the terms other than the central portion in the user's visual field VR refer to a range excluding 30 degrees or less in the horizontal and vertical directions from the center in the user's visual field VR. Further, it is more preferable that the text image TX1 or the like is displayed in a range excluding within 45 degrees in the horizontal and vertical directions at the center in the visual field VR. The image processing unit 160 and the image display unit 20 correspond to a display position setting unit in claims.

テキスト画像ＴＸ１は、所定の時間が経過すると、テキスト画像ＴＸ２やテキスト画像ＴＸ３のように、教師ＴＥからの吹き出しがない１つの固まりである文字画像として表示される。１つの固まりの文字画像には、１分間における教師ＴＥの音声を表す文字画像が表示される。そのため、１分間ごとに新たな固まりのテキスト画像が作成される。図６に示すテキスト画像群の表示態様では、テキスト画像の固まりを３つまで表示し、新たなテキスト画像の固まりが作成されると、最大画像表示領域ＰＮから一番古いテキスト画像の固まりが表示されなくなる。記憶部１２０は、最大画像表示領域ＰＮにおけるテキスト画像の固まりの表示／非表示に関わらず、自動的に最大画像表示領域ＰＮに表示していたテキスト画像群をテキスト画像の固まりごとに分けて記憶する。なお、他の実施形態では、１つのテキスト画像の固まりは、１分間の間に取得される音声である必要はなく、例えば、２分間であってもよいし、テキスト画像群が１つにまとめられて１つのテキスト画像の固まりとして表示されてもよい。 When a predetermined time elapses, the text image TX1 is displayed as a character image that is one lump without a balloon from the teacher TE, like the text image TX2 and the text image TX3. A character image representing the voice of the teacher TE in one minute is displayed on one character image. Therefore, a new mass text image is created every minute. In the display mode of the text image group shown in FIG. 6, up to three text image clusters are displayed, and when a new text image cluster is created, the oldest text image cluster is displayed from the maximum image display area PN. It will not be done. The storage unit 120 stores the text image group that has been automatically displayed in the maximum image display area PN for each text image group regardless of whether the text image group is displayed or not displayed in the maximum image display area PN. To do. In another embodiment, the lump of one text image does not need to be sound acquired during one minute, and may be, for example, two minutes, or a group of text images is combined. And may be displayed as a lump of one text image.

次に、使用者の視線方向が変化したか否かが判定される（図３のステップＳ３６０）。本実施形態における現時点では、図４から図６までに示すように、使用者は、音源方向に存在する教師ＴＥを視認し続けており、音源方向と使用者の視線方向とはほとんど同じ方向である。 Next, it is determined whether or not the user's line-of-sight direction has changed (step S360 in FIG. 3). At the present time in the present embodiment, as shown in FIGS. 4 to 6, the user continues to visually recognize the teacher TE existing in the sound source direction, and the sound source direction and the user's line-of-sight direction are almost the same direction. is there.

方向判定部１６１は、音源方向と視線方向とがなす角度が３０度以上変化したかを判定する。使用者の視線方向がほとんど動かず、使用者の視線方向と音源方向とのなす角が３０度未満であると判定された場合には（ステップＳ３６０：ＮＯ）、引き続き、文字画像は、視野ＶＲにおいて、教師ＴＥに重複しない位置であると共に、最大画像表示領域ＰＮの中央以外で教師ＴＥの近くの位置に表示される。なお、使用者の視線方向が３０度未満の範囲で動いた場合には、音源方向と視線方向とのなす角度に応じてテキスト画像群が最大画像表示領域ＰＮに表示される位置が変更される。なお、視線方向と音源方向とのなす角である３０度は、請求項における第１の閾値および第２の閾値に相当する。そのため、本実施形態における頭部装着型表示装置１００では、使用者の視野ＶＲに外景として音源方向が視認されている場合には、最大画像表示領域ＰＮにおいて音源方向の近くにテキスト画像ＴＸ１１等が表示されるため、取得される音声に対する使用者の理解度が向上する。 The direction determination unit 161 determines whether the angle formed by the sound source direction and the line-of-sight direction has changed by 30 degrees or more. When it is determined that the user's line-of-sight direction hardly moves and the angle between the user's line-of-sight direction and the sound source direction is less than 30 degrees (step S360: NO), the character image continues to be displayed in the visual field VR. In FIG. 5, the position is not overlapped with the teacher TE and is displayed at a position near the teacher TE other than the center of the maximum image display area PN. When the user's line-of-sight direction moves within a range of less than 30 degrees, the position where the text image group is displayed in the maximum image display area PN is changed according to the angle formed by the sound source direction and the line-of-sight direction. . Note that 30 degrees, which is an angle between the line-of-sight direction and the sound source direction, corresponds to the first threshold value and the second threshold value in the claims. Therefore, in the head-mounted display device 100 according to the present embodiment, when the sound source direction is visually recognized as an outside scene in the user's visual field VR, the text image TX11 or the like is displayed near the sound source direction in the maximum image display area PN. Since it is displayed, the user's understanding of the acquired voice is improved.

視線方向と音源方向とがなす角度が３０度以上変化したと判定された場合には（ステップＳ３６０：ＹＥＳ）、方向判定部１６１は、使用者の視野ＶＲに目標音源が視認されていないと判定する。この場合に、画像処理部１６０は、最大画像表示領域ＰＮにカメラ６１が撮像した目標音源の画像を表示すると共に、文字画像を表示する位置を変更する（ステップＳ３７０）。 When it is determined that the angle formed by the line-of-sight direction and the sound source direction has changed by 30 degrees or more (step S360: YES), the direction determination unit 161 determines that the target sound source is not visually recognized in the user's visual field VR. To do. In this case, the image processing unit 160 displays the target sound source image captured by the camera 61 in the maximum image display area PN and changes the position where the character image is displayed (step S370).

図７は、使用者の視野ＶＲの一例を示す説明図である。図７には、使用者の視線方向が音源方向から下向きに変わり、使用者が手ＨＤに握ったペンＰＥＮを使ってノートＮＴにメモを取っている状態が示されている。図７に示すように、最大画像表示領域ＰＮの上側には、文字画像がリアルタイムでテキスト画像ＴＸ４に更新されて表示されている。また、最大画像表示領域ＰＮにおいて、テキスト画像ＴＸ４の右側には、目標音源である教師ＴＥの画像が画像ＩＭＧとして表示されている。テキスト画像ＴＸ４は、図６に示すテキスト画像群と異なり、音声を取得してから経過した時間にかかわらず、１つの固まりである文字画像として表示される。また、テキスト画像ＴＸ４は、最大画像表示領域ＰＮにおいて、画像ＩＭＧに重複せずに、かつ、画像ＩＭＧの近くに表示される。なお、本実施形態では、テキスト画像ＴＸ４を表示する位置と画像ＩＭＧの位置とは予め設定されている。言い換えれば、視線方向と音源方向とがなす角度が３０度以上の場合には、最大画像表示領域ＰＮにおける音源方向と関係のない位置にテキスト画像ＴＸ４が表示されている。そのため、本実施形態の頭部装着型表示装置１００では、音源方向が外景に視認されていなくても、最大画像表示領域ＰＮに音源方向の画像ＩＭＧと文字画像とが近くに表示されるので、取得される音声に対する使用者の理解度が向上する。 FIG. 7 is an explanatory diagram illustrating an example of the visual field VR of the user. FIG. 7 shows a state in which the user's line-of-sight direction changes from the sound source direction downward, and the user is taking notes on the notebook NT using the pen PEN held by the hand HD. As shown in FIG. 7, on the upper side of the maximum image display area PN, a character image is updated and displayed as a text image TX4 in real time. In the maximum image display area PN, an image of the teacher TE that is the target sound source is displayed as an image IMG on the right side of the text image TX4. Unlike the text image group shown in FIG. 6, the text image TX <b> 4 is displayed as a character image that is one lump regardless of the time that has elapsed since the voice was acquired. Further, the text image TX4 is displayed near the image IMG without overlapping the image IMG in the maximum image display area PN. In the present embodiment, the position where the text image TX4 is displayed and the position of the image IMG are set in advance. In other words, when the angle formed by the line-of-sight direction and the sound source direction is 30 degrees or more, the text image TX4 is displayed at a position unrelated to the sound source direction in the maximum image display area PN. Therefore, in the head-mounted display device 100 of the present embodiment, the sound source direction image IMG and the character image are displayed in the maximum image display area PN even if the sound source direction is not visually recognized in the outside scene. The user's level of understanding of the acquired voice is improved.

次に、取得音声の画像表示処理を終了するか否かが判断される（図３のステップＳ３８０）。取得音声の画像表示処理を行なうと判断された場合には（ステップＳ３８０：ＮＯ）、引き続き、ステップＳ３３０からステップＳ３７０までの処理が行なわれる。取得音声の画像表示処理を終了すると判断された場合には（ステップＳ３８０：ＹＥＳ）、使用者が所定の操作を行なうことで、取得音声の画像表示処理が終了する。 Next, it is determined whether or not to end the acquired voice image display processing (step S380 in FIG. 3). If it is determined that the acquired sound image display process is to be performed (step S380: NO), the process from step S330 to step S370 is performed. If it is determined that the acquired sound image display process is to be ended (step S380: YES), the user performs a predetermined operation to end the acquired sound image display process.

以上説明したように、本実施形態における頭部装着型表示装置１００では、マイク６３が取得した音声を変換部１８５が文字画像に変換する。操作部１３５が操作されることで音源方向が特定される。画像処理部１６０と画像表示部２０とは、音源方向に基づいて最大画像表示領域ＰＮに表示する文字画像の位置を設定する。そのため、この頭部装着型表示装置１００では、使用者が設定した方向に基づいて、使用者が取得する音声を、音声を表すテキスト画像ＴＸ１等として使用者に視認させることができ、使用者における音声を認識する理解度を向上させることができる。また、使用者が設定した方向に基づいて、取得された音声を表すテキスト画像ＴＸ１等を最大画像表示領域ＰＮに表示する位置が設定されるので、使用者は、設定した方向とテキスト画像ＴＸ１等との関係を認識しやすく、使用者の利便性が向上する。 As described above, in the head-mounted display device 100 according to this embodiment, the conversion unit 185 converts the sound acquired by the microphone 63 into a character image. The sound source direction is specified by operating the operation unit 135. The image processing unit 160 and the image display unit 20 set the position of the character image to be displayed in the maximum image display area PN based on the sound source direction. Therefore, in this head-mounted display device 100, based on the direction set by the user, the voice acquired by the user can be visually recognized by the user as a text image TX1 or the like representing the voice. The degree of comprehension of recognizing speech can be improved. Further, since the position for displaying the text image TX1 or the like representing the acquired voice in the maximum image display area PN is set based on the direction set by the user, the user can set the direction and the text image TX1 or the like set. The user's convenience is improved.

また、本実施形態の頭部装着型表示装置１００では、９軸センサー６６と方向判定部１６１とは、画像表示部２０の向きによって使用者の視線方向を推定する。画像処理部１６０と画像表示部２０とは、音源方向と視線方向とのずれに基づいて、使用者の視野ＶＲにおける取得した音声を表す文字画像を表示する位置を設定する。そのため、本実施形態の頭部装着型表示装置１００では、音源方向と視線方向とのずれに応じて、使用者の視野ＶＲに音源が視認されているかいないかが判断されて、文字画像を表示する位置が設定されるので、使用者は、音源方向と文字画像とを関連付けて認識しやすい。 In the head-mounted display device 100 of this embodiment, the 9-axis sensor 66 and the direction determination unit 161 estimate the user's line-of-sight direction based on the orientation of the image display unit 20. The image processing unit 160 and the image display unit 20 set a position for displaying a character image representing the acquired voice in the user's visual field VR based on the difference between the sound source direction and the line-of-sight direction. Therefore, in the head-mounted display device 100 according to the present embodiment, it is determined whether or not the sound source is visually recognized in the user's visual field VR according to the difference between the sound source direction and the line-of-sight direction, and a character image is displayed. Since the position is set, the user can easily recognize the sound source direction and the character image in association with each other.

Ｂ１．第２実施形態：
図８は、第２実施形態における取得音声の画像表示処理の流れを示す説明図である。図８には、複数の異なる種類の音声を表す文字画像を区別して表示する処理の流れが示されている。第２実施形態における頭部装着型表示装置１００ａでは、第１実施形態のマイク６３に代えて指向性を有さないマイク６３ａが複数の音源から異なる種類の音声を取得し、音声処理部１７０が取得された複数の音声を種類ごとに識別する。 B1. Second embodiment:
FIG. 8 is an explanatory diagram showing the flow of image display processing of acquired sound in the second embodiment. FIG. 8 shows a flow of processing for distinguishing and displaying character images representing a plurality of different types of sounds. In the head-mounted display device 100a in the second embodiment, a microphone 63a having no directivity acquires different types of sound from a plurality of sound sources instead of the microphone 63 of the first embodiment, and the sound processing unit 170 A plurality of acquired sounds are identified for each type.

初めに、マイク６３ａは、複数の音源から、種類の異なる音声を取得する（ステップＳ４１０）。音声処理部１７０は、取得された複数の種類の音声のそれぞれから音声の特徴を抽出してモデル化し、音声を種類ごとに識別して認識する（以下、「話者認識」とも呼ぶ）（ステップＳ４２０）。なお、この時点では、音声処理部１７０が音声の種類の識別のみを行ない、音源と音源から取得される音声との対応関係については特定されていない。次に、変換部１８５は、複数の種類の音声を、音声を表す文字画像へと変換する（ステップＳ４３０）。画像処理部１６０は、文字画像を制御信号として画像表示部２０に送信し、画像表示部２０は、文字画像を音声の種類ごとに区別して使用者に視認させる（ステップＳ４４０）。 First, the microphone 63a acquires different types of sound from a plurality of sound sources (step S410). The speech processing unit 170 extracts and models speech features from each of the acquired types of speech, and identifies and recognizes speech by type (hereinafter also referred to as “speaker recognition”) (step S420). At this time, the sound processing unit 170 only identifies the type of sound, and the correspondence between the sound source and the sound acquired from the sound source is not specified. Next, the conversion unit 185 converts a plurality of types of sound into character images representing the sound (step S430). The image processing unit 160 transmits the character image as a control signal to the image display unit 20, and the image display unit 20 distinguishes the character image for each type of sound and causes the user to visually recognize it (step S440).

図９は、使用者の視野ＶＲの一例を表す説明図である。図９には、使用者が外景に加えて、教師ＴＥの音声を表す文字画像と生徒ＳＴ１の音声を表す文字画像とを異なる文字画像として視認している状態が示されている。図９に示すように、最大画像表示領域ＰＮにおいて、中央部分以外の右上に教師ＴＥの音声を文字画像として表すテキスト画像ＴＸ１１と、テキスト画像ＴＸ１１の下に生徒ＳＴ１の音声を文字画像として表すテキスト画像ＴＸ１２と、が示されている。テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とは、文字画像における文字の色と文字の背景の色とが異なり、異なる種類の音声を表す文字画像として最大画像表示領域ＰＮに表示されている。 FIG. 9 is an explanatory diagram illustrating an example of the visual field VR of the user. FIG. 9 shows a state where the user visually recognizes the character image representing the voice of the teacher TE and the character image representing the voice of the student ST1 as different character images in addition to the outside scene. As shown in FIG. 9, in the maximum image display area PN, a text image TX11 that expresses the voice of the teacher TE as a character image on the upper right other than the central portion, and a text that expresses the voice of the student ST1 as a character image below the text image TX11. An image TX12 is shown. The text image TX11 and the text image TX12 are displayed in the maximum image display area PN as character images representing different types of sounds, with the color of the character in the character image being different from the color of the background of the character.

次に、カメラ６１は、外景を撮像する（図８のステップＳ４４５）。カメラ６１が外景を撮像する処理は、第１実施形態における図３のステップ３０５に示した処理と同じであるため、説明を省略する。 Next, the camera 61 captures an outside scene (step S445 in FIG. 8). The process for the camera 61 to pick up an outside scene is the same as the process shown in step 305 of FIG. 3 in the first embodiment, and a description thereof will be omitted.

次に、使用者は、操作部１３５を操作することにより、最大画像表示領域ＰＮに表示される撮像された画像から１つの音源を選択することで、マイク６３ａから複数の音源までの複数の音源方向の内から１つの音源方向を特定する（ステップＳ４５０）。図１０は、カメラ６１が撮像した外景画像ＢＩＭの一例を示す説明図である。図１０には、最大画像表示領域ＰＮにカメラ６１が撮像した外景画像ＢＩＭが表示されている。外景画像ＢＩＭには、図９に示したテキスト画像ＴＸ１１およびテキスト画像ＴＸ１２と、カーソルＣＲと、指示画像ＣＭと、が表示されている。指示画像ＣＭは、使用者が次に行なう操作の指示を示す画像である。カーソルＣＲは、使用者が方向キー１６を操作することで最大画像表示領域ＰＮ上を移動する画像である。使用者が、指示画像ＣＭに表示されている「音源を特定してください」の指示に従い、カーソルＣＲを最大画像表示領域ＰＮに表示された目標音源である教師ＴＥに重なるように移動させて決定キー１１を押下すると、１つの目標音源が選択されると共に、音源方向が特定される。 Next, the user operates the operation unit 135 to select one sound source from the captured image displayed in the maximum image display area PN, thereby allowing a plurality of sound sources from the microphone 63a to the plurality of sound sources. One sound source direction is specified from the directions (step S450). FIG. 10 is an explanatory diagram illustrating an example of an outside scene image BIM captured by the camera 61. In FIG. 10, the outside scene image BIM captured by the camera 61 is displayed in the maximum image display area PN. In the outside scene image BIM, the text image TX11 and the text image TX12, the cursor CR, and the instruction image CM shown in FIG. 9 are displayed. The instruction image CM is an image indicating an instruction for the next operation performed by the user. The cursor CR is an image that moves on the maximum image display area PN when the user operates the direction key 16. In accordance with the instruction “Specify sound source” displayed on the instruction image CM, the user moves the cursor CR so as to overlap the teacher TE, which is the target sound source displayed in the maximum image display area PN. When the key 11 is pressed, one target sound source is selected and the sound source direction is specified.

次に、選択された目標音源から取得される音声を表す文字画像が選択される（図８のステップＳ４６０）。１つの目標音源が選択されると、指示画像ＣＭ内に表示される文字は、「選択した音源から取得される文字画像を選択してください」に変更される。使用者が教師ＴＥの音声を表す文字画像としてテキスト画像ＴＸ１１を選択すると、音声処理部１７０は、テキスト画像ＴＸ１１として表される種類の音声が教師ＴＥの音声であると認識する。 Next, the character image representing the sound acquired from the selected target sound source is selected (step S460 in FIG. 8). When one target sound source is selected, the character displayed in the instruction image CM is changed to “Please select a character image acquired from the selected sound source”. When the user selects the text image TX11 as a character image representing the voice of the teacher TE, the voice processing unit 170 recognizes that the type of voice represented as the text image TX11 is the voice of the teacher TE.

次に、制御部１０は、音声の種類の数と同数の音源方向の特定が行なわれたかを判定する（ステップＳ４７０）。この時点では、テキスト画像ＴＸ１２に対応する音源方向が設定されていないため（ステップＳ４７０：ＮＯ）、ステップＳ４５０およびステップＳ４６０の処理が行なわれる。使用者が、目標音源として生徒ＳＴ１を選択して（ステップＳ４５０）、生徒ＳＴ１の音声を表す文字画像としてテキスト画像ＴＸ１２を選択すると（ステップＳ４６０）、音声処理部１７０は、テキスト画像ＴＸ１２として表される種類の音声が生徒ＳＴ１の音声であると認識する。そのため、第２実施例における頭部装着型表示装置１００ａでは、簡便な操作によって、特定の音源方向と特定の音源方向から取得される音声を表す文字画像とが設定され、使用者は容易に話者認識の設定を行なうことができる。 Next, the control unit 10 determines whether the same number of sound source directions as the number of types of sound have been specified (step S470). At this time, since the sound source direction corresponding to the text image TX12 has not been set (step S470: NO), the processes of steps S450 and S460 are performed. When the user selects the student ST1 as the target sound source (step S450) and selects the text image TX12 as the character image representing the voice of the student ST1 (step S460), the voice processing unit 170 is represented as the text image TX12. A certain kind of voice is recognized as the voice of the student ST1. Therefore, in the head-mounted display device 100a according to the second embodiment, a specific sound source direction and a character image representing voice acquired from the specific sound source direction are set by a simple operation, and the user can easily speak. Person recognition can be set.

使用者が音声の種類の数と同数の音源方向の特定を行なうと（ステップＳ４７０：ＹＥＳ）、次に、特定された音源方向と文字画像との対応関係が正しいか否かの判定が行なわれる（ステップＳ４７５）。対応関係が正しくないと判定された場合には（ステップＳ４７５：ＮＯ）、対応関係が正しくないと判定された音源方向と文字画像との組み合わせに対して、再度、ステップＳ４５０からステップＳ４７０までの処理が行なわれる。 If the user specifies the same number of sound source directions as the number of types of speech (step S470: YES), it is next determined whether or not the correspondence between the specified sound source direction and the character image is correct. (Step S475). If it is determined that the correspondence relationship is not correct (step S475: NO), the processing from step S450 to step S470 is performed again for the combination of the sound source direction and the character image determined to have the incorrect correspondence relationship. Is done.

音声の種類の数と同数の音源方向の特定が行なわれたと判定されると（ステップＳ４７５：ＹＥＳ）、画像処理部１６０と画像表示部２０は、最大画像表示領域ＰＮにおいて、指示画像ＣＭとカーソルＣＲとを非表示にして、テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とを表示する方法および位置を変更する（ステップＳ４８０）。 When it is determined that the same number of sound source directions as the number of types of audio have been specified (step S475: YES), the image processing unit 160 and the image display unit 20 display the instruction image CM and the cursor in the maximum image display area PN. The method of displaying the text image TX11 and the text image TX12 and the position thereof are changed while the CR is not displayed (step S480).

図１１は、使用者の視野ＶＲの一例を示す説明図である。図１１では、特定された音源方向と特定された音源方向から取得される音声を表す文字画像とが関連付けて表示されている。図１１に示すように、テキスト画像ＴＸ１１は、教師ＴＥの音声を表す文字画像であるため、教師ＴＥを起点とする吹き出しで囲まれた文字の画像である。また、テキスト画像ＴＸ１２は、生徒ＳＴ１の音声を表す文字画像であるため、生徒ＳＴ１を起点とする吹き出しで囲まれた文字の画像である。テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とのそれぞれでは、教師ＴＥと生徒ＳＴ１とのそれぞれから取得される音声を表す文字画像がリアルタイムで更新されて表示される。テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とに表示される文字数は予め定められており、定められた文字数を超えると、超えた分の文字は表示されなくなる。他の実施形態では、文字の表示と非表示とが文字数によって行なわれるのではなく、時間等によって行なわれてもよい。また、音源方向から取得された最後の音声から同じ種類の音声が取得されずに所定の時間が経過すると、文字画像は最大画像表示領域ＰＮに表示されなくなる。本実施形態では、所定の時間としての５分が経過すると文字画像が表示されなくなるが、他の実施形態では、所定の時間が５分以外の時間であってもよい。 FIG. 11 is an explanatory diagram illustrating an example of the visual field VR of the user. In FIG. 11, a specified sound source direction and a character image representing a sound acquired from the specified sound source direction are displayed in association with each other. As illustrated in FIG. 11, the text image TX11 is a character image representing the voice of the teacher TE, and thus is a character image surrounded by a balloon starting from the teacher TE. Further, since the text image TX12 is a character image representing the voice of the student ST1, it is a character image surrounded by a balloon starting from the student ST1. In each of the text image TX11 and the text image TX12, a character image representing a voice acquired from each of the teacher TE and the student ST1 is updated and displayed in real time. The number of characters displayed in the text image TX11 and the text image TX12 is determined in advance. When the number of characters exceeds the predetermined number, the excess characters are not displayed. In other embodiments, the display and non-display of characters may be performed not by the number of characters but by time or the like. Further, when a predetermined time elapses without acquiring the same type of sound from the last sound acquired from the sound source direction, the character image is not displayed in the maximum image display area PN. In the present embodiment, the character image is not displayed when 5 minutes as the predetermined time elapses. However, in another embodiment, the predetermined time may be a time other than 5 minutes.

次に、使用者は、取得音声の画像表示処理を終了するか否かが判断される（図８のステップＳ４９０）。ステップＳ４９０における判断は、第１実施形態の図３のステップ３８０における判断と同じであるため、説明を省略する。 Next, the user determines whether or not to end the acquired voice image display process (step S490 in FIG. 8). Since the determination in step S490 is the same as the determination in step 380 of FIG. 3 of the first embodiment, the description thereof is omitted.

以上説明したように、第２実施形態における頭部装着型表示装置１００ａでは、音声処理部１７０は、取得された複数の音声を種類ごとに識別する。操作部１３５は、使用者による操作を受け付けることにより、マイク６３ａから、マイク６３ａが取得する複数の種類の音声のうち特定の音声が取得された音源までの方向を特定する。画像処理部１６０と画像表示部２０とは、使用者の視野ＶＲにおいて、テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とを最大画像表示領域ＰＮに表示する位置を、それぞれテキスト画像ＴＸ１１、テキスト画像ＴＸ１２、として表される音声が取得される音源方向の近くに設定する。また、画像処理部１６０と画像表示部２０とは、使用者の視野ＶＲにおいて、テキスト画像ＴＸ１１とテキスト画像ＴＸ１２とを最大画像表示領域ＰＮに表示する位置のそれぞれを、複数の音源方向のいずれにも重複しない位置に設定する。そのため、第２実施形態における頭部装着型表示装置１００ａでは、複数の人が話す会話であっても、最大画像表示領域ＰＮにおいて、発言者の声を表す文字画像が当該発言者の近くの位置に表示される。よって、使用者は、聴覚に加えて、視覚で発言者と発言者の音声を表す文字画像とを関連付けて認識でき、会話の内容を理解しやすい。また、取得された音声を表す文字画像のいずれも、複数の音源方向と重複しない位置に表示されるため、使用者は、発言者と発言者の音声を表す文字画像とをより関連付けて視認することができる。 As described above, in the head-mounted display device 100a according to the second embodiment, the sound processing unit 170 identifies a plurality of acquired sounds for each type. The operation unit 135 specifies a direction from the microphone 63a to a sound source from which a specific sound is acquired from among a plurality of types of sound acquired by the microphone 63a by receiving an operation by the user. The image processing unit 160 and the image display unit 20 represent the positions where the text image TX11 and the text image TX12 are displayed in the maximum image display region PN in the user's visual field VR as the text image TX11 and the text image TX12, respectively. Set near the sound source direction from which the sound to be acquired is acquired. In addition, the image processing unit 160 and the image display unit 20 indicate the positions at which the text image TX11 and the text image TX12 are displayed in the maximum image display area PN in any of a plurality of sound source directions in the user's visual field VR. Are also set to positions that do not overlap. Therefore, in the head-mounted display device 100a according to the second embodiment, a character image representing the voice of a speaker is located near the speaker in the maximum image display area PN even in a conversation where a plurality of people speak. Is displayed. Therefore, the user can visually recognize the speaker and the character image representing the voice of the speaker in addition to hearing, and can easily understand the content of the conversation. In addition, since all of the character images representing the acquired voice are displayed at positions that do not overlap with the plurality of sound source directions, the user visually recognizes the speaker and the character image representing the voice of the speaker more closely. be able to.

Ｂ２．第３実施形態：
図１３は、第３実施形態における頭部装着型表示装置１００ｂの構成を機能的に示す説明図である。第３実施形態における頭部装着型表示装置１００ｂでは、上記実施形態の頭部装着型表示装置１００と比べて、制御部１０ｂが無線通信部１３２を有することと、制御部１０ｂのＣＰＵ１４０ｂが画像判定部１４２として機能することと、が異なる。また、第３実施形態では、音声処理部１７０ｂは、マイク６３によって取得された音声とは異なる無線通信部１３２によって取得された音声信号を取得して、音声信号を音声としてイヤホン３２，３４のスピーカーに対して供給する。 B2. Third embodiment:
FIG. 13 is an explanatory diagram functionally showing the configuration of the head-mounted display device 100b according to the third embodiment. In the head-mounted display device 100b according to the third embodiment, the control unit 10b includes the wireless communication unit 132 and the CPU 140b of the control unit 10b performs image determination as compared with the head-mounted display device 100 according to the above-described embodiment. It differs from functioning as the part 142. In the third embodiment, the audio processing unit 170b acquires an audio signal acquired by the wireless communication unit 132 that is different from the audio acquired by the microphone 63, and uses the audio signal as audio, and the speakers of the earphones 32 and 34. Supply against.

無線通信部１３２は、無線ＬＡＮやbluetooth（登録商標）といった所定の無線通信規格に則って他の機器との間で無線通信を行なう。無線通信部１３２は、無線通信を行なうことで、マイク６３によって取得された音声とは異なる音声を表す音声信号を取得する。無線通信部１３２が取得する音声信号としては、例えば、ラジオ等から放送される音声信号や、頭部装着型表示装置１００ｂのマイク６３とは異なるマイクによって取得されてデジタル化された音声信号などが含まれる。画像処理部１６０は、カメラ６１によって継続的に撮像された外景画像（動画）に含まれる対象物を顔認識によって目標音源の候補として抽出する。画像判定部１４２は、抽出された目標音源の候補に対して、記憶部１２０に予め記憶された特定の対象物の画像データと同じ特定の対象物の動きの変化を、パターンマッチングによって判定する。例えば、目標音源の候補として、教師ＴＥが抽出された場合に、画像判定部１４２は、教師ＴＥの体の一部である口の部分に対してパターンマッチングを行ない、教師ＴＥの口の開閉状態を判定する。なお、無線通信部１３２は、請求項における通信部に相当する。カメラ６１によって継続的に撮像された外景画像（動画）は、請求項における画像取得部が取得する複数の時点における外景の画像に相当する。使用者は、カメラ６１によって撮像された外景画像とは別に、画像表示部２０の導光板２６１，２６２を透過した外景と、最大画像表示領域ＰＮに表示された表示画像と、を視認できる。 The wireless communication unit 132 performs wireless communication with other devices in accordance with a predetermined wireless communication standard such as a wireless LAN or bluetooth (registered trademark). The wireless communication unit 132 acquires a sound signal representing a sound different from the sound acquired by the microphone 63 by performing wireless communication. Examples of the audio signal acquired by the wireless communication unit 132 include an audio signal broadcast from a radio or the like, and an audio signal acquired and digitized by a microphone different from the microphone 63 of the head-mounted display device 100b. included. The image processing unit 160 extracts a target included in an outside scene image (moving image) continuously captured by the camera 61 as a target sound source candidate by face recognition. The image determination unit 142 determines, by pattern matching, a change in the movement of the specific object that is the same as the image data of the specific object stored in advance in the storage unit 120 with respect to the extracted target sound source candidate. For example, when the teacher TE is extracted as a candidate for the target sound source, the image determination unit 142 performs pattern matching on the mouth portion that is a part of the teacher TE's body, and the opening / closing state of the teacher TE's mouth Determine. The wireless communication unit 132 corresponds to the communication unit in the claims. The outside scene image (moving image) continuously captured by the camera 61 corresponds to an outside scene image at a plurality of times acquired by the image acquisition unit in the claims. The user can visually recognize the outside scene transmitted through the light guide plates 261 and 262 of the image display unit 20 and the display image displayed in the maximum image display area PN separately from the outside scene image captured by the camera 61.

図１４は、第３実施形態における取得音声の画像表示処理の流れを示す説明図である。第３実施形態における取得音声の画像表示処理では、教師ＴＥの口の開閉状態を検出して音源方向が設定され、カメラ６１によって取得された音声とは異なる音声が異なる種類の文字画像として変換される点が、第１実施形態の取得音声の画像表示処理と異なる。 FIG. 14 is an explanatory diagram illustrating a flow of an acquired sound image display process according to the third embodiment. In the acquired voice image display process according to the third embodiment, the sound source direction is set by detecting the opening / closing state of the mouth of the teacher TE, and the voice different from the voice acquired by the camera 61 is converted as a different type of character image. This is different from the acquired sound image display processing of the first embodiment.

第３実施形態における取得音声の画像表示処理では、初めに、カメラ６１によって外景が撮像されると（ステップＳ５０５）、画像処理部１６０は、目標音源の候補として教師ＴＥを抽出する（ステップＳ５０６）。図１５は、使用者の視野ＶＲの一例を示す説明図である。図１５に示すように、目標音源の候補として教師ＴＥが抽出されると、次に、画像判定部１４２は、さらに、パターンマッチングによって、教師ＴＥの口の周辺部ＭＯ（以降、単に「口ＭＯ」とも呼ぶ）を抽出し、口ＭＯに開閉状態の変化があるか否かを判定する（図１４のステップＳ５０７）。教師ＴＥの口ＭＯに開閉状態の変化がないと判定された場合には（ステップＳ５０７：ＮＯ）、画像判定部１４２は、引き続き、教師ＴＥの口ＭＯを抽出した状態で、口ＭＯの開閉状態の変化の検出を監視する（ステップＳ５０７）。教師ＴＥの口ＭＯに開閉状態の変化があったと判定された場合には（ステップＳ５０７：ＹＥＳ）、操作部１３５が操作されなくても、記憶部１２０は、教師ＴＥの口ＭＯを目標音源として記憶し、音源方向を設定する（ステップＳ５１０）。 In the acquired sound image display processing in the third embodiment, when an outside scene is first imaged by the camera 61 (step S505), the image processing unit 160 extracts a teacher TE as a target sound source candidate (step S506). . FIG. 15 is an explanatory diagram illustrating an example of the visual field VR of the user. As illustrated in FIG. 15, when the teacher TE is extracted as a target sound source candidate, the image determination unit 142 then further performs a pattern matching on the peripheral portion MO of the teacher TE's mouth (hereinafter simply “mouth MO”). Is also determined, and it is determined whether or not there is a change in the opening / closing state of the mouth MO (step S507 in FIG. 14). If it is determined that there is no change in the opening / closing state of the mouth MO of the teacher TE (step S507: NO), the image determination unit 142 continues to extract the opening / closing state of the mouth MO while extracting the mouth MO of the teacher TE. The detection of the change is monitored (step S507). If it is determined that the opening / closing state of the mouth MO of the teacher TE has changed (step S507: YES), the storage unit 120 sets the mouth MO of the teacher TE as the target sound source even if the operation unit 135 is not operated. The sound source direction is stored and set (step S510).

音源方向が設定されると、マイク駆動部１６３がマイク６３の向きを設定し（ステップＳ５２０）、マイク６３および無線通信部１３２は、音声および音声を表す音声信号を取得する（ステップＳ５３０）。第３実施形態では、無線通信部１３２が通信によって取得したインターネットラジオの音声信号を取得する。次に、変換部１８５は、マイク６３が取得した音声と、無線通信部１３２が取得した音声信号と、を異なる種類のフォントの文字画像へと変換する（ステップＳ５４０）。 When the sound source direction is set, the microphone driving unit 163 sets the direction of the microphone 63 (step S520), and the microphone 63 and the wireless communication unit 132 acquire voice and a voice signal representing the voice (step S530). In 3rd Embodiment, the radio | wireless communication part 132 acquires the audio | voice signal of the internet radio acquired by communication. Next, the conversion unit 185 converts the voice acquired by the microphone 63 and the voice signal acquired by the wireless communication unit 132 into character images of different types of fonts (step S540).

音声および音声信号が文字画像へと変換されると、画像処理部１６０および画像表示部２０は、変換された異なる種類の文字画像を使用者に視認させる（ステップＳ５５０）。図１６は、使用者の視野ＶＲの一例を示す説明図である。図１６に示すように、最大画像表示領域ＰＮには、マイク６３によって取得される音声がテキスト画像ＴＸ４１と表示され、無線通信部１３２によって取得された音声信号がテキスト画像ＴＸ４２として表示される。テキスト画像ＴＸ４１のフォントはＭＳゴシックであり、テキスト画像ＴＸ４２のフォントはＭＳ明朝である。なお、他の実施形態では、最大画像表示領域ＰＮには、異なるフォントによってテキスト画像ＴＸ４１等が表示されてもよいし、太字や細字、文字の大小等、文字画像を形成するフォント以外の要素が変更されて表示されてもよい。 When the voice and the voice signal are converted into a character image, the image processing unit 160 and the image display unit 20 cause the user to visually recognize different types of converted character images (step S550). FIG. 16 is an explanatory diagram illustrating an example of the visual field VR of the user. As shown in FIG. 16, in the maximum image display area PN, the sound acquired by the microphone 63 is displayed as the text image TX41, and the sound signal acquired by the wireless communication unit 132 is displayed as the text image TX42. The font of the text image TX41 is MS Gothic, and the font of the text image TX42 is MS Mincho. In other embodiments, the maximum image display area PN may display a text image TX41 or the like using different fonts, or may include elements other than fonts that form a character image, such as bold and thin characters, and the size of characters. It may be changed and displayed.

使用者に文字画像が視認させると（図１４のステップＳ５５０）、画像判定部１４２は、外景画像の目標音源の位置の変化を監視する（ステップＳ５６０）。カメラ６１が外景画像を継続して撮像しているため、画像判定部１４２は、複数の時点において、外景画像における目標音源である教師ＴＥの口の位置を特定できる。外景画像における教師ＴＥの口の位置が変化したと判定された場合（ステップＳ５６０：ＹＥＳ）、例えば、教師ＴＥの口ＭＯの位置が、図１６に示す視野ＶＲにおける位置から図６に示す視野ＶＲにおける位置へと変化した場合には、画像処理部１６０および画像表示部２０は、最大画像表示領域ＰＮに文字画像を表示する位置を変更する（図１４のステップＳ５７０）。画像処理部１６０および画像表示部２０は、最大画像表示領域ＰＮにおいて、位置が変化した教師ＴＥの口ＭＯの近くに文字画像光を表示する。文字画像光の表示位置が変更される（ステップＳ５７０）、または、ステップＳ５６０の処理において外景画像の目標音源の位置が変化していないと判断された場合（ステップＳ５６０：ＮＯ）、取得音声の画像表示処理を終了するか否かが判断される（ステップＳ５８０）。 When the user visually recognizes the character image (step S550 in FIG. 14), the image determination unit 142 monitors the change in the position of the target sound source in the outside scene image (step S560). Since the camera 61 continuously captures the outside scene image, the image determination unit 142 can specify the position of the mouth of the teacher TE that is the target sound source in the outside scene image at a plurality of points in time. If it is determined that the position of the mouth of the teacher TE in the outside scene image has changed (step S560: YES), for example, the position of the mouth MO of the teacher TE is changed from the position in the field of view VR shown in FIG. 16 to the field of view VR shown in FIG. When the position is changed to the position, the image processing unit 160 and the image display unit 20 change the position where the character image is displayed in the maximum image display area PN (step S570 in FIG. 14). The image processing unit 160 and the image display unit 20 display character image light near the mouth MO of the teacher TE whose position has changed in the maximum image display region PN. If the display position of the character image light is changed (step S570), or if it is determined in step S560 that the position of the target sound source of the outside scene image has not changed (step S560: NO), an image of the acquired voice It is determined whether or not to end the display process (step S580).

以上説明したように、第３実施形態における頭部装着型表示装置１００ｂでは、カメラ６１が継続的に外景画像を撮像し、画像処理部１６０および画像表示部２０は、外景画像における教師ＴＥの口ＭＯの位置が変化したと判定された場合には、最大画像表示領域ＰＮに文字画像を表示する位置を変更する。そのため、第３実施形態における頭部装着型表示装置１００ｂでは、音源方向がより詳細に使用者に認識され、文字画像が表示される位置が音源方向の近くに設定されるため、使用者に音源方向と目標音源が発する音声を表す文字画像とをより関連付けて認識させやすい。 As described above, in the head-mounted display device 100b according to the third embodiment, the camera 61 continuously captures an outside scene image, and the image processing unit 160 and the image display unit 20 use the mouth of the teacher TE in the outside scene image. If it is determined that the position of the MO has changed, the position where the character image is displayed in the maximum image display area PN is changed. Therefore, in the head-mounted display device 100b according to the third embodiment, the sound source direction is recognized in more detail by the user, and the position where the character image is displayed is set near the sound source direction. The direction and the character image representing the sound emitted from the target sound source can be easily associated with each other and recognized.

また、第３実施形態における頭部装着型表示装置１００ｂでは、音声処理部１７０ｂがマイク６３によって取得された音声とは異なる無線通信部１３２によって取得された音声信号を取得する。変換部１８５は、マイク６３が取得した音声と、無線通信部１３２が取得した音声信号と、を異なる種類のフォントの文字画像へと変換する。そのため、第３実施形態における頭部装着型表示装置１００ｂでは、マイク６３によって取得された音声を表すテキスト画像ＴＸ４１と、マイク６３以外によって取得した音声を表すテキスト画像ＴＸ４２と、が異なる種類の文字画像で表示されるため、使用者に音声を発する音源の違いを視覚によって認識させることができる。また、この頭部装着型表示装置１００ｂでは、マイク６３によって取得された音声とは異なる音声は、無線通信部１３２による通信によって取得された音声信号を表す音声である。そのため、この頭部装着型表示装置１００ｂでは、使用者は、マイク６３によって取得される外部の音声のみでなく、通信によって取得されたさまざま音声信号を表す音声を聞くと共に、通信によって取得された音声を視覚情報として認識できる。 In the head-mounted display device 100b according to the third embodiment, the audio processing unit 170b acquires an audio signal acquired by the wireless communication unit 132 different from the audio acquired by the microphone 63. The conversion unit 185 converts the voice acquired by the microphone 63 and the voice signal acquired by the wireless communication unit 132 into character images of different types of fonts. Therefore, in the head-mounted display device 100b according to the third embodiment, the text image TX41 representing the voice acquired by the microphone 63 and the text image TX42 representing the voice acquired by other than the microphone 63 are different types of character images. Therefore, it is possible to make the user visually recognize the difference between sound sources that emit sound. In the head-mounted display device 100b, the sound different from the sound acquired by the microphone 63 is a sound representing an audio signal acquired by communication by the wireless communication unit 132. Therefore, in the head-mounted display device 100b, the user listens not only to the external sound acquired by the microphone 63 but also to the sound representing various audio signals acquired by communication, and the sound acquired by communication. Can be recognized as visual information.

Ｃ．変形例：
なお、この発明は上記実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば、次のような変形も可能である。 C. Variation:
In addition, this invention is not limited to the said embodiment, It can implement in a various aspect in the range which does not deviate from the summary, For example, the following deformation | transformation is also possible.

Ｃ１．変形例１：
上記実施形態では、頭部装着型表示装置１００において、変換部１８５が取得された音声をリアルタイムで文字画像に変換して、使用者に文字画像を視認させる態様としたが、文字画像を使用者に視認させる方法はこれに限られず、種々変形可能である。 C1. Modification 1:
In the above embodiment, in the head-mounted display device 100, the voice obtained by the conversion unit 185 is converted into a character image in real time so that the user can visually recognize the character image. The method of visually recognizing is not limited to this, and various modifications are possible.

図１２は、使用者の視野ＶＲの一例を示す説明図である。図１２では、記憶部１２０に記憶された音声が種類ごとに区別されて、音声が録音された時間に関連付けられて最大画像表示領域ＰＮに音声を表す文字画像として表示されている。図１２に示すように、最大画像表示領域ＰＮの左側には、音声の録音を開始してからの経過時間ＴＭが表示されている。最大画像表示領域ＰＮにおける経過時間ＴＭの右側には、複数の文字画像であるテキスト画像ＴＸ３５等が表示されている。テキスト画像ＴＸ３５等は、経過時間ＴＭに示す録音された時間と関連付けられて表示されている。例えば、テキスト画像ＴＸ３５は、音声を文字画像へと変換し始めた録音開始時に、録音され始めた音声を表す文字画像であり、テキスト画像ＴＸ３５内の文字から音源が教師ＴＥであることがわかる。また、テキスト画像ＴＸ３２は、録音が開始されてからおよそ３２分が経過したときに、録音され始めた音声を表す文字画像であり、テキスト画像ＴＸ３２内の文字から音源が生徒ＳＴ１であることがわかる。 FIG. 12 is an explanatory diagram illustrating an example of the visual field VR of the user. In FIG. 12, the voice stored in the storage unit 120 is distinguished for each type and displayed as a character image representing the voice in the maximum image display area PN in association with the time when the voice was recorded. As shown in FIG. 12, an elapsed time TM from the start of voice recording is displayed on the left side of the maximum image display area PN. On the right side of the elapsed time TM in the maximum image display area PN, a text image TX35, which is a plurality of character images, is displayed. The text image TX35 or the like is displayed in association with the recorded time indicated by the elapsed time TM. For example, the text image TX35 is a character image representing the voice that has started to be recorded at the start of recording when the voice is started to be converted into a character image, and it can be seen from the characters in the text image TX35 that the sound source is the teacher TE. In addition, the text image TX32 is a character image representing the sound that has been recorded when approximately 32 minutes have elapsed since the recording was started, and it can be seen from the characters in the text image TX32 that the sound source is the student ST1. .

図１２に示すカーソルＣＲがテキスト画像ＴＸ３５等に重ねられて選択されると、テキスト画像ＴＸ３５として表示されている文字画像が最大画像表示領域ＰＮの全域に拡大して表示される。また、使用者からの操作によって、音源方向を撮像した画像が文字画像とあわせて表示される。そのため、この頭部装着型表示装置１００では、使用者は、音声を聞き逃した場合や、文字画像を見逃した場合でも、記憶部１２０に記憶された音声を表す文字画像を後から確認できる。また、記憶部１２０には、音源方向を撮像した画像や録音された時間の対応関係についても記憶されているので、使用者が過去に録音した音声を表す文字画像を検索する場合などに利便性が向上する。 When the cursor CR shown in FIG. 12 is selected by being superimposed on the text image TX35 or the like, the character image displayed as the text image TX35 is enlarged and displayed over the entire maximum image display area PN. In addition, an image of the sound source direction is displayed together with the character image by an operation from the user. Therefore, in the head-mounted display device 100, the user can later confirm the character image representing the voice stored in the storage unit 120 even when the user misses the voice or misses the character image. In addition, the storage unit 120 also stores the correspondence between the sound source direction image and the recorded time, which is convenient when searching for a character image representing the voice recorded by the user in the past. Will improve.

また、使用者が音声を聞いている場合に、操作部１３５に所定の操作が行なわれることで、所定の操作を受け付けている間に取得される音声が特別な文字画像として表示されてもよいし、特別な音声として記憶部１２０に記憶されてもよい。この頭部装着型表示装置１００では、使用者の操作によって、文字画像が特徴付けられて最大画像表示領域ＰＮに表示されたり、音声が特徴付けられて記憶されたりする。よって、使用者は、音声および文字画像ごとに新たな情報を付加できるため、使用者の利便性が向上する。 In addition, when the user is listening to a voice, a predetermined operation is performed on the operation unit 135 so that the voice acquired while the predetermined operation is received may be displayed as a special character image. However, it may be stored in the storage unit 120 as a special voice. In the head-mounted display device 100, a character image is characterized and displayed in the maximum image display area PN or a voice is characterized and stored by a user operation. Therefore, since the user can add new information for each voice and character image, the convenience for the user is improved.

また、画像表示部２０は、音声が取得された時点から所定の時間遅らせて文字画像を使用者に視認させてもよい。例えば、画像表示部２０は、取得された音声を３０秒遅れて最大画像表示領域ＰＮに表示させてもよい。この頭部装着型表示装置１００では、使用者は、一時的に音声を聞き逃すと共にリアルタイムでの音声を表す文字画像を見逃した場合に、音声よりも遅れて表示される文字画像によって、現時点で聞いている音声とあわせて３０秒前に聞き逃した音声を表す文字画像として視認できる。よって、使用者は、取得される音声の前後のつながりを理解しやすい。 Further, the image display unit 20 may cause the user to visually recognize the character image with a predetermined time delay from the time when the sound is acquired. For example, the image display unit 20 may display the acquired sound in the maximum image display area PN with a delay of 30 seconds. In the head-mounted display device 100, when the user temporarily misses the voice and misses the character image representing the real-time voice, the character image displayed later than the voice is displayed at the present time. It can be visually recognized as a character image representing the voice missed 30 seconds before the voice being heard. Therefore, the user can easily understand the connection before and after the acquired voice.

Ｃ２．変形例２：
上記実施形態では、推定された使用者の視線方向と音源方向とのずれによって、最大画像表示領域ＰＮにおいて、表示される文字画像の位置が設定されたが、表示される文字画像の位置の設定方法はこれに限られず、種々変形可能である。例えば、推定された使用者の視線方向のみに基づいて、最大画像表示領域ＰＮに表示される文字画像の位置や方法が設定されてもよい。 C2. Modification 2:
In the above embodiment, the position of the character image to be displayed is set in the maximum image display area PN due to the deviation between the estimated user's line-of-sight direction and the sound source direction. The method is not limited to this, and various modifications are possible. For example, the position and method of the character image displayed in the maximum image display area PN may be set based only on the estimated user's line-of-sight direction.

方向判定部１６１と９軸センサー６６とは、画像表示部２０の動きを検出することで、取得される音声を表す文字画像が最大画像表示領域ＰＮに表示されている表示状態を基準として、視線方向の角度の変化量と角速度とのうち少なくとも一方を推定する。なお、視線方向の角度の変化量と角速度とは、請求項における特定値に相当する。また、９軸センサー６６が地磁気を検出できるため、９軸センサー６６によって重力方向と重力方向に垂直な水平方向とが特定され、方向判定部１６１と９軸センサー６６とは、重力方向と水平方向とに対する角度の変化量および角速度を推定できる。この変形例の頭部装着型表示装置１００では、視線方向の角度が３０度以上変化した場合、または、角速度が１（ラジアン／秒）以上の場合には、方向判定部１６１は、使用者が最大画像表示領域ＰＮに表示される文字画像ではなく、外景を視認したいと判定する。この場合に、画像処理部１６０と画像表示部２０とは、最大画像表示領域ＰＮにおいて、中央部分以外で、使用者の視線が移った逆の方向に、文字画像を表示する位置を変更する。例えば、使用者の視線方向が重力方向に対して上を向いた場合には、使用者が上の外景を視認したいと推定できるので、最大画像表示領域ＰＮにおける下側に文字画像が表示される。そのため、この変形例の頭部装着型表示装置１００では、使用者が視認したい方向にあわせて、使用者の視界の妨げにならない位置に自動で文字画像を表示する位置が変更されるので、使用者の使い勝手が向上する。なお、３０度は請求項における第３の閾値に相当し、１（ラジアン／秒）は請求項における第５の閾値に相当する。なお、他の実施形態では、閾値として、角度の変化量が３０度以外の数値であってもよいし、角速度が１（ラジアン／秒）以外の数値であってもよい。 The direction determination unit 161 and the 9-axis sensor 66 detect the movement of the image display unit 20, and the line of sight is based on the display state in which the character image representing the acquired voice is displayed in the maximum image display region PN. At least one of the change amount of the direction angle and the angular velocity is estimated. Note that the amount of change in the angle in the line-of-sight direction and the angular velocity correspond to specific values in the claims. In addition, since the 9-axis sensor 66 can detect geomagnetism, the 9-axis sensor 66 specifies the gravity direction and the horizontal direction perpendicular to the gravity direction. The direction determination unit 161 and the 9-axis sensor 66 have the gravity direction and the horizontal direction. It is possible to estimate the amount of change in angle and the angular velocity with respect to. In the head-mounted display device 100 of this modification, when the angle of the line-of-sight direction changes by 30 degrees or more, or when the angular velocity is 1 (radian / second) or more, the direction determination unit 161 is It is determined that the user wants to view the outside scene instead of the character image displayed in the maximum image display area PN. In this case, the image processing unit 160 and the image display unit 20 change the position where the character image is displayed in the opposite direction in which the user's line of sight moves in the maximum image display area PN except for the central portion. For example, when the user's line-of-sight direction is directed upward with respect to the direction of gravity, the user can estimate that he / she wants to view the upper outside scene, so that a character image is displayed on the lower side in the maximum image display area PN. . Therefore, in the head-mounted display device 100 according to this modification, the position for automatically displaying the character image is changed to a position that does not hinder the user's view according to the direction that the user wants to view. User convenience is improved. Note that 30 degrees corresponds to the third threshold value in the claims, and 1 (radian / second) corresponds to the fifth threshold value in the claims. In another embodiment, the threshold value may be a numerical value other than 30 degrees, and the angular velocity may be a numerical value other than 1 (radian / second).

また、この変形例の頭部装着型表示装置１００では、表示状態を基準として、視線方向の角度の変化量が３０度未満、かつ、視線方向の角速度が１（ラジアン／秒）未満であり、かつ、その状態で３０秒が経過した場合には、方向判定部１６１は、使用者が最大画像表示領域ＰＮに表示される文字画像に注目していると判定する。この場合に、画像処理部１６０と画像表示部２０とは、最大画像表示領域ＰＮにおける中央部分に、文字画像を表示する。そのため、この変形例の頭部装着型表示装置１００では、使用者が文字画像に注目していると判定された場合に、使用者が視認しやすい位置に自動で文字画像を表示するので、使用者の使い勝手が向上する。なお、３０度は請求項における第４の閾値に相当し、１（ラジアン／秒）は請求項における第６の閾値に相当する。 Further, in the head-mounted display device 100 of this modified example, with respect to the display state, the amount of change in the angle of the line of sight is less than 30 degrees, and the angular velocity in the line of sight is less than 1 (radians / second), If 30 seconds have elapsed in this state, the direction determination unit 161 determines that the user is paying attention to the character image displayed in the maximum image display area PN. In this case, the image processing unit 160 and the image display unit 20 display a character image at the central portion in the maximum image display area PN. Therefore, in the head-mounted display device 100 according to this modification, when it is determined that the user is paying attention to the character image, the character image is automatically displayed at a position that is easy for the user to visually recognize. User convenience is improved. Note that 30 degrees corresponds to the fourth threshold value in the claims, and 1 (radian / second) corresponds to the sixth threshold value in the claims.

Ｃ３．変形例３：
上記実施形態では、特定の方向として目標音源が存在する音源方向が設定されたが、必ずしも音源方向が特定の方向である必要はなく、設定される特定の方向については種々変形可能である。例えば、音源であるバスガイドがスカイツリーのようなランドマークの説明を行なっている場合に、音源ではなく、使用者の視野ＶＲに視認されるスカイツリーの方向が特定の方向として設定されてもよい。この場合に、使用者は、常に音源とは異なる特定の方向であるスカイツリーを視認しながら、文字画像を視認できる。この変形例における頭部装着型表示装置１００では、使用者が視認したい特定の方向を自由に設定できるため、使用者の使い勝手が向上する。 C3. Modification 3:
In the above embodiment, the sound source direction in which the target sound source exists is set as the specific direction. However, the sound source direction does not necessarily have to be a specific direction, and the set specific direction can be variously modified. For example, when a bus guide that is a sound source is explaining a landmark such as a sky tree, the direction of the sky tree that is visible not in the sound source but in the user's visual field VR is set as a specific direction. Good. In this case, the user can visually recognize the character image while visually recognizing the sky tree in a specific direction different from the sound source. In the head-mounted display device 100 according to this modified example, a user can freely set a specific direction that the user wants to visually recognize, which improves user convenience.

また、特定の方向としては、使用者が視認している視線方向、使用者の背後方向、車の進行方向であってもよい。例えば、特定の方向が使用者の背後方向であって場合には、表示される文字画像の態様を変化させて、使用者に背後方向で取得された音声を表す文字画像であることを認識させてもよい。 Further, the specific direction may be a line-of-sight direction visually recognized by the user, a backward direction of the user, or a traveling direction of the vehicle. For example, if the specific direction is behind the user, the character image displayed is changed so that the user recognizes the character image representing the voice acquired in the backward direction. May be.

また、上記実施形態では、人間から取得される声を音声の一例としたが、取得されて変換される音声はこれに限られず、種々変形可能であり、人間以外の生物である動物の鳴き声等であってもよいし、サイレンといった警告音や効果音であってもよい。例えば、使用者が多くの人に囲まれていて、多くの人の声を聞いている場合に、音声処理部１７０が警告音のみを識別して、警告音を表す文字画像を使用者に視認させることで、いち早く使用者に警告を告知できるため、使用者の利便性が向上する。 In the above embodiment, the voice acquired from a human is an example of a voice. However, the voice acquired and converted is not limited to this, and can be variously modified. It may be a warning sound such as a siren or a sound effect. For example, when the user is surrounded by many people and listening to the voices of many people, the voice processing unit 170 identifies only the warning sound and visually recognizes the character image representing the warning sound. By doing so, the user can be alerted promptly, and the convenience for the user is improved.

また、上記実施形態では、使用者の視線方向が画像表示部２０に備え付けられた９軸センサー６６によって推定されたが、使用者の視線方向の推定方法はこれに限られず、種々変形可能である。例えば、ＣＣＤカメラによって使用者の眼を撮像して、撮像した画像を解析することで、使用者の視線方向が推定されてもよい。 In the above embodiment, the user's line-of-sight direction is estimated by the nine-axis sensor 66 provided in the image display unit 20, but the method of estimating the user's line-of-sight direction is not limited to this, and various modifications can be made. . For example, the line of sight of the user may be estimated by capturing the eyes of the user with a CCD camera and analyzing the captured image.

また、上記実施形態では、文字画像は、最大画像表示領域ＰＮにおいて、音源方向の近くの位置に表示され、中央部分以外に表示される態様としたが、文字画像が表示される位置や方法はこれに限られず、種々変形可能である。例えば、文字画像が使用者に視認される外景と比較して重要であるならば、最大画像表示領域ＰＮの中央部分に文字画像が表示されてもよい。また、使用者の視野ＶＲにおいて、目標音源が隅の方で視認されており、できるだけ多くの文字数を最大画像表示領域ＰＮに表示させたい場合には、音源方向に重ならない最大画像表示領域ＰＮの全域に文字画像が表示されてもよい。また、使用者の視野ＶＲにおける外景の視認性を高めるために、音源方向と文字画像とが重ねられて表示されてもよい。 In the above embodiment, the character image is displayed at a position near the sound source direction in the maximum image display area PN and displayed in a portion other than the central portion. However, the position and method of displaying the character image are as follows. The present invention is not limited to this, and various modifications are possible. For example, if the character image is more important than the outside scene visually recognized by the user, the character image may be displayed in the central portion of the maximum image display area PN. Further, in the user's visual field VR, when the target sound source is visually recognized at the corner and it is desired to display as many characters as possible in the maximum image display area PN, the maximum sound display area PN that does not overlap the sound source direction is displayed. A character image may be displayed in the entire area. Further, in order to improve the visibility of the outside scene in the user's visual field VR, the sound source direction and the character image may be displayed in an overlapping manner.

Ｃ４．変形例４：
上記実施形態では、マイク６３は、画像表示部２０に対して相対的に動いて向きを変えることで、音声を取得する感度をマイク６３から音源への方向に応じて変える態様としたが、必ずしも画像表示部２０に対して相対的に動く必要はなく、マイク６３の構造および構成は種々変形可能である。例えば、変形例におけるマイク６３ｂは、別々の方向を向いている指向性を有する複数のマイクから構成されていてもよい。この場合に、使用者からの操作によって、複数のマイクのうちいくつかのマイクからの音声を取得しないことで、マイク６３ｂが方向ごとに音声を取得する感度を変更できる。この頭部装着型表示装置１００では、マイク６３ｂを画像表示部２０に対して相対的に動かす構造が不要であるため、簡便な構成によって音声を取得するマイク６３ｂの指向性が設定される。 C4. Modification 4:
In the above-described embodiment, the microphone 63 moves relative to the image display unit 20 and changes the direction thereof, thereby changing the sensitivity for acquiring sound according to the direction from the microphone 63 to the sound source. There is no need to move relative to the image display unit 20, and the structure and configuration of the microphone 63 can be variously modified. For example, the microphone 63b in the modified example may be composed of a plurality of microphones having directivity facing different directions. In this case, the sensitivity with which the microphone 63b acquires the sound for each direction can be changed by not acquiring the sound from some of the plurality of microphones by the operation from the user. In the head-mounted display device 100, since the structure for moving the microphone 63b relative to the image display unit 20 is not required, the directivity of the microphone 63b that acquires sound is set with a simple configuration.

Ｃ５．変形例５：
上記実施形態では、視線方向と音源方向とがなす角度が３０度以上の場合に、最大画像表示領域ＰＮにおいて、カメラ６１が撮像した音源方向の画像を表示して、音源方向の画像の近くに文字画像が表示される態様としたが、視線方向と音源方向との関係によって、文字画像を表示される方法および位置が変更される態様については種々変形可能である。例えば、視線方向と音源方向とがなす角度に関わらず、最大画像表示領域ＰＮにおける予め定められた位置に常に文字画像が表示される態様であってもよい。また、視線方向と音源方向とがなす角度が３０度を閾値として、文字画像が最大画像表示領域ＰＮに表示される位置が設定されるのではなく、３０度よりも大きい角度や小さい角度が閾値として設定されてもよい。 C5. Modification 5:
In the above embodiment, when the angle formed by the line-of-sight direction and the sound source direction is 30 degrees or more, the image of the sound source direction captured by the camera 61 is displayed in the maximum image display area PN, and is close to the image of the sound source direction. Although the character image is displayed, the manner in which the character image is displayed and the position in which the character image is displayed can be variously modified depending on the relationship between the line-of-sight direction and the sound source direction. For example, the character image may always be displayed at a predetermined position in the maximum image display area PN regardless of the angle formed by the line-of-sight direction and the sound source direction. Further, the angle between the line-of-sight direction and the sound source direction is set to 30 degrees as a threshold value, and the position where the character image is displayed in the maximum image display area PN is not set, but an angle larger or smaller than 30 degrees is set as the threshold value. May be set as

Ｃ６．変形例６：
上記実施形態では、画像処理部１６０は、カメラ６１が撮像した外景の画像に対して顔認証を行なうことで、音源として教師ＴＥを抽出したが、音源の抽出方法については種々変形可能である。例えば、測距センサーによって、使用者から一定距離に存在するものが音源の候補として抽出されてもよい。 C6. Modification 6:
In the above embodiment, the image processing unit 160 performs face authentication on the outside scene image captured by the camera 61 to extract the teacher TE as a sound source. However, the sound source extraction method can be variously modified. For example, an object existing at a certain distance from the user may be extracted as a sound source candidate by a distance measuring sensor.

また、上記実施形態では、音声処理部１７０は、話者認識によって種類の異なる音声を識別しているが、音声の識別方法についてはこれに限られず、種々変形可能である。例えば、大学の講義のように使用者が毎週同じ音源である教師ＴＥの話を聞く場合、予め、教師ＴＥの音声の特徴が頭部装着型表示装置１００に登録されることで、教師ＴＥ以外の音声が文字画像へと変換されないように設定できる。この頭部装着型表示装置１００では、その都度、音源方向と異なる種類ごとの音声の設定が不要であり、また、使用者が文字画像として最大画像表示領域ＰＮに表示させたい種類の音声を表す文字画像を高い精度で識別することできる。 In the above embodiment, the voice processing unit 170 identifies different kinds of voices by speaker recognition. However, the voice identification method is not limited to this, and various modifications can be made. For example, when a user listens to a story of a teacher TE who is the same sound source every week as in a university lecture, the voice characteristics of the teacher TE are registered in the head-mounted display device 100 in advance. Can be set not to be converted into a character image. In this head-mounted display device 100, it is not necessary to set sound for each type different from the sound source direction each time, and represents the type of sound that the user wants to display as a character image in the maximum image display area PN. Character images can be identified with high accuracy.

また、上記実施形態では、取得された音声を、音声を表す文字画像として変換したが、音声を文字画像へと変換する方法は種々変形可能である。例えば、プライバシー保護の観点から、公演では取得された音声が文字画像へと変換されて、取得された音声の録音は行われない設定にしてもよい。また、頭部装着型表示装置１００に個人名等の特定の言葉が予め登録されておくことで、登録された言葉については、音声が取得されても文字画像へと変換が行われない設定としてもよい。 Moreover, in the said embodiment, although the acquired audio | voice was converted as a character image showing an audio | voice, the method of converting an audio | voice into a character image can be variously deformed. For example, from the viewpoint of privacy protection, it may be set such that the acquired voice is converted into a character image in the performance, and the acquired voice is not recorded. In addition, by registering specific words such as personal names in the head-mounted display device 100 in advance, the registered words are not converted into a character image even if voice is acquired. Also good.

また、頭部装着型表示装置１００に予め登録される音声としては、機械の稼働音のような音であってもよい。例えば、稼働音を表す文字画像を使用者に視認させ、産業用機械における正常稼動時の稼動音を予め登録する。この場合に、産業用機械が設置された工場内が他の機械等によって騒がしい場合であっても、産業用機械が異常稼動時の稼働音を表す文字画像を使用者が視認できることで、使用者は、産業用機械の稼働状態が正常か否かを視覚情報として認識できる。 In addition, the sound registered in advance in the head-mounted display device 100 may be a sound such as an operating sound of a machine. For example, the user visually recognizes a character image representing an operating sound, and the operating sound during normal operation of the industrial machine is registered in advance. In this case, even if the factory where the industrial machine is installed is noisy due to other machines, the user can visually recognize the character image representing the operating sound when the industrial machine is operating abnormally. Can recognize as visual information whether or not the operating state of the industrial machine is normal.

Ｃ７．変形例７：
上記実施形態における頭部装着型表示装置１００の構成は、あくまで一例であり、種々変形可能である。例えば、制御部１０に設けられた方向キー１６やトラックパッド１４の一方を省略したり、方向キー１６やトラックパッド１４に加えてまたは方向キー１６やトラックパッド１４に代えて操作用スティック等の他の操作用インターフェイスを設けたりしてもよい。また、制御部１０は、キーボードやマウス等の入力デバイスを接続可能な構成であり、キーボードやマウスから入力を受け付けるものとしてもよい。 C7. Modification 7:
The configuration of the head-mounted display device 100 in the above embodiment is merely an example and can be variously modified. For example, one of the direction key 16 and the track pad 14 provided in the control unit 10 may be omitted, in addition to the direction key 16 and the track pad 14, or in place of the direction key 16 and the track pad 14, etc. An operation interface may be provided. Moreover, the control part 10 is a structure which can connect input devices, such as a keyboard and a mouse | mouth, and is good also as what receives an input from a keyboard or a mouse | mouth.

また、画像表示部として、眼鏡のように装着する画像表示部２０に代えて、例えば帽子のように装着する画像表示部といった他の方式の画像表示部を採用してもよい。また、イヤホン３２，３４、カメラ６１、は適宜省略可能である。また、上記実施形態では、画像光を生成する構成として、ＬＣＤと光源とを利用しているが、これらに代えて、有機ＥＬディスプレイといった他の表示素子を採用してもよい。また、上記実施形態では、使用者の頭の動きを検出するセンサーとして９軸センサー６６を利用しているが、これに代えて、加速度センサー、角速度センサー、地磁気センサーのうちの１つまたは２つから構成されたセンサーを利用するとしてもよい。また、上記実施形態では、頭部装着型表示装置１００は、両眼タイプの光学透過型であるとしているが、本発明は、例えばビデオ透過型や単眼タイプといった他の形式の頭部装着型表示装置にも同様に適用可能である。 As the image display unit, instead of the image display unit 20 worn like glasses, another type of image display unit such as an image display unit worn like a hat may be adopted. The earphones 32 and 34 and the camera 61 can be omitted as appropriate. Moreover, in the said embodiment, although LCD and a light source are utilized as a structure which produces | generates image light, it replaces with these and you may employ | adopt other display elements, such as an organic EL display. In the above embodiment, the 9-axis sensor 66 is used as a sensor for detecting the movement of the user's head. Instead, one or two of an acceleration sensor, an angular velocity sensor, and a geomagnetic sensor are used. You may use the sensor comprised from. In the above embodiment, the head-mounted display device 100 is a binocular optical transmission type. However, the present invention can be applied to other types of head-mounted display such as a video transmission type and a monocular type. The same applies to the apparatus.

また、上記実施形態において、頭部装着型表示装置１００は、使用者の左右の眼に同じ画像を表す画像光を導いて使用者に二次元画像を視認させるとしてもよいし、使用者の左右の眼に異なる画像を表す画像光を導いて使用者に三次元画像を視認させるとしてもよい。 In the above embodiment, the head-mounted display device 100 may guide image light representing the same image to the left and right eyes of the user so that the user can visually recognize the two-dimensional image. It is also possible to guide the user to visually recognize a three-dimensional image by guiding image light representing a different image to his eyes.

また、上記実施形態において、ハードウェアによって実現されていた構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されていた構成の一部をハードウェアに置き換えるようにしてもよい。例えば、上記実施形態では、画像処理部１６０や音声処理部１７０は、ＣＰＵ１４０がコンピュータープログラムを読み出して実行することにより実現されるとしているが、これらの機能部はハードウェア回路により実現されるとしてもよい。 In the above embodiment, a part of the configuration realized by hardware may be replaced by software, and conversely, a part of the configuration realized by software may be replaced by hardware. Good. For example, in the above-described embodiment, the image processing unit 160 and the sound processing unit 170 are realized by the CPU 140 reading and executing a computer program, but these functional units may be realized by a hardware circuit. Good.

また、本発明の機能の一部または全部がソフトウェアで実現される場合には、そのソフトウェア（コンピュータープログラム）は、コンピューター読み取り可能な記録媒体に格納された形で提供することができる。この発明において、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスクやＣＤ−ＲＯＭのような携帯型の記録媒体に限らず、各種のＲＡＭやＲＯＭ等のコンピューター内の内部記憶装置や、ハードディスク等のコンピューターに固定されている外部記憶装置も含んでいる。 In addition, when part or all of the functions of the present invention are realized by software, the software (computer program) can be provided in a form stored in a computer-readable recording medium. In the present invention, the “computer-readable recording medium” is not limited to a portable recording medium such as a flexible disk or a CD-ROM, but an internal storage device in a computer such as various RAMs and ROMs, a hard disk, etc. It also includes an external storage device fixed to the computer.

また、上記実施形態では、図１および図２に示すように、制御部１０と画像表示部２０とが別々の構成として形成されているが、制御部１０と画像表示部２０との構成については、これに限られず、種々変形可能である。例えば、画像表示部２０の内部に、制御部１０に形成された構成の全てが形成されてもよいし、一部が形成されてもよい。また、制御部１０に形成された構成の内、操作部１３５のみが単独のユーザーインターフェース（ＵＩ）として形成されてもよいし、上記実施形態における電源１３０が単独で形成されて、交換可能な構成であってもよい。また、制御部１０に形成された構成が重複して画像表示部２０に形成されていてもよい。例えば、図２に示すＣＰＵ１４０が制御部１０と画像表示部２０との両方に形成されていてもよいし、制御部１０に形成されたＣＰＵ１４０と画像表示部２０に形成されたＣＰＵとが行なう機能が別々に分けられている構成としてもよい。 Moreover, in the said embodiment, as shown in FIG. 1 and FIG. 2, the control part 10 and the image display part 20 are formed as a separate structure, However, about the structure of the control part 10 and the image display part 20, about. However, the present invention is not limited to this, and various modifications are possible. For example, all of the components formed in the control unit 10 may be formed inside the image display unit 20 or a part thereof may be formed. Further, among the configurations formed in the control unit 10, only the operation unit 135 may be formed as a single user interface (UI), or the power source 130 in the above embodiment is formed independently and can be replaced. It may be. Further, the configuration formed in the control unit 10 may be formed in the image display unit 20 in an overlapping manner. For example, the CPU 140 shown in FIG. 2 may be formed in both the control unit 10 and the image display unit 20, or a function performed by the CPU 140 formed in the control unit 10 and the CPU formed in the image display unit 20. May be configured separately.

Ｃ８．変形例８：
例えば、画像光生成部は、有機ＥＬ（有機エレクトロルミネッセンス、Organic Electro-Luminescence）のディスプレイと、有機ＥＬ制御部とを備える構成としても良い。また、例えば、画像生成部は、ＬＣＤに代えて、ＬＣＯＳ（Liquid crystal on silicon, LCoS は登録商標）や、デジタル・マイクロミラー・デバイス等を用いることもできる。また、例えば、レーザー網膜投影型のヘッドマウントディスプレイに対して本発明を適用することも可能である。レーザー網膜投影型の場合、「画像光生成部における画像光の射出可能領域」とは、使用者の眼に認識される画像領域として定義することができる。 C8. Modification 8:
For example, the image light generation unit may include an organic EL (Organic Electro-Luminescence) display and an organic EL control unit. Further, for example, the image generation unit may use LCOS (Liquid crystal on silicon, LCoS is a registered trademark), a digital micromirror device, or the like instead of the LCD. Further, for example, the present invention can be applied to a laser retinal projection type head mounted display. In the case of the laser retinal projection type, the “image light emitting area in the image light generation unit” can be defined as an image area recognized by the user's eyes.

また、例えば、ヘッドマウントディスプレイは、光学像表示部が使用者の眼の一部分のみを覆う態様、換言すれば、光学像表示部が使用者の眼を完全に覆わない態様のヘッドマウントディスプレイとしてもよい。また、ヘッドマウントディスプレイは、いわゆる単眼タイプのヘッドマウントディスプレイであるとしてもよい。 Further, for example, the head-mounted display may be a head-mounted display in which the optical image display unit covers only a part of the user's eye, in other words, the optical image display unit does not completely cover the user's eye. Good. The head mounted display may be a so-called monocular type head mounted display.

また、イヤホンは耳掛け型やヘッドバンド型を採用してもよく、省略しても良い。また、例えば、自動車や飛行機等の車両に搭載されるヘッドマウントディスプレイとして構成されてもよい。また、例えば、ヘルメット等の身体防護具に内蔵されたヘッドマウントディスプレイとして構成されてもよい。 Further, the earphone may be an ear-hook type or a headband type, or may be omitted. Further, for example, it may be configured as a head mounted display mounted on a vehicle such as an automobile or an airplane. Further, for example, it may be configured as a head-mounted display built in a body protective device such as a helmet.

Ｃ９．変形例９：
また、上記実施形態では、使用者に文字画像を視認させる表示装置として、使用者の頭部に装着される頭部装着型表示装置１００を用いたが、表示装置はこれに限られず、種々変形可能である。例えば、自動車のフロントガラスに用いられるヘッドアップディスプレイ（Head-up Display；ＨＵＤ）であってもよい。この場合に、使用者である自動車の運転手は、進行方向の視野を確保した上で自動車の外の音等を文字画像として視認することができる。この変形例の表示装置では、聴覚障害者が運転手である場合や自動車内が騒がしい場合に、自動車の外の音を視覚情報として認識できるので、外部での危険情報を視覚情報として認識でき、自動車を運転しているときの安全性を高めることができる。 C9. Modification 9:
In the above embodiment, the head-mounted display device 100 that is worn on the user's head is used as the display device that allows the user to visually recognize the character image. However, the display device is not limited to this, and various modifications can be made. Is possible. For example, the head-up display (HUD) used for the windshield of a motor vehicle may be sufficient. In this case, the driver of the automobile as a user can visually recognize the sound outside the automobile as a character image while ensuring a visual field in the traveling direction. In the display device of this modified example, when the hearing impaired person is a driver or when the inside of the car is noisy, the sound outside the car can be recognized as visual information, so external danger information can be recognized as visual information, Safety when driving a car can be increased.

Ｃ１０．変形例１０：
また、上記第３実施形態では、図１４のステップＳ５０７の処理において、教師ＴＥの口ＭＯに開閉状態の変化がないと判定された場合には（ステップＳ５０７：ＮＯ）、引き続き、画像判定部１４２が教師ＴＥの口ＭＯの開閉状態の変化の検出を監視するが、必ずしも、画像判定部１４２が変化の検出を監視する必要はない。例えば、教師ＴＥの口ＭＯの開閉状態の変化が検出されない場合には、画像処理部１６０によって抽出された教師ＴＥを目標音源として、音源方向が設定されてもよい。また、この場合に、教師ＴＥの口ＭＯの開閉状態の変化が初めて検出された後には、教師ＴＥの口ＭＯを目標音源として、音源方向が設定されてもよい。 C10. Modification 10:
Further, in the third embodiment, when it is determined in the process of step S507 in FIG. 14 that there is no change in the opening / closing state of the mouth MO of the teacher TE (step S507: NO), the image determination unit 142 continues. However, the detection of the change in the opening / closing state of the mouth MO of the teacher TE is monitored, but the image determination unit 142 does not necessarily need to monitor the detection of the change. For example, when a change in the opening / closing state of the mouth MO of the teacher TE is not detected, the sound source direction may be set with the teacher TE extracted by the image processing unit 160 as the target sound source. In this case, after a change in the opening / closing state of the mouth MO of the teacher TE is detected for the first time, the sound source direction may be set with the mouth MO of the teacher TE as the target sound source.

また、上記第３実施形態では、マイク６３によって取得される音声と異なる音声として、無線通信部１３２が通信によって取得した音声信号を表す音声が例に挙げられているが、マイク６３によって取得される音声と箱となる音声は、これに限られず種々変形可能である。例えば、マイク６３によって取得される音声とは異なる音声として、マイク６３とは別に教師ＴＥの胸元に装着されたピンマイクが取得した音声であってもよい。また、マイク６３によって取得された音声の内、音声処理部１７０によって人が発声する音声と、機械等によって発せられる音声と、が識別されることで、これらの音声が変換された文字画像のフォントが変えられてもよい。また、複数の異なる音声の内の１つの音声のみが文字画像として変換されてもよいし、複数の音声の種類は３種類以上であってもよい。 In the third embodiment, the voice representing the voice signal acquired by communication by the wireless communication unit 132 is exemplified as the voice different from the voice acquired by the microphone 63. However, the voice is acquired by the microphone 63. The voice and the voice to be a box are not limited to this and can be variously modified. For example, as a voice different from the voice acquired by the microphone 63, a voice acquired by a pin microphone mounted on the chest of the teacher TE separately from the microphone 63 may be used. Further, among the voices acquired by the microphone 63, the voice of the person uttered by the voice processing unit 170 and the voice uttered by a machine or the like are identified, and the font of the character image in which these voices are converted. May be changed. In addition, only one voice among a plurality of different voices may be converted as a character image, and the plurality of voice types may be three or more.

Ｃ１１．変形例１１：
上記実施形態では、外景画像や使用者の視線方向等に基づいてマイク６３から音源方向が設定されたが、音源方向の設定方法についてはこれに限られず、種々変形可能である。例えば、複数の指向性を有するマイクにおいて、それぞれの方向から取得された音量の大きさ（例えば、デジベル（ｄＢ））が比較されて、最も音量が大きい方向が音源方向として設定されてもよい。この変形例の頭部装着型表示装置１００では、最も音量の大きい音声を表す文字画像を使用者に視認させるため、例えば、ヘッドホン等によって使用者が外部の音声を聞きづらい場合であっても、最も注意すべき外部の音声を使用者に視覚情報として認識させることができる。 C11. Modification 11:
In the above embodiment, the sound source direction is set from the microphone 63 based on the outside scene image, the user's line-of-sight direction, and the like. However, the method of setting the sound source direction is not limited to this, and various modifications can be made. For example, in microphones having a plurality of directivities, the volume levels (for example, decibels (dB)) acquired from the respective directions may be compared, and the direction with the highest volume may be set as the sound source direction. In the head-mounted display device 100 of this modification, in order to make the user visually recognize the character image representing the sound with the highest volume, for example, even when the user has difficulty in listening to the external sound using headphones or the like. It is possible to make the user recognize external sounds to be noted as visual information.

また、上記実施形態では、視線方向と設定された音源方向とのずれに応じて文字画像が表示される位置が設定されたが、文字画像が表示される位置の設定の方法はこられに限られず、種々変形可能である。例えば、この変形例では、マイク６３ｃは、Ａ方向とＢ方向とから取得される音声の音量の感度が最大である。マイク６３ｃは、ある時点において、取得した音量の大きさがＡ方向とＢ方向とで同じであり、その後に、Ａ方向から取得した音量の大きさがＢ方向のものよりも大きかった場合に、方向判定部１６１は、目標音源がＢ方向からＡ方向へと近づいたと判定する。その後、画像処理部１６０および画像表示部２０は、マイク６３ｃによって取得された音声を変換した文字画像を、最大画像表示領域ＰＮにおけるＡ方向付近に表示させる。この変形例では、マイク６３ｃが方向によって取得する音声の音量の感度が異なり、画像処理部１６０および画像表示部２０が取得された音声の音量に基づいて文字画像を表示する位置を変更する。そのため、この変形例では、カメラ６１等がなくても、音源方向を設定して、目標音源の近くに取得された文字画像を表示できるので、使用者の利便性が向上する。 In the above embodiment, the position where the character image is displayed is set according to the difference between the line-of-sight direction and the set sound source direction, but the method for setting the position where the character image is displayed is not limited to this. However, various modifications are possible. For example, in this modification, the microphone 63c has the maximum sound volume sensitivity acquired from the A direction and the B direction. When the volume of the acquired volume is the same in the A direction and the B direction at a certain point in time and the volume acquired from the A direction is larger than that in the B direction, The direction determination unit 161 determines that the target sound source has approached the A direction from the B direction. Thereafter, the image processing unit 160 and the image display unit 20 display a character image obtained by converting the sound acquired by the microphone 63c in the vicinity of the A direction in the maximum image display region PN. In this modification, the sensitivity of the sound volume acquired by the microphone 63c differs depending on the direction, and the image processing unit 160 and the image display unit 20 change the position where the character image is displayed based on the acquired sound volume. For this reason, in this modified example, the sound source direction can be set and the character image acquired near the target sound source can be displayed without the camera 61 or the like, so that convenience for the user is improved.

また、音源方向は、ＧＰＳ（Global Positioning System）によって設定されてもよい。例えば、教師ＴＥのように、予め取得したい音声を発する目標音源が判明している場合、目標音源にＧＰＳモジュールを内蔵させると共に、画像表示部２０にもＧＰＳモジュールを内蔵させ、無線通信部１３２がＧＰＳモジュールの位置情報を受信することで、画像表示部２０とＧＰＳモジュールを携帯する教師ＴＥとの位置関係を特定してもよい。この変形例では、カメラ６１の外景画像や指向性を有するマイク６３等によって音源方向を特定するよりも、より詳細な音源方向が特定される。なお、画像表示部２０に対する音源方向を特定する方法は、ＧＰＳモジュールを用いることに限られず、種々変形可能である。例えば、頭部装着型表示装置１００と１対１で対応する通信機を教師ＴＥに携帯させてもよい。 The sound source direction may be set by GPS (Global Positioning System). For example, when a target sound source that emits a sound to be acquired in advance is known, such as a teacher TE, a GPS module is incorporated in the target sound source, and a GPS module is also incorporated in the image display unit 20. The positional relationship between the image display unit 20 and the teacher TE carrying the GPS module may be specified by receiving the position information of the GPS module. In this modification, the sound source direction is specified in more detail than the sound source direction is specified by the outside scene image of the camera 61 or the microphone 63 having directivity. Note that the method of specifying the sound source direction with respect to the image display unit 20 is not limited to using the GPS module, and various modifications can be made. For example, the teacher TE may carry a communication device that has a one-to-one correspondence with the head-mounted display device 100.

また、ＧＰＳモジュール等によって衛星を用いた三角測量が行なわれることで、画像表示部２０から目標音源までの距離が特定され、特定された距離に基づいて、最大画像表示領域ＰＮに表示される文字画像の大きさが設定されてもよい。例えば、画像表示部２０から目標音源までの距離が近いほど、最大画像表示領域ＰＮにカメラ６１によって取得された音声を表す文字画像が大きく表示されてもよい。この変形例では、使用者から目標音源までの距離に応じて文字画像の大きさが変化するため、使用者に目標音源までの距離を視覚情報として認識させることができる。なお、最大画像表示領域ＰＮに表示される文字画像の大きさは、画像表示部２０から目標音源までの距離を何段階かに分類して、その分類に応じて段階的に変更してもよい。この場合に、一定以上の距離が離れた場合には、最大画像表示領域ＰＮにおいて文字画像が表示される位置を、最大画像表示領域ＰＮの中心からずれた位置に表示させることで、使用者の視認性を向上させてもよい。また、文字画像の大きさの代わりに文字画像の種類が変化してもよいし、大きさおよび種類のいずれもが変化してもよい。なお、音源方向については、カメラ６１等によって特定されて、画像表示部２０から目標音源までの距離のみがＧＰＳモジュール等によって特定されてもよい。ＧＰＳモジュールおよび無線通信部１３２は、請求項における距離特定部に相当する。 In addition, the distance from the image display unit 20 to the target sound source is specified by performing triangulation using a satellite by a GPS module or the like, and the characters displayed in the maximum image display area PN based on the specified distance The size of the image may be set. For example, the closer the distance from the image display unit 20 to the target sound source, the larger the character image representing the voice acquired by the camera 61 may be displayed in the maximum image display area PN. In this modified example, since the size of the character image changes according to the distance from the user to the target sound source, the user can recognize the distance to the target sound source as visual information. The size of the character image displayed in the maximum image display area PN may be changed in stages according to the classification of the distance from the image display unit 20 to the target sound source in several stages. . In this case, when the distance is more than a certain distance, the position where the character image is displayed in the maximum image display area PN is displayed at a position shifted from the center of the maximum image display area PN. Visibility may be improved. Further, the type of the character image may be changed instead of the size of the character image, and both the size and the type may be changed. The sound source direction may be specified by the camera 61 or the like, and only the distance from the image display unit 20 to the target sound source may be specified by the GPS module or the like. The GPS module and the wireless communication unit 132 correspond to a distance specifying unit in the claims.

また、画像表示部２０から目標音源までの距離を特定する方法については、種々変形可能である。画像表示部２０から目標音源までの距離を特定する方法として、ＧＰＳモジュールを用いる以外に、光学式の二重像合致式距離計が用いられてもよいし、超音波式距離計が用いられてもよい。また、画像表示部２０に、配置される場所が異なる複数のカメラ６１が設けられ、配置された場所が異なるカメラ６１が撮像した画像のそれぞれに基づいて、画像表示部２０から目標音源までの距離が特定されてもよい。また、目標音源に光を照射して、受光素子を用いてその反射光を受光して評価することで、画像表示部２０から目標音源までの距離が特定されてもよい。また、無線ＬＡＮの電波の強度によって距離が特定されてもよい。 The method for specifying the distance from the image display unit 20 to the target sound source can be variously modified. As a method for specifying the distance from the image display unit 20 to the target sound source, in addition to using the GPS module, an optical double image coincidence distance meter may be used, or an ultrasonic distance meter may be used. Also good. The image display unit 20 is provided with a plurality of cameras 61 at different locations, and the distance from the image display unit 20 to the target sound source is based on each of the images captured by the cameras 61 at different locations. May be specified. Further, the distance from the image display unit 20 to the target sound source may be specified by irradiating the target sound source with light and receiving and evaluating the reflected light using a light receiving element. Further, the distance may be specified by the intensity of radio waves of the wireless LAN.

図１７は、変形例における頭部装着型表示装置の外観構成を示す説明図である。図１７（Ａ）の例の場合、図１に示した頭部装着型表示装置１００との違いは、画像表示部２０ｃが、右光学像表示部２６に代えて右光学像表示部２６ｃを備える点と、左光学像表示部２８に代えて左光学像表示部２８ｃを備える点である。右光学像表示部２６ｃは、上記実施形態の光学部材よりも小さく形成され、頭部装着型表示装置１００ｃの装着時における使用者ＵＳの右眼の斜め上に配置されている。同様に、左光学像表示部２８ｃは、上記実施形態の光学部材よりも小さく形成され、頭部装着型表示装置１００ｃの装着時における使用者ＵＳの左眼の斜め上に配置されている。図１７（Ｂ）の例の場合、図１に示した頭部装着型表示装置１００との違いは、画像表示部２０ｄが、右光学像表示部２６に代えて右光学像表示部２６ｄを備える点と、左光学像表示部２８に代えて左光学像表示部２８ｄを備える点である。右光学像表示部２６ｄは、上記実施形態の光学部材よりも小さく形成され、ヘッドマウントディスプレイの装着時における使用者ＵＳの右眼の斜め下に配置されている。左光学像表示部２８ｄは、上記実施形態の光学部材よりも小さく形成され、ヘッドマウントディスプレイの装着時における使用者ＵＳの左眼の斜め下に配置されている。このように、光学像表示部は使用者ＵＳの眼の近傍に配置されていれば足りる。また、光学像表示部を形成する光学部材の大きさも任意であり、光学像表示部が使用者ＵＳの眼の一部分のみを覆う態様、換言すれば、光学像表示部が使用者ＵＳの眼を完全に覆わない態様の頭部装着型表示装置１００として実現できる。 FIG. 17 is an explanatory diagram illustrating an external configuration of a head-mounted display device according to a modification. In the case of the example in FIG. 17A, the difference from the head-mounted display device 100 shown in FIG. 1 is that the image display unit 20c includes a right optical image display unit 26c instead of the right optical image display unit 26. And a point provided with a left optical image display unit 28 c instead of the left optical image display unit 28. The right optical image display unit 26c is formed smaller than the optical member of the above-described embodiment, and is disposed obliquely above the right eye of the user US when the head-mounted display device 100c is worn. Similarly, the left optical image display unit 28c is formed smaller than the optical member of the above-described embodiment, and is disposed obliquely above the left eye of the user US when the head-mounted display device 100c is worn. In the case of the example of FIG. 17B, the difference from the head-mounted display device 100 shown in FIG. 1 is that the image display unit 20d includes a right optical image display unit 26d instead of the right optical image display unit 26. And a point provided with a left optical image display unit 28d instead of the left optical image display unit 28. The right optical image display unit 26d is formed smaller than the optical member of the above embodiment, and is disposed obliquely below the right eye of the user US when the head mounted display is mounted. The left optical image display unit 28d is formed smaller than the optical member of the above-described embodiment, and is disposed obliquely below the left eye of the user US when the head mounted display is mounted. Thus, it is sufficient that the optical image display unit is disposed in the vicinity of the eyes of the user US. The size of the optical member forming the optical image display unit is also arbitrary, and the optical image display unit covers only a part of the eyes of the user US, in other words, the optical image display unit covers the eyes of the user US. This can be realized as a head-mounted display device 100 that is not completely covered.

本発明は、上記実施形態や変形例に限られるものではなく、その趣旨を逸脱しない範囲において種々の構成で実現することができる。例えば、発明の概要の欄に記載した各形態中の技術的特徴に対応する実施形態、変形例中の技術的特徴は、上述の課題の一部または全部を解決するために、あるいは、上述の効果の一部または全部を達成するために、適宜、差し替えや、組み合わせを行なうことが可能である。また、その技術的特徴が本明細書中に必須なものとして説明されていなければ、適宜、削除することが可能である。 The present invention is not limited to the above-described embodiments and modifications, and can be realized with various configurations without departing from the spirit of the present invention. For example, the technical features in the embodiments and the modifications corresponding to the technical features in each form described in the summary section of the invention are to solve some or all of the above-described problems, or In order to achieve part or all of the effects, replacement or combination can be performed as appropriate. Further, if the technical feature is not described as essential in the present specification, it can be deleted as appropriate.

１０…制御部
１１…決定キー
１２…点灯部
１３…表示切替キー
１４…トラックパッド
１５…輝度切替キー
１６…方向キー
１７…メニューキー
１８…電源スイッチ
２０…画像表示部（表示位置設定部）
２１…右保持部
２２…右表示駆動部
２３…左保持部
２４…左表示駆動部
２６…右光学像表示部
２８…左光学像表示部
３０…イヤホンプラグ
３２…右イヤホン
３４…左イヤホン
４０…接続部
４２…右コード
４４…左コード
４６…連結部材
４８…本体コード
５１，５２…送信部
５３，５４…受信部
６１…カメラ（画像取得部）
６３…マイク（音声取得部）
６６…９軸センサー（視線方向推定部）
１００…頭部装着型表示装置
１１０…入力情報取得部
１２０…記憶部
１３０…電源
１３２…無線通信部（通信部、距離特定部）
１３５…操作部（特定方向設定部）
１４０…ＣＰＵ
１４２…画像判定部
１５０…オペレーティングシステム
１６０…画像処理部（表示位置設定部）
１６１…方向判定部（視線方向推定部）
１６３…マイク駆動部（音声取得部）
１７０…音声処理部（音声識別部）
１８０…インターフェイス
１８５…変換部
１９０…表示制御部
２０１…右バックライト制御部
２０２…左バックライト制御部
２１１…右ＬＣＤ制御部
２１２…左ＬＣＤ制御部
２２１…右バックライト
２２２…左バックライト
２４１…右ＬＣＤ
２４２…左ＬＣＤ
２５１…右投写光学系
２５２…左投写光学系
２６１…右導光板
２６２…左導光板
ＶＳｙｎｃ…垂直同期信号
ＨＳｙｎｃ…水平同期信号
ＰＣＬＫ…クロック信号
ＯＡ…外部機器
ＷＢ…ホワイトボード
ＴＥ…教師
ＳＴ，ＳＴ１，ＳＴ２，ＳＴ３…生徒
ＣＭ…指示画像
ＰＮ…最大画像表示領域
ＣＲ…カーソル
ＶＲ…使用者の視野
ＮＴ…ノート
ＰＥＮ…ペン
ＩＭＧ…画像
ＢＩＭ…外景画像
ＴＸ１，ＴＸ２，ＴＸ３，ＴＸ１１，ＴＸ１２，ＴＸ３１，ＴＸ３２，ＴＸ３３，ＴＸ３４，ＴＸ３５，ＴＸ４１，ＴＸ４２…テキスト画像
ＭＯ…教師ＴＥの口 DESCRIPTION OF SYMBOLS 10 ... Control part 11 ... Decision key 12 ... Illumination part 13 ... Display switch key 14 ... Trackpad 15 ... Luminance switch key 16 ... Direction key 17 ... Menu key 18 ... Power switch 20 ... Image display part (display position setting part)
DESCRIPTION OF SYMBOLS 21 ... Right holding part 22 ... Right display drive part 23 ... Left holding part 24 ... Left display drive part 26 ... Right optical image display part 28 ... Left optical image display part 30 ... Earphone plug 32 ... Right earphone 34 ... Left earphone 40 ... Connection unit 42 ... right cord 44 ... left cord 46 ... connecting member 48 ... body cord 51, 52 ... transmission unit 53, 54 ... reception unit 61 ... camera (image acquisition unit)
63 ... Microphone (voice acquisition unit)
66 ... 9-axis sensor (gaze direction estimation unit)
DESCRIPTION OF SYMBOLS 100 ... Head-mounted display apparatus 110 ... Input information acquisition part 120 ... Memory | storage part 130 ... Power supply 132 ... Wireless communication part (communication part, distance specific | specification part)
135 ... operation unit (specific direction setting unit)
140 ... CPU
142: image determination unit 150: operating system 160: image processing unit (display position setting unit)
161... Direction determination unit (gaze direction estimation unit)
163 ... Microphone drive unit (voice acquisition unit)
170: Voice processing unit (voice identification unit)
180 ... Interface 185 ... Conversion unit 190 ... Display control unit 201 ... Right backlight control unit 202 ... Left backlight control unit 211 ... Right LCD control unit 212 ... Left LCD control unit 221 ... Right backlight 222 ... Left backlight 241 ... Right LCD
242 ... Left LCD
251 ... Right projection optical system 252 ... Left projection optical system 261 ... Right light guide plate 262 ... Left light guide plate VSync ... Vertical sync signal HSync ... Horizontal sync signal PCLK ... Clock signal OA ... External device WB ... Whiteboard TE ... Teacher ST, ST1 ST2, ST3 ... Student CM ... Instruction image PN ... Maximum image display area CR ... Cursor VR ... User's field of view NT ... Notebook PEN ... Pen IMG ... Image BIM ... Outside scene image TX1, TX2, TX3, TX11, TX12, TX31, TX32, TX33, TX34, TX35, TX41, TX42 ... Text image MO ... Teacher TE's mouth

Claims

A transmissive display device,
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
An audio acquisition unit for acquiring audio;
A conversion unit for converting the sound into a character image represented as an image by characters;
A specific direction setting unit for setting a specific direction;
A display position setting unit configured to set an image display position that is a position for visually recognizing the character image light representing the character image in the visual field of the user based on the specific direction.

The display device according to claim 1,
The display position setting unit sets the image display position so as not to overlap with a position corresponding to the specific direction in a user's visual field.

The display device according to claim 1 or 2,
The display position setting unit sets the image display position to a position corresponding to a position other than the center in the user's visual field.

A display device according to any one of claims 1 to 3,
The sound acquisition unit has different sensitivities for acquiring sound according to the direction from the sound source to the sound acquisition unit,
The display device in which the specific direction is set based on the sensitivity of the acquired sound.

A display device according to any one of claims 1 to 3,
The display device, wherein the specific direction is a direction from the sound acquisition unit to a sound source.

The display device according to any one of claims 1 to 5, further comprising:
An image acquisition unit that acquires images of outside scenes at a plurality of points in time,
The image display position setting unit sets the image display position based on a change in the image of the outside scene at a plurality of points in time and the specific direction.

The display device according to any one of claims 1 to 6, further comprising:
A distance specifying unit for specifying a distance from the display device to the sound source;
The display device, wherein the display position setting unit performs at least one of a change in the size of the character image light and the image display position and setting based on the specified distance.

The display device according to any one of claims 1 to 7, further comprising:
A distance specifying unit for specifying a distance from the display device to the sound source;
The said conversion part is a display apparatus which changes the kind of said character image based on the specified distance.

A display device according to any one of claims 1 to 8,
The sound acquisition unit is different in sensitivity for acquiring sound volume according to the direction from the sound source to the sound acquisition unit,
The display device, wherein the display position setting unit sets the image display position based on a volume of sound acquired differently for each direction in which different same sound sources emit.

A display device according to any one of claims 1 to 9,
The display device, wherein the sound acquisition unit is set so that sensitivity for acquiring sound differs according to a direction from a sound source to the sound acquisition unit, and sensitivity for acquiring sound from the specific direction is maximized.

The display device according to any one of claims 1 to 10, further comprising:
A voice identification unit for identifying different types of voice acquired from a plurality of sound sources for each type of voice;
An operation unit for receiving an operation by a user,
The specific direction setting unit specifies a specific sound source direction that is a direction from the sound acquisition unit to a sound source from which one of the plurality of sounds is acquired based on the operation,
The display position setting unit is configured to set, in a user's field of view, a position where the character image light representing the one voice is visually recognized near a position corresponding to the specific sound source direction.

The display device according to claim 11,
The display position setting unit sets a position at which the user visually recognizes the image light representing the one sound in a visual field to a position that does not overlap any of a plurality of positions corresponding to the specific sound source directions. Display device.

A display device according to claim 11 or claim 12,
The image display unit generates a plurality of the voices in the character image light that is different for each type of sound, and allows a user to visually recognize the image light for each of the plurality of types of sound,
The operation is an operation of specifying the character image light corresponding to the sound from one specific sound source direction from the character image light for each of a plurality of types of the sound visually recognized in the user's visual field. Display device.

A display device according to any one of claims 1 to 13,
The image display unit is a display device that causes a user to recognize the character image light as a virtual image after a predetermined time delay from the time when the sound acquisition unit acquires the sound.

The display device according to any one of claims 1 to 14, further comprising:
A gaze direction estimation unit for estimating the gaze direction of the user,
The image display unit allows the user to visually recognize the image light in a state of being worn on the user's head,
The display position setting unit sets the image display position based on a relationship between the specific direction and the line-of-sight direction.

The display device according to claim 15,
The display position setting unit, when a specific angle, which is an angle formed by the line-of-sight direction and the specific direction, is less than a first threshold, is near the position corresponding to the specific direction in the user's visual field. A display device that sets an image display position, and sets the image display position regardless of the specific direction when the specific angle is equal to or greater than a first threshold.

The display device according to claim 15 or 16, further comprising:
An image acquisition unit for acquiring an image of the outside scene,
The image display unit is an image light that represents an image in the specific direction acquired by the image acquisition unit when a specific angle that is an angle formed by the line-of-sight direction and the specific direction is equal to or greater than a second threshold. When a specific direction image light is generated and visually recognized by the user, and the specific angle is less than a second threshold, the specific direction image light is not generated,
The display position setting unit sets a position for recognizing the image light in the specific direction not to overlap the image display position and close to the image display position when the specific angle is equal to or greater than a second threshold. And when the said specific angle is less than a 2nd threshold value, the said image display position is set near the position corresponding to the said specific direction in a user's visual field.

The display device according to any one of claims 1 to 17, further comprising:
A voice identification unit for identifying the acquired voice and a specific voice different from the acquired voice;
The said conversion part is a display apparatus which converts the acquired audio | voice and the said specific audio | voice into the said character image of a different kind.

The display device according to claim 18, further comprising:
A communication unit that acquires an audio signal by communication,
The display device, wherein the specific sound is a sound output based on a sound signal acquired by the communication unit.

A transmissive head-mounted display device,
An image display unit that generates image light representing an image and allows the user to visually recognize the image light while being mounted on the user's head;
An audio acquisition unit for acquiring audio;
A conversion unit for converting the sound into a character image represented as an image by characters;
A gaze direction estimation unit that estimates the gaze direction of the user;
A head-mounted display device comprising: a display position setting unit that sets an image display position that is a position for visually recognizing the character image light representing the character image in the visual field of the user based on the change in the line-of-sight direction.

The head-mounted display device according to claim 20,
The sound acquisition unit is different in sensitivity for acquiring sound volume according to the direction from the sound source to the sound acquisition unit,
The head-mounted display device, wherein the display position setting unit sets the image display position based on sound volume obtained differently for each direction in which different same sound sources emit.

The head-mounted display device according to claim 20 or 21,
The line-of-sight direction estimation unit estimates a specific value of at least one of an angular velocity and an angle change amount in the line-of-sight direction with reference to a display state in which the character image light is recognized by a user,
The display position setting unit is a head-mounted display device that sets the image display position other than the central part in the user's visual field when the specific value exceeds a certain value.

The head-mounted display device according to claim 22,
The line-of-sight direction estimation unit estimates a gravity direction and a horizontal direction perpendicular to the gravity direction,
The head-mounted display device, wherein the display position setting unit sets the image display position in a user's field of view based on the specific value in the display state with respect to the gravity direction and the horizontal direction.

The head-mounted display device according to claim 22 or claim 23,
The display position setting unit sets the image display position other than the central portion in the user's visual field when the angle change amount is equal to or greater than a third threshold value, and the angle change amount is a third threshold value. The head-mounted display device that sets the image display position to a preset position in the user's visual field when the number is less than.

The head-mounted display device according to any one of claims 22 to 24,
The display position setting unit sets the image display position at a central portion in the user's field of view when a predetermined time has elapsed with the change amount of the angle being less than a fourth threshold, and the change of the angle. A head-mounted display device that sets the image display position in a region other than the center in the user's visual field when the amount is equal to or greater than a fourth threshold.

The head-mounted display device according to any one of claims 22 to 25,
The display position setting unit sets the image display position other than the central part in the user's visual field when the angular velocity is equal to or greater than a fifth threshold, and when the angular velocity is less than the fifth threshold, A head-mounted display device that sets the image display position at a preset position in a user's visual field.

A head-mounted display device according to any one of claims 22 to 26, wherein:
The display position setting unit sets the image display position at a central portion in the user's field of view when the predetermined time has elapsed with the angular velocity being less than a sixth threshold, and the angular velocity is a sixth threshold. In the above case, the head-mounted display device that sets the image display position other than the central portion in the visual field of the user.

A control method for a transmissive display device, which generates an image light representing an image, allows a user to visually recognize the image light, and includes an image display unit that transmits an outside scene.
Obtaining audio,
Converting the sound into a character image represented as an image by characters;
Generating character image light that is image light representing the character image, allowing the user to visually recognize the character image light, and transmitting the outside scene;
Setting a specific direction;
And a step of setting a position for visually recognizing the character image light in the visual field of the user based on the specific direction.

A transmissive head-mounted display device having an image display unit that generates image light representing an image and allows the user to visually recognize the image light while being mounted on the user's head, and transmits an outside scene. Control method,
Obtaining audio,
Converting the sound into a character image represented as an image by characters;
Generating character image light that is image light representing the character image, allowing the user to visually recognize the character image light while the image display unit is mounted on the user's head, and transmitting the outside scene;
Estimating a user's gaze direction;
And a step of setting a position for visually recognizing the character image light in the visual field of the user based on the line-of-sight direction.