JP6201332B2

JP6201332B2 - Sound processor

Info

Publication number: JP6201332B2
Application number: JP2013027665A
Authority: JP
Inventors: 薫千代
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2013-02-15
Filing date: 2013-02-15
Publication date: 2017-09-27
Anticipated expiration: 2033-02-15
Also published as: JP2014158151A

Description

本発明は、音処理装置に関する。 The present invention relates to a sound processing apparatus.

頭部に装着する表示装置である頭部装着型表示装置（ヘッドマウントディスプレイ（Head Mounted Display）、ＨＭＤ）が知られている。頭部装着型表示装置は、例えば、液晶ディスプレイおよび光源を利用して画像を表す画像光を生成し、生成された画像光を投写光学系や導光板を利用して使用者の眼に導くことにより、使用者に虚像を視認させる。頭部装着型表示装置には、使用者が虚像に加えて外景も視認可能な透過型と、使用者が外景を視認できない非透過型と、の２つのタイプがある。透過型の頭部装着型表示装置には、光学透過型およびビデオ透過型がある。 A head-mounted display device (Head Mounted Display, HMD) that is a display device mounted on the head is known. The head-mounted display device, for example, generates image light representing an image using a liquid crystal display and a light source, and guides the generated image light to a user's eye using a projection optical system or a light guide plate This causes the user to visually recognize the virtual image. There are two types of head-mounted display devices: a transmission type in which the user can visually recognize the outside scene in addition to a virtual image, and a non-transmission type in which the user cannot visually recognize the outside scene. The transmissive head-mounted display device includes an optical transmissive type and a video transmissive type.

透過型の頭部装着型表示装置において、音声を変換して音声を表す文字画像として使用者に視認させる技術が知られている。例えば、特許文献１には、記録媒体に記録されたコンテンツを頭部装着型表示装置の使用者に視認させる場合に、コンテンツに含まれる音声を人の声と環境音とに区別して、環境音を表すテキスト画像を生成して、生成したテキスト画像を使用者に視認させる技術が開示されている。また、特許文献２には、歌を歌う合唱隊と頭部装着型表示装置を装着した合唱隊を指揮する指揮者とによる合唱練習において、歌のテンポや音程がずれている合唱隊員を特定して、特定した合唱隊員を示す画像を指揮者に視認させる技術が開示されている。 In a transmissive head-mounted display device, a technique is known that allows a user to visually recognize a character image representing the sound by converting the sound. For example, in Patent Document 1, when a user of a head-mounted display device visually recognizes content recorded on a recording medium, the sound included in the content is classified into human voice and environmental sound, and environmental sound is distinguished. A technique is disclosed in which a text image representing the image is generated and the generated text image is visually recognized by a user. Further, Patent Document 2 specifies a choir member whose tempo and pitch of the song are shifted in chorus practice by a choir who sings a song and a conductor who conducts a choir equipped with a head-mounted display device. Thus, a technique for allowing a conductor to visually recognize an image showing a specified choir member is disclosed.

特開２０１１−２５０１００号公報JP 2011-250100 A 特開２０１０−１４５８７８号公報JP 2010-145878 A

しかし、特許文献１に記載された技術では、使用者の操作によって視聴しているコンテンツの音量を調整することで、使用者は、コンテンツの音声とは異なる外部の音声の聞こえ方を調整できるが、コンテンツの音量を調整しないと共に、コンテンツの音量が大きい場合には、外部の音声が聞こえづらい場合がある。また、コンテンツの音量が大きい状態でありながら、緊急異常時のような突然発生する予想できない外部の音声を使用者に認識させたい課題があった。また、特許文献２には、歌のテンポや音程ずれが発生している音源を使用者に認識させることができるが、使用者が視聴している音声の音量について調整することはできない。なお、上述の課題は、頭部装着型表示装置に限られず、音を出力する音出力装置に共通する課題であった。 However, in the technique described in Patent Document 1, the user can adjust the way the external sound is heard, which is different from the sound of the content, by adjusting the volume of the content being viewed by the user's operation. If the volume of the content is not adjusted and the volume of the content is high, it may be difficult to hear external sound. In addition, there is a problem that the user wants to recognize an unexpected external sound that occurs suddenly, such as in an emergency, while the volume of the content is high. In Patent Document 2, the user can recognize a sound source in which a tempo of a song or a pitch difference has occurred, but it cannot adjust the volume of the sound being viewed by the user. The above-described problem is not limited to the head-mounted display device, and is a problem common to sound output devices that output sound.

本発明は、上述の課題の少なくとも一部を解決するためになされたものであり、以下の形態として実現することが可能である。
本発明の第１の形態は、音処理装置であって、
音を出力する音出力部と、
前記音出力部が出力する出力音とは異なる外部の音である外部音を取得する音取得部と、
前記外部音に基づいて、前記出力音を制御する音制御部と、
画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と、
前記外部音を文字として表した文字画像に変換する変換部と、
前記画像表示部に対して、前記文字画像を表す前記画像光である文字画像光を生成させる表示制御部と、
を備え、
前記表示制御部は、前記画像表示部に対して、前記外部音の音源の位置にかかわらず、前記画像表示部が前記画像光を生成可能な領域である画像生成可能領域における中心部分を除く周辺部に前記文字画像光を生成させ、
前記外部音に含まれる特定の言葉を検出する手段を備え、
前記音制御部は、前記外部音に前記特定の言葉が含まれる場合には、出力音量を上げることにより、前記特定の言葉が使用者に視聴できないようにする、音処理装置である。
本発明の第２の形態は、音処理装置であって、
音を出力する音出力部と、
前記音出力部が出力する出力音とは異なる外部の音である外部音を取得する音取得部と、
前記外部音に基づいて、前記出力音を制御する音制御部と、
画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と、
前記外部音を文字として表した文字画像に変換する変換部と、
前記画像表示部に対して、前記文字画像を表す前記画像光である文字画像光を生成させる表示制御部と、
を備え、
前記表示制御部は、前記画像表示部に対して、前記外部音の音源の位置にかかわらず、前記画像表示部が前記画像光を生成可能な領域である画像生成可能領域における中心部分を除く周辺部に前記文字画像光を生成させ、
前記外部音に含まれる特定の言葉を検出する手段を備え、
前記変換部は、前記外部音に前記特定の言葉が含まれる場合には、前記特定の言葉を文字画像に変換しない、音処理装置である。
本発明の第３の形態は、音処理装置であって、
音を出力する音出力部と、
前記音出力部が出力する出力音とは異なる外部の音である外部音を取得する音取得部と、
前記外部音に基づいて、前記出力音を制御する音制御部と、
画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と、
前記外部音を文字として表した文字画像に変換する変換部と、
前記画像表示部に対して、前記文字画像を表す前記画像光である文字画像光を生成させる表示制御部と、
を備え、
前記表示制御部は、前記画像表示部に対して、前記外部音の音源の位置にかかわらず、前記画像表示部が前記画像光を生成可能な領域である画像生成可能領域における中心部分を除く周辺部に前記文字画像光を生成させ、
前記外部音に含まれる特定の言葉を検出する手段を備え、
前記表示制御部は、前記外部音に前記特定の言葉が含まれる場合には、前記特定の言葉を前記画像表示部に表示させない、音処理装置である。
また、本発明は以下の形態としても実現可能である。 SUMMARY An advantage of some aspects of the invention is to solve at least a part of the problems described above, and the invention can be implemented as the following forms.
A first aspect of the present invention is a sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part ,
Means for detecting a specific word contained in the external sound,
The sound control unit is a sound processing device that prevents the specific word from being viewed by a user by increasing an output volume when the external word includes the specific word .
A second aspect of the present invention is a sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part,
Means for detecting a specific word contained in the external sound,
The said conversion part is a sound processing apparatus which does not convert the said specific word into a character image, when the said specific word is contained in the said external sound.
A third aspect of the present invention is a sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part,
Means for detecting a specific word contained in the external sound,
The display control unit is a sound processing apparatus that does not display the specific word on the image display unit when the specific word is included in the external sound.
The present invention can also be realized as the following forms.

（１）本発明の一形態によれば、音処理装置が提供される。この音処理装置は、音を出力する音出力部と；前記音出力部が出力する出力音とは異なる外部の音である外部音を取得する音取得部と；前記外部音に基づいて、前記出力音を制御する音制御部と、を備える。この形態の音処理装置では、外部音に基づいて出力音を制御するので、使用者は、供給されるコンテンツ等の出力音を視聴しながら、外部音を聞くことができ、外部の変化を速やかに認識できる。 (1) According to one aspect of the present invention, a sound processing apparatus is provided. The sound processing apparatus includes: a sound output unit that outputs a sound; a sound acquisition unit that acquires an external sound that is an external sound different from the output sound output from the sound output unit; and based on the external sound, A sound control unit that controls the output sound. In the sound processing apparatus of this embodiment, the output sound is controlled based on the external sound. Therefore, the user can listen to the external sound while viewing the output sound of the supplied content or the like, and can quickly change the external sound. Can be recognized.

（２）上記形態の音処理装置において、さらに；画像を表す画像光を生成し、使用者に前記画像光を視認させると共に、外景を透過させる画像表示部と、；前記外部音を文字として表した文字画像に変換する変換部と；前記画像表示部に対して、前記文字画像を表す前記画像光である文字画像光を生成させる表示制御部と、を備えていてもよい。この形態の音処理装置では、音取得部が取得した外部音を、外部音を表す文字画像として使用者に視認させ、使用者に外部音をより詳しく認識させることができる。 (2) In the sound processing apparatus according to the above aspect, further: an image display unit that generates image light representing an image and allows a user to visually recognize the image light and transmits an outside scene; and the external sound is represented as characters. A conversion unit that converts the character image into a character image; and a display control unit that causes the image display unit to generate character image light that is the image light representing the character image. In the sound processing apparatus of this form, the user can visually recognize the external sound acquired by the sound acquisition unit as a character image representing the external sound, and can make the user recognize the external sound in more detail.

（３）上記形態の音処理装置において、前記表示制御部は、前記外部音に基づいて、前記画像表示部が使用者に視認させる前記文字画像光の大きさを設定してもよい。この形態の音処理装置によれば、外部音に基づいて文字画像の文字の大きさが設定されるので、使用者に、外部音を視聴させると共に、外部音の音量や周波数における相違を文字画像の文字の大きさとしても視認させ、外部音をより詳しく認識させることができる。 (3) In the sound processing device of the above aspect, the display control unit may set the size of the character image light that the image display unit makes a user visually recognize based on the external sound. According to the sound processing device of this aspect, the character size of the character image is set based on the external sound, so that the user can watch the external sound and also the difference in the volume and frequency of the external sound is detected. The size of the character can be visually recognized, and the external sound can be recognized in more detail.

（４）上記形態の音処理装置において、前記表示制御部は、前記画像表示部に対して、前記画像表示部が前記画像光を生成可能な領域である画像生成可能領域における中心部分を除く周辺部に前記文字画像光を生成させてもよい。この形態の音処理装置によれば、画像生成可能領域における中心部分に外部音を表す文字画像が表示されないため、使用者の視野が必要以上に妨げられることがなく、使用者は、画像表示部を透過する外景を視認できると共に文字画像を視認でき、使用者の利便性が向上する。 (4) In the sound processing apparatus according to the above aspect, the display control unit may be configured to have a periphery of the image display unit excluding a central portion in an image generation possible region that is a region where the image display unit can generate the image light. The character image light may be generated in a part. According to the sound processing device of this embodiment, since the character image representing the external sound is not displayed at the central portion in the image generation possible region, the user's visual field is not obstructed more than necessary, and the user can The user can visually recognize the outside scene and the character image, thereby improving the convenience for the user.

（５）上記形態の音処理装置において、配置される位置が異なる複数の前記音取得部を有し；前記音処理装置は、さらに；一の音取得部が取得した前記外部音の音量と、他の音取得部が取得した前記外部音の音量と、に基づいて、前記画像表示部から前記外部音の音源への方向である音源方向を推定する音源方向推定部を備え；前記表示制御部は、前記画像表示部に対して、前記画像表示部が前記画像光を生成可能な領域である画像生成可能領域における前記文字画像光を、前記音源方向に対応づけて生成させてもよい。この形態の音処理装置によれば、使用者に、音源の近い位置に外部音を文字として表す文字画像を視認させ、音源の位置を使用者に認識させることができる。 (5) In the sound processing device of the above aspect, the sound processing device includes a plurality of the sound acquisition units arranged at different positions; and the sound processing device further includes: a volume of the external sound acquired by the one sound acquisition unit; A sound source direction estimation unit that estimates a sound source direction that is a direction from the image display unit to the sound source of the external sound based on the volume of the external sound acquired by another sound acquisition unit; May cause the image display unit to generate the character image light in the image generation possible region, which is a region in which the image display unit can generate the image light, in association with the sound source direction. According to the sound processing apparatus of this aspect, the user can visually recognize a character image representing the external sound as a character at a position near the sound source, and can make the user recognize the position of the sound source.

（６）上記形態の音処理装置において、前記音制御部は、前記外部音の音量を測定し、測定した前記外部音の音量が第１の閾値未満から前記第１の閾値以上へと変化した場合に、前記出力音の音量を小さくしてもよい。この形態の音処理装置によれば、コンテンツ等の出力音の音量が大きい場合であっても、外部音の音量が大きい場合には出力音の音量が下げられるため、警告音等の音量の大きい外部音が取得された場合でも、外部音をより明確に聞くことができる。 (6) In the sound processing apparatus of the above aspect, the sound control unit measures the volume of the external sound, and the volume of the measured external sound has changed from less than a first threshold value to more than the first threshold value. In this case, the volume of the output sound may be reduced. According to the sound processing apparatus of this aspect, even when the volume of output sound such as content is high, the volume of output sound is lowered when the volume of external sound is high, so the volume of warning sound or the like is high. Even when an external sound is acquired, the external sound can be heard more clearly.

（７）上記形態の音処理装置において、さらに；前記外部音に含まれる特定の音を検出する特定音検出部を備え；前記音制御部は、前記特定音検出部が検出した前記特定の音に基づいて前記出力音を制御してもよい。この形態の音処理装置によれば、使用者が検出したい特定の音を検出したときに、出力音の音量を下げるので、使用者が外部音に含まれる必要な音のみを聞くことができ、使用者の利便性が向上する。 (7) In the sound processing device according to the above aspect, the sound processing device further includes: a specific sound detecting unit that detects a specific sound included in the external sound; and the sound control unit is configured to detect the specific sound detected by the specific sound detecting unit. The output sound may be controlled based on the above. According to the sound processing device of this form, when the user detects a specific sound that the user wants to detect, the volume of the output sound is lowered, so that the user can hear only the necessary sound included in the external sound, User convenience is improved.

（８）上記形態の音処理装置において、さらに；使用者の頭部の動きの変化を検出する動き検出部を備え；前記音制御部は、前記動き検出部が検出した使用者の頭部の動きの変化に基づいて、前記出力音を制御してもよい。この形態の音処理装置によれば、外部音の変化と使用者の視線方向の変化との２つの指標に基づいて出力音の音量や周波数等が調整されるので、外部音の変化のみで出力音の音量が調整される場合に比べ、不要な調整が抑制され、使用者の利便性が向上する。 (8) The sound processing apparatus according to the above aspect further includes: a motion detection unit that detects a change in the motion of the user's head; and the sound control unit is configured to detect the user's head detected by the motion detection unit. The output sound may be controlled based on a change in movement. According to the sound processing apparatus of this embodiment, the volume and frequency of the output sound are adjusted based on two indexes, that is, the change in the external sound and the change in the user's line of sight, so that the output is performed only by the change in the external sound. Compared with the case where the sound volume is adjusted, unnecessary adjustment is suppressed, and the convenience for the user is improved.

（９）上記形態の音処理装置において、前記音制御部は、前記音出力部に対し、前記外部音の変化に基づいて、前記外部音の音量を設定した音を前記出力音として出力させてもよい。この形態の音処理装置によれば、外部音に含まれる警告音等の特定の音が出力音に対して小さい音量で検出された場合に、外部音の音量よりも大きい音量の出力音として使用者に視聴させることができ、使用者の利便性が向上する。 (9) In the sound processing device according to the above aspect, the sound control unit causes the sound output unit to output a sound in which the volume of the external sound is set as the output sound based on a change in the external sound. Also good. According to the sound processing device of this aspect, when a specific sound such as a warning sound included in the external sound is detected with a volume lower than the output sound, the sound is used as an output sound having a volume larger than the volume of the external sound. Can be viewed by the user, and the convenience for the user is improved.

（１０）上記形態の音処理装置において、前記音制御部は、前記音取得部が特定の人の声を取得した場合には、取得された前記特定の人の声の周波数を変更し、前記音出力部に対し、周波数を変更した前記特定の人の声を出力すると共に、前記特定の人の声以外の音の音量を小さくしてもよい。この形態の音処理装置によれば、人が聞き取りづらい周波数の声が取得された場合に、取得された声を視聴しやすくするため、使用者は、外部音を認識しやすくなる。 (10) In the sound processing device of the above aspect, when the sound acquisition unit acquires a voice of a specific person, the sound control unit changes the frequency of the acquired voice of the specific person, The sound output unit may output the voice of the specific person whose frequency has been changed, and reduce the volume of sound other than the voice of the specific person. According to the sound processing device of this aspect, when a voice having a frequency that is difficult for a person to hear is acquired, the user can easily recognize the external sound because the acquired voice is easily viewed.

（１１）上記形態の音処理装置において、前記音制御部は、前記外部音の音量を測定し、測定した前記外部音の音量と、前記出力音の音量と、に基づいて前記出力音を制御してもよい。この形態の音処理装置によれば、外部音の音量と出力音の音量とに基づいて、出力音の音量が調整されるため、使用者の意思を反映した音量で出力音を使用者に視聴させることができ、利用者の利便性が向上する。 (11) In the sound processing device according to the above aspect, the sound control unit measures the volume of the external sound and controls the output sound based on the measured volume of the external sound and the volume of the output sound. May be. According to the sound processing device of this aspect, the volume of the output sound is adjusted based on the volume of the external sound and the volume of the output sound, so that the user can view the output sound at a volume that reflects the user's intention. This can improve the convenience for the user.

（１２）上記形態の音処理装置において、前記音出力部は、使用者の頭部に装着された状態において、前記出力音を使用者に視聴させてもよい。この形態の音処理装置によれば、使用者の頭部に装着されるため、使用者の耳から音出力部が外れにくく、安定的に出力音を使用者に視聴させることができる。 (12) In the sound processing device according to the above aspect, the sound output unit may allow the user to view the output sound in a state of being mounted on the user's head. According to the sound processing device of this aspect, since the sound output unit is attached to the user's head, the sound output unit is hardly detached from the user's ear, and the user can stably view the output sound.

（１３）上記形態の音処理装置において、前記頭部の動きの変化は、使用者の頭部における角速度と角度の変化量との少なくとも一方であり；前記音声制御部は、前記頭部の動きの変化が第２の閾値以上である場合に、前記出力音声の音量を小さくしてもよい。この形態の音処理装置によれば、外部音声の変化と使用者の視線方向の変化との２つの指標に基づいて出力音声の音量が調整されるので、外部音声の変化のみで出力音声の音量が調整される場合に比べ、不要な調整が抑制され、使用者の利便性が向上する。 (13) In the sound processing device according to the above aspect, the change in the movement of the head is at least one of an angular velocity and an amount of change in the head of the user; The volume of the output sound may be reduced when the change in is greater than or equal to the second threshold. According to the sound processing device of this aspect, the volume of the output sound is adjusted based on two indicators, that is, the change in the external sound and the change in the user's line-of-sight direction. Compared with the case where the adjustment is made, unnecessary adjustment is suppressed, and the convenience for the user is improved.

（１４）上記形態の音処理装置において、さらに；前記外部音声の周波数スペクトルにおける周波数ごとの強度の解析を行なう解析部を備え；前記音声制御部は、前記解析部が解析した周波数スペクトルにおける特定の周波数に基づいて、前記出力音声の音量を制御してもよい。この形態の音処理装置によれば、取得された外部音声に周波数の異なる異音等が検出された場合に、使用者に異音等を認識させることができ、使用者の利便性が向上する。 (14) In the sound processing device according to the above aspect, the sound processing device further includes: an analysis unit that analyzes an intensity for each frequency in the frequency spectrum of the external sound; and the sound control unit has a specific frequency spectrum analyzed by the analysis unit. The volume of the output sound may be controlled based on the frequency. According to this form of the sound processing apparatus, when an abnormal sound or the like having a different frequency is detected in the acquired external sound, the user can be made to recognize the abnormal sound or the like, and the convenience for the user is improved. .

上述した本発明の各形態の有する複数の構成要素はすべてが必須のものではなく、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、適宜、前記複数の構成要素の一部の構成要素について、その変更、削除、新たな他の構成要素との差し替え、限定内容の一部削除を行なうことが可能である。また、上述の課題の一部または全部を解決するため、あるいは、本明細書に記載された効果の一部または全部を達成するために、上述した本発明の一形態に含まれる技術的特徴の一部または全部を上述した本発明の他の形態に含まれる技術的特徴の一部または全部と組み合わせて、本発明の独立した一形態とすることも可能である。 A plurality of constituent elements of each embodiment of the present invention described above are not essential, and some or all of the effects described in the present specification are to be solved to solve part or all of the above-described problems. In order to achieve the above, it is possible to appropriately change, delete, replace with another new component, and partially delete the limited contents of some of the plurality of components. In order to solve some or all of the above-described problems or achieve some or all of the effects described in this specification, technical features included in one embodiment of the present invention described above. A part or all of the technical features included in the other aspects of the present invention described above may be combined to form an independent form of the present invention.

例えば、本発明の一形態は、音出力部と、音取得部と、音制御部と、の３つ要素の内の一つ以上または全部の要素を備えた装置として実現可能である。すなわち、この装置は、音出力部を有していてもよく、有していなくてもよい。また、装置は、音取得部を有していてもよく、有していなくてもよい。また、装置は、音制御部を有していてもよく、有していなくてもよい。音出力部は、例えば、音を出力してもよい。音取得部は、例えば、音出力部が出力する出力音とは異なる外部の音である外部音を取得してもよい。音制御部は、例えば、外部音に基づいて、出力音を制御してもよい。こうした装置は、例えば、音処理装置として実現できるが、音処理装置以外の他の装置としても実現可能である。このような形態によれば、装置を使用する使用者の利便性の向上、装置の一体化や、使い勝手の向上等の種々の課題の少なくとも１つを解決することができる。前述した音処理装置の各形態の技術的特徴の一部または全部は、いずれもこの装置に適用することが可能である。 For example, one embodiment of the present invention can be realized as an apparatus including one or more or all of the three elements of a sound output unit, a sound acquisition unit, and a sound control unit. That is, this apparatus may or may not have a sound output unit. The device may or may not have a sound acquisition unit. The device may or may not have a sound control unit. For example, the sound output unit may output sound. The sound acquisition unit may acquire an external sound that is an external sound different from the output sound output from the sound output unit, for example. For example, the sound control unit may control the output sound based on the external sound. Such a device can be realized as a sound processing device, for example, but can also be realized as a device other than the sound processing device. According to such a form, it is possible to solve at least one of various problems such as improvement of convenience for a user who uses the apparatus, integration of the apparatus, and improvement of usability. Any or all of the technical features of each form of the sound processing apparatus described above can be applied to this apparatus.

本発明は、音処理装置以外の種々の形態で実現することも可能である。例えば、画像表示装置、頭部装着型表示装置、音処理装置の制御方法、画像表示装置の制御方法、頭部装着型表示装置の制御方法、音処理システム、画像表示システム、頭部装着型画像表示システム等の形態で実現できる。また、音処理装置システム、画像表示装置システム、頭部装着型表示装置システム、の機能を実現するためのコンピュータープログラム、そのコンピュータープログラムを記録した記録媒体、そのコンピュータープログラムを含み搬送波内に具現化されたデータ信号等の形態で実現できる。 The present invention can also be realized in various forms other than the sound processing apparatus. For example, an image display device, a head-mounted display device, a sound processing device control method, an image display device control method, a head-mounted display device control method, a sound processing system, an image display system, a head-mounted image It can be realized in the form of a display system or the like. A computer program for realizing the functions of a sound processing device system, an image display device system, and a head-mounted display device system, a recording medium on which the computer program is recorded, and the computer program including the computer program are embodied in a carrier wave. It can be realized in the form of a data signal or the like.

頭部装着型表示装置１００の外観構成を示す説明図である。2 is an explanatory diagram showing an external configuration of a head-mounted display device 100. FIG. 頭部装着型表示装置１００の構成を機能的に示すブロック図である。3 is a block diagram functionally showing the configuration of the head-mounted display device 100. FIG. 画像光生成部によって画像光が射出される様子を示す説明図である。It is explanatory drawing which shows a mode that image light is inject | emitted by the image light production | generation part. 出力音量調整処理の流れを示す説明図である。It is explanatory drawing which shows the flow of an output volume adjustment process. 使用者が視認する視野ＶＲの一例を示す説明図である。It is explanatory drawing which shows an example of the visual field VR visually recognized by the user. 取得される外部音声の音量の変化の一例を示す説明図である。It is explanatory drawing which shows an example of the change of the volume of the external audio | voice acquired. 取得される外部音声の周波数の一例について示す説明図である。It is explanatory drawing shown about an example of the frequency of the external audio | voice acquired. 自動車の運転手が視認する視野ＶＲ’の一例を示す説明図である。It is explanatory drawing which shows an example of visual field VR 'visually recognized by the driver | operator of a motor vehicle.

次に、本発明の実施の形態を実施形態に基づいて以下の順序で説明する。
Ａ．第１実施形態：
Ａ−１．頭部装着型表示装置の構成：
Ａ−２．出力音量調整処理：
Ｂ．変形例： Next, embodiments of the present invention will be described in the following order based on the embodiments.
A. First embodiment:
A-1. Configuration of head mounted display device:
A-2. Output volume adjustment processing:
B. Variations:

Ａ．第１実施形態：
Ａ−１．頭部装着型表示装置の構成：
図１は、頭部装着型表示装置１００の外観構成を示す説明図である。頭部装着型表示装置１００は、頭部に装着する表示装置であり、ヘッドマウントディスプレイ（Head Mounted Display、ＨＭＤ）とも呼ばれる。本実施形態の頭部装着型表示装置１００は、使用者に、虚像を視認させると同時に外景も直接視認させることができる光学透過型の頭部装着型表示装置である。なお、本明細書では、頭部装着型表示装置１００によって使用者が視認する虚像を便宜的に「表示画像」とも呼ぶ。また、画像データに基づいて生成された画像光を射出することを「画像を表示する」ともいう。 A. First embodiment:
A-1. Configuration of head mounted display device:
FIG. 1 is an explanatory diagram showing an external configuration of the head-mounted display device 100. The head-mounted display device 100 is a display device mounted on the head, and is also called a head mounted display (HMD). The head-mounted display device 100 according to this embodiment is an optically transmissive head-mounted display device that allows a user to visually recognize a virtual image and at the same time directly view an outside scene. In this specification, a virtual image visually recognized by the user with the head-mounted display device 100 is also referred to as a “display image” for convenience. Moreover, emitting image light generated based on image data is also referred to as “displaying an image”.

頭部装着型表示装置１００は、使用者の頭部に装着された状態において使用者に虚像を視認させる画像表示部２０と、画像表示部２０を制御する制御部１０（コントローラー１０）と、を備えている。 The head-mounted display device 100 includes an image display unit 20 that allows a user to visually recognize a virtual image when mounted on the user's head, and a control unit 10 (controller 10) that controls the image display unit 20. I have.

画像表示部２０は、使用者の頭部に装着される装着体であり、本実施形態では眼鏡形状を有している。画像表示部２０は、右保持部２１と、右表示駆動部２２と、左保持部２３と、左表示駆動部２４と、右光学像表示部２６と、左光学像表示部２８と、カメラ６１と、マイク６２，６４と、を含んでいる。右光学像表示部２６および左光学像表示部２８のそれぞれは、使用者が画像表示部２０を装着した際に使用者の右および左の眼前に位置するように配置されている。右光学像表示部２６の一端と左光学像表示部２８の一端とは、使用者が画像表示部２０を装着した際の使用者の眉間に対応する位置で、互いに接続されている。 The image display unit 20 is a mounting body that is mounted on the user's head, and has a glasses shape in the present embodiment. The image display unit 20 includes a right holding unit 21, a right display driving unit 22, a left holding unit 23, a left display driving unit 24, a right optical image display unit 26, a left optical image display unit 28, and a camera 61. And microphones 62 and 64. Each of the right optical image display unit 26 and the left optical image display unit 28 is disposed so as to be positioned in front of the right and left eyes of the user when the user wears the image display unit 20. One end of the right optical image display unit 26 and one end of the left optical image display unit 28 are connected to each other at a position corresponding to the eyebrow of the user when the user wears the image display unit 20.

右保持部２１は、右光学像表示部２６の他端である端部ＥＲから、使用者が画像表示部２０を装着した際の使用者の側頭部に対応する位置にかけて、延伸して設けられた部材である。同様に、左保持部２３は、左光学像表示部２８の他端である端部ＥＬから、使用者が画像表示部２０を装着した際の使用者の側頭部に対応する位置にかけて、延伸して設けられた部材である。右保持部２１および左保持部２３は、眼鏡のテンプル（つる）のようにして、使用者の頭部に画像表示部２０を保持する。 The right holding unit 21 extends from the end ER which is the other end of the right optical image display unit 26 to a position corresponding to the user's temporal region when the user wears the image display unit 20. It is a member. Similarly, the left holding unit 23 extends from the end EL which is the other end of the left optical image display unit 28 to a position corresponding to the user's temporal region when the user wears the image display unit 20. It is a member provided. The right holding unit 21 and the left holding unit 23 hold the image display unit 20 on the user's head like a temple of glasses.

右表示駆動部２２と左表示駆動部２４とは、使用者が画像表示部２０を装着した際の使用者の頭部に対向する側に配置されている。なお、以降では、右保持部２１および左保持部２３を総称して単に「保持部」とも呼び、右表示駆動部２２および左表示駆動部２４を総称して単に「表示駆動部」とも呼び、右光学像表示部２６および左光学像表示部２８を総称して単に「光学像表示部」とも呼ぶ。 The right display drive unit 22 and the left display drive unit 24 are disposed on the side facing the user's head when the user wears the image display unit 20. Hereinafter, the right holding unit 21 and the left holding unit 23 are collectively referred to simply as “holding unit”, and the right display driving unit 22 and the left display driving unit 24 are collectively referred to simply as “display driving unit”. The right optical image display unit 26 and the left optical image display unit 28 are collectively referred to simply as “optical image display unit”.

表示駆動部２２，２４は、液晶ディスプレイ２４１，２４２（Liquid Crystal Display、以下「ＬＣＤ２４１，２４２」とも呼ぶ）や投写光学系２５１，２５２等を含む（図２参照）。表示駆動部２２，２４の構成の詳細は後述する。光学部材としての光学像表示部２６，２８は、導光板２６１，２６２（図２参照）と調光板とを含んでいる。導光板２６１，２６２は、光透過性の樹脂材料等によって形成され、表示駆動部２２，２４から出力された画像光を使用者の眼に導く。調光板は、薄板状の光学素子であり、使用者の眼の側とは反対の側である画像表示部２０の表側を覆うように配置されている。調光板は、導光板２６１，２６２を保護し、導光板２６１，２６２の損傷や汚れの付着等を抑制する。また、調光板の光透過率を調整することによって、使用者の眼に入る外光量を調整して虚像の視認のしやすさを調整できる。なお、調光板は省略可能である。 The display driving units 22 and 24 include liquid crystal displays 241 and 242 (hereinafter referred to as “LCDs 241 and 242”), projection optical systems 251 and 252 (see FIG. 2). Details of the configuration of the display driving units 22 and 24 will be described later. The optical image display units 26 and 28 as optical members include light guide plates 261 and 262 (see FIG. 2) and a light control plate. The light guide plates 261 and 262 are formed of a light transmissive resin material or the like, and guide the image light output from the display driving units 22 and 24 to the eyes of the user. The light control plate is a thin plate-like optical element, and is arranged so as to cover the front side of the image display unit 20 which is the side opposite to the user's eye side. The light control plate protects the light guide plates 261 and 262 and suppresses damage to the light guide plates 261 and 262 and adhesion of dirt. In addition, by adjusting the light transmittance of the light control plate, it is possible to adjust the amount of external light entering the user's eyes and adjust the ease of visual recognition of the virtual image. The light control plate can be omitted.

カメラ６１は、使用者が画像表示部２０を装着した際の使用者の眉間に対応する位置に配置されている。カメラ６１は、使用者の眼の側とは反対側方向の外部の景色である外景を撮像し、外景画像を取得する。本実施形態におけるカメラ６１は、単眼カメラであるが、ステレオカメラであってもよい。 The camera 61 is disposed at a position corresponding to the user's eyebrow when the user wears the image display unit 20. The camera 61 captures an outside scene that is an external scenery in a direction opposite to the user's eye side, and acquires an outside scene image. The camera 61 in the present embodiment is a monocular camera, but may be a stereo camera.

右マイク６２は、右保持部２１における右表示駆動部２２の反対側に配置されている。左マイク６４は、左保持部２３における左表示駆動部２４の反対側に配置されている。右マイク６２と左マイク６４とは、画像表示部２０に対して配置されている位置が異なるため、右マイク６２が取得する外部音声の音量である外部音量と左マイク６４が取得する外部音量とは異なる。なお、本明細書における「音声」とは、人の声のみでなく、機械音等を含む広義の「音」の意味を表す。右マイク６２および左マイク６４は、請求項における音取得部に相当する。 The right microphone 62 is arranged on the opposite side of the right display drive unit 22 in the right holding unit 21. The left microphone 64 is disposed on the opposite side of the left display driving unit 24 in the left holding unit 23. Since the right microphone 62 and the left microphone 64 are arranged at different positions with respect to the image display unit 20, the external volume that is the volume of the external sound acquired by the right microphone 62 and the external volume that the left microphone 64 acquires are Is different. Note that “speech” in this specification represents the meaning of “sound” in a broad sense including not only human voice but also mechanical sound and the like. The right microphone 62 and the left microphone 64 correspond to the sound acquisition unit in the claims.

画像表示部２０は、さらに、画像表示部２０を制御部１０に接続するための接続部４０を有している。接続部４０は、制御部１０に接続される本体コード４８と、右コード４２と、左コード４４と、連結部材４６と、を含んでいる。右コード４２と左コード４４とは、本体コード４８が２本に分岐したコードである。右コード４２は、右保持部２１の延伸方向の先端部ＡＰから右保持部２１の筐体内に挿入され、右表示駆動部２２に接続されている。同様に、左コード４４は、左保持部２３の延伸方向の先端部ＡＰから左保持部２３の筐体内に挿入され、左表示駆動部２４に接続されている。連結部材４６は、本体コード４８と、右コード４２および左コード４４と、の分岐点に設けられ、イヤホンプラグ３０を接続するためのジャックを有している。イヤホンプラグ３０からは、右イヤホン３２および左イヤホン３４が延伸している。なお、右イヤホン３２および左イヤホン３４は、請求項における音出力部に相当する。 The image display unit 20 further includes a connection unit 40 for connecting the image display unit 20 to the control unit 10. The connection unit 40 includes a main body cord 48, a right cord 42, a left cord 44, and a connecting member 46 connected to the control unit 10. The right cord 42 and the left cord 44 are codes in which the main body cord 48 is branched into two. The right cord 42 is inserted into the casing of the right holding unit 21 from the distal end AP in the extending direction of the right holding unit 21 and connected to the right display driving unit 22. Similarly, the left cord 44 is inserted into the housing of the left holding unit 23 from the distal end AP in the extending direction of the left holding unit 23 and connected to the left display driving unit 24. The connecting member 46 is provided at a branch point between the main body cord 48, the right cord 42 and the left cord 44, and has a jack for connecting the earphone plug 30. A right earphone 32 and a left earphone 34 extend from the earphone plug 30. The right earphone 32 and the left earphone 34 correspond to a sound output unit in the claims.

画像表示部２０と制御部１０とは、接続部４０を介して各種信号の伝送を行なう。本体コード４８における連結部材４６とは反対側の端部と、制御部１０と、のそれぞれには、互いに嵌合するコネクター（図示しない）が設けられている。本体コード４８のコネクターと制御部１０のコネクターとの嵌合／嵌合解除により、制御部１０と画像表示部２０とが接続されたり切り離されたりする。右コード４２と、左コード４４と、本体コード４８とには、例えば、金属ケーブルや光ファイバーを採用できる。 The image display unit 20 and the control unit 10 transmit various signals via the connection unit 40. A connector (not shown) that fits each other is provided at each of the end of the main body cord 48 opposite to the connecting member 46 and the control unit 10. By fitting / releasing the connector of the main body cord 48 and the connector of the control unit 10, the control unit 10 and the image display unit 20 are connected or disconnected. For the right cord 42, the left cord 44, and the main body cord 48, for example, a metal cable or an optical fiber can be adopted.

制御部１０は、頭部装着型表示装置１００を制御するための装置である。制御部１０は、決定キー１１と、点灯部１２と、表示切替キー１３と、トラックパッド１４と、輝度切替キー１５と、方向キー１６と、メニューキー１７と、電源スイッチ１８と、を含んでいる。決定キー１１は、押下操作を検出して、制御部１０で操作された内容を決定する信号を出力する。点灯部１２は、頭部装着型表示装置１００の動作状態を、その発光状態によって通知する。頭部装着型表示装置１００の動作状態としては、例えば、電源のＯＮ／ＯＦＦ等がある。点灯部１２としては、例えば、ＬＥＤ（Light Emitting Diode）が用いられる。表示切替キー１３は、押下操作を検出して、例えば、コンテンツ動画の表示モードを３Ｄと２Ｄとに切り替える信号を出力する。トラックパッド１４は、トラックパッド１４の操作面上での使用者の指の操作を検出して、検出内容に応じた信号を出力する。トラックパッド１４としては、静電式や圧力検出式、光学式といった種々のトラックパッドを採用できる。輝度切替キー１５は、押下操作を検出して、画像表示部２０の輝度を増減する信号を出力する。方向キー１６は、上下左右方向に対応するキーへの押下操作を検出して、検出内容に応じた信号を出力する。電源スイッチ１８は、スイッチのスライド操作を検出することで、頭部装着型表示装置１００の電源投入状態を切り替える。 The control unit 10 is a device for controlling the head-mounted display device 100. The control unit 10 includes a determination key 11, a lighting unit 12, a display switching key 13, a track pad 14, a luminance switching key 15, a direction key 16, a menu key 17, and a power switch 18. Yes. The determination key 11 detects a pressing operation and outputs a signal for determining the content operated by the control unit 10. The lighting unit 12 notifies the operation state of the head-mounted display device 100 by its light emission state. Examples of the operating state of the head-mounted display device 100 include power ON / OFF. For example, an LED (Light Emitting Diode) is used as the lighting unit 12. The display switching key 13 detects a pressing operation and outputs a signal for switching the display mode of the content video between 3D and 2D, for example. The track pad 14 detects the operation of the user's finger on the operation surface of the track pad 14 and outputs a signal corresponding to the detected content. As the track pad 14, various track pads such as an electrostatic type, a pressure detection type, and an optical type can be adopted. The luminance switching key 15 detects a pressing operation and outputs a signal for increasing or decreasing the luminance of the image display unit 20. The direction key 16 detects a pressing operation on a key corresponding to the up / down / left / right direction, and outputs a signal corresponding to the detected content. The power switch 18 switches the power-on state of the head-mounted display device 100 by detecting a slide operation of the switch.

図２は、頭部装着型表示装置１００の構成を機能的に示すブロック図である。図２に示すように、制御部１０は、ＣＰＵ１４０と、操作部１３５と、入力情報取得部１１０と、記憶部１２０と、電源１３０と、インターフェイス１８０と、送信部５１（Ｔｘ５１）および送信部５２（Ｔｘ５２）と、を有している。操作部１３５は、使用者による操作を受け付け、決定キー１１、表示切替キー１３、トラックパッド１４、輝度切替キー１５、方向キー１６、メニューキー１７、電源スイッチ１８、から構成されている。 FIG. 2 is a block diagram functionally showing the configuration of the head-mounted display device 100. As shown in FIG. 2, the control unit 10 includes a CPU 140, an operation unit 135, an input information acquisition unit 110, a storage unit 120, a power supply 130, an interface 180, a transmission unit 51 (Tx51), and a transmission unit 52. (Tx52). The operation unit 135 receives an operation by the user and includes an enter key 11, a display switch key 13, a track pad 14, a luminance switch key 15, a direction key 16, a menu key 17, and a power switch 18.

入力情報取得部１１０は、使用者による操作入力に応じた信号を取得する。操作入力に応じた信号としては、例えば、トラックパッド１４、方向キー１６、電源スイッチ１８、に対する操作入力がある。電源１３０は、頭部装着型表示装置１００の各部に電力を供給する。電源１３０としては、例えば二次電池を用いることができる。記憶部１２０は、種々のコンピュータープログラムを格納している。記憶部１２０は、ＲＯＭやＲＡＭ等によって構成されている。ＣＰＵ１４０は、記憶部１２０に格納されているコンピュータープログラムを読み出して実行することにより、オペレーティングシステム１５０（ОＳ１５０）、画像処理部１６０、音声処理部１７０、変換部１８５、方向判定部１６１、表示制御部１９０、として機能する。 The input information acquisition unit 110 acquires a signal corresponding to an operation input by the user. As a signal corresponding to the operation input, for example, there is an operation input to the track pad 14, the direction key 16, and the power switch 18. The power supply 130 supplies power to each part of the head-mounted display device 100. As the power supply 130, for example, a secondary battery can be used. The storage unit 120 stores various computer programs. The storage unit 120 is configured by a ROM, a RAM, or the like. The CPU 140 reads out and executes the computer program stored in the storage unit 120, thereby operating the operating system 150 (OS 150), the image processing unit 160, the sound processing unit 170, the conversion unit 185, the direction determination unit 161, and the display control unit. 190.

画像処理部１６０は、コンテンツに含まれる画像信号を取得する。画像処理部１６０は、取得した画像信号から、垂直同期信号ＶＳｙｎｃや水平同期信号ＨＳｙｎｃ等の同期信号を分離する。また、画像処理部１６０は、分離した垂直同期信号ＶＳｙｎｃや水平同期信号ＨＳｙｎｃの周期に応じて、ＰＬＬ（Phase Locked Loop）回路等（図示しない）を利用してクロック信号ＰＣＬＫを生成する。画像処理部１６０は、同期信号が分離されたアナログ画像信号を、Ａ／Ｄ変換回路等（図示しない）を用いてディジタル画像信号に変換する。その後、画像処理部１６０は、変換後のディジタル画像信号を、対象画像の画像データＤａｔａ（ＲＧＢデータ）として、１フレームごとに記憶部１２０内のＤＲＡＭに格納する。なお、画像処理部１６０は、必要に応じて、画像データに対して、解像度変換処理、輝度、彩度の調整といった種々の色調補正処理、キーストーン補正処理等の画像処理を実行してもよい。 The image processing unit 160 acquires an image signal included in the content. The image processing unit 160 separates synchronization signals such as the vertical synchronization signal VSync and the horizontal synchronization signal HSync from the acquired image signal. Further, the image processing unit 160 generates a clock signal PCLK using a PLL (Phase Locked Loop) circuit or the like (not shown) according to the period of the separated vertical synchronization signal VSync and horizontal synchronization signal HSync. The image processing unit 160 converts the analog image signal from which the synchronization signal is separated into a digital image signal using an A / D conversion circuit or the like (not shown). Thereafter, the image processing unit 160 stores the converted digital image signal as image data Data (RGB data) of the target image in the DRAM in the storage unit 120 for each frame. Note that the image processing unit 160 may execute image processing such as various tone correction processing such as resolution conversion processing, brightness and saturation adjustment, and keystone correction processing on the image data as necessary. .

画像処理部１６０は、生成されたクロック信号ＰＣＬＫ、垂直同期信号ＶＳｙｎｃ、水平同期信号ＨＳｙｎｃ、記憶部１２０内のＤＲＡＭに格納された画像データＤａｔａ、のそれぞれを、送信部５１、５２を介して送信する。なお、送信部５１を介して送信される画像データＤａｔａを「右眼用画像データ」とも呼び、送信部５２を介して送信される画像データＤａｔａを「左眼用画像データ」とも呼ぶ。送信部５１、５２は、制御部１０と画像表示部２０との間におけるシリアル伝送のためのトランシーバーとして機能する。 The image processing unit 160 transmits the generated clock signal PCLK, vertical synchronization signal VSync, horizontal synchronization signal HSync, and image data Data stored in the DRAM in the storage unit 120 via the transmission units 51 and 52, respectively. To do. The image data Data transmitted via the transmission unit 51 is also referred to as “right eye image data”, and the image data Data transmitted via the transmission unit 52 is also referred to as “left eye image data”. The transmission units 51 and 52 function as a transceiver for serial transmission between the control unit 10 and the image display unit 20.

音声処理部１７０は、取得する外部音声および出力する音声である出力音声について様々な処理を行なう。音声処理部１７０は、コンテンツに含まれる音声信号を取得し、取得した音声信号を増幅して、連結部材４６に接続された右イヤホン３２内のスピーカー（図示しない）および左イヤホン３４内のスピーカー（図示しない）に対して供給する。なお、例えば、Ｄｏｌｂｙ（登録商標）システムを採用した場合、音声信号に対する処理がなされ、右イヤホン３２および左イヤホン３４のそれぞれからは、例えば周波数等が変えられた異なる音が出力される。 The sound processing unit 170 performs various processes on the external sound to be acquired and the output sound that is the sound to be output. The audio processing unit 170 acquires an audio signal included in the content, amplifies the acquired audio signal, and a speaker (not shown) in the right earphone 32 and a speaker (not shown) connected to the connecting member 46 ( (Not shown). For example, when the Dolby (registered trademark) system is adopted, processing on the audio signal is performed, and different sounds with different frequencies or the like are output from the right earphone 32 and the left earphone 34, for example.

また、音声処理部１７０は、マイク６２，６４が取得した外部音量および出力音声の音量である出力音量をデジベル（ｄＢ）の数値として特定したり、外部音声に含まれると共に、記憶部１２０に予め記憶された警告音といった特定の音声を検出する。外部音声に含まれている特定の音声の検出は、記憶部１２０に予め記憶された警告音等の周波数スペクトルの波形と、所定の期間に取得された外部音声の周波数スペクトルの波形と、を照合することによって行なわれる。音声処理部１７０は、周波数スペクトルの波形を照合することで、人の声とそれ以外の音声とを識別することもできる。また、音声処理部１７０は、取得した外部音声の周波数スペクトルを解析して、周波数ごとの強度を算出する。本実施形態では、音声処理部１７０は、外部音量を１秒ごとに取得し、周波数スペクトルの解析では、１秒間に取得された外部音声の解析を行なう。 In addition, the sound processing unit 170 specifies the external sound volume acquired by the microphones 62 and 64 and the output sound volume that is the sound volume of the output sound as a numerical value of decibel (dB), or is included in the external sound and stored in the storage unit 120 in advance. A specific sound such as a stored warning sound is detected. The detection of a specific voice included in the external voice is performed by collating the waveform of the frequency spectrum such as a warning sound stored in advance in the storage unit 120 with the waveform of the frequency spectrum of the external voice acquired during a predetermined period. It is done by doing. The voice processing unit 170 can also discriminate between human voices and other voices by collating waveforms of frequency spectra. In addition, the sound processing unit 170 analyzes the frequency spectrum of the acquired external sound and calculates the intensity for each frequency. In the present embodiment, the sound processing unit 170 acquires the external sound volume every second, and in the analysis of the frequency spectrum, analyzes the external sound acquired per second.

また、音声処理部１７０は、ある時点で取得された外部音量が予め定められた音量以上であるか否か、または、ある時点で取得された外部音量とその１秒前に取得された外部音量との変化、に基づいて出力音量を調整する。音声処理部１７０は、ある時点から１秒間に取得された外部音声の周波数スペクトルがその直前の１秒間に取得された外部音声の周波数スペクトルに対しての変化に基づいて出力音量を調整する。また、音声処理部１７０は、検出された外部音声に含まれる特定の音声に基づいて出力音量を調整する。 Also, the sound processing unit 170 determines whether or not the external volume acquired at a certain time is equal to or higher than a predetermined volume, or the external volume acquired at a certain time and the external volume acquired one second before that. The output volume is adjusted based on the change. The sound processing unit 170 adjusts the output volume based on a change in the frequency spectrum of the external sound acquired in one second from a certain time point with respect to the frequency spectrum of the external sound acquired in the immediately preceding one second. In addition, the sound processing unit 170 adjusts the output volume based on specific sound included in the detected external sound.

音声処理部１７０は、右マイク６２が取得する外部音量と左マイク６４が取得する外部音量とに基づいて画像表示部２０から外部音声の音源への方向を推定する。具体的には、本実施形態において、右マイク６２の取得する音量が左マイク６４の取得する音量よりも大きい場合、音声処理部１７０は、使用者を中心として右方向に音源があると推定する。なお、音声処理部１７０は、請求項における音制御部、解析部、特定音検出部、音源方向推定部、に相当する。 The sound processing unit 170 estimates the direction of the external sound from the image display unit 20 to the sound source based on the external volume acquired by the right microphone 62 and the external volume acquired by the left microphone 64. Specifically, in this embodiment, when the volume acquired by the right microphone 62 is larger than the volume acquired by the left microphone 64, the audio processing unit 170 estimates that there is a sound source in the right direction with the user at the center. . The voice processing unit 170 corresponds to a sound control unit, an analysis unit, a specific sound detection unit, and a sound source direction estimation unit in the claims.

変換部１８５は、マイク６２，６４が取得した音声を、音声を文字によって表した文字画像へと変換する。方向判定部１６１は、後述する９軸センサー６６が検出した画像表示部２０の動きの変化に基づいて、使用者の頭部における角速度や角度の変化量を特定する。 The conversion unit 185 converts the sound acquired by the microphones 62 and 64 into a character image in which the sound is represented by characters. The direction determination unit 161 specifies the angular velocity and the amount of change in the angle of the user's head based on the change in the movement of the image display unit 20 detected by the 9-axis sensor 66 described later.

表示制御部１９０は、右表示駆動部２２および左表示駆動部２４を制御する制御信号を生成する。具体的には、表示制御部１９０は、制御信号により、右ＬＣＤ制御部２１１による右ＬＣＤ２４１の駆動ＯＮ／ＯＦＦ、右バックライト制御部２０１による右バックライト２２１の駆動ＯＮ／ＯＦＦ、左ＬＣＤ制御部２１２による左ＬＣＤ２４２の駆動ＯＮ／ＯＦＦ、左バックライト制御部２０２による左バックライト２２２の駆動ＯＮ／ＯＦＦなど、を個別に制御する。これにより、表示制御部１９０は、右表示駆動部２２および左表示駆動部２４のそれぞれによる画像光の生成および射出を制御する。例えば、表示制御部１９０は、右表示駆動部２２および左表示駆動部２４の両方に画像光を生成させたり、一方のみに画像光を生成させたり、両方共に画像光を生成させなかったりする。表示制御部１９０は、右ＬＣＤ制御部２１１と左ＬＣＤ制御部２１２とに対する制御信号のそれぞれを、送信部５１および５２を介して送信する。また、表示制御部１９０は、右バックライト制御部２０１と左バックライト制御部２０２とに対する制御信号のそれぞれを送信する。 The display control unit 190 generates control signals for controlling the right display drive unit 22 and the left display drive unit 24. Specifically, the display control unit 190 controls driving of the right LCD 241 by the right LCD control unit 211, driving ON / OFF of the right backlight 221 by the right backlight control unit 201, and left LCD control unit according to control signals. The left LCD 242 driving ON / OFF by 212, the left backlight 222 driving ON / OFF by the left backlight control unit 202, and the like are individually controlled. Thus, the display control unit 190 controls the generation and emission of image light by the right display driving unit 22 and the left display driving unit 24, respectively. For example, the display control unit 190 may cause both the right display driving unit 22 and the left display driving unit 24 to generate image light, generate only one image light, or neither may generate image light. The display control unit 190 transmits control signals for the right LCD control unit 211 and the left LCD control unit 212 via the transmission units 51 and 52, respectively. In addition, the display control unit 190 transmits control signals to the right backlight control unit 201 and the left backlight control unit 202, respectively.

また、表示制御部１９０は、マイク６２，６４が取得した外部音量に基づいて、方向判定部１６１が変換した文字画像の文字の大きさを設定し、文字の大きさを設定した文字画像を表す制御信号として画像表示部２０に送信する。画像表示部２０は、送信された制御信号に基づいて文字画像を表す画像光を生成して、使用者の眼に射出することで、使用者は、音声を文字画像として視認できる。 Further, the display control unit 190 sets the character size of the character image converted by the direction determination unit 161 based on the external volume acquired by the microphones 62 and 64, and represents the character image in which the character size is set. It transmits to the image display part 20 as a control signal. The image display unit 20 generates image light representing a character image based on the transmitted control signal and emits it to the user's eyes, so that the user can visually recognize the sound as a character image.

インターフェイス１８０は、制御部１０に対して、コンテンツの供給元となる種々の外部機器ＯＡを接続するためのインターフェイスである。外部機器ＯＡとしては、例えば、パーソナルコンピューター（ＰＣ）や携帯電話端末、ゲーム端末等、がある。インターフェイス１８０としては、例えば、ＵＳＢインターフェイス、マイクロＵＳＢインターフェイス、メモリーカード用インターフェイス等、を用いることができる。 The interface 180 is an interface for connecting various external devices OA that are content supply sources to the control unit 10. Examples of the external device OA include a personal computer (PC), a mobile phone terminal, and a game terminal. As the interface 180, for example, a USB interface, a micro USB interface, a memory card interface, or the like can be used.

画像表示部２０は、右表示駆動部２２と、左表示駆動部２４と、右光学像表示部２６としての右導光板２６１と、左光学像表示部２８としての左導光板２６２と、カメラ６１と、９軸センサー６６と、右マイク６２と、左マイク６４と、を備えている。 The image display unit 20 includes a right display drive unit 22, a left display drive unit 24, a right light guide plate 261 as a right optical image display unit 26, a left light guide plate 262 as a left optical image display unit 28, and a camera 61. A 9-axis sensor 66, a right microphone 62, and a left microphone 64.

９軸センサー６６は、加速度（３軸）、角速度（３軸）、地磁気（３軸）、を検出するモーションセンサーである。９軸センサー６６は、画像表示部２０に設けられているため、画像表示部２０が使用者の頭部に装着されているときには、使用者の頭部の動きを検出する。検出された使用者の頭部の動きから画像表示部２０の向きがわかるため、方向判定部１６１は、使用者の視線方向を推定できる。方向判定部１６１と９軸センサー６６とは、請求項における動き検出部に相当する。マイク６２，６４は、取得した音声の音声信号を変換部１８５および音声処理部１７０に送信する。 The 9-axis sensor 66 is a motion sensor that detects acceleration (3 axes), angular velocity (3 axes), and geomagnetism (3 axes). Since the 9-axis sensor 66 is provided in the image display unit 20, when the image display unit 20 is mounted on the user's head, the movement of the user's head is detected. Since the orientation of the image display unit 20 is known from the detected movement of the user's head, the direction determination unit 161 can estimate the user's line-of-sight direction. The direction determination unit 161 and the 9-axis sensor 66 correspond to a motion detection unit in claims. The microphones 62 and 64 transmit the acquired audio signal to the conversion unit 185 and the audio processing unit 170.

右表示駆動部２２は、受信部５３（Ｒｘ５３）と、光源として機能する右バックライト制御部２０１（右ＢＬ制御部２０１）および右バックライト２２１（右ＢＬ２２１）と、表示素子として機能する右ＬＣＤ制御部２１１および右ＬＣＤ２４１と、右投写光学系２５１と、を含んでいる。右バックライト制御部２０１と右バックライト２２１とは、光源として機能する。右ＬＣＤ制御部２１１と右ＬＣＤ２４１とは、表示素子として機能する。なお、右バックライト制御部２０１と、右ＬＣＤ制御部２１１と、右バックライト２２１と、右ＬＣＤ２４１と、を総称して「画像光生成部」とも呼ぶ。 The right display driving unit 22 includes a receiving unit 53 (Rx53), a right backlight control unit 201 (right BL control unit 201) and a right backlight 221 (right BL221) that function as a light source, and a right LCD that functions as a display element. A control unit 211, a right LCD 241 and a right projection optical system 251 are included. The right backlight control unit 201 and the right backlight 221 function as a light source. The right LCD control unit 211 and the right LCD 241 function as display elements. The right backlight control unit 201, the right LCD control unit 211, the right backlight 221 and the right LCD 241 are collectively referred to as “image light generation unit”.

受信部５３は、制御部１０と画像表示部２０との間におけるシリアル伝送のためのレシーバーとして機能する。右バックライト制御部２０１は、入力された制御信号に基づいて、右バックライト２２１を駆動する。右バックライト２２１は、例えば、ＬＥＤやエレクトロルミネセンス（ＥＬ）等の発光体である。右ＬＣＤ制御部２１１は、受信部５３を介して入力されたクロック信号ＰＣＬＫと、垂直同期信号ＶＳｙｎｃと、水平同期信号ＨＳｙｎｃと、右眼用画像データと、に基づいて、右ＬＣＤ２４１を駆動する。右ＬＣＤ２４１は、複数の画素をマトリクス状に配置した透過型液晶パネルである。 The receiving unit 53 functions as a receiver for serial transmission between the control unit 10 and the image display unit 20. The right backlight control unit 201 drives the right backlight 221 based on the input control signal. The right backlight 221 is a light emitter such as an LED or electroluminescence (EL). The right LCD control unit 211 drives the right LCD 241 based on the clock signal PCLK, the vertical synchronization signal VSync, the horizontal synchronization signal HSync, and the right-eye image data input via the reception unit 53. The right LCD 241 is a transmissive liquid crystal panel in which a plurality of pixels are arranged in a matrix.

右投写光学系２５１は、右ＬＣＤ２４１から射出された画像光を並行状態の光束にするコリメートレンズによって構成される。右光学像表示部２６としての右導光板２６１は、右投写光学系２５１から出力された画像光を、所定の光路に沿って反射させつつ使用者の右眼ＲＥに導く。なお、右投写光学系２５１と右導光板２６１とを総称して「導光部」とも呼ぶ。 The right projection optical system 251 is configured by a collimating lens that converts the image light emitted from the right LCD 241 into a light beam in a parallel state. The right light guide plate 261 as the right optical image display unit 26 guides the image light output from the right projection optical system 251 to the right eye RE of the user while reflecting the image light along a predetermined optical path. The right projection optical system 251 and the right light guide plate 261 are collectively referred to as “light guide unit”.

左表示駆動部２４は、右表示駆動部２２と同様の構成を有している。左表示駆動部２４は、受信部５４（Ｒｘ５４）と、光源として機能する左バックライト制御部２０２（左ＢＬ制御部２０２）および左バックライト２２２（左ＢＬ２０２）と、表示素子として機能する左ＬＣＤ制御部２１２および左ＬＣＤ２４２と、左投写光学系２５２と、を含んでいる。左バックライト制御部２０２と左バックライト２２２とは、光源として機能する。左ＬＣＤ制御部２１２と左ＬＣＤ２４２とは、表示素子として機能する。なお、左バックライト制御部２０２と、左ＬＣＤ制御部２１２と、左バックライト２２２と、左ＬＣＤ２４２と、を総称して「画像光生成部」とも呼ぶ。また、左投写光学系２５２は、左ＬＣＤ２４２から射出された画像光を並行状態の光束にするコリメートレンズによって構成される。左光学像表示部２８としての左導光板２６２は、左投写光学系２５２から出力された画像光を、所定の光路に沿って反射させつつ使用者の左眼ＬＥに導く。なお、左投写光学系２５２と左導光板２６２とを総称して「導光部」とも呼ぶ。 The left display drive unit 24 has the same configuration as the right display drive unit 22. The left display driving unit 24 includes a receiving unit 54 (Rx54), a left backlight control unit 202 (left BL control unit 202) and a left backlight 222 (left BL202) that function as a light source, and a left LCD that functions as a display element. A control unit 212 and a left LCD 242 and a left projection optical system 252 are included. The left backlight control unit 202 and the left backlight 222 function as a light source. The left LCD control unit 212 and the left LCD 242 function as display elements. The left backlight control unit 202, the left LCD control unit 212, the left backlight 222, and the left LCD 242 are also collectively referred to as “image light generation unit”. The left projection optical system 252 is configured by a collimating lens that converts the image light emitted from the left LCD 242 into a light beam in a parallel state. The left light guide plate 262 as the left optical image display unit 28 guides the image light output from the left projection optical system 252 to the left eye LE of the user while reflecting the image light along a predetermined optical path. The left projection optical system 252 and the left light guide plate 262 are collectively referred to as “light guide unit”.

図３は、画像光生成部によって画像光が射出される様子を示す説明図である。右ＬＣＤ２４１は、マトリクス状に配置された各画素位置の液晶を駆動することによって、右ＬＣＤ２４１を透過する光の透過率を変化させることにより、右バックライト２２１から照射される照明光ＩＬを、画像を表わす有効な画像光ＰＬへと変調する。左側についても同様である。なお、図４のように、本実施形態ではバックライト方式を採用したが、フロントライト方式や、反射方式を用いて画像光を射出する構成としてもよい。 FIG. 3 is an explanatory diagram illustrating a state in which image light is emitted by the image light generation unit. The right LCD 241 changes the transmittance of the light transmitted through the right LCD 241 by driving the liquid crystal at each pixel position arranged in a matrix, thereby changing the illumination light IL emitted from the right backlight 221 into an image. Is modulated into an effective image light PL representing The same applies to the left side. As shown in FIG. 4, the backlight system is used in the present embodiment, but it may be configured to emit image light using a front light system or a reflection system.

Ａ−２．出力音量調整処理：
図４は、出力音量調整処理の流れを示す説明図である。出力音量調整処理では、マイク６２，６４が取得した外部音声の変化に基づいて、制御部１０の各部がイヤホン３２，３４を介して出力する出力音量を調整する処理である。初めに、マイク６２，６４は、外部音声を取得する（ステップＳ３１０）。音声処理部１７０は、マイク６２，６４が取得した外部音声に対して、１秒ごとの音量の取得、周波数スペクトルの解析、特定の音声の検出、を行なう。 A-2. Output volume adjustment processing:
FIG. 4 is an explanatory diagram showing the flow of output volume adjustment processing. The output volume adjustment process is a process for adjusting the output volume output by each unit of the control unit 10 via the earphones 32 and 34 based on the change in the external sound acquired by the microphones 62 and 64. First, the microphones 62 and 64 acquire external sound (step S310). The sound processing unit 170 performs sound volume acquisition per second, frequency spectrum analysis, and specific sound detection for the external sound acquired by the microphones 62 and 64.

次に、音声処理部１７０は、外部音声に警告音等の特定の音声が検出されるかを監視する（ステップＳ３２０）。特定の音声が検出された場合には（ステップＳ３２０：ＹＥＳ）、音声処理部１７０は、イヤホン３２，３４から出力する出力音量を下げる（ステップＳ３６０）。本実施形態では、音声処理部１７０は、予め設定されている音量まで出力音量を下げるが、他の実施形態では、使用者によって下げる音量が自由に設定されてもよいし、外部音量に基づいて音量が設定されてもよい。また、出力音量と外部音量とに基づいて設定されてもよい。 Next, the sound processing unit 170 monitors whether a specific sound such as a warning sound is detected in the external sound (step S320). When a specific sound is detected (step S320: YES), the sound processing unit 170 decreases the output volume output from the earphones 32 and 34 (step S360). In this embodiment, the audio processing unit 170 lowers the output volume to a preset volume, but in other embodiments, the volume to be lowered may be set freely by the user, or based on the external volume The volume may be set. Further, it may be set based on the output volume and the external volume.

出力音量が下げられると、次に、表示制御部１９０は、取得した外部音声を表すテキスト画像を画像表示部２０に表示する（ステップＳ３７０）。図５は、使用者が視認する視野ＶＲの一例を示す説明図である。図５には、使用者が視認する視野ＶＲと、画像表示部２０が画像を表示できる領域である最大画像表示領域ＰＮと、が示されている。図５に示すように、使用者は、教師ＴＥと、教師ＴＥの講義を受けている複数の生徒ＳＴと、取得された外部音声を文字画像として表したテキスト画像ＴＸ１と、を外景として視認できる。使用者は、教師ＴＥに装着された講義用マイクが取得した音声を、インターフェイス１８０を介してイヤホン３２，３４から出力音声として視聴すると共に、教師ＴＥがホワイトボードＷＢに書いた文字を視認する。本実施形態では、ステップＳ３３０の処理において、アナウンスの前に流れる「ピンポンパンポン」というアナウンス音が特定の音声として検出されて、変換部１８５は、マイク６２，６４が取得する外部音声を、外部音声を文字として表す文字画像へと変換する。表示制御部１９０は、変換された文字画像を最大画像表示領域ＰＮにテキスト画像ＴＸ１として表示する。アナウンス音に含まれる「ピンポンパンポン」という外部音声の音量よりも、「ただいま、地震が発生しました。」という外部音声の音量の方が大きい。そのため、図５に示すように、表示制御部１９０は、「ピンポンパンポン」の音声を表す文字画像よりも「ただいま、地震が発生しました。」を表す文字画像における各文字の大きさを大きくする。 When the output volume is lowered, next, the display control unit 190 displays a text image representing the acquired external sound on the image display unit 20 (step S370). FIG. 5 is an explanatory diagram showing an example of the visual field VR visually recognized by the user. FIG. 5 shows a visual field VR visually recognized by the user and a maximum image display area PN that is an area where the image display unit 20 can display an image. As shown in FIG. 5, the user can visually recognize a teacher TE, a plurality of students ST receiving a lecture of the teacher TE, and a text image TX1 representing the acquired external voice as a character image as an outside scene. . The user views the voice acquired by the lecture microphone attached to the teacher TE as output voice from the earphones 32 and 34 via the interface 180, and visually recognizes the character written on the whiteboard WB by the teacher TE. In the present embodiment, in the process of step S330, an announcement sound “ping-pong pampong” flowing before the announcement is detected as a specific sound, and the conversion unit 185 converts the external sound acquired by the microphones 62 and 64 into the external sound. Is converted to a character image representing the character as a character. The display control unit 190 displays the converted character image as the text image TX1 in the maximum image display area PN. The volume of the external audio “Ping Pong Pampong” included in the announcement sound is larger than the volume of the external audio “Now the earthquake has occurred.” Therefore, as shown in FIG. 5, the display control unit 190 increases the size of each character in the character image representing “An earthquake has occurred” rather than the character image representing the sound of “ping-pong pampong”. .

アナウンス音は図５に示すスピーカーＳＰから出力されており、左マイク６４は、右マイク６２よりもスピーカーＳＰの位置に近いため、右マイク６２よりも取得する外部音量が大きい。そのため、音声処理部１７０は、外部音声の音源であるスピーカーＳＰが使用者に対して左側にあると判定する。表示制御部１９０は、最大画像表示領域ＰＮにおいて、中心部分を除き、かつ、中心よりも左側の部分に占める部分が多くなるような周辺部にテキスト画像ＴＸ１を表示する。本実施形態における最大画像表示領域ＰＮは、横の画素数が９６０、縦の画素数が５４０である。最大画像表示領域ＰＮは、請求項における画像生成可能領域に相当する。最大画像表示領域ＰＮにおける中心部分を除く周辺部とは、中央に位置する横の画素数３２０と縦の画素数１８０の領域を除く領域のことをいい、中央に位置する横の画素数４８０と縦の画素数２７０の領域を除く領域であるとさらに好ましい。 The announcement sound is output from the speaker SP shown in FIG. 5, and the left microphone 64 is closer to the position of the speaker SP than the right microphone 62, so that the external volume acquired is larger than that of the right microphone 62. Therefore, the audio processing unit 170 determines that the speaker SP, which is a sound source of external audio, is on the left side with respect to the user. In the maximum image display area PN, the display control unit 190 displays the text image TX1 in a peripheral part that excludes the central part and increases in the part that occupies the left part of the center. In the present embodiment, the maximum image display area PN has 960 horizontal pixels and 540 vertical pixels. The maximum image display area PN corresponds to an image generation possible area in claims. The peripheral portion excluding the central portion in the maximum image display region PN refers to a region excluding the region having the horizontal pixel number 320 and the vertical pixel number 180 located in the center, and the horizontal pixel number 480 located in the center. It is more preferable that the region is a region excluding the region of 270 vertical pixels.

次に、音声処理部１７０は、検出した特定の音声が検出されない元の状態に戻ったかを監視する（ステップＳ３８０）。特定の音声が検出されなくなった場合には（ステップＳ３８０：ＹＥＳ）、音声処理部１７０は、ステップＳ３６０の処理において下げた出力音量を、下げる前の出力音量に戻す（ステップＳ３９０）。その後、制御部１０は、出力音声調整処理を終了する操作を監視する（ステップＳ４００）。出力音量調整処理を終了する操作が検出されない場合には（ステップＳ４００：ＮＯ）、音声処理部１７０は、再び、外部音声に警告音等の特定の音声が検出されるかを監視する（ステップＳ３２０）。出力音量調整処理を終了する操作が検出された場合には（ステップＳ４００：ＹＥＳ）、表示制御部１９０が最大画像表示領域ＰＮに表示していたテキスト画像ＴＸ１を非表示にして、制御部１０は、出力音量調整処理を終了する。 Next, the sound processing unit 170 monitors whether the detected specific sound has returned to the original state where it is not detected (step S380). When the specific sound is not detected (step S380: YES), the sound processing unit 170 returns the output volume reduced in the process of step S360 to the output volume before the decrease (step S390). Thereafter, the control unit 10 monitors an operation for ending the output audio adjustment process (step S400). When the operation for ending the output volume adjustment process is not detected (step S400: NO), the sound processing unit 170 again monitors whether a specific sound such as a warning sound is detected in the external sound (step S320). ). When an operation for ending the output volume adjustment processing is detected (step S400: YES), the control unit 10 hides the text image TX1 displayed in the maximum image display area PN by the display control unit 190, and the control unit 10 Then, the output volume adjustment process is terminated.

ステップＳ３８０の処理において、引き続き、特定の音声が検出される場合には（ステップＳ３８０：ＮＯ）、制御部１０は、出力音声調整処理を終了する操作を監視する（ステップＳ４１０）。出力音量調整処理を終了する操作が検出されない場合には（ステップＳ３８０：ＮＯ）、音声処理部１７０は、引き続き、検出した特定の音声が検出されない元の状態に戻ったかを監視する（ステップＳ３８０）。出力音量調整処理を終了する操作が検出された場合には（ステップＳ４１０：ＹＥＳ）、制御部１０は、出力音量調整処理を終了する。 In the process of step S380, when a specific sound is detected continuously (step S380: NO), the control unit 10 monitors an operation for ending the output sound adjustment process (step S410). When the operation for ending the output volume adjustment process is not detected (step S380: NO), the voice processing unit 170 continues to monitor whether the original state where the detected specific voice is not detected is restored (step S380). . When an operation for ending the output volume adjustment process is detected (step S410: YES), the control unit 10 ends the output volume adjustment process.

ステップ３２０の処理において、特定の音声が検出されない場合には（ステップＳ３２０：ＮＯ）、音声処理部１７０は、取得された外部音量が変化して予め定められた閾値以上の音量であるかを監視する（ステップＳ３３０）。図６は、取得される外部音声の音量の変化の一例を示す説明図である。図６には、マイク６２，６４が取得した外部音声の音量の推移Ｎｏｕｔ（ｄＢ）と、予め定められた音量閾値Ｎｔｈ（ｄＢ）と、が示されている。図６に示すように、取得される外部音量は、時刻ｔ１までは音量閾値Ｎｔｈよりも小さい音量であり、時刻ｔ１から１秒後の時刻ｔ２では音量閾値Ｎｔｈ以上の音量である。この場合には、時刻ｔ１の音量Ｎ１から時刻ｔ２の音量Ｎ２へと音量が変化して、音量Ｎ２が音量閾値Ｎｔｈ以上であるため（図４のステップＳ３３０：ＹＥＳ）、次に、制御部１０の各部は、ステップＳ３６０以降の処理を行なう。 If a specific sound is not detected in the process of step 320 (step S320: NO), the sound processing unit 170 monitors whether the acquired external sound volume is changed and the sound volume is equal to or higher than a predetermined threshold. (Step S330). FIG. 6 is an explanatory diagram illustrating an example of a change in the volume of the acquired external sound. FIG. 6 shows a transition Nout (dB) of the volume of the external sound acquired by the microphones 62 and 64 and a predetermined volume threshold Nth (dB). As shown in FIG. 6, the acquired external volume is a volume smaller than the volume threshold Nth until time t1, and is a volume equal to or higher than the volume threshold Nth at time t2 one second after time t1. In this case, the volume changes from the volume N1 at time t1 to the volume N2 at time t2, and the volume N2 is greater than or equal to the volume threshold Nth (step S330 in FIG. 4: YES). Each part of performs the process after step S360.

ステップＳ３３０の処理において、外部音量が変化しない、または、外部音量が変化しても音量が音量閾値Ｎｔｈよりも小さい場合には（ステップＳ３３０：ＮＯ）、音声処理部１７０は、外部音声に特定の周波数が検出されるかを監視する（ステップＳ３４０）。図７は、取得される外部音声の周波数の一例について示す説明図である。図７には、期間Ｔ１と期間Ｔ１の後の期間Ｔ２とのそれぞれにおいて取得された外部音声を解析した周波数スペクトルが示されている。図７に示すように、期間Ｔ１における周波数スペクトルｆＴ１から期間Ｔ２における周波数スペクトルｆＴ２への変化において、特定の周波数の帯域ｆｔｈの周波数が検出された場合に（ステップＳ３４０：ＹＥＳ）、制御部１０の各部は、ステップＳ３６０以降の処理を行なう。なお、周波数が３０００ヘルツ（Ｈｚ）の音声は、音量が小さくても人が聞き取りやすいことが一般に知られている。そのため、他の実施形態では、人が聞き取れる周波数である２０ヘルツから２００００ヘルツの音声のうち、２０ヘルツまたは２００００ヘルツに近い周波数の音声の場合に、外部音声が同じ音量であっても出力音声を小さくしてもよい。 In the process of step S330, if the external volume does not change or the volume is smaller than the volume threshold value Nth even if the external volume changes (step S330: NO), the audio processing unit 170 specifies the external audio. It is monitored whether the frequency is detected (step S340). FIG. 7 is an explanatory diagram showing an example of the frequency of the acquired external sound. FIG. 7 shows a frequency spectrum obtained by analyzing the external sound acquired in each of the period T1 and the period T2 after the period T1. As shown in FIG. 7, when a frequency of a specific frequency band fth is detected in the change from the frequency spectrum fT1 in the period T1 to the frequency spectrum fT2 in the period T2 (step S340: YES), the control unit 10 Each unit performs the processing after step S360. Note that it is generally known that sound having a frequency of 3000 hertz (Hz) is easy for humans to hear even when the volume is low. Therefore, in another embodiment, in the case of a sound having a frequency close to 20 Hz or 20 000 Hz out of 20 Hz to 20000 Hz, which is a frequency that humans can hear, the output sound is output even if the external sound has the same volume. It may be small.

ステップＳ３４０の処理において、外部音声に特定の周波数が検出されない場合には（ステップＳ３４０：ＮＯ）、外部音声が変化して、外部音声に人の声が検出されている状態で、方向判定部１６１は、使用者の頭部の動きの検出を監視する（ステップＳ３５０）。音声処理部１７０が外部音声に人の声が検出している状態で、方向判定部１６１が頭部の角速度と角度の変化量との少なくとも一方において、予め定められた閾値以上の変化量を検出すると（ステップＳ３５０：ＹＥＳ）、制御部１０の各部は、ステップＳ３６０以降の処理を行なう。ステップＳ３５０の処理において、人の声が検出されていない場合、または、閾値以上の頭部の動きの変化量が検出されない場合（ステップＳ３５０：ＮＯ）、音声処理部１７０は、再度、外部音声に警告音等の特定の音声が検出されるかを監視する（ステップＳ３２０）。 In the process of step S340, when a specific frequency is not detected in the external sound (step S340: NO), the direction determination unit 161 in a state where the external sound is changed and a human voice is detected in the external sound. Monitors the detection of the movement of the user's head (step S350). While the voice processing unit 170 detects a human voice in the external voice, the direction determination unit 161 detects a change amount equal to or greater than a predetermined threshold in at least one of the angular velocity of the head and the change amount of the angle. Then (step S350: YES), each unit of the control unit 10 performs the processing after step S360. When the human voice is not detected in the process of step S350, or when the amount of change in the head movement equal to or greater than the threshold is not detected (step S350: NO), the voice processing unit 170 again converts the voice into an external voice. It is monitored whether a specific sound such as a warning sound is detected (step S320).

以上説明したように、本実施形態における頭部装着型表示装置１００では、音声処理部１７０は、マイク６２，６４が取得した外部音声の変化に基づいて、イヤホン３２，３４から出力される出力音声の音量を調整する。そのため、この頭部装着型表示装置１００では、外部音声の音量等が変化した場合に出力音声の音量を小さくするので、使用者は、インターフェイス１８０を介して供給されるコンテンツの音声を視聴しながら、外部音声を聞くことができ、外部の変化を速やかに認識できる。 As described above, in the head-mounted display device 100 according to the present embodiment, the audio processing unit 170 outputs audio output from the earphones 32 and 34 based on the change in external audio acquired by the microphones 62 and 64. Adjust the volume. Therefore, in the head-mounted display device 100, the volume of the output sound is reduced when the volume of the external sound or the like changes, so that the user can watch the sound of the content supplied via the interface 180. , Can hear external audio and can recognize external changes quickly.

また、本実施形態における頭部装着型表示装置１００では、変換部１８５は、マイク６２，６４が取得した音声を、音声を文字によって表した文字画像へと変換する。表示制御部１９０は、文字画像を表す制御信号を画像表示部２０に送信し、画像表示部２０によって文字画像を表す画像光が生成される。そのため、この頭部装着型表示装置１００では、マイク６２，６４が取得した外部音声を、外部音声を表すテキスト画像ＴＸ１として使用者に視認させ、使用者に外部音声をより詳しく認識させることができる。 Further, in the head-mounted display device 100 according to the present embodiment, the conversion unit 185 converts the sound acquired by the microphones 62 and 64 into a character image in which the sound is represented by characters. The display control unit 190 transmits a control signal representing a character image to the image display unit 20, and image light representing the character image is generated by the image display unit 20. Therefore, in the head-mounted display device 100, the user can visually recognize the external sound acquired by the microphones 62 and 64 as the text image TX1 representing the external sound, and can make the user recognize the external sound in more detail. .

また、本実施形態における頭部装着型表示装置１００では、表示制御部１９０は、マイク６２，６４が取得した外部音量に基づいて文字画像の文字の大きさを設定し、文字の大きさを設定した文字画像を表す制御信号を画像表示部２０に送信する。そのため、この頭部装着型表示装置１００では、外部音声の音量に基づいてテキスト画像ＴＸ１の文字の大きさが設定されるので、使用者に、外部音声を視聴させると共に、外部音量の相違をテキスト画像ＴＸ１の文字の大きさとしても視認させ、外部音声をより詳しく認識させることができる。 In the head-mounted display device 100 according to this embodiment, the display control unit 190 sets the character size of the character image based on the external volume acquired by the microphones 62 and 64, and sets the character size. A control signal representing the character image is transmitted to the image display unit 20. Therefore, in the head-mounted display device 100, the character size of the text image TX1 is set based on the volume of the external sound, so that the user can watch the external sound and the difference in the external sound volume is text. Even the character size of the image TX1 can be visually recognized, and the external sound can be recognized in more detail.

また、本実施形態における頭部装着型表示装置１００では、表示制御部１９０は、画像表示部２０が画像を表示できる領域である最大画像表示領域ＰＮにおいて、中心部分を除く周辺部にテキスト画像ＴＸ１を表示する。そのため、この頭部装着型表示装置１００では、最大画像表示領域ＰＮにおける中心部分に外部音声を表す文字画像が表示されないため、使用者の視野ＶＲが必要以上に妨げられることがなく、使用者は、画像表示部２０を透過する外景を視認できると共に文字画像を視認でき、使用者の利便性が向上する。 In the head-mounted display device 100 according to the present embodiment, the display control unit 190 includes the text image TX1 in the peripheral portion excluding the central portion in the maximum image display region PN, which is a region where the image display unit 20 can display an image. Is displayed. Therefore, in the head-mounted display device 100, since the character image representing the external voice is not displayed at the central portion in the maximum image display area PN, the user's visual field VR is not unnecessarily disturbed, and the user can In addition, it is possible to visually recognize an outside scene that is transmitted through the image display unit 20 and to visually recognize a character image, thereby improving user convenience.

また、本実施形態における頭部装着型表示装置１００では、画像表示部２０に右マイク６２と左マイク６４とが異なる位置に配置される。音声処理部１７０は、右マイク６２が取得する外部音量と左マイク６４が取得する外部音量とに基づいて画像表示部２０から外部音声の音源への方向を推定する。音声処理部１７０は、スピーカーＳＰが使用者に対して左側にあると推定し、最大画像表示領域ＰＮにおいて、中心よりも左側の部分に占める部分にテキスト画像ＴＸ１を表示する。そのため、この頭部装着型表示装置１００では、使用者の視野ＶＲにおいて、音源の近い位置に外部音声を文字として表す文字画像を視認させ、音源の位置を使用者に認識させることができる。 In the head-mounted display device 100 according to the present embodiment, the right microphone 62 and the left microphone 64 are arranged on the image display unit 20 at different positions. The sound processing unit 170 estimates the direction of the external sound from the image display unit 20 to the sound source based on the external volume acquired by the right microphone 62 and the external volume acquired by the left microphone 64. The sound processing unit 170 estimates that the speaker SP is on the left side with respect to the user, and displays the text image TX1 in a portion occupying the left portion of the center in the maximum image display region PN. Therefore, in the head-mounted display device 100, in the user's visual field VR, it is possible to make the user recognize the position of the sound source by visually recognizing a character image representing the external sound as a character near the sound source.

また、本実施形態の頭部装着型表示装置１００では、音声処理部１７０は、取得された音量が音量閾値Ｎｔｈ未満の音量Ｎ１から音量閾値Ｎｔｈ以上の音量Ｎ２へと変化する場合に出力音量を下げる。そのため、この頭部装着型表示装置１００では、コンテンツの音声の音量が大きい場合であっても、外部音量が大きい場合には出力音量が下げられるため、警告音等の音量の大きい外部音声が取得された場合でも、外部音声をより明確に聞くことができる。 Further, in the head-mounted display device 100 of the present embodiment, the audio processing unit 170 changes the output volume when the acquired volume changes from the volume N1 less than the volume threshold Nth to the volume N2 greater than or equal to the volume threshold Nth. Lower. Therefore, in the head-mounted display device 100, even when the volume of the content sound is high, the output sound volume is lowered when the external sound volume is high. Even if it is, you can hear the external sound more clearly.

また、本実施形態の頭部装着型表示装置１００では、音声処理部１７０は、取得された外部音声に含まれている警告音といった特定の音声を検出し、検出された外部音声に含まれる特定の音声に基づいて出力音声を調整する。そのため、この頭部装着型表示装置１００では、使用者が検出したい特定の音声を検出したときに、出力音量を下げるので、使用者が外部音声に含まれる必要な音声のみを聞くことができ、使用者の利便性が向上する。 Further, in the head-mounted display device 100 according to the present embodiment, the sound processing unit 170 detects specific sound such as a warning sound included in the acquired external sound, and specifies the specific sound included in the detected external sound. The output sound is adjusted based on the sound. Therefore, in this head-mounted display device 100, when the user detects a specific sound that the user wants to detect, the output volume is lowered, so that the user can hear only the necessary sound included in the external sound, User convenience is improved.

また、本実施形態の頭部装着型表示装置１００では、９軸センサー６６が使用者の頭部の動きを検出し、方向判定部１６１が検出された画像表示部２０の動きの変化に基づいて、使用者の頭部における角速度や角度の変化量を特定する。音声処理部１７０は、外部音声に含まれる人の声と、使用者の頭部の動きにおける予め定められた閾値以上の変化量と、に基づいて出力音量を下げる。そのため、この頭部装着型表示装置１００では、外部音声の変化と使用者の視線方向の変化との２つの指標に基づいて出力音量が調整されるので、外部音声の変化のみで出力音量が調整される場合に比べ、不要な調整が抑制され、使用者の利便性が向上する。 Further, in the head-mounted display device 100 of the present embodiment, the 9-axis sensor 66 detects the movement of the user's head, and based on the change in the movement of the image display unit 20 detected by the direction determination unit 161. Then, the angular velocity and the amount of change in angle in the user's head are specified. The sound processing unit 170 reduces the output volume based on the human voice included in the external sound and the amount of change equal to or greater than a predetermined threshold in the movement of the user's head. For this reason, in this head-mounted display device 100, the output volume is adjusted based on two indicators, the change in the external sound and the change in the user's line of sight, so the output volume is adjusted only by the change in the external sound. Compared with the case where it is done, unnecessary adjustment is suppressed and the convenience of the user is improved.

また、本実施形態の頭部装着型表示装置１００では、音声処理部１７０は、取得した外部音声の周波数スペクトルを解析して、周波数ごとの強度を算出し、解析された外部音声の周波数スペクトルの変化に基づいて出力音声を調整する。そのため、この頭部装着型表示装置１００では、取得された外部音声に周波数の異なる異音等が検出された場合に、使用者に異音等を認識させることができ、使用者の利便性が向上する。 In the head-mounted display device 100 of the present embodiment, the sound processing unit 170 analyzes the frequency spectrum of the acquired external sound, calculates the intensity for each frequency, and calculates the frequency spectrum of the analyzed external sound. Adjust the output audio based on the change. Therefore, in the head-mounted display device 100, when an abnormal sound having a different frequency is detected in the acquired external sound, the user can recognize the abnormal sound or the like, which is convenient for the user. improves.

Ｂ．変形例：
なお、この発明は上記実施形態に限られるものではなく、その要旨を逸脱しない範囲において種々の態様において実施することが可能であり、例えば、次のような変形も可能である。 B. Variations:
In addition, this invention is not limited to the said embodiment, It can implement in a various aspect in the range which does not deviate from the summary, For example, the following deformation | transformation is also possible.

Ｂ１．変形例１：
上記実施形態では、音声処理部１７０と、画像表示部２０と、を備える頭部装着型表示装置１００を例に挙げて説明したが、音声処理部１７０と、画像表示部２０と、を備える装置については種々変形可能である。図８は、自動車の運転手が視認する視野ＶＲ’の一例を示す説明図である。この変形例では、図８に示すように、自動車を運転している運転手に文字画像を視認させる画像表示装置として、自動車のフロントガラスに形成されたヘッドアップディスプレイ（Head-up Display；ＨＵＤ）が用いられ、音声処理装置として、自動車に搭載されたカーオーディオと、自動車の外部における前後左右のそれぞれに取り付けられた外部音声を取得するマイクと、が用いられる。 B1. Modification 1:
In the above embodiment, the head-mounted display device 100 including the sound processing unit 170 and the image display unit 20 has been described as an example. However, the device including the sound processing unit 170 and the image display unit 20 is described. Various modifications can be made. FIG. 8 is an explanatory diagram showing an example of a visual field VR ′ visually recognized by the driver of the automobile. In this modified example, as shown in FIG. 8, a head-up display (HUD) formed on a windshield of an automobile is used as an image display device that allows a driver driving the automobile to visually recognize a character image. As audio processing devices, car audio mounted on a car and microphones for acquiring external sound attached to the front, rear, left and right of the car are used.

図８には、最大画像表示領域ＰＮと、取得した外部音声を文字画像として最大画像表示領域ＰＮに表示したテキスト画像ＴＸ２と、が示されている。運転手は、自動車に搭載されたカーオーディオで音楽を聴いている状態である。マイクによって取得された外部音声の音源は、複数のマイクが取得した音量に基づいて自動車から左前方にあると推定された救急車である（図示しない）。この変形例では、外部音量が音量閾値Ｎｔｈ未満の状態から、救急車の発する音（図８に示す「ピーポーパーポー」）によって外部音量が音量閾値Ｎｔｈ以上に変化すると、カーオーディオの音量を調整する音声処理部１７０’は、カーオーディオが発する音楽の音量を下げる。また、最大画像表示領域ＰＮには、表示制御部１９０が救急車の発する音を表す文字画像であるテキスト画像ＴＸ２を表示する。テキスト画像ＴＸ２は、最大画像表示領域ＰＮにおける左上の周辺部に表示されている。なお、テキスト画像ＴＸ２が表示されず、カーオーディオの発する音量のみを小さくする態様であってもよい。そのため、この変形例では、自動車の外で何らかの異常があったときに、自動車の運転手にその異常を認識させやすく、運転手の利便性が向上する。 FIG. 8 shows a maximum image display area PN and a text image TX2 in which the acquired external sound is displayed as a character image in the maximum image display area PN. The driver is listening to music with the car audio installed in the car. The sound source of the external sound acquired by the microphone is an ambulance (not shown) estimated to be left front from the vehicle based on the sound volume acquired by the plurality of microphones. In this modified example, when the external sound volume changes from the state where the external sound volume is less than the sound volume threshold Nth to the sound volume emitted by the ambulance ("Peeper per po" shown in FIG. 8) or more than the sound volume threshold value Nth, the sound for adjusting the volume of the car audio The processing unit 170 ′ decreases the volume of the music generated by the car audio. In the maximum image display area PN, the display control unit 190 displays a text image TX2 that is a character image representing a sound emitted by an ambulance. The text image TX2 is displayed in the upper left peripheral part in the maximum image display area PN. Note that the text image TX2 may not be displayed, and only the volume generated by the car audio may be reduced. Therefore, in this modified example, when there is some abnormality outside the automobile, it is easy for the driver of the automobile to recognize the abnormality, and the convenience of the driver is improved.

また、上記実施形態では、取得された外部音声を文字画像として表示できる画像表示部を備える態様であったが、画像表示部は必須の構成要件ではなく、音声処理装置の態様については種々変形可能である。例えば、音声処理装置は、音楽コンテンツを再生できる携帯型音楽再生装置であってもよい。携帯型音楽再生装置の使用者が密閉型のイヤホンによって音楽コンテンツの出力音声を視聴している場合に、携帯型音楽再生装置に形成されたマイクによって外部音声を取得して、外部音声の変化に基づいて出力音声の音量が設定されてもよい。 Moreover, in the said embodiment, although it was an aspect provided with the image display part which can display the acquired external audio | voice as a character image, an image display part is not an essential structural requirement, and can change variously about the aspect of an audio processing apparatus. It is. For example, the audio processing device may be a portable music playback device that can play music content. When a user of a portable music playback device views the output sound of music content with a sealed earphone, external sound is acquired by a microphone formed in the portable music playback device, and the change of the external sound Based on this, the volume of the output sound may be set.

また、上記実施形態では、取得した外部音声の変化に基づいて、音声処理部１７０は、出力音声を下げる調整を行なったが、出力音声を上げる調整を行なってもよい。例えば、使用者が出力音声および外部音声の両方を視聴できる場合に、外部音量と出力音量との差に基づいて、音声処理部１７０は、出力音量を上げる、または、出力音量を下げる調整を行なってもよい。この変形例では、外部音量と出力音量との差に基づいて、出力音量が調整されるため、使用者の意思を反映した音量で出力音声を使用者に視聴させることができ、利用者の利便性が向上する。 In the above embodiment, the audio processing unit 170 performs the adjustment for decreasing the output sound based on the acquired change in the external sound, but may perform the adjustment for increasing the output sound. For example, when the user can view both the output sound and the external sound, the sound processing unit 170 adjusts to increase or decrease the output sound volume based on the difference between the external sound volume and the output sound volume. May be. In this modification, since the output volume is adjusted based on the difference between the external volume and the output volume, the user can view the output sound at a volume that reflects the user's intention, which is convenient for the user. Improves.

Ｂ２．変形例２：
上記実施形態では、外部音声の変化に基づいて出力音声の音量が調整されたが、出力音量の調整については種々変形可能である。例えば、音声処理部１７０は、マイク６２，６４が取得した外部音声の音量を設定して、イヤホン３２，３４から出力音声として出力させてもよい。この変形例では、外部音声に含まれる警告音等の特定の音声が出力音声に対して小さい音量で検出された場合に、外部音量よりも大きい音量の出力音声として使用者に視聴させることができ、使用者の利便性が向上する。 B2. Modification 2:
In the above embodiment, the volume of the output sound is adjusted based on the change of the external sound. However, various modifications can be made to the adjustment of the output sound volume. For example, the sound processing unit 170 may set the volume of the external sound acquired by the microphones 62 and 64 and output it from the earphones 32 and 34 as output sound. In this modification, when a specific sound such as a warning sound included in the external sound is detected with a volume lower than the output sound, the user can view the output sound with a volume larger than the external volume. , User convenience is improved.

Ｂ３．変形例３：
また、上記実施形態では、表示制御部１９０は、最大画像表示領域ＰＮにおけるテキスト画像ＴＸ１を表示する位置を、画像表示部２０から外部音声の音源への方向に対応付けて設定したが、最大画像表示領域ＰＮにテキスト画像ＴＸ１が表示される位置については種々変形可能である。例えば、頭部装着型表示装置１００には、１つのマイクのみが配置されて、外部音声の音源の方向が特定されない態様であってもよい。また、外部音声の音源の方向が特定できても、表示制御部１９０は、最大画像表示領域ＰＮにテキスト画像ＴＸ１を表示する位置を、外部音声の音源の方向にかかわらず設定してもよい。 B3. Modification 3:
In the above embodiment, the display control unit 190 sets the position for displaying the text image TX1 in the maximum image display area PN in association with the direction from the image display unit 20 to the external sound source. The position where the text image TX1 is displayed in the display area PN can be variously modified. For example, the head-mounted display device 100 may be configured such that only one microphone is arranged and the direction of the external sound source is not specified. Even if the direction of the external sound source can be specified, the display control unit 190 may set the position where the text image TX1 is displayed in the maximum image display area PN regardless of the direction of the external sound source.

また、最大画像表示領域ＰＮに表示される文字画像の各文字の大きさおよび大きさの設定方法についても種々変形可能である。表示制御部１９０は、外部音量にかかわらず、予め定められた大きさで文字画像における各文字を最大画像表示領域ＰＮに表示させてもよい。また、最大画像表示領域ＰＮに表示される文字画像における各文字の大きさは、周波数、音の種類、使用者の頭部の動きの変化量、に基づいて設定されてもよい。また、文字の大きさのみにかかわらず、文字のフォントを変更してもよい。 Also, various changes can be made to the size of each character of the character image displayed in the maximum image display area PN and the method of setting the size. The display control unit 190 may display each character in the character image in the maximum image display area PN with a predetermined size regardless of the external volume. The size of each character in the character image displayed in the maximum image display area PN may be set based on the frequency, the type of sound, and the amount of change in the movement of the user's head. In addition, the font of the character may be changed regardless of the size of the character.

Ｂ４．変形例４：
上記実施形態では、最大画像表示領域ＰＮに表示される外部音声を表す文字画像は、予め決められた態様で表示されるが、表示される文字画像の設定については種々変形可能である。使用者が出力音声を聞いている場合に、操作部１３５に所定の操作が行なわれることで、外部音声を表す文字画像は、操作が行なわれる前と異なる文字画像として表示されてもよい。 B4. Modification 4:
In the above embodiment, the character image representing the external sound displayed in the maximum image display area PN is displayed in a predetermined manner. However, the setting of the displayed character image can be variously modified. When the user is listening to the output sound, the character image representing the external sound may be displayed as a character image different from that before the operation is performed by performing a predetermined operation on the operation unit 135.

また、上記実施形態では、マイク６２，６４は、使用者が画像表示部２０を装着した場合の両耳付近の位置に配置されたが、マイク６２，６４の配置および構造についてはこれに限られず、種々変形可能である。例えば、マイク６２，６４は、画像表示部２０に対して向きを変更できるような機械的構造を備え、指向性を有するマイクであってもよい。音声処理部１７０は、画像表示部２０から外部音声の音源への方向を推定できた場合に、マイク６２，６４の向きを変更することで、より詳しく外部音声を取得することができ、外部音声の変化を検出しやすくなる。 In the above embodiment, the microphones 62 and 64 are arranged at positions near both ears when the user wears the image display unit 20. However, the arrangement and structure of the microphones 62 and 64 are not limited thereto. Various modifications are possible. For example, the microphones 62 and 64 may be microphones having a mechanical structure that can change the orientation with respect to the image display unit 20 and having directivity. When the direction from the image display unit 20 to the sound source of the external sound can be estimated, the sound processing unit 170 can acquire the external sound in more detail by changing the direction of the microphones 62 and 64, and the external sound can be acquired. It becomes easy to detect the change of.

また、上記実施形態では、音声処理部１７０は、外部音声の周波数によって、外部音声の変化を検出するため、外部音声に含まれる特定の音声の検出については種々変形可能である。例えば、複数の人の声が含まれる外部音声であっても、音声処理部１７０は、一人一人の声を認識して、識別できてもよい。この場合に、予め定めた特定の一人の声を検出したときに、音声処理部１７０は、外部音量を調整する態様であってもよい。 In the above embodiment, the sound processing unit 170 detects a change in the external sound based on the frequency of the external sound. Therefore, the detection of a specific sound included in the external sound can be variously modified. For example, even if the voice is an external voice including a plurality of voices, the voice processing unit 170 may recognize and identify each voice. In this case, the sound processing unit 170 may be configured to adjust the external sound volume when a predetermined specific voice of one person is detected.

また、上記実施形態では、外部音声に含まれる特定の音声を検出して、出力音量の調整が行なわれたが、特定の言葉を検出して、出力音量の調整が行なわれてもよい。例えば、プライバシー保護の観点から、特定の言葉については、出力音量を上げたりすることで、特定の言葉が使用者に視聴できない態様であってもよい。また、個人名等の特定の言葉が予め登録されておくことで、登録された言葉については、音声が取得されても文字画像へと変換が行われない態様としてもよい。また、取得された外部音声を、外部音声を表す文字画像として変換する態様についても同様であり、特定の言葉については、文字画像として画像表示部２０に表示されない態様であってもよい。 In the above embodiment, the specific sound included in the external sound is detected and the output sound volume is adjusted. However, the specific sound may be detected and the output sound volume may be adjusted. For example, from the viewpoint of privacy protection, a specific word may not be viewed by the user by increasing the output volume. In addition, by registering specific words such as personal names in advance, the registered words may not be converted into character images even when voice is acquired. The same applies to the aspect in which the acquired external sound is converted as a character image representing the external sound, and a specific word may not be displayed on the image display unit 20 as a character image.

Ｂ５．変形例５：
また、上記実施形態では、外部音声の変化に基づいて出力音声が制御されたが、外部音声に基づいて出力音声が制御されてもよい。例えば、外部音声が常にほとんど取得されない状態であり、外部音声の音量や周波数の変化がなくても、出力音声の音量を小さくする等の制御が行なわれてもよい。この変形例では、取得された外部音声が変化しなくても、出力音声が制御されるため、使用者は、適切な音量および周波数の出力音声を聞くことができる。 B5. Modification 5:
Moreover, in the said embodiment, although the output audio | voice was controlled based on the change of an external audio | voice, an output audio | voice may be controlled based on an external audio | voice. For example, it is in a state in which almost no external sound is always acquired, and control such as decreasing the volume of the output sound may be performed even if there is no change in the volume or frequency of the external sound. In this modified example, since the output sound is controlled even if the acquired external sound does not change, the user can listen to the output sound having an appropriate volume and frequency.

また、カメラ６１が予め定められている人の声を取得した場合に、音声処理部１７０は、取得された人の声の周波数を変更して、イヤホン３２，３４の出力音声に変更した周波数の人の声を出力させると共に、周波数を変更した人の声以外の出力音量を小さくする態様であってもよい。この変形例では、人が聞き取りづらい周波数の声が取得された場合に、取得された声を視聴しやすくするため、使用者は、外部音声を認識しやすくなる。また、複数の音源から周波数が同じぐらいの外部音声が取得された場合に、いくつかの音源から取得される外部音声の周波数を変更して、周波数を変更した外部音声をイヤホン３２，３４から出力してもよい。一般に、人は、周波数の近い複数の音声を取得すると、周波数マスキングによって、周波数が低い音声が聞き取りづらい。そのため、この変形例では、周波数マスキングによる影響を抑制して、複数の音源から取得される音声を使用者に認識させやすい。 Further, when the camera 61 acquires a predetermined human voice, the audio processing unit 170 changes the frequency of the acquired human voice to the output voice of the earphones 32 and 34. A mode in which a human voice is output and an output volume other than the voice of the person whose frequency is changed may be reduced. In this modified example, when a voice having a frequency that is difficult for a person to hear is acquired, the user can easily recognize the external voice in order to easily view the acquired voice. Further, when external sounds having the same frequency are acquired from a plurality of sound sources, the frequencies of the external sounds acquired from several sound sources are changed, and the external sounds with the changed frequencies are output from the earphones 32 and 34. May be. In general, when a person acquires a plurality of sounds having frequencies close to each other, it is difficult to hear a sound having a low frequency due to frequency masking. Therefore, in this modification, it is easy to make the user recognize the sound acquired from a plurality of sound sources by suppressing the influence of frequency masking.

Ｂ６．変形例６：
また、画像表示部として、眼鏡のように装着する画像表示部２０に代えて、例えば帽子のように装着する画像表示部といった他の方式の画像表示部を採用してもよい。また、上記実施形態では、画像光を生成する構成として、ＬＣＤと光源とを利用しているが、これらに代えて、有機ＥＬディスプレイといった他の表示素子を採用してもよい。また、上記実施形態では、使用者の頭の動きを検出するセンサーとして９軸センサー６６を利用しているが、これに代えて、加速度センサー、角速度センサー、地磁気センサーのうちの１つまたは２つから構成されたセンサーを利用するとしてもよい。また、上記実施形態では、頭部装着型表示装置１００は、両眼タイプの光学透過型であるとしているが、本発明は、例えばビデオ透過型や単眼タイプといった他の形式の頭部装着型表示装置にも同様に適用可能である。 B6. Modification 6:
As the image display unit, instead of the image display unit 20 worn like glasses, another type of image display unit such as an image display unit worn like a hat may be adopted. Moreover, in the said embodiment, although LCD and a light source are utilized as a structure which produces | generates image light, it replaces with these and you may employ | adopt other display elements, such as an organic EL display. In the above embodiment, the 9-axis sensor 66 is used as a sensor for detecting the movement of the user's head. Instead, one or two of an acceleration sensor, an angular velocity sensor, and a geomagnetic sensor are used. You may use the sensor comprised from. In the above embodiment, the head-mounted display device 100 is a binocular optical transmission type. However, the present invention can be applied to other types of head-mounted display such as a video transmission type and a monocular type. The same applies to the apparatus.

また、上記実施形態において、頭部装着型表示装置１００は、使用者の左右の眼に同じ画像を表す画像光を導いて使用者に二次元画像を視認させるとしてもよいし、使用者の左右の眼に異なる画像を表す画像光を導いて使用者に三次元画像を視認させるとしてもよい。 In the above embodiment, the head-mounted display device 100 may guide image light representing the same image to the left and right eyes of the user so that the user can visually recognize the two-dimensional image. It is also possible to guide the user to visually recognize a three-dimensional image by guiding image light representing a different image to his eyes.

また、上記実施形態において、ハードウェアによって実現されていた構成の一部をソフトウェアに置き換えるようにしてもよく、逆に、ソフトウェアによって実現されていた構成の一部をハードウェアに置き換えるようにしてもよい。例えば、上記実施形態では、画像処理部１６０や音声処理部１７０は、ＣＰＵ１４０がコンピュータープログラムを読み出して実行することにより実現されるとしているが、これらの機能部はハードウェア回路により実現されるとしてもよい。 In the above embodiment, a part of the configuration realized by hardware may be replaced by software, and conversely, a part of the configuration realized by software may be replaced by hardware. Good. For example, in the above-described embodiment, the image processing unit 160 and the sound processing unit 170 are realized by the CPU 140 reading and executing a computer program, but these functional units may be realized by a hardware circuit. Good.

また、本発明の機能の一部または全部がソフトウェアで実現される場合には、そのソフトウェア（コンピュータープログラム）は、コンピューター読み取り可能な記録媒体に格納された形で提供することができる。この発明において、「コンピューター読み取り可能な記録媒体」とは、フレキシブルディスクやＣＤ−ＲＯＭのような携帯型の記録媒体に限らず、各種のＲＡＭやＲＯＭ等のコンピューター内の内部記憶装置や、ハードディスク等のコンピューターに固定されている外部記憶装置も含んでいる。 In addition, when part or all of the functions of the present invention are realized by software, the software (computer program) can be provided in a form stored in a computer-readable recording medium. In the present invention, the “computer-readable recording medium” is not limited to a portable recording medium such as a flexible disk or a CD-ROM, but an internal storage device in a computer such as various RAMs and ROMs, a hard disk, etc. It also includes an external storage device fixed to the computer.

また、上記実施形態では、図１および図２に示すように、制御部１０と画像表示部２０とが別々の構成として形成されているが、制御部１０と画像表示部２０との構成については、これに限られず、種々変形可能である。例えば、画像表示部２０の内部に、制御部１０に形成された構成の全てが形成されてもよいし、一部が形成されてもよい。また、制御部１０に形成された構成の内、操作部１３５のみが単独のユーザーインターフェース（ＵＩ）として形成されてもよいし、上記実施形態における電源１３０が単独で形成されて、交換可能な構成であってもよい。また、制御部１０に形成された構成が重複して画像表示部２０に形成されていてもよい。例えば、図２に示すＣＰＵ１４０が制御部１０と画像表示部２０との両方に形成されていてもよいし、制御部１０に形成されたＣＰＵ１４０と画像表示部２０に形成されたＣＰＵとが行なう機能が別々に分けられている構成としてもよい。 Moreover, in the said embodiment, as shown in FIG. 1 and FIG. 2, the control part 10 and the image display part 20 are formed as a separate structure, However, about the structure of the control part 10 and the image display part 20, about. However, the present invention is not limited to this, and various modifications are possible. For example, all of the components formed in the control unit 10 may be formed inside the image display unit 20 or a part thereof may be formed. Further, among the configurations formed in the control unit 10, only the operation unit 135 may be formed as a single user interface (UI), or the power source 130 in the above embodiment is formed independently and can be replaced. It may be. Further, the configuration formed in the control unit 10 may be formed in the image display unit 20 in an overlapping manner. For example, the CPU 140 shown in FIG. 2 may be formed in both the control unit 10 and the image display unit 20, or a function performed by the CPU 140 formed in the control unit 10 and the CPU formed in the image display unit 20. May be configured separately.

また、上記実施形態では、音声処理部１７０および変換部１８５は、制御部１０に形成されたが、他の装置として単独に構成された音声処理システムによって、本発明を実現することも可能である。例えば、出力音声の制御や音声処理は、無線通信回線を介してクラウドコンピューティング、サーバーやホストコンピューター、スマートフォン、によって行なわれてもよい。 In the above embodiment, the voice processing unit 170 and the conversion unit 185 are formed in the control unit 10, but the present invention can also be realized by a voice processing system configured independently as another device. . For example, output audio control and audio processing may be performed by cloud computing, a server, a host computer, or a smartphone via a wireless communication line.

また、上記実施形態では、マイク６２，６４が画像表示部２０に配置され、イヤホン３２，３４が制御部１０に接続される態様としたが、マイク６２，６４およびイヤホン３２，３４の配置については、種々変形可能である。例えば、イヤホン３２，３４は、頭部に装着する頭部装着型音声処理装置のように、頭部装着型表示装置１００に接続されてもよい。また、音出力装置として、イヤホン３２，３４ではなく、スピーカーが用いられてもよい。また、マイク６２，６４についても、画像表示部２０に配置されずに、例えば、施設の特定の位置に備え付けられた音取得装置であってもよい。また、音出力装置は、頭部に装着される、例えば、補聴器であってもよい。この変形例では、使用者の頭部に装着されるため、使用者の耳から音出力装置が外れにくく、安定的に出力音声を使用者に視聴させることができる。 In the above embodiment, the microphones 62 and 64 are arranged on the image display unit 20 and the earphones 32 and 34 are connected to the control unit 10. However, the arrangement of the microphones 62 and 64 and the earphones 32 and 34 is described. Various modifications are possible. For example, the earphones 32 and 34 may be connected to the head-mounted display device 100 like a head-mounted audio processing device mounted on the head. Further, as the sound output device, a speaker may be used instead of the earphones 32 and 34. Also, the microphones 62 and 64 may be sound acquisition devices provided at specific locations in the facility without being arranged in the image display unit 20. The sound output device may be a hearing aid, for example, attached to the head. In this modification, since it is mounted on the user's head, the sound output device is unlikely to come off from the user's ear, and the user can stably view the output sound.

Ｂ７．変形例７：
例えば、画像光生成部は、有機ＥＬ（有機エレクトロルミネッセンス、Organic Electro-Luminescence）のディスプレイと、有機ＥＬ制御部とを備える構成としても良い。また、例えば、画像生成部は、ＬＣＤに代えて、ＬＣＯＳ（Liquid crystal on silicon, LCoS は登録商標）や、デジタル・マイクロミラー・デバイス等を用いることもできる。また、例えば、レーザー網膜投影型のヘッドマウントディスプレイに対して本発明を適用することも可能である。レーザー網膜投影型の場合、「画像生成可能領域」とは、使用者の眼に認識される画像領域として定義することができる。 B7. Modification 7:
For example, the image light generation unit may include an organic EL (Organic Electro-Luminescence) display and an organic EL control unit. Further, for example, the image generation unit may use LCOS (Liquid crystal on silicon, LCoS is a registered trademark), a digital micromirror device, or the like instead of the LCD. Further, for example, the present invention can be applied to a laser retinal projection type head mounted display. In the case of the laser retinal projection type, the “image generation possible region” can be defined as an image region recognized by the user's eyes.

また、例えば、ヘッドマウントディスプレイは、光学像表示部が使用者の眼の一部分のみを覆う態様、換言すれば、光学像表示部が使用者の眼を完全に覆わない態様のヘッドマウントディスプレイとしてもよい。また、ヘッドマウントディスプレイは、いわゆる単眼タイプのヘッドマウントディスプレイであるとしてもよい。 Further, for example, the head-mounted display may be a head-mounted display in which the optical image display unit covers only a part of the user's eye, in other words, the optical image display unit does not completely cover the user's eye. Good. The head mounted display may be a so-called monocular type head mounted display.

また、イヤホンは耳掛け型やヘッドバンド型を採用してもよく、省略しても良い。また、例えば、自動車や飛行機等の車両に搭載されるヘッドマウントディスプレイとして構成されてもよい。また、例えば、ヘルメット等の身体防護具に内蔵されたヘッドマウントディスプレイとして構成されてもよい。 Further, the earphone may be an ear-hook type or a headband type, or may be omitted. Further, for example, it may be configured as a head mounted display mounted on a vehicle such as an automobile or an airplane. Further, for example, it may be configured as a head-mounted display built in a body protective device such as a helmet.

本発明は、上記実施形態や変形例に限られるものではなく、その趣旨を逸脱しない範囲において種々の構成で実現することができる。例えば、発明の概要の欄に記載した各形態中の技術的特徴に対応する実施形態、変形例中の技術的特徴は、上述の課題の一部または全部を解決するために、あるいは、上述の効果の一部または全部を達成するために、適宜、差し替えや、組み合わせを行なうことが可能である。また、その技術的特徴が本明細書中に必須なものとして説明されていなければ、適宜、削除することが可能である。 The present invention is not limited to the above-described embodiments and modifications, and can be realized with various configurations without departing from the spirit of the present invention. For example, the technical features in the embodiments and the modifications corresponding to the technical features in each form described in the summary section of the invention are to solve some or all of the above-described problems, or In order to achieve part or all of the effects, replacement or combination can be performed as appropriate. Further, if the technical feature is not described as essential in the present specification, it can be deleted as appropriate.

１０…制御部
１１…決定キー
１２…点灯部
１３…表示切替キー
１４…トラックパッド
１５…輝度切替キー
１６…方向キー
１７…メニューキー
１８…電源スイッチ
２０…画像表示部
２１…右保持部
２２…右表示駆動部
２３…左保持部
２４…左表示駆動部
２６…右光学像表示部
２８…左光学像表示部
３０…イヤホンプラグ
３２…右イヤホン（音出力部）
３４…左イヤホン（音出力部）
４０…接続部
４２…右コード
４４…左コード
４６…連結部材
４８…本体コード
５１…送信部
５２…送信部
５３…受信部
５４…受信部
６１…カメラ
６２…右マイク（音取得部）
６４…左マイク（音取得部）
６６…９軸センサー
１００…頭部装着型表示装置
１１０…入力情報取得部
１２０…記憶部
１３０…電源
１３５…操作部
１４０…ＣＰＵ
１５０…オペレーティングシステム
１６０…画像処理部
１６１…方向判定部
１７０…音声処理部（音制御部、解析部、特定音検出部、音源方向推定部）
１８０…インターフェイス
１８５…変換部
１９０…表示制御部
２０１…右バックライト制御部
２０２…左バックライト制御部
２１１…右ＬＣＤ制御部
２１２…左ＬＣＤ制御部
２２１…右バックライト
２２２…左バックライト
２４１…右ＬＣＤ
２４２…左ＬＣＤ
２５１…右投写光学系
２５２…左投写光学系
２６１…右導光板
２６２…左導光板
Ｎｏｕｔ…音量の推移
ｔ１，ｔ２…時刻
Ｔ１，Ｔ２…期間
Ｎ１，Ｎ２…音量（外部音の音量）
Ｎｔｈ…音量閾値（第１の閾値）
ｆＴ１，ｆＴ２…周波数スペクトル
ｆｔｈ…周波数の帯域
ＴＸ１，ＴＸ２…テキスト画像（文字画像）
ＯＡ…外部機器
ＷＢ…ホワイトボード
ＲＥ…右眼
ＬＥ…左眼
ＩＬ…照明光
ＰＬ…画像光
ＰＮ…最大画像表示領域（画像生成可能領域）
ＡＰ…先端部
ＳＰ…スピーカー
ＶＲ…視野 DESCRIPTION OF SYMBOLS 10 ... Control part 11 ... Decision key 12 ... Illumination part 13 ... Display switching key 14 ... Trackpad 15 ... Luminance switching key 16 ... Direction key 17 ... Menu key 18 ... Power switch 20 ... Image display part 21 ... Right holding part 22 ... Right display drive unit 23 ... Left holding unit 24 ... Left display drive unit 26 ... Right optical image display unit 28 ... Left optical image display unit 30 ... Earphone plug 32 ... Right earphone (sound output unit)
34 ... Left earphone (sound output part)
DESCRIPTION OF SYMBOLS 40 ... Connection part 42 ... Right cord 44 ... Left cord 46 ... Connecting member 48 ... Main body code 51 ... Transmission part 52 ... Transmission part 53 ... Reception part 54 ... Reception part 61 ... Camera 62 ... Right microphone (sound acquisition part)
64 ... Left microphone (sound acquisition unit)
66 ... 9-axis sensor 100 ... Head-mounted display device 110 ... Input information acquisition unit 120 ... Storage unit 130 ... Power source 135 ... Operation unit 140 ... CPU
DESCRIPTION OF SYMBOLS 150 ... Operating system 160 ... Image processing part 161 ... Direction determination part 170 ... Sound processing part (Sound control part, analysis part, specific sound detection part, sound source direction estimation part)
180 ... Interface 185 ... Conversion unit 190 ... Display control unit 201 ... Right backlight control unit 202 ... Left backlight control unit 211 ... Right LCD control unit 212 ... Left LCD control unit 221 ... Right backlight 222 ... Left backlight 241 ... Right LCD
242 ... Left LCD
251 ... Right projection optical system 252 ... Left projection optical system 261 ... Right light guide plate 262 ... Left light guide plate Nout ... Transition of volume t1, t2 ... Time T1, T2 ... Time period N1, N2 ... Volume
Nth ... Volume threshold (first threshold)
fT1, fT2 ... frequency spectrum fth ... frequency band TX1, TX2 ... text image (character image)
OA ... External device WB ... White board RE ... Right eye LE ... Left eye IL ... Illumination light PL ... Image light PN ... Maximum image display area (image generation possible area)
AP ... tip SP ... speaker VR ... field of view

Claims

A sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part ,
Means for detecting a specific word contained in the external sound,
The sound control unit is a sound processing device that prevents the specific word from being viewed by a user by increasing an output volume when the specific word is included in the external sound .

A sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part ,
Means for detecting a specific word contained in the external sound,
The said conversion part is a sound processing apparatus which does not convert the said specific word into a character image, when the said specific word is contained in the said external sound .

A sound processing device,
A sound output unit for outputting sound;
A sound acquisition unit for acquiring an external sound that is an external sound different from the output sound output by the sound output unit;
A sound control unit for controlling the output sound based on the external sound;
An image display unit that generates image light representing an image, allows a user to visually recognize the image light, and transmits an outside scene;
A conversion unit that converts the external sound into a character image represented as a character;
A display control unit that causes the image display unit to generate character image light that is the image light representing the character image;
With
The display control unit, with respect to the image display unit, except for a central portion in an image generation possible region, which is a region where the image display unit can generate the image light regardless of the position of the sound source of the external sound The character image light is generated in a part ,
Means for detecting a specific word contained in the external sound,
The display processing unit is a sound processing device that does not display the specific word on the image display unit when the specific word is included in the external sound .

The sound processing device according to any one of claims 1 to 3 , wherein
The said display control part is a sound processing apparatus which sets the magnitude | size of the said character image light which the said image display part makes a user visually recognize based on the said external sound.

The sound processing device according to any one of claims 1 to 4 ,
A plurality of the sound acquisition units at different positions;
The sound processing device further includes:
A sound source that is in the direction from the image display unit to the sound source of the external sound, based on the volume of the external sound acquired by one sound acquisition unit and the volume of the external sound acquired by another sound acquisition unit A sound source direction estimation unit for estimating the direction,
The display control unit causes the image display unit to generate the character image light in the peripheral portion in association with the sound source direction.

A sound processing device according to any one of claims 1 to 5 ,
The sound control unit measures the volume of the external sound, and reduces the volume of the output sound when the measured volume of the external sound changes from less than a first threshold value to more than the first threshold value. , Sound processing device.

The sound processing device according to any one of claims 1 to 6 , further comprising:
A specific sound detection unit for detecting a specific sound included in the external sound;
The sound processing unit is a sound processing device that controls the output sound based on the specific sound detected by the specific sound detection unit.

The sound processing apparatus according to any one of claims 1 to 7 , further comprising:
It has a motion detector that detects changes in the movement of the user's head,
The said sound control part is a sound processing apparatus which controls the said output sound based on the change of the motion of the user's head detected by the said motion detection part.

The sound processing apparatus according to any one of claims 1 to 8 ,
The sound control unit causes the sound output unit to output, as the output sound, a sound in which a volume of the external sound is set based on a change in the external sound.

The sound processing apparatus according to any one of claims 1 to 9 ,
When the sound acquisition unit acquires a voice of a specific person, the sound control unit changes the frequency of the acquired voice of the specific person and changes the frequency to the sound output unit. A sound processing apparatus for outputting a voice of a specific person and reducing a volume of a sound other than the voice of the specific person.

The sound processing device according to any one of claims 1 to 10 ,
The sound output unit is a sound processing device that allows a user to view the output sound in a state of being mounted on a user's head.