JP2014027459A

JP2014027459A - Communication apparatus, communication method, and communication program

Info

Publication number: JP2014027459A
Application number: JP2012165902A
Authority: JP
Inventors: Toshiyuki Omori; 俊行大森; Kengo Noda; 憲吾野田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2012-07-26
Filing date: 2012-07-26
Publication date: 2014-02-06

Abstract

PROBLEM TO BE SOLVED: To provide a communication apparatus, communication method, and communication program, having display means configured to be installed on the head of a user, capable of achieving communication using a voice, in view of the magnitude of the sound of a call environment.SOLUTION: The communication apparatus includes display means that is configured to be installed on the head of a user and outputs an image; communication means that exchanges data including sound information with another apparatus; and sound output means that converts the sound information received by the communication means into sound for output. A call device determines whether or not a call environment is a noise environment which is an environment having a possibility that noise may occur (S10). The communication apparatus, when the call environment is determined to be a noise environment (S65: YES), converts the sound which the sound information received by the communication means indicates into a text (S75). The communication apparatus displays an image indicating the text on the display means (S80).

Description

本発明は、ユーザの頭部に装着可能に構成され、画像を出力する表示手段を備え、他の装置との間で音情報を含むデータを送受信可能な通信装置、通信方法、及び通信プログラムに関する。 The present invention relates to a communication device, a communication method, and a communication program that are configured to be wearable on a user's head, include display means for outputting an image, and that can transmit and receive data including sound information to and from other devices. .

ユーザの頭部に装着された状態で使用され、ユーザに各種画像を認識させる表示装置が知られている。表示装置の一例として、ヘッドマウントディスプレイが挙げられる。この種の表示装置の中には、周囲の音を考慮して、ユーザに音を認識させる機能を備えものがある。例えば、特許文献１に記載の頭部装着型表示装置は、ユーザに確実に知覚を促す手段として、気導式スピーカと骨導音スピーカとを備え、必要に応じて双方を使い分ける。一方、周囲の音の大きさを考慮して通話を快適に行うための技術が種々検討されている。例えば、特許文献２に記載の携帯電話機は、骨伝導スピーカを備え、周囲騒音が大きい場合には、骨伝導スピーカを骨導音によるレシーバとして用い、周囲騒音が小さい場合には、骨伝導スピーカを気導音によるレシーバとして用いる。 There is known a display device that is used while being worn on a user's head and allows the user to recognize various images. A head mounted display is mentioned as an example of a display device. Some display devices of this type have a function that allows the user to recognize sounds in consideration of surrounding sounds. For example, the head-mounted display device described in Patent Document 1 includes an air-conducting speaker and a bone-conducting sound speaker as means for surely perceiving the user, and uses both as necessary. On the other hand, various techniques for making a comfortable call in consideration of the volume of surrounding sounds have been studied. For example, the mobile phone described in Patent Document 2 includes a bone conduction speaker. When the ambient noise is high, the bone conduction speaker is used as a receiver for bone conduction sound. When the ambient noise is low, the bone conduction speaker is used. Used as a receiver with air conduction sound.

特開２００６−７４７９８号公報JP 2006-74798 A 特開２００７−１９８９８号公報JP 2007-1998 A

周囲の音を考慮した通話機能を備えた表示装置の開発が望まれている。このような要望に対し、特許文献１に記載の頭部装着型表示装置において、特許文献２に記載の携帯電話機の通話技術を組み込むことが考えられる。しかしながら、特許文献２に記載の携帯電話機の通話技術では、聞き手又は話し手の周囲の音の大きさによっては、相手の音声が依然として聞き取りにくい場合があった。 Development of a display device having a call function in consideration of ambient sounds is desired. In response to such a demand, it is conceivable to incorporate the mobile phone call technology described in Patent Document 2 in the head-mounted display device described in Patent Document 1. However, with the mobile phone call technology described in Patent Document 2, the voice of the other party may still be difficult to hear depending on the volume of sound around the listener or speaker.

本発明の目的は、通話環境の音の大きさを考慮して、音声を用いた通信が可能な、ユーザの頭部に装着可能に構成された表示手段を備える通信装置、通信方法、及び通信プログラムを提供することである。 SUMMARY OF THE INVENTION An object of the present invention is to provide a communication device, a communication method, and a communication including display means configured to be mounted on a user's head, capable of communication using voice in consideration of the volume of sound in a call environment. Is to provide a program.

第１態様に係る通信装置は、ユーザの頭部に装着可能に構成され、画像を出力する表示手段と、他の装置との間で音情報を含むデータの送受信を実行する通信手段と、前記通信手段が受信した前記音情報を音に変換して出力する音出力手段と、通話環境が、騒音を発生する可能性がある環境である騒音環境であるか否かを判断する判断手段と、前記判断手段によって前記通話環境が前記騒音環境であると判断された場合に、前記通信手段が受信した前記音情報が表す音声をテキストに変換する変換手段と、前記変換手段によって生成された前記テキストを表す画像を前記表示手段に表示させる表示制御手段とを備えている。 The communication device according to the first aspect is configured to be wearable on a user's head, and includes a display unit that outputs an image, a communication unit that transmits and receives data including sound information to and from another device, Sound output means for converting the sound information received by the communication means into sound, and determining means for determining whether or not the call environment is a noise environment that may generate noise; When the determination unit determines that the call environment is the noise environment, a conversion unit that converts voice represented by the sound information received by the communication unit into text, and the text generated by the conversion unit Display control means for causing the display means to display an image representing the above.

第１態様に係る通信装置によれば、通話環境が騒音環境であると判断される場合であっても、ユーザは、他の装置から送信された音情報が表す事項を、表示手段に表示されるテキストを見ることによって、認識することができる。したがって、ユーザは第１態様に係る通信装置を用いることによって、通話環境を考慮して、音声を用いた通信が可能である。 According to the communication device according to the first aspect, even when the call environment is determined to be a noise environment, the user displays the matter represented by the sound information transmitted from the other device on the display unit. Can be recognized by looking at the text. Therefore, by using the communication device according to the first aspect, the user can perform communication using voice in consideration of the call environment.

第２態様に係る通信方法は、ユーザの頭部に装着可能に構成され、画像を出力する表示手段と、他の装置との間で音情報を含むデータの送受信を実行する通信手段と、前記通信手段が受信した前記音情報を振動に変換して出力する音出力手段とを備えた通信装置によって実行される通信方法であって、通話環境が、騒音を発生する可能性がある環境である騒音環境であるか否かを判断する判断ステップと、前記判断ステップにおいて前記通話環境が前記騒音環境であると判断された場合に、前記通信手段が受信した前記音情報が表す音声をテキストに変換する変換ステップと、前記変換ステップにおいて生成された前記テキストを表す画像を前記表示手段に表示させる表示制御ステップとを備えている。第２態様に係る通信方法は、通信装置によって上記ステップが実行されることにより、第１態様の通信装置と同様の作用効果が得られる。 The communication method according to the second aspect is configured to be wearable on a user's head, and includes a display unit that outputs an image, a communication unit that performs transmission and reception of data including sound information with another device, A communication method executed by a communication device including sound output means for converting the sound information received by the communication means into vibration and outputting the vibration, wherein the call environment is an environment in which noise may occur. A determination step for determining whether or not the environment is a noise environment; and when the call environment is determined to be the noise environment in the determination step, the voice represented by the sound information received by the communication means is converted into text And a display control step for causing the display means to display an image representing the text generated in the conversion step. In the communication method according to the second aspect, the same steps and effects as those of the communication apparatus according to the first aspect are obtained by executing the above steps by the communication device.

第３態様に係る通信プログラムは、ユーザの頭部に装着可能に構成され、画像を出力する表示手段と、他の装置との間で音情報を含むデータの送受信を実行する通信手段と、前記通信手段が受信した前記音情報を振動に変換して出力する音出力手段とを備えた通信装置のコンピュータによって実行される表示プログラムであって、通話環境が、騒音を発生する可能性がある環境である騒音環境であるか否かを判断する判断ステップと、前記判断ステップにおいて前記通話環境が前記騒音環境であると判断された場合に、前記通信手段が受信した前記音情報が表す音声をテキストに変換する変換ステップと、前記変換ステップにおいて生成された前記テキストを表す画像を前記表示手段に表示させる表示制御ステップとを実行させる。第３態様に係る通信プログラムは、上記ステップをコンピュータに実行させることにより、第１態様の通信装置と同様の作用効果が得られる。 A communication program according to a third aspect is configured to be wearable on a user's head, and includes a display unit that outputs an image, a communication unit that performs transmission and reception of data including sound information with another device, A display program executed by a computer of a communication device comprising sound output means for converting the sound information received by the communication means into vibration and outputting the vibration, wherein the call environment may generate noise A determination step for determining whether or not the noise environment is, and when the call environment is determined to be the noise environment in the determination step, the voice represented by the sound information received by the communication means is converted into a text And a display control step for causing the display means to display an image representing the text generated in the conversion step. The communication program which concerns on a 3rd aspect can obtain the effect similar to the communication apparatus of a 1st aspect by making a computer perform the said step.

ＨＭＤ１の外観を示す斜視図である。It is a perspective view which shows the external appearance of HMD1. 装着装置２及び制御装置３の電気的構成を示すブロック図である。3 is a block diagram showing an electrical configuration of the mounting device 2 and the control device 3. FIG. 通話処理のフローチャートである。It is a flowchart of a telephone call process. 図４に示す通話処理で実行される通話環境判断処理のフローチャートである。It is a flowchart of the telephone call environment judgment process performed by the telephone call process shown in FIG. 撮影画像７０，表示画像８０及び視野８５の説明図である。It is explanatory drawing of the picked-up image 70, the display image 80, and the visual field 85. FIG.

以下、本発明の一実施形態について、図面を参照して説明する。図１に示すヘッドマウントディスプレイ１（以下、「ＨＭＤ１」と言う。）は、他の装置との間で音情報を含むデータを送受信する機能を有する。ＨＭＤ１は、装着装置２と、制御装置３とを備える。装着装置２は、ユーザの頭部に装着して使用される。装着装置２は、ハーネス４を介して制御装置３と着脱可能に接続する。制御装置３は、例えば、ユーザの腰ベルト等に装着して使用される。制御装置３は、装着装置２を制御する。以下、装着装置２と、制御装置３とのそれぞれについて詳述する。図１の上方向、下方向、右斜め下方向、左斜め上方向、右斜め上方向、左斜め下方向が、それぞれ、ＨＭＤ１の上方向、下方向、前方向、後方向、右方向、左方向である。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. The head mounted display 1 (hereinafter referred to as “HMD1”) shown in FIG. 1 has a function of transmitting and receiving data including sound information to and from other devices. The HMD 1 includes a mounting device 2 and a control device 3. The mounting device 2 is used by being mounted on the user's head. The mounting device 2 is detachably connected to the control device 3 via the harness 4. For example, the control device 3 is used by being mounted on a user's waist belt or the like. The control device 3 controls the mounting device 2. Hereinafter, each of the mounting device 2 and the control device 3 will be described in detail. The upward direction, downward direction, downward diagonal direction to the right, upward diagonal direction to the left, upward diagonal direction to the right, and downward diagonal direction to the left in FIG. 1 are the upward direction, downward direction, forward direction, backward direction, right direction, left direction, respectively. Direction.

図１を参照して、装着装置２の構成について説明する。装着装置２は、眼鏡１１と、表示装置１２と、音情報入出力装置１３と、気導式マイク１４とを備える。眼鏡１１は、通常の眼鏡と同様の形状を有する。ユーザが装着装置２を装着した状態において、眼鏡１１は、表示装置１２と、音情報入出力装置１３と、気導式マイク１４とをユーザの頭部において保持する。眼鏡１１のフレームの内部には、音情報入出力装置１３と、気導式マイク１４とに電気的に接続される配線が設けられている。この配線は、電力の供給と、各種信号の送受信とを行うためのものである。 With reference to FIG. 1, the structure of the mounting apparatus 2 is demonstrated. The wearing device 2 includes glasses 11, a display device 12, a sound information input / output device 13, and an air conduction microphone 14. The glasses 11 have the same shape as normal glasses. In a state where the user wears the wearing device 2, the glasses 11 hold the display device 12, the sound information input / output device 13, and the air-conduction microphone 14 on the user's head. Inside the frame of the glasses 11, wiring that is electrically connected to the sound information input / output device 13 and the air conduction microphone 14 is provided. This wiring is for supplying power and transmitting / receiving various signals.

表示装置１２は、画像信号に応じて変調されたレーザ光を走査させて、画像光として射出し、ユーザの少なくとも一方の眼の網膜に画像を直接投影する網膜走査型ディスプレイである。表示装置１２は、筐体１５と、支持部１９とを備える。筐体１５は、内部に画像光を出力するための構成を収容する。画像光を出力するための構成は、画像表示部（図示外）と、接眼光学系（図示外）と、ハーフミラー１６（以下「ＨＭ１６」と言う。）とを含む。画像表示部は、ＣＰＵ２１（図２参照）と、表示制御部２９（図２参照）と、ディスプレイ２８（図２参照）とを備える。画像表示部は、制御装置３からハーネス４を介して送信される映像信号に基づいて、表示制御部２９を駆動させ、ディスプレイ２８に画像を表示する。ディスプレイ２８は、例えば、液晶表示素子とバックライトとで構成される。接眼光学系は、ディスプレイ２８に表示された画像を示す画像光を、ユーザの眼に導くために集光する。ＨＭ１６は、表示装置１２の左側に設けられ、接眼光学系から射出された画像光の少なくとも一部（例えば半分）を反射して、ユーザの左の眼球（図示略）に入射させる。ＨＭ１６は外界の実像からの光の少なくとも一部を透過するので、ユーザは、自己の視野において実像（外界の風景）に重畳して画像を視認できる。筐体１５は、さらに、内部にカメラ１７を備える。カメラ１７のレンズは、筐体１５の前面且つＨＭ１６の右側に設けられた小孔部１８から露出する。カメラ１７は、ユーザの視野前方の外界の風景を表す画像データを生成する。支持部１９は、表示装置１２の右上部に設けられ、表示装置１２を眼鏡１１の右上部に取り付けるための部材である。 The display device 12 is a retinal scanning display that scans laser light modulated in accordance with an image signal, emits the laser light as image light, and directly projects an image on the retina of at least one eye of the user. The display device 12 includes a housing 15 and a support unit 19. The housing 15 accommodates a configuration for outputting image light therein. The configuration for outputting image light includes an image display unit (not shown), an eyepiece optical system (not shown), and a half mirror 16 (hereinafter referred to as “HM16”). The image display unit includes a CPU 21 (see FIG. 2), a display control unit 29 (see FIG. 2), and a display 28 (see FIG. 2). The image display unit drives the display control unit 29 based on the video signal transmitted from the control device 3 via the harness 4 and displays an image on the display 28. The display 28 includes, for example, a liquid crystal display element and a backlight. The eyepiece optical system condenses image light indicating an image displayed on the display 28 so as to guide it to the eyes of the user. The HM 16 is provided on the left side of the display device 12 and reflects at least a part (for example, half) of the image light emitted from the eyepiece optical system so as to enter the left eyeball (not shown) of the user. Since the HM 16 transmits at least part of the light from the real image in the outside world, the user can visually recognize the image superimposed on the real image (outside landscape) in his field of view. The housing 15 further includes a camera 17 inside. The lens of the camera 17 is exposed from a small hole 18 provided on the front surface of the housing 15 and on the right side of the HM 16. The camera 17 generates image data representing an external landscape in front of the user's visual field. The support portion 19 is a member that is provided on the upper right portion of the display device 12 and attaches the display device 12 to the upper right portion of the glasses 11.

音情報入出力装置１３は、ユーザの耳（図示略）を覆うヘッドフォン型の装置であり、図２に示す、骨伝導式マイク３１，気導式イヤホン３２，及び骨伝導式イヤホン３３を備える。気導式マイク１４は、眼鏡１１の左前端から下方に設けられる。ユーザが装着装置２を装着した場合、気導式マイク１４は、ユーザの口（図示略）付近に配設される。 The sound information input / output device 13 is a headphone type device that covers a user's ear (not shown), and includes a bone conduction microphone 31, an air conduction earphone 32, and a bone conduction earphone 33 shown in FIG. The air conducting microphone 14 is provided below the left front end of the glasses 11. When the user wears the wearing device 2, the air-conduction microphone 14 is disposed near the user's mouth (not shown).

制御装置３の構成について説明する。制御装置３は、略直方体状のシステムボックスである。制御装置３は操作部５を備える。ユーザは、他の装置のユーザとの通話を開始する等の各種操作を、操作部５を介して行うことが可能である。制御装置３はコネクタ６を備える。コネクタ６には、通信ケーブルを接続可能である。 The configuration of the control device 3 will be described. The control device 3 is a substantially rectangular parallelepiped system box. The control device 3 includes an operation unit 5. The user can perform various operations such as starting a call with a user of another device via the operation unit 5. The control device 3 includes a connector 6. A communication cable can be connected to the connector 6.

図２を参照して、装着装置２の電気的構成について説明する。装着装置２は、ＣＰＵ２１，ＲＯＭ２３，ＲＡＭ２５，表示制御部２９，カメラ１７，気導式マイク１４，骨伝導式マイク３１，気導式イヤホン３２，骨伝導式イヤホン３３及び第一有線通信部３４を備える。ＣＰＵ２１は、装着装置２全体の制御を司り、装着装置２が備える他の電気機器と電気的に接続する。ＲＯＭ２３は、ＣＰＵ２１が実行する各種プログラム等を記憶する。ＲＡＭ２５は、各種データを一時的に記憶する。 With reference to FIG. 2, the electrical configuration of the mounting device 2 will be described. The mounting device 2 includes a CPU 21, a ROM 23, a RAM 25, a display control unit 29, a camera 17, an air conduction microphone 14, a bone conduction microphone 31, an air conduction earphone 32, a bone conduction earphone 33, and a first wired communication unit 34. Prepare. The CPU 21 controls the entire mounting device 2 and is electrically connected to other electrical devices included in the mounting device 2. The ROM 23 stores various programs executed by the CPU 21. The RAM 25 temporarily stores various data.

表示制御部２９は、ディスプレイ２８に画像を表示させるためにディスプレイ２８を制御する。以下、ディスプレイ２８に表示させる画像を、表示画像と言う。カメラ１７は、静止画像又は動画像（以下、「撮影画像」と言う。）を表す画像データを生成する。気導式マイク１４は、空気中を伝わる音の振動を音情報に変換する。気導式マイク１４としては、例えば、ダイナミックマイク及びコンデンサマイク等、周知の構成が利用できる。空気中を伝わる音の振動に基づき生成された音情報を第１音情報と言う。骨伝導式マイク３１は、ユーザの骨導音を収集して音情報に変換する。ユーザの骨導音を収集して生成された音情報を第２音情報と言う。気導式イヤホン３２は、音情報を空気中を伝わる音に変換して出力する。骨伝導式イヤホン３３は、音情報を、骨導音に変換して出力する。第一有線通信部３４は、ハーネス４（図１参照）を介して制御装置３と有線通信を行う。ＣＰＵ２１は、後述する通話処理において、気導式マイク１４が生成した第１音情報及び骨伝導式マイク３１が生成した第２音情報を、第一有線通信部３４及びハーネス４を介して制御装置３に随時送信する。 The display control unit 29 controls the display 28 in order to display an image on the display 28. Hereinafter, an image displayed on the display 28 is referred to as a display image. The camera 17 generates image data representing a still image or a moving image (hereinafter referred to as “captured image”). The air conduction type microphone 14 converts vibration of sound transmitted in the air into sound information. As the air conduction type microphone 14, for example, a known configuration such as a dynamic microphone and a condenser microphone can be used. Sound information generated based on vibration of sound transmitted in the air is referred to as first sound information. The bone conduction microphone 31 collects a user's bone conduction sound and converts it into sound information. The sound information generated by collecting the user's bone conduction sound is referred to as second sound information. The air-conduction type earphone 32 converts sound information into sound transmitted in the air and outputs the sound. The bone conduction earphone 33 converts sound information into bone conduction sound and outputs it. The first wired communication unit 34 performs wired communication with the control device 3 via the harness 4 (see FIG. 1). The CPU 21 controls the first sound information generated by the air conduction microphone 14 and the second sound information generated by the bone conduction microphone 31 in the call processing described later via the first wired communication unit 34 and the harness 4. 3 to send at any time.

制御装置３の電気的構成について説明する。制御装置３は、ＣＰＵ４１，ＲＯＭ４３，ＲＡＭ４５，フラッシュメモリ４７，第二有線通信部４６，操作部５，無線通信部５３，及び外部通信部５５を備える。ＣＰＵ４１は、制御装置３全体の制御を司り、制御装置３が備える他の電子機器と電気的に接続する。ＲＯＭ４３は、ＣＰＵ４１が実行する各種プログラム等を記憶する。ＲＡＭ４５は、各種データを一時的に記憶する。フラッシュメモリ４７は、登録画像記憶エリア４８，設定記憶エリア４９及びその他の記憶エリア５０を備える。登録画像記憶エリア４８は、通話環境が騒音環境であるか否かの判断に用いる画像（以下、「登録画像」とも言う。）を表す画像データを記憶する。登録画像は、周囲の環境の属性（例えば、作業場、遊戯室、工場、研究所等）を表す標識である。登録画像５１は、工場の内部等の「ＷＯＲＫＡＲＥＡ」に貼られる標識を表す。設定記憶エリア４９は、後述する通話処理で用いられる閾値を含む各種設定値を記憶する。その他の記憶エリア５０は、音声認識を行うための音響モデル、言語モデル、及び単語辞書を含むその他の情報を記憶する。フラッシュメモリ４７には、ＣＰＵ２１，４１が実行するプログラムが記憶されてもよい。第二有線通信部４６は、ハーネス４を介して装着装置２と有線通信を行う。無線通信部５３は、他の装置との間で無線通信を行う。外部通信部５５は、通信ケーブルを介して他の装置と有線通信を行う。ＨＭＤ１と無線又は有線通信を行う他の装置は、例えば、他のＨＭＤ１，ＰＣ，携帯電話、及びスマートフォンである。 The electrical configuration of the control device 3 will be described. The control device 3 includes a CPU 41, a ROM 43, a RAM 45, a flash memory 47, a second wired communication unit 46, an operation unit 5, a wireless communication unit 53, and an external communication unit 55. The CPU 41 controls the entire control device 3 and is electrically connected to other electronic devices included in the control device 3. The ROM 43 stores various programs executed by the CPU 41. The RAM 45 temporarily stores various data. The flash memory 47 includes a registered image storage area 48, a setting storage area 49, and other storage areas 50. The registered image storage area 48 stores image data representing an image (hereinafter also referred to as “registered image”) used for determining whether or not the call environment is a noise environment. The registered image is a sign indicating an attribute of the surrounding environment (for example, a workplace, a play room, a factory, a laboratory, etc.). The registered image 51 represents a sign affixed to “WORK AREA” such as inside a factory. The setting storage area 49 stores various setting values including threshold values used in call processing to be described later. The other storage area 50 stores other information including an acoustic model, a language model, and a word dictionary for performing speech recognition. The flash memory 47 may store programs executed by the CPUs 21 and 41. The second wired communication unit 46 performs wired communication with the mounting device 2 via the harness 4. The wireless communication unit 53 performs wireless communication with other devices. The external communication unit 55 performs wired communication with other devices via a communication cable. Other devices that perform wireless or wired communication with the HMD 1 are, for example, other HMDs 1, PCs, mobile phones, and smartphones.

ＨＭＤ１で実行される通話処理の概要について説明する。ＨＭＤ１は、無線通信部５３を介して、他の装置との間で音情報を含むデータを送受信可能である。通話処理はこの機能を利用して、ＨＭＤ１のユーザが、他の装置のユーザと通話する場合に実行される。ＨＭＤ１は、通話処理において、通話環境が騒音環境であるか否かに応じて、音情報の入出力方法を切り替える。通話環境は、処理を実行するＨＭＤ１（以下、「自装置１」とも言う。）の周囲の環境及び通信相手である他の装置（以下、「他装置」とも言う。）の周囲の環境のうちの少なくともいずれかである。本実施形態の通話環境は、自装置の周囲の環境及び他装置の周囲の環境の双方を指す。騒音環境は、騒音を発生する可能性がある環境である。本実施形態の騒音環境は、処理実行時に騒音が発生している環境に加え、処理実行期間中に騒音が発生すると予測される環境も含む。ＨＭＤ１は、通話環境が騒音環境ではないと判断される場合に、気導式マイク１４が生成した第１音情報を他装置に送信し、且つ、気導式イヤホン３２を用いて音情報を気導音に変換して出力する。ＨＭＤ１は、通話環境が騒音環境であると判断される場合に、骨伝導式マイク３１が生成した第２音情報を他装置に送信し、且つ、骨伝導式イヤホン３３を用いて音情報を骨導音に変換して出力する。ＨＭＤ１はさらに、通話環境が騒音環境であると判断される場合に、他装置から送信された音情報が表す音声をテキストに変換し、生成されたテキストを表す画像をディスプレイ２８に表示する。 An outline of call processing executed in the HMD 1 will be described. The HMD 1 can transmit and receive data including sound information to and from other devices via the wireless communication unit 53. Call processing is executed when a user of the HMD 1 uses this function to make a call with a user of another device. In the call processing, the HMD 1 switches the sound information input / output method according to whether or not the call environment is a noise environment. The call environment includes an environment around the HMD 1 (hereinafter also referred to as “own device 1”) that executes processing and an environment around another device that is a communication partner (hereinafter also referred to as “other device”). At least one of them. The call environment of the present embodiment refers to both the environment around the device itself and the environment around other devices. The noise environment is an environment that may generate noise. The noise environment of the present embodiment includes an environment in which noise is predicted to be generated during the process execution period, in addition to an environment in which noise is generated during the process execution. When it is determined that the call environment is not a noise environment, the HMD 1 transmits the first sound information generated by the air-conduction microphone 14 to another device and uses the air-conduction earphone 32 to monitor the sound information. Convert to sound and output. When it is determined that the call environment is a noise environment, the HMD 1 transmits the second sound information generated by the bone conduction microphone 31 to another device, and uses the bone conduction earphone 33 to transmit the sound information to the bone information. Convert to sound and output. Further, when it is determined that the call environment is a noise environment, the HMD 1 converts the voice represented by the sound information transmitted from the other device into text, and displays an image representing the generated text on the display 28.

図３から図５を参照して、通話処理を説明する。通話処理は、ＨＭＤ１のユーザが操作部５を操作して、他装置と通信を開始させる指示を入力した場合、又は、他装置から通信を開始させる指示を含む信号を受信した場合に開始される。図３の通話処理は、図２のＲＯＭ４３に記憶されたプログラムに従って、ＣＰＵ２１とＣＰＵ４１とが協働して実行する。一例として、他のＨＭＤ１との間で通信を実行する場合について説明する。 Call processing will be described with reference to FIGS. The call processing is started when the user of the HMD 1 operates the operation unit 5 to input an instruction to start communication with another device, or when a signal including an instruction to start communication is received from the other device. . The call process in FIG. 3 is executed by the CPU 21 and the CPU 41 in cooperation with each other according to the program stored in the ROM 43 in FIG. As an example, a case where communication is performed with another HMD 1 will be described.

図３に示すように、ＣＰＵ４１はまず、自装置１と他装置との間の通信を確立する（Ｓ５）。次に、ＣＰＵ４１は、通話環境判断処理を実行する（Ｓ１０）。図４を参照して、通話環境判断処理について説明する。図４に示すように、通話環境判断処理ではまず、ＣＰＵ４１は、環境情報の一態様として、気導式マイク１４が生成した第１音情報を取得する（Ｓ１５）。環境情報は、ＨＭＤ１のユーザの周囲の環境を表す情報である。ＣＰＵ４１は、ＣＰＵ２１に指示を出力し、制御装置３に第１音情報及び第２音情報の送信を開始させる。ＣＰＵ２１は、受信した指示に基づき、気導式マイク１４が生成した第１音情報及び骨伝導式マイク３１が生成した第２音情報を、随時制御装置３に送信する処理を開始する。Ｓ１５で取得された第１音情報は、自装置１の周囲の環境の音を表す。ＣＰＵ４１は、Ｓ１５で取得した第１音情報に基づき、自装置１の周囲の音レベルを特定し、特定した音レベルを含むデータを他装置に送信する（Ｓ２０）。音レベルは、周囲の音の大きさの程度を表す周知の指標によって表される。本実施形態のＨＭＤ１は、Ｓ１５で取得した第１音情報から予測される音圧レベル（ｄＢ）を用いて音レベルを表す。ここで、Ｓ１５で取得された第１音情報は、ユーザが発声していた場合、周囲の音だけでなくユーザが発した音声を表す情報も含む。そこで本実施形態のＨＭＤ１は、第１音情報が表す音のうち、音声に相当する周波数域の音を除いて音圧レベルを推定する。このためＨＭＤ１は、第１音情報に含まれるユーザの音声の影響を除去し、周囲の音の大きさの音圧レベルをより正確に予測できる。次に、ＣＰＵ４１は、他装置から送信されるデータを受信し、受信したデータから他装置の音レベルを取得する（Ｓ２５）。具体例として、自装置１の音レベルが５０ｄＢであり、他装置から送信された音レベルが４０ｄＢである場合を想定する。 As shown in FIG. 3, the CPU 41 first establishes communication between the own apparatus 1 and another apparatus (S5). Next, the CPU 41 executes a call environment determination process (S10). With reference to FIG. 4, the call environment determination process will be described. As shown in FIG. 4, in the call environment determination process, first, the CPU 41 acquires the first sound information generated by the air-conduction microphone 14 as one aspect of the environment information (S15). The environment information is information representing the environment around the user of the HMD 1. The CPU 41 outputs an instruction to the CPU 21 and causes the control device 3 to start transmitting the first sound information and the second sound information. Based on the received instruction, the CPU 21 starts a process of transmitting the first sound information generated by the air conduction microphone 14 and the second sound information generated by the bone conduction microphone 31 to the control device 3 as needed. The first sound information acquired in S15 represents the sound of the environment around the device 1 itself. CPU41 specifies the sound level around the own apparatus 1 based on the 1st sound information acquired by S15, and transmits the data containing the specified sound level to another apparatus (S20). The sound level is represented by a well-known index that represents the degree of loudness of the surrounding sound. HMD1 of this embodiment represents a sound level using the sound pressure level (dB) estimated from the 1st sound information acquired by S15. Here, the 1st sound information acquired by S15 includes the information showing not only the surrounding sound but the voice which the user uttered, when the user uttered. Therefore, the HMD 1 of the present embodiment estimates the sound pressure level by excluding the sound in the frequency range corresponding to the sound from the sound represented by the first sound information. For this reason, the HMD 1 can eliminate the influence of the user's voice included in the first sound information, and more accurately predict the sound pressure level of the surrounding sound volume. Next, the CPU 41 receives data transmitted from the other device, and acquires the sound level of the other device from the received data (S25). As a specific example, it is assumed that the sound level of the own device 1 is 50 dB and the sound level transmitted from another device is 40 dB.

ＣＰＵ４１は、自装置１の音レベルが閾値以下であるか否かを判断する（Ｓ３０）。閾値は、Ｓ２０で特定された自装置１の音レベルに基づき、自装置１の周囲の環境が騒音環境であるか否かを判断するために予め設定された値である。Ｓ３０の処理の閾値は、音レベルを表す指標に応じて適宜定められればよく、予め定められた固定値であってもよいし、ユーザが設定した値であってもよい。本実施形態の閾値は、音圧レベル６０ｄＢと予め設定されている。具体例では、ＣＰＵ４１は、自装置１の音レベル（５０ｄＢ）は閾値以下であると判断し（Ｓ３０：ＹＥＳ）、他装置の音レベルが閾値以下であるか否かを判断する（Ｓ３５）。Ｓ３５の処理の閾値は、音レベルを表す指標に応じて適宜定められればよく、予め定められていてもよいし、ユーザが設定した値であってもよい。Ｓ３５の処理の閾値は、Ｓ３０の処理の閾値と同じであってもよいし、異なっていてもよい。本実施形態の閾値は、音圧レベル６０ｄＢと予め設定されている。具体例では、ＣＰＵ４１は、他装置の音レベル（４０ｄＢ）は閾値以下であると判断し（Ｓ３５：ＹＥＳ）、環境情報の一態様として自装置１の画像データを取得する（Ｓ４０）。ＣＰＵ４１は、Ｓ４０では、Ｓ１５で取得した第１音情報とは別の環境情報として、カメラ１７が生成した画像データを取得する。具体例において、図５に示す撮影画像７０を表す画像データが取得された場合を想定する。 CPU41 judges whether the sound level of the own apparatus 1 is below a threshold value (S30). The threshold is a value set in advance to determine whether or not the environment around the device 1 is a noise environment based on the sound level of the device 1 specified in S20. The threshold value of the process of S30 may be determined as appropriate according to the index representing the sound level, and may be a predetermined fixed value or a value set by the user. The threshold of this embodiment is preset as a sound pressure level of 60 dB. In a specific example, the CPU 41 determines that the sound level (50 dB) of the own device 1 is equal to or lower than the threshold (S30: YES), and determines whether the sound level of the other device is equal to or lower than the threshold (S35). The threshold value for the process of S35 may be determined as appropriate according to the index representing the sound level, and may be determined in advance or may be a value set by the user. The threshold for the process at S35 may be the same as or different from the threshold for the process at S30. The threshold of this embodiment is preset as a sound pressure level of 60 dB. In the specific example, the CPU 41 determines that the sound level (40 dB) of the other device is equal to or less than the threshold value (S35: YES), and acquires the image data of the own device 1 as one aspect of the environmental information (S40). In S40, the CPU 41 acquires the image data generated by the camera 17 as environment information different from the first sound information acquired in S15. In a specific example, a case is assumed where image data representing the captured image 70 illustrated in FIG. 5 is acquired.

ＣＰＵ４１は、Ｓ４０で取得した画像データに基づき、通話環境が騒音環境であるか否かを判断する（Ｓ４５）。本実施形態では、ＣＰＵ４１は、Ｓ４０で取得した画像データによって表される撮影画像中に、登録画像が含まれる場合に、通話環境が騒音環境であると判断する。撮影画像中に、登録画像が含まれるか否かを判断する処理は、公知の技術が適宜採用されて実行される。登録画像は、フラッシュメモリ４７に記憶されている画像データによって表される。例えば、Ｓ４５では以下の手順で判断処理が実行される。まずＣＰＵ４１は、撮影画像と登録画像とに対してエッジ検出を行う。エッジ検出の手法としては、画像を一次微分し、勾配が極大となる位置を検出する手法、及び画像を二次微分してゼロ交差位置を検出する手法等、周知の方法が用いられる。次にＣＰＵ４１は、検出されたエッジから特徴点（例えば、頂点位置）を抽出する。次にＣＰＵ４１は、撮影画像の特徴点と登録画像の特徴点とをパターンマッチングで比較し、撮影画像の特徴点の中に登録画像の特徴点と一致するパターンが含まれていれば、撮影画像中に登録画像が含まれる判断する。 The CPU 41 determines whether or not the call environment is a noise environment based on the image data acquired in S40 (S45). In the present embodiment, the CPU 41 determines that the call environment is a noise environment when a registered image is included in the captured image represented by the image data acquired in S40. The process of determining whether or not a registered image is included in a captured image is executed by appropriately adopting a known technique. The registered image is represented by image data stored in the flash memory 47. For example, in S45, the determination process is executed according to the following procedure. First, the CPU 41 performs edge detection on the captured image and the registered image. As a method for detecting an edge, a known method such as a method for first-order differentiation of an image to detect a position where the gradient becomes maximum and a method for second-order differentiation of an image to detect a zero crossing position are used. Next, the CPU 41 extracts feature points (for example, vertex positions) from the detected edges. Next, the CPU 41 compares the feature point of the captured image with the feature point of the registered image by pattern matching. If the feature point of the captured image includes a pattern that matches the feature point of the registered image, the captured image It is determined that the registered image is included.

図５のように、具体例の撮影画像７０には、標識７１が含まれる。標識７１は、図２のように登録画像５１として、フラッシュメモリ４７に記憶されている。したがって、ＣＰＵ４１は、通話環境は騒音環境であると判断し（Ｓ４５：ＹＥＳ）、フラグをＯＮにする（Ｓ５５）。フラグは、通話環境が騒音環境であるか否かの判断結果を表す。フラグがＯＮである場合、通話環境が騒音環境であると判断されたことを表す。自装置１の音レベルが閾値よりも大きかった場合（Ｓ３０：ＮＯ）、及び他装置の音レベルが閾値よりも大きかった場合（Ｓ３５：ＮＯ）にも、ＣＰＵ４１はフラグをＯＮにする（Ｓ５５）。一方、ＣＰＵ４１は、Ｓ４０で取得した画像データに基づき通話環境は騒音環境ではないと判断した場合（Ｓ４５：ＮＯ）、フラグをＯＦＦにする（Ｓ５０）。フラグがＯＦＦである場合、通話環境が騒音環境ではないと判断されたことを表す。Ｓ５０又はＳ５５の次に、処理は図３の通話処理に戻る。 As shown in FIG. 5, the captured image 70 of the specific example includes a sign 71. The sign 71 is stored in the flash memory 47 as the registered image 51 as shown in FIG. Therefore, the CPU 41 determines that the call environment is a noise environment (S45: YES), and turns on the flag (S55). The flag represents a determination result of whether or not the call environment is a noise environment. When the flag is ON, it indicates that the call environment is determined to be a noise environment. The CPU 41 also turns on the flag when the sound level of the own device 1 is larger than the threshold (S30: NO) and when the sound level of the other device is larger than the threshold (S35: NO) (S55). . On the other hand, if the CPU 41 determines that the call environment is not a noise environment based on the image data acquired in S40 (S45: NO), the CPU 41 turns off the flag (S50). If the flag is OFF, it indicates that the call environment is determined not to be a noise environment. After S50 or S55, the process returns to the call process of FIG.

Ｓ１０の次に、ＣＰＵ４１は、他装置の音声を取得したか否かを判断する（Ｓ６０）。ＣＰＵ４１は、他装置から受信したデータから音情報を取得する。ＣＰＵ４１は、音情報に音声に対応する周波数帯の情報が含まれている場合に、他装置の音声を取得したと判断する（Ｓ６０：ＹＥＳ）。この場合、ＣＰＵ４１は、具体例のように通話環境判断処理で設定したフラグがＯＮである場合には（Ｓ６５：ＹＥＳ）、装着装置２に他装置から受信した音情報、及び骨伝導式イヤホン３３を用いて音情報を音に変換させるための指示を出力する。ＣＰＵ２１は、制御装置３から受信した指示に基づき、骨伝導式イヤホン３３を用いて、他装置から受信した音情報を音に変換して出力する（Ｓ７０）。次に、ＣＰＵ４１は、Ｓ６０で取得した音情報の音声を認識し、テキストに変換する（Ｓ７５）。音声認識の方法は公知の方法が適宜採用されればよい。例えば、ＣＰＵ４１は、音情報を分析し、特徴量を抽出する。ＣＰＵ４１は、必要に応じてフラッシュメモリ４７に記憶された単語辞書を参照しつつ、抽出した特徴量と、フラッシュメモリ４７に記憶された音響モデル及び言語モデルとのマッチングを行う。その結果、言語モデルで受理可能な文毎に尤度が求まり、尤度が最も高い文が認識結果のテキストとして得られる。具体例において、音情報が表す音声が、テキスト「Ｈｅｌｌｏ，ｔｈｉｓｉｓＡＢＣｏｆＸｃｏｍｐａｎｙ．」に変換された場合を想定する。 Following S10, the CPU 41 determines whether or not the sound of another device has been acquired (S60). The CPU 41 acquires sound information from data received from another device. CPU41 judges that the audio | voice of another apparatus was acquired when the information of the frequency band corresponding to an audio | voice is contained in sound information (S60: YES). In this case, when the flag set in the call environment determination process is ON as in the specific example (S65: YES), the sound information received from the other device by the mounting device 2 and the bone conduction earphone 33 are displayed. Is used to output instructions for converting sound information into sound. Based on the instruction received from the control device 3, the CPU 21 converts the sound information received from the other device into sound using the bone conduction earphone 33 and outputs the sound (S70). Next, the CPU 41 recognizes the sound information obtained in S60 and converts it into text (S75). A known method may be appropriately employed as the speech recognition method. For example, the CPU 41 analyzes sound information and extracts feature amounts. The CPU 41 matches the extracted feature quantity with the acoustic model and the language model stored in the flash memory 47 while referring to the word dictionary stored in the flash memory 47 as necessary. As a result, the likelihood is obtained for each sentence that can be accepted by the language model, and the sentence with the highest likelihood is obtained as the text of the recognition result. In a specific example, it is assumed that the voice represented by the sound information is converted into the text “Hello, this is ABC of X company.”.

ＣＰＵ４１は、テキストを表す画像データを生成し、生成した画像データ、及び画像データに基づき表示画像を表示させるための指示を装着装置２に送信する。ＣＰＵ２１は、制御装置３から受信した指示に基づき、表示制御部２９を制御して、画像データが表す表示画像をディスプレイ２８に表示させる（Ｓ８０）。具体例では、Ｓ８０の処理によって、図５の表示画像８０がディスプレイ２８に表示される。自己の視野において実像が撮影画像７０と同様の像であり、表示画像８０がディスプレイ２８に表示された場合、図５に示すように、ユーザは、自己の視野８５において実像に重畳してテキストを表す表示画像８０を視認できる。一方、通話環境判断処理で設定したフラグがＯＦＦである場合には（Ｓ６５：ＮＯ）、ＣＰＵ４１は、他装置から受信した音情報及び気導式イヤホン３２で音情報を出力するための指示を装着装置２に出力する。ＣＰＵ２１は、制御装置３から受信した指示に基づき、気導式イヤホン３２を用いて、他装置から受信した音情報を音に変換して出力する（Ｓ８５）。 The CPU 41 generates image data representing text, and transmits the generated image data and an instruction for displaying a display image based on the image data to the mounting apparatus 2. Based on the instruction received from the control device 3, the CPU 21 controls the display control unit 29 to display the display image represented by the image data on the display 28 (S80). In the specific example, the display image 80 of FIG. 5 is displayed on the display 28 by the process of S80. When the real image is the same image as the photographed image 70 in the own field of view and the display image 80 is displayed on the display 28, the user superimposes the text on the real image in the own field of view 85 as shown in FIG. The displayed display image 80 can be visually recognized. On the other hand, when the flag set in the call environment determination processing is OFF (S65: NO), the CPU 41 attaches the sound information received from the other device and the instruction for outputting the sound information with the air conduction type earphone 32. Output to device 2. Based on the instruction received from the control device 3, the CPU 21 converts the sound information received from the other device into sound using the air conduction type earphone 32 and outputs the sound (S85).

他装置から音声を取得していない場合（Ｓ６０：ＮＯ）、Ｓ８０又はＳ８５の次に、ＣＰＵ４１は、自装置１の音声を取得したか否かを判断する（Ｓ９０）。ＣＰＵ４１は、装着装置２から取得したデータから音情報を取得する。ＣＰＵ４１は、装着装置２から送信される音情報に音声に対応する周波数帯の情報が含まれている場合に、自装置１の音声を取得したと判断する（Ｓ９０：ＹＥＳ）。この場合、具体例のようにフラグがＯＮである場合には（Ｓ９５：ＹＥＳ）、ＣＰＵ４１は、骨伝導式マイク３１が生成した第２音情報を取得する（Ｓ１００）。ＣＰＵ４１は、取得した第２音情報を含むデータを、他装置に出力する（Ｓ１０５）。一方、フラグがＯＦＦである場合には（Ｓ９５：ＮＯ）、ＣＰＵ４１は、気導式マイク１４が生成した第１音情報を装着装置２から取得する（Ｓ１１０）。ＣＰＵ４１は、取得した第１音情報を含むデータを、他装置に出力する（Ｓ１１５）。自装置１の音声を取得していない場合（Ｓ９０：ＮＯ）、Ｓ１０５又はＳ１１５の次に、通話を終了させる指示がなければ（Ｓ１２０：ＮＯ）、処理はＳ６０に戻る。通話を終了させる指示があれば（Ｓ１２０：ＹＥＳ）、通話処理は以上で終了する。 When the voice is not acquired from the other apparatus (S60: NO), the CPU 41 determines whether or not the voice of the own apparatus 1 is acquired after S80 or S85 (S90). The CPU 41 acquires sound information from the data acquired from the mounting device 2. CPU41 judges that the audio | voice of the own apparatus 1 was acquired when the information of the frequency band corresponding to an audio | voice is contained in the sound information transmitted from the mounting apparatus 2 (S90: YES). In this case, when the flag is ON as in the specific example (S95: YES), the CPU 41 obtains the second sound information generated by the bone conduction microphone 31 (S100). CPU41 outputs the data containing the acquired 2nd sound information to another apparatus (S105). On the other hand, when the flag is OFF (S95: NO), the CPU 41 obtains the first sound information generated by the air conduction microphone 14 from the mounting device 2 (S110). CPU41 outputs the data containing the acquired 1st sound information to another apparatus (S115). If the voice of the device 1 has not been acquired (S90: NO), if there is no instruction to end the call after S105 or S115 (S120: NO), the process returns to S60. If there is an instruction to end the call (S120: YES), the call process is ended.

本実施形態のＨＭＤ１において、表示装置１２（ディスプレイ２８）は本発明の表示手段に相当する。無線通信部５３は、本発明の通信手段に相当する。気導式イヤホン３２と、骨伝導式イヤホン３３とのそれぞれは、本発明の音出力手段に相当する。図４のＳ３０，Ｓ３５，及びＳ４５のそれぞれは、本発明の判断ステップに相当する。Ｓ３０，Ｓ３５，及びＳ４５の処理のそれぞれを実行するＣＰＵ４１は、本発明の判断手段として機能する。Ｓ７５は、本発明の変換ステップに相当する。Ｓ７５の処理を実行するＣＰＵ４１は、本発明の変換手段として機能する。Ｓ８０は、本発明の表示制御ステップに相当する。Ｓ８０の処理を実行するＣＰＵ４１は、本発明の表示制御手段として機能する。Ｓ１５及びＳ４０の処理のそれぞれを実行するＣＰＵ４１は、本発明の取得手段として機能する。気導式マイク１４は、本発明の第１音情報生成手段に相当する。Ｓ１１５の処理を実行するＣＰＵ４１は、第１通信制御手段として機能する。骨伝導式マイク３１は、本発明の第２音情報生成手段に相当する。Ｓ１０５の処理を実行するＣＰＵ４１は、本発明の第２通信制御手段として機能する。 In the HMD 1 of the present embodiment, the display device 12 (display 28) corresponds to the display means of the present invention. The wireless communication unit 53 corresponds to communication means of the present invention. Each of the air conduction type earphone 32 and the bone conduction type earphone 33 corresponds to the sound output means of the present invention. Each of S30, S35, and S45 in FIG. 4 corresponds to a determination step of the present invention. The CPU 41 that executes each of the processes of S30, S35, and S45 functions as a determination unit of the present invention. S75 corresponds to the conversion step of the present invention. The CPU 41 that executes the process of S75 functions as the conversion means of the present invention. S80 corresponds to the display control step of the present invention. The CPU 41 that executes the process of S80 functions as display control means of the present invention. The CPU 41 that executes each of the processes of S15 and S40 functions as an acquisition unit of the present invention. The air conduction type microphone 14 corresponds to the first sound information generating means of the present invention. The CPU 41 that executes the process of S115 functions as a first communication control unit. The bone conduction microphone 31 corresponds to the second sound information generating means of the present invention. The CPU 41 that executes the process of S105 functions as the second communication control means of the present invention.

ＨＭＤ１によれば、通話環境が騒音環境であると判断される場合であっても、ユーザは、他装置から送信された音情報が表す事項を、ディスプレイ２８に表示されるテキストを見ることによって、認識することができる。ＨＭＤ１は、３つの観点から通話環境が騒音環境であるか否かを判断する。１つ目の観点は、自装置１の周囲の環境が処理実行時に騒音を発生している環境であるか否かという観点であり、ＨＭＤ１は、環境情報の一態様である自装置１の周囲の音を表す第１音情報に基づき、騒音環境であるか否かを判断する（Ｓ３０）。気導式マイク１４が生成した第１音情報は自装置１の周囲に騒音が発生しているか否かを表すため、気導式マイク１４が生成した第１音情報以外の情報に基づき、通話環境が騒音環境であるか否かを判断する場合に比べ、判断結果の信頼性が高い。自装置１の周囲の音の音レベルが大きい場合、通話処理実行中に他装置のユーザの音声が、自装置１の周囲の音の影響で聞き取りにくいことが予想される。ＨＭＤ１のユーザは、他装置から送信された音情報が表す音声を、自装置１の周囲の音の影響で聞き取りにくい場合でも、ディスプレイ２８に表示されるテキストを見ることによって、該音情報が表す事項を認識することができる。 According to HMD1, even when the call environment is determined to be a noise environment, the user can view the matter represented by the sound information transmitted from the other device by looking at the text displayed on the display 28. Can be recognized. The HMD 1 determines whether or not the call environment is a noise environment from three viewpoints. The first point of view is whether or not the surrounding environment of the own device 1 is an environment that generates noise at the time of processing execution, and the HMD 1 is a surrounding of the own device 1 that is an aspect of the environment information. Based on the first sound information representing the sound, it is determined whether or not it is a noise environment (S30). Since the first sound information generated by the air-conduction microphone 14 indicates whether or not noise is generated around the device 1, the call is based on information other than the first sound information generated by the air-conduction microphone 14. Compared with the case where it is determined whether the environment is a noise environment, the reliability of the determination result is high. When the sound level of the sound around the own device 1 is high, it is expected that the voice of the user of the other device is difficult to hear due to the influence of the sound around the own device 1 during the call processing. Even if it is difficult for the user of the HMD 1 to hear the sound represented by the sound information transmitted from the other device due to the influence of the sound around the device 1, the sound information is represented by viewing the text displayed on the display 28. Recognize matters.

２つ目の観点は、自装置１の周囲の環境が騒音を発生させる可能性があるか否かという観点であり、ＨＭＤ１は環境情報の一態様である画像データに基づき、通話環境が騒音環境であるか否かを自動的に判断する（Ｓ４５）。通話環境が騒音環境であるか否かを判断する処理（判断処理）が実行されるときには、自装置１の周囲の環境に騒音を発生していなくても、判断処理実行後の通話処理実行中に、自装置１の周囲の環境に騒音を発生することがある。そのような場合を考慮し、ＨＭＤ１では、騒音を発生する可能性がある環境であることを表す情報を、登録画像として予め登録している。これにより、判断処理実行時に騒音が発生している環境に加え、処理実行期間中に騒音が発生すると予測される環境も騒音環境に含ませることができる。 The second point of view is whether or not the surrounding environment of the device 1 may generate noise. The HMD 1 is based on image data that is one form of environmental information, and the call environment is a noise environment. It is automatically determined whether or not (S45). When a process for determining whether or not the call environment is a noise environment (determination process) is executed, the call process is being executed after the determination process is executed even if no noise is generated in the environment around the device 1. In addition, noise may be generated in the environment around the device 1. In consideration of such a case, in the HMD 1, information indicating that there is a possibility of generating noise is registered in advance as a registered image. Thereby, in addition to the environment in which noise is generated during the execution of the determination process, the environment in which noise is predicted to be generated during the process execution period can be included in the noise environment.

３つ目の観点は、他装置のユーザの周囲の環境が騒音を発生する可能性があるか否かという観点であり、ＨＭＤ１は他装置から送信されるデータに基づき、通話環境が騒音環境であるか否かを判断する（Ｓ３５）。他装置のユーザの周囲の環境が騒音を発生する可能性がある場合、他装置から送信される音情報には、騒音が含まれ、他装置のユーザの音声が聞き取りにくいことが想定される。これに対し、ＨＭＤ１は、他装置から送信されるデータに基づき、通話環境が騒音環境であるか否かを判断することによって、他の装置のユーザの周囲の環境が騒音を発生する可能性があるか否かを判断することができる。そして、通話環境が騒音環境であると判断される場合には、他装置から送信された音情報が表す事項が自動的にテキストで表示される。このため、他装置のユーザの周囲の騒音の影響で、他の装置のユーザの音声が聞き取りにくい場合にも、自装置１のユーザは通話内容をテキストを見ることによって認識することができる。ＨＭＤ１は、上記３つの観点から通話環境が騒音環境であるか否かを判断するための情報を、自動的に取得するため、ユーザが情報を入力する必要はない。 The third viewpoint is whether or not the environment around the user of the other device may generate noise. The HMD 1 is based on data transmitted from the other device, and the call environment is a noise environment. It is determined whether or not there is (S35). When there is a possibility that the environment around the user of the other device may generate noise, it is assumed that the sound information transmitted from the other device includes noise and it is difficult to hear the voice of the user of the other device. On the other hand, the HMD 1 may determine whether or not the call environment is a noise environment based on data transmitted from another device, so that the environment around the user of the other device may generate noise. It can be determined whether or not there is. When it is determined that the call environment is a noise environment, the matter represented by the sound information transmitted from the other device is automatically displayed in text. For this reason, even when it is difficult to hear the voice of the user of the other apparatus due to the influence of the noise around the user of the other apparatus, the user of the own apparatus 1 can recognize the call content by looking at the text. Since the HMD 1 automatically acquires information for determining whether or not the call environment is a noise environment from the above three viewpoints, it is not necessary for the user to input information.

さらにＨＭＤ１は、ＨＭＤ１は、通話環境が騒音環境ではないと判断される場合に、気導式マイク１４が生成した第１音情報を他装置に送信し、且つ、気導式イヤホン３２を用いて音情報を気導音に変換して出力する。ＨＭＤ１は、通話環境が騒音環境であると判断される場合に、骨伝導式マイク３１が生成した第２音情報を他装置に送信し、且つ、骨伝導式イヤホン３３を用いて音情報を骨導音に変換して出力する。自装置１の周囲の環境が騒音が比較的多い環境である場合には、一般的に、第１音情報には周囲の雑音を表す情報が多く含まれるため、第１音情報よりも第２音情報の方が自装置１のユーザが発する音声を精度よく表す。ＨＭＤ１は、ユーザの周囲の環境が騒音環境と判断される場合には、第２音情報を自動的に他装置に送信する。したがって自装置１のユーザは、自装置１のユーザの周囲の環境が騒音環境である場合にも、自装置１のユーザの音声を他装置のユーザに正確に認識させる可能性を高めることができる。 Further, the HMD 1 transmits the first sound information generated by the air conduction microphone 14 to another device and uses the air conduction earphone 32 when it is determined that the call environment is not a noise environment. Sound information is converted into air conduction sound and output. When it is determined that the call environment is a noise environment, the HMD 1 transmits the second sound information generated by the bone conduction microphone 31 to another device, and uses the bone conduction earphone 33 to transmit the sound information to the bone information. Convert to sound and output. When the environment around the device 1 is a relatively noisy environment, generally, the first sound information includes a lot of information representing the surrounding noise, and therefore the second sound information is higher than the first sound information. The sound information more accurately represents the sound emitted by the user of the device 1. When it is determined that the environment around the user is a noise environment, the HMD 1 automatically transmits the second sound information to another device. Therefore, even when the user's own device 1 user is in a noisy environment, the user of the user's own device 1 can increase the possibility that the user of the user's device 1 will correctly recognize the user's voice. .

周囲の環境が、騒音が比較的多い環境である場合には、一般的に、振動が空気を介して鼓膜に達する気導音よりも、外耳道内壁近傍の軟骨に伝搬して聴覚神経に達する骨導音の方が、ユーザは音声の内容を認識しやすい。また一般に、音情報を骨導音に変換する場合、第２音情報を骨導音に変換した方が、第１音情報を骨導音に変換する場合よりも、ユーザの音声を精度よく再生できる。したがって、他装置の周囲の環境が騒音環境である場合には、他装置のユーザは骨導音の方が聞き取りやすいと推測される。これに対し、ＨＭＤ１は、他装置の周囲の環境が騒音環境であると判断した場合には、骨導音に変換されるのに適した第２音情報を自動的に他の装置に送信する。したがって自装置１のユーザは、他装置のユーザの周囲の環境が騒音環境と判断される場合にも、自装置１のユーザの意図を音声によって他装置のユーザに正確に伝える可能性を高めることができる。自装置１が、他のＨＭＤ１との間で通話処理を実行する場合には、自装置１と他のＨＭＤ１とは同じ通話処理によって、通話環境が騒音環境であると判断された場合に、第２音情報を送受信し、受信した第２音情報を骨導音に変換する。したがって、自装置１と他のＨＭＤ１とのユーザはそれぞれ、通話環境が騒音環境と判断される場合にも、騒音の影響を受けにくい骨導音に自動的に切り替えて通話をすることができる。 When the surrounding environment is a relatively noisy environment, generally, the bone that reaches the auditory nerve by propagating to the cartilage near the inner wall of the ear canal rather than the air conduction sound that reaches the eardrum through the air. The user is easier to recognize the content of the sound when the sound is guided. In general, when sound information is converted into bone conduction sound, the user's voice is reproduced more accurately when the second sound information is converted into bone conduction sound than when the first sound information is converted into bone conduction sound. it can. Therefore, when the environment around the other device is a noise environment, it is estimated that the user of the other device can easily hear the bone conduction sound. On the other hand, when the HMD 1 determines that the environment around the other device is a noise environment, the HMD 1 automatically transmits to the other device the second sound information suitable for conversion to the bone conduction sound. . Therefore, the user of the own device 1 increases the possibility that the intention of the user of the own device 1 is accurately conveyed to the user of the other device by voice even when the environment around the user of the other device is determined to be a noise environment. Can do. When the own device 1 executes a call process with another HMD 1, when it is determined that the call environment is a noise environment by the same call process between the own device 1 and the other HMD 1, Two-sound information is transmitted / received, and the received second sound information is converted into a bone conduction sound. Therefore, users of the device 1 and the other HMD 1 can make a call by automatically switching to a bone-conducted sound that is hardly affected by noise even when the call environment is determined to be a noise environment.

本発明の通話装置は、上記した実施形態のＨＭＤ１に限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更が加えられてもよい。例えば、以下の（Ａ）から（Ｆ）までの変形が適宜加えられてもよい。 The communication device of the present invention is not limited to the HMD 1 of the above-described embodiment, and various modifications may be made without departing from the gist of the present invention. For example, the following modifications (A) to (F) may be added as appropriate.

（Ａ）ＨＭＤ１の構成、形状、及びデザイン等は適宜変更されてよい。例えば、ＨＭＤ１の表示装置１２は、ユーザが日常的に使用する眼鏡、ヘルメット、及びヘッドフォンなど、他の装着具に取り付けられてもよい。また例えば、装着装置２と、制御装置３とは有線通信ではなく無線通信を行ってもよい。また例えば、制御装置３の機能を装着装置２が有していてもよい。また例えば、表示装置１２が備える画像表示部（図示略）は、液晶素子などの空間変調素子及び光源、有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｔ）素子、二次元光スキャナ及び光源からなる網膜走査型表示部等であってよい。また例えば、ＨＭＤ１の、骨伝導式マイク３１，気導式イヤホン３２，及び骨伝導式イヤホン３３は、音情報入出力装置１３に内蔵されていたが、それぞれ別体に設けられてもよい。また例えば、ＨＭＤ１は、音情報生成装置として、気導式マイク１４と、骨伝導式マイク３１とを備えていたが、どちらか一方のみを備えてもよい。また例えば、ＨＭＤ１は、音出力装置として、気導式イヤホン３２と、骨伝導式イヤホン３３とを備えていたが、どちらか一方のみを備えてもよい。さらにＨＭＤ１は、有線通信を利用して他装置との間で通話処理を実行してもよい。 (A) The configuration, shape, design, and the like of the HMD 1 may be changed as appropriate. For example, the display device 12 of the HMD 1 may be attached to other wearing tools such as glasses, a helmet, and headphones that are used on a daily basis by the user. Further, for example, the mounting device 2 and the control device 3 may perform wireless communication instead of wired communication. For example, the mounting device 2 may have the function of the control device 3. Further, for example, the image display unit (not shown) included in the display device 12 is a retinal scanning display unit including a spatial modulation element such as a liquid crystal element and a light source, an organic EL (Electro Luminescent) element, a two-dimensional optical scanner, and a light source. It's okay. Further, for example, the bone conduction microphone 31, the air conduction earphone 32, and the bone conduction earphone 33 of the HMD 1 are built in the sound information input / output device 13, but may be provided separately from each other. Further, for example, the HMD 1 includes the air conduction microphone 14 and the bone conduction microphone 31 as the sound information generation device, but may include only one of them. Further, for example, the HMD 1 includes the air conduction type earphone 32 and the bone conduction type earphone 33 as the sound output device, but may include only one of them. Further, the HMD 1 may execute a call process with another apparatus using wired communication.

（Ｂ）通話環境が騒音環境であるかの判断方法は適宜変更されてよい。例えば、（Ｂ−１）から（Ｂ−４）に示す変形が適宜加えられてもよい。 (B) The method for determining whether the call environment is a noise environment may be changed as appropriate. For example, the modifications shown in (B-1) to (B-4) may be added as appropriate.

（Ｂ−１）通話環境は、自装置１の周囲の環境及び他装置の周囲の環境のうちの少なくともいずれかであればよい。また上記実施形態では、上記のように３つの観点から通話環境が騒音環境であるか否かを判断していたが、これに限定されない。例えば、上記３つの観点中から選択された１以上の観点に基づき、通話環境が騒音環境であるか否かが判断されてもよい。また例えば、ＨＭＤ１は、複数の観点のうち、所定数以上の観点（例えば、３つの観点のうち２つ以上の観点）から通話環境が騒音を発生する可能性がある環境であると判断される場合に、通話環境が騒音環境であると判断してもよい。 (B-1) The call environment may be at least one of the environment around the device 1 and the environment around another device. Moreover, in the said embodiment, although the telephone call environment was judged whether it was a noise environment from three viewpoints as mentioned above, it is not limited to this. For example, based on one or more viewpoints selected from the above three viewpoints, it may be determined whether or not the call environment is a noise environment. Further, for example, the HMD 1 is determined to be an environment in which the call environment may generate noise from a predetermined number of viewpoints (for example, two or more viewpoints among the three viewpoints) among a plurality of viewpoints. In this case, it may be determined that the call environment is a noise environment.

（Ｂ−２）環境情報の種類及び取得方法は適宜変更されてよい。例えば、環境情報として、カメラ１７が生成した画像データを取得する場合、撮影画像を解析して予め登録された物又は人が検出された場合に、通話環境が騒音環境であると判断してもよい。より具体的には、例えば、図５に示す撮影画像７０において、機械７２が検出された場合に、通話環境が騒音環境であると判断してもよい。また例えば、環境情報が周囲の環境の属性を表す情報である場合、環境情報はＲＦＩＤ（ＲａｄｉｏＦｒｅｑｕｅｎｃｙＩｄｅｎｔｉｆｉｃａｔｉｏｎ）によって取得されてもよい。また例えば、ユーザが操作部５を操作して入力した情報が環境情報として取得されてもよい。また例えば、自装置１（他装置）の周囲の環境が特定の時間帯にのみ騒音環境になることがわかっている場合には、ＣＰＵ４１は時刻を環境情報として取得し、取得した時刻に基づき通話環境が騒音環境であるか否かを判断してもよい。この場合には、取得時間に応じて、通話環境が騒音環境であるか否かを適切に判断することができる。また例えば、位置と、該位置が騒音環境であるか否かを表す情報とを対応づけて記憶している場合には、ＧＰＳ等から自装置１の位置を環境情報として取得してもよい。この場合、取得した自装置１の位置が、予め騒音環境と対応づけて登録した位置である場合に、通話環境が騒音環境であると判断してもよい。 (B-2) The type and acquisition method of environmental information may be changed as appropriate. For example, when acquiring image data generated by the camera 17 as environment information, if a photographed image is analyzed and a pre-registered object or person is detected, it may be determined that the call environment is a noise environment. Good. More specifically, for example, when the machine 72 is detected in the captured image 70 shown in FIG. 5, it may be determined that the call environment is a noise environment. In addition, for example, when the environment information is information representing an attribute of the surrounding environment, the environment information may be acquired by RFID (Radio Frequency Identification). For example, information input by the user operating the operation unit 5 may be acquired as environment information. Further, for example, when it is known that the environment around the own device 1 (another device) becomes a noise environment only in a specific time zone, the CPU 41 acquires the time as environment information and makes a call based on the acquired time. It may be determined whether the environment is a noise environment. In this case, it is possible to appropriately determine whether the call environment is a noise environment according to the acquisition time. Further, for example, when a position and information indicating whether or not the position is a noise environment are stored in association with each other, the position of the device 1 may be acquired as environment information from a GPS or the like. In this case, when the acquired position of the own device 1 is a position registered in advance in association with the noise environment, the call environment may be determined to be the noise environment.

また例えば、ＨＭＤ１は、ＨＭＤ１の音出力装置（気導式イヤホン３２又は骨伝導式イヤホン３３）の音量に基づき、通話環境が騒音環境であるか否かを判断してもよい。より具体的には、音出力装置の音量が比較的大きく設定されている場合には、通話環境が騒音環境であることに起因して、他装置のユーザの音声が聞き取りにくい状態であると予想される。このため、ＨＭＤ１は、自装置１又は他装置から音量の設定を取得し、音出力装置の音量が比較的大きい場合（例えば、音量が小さい順に１から１０の１０段階で設定される音量のレベル９以上）に、通話環境が騒音環境であると判断してもよい。また例えば、自装置１で利用されている、又は所定時間（例えば、１０分間）内に利用されたコンテンツに関する情報が、環境情報として取得されてもよい。より具体的には、ＨＭＤ１で利用可能なコンテンツ（例えば、画像及び音声）には、特定の環境下で利用されるものがある。例えば、ＨＭＤ１のディスプレイ２８に表示されるコンテンツのうち、特定の機械の操作方法を説明するマニュアルは、該特定の機械が配置された工場等の作業環境下で利用されると考えられる。ＨＭＤ１がコンテンツの付帯情報として、コンテンツの利用環境（例えば、工場、研究施設等）を記憶している場合に、ＨＭＤ１は付帯情報を環境情報としてもよい。この場合、例えば、コンテンツを再生する処理を実行しつつ、通話処理が実行されてもよい。このようにした場合、例えば、自装置のユーザは、特定の機械の操作方法を知りたい場合に、ＨＭＤ１で特定の機械の操作方法を説明するマニュアルを再生して確認しつつ、該マニュアルの不明な点等を他装置のユーザに問い合わせることができる。さらに他装置に利用中のコンテンツに関する情報、及び自装置の周囲を表す画像データを送信すれば、自装置のユーザと、他装置のユーザとの間の意思疎通を円滑に進めることができる。なお、コンテンツに関する情報は、コンテンツの利用環境を表す情報の他、例えば、コンテンツの内容を表す情報（例えば、機械の操作方法のマニュアル）、及びコンテンツの利用対象者に関する情報（例えば、工場の作業員、及び研究所の研究員）等が挙げられる。 Further, for example, the HMD 1 may determine whether or not the call environment is a noise environment based on the volume of the sound output device (the air conduction type earphone 32 or the bone conduction type earphone 33) of the HMD1. More specifically, when the volume of the sound output device is set to be relatively large, it is expected that the voice of the user of the other device is difficult to hear due to the call environment being a noise environment. Is done. For this reason, the HMD 1 acquires the volume setting from its own apparatus 1 or another apparatus, and the volume level set in 10 levels from 1 to 10 in the order of decreasing volume, for example, when the volume of the sound output apparatus is relatively large (for example, 9 or more), the call environment may be determined to be a noise environment. Further, for example, information related to content that is used in the device 1 or used within a predetermined time (for example, 10 minutes) may be acquired as environment information. More specifically, some contents (for example, images and sounds) that can be used in the HMD 1 are used in a specific environment. For example, among the contents displayed on the display 28 of the HMD 1, it is considered that a manual for explaining a method for operating a specific machine is used in a work environment such as a factory where the specific machine is arranged. When the HMD 1 stores content usage environments (for example, factories, research facilities, etc.) as supplementary information of content, the HMD 1 may use the supplementary information as environmental information. In this case, for example, the call process may be executed while executing the process of reproducing the content. In this case, for example, when the user of the device himself / herself wants to know the operation method of a specific machine, the manual explaining the operation method of the specific machine is reproduced and confirmed on the HMD 1 while the manual is unknown. It is possible to make inquiries about other points to users of other devices. Furthermore, if information on the content being used by the other device and image data representing the surroundings of the own device are transmitted, communication between the user of the own device and the user of the other device can be facilitated. The information related to the content includes, for example, information indicating the usage environment of the content, for example, information indicating the content content (for example, a manual for operating the machine), and information regarding the content usage target (for example, factory operations). Member, and researcher at the research institute).

（Ｂ−３）通話環境が騒音環境であるか否かの判断に用いる音レベルを表す指標及び指標の取得方法は適宜変更されてよい。また通話環境が騒音環境であるか否かの判断に用いる閾値は、音レベルを表す指標に応じて適宜変更されてよい。例えば、Ｓ２０の処理において、ＣＰＵ４１は、Ｓ１５で取得した第１音情報に基づき、ユーザが発した音声の影響を考慮せずに、音圧レベルを予測してもよい。この場合、ユーザが発声しないと推定されるタイミングで音情報が取得されることが好ましい。また例えば、ＨＭＤ１が騒音計を備える場合には、音レベルは、騒音計から取得した音圧レベル（ｄＢ）によって表されてもよい。また例えば、音レベルは、第１音情報に含まれるＳ（信号）とＮ（ノイズ）との比を対数であらわしたＳ／Ｎ比（ＳｉｇｎａｌｔｏＮｏｉｓｅｒａｔｉｏ）によって表されてもよい。この場合、例えば、ＨＭＤ１は、音声を表す周波数の振動をＳ（信号）としてもよい。また例えば、閾値は、予め定められた値であってもよいし、ユーザが設定した値であってもよい。また、閾値を予め定められた値とする場合、例えば、ユーザ、使用時間帯、及び周囲の環境の属性（工場、研究所等）等の条件に応じて、複数の閾値の中から、特定の閾値を選択して用いてもよい。 (B-3) The index indicating the sound level used for determining whether or not the call environment is a noise environment and the method for acquiring the index may be changed as appropriate. Moreover, the threshold value used for determining whether or not the call environment is a noise environment may be changed as appropriate according to an index representing the sound level. For example, in the process of S20, the CPU 41 may predict the sound pressure level without considering the influence of the voice uttered by the user based on the first sound information acquired in S15. In this case, it is preferable that the sound information is acquired at a timing estimated that the user does not utter. For example, when the HMD 1 includes a sound level meter, the sound level may be represented by a sound pressure level (dB) acquired from the sound level meter. Further, for example, the sound level may be represented by an S / N ratio (Signal to Noise ratio) in which the ratio of S (signal) and N (noise) included in the first sound information is expressed logarithmically. In this case, for example, the HMD 1 may use a vibration having a frequency representing sound as S (signal). For example, the threshold value may be a predetermined value or a value set by the user. In addition, when the threshold value is set to a predetermined value, for example, a specific value is selected from a plurality of threshold values according to conditions such as a user, a usage time zone, and an attribute (factory, laboratory, etc.) of the surrounding environment. A threshold value may be selected and used.

（Ｂ−４）他装置から受信したデータに基づき、通話環境が騒音環境であるか否かを判断する方法は適宜変更されてよい。例えば、データに含まれる音情報に基づき、音情報が表す音レベルを、自装置１において特定してもよい。また例えば、他装置において、他装置の周囲の環境が騒音環境か否かを判断し、その判断結果を自装置１に送信してもよい。この場合、自装置１では、他装置から受信した判断結果に基づいて通話環境が騒音環境であるか否かを判断してもよい。また例えば、他装置から送信されるデータに他装置の周囲の環境を表す情報が含まれる場合には、他装置から受信したデータに含まれる情報に基づき、通話環境が騒音環境であるか否かを判断してもよい。他装置の情報は、前述の自装置の環境情報のいずれかと同じ種類の情報であってもよい。 (B-4) The method for determining whether or not the call environment is a noise environment based on data received from another device may be changed as appropriate. For example, the own device 1 may specify the sound level represented by the sound information based on the sound information included in the data. Further, for example, in another device, it may be determined whether the environment around the other device is a noise environment, and the determination result may be transmitted to the own device 1. In this case, the own device 1 may determine whether or not the call environment is a noise environment based on the determination result received from another device. Also, for example, when the data transmitted from the other device includes information representing the environment around the other device, whether or not the call environment is a noise environment based on the information included in the data received from the other device. May be judged. The information on the other device may be the same type of information as any of the environment information on the own device.

（Ｃ）ＨＭＤ１は、通話環境が騒音環境であるか否かに応じて、気導式マイク１４と、骨伝導式マイク３１とを切り替えていたが、これに限定されない。例えば、自装置１が気導式マイク１４と、骨伝導式マイク３１とのいずれかのみを備える場合には、ＨＭＤ１は通話環境が騒音環境であるか否かに応じて音情報入力装置を切り替えなくてもよい。同様に、ＨＭＤ１は、通話環境が騒音環境であるか否かに応じて、気導式イヤホン３２と、骨伝導式イヤホン３３とを切り替えていたが、これに限定されない。例えば、自装置１が気導式イヤホン３２と、骨伝導式イヤホン３３とのいずれかのみを備える場合には、ＨＭＤ１は通話環境が騒音環境であるか否かに応じて音情報出力装置を切り替えなくてもよい。上記のいずれかの変形を加えた場合にも、通話環境が騒音環境であると判断される場合には、ユーザは、他装置から送信された音情報が表す事項を、ディスプレイ２８に表示されるテキストを見ることによって、認識することができる。 (C) The HMD 1 switches between the air-conduction microphone 14 and the bone-conduction microphone 31 depending on whether the call environment is a noise environment, but is not limited to this. For example, when the own device 1 includes only the air conduction microphone 14 or the bone conduction microphone 31, the HMD 1 switches the sound information input device depending on whether or not the call environment is a noise environment. It does not have to be. Similarly, the HMD 1 switches between the air conduction type earphone 32 and the bone conduction type earphone 33 depending on whether or not the call environment is a noise environment, but is not limited thereto. For example, when the device 1 includes only the air conduction type earphone 32 or the bone conduction type earphone 33, the HMD 1 switches the sound information output device depending on whether or not the call environment is a noise environment. It does not have to be. Even when any one of the above modifications is made, if it is determined that the call environment is a noise environment, the user displays the matter represented by the sound information transmitted from the other device on the display 28. It can be recognized by looking at the text.

（Ｄ）テキストを表す画像中の、テキストの配置、フォント、及び大きさ等は適宜変更されてよい。ＨＭ１６は外界の実像からの光を透過しない構成とし、ユーザは、自己の視野においてディスプレイ２８に表示される画像のみを視認できるようにしてもよい。 (D) The arrangement, font, size, and the like of the text in the image representing the text may be changed as appropriate. The HM 16 may be configured not to transmit light from a real image of the outside world, and the user may be able to visually recognize only the image displayed on the display 28 in his field of view.

（Ｅ）通話処理を実行させるための指示を含むプログラムは、ＨＭＤ１がプログラムを実行するまでに、ＨＭＤ１が備える記憶装置に記憶されればよく、プログラムの取得方法、取得経路及びプログラムを記憶する装置のそれぞれは適宜変更されてよい。したがって、ＣＰＵ４１が実行するプログラムは、通信ケーブル又は無線通信を介して、他の装置から受信し、フラッシュメモリ４７等の記憶装置に記憶されてもよい。他の装置は、例えば、ＰＣ，及びネットワーク網を介して接続されるサーバを含む。同様に、ＣＰＵ２１が実行するプログラムは、通信ケーブル又は無線通信を介して、他の装置から受信し、ＲＡＭ２５及び不揮発媒体等の記憶装置に記憶されてもよい。この場合の他の装置は、例えば、制御装置３，ＰＣ，及びネットワーク網を介して接続されるサーバを含む。 (E) A program including an instruction for executing a call process may be stored in a storage device included in the HMD 1 until the HMD 1 executes the program, and a program acquisition method, an acquisition route, and a device that stores the program Each of these may be changed as appropriate. Therefore, the program executed by the CPU 41 may be received from another device via a communication cable or wireless communication and stored in a storage device such as the flash memory 47. Other devices include, for example, a PC and a server connected via a network. Similarly, the program executed by the CPU 21 may be received from another device via a communication cable or wireless communication, and stored in a storage device such as the RAM 25 and a nonvolatile medium. Other devices in this case include, for example, a control device 3, a PC, and a server connected via a network.

（Ｆ）図３及び図４に示す通信処理の各ステップは、ＣＰＵ４１によって実行される例に限定されず、一部又は全部が他の電子機器（例えば、ＡＳＩＣ）によって実行されてもよい。また、通信処理の各ステップは、複数の電子機器（例えば、複数のＣＰＵ）によって分散処理されてもよい。図３及び図４に示す通信処理の各ステップは、必要に応じて順序の変更、ステップの省略、及び追加が可能である。例えば、図３の通話処理のＳ１００と、Ｓ１１０との処理において、ＨＭＤ１は、Ｓ１００で入力された自装置１のユーザの音声をテキストに変換し、生成されたテキストを表す画像をディスプレイ２８に表示させてもよい。このようにすれば、他装置において自装置１のユーザの音声がどのようにテキストに変換されるかを、自装置１のユーザはディスプレイ２８に表示された画像を見ることによって確認することができる。また例えば、通話環境が騒音環境であると判断される場合に（Ｓ９５：ＹＥＳ）、気導式マイク１４から取得された第１音情報と、骨伝導式マイク３１から取得された第２音情報との双方が他装置に送信されてもよい。このようにした場合には、他装置は、第１音情報及び第２音情報の中から、音声を聞き取りやすい方の音情報を選択して、音情報を音に変換させることが可能である。さらに、ＨＭＤ１が備えるＣＰＵからの指示に基づき、ＨＭＤ１上で稼動しているオペレーティングシステム（ＯＳ）等が実際の処理の一部または全部を行い、その処理によって上記実施形態の機能が実現される場合も本発明の範囲に含まれる。 (F) Each step of the communication process shown in FIG. 3 and FIG. 4 is not limited to the example executed by the CPU 41, and a part or all of the steps may be executed by another electronic device (for example, ASIC). Each step of the communication process may be distributed and processed by a plurality of electronic devices (for example, a plurality of CPUs). The steps of the communication process shown in FIGS. 3 and 4 can be changed in order, omitted, or added as necessary. For example, in the processing of S100 and S110 in the call processing of FIG. 3, the HMD 1 converts the user's voice of the user's device 1 input in S100 into text, and displays an image representing the generated text on the display 28. You may let them. In this way, the user of the user apparatus 1 can confirm how the user's voice of the user apparatus 1 is converted into text in the other apparatus by viewing the image displayed on the display 28. . Further, for example, when it is determined that the call environment is a noise environment (S95: YES), the first sound information acquired from the air conduction microphone 14 and the second sound information acquired from the bone conduction microphone 31 Both may be transmitted to other devices. In such a case, the other device can select the sound information that is easier to hear the sound from the first sound information and the second sound information, and convert the sound information into a sound. . Furthermore, based on an instruction from the CPU provided in the HMD 1, an operating system (OS) or the like operating on the HMD 1 performs part or all of the actual processing, and the functions of the above-described embodiments are realized by the processing. Are also included within the scope of the present invention.

１ヘッドマウントディスプレイ
２装着装置
３制御装置
１２表示装置
１３音情報入出力装置
１４気導式マイク
２１，４１ＣＰＵ
２３，４３ＲＯＭ
２５，４５ＲＡＭ
１７カメラ
３１骨伝導式マイク
３２気導式イヤホン
３３骨伝導式イヤホン
３４第一有線通信部
４６第二有線通信部
４７フラッシュメモリ
５３無線通信部 DESCRIPTION OF SYMBOLS 1 Head mounted display 2 Mounting apparatus 3 Control apparatus 12 Display apparatus 13 Sound information input / output apparatus 14 Air-conduction type microphone 21, 41 CPU
23, 43 ROM
25, 45 RAM
17 Camera 31 Bone Conduction Microphone 32 Air Conduction Earphone 33 Bone Conduction Earphone 34 First Wired Communication Unit 46 Second Wired Communication Unit 47 Flash Memory 53 Wireless Communication Unit

Claims

Display means configured to be worn on the user's head and outputting an image;
A communication means for transmitting and receiving data including sound information to and from other devices;
Sound output means for converting the sound information received by the communication means into sound and outputting the sound;
A determination means for determining whether or not the call environment is a noise environment that may generate noise;
Conversion means for converting the voice represented by the sound information received by the communication means into text when the determination means determines that the call environment is the noise environment;
A communication apparatus comprising: display control means for causing the display means to display an image representing the text generated by the conversion means.

An acquisition unit that acquires environment information representing an environment around the user;
The communication apparatus according to claim 1, wherein the determination unit determines whether or not the call environment is the noise environment based at least on the environment information acquired by the acquisition unit.

First sound information generating means for converting vibration of sound transmitted through the air into first sound information;
A first communication control means for transmitting the first sound information generated by the first sound information generating means to the other device via the communication means;
The acquisition means acquires the first sound information generated by the first sound information generation means as at least a part of the environmental information,
The determination means determines whether or not the call environment is the noise environment based at least on a comparison result between the first sound information generated by the first sound information generation means and a threshold value. The communication device according to claim 2.

Second sound information generating means for collecting the bone conduction sound of the user and converting it into second sound information;
When the determination unit determines that the call environment is the noise environment, the second sound information generated by the second sound information generation unit is transmitted to the other device via the communication unit. The communication apparatus according to claim 1, further comprising: a second communication control unit that performs the above-described operation.

5. The communication apparatus according to claim 1, wherein the determination unit determines whether or not the call environment is the noise environment based at least on the data received by the communication unit. .

A display unit configured to be mounted on a user's head, outputting an image, a communication unit executing transmission / reception of data including sound information with another device, and the sound information received by the communication unit A communication method executed by a communication device including sound output means for converting to vibration and outputting,
A determination step of determining whether or not the call environment is a noise environment that is an environment that may generate noise;
A conversion step of converting the voice represented by the sound information received by the communication means into text when the call environment is determined to be the noise environment in the determination step;
A display control step of causing the display means to display an image representing the text generated in the conversion step.

A display unit configured to be mounted on a user's head, outputting an image, a communication unit executing transmission / reception of data including sound information with another device, and the sound information received by the communication unit A communication program executed by a computer of a communication device including sound output means for converting to vibration and outputting,
A determination step of determining whether or not the call environment is a noise environment that is an environment that may generate noise;
A conversion step of converting the voice represented by the sound information received by the communication means into text when the call environment is determined to be the noise environment in the determination step;
A communication program for executing a display control step of causing the display means to display an image representing the text generated in the conversion step.