JP5366043B2

JP5366043B2 - Audio recording / playback device

Info

Publication number: JP5366043B2
Application number: JP2008294524A
Authority: JP
Inventors: 朋子米澤; 大丈山添
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2008-11-18
Filing date: 2008-11-18
Publication date: 2013-12-11
Anticipated expiration: 2028-11-18
Also published as: JP2010122369A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice reproduction device by which a content of voice information is easily stored in human memory, and easily adopted to human consideration, while linking feeling of reproduction operation when the voice information is reproduced, to the content of the voice information. <P>SOLUTION: The voice reproduction device for reproducing voice information recorded in a recording means (103) while associated with direction information indicating a direction, includes: a first direction specifying means (14, 100, S5) for specifying a direction which is indicated by using user's hands and body, when the voice information is reproduced; and a reproduction means (13, 100, S15) for reproducing the voice information recorded in the recording means while associated with the direction information indicating the direction corresponding to the direction specified by the first direction specifying means. The content of the voice information is easily stored in human memory, and easily adopted to human consideration, while linking feeling of reproduction operation when voice information is reproduced, to the content of the voice information. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

この発明は、音声記録再生装置に関し、特に、方向を示す情報と対応つけて音声情報を記録し、記録された音声情報を再生する音声記録再生装置に関する。 The present invention relates to an audio recording / reproducing apparatus, and more particularly to an audio recording / reproducing apparatus that records audio information in association with direction information and reproduces the recorded audio information.

記録手段に記録された音声情報を再生する音声再生装置として、ＩＣレコーダ（ボイスレコーダ）と呼ばれる装置がある。このようなＩＣレコーダに関する技術としては、特許文献１、特許文献２、および特許文献３などさまざまな工夫がなされたものが存在する。
特開２００７−３２５３９２号公報［H02J 7/00, G10L 19/00, H01M 10/48］特開２００５−２９２６９３号公報［G10L 15/00, G10L 15/22, G10L 15/24, G10L 15/28］特開２００５−２６１３３号公報［H01H 15/16, G10L 19/00, H01H 3/52］ An apparatus called an IC recorder (voice recorder) is known as an audio reproducing apparatus for reproducing audio information recorded in a recording means. As technologies relating to such an IC recorder, there are those in which various devices such as Patent Document 1, Patent Document 2, and Patent Document 3 have been devised.
JP 2007-325392 A [H02J 7/00, G10L 19/00, H01M 10/48] JP-A-2005-292893 [G10L 15/00, G10L 15/22, G10L 15/24, G10L 15/28] JP 2005-26133 A [H01H 15/16, G10L 19/00, H01H 3/52]

しかし、従来、ＩＣレコーダなどの音声再生装置において、再生の態様について工夫されたものが無かった。つまり、従来のＩＣレコーダでは、装置に設けられた小さな液晶画面に、音声情報の通し番号や記録日時の情報を表示して目的の音声情報に当たりを付けて再生するものであり、どの音声情報を再生する場合にも、体で感じる再生操作の感覚（体感）は同じであった。そのため、音声情報の内容を再生操作の体感と結びつけることによって、音声情報の内容を記憶にとどめやすくしたり、音声情報の内容を思考に取り入れやすくしたりすることはできなかった。 However, conventionally, there has been no audio playback device such as an IC recorder that has been devised in terms of playback mode. In other words, in the conventional IC recorder, the serial number of the audio information and the recording date / time information are displayed on the small liquid crystal screen provided in the apparatus, and the desired audio information is played back. In this case, the sensation (experience) of the reproduction operation felt by the body was the same. For this reason, it has been impossible to make it easy to keep the contents of the sound information in memory or to incorporate the contents of the sound information into thoughts by linking the contents of the sound information with the experience of the reproduction operation.

それゆえに、この発明の主たる目的は、新規な音声記録再生装置を提供することである。 Therefore, a main object of the present invention is to provide a novel audio recording / reproducing apparatus.

また、この発明の他の目的は、音声情報を再生する際の再生操作の体感を音声情報の内容と結びつけて、音声情報の内容を記憶にとどめやすくしたり、音声情報の内容を思考に取り入れやすくしたりできる、音声記録再生装置を提供することである。 Another object of the present invention is to combine the experience of the reproduction operation when reproducing audio information with the content of the audio information, making it easy to keep the content of the audio information in memory, or incorporating the content of the audio information into thinking. Ru can or easy, is to provide a voice recording and reproducing apparatus.

本発明は、上記の課題を解決するために、以下の構成を採用した。なお、括弧内の参照符号および補足説明等は、本発明の理解を助けるために後述する実施の形態との対応関係を示したものであって、本発明を何ら限定するものではない。 The present invention employs the following configuration in order to solve the above problems. The reference numerals in parentheses, supplementary explanations, and the like indicate correspondence relationships with embodiments described later to help understanding of the present invention, and do not limit the present invention in any way.

第１の発明は、記録装置に音声情報を記録し、記録装置から音声情報を再生する、音声記録再生装置であって、利用者の発する音声に基づいて音声情報を生成する音声情報生成手段、記録時に利用者が身体を使って示す記録時方向を特定する記録時方向特定手段、音声情報生成手段が生成した音声情報を、記録時方向を示す方向情報と対応つけて記録装置に記録する記録手段、再生時に利用者が身体を使って示す再生時方向を特定する再生時方向特定手段、および再生時方向に相当する記録時方向を示す方向情報と対応つけて記録装置に記録されている音声情報を記録装置から再生する第１音声再生手段を備える、音声記録再生装置である。 A first invention is an audio recording / reproducing device for recording audio information in a recording device and reproducing the audio information from the recording device, wherein the audio information generating means generates the audio information based on the sound emitted by the user, Recording direction specifying means for specifying the recording direction indicated by the user using the body at the time of recording, recording in which the audio information generated by the audio information generating means is recorded in the recording device in association with the direction information indicating the recording time direction means, recorded on the user recording apparatus in association with the direction information indicating the recording time direction corresponding to the playback direction specifying means, and playback direction identifying the reproduction direction shown with the body at the time of playback The audio recording / reproducing apparatus includes first audio reproducing means for reproducing the existing audio information from the recording apparatus .

第１の発明では、音声情報生成手段（１２、Ｓ３、Ｓ３５、Ｓ８３、Ｓ１３３）が利用者の発する音声に基づいて音声情報を生成し、記録時方向特定手段（１４、１００、Ｓ５、Ｓ３７、Ｓ８５、Ｓ１３５）が音声情報の記録時に利用者が身体を使って示す方向を特定し、記録手段は、記録装置（１０３）に、音声情報生成手段が生成した音声情報を記録時方向特定手段が特定した記録時方向を示す方向情報と対応つけて記録する。そして、再生時方向特定手段（１４、１００、Ｓ１１、Ｓ６３）が音声情報の再生時に利用者が身体を使って示す方向を特定し、第１音声再生手段（１００、Ｓ１５、Ｓ６７、）が再生時方向特定手段が特定した再生時方向に該当する方向を示す方向情報と対応つけて記録装置に記録された音声情報を記録装置から再生する。 In the first invention, the voice information generating means (12, S3, S35, S83, S133) generates voice information based on the voice uttered by the user, and the recording direction specifying means (14, 100, S5, S37, S85, S135) that identifies the direction indicated with the user body during recording of the audio information, recording means, the recording apparatus (103), the recording time of direction specifying means audio information is audio information generating means to generate Recording is performed in association with the direction information indicating the specified recording direction. Then, the reproduction direction specifying means (14, 100, S11, S63) specifies the direction indicated by the user using the body when reproducing the audio information, and the first audio reproduction means (100, S15, S67) is reproduced. The audio information recorded in the recording apparatus is reproduced from the recording apparatus in association with the direction information indicating the direction corresponding to the reproduction direction specified by the time direction specifying means.

第１の発明によれば、音声情報は利用者が身体を使って示す方向と対応つけて記録されるので、この音声情報を再生する際には、利用者は身体を使って音声情報の記録時に示したのと同じ方向を示すことになり、利用者は再生時に記録時の身体の体感を再体験でき、音声情報の記録時の思考を呼び覚ましやすくしたり、音声情報の内容を記憶にとどめやすくしたり、音声情報の内容を思考に取り入れやすくしたりできる。さらに、利用者が身体を使って示す方向に該当する方向を示す方向情報と対応つけられた音声情報を再生するので、音声情報を再生する際の再生操作の体感を音声情報の内容と結びつけて、音声情報の内容を記憶にとどめやすくしたり、音声情報の内容を思考に取り入れやすくしたりできる。 According to the first invention, since the voice information is recorded in association with the direction indicated by the user using the body, when reproducing the voice information, the user records the voice information using the body. The user will be able to re-experience the physical feeling during recording during playback, make it easier to awaken thoughts during recording of audio information, and keep the contents of audio information in memory. It can be made easier, and the contents of audio information can be easily incorporated into thoughts. Furthermore, since the audio information associated with the direction information indicating the direction corresponding to the direction indicated by the user is reproduced, the experience of the reproduction operation when reproducing the audio information is combined with the content of the audio information. It is possible to make it easy to keep the contents of audio information in memory, and to make it easy to incorporate the contents of audio information into thoughts.

第２の発明は、第１の発明に従属する発明であって、第１音声再生手段は、音声情報を、方向情報と利用者から当該音声情報までの距離を示す距離情報とに基づいて音源定位可能に再生する音声記録再生装置である。 The second invention is an invention subordinate to the first invention, wherein the first sound reproduction means generates sound information based on direction information and distance information indicating a distance from the user to the sound information. This is an audio recording / playback apparatus that plays back in a localizable manner.

第２の発明では、第１再生手段は、音声情報を、方向情報と利用者から当該音声情報までの距離を示す距離情報とに基づいて音源定位可能に再生する。 In the second invention, the first reproduction means reproduces the audio information so that the sound source can be localized based on the direction information and the distance information indicating the distance from the user to the audio information.

第２の発明によれば、音声情報が音源定位可能に再生されるので、利用者は、音声情報が位置する方向を特定することができる。
第３の発明は、第１または第２の発明に従属する発明であって、音声情報の存在を示す象徴化した音声アイコンを、当該音声情報および方向情報と対応つけて記録装置に記録する音声アイコン記録手段、および再生時方向に相当しない記録時方向を示す方向情報と対応付けて記録装置に記録されている音声情報と対応つけて記録装置に記録されている音声アイコンを再生する音声アイコン再生手段をさらに備える、音声記録再生装置である。
第３の発明によれば、音声アイコンが再生されるので、利用者のそのときの再生時方向に相当しない記録時方向にも音声情報が記録されていることが、容易に分かる。 According to the second aspect, since the audio information is reproduced so that the sound source can be localized, the user can specify the direction in which the audio information is located.
A third invention is an invention subordinate to the first or second invention, wherein a voice icon symbolized to indicate the presence of voice information is recorded in a recording device in association with the voice information and direction information. Audio icon reproduction for reproducing an audio icon recorded in the recording device in association with the audio information recorded in the recording device in association with the direction information indicating the recording direction not corresponding to the direction of reproduction in the icon recording means An audio recording / reproducing apparatus further comprising means.
According to the third aspect, since the sound icon is reproduced, it can be easily understood that the sound information is recorded in the recording direction that does not correspond to the reproduction direction of the user at that time.

第４の発明は、第３の発明に従属する発明であって、音声アイコン再生手段は、音声アイコンを、方向情報と利用者から当該音声情報までの距離を示す距離情報とに基づいて音源定位可能に生成する、音声記録再生装置である。 A fourth invention is an invention subordinate to the third invention, wherein the sound icon reproducing means is configured to determine the sound icon based on direction information and distance information indicating a distance from the user to the sound information. It is an audio recording / reproducing device that can be generated .

第４の発明では、音声アイコン再生手段は、再生時方向特定手段が特定した再生時方向に該当しない記録時方向を示す方向情報と対応つけて記録装置に記録された音声情報について、当該音声情報の存在を示す音声アイコンを方向情報と距離情報とに基づいて音源定位可能に生成する。 In the fourth aspect of the invention, the audio icon reproducing means relates to the audio information recorded in the recording device in association with the direction information indicating the recording time direction not corresponding to the reproduction time direction specified by the reproduction time direction specifying means. A sound icon indicating the presence of the sound source is generated based on the direction information and the distance information so that sound source localization is possible.

第４の発明によれば、利用者が示す再生時方向以外の記録時方向に位置する音声情報について、当該音声情報の存在を示す音声アイコンを音源定位可能に再生するので、利用者は記録装置に記録されているすべての音声情報の存在を一度に把握することができる。 According to the fourth invention, the audio information located in the recording time of a direction other than the playback direction indicated by the user, since reproduces audio icon that indicates the presence of the voice information sound source localization, the user recording device It is possible to grasp the existence of all audio information recorded in

第５の発明は、第１の発明ないし第４の発明のいずれかに従属する発明であって、記録手段は、音声情報を深さの異なる２以上のレイヤに、レイヤを示すレイヤ識別情報とさらに対応つけて記録し、さらにレイヤの指定を受け付けるレイヤ指定受付手段を備え、第１音声再生手段は、再生時方向に相当する記録時方向を示す方向情報と対応つけて記録装置に記録された音声情報のうちのレイヤ指定受付手段が受け付けたレイヤを示すレイヤ識別情報と対応つけられた音声情報を再生する、音声記録再生装置である。 A fifth invention is an invention dependent on any one of the first to fourth inventions, wherein the recording means is configured to record audio information into two or more layers having different depths, layer identification information indicating the layers, and Further, recording is performed in association with each other, and further includes a layer designation receiving unit that receives the designation of the layer, and the first sound reproduction unit is recorded in the recording apparatus in association with the direction information indicating the recording direction corresponding to the reproduction direction. It is an audio recording / reproducing apparatus that reproduces audio information associated with layer identification information indicating a layer received by a layer designation receiving unit of audio information.

第５の発明では、記録手段は、音声情報をさらにレイヤを識別するレイヤ識別情報と対応つけて記録する。レイヤ指定受付手段（１００、Ｓ２３、Ｓ５５）はレイヤの指定を受け付け、第１音声再生手段は再生時方向特定手段が特定した再生時方向に該当する記録時方向を示す方向情報と対応つけて記録装置に記録された音声情報のうちのレイヤ指定受付手段が受け付けたレイヤを示すレイヤ識別情報と対応つけられた音声情報を再生する。つまり、第１音声再生手段は、指定されたレイヤから音声情報を再生する。 In the fifth invention, the recording means records the audio information in association with the layer identification information for further identifying the layer. The layer designation accepting means (100, S23, S55) accepts the designation of the layer, and the first sound reproducing means records in association with the direction information indicating the recording direction corresponding to the reproduction direction specified by the reproduction direction specifying means. Of the audio information recorded in the apparatus, the audio information associated with the layer identification information indicating the layer received by the layer designation receiving means is reproduced. That is, the first sound reproducing means reproduces sound information from the designated layer.

第５の発明によれば、音声情報をレイヤに属させてレイヤ毎に区別することができる。 According to the fifth aspect, audio information can belong to a layer and can be distinguished for each layer.

第６の発明は、第５の発明に従属する発明であって、利用者から音声情報までの距離は、レイヤ識別情報が示すレイヤの深さに基づいて決定される、音声記録再生装置である。 A sixth invention is an audio recording / reproducing apparatus according to the fifth invention, wherein the distance from the user to the audio information is determined based on a layer depth indicated by the layer identification information. .

第６の発明では、音声情報までの距離は、レイヤ識別情報が示すレイヤの深さ基づいて決定される。 In the sixth invention, the distance to the audio information is determined based on the depth of the layer indicated by the layer identification information.

第６の発明によれば、音声情報までの距離をレイヤの深さによって決定することができる。 According to the sixth aspect , the distance to the audio information can be determined by the depth of the layer.

第７の発明は、第１の発明ないし第６の発明のいずれかに従属する発明であって、再生時方向特定手段は、利用者の顔の向きが示す方向、利用者の視線の向きが示す方向、および利用者の指差す向きが示す方向のいずれかを特定する、音声記録再生装置である。 A seventh aspect of the invention, to no first invention an invention subordinate to any one of aspects 6, playback direction specifying means, the direction indicated by the direction of the face of the user, the orientation of the user's line of sight identifying one of the direction indicated, and the user's Pointing orientation shown, an audio recording and reproducing apparatus.

第７の発明では、再生時方向特定手段は、利用者が顔の向きによって示す方向、視線の向きによって示す方向、および指差す向きによって示す方向のいずれかを再生時方向として特定する。 In the seventh invention, the reproduction direction specifying means specifies any one of a direction indicated by the face direction of the user, a direction indicated by the line-of-sight direction, and a direction indicated by the pointing direction as the reproduction direction .

第７の発明によれば、利用者は、再生する音声情報を、顔の向き、視線の向き、および指差しの向きのいずれかによって指示することができる。 According to the seventh aspect , the user can instruct the audio information to be reproduced by any one of the face direction, the line-of-sight direction, and the pointing direction.

第８の発明は、第１の発明ないし第７の発明のいずれかに従属する発明であって、記録時方向特定手段は、利用者の顔の向きが示す方向、利用者の視線の向きが示す方向、および利用者の指差す向きが示す方向のいずれかを特定する、音声記録再生装置である。 Eighth aspect of the present invention is to not first invention an invention subordinate to any of the seventh invention, during recording direction specifying means, the direction indicated by the direction of the face of the user, the orientation of the user's line of sight identifying one of the direction indicated, and the user's Pointing orientation shown, an audio recording and reproducing apparatus.

第７の発明では、記録時方向特定手段は、利用者が顔の向きによって示す方向、視線の向きによって示す方向、および指差す向きによって示す方向のいずれかを記録時方向として特定する。 In the seventh invention, the recording direction specifying means specifies any one of the direction indicated by the face direction of the user, the direction indicated by the direction of the line of sight, and the direction indicated by the pointing direction as the recording time direction .

第７の発明によれば、利用者は、記録する音声情報を付加する仮想空間における方向を、顔の向き、視線の向き、および指差しの向きのいずれかによって指示することができる。
第９の発明は、記録装置に音声情報を記録し、記録装置から音声情報を再生する、音声記録再生のコンピュータによって実行するプログラムであって、プログラムはコンピュータを、利用者の発する音声に基づいて音声情報を生成する音声情報生成手段、記録時に利用者が身体を使って示す記録時方向を特定する記録時方向特定手段、音声情報生成手段が生成した音声情報を、記録時方向を示す方向情報と対応つけて記録装置に記録する記録手段、再生時に利用者が身体を使って示す再生時方向を特定する再生時方向特定手段、再生時方向に相当する記録時方向を示す方向情報と対応つけて記録装置に記録されている音声情報を記録装置から再生する音声再生手段として機能させる、音声記録再生プログラムである。
第１０の発明は、記録装置に音声情報を記録し、記録装置から音声情報を再生する、音声記録再生のコンピュータが実行する方法であって、利用者の発する音声に基づいて音声情報を生成する音声情報生成ステップ、記録時に利用者が身体を使って示す記録時方向を特定する記録時方向特定ステップ、音声情報生成ステップで生成した音声情報を、記録時方向を示す方向情報と対応つけて記録装置に記録する記録ステップ、再生時に利用者が身体を使って示す再生時方向を特定する再生時方向特定ステップ、および再生時方向に相当する記録時方向を示す方向情報と対応つけて記録装置に記録されている音声情報を記録装置から再生する音声再生ステップを含む、音声記録再生方法である。 According to the seventh aspect , the user can instruct the direction in the virtual space to which the audio information to be recorded is added by any one of the face direction, the line-of-sight direction, and the pointing direction.
According to a ninth aspect of the invention, there is provided a program executed by a voice recording / reproducing computer for recording voice information in a recording device and reproducing the voice information from the recording device, wherein the program executes the computer on the basis of voice produced by a user. Audio information generating means for generating audio information, recording direction specifying means for specifying the recording direction indicated by the user using the body during recording, direction information indicating the recording time direction for the audio information generated by the audio information generating means Corresponding to the recording means for recording in the recording device in association with the reproduction direction specifying means for specifying the reproduction direction indicated by the user using the body during reproduction, and the direction information indicating the recording direction corresponding to the reproduction direction The audio recording / reproducing program causes the audio information recorded in the recording apparatus to function as audio reproducing means for reproducing from the recording apparatus.
A tenth aspect of the invention is a method executed by a voice recording / reproducing computer for recording voice information on a recording device and reproducing the voice information from the recording device, and generating the voice information based on a voice uttered by a user. Audio information generating step, recording direction specifying step for specifying the recording direction indicated by the user using the body during recording, and audio information generated in the audio information generating step are recorded in association with the direction information indicating the recording direction A recording step for recording on the device, a direction specification step for reproduction for specifying a reproduction direction indicated by the user using the body at the time of reproduction, and direction information indicating a recording direction corresponding to the direction for reproduction are associated with the recording device. An audio recording / reproducing method including an audio reproducing step of reproducing recorded audio information from a recording apparatus.

第１１の発明は、第1の発明ないし第８の発明のいずれかに従属する発明であって、記録時の利用者の周辺画像を撮影する撮影手段、および記録時方向を撮影手段が撮影した周辺画像上の位置に変換する変換手段をさらに備え、記録手段は、音声情報を周辺画像および位置情報と対応つけて記録する、音声記録再生装置である。 The eleventh invention is an invention dependent on any one of the first to eighth inventions, the photographing means for photographing a peripheral image of the user at the time of recording, and the photographing means for photographing the direction at the time of recording. The sound recording / reproducing apparatus further includes conversion means for converting to a position on the peripheral image, and the recording means records the sound information in association with the peripheral image and the position information.

第１１の発明では、撮影手段（１７）が音声情報の記録時の利用者の周辺画像を撮影し、変換手段（１００、Ｓ９９、Ｓ１５５）が記録時方向特定手段が特定した方向を撮影手段が撮影した周辺画像上の位置に変換する。そして、記録手段は、音声情報生成手段が生成した音声情報を撮影手段が撮影した周辺画像および変換手段が変換した位置を示す位置情報と対応つけて記録する。 In the eleventh aspect of the invention, the photographing means (17) takes a peripheral image of the user at the time of recording audio information, and the photographing means determines the direction specified by the recording direction specifying means by the converting means (100, S99, S155). Convert to a position on the captured peripheral image. Then, the recording unit records the audio information generated by the audio information generation unit in association with the peripheral image captured by the imaging unit and the position information indicating the position converted by the conversion unit.

第１１の発明によれば、音声情報を周辺画像上の位置に対応つけて記録することができる。 According to the eleventh aspect, audio information can be recorded in association with a position on a peripheral image.

第１２の発明は、第１１の発明に従属する発明であって、撮影手段が撮影した周辺画像と最後に記録装置に記録された音声情報が対応つけられた周辺画像とを比較して、利用者の環境が変化したか否かを判断する第１環境変化判断手段をさらに備え、記録手段は、第１環境変化判断手段が環境が変化したと判断した場合に音声情報を、撮影手段が撮影した周辺画像と対応つけて記録し、第１環境変化手段が環境が変化していないと判断した場合に音声情報を、最後に記録装置に記録された音声情報が対応つけられている周辺画像と対応つけて記録する、音声記録再生装置。
である。 A twelfth invention is an invention subordinate to the eleventh invention, audio information photographing means is recorded in the recording apparatus on the peripheral image and the last shot is compared with the peripheral image that is to correspond, use the first, further comprising an environmental change judgment means for judging whether the user environment has changed, the recording means, the audio information when the first environmental change determination unit determines that the environment has changed, imaging means photographed the peripheral image and recorded in association, the audio information when the first environmental change unit determines that the environment is not changed, and the surrounding image audio information recorded in the last recording apparatus is attached corresponding Audio recording / playback device that records in correspondence.
It is.

第１２の発明では、第１環境変化判断手段（１００、Ｓ９１）は撮影手段が撮影した周辺画像と最後に記録装置に記録された音声情報が対応つけられた周辺画像とを比較して利用者の周辺の環境が変化したか否かを判断し、記録手段は、第１環境変化判断手段が環境が変化したと判断した場合に音声情報生成手段が生成した音声情報を、撮影手段が撮影した周辺画像と対応つけて記録し、第１環境変化手段が環境が変化していないと判断した場合に音声情報生成手段が生成した音声情報を、最後に当該記録手段に記録された音声情報が対応つけられている周辺画像と対応つけて記録する。 In the twelfth invention, the first environment change determining means (100, S91) compares the peripheral image captured by the image capturing means with the peripheral image associated with the sound information recorded last in the recording device , and to determine whether the surrounding environment is changed, the recording means, the audio information is audio information generating means to generate when the first environmental change determination unit determines that the environment has changed, the photographing means is taken recorded in association with the peripheral image, the audio information audio information generating means is generated when the first environmental change unit determines that the environment is not changed, the speech information that was last recorded in the recording means corresponding Record in association with the attached peripheral image.

第１２の発明によれば、利用者の周辺の環境が変化した場合には、変換した環境の周辺画像に音声情報を付加することができる。 According to the twelfth aspect, when the environment around the user changes, audio information can be added to the converted surrounding image of the environment.

第１３の発明は、第１１の発明に従属する発明であって、記録装置に最後に音声情報を記録してから所定の時間が経過したか否かを判断する時間経過判断手段、撮影手段が撮影した周辺画像と最後に記録装置に記録された音声情報が対応つけられた周辺画像とを比較して利用者の環境が変化したか否かを判断する第２環境変化判断手段をさらに備え、記録手段は、時間経過判断手段および第２環境変化判断手段の少なくとも一方の判断結果に基づいて、音声情報を、撮影手段が撮影した周辺画像と対応つけて記録するか、または音声情報を、最後に記録装置に記録された音声情報が対応つけられている周辺画像と対応つけて記録する、音声記録再生装置。
である。 A thirteenth invention is the invention that according to the eleventh invention, the time lapse determining means for determining whether or not the last predetermined time to record the audio information in the recording apparatus has passed, imaging means A second environment change judging means for judging whether or not the environment of the user has changed by comparing the photographed peripheral image with the peripheral image associated with the sound information recorded last in the recording device ; recording means, based on at least one of the determination result of the time lapse determining means and the second environmental change determination means, audio information, or recorded in association with the peripheral image photographing means is taken, or the audio information, the last audio information recorded in the recording apparatus is that records in association with the peripheral image that is to correspond, audio recording and reproducing apparatus.
It is.

第１３の発明では、時間経過判断手段（１００、Ｓ１４、Ｓ１４５）は記録装置に最後に音声情報を記録してから所定の時間が経過したか否かを判断し、第２環境変化判断手段（１００、Ｓ１１９）は撮影手段が撮影した周辺画像と最後に記録装置に記録された音声情報が対応つけられた周辺画像とを比較して利用者の周辺の環境が変化したか否かを判断する。そして、記録手段は、時間経過判断手段および第２環境変化判断手段の少なくとも一方の判断結果に基づいて、音声情報生成手段が生成した音声情報を撮影手段が撮影した周辺画像と対応つけて記録するか、音声情報生成手段が生成した音声情報を最後に当該記録装置に記録された音声情報が対応つけられている周辺画像と対応つけて記録する。 In the thirteenth invention, the time lapse determining means (100, S14, S145) determines whether or not a predetermined time has elapsed since the last recording of the audio information on the recording device , and the second environment change determining means ( 100, S119) compares the peripheral image captured by the image capturing means with the peripheral image associated with the sound information recorded last in the recording device to determine whether the environment around the user has changed. . The recording unit records the audio information generated by the audio information generation unit in association with the peripheral image captured by the imaging unit based on the determination result of at least one of the time passage determination unit and the second environment change determination unit. or, recorded in association with the peripheral image audio information is audio information generating means stored the generated voice information to the end on the recording apparatus is to correspond.

第１３の発明によれば、最後に音声情報を記録してからの経過時間、および利用者の周辺の環境が変化の少なくとも一方に基づいて、音声情報を付加する周辺画像を更新するか否かを決定することができる。 According to the thirteenth aspect, whether or not to update the peripheral image to which the audio information is added based on at least one of the time elapsed since the last recording of the audio information and the environment surrounding the user. Can be determined.

第１４の発明は、第１０の発明ないし第１３の発明のいずれかに従属する発明であって、記録装置に記録されている周辺画像を表示するとともに、記録装置に記録されている周辺画像に対応つけて記録されている音声情報を示すアイコンを当該音声情報の位置情報に基づいた周辺画像上の位置に表示する表示手段、表示手段に表示されたアイコンの指定を受け付けるアイコン指定手段、アイコン指定手段によって指定が受け付けられたアイコンが示す記録装置に記録された音声情報を再生する第２音声再生手段をさらに備える、音声記録再生装置である。 A fourteenth invention is an invention dependent on any one of the tenth invention to thirteenth invention of, and displays the peripheral image recorded in the recording apparatus, the peripheral image recorded in the recording device Display means for displaying audio information recorded in association with each other at a position on the peripheral image based on the position information of the audio information, icon designation means for accepting designation of the icon displayed on the display means, icon designation The audio recording / reproducing apparatus further includes second audio reproducing means for reproducing the audio information recorded in the recording apparatus indicated by the icon whose designation is accepted by the means.

第１４の発明では、表示手段（１０６、Ｓ１０１、Ｓ１０３、Ｓ１０７、Ｓ１０９）は記録装置に記録されている周辺画像を表示するとともに、記録装置に記録されている周辺画像に対応つけて記録されている音声情報を示すアイコンを当該音声情報の位置情報に基づいた周辺画像上の位置に表示し、アイコン指定手段（１０５、Ｓ１１１）は表示手段に表示されたアイコンの指定を受け付け、第２再生手段（１００、Ｓ１１３）はアイコン指定手段によって指定が受け付けられたアイコンが示す記録装置に記録された音声情報を再生する。 In the fourteenth invention, the display means (106, S101, S103, S107 , S109) along with displaying a peripheral image recorded in a recording apparatus, is recorded in association on the peripheral image recorded in the recording device The icon indicating the voice information is displayed at a position on the peripheral image based on the position information of the voice information, and the icon designation means (105, S111) accepts designation of the icon displayed on the display means, and the second reproduction means (100, S113) reproduces the audio information recorded in the recording device indicated by the icon that has been designated by the icon designation means.

第１４の発明によれば、周辺画像上に表示されたアイコンを指定することによってそのアイコンに対応する音声情報を再生することができる。 According to the fourteenth invention, audio information corresponding to an icon can be reproduced by designating the icon displayed on the peripheral image.

第１５の発明は、第１４の発明に従属する発明であって、表示手段は、記録装置に記録されている複数の周辺画像を表示し、表示手段に表示された周辺画像の指定を受け付ける画像指定手段をさらに備え、アイコン指定手段は、画像指定手段によって指定が受け付けられた周辺画像上に表示されたアイコンの指定を受け付ける、音声再生装置である。 The fifteenth invention is an invention dependent on the fourteenth invention, wherein the display means displays a plurality of peripheral images recorded in the recording device and accepts designation of the peripheral images displayed on the display means. The audio designating device further includes a designating unit, and the icon designating unit accepts designation of an icon displayed on a peripheral image whose designation is accepted by the image designating unit.

第１５の発明では、表示手段は記録装置に記録されている複数の周辺画像を表示し、画像指定手段（１０５、Ｓ１０５）は表示手段に表示された周辺画像の指定を受け付け、アイコン指定手段は画像指定手段によって指定が受け付けられた周辺画像上に表示されたアイコンの指定を受け付ける。 In the fifteenth aspect, the display means displays a plurality of peripheral images recorded in the recording device , the image designation means (105, S105) accepts designation of the peripheral images displayed on the display means, and the icon designation means The designation of the icon displayed on the peripheral image whose designation is accepted by the image designation means is accepted.

第１５の発明によれば、周辺画像を指定し、その周辺画像上に表示されたアイコンを指定して、そのアイコンに対応する音声情報を再生することができる。 According to the fifteenth aspect, it is possible to designate a peripheral image, designate an icon displayed on the peripheral image, and reproduce audio information corresponding to the icon.

第１６の発明は、第１４の発明または第１５の発明に従属する発明であって、表示手段に表示された周辺画像上のアイコンの変更位置の指定を受け付ける位置指定手段、位置指定手段が受け付けた変更位置に基づいて当該アイコンの音声情報と対応つけて記録装置に記録されている位置情報を変更する変更手段をさらに備える、音声記録再生装置である。 A sixteenth aspect of the invention is an invention dependent on the fourteenth aspect of the invention or the fifteenth aspect of the invention, wherein the position designation means for accepting designation of the icon change position on the peripheral image displayed on the display means, the position designation means accepts it. The audio recording / reproducing apparatus further includes a changing unit that changes the position information recorded in the recording apparatus in association with the audio information of the icon based on the changed position.

第１６の発明では、位置指定手段（１０５、Ｓ１１５）は表示手段に表示された周辺画像上のアイコンの変更位置の指定を受け付け、変更手段（１００、Ｓ１１７）は位置指定手段が受け付けた変更位置に基づいて当該アイコンの音声情報と対応つけて記録装置に記録されている位置情報を変更する。 In the sixteenth invention, the position designation means (105, S115) accepts designation of the icon change position on the peripheral image displayed on the display means, and the change means (100, S117) accepts the change position accepted by the position designation means. The position information recorded in the recording device is changed in association with the audio information of the icon based on the above.

第１６の発明によれば、周辺画像において音声情報が付加された位置、つまり周辺画像上にアイコンが表示される位置を変更することができる。 According to the sixteenth aspect, the position where the audio information is added to the peripheral image, that is, the position where the icon is displayed on the peripheral image can be changed.

この発明によれば、音声情報は利用者が身体を使って示す方向と対応つけて記録されるので、この音声情報を再生する際には、利用者は身体を使って音声情報の記録時に示したのと同じ方向を示すことになり、利用者は再生時に記録時の身体の体感を再体験でき、音声情報の記録時の思考を呼び覚ましやすくしたり、音声情報の内容を記憶にとどめやすくしたり、音声情報の内容を思考に取り入れやすくしたりできる。 According to the present invention, the voice information is recorded in association with the direction indicated by the user using the body. Therefore, when reproducing the voice information, the user uses the body to indicate the voice information when recording the voice information. The user will be able to re-experience the physical experience during recording, make it easier to awaken thoughts when recording audio information, and keep the contents of audio information in memory. Or make it easier to incorporate the contents of audio information into thoughts.

この発明の上述の目的，その他の目的，特徴および利点は、図面を参照して行う以下の実施例の詳細な説明から一層明らかとなろう。 The above object, other objects, features and advantages of the present invention will become more apparent from the following detailed description of embodiments with reference to the drawings.

＜第１の実施形態＞
図１を参照してこの発明の音声記録再生装置１０は、個人利用の携帯情報端末（ＰＤＡ：Personal Digital Assistants）１１と、骨伝導マイク１２と、例えば耳掛け型のステレオヘッドフォン１３と、ステレオヘッドフォン１３に内蔵された３軸地磁気センサ１４とから構成される。骨伝導マイク１２、ステレオヘッドフォン１３、および３軸地磁気センサ１４は携帯情報端末（以下、“ＰＤＡ１１”と記載する。）にそれぞれ接続される。 <First Embodiment>
Referring to FIG. 1, an audio recording / reproducing apparatus 10 according to the present invention includes a personal information terminal (PDA: Personal Digital Assistants) 11, a bone conduction microphone 12, an ear-mounted stereo headphone 13, and a stereo headphone. 13 and a triaxial geomagnetic sensor 14 built in. The bone conduction microphone 12, the stereo headphones 13, and the triaxial geomagnetic sensor 14 are connected to a portable information terminal (hereinafter referred to as “PDA11”).

骨伝導マイク１２はマイクを装着している利用者の音声を内耳道壁に伝達される信号として直接検出するので喧騒の中においても、利用者の音声を拾うことができる。なお、周りの騒音を考慮しなければ、骨伝導マイク１２に替えて通常のマイクを用いてもよい。 Since the bone conduction microphone 12 directly detects the voice of the user wearing the microphone as a signal transmitted to the inner ear canal wall, the voice of the user can be picked up even in a noise. If the surrounding noise is not taken into account, a normal microphone may be used instead of the bone conduction microphone 12.

３軸地磁気センサ１４としては、バイテック社製の３ＤセンサモジュールＴＤＳ０１Ｖが利用できる。なお、ＴＤＳ０１Ｖには、３軸地磁気センサ１４の他に３軸加速度センサも備えている。 As the triaxial geomagnetic sensor 14, a 3D sensor module TDS01V manufactured by Vitec Corporation can be used. The TDS01V includes a triaxial acceleration sensor in addition to the triaxial geomagnetic sensor 14.

ＰＤＡ１１は、例えば利用者の衣類のポケット１５などに収容される。このＰＤＡ１１は、図２に示すように、ＣＰＵ１００、ＲＯＭ１０１、ＲＡＭ１０２、フラッシュメモリ１０３、時計回路１０４、タッチパネル１０５、ディスプレイ１０６、オーディオ入力Ｉ／Ｆ１０７、オーディオ出力Ｉ／Ｆ１０８、地磁気センサ信号入力Ｉ／Ｆ１０９から構成されている。 The PDA 11 is accommodated, for example, in a pocket 15 of a user's clothing. As shown in FIG. 2, the PDA 11 includes a CPU 100, ROM 101, RAM 102, flash memory 103, clock circuit 104, touch panel 105, display 106, audio input I / F 107, audio output I / F 108, geomagnetic sensor signal input I / F 109. It is composed of

ＰＤＡ１１は、ディスプレイ１０６の表示に基づいて、ディスプレイ１０６の上面に配置されたタッチパネル１０５に入力操作を行うことによって操作することができる。オーディオ入力Ｉ／Ｆ１０７は、骨伝導マイク１２からの音声信号をＰＤＡ１１に入力するためのインターフェースである。また、オーディオ出力Ｉ／Ｆ１０８は、ＰＤＡ１１で再生された音声信号をステレオヘッドフォン１３に出力するためのインターフェースである。そして、地磁気センサ信号入力Ｉ／Ｆ１０９は、３軸地磁気センサ１４からの信号をＰＤＡ１１に入力するためのインターフェースである。 The PDA 11 can be operated by performing an input operation on the touch panel 105 disposed on the upper surface of the display 106 based on the display on the display 106. The audio input I / F 107 is an interface for inputting an audio signal from the bone conduction microphone 12 to the PDA 11. The audio output I / F 108 is an interface for outputting the audio signal reproduced by the PDA 11 to the stereo headphones 13. The geomagnetic sensor signal input I / F 109 is an interface for inputting a signal from the triaxial geomagnetic sensor 14 to the PDA 11.

このような音声記録再生装置１０によれば、骨伝導マイク１２で拾った利用者の発する音声の情報（音声情報）をＰＤＡ１１のフラッシュメモリ１０３に記録することができ、またフラッシュメモリ１０３に記録された音声情報を再生してステレオヘッドフォン１３に出力することができる。 According to such an audio recording / reproducing apparatus 10, audio information (audio information) uttered by a user picked up by the bone conduction microphone 12 can be recorded in the flash memory 103 of the PDA 11, and recorded in the flash memory 103. Audio information can be reproduced and output to the stereo headphones 13.

そして、この音声記録再生装置１０によれば、利用者の発する音声の情報を仮想空間の任意の位置に付加して記録することができ、また仮想空間に付加された任意の音声の情報を再生することができる。 According to the audio recording / reproducing apparatus 10, it is possible to add and record audio information generated by the user at an arbitrary position in the virtual space, and to reproduce arbitrary audio information added to the virtual space. can do.

より具体的には、利用者の音声を記録する際には、３軸地磁気センサ１４でその際の利用者の仮想空間における顔の向きを検出し、この顔の向きを示す方向情報を音声情報と対応つけてフラッシュメモリ１０３に記憶する。そして、フラッシュメモリ１０３に記憶された音声情報を再生する際には、３軸地磁気センサ１４でその際の利用者の仮想空間における顔の向きを検出し、検出された顔の向きから所定の範囲内の方向を示す方向情報と対応つけてフラッシュメモリ１０３に記録されている音声情報を生成する。この際、音声情報は、利用者が仮想空間において音源定位が可能となるように再生される。 More specifically, when recording the user's voice, the three-axis geomagnetic sensor 14 detects the face direction in the virtual space of the user at that time, and the direction information indicating the face direction is used as the voice information. And stored in the flash memory 103. When the audio information stored in the flash memory 103 is reproduced, the orientation of the face in the virtual space of the user at that time is detected by the three-axis geomagnetic sensor 14, and a predetermined range is determined from the detected face orientation. Audio information recorded in the flash memory 103 is generated in association with the direction information indicating the internal direction. At this time, the audio information is reproduced so that the user can perform sound source localization in the virtual space.

また、音声情報の再生の際に、検出された顔の向きから所定範囲外の方向を示す方向情報と対応つけてフラッシュメモリ１０３に記録されている音声情報については、音声情報の代わりに音声情報の存在を示す象徴化された音声（以下、“アイコン音声”と呼ぶ。）が再生される。このアイコン音声も音源定位が可能となるように再生される。 Also, when reproducing audio information, the audio information recorded in the flash memory 103 in association with the direction information indicating the direction outside the predetermined range from the detected face orientation is the audio information instead of the audio information. A symbolized sound (hereinafter referred to as “icon sound”) indicating the presence of the icon is reproduced. This icon sound is also reproduced so that sound source localization is possible.

このように、本発明の音声記録再生装置１０によれば、仮想空間における利用者が記録時に顔の向きで指定した位置（方向）に音声情報を付加し、また再生時に顔の向きで指定した位置（方向）に付加されている音声情報を再生することができる。したがって、音声情報を記録したり再生したりする操作の際の顔の向きという体感が音声情報の内容と結びつけられるので、音声情報の内容が記憶にとどめやすくなり、また音声情報の内容が思考に取り入れやすくなる。 As described above, according to the audio recording / reproducing apparatus 10 of the present invention, the audio information is added to the position (direction) designated by the user in the virtual space by the face direction at the time of recording, and also designated by the face direction at the time of reproduction. Audio information added to the position (direction) can be reproduced. Therefore, the sense of face orientation when recording and playing back audio information is linked to the content of the audio information, making it easier to keep the content of the audio information in memory, and the content of the audio information into thoughts. Easy to incorporate.

また、音声情報とアイコン音声とは、仮想空間において利用者が音源定位することが可能なように再生されるので、ＰＤＡ１１のフラッシュメモリ１０３に記憶されているすべての音声情報のそれぞれの存在を一度に把握することができる。 Also, since the audio information and the icon audio are reproduced so that the user can localize the sound source in the virtual space, the presence of all the audio information stored in the flash memory 103 of the PDA 11 is once detected. Can grasp.

以下に、図３および図４に示すフロー図などを参照しつつ、音声記録再生装置１０で音声を記録する際および音声を再生する際にＰＤＡ１１のＣＰＵ１００が実行する処理について説明する。なお、音声を記録する際の処理と音声を再生する際の処理とを図３のフロー図と図４のフロー図に別々に示しているが、図３のフロー図の処理と図４のフロー図の処理とは、ＣＰＵ１００によって並行して実行される。つまり、音声の記録と音声の再生は同時に行うことができる。ただし、フロー図には示していないが、音声を記録する際には、利用者の思考の邪魔とならないように、再生される音声の音量は低く抑えられる。なお、図３および図４のフロー図に示す処理は一例であり、各処理ステップの順序はこれを変更しても発明を実施できるものについては順序を変更してもかまわない。 The processing executed by the CPU 100 of the PDA 11 when recording audio and reproducing audio with the audio recording / reproducing apparatus 10 will be described below with reference to the flowcharts shown in FIGS. 3 and 4. Note that the processing for recording audio and the processing for reproducing audio are shown separately in the flowchart of FIG. 3 and the flowchart of FIG. 4, but the processing of the flowchart of FIG. 3 and the flowchart of FIG. The processing shown in the figure is executed in parallel by the CPU 100. That is, audio recording and audio reproduction can be performed simultaneously. However, although not shown in the flow diagram, when recording audio, the volume of the reproduced audio is kept low so as not to disturb the user's thought. Note that the processing shown in the flowcharts of FIGS. 3 and 4 is an example, and the order of the processing steps may be changed, or the order of those that can implement the invention may be changed.

まず、ＰＤＡ１１のＣＰＵ１００は、図３のステップＳ１で、音声の入力があるか否かを判断する。この音声の入力があるか否かの判断は、オーディオ入力Ｉ／Ｆ１０７を介して骨伝導マイク１２から入力される音声信号に基づいて判断される。この判断では、オーディオ入力Ｉ／Ｆ１０７に入力される音声信号から音声の信号（有音）である区間（音声区間）の切り出しを試み、音声区間が切り出せた場合に音声の入力があると判断する。 First, the CPU 100 of the PDA 11 determines whether or not there is a voice input in step S1 of FIG. The determination as to whether or not there is an audio input is made based on an audio signal input from the bone conduction microphone 12 via the audio input I / F 107. In this determination, an attempt is made to cut out a section (voice section) that is a voice signal (sound) from the voice signal input to the audio input I / F 107, and when the voice section is cut out, it is determined that there is a voice input. .

音声区間の切り出しでは、音声信号のレベルが閾値を上回る時間が０．３秒を超えた位置を発話開始（音声区間開始）とみなし、その後音声信号のレベルが閾値を下回る時間が１．０秒を超えたときに閾値を下回り始めた位置を発話停止（音声区間終了）とみなすことにより音声区間を決定する。ただし、発話が開始されたと判定した後、０．５秒以内に音声信号のレベルが閾値を下回り、かつ音声信号のレベルが閾値を下回ってからその状態が続く時間が２．０秒以上である場合には、発話が開始されたという判定を覆し音声区間とはみなさない。 In audio segmentation, the position where the time when the level of the audio signal exceeds the threshold exceeds 0.3 seconds is regarded as the start of speech (start of the audio interval), and the time when the level of the audio signal falls below the threshold after that is 1.0 second. The speech section is determined by regarding the position that starts to fall below the threshold when exceeding the threshold as speech stop (speech section end). However, after it is determined that the utterance has started, the level of the voice signal falls below the threshold within 0.5 seconds, and the duration of the state after the voice signal level falls below the threshold is 2.0 seconds or longer. In such a case, the determination that the utterance has been started is reversed, and the speech section is not considered.

このようにして、音声信号から音声区間が切り出されて、音声の入力があると判断すると（ステップＳ１：ＹＥＳ）、次に、ＣＰＵ１００はステップＳ３で、音声信号から音声区間の音声信号を取得して音声情報を生成する。そして生成した音声情報をフラッシュメモリ１０３に保存（記録）する。この音声情報は、例えば、ＭＰ３（MPEG Audio Layer-3）ファイルなどである。 In this way, when the voice section is cut out from the voice signal and it is determined that there is voice input (step S1: YES), the CPU 100 then acquires the voice signal of the voice section from the voice signal in step S3. To generate voice information. The generated audio information is stored (recorded) in the flash memory 103. This audio information is, for example, an MP3 (MPEG Audio Layer-3) file.

ステップＳ５では、利用者の音声が骨伝道マイク１２に入力された際（音声情報を記録する際）の利用者の仮想空間における顔の向きを示す情報（方向情報）を３軸地磁気センサ１４の出力に基づいて生成し、この方向情報をフラッシュメモリ１０３に先に保存された音声情報と対応つけて保存する。ここで、仮想空間とは、図５に示すように、利用者の頭部を中心とする水平な円弧Ｃであり、円弧Ｃは仮想空間の基準である正面から左右にそれぞれ９０°の広がりを有している。なお、仮想空間の基準となる正面は、音声記録再生装置１０の電源が投入されて、利用者がＰＤＡ１１のタッチパネル１０５に対して所定の操作を行った時に、３軸地磁気センサ１４が検出した利用者の顔が向いている方向である。 In step S5, information (direction information) indicating the orientation of the face in the user's virtual space when the user's voice is input to the bone oscillating microphone 12 (when voice information is recorded) is obtained from the triaxial geomagnetic sensor 14. The direction information is generated based on the output, and the direction information is stored in the flash memory 103 in association with the voice information previously stored. Here, as shown in FIG. 5, the virtual space is a horizontal arc C centered on the user's head, and the arc C has a 90 ° spread from the front, which is the reference of the virtual space, to the left and right. Have. In addition, the front which becomes the reference of the virtual space is the use detected by the three-axis geomagnetic sensor 14 when the audio recording / reproducing apparatus 10 is turned on and the user performs a predetermined operation on the touch panel 105 of the PDA 11. This is the direction the person's face is facing.

そして、利用者の仮想空間における顔の向きを示す情報（方向情報）は、仮想空間の基準（正面）線と利用者の顔の向きが示す直線とが成す角度αである。この利用者の顔の向きを示す角度αは、利用者の顔が基準方向（正面方向）を向いているときの３軸地磁気センサ１４の出力を［Ｉ_ｘ，Ｉ_ｙ，Ｉ_ｚ］’とし、顔の向きの測定時の３軸地磁気センサ１４の出力を［Ｍ_ｘ，Ｍ_ｙ，Ｍ_ｚ］’とすると数１および数２で得ることができる。 The information (direction information) indicating the orientation of the face in the user's virtual space is an angle α formed by the reference (front) line of the virtual space and the straight line indicated by the orientation of the user's face. The angle α indicating the orientation of the user's face is represented by [I _x , I _y , I _z ] ′ as the output of the triaxial geomagnetic sensor 14 when the user's face is facing the reference direction (front direction). Assuming that the output of the triaxial geomagnetic sensor 14 at the time of measuring the orientation of the face is [M _x , M _y , M _z ] ′, _Equation 1 and Equation 2 can be obtained.

実験によると、利用者が一方向に顔を向けて音声を入力しているつもりであっても、音声の入力の開始から入力の終了までの間に、角度αに５°から７°のブレがあることが確認された。そのため、音声の入力時の利用者の顔の向きを示す方向情報は、音声の入力の開始から音声の入力の終了までの間（音声区間）の顔の向き（角度α）の平均値とする。なお、３軸地磁気センサ１４からは１００ミリ秒毎に測定値が出力される。 According to the experiment, even if the user intends to input the voice with the face facing in one direction, the angle α is blurred by 5 ° to 7 ° between the start of the input and the end of the input. It was confirmed that there is. Therefore, the direction information indicating the direction of the user's face at the time of voice input is the average value of the face direction (angle α) from the start of voice input to the end of voice input (voice section). . The triaxial geomagnetic sensor 14 outputs a measurement value every 100 milliseconds.

ステップＳ７では、アイコン音声情報を生成して、これを先に保存した音声情報および方向情報と対応つけてフラッシュメモリ１０３に保存する。このアイコン音声情報は、例えば、ＭＰ３（MPEG Audio Layer-3）ファイルなどである。なお、このアイコン音声情報は、あらかじめフラッシュメモリ１０３に記録されているアイコン音声情報の複製である。また、アイコン音声は象徴化された音、例えば、単純な鐘の音の楽音である。ただし、鐘の音の楽音は、利用者の左右の耳の音量差によって音源定位が容易となるように、１０００Ｈｚ以上を含むことが望ましい。 In step S7, icon sound information is generated and stored in the flash memory 103 in association with the sound information and direction information stored earlier. The icon audio information is, for example, an MP3 (MPEG Audio Layer-3) file. This icon sound information is a copy of the icon sound information recorded in the flash memory 103 in advance. The icon sound is a symbolized sound, for example, a simple bell sound. However, it is preferable that the musical sound of the bell sound includes 1000 Hz or more so that sound source localization is facilitated by the difference in volume between the left and right ears of the user.

次に、ステップＳ９では、アイコン音声情報に基づいて再生されるアイコン音声の変調パラメータを決定し、これを先に保存した音声情報などと対応つけてフラッシュメモリ１０３に保存する。ここで、変調パラメータとは、利用者が再生されるアイコン音声を識別しやすいように、仮想空間におけるアイコン音声の位置に応じてアイコン音声の周波数をアイコン音声毎に異ならせるためのものである。変調パラメータは、音声の記録時の利用者の顔の向きの角度α（平均値）の絶対値に基づいて、アイコン音声の周波数を数３によって、１．０倍から２．０倍の間で変換して決定される。この周波数変換により最大で１オクターブの差があるアイコン音声が生成される。 Next, in step S9, the modulation parameter of the icon sound to be reproduced is determined based on the icon sound information, and is stored in the flash memory 103 in association with the sound information previously stored. Here, the modulation parameter is for making the frequency of the icon sound different for each icon sound in accordance with the position of the icon sound in the virtual space so that the user can easily identify the icon sound to be reproduced. The modulation parameter is calculated based on the absolute value of the angle α (average value) of the user's face at the time of recording the sound, and the frequency of the icon sound is between 1.0 and 2.0 times according to Equation 3. Determined by conversion. By this frequency conversion, icon voices having a difference of 1 octave at maximum are generated.

このようにして、利用者が発した音声をＰＤＡ１１に記録することによって、フラッシュメモリ１０３には、１つの音声情報について、図６（ａ）に示すように、音声情報、方向情報、アイコン音声情報、変調パラメータが記録される。 In this way, by recording the voice uttered by the user on the PDA 11, the flash memory 103 stores one piece of voice information, voice information, direction information, and icon voice information as shown in FIG. The modulation parameters are recorded.

一方、音声情報を再生する際には、ＰＤＡ１１のＣＰＵ１００は、図４のステップＳ１１で、３軸地磁気センサ１４の出力に基づいて、仮想空間において利用者の顔が向いている方向を特定する。つまり、利用者の顔が仮想空間の基準からどれだけの角度（α）の方向を向いているかを特定する。 On the other hand, when reproducing audio information, the CPU 100 of the PDA 11 specifies the direction in which the user's face is facing in the virtual space based on the output of the triaxial geomagnetic sensor 14 in step S11 of FIG. That is, it is specified how much angle (α) the user's face is facing from the virtual space reference.

次に、ステップＳ１３では、フラッシュメモリ１０３に記憶されているすべての音声情報について、仮想空間における音声情報が位置する方向が、ステップＳ１１で特定した利用者の顔の向きの方向に対してどれだけの相対的方向（角度δ）（図７参照）を有しているかを算出する。この相対的方向（角度δ）は、フラッシュメモリ１０３に音声情報と対応つけて記録されている方向情報とステップＳ１１で特定された利用者の顔の向きの角度の情報に基づいて算出される。 Next, in step S13, for all audio information stored in the flash memory 103, how much the audio information is located in the virtual space with respect to the direction of the user's face specified in step S11. The relative direction (angle δ) (see FIG. 7) is calculated. This relative direction (angle δ) is calculated based on the direction information recorded in the flash memory 103 in association with the audio information and the information on the angle of the user's face orientation specified in step S11.

そして、ステップＳ１５では、フラッシュメモリ１０３に記録されている音声情報のうち、ステップＳ１３で算出された相対的方向が、例えば、±１０°の範囲以内である音声情報を再生してオーディオ出力Ｉ／Ｆ１０８を介してステレオヘッドフォン１３に出力する。このとき、音声情報は、仮想空間における音声情報の位置を利用者が特定することが可能なように、つまり音源定位が可能なように再生される。 In step S15, of the audio information recorded in the flash memory 103, the audio information whose relative direction calculated in step S13 is within a range of ± 10 °, for example, is reproduced and the audio output I / O is reproduced. The data is output to the stereo headphones 13 via F108. At this time, the sound information is reproduced so that the user can specify the position of the sound information in the virtual space, that is, the sound source can be localized.

具体的には、利用者の左右の耳のそれぞれに対する音声の到達の時間差と音声の音量差とに基づいて音源定位が可能なように音声情報を再生する。 Specifically, the sound information is reproduced so that sound source localization can be performed based on the time difference between the arrival of sound to the left and right ears of the user and the sound volume difference.

図７に示すように、利用者の左右の耳の間の距離をｈ［ｃｍ］、左右の耳から仮想空間である円弧Ｃまでの距離をｃ［ｃｍ］とすると、音源（音声情報）から左耳までの距離ｄＬ［ｃｍ］および右耳までの距離ｄＲ［ｃｍ］はそれぞれ、数４および数５で決定される。そして、音声の左右の耳への到達にかかる時間は、音速３４［ミリ秒／ｃｍ］を用いて、それぞれ、ｄＬ／３４、ｄＲ／３４で導出され、これらに基づいて時間差が算出される。 As shown in FIG. 7, when the distance between the left and right ears of the user is h [cm] and the distance from the left and right ears to the arc C, which is a virtual space, is c [cm], the sound source (voice information) The distance dL [cm] to the left ear and the distance dR [cm] to the right ear are determined by Equations 4 and 5, respectively. Then, the time required for the voice to reach the left and right ears is derived at dL / 34 and dR / 34, respectively, using the sound velocity 34 [milliseconds / cm], and the time difference is calculated based on these.

なお、図７（ａ）は利用者の顔が仮想空間の正面の方向を向いている場合を示しており、図７（ｂ）は利用者の顔が仮想空間の正面から角度αの方向を向いている場合を示している。また、これらの図において、角度γは仮想空間の正面の方向から音声情報が位置する方向までの角度を示しており、角度δは利用者の顔が向いている方向から音声情報が位置する方向までの角度を示している。 FIG. 7A shows the case where the user's face is facing the front of the virtual space, and FIG. 7B shows the direction of the angle α from the front of the virtual space. The case where it faces is shown. In these figures, the angle γ indicates the angle from the front direction of the virtual space to the direction in which the audio information is located, and the angle δ indicates the direction in which the audio information is located from the direction in which the user's face is facing. The angle up to is shown.

利用者の左右の耳の間の距離ｈ［ｃｍ］および利用者から円弧Ｃまでの距離ｃ［ｃｍ］はＰＤＡ１１のＲＯＭ１０１に記録されている。これは他の実施形態でも同じである。 The distance h [cm] between the left and right ears of the user and the distance c [cm] from the user to the arc C are recorded in the ROM 101 of the PDA 11. This is the same in other embodiments.

一方、音量差は、一般的な音源であれば、常に同じ音量であっても定位が変わると左右の耳に届く音量［ｄｂ］の総合は常に同じであるとは限らないが、本実施形態では回り込み音声などの計算は行わず、単純に距離による減衰率“ａｔｎ”のみに基づいて算出する。減衰率の初期値（ＡＴＮ_ｉｎｉｔ）を“０．８ｃ^２”とすると、音源と耳との距離“ｄ”とに基づく減衰率“ａｔｎ”は、数６で算出される。ただし、ｄ_ｍｉｎ＝ｃである。例えば、ｃ＝１５、ｈ＝１５とすれば、ＡＴＮ_ｉｎｉｔ＝１８０である。 On the other hand, if the volume difference is a general sound source, the total volume [db] reaching the left and right ears is not always the same when the localization changes even if the volume is always the same. However, the calculation of the wraparound voice or the like is not performed, and the calculation is simply based on the attenuation rate “atn” due to the distance. When the initial value (ATN _init ) of the attenuation rate is “0.8c ² ”, the attenuation rate “atn” based on the distance “d” between the sound source and the ear is calculated by Equation 6. However, d _min = c. For example, if c = 15 and h = 15, then ATN _init = 180.

次に、ステップＳ１７では、フラッシュメモリ１０３に記録されている音声情報のうち、ステップＳ１３で算出された相対的方向が、例えば、±１０°の範囲外である音声情報について、アイコン音声情報を変調パラメータに基づいて変調して再生し、オーディオ出力Ｉ／Ｆ１０８を介してステレオヘッドフォン１３に出力する。アイコン音声情報も上述したように、利用者の左右の耳のそれぞれに対する音声の到達の時間差と音声の音量差とに基づいて音源定位が可能なように再生する。なお、ステップＳ１５における音声情報の再生とステップＳ１７におけるアイコン音声情報の再生とは同時に並行して実施される。また、音声情報とアイコン音声情報とは、５００ミリ秒の無音区間を挟みながら繰り返し再生される。 Next, in step S17, icon audio information is modulated for audio information whose relative direction calculated in step S13 is outside the range of ± 10 °, for example, of the audio information recorded in the flash memory 103. Modulated and reproduced based on the parameters and output to the stereo headphones 13 via the audio output I / F 108. As described above, the icon sound information is also reproduced so that sound source localization can be performed based on the time difference of sound arrival and the sound volume difference between the left and right ears of the user. Note that the reproduction of the audio information in step S15 and the reproduction of the icon audio information in step S17 are performed simultaneously in parallel. Also, the audio information and the icon audio information are repeatedly reproduced with a silent section of 500 milliseconds in between.

このようにして、第１の実施形態の音声記録再生装置１０では、図８に示すように、仮想空間の円弧Ｃにおいて、利用者の顔の向いている方向から±１０°の範囲以内に位置する音声情報（図中で黒塗りの音声アイコン２０１で示される音声情報）はそのまま音声情報が音源定位可能に再生され、利用者の顔の向いている方向から±１０°の範囲外に位置する音声情報（図中で白抜きの音声アイコン２０２で示される音声情報）は音声情報に替えてアイコン音声情報が音源定位可能に再生される。
＜第２の実施形態＞
第２の実施形態の音声記録再生装置１０では、音声情報を付加する仮想空間を３次元空間とする。つまり、仮想空間における利用者の顔の向きの情報（方向情報）として、第１実施形態と同じ水平方向の角度に加え、垂直方向の角度を取り入れる。また、レイヤという概念を取り入れ、仮想空間における音声情報の利用者（の頭の中心）からの距離を考慮する。なお、この実施形態では、レイヤとは、仮想空間において利用者の頭部の中心を同心として一定の距離をおいて位置する半球面状の層である。 In this way, in the audio recording / reproducing apparatus 10 of the first embodiment, as shown in FIG. 8, the arc C in the virtual space is located within a range of ± 10 ° from the direction in which the user's face is facing. The audio information (audio information indicated by the black audio icon 201 in the figure) is reproduced as it is so that the sound source can be localized, and is located outside the range of ± 10 ° from the direction of the user's face. Audio information (audio information indicated by a white audio icon 202 in the drawing) is reproduced in place of the audio information so that the icon audio information can be localized.
<Second Embodiment>
In the audio recording / reproducing apparatus 10 of the second embodiment, a virtual space to which audio information is added is a three-dimensional space. That is, as the information (direction information) of the user's face orientation in the virtual space, the angle in the vertical direction is incorporated in addition to the same angle in the horizontal direction as in the first embodiment. In addition, the concept of layer is adopted, and the distance from the user (the center of the head) of the voice information in the virtual space is considered. In this embodiment, the layer is a hemispherical layer positioned at a certain distance with the center of the user's head concentric in the virtual space.

具体的には、新たに音声情報を記録する際（仮想空間に付加する際）には、当該音声情報が属するレイヤを最も浅い（距離が近い）レイヤとし、新たな音声情報を記録する際にすでに記録されている音声情報については、新たな音声情報を記録する際に最後の音声情報を記録した時刻から所定の時間が経過していれば、当該音声情報が属するレイヤをより深い（距離が遠い）レイヤに更新する。つまり、すでに記録されている音声情報のレイヤをより利用者から遠いレイヤとする。 Specifically, when recording new audio information (when adding it to the virtual space), when recording the new audio information, the layer to which the audio information belongs is the shallowest (closest distance) layer. For audio information that has already been recorded, if a predetermined time has elapsed since the time when the last audio information was recorded when recording new audio information, the layer to which the audio information belongs is deeper (the distance is Update to the far) layer. In other words, the already recorded audio information layer is a layer farther from the user.

したがって、第２の実施形態の音声記録再生装置１０では、図９に示すように、仮想空間において利用者によって指定されたレイヤに属し（図９の例ではレイヤ２が指定されている。）、利用者の顔が向いている方向から水平方向に±１０°の範囲以内で、かつ垂直方向に±１０°の範囲以内に位置する音声情報（図中で黒塗りの音声アイコン２０１で示される音声情報）はそのまま音声情報を音源定位可能に再生し、利用者によって指定されたレイヤ以外に属する音声情報、および利用者によって指定されたレイヤに属するが利用者の顔が向いている方向から水平方向に±１０°の範囲外で、かつ垂直方向に±１０°の範囲外に位置する音声情報（図中で白抜きの音声アイコン２０２で示される音声情報）は音声情報に替えてアイコン音声情報を音源定位可能に再生する。 Therefore, in the audio recording / reproducing apparatus 10 of the second embodiment, as shown in FIG. 9, it belongs to the layer designated by the user in the virtual space (layer 2 is designated in the example of FIG. 9). Audio information located within a range of ± 10 ° in the horizontal direction and within a range of ± 10 ° in the vertical direction from the direction in which the user's face is facing (the audio indicated by the black audio icon 201 in the figure) Information) is reproduced as it is so that sound source localization can be performed, and audio information belonging to a layer other than the layer specified by the user, and horizontal direction from the direction specified by the user but facing the user's face Voice information (voice information indicated by white voice icon 202 in the figure) that is outside the range of ± 10 ° and outside the range of ± 10 ° in the vertical direction is replaced with voice information. Plays sound source so that it can be localized.

なお、利用者の左右の耳の間の距離をｈ［ｃｍ］とすると、隣接するレイヤ間の距離はｈ／２［ｃｍ］である。また、利用者の耳から最近のレイヤまでの距離はｃ［ｃｍ］である。 If the distance between the user's left and right ears is h [cm], the distance between adjacent layers is h / 2 [cm]. The distance from the user's ear to the latest layer is c [cm].

図９において、音声アイコン２０２は、利用者の顔の向きより下方に位置する音声情報についてはより小さく、顔の向きより上方に位置する音声情報についてはより大きくなるように示している。 In FIG. 9, the audio icon 202 is shown to be smaller for audio information positioned below the user's face direction and larger for audio information positioned above the face direction.

なお、音声記録再生装置１０を構成するＰＤＡ１１、骨伝導マイク１２、ステレオヘッドフォン１３、および３軸地磁気センサ１４の構成は、第１の実施形態のものと同じである。 The configurations of the PDA 11, the bone conduction microphone 12, the stereo headphone 13, and the triaxial geomagnetic sensor 14 that constitute the audio recording / reproducing apparatus 10 are the same as those in the first embodiment.

以下に、図１０および図１１に示すフロー図などを参照しつつ、音声記録再生装置１０で音声を記録する際および音声を再生する際にＰＤＡ１１のＣＰＵ１００が実行する処理について説明する。なお、音声を記録する際の処理と音声を再生する際の処理とを図１０のフロー図と図１１のフロー図に別々に示しているが、図１０のフロー図の処理と図１１のフロー図の処理とは、ＣＰＵ１００によって並行して実行される。つまり、音声の記録と音声の再生は同時に行うことができる。ただし、フロー図には示していないが、音声を記録する際には、利用者の思考の邪魔とならないように、再生される音声の音量は低く抑えられる。なお、図１０および図１１のフロー図に示す処理は一例であり、各処理ステップの順序はこれを変更しても発明を実施できるものについては順序を変更してもかまわない。 Hereinafter, processing executed by the CPU 100 of the PDA 11 when recording audio and reproducing audio with the audio recording / reproducing apparatus 10 will be described with reference to the flowcharts shown in FIGS. 10 and 11. The processing for recording audio and the processing for reproducing audio are separately shown in the flowchart of FIG. 10 and the flowchart of FIG. 11, but the processing of the flowchart of FIG. 10 and the flowchart of FIG. The processing shown in the figure is executed in parallel by the CPU 100. That is, audio recording and audio reproduction can be performed simultaneously. However, although not shown in the flow diagram, when recording audio, the volume of the reproduced audio is kept low so as not to disturb the user's thought. Note that the processing illustrated in the flowcharts of FIGS. 10 and 11 is an example, and the order of the processing steps may be changed or the order of those that can implement the invention may be changed.

なお、図１０のフロー図の処理を実行して、利用者が発した音声をＰＤＡ１１に記録すると、フラッシュメモリ１０３には、１つの音声情報について、図６（ｂ）に示すように、レイヤ識別情報、音声情報、方向情報、アイコン音声情報、変調パラメータ、時刻情報が記録される。 When the process shown in the flowchart of FIG. 10 is executed and the voice uttered by the user is recorded in the PDA 11, the layer identification of one piece of voice information is stored in the flash memory 103 as shown in FIG. Information, audio information, direction information, icon audio information, modulation parameters, and time information are recorded.

まず、ＰＤＡ１１のＣＰＵ１００は、図１０のステップＳ２１で、音声の入力があるか否かを判断する。この音声の入力があるか否かの判断は、オーディオ入力Ｉ／Ｆ１０７を介して骨伝導マイク１２から入力される音声信号に基づいて判断される。この判断では、オーディオ入力Ｉ／Ｆ１０７に入力される音声信号から音声の信号（有音）である区間（音声区間）の切り出しを試み、音声区間が切り出せた場合に音声の入力があると判断する。なお、音声区間の切り出しの方法は、第１の実施形態と同じである。 First, the CPU 100 of the PDA 11 determines whether or not there is an audio input in step S21 of FIG. The determination as to whether or not there is an audio input is made based on an audio signal input from the bone conduction microphone 12 via the audio input I / F 107. In this determination, an attempt is made to cut out a section (voice section) that is a voice signal (sound) from the voice signal input to the audio input I / F 107, and when the voice section is cut out, it is determined that there is a voice input. . Note that the method of segmenting the speech section is the same as that in the first embodiment.

音声信号から音声区間が切り出されて、音声の入力があると判断すると（ステップＳ２１：ＹＥＳ）、次に、ＣＰＵ１００はステップＳ２３で、音声区間の音声信号に基づいて音声認識を行って、利用者によって入力された音声がレイヤを指定する音声であるか否かを判断する。利用者はレイヤを指定する際にはあらかじめ決められたフレーズである、例えば、“レイヤ２を指定”などと発話するように求められており、ＣＰＵ１００は、音声区間の音声信号の音声がこのフレーズに合致するか否かを音声認識によって判断し、入力された音声がレイヤを指定するものであるか否かを判断する。そして、レイヤを指定する音声であると判断すると（ステップＳ２３：ＹＥＳ）、ステップＳ２５の音声の再生に処理を引き継ぐ。なお、ステップＳ２５の音声の再生の処理は、図１１のステップＳ５９以降の処理である。 If it is determined that a voice section is cut out from the voice signal and there is a voice input (step S21: YES), then the CPU 100 performs voice recognition based on the voice signal in the voice section in step S23, and the user. It is determined whether or not the voice input by is a voice designating a layer. When the user designates a layer, the user is required to speak a predetermined phrase, for example, “designate layer 2”, and the CPU 100 determines that the voice of the voice signal in the voice section is the phrase. Is determined by voice recognition, and it is determined whether or not the input voice specifies a layer. If it is determined that the sound is for designating a layer (step S23: YES), the processing is taken over for sound reproduction in step S25. Note that the sound reproduction processing in step S25 is processing in step S59 and subsequent steps in FIG.

一方、レイヤを指定する音声でないと判断すると（ステップＳ２３：ＮＯ）、続いて、ステップＳ２７において、時計回路１０４から現在（音声情報の記録時（より正確には音声区間の開始時））の時刻の情報を取得し、これをフラッシュメモリ１０３に記録する。 On the other hand, if it is determined that the voice is not a layer designating voice (step S23: NO), then in step S27, the current time from the clock circuit 104 (at the time of recording voice information (more accurately, at the start of the voice section)). Is recorded in the flash memory 103.

そして、ステップＳ２９では、最後に（前回）フラッシュメモリ１０３に記録された音声情報に対応つけられて記録されている時刻の情報を参照し、ステップＳ２７で時計回路１０４から取得した現在の時刻が、最後に音声情報を記録した時刻から、例えば、１時間が経過しているか否かを判断する。なお、フラッシュメモリ１０３にすでに音声情報と対応つけて記録された時刻の情報が存在しない場合は、つまりいまだに音声情報が記録されていない場合は、１時間が経過していないと判断する。 In step S29, the time information recorded lastly (previously) in association with the audio information recorded in the flash memory 103 is referred to, and the current time acquired from the clock circuit 104 in step S27 is For example, it is determined whether or not one hour has passed since the time when the voice information was last recorded. If there is no time information recorded in the flash memory 103 in association with the voice information, that is, if no voice information has been recorded yet, it is determined that one hour has not passed.

１時間が経過していると判断すると（ステップＳ２９：ＹＥＳ）、ステップＳ３１で、フラッシュメモリ１０３にすでに記録されている音声情報のレイヤを示すレイヤ識別情報を更新する。つまり、レイヤ識別情報が示すレイヤ数（レイヤ１、レイヤ２、レイヤ３など）を１つだけインクリメントする。なお、レイヤ数が大きいレイヤほど利用者からレイヤの距離は遠くなる。一方、１時間が経過していないと判断すると（ステップＳ２９：ＮＯ）、ステップＳ３１をスキップする。 If it is determined that one hour has passed (step S29: YES), the layer identification information indicating the layer of the audio information already recorded in the flash memory 103 is updated in step S31. That is, the number of layers (layer 1, layer 2, layer 3, etc.) indicated by the layer identification information is incremented by one. Note that the greater the number of layers, the farther the layer is from the user. On the other hand, if it is determined that one hour has not elapsed (step S29: NO), step S31 is skipped.

ステップＳ３３では、現在記録しようとしている音声情報のレイヤ識別情報、つまり初期値であるレイヤ１を示す情報を、ステップＳ２７で保存した時刻の情報に対応つけてフラッシュメモリ１０３に保存する。 In step S33, layer identification information of the audio information to be recorded at present, that is, information indicating the initial value of layer 1 is stored in the flash memory 103 in association with the time information stored in step S27.

次に、ＣＰＵ１００はステップＳ３５で、音声信号から音声区間の音声信号を取得して音声情報を生成する。そして生成した音声情報を先に保存した時刻の情報およびレイヤ識別情報と対応つけてフラッシュメモリ１０３に保存（記録）する。この音声情報は、例えば、ＭＰ３（MPEG Audio Layer-3）ファイルなどである。 Next, CPU100 acquires the audio | voice signal of an audio | voice area from an audio | voice signal by step S35, and produces | generates audio | voice information. The generated audio information is stored (recorded) in the flash memory 103 in association with the time information and the layer identification information stored earlier. This audio information is, for example, an MP3 (MPEG Audio Layer-3) file.

ステップＳ３７では、利用者の音声が骨伝道マイク１２に入力された際（音声情報を記録する際）の利用者の仮想空間における顔の向きを示す情報（方向情報）を３軸地磁気センサ１４の出力に基づいて生成し、フラッシュメモリ１０３に先に保存された音声情報などと対応つけて保存する。ここで、仮想空間とは、図９に示したように、利用者の頭部を中心とする半球面の層の連なりである。なお、仮想空間の基準となる正面は、音声記録再生装置１０の電源が投入されて、利用者がＰＤＡ１１のタッチパネル１０５に対して所定の操作を行った時に、３軸地磁気センサ１４が検出した利用者の顔が向いている方向である。
＜第３の実施形態＞
第３の実施形態の音声記録再生装置１０ａでは、図１３に示すように、第１の実施形態および第２の実施形態の音声記録再生装置１０の構成に加え、魚眼レンズを有するカメラ１６を備えている。このカメラ１６は、音声の記録時の利用者の正面方向の周辺画像を撮影するものであり、利用者の胴体の前方に位置するように図示しない紐によって首から下げられる。一方、ＰＤＡ１１ａは、図１４に示すように、第１の実施形態および第２の実施形態のＰＤＡ１１の構成に加え、カメラ１６からの画像信号を入力するための画像信号入力Ｉ／Ｆ１１０を備えている。なお、音声記録再生装置１０ａおよびＰＤＡ１１ａについて、音声記録再生装置１０およびＰＤＡ１１と同じ構成については同じ符号を用いている。 In step S <b> 37, information (direction information) indicating the orientation of the face in the virtual space of the user when the user's voice is input to the bone oscillating microphone 12 (when recording voice information) is stored in the triaxial geomagnetic sensor 14. It is generated based on the output and stored in correspondence with the voice information previously stored in the flash memory 103. Here, the virtual space is a series of hemispherical layers centered on the user's head, as shown in FIG. In addition, the front which becomes the reference of the virtual space is the use detected by the three-axis geomagnetic sensor 14 when the audio recording / reproducing apparatus 10 is turned on and the user performs a predetermined operation on the touch panel 105 of the PDA 11. This is the direction the person's face is facing.
<Third Embodiment>
As shown in FIG. 13, the audio recording / reproducing apparatus 10a of the third embodiment includes a camera 16 having a fisheye lens in addition to the configurations of the audio recording / reproducing apparatus 10 of the first and second embodiments. Yes. The camera 16 shoots a peripheral image in the front direction of the user at the time of recording sound, and is lowered from the neck by a string (not shown) so as to be positioned in front of the user's torso. On the other hand, as shown in FIG. 14, the PDA 11 a includes an image signal input I / F 110 for inputting an image signal from the camera 16 in addition to the configuration of the PDA 11 of the first embodiment and the second embodiment. Yes. In addition, about the audio recording / reproducing apparatus 10a and PDA11a, the same code | symbol is used about the same structure as the audio recording / reproducing apparatus 10 and PDA11.

そして、利用者の仮想空間における顔の向きを示す情報（方向情報）は、仮想空間の基準（正面）線と利用者の顔の向きが示す直線とが成す、水平方向の角度α（図５参照）および垂直方向の角度β（図１２参照）である。なお、先述したように、音声の入力の開始から音声の入力の終了までの間に利用者の顔の向きにブレが生じるため、そのため、音声の入力時の利用者の顔の向きを示す方向情報は、音声の入力の開始から音声の入力の終了までの間（音声区間）の水平方向の顔の向き（角度α）の平均値および垂直方向の顔の向き（角度β）の平均値とする。 The information (direction information) indicating the orientation of the face in the user's virtual space is the horizontal angle α (FIG. 5) formed by the reference (front) line of the virtual space and the straight line indicating the orientation of the user's face. Reference) and the vertical angle β (see FIG. 12). As described above, since the direction of the user's face is blurred between the start of the voice input and the end of the voice input, the direction indicating the direction of the user's face when the voice is input. The information includes the average value of the horizontal face direction (angle α) and the average value of the vertical face direction (angle β) from the start of voice input to the end of voice input (voice section). To do.

ステップＳ３９では、アイコン音声情報を生成して、これを先に保存した音声情報、方向情報などと対応つけてフラッシュメモリ１０３に保存する。なお、このアイコン音声情報は、先述したように、あらかじめフラッシュメモリ１０３に記録されている鐘の音の楽音の音声情報の複製である。 In step S39, icon audio information is generated and stored in the flash memory 103 in association with the previously stored audio information, direction information, and the like. The icon sound information is a copy of the sound information of the bell sound previously recorded in the flash memory 103, as described above.

次に、ステップＳ４１では、アイコン音声情報に基づいて再生されるアイコン音声の変調パラメータを決定し、これを先に保存した音声情報などと対応つけてフラッシュメモリ１０３に保存する。ここで、変調パラメータとは、先述したように、利用者が再生されるアイコン音声を識別しやすいように、仮想空間におけるアイコン音声の位置に応じてアイコン音声の周波数を変化させるためのものである。 Next, in step S41, a modulation parameter of the icon sound to be reproduced is determined based on the icon sound information, and is stored in the flash memory 103 in association with the sound information previously stored. Here, as described above, the modulation parameter is for changing the frequency of the icon sound in accordance with the position of the icon sound in the virtual space so that the user can easily identify the icon sound to be reproduced. .

一方、音声情報を再生する際には、ＰＤＡ１１のＣＰＵ１００は、図１１のステップ５１で、音声の入力があるか否かを判断する。この判断の方法は、図１０のステップ２１における判断と同じである。そして、音声の入力がないと判断すると（ステップＳ５１：ＮＯ）、ステップＳ５３で、再生する音声情報の属するレイヤ数を初期値に設定する。つまり、レイヤ１が指定されたとみなす。 On the other hand, when reproducing audio information, the CPU 100 of the PDA 11 determines whether or not there is an audio input in step 51 of FIG. This determination method is the same as the determination in step 21 of FIG. If it is determined that no sound is input (step S51: NO), the number of layers to which the sound information to be reproduced belongs is set to an initial value in step S53. That is, it is considered that layer 1 is designated.

一方、音声の入力があると判断すると（ステップＳ５１：ＹＥＳ）、続いて、ステップＳ５５で、入力された音声はレイヤを指定する音声であるか否かを音声認識により判断する。この判断の方法は、図１０のステップＳ２３における判断と同じである。そして、レイヤを指定する音声でない、つまり記録すべき音声であると判断すると（ステップＳ５５：ＮＯ）、ステップＳ５７の音声の記録に処理を引き継ぐ。なお、音声の記録の処理は、図１０のフロー図のステップＳ２７以降の処理である。 On the other hand, if it is determined that there is a voice input (step S51: YES), then in step S55, it is determined by voice recognition whether or not the input voice is a voice designating a layer. This determination method is the same as the determination in step S23 of FIG. If it is determined that the sound is not a sound for designating a layer, that is, a sound to be recorded (step S55: NO), the processing is taken over for the sound recording in step S57. The audio recording process is a process after step S27 in the flowchart of FIG.

一方、レイヤを指定する音声であると判断すると（ステップＳ５５：ＹＥＳ）、ステップＳ５９で、入力された音声の音声認識の結果に基づいてレイヤ数を特定し、ステップＳ６１で特定されたレイヤ数を再生するべき音声情報が属するレイヤ数として設定する。つまり、音声認識されたレイヤ数が指定されたとみなす。 On the other hand, if it is determined that the voice specifies a layer (step S55: YES), the number of layers is specified based on the result of speech recognition of the input voice in step S59, and the number of layers specified in step S61 is determined. This is set as the number of layers to which audio information to be reproduced belongs. That is, it is considered that the number of layers that have been voice-recognized is designated.

次に、ステップＳ６３で、３軸地磁気センサ１４の出力に基づいて、仮想空間において利用者の顔が向いている方向を特定する。つまり、利用者の顔が仮想空間の基準から水平方向にどれだけの角度（α）の方向を向いており、垂直方向にどれだけの角度（β）の方向を向いているかを特定する。 Next, in step S63, based on the output of the triaxial geomagnetic sensor 14, the direction in which the user's face is facing in the virtual space is specified. That is, it is specified how much angle (α) the user's face faces in the horizontal direction from the reference of the virtual space and how much angle (β) faces in the vertical direction.

ステップＳ６５では、フラッシュメモリ１０３に記憶されているすべての音声情報について、仮想空間における音声情報が位置する方向の、ステップＳ６３で特定した利用者の顔の向きの方向に対する相対的方向を算出する。この相対的方向は、フラッシュメモリ１０３に音声情報と対応つけて記録されている方向情報（水平方向角度および垂直方向角度）とステップＳ６３で特定された利用者の顔の向きの角度の情報に基づいて算出される。 In step S65, for all audio information stored in the flash memory 103, the relative direction of the direction in which the audio information is located in the virtual space with respect to the direction of the user's face specified in step S63 is calculated. This relative direction is based on the direction information (horizontal angle and vertical angle) recorded in the flash memory 103 in association with the audio information and the information on the angle of the face of the user specified in step S63. Is calculated.

そして、ステップＳ６７では、フラッシュメモリ１０３に記録されている音声情報のうち、ステップＳ５３またはステップＳ６１で設定（指定）されたレイヤに属し、ステップＳ６５で算出された相対的方向が水平方向および垂直方向ともに±１０°の範囲以内である音声情報を再生してオーディオ出力Ｉ／Ｆ１０８を介してステレオヘッドフォン１３に出力する。このとき、音声情報は、仮想空間における音声情報の位置を利用者が特定することが可能なように、つまり音源定位が可能なように再生される。具体的には、利用者の左右の耳のそれぞれに対する音声の到達の時間差と音声の音量差とに基づいて音源定位が可能なように音声情報を再生する。 In step S67, the audio information recorded in the flash memory 103 belongs to the layer set (designated) in step S53 or step S61, and the relative directions calculated in step S65 are the horizontal direction and the vertical direction. Both audio information within ± 10 ° is reproduced and output to the stereo headphones 13 via the audio output I / F 108. At this time, the sound information is reproduced so that the user can specify the position of the sound information in the virtual space, that is, the sound source can be localized. Specifically, the sound information is reproduced so that sound source localization can be performed based on the time difference between the arrival of sound to the left and right ears of the user and the sound volume difference.

音声の到達の時間差と音声の音量差との算出方法は、第１実施形態で説明したものと同じ方法で実現できるが、音声情報を音源定位が可能なように再生する処理は、大内誠，岩谷幸雄，鈴木陽一，棟方哲弥による“三次元音響ＶＲエデュテイメントシステムによる視覚障害者の空間認識能力訓練効果”，第３回科学技術フォーラムＦＩＴ２００４情報技術レターズ，ｐｐ．２８３−２８４，２００４や、岩永信之，坂本憲成，小林亙，尾上孝雄，白川功による“組込みシステム向けヘッドフォンステレオ頭外音場拡大手法とその実装”，第１７回ディジタル信号処理シンポジウム，Ｂ２−４，２００２で提案されている手法に基づく、音波の音量・伝達時間・周波数特性の差分や変化などにより実現される。 The calculation method of the time difference of voice arrival and the volume difference of the voice can be realized by the same method as described in the first embodiment, but the process of reproducing the voice information so that sound source localization can be performed is Makoto Ouchi. , "Effects of spatial recognition ability training for visually handicapped by 3D acoustic VR edutainment system" by Yukio Iwatani, Yoichi Suzuki and Tetsuya Munakata, 3rd Science and Technology Forum FIT 2004 Information Technology Letters, pp. 283-284, 2004, Nobuyuki Iwanaga, Kennari Sakamoto, Jun Kobayashi, Takao Onoe, Isao Shirakawa "Headphone Stereo Out-of-head Sound Field Expansion Method for Embedded Systems and its Implementation", 17th Digital Signal Processing Symposium, B2- This is realized by a difference or change in sound volume, transmission time, and frequency characteristics based on the method proposed in US Pat.

次に、ステップＳ６９では、フラッシュメモリ１０３に記録されている音声情報のうち、指定されたレイヤ以外に属する音声情報、および指定されたレイヤに属すがステップＳ６５で算出された相対的方向が水平方向および垂直方向ともに±１０°の範囲外である音声情報についてアイコン音声情報を変調パラメータに基づいて変調して再生してオーディオ出力Ｉ／Ｆ１０８を介してステレオヘッドフォン１３に出力する。このアイコン音声情報も上述したように、上述の手法に基づいて音源定位が可能なように再生される。なお、ステップＳ６７における音声情報の再生とステップＳ６９におけるアイコン音声情報の再生とは同時に並行して実施される。また、音声情報とアイコン音声情報とは、５００ミリ秒の無音区間を挟みながら繰り返し再生される。
＜第３の実施形態＞
第３の実施形態の音声再生装置１０ａでは、図１３に示すように、第１の実施形態および第２の実施形態の音声再生装置１０の構成に加え、魚眼レンズを有するカメラ１６を備えている。このカメラ１６は、音声の記録時の利用者の正面方向の周辺画像を撮影するものであり、利用者の胴体の前方に位置するように図示しない紐によって首から下げられる。一方、ＰＤＡ１１ａは、図１４に示すように、第１の実施形態および第２の実施形態のＰＤＡ１１の構成に加え、カメラ１６からの画像信号を入力するための画像信号入力Ｉ／Ｆ１１０を備えている。なお、音声再生装置１０ａおよびＰＤＡ１１ａについて、音声再生装置１０およびＰＤＡ１１と同じ構成については同じ符号を用いている。 Next, in step S69, of the audio information recorded in the flash memory 103, the audio information belonging to other than the designated layer and the relative direction belonging to the designated layer but calculated in step S65 is the horizontal direction. For the audio information that is out of the range of ± 10 ° in the vertical direction, the icon audio information is modulated and reproduced based on the modulation parameter, and is output to the stereo headphones 13 via the audio output I / F 108. As described above, this icon sound information is also reproduced so that sound source localization can be performed based on the above-described method. Note that the reproduction of the audio information in step S67 and the reproduction of the icon audio information in step S69 are simultaneously performed in parallel. Also, the audio information and the icon audio information are repeatedly reproduced with a silent section of 500 milliseconds in between.
<Third Embodiment>
As shown in FIG. 13, the audio playback device 10 a of the third embodiment includes a camera 16 having a fisheye lens in addition to the configurations of the audio playback devices 10 of the first embodiment and the second embodiment. The camera 16 shoots a peripheral image in the front direction of the user at the time of recording sound, and is lowered from the neck by a string (not shown) so as to be positioned in front of the user's torso. On the other hand, as shown in FIG. 14, the PDA 11 a includes an image signal input I / F 110 for inputting an image signal from the camera 16 in addition to the configuration of the PDA 11 of the first embodiment and the second embodiment. Yes. In addition, about the audio reproduction apparatus 10a and PDA11a, the same code | symbol is used about the same structure as the audio reproduction apparatus 10 and PDA11.

第３の実施形態の音声記録再生装置１０ａでは、音声情報をカメラ１６で撮影した利用者の周辺の画像（周辺画像）に付加して記録する。より具体的には、音声情報を記録する際の利用者の顔の向きを検出し、この顔の向きを周辺画像における位置に変換し、音声情報をこの周辺画像における位置の情報（位置情報）や周辺画像の情報（周辺画像情報）とともに記録する。なお、新規に音声情報を記録する際には、最後に（前回）記録した音声情報が付加された周辺画像と新規に音声情報を記録する際にカメラ１６で撮影した周辺画像を比較し、利用者の環境が変化したと判断する場合には新たにカメラ１６で撮影した周辺画像に新規の音声情報を付加し、環境が変化したと判断しない場合には最後に（前回）記録した音声情報が付加された周辺画像に新規の音声情報を付加する。つまり、第３の実施形態では、カメラ１６で撮影した周辺画像がレイヤの役割を果たす。 In the audio recording / reproducing apparatus 10a of the third embodiment, audio information is added to and recorded on an image (peripheral image) around the user taken by the camera 16. More specifically, the orientation of the user's face when recording audio information is detected, the orientation of the face is converted into a position in the surrounding image, and the audio information is information on the position in the surrounding image (position information). And peripheral image information (peripheral image information). When newly recording audio information, the peripheral image added with the last (previous) recorded audio information is compared with the peripheral image captured by the camera 16 when recording the new audio information. If it is determined that the environment of the person has changed, new audio information is added to the surrounding image newly taken by the camera 16, and if not determined that the environment has changed, the last (previous) recorded audio information is stored. New audio information is added to the added peripheral image. That is, in the third embodiment, the peripheral image captured by the camera 16 serves as a layer.

この周辺画像の情報（周辺画像情報）は、図６（ｃ）に示すように、当該周辺画像情報を識別する画像識別情報と対応つけてフラッシュメモリ１０３に記憶され、また音声情報も画像識別情報と対応つけて記憶される。このことにより、音声情報が付加される周辺画像が特定される。 The peripheral image information (peripheral image information) is stored in the flash memory 103 in association with image identification information for identifying the peripheral image information, as shown in FIG. Are stored in association with each other. Thereby, the peripheral image to which the audio information is added is specified.

以下に、図１５に示すフロー図を参照して、音声記録再生装置１０ａで音声を記録する際にＰＤＡ１１aのＣＰＵ１００が実行する処理について説明する。なお、図１５に示す処理を実行して利用者が発した音声をＰＤＡ１１ａに記録すると、フラッシュメモリ１０３には、１つの音声情報について、図６（ｃ）に示すように、画像識別情報、音声情報、位置情報、時刻情報が記録される。また、これらとは別個にフラッシュメモリ１０３には、画像識別情報、周辺画像情報が記録される。さらに、図１５のフロー図に示す処理は一例であり、各処理ステップの順序はこれを変更しても発明を実施できるものについては順序を変更してもかまわない。 Hereinafter, processing executed by the CPU 100 of the PDA 11a when audio is recorded by the audio recording / reproducing apparatus 10a will be described with reference to a flowchart shown in FIG. Note that when the processing shown in FIG. 15 is executed and the voice uttered by the user is recorded in the PDA 11a, the flash memory 103 stores one piece of voice information, image identification information, voice, as shown in FIG. Information, position information, and time information are recorded. Separately from these, image identification information and peripheral image information are recorded in the flash memory 103. Furthermore, the processing shown in the flowchart of FIG. 15 is an example, and the order of the processing steps may be changed for those that can implement the invention even if they are changed.

まず、ＰＤＡ１１ａのＣＰＵ１００は、図１５のステップＳ８１で、音声の入力があるか否かを判断する。この音声の入力があるか否かの判断は、オーディオ入力Ｉ／Ｆ１０７を介して骨伝導マイク１２から入力される音声信号に基づいて判断される。この判断では、オーディオ入力Ｉ／Ｆ１０７に入力される音声信号から音声の信号（有音）である区間（音声区間）の切り出しを試み、音声区間が切り出せた場合に音声の入力があると判断する。なお、音声区間の切り出しの方法は、第１の実施形態などと同じである。 First, the CPU 100 of the PDA 11a determines whether or not there is a voice input in step S81 of FIG. The determination as to whether or not there is an audio input is made based on an audio signal input from the bone conduction microphone 12 via the audio input I / F 107. In this determination, an attempt is made to cut out a section (voice section) that is a voice signal (sound) from the voice signal input to the audio input I / F 107, and when the voice section is cut out, it is determined that there is a voice input. . Note that the method of segmenting the voice section is the same as in the first embodiment.

音声信号から音声区間が切り出されて、音声の入力があると判断すると（ステップＳ８１：ＹＥＳ）、次に、ＣＰＵ１００はステップＳ８３で、音声信号から音声区間の音声信号を取得して音声情報を生成する。そして生成した音声情報をフラッシュメモリ１０３に保存（記録）する。この音声情報は、例えば、ＭＰ３（MPEG Audio Layer-3）ファイルなどである。 If it is determined that the voice section is cut out from the voice signal and there is voice input (step S81: YES), the CPU 100 then acquires the voice signal of the voice section from the voice signal and generates voice information in step S83. To do. The generated audio information is stored (recorded) in the flash memory 103. This audio information is, for example, an MP3 (MPEG Audio Layer-3) file.

次に、ステップＳ８５で、３軸地磁気センサ１４の出力に基づいて、仮想空間において利用者の顔が向いている方向を特定し、方向情報を生成する。つまり、利用者の顔が仮想空間の基準から水平方向にどれだけの角度（α）の方向を向いており、垂直方向にどれだけの角度（β）の方向を向いているかを特定する。ここで、仮想空間とは魚眼レンズを有するカメラ１６によって撮影される１８０°の画角を持った周辺画像で示される空間である。なお、この方向情報（水平方向の角度αおよび垂直方向の角度β）は、ＲＡＭ１０２に記録される。 Next, in step S85, the direction in which the user's face is facing in the virtual space is specified based on the output of the triaxial geomagnetic sensor 14, and direction information is generated. That is, it is specified how much angle (α) the user's face faces in the horizontal direction from the reference of the virtual space and how much angle (β) faces in the vertical direction. Here, the virtual space is a space indicated by a peripheral image having an angle of view of 180 ° that is captured by the camera 16 having a fisheye lens. The direction information (horizontal angle α and vertical angle β) is recorded in the RAM 102.

ステップＳ８７では、画像信号入力Ｉ／Ｆ１１０を介してカメラ１６から画像信号を取得して、利用者の体が向いている方向の周辺画像情報を生成しこれを一旦ＲＡＭ１０２に記録する。この周辺画像情報は、例えば、ＪＰＥＧファイルなどである。 In step S87, an image signal is acquired from the camera 16 via the image signal input I / F 110, peripheral image information in a direction in which the user's body is facing is generated, and this is temporarily recorded in the RAM 102. This peripheral image information is, for example, a JPEG file.

次に、ステップＳ８９では、時計回路１０４からから現在（音声情報の記録時（より正確には音声区間の開始時））の時刻の情報を取得し、これをステップＳ８３で保存した音声情報に対応つけてフラッシュメモリ１０３に記録する。 Next, in step S89, information on the current time (at the time of recording the voice information (more accurately, at the start of the voice section)) is acquired from the clock circuit 104, and this corresponds to the voice information stored in step S83. And record it in the flash memory 103.

そして、ステップＳ９１では、ステップＳ８７で生成してＲＡＭ１０２に一旦記録した周辺画像情報と、フラッシュメモリ１０３に最後に（前回）記録された周辺画像情報とを比較し、利用者の周辺の環境に変化があるか否かを判断する。最後（前回）の周辺画像情報は、画像識別情報が対応つけられた音声情報のうち、対応つけられた時刻情報が示す時刻が最新である音声情報を特定すれば、この音声情報に対応つけられた画像識別情報に基づいて決定できる。（ステップＳ８３で保存した音声情報は画像識別情報がいまだ対応つけられていないため除外される。）
なお、周辺画像情報をフラッシュメモリ１０３に記録する際に時計回路１０４から取得した現在の時刻の情報を周辺画像情報とともに記録し、この時刻の情報に基づいて最後（前回）の周辺画像情報を特定するようにしてもよい。 In step S91, the peripheral image information generated in step S87 and temporarily recorded in the RAM 102 is compared with the peripheral image information recorded last (previous) in the flash memory 103, and the environment surrounding the user is changed. Judge whether there is. The last (previous) peripheral image information is associated with the sound information if the sound information with the latest time indicated by the associated time information is identified from the sound information associated with the image identification information. It can be determined based on the image identification information. (The audio information stored in step S83 is excluded because the image identification information has not yet been associated.)
The current time information acquired from the clock circuit 104 when the peripheral image information is recorded in the flash memory 103 is recorded together with the peripheral image information, and the last (previous) peripheral image information is specified based on this time information. You may make it do.

なお、周辺画像情報に基づく環境の変化の有無の判断は、例えば、２つの周辺画像情報における輝度やカラー情報の単位時間当たりの変化を集計し、変化量が所定の閾値以上である場合に、環境が変化していると判断することによって実現することができる。また、特開平６−２５１１４７号公報（映像特徴処理方法）に開示されている動画像におけるシーン変化検出方法も利用することができる。 In addition, the determination of the presence or absence of the environmental change based on the peripheral image information is, for example, when the changes per unit time of the luminance and color information in the two peripheral image information are totaled, It can be realized by judging that the environment is changing. In addition, a scene change detection method in a moving image disclosed in JP-A-6-251147 (video feature processing method) can also be used.

そして、利用者の周辺の環境に変化があると判断すると（ステップＳ９１：ＹＥＳ）、ステップＳ９３で、ＲＡＭ１０２に一旦記録されている（新規の）周辺画像情報をフラッシュメモリ１０３に記録する。このとき、周辺画像情報には当該周辺画像情報を識別するための画像識別情報が割り当てられ、周辺画像情報はこの画像識別情報と対応つけてフラッシュメモリ１０３に記録される。 If it is determined that there is a change in the environment around the user (step S91: YES), the (new) peripheral image information once recorded in the RAM 102 is recorded in the flash memory 103 in step S93. At this time, image identification information for identifying the peripheral image information is assigned to the peripheral image information, and the peripheral image information is recorded in the flash memory 103 in association with the image identification information.

次に、ステップＳ９５では、ステップＳ９３でフラッシュメモリ１０３に記録した周辺画像情報の画像識別情報をステップＳ８３で保存した音声情報と対応つけてフラッシュメモリ１０３に記録する。このことによって、新規に記録した周辺画像情報と音声情報とが対応つけられる。 Next, in step S95, the image identification information of the peripheral image information recorded in the flash memory 103 in step S93 is recorded in the flash memory 103 in association with the audio information stored in step S83. As a result, the newly recorded peripheral image information is associated with the audio information.

一方、利用者の周辺の環境に変化がないと判断すると（ステップＳ９１：ＮＯ）、最後に（前回）フラッシュメモリ１０３に記録された（既存の）周辺画像情報（ステップＳ９１の判断に利用された一方の周辺画像情報）の画像識別情報をステップＳ８３で保存した音声情報と対応つけてフラッシュメモリ１０３に記録することによって、最後に（前記）記録した周辺画像情報と音声情報とを対応つける。 On the other hand, if it is determined that there is no change in the surrounding environment of the user (step S91: NO), the (existing) peripheral image information recorded in the flash memory 103 at the end (used in the determination of step S91). The image identification information (one peripheral image information) is recorded in the flash memory 103 in association with the audio information stored in step S83, so that the last recorded (above) peripheral image information and audio information are associated with each other.

そして、ステップＳ９９では、ステップＳ８５で生成されてＲＡＭ１０２に記録された方向情報である水平方向の角度αおよび垂直方向の角度βを、ステップＳ９５またはステップＳ９７で音声情報が対応つけられた周辺画像（情報）における位置を示す座標（ｘ，ｙ）に変換し、これらの位置情報（座標（ｘ，ｙ））を先に保存された音声情報と対応つけてフラッシュメモリ１０３に記録する。なお、顔の向きを示す角度αおよび角度βから画像上の座標（ｘ，ｙ）への変換は、Hirotake Yamazoe, Akira Utsumi, Kenichi Hosaka, Masahiko Yachida, “A body-mounted camera system for head-pose estimation and user-view image synthesis,” Available online at www.sciencedirect.com ScienceDirect Image and Vision Computing 25 (2007) 1848-1855に開示されている手法を用いて実現することができる。 In step S99, the horizontal direction angle α and the vertical direction angle β, which are the direction information generated in step S85 and recorded in the RAM 102, are set as the peripheral image (step S95 or step S97). Information) is converted into coordinates (x, y) indicating the position, and the position information (coordinates (x, y)) is recorded in the flash memory 103 in association with the previously stored audio information. The transformation from angle α and angle β indicating the orientation of the face to the coordinates (x, y) on the image is as follows: Hirotake Yamazoe, Akira Utsumi, Kenichi Hosaka, Masahiko Yachida, “A body-mounted camera system for head-pose estimation and user-view image synthesis, "Available online at www.sciencedirect.com ScienceDirect Image and Vision Computing 25 (2007) 1848-1855.

次に、図１６に示すフロー図を参照して、音声記録再生装置１０ａで音声情報を視覚的に閲覧して音声の再生などをする際にＰＤＡ１１aのＣＰＵ１００が実行する処理について説明する。なお、図１６のフロー図に示す処理は一例であり、各処理ステップの順序はこれを変更しても発明を実施できるものについては順序を変更してもかまわない。 Next, processing executed by the CPU 100 of the PDA 11a when the audio recording / reproducing apparatus 10a visually browses audio information and reproduces audio will be described with reference to a flowchart shown in FIG. Note that the processing shown in the flowchart of FIG. 16 is an example, and the order of each processing step may be changed if the order of processing steps can be changed and the invention can be implemented.

利用者がＰＤＡ１１ａのタッチパネル１０５に対して、音声情報を閲覧するための操作を行うと、ＣＰＵ１００は、ステップＳ１０１で、フラッシュメモリ１０３に記録されているすべての周辺画像（情報）（レイヤ）をディスプレイ１０６に３次元で表示する。このとき、図１７に示すように、フラッシュメモリ１０３に記録された時刻が古いものを下（奥あるいは遠く）に表示し、時刻が新しい物をこの上に重ねて手前（近く）に表示する。また、各周辺画像の右上には、その周辺画像を選択するための付箋が表示される。 When the user performs an operation for browsing audio information on the touch panel 105 of the PDA 11a, the CPU 100 displays all peripheral images (information) (layers) recorded in the flash memory 103 in step S101. It is displayed in three dimensions on 106. At this time, as shown in FIG. 17, the oldest time recorded in the flash memory 103 is displayed below (back or far), and the newest time is displayed on top of this (close). In addition, a sticky note for selecting the peripheral image is displayed on the upper right of each peripheral image.

なお、周辺画像情報がフラッシュメモリ１０３に記録された時刻の新旧は、周辺画像情報に対応つけられている音声情報の時刻情報に基づいて特定できるが、先述したように、周辺画像情報自体に記録時の時刻を示す情報を対応つけて記録するようにしてもよい。 The time when the peripheral image information is recorded in the flash memory 103 can be specified based on the time information of the audio information associated with the peripheral image information, but as described above, it is recorded in the peripheral image information itself. Information indicating the time of time may be recorded in association with each other.

そして、ステップＳ１０３では、ディスプレイ１０６に表示した周辺画像の上に当該周辺画像の周辺画像情報の画像識別情報と対応つけられた音声情報を示すアイコン２００を表示する。このとき、アイコン２００は対応する音声情報と対応つけてフラッシュメモリ１０３に記録された位置情報（ｘ座標、ｙ座標）によって特定される周辺画像上の位置に表示される。 In step S103, an icon 200 indicating audio information associated with the image identification information of the peripheral image information of the peripheral image is displayed on the peripheral image displayed on the display 106. At this time, the icon 200 is displayed at a position on the peripheral image specified by the position information (x coordinate, y coordinate) recorded in the flash memory 103 in association with the corresponding audio information.

次に、利用者がタッチパネル１０５に対して周辺画像（レイヤ）を選択する操作、つまりディスプレイ１０６に表示されている付箋のいずれかを選択する操作（付箋をクリック（タップ）する操作）を行うと、ＣＰＵ１００は、ステップＳ１０５で、レイヤが選択されたと判断する。 Next, when the user performs an operation of selecting a peripheral image (layer) on the touch panel 105, that is, an operation of selecting one of the sticky notes displayed on the display 106 (an operation of clicking (tapping) the sticky note). The CPU 100 determines in step S105 that a layer has been selected.

そして、ステップＳ１０７で、選択された周辺画像（レイヤ）のみをディスプレイ１０６に３次元で表示し、ステップＳ１０９で選択された周辺画像に付加された音声情報のアイコン２００、つまり選択された周辺画像の画像識別情報と対応つけて記録された音声情報のアイコン２００をディスプレイ１０６に表示された周辺画像の上に表示する。この際もアイコン２００は対応する音声情報と対応つけてフラッシュメモリ１０３に記録された位置情報に基づいた位置に表示される。 In step S107, only the selected peripheral image (layer) is displayed in three dimensions on the display 106, and the audio information icon 200 added to the peripheral image selected in step S109, that is, the selected peripheral image is displayed. The audio information icon 200 recorded in association with the image identification information is displayed on the peripheral image displayed on the display 106. Also at this time, the icon 200 is displayed at a position based on the position information recorded in the flash memory 103 in association with the corresponding audio information.

このように、ディスプレイ１０６に周辺画像とその上にアイコン２００が表示された状態で、利用者がタッチパネル１０５に対してアイコン２００をクリック（タップ）する操作を行うと、ＣＰＵ１００は、ステップＳ１１１でアイコン２００がクリックされたと判断し（ステップＳ１１１：ＹＥＳ）、ステップＳ１１３で、クリックされたアイコン２００に対応する音声情報を再生する。 As described above, when the user performs an operation of clicking (tapping) the icon 200 on the touch panel 105 in a state where the peripheral image and the icon 200 are displayed on the display 106, the CPU 100 displays the icon in step S111. It is determined that 200 is clicked (step S111: YES), and the audio information corresponding to the clicked icon 200 is reproduced in step S113.

また、利用者がタッチパネル１０５に対してアイコン２００をドラッグしてその位置を変更する操作を行うと、ＣＰＵ１００は、ステップＳ１１５でアイコン２００のドラッグ操作が行われたと判断し（ステップＳ１１５：ＹＥＳ）、ステップＳ１１７で、利用者のドラッグ操作に応じてアイコン２００の周辺画像上の位置を変更する。また、このとき、変更されたアイコン２００の位置に応じて、フラッシュメモリ１０３に記録されている位置情報（ｘ座標、ｙ座標）を変更する。 When the user performs an operation of dragging the icon 200 on the touch panel 105 and changing its position, the CPU 100 determines that the drag operation of the icon 200 has been performed in step S115 (step S115: YES). In step S117, the position of the icon 200 on the peripheral image is changed according to the user's drag operation. At this time, the position information (x coordinate, y coordinate) recorded in the flash memory 103 is changed according to the changed position of the icon 200.

さらに、利用者がタッチパネル１０５に対して周辺画像をドラッグしてその向きを変える操作を行うと、ＣＰＵ１００は、ステップＳ１１９で周辺画像のドラッグ操作が行われたと判断し（ステップＳ１１９：ＹＥＳ）、利用者のドラッグ操作に応じてディスプレイ１０６に表示される周辺画像の向きを変更する。 Further, when the user performs an operation of dragging the surrounding image on the touch panel 105 and changing the direction thereof, the CPU 100 determines that the dragging operation of the surrounding image has been performed in step S119 (step S119: YES), and is used. The direction of the peripheral image displayed on the display 106 is changed according to the drag operation of the person.

なお、利用者がタッチパネル１０５に対して音声情報の閲覧を終了する操作を行うと、ＣＰＵ１００は、ステップＳ１２３で、閲覧の終了操作が行われたと判断して処理を終了する。
＜第４の実施形態＞
第３の実施形態の音声記録再生装置１０ａでは、利用者の周辺の環境が変化した場合に、音声情報を付加する周辺画像を更新したが、第４の実施形態の音声記録再生装置１０ａでは、所定の時間が経過した場合にも音声情報を付加する周辺画像を更新する。なお、音声記録再生装置１０ａの構成は、第３の実施形態のものと同じである。 When the user performs an operation to end browsing of the audio information on the touch panel 105, the CPU 100 determines in step S123 that the browsing end operation has been performed, and ends the process.
<Fourth Embodiment>
In the audio recording / reproducing apparatus 10a of the third embodiment, the surrounding image to which the audio information is added is updated when the environment around the user changes. However, in the audio recording / reproducing apparatus 10a of the fourth embodiment, The peripheral image to which the audio information is added is also updated when a predetermined time has elapsed. Note that the configuration of the audio recording / reproducing apparatus 10a is the same as that of the third embodiment.

以下に、図１８に示すフロー図を参照して、音声記録再生装置１０ａで音声を記録する際にＰＤＡ１１aのＣＰＵ１００が実行する処理について説明する。なお、図１８のフロー図に示す処理を実行して利用者が発した音声をＰＤＡ１１ａに記録すると、フラッシュメモリ１０３には、１つの音声情報について、図６（ｄ）に示すように、画像識別情報、音声情報、位置情報、時刻情報が記録される。また、これらとは別個にフラッシュメモリ１０３には、画像識別情報、周辺画像情報が記録される。さらに、図１８のフロー図に示す処理は一例であり、各処理ステップの順序はこれを変更しても発明を実施できるものについては順序を変更してもかまわない。 Hereinafter, processing executed by the CPU 100 of the PDA 11a when audio is recorded by the audio recording / reproducing apparatus 10a will be described with reference to a flowchart shown in FIG. Note that when the processing shown in the flowchart of FIG. 18 is executed and the voice uttered by the user is recorded on the PDA 11a, the flash memory 103 stores one piece of voice information for image identification as shown in FIG. Information, audio information, position information, and time information are recorded. Separately from these, image identification information and peripheral image information are recorded in the flash memory 103. Furthermore, the processing shown in the flowchart of FIG. 18 is an example, and the order of the processing steps may be changed as long as the invention can be implemented even if the order is changed.

まず、ＰＤＡ１１ａのＣＰＵ１００は、図１８のステップＳ１３１で、音声の入力があるか否かを判断する。この音声の入力があるか否かの判断は、オーディオ入力Ｉ／Ｆ１０７を介して骨伝導マイク１２から入力される音声信号に基づいて判断される。この判断では、オーディオ入力Ｉ／Ｆ１０７に入力される音声信号から音声の信号（有音）である区間（音声区間）の切り出しを試み、音声区間が切り出せた場合に音声の入力があると判断する。なお、音声区間の切り出しの方法は、第１の実施形態などと同じである。 First, the CPU 100 of the PDA 11a determines whether or not there is an audio input in step S131 of FIG. The determination as to whether or not there is an audio input is made based on an audio signal input from the bone conduction microphone 12 via the audio input I / F 107. In this determination, an attempt is made to cut out a section (voice section) that is a voice signal (sound) from the voice signal input to the audio input I / F 107, and when the voice section is cut out, it is determined that there is a voice input. . Note that the method of segmenting the voice section is the same as in the first embodiment.

音声信号から音声区間が切り出されて、音声の入力があると判断すると（ステップＳ１３１：ＹＥＳ）、次に、ＣＰＵ１００はステップＳ１３３で、音声信号から音声区間の音声信号を取得して音声情報を生成する。そして生成した音声情報をフラッシュメモリ１０３に保存（記録）する。この音声情報は、例えば、ＭＰ３（MPEG Audio Layer-3）ファイルなどである。 If it is determined that a voice section is cut out from the voice signal and there is voice input (step S131: YES), the CPU 100 then obtains a voice signal of the voice section from the voice signal and generates voice information in step S133. To do. The generated audio information is stored (recorded) in the flash memory 103. This audio information is, for example, an MP3 (MPEG Audio Layer-3) file.

次に、ステップＳ１３５で、３軸地磁気センサ１４の出力に基づいて、仮想空間において利用者の顔が向いている方向を特定し、方向情報を生成する。つまり、利用者の顔が仮想空間の基準から水平方向にどれだけの角度（α）の方向を向いており、垂直方向にどれだけの角度（β）の方向を向いているかを特定する。ここで、仮想空間とは魚眼レンズを有するカメラ１６によって撮影される１８０°の画角を持った周辺画像で示される空間である。なお、この方向情報（水平方向の角度αおよび垂直方向の角度β）は、ＲＡＭ１０２に記録される。 Next, in step S135, based on the output of the triaxial geomagnetic sensor 14, the direction in which the user's face is facing in the virtual space is specified, and direction information is generated. That is, it is specified how much angle (α) the user's face faces in the horizontal direction from the reference of the virtual space and how much angle (β) faces in the vertical direction. Here, the virtual space is a space indicated by a peripheral image having an angle of view of 180 ° that is captured by the camera 16 having a fisheye lens. The direction information (horizontal angle α and vertical angle β) is recorded in the RAM 102.

ステップＳ１３７では、画像信号入力Ｉ／Ｆ１１０を介してカメラ１６から画像信号を取得して、利用者の体が向いている方向の周辺画像情報を生成しこれを一旦ＲＡＭ１０２に記録する。この周辺画像情報は、例えば、ＪＰＥＧファイルなどである。 In step S137, an image signal is acquired from the camera 16 via the image signal input I / F 110, peripheral image information in a direction in which the user's body is facing is generated, and this is temporarily recorded in the RAM 102. This peripheral image information is, for example, a JPEG file.

次に、ステップＳ１３９では、時計回路１０４からから現在（音声情報の記録時（より正確には音声区間の開始時））の時刻の情報を取得し、これをステップＳ１３３で保存した音声情報に対応つけてフラッシュメモリ１０３に記録する。 Next, in step S139, the current time information (at the time of recording the voice information (more precisely, at the start of the voice section)) is acquired from the clock circuit 104, and this corresponds to the voice information stored in step S133. And record it in the flash memory 103.

そして、ステップＳ１４１では、ステップＳ１３９でフラッシュメモリ１０３に記録した時刻情報と、フラッシュメモリ１０３に画像識別情報と対応つけられた音声情報に対応つけて最後に記録された時刻情報（つまり、前回音声情報を記録したときの時刻を示す時刻情報）とを比較し、現在の時刻（ステップＳ１３９でフラッシュメモリ１０３に記録した時刻情報が示す時刻（正確には、音声区間の始まりの時刻））が前回音声情報をフラッシュメモリ１０３に記録してから、例えば、３０分以上経過しているか否かを判断する。 In step S141, the time information recorded in the flash memory 103 in step S139 and the time information recorded last in association with the audio information associated with the image identification information in the flash memory 103 (that is, the previous audio information). And the current time (the time indicated by the time information recorded in the flash memory 103 in step S139 (more precisely, the start time of the voice section)) is compared with the previous voice. For example, it is determined whether or not 30 minutes or more have elapsed since the information was recorded in the flash memory 103.

３０分以上経過していないと判断すると（ステップＳ１４１：ＮＯ）、最後に（前回）フラッシュメモリ１０３に記録された周辺画像情報の画像識別情報をステップＳ１３３で保存した音声情報と対応つけてフラッシュメモリ１０３に記録する。このことによって、最後に（前回）記録した（既存の）周辺画像情報と音声情報とが対応つけられる。 If it is determined that 30 minutes or more have not passed (step S141: NO), the flash memory is associated with the image information stored in the flash memory 103 at last (previous) in the image identification information stored in step S133. 103. Thus, the last (previous) recorded (existing) peripheral image information and the audio information are associated with each other.

一方、３０分以上経過していると判断すると（ステップＳ１４１：ＹＥＳ）、さらにステップＳ１４５で、前回音声情報をフラッシュメモリ１０３に記録してから、例えば、６０分以上経過しているか否かを判断する。 On the other hand, if it is determined that 30 minutes or more have elapsed (step S141: YES), it is further determined in step S145 whether, for example, 60 minutes or more have elapsed since the previous audio information was recorded in the flash memory 103. To do.

そして、６０分以上経過していると判断すると（ステップＳ１４５：ＹＥＳ）、ステップＳ１４７で、ＲＡＭ１０２に一旦記録されている周辺画像情報をフラッシュメモリ１０３に記録する。このとき、周辺画像情報には当該周辺画像情報を識別するための画像識別情報が割り当てられ、周辺画像情報はこの画像識別情報と対応つけてフラッシュメモリ１０３に記録される。 If it is determined that 60 minutes or more have elapsed (step S145: YES), the peripheral image information once recorded in the RAM 102 is recorded in the flash memory 103 in step S147. At this time, image identification information for identifying the peripheral image information is assigned to the peripheral image information, and the peripheral image information is recorded in the flash memory 103 in association with the image identification information.

次に、ステップＳ１４９では、ステップＳ１４７でフラッシュメモリ１０３に記録した周辺画像情報の画像識別情報をステップＳ１３３で保存した音声情報と対応つけてフラッシュメモリ１０３に記録することによって、新規に記録した周辺画像情報と音声情報とを対応つける。 Next, in step S149, the peripheral image information newly recorded by recording the image identification information of the peripheral image information recorded in the flash memory 103 in step S147 in the flash memory 103 in association with the audio information stored in step S133. Correlate information with audio information.

一方、６０分以上経過していないと判断すると（ステップＳ１４５：ＮＯ）、ステップＳ１５１で、ステップＳ１３７で生成してＲＡＭ１０２に一旦記録した周辺画像情報と、フラッシュメモリ１０３に最後に（前回）記録された周辺画像情報とを比較し、利用者の周辺の環境に変化があるか否かを判断する。 On the other hand, if it is determined that 60 minutes or more have not elapsed (step S145: NO), in step S151, the peripheral image information generated in step S137 and once recorded in the RAM 102 and finally (previous) recorded in the flash memory 103 are recorded. The surrounding image information is compared to determine whether there is a change in the environment around the user.

そして、利用者の周辺の環境に変化があると判断すると（ステップＳ１５１：ＹＥＳ）、ステップＳ１４７およびステップＳ１４９で先述の処理を実行する。 If it is determined that there is a change in the environment around the user (step S151: YES), the above-described processing is executed in steps S147 and S149.

一方、利用者の周辺の環境に変化がないと判断すると（ステップＳ１５１：ＮＯ）、最後に（前記）フラッシュメモリ１０３に記録された周辺画像情報（ステップＳ１５１の判断に利用された一方の周辺画像情報）の画像識別情報をステップＳ１３３で保存した音声情報と対応つけてフラッシュメモリ１０３に記録することによって、最後に（前回）記録した（既存の）周辺画像情報と音声情報とを対応つける。 On the other hand, if it is determined that there is no change in the surrounding environment of the user (step S151: NO), the peripheral image information recorded in the flash memory 103 at the end (one peripheral image used for the determination in step S151). (Information) image identification information is recorded in the flash memory 103 in association with the audio information stored in step S133, so that the last (previous) recorded (existing) peripheral image information and audio information are associated with each other.

そして、ステップＳ１５５では、ステップＳ１３５で生成されてＲＡＭ１０２に記録された方向情報である水平方向の角度αおよび垂直方向の角度βを、ステップＳ１４３、ステップＳ１４９、或いはステップＳ１５３で音声情報が対応つけられた周辺画像（情報）における座標（ｘ，ｙ）に変換し、これらの位置情報（座標（ｘ，ｙ））を先に保存された音声情報と対応つけてフラッシュメモリ１０３に記録する。 In step S155, the horizontal angle α and the vertical angle β, which are the direction information generated in step S135 and recorded in the RAM 102, are associated with the audio information in step S143, step S149, or step S153. Then, it is converted into coordinates (x, y) in the peripheral image (information), and the positional information (coordinates (x, y)) is recorded in the flash memory 103 in association with the previously stored audio information.

なお、音声記録再生装置１０ａで音声情報を視覚的に閲覧して音声の再生などをする際にＰＤＡ１１aのＣＰＵ１００が実行する処理は、第３の実施形態において図１６のフロー図を用いて説明した処理と同じであるので、重複する説明を省略する。
＜その他の実施形態＞
その他の実施形態として、第２の実施形態におけるレイヤ識別情報（レイヤ数）によって規定され時間とともに仮想空間における利用者からの距離が遠くなるレイヤ（“時間レイヤ”と呼ぶ）と、第３および第４の実施形態における周辺画像によって規定されるレイヤ（“画像レイヤ”と呼ぶ）とを混在させる形態が考えられる。この場合、周辺画像情報に撮影時の時刻の情報（時刻情報）やレイヤ識別情報（レイヤ数）を対応つけて記憶するとよい。このようにすれば、時間レイヤと画像レイヤとからなるレイヤの集合に、時刻情報が示す時刻に応じて通し番号となったレイヤ数を割り当てて一連のレイヤとして取り扱うことができる。 Note that the processing executed by the CPU 100 of the PDA 11a when the audio information is visually browsed and reproduced by the audio recording / reproducing apparatus 10a has been described with reference to the flowchart of FIG. 16 in the third embodiment. Since it is the same as the process, a duplicate description is omitted.
<Other embodiments>
As other embodiments, layers defined by the layer identification information (number of layers) in the second embodiment and the distance from the user in the virtual space with time (referred to as “time layer”), the third and third A mode in which a layer defined by a peripheral image (referred to as an “image layer”) in the fourth embodiment can be considered. In this case, it is preferable to store the peripheral image information in association with time information (time information) and layer identification information (number of layers) at the time of shooting. In this way, the number of layers that are serial numbers can be assigned to a set of layers composed of a time layer and an image layer according to the time indicated by the time information, and can be handled as a series of layers.

また、この場合は、周辺画像情報に対応つける音声情報には、音声の記録時の利用者の顔の向きの情報（方向情報）を対応つけて記録し、再生時にこの方向情報にもとづいて音源定位可能に再生するようにするとよい。さらに、第３および第４の実施形態のように音声情報を視覚的に閲覧する場合には、時間レイヤに対応させて、あらかじめ用意した擬似的な周辺画像をディスプレイ１０６に表示しその上に音声情報に対応するアイコン２００を表示するようにするとよい。 Also, in this case, the audio information associated with the peripheral image information is recorded in association with the information on the user's face direction (direction information) at the time of recording the sound, and the sound source is based on this direction information at the time of reproduction. It is better to play it so that it can be localized. Furthermore, when the audio information is visually browsed as in the third and fourth embodiments, a pseudo peripheral image prepared in advance corresponding to the time layer is displayed on the display 106 and the audio is displayed thereon. An icon 200 corresponding to the information may be displayed.

なお、上述した実施の形態は、本発明の技術思想の範囲内で種々に変更してかまわない。例えば、上述の実施形態では、利用者の顔の向きを３軸地磁気センサのみで検出しているが、３軸地磁気センサと３軸加速度センサとの２つで検出するようにしてもよい。また、利用者の顔を撮影した画像を処理することによって顔の向きを検出するようにしてもよい。 The embodiment described above may be variously modified within the scope of the technical idea of the present invention. For example, in the above-described embodiment, the orientation of the user's face is detected only by the triaxial geomagnetic sensor, but may be detected by two of the triaxial geomagnetic sensor and the triaxial acceleration sensor. Further, the orientation of the face may be detected by processing an image obtained by photographing the user's face.

さらに、利用者の顔の向きによって仮想空間における方向を指定したが、これに替えて、利用者の視線の向きや指差しの向きによって方向を指定するようにしてもよい。なお、利用者の視線の向きの特定には、H. Yamazoe, A. Utsumi, T. Yonezawa, S. Abe. “Remote Gaze Direction Estimation with a Single Camera Based on Facial-Feature Tracking,” ETRA2008, pp.245-250, 2008で提案されている手法を用いることができる。 Furthermore, the direction in the virtual space is designated by the orientation of the user's face, but instead, the direction may be designated by the orientation of the user's line of sight or pointing. To identify the user's line of sight, H. Yamazoe, A. Utsumi, T. Yonezawa, S. Abe. “Remote Gaze Direction Estimation with a Single Camera Based on Facial-Feature Tracking,” ETRA2008, pp. The method proposed in 245-250, 2008 can be used.

また、利用者による音声の入力には、骨伝導マイク１２を用いることとしたが、これに替えて、ＮＡＭ（非可聴つぶやき）マイクを用いるようにしてもよい。ＮＡＭマイクとは、声帯振動を伴わない無声呼気音が発話器官の運動による音響的フィルタ特性により調音されて人体頭部の主に軟部組織を伝導したものを収集することができるマイクロフォンである。このＮＡＭマイクを利用すれば、電車内などの人目が気になる場所においても、気兼ねなく音声記録再生装置１０、１０ａに音声を入力することができる。 Further, although the bone conduction microphone 12 is used for voice input by the user, a NAM (non-audible tweeting) microphone may be used instead. The NAM microphone is a microphone that can collect unvoiced exhalation sounds that are not accompanied by vocal cord vibrations by acoustic filter characteristics due to the movement of the speech organs and that are conducted mainly through the soft tissue of the human head. By using this NAM microphone, it is possible to input voice to the voice recording / reproducing apparatus 10, 10a without hesitation even in places such as trains where people are concerned.

なお、人目が気になる場所においても気兼ねなく音声を入力するためには、喉元に装着して声帯の振動を音声にして伝える方式であるスロートマイクを採用してもよい。 In order to input voice without hesitation even in places where people are worried, a throat microphone, which is a system that is attached to the throat and transmits the vibration of the vocal cords as voice, may be employed.

また、上述の実施形態では、レイヤを円弧または半球面とし、時間の経過とともに、時間の経過とともに音声情報は利用者から距離が遠いレイヤに属することとしたが、レイヤは必ずしも利用者を中心とする円弧や半球面でなくてもよく、また、レイヤごとに利用者からの距離が同心円を中心として遠くなる構成でなくてもよい。つまり、レイヤは平面であり、利用者の周りに等距離で位置する構成としてもよく、さらには、レイヤは空間的広がりを有さず、単に属する音声情報をまとめる概念的な区別単位であってもよい。 In the above-described embodiment, the layer is an arc or a hemisphere, and the audio information belongs to a layer that is far from the user as time passes. However, the layer is not necessarily centered on the user. It does not have to be a circular arc or hemispherical surface, and the distance from the user for each layer does not have to be far from a concentric circle. In other words, the layer may be a plane, and may be configured to be equidistant around the user. Furthermore, the layer has no spatial spread and is a conceptual distinction unit that simply summarizes the audio information to which it belongs. Also good.

本発明の音声記録再生装置の利用状況の一例を示す図解図である。It is an illustration figure which shows an example of the utilization condition of the audio | voice recording / reproducing apparatus of this invention. 本発明の音声記録再生装置を構成する携帯情報端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the portable information terminal which comprises the audio | voice recording / reproducing apparatus of this invention. 携帯情報端末のＣＰＵが音声の記録時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of recording of an audio | voice. 携帯情報端末のＣＰＵが音声の再生時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of reproduction | regeneration of an audio | voice. 仮想空間における利用者の顔が向いている方向を説明する図解図である。It is an illustration figure explaining the direction where the user's face is facing in virtual space. 携帯情報端末のフラッシュメモリに記録されるデータの構成の一例を示す図解図である。It is an illustration figure which shows an example of a structure of the data recorded on the flash memory of a portable information terminal. 音声情報の音源定位について説明する図解図である。It is an illustration figure explaining the sound source localization of audio | voice information. 音声情報の再生について説明する図解図である。It is an illustration figure explaining reproduction | regeneration of audio | voice information. 音声情報の再生について説明する図解図である。It is an illustration figure explaining reproduction | regeneration of audio | voice information. 携帯情報端末のＣＰＵが音声の記録時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of recording of an audio | voice. 携帯情報端末のＣＰＵが音声の再生時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of reproduction | regeneration of an audio | voice. 仮想空間における利用者の顔が向いている方向を説明する図解図である。It is an illustration figure explaining the direction where the user's face is facing in virtual space. 本発明の音声記録再生装置の利用状況の一例を示す図解図である。It is an illustration figure which shows an example of the utilization condition of the audio | voice recording / reproducing apparatus of this invention. 本発明の音声記録再生装置を構成する携帯情報端末の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of the portable information terminal which comprises the audio | voice recording / reproducing apparatus of this invention. 携帯情報端末のＣＰＵが音声の記録時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of recording of an audio | voice. 携帯情報端末のＣＰＵが音声情報の閲覧・再生時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of browsing and reproduction | regeneration of audio | voice information. 携帯情報端末のディスプレイにおける表示の一例を示す図解図である。It is an illustration figure which shows an example of the display in the display of a portable information terminal. 携帯情報端末のＣＰＵが音声の記録時に実行する処理の一例を示すフロー図である。It is a flowchart which shows an example of the process which CPU of a portable information terminal performs at the time of recording of an audio | voice.

Explanation of symbols

１０ …音声記録再生装置
１１ …ＰＤＡ
１２ …骨伝導マイク
１３ …ステレオヘッドフォン
１４ …３軸地磁気センサ
１６ …カメラ
１００ …ＣＰＵ
１０３ …フラッシュメモリ
１０４ …時計回路 10: Audio recording / reproducing device 11: PDA
DESCRIPTION OF SYMBOLS 12 ... Bone conduction microphone 13 ... Stereo headphone 14 ... Triaxial geomagnetic sensor 16 ... Camera 100 ... CPU
103 ... Flash memory 104 ... Clock circuit

Claims

An audio recording / reproducing device for recording audio information in a recording device and reproducing the audio information from the recording device,
Voice information generating means for generating voice information based on the voice emitted by the user;
Recording direction specifying means for specifying the recording direction indicated by the user using the body during recording,
Recording means for recording the sound information generated by the sound information generating means in the recording device in association with the direction information indicating the recording direction;
Playback direction specifying means for specifying the playback direction indicated by the user using the body during playback, and the voice recorded in the recording device in association with the direction information indicating the recording direction corresponding to the playback direction An audio recording / reproducing apparatus comprising first audio reproducing means for reproducing information from the recording apparatus.

The audio recording / reproducing apparatus according to claim 1, wherein the first audio reproducing unit reproduces the audio information so that sound source localization is possible based on the direction information and distance information indicating a distance from the user to the audio information.

Voice icon recording means for recording a symbolized voice icon indicating the presence of voice information in the recording device in association with the voice information and the direction information, and direction information indicating a recording time direction not corresponding to the playback time direction The audio recording / reproducing apparatus according to claim 1, further comprising audio icon reproducing means for reproducing the audio icon recorded in the recording apparatus in association with the audio information recorded in the recording apparatus in association with .

4. The audio recording / reproducing apparatus according to claim 3, wherein the audio icon reproducing means generates the audio icon so that sound source localization is possible based on the direction information and distance information indicating a distance from the user to the audio information.

Before Kiki recording means, the two or more different layers of depth audio information, further recording and in association, with the layer designation accepting means further accepts a designation of the layer and the layer identification information indicating the layer,
The first sound reproducing means is a layer indicating the layer received by the layer designation receiving means in the sound information recorded in the recording device in association with the direction information indicating the recording time direction corresponding to the reproduction time direction. The audio recording / reproducing apparatus according to claim 1, wherein the audio information associated with the identification information is reproduced.

The audio recording / reproducing apparatus according to claim 5, wherein the distance from the user to the audio information is determined based on a layer depth indicated by the layer identification information.

The reproduction direction specifying means specifies any one of a direction indicated by a user's face direction, a direction indicated by a user's gaze direction, and a direction indicated by a user's pointing direction. The audio recording / reproducing apparatus according to any one of the above.

7. The recording time specifying means specifies any one of a direction indicated by a user's face direction, a direction indicated by a user's line of sight, and a direction indicated by a user's pointing direction. The audio recording / reproducing apparatus according to any one of the above.

A program executed by a voice recording / reproducing computer for recording voice information in a recording device and reproducing the voice information from the recording device, wherein the program
Voice information generating means for generating voice information based on the voice emitted by the user;
Recording direction specifying means for specifying the recording direction indicated by the user using the body during recording,
Recording means for recording the sound information generated by the sound information generating means in the recording device in association with the direction information indicating the recording direction;
Playback direction specifying means for specifying the playback direction indicated by the user using the body during playback,
An audio recording / reproducing program for causing audio information recorded in the recording device to function as audio reproducing means for reproducing from the recording device in association with direction information indicating a recording time direction corresponding to the reproducing time direction.

An audio recording / playback computer-executed method for recording audio information on a recording device and reproducing the audio information from the recording device,
A voice information generation step for generating voice information based on the voice emitted by the user;
Recording direction specifying step for specifying the recording direction indicated by the user using the body during recording,
A recording step of recording the audio information generated in the audio information generation step in the recording device in association with the direction information indicating the recording direction;
A playback direction specifying step for specifying the playback direction indicated by the user using the body during playback, and the voice recorded in the recording device in association with the direction information indicating the recording direction corresponding to the playback direction. An audio recording / reproducing method including an audio reproducing step of reproducing information from the recording device.

An imaging unit that captures a peripheral image of the user at the time of recording; and a conversion unit that converts the recording direction to a position on the peripheral image captured by the imaging unit,
It said recording means in association records and the peripheral image and the position information of the audio information, the audio recording and reproducing apparatus according to any one of claims 1 to 8.

A first environmental change for determining whether or not the environment of the user has changed by comparing the peripheral image captured by the imaging unit with the peripheral image associated with the sound information recorded last in the recording device A judgment means,
The recording means records audio information in association with a peripheral image taken by the photographing means when the first environment change judging means judges that the environment has changed, and the first environment changing means 12. The audio recording / reproducing apparatus according to claim 11, wherein the audio information is recorded in association with a peripheral image associated with the audio information last recorded in the recording apparatus when it is determined that the information has not changed.

Time elapse judging means for judging whether or not a predetermined time has elapsed since the last recording of the audio information in the recording device;
A second environment change determination for determining whether or not the environment of the user has changed by comparing the peripheral image captured by the image capturing unit with the peripheral image associated with the audio information recorded last in the recording device. Further comprising means,
The recording unit records audio information in association with a peripheral image captured by the imaging unit based on a determination result of at least one of the time passage determination unit and the second environment change determination unit, or audio The audio recording / reproducing apparatus according to claim 11, wherein the information is recorded in association with a peripheral image to which audio information recorded last in the recording apparatus is associated.

A display for displaying a peripheral image recorded in the recording device and displaying an icon indicating audio information recorded in association with the peripheral image at a position on the peripheral image based on the position information of the audio information means,
Icon designation means for accepting designation of an icon displayed on the display means;
The audio recording / reproducing apparatus according to claim 11, further comprising second audio reproducing means for reproducing audio information recorded in the recording apparatus indicated by the icon that has been designated by the icon designating means.

The display means displays a plurality of peripheral images recorded in the recording means,
Further comprising image designation means for accepting designation of a peripheral image displayed on the display means,
15. The audio recording / reproducing apparatus according to claim 14, wherein the icon designating unit accepts designation of an icon displayed on a peripheral image for which designation is accepted by the image designating unit.

Position designation means for accepting designation of an icon change position on a peripheral image displayed on the display means;
The voice recording / reproducing according to claim 14 or 15, further comprising changing means for changing the position information recorded in the recording device in association with the voice information of the icon based on the changed position received by the position specifying means. apparatus.