JP2005234074A

JP2005234074A - Apparatus and method for information processing, recording medium, and program

Info

Publication number: JP2005234074A
Application number: JP2004040908A
Authority: JP
Inventors: Mari Kuragami; 麻里倉上; Jun Kishigami; 純岸上
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2004-02-18
Filing date: 2004-02-18
Publication date: 2005-09-02

Abstract

PROBLEM TO BE SOLVED: To provide more suitable information for a user by effectively using an environmental sound as to an apparatus and method for information processing enabled to effectively use the environmental sound generally regarded as noise, a recording medium, and a program. SOLUTION: A sound source specification part 15 specifies the sound source of a collected sound and an information acquisition part 16 acquires information relating to the collected sound from an information database 17 based upon information representing the sound source specified by the sound source specification part 15. An image editing part 18 edits a photographed image by using the information that the information acquisition part 16 acquires and a recording control part 19 records the image edited by the image editing part 18 in a recording part 20 together with the collected sound. Based upon an inputted sound, information relating to the inputted sound can be acquired to provide more suitable information with the user. For example, this invention is applicable to a video camera which records a photographed image and a collected sound. COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、情報処理装置および情報処理方法、記録媒体、並びにプログラムに関し、特に、一般的に雑音とされる環境音を有効に活用することができるようにした情報処理装置および情報処理方法、記録媒体、並びにプログラムに関する。 The present invention relates to an information processing device, an information processing method, a recording medium, and a program, and in particular, an information processing device, an information processing method, and a recording that enable effective use of environmental sound that is generally regarded as noise. The present invention relates to a medium and a program.

例えば、言語音声認識においては、入力された音声から雑音を除去し、さらにスペクトル分析等の音響分析を行い、音響モデルおよび言語モデルを用いて言語音声の認識が行われる。音響モデルは、通常、音素を単位とした音声特徴量パターン分布の統計モデルであり、隠れマルコフモデル（Hidden Malkov Model）が主流である。言語モデルは、単語間の接続関係を規定する。一般的に使用されている言語モデルは、テキストデータベースから単語の連鎖統計を抽出したもので、単語どうしの接続関係を確率値で与える統計的モデルである（例えば、非特許文献１参照）。 For example, in language speech recognition, noise is removed from input speech, acoustic analysis such as spectrum analysis is performed, and speech recognition is performed using an acoustic model and a language model. The acoustic model is usually a statistical model of a voice feature amount pattern distribution in units of phonemes, and a Hidden Malkov Model is mainstream. The language model defines connection relationships between words. The language model generally used is a statistical model in which word chain statistics are extracted from a text database, and the connection relationship between words is given as a probability value (see, for example, Non-Patent Document 1).

なお、音声から雑音を除去する方法、即ち、雑音と言語音声を区別する方法としては、例えば、LPC（線形予測）分析を用いる方法がある。 In addition, as a method for removing noise from speech, that is, a method for distinguishing noise from speech speech, for example, there is a method using LPC (linear prediction) analysis.

一方、最近では、画像の編集は、例えば、PC（Personal Computer）等に搭載されている画像編集機能を用いて行うことができる。また、ビデオカメラ本体に搭載されている画像編集機能を用いて、フラッシュメモリ等に記録された画像の編集を行うことができる。 On the other hand, recently, image editing can be performed using, for example, an image editing function installed in a PC (Personal Computer) or the like. Further, it is possible to edit an image recorded in a flash memory or the like using an image editing function installed in the video camera body.

しかしながら、その前提としてユーザ自身がその機能や装置の編集操作方法等を学習する必要があった。さらに、編集作業を行うには、時間が必要とされるため、ユーザにとっては面倒であることがある。 However, as a precondition, the user himself / herself has to learn the function and the editing operation method of the apparatus. Furthermore, since it takes time to perform the editing work, it may be troublesome for the user.

そこで、画像の編集として、音声認識で得られるテキストデータと撮影画像の合成を行うカメラ装置が提案されている（例えば、特許文献１参照）。 Therefore, a camera device that synthesizes text data obtained by voice recognition and a photographed image as image editing has been proposed (see, for example, Patent Document 1).

IPSJ MAGAZINE Vol.41 No.4 Apr.2000 P436−439IPSJ MAGAZINE Vol.41 No.4 Apr.2000 P436-439

特開２００１−２７５０３１号公報JP 2001-275031 A

従来の音声認識においては言語音声の認識が主であり、言語音声以外の音声は雑音として扱われ、除去の対象とされてきたため、言語音声以外の音声（環境音）は有効に活用されていなかった。 In conventional speech recognition, speech recognition is mainly used for speech, and speech other than speech is treated as noise and is subject to removal, so speech other than speech (environmental sound) is not effectively used. It was.

本発明は、このような状況に鑑みてなされたものであり、環境音を有効に活用することで、ユーザに対して、より好適な情報を提供することができるようにするものである。 This invention is made | formed in view of such a condition, and makes it possible to provide a user with more suitable information by utilizing an environmental sound effectively.

本発明の情報処理装置は、入力された音声の音源を特定する音源特定手段と、音源特定手段により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得手段とを備えることを特徴とする。 An information processing apparatus according to the present invention includes a sound source specifying unit that specifies a sound source of an input sound, an information acquisition unit that acquires information related to sound based on sound source information indicating the sound source specified by the sound source specifying unit, It is characterized by providing.

本発明の情報処理装置は、情報取得手段により取得された情報を、音声とともに記録する記録手段をさらに備えるようにすることができる。 The information processing apparatus according to the present invention may further include a recording unit that records the information acquired by the information acquisition unit together with the voice.

本発明の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段をさらに備え、記録手段は、テキストデータも音声とともに記録するようにすることができる。 The information processing apparatus according to the present invention further includes language speech recognition means for recognizing language speech included in speech and outputting text data corresponding to the language speech, and the recording means records the text data together with the speech. can do.

本発明の情報処理装置は、画像を撮影する撮影手段と、撮影手段により取得された画像を、情報取得手段により取得された情報を用いて編集する画像編集手段と、画像編集手段による編集により得られる編集画像を音声とともに記録する記録手段とさらに備えるようにすることができる。 An information processing apparatus according to the present invention includes an imaging unit that captures an image, an image editing unit that edits an image acquired by the imaging unit using information acquired by the information acquisition unit, and editing by the image editing unit. And a recording means for recording the edited image together with the sound.

本発明の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段をさらに備え、画像編集手段は、画像を、テキストデータも用いて編集するようにすることができる。 The information processing apparatus of the present invention further includes language speech recognition means for recognizing language speech included in speech and outputting text data corresponding to the language speech, and the image editing means uses the image as text data. Can be edited.

本発明の情報処理装置は、情報取得手段により取得された情報を用いて、電子メールのデータを作成するメール作成手段をさらに備えるようにすることができる。 The information processing apparatus according to the present invention may further include a mail creating unit that creates data of electronic mail using the information acquired by the information acquiring unit.

本発明の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段をさらに備え、メール作成手段は、電子メールのデータを、テキストデータも用いて作成するようにすることができる。 The information processing apparatus of the present invention further includes language speech recognition means for recognizing language speech included in speech and outputting text data corresponding to the language speech, and the mail creation means converts the email data into text data. Can also be created using.

本発明の情報処理装置は、情報取得手段により取得した情報を表示する表示手段をさらに備えるようにすることができる。 The information processing apparatus of the present invention can further include display means for displaying information acquired by the information acquisition means.

音源特定手段は、ネットワークを介して接続された外部の装置に音声を処理させることにより、その音源を特定するようにすることができる。 The sound source specifying means can specify the sound source by causing an external device connected via the network to process the sound.

情報取得手段は、ネットワークを介して通信を行うことにより、情報を取得するようにすることができる。 The information acquisition means can acquire information by communicating via a network.

本発明の情報処理方法は、入力された音声の音源を特定する音源特定ステップと、音源特定の処理により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得ステップとを含むことを特徴とする。 The information processing method of the present invention includes a sound source specifying step for specifying a sound source of input sound, and an information acquisition step for acquiring information related to sound based on sound source information indicating the sound source specified by the sound source specifying process. It is characterized by including.

本発明の記録媒体に記録されているプログラムは、入力された音声の音源を特定する音源特定ステップと、音源特定ステップの処理により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得ステップとを含むことを特徴とする。 The program recorded in the recording medium of the present invention includes a sound source specifying step for specifying a sound source of input sound, and sound-related information based on sound source information indicating the sound source specified by the processing of the sound source specifying step. And an information acquisition step of acquiring.

本発明のプログラムは、入力された音声の音源を特定する音源特定ステップと、音源特定ステップの処理により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得ステップとをコンピュータに実行させることを特徴とする。 The program of the present invention includes a sound source specifying step for specifying a sound source of input sound, an information acquisition step for acquiring information related to sound based on sound source information indicating the sound source specified by the processing of the sound source specifying step, Is executed by a computer.

本発明においては、入力された音声の音源が特定され、その音源を示す音源情報に基づいて、入力された音声に関連する情報が取得される。 In the present invention, a sound source of the input sound is specified, and information related to the input sound is acquired based on sound source information indicating the sound source.

なお、ネットワークとは、少なくとも２つの装置が接続され、ある装置から、他の装置に対して、情報の伝達をできるようにした仕組みをいう。ネットワークを介して通信する装置は、独立した装置どうしであっても良いし、１つの装置を構成している内部ブロックどうしであっても良い。 Note that a network is a mechanism in which at least two devices are connected and information can be transmitted from one device to another device. The devices that communicate via the network may be independent devices, or may be internal blocks that constitute one device.

また、通信とは、無線通信および有線通信は勿論、無線通信と有線通信とが混在した通信、即ち、ある区間では無線通信が行われ、他の区間では有線通信が行われるようなものであっても良い。さらに、ある装置から他の装置への通信が有線通信で行われ、他の装置からある装置への通信が無線通信で行われるようなものであっても良い。 The communication is not only wireless communication and wired communication, but also communication in which wireless communication and wired communication are mixed, that is, wireless communication is performed in a certain section and wired communication is performed in another section. May be. Further, communication from one device to another device may be performed by wired communication, and communication from another device to one device may be performed by wireless communication.

本発明によれば、入力された音声をもとに、入力された音声に関連する情報の取得ができ、ユーザに対して、より好適な情報を提供するこができる。 ADVANTAGE OF THE INVENTION According to this invention, the information relevant to the input audio | voice can be acquired based on the input audio | voice, and more suitable information can be provided with respect to a user.

以下に本発明の実施の形態を説明するが、請求項に記載の構成要件と、発明の実施の形態における具体例との対応関係を例示すると、次のようになる。この記載は、請求項に記載されている発明をサポートする具体例が、発明の実施の形態に記載されていることを確認するためのものである。従って、発明の実施の形態中には記載されているが、構成要件に対応するものとして、ここには記載されていない具体例があったとしても、そのことは、その具体例が、その構成要件に対応するものではないことを意味するものではない。逆に、具体例が構成要件に対応するものとしてここに記載されていたとしても、そのことは、その具体例が、その構成要件以外の構成要件には対応しないものであることを意味するものでもない。 Embodiments of the present invention will be described below. Correspondences between constituent elements described in the claims and specific examples in the embodiments of the present invention are exemplified as follows. This description is to confirm that specific examples supporting the invention described in the claims are described in the embodiments of the invention. Therefore, even if there are specific examples that are described in the embodiment of the invention but are not described here as corresponding to the configuration requirements, the specific examples are not included in the configuration. It does not mean that it does not correspond to a requirement. On the contrary, even if a specific example is described here as corresponding to a configuration requirement, this means that the specific example does not correspond to a configuration requirement other than the configuration requirement. not.

さらに、この記載は、発明の実施の形態に記載されている具体例に対応する発明が、請求項に全て記載されていることを意味するものではない。換言すれば、この記載は、発明の実施の形態に記載されている具体例に対応する発明であって、この出願の請求項には記載されていない発明の存在、すなわち、将来、分割出願されたり、補正により追加される発明の存在を否定するものではない。 Further, this description does not mean that all the inventions corresponding to the specific examples described in the embodiments of the invention are described in the claims. In other words, this description is an invention corresponding to the specific example described in the embodiment of the invention, and the existence of an invention not described in the claims of this application, that is, in the future, a divisional application will be made. Nor does it deny the existence of an invention added by amendment.

請求項１に記載の情報処理装置は、入力された音声の音源を特定する音源特定手段（例えば、図１の音源特定部１５）と、音源特定手段により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得手段（例えば、図１の情報取得部１６）とを備えることを特徴とする。 The information processing apparatus according to claim 1 is based on sound source specifying means (for example, the sound source specifying unit 15 in FIG. 1) for specifying the sound source of the input sound and sound source information indicating the sound source specified by the sound source specifying means. And information acquisition means (for example, the information acquisition unit 16 in FIG. 1) for acquiring information related to voice.

請求項２に記載の情報処理装置は、情報取得手段（例えば、図１の情報取得部１６）により取得された情報を、音声とともに記録する記録手段（例えば、図１の記録制御部１９）をさらに備えることを特徴とする。 The information processing apparatus according to claim 2 includes a recording unit (for example, the recording control unit 19 in FIG. 1) that records the information acquired by the information acquisition unit (for example, the information acquisition unit 16 in FIG. 1) together with the voice. It is further provided with the feature.

請求項３に記載の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段（例えば、図１の言語音声認識部１４）をさらに備え、記録手段（例えば、図１の記録制御部１９）は、テキストデータも音声とともに記録することを特徴とする。 The information processing apparatus according to claim 3 further includes a linguistic speech recognition unit (for example, the linguistic speech recognition unit 14 in FIG. 1) that recognizes linguistic speech included in speech and outputs text data corresponding to the linguistic speech. The recording means (for example, the recording control unit 19 in FIG. 1) records the text data together with the voice.

請求項４に記載の情報処理装置は、画像を撮影する撮影手段（例えば、図１の画像入力部１２）と、撮影手段により取得された画像を、情報取得手段（例えば、図１の情報取得部１６）により取得された情報を用いて編集する画像編集手段（例えば、図１の画像編集部１８）と、画像編集手段による編集により得られる編集画像を音声とともに記録する記録手段（例えば、図１の記録制御部１９）とをさらに備えることを特徴とする。 The information processing apparatus according to claim 4 includes an imaging unit that captures an image (for example, the image input unit 12 in FIG. 1), and an information acquisition unit (for example, the information acquisition unit in FIG. Image editing means (for example, the image editing section 18 in FIG. 1) that edits using the information acquired by the section 16), and recording means (for example, FIG. 1) that records the edited image obtained by editing by the image editing means together with sound. 1 recording control unit 19).

請求項５に記載の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段（例えば、図１の言語音声認識部１４）をさらに備え、画像編集手段（例えば、図１の画像編集部１８）は、画像を、テキストデータも用いて編集することを特徴とする。 The information processing apparatus according to claim 5 further includes language speech recognition means (for example, the language speech recognition unit 14 in FIG. 1) that recognizes language speech included in speech and outputs text data corresponding to the language speech. The image editing means (for example, the image editing unit 18 in FIG. 1) edits the image using text data.

請求項６に記載の情報処理装置は、情報取得手段（例えば、図１１の情報取得部１６）により取得された情報を用いて、電子メールのデータを作成するメール作成手段（例えば、図１１のメール作成部５１）をさらに備えることを特徴とする。 The information processing apparatus according to claim 6 is a mail creation unit (for example, FIG. 11) that creates e-mail data using information acquired by the information acquisition unit (for example, the information acquisition unit 16 in FIG. 11). It is further characterized by further comprising a mail creating unit 51).

請求項７に記載の情報処理装置は、音声に含まれる言語音声を認識し、その言語音声に対応するテキストデータを出力する言語音声認識手段（例えば、図１１の言語音声認識部１４）をさらに備え、メール作成手段（例えば、図１１のメール作成部５１）は、電子メールのデータを、テキストデータも用いて作成することを特徴とする。 The information processing apparatus according to claim 7 further includes a linguistic voice recognition unit (for example, the linguistic voice recognition unit 14 in FIG. 11) that recognizes linguistic voice included in the voice and outputs text data corresponding to the linguistic voice. The mail creation means (for example, the mail creation unit 51 in FIG. 11) creates e-mail data using text data.

請求項８に記載の情報処理装置は、情報取得手段（例えば、図１７の情報取得部１６）により取得した情報を表示する表示手段（例えば、図１７の表示制御部８１）をさらに備えることを特徴とする。 The information processing apparatus according to claim 8 further includes a display unit (for example, the display control unit 81 in FIG. 17) that displays information acquired by the information acquisition unit (for example, the information acquisition unit 16 in FIG. 17). Features.

請求項９に記載の情報処理装置は、音源特定手段（例えば、図１の音源特定部１５）が、ネットワークを介して接続された外部の装置に音声を処理させることにより、その音源を特定することを特徴とする。 The information processing apparatus according to claim 9 specifies a sound source by a sound source specifying unit (for example, the sound source specifying unit 15 in FIG. 1) by causing an external device connected via a network to process the sound. It is characterized by that.

請求項１０に記載の情報処理装置は、情報取得手段（例えば、図１の情報取得部１６）が、ネットワークを介して通信を行うことにより、情報を取得することを特徴とする。 The information processing apparatus according to claim 10 is characterized in that information acquisition means (for example, the information acquisition unit 16 in FIG. 1) acquires information by performing communication via a network.

請求項１１に記載の情報処理方法は、入力された音声の音源を特定する音源特定ステップ（例えば、図８のステップS３の処理）と、音源特定ステップの処理により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得ステップ（例えば、図８のステップS４の処理）とを含むことを特徴とする。 The information processing method according to claim 11 is a sound source specifying step (for example, the process of step S3 in FIG. 8) for specifying the sound source of the input voice and the sound source information indicating the sound source specified by the process of the sound source specifying step. The information acquisition step (for example, the process of step S4 of FIG. 8) which acquires the information relevant to audio | voice based on this is characterized by the above-mentioned.

請求項１２に記載の記録媒体に記録されているプログラム、および請求項１３に記載のプログラムは、入力された音声の音源を特定する音源特定ステップ（例えば、図８のステップS３の処理）と、音源特定ステップの処理により特定された音源を示す音源情報に基づいて、音声に関連する情報を取得する情報取得ステップ（例えば、図８のステップS４の処理）とを含むことを特徴とする。 The program recorded on the recording medium according to claim 12 and the program according to claim 13 include a sound source specifying step (for example, the process of step S3 in FIG. 8) for specifying the sound source of the input sound, The information acquisition step (for example, the process of step S4 of FIG. 8) which acquires the information relevant to a sound based on the sound source information which shows the sound source identified by the process of a sound source identification step is characterized by the above-mentioned.

このプログラムは、記録媒体（例えば、図２０の磁気ディスク１１１）に記録することができる。 This program can be recorded on a recording medium (for example, the magnetic disk 111 in FIG. 20).

図１は、本発明を適用したビデオカメラの一実施の形態の構成例を示すブロック図である。 FIG. 1 is a block diagram showing a configuration example of an embodiment of a video camera to which the present invention is applied.

ビデオカメラ１０は、音声入力部１１、画像入力部１２、音声認識部１３、情報取得部１６、情報データベース１７、画像編集部１８、記録制御部１９、および記録媒体２０から構成されている。 The video camera 10 includes an audio input unit 11, an image input unit 12, an audio recognition unit 13, an information acquisition unit 16, an information database 17, an image editing unit 18, a recording control unit 19, and a recording medium 20.

音声入力部１１は、音声を集音することにより、音声データを取得し、言語音認識部１４、および音源特定部１５、並びに記録制御部１９に供給する。音声入力部１１は、例えば、マイクロホンなどで構成される。 The voice input unit 11 acquires voice data by collecting voice and supplies the voice data to the language sound recognition unit 14, the sound source identification unit 15, and the recording control unit 19. The voice input unit 11 is composed of, for example, a microphone.

画像入力部１２は、撮影対象を撮影することにより、撮影画像データを取得し、画像編集部１８に供給する。画像入力部１２は、例えば、CCD（Charge Coupled Device）やCMOS（Complementary Metal Oxide Semiconductor）イメージャなどで構成される。 The image input unit 12 acquires the captured image data by capturing an image of the capturing target, and supplies the captured image data to the image editing unit 18. The image input unit 12 includes, for example, a CCD (Charge Coupled Device), a CMOS (Complementary Metal Oxide Semiconductor) imager, or the like.

音声認識部１３は、言語音認識部１４、および音源特定部１５から構成されている。 The voice recognition unit 13 includes a language sound recognition unit 14 and a sound source identification unit 15.

言語音認識部１４は、音声入力部１１から供給された音声データに含まれる言語音声を音声認識処理し、その言語音声に対応するテキストデータを、画像編集部１８に供給する。 The language sound recognition unit 14 performs speech recognition processing on the language sound included in the sound data supplied from the sound input unit 11 and supplies text data corresponding to the language sound to the image editing unit 18.

音源特定部１５は、音声入力部１１から供給された音声データに含まれる音声の音源を特定し、その音源を表す音源情報を、情報取得部１６に供給する。 The sound source specifying unit 15 specifies a sound source of sound included in the sound data supplied from the sound input unit 11 and supplies sound source information representing the sound source to the information acquisition unit 16.

情報取得部１６は、音源特定部１５により供給された音源情報に基づいて、情報データベース１７から、音声入力部１１に入力された音声に関連する情報を取得し、画像編集部１８に供給する。 Based on the sound source information supplied from the sound source specifying unit 15, the information acquisition unit 16 acquires information related to the sound input to the sound input unit 11 from the information database 17 and supplies the information to the image editing unit 18.

情報データベース１７は、各種の音声に関連する情報（例えば、特定の音（音源）に直接関連する画像やテキスト情報、さらには、その特定の音が生じるシーンを効果的に表す画像（例えば、エフェクトをかける画像（効果画像）などの、特定の音に間接的に関連する画像やテキスト情報）など）を、音源情報（例えば、「ジェットコースタ」などのキーワード）に対応付けて記憶している。 The information database 17 includes information related to various sounds (for example, images and text information directly related to a specific sound (sound source), and an image (for example, an effect) that effectively represents a scene in which the specific sound occurs. (Image or text information indirectly related to a specific sound, such as an image to be applied (effect image)) is stored in association with sound source information (for example, a keyword such as “jet coaster”).

画像編集部１８は、画像入力部１２により供給される撮影画像データを編集する。即ち、これは、画像編集部１８からの撮影画像データに、言語音声認識部１４から供給されるテキストデータ、または情報取得部１６から供給される情報を合成し、その結果得られる合成画像データを、記録制御部１９へ供給する。 The image editing unit 18 edits the captured image data supplied from the image input unit 12. That is, this combines the captured image data from the image editing unit 18 with the text data supplied from the language / speech recognition unit 14 or the information supplied from the information acquisition unit 16, and the resulting composite image data is obtained. , And supplied to the recording control unit 19.

記録制御部１９は、音声入力部１１から供給される音声データと、画像編集部１８から供給される合成画像データとを、記録媒体２０に記録する制御を行う。 The recording control unit 19 controls to record the audio data supplied from the audio input unit 11 and the composite image data supplied from the image editing unit 18 on the recording medium 20.

記録媒体２０は、例えば、磁気テープ、磁気ディスク、光ディスク、光磁気ディスク、フラッシュメモリ等で構成される。 The recording medium 20 is composed of, for example, a magnetic tape, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like.

図２は、図１の音源特定部１５の構成例を示している。音源特定部１５は、FFT処理部３１、データ比較部３２、および音源特定データベース３３から構成されている。 FIG. 2 shows a configuration example of the sound source identification unit 15 of FIG. The sound source identification unit 15 includes an FFT processing unit 31, a data comparison unit 32, and a sound source identification database 33.

FFT処理部３１は、音声入力部１１から供給された音声データをFFT（Fast Fourier Transform）処理することにより、周波数スペクトルを得て、データ比較部３２に供給する。 The FFT processing unit 31 obtains a frequency spectrum by performing FFT (Fast Fourier Transform) processing on the audio data supplied from the audio input unit 11 and supplies the frequency spectrum to the data comparison unit 32.

データ比較部３２は、FFT処理部３１から供給された周波数スペクトルと、音源特定データベース３３に記憶されている周波数スペクトルとを比較し、音声入力部１１から供給された音声データの音源を表す音源情報を出力する。 The data comparison unit 32 compares the frequency spectrum supplied from the FFT processing unit 31 with the frequency spectrum stored in the sound source identification database 33 and represents sound source information representing the sound source of the audio data supplied from the audio input unit 11. Is output.

音源特定データベース３３は、各種の音源に対応する周波数スペクトルのモデル（周波数スペクトルモデル）を記憶している。なお、音源特定部３３には、その他、室内、室外、車内等の場所（環境）を特定するために、反射音に関連するデータも記憶するようにしても良い。 The sound source identification database 33 stores frequency spectrum models (frequency spectrum models) corresponding to various sound sources. In addition, the sound source specifying unit 33 may also store data related to the reflected sound in order to specify other places (environments) such as indoors, outdoors, and in the car.

ここで、音源特定部１５の処理の概要を、図３を用いて説明する。例えば、遊園地にて、図１のビデオカメラ１０によってジェットコースタを撮影した場合、音声入力部１１では、ジェットコースタの車輪音その他の音が集音される。音源特定部１５は、音声入力部１１で集音された音声（以下、適宜、集音音声と称する）に、音源として、例えば、図３に示すようなジェットコースタの車輪音、乗客の叫び声、遊園地のBGM、遊園地のその他の音、および撮影者のナレーションコメントが含まれていることを特定し、それぞれの音源を表す音源情報を、情報取得部１６に供給する。 Here, the outline of the processing of the sound source identification unit 15 will be described with reference to FIG. For example, when a roller coaster is photographed by the video camera 10 in FIG. 1 at an amusement park, the sound input unit 11 collects the wheel sound and other sounds of the roller coaster. The sound source specifying unit 15 uses, as a sound source, the sound collected by the sound input unit 11 (hereinafter referred to as sound collection sound as appropriate), for example, the wheel sound of a roller coaster as shown in FIG. It is specified that the amusement park BGM, other sounds of the amusement park, and the narration comment of the photographer are included, and the sound source information representing each sound source is supplied to the information acquisition unit 16.

音源特定部１５は、例えば以下に示すような周波数スペクトルの比較により、音源を特定する。 The sound source specifying unit 15 specifies a sound source by, for example, comparing frequency spectra as described below.

即ち、音源特定部１５は、音声入力部１１からの音声データに対してFFT（Fast Fourier Transform）処理を行うことで得られる周波数スペクトルを、音源特定データベース３３に記憶されている周波数スペクトル（モデル）と比較することで音源の特定を行う。 That is, the sound source specifying unit 15 uses the frequency spectrum (model) stored in the sound source specifying database 33 as a frequency spectrum obtained by performing FFT (Fast Fourier Transform) processing on the sound data from the sound input unit 11. The sound source is identified by comparing with.

図４は、ある音源の音に対してFFT処理を行うことにより得られた周波数軸スペクトルを示している。なお、図４において、横軸は周波数を示し、縦軸は強度を示している。音源特定部１５は、例えば、図４に示した周波数スペクトルにおいて突出した周波数成分を、音源特定データベース３３に記憶されている、各種音源に対応する周波数スペクトルと比較することにより、音源の特定を行う。 FIG. 4 shows a frequency axis spectrum obtained by performing FFT processing on the sound of a certain sound source. In FIG. 4, the horizontal axis represents frequency and the vertical axis represents intensity. For example, the sound source specifying unit 15 specifies a sound source by comparing the frequency components protruding in the frequency spectrum shown in FIG. 4 with frequency spectra corresponding to various sound sources stored in the sound source specifying database 33. .

なお、周波数成分（周波数スペクトル）どうしの比較は、例えば、可聴周波数帯域を１６分割や３２分割などして、その分割した周波数帯域ごとに、言わば簡易的に行っても良い。 Note that the comparison between frequency components (frequency spectra) may be performed simply, for example, by dividing the audible frequency band into 16 or 32, and so on for each divided frequency band.

また、音源特定部１５では、周波数スペクトルの時間変化によって、音源を特定することもできる。音源特定部１５では、例えば、音声入力部１１からの音声から、特定のレベルとなっている周波数が時間の経過に伴い上昇する、あるいは下降する特徴を抽出し、その特徴を、音源特定データベース３３に記憶されている周波数スペクトルと比較することで音源を特定することができる。ここで、図５に、特定のレベルとなっている周波数が上昇する周波数スペクトルを示す。 The sound source specifying unit 15 can also specify a sound source by a time change of the frequency spectrum. The sound source specifying unit 15 extracts, for example, a feature in which a frequency at a specific level rises or falls with time from the sound from the sound input unit 11, and the feature is extracted from the sound source specifying database 33. The sound source can be specified by comparing with the frequency spectrum stored in. Here, FIG. 5 shows a frequency spectrum in which the frequency at a specific level increases.

また、音源特定部１５では、図６に示すような、音声入力部１１からの音声の周波数スペクトルの時間変化、さらには、その高調波成分の分布、および強度を、音源特定データベース３３に記憶されている周波数スペクトルと比較することで音源を特定することができる。 Further, in the sound source specifying unit 15, the time change of the frequency spectrum of the sound from the sound input unit 11 as shown in FIG. 6, and the distribution and intensity of the harmonic component thereof are stored in the sound source specifying database 33. The sound source can be specified by comparing with a frequency spectrum.

また、音源特定部１５では、図７に示すような、音声入力部１１からの音声の周波数スペクトルの時間変化、および時間経過に伴う減衰特性を、音源特定データベース３３に記憶されている周波数スペクトルと比較することで音源を特定することができる。 Further, in the sound source specifying unit 15, the time change of the frequency spectrum of the sound from the sound input unit 11 and the attenuation characteristics with the passage of time as shown in FIG. 7 are stored in the frequency spectrum stored in the sound source specifying database 33. A sound source can be specified by comparison.

図８は、図１のビデオカメラ１０の撮影記録処理を説明するフローチャートである。 FIG. 8 is a flowchart for explaining shooting and recording processing of the video camera 10 of FIG.

ステップS１において、音声入力部１１は音声を集音することにより音声データを取得し、言語音声認識部１４、音源特定部１５、および記録制御部１９に供給する。また、画像入力部１２は、撮影対象を撮影することにより撮影画像データを取得し、画像編集部１８に供給する。 In step S 1, the voice input unit 11 acquires voice data by collecting voice and supplies the voice data to the language voice recognition unit 14, the sound source identification unit 15, and the recording control unit 19. In addition, the image input unit 12 acquires captured image data by capturing a subject to be captured, and supplies the captured image data to the image editing unit 18.

ステップS１からステップS２へ進み、言語音声認識部１４は、一般的な言語音声認識技術を用いて、音声入力部１１から供給された音声データを、対応するテキストデータに変換し、画像編集部１８へ供給して、ステップS３へ進む。 Proceeding from step S1 to step S2, the language speech recognition unit 14 converts the speech data supplied from the speech input unit 11 into corresponding text data using a general language speech recognition technology, and the image editing unit 18. To step S3.

ステップS３において、音源特定部１５（図２）は、FFT処理部３１にて、音声入力部１１から供給された音声データに対してFFT処理を行い、その音声データを周波数スペクトルへ変換し、データ比較部３２へ供給する。 In step S3, the sound source identification unit 15 (FIG. 2) performs FFT processing on the audio data supplied from the audio input unit 11 in the FFT processing unit 31, converts the audio data into a frequency spectrum, and performs data processing. It supplies to the comparison part 32.

さらに、ステップS３において、データ比較部３２は、FFT処理部３１から供給された周波数スペクトルを、音源特定データベース３３に記憶されている各種の音源に対応する周波数スペクトル（モデル）と比較し、その比較結果に基づいて、音声入力部１１に入力された音声の音源を特定する。データ比較部３２は、特定した音源を示す音源情報を情報取得部１６に供給し、ステップS４に進む。 Further, in step S3, the data comparison unit 32 compares the frequency spectrum supplied from the FFT processing unit 31 with frequency spectra (models) corresponding to various sound sources stored in the sound source identification database 33, and compares them. Based on the result, the sound source of the sound input to the sound input unit 11 is specified. The data comparison unit 32 supplies sound source information indicating the identified sound source to the information acquisition unit 16, and proceeds to step S4.

ステップS４において、情報取得部１６は、音源特定部１５（のデータ比較部３２）から供給された音源情報に基づいて、情報データベース１７から、集音音声（音声入力部１１に入力された音声）に関連する情報を取得し、画像編集部１８に供給する。 In step S4, the information acquisition unit 16 collects sound from the information database 17 based on the sound source information supplied from the sound source identification unit 15 (the data comparison unit 32) (the sound input to the sound input unit 11). Is obtained and supplied to the image editing unit 18.

ステップS４からステップS５へ進み、画像編集部１８は、画像入力部１２から供給された撮影画像データに対して、言語音認識部１４から供給されたテキストデータ、および情報取得部１６から供給された情報を合成し、その合成画像データを記録制御部１９に供給して、ステップS６へ進む。 Proceeding from step S4 to step S5, the image editing unit 18 supplies the text data supplied from the language sound recognition unit 14 and the information acquisition unit 16 to the captured image data supplied from the image input unit 12. The information is combined, the combined image data is supplied to the recording control unit 19, and the process proceeds to step S6.

ステップS６において、記録制御部１９は、音声入力部１１から供給された音声データと画像編集部１８から供給された合成画像データを、ともに記録媒体２０へ記録し、ステップS１に戻り撮影記録処理を繰り返す。 In step S6, the recording control unit 19 records both the audio data supplied from the audio input unit 11 and the composite image data supplied from the image editing unit 18 on the recording medium 20, and returns to step S1 to perform shooting recording processing. repeat.

なお、上述の場合には、撮影画像データに対して編集としての合成を行った合成画像データを記録するようにしたが、撮影画像データに合成する情報やテキスト（テロップ）データは、撮影画像データおよび音声データに関連付けて記録するようにし、再生時に、撮影画像データに合成するようにしても良い。この場合、ユーザ操作により、撮影画像データ、または合成画像データの再生を選択することができる。 In the above-described case, the composite image data obtained by combining the captured image data as an edit is recorded. However, information or text (telop) data to be combined with the captured image data is recorded in the captured image data. In addition, recording may be performed in association with audio data, and may be combined with captured image data during reproduction. In this case, reproduction of captured image data or composite image data can be selected by a user operation.

また、集音音声に関連する情報の取得を行う処理、および取得した情報を撮影画像データに合成する処理は、撮影（記録）後に行うようにすることも可能である。 Further, the process of acquiring information related to the collected sound and the process of synthesizing the acquired information with captured image data can be performed after shooting (recording).

図９は、図１のビデオカメラ１０の第１の処理の概要を示す図である。 FIG. 9 is a diagram showing an outline of the first processing of the video camera 10 of FIG.

撮影画像４１は、ビデオカメラ１０によるを撮影によって得られた画像を示している。即ち、撮影画像４１は、例えば、遊園地でジェットコースタを撮影することにより得られた画像である。 A photographed image 41 shows an image obtained by photographing with the video camera 10. That is, the photographed image 41 is an image obtained by photographing a roller coaster at an amusement park, for example.

効果画像４２は、ビデオカメラ１０で撮影画像４１を撮影した時の集音音声に含まれる音の音源を示す音源情報に基づいて、情報データベース１７から取得された情報としての画像を示している。効果画像４２は、例えば、撮影画像４１に表示されているジェットコースタに乗っているときの恐怖をイメージさせる画像となっている。 The effect image 42 indicates an image as information acquired from the information database 17 based on sound source information indicating a sound source of sound included in the collected sound when the captured image 41 is captured by the video camera 10. The effect image 42 is an image that gives an image of fear when riding the roller coaster displayed in the captured image 41, for example.

図１のビデオカメラ１０では、撮影画像４１に対して、効果画像４２が合成され、その結果、合成画像４３が得られる。 In the video camera 10 of FIG. 1, the effect image 42 is synthesized with the captured image 41, and as a result, a synthesized image 43 is obtained.

図１０は、図１のビデオカメラ１０の第２の処理の概要を示す図である。 FIG. 10 is a diagram showing an outline of the second processing of the video camera 10 of FIG.

撮影画像４６は、ビデオカメラ１０による撮影によって得られた画像を示している。即ち、撮影画像４６は、例えば、F１（Formula １）レースにおけるレーシングカーを撮影することにより得られた画像である。 The photographed image 46 shows an image obtained by photographing with the video camera 10. That is, the photographed image 46 is an image obtained by photographing a racing car in an F1 (Formula 1) race, for example.

情報画像４７は、ビデオカメラ１０で集音音声に含まれる音の音源を示す音源情報に基づいて、情報データベース１７から取得された情報としての画像を示している。情報画像４７は、例えば、撮影画像４６に表示されているレーシングカーのエンジン音から特定されるマシンの画像や、詳細情報を記述したテキスト情報などを含んでいる。 The information image 47 shows an image as information acquired from the information database 17 based on the sound source information indicating the sound source of the sound included in the collected sound by the video camera 10. The information image 47 includes, for example, a machine image specified from the engine sound of the racing car displayed in the captured image 46, text information describing detailed information, and the like.

図１のビデオカメラ１０では、撮影画像４６に対して、情報画像４７が合成され、合成画像４８が得られる。 In the video camera 10 of FIG. 1, the information image 47 is synthesized with the captured image 46 to obtain a synthesized image 48.

このようにビデオカメラ１０では、撮影時に集音された音声に合わせて撮影画像に効果画像や情報画像を合成するなどの画像編集が行われるので、ユーザが編集を行う手間を省くことができる。 As described above, the video camera 10 performs image editing such as synthesizing the effect image and the information image with the captured image in accordance with the sound collected at the time of shooting, so that the user can save time and effort for editing.

また、鑑賞時において、撮影画像に対し、ユーザの予測しない編集が加えられていることで、より一層の楽しみをユーザへ与えることができる。 In addition, at the time of viewing, editing that is not predicted by the user is added to the captured image, so that even more enjoyment can be given to the user.

さらに、集音音声に含まれる特定の音に関連する画像やテキスト情報が合成されるので、ユーザに対してより有益な情報を提供することができる。 Furthermore, since the image and text information related to the specific sound included in the collected voice are synthesized, more useful information can be provided to the user.

なお、上述した処理は、ビデオカメラと同様に、画像と音声を取得することができる、例えば、デジタルスチルカメラやテレビジョン受像機などで行うことも可能であり、上述の効果と同様の効果を得ることができる。 Note that the above-described processing can acquire images and sound in the same manner as a video camera. For example, the processing can be performed by a digital still camera, a television receiver, or the like. Can be obtained.

また、上述の場合には、音声認識部１３、情報取得部１６、および画像編集部１８がビデオカメラ１０内部に実装されているが、音声認識部１３、情報取得部１６、または画像編集部１８のうちの１以上は、外部に設置された装置（外部装置）に実装し、ビデオカメラ１０がネットワークを介して外部装置と通信することにより、音声認識部１３、情報取得部１６、または画像編集部１８が行う処理を外部装置に実行させるようにしても良い。外部装置とは、例えば、PCやサーバなどである。 In the above-described case, the voice recognition unit 13, the information acquisition unit 16, and the image editing unit 18 are mounted in the video camera 10, but the voice recognition unit 13, the information acquisition unit 16, or the image editing unit 18 is mounted. One or more of them are mounted on an external device (external device), and the video camera 10 communicates with the external device via a network, so that the voice recognition unit 13, the information acquisition unit 16, or the image editing is performed. The processing performed by the unit 18 may be executed by an external device. The external device is, for example, a PC or a server.

また、上述の場合には、情報取得部１６が、ビデオカメラ１０本体に内蔵された情報データベース１７から情報を取得するようにしたが、情報取得部１６は、ネットワークを介して外部装置から情報を取得するようにしても良いし、ビデオカメラ１０に着脱可能なリムーバルメディア、例えば、フラッシュメモリなどから情報を取得するようにしても良い。 In the above case, the information acquisition unit 16 acquires information from the information database 17 built in the video camera 10 body. However, the information acquisition unit 16 receives information from an external device via a network. The information may be acquired, or information may be acquired from a removable medium that can be attached to and detached from the video camera 10, for example, a flash memory.

図１１は、本発明を適用した携帯電話機の一実施の形態の構成例を示すブロック図である。図１１において、図１のビデオカメラ１０における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 FIG. 11 is a block diagram showing a configuration example of an embodiment of a mobile phone to which the present invention is applied. 11, parts corresponding to those in the video camera 10 of FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

携帯電話機５０は、音声入力部１１、音声認識部１３、情報取得部１６、情報データベース１７、メール作成部５１、通信部５２から構成されている。 The cellular phone 50 includes a voice input unit 11, a voice recognition unit 13, an information acquisition unit 16, an information database 17, a mail creation unit 51, and a communication unit 52.

言語音声認識部１４は、音声入力部１１から供給された音声データを音声認識することにより得られるテキストデータを、メール作成部５１に供給する。 The language voice recognition unit 14 supplies text data obtained by voice recognition of the voice data supplied from the voice input unit 11 to the mail creation unit 51.

情報取得部１６は、音源特定部１５により供給された音源情報に基づいて、情報データベース１７から得られた情報を、メール作成部５１に供給する。 The information acquisition unit 16 supplies information obtained from the information database 17 to the mail creation unit 51 based on the sound source information supplied by the sound source identification unit 15.

メール作成部５１は、言語音声認識部１４から供給されるテキストデータと、情報取得部１６から供給される情報をもとにメール（電子メール）を作成し、通信部５２へ、そのメールのメールデータを供給する。 The mail creation unit 51 creates a mail (e-mail) based on the text data supplied from the language speech recognition unit 14 and the information supplied from the information acquisition unit 16, and sends the mail to the communication unit 52. Supply data.

通信部５２は、メール作成部５１から供給されたメールデータをインターネット等のネットワークを介して外部（通話相手）へ送信する。 The communication unit 52 transmits the mail data supplied from the mail creation unit 51 to the outside (call partner) via a network such as the Internet.

ここで、図１１の本実施の形態では、音声認識部１３、情報取得部１６、およびメール作成部５１が携帯電話機５０内部に実装されているが、音声認識部１３、情報取得部１６、またはメール作成部５１は、外部の装置、例えば、携帯電話会社のデータセンタ（基地局や制御センタ）に設置されているサーバなどに実装することができる。この場合、携帯電話機５０からは、必要なデータを、データセンタに送信させ、音声認識部１３、情報取得部１６、またはメール作成部５１が行う処理を、データセンタで行うようにしても良い。 Here, in the present embodiment of FIG. 11, the voice recognition unit 13, the information acquisition unit 16, and the mail creation unit 51 are implemented in the mobile phone 50, but the voice recognition unit 13, the information acquisition unit 16, or The mail creation unit 51 can be mounted on an external device, for example, a server installed in a data center (base station or control center) of a mobile phone company. In this case, necessary data may be transmitted from the mobile phone 50 to the data center, and the processing performed by the voice recognition unit 13, the information acquisition unit 16, or the mail creation unit 51 may be performed in the data center.

また、図１１の本実施の形態では、情報取得部１６が携帯電話機５０内部に実装された情報データベース１７から情報を取得するようにしたが、例えば、携帯電話会社のデータセンタ（基地局や制御センタ）に設置されているサーバなどから情報を取得するようにしても良いし、携帯電話機５０に着脱可能なリムーバルメディア、例えば、フラッシュメモリなどから情報を取得するようにしても良い。 In the present embodiment of FIG. 11, the information acquisition unit 16 acquires information from the information database 17 installed in the mobile phone 50. For example, a data center (base station or control) of a mobile phone company is used. Information may be acquired from a server installed in the center) or information may be acquired from a removable medium that can be attached to and detached from the mobile phone 50, for example, a flash memory.

図１２は、図１１の携帯電話機５０における処理を説明するフローチャートである。 FIG. 12 is a flowchart for explaining processing in the mobile phone 50 of FIG.

ステップS２１において、音声入力部１１は、音声データを取得し、音声認識部１３に供給して、ステップS２２へ進む。なお、この音声データは、音声認識部１３に供給される他、電話の音声のデータとして基地局に送信される。 In step S21, the voice input unit 11 acquires voice data, supplies the voice data to the voice recognition unit 13, and proceeds to step S22. The voice data is supplied to the voice recognition unit 13 and transmitted to the base station as telephone voice data.

図１２のフローチャートに従った処理は、例えば、携帯電話機５０において、他の携帯電話機との通信が開始されると開始される。即ち、ステップS２２において、言語音声認識部１４は、音声入力部１１から供給された音声データを音声認識することにより、テキストデータに変換し、メール作成部５１に供給する。 The process according to the flowchart of FIG. 12 is started when, for example, the mobile phone 50 starts communication with another mobile phone. That is, in step S 22, the language speech recognition unit 14 recognizes the speech data supplied from the speech input unit 11, converts it into text data, and supplies it to the mail creation unit 51.

その後、ステップS２３では、図８のステップS３における場合と同様に、音源特定部１５が、音声入力部１１に入力された音声の音源を特定し、その音源を表す音源情報を、情報取得部１６に供給してステップS２４に進む。ステップS２４では、情報取得部１６は、音源特定部１５から供給された音源情報に基づいて、情報データベース１７から、集音音声（音声入力部１１に入力された音声）に関連する画像を取得し、メール作成部５１に供給する。 Thereafter, in step S23, as in step S3 of FIG. 8, the sound source specifying unit 15 specifies the sound source of the sound input to the sound input unit 11, and the sound source information representing the sound source is obtained from the information acquisition unit 16. To proceed to step S24. In step S24, the information acquisition unit 16 acquires an image related to the collected sound (sound input to the sound input unit 11) from the information database 17 based on the sound source information supplied from the sound source specifying unit 15. To the mail creation unit 51.

ステップS２４からステップS２５へ進み、メール作成部５１は、通話が終了したか否かの判定を行う。ステップS２５において、通話中である（通話が終了していない）と判定された場合は、ステップS２１へ戻り、ステップS２１乃至ステップS２５の処理を繰り返す。 Proceeding from step S24 to step S25, the mail creating unit 51 determines whether or not the call has ended. If it is determined in step S25 that the call is in progress (the call has not ended), the process returns to step S21, and the processes in steps S21 to S25 are repeated.

また、ステップS２５において、通話が終了したと判定された場合、ステップS２６へ進み、メール作成部５１は、言語音声認識部１４から供給されたテキストデータと、情報取得部１６から供給された画像をもとにメールを作成し、そのメールのメールデータを通信部５２に供給する。 If it is determined in step S25 that the call has ended, the process proceeds to step S26, where the mail creation unit 51 receives the text data supplied from the language speech recognition unit 14 and the image supplied from the information acquisition unit 16. A mail is originally created and the mail data of the mail is supplied to the communication unit 52.

ここで、メール作成部５１は、言語音声認識部１４からのテキストデータを、例えば、メールのメール本文とする。また、メール作成部５１は、情報取得部１６から供給された画像を、メールの添付ファイルとする。添付ファイルとされた画像は、例えば、メールを開いたときのメール本文の背景画像として用いることができる。 Here, the mail creation unit 51 uses the text data from the language speech recognition unit 14 as, for example, the mail body of the mail. The mail creation unit 51 also uses the image supplied from the information acquisition unit 16 as an email attachment file. For example, an image that is an attached file can be used as a background image of a mail text when a mail is opened.

ステップS２６からステップS２７へ進み、通信部５２は、メール作成部５１から供給されたメールデータを、ネットワークを介して通話相手へ送信し、処理を終了する。 Proceeding from step S26 to step S27, the communication unit 52 transmits the mail data supplied from the mail creation unit 51 to the other party via the network, and ends the process.

なお、図１２の実施の形態では、通話時に集音される音声を利用してメールを作成するようにしたが、ユーザがメールを作成する操作を行っているときに音声入力部１１に入力された音声をもとに、その音声に関連する画像を取得し、その画像を、ユーザが作成したメールに添付するようにしても良い。 In the embodiment shown in FIG. 12, the mail is created using the voice collected during the call. However, when the user performs the operation of creating the mail, the voice is input to the voice input unit 11. An image related to the voice may be acquired based on the voice, and the image may be attached to an email created by the user.

図１３は、図１１の携帯電話機５０が送信したメール（のメールデータ）を受信した携帯電話機（以下、適宜、受信機という）における、そのメールの表示例を示している。 FIG. 13 shows a display example of the mail on a mobile phone (hereinafter, appropriately referred to as a receiver) that has received the mail (mail data) transmitted by the mobile phone 50 of FIG.

図１３の一番左は、図１１の携帯電話５０のユーザが、遊園地にて通話、またはメールを作成した場合において、そのメールを受信した受信機の画面の表示例である。 The leftmost part of FIG. 13 is a display example of the screen of the receiver that has received the mail when the user of the mobile phone 50 of FIG. 11 makes a call or creates a mail at the amusement park.

図１３の一番左では、「こんにちは！今日は遊園地に遊びにきています！」のテキストデータとともに、その背景として、遊園地の画像が表示されている。従って受信機のユーザは、携帯電話機５０のユーザが遊園地にいることを認識することができる。 In the far left of Figure 13, along with the text data of "Hello! Today has come to play in the amusement park!", As its background, amusement park of the image is displayed. Therefore, the user of the receiver can recognize that the user of the mobile phone 50 is in the amusement park.

図１３の左から２番目（中央）は、図１１の携帯電話５０のユーザが、宴会会場にて通話、またはメールを作成した場合において、そのメールを受信した受信機の画面の表示例である。 The second (center) from the left in FIG. 13 is a display example of the screen of the receiver that has received the mail when the user of the mobile phone 50 in FIG. 11 makes a call or creates a mail at the banquet hall. .

図１３の左から2番目では、「今、パーティやっているからおいでよ！！」のテキストデータとともに、その背景として、宴会（パーティ）会場の画像が表示されている。従って受信機のユーザは、携帯電話機５０のユーザが受信機のユーザにパーティに来てもらいたいことを希望していることを認識することができる。 In the second from the left in FIG. 13, an image of the banquet hall is displayed as the background along with the text data “Please come to the party now!”. Therefore, the user of the receiver can recognize that the user of the mobile phone 50 wants the user of the receiver to come to the party.

図１３の左から３番目（一番右）は、図１１の携帯電話５０のユーザが、走行中の自動車内にて通話、またはメールを作成した場合において、そのメールを受信した受信機の画面の表示例である。 The third from the left in FIG. 13 (the rightmost) is the screen of the receiver that has received the mail when the user of the mobile phone 50 in FIG. Is a display example.

図１３の左から３番目では、「あと１５分でそちらに着きます」のテキストデータとともに、その背景として、走行中の車内からみた道路の画像が表示されている。従って受信機のユーザは、携帯電話機５０のユーザが走行中の自動車内にいることを認識することができる。 In the third part from the left in FIG. 13, along with text data “I will get there in 15 minutes”, an image of the road viewed from the inside of the running car is displayed as the background. Therefore, the user of the receiver can recognize that the user of the mobile phone 50 is in the traveling car.

このように携帯電話機５０は、より効果的に通信相手へ情報を伝達することができる。 Thus, the mobile phone 50 can transmit information to the communication partner more effectively.

図１４は、本発明を適用したICレコーダシステムの一実施の形態の構成例を示す外観図である。 FIG. 14 is an external view showing a configuration example of an embodiment of an IC recorder system to which the present invention is applied.

ICレコーダ６０は、音声を集音し、その集音音声の音源を特定する。さらに、ICレコーダ６０は、特定した音源を示す音源情報に基づいて、集音した音声に関連する画像６２を取得する。また、ICレコーダ６０は、集音音声に含まれる言語音声を音声認識することで、その言語音声をテキストデータに変換し、画像６２および集音音声とともに記録する。 The IC recorder 60 collects sound and specifies the sound source of the collected sound. Further, the IC recorder 60 acquires an image 62 related to the collected sound based on the sound source information indicating the specified sound source. Further, the IC recorder 60 recognizes a language voice included in the collected voice, converts the voice to text data, and records it together with the image 62 and the collected voice.

ここで、画像６２としては、例えば、ICレコーダで集音を行った環境を表す画像（例えば、会議室の画像）が取得される。 Here, as the image 62, for example, an image (for example, an image of a conference room) representing an environment in which sound is collected by an IC recorder is acquired.

ICレコーダ６０の録音処理により記録された情報を、例えばPC６１に出力すると、PC６１のディスプレイには、ICレコーダ６０で集音された音声に関連する画像６２と、集音音声を音声認識することにより得られたテキストデータが表示されたテキスト画面６３とが表示される。 When the information recorded by the recording process of the IC recorder 60 is output to the PC 61, for example, the display of the PC 61 recognizes the image 62 related to the sound collected by the IC recorder 60 and the collected sound by voice recognition. A text screen 63 on which the obtained text data is displayed is displayed.

図１５は、図１４のICレコーダ６０の内部構成例を示すブロック図である。図１５において、図１のビデオカメラ１０における場合と対応する部分には同一の符号を付してあり、その説明は適宜省略する。 FIG. 15 is a block diagram illustrating an internal configuration example of the IC recorder 60 of FIG. 15, parts corresponding to those in the video camera 10 of FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted as appropriate.

ICレコーダ６０は、音声入力部１１、音声認識部１３、情報取得部１６、情報データベース１７、記録制御部６６、記録媒体６７から構成されている。 The IC recorder 60 includes a voice input unit 11, a voice recognition unit 13, an information acquisition unit 16, an information database 17, a recording control unit 66, and a recording medium 67.

音声入力部１１は、音声を集音することにより、音声データを取得し、言語音認識部１４、音源特定部１５および記録制御部６６に供給する。 The voice input unit 11 acquires voice data by collecting voice and supplies the voice data to the language sound recognition unit 14, the sound source identification unit 15, and the recording control unit 66.

言語音声認識部１４は、音声入力部１１から供給された音声データを音声認識することにより得られるテキストデータを、記録制御部６６に供給する。 The language voice recognition unit 14 supplies text data obtained by voice recognition of the voice data supplied from the voice input unit 11 to the recording control unit 66.

情報取得部１６は、音源特定部１５により供給された音源情報に基づいて、情報データベース１７から取得した情報を、記録制御部６６に供給する。 The information acquisition unit 16 supplies the information acquired from the information database 17 to the recording control unit 66 based on the sound source information supplied by the sound source identification unit 15.

記録制御部６６は、音声入力部１１から供給される音声データを、言語音声認識部１４から供給されるテキストデータ、および情報取得部１６から供給された情報とともに記録媒体６７に記録する制御を行う。記録媒体６７としては、例えば、フラッシュメモリ等を用いることができる。 The recording control unit 66 performs control to record the voice data supplied from the voice input unit 11 on the recording medium 67 together with the text data supplied from the language voice recognition unit 14 and the information supplied from the information acquisition unit 16. . As the recording medium 67, for example, a flash memory or the like can be used.

なお、図１５の実施の形態では、音声認識部１３および情報取得部１６がICレコーダ６０内部に実装されているが、音声認識部１３または情報取得部１６は、外部に設置された装置（外部装置）に実装し、ICレコーダ６０がネットワークを介して外部装置と通信することにより、音声認識部１３または情報取得部１６が行う処理を外部装置に実行させるようにしても良い。外部装置とは、例えば、PCやサーバなどである。 In the embodiment of FIG. 15, the voice recognition unit 13 and the information acquisition unit 16 are mounted in the IC recorder 60, but the voice recognition unit 13 or the information acquisition unit 16 is an external device (external It is also possible to cause the external device to execute processing performed by the voice recognition unit 13 or the information acquisition unit 16 when the IC recorder 60 communicates with the external device via a network. The external device is, for example, a PC or a server.

また、図１５の実施の形態では、情報取得部１６が、ICレコーダ６０本体に内蔵された情報データベース１７から情報を取得するようにしたが、情報取得部１６は、ネットワークを介して外部装置から情報を取得するようにしても良いし、ICレコーダ６０に着脱可能なリムーバルメディア、例えば、フラッシュメモリなどから情報を取得するようにしても良い。 In the embodiment of FIG. 15, the information acquisition unit 16 acquires information from the information database 17 built in the main body of the IC recorder 60. However, the information acquisition unit 16 is connected to an external device via a network. Information may be acquired, or information may be acquired from a removable medium that can be attached to and detached from the IC recorder 60, such as a flash memory.

図１６は、ICレコーダ６０の録音処理を説明するフローチャートである。 FIG. 16 is a flowchart for explaining the recording process of the IC recorder 60.

ステップS４１において、音声入力部１１は、音声データを取得し、音声認識部１３および記録制御部６６に供給して、ステップS４２へ進む。 In step S41, the voice input unit 11 acquires voice data, supplies the voice data to the voice recognition unit 13 and the recording control unit 66, and proceeds to step S42.

ステップS４２において、言語音声認識部１４は、音声入力部１１から供給された音声データを音声認識することにより、テキストデータに変換し、記録制御部６６に供給する。 In step S 42, the language voice recognition unit 14 converts the voice data supplied from the voice input unit 11 into text data by voice recognition, and supplies the text data to the recording control unit 66.

その後、ステップS４３では、図８のステップS３における場合と同様に、音源特定部１５が、音声入力部１１に入力された音声の音源を特定し、その音源を表す音源情報を情報取得部１６に供給してステップS４４に進む。ステップS４４では、情報取得部１６は、音源特定部１５から供給された音源情報に基づいて、情報データベース１７から、集音音声に関連する画像（例えば、駅のホームの画像など、集音を行った環境をイメージさせる画像）を取得し、情報記録部６６に供給する。 Thereafter, in step S43, as in step S3 of FIG. 8, the sound source specifying unit 15 specifies the sound source of the sound input to the sound input unit 11, and the sound source information representing the sound source is sent to the information acquisition unit 16. The process proceeds to step S44. In step S44, based on the sound source information supplied from the sound source specifying unit 15, the information acquisition unit 16 collects sound from the information database 17 such as an image related to the collected sound (for example, an image of a station home). An image that gives an image of the environment) and supplies it to the information recording unit 66.

ステップS４４からステップS４５へ進み、記録制御部６６は、言語音声認識部１４から供給されるテキストデータと情報取得部１６から供給される画像データを音声データに対応付けて記録媒体６７に記憶させ、ステップS４１に戻り、録音処理を繰り返す。 Proceeding from step S44 to step S45, the recording control unit 66 stores the text data supplied from the language speech recognition unit 14 and the image data supplied from the information acquisition unit 16 in the recording medium 67 in association with the voice data, Returning to step S41, the recording process is repeated.

このようにICレコーダ６０は、集音音声を、集音環境をイメージさせる画像とともに記録することで、ユーザが、後に録音内容を確認した際に、内容や録音状況を容易に把握することができる。 As described above, the IC recorder 60 records the collected sound together with the image that makes the sound collection environment image, so that the user can easily grasp the contents and the recording situation when the user confirms the recorded contents later. .

また、ICレコーダ６０においては、例えば、会議中の様子を録音した場合、録音内容を議事録としてPCに出力することができる。また、ICレコーダ６０では、予め登録された音声をもとに、発言者を特定したり、発言時の声の調子から、発言者の感情や、顔の表情を表す画像を付加することも可能である。 Further, in the IC recorder 60, for example, when the state during the meeting is recorded, the recorded content can be output to the PC as a minutes. In addition, the IC recorder 60 can specify a speaker based on a pre-registered voice, and can add an image representing a speaker's emotion and facial expression based on the tone of the voice at the time of speaking. It is.

図１７は、本発明を適用した携帯電話機の他の実施の形態の構成例を示すブロック図である。図１７において、図１のビデオカメラ１０における場合と対応するの部分には同一の符号を付してあり、その説明は適宜省略する。 FIG. 17 is a block diagram showing a configuration example of another embodiment of a mobile phone to which the present invention is applied. 17, parts corresponding to those in the video camera 10 of FIG. 1 are denoted by the same reference numerals, and description thereof is omitted as appropriate.

携帯電話機８０は、音声入力部１１、音源特定部１５、情報取得部１６、情報データベース１７、表示制御部８１、表示部８２から構成されている。 The cellular phone 80 includes a voice input unit 11, a sound source identification unit 15, an information acquisition unit 16, an information database 17, a display control unit 81, and a display unit 82.

情報取得部１６は、音源特定部１５から供給される音源情報に基づいて、情報データベース１７から取得した情報を、表示制御部８１へ供給する。 The information acquisition unit 16 supplies the information acquired from the information database 17 to the display control unit 81 based on the sound source information supplied from the sound source specifying unit 15.

表示制御部８１は、情報取得部１６から供給された情報を表示部８２に供給して表示させる。 The display control unit 81 supplies the information supplied from the information acquisition unit 16 to the display unit 82 for display.

表示部８２は、例えばLCD（Liquid Crystal Display）などで構成され、表示制御部８１からの画像を、いわゆる待ち受け画面の画像として表示する。 The display unit 82 is configured by, for example, an LCD (Liquid Crystal Display), and displays an image from the display control unit 81 as a so-called standby screen image.

図１７の実施の形態では、音源特定部１５および情報取得部１６が携帯電話機８０内部に実装されているが、音源特定部１５または情報取得部１６は、外部に設置された装置（外部装置）に実装し、携帯電話機８０がネットワークを介して外部装置と通信することにより、音源特定部１５または情報取得部１６が行う処理を外部装置に実行させるようにしても良い。外部装置とは、例えば、PCやサーバなどである。 In the embodiment of FIG. 17, the sound source identification unit 15 and the information acquisition unit 16 are mounted inside the mobile phone 80, but the sound source identification unit 15 or the information acquisition unit 16 is an external device (external device). And the mobile phone 80 may cause the external device to execute the processing performed by the sound source identification unit 15 or the information acquisition unit 16 by communicating with the external device via the network. The external device is, for example, a PC or a server.

また、図１７の実施の形態では、情報取得部１６が、携帯電話機８０本体に内蔵された情報データベース１７から情報を取得するようにしたが、情報取得部１６は、ネットワークを介して外部装置から情報を取得するようにしても良いし、携帯電話機８０に着脱可能なリムーバルメディア、例えば、フラッシュメモリなどから情報を取得するようにしても良い。 In the embodiment of FIG. 17, the information acquisition unit 16 acquires information from the information database 17 built in the mobile phone 80 body. However, the information acquisition unit 16 receives information from an external device via a network. Information may be acquired, or information may be acquired from a removable medium that can be attached to and detached from the mobile phone 80, such as a flash memory.

図１８は、図１７の携帯電話機８０が行う、待ち受け画面を表示する待ち受け画面表示処理を示すフローチャートである。 FIG. 18 is a flowchart showing a standby screen display process for displaying a standby screen, which is performed by the mobile phone 80 of FIG.

ステップS６１において、音声入力部１１は、音声データを取得し、音源特定部１５へ供給してステップS６２へ進む。 In step S61, the voice input unit 11 acquires voice data, supplies the voice data to the sound source specifying unit 15, and proceeds to step S62.

その後、ステップS６２では、図８のステップS３における場合と同様に、音源特定部１５が、音声入力部１１に入力された音声の音源を特定し、その音源を表す音源情報を情報取得部１６に供給してステップS６３へ進む。ステップS６３では、情報取得部１６は、音源特定部１５により供給された音源情報に基づいて、情報データベース１７から、集音音声に関連する画像を取得し、表示制御部８１に供給してステップS６４に進む。 After that, in step S62, as in step S3 of FIG. 8, the sound source specifying unit 15 specifies the sound source of the sound input to the sound input unit 11, and the sound source information representing the sound source is sent to the information acquisition unit 16. The process proceeds to step S63. In step S63, the information acquisition unit 16 acquires an image related to the collected sound from the information database 17 based on the sound source information supplied by the sound source specifying unit 15, and supplies the image to the display control unit 81 to be supplied to step S64. Proceed to

ステップS６４において、表示制御部８１は、情報取得部１６から供給された画像を表示部８２へ供給して表示させ、ステップS６５へ進む。 In step S64, the display control unit 81 supplies the image supplied from the information acquisition unit 16 to the display unit 82 for display, and proceeds to step S65.

ステップS６５において、音声入力部１１は、現時刻が設定時刻であるか否かの判定を行う。ステップS６５において、現時刻が設定時刻でないと判定された場合は継続して時刻の監視を行い、現時刻が設定時刻であると判定された場合には、ステップS６１へ戻り、ステップS６１乃至ステップS６５の処理を繰り返す。
この場合、携帯電話機８０では、設定時刻を契機にして（トリガとして）、表示部８２に表示される待ち受け画面が切り替えられることになる。 In step S65, the voice input unit 11 determines whether or not the current time is the set time. If it is determined in step S65 that the current time is not the set time, the time is continuously monitored. If it is determined that the current time is the set time, the process returns to step S61, and steps S61 to S65 are performed. Repeat the process.
In this case, in the mobile phone 80, the standby screen displayed on the display unit 82 is switched with the set time as a trigger (as a trigger).

なお、設定時刻については、例えば、1時間周期、もしくは、朝（6:00）、昼（12:00）、夜（18:00）、深夜（0:00）などの時刻設定を行うことができる。また、本実施の形態では、時刻に対応して待ち受け画面の表示を切り替えるとしたが、音声入力部１１を常にオン状態に保ち、集音音声の変化に合わせて待ち受け画面の表示を切り替えるようにしても良い。 As for the set time, for example, a time period such as 1 hour cycle or morning (6:00), noon (12:00), night (18:00), midnight (0:00) can be set. it can. In the present embodiment, the standby screen display is switched according to the time. However, the voice input unit 11 is always turned on, and the standby screen display is switched in accordance with the change in the collected sound. May be.

このように携帯電話機８０では、一定周期もしくは、設定された時刻、または集音音声の変化などを契機にして待ち受け画面の表示の切り替えが可能である。 As described above, the mobile phone 80 can switch the display of the standby screen in response to a certain period, a set time, or a change in collected sound.

図１９は、図１７の携帯電話機８０の表示部８２に表示される待ち受け画面の表示例を示す図である。 FIG. 19 is a diagram illustrating a display example of a standby screen displayed on the display unit 82 of the mobile phone 80 of FIG.

例えば、ある時点においてウグイスの鳴き声を集音した場合、携帯電話機８０は、集音した音声がウグイスの鳴き声（集音した音声の音源がウグイス）であることを特定し、ウグイスに関連する情報である情報画像９１を取得する。そして、携帯電話機８０は、表示部８２に情報画像９１を表示させる。さらに、携帯電話機８０は、その後、別の鳥の鳴き声を集音した場合は、その別の鳥に関連する情報である情報画像９２を取得し、表示部８２の表示を情報画像９２に切り替える。 For example, when collecting warbler's calls at a certain point in time, the mobile phone 80 specifies that the collected voice is a warbler's call (the sound source of the collected voice is a warbler), and information related to the warbler. A certain information image 91 is acquired. Then, the cellular phone 80 displays the information image 91 on the display unit 82. Further, when the mobile phone 80 subsequently collects another bird's cry, it acquires an information image 92 that is information related to the other bird, and switches the display of the display unit 82 to the information image 92.

なお、図１９では、情報画像９１および情報画像９２には、鳥の画像の他、その鳥の名前（名称）等の鳥に関連する情報も含まれている。 In FIG. 19, the information image 91 and the information image 92 include information related to a bird such as the name (name) of the bird in addition to the bird image.

この場合、携帯電話機８０は、図鑑的な役割も果たすことになる。 In this case, the mobile phone 80 also plays a pictorial role.

上記においては、鳥の鳴き声を例に挙げて説明したが、もちろん鳥以外の鳴き声に対しても、また、例えば、自動車や飛行機のエンジン音などに対しても同様の処理を行うことができる。また、携帯電話機８０では、その他、例えばある時刻に駅のホームいた場合は、待ち受け画面には駅のホームの画像を表示させることなども可能である。 In the above description, the squeal of a bird has been described as an example. Of course, the same processing can be performed for a squeal other than a bird, and for example, an engine sound of an automobile or an airplane. In addition, in the mobile phone 80, for example, when the station home is at a certain time, an image of the station home can be displayed on the standby screen.

なお、例えば、ある一定の時間範囲において表示した画像の履歴を記憶しておくようにすることで、その履歴から、ユーザがある時刻に何処にいたかなどの記憶を思い出すのに役立てることができる。 For example, by storing the history of images displayed in a certain time range, it is possible to use the history to remember memories such as where the user was at a certain time. .

また、表示部８２に表示する画像は、ユーザの操作に基づいて、所望のカテゴリに属する画像に制限することが可能である。カテゴリとしては、例えば、風景、動物、自動車などがある。 Further, the images displayed on the display unit 82 can be limited to images belonging to a desired category based on user operations. Categories include, for example, landscapes, animals, and cars.

なお、本発明は、ビデオカメラや、ICレコーダ、携帯電話機等の他、PDA（Personal Digital Assistants）などの集音機能を少なくとも備えたポータブル機器、さらには、表示機能も備えたポータブル機器、その他に適用することが可能である。 The present invention is not limited to a video camera, an IC recorder, a mobile phone, a portable device having at least a sound collecting function such as a PDA (Personal Digital Assistants), a portable device having a display function, and the like. It is possible to apply.

上述した一連の処理は、ハードウェアにより実行させることもできるが、ソフトウェアにより実行させることもできる。一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、専用のハードウェアに組み込まれているコンピュータ、または各種のプログラムをインストールすることで各種の機能を実行することが可能な、例えば汎用のパーソナルコンピュータ１００などに、記録媒体からインストールされる。 The series of processes described above can be executed by hardware, but can also be executed by software. When a series of processing is executed by software, a program constituting the software can execute various functions by installing a computer incorporated in dedicated hardware or various programs. For example, it is installed from a recording medium in a general-purpose personal computer 100 or the like.

この記録媒体は、図２０に示すように、パーソナルコンピュータ１００とは別に、ユーザにプログラムを提供するために配布される、プログラムが記録されている磁気ディスク１１１（フレキシブルディスクを含む）、光ディスク１１２（CD-ROM(Compact Disc-Read Only Memory)、ＤＶＤ(Digital Versatile Disc)を含む）、光磁気ディスク１１３（ＭＤ(Mini-Disc)（商標）を含む）、若しくは半導体メモリ１１４などよりなるパッケージメディアにより構成されるだけでなく、コンピュータに予め組み込まれた状態でユーザに提供される、プログラムが記録されているROM１０２や、記憶部１０８に含まれるハードディスクなどで構成される。 As shown in FIG. 20, the recording medium is distributed to provide a program to the user separately from the personal computer 100. The recording medium includes a magnetic disk 111 (including a flexible disk) and an optical disk 112 (including a flexible disk). CD-ROM (including Compact Disc-Read Only Memory) and DVD (Digital Versatile Disc)), magneto-optical disk 113 (including MD (Mini-Disc) (trademark)), or package media including semiconductor memory 114 In addition to being configured, it is configured by a ROM 102 on which a program is recorded and a hard disk included in the storage unit 108 provided to the user in a state of being preinstalled in a computer.

パーソナルコンピュータ１００のCPU１０１は、パーソナルコンピュータの全体の動作を制御する。また、CPU１０１は、バス１０４および入出力インターフェース１０５を介してユーザからキーボードやマウスなどからなる入力部１０６から指令が入力されると、それに対応してROM(Read Only Memory)１０２に格納されているプログラムを実行する。あるいはまた、CPU１０１は、ドライブ１１０に接続された磁気ディスク１１１、光ディスク１１２、光磁気ディスク１１３、または半導体メモリ１１４から読み出され、記憶部１０８にインストールされたプログラムを、RAM(Random Access Memory)１０３にロードして実行する。さらに、CPU１０１は、通信部１０９を制御して、外部と通信し、データの授受を実行する。 The CPU 101 of the personal computer 100 controls the overall operation of the personal computer. Further, when an instruction is input from the input unit 106 such as a keyboard or a mouse from the user via the bus 104 and the input / output interface 105, the CPU 101 stores the instruction in a ROM (Read Only Memory) 102 correspondingly. Run the program. Alternatively, the CPU 101 reads a program read from the magnetic disk 111, the optical disk 112, the magneto-optical disk 113, or the semiconductor memory 114 connected to the drive 110 and installed in the storage unit 108 into a RAM (Random Access Memory) 103. To load and execute. Further, the CPU 101 controls the communication unit 109 to communicate with the outside and exchange data.

なお、上述した一連の処理を実行させるプログラムは、必要に応じてルータ、モデムなどのインターフェースを介して、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の通信媒体を介してコンピュータにインストールされるようにしても良い。 The program for executing the series of processes described above is installed in a computer via a wired or wireless communication medium such as a local area network, the Internet, or digital satellite broadcasting via an interface such as a router or a modem as necessary. You may be made to do.

また、本明細書において、記録媒体に格納されるプログラムを記述するステップは、記載された順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的あるいは個別に実行される処理をも含むものである。 Further, in the present specification, the step of describing the program stored in the recording medium is not limited to the processing performed in chronological order according to the described order, but is not necessarily performed in chronological order. It also includes processes that are executed individually.

本発明を適用したビデオカメラの一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the video camera to which this invention is applied. 図１の音源特定部１５の構成例を示すブロック図である。It is a block diagram which shows the structural example of the sound source specific | specification part 15 of FIG. 図１の音源特定部１５における処理の概要を示す図である。It is a figure which shows the outline | summary of the process in the sound source specific | specification part 15 of FIG. 周波数スペクトルを示す図である。It is a figure which shows a frequency spectrum. 周波数スペクトルの時間変化を示す図である。It is a figure which shows the time change of a frequency spectrum. 周波数スペクトルの時間変化を示す図である。It is a figure which shows the time change of a frequency spectrum. 周波数スペクトルの時間変化を示す図である。It is a figure which shows the time change of a frequency spectrum. 図１のデオカメラ１０の撮影記録処理を説明するフローチャートである。It is a flowchart explaining the imaging | photography recording process of the video camera 10 of FIG. 図１のビデオカメラ１０における処理の概要を示す図である。It is a figure which shows the outline | summary of the process in the video camera 10 of FIG. 図１のビデオカメラ１０における処理の概要を示す図である。It is a figure which shows the outline | summary of the process in the video camera 10 of FIG. 本発明を適用した携帯電話機の一実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of one Embodiment of the mobile telephone to which this invention is applied. 図１１の携帯電話機５０における処理を説明するフローチャートである。12 is a flowchart for explaining processing in the mobile phone 50 of FIG. 図１１の携帯電話機５０が送信したメールを受信した携帯電話機におけるメールの表示例を示す図である。It is a figure which shows the example of a display of the mail in the mobile telephone which received the mail which the mobile telephone 50 of FIG. 11 transmitted. 本発明を適用したICレコーダシステムの一実施の形態の構成例を示す外観図である。1 is an external view showing a configuration example of an embodiment of an IC recorder system to which the present invention is applied. 図１４のICレコーダ６０の内部構成例を示すブロック図である。It is a block diagram which shows the internal structural example of IC recorder 60 of FIG. 図１４のICレコーダ６０における録音処理を説明するフローチャートである。It is a flowchart explaining the recording process in the IC recorder 60 of FIG. 本発明を適用した携帯電話機の他の実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of other embodiment of the mobile telephone to which this invention is applied. 図１７の携帯電話機８０の待ち受け画面表示処理を説明するフローチャートである。18 is a flowchart for explaining a standby screen display process of the mobile phone 80 of FIG. 図１７の携帯電話機８０の処理の概要を示す図である。It is a figure which shows the outline | summary of a process of the mobile telephone 80 of FIG. パーソナルコンピュータの構成例を示す図である。It is a figure which shows the structural example of a personal computer.

Explanation of symbols

１０ビデオカメラ，１１音声入力部，１２画像入力部，１３音声認識部，１４言語音声認識部，１５音源特定部，１６情報取得部，１７情報データベース，１８画像編集部，１９記録制御部，２０記録媒体，３１ FFT処理部，３２データ比較部，３３音源特定データベース，４１撮影画像，４２効果画像，４３合成画像，４６撮影画像，４７情報画像，４８合成画像，５０携帯電話機，５１メール作成部，５２通信部，６０ ICレコーダ，６１ PC，６２画像，６３テキスト画面，６６記録制御部，６７記録媒体，８０携帯電話機，８１表示制御部，８２表示部，９１情報画像，９２情報画像，１００パーソナルコンピュータ，１０１ＣＰＵ，１０２ＲＯＭ，１０３ＲＡＭ，１０４内部バス，１０５入出力インターフェース，１０６入力部，１０７出力部，１０８記憶部，１０９通信部，１１０ドライブ，１１１磁気ディスク，１１２光ディスク，１１３光磁気ディスク，１１４半導体メモリ DESCRIPTION OF SYMBOLS 10 Video camera, 11 Voice input part, 12 Image input part, 13 Voice recognition part, 14 Language speech recognition part, 15 Sound source specific part, 16 Information acquisition part, 17 Information database, 18 Image editing part, 19 Recording control part, 20 Recording medium, 31 FFT processing section, 32 data comparison section, 33 sound source identification database, 41 captured image, 42 effect image, 43 composite image, 46 captured image, 47 information image, 48 composite image, 50 mobile phone, 51 mail creation section , 52 communication unit, 60 IC recorder, 61 PC, 62 image, 63 text screen, 66 recording control unit, 67 recording medium, 80 mobile phone, 81 display control unit, 82 display unit, 91 information image, 92 information image, 100 Personal computer, 101 CPU, 102 ROM, 103 RAM, 104 internal bus, 105 input / output interface, 106 input section, 107 output section, 108 storage section, 109 communication section, 110 drive, 111 magnetic disk, 112 optical disk, 113 magneto-optical disk, 114 semiconductor memory

Claims

A sound source identifying means for identifying the sound source of the input voice;
An information processing apparatus comprising: information acquisition means for acquiring information related to the sound based on sound source information indicating the sound source specified by the sound source specifying means.

The information processing apparatus according to claim 1, further comprising a recording unit that records the information acquired by the information acquisition unit together with the sound.

Language speech recognition means for recognizing language speech included in the speech and outputting text data corresponding to the language speech;
The information processing apparatus according to claim 2, wherein the recording unit records the text data together with the voice.

Photographing means for photographing an image;
Image editing means for editing the image acquired by the photographing means using the information acquired by the information acquiring means;
The information processing apparatus according to claim 1, further comprising: a recording unit that records an edited image obtained by editing by the image editing unit together with the sound.

Language speech recognition means for recognizing language speech included in the speech and outputting text data corresponding to the language speech;
The information processing apparatus according to claim 4, wherein the image editing unit edits the image using the text data.

The information processing apparatus according to claim 1, further comprising: a mail creation unit that creates e-mail data using the information acquired by the information acquisition unit.

Language speech recognition means for recognizing language speech included in the speech and outputting text data corresponding to the language speech;
The information processing apparatus according to claim 6, wherein the e-mail creating unit creates the e-mail data using the text data.

The information processing apparatus according to claim 1, further comprising display means for displaying information acquired by the information acquisition means.

The information processing apparatus according to claim 1, wherein the sound source specifying unit specifies the sound source by causing the external apparatus connected via a network to process the sound.

The information processing apparatus according to claim 1, wherein the information acquisition unit acquires the information by performing communication via a network.

A sound source identification step for identifying the sound source of the input voice;
An information acquisition method comprising: an information acquisition step of acquiring information related to the sound based on sound source information indicating the sound source specified by the sound source specifying step.

A sound source identification step for identifying the sound source of the input voice;
An information acquisition step of acquiring information related to the sound based on sound source information indicating the sound source specified by the processing of the sound source specifying step. Recording medium.

A sound source identification step for identifying the sound source of the input voice;
An information acquisition step of acquiring information related to the sound based on sound source information indicating the sound source specified by the processing of the sound source specifying step.