JP2012039391A

JP2012039391A - Reproducing device, method, and program

Info

Publication number: JP2012039391A
Application number: JP2010177839A
Authority: JP
Inventors: Tamotsu Irie; 保入江
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2010-08-06
Filing date: 2010-08-06
Publication date: 2012-02-23

Abstract

PROBLEM TO BE SOLVED: To easily and simply achieve processing of displaying even images reflecting sound contents included in sound data while maintaining the composition and configuration of original images in the middle of reproducing original image data.SOLUTION: A sound output unit 19 outputs sound expressed by sound data by reproducing the sound data. A sound content recognition unit 52 recognizes the sound contents included in the sound data by analyzing the sound data of the reproduction object of the sound output unit 19. A display unit 18 displays images expressed by the image data as original images by reproducing the image data. A sound content reflection unit 54 executes processing of displaying images reflecting the sound contents recognized by the sound content recognition unit 52 while maintaining the configuration and composition of the original images in the middle of reproducing the image data by the display unit 18, as sound content reflection processing.

Description

本発明は、再生装置及び方法、並びにプログラムに関し、特に、オリジナルの画像データを再生している最中に、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現できるようにする技術に関する。 The present invention relates to a playback apparatus and method, and a program, and particularly, while reproducing original image data, reflects the audio content included in the audio data while maintaining the composition and configuration of the original image. The present invention relates to a technique for enabling easy and easy processing for displaying an image.

近年、画像を鑑賞する際の演出効果を高める目的で、画像データに対して各種画像処理が施されるようになっている。
例えば、デジタルフォトフレームに代表される再生装置では、画像データと共に音声データも再生できるようになっている。
そこで、特許文献１には、画像データに同期した音声データを文字データに変換して、この文字データと画像データとを関連付ける技術が開示されている。
また、特許文献２には、音楽再生機器から出力される音楽のテンポに合わせて、画面で表示するキャラクタを動作させる技術、即ち、音楽のテンポにあわせて変化するキャラクタを被写体に含む新たな画像データを創造して再生する技術が開示されている。 In recent years, various types of image processing have been performed on image data for the purpose of enhancing the effect of viewing images.
For example, a reproducing apparatus represented by a digital photo frame can reproduce audio data together with image data.
Therefore, Patent Document 1 discloses a technique for converting audio data synchronized with image data into character data and associating the character data with the image data.
Patent Document 2 discloses a technique for operating a character displayed on a screen in accordance with the tempo of music output from a music playback device, that is, a new image including a character that changes in accordance with the tempo of music as a subject. A technique for creating and reproducing data is disclosed.

特開２００７−１０１９４５号公報JP 2007-101945 A 特開２００７−１６００６５号公報JP 2007-160065 A

しかしながら、特許文献１に記載の技術では、音声データを解析して文字データを作成し、作成した文字データを画像データに関連付ける、といった複雑で時間のかかる処理が必要であった。
このため、当該処理を画像データの再生中に並行して実行しながら、当該処理結果を画像データの再生内容に逐次反映させていくこと、即ち音声データに含まれる音声内容を反映させた画像データの再生を実現することは非常に困難である。換言すると、画像データを再生する前に、当該処理を予め実行しておき、画像データについてのメタデータに文字データを含める等の事前準備をした上でないと、音声データに含まれる音声内容を反映させた画像データの再生をすることは非常に困難である。
また、特許文献２に記載の技術では、上述したように、音楽のテンポにあわせて変化するキャラクタを被写体に含む新たな画像データを創造するため、オリジナルの画像を鑑賞する目的のデジタルフォトフレーム等に適用することは好適でない。
このため、オリジナルの画像データを再生している最中に、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現することが要求されている状況である。 However, the technique disclosed in Patent Document 1 requires complicated and time-consuming processing such as analyzing voice data to create character data and associating the created character data with image data.
Therefore, while executing the process in parallel during the reproduction of the image data, the process result is sequentially reflected in the reproduction content of the image data, that is, the image data reflecting the audio content included in the audio data. It is very difficult to realize the reproduction. In other words, the audio content included in the audio data is reflected unless the processing is executed in advance before the image data is reproduced and the text data is included in the metadata of the image data. It is very difficult to reproduce the image data.
Further, in the technique described in Patent Document 2, as described above, in order to create new image data including a character that changes in accordance with the tempo of music in a subject, a digital photo frame for the purpose of viewing an original image or the like It is not suitable to apply to.
Therefore, during the reproduction of the original image data, it is possible to easily and easily realize the process of displaying the image reflecting the audio content included in the audio data while maintaining the composition and configuration of the original image. It is a situation that is required.

本発明は、このような状況に鑑みてなされたものであり、オリジナルの画像データを再生している最中に、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現することを目的とする。 The present invention has been made in view of such a situation, and while reproducing the original image data, the audio content included in the audio data is maintained while maintaining the composition and configuration of the original image. It is an object to easily and easily realize a process of displaying a reflected image.

本発明の一態様によると、
音声データを再生することによって、当該音声データにより表わされる音声を出力する音声再生手段と、
前記音声再生手段の再生対象の前記音声データを解析することによって、当該音声データに含まれる音声内容を認識する音声内容認識手段と、
画像データを再生することによって、当該画像データにより表わされる画像を、オリジナルの画像として表示する画像再生手段と、
前記画像再生手段により前記画像データが再生されている最中に、前記オリジナルの画像の構成及び構図を維持したまま、前記音声内容認識手段により認識された前記音声内容を反映させた画像を表示する処理を、音声内容反映処理として実行する音声内容反映手段と、
を備える再生装置を提供する。 According to one aspect of the invention,
Sound reproducing means for outputting sound represented by the sound data by playing the sound data;
Audio content recognition means for recognizing audio content included in the audio data by analyzing the audio data to be reproduced by the audio reproduction means;
Image reproducing means for displaying an image represented by the image data as an original image by reproducing the image data;
While the image data is being reproduced by the image reproduction means, an image reflecting the audio content recognized by the audio content recognition means is displayed while maintaining the configuration and composition of the original image. Voice content reflecting means for executing processing as voice content reflecting processing;
A playback device is provided.

本発明の他の態様によると、上述した本発明の一態様に係るに対応する方法及びプログラムの各々を提供する。 According to another aspect of the present invention, there is provided each of a method and program corresponding to one aspect of the present invention described above.

本発明によれば、オリジナルの画像データを再生している最中に、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現することができる。 According to the present invention, during the reproduction of original image data, the process of displaying an image reflecting the audio content included in the audio data while maintaining the composition and configuration of the original image is easy and easy. Can be realized.

本発明の一実施形態に係る再生装置のハードウェアの構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the reproducing | regenerating apparatus which concerns on one Embodiment of this invention. 音声再生処理及び画像再生処理を実行するための再生装置の機能的構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure of the reproducing | regenerating apparatus for performing an audio | voice reproduction | regeneration process and an image reproduction process. 図２のＣＰＵが実行する音声再生処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the audio | voice reproduction | regeneration processing which CPU of FIG. 2 performs. 図２の再生音声取得部により処理対象期間の音声データが取得される再生対象の音声データの一例を示すタイミングチャートである。3 is a timing chart showing an example of audio data to be reproduced from which audio data for a processing target period is acquired by the reproduction audio acquisition unit of FIG. 2. 再生対象の音声データに対して、ＦＦＴ処理を施した結果の一例を示している。An example of the result of performing FFT processing on audio data to be reproduced is shown. 図２のＣＰＵが実行する画像再生処理の流れを説明するフローチャートである。3 is a flowchart illustrating a flow of image reproduction processing executed by a CPU in FIG. 2. 装飾付加オリジナル画像の一例を示している。An example of a decoration-added original image is shown. 本発明の第２実施形態に係るＣＰＵが実行する画像再生処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the image reproduction process which CPU which concerns on 2nd Embodiment of this invention performs. 音声内容に対応したオリジナルの画像の一例を示している。An example of an original image corresponding to audio content is shown.

以下、本発明の一実施形態を図面に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

［第１実施形態］
図１は、本発明の一実施形態に係る再生装置１のハードウェアの構成を示すブロック図である。再生装置１は、例えばデジタルフォトフレームにより構成することができる。 [First Embodiment]
FIG. 1 is a block diagram showing a hardware configuration of a playback apparatus 1 according to an embodiment of the present invention. The playback device 1 can be constituted by a digital photo frame, for example.

再生装置１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３と、ＲＴＣ（ＲｅａｌＴｉｍｅＣｌｏｃｋ）１４と、バス１５と、入出力インターフェース１６と、操作部１７と、表示部１８と、音声出力部１９と、記憶部２０と、通信部２１と、ドライブ２２と、を備えている。 The playback apparatus 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, an RTC (Real Time Clock) 14, a bus 15, and an input / output interface 16. , An operation unit 17, a display unit 18, an audio output unit 19, a storage unit 20, a communication unit 21, and a drive 22.

ＣＰＵ１１は、ＲＯＭ１２に記録されているプログラムに従って各種の処理を実行する。又は、ＣＰＵ１１は、記憶部２０からＲＡＭ１３にロードされたプログラムに従って各種の処理を実行する。
ＲＡＭ１３にはまた、ＣＰＵ１１が各種の処理を実行する上において必要なデータ等も適宜記憶される。 The CPU 11 executes various processes according to programs recorded in the ROM 12. Alternatively, the CPU 11 executes various processes according to a program loaded from the storage unit 20 to the RAM 13.
The RAM 13 also stores data necessary for the CPU 11 to execute various processes as appropriate.

例えば本実施形態では、後述する図２の再生音声取得部５１乃至音声内容反映部５４の各機能を実現するプログラムが、ＲＯＭ１２や記憶部２０に記憶されている。従って、ＣＰＵ１１が、これらのプログラムに従った処理を実行することで、後述する図２の再生音声取得部５１乃至音声内容反映部５４の各機能を実現することができる。 For example, in the present embodiment, programs for realizing the functions of the reproduction audio acquisition unit 51 to the audio content reflection unit 54 shown in FIG. 2 to be described later are stored in the ROM 12 and the storage unit 20. Therefore, when the CPU 11 executes processing according to these programs, each function of the reproduced sound acquisition unit 51 to the sound content reflection unit 54 of FIG. 2 to be described later can be realized.

ＲＴＣ１４は、計時動作をして、例えば現在時刻をＣＰＵ１１に出力する。即ち、ＣＰＵ１１は、ＲＴＣ１４から現在時刻を所定時間毎に取得し、この現在時刻に基づいて音声データ及び画像データの取得タイミングを制御するようにしている。 The RTC 14 counts time and outputs, for example, the current time to the CPU 11. That is, the CPU 11 acquires the current time from the RTC 14 every predetermined time, and controls the acquisition timing of the audio data and the image data based on the current time.

ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、及びＲＴＣ１４は、バス１５を介して相互に接続されている。このバス１５にはまた、入出力インターフェース１６も接続されている。入出力インターフェース１６には、操作部１７、表示部１８、音声出力部１９、記憶部２０及び通信部２１が接続されている。 The CPU 11, ROM 12, RAM 13, and RTC 14 are connected to each other via a bus 15. An input / output interface 16 is also connected to the bus 15. An operation unit 17, a display unit 18, an audio output unit 19, a storage unit 20, and a communication unit 21 are connected to the input / output interface 16.

操作部１７は、各種釦等で構成され、ユーザの指示操作を受け付ける。
表示部１８は、後述する図２のディスプレイ６２を含み、ＣＰＵ１１の制御の下、所定の画像データにより表現される画像を当該ディスプレイ６２に表示する。
音声出力部１９は、後述する図２のスピーカ７２を含み、ＣＰＵ１１の制御の下、所定の音声データにより表現される音声を、当該スピーカ７２から出力する。 The operation unit 17 includes various buttons and the like, and accepts user instruction operations.
The display unit 18 includes a display 62 shown in FIG. 2 described later, and displays an image represented by predetermined image data on the display 62 under the control of the CPU 11.
The audio output unit 19 includes a speaker 72 of FIG. 2 to be described later, and outputs sound expressed by predetermined audio data from the speaker 72 under the control of the CPU 11.

なお、所定の画像データに基づいて、当該画像データにより表現される画像をディスプレイ６２に表示させることを、以下、「画像データを再生する」と表現する。また、所定の音声データに基づいて、当該音声データにより表現される音声をスピーカ７２から出力させることを、以下、「音声データを再生する」と表現する。
即ち、ＣＰＵ１１の制御の下、表示部１８は画像データを再生し、音声出力部１９は、音声データを再生する。 Note that displaying the image represented by the image data on the display 62 based on the predetermined image data is hereinafter referred to as “reproducing the image data”. Moreover, outputting the sound expressed by the sound data from the speaker 72 based on the predetermined sound data is hereinafter referred to as “reproducing the sound data”.
That is, under the control of the CPU 11, the display unit 18 reproduces image data, and the audio output unit 19 reproduces audio data.

記憶部２０は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成される。記憶部２０は、各種音声処理や画像処理に必要な各種データ、例えば、各種フラグの値、閾値等も記憶する。記憶部２０はまた、その一領域として、音声記憶部３１及び画像記憶部３２を含んでいる。 The storage unit 20 includes a DRAM (Dynamic Random Access Memory) or the like. The storage unit 20 also stores various data necessary for various audio processing and image processing, such as various flag values and threshold values. The storage unit 20 also includes an audio storage unit 31 and an image storage unit 32 as one area.

音声記憶部３１は、再生可能な複数の音声データ、例えば楽曲等の音声データを記憶する。
音声記憶部３１に記憶される音声データの形式は、特に限定されない。例えば、本実施形態では、所定のフォーマットに従って符号化され、必要に応じて圧縮された音声データが、音声記憶部３１に記憶されている。なお、所定のフォーマットとしては、例えば、ＷＡＶＥ形式、ＭＰ３（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐＡｕｄｉｏＬａｙｅｒ−３）形式、ＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）形式等を採用することができる。 The sound storage unit 31 stores a plurality of reproducible sound data, for example, sound data such as music.
The format of the audio data stored in the audio storage unit 31 is not particularly limited. For example, in the present embodiment, audio data encoded according to a predetermined format and compressed as necessary is stored in the audio storage unit 31. As the predetermined format, for example, a WAVE format, an MP3 (Moving Picture Experts Group Audio Layer-3) format, an AAC (Advanced Audio Coding) format, or the like can be adopted.

画像記憶部３２は、再生可能な複数の画像データ、例えばデジタルカメラ等で撮影された写真、スキャナで読みこまれた画像、パーソナルコンピュータ等により加工若しくは創造された画像等の各種画像データを記憶する。
画像記憶部３２に記憶される画像データの形式は、特に限定されない。例えば、本実施形態では、所定のフォーマットに従って圧縮符号化された静止画の画像データが、画像記憶部３２に記憶されている。なお、所定のフォーマットとしては、例えば、ＪＰＥＧ（ＪｏｉｎｔＰｈｏｔｏｇｒａｐｈｉｃＥｘｐｅｒｔｓＧｒｏｕｐ）を採用することができる。或いは、ＧＩＦ（ＧｒａｐｈｉｃｓＩｎｔｅｒｃｈａｎｇｅＦｏｒｍａｔ）、ＰＮＧ（ＰｏｒｔａｂｌｅＮｅｔｗｏｒｋＧｒａｐｈｉｃｓ）、ＴＩＦＦ（ＴａｇｇｅｄＩｍａｇｅＦｉｌｅＦｏｒｍａｔ）等を、所定のフォーマットとして採用してもよい。なお、後述するＣＰＵ１１（より詳細には音声内容反映部５４）によって加工された画像データと区別すべく、画像記憶部３２に記憶されている画像データを、以下、「オリジナルの画像データ」と呼ぶ。また、オリジナルの画像データが再生された場合にディスプレイ６２に表示される画像を、以下、「オリジナルの画像」と呼ぶ。 The image storage unit 32 stores a plurality of reproducible image data, for example, various image data such as a photograph taken by a digital camera, an image read by a scanner, and an image processed or created by a personal computer. .
The format of the image data stored in the image storage unit 32 is not particularly limited. For example, in the present embodiment, image data of a still image that has been compression-encoded according to a predetermined format is stored in the image storage unit 32. For example, JPEG (Joint Photographic Experts Group) can be adopted as the predetermined format. Alternatively, GIF (Graphics Interchange Format), PNG (Portable Network Graphics), TIFF (Tagged Image File Format), or the like may be adopted as the predetermined format. The image data stored in the image storage unit 32 is hereinafter referred to as “original image data” in order to distinguish it from image data processed by the CPU 11 (more specifically, the audio content reflecting unit 54) described later. . An image displayed on the display 62 when the original image data is reproduced is hereinafter referred to as “original image”.

通信部２１は、インターネットを含むネットワークを介して他の装置（図示せず）との間で行う通信を制御する。 The communication unit 21 controls communication with other devices (not shown) via a network including the Internet.

入出力インターフェース１６にはまた、必要に応じてドライブ２２が接続され、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリ等よりなるリムーバブルメディア４１が適宜装着される。ドライブ２２によってリムーバブルメディア４１から読み出されたプログラムは、必要に応じて記憶部２０にインストールされる。また、リムーバブルメディア４１は、記憶部２０に記憶されている画像データや音声データ等の各種データも、記憶部２０と同様に記憶することができる。 A drive 22 is connected to the input / output interface 16 as necessary, and a removable medium 41 made of a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is appropriately attached. The program read from the removable medium 41 by the drive 22 is installed in the storage unit 20 as necessary. The removable media 41 can also store various data such as image data and audio data stored in the storage unit 20 in the same manner as the storage unit 20.

このような構成を有する再生装置１は、次のような一連の処理を実行することができる。 The playback apparatus 1 having such a configuration can execute the following series of processes.

即ち、再生装置１は、音声記憶部３１に記憶されている複数の音声データのうち、再生対象の音声データを取得する。
再生装置１は、再生対象の音声データを解析することによって、当該音声データに含まれる音声の内容（以下、単に「音声内容」と呼ぶ）を認識する。
再生装置１は、再生対象の音声データを再生する。
このような一連の処理を、以下、「音声再生処理」と呼ぶ。 That is, the playback device 1 acquires audio data to be played out of a plurality of audio data stored in the audio storage unit 31.
The playback apparatus 1 recognizes the content of sound included in the sound data (hereinafter, simply referred to as “sound content”) by analyzing the sound data to be played back.
The playback device 1 plays back audio data to be played back.
Such a series of processing is hereinafter referred to as “audio reproduction processing”.

また、再生装置１は、画像記憶部３２に記憶されている複数の画像データのうち、再生対象の画像データを再生する場合、当該再生対象の画像データをオリジナルの画像データとして、次のような画像をディスプレイ６２に表示させる。即ち、再生装置１は、オリジナルの画像の構成及び構図を維持したまま、上述の音声再生処理により認識された音声内容を反映させた画像をディスプレイ６２表示させる。
なお、このような一連の処理を、以下、「画像再生処理」と呼ぶ。 Further, when reproducing the image data to be reproduced among the plurality of image data stored in the image storage unit 32, the reproducing apparatus 1 uses the image data to be reproduced as original image data as follows. The image is displayed on the display 62. That is, the playback device 1 displays on the display 62 an image reflecting the audio content recognized by the above-described audio playback processing while maintaining the configuration and composition of the original image.
Such a series of processing is hereinafter referred to as “image reproduction processing”.

本実施形態では、再生装置１は、画像再生処理の一部として、オリジナルの画像に対して、上述の音声再生処理により認識された音声内容に対応した装飾画像を付加する画像処理を、オリジナルの画像データに対して施す。これにより、オリジナルの画像の構成及び構図を維持したまま、上述の音声再生処理により認識された音声内容を反映させた画像（本実施形態では装飾画像）がディスプレイ６２に表示される。 In the present embodiment, as a part of the image reproduction process, the reproduction apparatus 1 performs image processing for adding a decoration image corresponding to the audio content recognized by the audio reproduction process described above to the original image. It is applied to image data. As a result, an image (decorative image in the present embodiment) reflecting the audio content recognized by the audio reproduction process described above is displayed on the display 62 while maintaining the configuration and composition of the original image.

図２は、このような音声再生処理及び画像再生処理を実行するための再生装置１の機能的構成を示す機能ブロック図である。
図２においては、図１の再生装置１の構成のうち、ＣＰＵ１１と、ＲＴＣ１４と、操作部１７と、表示部１８と、音声出力部１９と、記憶部２０と、のみが図示されている。
ＣＰＵ１１は、再生音声取得部５１と、音声内容認識部５２と、再生画像取得部５３と、音声内容反映部５４と、を備えている。
表示部１８は、表示制御部６１と、ディスプレイ６２と、を備えている。
音声出力部１９は、音声出力制御部７１と、スピーカ７２と、を備えている。 FIG. 2 is a functional block diagram showing a functional configuration of the playback apparatus 1 for executing such audio playback processing and image playback processing.
2, only the CPU 11, the RTC 14, the operation unit 17, the display unit 18, the audio output unit 19, and the storage unit 20 are illustrated in the configuration of the playback device 1 in FIG.
The CPU 11 includes a reproduction audio acquisition unit 51, an audio content recognition unit 52, a reproduction image acquisition unit 53, and an audio content reflection unit 54.
The display unit 18 includes a display control unit 61 and a display 62.
The audio output unit 19 includes an audio output control unit 71 and a speaker 72.

再生音声取得部５１は、操作部１７に対するユーザの指示操作により動作を開始すると、再生対象の音声データを音声記憶部３１から取得する。ここで、再生対象の音声データは、操作部１７に対するユーザの指示操作により予め決定されているものとする。
詳細については、図３のフローチャートを参照して後述するが、再生対象の音声データとは、１つの楽曲の開始から終了までの音声データであり、１回の処理で扱うには容量が大きい。そこで、再生対象の音声データは、所定期間毎の音声データに分割されて、所定の時間間隔毎に読み出される。即ち、再生音声取得部５１を含むＣＰＵ１１は、この所定期間分の音声データを単位として、再生音声処理を実行する。このような再生音声処理の対象となる１単位の音声データを、以下、「処理対象期間の音声データ」と呼ぶ。
また、処理対象期間の音声データが読み出される時間間隔は、本実施形態では、ＲＴＣ１４から現在時刻がＣＰＵ１１に対して送信される時間間隔に基づいて設定されており、具体的には２３ｍｓ間隔であるものとする。
即ち、本実施形態では、再生音声取得部５１は、ＲＴＣ１４から現在時刻が送信されるタイミングに同期して、２３ｍｓ間隔毎に、処理対象期間の音声データを音声記憶部３１から順次取得して、音声内容認識部５２に順次供給する。
なお、処理対象期間の音声データが、符号化され、必要に応じて圧縮されている場合には、再生音声取得部５１は、当該音声データを、必要に応じて伸長して、復号するものとする。 The reproduction sound acquisition unit 51 acquires sound data to be reproduced from the sound storage unit 31 when the operation is started by a user instruction operation on the operation unit 17. Here, it is assumed that the audio data to be reproduced is determined in advance by a user instruction operation on the operation unit 17.
Although details will be described later with reference to the flowchart of FIG. 3, the audio data to be reproduced is audio data from the start to the end of one piece of music, and has a large capacity for handling in one process. Therefore, the audio data to be reproduced is divided into audio data for each predetermined period, and read out at predetermined time intervals. In other words, the CPU 11 including the reproduction audio acquisition unit 51 executes reproduction audio processing in units of audio data for the predetermined period. Hereinafter, one unit of audio data to be subjected to the reproduction audio processing is referred to as “audio data for the processing target period”.
In the present embodiment, the time interval at which the audio data in the processing target period is read out is set based on the time interval at which the current time is transmitted from the RTC 14 to the CPU 11, and is specifically a 23 ms interval. Shall.
That is, in the present embodiment, the reproduction audio acquisition unit 51 sequentially acquires audio data of the processing target period from the audio storage unit 31 at intervals of 23 ms in synchronization with the timing at which the current time is transmitted from the RTC 14. Sequentially supplied to the voice content recognition unit 52.
When the audio data in the processing target period is encoded and compressed as necessary, the reproduction audio acquisition unit 51 decompresses the audio data as necessary and decodes it. To do.

音声内容認識部５２は、再生音声取得部５１から供給された処理対象期間の音声データに含まれる音声内容を認識する。
音声内容認識部５２により認識される音声内容は、特に限定されず、また、音声内容を認識する手法も特に限定されない。
ただし、本実施形態では、音声内容認識部５２により認識される音声内容は、男性の声、女性の声、及び、人間（男性及び女性）の声を含まない音、といった３種類の音声内容のうちの何れかであるものとする。
詳細については後述するが、これらの３種類の音声内容は何れも、特定の周波数帯域に特徴がある。ところが、再生音声取得部５１から供給された処理対象期間の音声データとは、時間領域の音声データである。時間領域の音声データとは、後述する図４に示すように、横軸が時間となっている時系列の音のデータをいう。時間領域の音声データは、音圧の時間推移を解析する用途では好適なデータであるが、周波数分布を解析する用途では不適なデータである。即ち、音声内容認識部４２は、時間領域の音声データの形態のまま、特定の周波数帯の特徴を認識することは困難である。
そこで、本実施形態では、音声内容認識部５２は、処理対象期間の音声データを、時間領域の音声データの形態から周波数領域の形態に変換する。ここで、周波数領域の音声データとは、後述する図５に示すように、横軸が周波数となっている周波数分布を示す音のデータをいう。
ただし、「時間領域」や「周波数領域」は、「データ」に係る修飾語であるため、以下、係り受けの位置は適宜変化するものとする。
時間領域の音声データの形態から周波数領域の形態に変換する手法としては、いわゆるフーリエ変換処理を採用することができる。より具体的には、本実施形態では、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）処理が採用されている。
即ち、音声内容認識部５２は、処理対象期間の音声データに対してＦＦＴ処理を施し、その結果得られる周波数領域の音声データを用いて、当該音声データに含まれる音声内容として、男性の声、女性の声、又は、人間の声を含まない音を認識する。
音声内容認識部５２の認識結果は、音声内容反映部５４に供給される。一方、処理対象期間の音声データは、時間領域の音声データの形態として、音声出力制御部７１に供給される。 The audio content recognition unit 52 recognizes the audio content included in the audio data of the processing target period supplied from the reproduction audio acquisition unit 51.
The audio content recognized by the audio content recognition unit 52 is not particularly limited, and the method for recognizing the audio content is not particularly limited.
However, in the present embodiment, the audio content recognized by the audio content recognition unit 52 includes three types of audio content such as a male voice, a female voice, and a sound that does not include human (male and female) voices. It shall be one of them.
Although details will be described later, these three types of audio contents are all characterized by specific frequency bands. However, the audio data in the processing target period supplied from the reproduction audio acquisition unit 51 is audio data in the time domain. The time domain audio data refers to time-series sound data in which the horizontal axis is time, as shown in FIG. The sound data in the time domain is suitable for use in analyzing the time transition of sound pressure, but is unsuitable for use in analyzing frequency distribution. That is, it is difficult for the audio content recognition unit 42 to recognize the characteristics of a specific frequency band while maintaining the form of the audio data in the time domain.
Therefore, in the present embodiment, the audio content recognition unit 52 converts the audio data in the processing target period from the form of the time domain audio data to the form of the frequency domain. Here, the sound data in the frequency domain refers to sound data indicating a frequency distribution in which the horizontal axis is frequency as shown in FIG.
However, since “time domain” and “frequency domain” are modifiers related to “data”, the position of the dependency will be changed as appropriate.
A so-called Fourier transform process can be adopted as a technique for converting the form of the time domain audio data into the form of the frequency domain. More specifically, in this embodiment, FFT (Fast Fourier Transform) processing is adopted.
That is, the voice content recognition unit 52 performs FFT processing on the voice data in the processing target period, and uses the voice data in the frequency domain obtained as a result, as a voice content included in the voice data, Recognize female voices or sounds that do not contain human voices.
The recognition result of the voice content recognition unit 52 is supplied to the voice content reflection unit 54. On the other hand, the audio data in the processing target period is supplied to the audio output control unit 71 as audio data in the time domain.

再生画像取得部５３は、操作部１７に対するユーザの指示操作により動作を開始すると、再生対象のオリジナルの画像データを画像記憶部３２から取得する。ここで、再生対象のオリジナルの画像データとして、複数の画像データが、操作部１７に対するユーザの指示操作により予め決定されているものとする。さらに、当該複数の画像データの再生順番も、操作部１７に対するユーザの指示操作により予め決定されているものとする。
従って、再生画像取得部５３は、所定の時間間隔毎に、次の再生順番となっている画像データを画像記憶部３２から取得する。このような、次の再生順番となっている画像データを、以下、「次回再生対象のオリジナルの画像データ」と呼ぶ。
また、次回再生対象のオリジナルの画像データが読み出される時間間隔は、本実施形態では、ＲＴＣ１４から現在時刻がＣＰＵ１１に対して送信される時間間隔に基づいて設定されており、具体的には３ｓ間隔であるものとする。
即ち、本実施形態では、再生画像取得部５３は、ＲＴＣ１４から現在時刻が送信されるタイミングに同期して、３ｓ間隔毎に、次回再生対象のオリジナルの画像データを画像記憶部３２から取得して、音声内容反映部５４に供給する。
なお、次回再生対象のオリジナルの画像データが、圧縮符号化されている場合には、再生画像取得部５３は、当該画像データを、伸長復号するものとする。 The reproduction image acquisition unit 53 acquires the original image data to be reproduced from the image storage unit 32 when the operation is started by a user instruction operation on the operation unit 17. Here, it is assumed that a plurality of pieces of image data are determined in advance by a user instruction operation on the operation unit 17 as original image data to be reproduced. Furthermore, it is assumed that the reproduction order of the plurality of image data is also determined in advance by a user instruction operation on the operation unit 17.
Therefore, the reproduction image acquisition unit 53 acquires the image data in the next reproduction order from the image storage unit 32 at predetermined time intervals. Such image data in the next reproduction order is hereinafter referred to as “original image data to be reproduced next time”.
Further, in this embodiment, the time interval at which the original image data to be reproduced next time is read out is set based on the time interval at which the current time is transmitted from the RTC 14 to the CPU 11. Specifically, the time interval is 3 s. Suppose that
That is, in the present embodiment, the reproduction image acquisition unit 53 acquires the original image data to be reproduced next time from the image storage unit 32 every 3 s in synchronization with the timing at which the current time is transmitted from the RTC 14. To the audio content reflecting unit 54.
When the original image data to be reproduced next time is compressed and encoded, the reproduced image acquisition unit 53 decompresses and decodes the image data.

音声内容反映部５４は、次回再生対象のオリジナルの画像データを再生する場合に、当該オリジナルの画像の構成及び構図を維持したまま、音声内容認識部５２により認識された音声内容を反映させた画像をディスプレイ６２表示させる処理を実行する。このような音声内容反映部５４の処理を、以下、「音声内容反映処理」と呼ぶ。
本実施形態では、音声内容反映部５４は、音声内容反映処理として、再生画像取得部５３から供給された次回再生対象のオリジナルの画像データに対して、当該オリジナルの画像に、音声内容認識部５２により認識された音声内容に対応した装飾画像を付加する画像処理を施す。
これにより、オリジナルの画像に装飾画像が付加された画像（以下、「装飾付加オリジナル画像」と呼ぶ）の画像データが得られ、音声内容反映部５４から表示制御部６１に供給される。 When reproducing the original image data to be reproduced next time, the audio content reflecting unit 54 reflects the audio content recognized by the audio content recognizing unit 52 while maintaining the configuration and composition of the original image. Is displayed on the display 62. Such processing of the audio content reflection unit 54 is hereinafter referred to as “audio content reflection processing”.
In the present embodiment, the audio content reflecting unit 54 performs, as the audio content reflecting process, the original image data to be reproduced next time supplied from the reproduction image acquiring unit 53 with respect to the original image. The image processing for adding the decoration image corresponding to the audio content recognized by the above is performed.
Thereby, image data of an image obtained by adding a decoration image to the original image (hereinafter referred to as “decoration addition original image”) is obtained and supplied from the audio content reflection unit 54 to the display control unit 61.

表示制御部６１は、ＣＰＵ１１の制御の下、音声内容反映部５４から供給された画像データを再生する。これにより、本実施形態では、装飾付加オリジナル画像がディスプレイ６２に表示される。
ここで、音声内容認識部５２の認識結果、即ち音声内容は、上述の如く、２３ｍｓ間隔で音声内容反映部５４に供給される。一方で、次回再生対象のオリジナルの画像データは、上述の如く、３ｓ間隔で音声内容反映部５４に供給される。
そこで、本実施形態では、音声内容反映部５４は、２３ｍｓ間隔で、音声内容反映処理を実行するものとする。即ち、装飾付加オリジナル画像のデータは、２３ｍｓ間隔で更新されて、表示制御部６１に供給されるものとする。
これにより、オリジナルの画像については、３ｓ間同一の画像がディスプレイ６２に表示されたまま（それ故、３ｓ間、構図及び構成が維持されたまま）、２３ｍｓ間隔で、音声内容に対応した装飾画像の表示は更新されることになる。 The display control unit 61 reproduces the image data supplied from the audio content reflection unit 54 under the control of the CPU 11. Thereby, in the present embodiment, the decoration-added original image is displayed on the display 62.
Here, the recognition result of the voice content recognition unit 52, that is, the voice content is supplied to the voice content reflection unit 54 at intervals of 23 ms as described above. On the other hand, the original image data to be reproduced next time is supplied to the audio content reflecting unit 54 at intervals of 3 s as described above.
Therefore, in the present embodiment, the audio content reflection unit 54 executes the audio content reflection process at intervals of 23 ms. That is, the data of the decoration-added original image is updated at intervals of 23 ms and supplied to the display control unit 61.
As a result, for the original image, the same image is displayed on the display 62 for 3 seconds (hence, the composition and the configuration are maintained for 3 seconds), and the decoration image corresponding to the audio content at intervals of 23 ms. Will be updated.

音声出力制御部７１は、ＣＰＵ１１の制御の下、音声内容認識部５２から供給される処理対象期間の音声データ（時間領域の音声データ）を再生する。即ち、本実施形態では、上述の如く、処理対象期間の音声データは音声内容認識部５２から２３ｍｓ間隔で音声出力制御部７１に順次供給されてくる。そこで、音声出力制御部７１は、処理対象期間の音声データに対応する音声、即ち、現在ディスプレイに表示されている装飾画像に対応する音声内容の２３ｍｓ分の音声を、スピーカ７２から出力する。 Under the control of the CPU 11, the audio output control unit 71 reproduces audio data (time domain audio data) of the processing target period supplied from the audio content recognition unit 52. That is, in the present embodiment, as described above, the audio data in the processing target period is sequentially supplied from the audio content recognition unit 52 to the audio output control unit 71 at intervals of 23 ms. Therefore, the audio output control unit 71 outputs the audio corresponding to the audio data in the processing target period, that is, the audio for 23 ms of the audio content corresponding to the decoration image currently displayed on the display from the speaker 72.

次に、図３を参照して、このような図２の機能的構成を有する再生装置１の処理のうち、音声再生処理について説明する。
図３は、音声再生処理の流れを説明するフローチャートである。 Next, with reference to FIG. 3, an audio reproduction process among the processes of the reproduction apparatus 1 having the functional configuration of FIG. 2 will be described.
FIG. 3 is a flowchart for explaining the flow of the audio reproduction process.

例えば、音声再生処理は、本実施形態では、ユーザが操作部１７を指示操作することによって、再生対象の音声データ、再生対象の複数のオリジナルの画像データ、及び、それらの画像データの再生順番を決定したことを契機として、開始する。なお、音声再生処理の開始と同期して、後述する図６の画像再生処理も開始する。
なお、ここでは、モノラルの音声データであって、サンプリング周波数４４．１ｋＨｚで１６ビット符号化された音声データが無圧縮でＷＡＶＥ形式のファイルに含められて、再生音声取得部５１に記憶されているものとする。 For example, in the present embodiment, the audio reproduction processing is performed by the user operating the operation unit 17 to change the reproduction target audio data, a plurality of original image data to be reproduced, and the reproduction order of the image data. Start with the decision. In synchronization with the start of the audio reproduction process, an image reproduction process of FIG. 6 described later is also started.
Here, monaural audio data, which is 16-bit encoded audio data with a sampling frequency of 44.1 kHz, is included in a WAVE file without compression and stored in the reproduced audio acquisition unit 51. Shall.

ステップＳ１において、再生音声取得部５１は、音声記憶部３１に記憶された再生対象の音声データのうち、読み出しアドレスから所定期間分の音声データを、処理対象期間の音声データとして取得する。
ここで、読み出しアドレスとは、再生対象の音声データが記憶されている音声記憶部３１のアドレスのうち、原則として、前回の処理対象期間の音声データの最後尾のアドレスの次のアドレス（以下、「処理対象期間の次のアドレス」と表現する）をいう。ただし、音声再生処理が開始された直後の初回のステップＳ１の処理、及び、後述するステップＳ４の処理が実行された後のステップＳ１の処理では、再生対象の音声データの最初の部分が記憶されているアドレス（以下、単に「最初」と表現する）が、読み出しアドレスになる。
より具体的には、本実施形態では、所定期間として２３ｍｓが採用されている。そして、４４．１ｋＨｚでサンプリングされた音声データが採用されている。このため、処理対象期間の音声データとは、２３ｍｓに相当する個数、即ち１０２４個のサンプルデータとなる。従って、読み出しアドレスから順に１０２４個のサンプルデータが、音声記憶部３１から読み出され、処理対象期間の音声データとして再生音声取得部５１に取得される。即ち、処理対象期間の音声データのサイズは、１０２４個分のサンプルデータに相当する２０４８バイトである。従って、読み出しアドレスから２０４８バイト分のデータが、処理対象期間の音声データとして音声記憶部３１から読み出される。 In step S 1, the reproduction audio acquisition unit 51 acquires audio data for a predetermined period from the read address among the audio data to be reproduced stored in the audio storage unit 31 as audio data for the processing target period.
Here, the read address is, in principle, the address next to the last address of the audio data in the previous processing target period (hereinafter referred to as the address of the audio storage unit 31 in which the audio data to be reproduced is stored). It is expressed as “the next address of the processing target period”). However, in the process of the first step S1 immediately after the start of the audio reproduction process and the process of step S1 after the process of step S4 described later is executed, the first part of the audio data to be reproduced is stored. Address (hereinafter simply referred to as “first”) is the read address.
More specifically, in this embodiment, 23 ms is adopted as the predetermined period. Audio data sampled at 44.1 kHz is employed. For this reason, the audio data in the processing target period is the number corresponding to 23 ms, that is, 1024 sample data. Accordingly, 1024 pieces of sample data are read out from the audio storage unit 31 in order from the read address, and are acquired by the reproduction audio acquisition unit 51 as audio data in the processing target period. That is, the size of the audio data in the processing target period is 2048 bytes corresponding to 1024 sample data. Accordingly, 2048 bytes of data from the read address is read from the voice storage unit 31 as voice data for the processing target period.

図４は、ステップＳ１の処理で再生音声取得部５１により処理対象期間の音声データが取得される再生対象の音声データの一例を示すタイミングチャートである。
図４のタイミングチャートにおいて、横軸は、時間を示している。また、縦軸は、音圧を示している。
なお、図４において、横軸の目盛値は、音声データのサンプル数の区切りに合わせて付与しているため、秒単位とはなっていない。処理対象期間の音声データが２４個分集合した場合に相当する時間（５５０ｍｓ）は、この目盛間隔の１／８程度になる。即ち、この目盛間隔の（１／８）×（１／２４）程度の期間分のデータが、１回のステップＳ１の処理により、処理対象期間の音声データとして取得されることになる。 FIG. 4 is a timing chart showing an example of reproduction target audio data from which the reproduction audio acquisition unit 51 acquires audio data for the processing target period in the process of step S1.
In the timing chart of FIG. 4, the horizontal axis indicates time. The vertical axis indicates the sound pressure.
In FIG. 4, the scale value on the horizontal axis is given in accordance with the separation of the number of samples of audio data, and is not in units of seconds. The time (550 ms) corresponding to the collection of 24 audio data in the processing target period is about 1/8 of the scale interval. That is, data for a period of about (1/8) × (1/24) of the scale interval is acquired as audio data for the processing target period by one process of step S1.

図３のステップＳ２において、音声内容認識部５２は、ステップＳ１の処理で取得した音声データの処理対象期間は、再生対象の最後の期間であるか否かを判定する。 In step S2 of FIG. 3, the audio content recognition unit 52 determines whether or not the processing target period of the audio data acquired in the process of step S1 is the last period to be reproduced.

再生対象の最後の期間ではない場合、ステップＳ２において、ＮＯであると判定されて、ステップＳ３に進む。 If it is not the last period to be reproduced, it is determined as NO in step S2, and the process proceeds to step S3.

ステップＳ３において、音声内容認識部５２は、読み出しアドレスを処理対象期間の次のアドレスに更新する。 In step S3, the voice content recognition unit 52 updates the read address to an address next to the processing target period.

これに対して、再生対象の最後の期間である場合、ステップＳ２において、ＹＥＳであると判定されて、ステップＳ４に進む。
ステップＳ４において、音声内容認識部５２は、読み出しアドレスを最初に更新する。
これにより、次回のステップＳ１の処理では、再生対象の音声データは最初から読み出されることになる。即ち、再生対象の音声データは、最初から再生されることになる。このようにして、本実施形態では、再生対象の音声データの繰り返し再生を実現している。 On the other hand, if it is the last period to be reproduced, it is determined as YES in step S2, and the process proceeds to step S4.
In step S4, the voice content recognition unit 52 first updates the read address.
Thereby, in the next process of step S1, the audio data to be reproduced is read from the beginning. That is, the audio data to be reproduced is reproduced from the beginning. In this way, in this embodiment, repeated reproduction of audio data to be reproduced is realized.

ステップＳ３又はステップＳ４の処理により、読み出しアドレスが更新されると、処理はステップＳ５に進む。 When the read address is updated by the process of step S3 or step S4, the process proceeds to step S5.

ステップＳ５において、音声内容認識部５２は、処理対象期間の音声データに対して、ＦＦＴ処理を施す。
ここで、ステップＳ５のＦＦＴ処理結果、即ち、処理対象期間の周波数領域の音声データは、過去から連続して数１０回分が、所定のメモリ、例えば記憶部２０の一領域（図２には図示せず）に記憶されるものとする。 In step S5, the audio content recognition unit 52 performs FFT processing on the audio data in the processing target period.
Here, the result of the FFT processing in step S5, that is, the voice data in the frequency domain in the processing target period, is continuously several tens of times from the past, and is stored in a predetermined memory, for example, one area of the storage unit 20 (FIG. (Not shown).

ステップＳ６において、音声内容認識部５２は、処理対象期間と、過去の複数期間との各々のＦＦＴ処理結果（周波数領域の音声データ）を比較することにより、処理対象期間の音声内容を認識する。
具体的には、処理対象期間の音声データの音声内容が、男性の声、女性の声、及び人間の声を含まない音のうちの何れの種類であるのかが、周波数領域での比較に基づいて判定される。 In step S 6, the audio content recognition unit 52 recognizes the audio content of the processing target period by comparing the FFT processing results (frequency domain audio data) of the processing target period and the past plural periods.
Specifically, based on the comparison in the frequency domain, the audio content of the audio data in the processing target period is a type of sound that does not include male voice, female voice, or human voice. Is determined.

さらに、以下、図５を参照して、音声内容認識部５２によるステップＳ５及びＳ６の処理の詳細について説明する。 Further, the details of the processing of steps S5 and S6 by the audio content recognition unit 52 will be described below with reference to FIG.

図５（Ａ）は、再生対象の音声データのうち、人間の声を含まない音に対応する音声データに対して、ＦＦＴ処理を施した結果の一例を示している。即ち、人間の声を含まない音に対応する周波数領域の音声データの一例が、図５（Ａ）に示されている。
図５（Ｂ）は、再生対象の音声データのうち、男性の声を含む音に対応する音声データに対して、ＦＦＴ処理を施した結果の一例を示している。即ち、男性の声を含む音に対応する周波数領域の音声データの一例が、図５（Ｂ）に示されている。 FIG. 5A shows an example of the result of performing FFT processing on audio data corresponding to sound that does not include human voices among audio data to be reproduced. That is, FIG. 5A shows an example of frequency domain audio data corresponding to a sound that does not include a human voice.
FIG. 5B shows an example of the result of performing FFT processing on audio data corresponding to a sound including a male voice among the audio data to be reproduced. That is, FIG. 5B shows an example of audio data in the frequency domain corresponding to a sound including a male voice.

図５（Ａ）と図５（Ｂ）とを比較すると、図５（Ｂ）においては、２５０Ｈｚ付近の周波数成分の強度が強いのに対して、図５（Ａ）においては、２５０Ｈｚ付近の周波数成分の強度が弱いことがわかる。このことは、男性のピッチ（基本周波数）は２５０Ｈｚ付近にあるといわれているという内容と一致している。
また、図示はしないが、このピッチは、男性と女性で差異があることも知られており、男性のピッチが上述した２５０Ｈｚ付近であるのに対して、女性のピッチは５６０Ｈｚ付近であるといわれている。
従って、音声内容認識部５２は、２５０Ｈｚや５６０Ｈｚ付近の特定周波数帯に着目して、過去数１０回分の周波数領域のデータ（ＦＦＴ処理結果）を比較し、着目した特定周波数帯の周波数成分の強度の変化度合に基づいて、音声内容を認識することができる。
即ち、音声内容認識部５２は、２５０Ｈｚ付近の周波数成分の強度変化が大きい場合には、処理対象領域の音声データの音声内容は、男性の声であると認識することができる。
音声内容認識部５２は、５６０Ｈｚ付近の周波数成分の強度変化が大きい場合には、処理対象領域の音声データの音声内容は、女性の声であると認識することができる。
そして、音声内容認識部５２は、２５０Ｈｚ付近及び５６０Ｈｚ付近の何れの周波数成分の強度変化が小さい場合には、処理対象領域の音声データの音声内容は、人間の声を含まない音であると認識することができる。 Comparing FIG. 5 (A) and FIG. 5 (B), in FIG. 5 (B), the intensity of the frequency component near 250 Hz is strong, whereas in FIG. 5 (A), the frequency around 250 Hz. It can be seen that the strength of the components is weak. This is consistent with the content that the male pitch (basic frequency) is said to be around 250 Hz.
Although not shown, it is also known that there is a difference between men and women, and it is said that the pitch of women is around 560 Hz, whereas the pitch of men is around 250 Hz mentioned above. ing.
Therefore, the speech content recognition unit 52 pays attention to the specific frequency band near 250 Hz and 560 Hz, compares the frequency domain data (FFT processing result) for the past several tens of times, and compares the intensity of the frequency component of the specific frequency band of interest. The voice content can be recognized based on the degree of change.
That is, the voice content recognition unit 52 can recognize that the voice content of the voice data in the processing target area is a male voice when the intensity change of the frequency component near 250 Hz is large.
When the intensity change of the frequency component near 560 Hz is large, the audio content recognition unit 52 can recognize that the audio content of the audio data in the processing target area is a female voice.
The voice content recognition unit 52 recognizes that the voice content of the voice data in the processing target area is a sound that does not include a human voice when the intensity change of any frequency component near 250 Hz or 560 Hz is small. can do.

なお、音声内容認識部５２による音声内容の認識手法は、特に前段落の例に限定されず、任意でもよい。
例えば、音声内容認識部５２は、最初に、人間の声を含むか否かを切り分け、人間の声を含む場合にのみ、当該人間の声が男性の声であるのか女性の声であるのかを切り分けるようにしてもよい。この場合、最初の人間の声を含むか否かの判断では、広範囲な周波数帯、例えば、数１０Ｈｚ乃至２０００Ｈｚの周波数成分の強度変化の度合を用いることができる。即ち、これらの強度変化の度合が、大きい場合には、人間の声を含むと判定される一方、小さい場合には、人間の声を含まないと判定される。
また、音声内容認識部５２による音声内容の認識に用いる要素も、周波数成分の単純な強弱の変化のみならず、例えば、増状態の保持時間、次の増状態の時間との間（滅状態の期間）等様々な要素を採用することができる。
さらにまた、１つの処理対象期間の音声データの音声内容を認識する場合において、比較対象として用いられる音声データは、特に限定されない。ただし、比較として用いられる音声データの数は、画像表示の間隔（本実施形態では３ｓ）や、画像データが読み込まれてから表示されるまでの一連の処理に要する時間等を加味すると、数１００ｍｓ程度分の個数が好適である。そこで、例えば、５００ｍｓ程度分の音声データを採用するとした場合、処理対象期間の音声データに換算すると２４個分となり、数１０個とした上述の例と合致する。 Note that the speech content recognition method by the speech content recognition unit 52 is not particularly limited to the example in the previous paragraph, and may be arbitrary.
For example, the voice content recognition unit 52 first determines whether or not a human voice is included, and only when a human voice is included, determines whether the human voice is a male voice or a female voice. You may make it cut out. In this case, in determining whether or not the first human voice is included, the degree of intensity change of frequency components in a wide frequency band, for example, several tens to 2000 Hz can be used. That is, when the degree of intensity change is large, it is determined that a human voice is included, whereas when it is small, it is determined that a human voice is not included.
In addition, the elements used for the speech content recognition by the speech content recognition unit 52 are not only simple changes in the frequency component, but also, for example, between the holding time of the increased state and the time of the next increased state (in the dead state). Various elements such as (period) can be adopted.
Furthermore, in the case of recognizing the audio content of the audio data for one processing target period, the audio data used as a comparison target is not particularly limited. However, the number of audio data used for comparison is several hundreds of milliseconds, taking into account the interval of image display (3 s in the present embodiment), the time required for a series of processing from when image data is read to display, and the like. A number corresponding to the extent is suitable. Therefore, for example, when audio data for about 500 ms is adopted, when converted into audio data for the processing target period, the number of audio data is 24, which is consistent with the above example of several tens.

以上、音声内容認識部５２によるステップＳ５及びＳ６の処理の詳細について説明した。このようなステップＳ５及びＳ６の処理が終了し、音声内容認識部５２の認識結果、即ち、音声内容が、音声内容反映部５４に供給されると、処理はステップＳ７に進む。 The details of the processing in steps S5 and S6 by the voice content recognition unit 52 have been described above. When the processing of steps S5 and S6 is completed and the recognition result of the speech content recognition unit 52, that is, the speech content is supplied to the speech content reflection unit 54, the process proceeds to step S7.

ステップＳ７において、音声出力部１９は、処理対象期間の音声データを再生する。
即ち、音声出力部１９は、処理対象期間の音声データに対応する音声、即ち、ステップＳ６の処理で認識された音声内容の音声を、スピーカ７２から出力する。 In step S7, the audio output unit 19 reproduces the audio data in the processing target period.
That is, the audio output unit 19 outputs the audio corresponding to the audio data in the processing target period, that is, the audio of the audio content recognized in the process of step S6 from the speaker 72.

ステップＳ８において、再生音声取得部５１は、処理の終了指示があったか否かを判定する。
終了の指示は、特に限定されないが、本実施形態では、ユーザが操作部１７を操作して行う、画像及び音声の再生終了の指示が採用されているものとする。 In step S8, the reproduction sound acquisition unit 51 determines whether or not there is an instruction to end the process.
The end instruction is not particularly limited, but in the present embodiment, it is assumed that an instruction to end the reproduction of images and sounds, which is performed by the user operating the operation unit 17, is adopted.

この場合、再生終了の指示がなされていない場合、ステップＳ８においてＮＯであると判定されて、処理はステップＳ１に戻され、それ以降の処理が繰り返される。即ち、再生終了の指示がなさるまでの間、ステップＳ１乃至Ｓ８のループ処理が繰り返し実行される。処理対象期間の音声データを単位として、その音声内容が認識されると共に、その音声内容の音声がスピーカ７２から出力される、といった処理が繰り返し実行される。 In this case, when the instruction to end the reproduction is not given, it is determined as NO in Step S8, the process is returned to Step S1, and the subsequent processes are repeated. That is, the loop processing of steps S1 to S8 is repeatedly executed until an instruction to end reproduction is given. The process of recognizing the audio content and outputting the audio content of the audio content from the speaker 72 is repeatedly executed in units of audio data in the processing target period.

その後、再生終了の指示がなされると、ステップＳ８においてＹＥＳであると判定されて、音楽再生処理は終了となる。 Thereafter, when an instruction to end playback is given, it is determined YES in step S8, and the music playback process ends.

以上、図３を参照して、図２の再生装置１の処理のうち、音声再生処理について説明した。
次に、図６を参照して、図２の再生装置１の処理のうち、画像再生処理について説明する。
図６は、画像再生処理の流れを説明するフローチャートである。 Heretofore, the audio reproduction process among the processes of the reproduction apparatus 1 of FIG. 2 has been described with reference to FIG.
Next, an image reproduction process among the processes of the reproduction apparatus 1 in FIG. 2 will be described with reference to FIG.
FIG. 6 is a flowchart for explaining the flow of the image reproduction process.

例えば、画像再生処理は、本実施形態では、ユーザが操作部１７を指示操作することによって、再生対象の音声データ、再生対象の複数のオリジナルの画像データ、及び、それらの画像データの再生順番を決定したことを契機として、開始する。即ち、上述したように、図３の音声再生処理の開始と同期して図６の画像再生処理も開始する。
なお、ここでは、デジタルカメラ等で撮影された写真の画像データが、いわゆるＪＰＥＧ符号化されてＪＰＥＧ形式のファイルに含まれたデータ（以下、「ＪＰＥＧデータ」と呼ぶ）が、再生対象の複数のオリジナルの画像データとして決定されているものとする。そして、これらの複数のオリジナルの画像データが、３秒間隔で、いわゆるスライド再生されるものとする。 For example, in the present embodiment, in the present embodiment, when the user operates the operation unit 17 to instruct the reproduction of the audio data to be reproduced, a plurality of original image data to be reproduced, and the reproduction order of those image data. Start with the decision. That is, as described above, the image reproduction process of FIG. 6 is also started in synchronization with the start of the audio reproduction process of FIG.
Here, image data of a photograph taken with a digital camera or the like is so-called JPEG encoded data included in a JPEG format file (hereinafter referred to as “JPEG data”). It is assumed that it is determined as original image data. Then, it is assumed that the plurality of original image data are so-called slide-reproduced at intervals of 3 seconds.

ステップＳ２１において、再生画像取得部５３は、次回再生対象の初期設定を行う。即ち、ここでは、上述の再生順番として１番が決定されたオリジナルの画像データが、次回再生対象の画像データとして設定されるものとする。 In step S21, the reproduction image acquisition unit 53 performs initial setting of the next reproduction target. That is, here, it is assumed that the original image data for which No. 1 is determined as the reproduction order is set as the next reproduction target image data.

ステップＳ２２において、再生画像取得部５３は、画像記憶部３２に記憶されている再生対象の複数のオリジナルの画像データのうち、次回再生対象のオリジナルの画像データを取得する。
なお、ここでは、次回再生対象のオリジナルの画像データはＪＰＥＧデータである。そこで、再生画像取得部５３は、当該ＪＰＥＧデータに対して伸長復号処理を施す。ここで、伸長復号処理の結果得られる画像データの形態は特に限定されず、例えば、出力サイズ（例えば、１０２４×７６８、１０２４×６００、８００×６００、６４０×４８０）に合わせた非圧縮のＲＧＢの１画素当たり２４ビットの画像データ、ＹＵＶ４２２の１画素当たり１６ビットの画像データ、或いは、ＹＵＶ４２０の１画素当たり１２ビットの画像データ等を採用することができる。 In step S 22, the reproduction image acquisition unit 53 acquires original image data to be reproduced next time among the plurality of original image data to be reproduced stored in the image storage unit 32.
Here, the original image data to be reproduced next time is JPEG data. Therefore, the reproduced image acquisition unit 53 performs decompression decoding processing on the JPEG data. Here, the form of the image data obtained as a result of the decompression decoding process is not particularly limited. For example, uncompressed RGB in accordance with the output size (for example, 1024 × 768, 1024 × 600, 800 × 600, 640 × 480) The image data of 24 bits per pixel, the image data of 16 bits per pixel of YUV422, the image data of 12 bits per pixel of YUV420, or the like can be employed.

ステップＳ２３において、音声内容反映部５４は、音声内容を取得する。ここで、音声内容反映部５４が取得する音声内容とは、図３の音声再生処理のステップＳ６の処理で音声内容認識部５２により認識された、処理対象期間の音声データについての音声内容である。
即ち、音声内容反映部５４は、処理対象期間の音声データについての音声内容として、男性の声、女性の声、及び人間の声を含まない音のうちの何れの種類を取得する。
より具体的には例えば、本実施形態では、音声内容認識部５２は、処理対象期間の音声データについての音声内容の認識結果を、識別コードとして音声内容反映部５４に供給するものとする。即ち、音声内容認識部５２は、男性の声を認識した場合には識別コードＣＢを発行し、女性の声（子供の声の可能性あり）を認識した場合には識別コードＣＣを発行し、人間の声を含まない音を認識した場合には識別コードＣＤを発行する。
音声内容反映部５４は、ステップＳ２３の処理で、これらの識別コードＣＢ，ＣＣ，ＣＤのうちの何れかを音声内容として取得する。 In step S23, the audio content reflecting unit 54 acquires the audio content. Here, the audio content acquired by the audio content reflection unit 54 is the audio content of the audio data in the processing target period recognized by the audio content recognition unit 52 in step S6 of the audio reproduction process of FIG. .
In other words, the audio content reflection unit 54 acquires any type of a voice that does not include a male voice, a female voice, and a human voice as the audio content of the audio data in the processing target period.
More specifically, for example, in this embodiment, the audio content recognition unit 52 supplies the audio content reflection unit 54 with the recognition result of the audio content for the audio data in the processing target period as an identification code. That is, the voice content recognition unit 52 issues an identification code CB when recognizing a male voice, and issues an identification code CC when recognizing a female voice (possibly a child's voice). When a sound that does not contain human voice is recognized, an identification code CD is issued.
The audio content reflection unit 54 acquires any of these identification codes CB, CC, and CD as the audio content in the process of step S23.

ステップＳ２４において、音声内容反映部５４は、ステップＳ２３の処理で取得した音声内容が前回から変化したか否かを判定する。 In step S24, the audio content reflection unit 54 determines whether or not the audio content acquired in the process of step S23 has changed from the previous time.

直前の回のステップＳ２３の処理で取得された識別コードが、その前の回のステップＳ２３の処理で取得された識別コードと同一である場合、音声内容が前回から変化していないため、ステップＳ２４において、ＮＯであると判定されて、処理はステップＳ２５に進む。
ステップＳ２５において、音声内容反映部５４は、次回再生対象の画像に対して前回と同一の装飾画像を付加する処理を、音声内容反映処理として、次回再生対象の画像データに対して施す。 If the identification code acquired in the immediately previous step S23 is the same as the identification code acquired in the previous step S23, the audio content has not changed from the previous time, so step S24. In step S25, the process proceeds to step S25.
In step S 25, the audio content reflecting unit 54 performs processing for adding the same decorative image as the previous image to the next reproduction target image as the audio content reflection processing on the next reproduction target image data.

これに対して、直前の回のステップＳ２３の処理で取得された識別コードが、その前の回のステップＳ２３の処理で取得された識別コード異なる場合、音声内容が前回から変化しているため、ステップＳ２４において、ＹＥＳであると判定されて、処理はステップＳ２６に進む。
ステップＳ２６において、音声内容反映部５４は、次回再生対象の画像に対して音声内容に対応した装飾画像を付加する処理を、音声内容反映処理として、次回再生対象の画像データに対して施す。 On the other hand, when the identification code acquired in the previous step S23 is different from the identification code acquired in the previous step S23, the audio content has changed from the previous time. In step S24, it is determined as YES, and the process proceeds to step S26.
In step S 26, the audio content reflection unit 54 performs a process for adding a decoration image corresponding to the audio content on the next reproduction target image as the audio content reflection process on the next reproduction target image data.

ステップＳ２７において、表示部１８は、ＣＰＵ１１の制御の下、装飾画像が付加された次回再生対象のオリジナルの画像データを再生する。これにより、本実施形態では、図７に示すような装飾付加オリジナル画像がディスプレイ６２に表示される。 In step S27, the display unit 18 reproduces the original image data to be reproduced next time to which the decoration image is added under the control of the CPU 11. Thereby, in this embodiment, the decoration addition original image as shown in FIG.

図７は、装飾付加オリジナル画像の一例を示している。
図７の例では、猫を被写体に含むオリジナルの画像８１が採用されている。
また、ステップＳ２５又はステップＳ２６の処理で付加される装飾画像としては、男性の声に対応する装飾画像９１と、女性の声に対応する装飾画像９２と、人間の声を含まない音に対応する装飾画像９３（音符で模している装飾画像９３）とが採用されている。
例えば、ステップＳ２３の処理で音声内容として識別コードＣＢが取得された場合、即ち男性の声が認識された場合、次のステップＳ２５又はステップＳ２６の処理で、オリジナルの画像８１に対して装飾画像９１が付加される音声内容反映処理が実行される。その結果、次のステップＳ２７の処理では、図７の右方の一番上に示す装飾付加オリジナル画像１０１がディスプレイ６２に表示される。
また例えば、ステップＳ２３の処理で音声内容として識別コードＣＣが取得された場合、即ち女性の声が認識された場合、次のステップＳ２５又はステップＳ２６の処理で、オリジナルの画像８１に対して装飾画像９２が付加される音声内容反映処理が実行される。その結果、次のステップＳ２７の処理では、図７の右方の中央に示す装飾付加オリジナル画像１０２がディスプレイ６２に表示される。
また例えば、ステップＳ２３の処理で音声内容として識別コードＣＤが取得された場合、即ち人間の声を含まない音が認識された場合、次のステップＳ２５又はステップＳ２６の処理で、オリジナルの画像８１に対して装飾画像９３が付加される音声内容反映処理が実行される。その結果、次のステップＳ２７の処理では、図７の右方の一番下に示す装飾付加オリジナル画像１０３がディスプレイ６２に表示される。
なお、装飾付加オリジナル画像１０１乃至１０３は、例示に過ぎない。即ち、装飾画像は、図７の例の装飾画像９１乃至９３に限定されず、任意でもよい。また、本実施形態では、３種類の音声内容をユーザに提示できれば足りるので、装飾画像の種類は３種類である必要はなく、２種類でもよい。具体的には例えば、人間の声を含まない音の場合、装飾画像９３を付加しないオリジナルの画像８１がそのままディスプレイ６２に表示されたとしても、人間の声を含まない音であることをユーザに提示することができる。 FIG. 7 shows an example of a decoration-added original image.
In the example of FIG. 7, an original image 81 including a cat as a subject is employed.
Further, the decoration image added in the process of step S25 or step S26 corresponds to a decoration image 91 corresponding to a male voice, a decoration image 92 corresponding to a female voice, and a sound not including a human voice. A decorative image 93 (decorative image 93 imitating a musical note) is employed.
For example, when the identification code CB is acquired as the voice content in the process of step S23, that is, when a male voice is recognized, the decoration image 91 is compared with the original image 81 in the next step S25 or step S26. The voice content reflection process to which is added is executed. As a result, in the process of the next step S27, the decoration-added original image 101 shown at the top on the right side of FIG.
Further, for example, when the identification code CC is acquired as the audio content in the process of step S23, that is, when a female voice is recognized, the decoration image is added to the original image 81 in the process of the next step S25 or step S26. A voice content reflection process to which 92 is added is executed. As a result, in the process of the next step S27, the decoration-added original image 102 shown at the center on the right side of FIG.
Further, for example, when the identification code CD is acquired as the audio content in the process of step S23, that is, when a sound that does not include a human voice is recognized, the original image 81 is displayed in the process of the next step S25 or step S26. On the other hand, the audio content reflecting process to which the decoration image 93 is added is executed. As a result, in the next step S27, the decoration-added original image 103 shown at the bottom right of FIG.
Note that the decoration-added original images 101 to 103 are merely examples. That is, the decoration image is not limited to the decoration images 91 to 93 in the example of FIG. In the present embodiment, it is sufficient that three types of audio contents can be presented to the user. Therefore, the types of decorative images need not be three, and may be two. Specifically, for example, in the case of a sound that does not include a human voice, even if the original image 81 without the decoration image 93 is displayed on the display 62 as it is, the user is informed that the sound does not include a human voice. Can be presented.

図６のステップＳ２８において、再生画像取得部５３は、表示切替条件を満たしたか否かを判定する。
ここで、表示切替条件とは、次回再生対象の画像データを切り替える条件をいい、本実施形態では、当該表示切替条件を前回に満たした時から３秒経過したこと、という条件が採用されている。このような条件を採用することにより、オリジナルの画像の更新を３秒毎に実行することが可能になる。 In step S28 of FIG. 6, the reproduction image acquisition unit 53 determines whether or not the display switching condition is satisfied.
Here, the display switching condition refers to a condition for switching image data to be reproduced next time. In the present embodiment, a condition that three seconds have passed since the display switching condition was satisfied last time is employed. . By adopting such a condition, it is possible to update the original image every 3 seconds.

従って、表示切替条件を前回に満たした時から未だ３秒経過していない場合、即ち、現在ディスプレイ６２に表示されているオリジナルの画像（装飾画像を除いた部分）が、継続して３秒間表示されていない場合、表示切替条件は満たされていない。このような場合、ステップＳ２８においてＮＯであると判定されて、処理はステップＳ２９に進む。
ステップＳ２９において、再生画像取得部５３は、次回の再生対象を現状のまま維持する。 Therefore, when 3 seconds have not yet passed since the last time the display switching condition was satisfied, that is, the original image (excluding the decorative image) currently displayed on the display 62 is continuously displayed for 3 seconds. If not, the display switching condition is not satisfied. In such a case, it is determined as NO in Step S28, and the process proceeds to Step S29.
In step S29, the reproduction image acquisition unit 53 maintains the next reproduction target as it is.

これに対して、表示切替条件を前回に満たした時から３秒経過した場合、即ち、現在ディスプレイ６２に表示されているオリジナルの画像（装飾画像を除いた部分）が、継続して３秒間表示され続けた場合、表示切替条件は満たされる。このような場合、ステップＳ２８においてＹＥＳであると判定されて、処理はステップＳ３０に進む。
ステップＳ３０において、再生画像取得部５３は、次回再生対象を、次の再生順番の画像データに更新する。 On the other hand, when 3 seconds have passed since the last time the display switching condition was satisfied, that is, the original image currently displayed on the display 62 (the portion excluding the decorative image) is continuously displayed for 3 seconds. If the operation is continued, the display switching condition is satisfied. In such a case, it is determined as YES in Step S28, and the process proceeds to Step S30.
In step S30, the reproduction image acquisition unit 53 updates the next reproduction target to image data of the next reproduction order.

このようにして、ステップＳ２９又はステップＳ３０の処理で、次回再生対象が決定されると、処理はステップＳ３１に進む。 Thus, when the next reproduction target is determined in the process of step S29 or step S30, the process proceeds to step S31.

ステップＳ３１において、再生画像取得部５３は、処理の終了指示があったか否かを判定する。
終了の指示は、特に限定されないが、本実施形態では、図３の音声再生処理と同一の指示、即ち、ユーザが操作部１７を操作して行う、画像及び音声の再生終了の指示が採用されているものとする。 In step S31, the reproduction image acquisition unit 53 determines whether or not there is an instruction to end the process.
The end instruction is not particularly limited, but in this embodiment, the same instruction as the sound reproduction process of FIG. 3, that is, the instruction to end the reproduction of the image and sound, which is performed by the user operating the operation unit 17, is adopted. It shall be.

この場合、再生終了の指示がなされていない場合、ステップＳ３１においてＮＯであると判定されて、処理はステップＳ２２に戻され、それ以降の処理が繰り返される。即ち、再生終了の指示がなさるまでの間、ステップＳ２２乃至Ｓ３１のループ処理が繰り返し実行される。
ここで、ステップＳ２２乃至Ｓ３１のループ処理は、本実施形態では、図３の音声再生処理とあわせて２３ｍｓ毎に実行されるものとする。即ち、音声再生処理により認識される音声内容が更新される毎に、ステップＳ２３の処理で、更新後の音声内容が取得されるものとする。これにより、ステップＳ２５又はＳ２６の処理で付加される装飾画像は、２３ｍｓ毎に更新されることになる。
一方、次回再生対象のオリジナルの画像データは、ステップＳ２８乃至Ｓ３０の処理より、表示切替条件を満たす毎に、即ち、本実施形態では３ｓ毎に更新されることになる。即ち、オリジナルの画像は、３ｓ毎に、いわゆるスライド再生されることになる。 In this case, when the instruction to end the reproduction is not given, it is determined as NO in Step S31, the process is returned to Step S22, and the subsequent processes are repeated. That is, the loop process of steps S22 to S31 is repeatedly executed until an instruction to end reproduction is given.
Here, in this embodiment, the loop processing of steps S22 to S31 is executed every 23 ms together with the audio reproduction processing of FIG. That is, every time the audio content recognized by the audio reproduction process is updated, the updated audio content is acquired in the process of step S23. Thereby, the decoration image added by the process of step S25 or S26 is updated every 23 ms.
On the other hand, the original image data to be reproduced next time is updated every time the display switching condition is satisfied, that is, every 3 s in this embodiment, from the processing of steps S28 to S30. That is, the original image is so-called slide-played every 3 s.

その後、再生終了の指示がなされると、ステップＳ３１においてＹＥＳであると判定されて、画像再生処理は終了となる。なお、このとき、図３の音声再生処理もほぼ同時に終了することになる。 Thereafter, when an instruction to end reproduction is given, it is determined as YES in step S31, and the image reproduction processing is ended. At this time, the audio reproduction process of FIG. 3 is also almost completed.

以上説明したように、本実施形態の再生装置１は、表示部１８と、音声出力部１９と、音声内容認識部５２と、音声内容反映部５４と、を備えている。
音声出力部１９は、音声データを再生することによって、当該音声データにより表される音声を出力する。
音声内容認識部５２は、音声出力部１９の再生対象の音声データを解析することによって、当該音声データに含まれる音声内容を認識する。
表示部１８は、画像データを再生することによって、当該画像データにより表される画像を、オリジナルの画像として表示する。
音声内容反映部５４は、表示部１８により画像データが再生されている最中に、オリジナルの画像の構成及び構図を維持したまま、音声内容認識部５２により認識された音声内容を反映させた画像を表示する処理を、音声内容反映処理として実行する。
これにより、オリジナルの画像データを再生している最中に、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現することが可能になる。
また、音声内容反映部５４は、オリジナルの画像に対して、音声内容認識部５２により認識された音声内容に対応した装飾画像を付加する画像処理を、音声内容反映処理として、再生対象の画像データに対して施す。
従って、音声内容に対応した装飾画像を付加することにより、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像も表示させる処理を容易かつ手軽に実現することが可能となる。
これにより、オリジナルの画像を改変せずに、音声データに含まれる音声内容を再生対象の画像データに対し反映することができ、ユーザによる表現方法の多様化を図ることができる。 As described above, the playback device 1 of this embodiment includes the display unit 18, the audio output unit 19, the audio content recognition unit 52, and the audio content reflection unit 54.
The audio output unit 19 outputs audio represented by the audio data by reproducing the audio data.
The audio content recognition unit 52 recognizes the audio content included in the audio data by analyzing the audio data to be reproduced by the audio output unit 19.
The display unit 18 displays the image represented by the image data as an original image by reproducing the image data.
The audio content reflecting unit 54 reflects the audio content recognized by the audio content recognizing unit 52 while maintaining the configuration and composition of the original image while the display unit 18 is reproducing the image data. Is displayed as a voice content reflection process.
As a result, while reproducing the original image data, it is possible to easily and easily realize a process of displaying an image reflecting the audio content included in the audio data while maintaining the composition and configuration of the original image. It becomes possible.
In addition, the audio content reflecting unit 54 performs image processing for adding a decoration image corresponding to the audio content recognized by the audio content recognizing unit 52 on the original image as audio content reflecting processing. To apply.
Therefore, by adding a decoration image corresponding to the audio content, it is possible to easily and easily realize a process of displaying an image reflecting the audio content included in the audio data while maintaining the composition and configuration of the original image. Is possible.
As a result, the audio content included in the audio data can be reflected on the image data to be reproduced without modifying the original image, and the user's method of expression can be diversified.

以上、本発明の第１実施形態に係る再生装置について説明した。
以下、本発明の第２実施形態に係る再生装置について説明する。 The playback device according to the first embodiment of the present invention has been described above.
Hereinafter, a reproducing apparatus according to the second embodiment of the present invention will be described.

［第２実施形態］
以上説明したように、本発明に係る再生装置は、画像再生処理の少なくとも一部として、再生対象の画像データを再生する場合、次のような音声内容反映処理を実行する。即ち、再生装置は、再生対象の画像データにより表わされる画像を、オリジナルの画像として、当該オリジナルの画像の構成及び構図を維持したまま、音声再生処理により認識された音声内容を反映させた画像を表示する処理を、音声内容反映処理として実行する。 [Second Embodiment]
As described above, the playback apparatus according to the present invention executes the following audio content reflection process when playing back image data to be played back as at least a part of the image playback process. That is, the playback device uses an image represented by the image data to be played back as an original image, and reflects an audio content recognized by the audio playback processing while maintaining the configuration and composition of the original image. The display process is executed as a voice content reflection process.

第１実施形態では、音声内容反映処理として、オリジナルの画像に対して、音声再生処理により認識された音声内容に対応した装飾画像を付加する画像処理が採用された。これにより、オリジナルの画像の構成及び構図を維持したまま、上述の音声再生処理により認識された音声内容を反映させた画像（本実施形態では装飾画像）の表示が実現可能になる。 In the first embodiment, image processing for adding a decoration image corresponding to the audio content recognized by the audio reproduction processing to the original image is adopted as the audio content reflection processing. This makes it possible to display an image (decorative image in this embodiment) that reflects the audio content recognized by the above-described audio reproduction process while maintaining the configuration and composition of the original image.

これに対して、第２実施形態では、複数の再生対象候補の画像データの中から再生対象が選択されるものとして、次のような音声内容反映処理が採用される。即ち、複数の再生対象候補の画像データのうち、音声再生処理により認識された音声内容に対応する画像データを、再生対象の画像データとして選択する処理が、音声内容反映処理として採用される。 On the other hand, in the second embodiment, the following audio content reflection process is adopted on the assumption that a reproduction target is selected from a plurality of reproduction target candidate image data. That is, a process of selecting image data corresponding to the audio content recognized by the audio reproduction process among the plurality of reproduction target candidate image data as the reproduction target image data is adopted as the audio content reflection process.

このような第２実施形態の音声内容反映処理は、第１実施形態の再生装置１と同様のハードウェア構成及び機能的構成により実現できる。
そこで、第２実施形態に係る再生装置も、図１のハードウェア構成を有し、かつ、図２の機能的構成を有するものとする。従って、第２実施形態に係る再生装置についても、第１実施形態と同一の符号を用いて説明し、第１実施形態で説明した構成については、その説明を省略する。
また、第２実施形態に係る再生装置１の処理のうち、音楽再生処理は、第１実施形態と同様に図３のフローチャートに従って実行することが可能である。そこで、第２実施形態に係る音楽再生処理の説明は省略する。 Such audio content reflection processing of the second embodiment can be realized by the same hardware configuration and functional configuration as those of the playback device 1 of the first embodiment.
Therefore, the playback apparatus according to the second embodiment also has the hardware configuration of FIG. 1 and the functional configuration of FIG. Therefore, the playback apparatus according to the second embodiment will also be described using the same reference numerals as those in the first embodiment, and the description of the configuration described in the first embodiment will be omitted.
Of the processes of the playback apparatus 1 according to the second embodiment, the music playback process can be executed according to the flowchart of FIG. 3 as in the first embodiment. Therefore, the description of the music playback process according to the second embodiment is omitted.

そこで、以下、図８を参照して、第２実施形態に係る再生装置１の処理のうち、画像再生処理についてのみ説明する。
図８は、第２実施形態に係る画像再生処理の流れを説明するフローチャートである。 Therefore, with reference to FIG. 8, only the image reproduction process among the processes of the reproduction apparatus 1 according to the second embodiment will be described below.
FIG. 8 is a flowchart for explaining the flow of image reproduction processing according to the second embodiment.

例えば、第２実施形態に係る画像再生処理は、ユーザが操作部１７を指示操作することによって、再生対象の音声データ、再生対象候補の複数のオリジナルの画像データを決定したことを契機として、開始する。即ち、図３の音声再生処理の開始と同期して図８の画像再生処理も開始する。 For example, the image reproduction processing according to the second embodiment is started when the user determines the sound data to be reproduced and a plurality of original image data to be reproduced by operating the operation unit 17. To do. That is, the image reproduction process of FIG. 8 is also started in synchronization with the start of the audio reproduction process of FIG.

ただし、第２実施形態では、再生対象候補の複数のオリジナルの画像データについては、再生順番は特に決定されておらず、場合によっては（再生される音声データに含まれる音声内容によっては）、再生されない可能性もあり得る。ただし、再生対象候補の複数のオリジナルの画像データの各々は、音声再生処理により認識される得る音声内容のうちの少なくとも１つが対応付けられているものとする。
具体的には例えば、第２実施形態でも、第１実施形態と同様に、音声再生処理により認識される得る音声内容として、男性の声、女性の声、及び人間の声を含まない音の３種類が採用されているものとする。従って、ここでは、再生対象候補の複数のオリジナルの画像データの各々は、男性の声、女性の声、及び人間の声を含まない音の３種類のうちの何れかの種類が対応付けられているものとする。
より具体的には例えば、ここでは、男性を被写体に含む画像データに対しては、男性の声が対応付けられているものとする。また、女性を被写体に含む画像データに対しては、女性の声が対応付けられているものとする。そして、人間を含まない風景画等の画像データに対しては、人間の声を含まない音が対応付けられているものとする。 However, in the second embodiment, the order of reproduction is not particularly determined for a plurality of original image data to be reproduced, and in some cases (depending on the audio content included in the reproduced audio data) There is a possibility that it is not. However, it is assumed that each of a plurality of original image data as candidates for reproduction is associated with at least one of audio contents that can be recognized by the audio reproduction process.
Specifically, for example, in the second embodiment as well, in the same way as in the first embodiment, the audio content that can be recognized by the audio reproduction process is 3 voices that do not include male voice, female voice, and human voice. It is assumed that the type is adopted. Accordingly, here, each of the plurality of original image data of the reproduction target candidates is associated with any one of three types of sounds including a male voice, a female voice, and a human voice. It shall be.
More specifically, for example, it is assumed that a male voice is associated with image data including a male as a subject. Further, it is assumed that a female voice is associated with image data including a female as a subject. It is assumed that sound data that does not include a human voice is associated with image data such as a landscape image that does not include a human being.

ステップＳ４１において、音声内容反映部５４は、音声内容を取得する。ここで、音声内容反映部５４が取得する音声内容とは、図３の音声再生処理のステップＳ６の処理で音声内容認識部５２により認識された、処理対象期間の音声データについての音声内容である。具体的には例えば、音声内容反映部５４は、ステップＳ２３の処理で、第１実施形態と同様の識別コードＣＢ，ＣＣ，ＣＤのうちの何れかを音声内容として取得するものとする。 In step S41, the audio content reflecting unit 54 acquires the audio content. Here, the audio content acquired by the audio content reflection unit 54 is the audio content of the audio data in the processing target period recognized by the audio content recognition unit 52 in step S6 of the audio reproduction process of FIG. . Specifically, for example, the audio content reflecting unit 54 acquires any of the identification codes CB, CC, and CD as in the first embodiment as the audio content in the process of step S23.

ステップＳ４２において、音声内容反映部５４は、ステップＳ４１の処理で取得した音声内容が前回から変化したか否かを判定する。 In step S42, the audio content reflecting unit 54 determines whether or not the audio content acquired in the process of step S41 has changed from the previous time.

直前の回のステップＳ４１の処理で取得された識別コードが、その前の回のステップＳ４１の処理で取得された識別コードと同一である場合、音声内容が前回から変化していないため、ステップＳ４２において、ＮＯであると判定されて、処理はステップＳ４３に進む。
ステップＳ４３において、音声内容反映部５４は、前回と同一の画像データを、次回再生対象として、再生画像取得部５３を介して取得する、といった音声内容反映処理を実行する。
即ち、音声内容反映部５４は、前回と同一の画像データを、次回再生対象として取得するように、再生画像取得部５３に指示する。再生画像取得部５３は、指示された画像データを画像記憶部３２から取得して、音声内容反映部５４に供給する。
なお、音声内容反映部５４が、画像データをバッファリングする機能を有している場合、特に、再生画像取得部５３に同一の画像データを取得させる必要はない。 If the identification code acquired in the immediately previous step S41 is the same as the identification code acquired in the previous step S41, the audio content has not changed from the previous time, so step S42. In step S43, the process proceeds to step S43.
In step S43, the audio content reflection unit 54 executes an audio content reflection process such that the same image data as the previous time is acquired through the reproduction image acquisition unit 53 as the next reproduction target.
In other words, the audio content reflection unit 54 instructs the reproduction image acquisition unit 53 to acquire the same image data as the previous reproduction object as the next reproduction target. The reproduction image acquisition unit 53 acquires the instructed image data from the image storage unit 32 and supplies the acquired image data to the audio content reflection unit 54.
When the audio content reflecting unit 54 has a function of buffering image data, it is not particularly necessary for the reproduced image acquiring unit 53 to acquire the same image data.

これに対して、直前の回のステップＳ４１の処理で取得された識別コードが、その前の回のステップＳ４１の処理で取得された識別コードと異なる場合、音声内容が前回から変化しているため、ステップＳ４２において、ＹＥＳであると判定されて、処理はステップＳ４４に進む。
ステップＳ４４において、音声内容反映部５４は、音声内容に対応した画像データを、次回再生対象として、再生画像取得部５３を介して取得する、といった音声内容反映処理を実行する。
なお、ステップ４４の処理は、取得対象の画像データが異なることを除いては、ステップＳ４３の処理と基本的に同様であるため、画像データの具体的な取得手法等については、その説明を省略する。 On the other hand, if the identification code acquired in the previous step S41 is different from the identification code acquired in the previous step S41, the audio content has changed from the previous time. In step S42, it is determined as YES, and the process proceeds to step S44.
In step S44, the audio content reflection unit 54 executes an audio content reflection process such that image data corresponding to the audio content is acquired through the reproduction image acquisition unit 53 as a next reproduction target.
Note that the processing in step 44 is basically the same as the processing in step S43 except that the image data to be acquired is different, and therefore a description of a specific method for acquiring image data is omitted. To do.

ステップＳ４５において、表示部１８は、ＣＰＵ１１の制御の下、ステップＳ４３又はＳ４４の処理で次回再生対象として取得された画像データ、即ち、オリジナルの画像データを再生する。これにより、本実施形態では、図９に示すようなオリジナルの画像であって、音声内容に対応した画像がディスプレイ６２に表示される。 In step S45, under the control of the CPU 11, the display unit 18 reproduces the image data acquired as the next reproduction target in the process of step S43 or S44, that is, the original image data. Thereby, in the present embodiment, an original image as shown in FIG. 9 and an image corresponding to the audio content is displayed on the display 62.

図９は、ステップＳ４５の処理で表示されるオリジナルの画像であって、音声内容に対応した画像の一例を示している。
図９の例では、３つのオリジナルの画像１１１乃至１１３が、再生対象候補の画像として採用されている。オリジナルの画像１１１とは、女性１２１を被写体に含む画像であって、女性の声に対応付けられている。オリジナルの画像１１２とは、男性１２２を被写体に含む画像であって、男性の声に対応付けられている。オリジナルの画像１１３とは、人間を含まず風景１２３を被写体に含む画像であって、人間の声を含まない音が対応付けられている。
例えば、ステップＳ４１の処理で音声内容として識別コードＣＢが取得された場合、即ち男性の声が認識された場合、次のステップＳ４３又はステップＳ４４の処理で、オリジナルの画像１１２の画像データを次回再生対象として取得する、といった音声内容反映処理が実行される。その結果、次のステップＳ４５の処理では、図９の右方に示すような、男性１２２を被写体に含むオリジナルの画像１１２がディスプレイ６２に表示される。
また例えば、ステップＳ４１の処理で音声内容として識別コードＣＣが取得された場合、即ち女性の声が認識された場合、次のステップＳ４３又はステップＳ４４の処理で、オリジナルの画像１１１の画像データを次回再生対象として取得する、といった音声内容反映処理が実行される。その結果、次のステップＳ４５の処理では、図９の左上方に示すような、女性１２１を被写体に含むオリジナルの画像１１１がディスプレイ６２に表示される。
また例えば、ステップＳ４１の処理で音声内容として識別コードＣＤが取得された場合、即ち人間の声を含まない音が認識された場合、次のステップＳ４３又はステップＳ４４の処理で、オリジナルの画像１１３の画像データを次回再生対象として取得する、といった音声内容反映処理が実行される。その結果、次のステップＳ４５の処理では、図９の左下方に示すような、人間を被写体に含まずに風景１２３のみを被写体に含むオリジナルの画像１１３がディスプレイ６２に表示される。 FIG. 9 shows an example of an original image displayed in the process of step S45 and corresponding to the audio content.
In the example of FIG. 9, three original images 111 to 113 are adopted as reproduction target candidate images. The original image 111 is an image including a female 121 as a subject, and is associated with a female voice. The original image 112 is an image including a male 122 as a subject, and is associated with a male voice. The original image 113 is an image that does not include a human and includes a landscape 123 as a subject, and is associated with a sound that does not include a human voice.
For example, when the identification code CB is acquired as the audio content in the process of step S41, that is, when a male voice is recognized, the image data of the original image 112 is reproduced next time in the process of the next step S43 or step S44. A voice content reflection process such as acquisition as a target is executed. As a result, in the next step S45, the original image 112 including the male 122 as a subject is displayed on the display 62 as shown on the right side of FIG.
Further, for example, when the identification code CC is acquired as the audio content in the process of step S41, that is, when a female voice is recognized, the image data of the original image 111 is next time processed in the next step S43 or step S44. An audio content reflection process is performed such as acquisition as a reproduction target. As a result, in the process of the next step S45, the original image 111 including the woman 121 as a subject as shown in the upper left part of FIG.
Further, for example, when the identification code CD is acquired as the voice content in the process of step S41, that is, when a sound that does not include a human voice is recognized, the original image 113 is processed in the next step S43 or step S44. A sound content reflection process is executed such that image data is acquired as a next playback target. As a result, in the processing of the next step S45, an original image 113 that includes only the landscape 123 and does not include a person as a subject, as shown in the lower left of FIG.

このようにして、ステップＳ４５の処理により、オリジナルの画像であって、音声内容に対応した画像がディスプレイ６２に表示されると、処理はステップＳ４６に進む。 In this way, when the image corresponding to the audio content is displayed on the display 62 by the process of step S45, the process proceeds to step S46.

ステップＳ４６において、再生画像取得部５３は、処理の終了指示があったか否かを判定する。
終了の指示は、特に限定されないが、第２実施形態でも、第１実施形態と同様に、図３の音声再生処理と同一の指示、即ち、ユーザが操作部１７を操作して行う、画像及び音声の再生終了の指示が採用されているものとする。 In step S46, the reproduced image acquisition unit 53 determines whether or not there is an instruction to end the process.
The end instruction is not particularly limited, but in the second embodiment as well, in the same manner as in the first embodiment, the same instruction as the sound reproduction processing in FIG. It is assumed that an instruction for ending audio playback is adopted.

この場合、再生終了の指示がなされていない場合、ステップＳ４６においてＮＯであると判定されて、処理はステップＳ４１に戻され、それ以降の処理が繰り返される。即ち、再生終了の指示がなさるまでの間、ステップＳ４１乃至Ｓ４６のループ処理が繰り返し実行される。
ここで、ステップＳ４１乃至Ｓ４６のループ処理は、図３の音声再生処理とあわせて２３ｍｓ毎に実行されるものとすると、２３ｍｓでオリジナル画像の表示が更新されてしまい、ユーザの目には更新タイミングが早過ぎるように映る。従って、ここでは、ステップＳ４１乃至Ｓ４６のループ処理は、図３の音声再生処理とあわせて３ｓ毎に実行されるものとする。これにより、３ｓ毎に、音声内容に対応するオリジナルの画像が、いわゆるスライド再生されることになる。 In this case, if an instruction to end reproduction has not been given, it is determined as NO in step S46, the process returns to step S41, and the subsequent processes are repeated. That is, until the end of reproduction is instructed, the loop processing of steps S41 to S46 is repeatedly executed.
Here, assuming that the loop processing of steps S41 to S46 is executed every 23 ms in combination with the audio reproduction processing of FIG. 3, the display of the original image is updated in 23 ms, and the update timing is in the eyes of the user. Appears to be too early. Therefore, here, it is assumed that the loop processing of steps S41 to S46 is executed every 3 s together with the audio reproduction processing of FIG. Thus, every 3 s, the original image corresponding to the audio content is so-called slide reproduced.

その後、再生終了の指示がなされると、ステップＳ４６においてＹＥＳであると判定されて、画像再生処理は終了となる。なお、このとき、図３の音声再生処理もほぼ同時に終了することになる。 Thereafter, when an instruction to end reproduction is given, it is determined as YES in step S46, and the image reproduction processing is ended. At this time, the audio reproduction process of FIG. 3 is also almost completed.

以上説明したように、本実施形態の再生装置１は、表示部１８と、音声出力部１９と、音声内容認識部５２と、音声内容反映部５４と、を備えている。
音声出力部１９は、音声データを再生することによって、当該音声データにより表される音声を出力する。
音声内容認識部５２は、音声出力部１９の再生対象の音声データを解析することによって、当該音声データに含まれる音声内容を認識する。
表示部１８は、複数の再生対象候補の画像データの中から選択された、再生対象の画像データを再生することによって、当該画像データにより表される画像を、オリジナルの画像として表示する。
音声内容反映部５４は、表示部１８により画像データが再生されている最中に、オリジナルの画像の構成及び構図を維持したまま、複数の再生対象候補の画像データのうち、音声内容認識部５２により認識された音声内容に対応する画像データを、再生対象の画像データとして選択する処理を、音声内容反映処理として実行する。
従って、音声内容に対応する画像データを再生対象の画像データとして選択することにより、オリジナルの画像の構図及び構成を維持したまま、音声データに含まれる音声内容を反映した画像を表示することができる。
これにより、オリジナルの画像を改変せずに、複数のオリジナル画像を用いて、ユーザによる表現方法の多様化を図ることができる。 As described above, the playback device 1 of this embodiment includes the display unit 18, the audio output unit 19, the audio content recognition unit 52, and the audio content reflection unit 54.
The audio output unit 19 outputs audio represented by the audio data by reproducing the audio data.
The audio content recognition unit 52 recognizes the audio content included in the audio data by analyzing the audio data to be reproduced by the audio output unit 19.
The display unit 18 displays the image represented by the image data as an original image by reproducing the reproduction target image data selected from the plurality of reproduction target candidate image data.
While the image data is being reproduced by the display unit 18, the audio content reflection unit 54 maintains the configuration and composition of the original image, and among the plurality of reproduction target candidate image data, the audio content recognition unit 52. The process of selecting the image data corresponding to the audio content recognized by the above as image data to be reproduced is executed as the audio content reflection process.
Accordingly, by selecting the image data corresponding to the audio content as the image data to be reproduced, an image reflecting the audio content included in the audio data can be displayed while maintaining the composition and configuration of the original image. .
Thereby, it is possible to diversify the expression method by the user using a plurality of original images without modifying the original image.

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の目的を達成できる範囲での変形、改良等は本発明に含まれるものである。 In addition, this invention is not limited to the above-mentioned embodiment, The deformation | transformation in the range which can achieve the objective of this invention, improvement, etc. are included in this invention.

例えば、上述の実施形態では、オリジナルの画像に対して、音声内容に対応した装飾画像を付加しているが、この装飾画像は、動きのない静止画に限定されない。例えば、装飾画像は、動きのあるアニメーションＧＩＦや、ＡｄｏｂｅＦｌａｓｈ（登録商標）形式、ＭＰ４形式の動画像でもよい。
また、装飾画像として、男性の声に対応する装飾画像９１と、女性の声に対応する装飾画像９２と、人間の声を含まない音に対応する装飾画像９３（音符で模している装飾画像９３）を採用しているがこれらに限定されない。例えば、音声内容のテンポに応じて装飾画像上のキャラクタが踊ったり、音声内容の発声に対応して装飾画像上の男性、女性又は動物等のキャラクタが口パクしたりしてもよい。 For example, in the above-described embodiment, a decoration image corresponding to the audio content is added to the original image, but the decoration image is not limited to a still image without movement. For example, the decoration image may be a moving animation GIF, a moving image in the Adobe Flash (registered trademark) format, or the MP4 format.
In addition, as a decorative image, a decorative image 91 corresponding to a male voice, a decorative image 92 corresponding to a female voice, and a decorative image 93 corresponding to a sound not including a human voice (decorative image imitating a musical note) 93) is employed, but is not limited thereto. For example, a character on the decorative image may dance according to the tempo of the audio content, or a character such as a man, a woman, or an animal on the decorative image may puncture in response to the utterance of the audio content.

また、上述の実施形態では、音声データは、楽曲であるがこれに限られない。例えば、音声データとしては、声のみのナレーションや、台詞等により構成されている音声データであってもよい。 In the above-described embodiment, the audio data is music, but is not limited thereto. For example, the voice data may be voice data composed of voice-only narration or dialogue.

なお、音声記憶部３１が記憶する音声データは、ドライブ２２を介してリムーバブルメディア４１から取得した音声データに限定されず、通信部２１を介して外部から取得した音声データであってもよい。 The audio data stored in the audio storage unit 31 is not limited to the audio data acquired from the removable medium 41 via the drive 22, and may be audio data acquired from the outside via the communication unit 21.

また例えば、上述した実施形態では、本発明が適用される再生装置１は、デジタルフォトフレームとして構成される例として説明した。
しかしながら、本発明は、特にこれに限定されず、表示機能を有する電子機器一般に適用することができ、例えば、本発明は、パーソナルコンピュータ、携帯型ナビゲーション装置、ポータブルゲーム機等に幅広く適用可能である。 For example, in the above-described embodiment, the playback apparatus 1 to which the present invention is applied has been described as an example configured as a digital photo frame.
However, the present invention is not particularly limited to this, and can be applied to general electronic devices having a display function. For example, the present invention can be widely applied to personal computers, portable navigation devices, portable game machines, and the like. .

上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウェアにより実行させることもできる。 The series of processes described above can be executed by hardware or can be executed by software.

一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、コンピュータ等にネットワークや記録媒体からインストールされる。コンピュータは、専用のハードウェアに組み込まれているコンピュータであってもよい。また、コンピュータは、各種のプログラムをインストールすることで、各種の機能を実行することが可能なコンピュータ、例えば汎用のパーソナルコンピュータであってもよい。 When a series of processing is executed by software, a program constituting the software is installed on a computer or the like from a network or a recording medium. The computer may be a computer incorporated in dedicated hardware. The computer may be a computer capable of executing various functions by installing various programs, for example, a general-purpose personal computer.

このようなプログラムを含む記録媒体は、ユーザにプログラムを提供するために装置本体とは別に配布されるリムーバブルメディア４１により構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに提供される記録媒体等で構成される。リムーバブルメディア４１は、例えば、磁気ディスク（フロッピディスクを含む）、光ディスク、又は光磁気ディスク等により構成される。光ディスクは、例えば、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等により構成される。光磁気ディスクは、ＭＤ（Ｍｉｎｉ−Ｄｉｓｋ）等により構成される。また、装置本体に予め組み込まれた状態でユーザに提供される記録媒体は、例えば、プログラムが記録されているＲＯＭ１２や記憶部２０に含まれるハードディスク等で構成される。 The recording medium including such a program is provided not only by the removable medium 41 distributed separately from the apparatus main body in order to provide the program to the user, but also provided to the user in a state of being preinstalled in the apparatus main body. Recording medium. The removable medium 41 is composed of, for example, a magnetic disk (including a floppy disk), an optical disk, a magneto-optical disk, or the like. The optical disk is composed of, for example, a CD-ROM (Compact Disk-Read Only Memory), a DVD (Digital Versatile Disk), or the like. The magneto-optical disk is configured by an MD (Mini-Disk) or the like. In addition, the recording medium provided to the user in a state of being incorporated in advance in the apparatus main body includes, for example, a ROM 12 in which a program is recorded, a hard disk included in the storage unit 20, and the like.

なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、その順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的或いは個別に実行される処理をも含むものである。 In the present specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in time series along the order, but is not necessarily performed in time series, either in parallel or individually. The process to be executed is also included.

１・・・再生装置、１１・・・ＣＰＵ、１２・・・ＲＯＭ、１３・・・ＲＡＭ、１４・・・ＲＴＣ、１５・・・バス、１６・・・入出力インターフェース、１７・・・操作部、１８・・・表示部、１９・・・音声出力部、２０・・・記憶部、２１・・・通信部、２２・・・ドライブ、３１・・・音声記憶部、３２・・・画像記憶部、４１・・・リムーバブルメディア、５１・・・再生音声取得部、５２・・・音声内容認識部、５３・・・再生画像取得部、５４・・・音声内容反映部、６１・・・表示制御部、６２・・・ディスプレイ、７１・・・音声出力制御部、７２・・・スピーカ DESCRIPTION OF SYMBOLS 1 ... Playback apparatus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... RTC, 15 ... Bus, 16 ... Input / output interface, 17 ... Operation , 18 ... display unit, 19 ... audio output unit, 20 ... storage unit, 21 ... communication unit, 22 ... drive, 31 ... audio storage unit, 32 ... image Storage unit 41... Removable media 51. Reproduction audio acquisition unit 52. Audio content recognition unit 53. Reproduction image acquisition unit 54. Audio content reflection unit 61. Display control unit, 62... Display, 71... Audio output control unit, 72.

Claims

Sound reproducing means for outputting sound represented by the sound data by playing the sound data;
Audio content recognition means for recognizing audio content included in the audio data by analyzing the audio data to be reproduced by the audio reproduction means;
Image reproducing means for displaying an image represented by the image data as an original image by reproducing the image data;
While the image data is being reproduced by the image reproduction means, an image reflecting the audio content recognized by the audio content recognition means is displayed while maintaining the configuration and composition of the original image. Voice content reflecting means for executing processing as voice content reflecting processing;
A playback device comprising:

The voice content reflecting means is
Image processing for adding a decoration image corresponding to the audio content recognized by the audio content recognition means to the original image is applied to the reproduction target image data as the audio content reflection processing.
The playback apparatus according to claim 1.

The image reproduction means reproduces the reproduction target image data selected from a plurality of reproduction target candidate image data,
The voice content reflecting means is
The process of selecting the image data corresponding to the audio content recognized by the audio content recognition means as the image data to be played back as the audio content reflection process from among the plurality of reproduction target candidate image data To
The playback apparatus according to claim 1.

Sound reproducing means for outputting sound represented by the sound data by playing the sound data;
Image reproducing means for displaying an image represented by the image data as an original image by reproducing the image data;
A playback method for a playback device comprising:
An audio content recognition step for recognizing audio content included in the audio data by analyzing the audio data to be reproduced by the audio reproduction means;
While the image data is being reproduced by the image reproduction means, an image reflecting the audio content recognized by the audio content recognition step is processed while maintaining the configuration and composition of the original image. A voice content reflection step for executing the display process as a voice content reflection process;
A playback method including:

Sound reproducing means for outputting sound represented by the sound data by playing the sound data;
Image reproducing means for displaying an image represented by the image data as an original image by reproducing the image data;
A computer for controlling a playback device comprising:
An audio content recognition function for recognizing audio content included in the audio data by analyzing the audio data to be reproduced by the audio reproduction unit;
While the image data is being reproduced by the image reproduction means, an image reflecting the audio content recognized by the audio content recognition function is maintained while maintaining the configuration and composition of the original image. A voice content reflection function for executing the display process as a voice content reflection process;
A program that realizes