JP2023162709A

JP2023162709A - Imaging device

Info

Publication number: JP2023162709A
Application number: JP2022073271A
Authority: JP
Inventors: 亜也加木下; Ayaka Kinoshita; 潔関口; Kiyoshi Sekiguchi
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-04-27
Filing date: 2022-04-27
Publication date: 2023-11-09

Abstract

To provide an imaging device configured to add and record additional information matched with audio information of a recorded video to the record video using a method according to the additional information.SOLUTION: Included are: imaging means; a microphone; recording means for recording video data from the imaging means and audio data from the microphone on a recording medium; main scene determination means for determining main scenes in the video data recorded on the recording medium; audio analysis means for analyzing the audio data corresponding to the main scenes and extracting a keyword; obtaining means for obtaining a still image or video related to the keyword as a related image; and editing means for editing the recorded video data so as to edit the scene from which the keyword is extracted from among the main scenes. The editing means adds the related image obtained by the obtaining means to the scene from which the keyword is extracted in the video data, using a method according to the related image.SELECTED DRAWING: Figure 1

Description

本発明は撮像装置に関する。 The present invention relates to an imaging device.

近年、ユーザが撮影指示を与えることなく定期的および継続的に撮影を行うカメラが開発され実用化が進んでいる。例えば、室内の任意の場所にこのようなカメラを設置し、カメラが自動で撮影することで、日常の何気ない風景を記録する。 2. Description of the Related Art In recent years, cameras that regularly and continuously take pictures without the user giving a shooting instruction have been developed and are being put into practical use. For example, such a camera can be installed anywhere indoors and automatically take pictures to record casual scenes from everyday life.

このようなカメラで記録された動画や音声を再生する際、記録されたままの動画を再生するだけでは動きの少ない動画となってしまい鑑賞者にとって面白味に欠ける動画となってしまう場合がある。そのため、記録された動画を編集して、鑑賞者にとってより好適な動画にして再生することが望まれる。 When playing back moving images and audio recorded with such a camera, simply playing back the moving image as it was recorded may result in a moving image with little movement, resulting in a moving image that is uninteresting for viewers. Therefore, it is desirable to edit the recorded moving image to make it more suitable for viewers and to play it back.

これに対し、特許文献１では入力された音声からキーワードを抽出し、予め定められた複数の画像の中からキーワードに応じた画像を検索し、その画像を表示する技術が開示されている。 On the other hand, Patent Document 1 discloses a technique of extracting a keyword from input voice, searching for an image corresponding to the keyword from among a plurality of predetermined images, and displaying the image.

特開２０１７－１６２９６号公報Japanese Patent Application Publication No. 2017-16296

特許文献１に開示された従来技術では、音声に応じた関連画像を表示することができる。しかしながら記録された動画に付随する音声に対して同様の技術を適用した場合、動画の再生中に関連画像をどのように付加して再生するかが課題となる。 In the conventional technology disclosed in Patent Document 1, it is possible to display related images according to audio. However, when similar techniques are applied to audio accompanying recorded moving images, the problem becomes how to add and reproduce related images during playback of the moving image.

そこで本発明は、動画と共に記録された音声に合わせた付加情報を、付加情報に応じた方法で動画に付加して記録することを目的とする。 Therefore, an object of the present invention is to record additional information that matches the audio recorded together with the moving image by adding it to the moving image using a method that corresponds to the additional information.

本発明は、撮像手段と、マイクと、前記撮像手段からの動画データと前記マイクからの音声データとを記録媒体に記録する記録手段と、前記記録媒体に記録された動画データ内の主要シーンを判定する主要シーン判定手段と、前記主要シーンに対応した音声データを解析しキーワードを抽出する音声解析手段と、前記キーワードに関連する静止画あるいは動画を関連画像として取得する取得手段と、前記主要シーンのうち前記キーワードが抽出されたシーンを編集するように前記記録された動画データを編集する編集手段を有し、前記編集手段は、前記取得手段が取得した前記関連画像を、前記関連画像に応じた方法で、前記動画データ内の前記キーワードが抽出されたシーンに付加する。 The present invention includes an imaging means, a microphone, a recording means for recording moving image data from the imaging means and audio data from the microphone on a recording medium, and a recording means for recording main scenes in the moving image data recorded on the recording medium. a main scene determination means for determining a main scene; an audio analysis means for analyzing audio data corresponding to the main scene and extracting a keyword; an acquisition means for obtaining a still image or a moving image related to the keyword as a related image; The editing means edits the recorded video data so as to edit the scene from which the keyword is extracted, and the editing means edits the related image acquired by the acquisition means according to the related image. The keywords in the video data are added to the extracted scenes in a method according to the method described above.

本発明によれば、動画と共に記録された音声に合わせた付加情報を、付加情報に応じた方法で動画に付加して記録する。 According to the present invention, additional information matching the audio recorded together with the moving image is added to the moving image and recorded using a method according to the additional information.

撮像装置を示すブロック図である。FIG. 1 is a block diagram showing an imaging device. 動画撮影処理についてのフローチャートである。It is a flowchart regarding video shooting processing. 動画編集処理についてのフローチャートである。It is a flowchart regarding video editing processing. 撮影動画と編集後の動画の一例を示す図である。It is a figure showing an example of a photographed video and a video after editing.

以下に、本発明の好ましい実施の形態を、添付の図面に基づいて詳細に説明する。 Hereinafter, preferred embodiments of the present invention will be described in detail based on the accompanying drawings.

以下、図１を参照して、本発明の実施形態における撮像装置１００について説明する。レンズ１００１はズームレンズ及びフォーカスレンズを含むレンズ群である。レンズ制御部１００２は、後述する評価部１０１０での算出結果や、認識部１０１１によって抽出された被写体情報に基づいて、レンズ１００１の焦点距離や絞りの状態を制御するレンズ制御処理の機能を有する。ＣＰＵ１００３は撮像装置１００全体の制御を行う。ＣＰＵバス１００４はＣＰＵ１００３と各機能ブロックとの通信を行う。 Hereinafter, with reference to FIG. 1, an imaging apparatus 100 in an embodiment of the present invention will be described. A lens 1001 is a lens group including a zoom lens and a focus lens. The lens control unit 1002 has a function of lens control processing that controls the focal length and aperture state of the lens 1001 based on calculation results from an evaluation unit 1010 (described later) and subject information extracted by a recognition unit 1011. The CPU 1003 controls the entire imaging apparatus 100. A CPU bus 1004 performs communication between the CPU 1003 and each functional block.

ＲＡＭ制御部１００７は、各機能ブロックからのＲＡＭアクセス要求に基づき、ＲＡＭ１００６へアクセスする制御を行う。ＲＡＭバス１００５は、ＲＡＭ制御部１００７と各機能ブロックとの通信を行う。ＲＡＭバス１００５は各機能ブロックからＲＡＭ１００６へのアクセスを調停する機能も有する。 The RAM control unit 1007 controls access to the RAM 1006 based on RAM access requests from each functional block. The RAM bus 1005 performs communication between the RAM control unit 1007 and each functional block. The RAM bus 1005 also has a function of arbitrating access to the RAM 1006 from each functional block.

撮像部１００８は、レンズ１００１により取り込んだ光信号を不図示の撮像センサにより電気信号へ変換する撮像手段であり、得られた画像データに対してレンズ収差を補正する処理や、撮像センサの欠陥画素を補間する処理の機能を有する。現像部１００９は、撮像部１００８で生成した画像データに対し、デベイヤー処理を施して、輝度信号と色差信号から成る信号に変換し、各信号に含まれるノイズ除去、光学的な歪の補正、画像の適正化等の現像処理の機能を有する。評価部１０１０は、撮像部１００８で生成した画像データに基づいて、フォーカス状態や露出状態等の評価値を算出する評価値算出処理の機能を有する。 The imaging unit 1008 is an imaging unit that converts an optical signal captured by the lens 1001 into an electrical signal using an image sensor (not shown), and performs processing to correct lens aberrations for the obtained image data, and corrects defective pixels of the image sensor. It has a processing function that interpolates. The developing unit 1009 performs debayer processing on the image data generated by the imaging unit 1008 to convert it into a signal consisting of a luminance signal and a color difference signal, removes noise contained in each signal, corrects optical distortion, and processes the image. It has the function of development processing such as optimization of The evaluation unit 1010 has a function of an evaluation value calculation process that calculates evaluation values such as focus state and exposure state based on the image data generated by the imaging unit 1008.

認識部１０１１は、現像部１００９で現像処理された画像データ内の被写体情報を検出し、また、認識し、被写体情報を生成する認識処理の機能を有する。例えば、画像データ内の顔を検出し、検出した場合は、顔の位置を示す情報を出力し、さらに顔などの特徴情報に基づいて特定の人物の認証などを行う。 The recognition unit 1011 has a recognition processing function of detecting and recognizing subject information in the image data developed by the developing unit 1009 and generating subject information. For example, if a face is detected in image data, information indicating the position of the face is output, and a specific person is authenticated based on characteristic information such as the face.

表示制御部１０１２は、現像部１００９で現像処理された画像データに所定の表示処理を行った後、表示部１０１５に出力する表示制御処理の機能を有する。表示部１０１５は、画像データを再生する再生手段であって、例えば液晶ディスプレイによって構成される。 The display control unit 1012 has a display control function of performing predetermined display processing on the image data developed by the developing unit 1009 and then outputting the result to the display unit 1015. The display unit 1015 is a reproduction means for reproducing image data, and is configured by, for example, a liquid crystal display.

動画符号化部１０１３は、現像部１００９で現像処理された画像データをＭＥＰＧ等の所定の動画圧縮方式を用いて圧縮符号化し、情報量が圧縮された動画に変換する動画符号化処理の機能を有する。記録制御部１０１４は、現像部１００９で現像処理された画像データを記録媒体１０１６に記録する記録制御処理の機能を有する。記録媒体１０１６は、例えば、不揮発性のメモリカードやハードディスクなどである。 The video encoding unit 1013 has a video encoding function that compresses and encodes the image data developed by the developing unit 1009 using a predetermined video compression method such as MEPG, and converts it into a video with a compressed amount of information. have The recording control unit 1014 has a function of recording control processing to record the image data developed by the developing unit 1009 onto the recording medium 1016. The recording medium 1016 is, for example, a nonvolatile memory card or a hard disk.

マイク１０１７は、音声を音声信号に変換する。マイク制御部１０１８は、マイク１０１７と接続し、マイク１０１７の制御、収音の開始、及び、停止や、マイク１０１７からの音声の取得などを行う機能を有している。マイク１０１７の制御は例えば、ゲイン調整や、状態取得などである。音声符号／復号化部１０１９は、マイク１０１７からの音声信号をＡＡＣ等の所定の符号化方式で符号化あるいは復号化する。スピーカ１０２０は、音声符号／復号化部１０１９により復号化された音声信号を出力する。 Microphone 1017 converts voice into an audio signal. The microphone control unit 1018 is connected to the microphone 1017 and has functions such as controlling the microphone 1017, starting and stopping sound collection, and acquiring audio from the microphone 1017. Control of the microphone 1017 includes, for example, gain adjustment and status acquisition. Audio encoding/decoding section 1019 encodes or decodes the audio signal from microphone 1017 using a predetermined encoding method such as AAC. Speaker 1020 outputs the audio signal decoded by audio encoder/decoder 1019.

音声解析部１０２１は、マイク１０１７により取得した音声を解析する音声解析手段であり、音声データ内の特定のキーワードの検出や、音声データ内の音量変化の検出等の解析を行う。 The voice analysis unit 1021 is a voice analysis unit that analyzes the voice acquired by the microphone 1017, and performs analysis such as detecting a specific keyword in the voice data and detecting a change in volume in the voice data.

通信部１０２２は、有線又は無線によって撮像装置１００と他の装置とを接続し、画像データや音声データ等を送受信する通信インタフェースであって、無線ＬＡＮやインターネット等のネットワークにも接続できる。通信部１０２２は撮像装置１００で取得された画像データ並びに記録部１０１６に記録されている画像データを外部に送信でき、外部機器から画像データおよび各種情報を受信できる。 The communication unit 1022 is a communication interface that connects the imaging device 100 and other devices by wire or wirelessly, and transmits and receives image data, audio data, etc., and can also be connected to a network such as a wireless LAN or the Internet. The communication unit 1022 can transmit image data acquired by the imaging device 100 and image data recorded in the recording unit 1016 to the outside, and can receive image data and various information from external devices.

操作部１０２３は、撮像装置１００の各種設定を行うための、ユーザからの種々の操作を受け付ける。関連画像取得部１０２４は、音声解析部１０２１が検出したキーワードに関連する関連画像を、記録部１０１６あるいは通信部１０２２を介して接続するインターネット上から取得する。 The operation unit 1023 accepts various operations from the user to perform various settings of the imaging device 100. The related image acquisition unit 1024 acquires related images related to the keyword detected by the voice analysis unit 1021 from the Internet connected via the recording unit 1016 or the communication unit 1022.

主要シーン判定部１０２５は、認識部１０１１による画像の認識結果あるいは音声解析部１０２１による音声の解析結果に基づいて、撮影中のシーンが主要シーンであるかどうかを判定する主要シーン判定手段である。動画編集部１０２６は、記録媒体１０１６に記録された動画を編集し、関連画像取得部１０２４が取得した関連画像を付加する動画編集手段である。 The main scene determination unit 1025 is a main scene determination unit that determines whether the scene being photographed is a main scene based on the image recognition result by the recognition unit 1011 or the audio analysis result by the audio analysis unit 1021. The video editing unit 1026 is a video editing unit that edits the video recorded on the recording medium 1016 and adds related images acquired by the related image acquisition unit 1024.

次に、図２を参照して、撮像装置１００における動画の撮影処理について説明する。図２のフローチャートは、撮像装置１００の図示しない電源スイッチがＯＮかつ、動画撮影モードに設定されている状態であるときに実行される。 Next, with reference to FIG. 2, a video capturing process in the imaging device 100 will be described. The flowchart in FIG. 2 is executed when the power switch (not shown) of the imaging device 100 is turned on and set to video shooting mode.

ステップＳ２０１では、操作部１０２３に対するユーザの撮影開始指示操作により動画撮影を開始する。ステップＳ２０２では、ＣＰＵ１００３により動画の記録処理、および、音声の取得処理を開始する。動画の記録処理では、設定されたフレームレートで撮像部１００８により連続的に撮影を行い、取得した画像信号を動画符号化部１０１３により符号化し、動画データとしてＲＡＭバス１００５およびＲＡＭ制御部１００７を介してＲＡＭ１００６に記録する。 In step S201, video shooting is started by the user's operation on the operation unit 1023 to instruct the start of shooting. In step S202, the CPU 1003 starts video recording processing and audio acquisition processing. In the video recording process, the image capturing unit 1008 continuously captures images at a set frame rate, the acquired image signal is encoded by the video encoding unit 1013, and is sent as video data via the RAM bus 1005 and the RAM control unit 1007. and records it in the RAM 1006.

また、動画の記録と並行して、マイク制御部１０１８によりマイク１０１７からの音声の取得を行い、取得した音声データを音声符号／復号化部１０１９により符号化し、動画データに付随する音声データとしてＲＡＭ１００６に記録する。ＲＡＭ１００６に記録された動画データおよび音声データを一つの動画ファイルとして、記録制御部１０１４を介して記録媒体１０１６に記録する。この一連の動画の記録処理および音声の取得処理は、ユーザによる撮影終了指示操作があるまで繰り返し実行される。 In addition, in parallel with recording the video, the microphone control unit 1018 acquires audio from the microphone 1017, the audio encoder/decoder 1019 encodes the acquired audio data, and the audio data accompanying the video data is stored in the RAM 1006. to be recorded. The moving image data and audio data recorded in the RAM 1006 are recorded on the recording medium 1016 via the recording control unit 1014 as one moving image file. This series of video recording processing and audio acquisition processing is repeatedly executed until the user performs an operation to instruct the end of shooting.

ステップＳ２０３では、操作部１０２３に対する、ユーザからの撮影の終了指示の操作があったかどうかを判定する。撮影の終了指示の操作があった場合はステップＳ２０８に進む。撮影の終了指示の操作がない場合はステップＳ２０４に進む。 In step S203, it is determined whether the user operates the operation unit 1023 to instruct the end of photographing. If there is an operation to instruct the end of photographing, the process advances to step S208. If there is no operation to instruct the end of photographing, the process advances to step S204.

ステップＳ２０４では、主要シーン判定部１０２５により、撮影中のシーンが主要シーンであるかを判定する。主要シーンとは、例えば、撮影中のシーンの音量が高まった時とする。撮影中のシーンが主要シーンであると判定した場合はステップＳ２０５に進む。主要シーンでないと判定した場合はステップＳ２０３に戻る。 In step S204, the main scene determination unit 1025 determines whether the scene being photographed is a main scene. The main scene is, for example, when the volume of the scene being shot increases. If it is determined that the scene being photographed is a main scene, the process advances to step S205. If it is determined that the scene is not a main scene, the process returns to step S203.

ステップＳ２０５では、音声解析部１０２１が、マイク１０１７からの音声データを解析する。ステップＳ２０６では、音声解析部１０２１により音声データにキーワードがあるかどうかを判定する。キーワードは予めユーザにより設定され、記録部１０１６に記録されているものとする。また、キーワードとは例えば特定の場所を示す地名、あるいは特定の人物を示す人名であるとする。音声データにキーワードがある場合はステップＳ２０７に進む。キーワードがない場合はステップＳ２０３に戻る。 In step S205, the audio analysis unit 1021 analyzes audio data from the microphone 1017. In step S206, the voice analysis unit 1021 determines whether the voice data includes a keyword. It is assumed that the keyword is set by the user in advance and recorded in the recording unit 1016. Further, it is assumed that the keyword is, for example, a place name indicating a specific place or a person name indicating a specific person. If the voice data includes a keyword, the process advances to step S207. If there is no keyword, the process returns to step S203.

ステップＳ２０７では、取得された動画のうち、キーワードがあると判定されたシーンに対して、動画符号化部１０１３が動画データの符号化処理と合わせて内部マーカを付与する。内部マーカとして付与される情報には、当該シーン内で取得されたキーワードを示す情報が含まれる。付与された内部マーカは、後述する動画編集処理における編集点の目印となる。 In step S207, the video encoding unit 1013 adds an internal marker to a scene that is determined to have a keyword among the acquired videos, in conjunction with video data encoding processing. The information given as an internal marker includes information indicating a keyword acquired within the scene. The assigned internal marker serves as a mark of an editing point in video editing processing, which will be described later.

ステップＳ２０８では、動画の撮影を開始してから終了するまで、連続して動画の記録処理により記録した動画と音声を一つの動画ファイルとして記録媒体１０１６に記録し、動画ファイルをクローズする。そして一連の処理を終了する。 In step S208, the moving image and audio continuously recorded by the moving image recording process from the start to the end of shooting the moving image are recorded on the recording medium 1016 as one moving image file, and the moving image file is closed. Then, the series of processing ends.

このように、動画の撮影処理の際には、記録される音声にキーワードが含まれている場合に、対応する動画データ内のシーンに対して編集点の目印となる内部マーカを付与する。 In this way, when a video is shot, if a keyword is included in the audio to be recorded, an internal marker that serves as a mark of an editing point is added to a scene in the corresponding video data.

次に、図３を参照して、記録された動画ファイルに対する編集処理について説明する。図３のフローチャートは、ＣＰＵ１００３により、図２の動画の撮影処理が終了した後、引き続いて自動的に実行される。 Next, editing processing for recorded video files will be described with reference to FIG. The flowchart in FIG. 3 is continuously and automatically executed by the CPU 1003 after the video shooting process in FIG. 2 is completed.

ステップＳ３０１では、ＣＰＵ１００３により記録媒体１０１６に記録された動画ファイルに編集処理がまだ実行されていない内部マーカがあるかどうかを判定する。未処理の内部マーカがある場合はステップＳ３０２に進む。未処理の内部マーカがない場合は処理を終了する。 In step S301, it is determined whether or not there is an internal marker in the video file recorded on the recording medium 1016 by the CPU 1003 that has not been edited yet. If there is an unprocessed internal marker, the process advances to step S302. If there are no unprocessed internal markers, the process ends.

ステップＳ３０２では、未処理の内部マーカからキーワードを取得する。ステップＳ３０３では、関連画像取得部１０２４により、ステップＳ３０２で取得されたキーワードに関連する関連画像を取得する。関連画像とは、例えば、キーワードが特定の場所を示す地名であった場合は、当該キーワードが示す場所で撮影された画像とする。また、キーワードが特定の人物を示す場合は、当該キーワードが示す人物が被写体に含まれる画像とする。関連画像取得部１０２４は、記録媒体１０１６に記録された他の動画ファイルから関連画像を取得する。また、関連画像取得部１０２４は、通信部１０２２を介して接続された、不図示のネットワーク上のサーバー等から、関連画像を取得する。 In step S302, keywords are acquired from unprocessed internal markers. In step S303, the related image acquisition unit 1024 acquires a related image related to the keyword acquired in step S302. For example, if the keyword is a place name indicating a specific place, the related image is an image taken at the location indicated by the keyword. Furthermore, when the keyword indicates a specific person, the image is set to include the person indicated by the keyword as a subject. The related image acquisition unit 1024 obtains related images from other video files recorded on the recording medium 1016. Further, the related image acquisition unit 1024 acquires related images from a server or the like on a network (not shown) connected via the communication unit 1022.

ステップＳ３０４では、ステップＳ３０３で取得した関連画像が動画であるかどうかを判定する。動画である場合はステップ３０５に進む。動画ではなく静止画である場合はステップＳ３０７に進む。 In step S304, it is determined whether the related image acquired in step S303 is a moving image. If it is a video, the process advances to step 305. If the image is a still image rather than a moving image, the process advances to step S307.

ステップＳ３０５では、ステップＳ３０３で取得した動画が、所定時間以上の長さがある動画かどうかを判定する。所定時間以上の動画である場合はステップＳ３０６に進む。所定時間未満の動画である場合は当該動画を関連動画としてＳ３０７に進む。 In step S305, it is determined whether the video acquired in step S303 has a length longer than a predetermined time. If the video is longer than the predetermined time, the process advances to step S306. If the video is shorter than the predetermined time, the video is treated as a related video and the process advances to S307.

ステップＳ３０６では、ステップＳ３０３で取得した動画から所定時間切り出し、編集用の関連動画を作成する。動画の切り出しは、例えば動画の先頭から所定時間分を切り出すようにしてもよいし、先頭から一定時間経過した後の所定時間分を切り出すようにしてもよい。 In step S306, a predetermined period of time is cut out from the video acquired in step S303 to create a related video for editing. The video may be cut out, for example, by cutting out a predetermined amount of time from the beginning of the video, or by cutting out a predetermined amount of time after a certain amount of time has elapsed from the beginning.

ステップＳ３０７では、動画編集部１０２６が、ステップＳ３０３で取得した関連画像あるいはステップＳ３０６で作成した関連動画を、記録部１０１６に記録された動画ファイルの内部マーカが付与されているシーンに付加するように編集を行う。 In step S307, the video editing unit 1026 adds the related image acquired in step S303 or the related video created in step S306 to the scene to which the internal marker of the video file recorded in the recording unit 1016 is attached. Make edits.

ステップＳ３０３で取得した関連画像が静止画の場合は、動画ファイル内の内部マーカが付与されているシーンのタイミングで、当該静止画を所定時間表示するように動画ファイルを編集する。ステップＳ３０３で取得した関連画像が動画である場合は、その動画あるいはステップＳ３０６で切り出した関連動画を、動画ファイル内の内部マーカが付与されているシーンのタイミングで再生するように編集する。 If the related image acquired in step S303 is a still image, the video file is edited so that the still image is displayed for a predetermined period of time at the timing of a scene to which an internal marker is attached in the video file. If the related image acquired in step S303 is a video, the video or the related video cut out in step S306 is edited so as to be played back at the timing of the scene to which the internal marker is attached in the video file.

ここでの編集方法は、動画ファイルに付加する関連画像である静止画あるいは動画の内容に応じて決定する。例えば、関連画像内に主要な被写体が含まれている場合、鑑賞者にとって重要度の高い関連画像である可能性が高いため、当該シーンに関連画像を所定時間挿入して表示する方法で動画ファイルを編集する。関連画像内に主要な被写体が含まれていない場合、鑑賞者にとって重要度の高い関連画像である可能性が低いため、当該シーンの右下等に小さく重畳して表示する方法で動画ファイルを編集する。 The editing method here is determined depending on the content of the still image or moving image that is a related image to be added to the moving image file. For example, if the main subject is included in a related image, there is a high possibility that the related image is of high importance to the viewer, so the video file is created by inserting the related image into the relevant scene for a predetermined period of time and displaying it. Edit. If the main subject is not included in the related image, it is unlikely that the related image is of high importance to the viewer, so edit the video file by displaying it in a small superimposed position, such as in the lower right corner of the relevant scene. do.

ステップＳ３０７の後、ステップＳ３０１に戻り、動画ファイルに付与されている全ての内部マーカに対して動画編集処理を完了するまで、一連の処理を繰り返す。 After step S307, the process returns to step S301 and repeats a series of processes until the video editing process is completed for all internal markers added to the video file.

このように、音声解析部１０２１が検出したキーワードに関連する画像を取得し、その関連画像を付加した動画を作成することで、鑑賞者にとってより好適な動画を得ることができる。 In this way, by acquiring images related to the keywords detected by the audio analysis unit 1021 and creating a video with the related images added, a video more suitable for viewers can be obtained.

図４に、撮像装置１００によって撮影された動画と、その動画に関連画像を付加した後の動画の一例を示す。図４（ａ）は撮像装置１００によって撮影された動画である。図４（ａ）に示す動画には、時間の流れに沿ってシーン４０１～４０３がある。音声解析部１０２１によってシーン４０１に付随する音声データ内にキーワードが検出された場合、シーン４０１の後に関連画像が付加され、図４（ｂ）に示すような動画が作成される。図４（ｂ）ではシーン４０１とシーン４０２の間に、関連画像４０４が挿入されている。 FIG. 4 shows an example of a moving image captured by the imaging device 100 and a moving image after related images are added to the moving image. FIG. 4(a) is a moving image captured by the imaging device 100. The moving image shown in FIG. 4(a) includes scenes 401 to 403 along the flow of time. When the audio analysis unit 1021 detects a keyword in the audio data accompanying the scene 401, a related image is added after the scene 401, and a moving image as shown in FIG. 4(b) is created. In FIG. 4B, a related image 404 is inserted between a scene 401 and a scene 402.

以上説明したように本実施例によれば、記録した動画の音声情報に合わせた付加情報を、付加情報に応じた方法で記録動画に付加して記録する撮像装置を提供することを目的とする。 As described above, the present embodiment aims to provide an imaging device that adds and records additional information matching audio information of a recorded video to a recorded video using a method according to the additional information. .

前述の実施形態では、関連画像取得部１０２４が取得した関連画像が一つである場合について記載したが、複数の画像を取得した場合は、取得した画像のうち撮影日時が新しい画像から優先的に付加するようにしても良い。 In the above-described embodiment, the related image acquisition unit 1024 has acquired one related image, but when multiple images are acquired, the images with the latest shooting date and time are prioritized among the acquired images. It may be added.

また、図２のステップＳ２０４における主要シーンとは、撮影中のシーンの音声データの音量が高まった時としたが、これに限定しない。例えば、シーン内の複数の被写体が話している時としても良いし、音声データが示す感情の変化が大きい時としても良い。 Further, although the main scene in step S204 of FIG. 2 is defined as a time when the volume of audio data of the scene being photographed increases, the main scene is not limited to this. For example, it may be set when multiple subjects in the scene are talking, or when there is a large change in emotion indicated by the audio data.

また、編集後の動画ファイルの鑑賞性を高めるため、記録した動画ファイルに対する編集回数は所定時間ごとに１回以下等の制限を設けても良い。 Furthermore, in order to improve the viewing experience of the edited video file, the number of times a recorded video file can be edited may be limited to once or less every predetermined period of time.

さらに鑑賞性を高めるため、編集回数の少ない動画ファイルよりも編集回数が多い動画ファイルを優先的に再生するよう、編集回数が多い動画ファイルを推奨動画としてユーザに提案するようにしても良い。 Furthermore, in order to improve the viewing experience, a video file that has been edited many times may be suggested to the user as a recommended video so that video files that have been edited many times are played preferentially over video files that have been edited few times.

また、動画の撮影の終了後、引き続いて動画の編集処理を開始するようにしたが、ユーザによる開始指示に応じて動画の編集処理を開始するようにしても良い。ユーザによる開始指示は、例えば操作部１０２３に対する操作としても良いし、マイク１０１７を介した音声による指示としても良い。表示部１０１５は例えば液晶ディスプレイとしたが、これに限定しない。例えばプロジェクタ機能を有し、撮像装置１００外に投影するようにしても良い。 Further, although the video editing process is started after the video shooting ends, the video editing process may be started in response to a start instruction from the user. The user's start instruction may be, for example, an operation on the operation unit 1023 or a voice instruction via the microphone 1017. Although the display unit 1015 is, for example, a liquid crystal display, it is not limited to this. For example, it may have a projector function and project images outside the imaging device 100.

以上、本発明の好ましい実施形態について説明したが、本発明はこれらの実施形態に限定されず、その要旨の範囲内で種々の変形及び変更が可能である。 Although preferred embodiments of the present invention have been described above, the present invention is not limited to these embodiments, and various modifications and changes can be made within the scope of the invention.

１００撮像装置 100 Imaging device

Claims

an imaging means;
Mike and
recording means for recording video data from the imaging means and audio data from the microphone on a recording medium;
Main scene determining means for determining a main scene in video data recorded on the recording medium;
audio analysis means for analyzing audio data corresponding to the main scenes and extracting keywords;
acquisition means for acquiring a still image or video related to the keyword as a related image;
comprising editing means for editing the recorded video data so as to edit a scene from which the keyword is extracted from among the main scenes;
The image capturing apparatus is characterized in that the editing means adds the related image acquired by the acquisition means to the scene from which the keyword is extracted in the moving image data in a method according to the related image.

The imaging device according to claim 1, wherein the acquisition means acquires another image recorded on the recording medium or the related image from a network.

When the related image acquired by the acquisition means is a video, the editing means creates a cutout video cut out from the related image for a predetermined period of time, and adds the cutout video to the video data. , The imaging device according to claim 1 or 2.

Claims 1 to 3, wherein when the related image includes a main subject, the editing means inserts the related image into a scene from which the keyword is extracted in the video data. The imaging device according to any one of the above.

If the main subject is not included in the related image, the editing means superimposes the related image in a small size on the scene from which the keyword is extracted in the video data. 4. The imaging device according to any one of 4.

When there are a plurality of related images acquired by the acquisition means, the video editing means adds the image to the video data preferentially starting from the image with the latest shooting date and time among the plurality of related images. The imaging device according to any one of claims 1 to 5.

The imaging device according to any one of claims 1 to 6, wherein the main scene is a scene in which the volume of audio corresponding to the video data is higher than a predetermined value.

The imaging device according to any one of claims 1 to 6, wherein the main scene is a scene in which a plurality of subjects in the video data are talking.

The imaging device according to claim 1, wherein the main scene is a scene in which the emotion of audio in the video data changes significantly.

10. The imaging device according to claim 1, wherein when the keyword represents a place name, the acquisition means acquires still images and moving images taken at the location indicated by the keyword.

The imaging device according to any one of claims 1 to 9, wherein when the keyword represents a person's name, the acquisition means acquires still images and videos that include the person indicated by the keyword as a subject. .

12. The imaging device according to claim 1, wherein the editing unit edits the video data no more than once every predetermined time period of the video data.

The editing means records the video data edited by the video editing means on the recording means,
13. Any one of claims 1 to 12, further comprising a reproducing means for reproducing, as a recommended moving image, the moving image data having more scenes edited by the editing means from among the moving image data recorded on the recording medium. The imaging device described in section.

The imaging device according to claim 1, wherein the editing means edits the video data according to instructions from a user.