JP2014112787A

JP2014112787A - Video processing device and video processing method

Info

Publication number: JP2014112787A
Application number: JP2012266397A
Authority: JP
Inventors: Toshiyuki Tanaka; 俊幸田中
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2012-12-05
Filing date: 2012-12-05
Publication date: 2014-06-19
Anticipated expiration: 2032-12-05
Also published as: JP6081788B2; KR20140072785A

Abstract

PROBLEM TO BE SOLVED: To create a digest of video reflecting the intension of an operator accurately.SOLUTION: A video processing device includes a CMOS element 160 for capturing the image of an operator, a face analysis unit 220 for analyzing the features of face of the operator from a first video captured by the CMOS element 160, an expression evaluation value calculation unit 222 for calculating an evaluation value by quantifying the expression of face of the operator based on the analysis results from the analysis unit 220, from the first video captured while capturing or reproducing a second video different from the first video, a record file creation unit 228 for recording the evaluation value thus quantified on the same time line as the second video, and a video edition unit 226 for creating a digest by extracting the partial videos of the second video sequentially based on the evaluation value.

Description

本発明は、動画像処理装置及び動画像処理方法に関する。 The present invention relates to a moving image processing apparatus and a moving image processing method.

従来から、動画像のダイジェスト版を作成する技術が知られている。例えば下記の特許文献１には、人間関係情報を記憶させておくことにより、ユーザが指定した人物に加えて、当該人物と関係性が深い人物が映っている画像を抽出してダイジェスト画像を作成する情報処理装置が記載されている。 Conventionally, a technique for creating a digest version of a moving image is known. For example, in Patent Document 1 below, human relation information is stored, and in addition to the person specified by the user, an image showing a person closely related to the person is extracted to create a digest image. An information processing apparatus is described.

また、下記の特許文献２には、特定の画像、音声の変化をイベントとして検出し、再生速度を可変にしてダイジェスト版映像を生成する技術が記載されている。また、下記の特許文献３には、デジタルカメラ等で静止画像を再生する際に、閲覧者が笑顔であるかどうかに基づいて、閲覧中の画像のお気に入り度を設定するシステムが記載されている。 Japanese Patent Application Laid-Open No. 2004-228688 describes a technique for detecting a change in a specific image and sound as an event and generating a digest version video with a variable playback speed. Patent Document 3 below describes a system for setting a favorite degree of an image being browsed based on whether or not the viewer is smiling when playing back a still image with a digital camera or the like. .

特開２０１１−８２９１５号公報JP 2011-82915 A 特開２０１０−３９８７７号公報JP 2010-39877 A 特開２０１２−１５６７５０号公報JP 2012-156750 A

近時においては、デジタルスチルカメラ（Digital Still Camera）、スマートフォン（SmartPhone）等の電子機器においても、動画記録機能が装備されるようになっている。動画再生は臨場感を表現するのに適しているが、静止画と比較すると、再生による内容確認に時間を要してしまう欠点がある。また、動画再生においては、主となるシーン以外は冗長的な部分が多くなる。このため、動画を記録したとしても、動画の再生頻度は低くなる傾向がある。 In recent years, electronic devices such as a digital still camera and a smart phone are also equipped with a moving image recording function. Video playback is suitable for expressing a sense of realism, but has a drawback that it takes time to confirm the content by playback compared to a still image. In moving image reproduction, there are many redundant portions other than the main scene. For this reason, even if a moving image is recorded, the reproduction frequency of the moving image tends to be low.

このため、ダイジェスト（要約）動画を効率的に自動作成（自動編集）する機能が求められる。しかしながら、特許文献１、特許文献２に記載された技術では、人物の特徴、発する音を検出してダイジェストを作成しているため、ダイジェスト作成の適用状況が人物の特徴や音声によって粗く限定されてしまい、所望のダイジェストを作成することは困難であった。また、特許文献１，２等に記載された技術では、ダイジェストに残したいシーンに指定した人物が出現する保証はなく、また残したいシーンに音声が挿入されている保証もないため、所望のダイジェストを作成することは困難である。更に、動画中に指定した人物が常時出現するケースでは、重要なシーンを判定することができず、ダイジェストを生成するのが困難になるという問題がある。 For this reason, a function for automatically automatically creating (automatically editing) a digest (summary) moving image is required. However, in the techniques described in Patent Literature 1 and Patent Literature 2, since the digest is created by detecting the characteristics of the person and the sound to be emitted, the application status of the digest creation is roughly limited by the characteristics and voice of the person. Therefore, it has been difficult to create a desired digest. In addition, in the techniques described in Patent Documents 1 and 2 and the like, there is no guarantee that the specified person appears in the scene to be left in the digest, and there is no guarantee that the voice is inserted in the scene to be left. It is difficult to create. Furthermore, in the case where a designated person always appears in a moving image, there is a problem that an important scene cannot be determined and it is difficult to generate a digest.

また、特許文献３に記載された方法は、閲覧者が笑顔であるかどうかを認識するものであるが、静止画のお気に入り度を設定するための技術であり、動画のダイジェストの作成を想定したものではなかった。 In addition, the method described in Patent Document 3 recognizes whether or not the viewer is smiling, but is a technique for setting a favorite degree of a still image, and is assumed to create a digest of a moving image. It was not a thing.

そこで、本発明は、上記問題に鑑みてなされたものであり、本発明の目的とするところは、操作者の意図を正確に反映させた動画のダイジェストを作成することが可能な、新規かつ改良された動画像処理装置及び動画像処理方法を提供することにある。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a new and improved video digest that accurately reflects the intention of the operator. Another object is to provide a moving image processing apparatus and a moving image processing method.

上記課題を解決するために、本発明のある観点によれば、操作者を撮像する第１の動画像撮像部と、第１の動画像撮像部で撮像した第１の動画から操作者の顔の特徴を分析する分析部と、前記第１の動画とは別の第２の動画の撮像時又は再生時に撮像した前記第１の動画から、前記分析部による分析結果に基づいて操作者の顔の表情を数値化して評価値を算出する評価値算出部と、数値化した前記評価値を前記第２の動画と同じタイムラインで記録する記録部と、前記評価値に基づいて前記第２の動画の部分動画を順次抽出してダイジェストを生成する動画編集部と、を備える、動画像処理装置が提供される。 In order to solve the above problems, according to an aspect of the present invention, a first moving image capturing unit that captures an image of an operator and a face of the operator from the first moving image captured by the first moving image capturing unit. An analysis unit for analyzing the characteristics of the operator, and the face of the operator based on the analysis result by the analysis unit from the first moving image captured at the time of capturing or reproducing the second moving image different from the first moving image An evaluation value calculation unit that calculates an evaluation value by digitizing the facial expression, a recording unit that records the digitized evaluation value on the same timeline as the second moving image, and the second value based on the evaluation value There is provided a moving image processing apparatus including a moving image editing unit that sequentially extracts partial moving images of moving images and generates a digest.

上記構成によれば、第１の動画から操作者の顔の特徴が分析され、第１の動画とは別の第２の動画の撮像時又は再生時に撮像した第１の動画から、顔の特徴の分析結果に基づいて操作者の顔の表情が数値化されて評価値が算出される。数値化した評価値は第２の動画と同じタイムラインで記録され、評価値に基づいて第２の動画の部分動画が順次抽出されてダイジェストが作成される。従って、第２の動画の撮影時又は再生時に、ダイジェストを作成するための表情の評価値が算出されるため、操作者の感情を評価値に正確に反映させることができ、所望のダイジェストを作成することが可能となる。 According to the above configuration, the features of the operator's face are analyzed from the first video, and the facial features are captured from the first video captured at the time of capturing or reproducing the second video different from the first video. Based on the analysis result, the facial expression of the operator is digitized and an evaluation value is calculated. The numerical evaluation value is recorded on the same timeline as the second moving image, and a partial moving image of the second moving image is sequentially extracted based on the evaluation value to create a digest. Therefore, the facial expression evaluation value for creating the digest is calculated at the time of shooting or playback of the second video, so that the emotion of the operator can be accurately reflected in the evaluation value and a desired digest can be created. It becomes possible to do.

また、前記第２の動画を撮像する第２の動画像撮像部を備え、前記評価値算出部は、前記第２の動画像撮像部による前記第２の動画の撮像時に前記数値化を行う。この構成によれば、第２の動画像撮像部で撮像した第２の動画に基づいて表情の評価値の数値化を行うことができる。 In addition, a second moving image capturing unit that captures the second moving image is provided, and the evaluation value calculating unit performs the numerical value when the second moving image capturing unit captures the second moving image. According to this configuration, the expression evaluation value can be digitized based on the second moving image captured by the second moving image capturing unit.

また、前記動画編集部は、前記評価値の値が所定のしきい値よりも大きい区間の前記部分動画を順次抽出する。この構成によれば、評価値の値が所定のしきい値よりも大きい区間で部分動画が順次に抽出されるため、操作者の顔の表情の変化が大きい場合に部分動画が順次に抽出されることとなり、所望のダイジェストを作成することが可能となる。 The moving image editing unit sequentially extracts the partial moving images in a section where the value of the evaluation value is larger than a predetermined threshold value. According to this configuration, since the partial moving images are sequentially extracted in the section where the evaluation value is larger than the predetermined threshold value, the partial moving images are sequentially extracted when the change in the facial expression of the operator is large. This makes it possible to create a desired digest.

また、前記動画編集部は、前記部分動画の延べ時間が予め定められた時間に収まるように前記しきい値を変化させて、前記部分動画を抽出する。この構成によれば、部分動画の延べ時間が予め定められた時間に収まるようにしきい値が変化されるため、所望の長さで部分動画を抽出することができる。 In addition, the moving image editing unit extracts the partial moving image by changing the threshold value so that the total time of the partial moving image falls within a predetermined time. According to this configuration, since the threshold value is changed so that the total time of the partial moving image falls within a predetermined time, the partial moving image can be extracted with a desired length.

また、前記動画編集部は、前記評価値の値が所定のしきい値よりも大きい前記区間の前又は後ろに所定の余裕時間を加算して得られる区間の前記部分動画を順次抽出する。この構成によれば、評価値のサンプリング間隔が比較的大きい場合であっても、所望のダイジェストを作成することができる。 Further, the moving image editing unit sequentially extracts the partial moving images in a section obtained by adding a predetermined margin time before or after the section in which the value of the evaluation value is larger than a predetermined threshold value. According to this configuration, a desired digest can be created even when the sampling interval of evaluation values is relatively large.

また、前記評価値算出部は、前記数値化を予め定められた規則に基づいて行う。この構成によれば、予め定められた規則に基づいて数値化が行われるため、評価値を予め定められ規則に基づいて算出することができる。 Further, the evaluation value calculation unit performs the digitization based on a predetermined rule. According to this configuration, since the digitization is performed based on a predetermined rule, the evaluation value can be calculated based on the predetermined rule.

また、前記規則は、顔の表情の変化が激しい場合に評価値の重み付けが高くなるように規定される。この構成によれば、顔の表情の変化が激しい場合に評価値の重み付けが高く規定されるため、表情の変化を正確に評価値に反映させることがきる。 Further, the rule is defined so that the weight of the evaluation value is increased when the facial expression changes drastically. According to this configuration, when the change in facial expression is severe, the evaluation value is highly weighted, so that the change in expression can be accurately reflected in the evaluation value.

また、前記記録部は、前記評価値を前記第２の動画のデータ内にメタデータとして記録する。この構成によれば、評価値が第２の動画のデータ内にメタデータとして記録されるため、評価値と第２の動画のデータを対応付けて記録することができる。 The recording unit records the evaluation value as metadata in the data of the second moving image. According to this configuration, since the evaluation value is recorded as metadata in the data of the second moving image, the evaluation value and the data of the second moving image can be recorded in association with each other.

また、前記記録部は、前記評価値を前記第２の動画のデータと１対１で対応する別データとして記録する。この構成によれば、評価値が第２の動画のデータと１対１で対応する別データとして記録されるため、評価値と第２の動画のデータを対応付けて記録することができる。 In addition, the recording unit records the evaluation value as separate data that has a one-to-one correspondence with the data of the second moving image. According to this configuration, since the evaluation value is recorded as separate data that has a one-to-one correspondence with the second moving image data, the evaluation value and the second moving image data can be recorded in association with each other.

また、上記課題を解決するために、本発明の別の観点によれば、操作者を撮像するステップと、第１の動画像撮像部で撮像した第１の動画から操作者の顔の特徴を分析するステップと、前記第１の動画とは別の第２の動画の撮像時又は再生時に撮像した前記第１の動画から、前記顔の特徴の分析結果に基づいて、操作者の顔の表情を数値化して評価値を算出するステップと、数値化した前記評価値を前記第２の動画と同じタイムラインで記録するステップと、前記評価値に基づいて前記第２の動画の部分動画を順次抽出してダイジェストを生成するステップと、を備える動画像処理方法が提供される。 In order to solve the above-described problem, according to another aspect of the present invention, the step of imaging the operator and the characteristics of the operator's face from the first moving image captured by the first moving image capturing unit are obtained. Analyzing the facial expression of the operator based on the analysis result of the facial features from the first moving image captured at the time of imaging or playback of the second moving image different from the first moving image Numerically calculating the evaluation value, recording the numerical evaluation value on the same timeline as the second moving image, and sequentially sub-moving the second moving image based on the evaluation value A moving image processing method comprising: extracting and generating a digest.

また、前記第２の動画を撮像するステップを備え、前記評価値を算出するステップにおいて、前記第２の動画の撮像時に前記数値化を行う。この構成によれば、第２の動画を撮像するステップで撮像した第２の動画に基づいて表情の評価値の数値化を行うことができる。 In addition, the method includes a step of imaging the second moving image, and in the step of calculating the evaluation value, the numerical value is performed when the second moving image is captured. According to this configuration, the evaluation value of the facial expression can be quantified based on the second moving image captured in the step of capturing the second moving image.

また、前記ダイジェストを生成するステップにおいて、前記評価値の値が所定のしきい値よりも大きい区間の前記部分動画を順次抽出する。この構成によれば、評価値の値が所定のしきい値よりも大きい区間で部分動画が順次に抽出されるため、操作者の顔の表情の変化が大きい場合に部分動画が順次に抽出されることとなり、所望のダイジェストを作成することが可能となる。 In the step of generating the digest, the partial moving images in a section where the value of the evaluation value is larger than a predetermined threshold value are sequentially extracted. According to this configuration, since the partial moving images are sequentially extracted in the section where the evaluation value is larger than the predetermined threshold value, the partial moving images are sequentially extracted when the change in the facial expression of the operator is large. This makes it possible to create a desired digest.

また、前記ダイジェストを生成するステップにおいて、前記部分動画の延べ時間が予め定められた時間に収まるように前記しきい値を変化させて、前記部分動画を抽出する。この構成によれば、部分動画の延べ時間が予め定められた時間に収まるようにしきい値が変化されるため、所望の長さで部分動画を抽出することができる。 Further, in the step of generating the digest, the partial moving image is extracted by changing the threshold value so that the total time of the partial moving image falls within a predetermined time. According to this configuration, since the threshold value is changed so that the total time of the partial moving image falls within a predetermined time, the partial moving image can be extracted with a desired length.

前記ダイジェストを生成するステップにおいて、前記評価値の値が所定のしきい値よりも大きい前記区間の前又は後ろに所定の余裕時間を加算して得られる区間の前記部分動画を順次抽出する。この構成によれば、評価値のサンプリング間隔が比較的大きい場合であっても、所望のダイジェストを作成することができる。 In the step of generating the digest, the partial moving images in a section obtained by adding a predetermined margin time before or after the section in which the value of the evaluation value is larger than a predetermined threshold are sequentially extracted. According to this configuration, a desired digest can be created even when the sampling interval of evaluation values is relatively large.

また、前記評価値を算出するステップにおいて、前記数値化を予め定められた規則に基づいて行う。この構成によれば、予め定められた規則に基づいて数値化が行われるため、評価値を予め定められ規則に基づいて算出することができる。 In the step of calculating the evaluation value, the quantification is performed based on a predetermined rule. According to this configuration, since the digitization is performed based on a predetermined rule, the evaluation value can be calculated based on the predetermined rule.

また、前記記録するステップにおいて、前記評価値を前記第２の動画のデータ内にメタデータとして記録する。この構成によれば、評価値が第２の動画のデータ内にメタデータとして記録されるため、評価値と第２の動画のデータを対応付けて記録することができる。 In the recording step, the evaluation value is recorded as metadata in the data of the second moving image. According to this configuration, since the evaluation value is recorded as metadata in the data of the second moving image, the evaluation value and the data of the second moving image can be recorded in association with each other.

また、前記記録するステップにおいて、前記評価値を前記第２の動画のデータと１対１で対応する別データとして記録する。この構成によれば、評価値が第２の動画のデータと１対１で対応する別データとして記録されるため、評価値と第２の動画のデータを対応付けて記録することができる。 In the recording step, the evaluation value is recorded as separate data corresponding to the data of the second moving image on a one-to-one basis. According to this configuration, since the evaluation value is recorded as separate data that has a one-to-one correspondence with the second moving image data, the evaluation value and the second moving image data can be recorded in association with each other.

本発明によれば、操作者の意図を正確に反映させた動画のダイジェストを作成することが可能となる。 According to the present invention, it is possible to create a video digest that accurately reflects the operator's intention.

スマートフォン、デジタルカメラ等の電子機器で被写体を撮影している様子を示す模式図である。It is a schematic diagram which shows a mode that a to-be-photographed object is image | photographed with electronic devices, such as a smart phone and a digital camera. 電子機器が撮影した画像を示す模式図である。It is a schematic diagram which shows the image which the electronic device image | photographed. 本発明の一実施形態にかかる撮像装置１００の構成を示す模式図である。1 is a schematic diagram illustrating a configuration of an imaging apparatus 100 according to an embodiment of the present invention. 評価値を規定する規則を説明するための模式図である。It is a schematic diagram for demonstrating the rule which prescribes | regulates an evaluation value. 評価値を規定する規則を説明するための模式図である。It is a schematic diagram for demonstrating the rule which prescribes | regulates an evaluation value. 動画編集部による自動編集機能を示す模式図である。It is a schematic diagram which shows the automatic edit function by a moving image edit part. 図５に示す各要素（ａ）〜（ｅ）の各評価値と、各評価値の合計値（ｓｕｍ）が時間（ｔｉｍｅ）の経過に伴って変化する場合に、ダイジェスト動画を抽出する区間を示す模式図である。When the evaluation values of the elements (a) to (e) shown in FIG. 5 and the total value (sum) of the evaluation values change with the passage of time (time), a section for extracting the digest video is shown. It is a schematic diagram shown. 評価値のサンプリング時間（Sampling Time）と評価値の関係をプロットし、サンプリング時間の間を直線補間した例を示す模式図である。It is a schematic diagram which shows the example which plotted the relationship between the sampling time (Sampling Time) of an evaluation value, and an evaluation value, and linearly interpolated between sampling time. 評価値の保存方法を示す模式図である。It is a schematic diagram which shows the preservation | save method of an evaluation value. 撮像装置における動画像処理方法の処理を示すフローチャートである。It is a flowchart which shows the process of the moving image processing method in an imaging device.

以下に添付図面を参照しながら、本発明の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Exemplary embodiments of the present invention will be described below in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

近時のスマートフォン（ＳｍａｒｔＰｈｏｎｅ）のような携帯型の電子機器においては、被写体記録用のカメラとは別に、自分撮り用のカメラ（インカメラ（Ｉｎ−ｃａｍｅｒａ））を操作者側に搭載したものが出現している。自分撮り用のカメラを備えた構成は、デジタルスチルカメラ（ＤＳＣ）等の携帯機器を含む様々な機器に拡がることが想定される。 In a portable electronic device such as a recent smartphone (SmartPhone), a camera for taking a self-portrait (in-camera) is mounted on the operator side separately from a camera for recording a subject. Has appeared. A configuration including a camera for self-portrait is expected to be extended to various devices including a portable device such as a digital still camera (DSC).

デジタルスチルカメラ等の撮像装置は、小型化が進んだ影響で、スマートフォン等と同様にファインダーを有さずに、撮影者が液晶表示画面（ＬＣＤ）で被写体を視認するものが主流となっている。この場合、撮影者は撮像装置から顔を一定距離だけ離して撮影するため、インカメラを備えていれば撮影者（操作者）の顔全体を撮影することができる。 Imaging devices such as digital still cameras are mainly downsized due to the progress of miniaturization, so that a photographer visually recognizes a subject on a liquid crystal display screen (LCD) without a viewfinder like a smartphone. . In this case, since the photographer shoots the face away from the imaging device by a certain distance, if the in-camera is provided, the entire face of the photographer (operator) can be photographed.

図１は、スマートフォン、デジタルカメラ等の電子機器３００で被写体を撮影している様子を示す模式図である。電子機器３００は、撮影者の操作に基づいて、正面のカメラ１０２で被写体を撮影する。また、電子機器３００は、カメラ３０２と反対側（撮影者側）にインカメラ３０４を備えており、インカメラ３０４によりユーザの顔を撮影することができる。 FIG. 1 is a schematic diagram illustrating a state in which a subject is photographed by an electronic device 300 such as a smartphone or a digital camera. The electronic device 300 captures a subject with the front camera 102 based on the operation of the photographer. The electronic device 300 includes an in-camera 304 on the side opposite to the camera 302 (photographer side), and the user's face can be captured by the in-camera 304.

図２は、電子機器３００が撮影した画像を示す模式図である。ここで、図２（Ｂ）は、電子機器３００の正面のカメラ３０２が撮影した画像が、電子機器３００のＬＣＤ３０６に標示されている様子を示している。また、図２（Ａ）は、電子機器３００のインカメラ３０４が撮影した画像を示している。図２（Ａ）に示すように、インカメラ３０４によって撮影者が撮影される。このように、通常のカメラ３０２に加えてインカメラ３０４を備える電子機器１００では、被写体を撮影しながら撮影者の顔を撮影することができる。 FIG. 2 is a schematic diagram illustrating an image captured by the electronic device 300. Here, FIG. 2B shows a state in which an image taken by the camera 302 in front of the electronic device 300 is marked on the LCD 306 of the electronic device 300. FIG. 2A shows an image taken by the in-camera 304 of the electronic device 300. As shown in FIG. 2A, the photographer is photographed by the in-camera 304. As described above, the electronic device 100 including the in-camera 304 in addition to the normal camera 302 can photograph the photographer's face while photographing the subject.

カメラ３０２で撮像される画像は、人物、風景、乗り物、建物等の様々な被写体であり、カメラ３０２は常時に人物の顔を検出するものではない。一方、インカメラ３０４で撮像した画像は、ほとんどの場合において、インカメラ３０４の近傍に配置されたＬＣＤ３０６を正視している撮影者の顔の画像である。従って、インカメラ３０４によれば、ＬＣＤ３０６を正視している撮影者のほぼ正面の顔を捉えることが可能である。また、低照度時においても、ＬＣＤ３０６の光が照明代わりになるので、暗い場合もインカメラ３０４によって撮影者の顔を撮像することができる。従って、インカメラ３０４を備えた電子機器３００によれば、常時、撮影者の顔を撮像することができ、顔検出、表情の検出を行うことができる。 Images captured by the camera 302 are various subjects such as a person, a landscape, a vehicle, and a building, and the camera 302 does not always detect a person's face. On the other hand, in most cases, the image captured by the in-camera 304 is an image of the face of the photographer who is looking directly at the LCD 306 disposed in the vicinity of the in-camera 304. Therefore, according to the in-camera 304, it is possible to capture the almost front face of the photographer who is looking straight at the LCD 306. Further, since the light from the LCD 306 is used as an illumination even at low illumination, the photographer's face can be captured by the in-camera 304 even in the dark. Therefore, according to the electronic device 300 provided with the in-camera 304, the photographer's face can be always captured, and face detection and facial expression detection can be performed.

本実施形態では、常時に撮影者（操作者）の顔を撮像することが可能なインカメラ３０４を利用して、動画の自動編集を撮影者の顔情報（特に表情の情報）に基づいて行う。以下、詳細に説明する。 In the present embodiment, the in-camera 304 that can always capture the photographer's (operator's) face is used to automatically edit the moving image based on the photographer's face information (particularly facial expression information). . Details will be described below.

図３は、本発明に係る動画像処理装置としての一実施形態として、撮像装置１００の構成を示す模式図である。図３は、本実施形態に係るデジタルスチルカメラ（Digital Still Camera）において、主に画像処理パイプライン（Pipe Line）に関係するブロック図を表している。図３に示すように、本発明の一実施形態に係る撮像装置１００は、ズームレンズ（群）１０２と、絞り１０４と、フォーカスレンズ（群）１０８と、レンズＣＰＵ１１０と、ドライバ１１２，１１４と、モータ１１６，１１８と、Ａ／Ｄ変換回路１２０と、ＲＯＭ１２２と、ＲＡＭと１２４を備えている。 FIG. 3 is a schematic diagram showing a configuration of the imaging apparatus 100 as an embodiment as a moving image processing apparatus according to the present invention. FIG. 3 is a block diagram mainly related to an image processing pipeline (Pipe Line) in the digital still camera according to the present embodiment. As shown in FIG. 3, an imaging apparatus 100 according to an embodiment of the present invention includes a zoom lens (group) 102, a diaphragm 104, a focus lens (group) 108, a lens CPU 110, drivers 112 and 114, Motors 116 and 118, A / D conversion circuit 120, ROM 122, RAM and 124 are provided.

また、撮像装置１００は、シャッター１２６と、ドライバ１２８と、モータ１３０と、撮像素子としてのＣＭＯＳ(Complementary Metal Oxide Semiconductor)素子１３２と、アンプ一体型のＣＤＳ(Correlated Double Sampling)回路１３４と、Ａ／Ｄ変換器１３６と、画像入力コントローラ１３８と、ＣＰＵ２００と、を備える。 In addition, the imaging apparatus 100 includes a shutter 126, a driver 128, a motor 130, a complementary metal oxide semiconductor (CMOS) element 132 as an imaging element, a CDS (Correlated Double Sampling) circuit 134 integrated with an amplifier, an A / A D converter 136, an image input controller 138, and a CPU 200 are provided.

また、撮像装置１００は、不揮発性メモリ１４０と、圧縮処理回路１４２と、ＬＣＤドライバ１４４と、ＬＣＤ(Liquid Crystal Display)１４６と、メディアコントローラ１５０と、記録メディア１５２と、ＶＲＡＭ(Video Random Access Memory)１５４と、メモリ（ＳＤＲＡＭ）１５６と、操作部材１５８と、バッテリー１４８と、を備える。 The imaging apparatus 100 also includes a nonvolatile memory 140, a compression processing circuit 142, an LCD driver 144, an LCD (Liquid Crystal Display) 146, a media controller 150, a recording medium 152, and a VRAM (Video Random Access Memory). 154, a memory (SDRAM) 156, an operation member 158, and a battery 148.

絞り１０４、およびフォーカスレンズ１０８は、各ドライバ１１２，１１４によって制御されるモータ１１６，１１８を介して駆動される。ズームレンズ１０２は、光軸方向に前後して移動され、焦点距離を連続的に変化させるレンズである。絞り１０４は、画像を撮像する際に、ＣＭＯＳ素子１３２へ入射する光量の調節を行う。シャッター１２６は、画像を撮像する際に、ＣＭＯＳ素子１３２への露光時間を制御する。フォーカスレンズ１０８は、光軸方向に前後して移動され、ＣＭＯＳ素子１３２へ結像された被写体の画像のピントを調節するものである。 The diaphragm 104 and the focus lens 108 are driven via motors 116 and 118 controlled by the drivers 112 and 114, respectively. The zoom lens 102 is a lens that is moved back and forth in the optical axis direction and continuously changes the focal length. The diaphragm 104 adjusts the amount of light incident on the CMOS element 132 when capturing an image. The shutter 126 controls the exposure time for the CMOS element 132 when an image is captured. The focus lens 108 is moved back and forth in the optical axis direction, and adjusts the focus of the subject image formed on the CMOS element 132.

ＣＭＯＳ素子１３２は、ズームレンズ１０２、絞り１０４、およびフォーカスレンズ１０８を通って入射した光を電気信号に変換するための素子である。 The CMOS element 132 is an element for converting light incident through the zoom lens 102, the stop 104, and the focus lens 108 into an electric signal.

ＣＤＳ回路１３４は、ＣＭＯＳ素子１３２から出力された電気信号の雑音を除去する、サンプリング回路の一種であるＣＤＳ回路と、雑音を除去した後に電気信号を増幅するアンプとが一体となった回路である。本実施形態ではＣＤＳ回路とアンプとが一体となった回路を用いているが、ＣＤＳ回路とアンプとを別々の回路で構成してもよい。 The CDS circuit 134 is a circuit in which a CDS circuit that is a kind of sampling circuit that removes noise from the electrical signal output from the CMOS element 132 and an amplifier that amplifies the electrical signal after removing the noise are integrated. . In the present embodiment, a circuit in which the CDS circuit and the amplifier are integrated is used. However, the CDS circuit and the amplifier may be configured as separate circuits.

Ａ／Ｄ変換器１３６は、ＣＭＯＳ素子１３２で生成された電気信号をデジタル信号に変換して、画像の生データ（ローデータ、画像データ）を生成するものである。画像入力コントローラ１３８は、Ａ／Ｄ変換器１３６で生成された画像の生データ（画像データ）の記録メディア１５２への入力を制御するものである。 The A / D converter 136 converts the electrical signal generated by the CMOS element 132 into a digital signal, and generates raw image data (raw data, image data). The image input controller 138 controls input of raw image data (image data) generated by the A / D converter 136 to the recording medium 152.

また、本実施形態に係る撮像装置１００は、インカメラとして機能する第２のＣＭＯＳ素子１６０と、アンプ一体型のＣＤＳ回路１６２と、Ａ／Ｄ変換器１６４と、画像入力コントローラ１６６と、を備える。ＣＤＳ回路１６２、Ａ／Ｄ変換器１６４、および画像入力コントローラ１６６は、ＣＭＯＳ素子１６０用に設けられている。また、撮像装置１００は、ＣＭＯＳ素子１６０用に設けられた光学レンズ１６８を備える。光学レンズ１６２の光軸は撮像装置１００を操作する操作者に向けられており、光学レンズ１６２は、操作者の顔を被写体として、被写体像（操作者の顔の像）を第２のＣＭＯＳ素子１６０の撮像面に結像させる。ＣＭＯＳ素子１６０は、被写体像を光電変換し、画像入力コントローラ１６６へ入力する。これらの構成要素によって、インカメラが構成されている。 In addition, the imaging apparatus 100 according to the present embodiment includes a second CMOS element 160 that functions as an in-camera, an amplifier-integrated CDS circuit 162, an A / D converter 164, and an image input controller 166. . The CDS circuit 162, the A / D converter 164, and the image input controller 166 are provided for the CMOS element 160. In addition, the imaging apparatus 100 includes an optical lens 168 provided for the CMOS element 160. The optical axis of the optical lens 162 is directed to an operator who operates the imaging apparatus 100. The optical lens 162 uses the operator's face as a subject, and the subject image (image of the operator's face) is a second CMOS element. The image is formed on 160 imaging surfaces. The CMOS element 160 photoelectrically converts the subject image and inputs it to the image input controller 166. An in-camera is configured by these components.

ＣＤＳ回路１６２は、ＣＭＯＳ素子１６０から出力された電気信号の雑音を除去する、サンプリング回路の一種であるＣＤＳ回路と、雑音を除去した後に電気信号を増幅するアンプとが一体となった回路である。 The CDS circuit 162 is a circuit in which a CDS circuit, which is a kind of sampling circuit, that removes noise from the electrical signal output from the CMOS element 160 and an amplifier that amplifies the electrical signal after removing the noise are integrated. .

Ａ／Ｄ変換器１６４は、ＣＭＯＳ素子１６０で生成された電気信号をデジタル信号に変換して、画像の生データ（ローデータ、画像データ）を生成するものである。画像入力コントローラ１６６は、Ａ／Ｄ変換器１６４で生成された画像の生データ（画像データ）の記録メディア１５２への入力を制御するものである。 The A / D converter 164 converts the electrical signal generated by the CMOS element 160 into a digital signal, and generates raw image data (raw data, image data). The image input controller 166 controls input of raw image data (image data) generated by the A / D converter 164 to the recording medium 152.

不揮発性メモリ１４０は、撮像装置１００が常時保持するデータを格納するメモリである。不揮発性メモリ１４０は、ＣＰＵ２００を機能させるためのプログラムを格納することができる。 The non-volatile memory 140 is a memory that stores data that is always held by the imaging apparatus 100. The nonvolatile memory 140 can store a program for causing the CPU 200 to function.

圧縮処理回路１４２は、ＣＭＯＳ素子１３２，１３６から出力された画像のデータを適切な形式の画像データに圧縮する圧縮処理を行う。画像の圧縮形式は可逆形式であっても非可逆形式であってもよい。適切な形式の例として、ＪＰＥＧ(Joint Photographic Experts Group)形式やＪＰＥＧ２０００形式に変換してもよい。 The compression processing circuit 142 performs compression processing for compressing image data output from the CMOS elements 132 and 136 into image data of an appropriate format. The image compression format may be a reversible format or an irreversible format. As an example of an appropriate format, it may be converted into a JPEG (Joint Photographic Experts Group) format or JPEG2000 format.

ＬＣＤ１４６は、撮像操作を行う前のライブビュー表示や、撮像装置１００の各種設定画面や、撮像した画像の表示、動画再生等を行う。画像データや撮像装置１００の各種情報のＬＣＤ１４６への表示は、ＬＣＤドライバ１４８を介して行われる。 The LCD 146 performs live view display before performing an imaging operation, various setting screens of the imaging apparatus 100, display of captured images, playback of moving images, and the like. Display of image data and various types of information of the imaging apparatus 100 on the LCD 146 is performed via the LCD driver 148.

メモリ（ＳＤＲＡＭ）１５６は、ＣＭＯＳ素子１３２，ＣＭＯＳ素子１６０が撮像した画像を一時的に記憶するものである。記録メディア１５２は、複数の画像を記憶できるだけの記憶容量を有している。メモリ（ＳＤＲＡＭ）１５６への画像の読み書きは画像入力コントローラ１３８及び画像入力コントローラ１６６によって制御される。 The memory (SDRAM) 156 temporarily stores an image captured by the CMOS element 132 and the CMOS element 160. The recording medium 152 has a storage capacity sufficient to store a plurality of images. Reading and writing of images to and from the memory (SDRAM) 156 is controlled by the image input controller 138 and the image input controller 166.

ＶＲＡＭ１５４は、ＬＣＤ１４６に表示する内容を保持するものであり、ＬＣＤ１４６の解像度や最大発色数はＶＲＡＭ１５４の容量に依存する。 The VRAM 154 holds the content displayed on the LCD 146, and the resolution and the maximum number of colors of the LCD 146 depend on the capacity of the VRAM 154.

記録メディア１５２は、ＣＭＯＳ素子１３２，ＣＭＯＳ素子１６０が撮像した画像を記録するものである。記録メディア１５２への入出力は、メディアコントローラ１５０によって制御される。記録メディア１５２としては、フラッシュメモリにデータを記録するカード型の記憶装置であるメモリカードを用いることができる。 The recording medium 152 records an image captured by the CMOS element 132 and the CMOS element 160. Input / output to / from the recording medium 152 is controlled by the media controller 150. As the recording medium 152, a memory card, which is a card-type storage device that records data in a flash memory, can be used.

ＣＰＵ２００は、ＣＭＯＳ素子１３２，１６０やＣＤＳ回路１３４，１６２などに対して信号系の指令を行ったり、操作部材１５８の操作に応じた操作系の指令を行ったりする。本実施形態においては、ＣＰＵを１つだけ含んでいるが、信号系の命令と操作系の命令とを別々のＣＰＵで行うようにしてもよい。 The CPU 200 issues a signal system command to the CMOS elements 132 and 160, the CDS circuits 134 and 162, and performs an operation system command corresponding to the operation of the operation member 158. In the present embodiment, only one CPU is included, but a signal-related command and an operation-related command may be executed by separate CPUs.

ＣＰＵ２００は、ＡＥ／ＡＷＢ／ＡＦ評価値算出部２０２と、適正ＡＷＢ算出部２０４と、画像処理部２０６と、ＡＦ演算・制御部２０８と、ＡＥ演算・制御部２１０と、ＧＵＩ管理部２１２と、タイミングジェネレータ（ＴＧ１）２１４と、Ｉ／Ｏ２１６と、ＳＩＯ２１８と、タイミングジェネレータ（ＴＧ２）２１９と、を備える。また、ＣＰＵ２００は、顔認識部２２０と、表情評価値算出部２２２と、再生処理部２２４と、動画編集部２２６と、記録ファイル生成部２２８と、を備える。 The CPU 200 includes an AE / AWB / AF evaluation value calculation unit 202, an appropriate AWB calculation unit 204, an image processing unit 206, an AF calculation / control unit 208, an AE calculation / control unit 210, a GUI management unit 212, A timing generator (TG1) 214, an I / O 216, an SIO 218, and a timing generator (TG2) 219 are provided. The CPU 200 also includes a face recognition unit 220, a facial expression evaluation value calculation unit 222, a reproduction processing unit 224, a moving image editing unit 226, and a recording file generation unit 228.

ＡＥ／ＡＷＢ／ＡＦ評価値算出部２０２は、ＣＭＯＳ素子１３２，１６０から出力された画像のデータから、露光量情報としてのＡＥ評価値、ホワイトバランス情報としてのＡＷＢ評価値、ＡＦ情報としてのＡＦ評価値、を算出する。適正ＡＷＢ算出部２０４は、適正なホワイトバランス値を算出する処理を行う。画像処理部２０６は、ＣＭＯＳ素子１３２から出力された画像のデータに対して、光量のゲイン補正、画像のエッジ処理（輪郭強調処理）、ホワイトバランスの調整などの処理を行う。 The AE / AWB / AF evaluation value calculation unit 202 uses the image data output from the CMOS elements 132 and 160 as an AE evaluation value as exposure amount information, an AWB evaluation value as white balance information, and an AF evaluation as AF information. Value. The appropriate AWB calculation unit 204 performs processing for calculating an appropriate white balance value. The image processing unit 206 performs processing such as light amount gain correction, image edge processing (outline enhancement processing), and white balance adjustment on the image data output from the CMOS element 132.

ＡＦ演算・制御部２０８は、ＡＦ評価値に基づいて、被写体を撮影する際のフォーカスレンズ１０８の駆動量を決定する。レンズＣＰＵ１１０は、決定された駆動量に基づいてドライバ１１４を制御し、モータ１１８を駆動する。これにより、フォーカスレンズ１０８が合焦位置に駆動される。 The AF calculation / control unit 208 determines the driving amount of the focus lens 108 when shooting the subject based on the AF evaluation value. The lens CPU 110 controls the driver 114 based on the determined driving amount and drives the motor 118. As a result, the focus lens 108 is driven to the in-focus position.

また、ＡＥ演算・制御部２１０は、ＡＥ評価値に基づいて、被写体を撮影する際の絞り値、シャッター速度を決定する。ＣＰＵ２００は、決定されたシャッター速度に基づいてドライバ１２８を制御し、モータ１３０を駆動する。これにより、シャッター１２６が駆動される。また、レンズＣＰＵ１１０は、決定された絞り値に基づいてドライバ１１２を制御し、モータ１１６を駆動する。これにより、レンズが備える絞り１０４が駆動される。 In addition, the AE calculation / control unit 210 determines an aperture value and a shutter speed when photographing a subject based on the AE evaluation value. CPU 200 controls driver 128 based on the determined shutter speed to drive motor 130. Thereby, the shutter 126 is driven. Further, the lens CPU 110 controls the driver 112 based on the determined aperture value and drives the motor 116. Thereby, the diaphragm 104 provided in the lens is driven.

ＧＵＩ管理部２１２は、ユーザにより操作部材１５８が操作されると、操作部材１５８から操作入力情報を受け取る。ＣＰＵ１４０は、ＧＵＩ管理部２１２が受け取った操作部材１５８からの操作入力情報に基づいて、各種の処理を行う。例えば、ＧＵＩ管理部１５８がダイジェスト動画を生成するための操作入力情報を操作部材１５８から受け取ると、動画編集部２２６によりダイジェスト動画を生成するための処理が行われる。 When the operation member 158 is operated by the user, the GUI management unit 212 receives operation input information from the operation member 158. The CPU 140 performs various processes based on operation input information from the operation member 158 received by the GUI management unit 212. For example, when the GUI management unit 158 receives operation input information for generating a digest video from the operation member 158, the video editing unit 226 performs a process for generating a digest video.

タイミングジェネレータ（ＴＧ１）２１４は、ＣＭＯＳ素子１３２にタイミング信号を入力する。つまり、タイミングジェネレータ（ＴＧ１）２１４からのタイミング信号によりＣＭＯＳ素子１３２の駆動が制御される。タイミングジェネレータ（ＴＧ１）２１４は、ＣＭＯＳ素子１３２が駆動する時間内に被写体からの映像光を入射させることで、ＣＭＯＳ素子１３２に電子シャッターの機能を持たせることも可能である。 The timing generator (TG1) 214 inputs a timing signal to the CMOS element 132. That is, the driving of the CMOS element 132 is controlled by the timing signal from the timing generator (TG1) 214. The timing generator (TG1) 214 can cause the CMOS element 132 to have an electronic shutter function by causing the image light from the subject to enter during the time that the CMOS element 132 is driven.

同様に、タイミングジェネレータ（ＴＧ２）２１９は、ＣＭＯＳ素子１６０にタイミング信号を入力する。つまり、タイミングジェネレータ（ＴＧ２）２１９からのタイミング信号によりＣＭＯＳ素子１６０の駆動が制御される。タイミングジェネレータ（ＴＧ２）２１９は、ＣＭＯＳ素子１６０が駆動する時間内に被写体からの映像光を入射させることで、ＣＭＯＳ素子１６０に電子シャッターの機能を持たせることも可能である。 Similarly, the timing generator (TG2) 219 inputs a timing signal to the CMOS element 160. That is, the driving of the CMOS element 160 is controlled by the timing signal from the timing generator (TG2) 219. The timing generator (TG2) 219 can cause the CMOS element 160 to have an electronic shutter function by causing the image light from the subject to enter during the time that the CMOS element 160 is driven.

図３に示す、ＣＭＯＳ素子１３２から得られたＲＧＢ画像信号は、主として画像処理部２０６にて、欠陥画素補正、黒Ｌｅｖｅｌ補正などのイメージフロントプロセス（Image front process）処理が施され、更にホワイトバランス（White Balance）補正処理、ベイヤー（Ｂａｙｅｒ）色補間(Ｄｅｍｏｓａｉｃ)処理、色補正処理、ガンマ（Ｇａｍｍａ）補正処理などの電子的処理を行い画像記録が行われる。なお、図３に示す各機能ブロックは、回路（ハードウェア）、または中央演算処理装置（ＣＰＵ）とこれを機能させるためのプログラム（ソフトウェア）によって構成することができ、そのプログラムは、撮像装置１００が備える不揮発性メモリ１４０、または外部から接続されるメモリなどの記録媒体に格納されることができる。 The RGB image signal obtained from the CMOS element 132 shown in FIG. 3 is subjected to image front process processing such as defective pixel correction and black level correction mainly in the image processing unit 206, and further white balance is performed. Image recording is performed by performing electronic processing such as (White Balance) correction processing, Bayer color interpolation (Demosaic) processing, color correction processing, and gamma (Gamma) correction processing. Each functional block shown in FIG. 3 can be configured by a circuit (hardware) or a central processing unit (CPU) and a program (software) for causing it to function. Can be stored in a recording medium such as a non-volatile memory 140 or a memory connected from the outside.

以上のように、撮影者が被写体として動画撮影した対象（人物、風景など）は、ＣＭＯＳ素子１３２によって画像データに変換される。一方、撮影者の顔は、ＣＭＯＳ素子１６０によって画像データに変換される。 As described above, an object (person, landscape, etc.) taken by a photographer as a subject is converted into image data by the CMOS element 132. On the other hand, the photographer's face is converted into image data by the CMOS element 160.

なお、本実施形態では、撮像素子としてＣＭＯＳ素子１３２，１６０を用いているが、本発明は係る例に限定されず、ＣＭＯＳ素子１３２，１６０の代わりにＣＣＤ素子などの他のイメージセンサを用いてもよい。なお、ＣＭＯＳ素子１３２，１６０は、ＣＣＤ素子よりも高速に被写体の映像光を電気信号に変換できるので、被写体を撮像してから画像の合成処理を行うまでの時間を短縮することができる。 In the present embodiment, the CMOS elements 132 and 160 are used as the imaging elements. However, the present invention is not limited to this example, and other image sensors such as CCD elements are used instead of the CMOS elements 132 and 160. Also good. Since the CMOS elements 132 and 160 can convert the video light of the subject into an electrical signal at a higher speed than the CCD element, it is possible to shorten the time from when the subject is imaged to when the image is combined.

図１に示すように、撮像装置１００は、ボディ本体２５０と交換レンズ２６０とから構成されており、ボディ本体２５０から交換レンズ２６０を取り外すことが可能とされている。一方、撮像装置１００は、ボディ本体２５０と交換レンズ２６０が一体に構成されていても良い。 As shown in FIG. 1, the imaging apparatus 100 includes a body main body 250 and an interchangeable lens 260, and the interchangeable lens 260 can be removed from the body main body 250. On the other hand, in the imaging apparatus 100, the body main body 250 and the interchangeable lens 260 may be configured integrally.

以上のように構成された撮像装置１００において、撮影者の顔の表情をインカメラでモニタすることで、撮影している画像、または再生している画像に対する撮影者の思い入れを類推することが可能である。例えば、子供の発表会や運動会を両親が撮影する状況では、撮り損ねないよう子供が登場する前から十分な時間の余裕を確保して記録を始めることが多い。そして、撮りたい子供が現れたり演技が始まると、自然に表情が変わったり、場合によっては声をかけることも少なくない。そして、顔の表情の変化や状態を認識することで、撮影者の「主観的な思い入れ」を「客観的」にインカメラの画像から動画の部分毎に判断することができる。 In the imaging apparatus 100 configured as described above, by monitoring the photographer's facial expression with the in-camera, it is possible to infer the photographer's feelings about the image being captured or the image being reproduced. It is. For example, in a situation where parents take pictures of a child's presentation or athletic meet, recording is often started with a sufficient allowance before the child appears so as not to miss the shot. When a child who wants to take a picture appears or a performance begins, the expression changes naturally and sometimes the voice is spoken. Then, by recognizing changes and states of facial expressions, the photographer's “subjective feeling” can be determined “objectively” for each part of the moving image from the in-camera image.

本実施形態では、インカメラで撮像した表情から判断する撮影者（操作者）の喜怒哀楽のレベルやフレーム間の変化度に応じて客観的に数値化した評価値を算出し、評価値が高い数値ほど動画の重要シーンが含まれていると判断する。そして、所望の時間に収まるように評価値が高い動画区間を順次抽出することで、重要な部分で構成されたダイジェスト動画を自動的に生成する。 In the present embodiment, an objective numerical value is calculated according to the level of change of the emotion and emotion of the photographer (operator) determined from the facial expression captured by the in-camera. The higher the value, the more important scenes of the movie are included. Then, by sequentially extracting moving image sections having high evaluation values so as to fit in a desired time, a digest moving image composed of important portions is automatically generated.

評価値の算出は、ＣＭＯＳ素子１３２による動画撮像時に行うことができる。また、評価値の算出は、再生処理部２２４によって記録メディア１５２に記録された動画ファイルの動画を再生する際に行うこともできる。評価値を算出した後は、評価値に基づいて任意のタイミングでダイジェスト動画を作成することができる。 The evaluation value can be calculated at the time of moving image capturing by the CMOS element 132. Also, the evaluation value can be calculated when the moving image of the moving image file recorded on the recording medium 152 by the reproduction processing unit 224 is reproduced. After calculating the evaluation value, a digest video can be created at an arbitrary timing based on the evaluation value.

このため、ＣＰＵ２００の顔認識部（分析部）２２０は、ＣＭＯＳ素子１６０が撮像した画像データから撮影者（または再生動画を視聴する操作者）の顔の表情を認識し、その特徴を分析する。つまり、顔認識部２２０は、操作者の顔の特徴を分析する分析部として機能する。
表情評価値算出部２２２は、顔認識部２２０が認識した顔の表情に基づいて、表情を数値化して評価値を算出する。数値化した評価値のデータは、ＣＭＯＳ素子１３２による撮像動画の記録開始とともに生成され、撮像動画と同じタイムラインで記録される。また、動画再生時に評価値を算出する場合、数値化した評価値のデータは、再生処理部２２４による動画の再生開始とともに生成され、再生動画と同じタイムラインで記録される。 For this reason, the face recognition unit (analysis unit) 220 of the CPU 200 recognizes the facial expression of the photographer (or the operator who views the playback moving image) from the image data captured by the CMOS element 160 and analyzes the feature. That is, the face recognition unit 220 functions as an analysis unit that analyzes the characteristics of the operator's face.
The facial expression evaluation value calculation unit 222 calculates the evaluation value by digitizing the facial expression based on the facial expression recognized by the face recognition unit 220. The digitized evaluation value data is generated at the start of recording of the captured moving image by the CMOS element 132 and is recorded on the same timeline as the captured moving image. When calculating an evaluation value at the time of reproducing a moving image, the digitized evaluation value data is generated when the reproduction processing unit 224 starts reproducing the moving image, and is recorded on the same timeline as the reproduced moving image.

表情の評価値は、予め設定した規則に基づいて作成することができる。表情の評価値には個人差があるが、一つの動画に対する表情の画像データの全体の中から、喜怒哀楽の表現が現れる上位の箇所から順に抽出することで、相対的な判断ができる。これにより、評価値の個人差だけでなく、状況の違いによる評価値の絶対量の違いも低減される。 The evaluation value of the facial expression can be created based on a preset rule. Although there are individual differences in the evaluation value of facial expressions, a relative judgment can be made by extracting from the entire image data of facial expressions for one moving image in order from the top places where expressions of emotions are expressed. Thereby, not only the individual difference of the evaluation value but also the difference of the absolute amount of the evaluation value due to the difference of the situation is reduced.

以下、評価値の算出方法を具体的に説明する。図４及び図５は、評価値を規定する上記規則を説明するための模式図である。ここで、図４は、人物の表情のうち、評価値を決定する要素を示す模式図である。図４に示すように、評価値は、（ａ）目が細くなる、（ｂ）目を見開く、（ｃ）目じりが下がる、（ｄ）口が開く、（ｅ）口角が上がる、などの要素から決定される。例えば、（ｅ）の口角を観察することで、撮影者の心の動きを推察することができる。従って、評価値に基づいて、騒々しい環境下であっても撮影者が声を発したタイミングを検出することも可能である。 Hereinafter, a method for calculating the evaluation value will be specifically described. FIG. 4 and FIG. 5 are schematic diagrams for explaining the rules defining the evaluation value. Here, FIG. 4 is a schematic diagram showing elements that determine an evaluation value in a facial expression of a person. As shown in FIG. 4, the evaluation values are elements such as (a) narrowing the eyes, (b) opening the eyes, (c) lowering the eyes, (d) opening the mouth, (e) increasing the mouth angle. Determined from. For example, the movement of the photographer's heart can be inferred by observing the mouth angle of (e). Therefore, based on the evaluation value, it is possible to detect the timing at which the photographer speaks even in a noisy environment.

図５は、図４に示す各要素（ａ）〜（ｅ）に基づいて評価値を決定するための規則を示す模式図である。図４に示すように、各要素（ａ）〜（ｅ）が通常レベルの場合は、評価値が０となる。一方、各要素が変化した場合の最大の評価値を設定し、変化のレベルに応じて評価値を大きくする。例えば、「（ａ）目が細くなる」、については、最も目が細くなった場合の評価値を８とし、目が細くなるレベルに応じて評価値を大きくする。同様に、「（ｂ）目を見開く」、については、最も目を見開いた場合の評価値を１２とし、目を見開くレベルに応じて評価値を大きくする。 FIG. 5 is a schematic diagram showing rules for determining an evaluation value based on the elements (a) to (e) shown in FIG. As shown in FIG. 4, when each element (a) to (e) is at the normal level, the evaluation value is 0. On the other hand, the maximum evaluation value when each element changes is set, and the evaluation value is increased according to the level of change. For example, with respect to “(a) eyes become narrower”, the evaluation value when the eyes become the narrowest is set to 8, and the evaluation value is increased according to the level at which the eyes become thinner. Similarly, for “(b) widening eyes”, the evaluation value when the eyes are most widened is 12, and the evaluation value is increased according to the level at which the eyes are widened.

ここで、図５に示す規則は、被写体の表情変化が激しく表れる要素に対する重み付けを高くしている。例えば、口が大きく開いているとき（例えば、思わず声を出して子供を応援しているときなど）、口角が大きく上がったとき（例えば、笑っているときなど）は、被写体の感情変化が「（ｄ）口が開く」、「（ｅ）口角が上がる」の要素に大きく反映される。このため、表情変化に応じて図５の各要素に重み付けをすることが望ましい。図５において、「（ｃ）目じりが下がる」、と「（ｄ）口が開く」を比較すると、最も大きく変化した場合の評価値は、「（ｃ）目じりが下がる」は“４”であり、「（ｄ）口が開く」は“２０”である。これは、両者のそれぞれが最大に変化した場合、「（ｄ）口が開く」の方がより感情の変化を大きく表しているためである。このように、各要素（ａ）〜（ｅ）に応じて評価値の重み付けを変化させることで、評価値に基づいてユーザの感情をより適正に評価することができる。 Here, the rule shown in FIG. 5 increases the weighting for an element in which a change in the facial expression of a subject appears severely. For example, when your mouth is wide open (for example, when you are unintentionally speaking out and cheering on your child), or when your mouth is wide (for example, when you are laughing), This is largely reflected in the elements of (d) Mouth opening ”and“ (e) Mouth angle rising ”. Therefore, it is desirable to weight each element in FIG. In FIG. 5, comparing “(c) lowering of eyes” and “(d) opening of mouth”, the evaluation value when “(c) lowering of eyes” is “4” “(D) Open mouth” is “20”. This is because, when each of the two changes to the maximum, “(d) mouth open” represents a greater emotional change. Thus, by changing the weighting of the evaluation value according to each element (a) to (e), it is possible to more appropriately evaluate the user's emotion based on the evaluation value.

各要素（ａ）〜（ｅ）について、図５の通常〜最大のどのレベルまで変化したかの判定は、予めユーザの表情に基づいて通常のレベルと最大のレベルを設定することによって行うことができる。例えば、「（ａ）目を見開く」について例を挙げると、最初にユーザの顔を撮影した時に（撮影開始のスタンバイ時など）、両目の間隔、目の大きさ（横幅）、顔の輪郭等の基本的なデータを取得し、これに基づいて、通常の目の大きさ（図４中に示す実線Ｎ）と、最大に目を見開いた場合の目の大きさ（図４中に示す一点鎖線Ｍ）を予め設定しておく。そして、通常の目の大きさ（実線Ｎ）に対して、一点鎖線Ｍの範囲にどれだけ目の大きさが近づいたかを検出することで、図５に示す通常〜最大のどのレベルまで目が見開いたかを判定することができる。 For each element (a) to (e), it can be determined by setting the normal level and the maximum level in advance based on the user's facial expression to determine the normal to maximum level in FIG. it can. For example, when “(a) opening eyes” is given as an example, when the user's face is photographed for the first time (such as at the start of photographing), the distance between both eyes, the size of the eyes (width), the face outline, etc. Based on this, based on this, the normal eye size (solid line N shown in FIG. 4) and the eye size when the eye is fully opened (one point shown in FIG. 4) A chain line M) is set in advance. Then, by detecting how close the eye size is to the range of the alternate long and short dash line M with respect to the normal eye size (solid line N), the level of the eye from normal to maximum shown in FIG. It can be determined whether it is wide open.

顔認識部２２０は、認識した顔の画像から目、口、鼻などの特徴を認識し、分析する。表情評価値算出部２２２は、顔検出部２２０が検出した特徴に基づいて、各要素（ａ）〜（ｅ）の評価値を算出し、各要素の評価値を合計してΣ｛（ａ）＋（ｂ）＋（ｃ）＋（ｄ）＋（ｅ）｝を算出することにより、最終的な評価値を算出する。図５に示す規則は、撮像装置１００の不揮発性メモリ１４０に格納されている。また、ユーザは、操作部材１５８から操作を行うことで、図５に示す規則を最適に変更することができる。 The face recognition unit 220 recognizes and analyzes features such as eyes, mouth, and nose from the recognized face image. The facial expression evaluation value calculation unit 222 calculates the evaluation value of each element (a) to (e) based on the feature detected by the face detection unit 220, and sums the evaluation value of each element to obtain Σ {(a) The final evaluation value is calculated by calculating + (b) + (c) + (d) + (e)}. The rules shown in FIG. 5 are stored in the non-volatile memory 140 of the imaging device 100. Further, the user can optimally change the rule shown in FIG. 5 by performing an operation from the operation member 158.

ダイジェスト動画を生成する際には、評価値が所定のしきい値を超えている動画の区間を抽出することで、ダイジェスト動画を生成することができる。 When generating a digest movie, a digest movie can be generated by extracting a segment of a movie whose evaluation value exceeds a predetermined threshold.

以上のような構成によれば、ユーザが撮影動画または再生動画を視認する際に、動画の中で興味を示したり、感動したり、また心を動かされたりした場合は、表情の評価値が高くなる。従って、評価値に基づいて動画のダイジェストを生成することで、動画のダイジェストを最適に生成することが可能となる。 According to the above configuration, when a user visually recognizes a captured video or a playback video, if the user shows interest in the video, is impressed, or is moved, the evaluation value of the facial expression is Get higher. Therefore, it is possible to optimally generate a video digest by generating a video digest based on the evaluation value.

また、図５に示したような規則を撮影者（または再生動画を視聴する操作者）が予め知っておくことで、撮影者等においては、表情を意図的に変えることで、動画の編集ポイントを指示・指定する使い方が可能になり、意図的に抽出したい部分動画を選択することが可能となる。例えば、現在撮影（または再生）している動画がダイジェスト動画に含まれるようにしたい場合は、意図的に表情を笑顔にすることで、評価値が高く算出され、現在撮影（または再生）している動画をダイジェスト動画に含めることができる。従って、静粛が求められる場合や、被写体の声や音の邪魔にならないように黙って撮影する状況下であっても、特別な道具・装置を用いることなく、また特別な操作をすることなく、表情を変えるだけで、ダイジェスト動画を自動編集するための指示情報を動画再生時又は記録時に動画のタイムラインに埋め込むことが可能となる。このように、撮影者が動画抽出の規則を予め知っておくことで、自動編集で残したい箇所を意図的に撮影しながら動画に埋め込むことが可能となる。これにより、所望の部分動画を抽出することが可能となり、効率良くダイジェスト動画を作成することができる。 In addition, when the photographer (or an operator who views the playback video) knows the rules as shown in FIG. 5 in advance, the photographer or the like can change the expression by moving the expression intentionally. It is possible to select and specify a partial moving image to be intentionally extracted. For example, if you want to include the video currently being shot (or played back) in the digest video, the evaluation value will be calculated high by intentionally making the expression smile, Can be included in the digest video. Therefore, even when quietness is required, or even when shooting silently so as not to interfere with the subject's voice or sound, without using special tools and devices, and without performing special operations, By simply changing the facial expression, it is possible to embed instruction information for automatically editing a digest video in the timeline of the video during playback or recording. In this way, the photographer knows in advance the rules for moving image extraction, so that it is possible to embed in the moving image while intentionally shooting a portion desired to be left by automatic editing. Thereby, a desired partial moving image can be extracted, and a digest moving image can be created efficiently.

また、動画の撮影と同時に評価値を算出することができるため、動画撮影した後に特徴検出を行う必要がなく、評価値を算出するために動画データを再スキャンする必要がない。従って、評価値の算出を効率よく、且つ迅速に行うことが可能である。 Further, since the evaluation value can be calculated simultaneously with the shooting of the moving image, it is not necessary to perform feature detection after shooting the moving image, and it is not necessary to rescan the moving image data in order to calculate the evaluation value. Therefore, the evaluation value can be calculated efficiently and quickly.

次に、本実施形態に係る撮像装置１００で行われる具体的な処理について説明する。撮影者が被写体を撮影することにより、ＣＭＯＳ素子１３２によって被写体の動画が撮像される。また、同時に、ＣＭＯＳ素子１６０によって撮影者（または再生動画を視聴する操作者）の顔が撮像される。 Next, specific processing performed by the imaging apparatus 100 according to the present embodiment will be described. When the photographer photographs the subject, a moving image of the subject is captured by the CMOS element 132. At the same time, the face of the photographer (or the operator who views the playback video) is imaged by the CMOS element 160.

ＣＰＵ２００の顔認識部２２０は、ＣＭＯＳ素子１３２によって撮像された被写体の動画から、撮影者の顔を検出し、その特徴を分析する。そして、ＣＰＵ２００の表情評価値算出部２２２は、顔認識部２２０によって認識された撮影者の顔から図５に示す各要素（ａ）〜（ｅ）を抽出し、図５の規則に基づいて表情の評価値を算出する。上述したように、表情評価値算出部２２２は、各要素（ａ）〜（ｅ）の評価値を合計して最終的に評価値を算出する。記録ファイル生成部２２８は、算出した評価値を撮像動画と同じタイムラインで記録する。 The face recognition unit 220 of the CPU 200 detects the photographer's face from the moving image of the subject imaged by the CMOS element 132 and analyzes the characteristics thereof. Then, the facial expression evaluation value calculation unit 222 of the CPU 200 extracts the elements (a) to (e) shown in FIG. 5 from the photographer's face recognized by the face recognition unit 220, and the facial expression based on the rules of FIG. The evaluation value of is calculated. As described above, the facial expression evaluation value calculation unit 222 adds the evaluation values of the elements (a) to (e) and finally calculates the evaluation value. The recording file generation unit 228 records the calculated evaluation value on the same timeline as the captured moving image.

また、動画再生時に評価値を算出する場合、ＣＰＵ２００の再生処理部２２４は、記録メディア１５２に記録された動画ファイルを読み出して再生する。再生した動画は、ＬＣＤ１４６に表示される。動画再生中、操作者はＬＣＤ１４６の再生動画を視認する。このとき、ＣＭＯＳ素子１６０によって操作者の顔が撮像される。そして、顔認識部２２０は、ＣＭＯＳ素子１６０によって撮像された動画から撮影者の顔を検出し、その特徴を分析する。また、表情評価値算出部２２２は、顔認識部２２０によって認識された撮影者の顔から図５に示す各要素（ａ）〜（ｅ）を抽出し、図５の規則に基づいて表情の評価値を算出する。記録ファイル生成部２２８は、算出した評価値を再生動画と同じタイムラインで記録する。これにより、動画再生時においても、撮影時と同様に、動画のタイムラインに評価値のデータを記録することができる。 When calculating an evaluation value during moving image reproduction, the reproduction processing unit 224 of the CPU 200 reads out and reproduces the moving image file recorded on the recording medium 152. The reproduced moving image is displayed on the LCD 146. During the moving image reproduction, the operator visually recognizes the reproduced moving image on the LCD 146. At this time, the operator's face is imaged by the CMOS element 160. Then, the face recognition unit 220 detects the photographer's face from the moving image captured by the CMOS element 160 and analyzes the feature. The facial expression evaluation value calculation unit 222 extracts the elements (a) to (e) shown in FIG. 5 from the photographer's face recognized by the face recognition unit 220, and evaluates facial expressions based on the rules of FIG. Calculate the value. The recording file generation unit 228 records the calculated evaluation value on the same timeline as the reproduced moving image. As a result, even during moving image reproduction, evaluation value data can be recorded in the timeline of the moving image, as in the case of shooting.

ＣＰＵ２００の動画編集部２２６は、表情の評価値に基づいて動画を編集する。例えば、撮像装置１００は、ユーザが動画の内容を手早く確認したいとき、またはユーザが動画の主たる部分のみを抽出したいときに、動画編集部２２６による自動編集機能を動作させる。この場合、動画編集部２２６は、操作者による動画編集の指示が操作部材１５８に入力された際に動画編集を行う。 The moving image editing unit 226 of the CPU 200 edits the moving image based on the facial expression evaluation value. For example, the imaging apparatus 100 operates the automatic editing function by the moving image editing unit 226 when the user wants to quickly check the content of the moving image or when the user wants to extract only the main part of the moving image. In this case, the moving image editing unit 226 performs moving image editing when an instruction for moving image editing by the operator is input to the operation member 158.

また、動画編集部２２６による動画の編集は、撮影終了直後、または動画再生の終了直後に自動的に行うこともできる。また、動画編集部２２６による動画の編集は、画像ファイルの一覧をＬＣＤ１４６にサムネイルとして表示した際に、自動的に行うこともできる。動画編集により生成されたダイジェスト動画は、記録メディア１５２に記録される。 In addition, the editing of the moving image by the moving image editing unit 226 can be automatically performed immediately after the end of shooting or immediately after the end of moving image reproduction. The editing of moving images by the moving image editing unit 226 can also be automatically performed when a list of image files is displayed on the LCD 146 as thumbnails. The digest movie generated by the movie editing is recorded on the recording medium 152.

図６は、動画編集部２２６による自動編集機能を説明するための模式図である。動画編集部２２６による自動編集では、表情の評価値を参照して、評価値が大きい区間の動画を順次に抽出する。この際、評価値が大きい区間は、所定のしきい値に基づいて判定する。図６では、動画を抽出する際のしきい値として、編集時間が短い場合のしきい値Ｔ１、編集時間が中程度の場合のしきい値Ｔ２、編集時間が長い場合のしきい値Ｔ３の３通りを設定している。 FIG. 6 is a schematic diagram for explaining an automatic editing function by the moving image editing unit 226. In the automatic editing by the moving image editing unit 226, moving images in a section having a large evaluation value are sequentially extracted with reference to the evaluation value of the facial expression. At this time, a section having a large evaluation value is determined based on a predetermined threshold value. In FIG. 6, as threshold values for extracting moving images, a threshold value T1 when the editing time is short, a threshold value T2 when the editing time is medium, and a threshold value T3 when the editing time is long. Three types are set.

編集時間が短い場合は、評価値がしきい値Ｔ１を超える区間Ｒ１１，Ｒ１２の動画を抽出してダイジェスト動画を生成する。また、編集時間が中程度の場合は、評価値がしきい値Ｔ２を超える区間Ｒ２１，Ｒ２２，Ｒ２３，Ｒ２４，Ｒ２５を抽出してダイジェスト動画を生成する。また、編集時間が長い場合は、評価値がしきい値Ｔ３を超える区間Ｒ３１，Ｒ３２，Ｒ３３，Ｒ３４を抽出してダイジェスト動画（部分動画）を生成する。 When the editing time is short, the moving images in the sections R11 and R12 whose evaluation values exceed the threshold value T1 are extracted to generate a digest moving image. When the editing time is intermediate, the sections R21, R22, R23, R24, and R25 where the evaluation value exceeds the threshold value T2 are extracted to generate a digest video. If the editing time is long, sections R31, R32, R33, and R34 whose evaluation values exceed the threshold value T3 are extracted to generate a digest movie (partial movie).

このように、自動編集機能の作動時において、動画編集部２２６は、撮影者（操作者）が所望する編集時間となるように評価値のしきい値を変化させる。すなわち、動画編集部２２６は、抽出したダイジェスト動画の延べ時間が、予め定められた時間に収まるようにしきい値を変化させ、最適なしきい値を探索する。そして、しきい値を上回る評価値の区間を抽出し、区間合計時間が所望編集時間に最も近くなるしきい値を探索し、抽出区間を繋げてダイジェスト動画を作成する。これにより、ダイジェスト動画の長さをユーザが望む長さに設定することができる。なお、図６では、しきい値Ｔ１，Ｔ２，Ｔ３の３通りを示しているが、しきい値はＴ１，Ｔ２，Ｔ３以外の任意の値に設定することもできる。ユーザは、操作部材１５８を操作することで、ダイジェスト動画の長さを自由に設定することができる。動画編集部２２６は、操作部材１５８から入力されたダイジェスト動画の長さの情報に基づいて、しきい値を最適に調整する。これにより、ダイジェスト動画の長さをユーザの所望の時間とすることができる。 Thus, when the automatic editing function is activated, the moving image editing unit 226 changes the threshold value of the evaluation value so that the editing time desired by the photographer (operator) is reached. That is, the moving image editing unit 226 searches for the optimum threshold value by changing the threshold value so that the total time of the extracted digest moving image falls within a predetermined time. Then, an evaluation value section exceeding the threshold value is extracted, a threshold value whose section total time is closest to the desired editing time is searched, and a digest movie is created by connecting the extracted sections. Thereby, the length of the digest video can be set to a length desired by the user. In FIG. 6, three threshold values T1, T2, and T3 are shown, but the threshold value can be set to any value other than T1, T2, and T3. The user can freely set the length of the digest video by operating the operation member 158. The moving image editing unit 226 optimally adjusts the threshold value based on the digest moving image length information input from the operation member 158. Thereby, the length of a digest animation can be made into a user's desired time.

図７は、図５に示す各要素（ａ）〜（ｅ）の各評価値と、各評価値の合計値（ｓｕｍ）が時間（ｔｉｍｅ）の経過に伴って変化する場合に、ダイジェスト動画を抽出する区間を示す模式図である。評価値のしきい値を“１５”とした場合、１２秒から１６秒の区間では、評価値の合計（ｓｕｍ）が１５以上であるため、図７の太枠内で示す１２秒から１６秒の区間の動画がダイジェスト動画として抽出される。 FIG. 7 shows a digest video when each evaluation value of each element (a) to (e) shown in FIG. 5 and the total value (sum) of each evaluation value change with time (time). It is a schematic diagram which shows the area to extract. When the threshold value of the evaluation value is “15”, the total (sum) of the evaluation values is 15 or more in the section from 12 seconds to 16 seconds, and therefore, from 12 seconds to 16 seconds shown in the thick frame in FIG. Is extracted as a digest video.

図８は、評価値のサンプリング時間（Sampling Time）と評価値の関係をプロットし、サンプリング時間の間を直線補間した例を示している。ここでは、サンプリング時間が４秒間隔の場合を示している。また、評価値のしきい値は“１５”であるものとする。図８において、しきい値が１５を超える区間は、８．５秒〜１８．０秒の区間Ｒ４であり、この時間帯の動画を抽出してダイジェスト動画を生成する。 FIG. 8 shows an example in which the relationship between the sampling time of the evaluation value (Sampling Time) and the evaluation value is plotted and linear interpolation is performed between the sampling times. Here, a case where the sampling time is 4 seconds is shown. The threshold value of the evaluation value is “15”. In FIG. 8, a section where the threshold value exceeds 15 is a section R4 of 8.5 seconds to 18.0 seconds. A moving image in this time zone is extracted to generate a digest moving image.

なお、サンプリング時間の関係で、しきい値が“１５”を超える区間の動画を抽出すると、動画の冒頭部分、または最後の部分で所望の動画がダイジェスト動画から外れてしまうことが考えられる。このため、動画の抽出開始は、しきい値が“１５”を超える時刻の数秒程前から開始しても良い。図８に示す区間Ｒ５は、動画抽出の開始位置をしきい値が１５を超える時間よりも２秒程度早いタイミングにした例を示している。同様に、動画抽出の終了も、しきい値が“１５”以下となる時間よりも遅いタイミングにすることが望ましい。これにより、ダイジェスト動画からユーザの所望の部分が漏れてしまうことがなく、鑑賞し易いダイジェスト動画を作成することができる。ダイジェスト動画の始まりと終わりの画像は、フェードイン、フェードアウトで表示しても良い。 It should be noted that if a moving image in a section where the threshold value exceeds “15” is extracted due to the sampling time, a desired moving image may deviate from the digest moving image at the beginning or the last part of the moving image. For this reason, the extraction of the moving image may be started about several seconds before the time when the threshold value exceeds “15”. A section R5 shown in FIG. 8 shows an example in which the moving image extraction start position is set at a timing approximately 2 seconds earlier than the time when the threshold value exceeds 15. Similarly, it is desirable that the end of moving image extraction is also later than the time when the threshold value is “15” or less. Thereby, the user's desired part does not leak from the digest video, and a digest video that is easy to watch can be created. The start and end images of the digest movie may be displayed with fade-in and fade-out.

図９は、評価値の保存方法を示す模式図である。図９（Ａ）は、評価値を動画ファイル４００内のメタデータとして保存した場合を示している。この場合、動画ファイル４００は、ヘッダー４０２、表情の評価値４０４、動画データ４０６を含むものとなる。 FIG. 9 is a schematic diagram illustrating a method for storing evaluation values. FIG. 9A shows a case where the evaluation value is stored as metadata in the moving image file 400. In this case, the moving image file 400 includes a header 402, a facial expression evaluation value 404, and moving image data 406.

図９（Ｂ）は、評価値を動画ファイル４００と対応付けされた別のファイル５００として保存した場合を示している。この場合、拡張子を変えるなどの手法により、動画ファイル４００と評価値のファイル５００とが１対１で対応するようにしておく。 FIG. 9B shows a case where the evaluation value is stored as another file 500 associated with the moving image file 400. In this case, the moving image file 400 and the evaluation value file 500 are in a one-to-one correspondence by changing the extension.

図９（Ａ）及び図９（Ｂ）のいずれの場合においても、動画ファイル４００、評価値のファイル５００は、記録メディア１５２に記録される。動画ファイル４００と評価値を対応付けて保存することにより、動画編集部２２６は、任意のタイミングでダイジェスト動画を生成することが可能となる。また、図７に示したように、評価値はテキストデータで表すことができるため、テキストデータを保存するのみで、評価値を確実に保存することができる。 In both cases of FIG. 9A and FIG. 9B, the moving image file 400 and the evaluation value file 500 are recorded on the recording medium 152. By storing the video file 400 and the evaluation value in association with each other, the video editing unit 226 can generate a digest video at an arbitrary timing. Further, as shown in FIG. 7, since the evaluation value can be expressed by text data, the evaluation value can be surely saved only by saving the text data.

評価値を保存する際には、サンプリングを適度に間引く、評価値をテキストデータにする、等の手法でデータ量を削減することで、評価値を短時間で保存することが可能となる。また、動画編集時においても、短時間の読み取り動作で評価値を読み出すことが可能である。 When storing the evaluation value, it is possible to store the evaluation value in a short time by reducing the amount of data by a method such as thinning sampling appropriately or converting the evaluation value into text data. Even when editing a moving image, it is possible to read the evaluation value with a short reading operation.

また、動画編集部２２６が生成したダイジェスト動画についても、記録メディア１５２に記録される。ダイジェスト動画を生成した後は、所望のダイジェスト動画を選択することにより、ダイジェスト動画を再生することが可能である。 The digest movie generated by the movie editing unit 226 is also recorded on the recording medium 152. After the digest movie is generated, the digest movie can be reproduced by selecting a desired digest movie.

例えば、ＬＣＤ１４６に静止画のサムネイルとともにダイジェスト動画のサムネイルを表示しておき、ダイジェスト動画のサムネイルをクリックすることで、ＬＣＤ１４６の表示画面サイズにダイジェスト動画を拡大して再生するようにしても良い。なお、ＬＣＤ１４６上にダイジェスト動画をサムネイルで表示する場合は、所定の時間のダイジェスト動画を無限ループにして繰り返し表示する、動画の冒頭部分を静止画として表示する、等の方法が考えられる。このように、ダイジェスト動画をＬＣＤ１４６上にサムネイル表示することで、静止画のサムネイルと同様に、動画についてもサムネイルで内容を容易に確認することが可能となる。 For example, a thumbnail of a digest video may be displayed on the LCD 146 together with a thumbnail of a still image, and the digest video may be enlarged and reproduced to the display screen size of the LCD 146 by clicking the thumbnail of the digest video. In addition, when displaying the digest moving image as a thumbnail on the LCD 146, a method of repeatedly displaying the digest moving image for a predetermined time in an infinite loop, displaying the beginning portion of the moving image as a still image, and the like can be considered. In this way, by displaying the digest moving image as a thumbnail on the LCD 146, the contents of the moving image can be easily confirmed with the thumbnail as well as the still image thumbnail.

図１０は、撮像装置１００における動画像処理方法の処理を示すフローチャートである。先ず、ステップＳ１０では、インカメラによる操作者の撮像を開始する。次のステップＳ１２では、インカメラで撮像した動画に対して顔認識を行い、顔の特徴を分析する。次のステップＳ１４では、動画の撮像時又は動画の再生時に、顔の表情を数値化して評価値を算出する。次のステップＳ１６では、数値化した評価値を動画の撮像又は再生と同じタイムラインで記録する。次のステップＳ１８では、評価値に基づいて部分動画を順次抽出してダイジェストを生成する。 FIG. 10 is a flowchart illustrating processing of the moving image processing method in the imaging apparatus 100. First, in step S10, imaging of the operator by the in-camera is started. In the next step S12, face recognition is performed on the moving image captured by the in-camera, and the facial features are analyzed. In the next step S14, the facial expression is converted into a numerical value when the moving image is captured or the moving image is reproduced, and an evaluation value is calculated. In the next step S16, the digitized evaluation value is recorded on the same timeline as the moving image capturing or reproduction. In the next step S18, partial videos are sequentially extracted based on the evaluation value to generate a digest.

以上説明したように本実施形態によれば、インカメラで撮影した操作者の表情に基づいて、所望のダイジェスト動画を作成することができる。従って、操作者が動画を視聴した際に興味を感じた部分、心を動かされた部分の動画を正確に抽出することが可能となり、操作者の望み通りのダイジェスト動画を作成することが可能となる。これにより、操作者の主観を反映した効率的な動画の自動編集が可能となる。また、動画中の被写体の状況（人物の有無、向き、人数など）や動画中に含まれる音声に依存することなく、動画の自動編集が可能となる。 As described above, according to the present embodiment, a desired digest video can be created based on the facial expression of the operator photographed by the in-camera. Therefore, it is possible to accurately extract the video of the part that the operator felt interested in when watching the video and the part that moved the heart, and it was possible to create the digest video as desired by the operator Become. As a result, efficient automatic editing of the moving image reflecting the operator's subjectivity becomes possible. Further, it is possible to automatically edit a moving image without depending on the state of a subject in the moving image (the presence / absence of a person, the orientation, the number of people, etc.) and the sound included in the moving image.

更に、表情の評価値を算出する規則を操作者が予め認識しておくことで、操作者の意図に従ってダイジェスト動画を生成することができる。また、評価値の算出は、動画撮影と同時に行うことができるため、特徴量を算出するために動画データを再デコーディングする必要がなく、評価値算出の処理を効率よく行うことが可能となる。 Furthermore, if the operator recognizes in advance the rules for calculating the facial expression evaluation value, a digest video can be generated according to the operator's intention. Since the evaluation value can be calculated simultaneously with the moving image shooting, it is not necessary to re-decode the moving image data in order to calculate the feature amount, and the evaluation value calculation process can be performed efficiently. .

以上、添付図面を参照しながら本発明の好適な実施形態について詳細に説明したが、本発明はかかる例に限定されない。本発明の属する技術の分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本発明の技術的範囲に属するものと了解される。 The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field to which the present invention pertains can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that these also belong to the technical scope of the present invention.

１００撮像装置
１３２，１６０ＣＭＯＳ素子
２２０顔認識部
２２２表情評価値算出部
２２６動画編集部
２２８記録Ｆｉｌｅ生成部
DESCRIPTION OF SYMBOLS 100 Imaging device 132,160 CMOS element 220 Face recognition part 222 Facial expression evaluation value calculation part 226 Movie editing part 228 Recording file production | generation part

Claims

A first moving image imaging unit that images the operator;
An analysis unit for analyzing the characteristics of the operator's face from the first moving image captured by the first moving image capturing unit;
From the first moving image captured at the time of imaging or playback of the second moving image different from the first moving image, the facial expression of the operator is digitized based on the analysis result by the analysis unit, and the evaluation value is obtained. An evaluation value calculation unit for calculating,
A recording unit that records the numerical evaluation value on the same timeline as the second moving image;
A video editing unit that sequentially extracts partial videos of the second video based on the evaluation value and generates a digest;
A moving image processing apparatus comprising:

A second moving image capturing unit that captures the second moving image;
The moving image processing apparatus according to claim 1, wherein the evaluation value calculating unit performs the numerical value when the second moving image capturing unit captures the second moving image.

The moving image processing apparatus according to claim 1, wherein the moving image editing unit sequentially extracts the partial moving images in a section in which the value of the evaluation value is larger than a predetermined threshold value.

The moving image according to claim 3, wherein the moving image editing unit extracts the partial moving image by changing the threshold value so that a total time of the partial moving image falls within a predetermined time. Image processing device.

The moving image editing unit sequentially extracts the partial moving images in a section obtained by adding a predetermined margin time before or after the section in which the value of the evaluation value is larger than a predetermined threshold value. The moving image processing apparatus according to claim 3.

The moving image processing apparatus according to claim 1, wherein the evaluation value calculation unit performs the digitization based on a predetermined rule.

The moving image processing apparatus according to claim 6, wherein the rule is defined such that weighting of an evaluation value is increased when a change in facial expression is severe.

The moving image processing apparatus according to claim 1, wherein the recording unit records the evaluation value as metadata in data of the second moving image.

The moving image processing apparatus according to claim 1, wherein the recording unit records the evaluation value as separate data that has a one-to-one correspondence with the data of the second moving image.

Imaging the operator;
Analyzing the characteristics of the operator's face from the first moving image captured by the first moving image capturing unit;
Based on the analysis result of the facial feature, the facial expression of the operator is digitized and evaluated from the first video captured at the time of imaging or playback of the second video different from the first video Calculating a value;
Recording the digitized evaluation value on the same timeline as the second moving image;
Sequentially extracting partial videos of the second video based on the evaluation value to generate a digest;
A moving image processing method comprising:

Capturing the second moving image,
The moving image processing method according to claim 10, wherein in the step of calculating the evaluation value, the quantification is performed when the second moving image is captured.

12. The moving image processing method according to claim 10 or 11, wherein in the step of generating the digest, the partial moving images in a section where the value of the evaluation value is larger than a predetermined threshold are sequentially extracted.

13. The partial moving image is extracted according to claim 12, wherein in the step of generating the digest, the partial moving image is extracted by changing the threshold value so that a total time of the partial moving image falls within a predetermined time. Video processing method.

In the step of generating the digest, sequentially extracting the partial moving images in a section obtained by adding a predetermined margin time before or after the section in which the value of the evaluation value is larger than a predetermined threshold value. The moving image processing method according to claim 12, wherein the moving image processing method is characterized.

15. The moving image processing method according to claim 10, wherein in the step of calculating the evaluation value, the digitization is performed based on a predetermined rule.

16. The moving image processing method according to claim 15, wherein the rule is defined such that weighting of an evaluation value is increased when a change in facial expression is severe.

The moving image processing method according to claim 10, wherein, in the recording step, the evaluation value is recorded as metadata in the data of the second moving image.

The moving image processing method according to any one of claims 10 to 16, wherein, in the recording step, the evaluation value is recorded as separate data corresponding one-to-one with the data of the second moving image. .