JP2019102850A

JP2019102850A - Information processing apparatus, control method of the same, and program

Info

Publication number: JP2019102850A
Application number: JP2017228524A
Authority: JP
Inventors: 達也菅野; Tatsuya Sugano; 稔野村; Minoru Nomura
Original assignee: Canon Marketing Japan Inc
Current assignee: Canon Marketing Japan Inc
Priority date: 2017-11-29
Filing date: 2017-11-29
Publication date: 2019-06-24

Abstract

To provide a mechanism for efficiently setting an in point or an out point for editing video data on the basis of a sound included in video data of a person being taken.SOLUTION: The information processing apparatus includes: detection means for detecting sound included in video data in which a person is photographed; and setting means for setting an in point or an out point for editing the video data to a position specified on the basis of the sound detected by the detection means.SELECTED DRAWING: Figure 3

Description

本発明は、情報処理装置、及びその制御方法、プログラムに関し、特に、動画データを編集するためのイン点、又はアウト点を効率的に設定する技術に関する。 The present invention relates to an information processing apparatus, a control method thereof, and a program, and more particularly to a technique for efficiently setting an in point or an out point for editing moving image data.

従来、動画の中から、一部の動画を切り出すためには、ユーザが、その動画の映像を確認しながら、動画の切り出し位置を手入力して指定し、切り出し処理を行っており、その作業が煩雑であった。 Conventionally, in order to cut out a part of a moving image from moving images, the user manually inputs and specifies the cutout position of the moving image while checking the image of the moving image, and performs the extraction processing, and the operation Was complicated.

特許文献１は、ユーザが、一連の動画を確認しながら、切取開始時間、切取終了時間を操作キーから手入力して指定し、動画データからの切取処理を行うことが記載されている。 Patent Document 1 describes that a user manually inputs and designates a cutting start time and a cutting end time from an operation key while confirming a series of moving pictures, and performs a cutting process from moving picture data.

特開２０１０−１７８０１０号公報JP, 2010-178010, A

例えば、学生の卒業アルバムとしての学生毎の動画データを生成するサービスが考えられる。例えば、このようなサービスにおいて、効率的に複数の学生を撮影するために、１つのカメラが撮影を開始し、学生Ａが当該カメラの撮影範囲に入ってきて一言コメントを言い、その後、学生Ａが撮影範囲から出て、撮影を止めることなく、次の学生Ｂが当該カメラの撮影範囲に入ってきて一言コメントを言うようにして、各学生の動画を含む１つの動画ファイルを生成するケースが考えられる。 For example, a service may be considered that generates animation data for each student as a student's graduation album. For example, in such a service, in order to shoot a plurality of students efficiently, one camera starts shooting, student A enters the shooting range of the camera, says a single comment, and then the students As A goes out of the shooting range and the next student B enters the shooting range of the camera and says a single comment without stopping shooting, one moving image file including moving images of each student is generated Cases are conceivable.

すなわち、１つのカメラが、複数の学生が入れ替わり撮影範囲に入ってきて一言コメントを言う各学生の撮影を継続して行い、各学生の動画を含む１つの動画ファイルを生成するケースが考えられる。 That is, it is conceivable that one camera continuously shoots each student who turns into a shooting range and a plurality of students alternately enter a shooting range and says a single comment, and generates one moving image file including moving images of each student. .

このようにして撮影して得られた１つの動画ファイルから、学生ごとに、当該学生が映っている動画ファイルを切り出すために、従来、ユーザは、当該１つの動画ファイルの映像を確認しながら、切取開始時間、切取終了時間を手入力して指定しなければならず、その作業が煩雑であった。また、ユーザは、その際に、複数の学生が入れ替わる際の映像（学生が映っていない映像）も確認することとなり、効率的に、切り取り作業を行うことは難しかった。 Conventionally, in order to cut out a moving image file in which the student is shown from each moving image file obtained by photographing in this manner, the user checks the video of the one moving image file, It was necessary to manually enter and specify the start time and end time of the cut, and the work was complicated. In addition, at that time, the user also checks a video (a video in which a student is not shown) when a plurality of students are replaced, and it has been difficult to perform the cutting work efficiently.

また、例えば、ある学生が撮影中に喋るコメントを忘れてしまい一時中断したいケースも考えられるが、そのような映像についても、ユーザは確認して、切取開始時間、切取終了時間を手入力して指定しなければならず、その作業が煩雑であった。 Also, for example, there may be a case where a student forgets a comment given while shooting and wants to temporarily interrupt it, but the user confirms such video, and manually inputs the start time and end time of the cut. It had to be specified, and the work was complicated.

このように、従来、複数の人物が入れ替わり撮影された各人物の動画を含む１つの動画ファイルから、効率的に、人物毎に、当該人物を含む動画ファイルをそれぞれ生成することが難しかった。 As described above, conventionally, it has been difficult to efficiently generate, for each person, a moving image file including the person from a single moving image file including moving images of each person in which a plurality of persons are alternately photographed.

そこで、本発明は、人物が撮影された動画データに含まれる音声に基づいて、動画データを編集するためのイン点、又はアウト点を効率的に設定する仕組みを提供することである。 Therefore, the present invention is to provide a mechanism for efficiently setting an in point or an out point for editing moving image data based on the sound included in the moving image data in which a person is photographed.

本発明は、人物が撮影された動画データに含まれる音声を検出する検出手段と、前記動画データを編集するためのイン点、又はアウト点を、前記検出手段により検出される音声に基づいて特定される位置に設定する設定手段と、を備えること特徴とする。 The present invention specifies a detection unit that detects a voice included in moving image data obtained by photographing a person, and an in point or an out point for editing the moving image data based on the voice detected by the detection unit. And setting means for setting the position to be set.

また、本発明は、情報処理装置の制御方法であって、人物が撮影された動画データに含まれる音声を検出する検出工程と、前記動画データを編集するためのイン点、又はアウト点を、前記検出工程により検出される音声に基づいて特定される位置に設定する設定工程と、を備えること特徴とする。 Further, the present invention is a control method of an information processing apparatus, which comprises a detection step of detecting a sound included in moving image data obtained by photographing a person, and an in point or an out point for editing the moving image data. And a setting step of setting the position specified based on the voice detected by the detection step.

また、本発明は、当該制御方法を実行するためのプログラムであることを特徴とする。 Further, the present invention is characterized by a program for executing the control method.

本発明によれば、人物が撮影された動画データに含まれる音声に基づいて、動画データを編集するためのイン点、又はアウト点を効率的に設定することが可能となる。 According to the present invention, it is possible to efficiently set an in point or an out point for editing moving image data based on the sound included in the moving image data in which a person is photographed.

本発明の情報処理システムのシステム構成の一例を示すである。1 shows an example of a system configuration of an information processing system of the present invention. 本発明の実施形態におけるＰＣ１０１に適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the information processing apparatus applicable to PC101 in embodiment of this invention. 動画の各フレームと、ファイル始点３０１、各ＩＮ点、各ＯＵＴ点、ファイル終点３０６との位置（時間）の関係を示す図の一例である。FIG. 6 is an example of a diagram showing the relationship between the position (time) of each frame of a moving image, the file start point 301, each IN point, each OUT point, and the file end point 306. 動画編集画面の一例を示す図である。It is a figure which shows an example of a moving image edit screen. 表示領域４０９に表示される表示画面の一例である。7 is an example of a display screen displayed in a display area 409. 本実施形態の動画編集に係る処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process which concerns on the moving image edit of this embodiment. 動画のプレビュー再生を行う処理を示すフローチャートの一例を示す図である。FIG. 7 is a diagram showing an example of a flowchart showing processing of performing preview reproduction of a moving image. 被写体の人物リストの一例を示す図である。It is a figure showing an example of a person list of a subject. 動画データの各フレームと、当該フレーム（フレーム群）で検出される音声と、ＩＮ点、ＯＵＴ点の位置（時間）を説明する概念図の一例を示す図である。It is a figure which shows an example of the conceptual diagram explaining each frame of moving image data, the audio | voice detected by the said flame | frame (frame group), and the position (time) of IN point and OUT point. 複数のユーザ（例えば、学生）が入れ替わり撮影され得られた、複数の学生の動画を含む１つの動画ファイル１００１の概念図の一例を示す図である。It is a figure which shows an example of the conceptual diagram of one moving image file 1001 containing the moving image of a plurality of students by which a plurality of users (for example, students) were taken alternately and obtained.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の情報処理システムのシステム構成の一例を示すである。 FIG. 1 shows an example of the system configuration of the information processing system of the present invention.

ＰＣ１０１は、本発明の情報処理装置の適用例であり、デスクトップパソコン、ノートパソコン、タブレットＰＣ等のＰＣである。 The PC 101 is an application example of the information processing apparatus of the present invention, and is a PC such as a desktop personal computer, a notebook personal computer, and a tablet personal computer.

また、カメラ１０４は、動画及び写真を含む静止画を撮影することが可能なデジタルカメラ、又はデジタルビデオカメラである。カメラ１０４は、ＰＣ１０１と、有線又は無線通信により相互にデータを送受信する機能を備えている。そのため、カメラ１０４は、カメラ１０４で撮影された動画の動画ファイルを、ＰＣ１０１に、当該有線又は無線通信により送信して、ＰＣ１０１のメモリ等の記憶手段に記録することができる。 Also, the camera 104 is a digital camera or digital video camera capable of capturing still images including moving images and photographs. The camera 104 has a function of transmitting and receiving data to and from the PC 101 by wired or wireless communication. Therefore, the camera 104 can transmit the moving image file of the moving image captured by the camera 104 to the PC 101 by the wired or wireless communication, and record the moving image file in the storage unit such as the memory of the PC 101.

また、カメラ１０４は、ＳＤカード等の可搬型メモリ（携帯記録媒体）を装着可能であり、カメラ１０４で撮影された動画の動画ファイルを当該可搬型メモリに記録する機能を備えている。 In addition, the camera 104 can be mounted with a portable memory (portable recording medium) such as an SD card, and has a function of recording a moving image file of a moving image captured by the camera 104 in the portable memory.

カメラ１０４の撮像範囲（撮影範囲）は、椅子１０５に座った被写体（人物１０３）を撮影することができる撮像範囲である。 The imaging range (imaging range) of the camera 104 is an imaging range in which an object (person 103) sitting on the chair 105 can be imaged.

カメラ１０４は、ユーザによる撮影指示を受け付けると、カメラ１０４は、動画（映像）の撮影を開始する。また、カメラ１０４は、当該撮影された動画、及び静止画を、通信を介してＰＣ１０１の記憶手段に記憶するか、当該可搬型メモリに記録する。可搬型メモリに記憶した場合には、カメラ１０４から当該可搬型メモリを取り出して、ＰＣ１０１に当該取り出した可搬型メモリを装着して、当該可搬型メモリに記憶された動画、及び静止画を、ＰＣ１０１のメモリにコピー（展開）する。 When the camera 104 receives a shooting instruction from the user, the camera 104 starts shooting a moving image (video). In addition, the camera 104 stores the captured moving image and still image in the storage unit of the PC 101 through communication, or records the captured moving image and the still image in the portable memory. When stored in the portable memory, the portable memory is taken out from the camera 104, and the portable memory taken out is attached to the PC 101, and the moving picture and still picture stored in the portable memory are transferred to the PC 101. Copy (expand) to the memory of.

このようにして、ＰＣ１０１は、カメラ１０４で撮影された動画、及び静止画を取得することができる。 Thus, the PC 101 can acquire a moving image and a still image captured by the camera 104.

カメラ１０４が動画の撮影を開始した後に、カメラ１０４の撮像範囲に人物（ユーザ）が入ってきて人物が椅子１０５に座る。そして、人物が、当該人物の動画の撮影の準備ができたら、「開始」という言葉を喋る。そして、当該人物が一言コメントを言う。また、当該人物が、一言コメントの途中で、例えば、コメントを忘れてしまい一時中断したい場合には、「中断」という言葉を喋る。 After the camera 104 starts shooting a moving image, a person (user) enters the imaging range of the camera 104, and the person sits on the chair 105. Then, when the person is ready to shoot a moving image of the person, he / she speaks the word "start". And the said person says a word comment. In addition, when the person in question, for example, forgets a comment and wants to temporarily suspend in the middle of a single comment, he / she hears the word “interrupt”.

また、当該人物が、再度、撮影の準備ができたら、「再度開始」という言葉を喋る。そして、当該人物が、当該人物の動画の撮影が終了したい場合には、「終了」という言葉を喋る。 In addition, when the person is ready for shooting again, he / she speaks the word "Start again". Then, when the person wants to end the shooting of the moving image of the person, he / she speaks the word “end”.

そして、椅子１０５に座っていた人物（ユーザ）が立ち、カメラ１０４の撮像範囲外に移動しフレームアウトすることとなる。 Then, a person (user) sitting on the chair 105 stands up, moves out of the imaging range of the camera 104, and performs a frame out.

そして、撮影を止めることなく、次の異なる人物が、カメラ１０４の撮像範囲に入ってきて椅子１０５に座り、同様の動作を行うこととなる。このように、複数の学生が入れ替わり撮影範囲に入ってきて一言コメントを言うことで、カメラ１０４は、複数の学生の動画を含む１つの動画ファイルを生成し、当該生成された動画ファイルをＰＣ１０１の記憶手段が記憶することとなる。 Then, without stopping shooting, the next different person comes into the imaging range of the camera 104, sits on the chair 105, and performs the same operation. Thus, the camera 104 generates one moving image file including moving images of a plurality of students by the plurality of students alternately entering the shooting range and saying a single comment, and the generated moving image file is used as the PC 101. The storage means of will be stored.

図１０は、複数のユーザ（例えば、学生）が入れ替わり撮影され得られた、複数の学生の動画を含む１つの動画ファイル１００１（動画データ）の概念図の一例を示す図である。 FIG. 10 is a view showing an example of a conceptual diagram of one moving image file 1001 (moving image data) including moving images of a plurality of students, obtained by alternately photographing a plurality of users (for example, students).

図１０に示すように、複数の学生の動画を含む１つの動画ファイル１００１には、木屋野太郎さんの一言コメント付きの動画１００２が含まれており、そのあとに、木屋野次郎さんの一言コメント付きの動画１００３が含まれており、そのあとに、木屋野三郎さんの一言コメント付きの動画１００４が含まれており、そのあとに、木屋野四朗さんの一言コメント付きの動画１００５が含まれていることを示している。 As shown in FIG. 10, one moving image file 1001 including videos of a plurality of students includes a moving image 1002 with a single comment by Taro Kiyano, and then one Jiro Kiyano Video 1003 with comment is included, followed by video 1004 with comment by Saburo Kiyano, and then video 1005 with comment by Shiro Kiyano Indicates that it is included.

また、ＰＣ１０１の記憶手段（メモリ）には、予め、被写体の人物リスト（図８）が記憶されており、この人物リスト（図８）に示される順番で、撮影が行われる。 Further, the person list (FIG. 8) of the subject is stored in advance in the storage means (memory) of the PC 101, and shooting is performed in the order shown in the person list (FIG. 8).

図８は、被写体の人物リストの一例を示す図である。 FIG. 8 is a diagram showing an example of the person list of the subject.

図８に示すように、１番目には、木屋野太郎さんが示されており、２番目には、木屋野次郎さんが示されており、３番目には、木屋野三郎さんが示されており、４番目には、木屋野四朗さんが示されている。 As shown in FIG. 8, Taro Kiyano is shown at the first, Shiro Kiyano is shown at the second, Saburo Kiyano is shown at the third In the fourth place, Shiro Koyano is shown.

図８は、本発明の人物リストの適用例であり、動画データに含まれる各人物を個々に識別するための識別情報が、生成手段により動画データが生成される順番に対応して定められた人物リストをＰＣ１０１の記憶手段（メモリ）に記憶する。 FIG. 8 shows an application example of the person list according to the present invention, in which identification information for individually identifying each person included in the moving image data is determined corresponding to the order in which the moving image data is generated by the generating means. The person list is stored in storage means (memory) of the PC 101.

ＰＣ１０１の記憶手段には、後述するプログラム、各種リスト、テーブルが記憶されており、当該プログラムを実行することにより、本発明にかかる機能の動作・処理を実行する。 The storage means of the PC 101 stores a program, various lists, and a table, which will be described later. By executing the program, the operation and processing of the function according to the present invention are executed.

図２は、本発明の実施形態におけるＰＣ１０１に適用可能な情報処理装置のハードウェア構成の一例を示すブロック図である。各装置ともに、同様な構成を備えるため、同一の符号を用いて説明する。 FIG. 2 is a block diagram showing an example of a hardware configuration of an information processing apparatus applicable to the PC 101 in the embodiment of the present invention. The respective devices have the same configuration, and therefore will be described using the same reference numerals.

図２に示すように、情報処理装置は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、および通信Ｉ／Ｆコントローラ２０８が接続される。 As shown in FIG. 2, the information processing apparatus includes a central processing unit (CPU) 201, a read only memory (ROM) 202, a random access memory (RAM) 203, an input controller 205, a video controller 206, and a system bus 204. The memory controller 207 and the communication I / F controller 208 are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 centrally controls the devices and controllers connected to the system bus 204.

ＲＯＭ２０２あるいは外部メモリ２１１等の記憶装置は、ＣＰＵ２０１が実行する制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、本情報処理方法を実現するためのコンピュータ読み取り実行可能なプログラムおよび必要な各種データ（データテーブルを含む）を保持している。 A storage device such as the ROM 202 or the external memory 211 is a control program executed by the CPU 201 such as a BIOS (Basic Input / Output System), an OS (Operating System), or a computer readable and executable program for realizing the information processing method. And holds necessary data (including data tables).

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 203 functions as a main memory, a work area, and the like of the CPU 201. The CPU 201 loads programs necessary for execution of processing from the ROM 202 or the external memory 211 to the RAM 203 and executes the loaded programs to realize various operations.

入力コントローラ２０５は、入力デバイス２０９からの入力を制御する。入力デバイス２０９としては、キーボード、タッチパネル、マウス等のポインティングデバイス等が挙げられる。 The input controller 205 controls an input from the input device 209. Examples of the input device 209 include a keyboard, a touch panel, and a pointing device such as a mouse.

なお、入力デバイス２０９がタッチパネルの場合、ユーザがタッチパネルに表示されたアイコンやカーソルやボタンに合わせて押下（指等でタッチ）することにより、各種の指示を行うことができることとする。 When the input device 209 is a touch panel, various instructions can be given by the user pressing (touching with a finger or the like) according to an icon, a cursor, or a button displayed on the touch panel.

また、タッチパネルは、マルチタッチスクリーンなどの、複数の指でタッチされた位置を検出することが可能なタッチパネルであってもよい。 Further, the touch panel may be a touch panel capable of detecting a position touched by a plurality of fingers, such as a multi-touch screen.

ビデオコントローラ２０６は、ディスプレイ２１０などの外部出力装置への表示を制御する。ディスプレイは本体と一体になったノート型パソコンのディスプレイも含まれるものとする。なお、外部出力装置はディスプレイに限ったものははく、例えばプロジェクタであってもよい。また、前述のタッチ操作により受け付け可能な装置については、入力デバイス２０９を提供する。 Video controller 206 controls display on an external output device such as display 210. The display shall also include the display of a notebook computer integrated with the main unit. The external output device is not limited to the display, and may be, for example, a projector. In addition, an input device 209 is provided for an apparatus that can receive the touch operation described above.

なおビデオコントローラ２０６は、表示制御を行うためのビデオメモリ（ＶＲＡＭ）を制御することが可能で、ビデオメモリ領域としてＲＡＭ２０３の一部を利用することもできるし、別途専用のビデオメモリを設けることも可能である。 The video controller 206 can control a video memory (VRAM) for display control, and can use part of the RAM 203 as a video memory area, or can separately provide a dedicated video memory. It is possible.

メモリコントローラ２０７は、外部メモリ２１１へのアクセスを制御する。外部メモリとしては、ブートプログラム、各種アプリケーション、フォントデータ、ユーザファイル、編集ファイル、および各種データ等を記憶する外部記憶装置（ハードディスク）、フレキシブルディスク（ＦＤ）、或いはＰＣＭＣＩＡカードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等を利用可能である。 The memory controller 207 controls access to the external memory 211. The external memory is connected via an adapter to an external storage device (hard disk) that stores a boot program, various applications, font data, user files, editing files, various data, etc., a flexible disk (FD), or a PCMCIA card slot. Compact flash (registered trademark) memory etc. can be used.

通信Ｉ／Ｆコントローラ２０８は、ネットワークを介して外部機器（カメラ１０４）と接続・通信するものであり、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰを用いた通信、Ｗｉ−Ｆｉ、および３Ｇ回線を用いた通信が可能である。 The communication I / F controller 208 connects and communicates with an external device (camera 104) via a network, and executes communication control processing on the network. For example, communication using TCP / IP, communication using Wi-Fi, and 3G communication are possible.

なお、外部メモリ２１１等の記憶装置は情報を永続的に記憶するための媒体であって、その形態をハードディスク等の記憶装置に限定するものではない。例えば、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）などの媒体であってもよい。 The storage device such as the external memory 211 is a medium for permanently storing information, and the form thereof is not limited to the storage device such as a hard disk. For example, a medium such as a solid state drive (SSD) may be used.

また本実施形態における通信端末で行われる各種処理時の一時的なメモリエリアとしても利用可能である。 Moreover, it can also be used as a temporary memory area at the time of various processes performed by the communication terminal in the present embodiment.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上での表示を可能としている。また、ＣＰＵ２０１は、ディスプレイ２１０上の不図示のマウスカーソル等でのユーザ指示を可能とする。 The CPU 201 enables display on the display 210 by executing, for example, outline font rasterization processing on a display information area in the RAM 203. In addition, the CPU 201 enables user instruction with a mouse cursor (not shown) or the like on the display 210.

次に、図６のフローチャートを用いて、本実施形態におけるＰＣ１０１が実行する処理について説明する。 Next, processing executed by the PC 101 in the present embodiment will be described using the flowchart in FIG.

図６は、本実施形態の動画編集に係る処理を示すフローチャートの一例である。 FIG. 6 is an example of a flowchart showing processing relating to moving image editing of the present embodiment.

図６は、ＰＣ１０１のＣＰＵ２０１が所定の制御プログラムを読み出して実行する処理である。 FIG. 6 shows a process in which the CPU 201 of the PC 101 reads and executes a predetermined control program.

まず、ＰＣ１０１は、まず、所定の制御プログラムを起動すると、動画編集画面をディスプレイ２１０等の表示部に表示する。 First, when the PC 101 starts a predetermined control program, the PC 101 displays a moving image editing screen on a display unit such as the display 210.

そして、ＰＣ１０１は、処理対象となる図８に示す被写体の人物リストの選択をユーザにより受け付けて読み取る（Ｓ６０１）。ＰＣ１０１は、Ｓ６０１で読み取った人物リストのうち最も順番の早い（番号が低い）氏名（被写体を識別する情報）を処理対象として特定する。 Then, the PC 101 receives and reads the selection of the person list of the subject shown in FIG. 8 to be processed by the user (S601). The PC 101 specifies the name (information having a low number) in the order of the person list read in S601 (information identifying a subject) as the processing target.

そして、ＰＣ１０１は、Ｓ６０１で選択を受け付けた人物リストに対応し、処理対象とする１つの動画ファイルの読み取り指示をユーザにより受け付け、当該動画ファイルの読み取りを開始する（Ｓ６０２）。 Then, the PC 101 receives an instruction to read one moving image file to be processed by the user, corresponding to the person list selected in S601, and starts reading the moving image file (S602).

例えば、クラスごとの動画ファイルを生成し、例えば、図８に示すような被写体の人物リストが、学生のクラスごとに、記憶されており、どのクラスの学生の動画を処理するかを選択し、そのクラスの動画ファイルを処理対象として読み込むことができる。 For example, a moving image file for each class is generated, and for example, a person list of subjects as shown in FIG. 8 is stored for each class of student, and it is selected which moving image of the student of the class is to be processed. You can load a movie file of that class as a processing target.

Ｓ６０２における動画ファイルの読み取り処理の開始は、動画の最初のフレームから順に読み取り開始することを意味する。 The start of the moving image file reading process in step S602 means that reading starts from the first frame of the moving image in order.

まず、ＰＣ１０１は、動画（フレーム）を読み込み、当該読み込んだ動画（フレーム）を解析して、当該フレームの中に人物が含まれているか、すなわち、当該フレーム中の人物を検出できたか否かを判定する（Ｓ６０３）。 First, the PC 101 reads a moving image (frame), analyzes the read moving image (frame), and determines whether a person is included in the frame, that is, whether a person in the frame can be detected. It determines (S603).

そして、ＰＣ１０１は、当該フレームの中に人物が含まれていない、すなわち、当該フレーム中の人物を検出できなかったと判定された場合には（Ｓ６０３：ＮＯ）、次のフレームを読み取り、当該読み取ったフレームに対して、Ｓ６０３の処理を実行する。 Then, when it is determined that the person is not included in the frame, that is, it is determined that the person in the frame can not be detected (S603: NO), the PC 101 reads the next frame, and the reading is performed. The process of S603 is performed on the frame.

ＰＣ１０１は、当該フレームの中に人物が含まれている、すなわち、当該フレーム中の人物を検出できたと判定された場合には（Ｓ６０３：ＹＥＳ）、人物を検出できなかったフレームと、人物を検出できたと判定されたフレームとの間の位置（時間）に、ファイル始点３０１を設定する（ステップＳ６０４）。 When it is determined that the person is included in the frame, that is, the person in the frame is detected (S603: YES), the PC 101 detects the frame in which the person is not detected and the person. The file start point 301 is set at a position (time) between the frame determined to be successful (step S604).

Ｓ６０３は、本発明の特定手段の適用例であり、人物が入れ替わり撮影された動画データの中から、人物ごとにフレームインした時間又はフレームを特定する。 S603 is an application example of the specifying unit of the present invention, and specifies a time or a frame in which a person is framed in for each person from among moving image data captured in turn.

図３は、動画の各フレームと、ファイル始点３０１、各ＩＮ点、各ＯＵＴ点、ファイル終点３０６との位置（時間）の関係を示す図の一例である。 FIG. 3 is an example of a diagram showing the relationship between positions (time) of each frame of the moving image, the file start point 301, each IN point, each OUT point, and the file end point 306.

図３（Ａ）は、人物が検出されなくなる前に「終了」音声を検知した場合の当該関係を示す図の一例である。 FIG. 3A is an example of a diagram showing the relationship when the “end” voice is detected before the person is no longer detected.

図３（Ｂ）は、「終了」音声を検知できない状態で人物が検知出来なくなった場合の当該関係を示す図の一例である。 FIG. 3B is an example of a diagram showing the relationship in the case where the person can not be detected in a state where the “end” voice can not be detected.

図３（Ａ）を例に説明すると、Ｓ６０４では、人物を検出できなかったフレーム３０７と、人物を検出できたと判定されたフレーム３０８との間の位置（時間）に、ファイル始点３０１を設定する。 Referring to FIG. 3A as an example, in S604, the file start point 301 is set at a position (time) between the frame 307 in which the person is not detected and the frame 308 in which the person is determined to be detected. .

そして、ファイル始点３０１を設定すると、ＰＣ１０１は、「開始」という所定の音声を検出したか否かを判定する（Ｓ６０５）。そして、ＰＣ１０１は、「開始」という所定の音声を検出していないと判定されると（Ｓ６０５：ＮＯ）、次のフレームも処理対象にして、直前の過去に処理対象にしたフレーム（群）の音声も含めて、「開始」という所定の音声を検出するか否かを判定する（Ｓ６０５）。 Then, when the file start point 301 is set, the PC 101 determines whether or not a predetermined voice "start" has been detected (S605). When it is determined that the predetermined voice "start" has not been detected (S605: NO), the next frame is also to be processed, and the frame (group) to be processed immediately before is selected. It is determined whether a predetermined voice of "start" is detected, including voice (S605).

Ｓ６０５、Ｓ６０８、Ｓ６１１、Ｓ６１５は、本発明の検出手段の適用例であり、人物が撮影された動画データに含まれる音声を検出する。 S605, S608, S611, and S615 are application examples of the detection means of the present invention, and detect the sound included in the moving image data in which a person is photographed.

Ｓ６０５の検出手段は、イン点（３０２）を設定するための第１音声（開始）を検出し、Ｓ６０７の設定手段は、動画データの中で、Ｓ６０５の検出手段で第１音声が検出された時間又はフレームに基づき特定される位置にイン点を設定する。すなわち、Ｓ６０７の設定手段は、例えば、図９に示すように、動画データの中で、検出手段で第１音声が検出されたフレーム（３０８）と、当該フレームの後続のフレームであって第１音声が検出されなかったフレーム（３０９）との間の位置にイン点を設定する。 The detection means of S605 detects the first voice (start) for setting the in point (302), and the setting means of S607 detects the first voice by the detection means of S605 in the moving image data. Set the In point at a position specified based on time or frame. That is, as shown in FIG. 9, for example, as shown in FIG. 9, the setting unit of S607 is a frame (308) in which the first sound is detected by the detection unit in the moving image data, and a frame subsequent to the frame. An in point is set at a position between the frame (309) in which the voice was not detected.

そして、ＰＣ１０１は、「開始」という所定の音声を検出できたと判定されると（Ｓ６０５：ＹＥＳ）、当該検出された「開始」という所定の音声の前後で継続して話が続いているか否かを判定する（Ｓ６０６）。具体的には、ＰＣ１０１は、当該「開始」という所定の音声が検出できたフレーム（フレーム群）の前後の隣接するフレーム（所定時間幅のフレーム群）を読み取り、当該検出された「開始」音声の前後で継続して話が続いているか否かを判定する（Ｓ６０６）。 Then, if it is determined that the predetermined voice "start" has been detected (S605: YES), whether or not the talk continues continuously before and after the detected predetermined voice "start". Is determined (S606). Specifically, the PC 101 reads adjacent frames (frame groups of a predetermined time width) before and after a frame (frame group) in which a predetermined speech of “start” has been detected, and the detected “start” speech It is determined whether the talk continues continuously before and after (S606).

そして、ＰＣ１０１は、当該検出された「開始」音声の前後で継続して話が続いていると判定された場合には（Ｓ６０６：ＹＥＳ）、次のフレームを処理対象にして、処理をＳ６０５に戻す。 Then, if it is determined that the speech continues continuously before and after the detected "start" voice (S606: YES), the PC 101 sets the processing to S605 with the next frame as the processing target. return.

また、ＰＣ１０１は、当該検出された「開始」音声の前後で継続して話が続いていないと判定された場合には（Ｓ６０６：ＮＯ）、「開始」という音声が検出されたフレームと、その直後のフレーム（「開始」という音声が検出されていないフレーム）との間の位置（時間）にＩＮ点３０２を設定する（Ｓ６０７）。 Further, when it is determined that the speech does not continue continuously before and after the detected “start” speech (S606: NO), the PC 101 detects a frame in which the speech “start” is detected, and The IN point 302 is set at a position (time) between the immediately following frame (a frame in which the voice "start" is not detected) (S607).

Ｓ６０７、Ｓ６１０、Ｓ６１３、Ｓ６１７、Ｓ６２７は、本発明の設定手段の適用例であり、検出手段により検出される音声に基づいて特定される位置に動画データのイン点、又はアウト点を設定する。 S607, S610, S613, S617, and S627 are application examples of the setting unit of the present invention, and the in point or the out point of the moving image data is set at the position specified based on the sound detected by the detection unit.

図３（Ａ）を例に説明すると、Ｓ６０７では、「開始」という音声が検出されたフレーム３０８と、その直後のフレーム３０９（「開始」という音声が検出されていないフレーム）との間の位置（時間）にＩＮ点３０２を設定する。 Referring to FIG. 3A as an example, in S607, the position between the frame 308 in which the voice of "start" is detected and the frame 309 immediately after that (frame in which the voice of "start" is not detected). The IN point 302 is set to (time).

説明を簡単にするために、図３、図５に示すファイル始点３０１とＩＮ点３０２との間のフレーム３０８は１つのみを示しているが、複数のフレームが含まれていてもよい。同様に、ＩＮ点３０２とＯＵＴ点３０３との間にはフレーム３０９だけではなく複数のフレームが含まれていてもよい。また、同様に、ＯＵＴ点３０３とＩＮ点３０４との間にはフレーム３１０だけではなく複数のフレームが含まれていてもよい。また、同様に、ＩＮ点３０４とＯＵＴ点３０５との間にはフレーム３１１、３１２だけではなく３つ以上のフレームが含まれていてもよい。また、同様に、ＯＵＴ点３０５とファイル終点３０６との間にはフレーム３１３だけではなく複数のフレームが含まれていてもよい。 Although only one frame 308 between the file start point 301 and the IN point 302 shown in FIGS. 3 and 5 is shown to simplify the description, a plurality of frames may be included. Similarly, not only the frame 309 but also a plurality of frames may be included between the IN point 302 and the OUT point 303. Similarly, not only the frame 310 but a plurality of frames may be included between the OUT point 303 and the IN point 304. Similarly, not only the frames 311 and 312 but also three or more frames may be included between the IN point 304 and the OUT point 305. Similarly, not only the frame 313 but also a plurality of frames may be included between the OUT point 305 and the file end point 306.

図９に、動画データの各フレームと、当該フレーム（フレーム群）で検出される音声と、ＩＮ点、ＯＵＴ点の位置（時間）を説明する概念図を示す。 FIG. 9 is a conceptual diagram illustrating each frame of moving image data, audio detected in the frame (frame group), and positions (time) of IN point and OUT point.

図９は、動画データの各フレームと、当該フレーム（フレーム群）で検出される音声と、ＩＮ点、ＯＵＴ点の位置（時間）を説明する概念図の一例を示す図である。 FIG. 9 is a view showing an example of a conceptual diagram for explaining each frame of moving image data, audio detected in the frame (frame group), and positions (time) of IN point and OUT point.

次に、ＰＣ１０１は、ＩＮ点３０２を設定すると、次のフレームを処理対象にして、「中断」という所定の音声を検出したか否かを判定する（Ｓ６０７）。そして、ＰＣ１０１は、「中断」という所定の音声を検出していないと判定されると（Ｓ６０７：ＮＯ）、処理をＳ６１４に移行する。ＰＣ１０１は、Ｓ６１４において、現在処理対象のフレームの中に人物が含まれていないか否か、すなわち、当該フレーム中の人物を検出できなかったか否かを判定する（Ｓ６１４）。 Next, when the IN point 302 is set, the PC 101 determines the next frame as a processing target, and determines whether or not a predetermined voice of “suspend” is detected (S 607). Then, if it is determined that the predetermined sound of “suspend” is not detected (S607: NO), the PC 101 shifts the processing to S614. In S614, the PC 101 determines whether a person is not included in the current processing target frame, that is, whether or not a person in the frame can not be detected (S614).

ＰＣ１０１は、フレームの中に人物が含まれている、すなわち、当該フレーム中の人物を検出できたと判定された場合には（Ｓ６１４：ＮＯ）、処理をＳ６１５に移行して、「終了」という所定の音声を検出できたか否かを判定する（Ｓ６１５）。そして、ＰＣ１０１は、「終了」という所定の音声を検出できなかったと判定された場合には（Ｓ６１５：ＮＯ）、次のフレームも処理対象にして、直前の過去に処理対象にしたフレーム（群）の音声も含めて、「中断」という所定の音声を検出するか否かを判定する（Ｓ６０８）。 If the PC 101 determines that the person includes the person in the frame, that is, it is determined that the person in the frame has been detected (S614: NO), the process proceeds to S615, and the predetermined “end” is given. It is determined whether or not the voice of B. has been detected (S615). Then, when it is determined that the predetermined voice "END" has not been detected (S615: NO), the PC 101 also sets the next frame as a processing target, and the frame (group) processed immediately before is the processing target. It is determined whether or not to detect a predetermined sound of "suspension", including the sound of (S608).

Ｓ６０８は、本発明の検出手段の適用例であり、イン点（３０２）とアウト点（３０５）との間の動画データの中で、イン点（３０２）とは別の第１イン点（３０４）を設定するための第３音声（中断）を検出する。 S 608 is an application example of the detection means of the present invention, and in the moving image data between the in point (302) and the out point (305), a first in point (304) different from the in point (302) To detect the third voice (disruption) to set.

Ｓ６１５の検出手段は、アウト点（３０５）を設定するための第２音声を検出し、Ｓ６１７の設定手段は、動画データの中で、検出手段で第２音声が検出された時間又はフレームに基づき特定される位置にアウト点（３０５）を設定する。すなわち、Ｓ６１７の設定手段は、動画データの中で、検出手段で第２音声が検出されたフレームと、当該フレームよりも前のフレームであって第２音声が検出されなかったフレームとの間の位置にアウト点（３０５）を設定する。 The detection means of S615 detects the second voice for setting the out point (305), and the setting means of S617 is based on the time or frame at which the second voice is detected by the detection means in the moving image data. The out point (305) is set at the identified position. That is, the setting unit of S617 is between the frame in which the second sound is detected by the detection unit in the moving image data and the frame which is a frame before the frame and in which the second sound is not detected. Set the out point (305) to the position.

そして、ＰＣ１０１は、「中断」という所定の音声を検出した場合には（Ｓ６０８：ＹＥＳ）、当該検出された「中断」という所定の音声の前後で継続して話が続いているか否かを判定する（Ｓ６０９）。具体的には、ＰＣ１０１は、当該「中断」という所定の音声が検出できたフレーム（フレーム群）の前後の隣接するフレーム（所定時間幅のフレーム群）を読み取り、当該検出された「中断」音声の前後で継続して話が続いているか否かを判定する（Ｓ６０９）。 Then, when the PC 101 detects a predetermined voice of “suspension” (S 608: YES), the PC 101 determines whether the speech continues continuously before and after the detected predetermined voice of “suspension”. (S609). Specifically, the PC 101 reads adjacent frames (frame groups of a predetermined time width) before and after a frame (frame group) in which a predetermined speech of “suspension” has been detected, and the detected “suspension” speech It is determined whether the talk continues continuously before and after (S609).

そして、ＰＣ１０１は、当該検出された「中断」音声の前後で継続して話が続いていると判定された場合には（Ｓ６０９：ＹＥＳ）、次のフレームを処理対象にして、処理をＳ６０８に戻す。 Then, when it is determined that the speech continues continuously before and after the detected “suspended” voice (S 609: YES), the PC 101 sets the processing to S 608 with the next frame as the processing target. return.

また、ＰＣ１０１は、当該検出された「中断」音声の前後で継続して話が続いていないと判定された場合には（Ｓ６０９：ＮＯ）、例えば、図９のように、「中断」という音声が検出されたフレームと、その直前のフレーム（「中断」という音声が検出されていないフレーム）との間の位置（時間）にＯＵＴ点３０３を設定する（Ｓ６１０）。 In addition, when it is determined that the speech does not continue continuously before and after the detected “suspended” voice (S 609: NO), for example, as shown in FIG. The OUT point 303 is set at the position (time) between the frame in which the frame is detected and the frame immediately before it (the frame in which the voice "interrupt" is not detected) (S610).

図３（Ａ）を例に説明すると、Ｓ６１０では、「中断」という音声が検出されたフレーム３１０と、その直前のフレーム３０９（「中断」という音声が検出されていないフレーム）との間の位置（時間）にＯＵＴ点３０３を設定する。 Referring to FIG. 3A as an example, in S610, the position between the frame 310 in which the voice of "suspension" is detected and the frame 309 immediately before that (frame in which the voice of "suspension" is not detected). The OUT point 303 is set to (time).

Ｓ６１０は、本発明の設定手段の適用例であり、イン点（３０２）とアウト点（３０５）との間の動画データの中で、検出手段で第４音声（中断）が検出された時間又はフレームに基づき特定される時間に第１アウト点（３０３）を設定する。 S610 is an application example of the setting means of the present invention, and in the moving image data between the in point (302) and the out point (305), the time when the fourth voice (interruption) is detected by the detection means or A first out point (303) is set at a time specified based on a frame.

そして、ＰＣ１０１は、ＯＵＴ点３０３を設定すると、次のフレームを処理対象にして、「再度開始」という所定の音声を検出したか否かを判定する（Ｓ６１１）。そして、ＰＣ１０１は、「再度開始」という所定の音声を検出していないと判定されると（Ｓ６１１：ＮＯ）、次のフレームも処理対象にして、直前の過去に処理対象にしたフレーム（群）の音声も含めて、「再度開始」という所定の音声を検出するか否かを判定する（Ｓ６１１）。 Then, when the OUT point 303 is set, the PC 101 determines the next frame as a processing target, and determines whether or not a predetermined voice "restart" is detected (S611). Then, when it is determined that the PC 101 does not detect the predetermined voice "restart" (S611: NO), the next frame is also to be processed, and the frame (group) to be processed immediately before is It is determined whether or not a predetermined voice of "restart" is to be detected, including the voice of (S611).

Ｓ６０８は、本発明の検出手段の適用例であり、イン点（３０２）とアウト点（３０５）との間の動画データの中で、アウト点（３０５）とは別の第１アウト点（３０３）を設定するための第４音声（再度開始）を検出する。 S 608 is an application example of the detection means of the present invention, and in the moving image data between the in point (302) and the out point (305), a first out point (303) different from the out point (305). To detect the fourth voice (start again) for setting.

そして、ＰＣ１０１は、「再度開始」という所定の音声を検出した場合には（Ｓ６１１：ＹＥＳ）、当該検出された「再度開始」という所定の音声の前後で継続して話が続いているか否かを判定する（Ｓ６１２）。具体的には、ＰＣ１０１は、当該「再度開始」という所定の音声が検出できたフレーム（フレーム群）の前後の隣接するフレーム（所定時間幅のフレーム群）を読み取り、当該検出された「再度開始」音声の前後で継続して話が続いているか否かを判定する（Ｓ６１２）。 Then, when the PC 101 detects a predetermined voice “restart” (S 611: YES), whether or not the talk continues continuously before and after the detected predetermined voice “restart” Is determined (S612). Specifically, the PC 101 reads adjacent frames (frame groups of a predetermined time width) before and after a frame (frame group) in which the predetermined voice "restart" has been detected, and the detected "restart" It is determined whether the speech continues continuously before and after the voice (S612).

そして、ＰＣ１０１は、当該検出された「再度開始」音声の前後で継続して話が続いていると判定された場合には（Ｓ６１２：ＹＥＳ）、次のフレームを処理対象にして、処理をＳ６１１に戻す。 Then, if it is determined that the speech continues continuously before and after the detected "restart" voice (S612: YES), the PC 101 sets the processing to be performed on the next frame as the processing target S611. Back to.

ＰＣ１０１は、当該検出された「再度開始」音声の前後で継続して話が続いていないと判定された場合には（Ｓ６１２：ＹＥＳ）、例えば、図９のように、「再度開始」という音声が検出されたフレームと、その直後のフレーム（「再度開始」という音声が検出されていないフレーム）との間の位置（時間）にＩＮ点３０４を設定する（Ｓ６１３）。 If the PC 101 determines that the speech does not continue continuously before and after the detected "restart" voice (S612: YES), for example, as shown in FIG. The IN point 304 is set at the position (time) between the frame in which the frame is detected and the frame immediately after that (the frame in which the voice "restart" is not detected) (S613).

図３（Ａ）を例に説明すると、Ｓ６１３では、「再度開始」という音声が検出されたフレーム３１０と、その直後のフレーム３１１（「再度開始」という音声が検出されていないフレーム）との間の位置（時間）にＩＮ点３０４を設定する。 Referring to FIG. 3A as an example, in S613, a period between a frame 310 in which the voice "restart" is detected and a frame 311 immediately after that (a frame in which the voice "restart" is not detected). The IN point 304 is set to the position (time) of

Ｓ６１３は、本発明の設定手段の適用例であり、イン点（３０２）とアウト点（３０５）との間の動画データの中で、検出手段で第３音声（再度開始）が検出された時間又はフレームに基づき特定される時間に第１イン点（３０４）を設定する。 S613 is an application example of the setting means of the present invention, and in the moving image data between the in point (302) and the out point (305), the time when the third voice (restart) is detected by the detection means Alternatively, the first in point (304) is set to a time specified based on a frame.

このように、Ｓ６０８からＳ６１３の処理を実行することで、例えば、ある学生が撮影中に喋るコメントを忘れてしまい一時中断したい場合、その中断した映像をユーザが確認して、切取開始時間、切取終了時間を手入力して指定する作業を低減可能にすることができる。 In this way, by executing the processing from S608 to S613, for example, when a student forgets a comment given while shooting and wants to temporarily suspend, the user confirms the interrupted video, and the clipping start time, clipping By manually entering the end time, it is possible to reduce the number of designated operations.

そして、ＰＣ１０１は、ＩＮ点３０４を設定すると、次のフレームを処理対象にして、処理をＳ６１４に移行する。 After setting the IN point 304, the PC 101 sets the next frame as a processing target, and shifts the processing to S614.

ＰＣ１０１は、Ｓ６１４において、現在処理対象のフレームの中に人物が含まれていないか否か、すなわち、当該フレーム中の人物を検出できなかったか否かを判定する（Ｓ６１４）。 In S614, the PC 101 determines whether a person is not included in the current processing target frame, that is, whether or not a person in the frame can not be detected (S614).

そして、ＰＣ１０１は、「終了」という所定の音声を検出した場合には（Ｓ６１５：ＹＥＳ）、当該検出された「終了」という所定の音声の前後で継続して話が続いているか否かを判定する（Ｓ６１６）。具体的には、ＰＣ１０１は、当該「終了」という所定の音声が検出できたフレーム（フレーム群）の前後の隣接するフレーム（所定時間幅のフレーム群）を読み取り、当該検出された「終了」音声の前後で継続して話が続いているか否かを判定する（Ｓ６１６）。 When the PC 101 detects a predetermined voice of "end" (S615: YES), the PC 101 determines whether the speech continues continuously before and after the detected predetermined voice of "end". (S616). Specifically, the PC 101 reads adjacent frames (frame groups of a predetermined time width) before and after a frame (frame group) in which a predetermined voice of “end” has been detected, and the detected “end” voice It is determined whether the talk continues continuously before and after (S616).

そして、ＰＣ１０１は、当該検出された「終了」音声の前後で継続して話が続いていると判定された場合には（Ｓ６１６：ＹＥＳ）、次のフレームを処理対象にして、処理をＳ６１５に戻す。 Then, if it is determined that the speech continues continuously before and after the detected “end” voice (S 616: YES), the PC 101 sets the processing to S 615 with the next frame as the processing target. return.

ＰＣ１０１は、当該検出された「終了」音声の前後で継続して話が続いていないと判定された場合には（Ｓ６１６：ＹＥＳ）、例えば、図９のように、「終了」という音声が検出されたフレームと、その直前のフレーム（「終了」という音声が検出されていないフレーム）との間の位置（時間）にＯＵＴ点３０５を設定する（Ｓ６１７）。 If the PC 101 determines that the speech does not continue continuously before and after the detected "end" voice (S616: YES), for example, as shown in FIG. 9, the voice "end" is detected. The OUT point 305 is set at the position (time) between the received frame and the frame immediately preceding it (the frame in which the voice of "end" is not detected) (S617).

このように、Ｓ６０７で、動画データの中で、第１音声（例えば、開始）が検出されたフレームと、当該フレームの後続のフレームであって第１音声が検出されなかったフレームとの間の位置にイン点（３０２）を設定し、Ｓ６１７で、第２音声（例えば、終了）が検出されたフレームと、当該フレームよりも前のフレームであって第２音声が検出されなかったフレームとの間の位置にアウト点（３０５）を設定するため、第１音声や、第２音声が、イン点（３０２）とアウト点（３０５）との間に含まれないようなり、好適なイン点（３０２）とアウト点（３０５）の設定を効率的に行うことが可能となる。 Thus, in S 607, between the frame in which the first sound (for example, the start) is detected in the moving image data and the frame subsequent to the frame and in which the first sound is not detected. The in point (302) is set at the position, and in S617, the frame in which the second speech (for example, end) is detected and the frame before the frame but in which the second speech is not detected. To set the out point (305) at a position between them, the first voice and the second voice are not included between the in point (302) and the out point (305), and the preferred in point ( It becomes possible to efficiently set the 302) and the out point (305).

図３（Ａ）を例に説明すると、Ｓ６１７では、「終了」という音声が検出されたフレーム３１３と、その直前のフレーム３１２（「終了」という音声が検出されていないフレーム）との間の位置（時間）にＯＵＴ点３０５を設定する。 Taking FIG. 3A as an example, in S617, the position between the frame 313 in which the voice of "end" is detected and the frame 312 (frame in which the voice of "end" is not detected) immediately before that. The OUT point 305 is set to (time).

そして、ＰＣ１０１は、ＯＵＴ点３０５を設定すると、次のフレームを処理対象にして、処理をＳ６１８に移行する。 After setting the OUT point 305, the PC 101 sets the next frame as a processing target, and shifts the processing to S618.

次に、ＰＣ１０１は、Ｓ６１８において、現在処理対象のフレームの中に人物が含まれていないか否か、すなわち、当該フレーム中の人物を検出できなかったか否かを判定する（Ｓ６１８）。 Next, in S618, the PC 101 determines whether a person is not included in the current processing target frame, that is, whether or not a person in the frame can not be detected (S618).

ＰＣ１０１は、フレームの中に人物が含まれている、すなわち、当該フレーム中の人物を検出できたと判定された場合には（Ｓ６１８：ＮＯ）、次のフレームを処理対象にして、再度、Ｓ６１８の処理を行う。 When it is determined that the person contained in the frame, that is, the person in the frame has been detected (S 618: NO), the PC 101 sets the next frame as the processing target, and performs the processing in S 618 again. Do the processing.

また、ＰＣ１０１は、フレームの中に人物が含まれていない、すなわち、当該フレーム中の人物を検出できないと判定された場合には（Ｓ６１８：ＹＥＳ）、人物が含まれていないと判定されたフレームと、その直前のフレーム（人物が検出されたフレーム）との間の位置（時間）にファイル終点３０６を設定する（Ｓ６１９）。 In addition, when it is determined that the person is not included in the frame, that is, it is determined that the person in the frame can not be detected (S618: YES), the PC 101 determines that the person is not included. The file end point 306 is set at the position (time) between the frame and the frame immediately preceding it (the frame in which the person is detected) (S619).

また、ＰＣ１０１は、Ｓ６１４において、フレームの中に人物が含まれていない、すなわち、当該フレーム中の人物を検出できなかったと判定された場合には（Ｓ６１４：ＹＥＳ）、人物が含まれていないと判定されたフレームと、その直前のフレーム（人物が検出されたフレーム）との間の位置（時間）にＯＵＴ点３０５を設定する（Ｓ６２７）。 If the PC 101 determines in S614 that no person is included in the frame, that is, if it is determined that the person in the frame could not be detected (S614: YES), it is assumed that no person is included. The OUT point 305 is set at the position (time) between the determined frame and the frame immediately preceding it (the frame in which a person is detected) (S627).

図３（Ｂ）を例に説明すると、Ｓ６２７では、人物が含まれていないと判定されたフレーム３１４と、その直前のフレーム３１３（人物が検出されたフレーム）との間の位置（時間）にＯＵＴ点３０５を設定する。 Taking FIG. 3B as an example, in S627, at a position (time) between a frame 314 determined to contain no person and a frame 313 (a frame in which a person is detected) immediately before that. The OUT point 305 is set.

ＰＣ１０１は、Ｓ６２７でＯＵＴ点３０５が設定された位置（時間）の直前の表示するフレーム（人物が検出されたフレーム）を識別可能に表示する（強調表示する）ための識別情報を当該フレームに対して付加する（Ｓ６２８）。そして、ＰＣ１０１は、図３（Ｂ）に示すように、Ｓ６２７で設定したＯＵＴ点３０５と同じ位置（時間）にファイル終点３０６を設定する（Ｓ６１９）。 The PC 101 displays identification information for identifiably displaying (highlighting) a frame to be displayed (a frame in which a person is detected) immediately before the position (time) at which the OUT point 305 is set in S627. And add (S628). Then, as illustrated in FIG. 3B, the PC 101 sets the file end point 306 at the same position (time) as the OUT point 305 set in S627 (S619).

Ｓ６１８、Ｓ６１４は、本発明の特定手段の適用例であり、人物が入れ替わり撮影された動画データの中から、人物ごとにフレームアウトした時間又はフレームを特定する。 S 618 and S 614 are application examples of the specifying unit of the present invention, and specify the time or frame in which the frame is out for each person from the moving image data in which the person is alternately photographed.

Ｓ６２７は本発明の設定手段の適用例であり、検出手段でアウト点（３０５）を設定するための第２音声（終了）を検出できずに、動画データからフレームアウトした場合には、当該フレームアウトした時間又はフレームに基づき特定される分割位置に、アウト点を設定する。 S627 is an application example of the setting means of the present invention, and when the second voice (end) for setting the out point (305) can not be detected by the detection means, the frame is output from the moving image data, the frame The out point is set at the division position specified based on the time or frame that was out.

ＰＣ１０１は、Ｓ６０２で読み込んだ動画ファイルを、Ｓ６０４でファイル始点３０１が設定された位置（時間）と、Ｓ６１９でファイル終点３０６が設定された位置（時間）とで切り出して、当該切り出された動画ファイルと、現在の処理対象の氏名（被写体を識別する情報）とを対応付けて登録（設定）する（Ｓ６２０）。 The PC 101 cuts out the moving image file read in S602 at the position (time) where the file start point 301 is set in S604 and the position (time) where the file end point 306 is set in S619, and the cut out moving image file And the current processing target name (information for identifying the subject) are registered (set) in association with each other (S620).

Ｓ６２０は、本発明の生成手段の適用例であり、Ｓ６０３、Ｓ６１８、Ｓ６１４の特定手段により特定された時間又はフレームに基づき特定される分割位置で動画データを分割して人物ごとの動画データを個々に生成する。 S620 is an application example of the generation means of the present invention, and the moving picture data is divided at divided positions specified based on the time or frame specified by the specifying means of S603, S618, S614 to individually set moving picture data for each person. Generate to

Ｓ６２０は、本発明の登録手段の適用例であり、動画データの先頭から順次分割され生成された動画データを、当該動画データが生成された順番に対応して人物リスト（図８）に定められた識別情報と関連付けて登録する。 S620 is an application example of the registration means of the present invention, and the moving image data divided and generated sequentially from the top of the moving image data is defined in the person list (FIG. 8) corresponding to the order in which the moving image data is generated. Associate with the registered identification information and register.

そして、ＰＣ１０１は、Ｓ６０２で読み込んだ動画ファイルのすべてをＳ６０３からＳ６１９までの処理の処理対象にしたか否か、または、Ｓ６０１で読み取った人物リストの全ての氏名を処理対象として、Ｓ６０３からＳ６１９までの処理を実行したか否かを判定することにより、Ｓ６０２で読み込んだ動画ファイルの切り出し、ＩＮ点、ＯＵＴ点の設定の処理を終了するか否かを判定する（Ｓ６２１）。 Then, the PC 101 determines whether all moving image files read in S602 have been subjected to the processing of the processing of S603 to S619, or all names of the person list read in S601 as the processing target, from S603 to S619. It is determined whether the process of cutting out the moving image file read in S602 and setting of the IN point and the OUT point ends (S621).

ＰＣ１０１は、Ｓ６２１において、Ｓ６０２で読み込んだ動画ファイルのすべてをＳ６０３からＳ６１９までの処理の処理対象にした、または、Ｓ６０１で読み取った人物リストの全ての氏名を処理対象として、Ｓ６０３からＳ６１９までの処理を実行したと判定された場合には、Ｓ６０２で読み込んだ動画ファイルの切り出し、ＩＮ点、ＯＵＴ点の設定の処理を終了すると判定し（Ｓ６２１：ＹＥＳ）、処理をＳ６２２に移行する。 In step S621, the PC 101 processes all the moving image files read in step S602 as processing targets in steps S603 to S619, or sets all names in the person list read in step S601 as processing targets in steps S603 to S619. If it is determined that S has been executed, it is determined that the process of cutting out the moving image file read in S602 and setting of the IN point and the OUT point is ended (S621: YES), and the process proceeds to S622.

また、ＰＣ１０１は、Ｓ６２１において、Ｓ６０２で読み込んだ動画ファイルのすべてをＳ６０３からＳ６１９までの処理の処理対象にしていない、または、Ｓ６０１で読み取った人物リストの全ての氏名を処理対象として、Ｓ６０３からＳ６１９までの処理を実行していないと判定された場合には、Ｓ６０２で読み込んだ動画ファイルの切り出し、ＩＮ点、ＯＵＴ点の設定の処理を終了しないと判定し（Ｓ６２１：ＮＯ）、Ｓ６０１で読み取った人物リストの未処理の氏名のうち最も順番の早い（番号が低い）氏名を処理対象にすると共に、引き続き、Ｓ６０２で読み込んだ動画ファイルの次のフレームを処理対象にして、処理をＳ６０３に移行する。 Also, in S621, the PC 101 does not process all the moving image files read in S602 as the processing targets of S603 to S619, or processes all names in the person list read in S601 as S603 to S619. If it is determined that the process up to is not executed, it is determined that the process of cutting out the moving image file read in S602 and the setting of the IN point and the OUT point is not ended (S621: NO) and read in S601 While processing the processing target to the next frame of the moving image file read in S602 as the processing target, while processing the processing target as the processing target, while making the processing target the name with the earliest (low number) name among unprocessed names in the person list .

Ｓ６２１は、本発明の制御手段の適用例であり、特定手段により特定されたフレームインした時間又はフレームに基づき特定される分割位置と、特定手段により特定されたフレームアウトした時間又はフレームに基づき特定される分割位置との間の動画データに対して、検出手段による検出処理、及び設定手段による設定処理を行うように制御する。 S621 is an application example of the control means of the present invention, and it is specified based on the frame position-in time or frame-specific divided position specified by the identification means and the frame-out time or frame specified by the identification means Control is performed so that detection processing by the detection unit and setting processing by the setting unit are performed on the moving image data between the divided positions.

ＰＣ１０１は、Ｓ６０２で切り出された動画ファイルと、当該動画ファイルと対応付けて登録された氏名との一覧（リスト）を、動画編集画面（図４）に表示する（Ｓ６２２）。 The PC 101 displays a list (list) of the moving image file cut out in S602 and the names registered in association with the moving image file on the moving image editing screen (FIG. 4) (S622).

Ｓ６２２は、本発明の表示手段の適用例であり、動画編集画面（図４）に示すように、動画データの複数のフレームの画像を時系列に表示すると共に、当該画像間の位置に設定手段により設定されたイン点に関するコントロール（５０１、５０６）、及びアウト点に関するコントロール（５０５、５０２）を表示する。 S 622 is an application example of the display means of the present invention, and as shown in the moving picture editing screen (FIG. 4), images of a plurality of frames of moving picture data are displayed in time series and The control relating to the in point set by the (501, 506) and the control relating to the out point (505, 502) are displayed.

図４は、動画編集画面の一例を示す図である。 FIG. 4 is a diagram showing an example of a moving image editing screen.

４１２は、図８に示す人物リストの順番の値であり、４０１は、氏名を示している。これらの被写体を識別する情報と対応付けてＳ６２０で登録された切り出された動画ファイルが、４０３、４０４に動画のタイムラインとして示されている。４０３は、ＩＮ点３０２とＯＵＴ点３０５との間の動画のライムラインを示しており、４０４は、ＯＵＴ点３０５とファイル終点３０６との間の動画のタイムラインが示されている。また、４０５は、ファイル終点３０６以降の動画を示している。 Reference numeral 412 denotes the value of the order of the person list shown in FIG. 8, and reference numeral 401 denotes the name. The clipped moving image file registered in S620 in association with the information identifying the subject is shown as a moving image timeline in 403 and 404. 403 indicates the animation limeline between the IN point 302 and the OUT point 305, and 404 indicates the animation timeline between the OUT point 305 and the file end point 306. Reference numeral 405 denotes a moving image after the file end point 306.

また、４０２は、ＩＮ点３０２とＯＵＴ点３０５との間の動画、又はＩＮ点３０２とＯＵＴ点３０３との間の動画の各フレームのうち先頭のフレームの縮小画像（例えばサムネイル画像）である。 Reference numeral 402 denotes a reduced image (for example, a thumbnail image) of the first frame of the moving image between the IN point 302 and the OUT point 305 or the moving image between the IN point 302 and the OUT point 303.

また、４０８は、ファイル始点３０１とＩＮ点３０２との間の動画のタイムラインとして示されている。 Also, reference numeral 408 is shown as a timeline of the moving image between the file start point 301 and the IN point 302.

４０７は、切り出された動画ファイルに含まれる音の波形が表示される領域である。 Reference numeral 407 denotes an area where the waveform of the sound included in the clipped moving image file is displayed.

４０６は、Ｓ６２０において、切り出された動画ファイルと、現在の処理対象の氏名（被写体を識別する情報）とが対応付けられて登録（設定）され、ユーザの指示により、Ｓ６２６において任意にＩＮ点、ＯＵＴ点の編集（変更）の設定がなされ、当該ＩＮ点、ＯＵＴ点に基づき切り出され登録される動画ファイルの処理対象を選択するためのボタン（指示受付部）である。すなわち、誰の動画を処理対象にするにするかを選択するためのボタンである。 In 406, the clipped moving image file and the name of the current processing target (information for identifying a subject) are associated (registered) (set) in S620, and an IN point is arbitrarily selected in S626 in accordance with a user instruction. It is a button (instruction accepting unit) for selecting the processing target of the moving image file which is set based on the IN point and the OUT point after editing (modification) of the OUT point is made. That is, it is a button for selecting which moving image is to be processed.

図４に示すように、上述の４０１、４０２、４０３、４０４、４０５、４０６、４０７、４０８、４１２は、氏名ごと（Ｓ６２０で切り出された動画ファイルごと）にそれぞれリスト表示されている。 As shown in FIG. 4, the above-mentioned 401, 402, 403, 404, 405, 406, 407, 408, and 412 are listed and displayed for each name (for each moving image file cut out in S620).

４１１は、ファイル生成ボタンである。 Reference numeral 411 denotes a file generation button.

図４に示すように、上述の４０１、４０２、４０３、４０４、４０５、４０６、４０７、４０８、４１２は、氏名ごと（Ｓ６２０で切り出された動画ファイルごと）にそれぞれリスト表示されているが、いずれかの動画又は氏名（被写体を識別する情報）が選択されると、当該動画のプレビュー表示を行う表示領域４１０を動画編集画面（図４）は備えている。また、４１３は、再生ボタンであり、当該いずれかの動画又は氏名（被写体を識別する情報）が選択され、再生ボタン４１３がユーザにより押下されると、当該動画のプレビューを表示領域４１０に表示する。 As shown in FIG. 4, the above-mentioned 401, 402, 403, 404, 405, 406, 407, 408, and 412 are listed and listed for each name (for each moving image file cut out in S620). When a moving image or a name (information identifying a subject) is selected, the moving image editing screen (FIG. 4) has a display area 410 for displaying a preview of the moving image. Reference numeral 413 denotes a play button, which displays a preview of the moving image in the display area 410 when any one of the moving images or names (information for identifying a subject) is selected and the reproduction button 413 is pressed by the user. .

４０９は、リスト表示された動画又は氏名のいずれかが選択され、当該選択された動画のＩＮ点、ＯＵＴ点の編集指示をユーザにより受け付ける表示領域である。具体的には、４０９の表示領域には、例えば、図５（Ａ）あるいは図５（Ｂ）が表示される表示領域である。 Reference numeral 409 denotes a display area in which either the moving image or the name displayed in a list is selected, and the user receives an instruction to edit the IN point and OUT point of the selected moving image. Specifically, the display area 409 is, for example, a display area where FIG. 5 (A) or FIG. 5 (B) is displayed.

ＰＣ１０１は、Ｓ６２２において、動画編集画面（図４）にリスト表示された動画、氏名のうち、選択された動画、氏名の動画が表示領域４０９に表示するが、このとき、図５に示すように、ファイル始点３０１からＩＮ点３０２の間の動画のフレーム３０８、ＯＵＴ点３０３からＩＮ点３０４の間の動画のフレーム３１０（中断領域：中断映像の時間帯４０８のフレーム）、ＯＵＴ点３０５からファイル終点３０６の間の動画のフレーム３１３については、ブラックアウト、又はフレームの枠の色を他のフレームの枠を変えるなどして、識別可能に表示する。 In S 622, the PC 101 displays the selected moving image and the selected moving image of the names in the display area 409 in the moving image editing screen (FIG. 4), as shown in FIG. 5 at this time. , A frame 308 of the moving image between the file start point 301 and the IN point 302, a frame 310 of the moving picture between the OUT point 303 and the IN point 304 (interruption area: a frame of the interruption video time zone 408), and a file end point from the OUT point 305 With regard to the frame 313 of the moving image between 306, blackout or the color of the frame of the frame is identifiably displayed, for example, by changing the frame of another frame.

Ｓ６２８において、Ｓ６２７でＯＵＴ点３０５が設定された位置（時間）の直前の表示するフレーム（人物が検出されたフレーム）を識別可能に表示する（強調表示する）ための識別情報が、Ｓ６２８で当該フレームに対して付加されているため、当該識別情報が付加されたフレームを識別表示する。 In S628, identification information for identifiably displaying (highlighting) a frame to be displayed immediately before the position (time) where the OUT point 305 is set in S627 (a frame in which a person is detected) is relevant in S628. Since the frame is added, the frame to which the identification information is added is identified and displayed.

この識別表示は、本発明の通知手段の適用例であり、ＰＣ１０１は、検出手段でアウト点を設定するための第２音声を検出できずに、動画データからフレームアウトした場合には、当該第２音声を検出できずに動画データからフレームアウトしたことをユーザに知らせるための通知を行う。 This identification display is an application example of the notification means of the present invention, and when the PC 101 can not detect the second voice for setting the out point by the detection means, the frame is out of the moving image data. (2) A notification is given to notify the user that a frame out of moving image data has not been detected because the audio can not be detected.

すなわち、通知手段は、検出手段でアウト点を設定するための第２音声を検出できずに、動画データからフレームアウトした場合には、当該フレームアウトしたフレームの直前の表示されるフレームを他のフレームの表示形態とは異なる形態で識別可能に表示する。 That is, the notification means does not detect the second voice for setting the out point by the detection means, and when the moving picture data is out of frame, the frame displayed immediately before the frame which is out of the frame is The display is made distinguishably in a form different from the display form of the frame.

これにより、ユーザは、アウト点を設定するための第２音声を検出できずに、動画データからフレームアウトしたことを把握することができ、アウト点（３０５）のコントロール５０２の位置の調整を行う必要があることが分かり、コントロール５０２の位置調整をし忘れ難くなる。 As a result, the user can not detect the second voice for setting the out point, and can grasp that the frame is out of the moving image data, and adjusts the position of the control 502 of the out point (305). It will be appreciated that it is necessary and it will be difficult to forget to adjust the position of control 502.

図５は、表示領域４０９に表示される表示画面の一例である。 FIG. 5 is an example of a display screen displayed in the display area 409.

図５は、図３に示すファイル始点３０１からファイル終点３０６の範囲で切り出された動画の各フレームと、各ＩＮ点、各ＯＵＴ点との位置（時間）の関係を示す図の一例である。 FIG. 5 is an example of a diagram showing the relationship between positions (time) of each frame of the moving image cut out in the range from the file start point 301 to the file end point 306 shown in FIG. 3 and each IN point and each OUT point.

図５（Ａ）は、人物が検出されなくなる前に「終了」音声を検出した場合の当該関係を示す図の一例である。 FIG. 5A is an example of a diagram showing the relationship when “end” sound is detected before a person is no longer detected.

図５（Ｂ）は、「終了」音声を検出できない状態で、人物が検出出来なくなった場合の当該関係を示す図の一例である。 FIG. 5B is an example of a diagram showing the relationship when the person can not be detected in a state where the “end” voice can not be detected.

図５に示すように、表示領域４０９には、ユーザにより選択された動画の各フレームと、各ＩＮ点、各ＯＵＴ点の編集（変更）をユーザにより受け付けることが可能なコントロール５０１、５０２、５０５、５０６とが表示される。 As shown in FIG. 5, in the display area 409, controls 501, 502, 505 which allow the user to accept editing (modification) of each frame of the moving image selected by the user and each IN point and each OUT point. , 506 are displayed.

Ｓ６０４で設定されたファイル始点３０１と、Ｓ６０７で設定されたＩＮ点３０２との間のフレームが３０８である。 A frame 308 is between the file start point 301 set in S604 and the IN point 302 set in S607.

Ｓ６０７で設定されたＩＮ点３０２と、Ｓ６１０で設定されたＯＵＴ点３０３の間のフレームが３０９である。Ｓ６１０で設定されたＯＵＴ点３０３と、Ｓ６１３で設定されたＩＮ点３０４との間のフレームが３１０である。そして、この中断映像の時間帯４０８を識別可能に表示している。 A frame 309 is the frame between the IN point 302 set in step S607 and the OUT point 303 set in step S610. A frame 310 is between the OUT point 303 set in step S610 and the IN point 304 set in step S613. Then, the time zone 408 of the interruption video is displayed in a distinguishable manner.

また、Ｓ６１３で設定されたＩＮ点３０４と、Ｓ６１７で設定されたＯＵＴ点３０５との間のフレームが、３１１、３１２である。 Also, the frames between the IN point 304 set in S613 and the OUT point 305 set in S617 are 311 and 312, respectively.

また、Ｓ６１７で設定されたＯＵＴ点３０５と、Ｓ６１９で設定されたファイル終点３０６との間のフレームが３１３である（図５（A））。 Further, the frame between the OUT point 305 set in S617 and the file end point 306 set in S619 is 313 (FIG. 5A).

図５に示すように、ユーザによる編集指示を受け付ける前の初期画面では、Ｓ６０７で設定されたＩＮ点３０２の位置に、ＩＮ点３０２の編集（変更）をユーザにより受け付けることが可能なコントロール５０１が表示される。 As shown in FIG. 5, on the initial screen before accepting the editing instruction from the user, the control 501 capable of accepting the editing (modification) of the IN point 302 by the user at the position of the IN point 302 set in S607. Is displayed.

また、Ｓ６１０で設定されたＯＵＴ点３０３の位置には、ＯＵＴ点３０３の編集（変更）をユーザにより受け付けることが可能なコントロール５０５が表示される。 Further, at the position of the OUT point 303 set in S610, a control 505 which allows the user to receive an edit (change) of the OUT point 303 is displayed.

また、Ｓ６１３で設定されたＩＮ点３０４の位置には、ＩＮ点３０４の編集（変更）をユーザにより受け付けることが可能なコントロール５０６が表示される。 Further, at the position of the IN point 304 set in step S613, a control 506 that allows the user to accept editing (modification) of the IN point 304 is displayed.

また、Ｓ６１７で設定されたＯＵＴ点３０５の位置には、ＯＵＴ点３０５の編集（変更）をユーザにより受け付けることが可能なコントロール５０２が表示される。 Further, at the position of the OUT point 305 set in step S617, a control 502 which allows the user to accept editing (modification) of the OUT point 305 is displayed.

ＰＣ１０１は、動画編集画面（図４）を介して、ユーザによる操作指示を受け付ける（Ｓ６２３）。 The PC 101 receives an operation instruction from the user via the moving image editing screen (FIG. 4) (S623).

例えば、ＰＣ１０１は、ユーザの操作に従って、コントロール５０１、５０５、５０６、５０２を任意のフレーム間の位置に移動（変更）する。これにより、各ＩＮ点、各ＯＵＴ点の位置を任意の位置に変更することにより、動画の編集を行うことが可能なる。 For example, the PC 101 moves (changes) the control 501, 505, 506, 502 to a position between arbitrary frames according to the operation of the user. This makes it possible to edit a moving image by changing the positions of each IN point and each OUT point to an arbitrary position.

また、ＰＣ１０１は、Ｓ６２２において、動画編集画面（図４）にリスト表示された動画、氏名のうち、選択された動画、氏名の動画を表示領域４０９に表示し、各動画について、同様の操作を行うことが出来る。 Further, in S 622, the PC 101 displays the selected moving image and name moving image of the moving image and name listed in the moving image editing screen (FIG. 4) in the display area 409, and performs the same operation for each moving image. It can be done.

そして、ＰＣ１０１は、ユーザにより、動画編集画面（図４）にリスト表示された各動画、各氏名のボタン４０６（指示受付部）の選択を受け付け、ファイル生成ボタン４１１の押下を受け付ける（Ｓ６２３）。 Then, the PC 101 receives the selection of each of the moving images and each name button 406 (instruction receiving unit) listed and displayed on the moving image editing screen (FIG. 4) by the user, and receives the pressing of the file generation button 411 (S623).

ＰＣ１０１は、Ｓ６２３で、ユーザにより受け付けた操作内容が、ＩＮ点、又はＯＵＴ点の位置を任意の位置に変更することにより、動画の編集を行う編集指示であると判定された場合には（Ｓ６２４：編集指示）、ユーザ操作により任意にＩＮ点、又はＯＵＴ点が変更された位置を登録する編集処理を実行する（Ｓ６２６）。そして、処理をＳ６２３に戻す。 If the PC 101 determines in S623 that the operation content accepted by the user is an editing instruction for editing a moving image by changing the position of the IN point or the OUT point to an arbitrary position (S624) : Editing instruction), editing processing for registering the position at which the IN point or the OUT point is arbitrarily changed by the user operation (S626). Then, the process returns to S623.

Ｓ６２３は、本発明の受付手段の適用例であり、ユーザにより、表示手段により画像（フレーム）に対してコントロールが表示された位置の変更指示を受け付ける。 S623 is an application example of the receiving unit of the present invention, and the user receives an instruction to change the position at which the control is displayed on the image (frame) by the display unit.

また、Ｓ６２６は、本発明の変更手段の適用例であり、Ｓ６２３の受付手段により受け付けた変更指示に従って、当該画像（フレーム）に対するコントロールの表示位置の変更、及び、設定手段により設定されたイン点、又はアウト点の位置の変更を行いその結果を表示する。 Further, S626 is an application example of the changing means of the present invention, and according to the change instruction accepted by the accepting means of S623, the change of the display position of the control with respect to the image (frame) and the in point set by the setting means Or change the position of the out point and display the result.

また、ＰＣ１０１は、Ｓ６２３で、ユーザにより受け付けた操作内容が、リスト表示された各動画、各氏名のボタン４０６（指示受付部）の選択を受け付け、ファイル生成ボタン４１１の押下を受け付けた指示（生成指示）であると判定された場合には（Ｓ６２４：生成指示）、当該選択された動画のファイルの生成を行う（Ｓ６２５）。具体的には、現在、登録されている各ＩＮ点、及び各ＯＵＴ点の位置で、動画を切り取り、氏名（ユーザ）ごとの動画ファイルを、それぞれ生成する。すなわち、Ｓ６２６の編集処理が行われた場合には、当該編集されたＩＮ点、及び／又はＯＵＴ点の位置で、動画を切り取り、氏名（ユーザ）ごとの動画ファイルを、それぞれ生成する。 Further, the PC 101 receives the selection of the button 406 (instruction accepting unit) of each moving image and each name displayed in the list at step S623, and the operation content accepted by the user is an instruction (generation If it is determined that it is an instruction (S 624: generation instruction), the file of the selected moving image is generated (S 625). Specifically, the moving image is cut at the positions of each IN point and each OUT point currently registered, and a moving image file for each name (user) is generated. That is, when the editing process of S626 is performed, the moving image is cut at the position of the edited IN point and / or OUT point, and a moving image file for each name (user) is generated.

例えば、図５（Ａ）に示すように、フレーム３０８とフレーム３０９との間の位置に、ＩＮ点３０２のコントロール５０１が設定され、フレーム３０９とフレーム３１０との間の位置に、ＯＵＴ点３０３のコントロール５０５が設定され、フレーム３１０とフレーム３１１との間の位置に、ＩＮ点３０４のコントロール５０６が設定され、フレーム３１２とフレーム３１３との間の位置に、ＯＵＴ点３０５のコントロール５０２が設定されている場合、ＰＣ１０１は、Ｓ６２５において、コントロール５０１とコントロール５０５との間のフレーム（群）と、コントロール５０６とコントロール５０２との間のフレーム（群）とを切り出して、コントロール５０１とコントロール５０５との間のフレーム（群）と、コントロール５０６とコントロール５０２との間のフレーム（群）とを連結して１つの動画ファイルとして生成する。 For example, as shown in FIG. 5A, the control 501 of the IN point 302 is set at a position between the frame 308 and the frame 309, and the OUT point 303 is set at a position between the frame 309 and the frame 310. The control 505 is set, the control 506 of the IN point 304 is set at a position between the frame 310 and the frame 311, and the control 502 at the OUT point 305 is set at a position between the frame 312 and the frame 313. If so, the PC 101 cuts out the frame (group) between the control 501 and the control 505 and the frame (group) between the control 506 and the control 502 in S625, Frame (s) and controls 506 and controls And it connects the frame (s) between the roll 502 to generate a single video file.

また、ここで、コントロール５０５、及び５０６が無い場合には、コントロール５０１とコントロール５０２との間のフレーム（群）を切り出して、１つの動画ファイルとして生成する。 Here, if the controls 505 and 506 do not exist, the frame (group) between the control 501 and the control 502 is cut out to generate one moving image file.

この処理を、Ｓ６２３でユーザにより選択された動画ごと（氏名（ユーザ）ごと）に実行し、氏名（ユーザ）ごとの動画ファイルを、それぞれ生成する（Ｓ６２５）。そして、処理を終了する。 This processing is executed for each moving image (for each name (user)) selected by the user in S623, and a moving image file for each name (user) is generated (S625). Then, the process ends.

次に、図７を用いて、動画のプレビュー再生を行う処理について説明する。 Next, processing for performing preview reproduction of a moving image will be described with reference to FIG.

図７は、動画のプレビュー再生を行う処理を示すフローチャートの一例を示す図である。 FIG. 7 is a diagram showing an example of a flowchart showing processing for performing preview reproduction of a moving image.

図７に示すフローチャートは、ＰＣ１０１のＣＰＵ２０１が制御プログラムを読み出して実行する処理である。 The flowchart illustrated in FIG. 7 is a process in which the CPU 201 of the PC 101 reads and executes a control program.

図７に示す処理は、図６のＳ６２３で実行可能な処理である。 The process shown in FIG. 7 is a process that can be executed in S623 of FIG.

ＰＣ１０１は、ユーザにより、図４の動画編集画面にリスト表示された各氏名の各動画のうち１つの動画が選択され、ユーザにより、再生ボタン４１３の押下を受け付けたか否かを判定することにより、当該動画の再生指示を受け付けたか否かを判定する（Ｓ７０１）。 The PC 101 determines whether one moving image is selected from the moving images of the respective names listed on the moving image editing screen of FIG. 4 by the user, and the user determines whether the pressing of the play button 413 has been received or not. It is determined whether an instruction to play back the moving image has been received (S701).

ここでは、Ｓ６２６の編集処理を行う前の動画、又は、Ｓ６２６の編集処理後の動画（Ｓ６２６でＩＮ点、又はＯＵＴ点の位置を任意の位置に変更された動画）についての再生指示を受け付けたか否かを判定する。 Here, whether a playback instruction for a moving image before performing the editing process of S626 or a moving image after the editing process of S626 (moving image in which the position of the IN point or the OUT point is changed to an arbitrary position in S626) has been received It is determined whether or not.

そして、ＰＣ１０１は、動画の再生指示を受け付けたと判定された場合には、当該動画の動画ファイルの読み込みを行い（Ｓ７０２）、当該動画の再生を行う（Ｓ７０３）。ここでは、コントロール５０５とコントロール５０６との間のフレーム（群）については、スキップして再生されない。また、コントロール５０１以前のフレーム（群）、コントロール５０２以後のフレーム（群）についても、同様に再生されない。 Then, when it is determined that the reproduction instruction of the moving image is accepted, the PC 101 reads the moving image file of the moving image (S702), and reproduces the moving image (S703). Here, the frame (s) between the control 505 and the control 506 are not reproduced skipping. Further, the frame (group) before the control 501 and the frame (group) after the control 502 are not reproduced similarly.

ここで再生される動画は、コントロール５０１とコントロール５０５との間のフレーム（群）と、コントロール５０６とコントロール５０２との間のフレーム（群）とが連結された動画である。 The moving image reproduced here is a moving image in which the frame (group) between the control 501 and the control 505 and the frame (group) between the control 506 and the control 502 are linked.

ＰＣ１０１は、当該動画の再生が終了した場合には（Ｓ７０４：ＹＥＳ）、処理をＳ７０１に戻す。また、動画の再生が終了していない場合には、引き続き、当該動画の再生を行う（Ｓ７０３）。 When the reproduction of the moving image ends (S704: YES), the PC 101 returns the process to S701. In addition, when the reproduction of the moving image is not finished, the moving image is continuously reproduced (S703).

ＰＣ１０１は、Ｓ６０６、Ｓ６０９、Ｓ６１２、Ｓ６１６で説明した通り、検出手段により、イン点、又はアウト点を設定するための音声の直前又は直後の所定時間内に音声が続けて検出された場合には（Ｓ６０６：ＮＯ、Ｓ６０９：ＮＯ、Ｓ６１２：ＮＯ、Ｓ６１６：ＮＯ）、イン点、又はアウト点を設定するための当該音声に基づく動画データのイン点、又はアウト点の設定を行わない。 In the PC 101, as described in S606, S609, S612, and S616, when the voice is continuously detected within a predetermined time immediately before or after the voice for setting the in point or the out point by the detection unit. (S606: NO, S609: NO, S612: NO, S616: NO) The in point or out point is not set for setting the in point or out point of the moving image data based on the voice.

ＰＣ１０１は、Ｓ６０６、Ｓ６０９、Ｓ６１２、Ｓ６１６で説明した通り、イン点、又はアウト点を設定するための音声の直前又は直後の所定時間内に音声が続けて検出されなかった場合には（Ｓ６０６：ＹＥＳ、Ｓ６０９：ＹＥＳ、Ｓ６１２：ＹＥＳ、Ｓ６１６：ＹＥＳ）、イン点、又はアウト点を設定するための当該音声に基づく動画データのイン点、又はアウト点の設定を行う（Ｓ６０７、Ｓ６１０、Ｓ６１３、Ｓ６１７）。 As described in S606, S609, S612, and S616, the PC 101 does not detect voice continuously within a predetermined time immediately before or after the voice for setting the in point or the out point (S606: YES, step S609: YES, step S612: YES, step S616: YES), setting the in point or out point of the moving image data based on the voice to set the in point or out point (S607, S610, S613, S617).

以上、本発明によれば、人物が撮影された動画データに含まれる音声に基づいて、動画データを編集するためのイン点、又はアウト点を効率的に設定することができる。 As described above, according to the present invention, it is possible to efficiently set an in point or an out point for editing moving image data based on the sound included in the moving image data in which a person is photographed.

また、本発明によれば、複数の人物が入れ替わり撮影された各人物の動画を含む１つの動画ファイルから、効率的に、人物毎に、当該人物を含む動画ファイルをそれぞれ生成することが可能となる。 Further, according to the present invention, it is possible to efficiently generate, for each person, a moving image file including the person from a single moving image file containing moving images of each person in which a plurality of persons are alternately photographed. Become.

本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The present invention can be embodied as, for example, a system, an apparatus, a method, a program, or a recording medium. Specifically, the present invention may be applied to a system constituted by a plurality of devices, or may be applied to an apparatus comprising a single device.

また、本発明におけるプログラムは、図示したフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は当該処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは各装置の処理方法ごとのプログラムであってもよい。 Further, the program in the present invention is a program that can execute the processing method of the flowchart illustrated in the computer, and the storage medium of the present invention stores a program that can execute the processing method. The program in the present invention may be a program for each processing method of each device.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読み出し、実行することによっても本発明の目的が達成されることは言うまでもない。 As described above, the recording medium recording the program for realizing the functions of the above-described embodiments is supplied to the system or apparatus, and the computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by reading and executing.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記録した記録媒体は本発明を構成することになる。 In this case, the program itself read out from the recording medium realizes the novel function of the present invention, and the recording medium recording the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク等を用いることが出来る。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, non-volatile memory card, ROM, EEPROM, silicon A disk etc. can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiment are realized, but also an operating system (OS) or the like running on the computer is actually executed based on the instructions of the program. It goes without saying that the processing is partially or entirely performed, and the processing realizes the functions of the above-described embodiments.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided to the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is read based on the instruction of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiment are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、ひとつの機器から成る装置に適用しても良い。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system constituted by a plurality of devices or to an apparatus comprising a single device. It goes without saying that the present invention can also be applied to the case where it is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can receive the effects of the present invention.

さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 Further, by downloading and reading out a program for achieving the present invention from a server on a network, a database or the like by a communication program, the system or apparatus can receive the effects of the present invention. In addition, the structure which combined each embodiment mentioned above and its modification is also contained in this invention altogether.

１０１ＰＣ
１０２撮影者
１０３人物
１０４カメラ
１０５椅子 101 PC
102 Photographer 103 Person 104 Camera 105 Chair

Claims

Detection means for detecting sound included in moving image data in which a person is photographed;
Setting means for setting an in point or an out point for editing the moving image data to a position specified based on the sound detected by the detection means;
An information processing apparatus comprising:

The detection means detects a first voice for setting the in-point;
2. The apparatus according to claim 1, wherein the setting unit sets the in point at a position specified based on a time or a frame at which the first sound is detected by the detection unit in the moving image data. Information processing equipment.

The setting means is a frame between the frame in which the first sound is detected by the detection means and the frame which is a subsequent frame of the frame and in which the first sound is not detected in the moving image data. The information processing apparatus according to claim 2, wherein the in-point is set at a position.

The detection means detects a second voice for setting the out point;
4. The apparatus according to claim 1, wherein the setting unit sets the out point at a position specified based on a time or a frame at which the second sound is detected by the detection unit in the moving image data. The information processing apparatus according to any one of the above.

The setting means is a frame between the frame in which the second sound is detected by the detection means and the frame which is a frame before the frame and in which the second sound is not detected in the moving image data. 5. The information processing apparatus according to claim 4, wherein the out point is set at a position of.

Display means for displaying images of a plurality of frames of the moving image data in time series and displaying a control on an in point set by the setting means at a position between the plurality of images and a control on an out point;
Accepting means for accepting, by the user, an instruction to change the position at which the control is displayed on the image by the display means;
Changing means for changing the display position of the control with respect to the image, and changing the position of the in point or the out point set by the setting means, in accordance with the change instruction received by the reception means;
The information processing apparatus according to any one of claims 1 to 5, further comprising:

A specifying means for specifying a frame-in and a frame-out time or frame for each person from moving image data in which a person is alternately photographed;
Generation means for dividing the moving image data at division positions specified based on the time or frame specified by the specifying means to individually generate moving image data for each person;
The information processing apparatus according to any one of claims 1 to 6, further comprising:

For moving image data between a divided position specified based on the frame in time or frame specified by the specifying means and a divided position specified based on the frame out time or frame specified by the specifying means 8. The information processing apparatus according to claim 7, further comprising: control means for controlling to perform detection processing by the detection means and setting processing by the setting means.

Storage means for storing a person list in which identification information for individually identifying each person included in the moving image data is determined corresponding to the order of generation of the moving image data by the generating means;
Registration means for registering moving image data sequentially divided and generated from the head of the moving image data in association with identification information defined in the person list in correspondence with the order in which the moving image data is generated;
The information processing apparatus according to claim 7, further comprising:

When the setting means is not able to detect the second voice for setting the out point by the detection means, and the frame is out from the moving image data, the division specified based on the time or frame when the frame is out The information processing apparatus according to any one of claims 1 to 9, wherein the out point is set at a position.

If the second audio for setting the out point can not be detected by the detection means and the frame is out of the moving image data, the second audio can not be detected and the frame is out of the moving image data 11. The information processing apparatus according to claim 10, further comprising notification means for notifying a user of notification.

When the notification means does not detect the second voice for setting the out point by the detection means, and the frame is out of the moving image data, the frame displayed immediately before the frame which has been out of the frame is selected. The information processing apparatus according to claim 11, wherein the information processing apparatus displays the information in a distinguishable manner in a form different from the display form of the other frame.

The detection means is a third voice for setting a first in point different from the in point in the moving image data between the in point and the out point, and a different from the out point Detect the fourth voice to set the first out point,
The setting means sets the first out point at a time specified based on a time or a frame at which the fourth sound is detected by the detection means in the moving image data between the in point and the out point. 13. The method according to claim 1, wherein the first in-point is set to a time set based on a time or a frame at which the third voice is detected by the detection means. The information processing apparatus described in the item.

The setting unit is configured to, when the voice is continuously detected within a predetermined time immediately before or after the voice for setting the in point or the out point by the detection means, the in point or the in point. The in point or the out point of the moving image data based on the voice for setting the out point is not set, and the predetermined time immediately before or after the voice for setting the in point or the out point When a voice is not continuously detected in the inside, the setting of the in point or the out point of the moving image data based on the voice for setting the in point or the out point is performed. The information processing apparatus according to any one of claims 1 to 13.

A control method of the information processing apparatus,
A detection step of detecting sound included in moving image data in which a person is photographed;
A setting step of setting an in point or an out point for editing the moving image data to a position specified based on the sound detected by the detection step;
A control method comprising:

A program for executing the control method according to claim 15.