JP2009218851A

JP2009218851A - Video processing apparatus

Info

Publication number: JP2009218851A
Application number: JP2008060407A
Authority: JP
Inventors: Hisahiro Hayashi; 久紘林; Hiroshi Chiba; 浩千葉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-03-11
Filing date: 2008-03-11
Publication date: 2009-09-24
Anticipated expiration: 2028-03-11
Also published as: JP4857297B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance the compression ratio of a segmented video, to segment the video into which an image is fitted, to apply q geometrical transformation to the segmented video, and to weight a sound source with respect to the position of segmentation, with respect to a position where the video, is segmented in a video processing apparatus which is capable of segmenting videos within an arbitrary range from captured videos. <P>SOLUTION: The video processing apparatus packages therein a video segmenting section for segmenting a video and a video output section for outputting the segmented video and packages therein a reference frame selection section and an encoding section for performing encoding by using a selected frame, in order to improve compressibility; an image fitting section in order to segment the video into which the image is fitted; an image transforming section, in order to perform geometrical transformation upon the segmented video; and a microphone sound source adjusting section, in order to provide a sound corresponding to the position of segmentation, respectively. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は撮像する機能を有するビデオカメラや、デジタルカメラや、監視カメラや、定点カメラなどの映像処理装置に関する。 The present invention relates to a video processing apparatus such as a video camera, a digital camera, a surveillance camera, and a fixed point camera having an imaging function.

撮像した映像の一部を切り出す機能を有する映像処理装置に関する発明が検討されている。例えば、特許文献１には、「一台のカメラ装置で、方位及びズームを瞬時に切り替えた映像を得られるようにする」ことを目的とし、解決手段として、「映像処理装置であって、入力映像を実時間で記憶するメモリ手段と、当該メモリ手段に蓄積される画像の少なくとも１以上の切り出し範囲を指定する切り出し範囲指定手段と、当該１以上の切り出し範囲の画像を読み出し、所定表示サイズに変換する変換手段と、当該変換手段の出力する出力手段を備え、入力映像の一部を切り出して出力する」という技術が開示されている。 An invention relating to a video processing apparatus having a function of cutting out a part of a captured video image has been studied. For example, Patent Document 1 has an object of “allowing a single camera device to obtain an image in which direction and zoom are instantaneously switched”. Memory means for storing the video in real time, cutout range designation means for designating at least one cutout range of the image stored in the memory means, and reading out the images of the one or more cutout ranges to obtain a predetermined display size A technique is disclosed that includes a converting means for converting and an output means for outputting from the converting means, and a part of the input video is cut out and outputted.

特開平8-237590号公報JP-A-8-237590

ＭＰＥＧ等に代表される圧縮符号化技術においては、一つのフレームを符号化する際に、他のフレームを参照する方法が採用されてきていた。また、符号化の対象となるフレームと、参照フレームとが類似している程、符号化するフレームの圧縮率は高くなる。 In compression encoding techniques represented by MPEG and the like, a method of referring to another frame when encoding one frame has been adopted. Also, the more similar the frame to be encoded and the reference frame, the higher the compression rate of the frame to be encoded.

また、例えば、Ｈ．２６４等の圧縮符号化技術においては、一つのフレームを符号化する際に、複数の参照フレームを用意する。しかし、参照フレームの候補の数は、例えば５枚である等、有限のものに限られている場合が多かった。 Also, for example, H. In a compression encoding technique such as H.264, a plurality of reference frames are prepared when encoding one frame. However, the number of reference frame candidates is often limited to a finite number, for example, five.

ここで、広角映像の切り抜きについて検討する。広角映像の一部を切り抜き、その切り抜いた範囲を符号化していると、過去に同じ範囲、あるいは近い範囲を符号化していたという状況が発生する。この過去に符号化しているフレームを参照フレームとすると、符号化するに際し、圧縮率を高めることが考えられる。しかしながら、同じ範囲、あるいは近い範囲を符号化していたとしても、その範囲の符号化映像が参照フレームとされない場合が多かった。 Here, the clipping of a wide-angle image is examined. When a part of the wide-angle video is cut out and the cut out range is encoded, a situation occurs in which the same range or a close range has been encoded in the past. If the frame encoded in the past is used as a reference frame, it is conceivable to increase the compression rate when encoding. However, even if the same range or a close range is encoded, the encoded video in that range is often not a reference frame.

特許文献１に開示されている映像処理装置では、映像から映像を切り出すことは可能であるが、切り出した画像を圧縮する際に、その圧縮率を高めるという課題について検討されていない。 In the video processing apparatus disclosed in Patent Document 1, it is possible to cut out a video from a video, but the problem of increasing the compression rate when compressing a cut out image has not been studied.

また、特許文献１においては、広角映像に、例えば、プライバシーマスク等のはめこみ画像を合成した上に、圧縮処理の効率を向上させることについては検討されていない。 Further, Patent Document 1 does not discuss improving the efficiency of compression processing after combining a wide-angle image with, for example, an embedded image such as a privacy mask.

また、特許文献１においては、切り出す位置に対して音源の重みを変える点については検討されていない。 Further, in Patent Document 1, no consideration is given to changing the weight of the sound source with respect to the cut-out position.

本願発明は、例えば、切り出した画像の圧縮率を高くすることを目的とする。 An object of the present invention is, for example, to increase the compression rate of a cut-out image.

上記目的は、例えば特許請求の範囲に記載の発明により達成される。その代表的な例について説明する。本願発明の映像処理装置は、広角映像の切り抜いた範囲を示す位置情報を取得しておき、その位置情報を用いて参照フレームを選択する。 The above object can be achieved, for example, by the invention described in the claims. A typical example will be described. The video processing apparatus according to the present invention acquires position information indicating a cut-out range of a wide-angle video, and selects a reference frame using the position information.

本願発明によると、例えば、切り出した画像の圧縮率を高くすることが可能となる。 According to the present invention, for example, it is possible to increase the compression rate of a cut-out image.

以下、本発明の一実施の形態について図面を参照しながら説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

図1は、本実施形態による映像処理装置の構成の一例を示す。 FIG. 1 shows an example of the configuration of the video processing apparatus according to the present embodiment.

本実施例の映像処理装置は、例えば、ビデオカメラや、デジタルカメラや、監視カメラや、定点カメラなど、広く映像を処理する装置に適用可能である。また、本実施例の映像処理装置においては、監視カメラ等のシステムにおいて、外部の機器において撮像された映像を入力し、処理する装置であってもよいものとする。 The video processing apparatus according to the present embodiment can be applied to a wide range of video processing apparatuses such as a video camera, a digital camera, a surveillance camera, and a fixed point camera. Further, the video processing apparatus according to the present embodiment may be an apparatus that inputs and processes video captured by an external device in a system such as a monitoring camera.

図１において、０(ａ)は映像処理装置である。１はＣＣＤ等の受光素子を備え広角映像を取得する映像入力部である。２は映像入力部１により入力された広角映像を記録する映像記録部であり、例えば、メモリ等の記録装置で構成する。３は映像記録部２に記録した広角映像から映像を切り出す、映像切り出し部である。８は映像切り出し部３により切り出した映像を符号化するための最適な参照フレームを選択する参照フレーム選択部である。９は参照フレーム選択部８により選択した参照フレームを用いて切り出した映像を符号化する符号化部である。４(ａ)は符号化部９により符号化した映像を記録する切り出し映像記録部である。映像切り出し部３、参照フレーム選択部８、符号化部９は、例えばＡＳＩＣやＦＰＧＡ等の信号処理回路、あるいは、ＣＰＵやメモリ等の情報処理装置で構成してもよい。なお、別々に図示してあるが、これらの任意の組み合わせを、単一の装置で構成してもよいことはいうまでもない。また、映像記録部４（a）は、HDDや光ディスク、ホログラフィックメモリ等の記録装置で構成する。 In FIG. 1, 0 (a) is a video processing apparatus. An image input unit 1 includes a light receiving element such as a CCD and acquires a wide-angle image. Reference numeral 2 denotes a video recording unit that records the wide-angle video input by the video input unit 1, and is configured by a recording device such as a memory. Reference numeral 3 denotes a video cutout unit that cuts out a video from the wide-angle video recorded in the video recording unit 2. Reference numeral 8 denotes a reference frame selection unit that selects an optimal reference frame for encoding the video clipped by the video clipping unit 3. Reference numeral 9 denotes an encoding unit that encodes a video clipped using the reference frame selected by the reference frame selection unit 8. Reference numeral 4 (a) denotes a cut-out video recording unit that records the video encoded by the encoding unit 9. The video cutout unit 3, the reference frame selection unit 8, and the encoding unit 9 may be configured by a signal processing circuit such as an ASIC or FPGA, or an information processing apparatus such as a CPU or a memory. Although illustrated separately, it goes without saying that any combination of these may be configured by a single device. The video recording unit 4 (a) is configured by a recording device such as an HDD, an optical disk, or a holographic memory.

５は切り出し映像記録部４に記録している映像を出力する映像出力部である。映像出力部５は、例えば映像処理を行う処理回路や、映像の出力に用いるインターフェース等で構成する。１３はユーザが映像処理装置０（ａ）に指示を出すための操作部である。操作部１３は、例えばボタンや、ソフトキー、もしくは操作情報を入力するインターフェースで構成する。６は全体を制御するＣＰＵである。７はＣＰＵ６のメモリである。そして、映像出力部５、操作部１３、ＣＰＵ６、メモリ７は、バス１０に接続している。 Reference numeral 5 denotes a video output unit that outputs the video recorded in the cutout video recording unit 4. The video output unit 5 includes, for example, a processing circuit that performs video processing, an interface used for video output, and the like. Reference numeral 13 denotes an operation unit for a user to issue an instruction to the video processing apparatus 0 (a). The operation unit 13 includes, for example, a button, a soft key, or an interface for inputting operation information. Reference numeral 6 denotes a CPU for controlling the whole. Reference numeral 7 denotes a memory of the CPU 6. The video output unit 5, the operation unit 13, the CPU 6, and the memory 7 are connected to the bus 10.

この映像処理装置０(a)の動作について、図１、図２を用いて説明する。なお、図２の１１は映像処理装置０(a)の映像入力部１で取得した広角映像、１２(a)〜１２(g)は広角映像から映像を切り出す範囲を示している。図２は人がトラックを一周する状況を広角に撮影し、人が中心になるように映像を切り出す場合を想定している。 The operation of the video processing apparatus 0 (a) will be described with reference to FIGS. Note that reference numeral 11 in FIG. 2 denotes a wide-angle video acquired by the video input unit 1 of the video processing device 0 (a), and 12 (a) to 12 (g) denote ranges in which the video is cut out from the wide-angle video. FIG. 2 assumes a situation where a person goes around a track at a wide angle and cuts out an image so that the person is at the center.

まず、映像処理装置０(a)を固定して撮影を行う。そして、映像入力部１で広角映像１１を取得し、映像記録部２で記録する。次に、映像切り出し部３で広角映像１１から映像を切り出す。切り出す範囲の指定の仕方は、切り出す範囲及び拡大率をユーザが操作部１３より入力してもよいし、記録映像をディスプレイに出力し、ディスプレイにタッチスクリーンを用いて、タッチした部分を切り出すなどしてもよい。また、切り出す範囲はフレーム単位で可変である。そして、この切り出す範囲を示す位置情報を、例えば各フレーム毎に、映像切り出し部３あるいは、メモリ７に蓄積する。 First, shooting is performed with the video processing device 0 (a) fixed. Then, the wide-angle video 11 is acquired by the video input unit 1 and recorded by the video recording unit 2. Next, the video cutout unit 3 cuts out the video from the wide-angle video 11. The user can input the range to be cut out and the enlargement ratio from the operation unit 13, or output a recorded image to the display and use a touch screen to cut out the touched part. May be. In addition, the cutout range is variable in units of frames. Then, position information indicating the range to be cut out is stored in the video cutout unit 3 or the memory 7 for each frame, for example.

参照フレーム選択部８では、映像切り出し部３によって切り出した映像に対して、符号化する際に最適となる参照フレームを選択する。この具体的方法について、以下説明する。 The reference frame selection unit 8 selects a reference frame that is optimal when the video clipped by the video clip unit 3 is encoded. This specific method will be described below.

次に、符号化部９は、選択されたフレームを参照フレームとして符号化を行う。 Next, the encoding unit 9 performs encoding using the selected frame as a reference frame.

そして、映像処理装置０は、切り出された映像について、切り出し映像記録部４(a)で切り出した映像の記録を行う。次に、映像出力部５は、記録された映像に、復号等の処理を行った上で外部出力をする。 Then, the video processing device 0 records the video clipped by the cutout video recording unit 4 (a) for the cut video. Next, the video output unit 5 performs processing such as decoding on the recorded video and outputs it externally.

これにより、本実施例では映像から映像を切り出して符号化する際に、切り出した位置情報を用いて参照フレームを選択することで、映像の圧縮率を上げることができる。 As a result, in this embodiment, when a video is cut out from the video and encoded, the reference frame is selected using the cut position information, so that the video compression rate can be increased.

次に、映像切り出し部３で映像が切りされる場合に、保存される情報の形式の例について、図１２を例に説明する。１２００は、位置情報テーブルであり、各撮像フレームを識別するＩＤ１２０１と、各撮像の切り出し範囲を示す位置情報１２０２を有する。また、各フレームを撮像した時刻や時間等を示す時間情報１２０３を記憶してもよい。 Next, an example of a format of information stored when the video is cut out by the video cutout unit 3 will be described with reference to FIG. Reference numeral 1200 denotes a position information table, which has an ID 1201 for identifying each imaging frame, and position information 1202 indicating a cutout range of each imaging. Further, time information 1203 indicating the time and time when each frame is imaged may be stored.

ＩＤ１２０１には、例えばＭＰＥＧ２における、各フレームのＴＲ(Temporal Reference)の情報を用いてもよい。また、位置情報１２０２は、例えば広角映像中における切り出し範囲の座標により構成する。これにより、被写体が動けば、被写体の動きに合わせて切り出す範囲を変更し、特定の被写体を中心とした映像を切り出す事が可能である。また、座標は、例えば広角映像の中の画素単位で表現してもよい。つまり、例えば、図２の広角映像が、４０００×２０００画素を持つ映像であれば、X座標を１から４０００、Y座標を１から２０００として、位置情報を表現してもよい。また、時間情報１２０３は、例えば、各フレームが撮像されたＧＭＴ時間等を利用してもよい。また、符号化方式がＭＰＥＧであるならば、ＴＣ（Time Code）としてもよい。また、時間情報１２０３としては、時間そのものを示す情報でなくとも、フレームやピクチャのストリーム中における相対的な位置を示す情報を利用してもよい。また、位置情報テーブル１２００に記録する情報量の上限を定めておいて、古い情報から削除される構成としてもよい。この上限は、例えば、１０００フレーム分でも、２０００フレーム分の情報量でもよいが、その他の量でもよい。 As the ID 1201, TR (Temporal Reference) information of each frame in MPEG2, for example, may be used. Further, the position information 1202 is constituted by coordinates of a cutout range in a wide-angle video, for example. As a result, if the subject moves, it is possible to change the range to be cut out in accordance with the movement of the subject and cut out an image centered on the specific subject. The coordinates may be expressed in units of pixels in a wide-angle video, for example. That is, for example, if the wide-angle image in FIG. 2 is an image having 4000 × 2000 pixels, the position information may be expressed by setting the X coordinate from 1 to 4000 and the Y coordinate from 1 to 2000. The time information 1203 may use, for example, the GMT time when each frame is captured. If the encoding method is MPEG, TC (Time Code) may be used. Further, as the time information 1203, information indicating a relative position in a stream of a frame or a picture may be used instead of information indicating the time itself. In addition, an upper limit of the amount of information recorded in the position information table 1200 may be determined and deleted from old information. The upper limit may be, for example, an amount of information for 1000 frames or 2000 frames, but may be another amount.

次に、参照フレーム選択部８、符号化部９の動作について、詳細に説明する。 Next, operations of the reference frame selection unit 8 and the encoding unit 9 will be described in detail.

まず、参照フレーム選択部８に、映像切り出し部３で切り出された映像信号が、位置情報とともに入力される。次に、参照フレーム選択部８は、それ以前に符号化が行われたフレーム、つまり参照フレームの候補となる参照候補フレームの位置情報を、例えばメモリ７等から取得する。そして、複数の切り出し画像の中から、符号化の対象となるフレームと最も位置情報の差が小さい切り出し画像を選択する。 First, the video signal cut out by the video cutout unit 3 is input to the reference frame selection unit 8 together with the position information. Next, the reference frame selection unit 8 acquires, from the memory 7 or the like, for example, position information of a frame that has been encoded before, that is, a reference candidate frame that is a reference frame candidate. Then, a clipped image having the smallest difference in position information from the frame to be encoded is selected from the plurality of clipped images.

この動作を、図２、図１２の例でにおいて、切り出し画像（ｇ）を符号化する場合について説明する。まず、参照フレーム選択部８は、切り出し画像（ｇ）の位置情報（１８００、１８００）を取得する。また、参照フレーム選択部８は、位置情報テーブル１２００を参照して、過去に符号化した画像、つまり、切り出し画像（a）ないし（ｆ）の位置情報１２０２を取得する。そして、位置情報テーブル１２０２に記憶されている位置情報の中から、切り出し画像（ｇ）と位置情報の差が小さい切り出し画像のＩＤを選択する。そして、位置情報（１９００、１７００）を持つ切り出し画像(a)を参照フレームの候補として選択する。例えば切り出し画像１２(a)と切り出し画像１２(g)は時間的には離れているが、切り出し場所が近いため相関性が高い可能性があるだろうと予測ができる。よって切り出し画像１２(a)は切り出し画像１２(g)の参照フレームの候補となる。そして、参照フレーム選択部は、切り出し画像(a)に対応する参照フレームのID（１）を符号化部に出力する。 This operation will be described for the case where the clipped image (g) is encoded in the examples of FIGS. First, the reference frame selection unit 8 acquires position information (1800, 1800) of the cutout image (g). Further, the reference frame selection unit 8 refers to the position information table 1200 and acquires the position information 1202 of the previously encoded images, that is, the clipped images (a) to (f). Then, from the position information stored in the position information table 1202, the ID of the clipped image with a small difference between the clipped image (g) and the position information is selected. Then, the clipped image (a) having the position information (1900, 1700) is selected as a reference frame candidate. For example, although the cutout image 12 (a) and the cutout image 12 (g) are separated in time, it can be predicted that there is a possibility that the correlation is high because the cutout location is close. Therefore, the cutout image 12 (a) is a candidate for a reference frame of the cutout image 12 (g). Then, the reference frame selection unit outputs the ID (1) of the reference frame corresponding to the clipped image (a) to the encoding unit.

符号化部９は、入力した映像信号と、参照フレームのID（１）に基づいて映像の符号化を行う。具体的には、符号化部９は、取得したＩＤ(１)に対応する切り出し画像を取得し、取得した切り出し画像を参照フレームとして、映像信号の符号化を行う。 The encoding unit 9 encodes the video based on the input video signal and the ID (1) of the reference frame. Specifically, the encoding unit 9 acquires a clipped image corresponding to the acquired ID (1), and encodes a video signal using the acquired clipped image as a reference frame.

また、位置情報を記録する例として、位置情報テーブル１２を用いて記憶したが、決してテーブル形式の情報に限定されるものではない。個々のフレームやピクチャを特定可能な情報と、個々のフレームやピクチャを対応づける形式の情報であればよい。 Further, as an example of recording the position information, the position information table 12 is used for storage, but the position information is not limited to the table format information. Information that can identify individual frames and pictures and information in a format that associates individual frames and pictures may be used.

また、上述の例では、参照フレーム選択部８は、位置情報テーブル１２００の中で、符号化される切り出し画像（ｇ）との位置情報の差がもっとも小さい画像を選択する例について説明した。しかし、参照フレーム選択部８による選択の方法は、このの例に限定されるものではない。例えば、参照フレーム選択部８は、過去に符号化した切り出し画像の位置情報１２０２を、新しいものから順に参照していき、今から符号化を行う切り出し画像との位置情報１２０２との差が所定量以下であると判定した場合に、その位置情報１２０２に対応する切り出し画像を参照フレームとして選択する構成としてもよい。例えば、参照フレーム選択部８は、切り出し画像（ｇ）を符号化する際に、切り出し画像（ｆ）、切り出し画像（ｅ）、切り出し画像（ｄ）の順に位置情報１２０２を取得する。そして、取得した位置情報１２０２と、切り出し画像（ｇ）の位置情報（１９００、１７００）との差が、Ｘ座標で５０以下、Ｙ座標で５０以下の場合に、対応する切り出し画像を参照フレームとして選択する構成としてもよい。つまり、位置情報１２０２として、（１８５０〜１９５０、１６５０〜１７５０）を有する切り出し画像があれば、この切り出し画像を参照フレームとして選択してもよい。この構成とすることにより、位置情報テーブル１２００に含められる全ての位置情報１２０２を、今から符号化する切り出し画像との位置情報１２０２との差を比較しない場合でも、参照フレームを選択できる場合がある。 In the above-described example, the reference frame selection unit 8 has described an example in which the position information table 1200 selects an image with the smallest difference in position information from the clipped image (g) to be encoded. However, the selection method by the reference frame selection unit 8 is not limited to this example. For example, the reference frame selection unit 8 refers to the position information 1202 of the clipped image encoded in the past in order from the newest one, and the difference between the position information 1202 and the clipped image to be encoded from now is a predetermined amount. When it is determined that it is the following, a clipped image corresponding to the position information 1202 may be selected as a reference frame. For example, when encoding the cutout image (g), the reference frame selection unit 8 acquires the position information 1202 in the order of the cutout image (f), the cutout image (e), and the cutout image (d). When the difference between the acquired position information 1202 and the position information (1900, 1700) of the cutout image (g) is 50 or less in the X coordinate and 50 or less in the Y coordinate, the corresponding cutout image is used as a reference frame. A configuration may be selected. That is, if there is a clipped image having (1850 to 1950, 1650 to 1750) as the position information 1202, this clipped image may be selected as a reference frame. With this configuration, there is a case where a reference frame can be selected even when all the position information 1202 included in the position information table 1200 is not compared with the position information 1202 with the clipped image to be encoded from now. .

また、参照フレーム選択部８は、個々のフレームの映像が撮像された時刻の差を、参照フレームを選択する際に利用する構成としてもよい。例えば、撮像フレーム１２（ｇ）と、フレーム１２（ａ）との距離が近い場合であっても、時間の差が所定量以上ある場合は、フレーム（ａ）は参照フレームとしないことが可能である。時間の差は、任意に設定することが可能であり、例えば１０分、１時間、その他の量でもよい。また、この時間の差は、例えば、撮像を行うフレームレートと、フレーム数の差から算出できるが、それ以外の方法によって算出してもよい。このように、時間情報の差を用いることによって、主に時間の変動に起因する、参照フレームと符号化フレームとの相関の低下を抑制することが可能となる。例えば、グラウンドを撮影している場合において、夜に切り出し画像１２（ｇ）を撮像し、切り出し画像１２（ａ）を朝に撮像した場合には、フレーム全体の明暗が異なるため、切り出し画像１２（ａ）を参照フレームとしても圧縮効率が高くならない。これに対して、切り出し画像１２(a)と切り出し画像１２（ｇ）とが、ともに明るい時間帯に撮像された場合には、フレーム１２(a)を参照フレームとすると、フレーム全体の明暗も共通し、圧縮率を向上して符号化を行うことが可能となる。 Further, the reference frame selection unit 8 may be configured to use the difference in time at which the images of the individual frames are captured when selecting the reference frame. For example, even when the distance between the imaging frame 12 (g) and the frame 12 (a) is short, the frame (a) may not be a reference frame if the time difference is a predetermined amount or more. is there. The time difference can be arbitrarily set, and may be, for example, 10 minutes, 1 hour, or other amount. The time difference can be calculated from, for example, the frame rate at which imaging is performed and the difference in the number of frames, but may be calculated by other methods. Thus, by using the difference in time information, it is possible to suppress a decrease in the correlation between the reference frame and the encoded frame, which is mainly caused by a change in time. For example, in the case of shooting the ground, when the cutout image 12 (g) is captured at night and the cutout image 12 (a) is captured in the morning, the brightness and darkness of the entire frame differs. Even if a) is used as a reference frame, the compression efficiency does not increase. On the other hand, when both the cutout image 12 (a) and the cutout image 12 (g) are captured in a bright time zone, if the frame 12 (a) is the reference frame, the brightness of the entire frame is also common. In addition, encoding can be performed with an improved compression rate.

また、仮に、位置情報を用いずに、切りだし画像１２（ｇ）から参照フレームを決定しようとすると、例えば、切り出し画像１２（ｂ）ないし切り出し画像１２（ｇ）のうち、候補となるフレームとの相関を1枚ずつ演算し、相関が高いフレームがどれかを決定する必要がある。従って、切り出し画像１２（ｇ）と、切り出し画像１２（a）は、時間的に、あるいは撮像場所が離れていればいるほど、符号化する際には、相関を演算するためにＣＰＵ６等にかかる負荷が大きくなる。これに対して、本映像処理装置０は、切り出した位置の位置情報を一時的に保持しておくことにより、その負荷を軽減することが可能になる。また、位置情報テーブル１２００は、複数フレームの切り出し画像の情報よりも、情報量が少なくなる場合が多い、従って、参照フレームを選択する際に、参照フレーム選択部８が備えるバッファ、あるいはメモリ７等に、位置情報テーブルを記憶させておくことにより、参照フレームの候補を、本実施例の方式を用いない場合と比べて多く確保することが可能となる。 Also, if a reference frame is determined from the cutout image 12 (g) without using position information, for example, a candidate frame among the cutout image 12 (b) to the cutout image 12 (g) Must be calculated one by one to determine which frame has the highest correlation. Therefore, when the cutout image 12 (g) and the cutout image 12 (a) are encoded in time or as the imaging location is further away, the CPU 6 or the like is required to calculate the correlation. The load increases. On the other hand, the video processing apparatus 0 can reduce the load by temporarily holding the position information of the cut out position. Further, the position information table 1200 often has a smaller amount of information than information of a plurality of frames of cut-out images. Therefore, when selecting a reference frame, the buffer included in the reference frame selection unit 8 or the memory 7 or the like. In addition, by storing the position information table, it is possible to secure more reference frame candidates than in the case where the method of the present embodiment is not used.

また、特に切り出す領域の変更が頻繁に起こる場合など、連続するフレーム間の相関性が低くなるので、本手法は有効となる。この、切り出す領域の変更が頻繁に変更する場合としては、例えば野球中継などがある。野球中継では、投球シーン、守備シーン、走塁シーン、応援シーンなど状況に応じた様々な切り出しが考えられる。 In addition, this method is effective because the correlation between successive frames is lowered particularly when the region to be cut out frequently changes. An example of a case where the change of the cut-out area frequently changes includes a baseball broadcast. In baseball broadcasts, various cutouts are possible depending on the situation, such as a pitching scene, a defensive scene, a scouting scene, and a support scene.

また、本映像処理装置０を動き検出を行う監視カメラに適用した場合、撮像対象となる空間の複数の地点で動きを検出した場合にも、シーン切換を頻繁に行う必要があるので、有効性が高くなる。 In addition, when the present video processing device 0 is applied to a surveillance camera that performs motion detection, it is necessary to frequently perform scene switching even when motion is detected at a plurality of points in a space to be imaged. Becomes higher.

なお、本実施例における符号化処理は、位置情報テーブル１２００を用いない符号化の方法と同時に利用し得ることはいうまでもない。例えば、参照フレーム選択部８において、参照フレームの候補を選択できなかった場合には、直前のＩフレームを参照フレームとして選択する構成としてもよい。 Needless to say, the encoding process in this embodiment can be used simultaneously with the encoding method not using the position information table 1200. For example, when the reference frame selection unit 8 cannot select a reference frame candidate, the immediately preceding I frame may be selected as the reference frame.

次に、実施例１のように符号化するシステムであって、映像入力部１から取得した映像に対して領域を指定して任意の画像をはめ込み、そのはめ込んだ映像に対して映像の切り出しを行う場合について図３、図４を用いて説明をする。 Next, in the encoding system as in the first embodiment, an area is specified for the video acquired from the video input unit 1 to insert an arbitrary image, and the video is cut out from the inserted video. The case of performing will be described with reference to FIGS.

図３において０(ｂ)は映像処理装置である。また、４(ｂ)は切り出した映像を記録する切り出し映像記録部である。また、１４は画像はめ込み部である。その他のユニットは実施例１と同様である。 In FIG. 3, 0 (b) is a video processing apparatus. Reference numeral 4 (b) denotes a cutout video recording unit for recording the cutout video. Reference numeral 14 denotes an image fitting portion. Other units are the same as those in the first embodiment.

図４において、１１(ａ)〜１１(ｅ)は映像入力部１で取得した広角映像を表し１１(ａ)から１１(e)へ時系列に沿ったフレームとなっている。１５(ａ)〜１５(ｅ)は各広角映像１１(ａ)〜１１(ｅ)から切り出す範囲を示し、切り出した絵を１６(ａ)〜１６(ｅ)に示す。17は映像入力部1で取得した映像に画像のはめ込みをした領域を示す。図４は、車が木の横を通り過ぎる状況を固定の広角カメラで撮像した場合を表し、車の少し上の灰色の領域が画像のはめ込みを行った領域を示す。そしてユーザが車を中心とした映像を切り出した場合を想定している。 In FIG. 4, 11 (a) to 11 (e) represent wide-angle videos acquired by the video input unit 1 and are time-series frames from 11 (a) to 11 (e). Reference numerals 15 (a) to 15 (e) denote ranges to be cut out from the wide-angle images 11 (a) to 11 (e), and cut pictures are shown in 16 (a) to 16 (e). Reference numeral 17 denotes an area in which an image is inserted into the video acquired by the video input unit 1. FIG. 4 shows a situation in which a car passes by a tree and is captured by a fixed wide-angle camera, and a gray area slightly above the car shows an area where the image is fitted. It is assumed that the user cuts out an image centered on the car.

まず、映像処理装置０(b)を固定して撮影を行う。そして、映像入力部1で広角映像を取得し、映像記録部２で記録する。そして、取得した広角映像に対して、はめ込み部14で画像をはめ込む。はめ込む画像の選択及びはめ込む範囲の指定は操作部１３により行う。はめ込む画像と範囲についてはフレーム単位で変えてもよい。 First, shooting is performed with the video processing device 0 (b) fixed. Then, a wide-angle video is acquired by the video input unit 1 and recorded by the video recording unit 2. Then, the fitting unit 14 inserts an image into the acquired wide-angle video. Selection of an image to be inserted and designation of a range to be inserted are performed by the operation unit 13. The inset image and range may be changed in units of frames.

次に、映像切り出し部３で画像をはめ込んだ映像に対して、映像の切り出しをする。映像には既にはめ込み画像を適用している。このため映像の切り出し範囲を変える度に画像をはめ込む領域を算出することなく、画像をはめ込んだ映像を切り出すことが可能である。そして実施例１同様に参照フレームを選択し、映像を符号化する。 Next, the video cutout unit 3 cuts out the video from the video in which the image is inserted. An inset image has already been applied to the video. For this reason, it is possible to cut out a video in which an image is inserted without calculating a region in which the image is inserted each time the video cutout range is changed. Then, as in the first embodiment, a reference frame is selected and a video is encoded.

最後に、切り出し映像記録部４(ｂ)で切り出した映像の記録を行い、映像出力部５で映像の出力をする。映像に画像をはめ込む状況としては、プライバシーマスクなどセキュリティ上の用途や、スポーツ中継で特定の領域に広告などを表示するケースなどが考えられる。 Finally, the video clipped by the cutout video recording unit 4 (b) is recorded, and the video output unit 5 outputs the video. The situation in which an image is inserted into the video can be a security use such as a privacy mask, or a case where an advertisement is displayed in a specific area during sports broadcasting.

一台のカメラでパン・チルトによって撮像範囲を変えながら特定の領域に画像をはめ込む場合と比較すると、この場合は撮像範囲を変える度に、画像をはめ込む領域を計算する必要が生じるが、本実施例では、映像の切り出し範囲を変える度に画像をはめ込む領域を算出することなく、特定領域に画像をはめ込んだ映像を得る事ができる。 Compared to the case where an image is fitted to a specific area while changing the imaging range by panning and tilting with a single camera, in this case, it is necessary to calculate the area where the image is fitted each time the imaging range is changed. In the example, it is possible to obtain a video in which an image is inserted into a specific area without calculating an area in which the image is inserted each time the video cutout range is changed.

なお、本実施例において、実施例１の符号化方式を適用することも可能である。この場合、映像切り出し部３および切り出し映像記憶部４（ｂ）の間に参照フレーム選択部８を入れることとなる。 In the present embodiment, it is also possible to apply the encoding method of the first embodiment. In this case, the reference frame selection unit 8 is inserted between the video cutout unit 3 and the cutout video storage unit 4 (b).

次に、映像を切り出す際に、切り出す位置とカメラの位置関係から、切り出す映像に補正をかける場合について図５、図６、図７を用いて説明する。図５において０(c)は映像処理装置である。１８は画像変換部である。４(ｃ)は画像変換部１８によって変換した映像を記録する切り出し映像記録部である。その他のユニットは実施例１と同様である。 Next, a description will be given of a case where correction is performed on a cut-out video based on the positional relationship between the cut-out position and the camera when the video is cut out, with reference to FIGS. 5, 6, and 7. In FIG. 5, 0 (c) is a video processing device. Reference numeral 18 denotes an image conversion unit. Reference numeral 4 (c) denotes a cut-out video recording unit that records the video converted by the image conversion unit 18. Other units are the same as those in the first embodiment.

まず、映像処理装置０(ｃ)を固定して撮影を行う。そして、映像入力部１で広角映像を取得し、映像記録部２で記録する。そして、取得した広角画像に対して、映像切り出し部３で映像の切り出しを行う。次に、画像変換部１８でカメラの位置と切り出す位置の関係から、カメラが切り出す領域の方向を向いて撮像したような絵になるように幾何学的な変換を行う。 First, shooting is performed with the video processing device 0 (c) fixed. Then, a wide-angle video is acquired by the video input unit 1 and recorded by the video recording unit 2. Then, the video cutout unit 3 cuts out the video from the acquired wide-angle image. Next, from the relationship between the position of the camera and the position to be cut out by the image conversion unit 18, geometric conversion is performed so that the picture looks like a picture taken in the direction of the area to be cut out by the camera.

例えば、図６のように切り出したい領域とカメラの位置関係から透視変換のような画像変換を行ってもよい。ここで、図６の１９はカメラ、１１はカメラ１９により取得した広角画像、２０は切り出して変換する範囲、２１は切り出す範囲２０の画像を幾何学的変換した画像、２２は変換した領域２１を出力先のサイズに適した大きさに、アフィン変換などを施した画像となる。 For example, as shown in FIG. 6, image conversion such as perspective conversion may be performed based on the positional relationship between the region to be cut out and the camera. Here, 19 in FIG. 6 is a camera, 11 is a wide-angle image acquired by the camera 19, 20 is a range to be cut out and converted, 21 is an image obtained by geometrically converting the image of the range 20 to be cut out, and 22 is a converted region 21. The image is affine transformed to a size suitable for the output destination size.

この場合、最終的に切り出した画像２２は切り出した範囲２０よりも小さくなる場合が考えられるので、実際に変換を行う場合は、その点も考慮して切り出す範囲２０を決める必要がある。そして、実施例１と同様に、切り出し映像記録部4(c)で記録を行い、映像出力部5で映像を出力する。画像に変換を行った画像の例を図７に載せる。図７の２３は広角画像からそのまま切り出した画像、２４は切り出した画像２３に幾何学的変換を施した画像、２５は幾何学的変換を施した画像２４を出力先のサイズにリサイズあるいは切り出しを行った画像となっている。 In this case, since the image 22 that is finally cut out may be smaller than the cut-out range 20, when the conversion is actually performed, it is necessary to determine the cut-out range 20 in consideration of this point. Then, similarly to the first embodiment, the cutout video recording unit 4 (c) performs recording, and the video output unit 5 outputs the video. An example of an image converted into an image is shown in FIG. 7 in FIG. 7 is an image cut out from the wide-angle image as it is, 24 is an image obtained by performing geometric transformation on the cut-out image 23, and 25 is resized or cut out to the output destination size by applying the geometric transformation image 24. It is the image that went.

こうすることで、映像から映像を切り出した際に、カメラが切り出す領域の方向を向いて撮像したような絵を取得できる。また、より立体感のある絵を提供する事が出来る。 In this way, when the video is cut out from the video, it is possible to acquire a picture that is taken in the direction of the area to be cut out by the camera. In addition, a more three-dimensional picture can be provided.

次に、映像を切り出す際に、切り出す位置によって好適な音を提供する場合について図８、図９を用いて説明する。図８において、０(ｄ)は映像処理装置であり、２６はマイク音源調整部、４(ｄ)は映像切り出し部３とマイク音源調整部２６からの出力データを記録する切り出し映像記録部である。図９の１１は映像入力部1で取得した広角画像で、図９の２７(a)〜２７(c)は切り出す範囲、２８(a)〜２８(i)はマイクを示す。その他のユニットは実施例１と同様である。 Next, a case where a suitable sound is provided depending on a position where the video is cut out will be described with reference to FIGS. 8 and 9. In FIG. 8, 0 (d) is a video processing device, 26 is a microphone sound source adjustment unit, and 4 (d) is a cut-out video recording unit that records output data from the video cut-out unit 3 and the microphone sound source adjustment unit 26. . Reference numeral 11 in FIG. 9 denotes a wide-angle image acquired by the video input unit 1, and 27 (a) to 27 (c) in FIG. 9 denote a range to be cut out, and 28 (a) to 28 (i) denote a microphone. Other units are the same as those in the first embodiment.

まず、映像処理装置０(ｄ)を固定して撮影を行う。そして、映像入力部１で広角映像１１を取得し、映像記録部２で記録する。そして、取得した広角映像１１に対して映像切り出し部３で映像の切り出しを行う。次にマイク音源調整部で、切り出し範囲に応じたマイク音源の調整を行う。調整の方法には、例えば、図９の２７(ａ)を切り出す場合は、マイク音源として２８(ａ)、２８(ｂ)、２８(ｄ)、２８(ｅ)に重みをつけ、２７(ｂ)を切り出す場合は、マイク音源として２８(ｅ)、２８(ｆ)に重みをつけ、２７(ｃ)を切り出す場合は、マイク音源として28(e)に重みをつけるなどがある。そして、切り出した映像と調整した音を切り出し範囲記録部4で記録し、映像出力部5で出力する。 First, shooting is performed with the video processing device 0 (d) fixed. Then, the wide-angle video 11 is acquired by the video input unit 1 and recorded by the video recording unit 2. Then, the video cutout unit 3 cuts out the video from the acquired wide-angle video 11. Next, the microphone sound source adjustment unit adjusts the microphone sound source according to the cutout range. For example, when 27 (a) in FIG. 9 is cut out, weights 28 (a), 28 (b), 28 (d), and 28 (e) are weighted as microphone sound sources, and 27 (b) ) Is weighted as microphone sound sources 28 (e) and 28 (f), and 27 (c) is weighted as microphone sound sources 28 (e). Then, the cutout video and the adjusted sound are recorded by the cutout range recording unit 4 and output by the video output unit 5.

このように、映像処理装置０（ｄ）が重み付けを行うことにより、切り出し位置にあった音を映像に付加することができる。 In this way, the video processing device 0 (d) performs weighting, so that the sound at the cutout position can be added to the video.

また、上述の重み付けとは、例えば切出す範囲２７（ａ）〜（ｃ）との位置が近いマイク音源を１個以上選択することを示す。その他、重み付けとは、切出す範囲２７（ａ）〜（ｃ）に近いマイク音源を選択した上で、位置が近い音源ほど記録媒体に記録するボリュームを大きくする構成としてもよい。ここで、位置情報は実施例１における位置情報を用いる。 Moreover, the above-mentioned weighting indicates that, for example, one or more microphone sound sources that are close to the cutout ranges 27 (a) to (c) are selected. In addition, the weighting may be configured such that after selecting a microphone sound source close to the cut-out ranges 27 (a) to (c), a sound source closer to the position increases the volume recorded on the recording medium. Here, the position information in the first embodiment is used as the position information.

広角映像のどの部分を切り出したのかをディスプレイなどの表示部に出力する手段について、図１０、図１１を用いて説明する。図１０の２９は切り出し位置画像生成部、３０は画像重畳部を示し、その他のユニットは実施例２と同様である。しかし、説明を簡単にするため、本実施例では、画像をはめ込む手段については省略している。 A means for outputting to a display unit such as a display which portion of the wide-angle video has been cut out will be described with reference to FIGS. In FIG. 10, reference numeral 29 denotes a cut-out position image generation unit, 30 denotes an image superimposing unit, and other units are the same as those in the second embodiment. However, in order to simplify the description, in this embodiment, means for fitting an image is omitted.

まず、映像入力部１から取得した広角映像に対して、映像切り出し部３で映像を切り出す。そして、切り出し位置画像生成部２９では、切り出す範囲から、広角映像のどの部分を切り出したのかを枠で示した画像を出力する。そして、画像重畳部３０で切り出した映像に、切り出し位置画像生成部２９で作成した画像を重畳する。そして、それを映像出力部５で出力する。このようにして作成した映像の例を図１１に示す。図１１の３１は切り出した映像、３２は元の広角映像をリサイズした映像、３３はどの部分を切り出したのかを示すために、リサイズした映像３２に切り出した範囲を示す枠を表している。 First, the video cutout unit 3 cuts out the video from the wide-angle video acquired from the video input unit 1. Then, the cut-out position image generation unit 29 outputs an image indicating which part of the wide-angle video is cut out from the cut-out range with a frame. Then, the image created by the cutout position image generation unit 29 is superimposed on the video cut out by the image superimposing unit 30. Then, it is output by the video output unit 5. An example of the video created in this way is shown in FIG. In FIG. 11, 31 indicates a clipped video, 32 indicates a resized video of the original wide-angle video, and 33 indicates a frame indicating a range cut out in the resized video 32 in order to indicate which part is cut out.

こうすることで、広角映像のうちどの範囲の映像を切り出したのかを、ユーザに提供する事が出来る。 By doing so, it is possible to provide the user with which range of video from the wide-angle video is cut out.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例では、映像入力部１からの広角映像を映像記録部2に、切り出した映像を切り出し映像記録部４(a)乃至４(ｄ)で記録を行っている。しかし、映像記録部2あるいは切り出し映像記録部４(a)乃至４(ｄ)は必ずしも必要ではなく、映像入力部1からの映像を実時間で処理し、外部出力をしても良い。これらユニットは、本実施例を分かりやすく説明するために詳細に説明したものであり、説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, in the above-described embodiment, the wide-angle video from the video input unit 1 is recorded in the video recording unit 2, and the cut-out video is recorded in the video recording units 4 (a) to 4 (d). However, the video recording unit 2 or the cut-out video recording units 4 (a) to 4 (d) are not necessarily required, and the video from the video input unit 1 may be processed in real time and output externally. These units are described in detail for easy understanding of the present embodiment, and are not limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment.

以上説明した、各実施例の映像処理装置によれば、切り出した映像を符号化する際に、従来の手法よりも圧縮率を上げることが可能である。 According to the video processing apparatus of each embodiment described above, it is possible to increase the compression rate compared to the conventional method when encoding the cut out video.

また、撮像映像に画像をはめ込む場合については、一般的なビデオカメラを操作する場合は、カメラの動作に合わせてはめ込む位置を検出する必要があるが、本実施例の映像処理装置の場合、切り出す範囲を変えても、はめ込み画像の位置を改めて検出することなく処理を行う事が可能である。 In addition, in the case of fitting an image into a captured video, when operating a general video camera, it is necessary to detect the position to be fitted in accordance with the operation of the camera. Even if the range is changed, the processing can be performed without detecting the position of the embedded image again.

また、画像を切り出す際に、切り出す位置とカメラの位置を考慮して画像に補正をかけることで、単に切り出す場合と比べて立体的な映像を作成することが可能である。最後に、複数のマイクを接続し、切り出し位置に応じてマイク音源の重みを変えることで、映像に適した音を提供する事が可能である。 In addition, when an image is cut out, the image is corrected in consideration of the position to be cut out and the position of the camera, so that a three-dimensional video can be created as compared with a case where the image is simply cut out. Finally, by connecting a plurality of microphones and changing the weight of the microphone sound source according to the cut-out position, it is possible to provide sound suitable for video.

また、本発明の構成は、上記実施例に限定されるものではなく、発明の範囲で自由に変更することも可能である。また、各実施例の内容を組み合わせることも可能である。 Moreover, the structure of this invention is not limited to the said Example, It is also possible to change freely within the scope of the invention. It is also possible to combine the contents of the embodiments.

第１実施例としての映像処理装置の構成図である。It is a block diagram of the video processing apparatus as 1st Example. 第１実施例として、具体例を説明するための図である。It is a figure for demonstrating a specific example as 1st Example. 第２実施例としての映像処理装置の構成図である。It is a block diagram of the video processing apparatus as 2nd Example. 第２実施例として、具体例を説明するための図である。It is a figure for demonstrating a specific example as 2nd Example. 第３実施例としての映像処理装置の構成図である。It is a block diagram of the video processing apparatus as 3rd Example. 第３実施例として、画像の幾何学的変換を説明するための図である。It is a figure for demonstrating the geometric transformation of an image as 3rd Example. 第３実施例として、画像の変換例を示す図である。It is a figure which shows the example of an image conversion as 3rd Example. 第４実施例としての映像処理装置の構成図である。It is a block diagram of the video processing apparatus as 4th Example. 第４実施例として、マイク音源を選択する優先順位を説明するための図である。It is a figure for demonstrating the priority which selects a microphone sound source as 4th Example. 第５実施例としての映像処理装置の構成図である。It is a block diagram of the video processing apparatus as 5th Example. 第５実施例を具体的に説明するための図である。It is a figure for demonstrating 5th Example concretely. 位置情報テーブルの例を示す図である。It is a figure which shows the example of a position information table.

Explanation of symbols

０（ａ）〜０（ｅ）・・・映像処理装置
１・・・映像入力部
２・・・映像記録部
３・・・映像切り出し部
４（ａ）〜４（ｄ）・・・切り出し映像記録部
５・・・映像出力部
６・・・ＣＰＵ
７・・・メモリ
８・・・参照フレーム選択部
９・・・符号化部
１０・・・バス
１１，１１（ａ）〜１１（ｅ）・・・広角映像
１２（ａ）〜１２（ｇ）・・・切り出し範囲
１３・・・操作部
１４・・・画像はめ込み部
１５（ａ）〜１５（ｅ）・・・切り出し範囲
１６（ａ）〜１６（ｅ）・・・切り出した画像
１７・・・画像はめ込み範囲
１８・・・画像変換部
１９・・・カメラ
２０・・・切り出して変換する範囲
２１・・・画像変換した画像
２２・・・画像変換した画像から切り出した画像
２３・・・切り出した画像
２４・・・幾何学的変換を施した画像
２５・・・リサイズした画像
２６・・・マイク音源調整部
２７（ａ）〜２７（ｃ）・・・切り出し範囲
２８（ａ）〜２８（ｉ）・・・マイク音源
２９・・・切り出し位置画像生成部
３０・・・画像重畳部
３１・・・切り出した映像
３２・・・広角映像をリサイズした映像
３３・・・切り出し位置を示す枠 0 (a) to 0 (e) ... video processing device 1 ... video input unit 2 ... video recording unit 3 ... video cutout units 4 (a) to 4 (d) ... cutout video Recording unit 5 ... Video output unit 6 ... CPU
7 ... Memory 8 ... Reference frame selection unit 9 ... Encoding unit 10 ... Buses 11, 11 (a) to 11 (e) ... Wide-angle video images 12 (a) to 12 (g) ... Cutout range 13 ... Operation unit 14 ... Image fitting parts 15 (a) to 15 (e) ... Cutout range 16 (a) to 16 (e) ... Cut out image 17 Image fitting range 18 ... image conversion unit 19 ... camera 20 ... range to be cut out and converted 21 ... image converted image 22 ... image 23 cut out from the image converted image ... cut out Image 24 ... geometrically transformed image 25 ... resized image 26 ... microphone sound source adjustment unit 27 (a) to 27 (c) ... clipping range 28 (a) to 28 ( i)... Microphone sound source 29... Frame indicating the picture 33 ... cutout position fit the image 32 ... wide angle image cut out image superimposing section 31 ...

Claims

An image processing device that processes an image of a subject,
An imaging means for imaging a wide-angle image;
Cutting means for cutting out an image of a partial area from the wide-angle video;
Position information acquisition means for associating the image cut out by the cut-out means with the position information indicating the position of the cut-out image in the wide-angle image;
Encoding means for encoding the video clipped by the cutting means,
The video processing apparatus, wherein the encoding means includes means for selecting and encoding a reference frame based on the position information.

The video processing apparatus according to claim 1
Storage means for storing position information indicating cut-out positions of a plurality of videos encoded in the past,
The encoding means refers to a clipped image having position information having the smallest difference from the position information corresponding to the image to be encoded among the position information stored in the storage means. A video processing apparatus, wherein a frame is selected and encoded.

The video processing apparatus according to claim 1,
When the difference between the position information corresponding to the image encoded in the past and the position information corresponding to the image to be encoded is equal to or less than a predetermined amount, the encoding means refers to the previously encoded image as a reference frame. A video processing apparatus characterized by selecting and encoding.

The video processing apparatus according to claim 1,
The encoding means is configured such that a difference between position information corresponding to a previously encoded image and position information corresponding to an image to be encoded is a predetermined amount or less, and the previously encoded image and the encoding A video processing apparatus, wherein when a difference in time taken with a video to be captured is a predetermined amount or less, the video encoded in the past is selected as a reference frame and encoded.

The video processing device according to any one of claims 1 to 4,
A synthesis means for performing inset synthesis of another video in a partial area of the wide-angle video,
The video processing apparatus, wherein the cutout means cuts out a partial area of a wide-angle video synthesized with another video by the synthesis means.

The video processing device according to any one of claims 1 to 5,
Voice input means for inputting voice from a plurality of sound collectors;
A video processing apparatus comprising: audio output means for changing and outputting individual volumes of audio input from the plurality of sound collectors according to the position information.

The video processing device according to any one of claims 1 to 5,
Voice input means for inputting voice from a plurality of sound collectors;
Output means for outputting the video clipped by the cutting means,
The video processing apparatus characterized in that the output means changes and outputs individual weights of audio input from the plurality of sound collectors according to the position information.

The video processing device according to any one of claims 1 to 7,
Using the position information cut out by the cutout means, and having cutout display output means for outputting a display indicating which range of the wide-angle video is cut out when the cutout video is output to an external display device. A video processing device.

The video processing apparatus according to any one of claims 1 to 8,
A video processing apparatus comprising conversion means for converting the cut out video into a predetermined display size.