WO2016111239A1

WO2016111239A1 - Image processing device, image processing method and program recording medium

Info

Publication number: WO2016111239A1
Application number: PCT/JP2016/000013
Authority: WO
Inventors: 真澄石川; 仁河村
Original assignee: 日本電気株式会社
Priority date: 2015-01-06
Filing date: 2016-01-05
Publication date: 2016-07-14
Also published as: JPWO2016111239A1; JP6708131B2

Abstract

The present invention provides an image processing device capable of generating a natural image in which flickering is reduced. An image processing device 100 is provided with a determining unit 11, a motion estimation unit 12, an image generating unit 13, and an image synthesizing unit 14. The determining unit 11 determines which one of a plurality of temporally consecutive frame images is a target frame image including a flickering region in which the brightness or chroma is different from that of the preceding or subsequent frame image by a predetermined level or higher. The motion estimation unit 12 estimates a first movement amount caused by a movement of a camera and/or a second movement amount caused by a movement of a subject on the basis of a pair of frame images selected on the basis of a difference in brightness or chroma between the target frame image and the preceding or subsequent frame image. On the basis of the selected pair and the estimated first movement amount and/or second movement amount, the image generating unit 13 generates a correction frame image corresponding to a frame image at the time when the target frame image is captured. The image synthesizing unit 14 synthesizes the target frame image and the correction frame image.

Description

Video processing apparatus, video processing method, and program recording medium

The present invention relates to a video processing device, a video processing method, and a program recording medium.

一部 Some video content may have a physiological adverse effect on viewers. One such effect is the development of a photosensitivity attack. Photosensitivity seizure is one of the symptoms of an abnormal response to light stimulation, and is a seizure showing symptoms similar to epilepsy such as convulsions and disturbance of consciousness.

In order to suppress the occurrence of such effects, attempts are being made to suppress the distribution of video content that has a negative effect on the human body. For example, the International Telecommunication Union (ITU) recommends that video distribution organizations inform video content producers that video content is at risk of causing photosensitivity attacks (Non-Patent Document 1). ). In Japan, the Japan Broadcasting Corporation and the Japan Broadcasting Corporation have established guidelines for animation production in particular, and are demanding compliance with those who are involved in broadcasting (Non-Patent Document 2).

However, when broadcasting live video content that requires promptness, such as news footage, if the video contains many blinks that can trigger a photosensitivity attack, the video will have a negative effect on the human body. It is difficult to suppress content distribution. In such a case, at present, measures are taken to alert the viewer in advance with a telop or the like. One video that contains many flickers that can trigger photosensitivity seizures is a video that contains a lot of flash emitted from a news photographer during a press conference. In such an image, a short-time bright region is generated by a flash emitted from the camera, and many blinks are generated by repeating this.

Patent Documents 1 to 3 disclose related techniques for detecting and correcting video content that has an adverse effect on the human body.

Patent Document 1 discloses a technique for detecting a scene (image) that induces a light-sensitive seizure in a liquid crystal display and reducing the luminance of a backlight unit with respect to the detected scene. This technology obviates the effects of photosensitivity attacks on viewers.

Patent Document 2 corrects the dynamic range of the (n + 1) th frame image by gamma correction or tone curve correction based on the comparison result of the histograms of the nth frame image and the (n + 1) th frame image. The technology is disclosed. This technique relieves strong blinking and reduces eye strain or poor health. Patent Document 3 discloses a technique for correcting a motion vector.

Note that Non-Patent Document 3 and Non-Patent Document 4 disclose optical flow calculation methods described later.

JP 2008-301150 A JP 2010-035148 A JP 2008-124956 A

However, the related technology has the following problems. Large changes in brightness or saturation that can trigger photosensitivity seizures may occur in some areas of the image, not in the entire image. Since the technique disclosed in the related art uniformly corrects the entire image without making these determinations, it reduces the contrast and brightness of areas that do not need to be corrected and does not cause blinking, and reduces the image quality of those areas. May deteriorate.

In addition, in the case of blinking due to flash or the like, color information of a part of pixels in an area brightened by the flash may exceed the dynamic range of the camera (ie, be saturated). Pixels with saturated color information lose their original information. For this reason, only correction processing using only a frame image including pixels with saturated color information may generate pixels with excessive or undersaturation, and fluctuations in color cannot be suppressed. Therefore, it is difficult to naturally mitigate flicker by such correction processing.

An object of the present invention is to provide a technique capable of generating a natural video in which fluctuations in luminance or saturation are suppressed.

An image processing device according to an aspect of the present invention includes:
Determination means for determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region whose luminance or saturation differs by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject Motion estimation means for estimating a second movement amount to be
Image generation for generating a correction frame image corresponding to a frame image at the shooting time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount Means,
Image synthesizing means for synthesizing the frame image of interest and the correction frame image.

An image processing method according to an aspect of the present invention includes:
It is determined whether any of a plurality of temporally continuous frame images is an attention frame image including a blinking region whose luminance or saturation is different from a preceding frame image by a predetermined level or more.
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject Estimating a second movement amount to be
Based on the selected pair and the estimated first movement amount and / or second movement amount, a corrected frame image corresponding to a frame image at the shooting time of the frame image of interest is generated,
The attention frame image and the correction frame image are synthesized.

A program recording medium according to an aspect of the present invention includes:
On the computer,
A process of determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region that differs in luminance or saturation by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject A process of estimating a second movement amount to be performed;
Processing for generating a corrected frame image corresponding to a frame image at the photographing time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount; ,
And a process of combining the frame image of interest and the correction frame image.

An image processing device according to an aspect of the present invention includes:
Selection means for selecting a first frame image and a second frame image from a plurality of temporally continuous frame images;
A geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the first frame image and the second frame image, and a first movement amount due to camera movement is estimated. First estimating means for:
A subject area is detected from the first frame image and the second frame image by subtracting the first movement amount based on the geometric transformation parameter, and a subject is detected based on the detected subject area. And second estimation means for estimating a second movement amount resulting from the movement of.

According to the present invention, it is possible to generate a natural video in which fluctuations in luminance or saturation are suppressed.

FIG. 1 is a block diagram of a video processing apparatus according to the first embodiment. FIG. 2 is a schematic diagram showing a rectangular area luminance calculation method. FIG. 3 is a block diagram of the motion estimator in the first embodiment. FIG. 4 is a schematic diagram illustrating a method of selecting a frame image that does not include a bright region. FIG. 5 is a diagram illustrating a method for selecting a motion estimation frame. FIG. 6 is a diagram illustrating a method for selecting a motion estimation frame. FIG. 7 is a diagram illustrating an example of a method for selecting a motion estimation frame pair. FIG. 8 is a block diagram of a correction frame generation unit in the first embodiment. FIG. 9 is a graph showing an example of a method for setting the value of the rate of change in local area luminance in the output frame image. FIG. 10 is a flowchart showing the operation of the video processing apparatus according to the first embodiment. FIG. 11 is a block diagram illustrating a hardware configuration of the computer apparatus.

[Constitution]
FIG. 1 is a block diagram showing a configuration of a video processing apparatus 100 according to the first embodiment of the present invention. Note that the arrows described in FIG. 1 (and the subsequent block diagrams) merely show an example of the data flow, and are not intended to limit the data flow.

The video processing apparatus 100 includes a determination unit 11, a motion estimation unit 12, an image generation unit 13, and an image synthesis unit 14.

The determination unit 11 determines whether or not the frame image includes a region that may induce a photosensitivity seizure. Specifically, the determination unit 11 uses a frame image having a preset number of frames to blink a specific frame image (hereinafter referred to as “target frame image”) by flash or the like (the luminance changes greatly). It is determined whether the frame image includes a region. In the following, a region determined in this way (a region where the luminance changes greatly) is referred to as a “blinking region”. For example, when the determination unit 11 receives an input of time-sequential frame images for (2m + 1) frames taken from time (tm) to time (t + m), the determination unit 11 selects the frame image at time t. A frame image of interest is determined, and it is determined whether the frame image includes a blink region.

When the blinking region is included in the attention frame image, the motion estimation unit 12, the image generation unit 13, and the image synthesis unit 14 synthesize a frame image in which the movement of the image due to the displacement of the camera or the subject is corrected. The motion estimation unit 12, the image generation unit 13, and the image synthesis unit 14 can output a frame image in which the influence of blinking is reduced by appropriately suppressing the luminance change in the blinking region in this way.

Note that the blinking region includes a bright region in which the luminance of the frame image of interest is greatly improved (becomes brighter) and a dark region in which the luminance of the frame image of interest is greatly lowered (becomes dark). However, for the sake of simplicity, only the bright area will be described below.

<Determining unit 11>
When receiving the input of a plurality of frame images, the determination unit 11 determines whether the target frame image is a frame image including a blinking region.

One method for determining whether a target frame image is a frame including a blinking region is a method using a change rate of local region luminance between the target frame image and another input frame image.

Here, the local area luminance represents a luminance value of an area including the pixel and a predetermined number of pixels around the pixel in each pixel of the input plurality of frame images. The determination unit 11 first converts color information described in an RGB color system or the like into luminance information (luminance value) representing brightness for each pixel of the input plurality of frame images. Thereafter, the determination unit 11 performs a smoothing process using pixels around the target pixel on the converted luminance information, thereby calculating a luminance value in the pixel peripheral region.

The method for converting color information into luminance information is, for example, a method for calculating a Y value representing the luminance of the YUV (YCbCr, YPbPr) color system used for broadcasting, or a Y value representing the luminance of the XYZ color system. There is a way to calculate. However, the color systems describing luminance information are not limited to these color systems. For example, the determination unit 11 may convert the color information into another index representing luminance such as the V value of the HSV color system. In addition, when the input frame image has been subjected to gamma correction in advance, the determination unit 11 converts the color information into color information before correction by inverse gamma correction before conversion to luminance information. Also good.

The smoothing method is, for example, an average value of luminance information of upper and lower q pixels and left and right p pixels, that is, (2p + 1) × (2q + 1) pixels, of pixels around the target pixel. There is a method to calculate. In this case, the local region luminance l _t (x, y) of the pixel at the position (x, y) in the frame image at time t is _expressed by Equation (1) using the luminance information Y _t of the frame image. Can be expressed as

Further, the determination unit 11 may calculate the local region luminance l _t (x, y) using a weighted average using a preset weight w as in Expression (2).

As a weight setting method, for example, there is a method using Gaussian weight. The determination unit 11 calculates a Gaussian weight w (i, j) using Equation (3) using a preset parameter σ.

The local area luminance change rate represents the ratio of the local area luminance change between the pixel of the target frame image and the pixel of another input frame image at the same position. The determination unit 11 changes the local area luminance change rate r _{t−t + k} (x, y) of the pixel at each position (x, y) of the frame image of interest at time t and the frame image at time (t + k). ) Is calculated using equation (4).

The determination unit 11 determines based on the calculated change rate whether or not the frame image of interest includes an area that is brighter than a predetermined level by comparison with other frame images. As a result, when the attention frame image includes an area that is brighter than a predetermined level with respect to other frame images before and after in time, the determination unit 11 determines that the attention frame image is a bright area due to blinking. It is determined that the frame image is included.

The determination unit 11 uses the threshold value α of the change rate and the threshold value β of the area rate, which are set in advance, depending on whether the area rate of the region where the change rate r _{t-t + k} exceeds the threshold value α exceeds the threshold value β. A determination method can also be used. For example, as one of the criteria for determining blinking video to be avoided, the guidelines by the Japan Broadcasting Corporation and the Japan Broadcasting Corporation include: `` The area where blinking occurs simultaneously exceeds 1/4 of the screen and the luminance change is 10% or more. "In the case of". In the above determination method, in order to satisfy this determination criterion, the determination unit 11 sets α = 0.1 and β = 0.25.

The determination unit 11 sets the determination flag flag _{t-t + k} to “1” when it is determined that the frame image of interest at time t includes a region that is brighter than a predetermined level by the frame image at time (t + k). . If the determination unit 11 determines that there is no such area, the determination flag flag _{t-t + k is set} to “0”. The determination unit 11 similarly calculates a determination flag for the combination of the target frame image and all the other input frame images, and the frame image for which the determination flag is “1” for each of the times before and after the target frame image. It is determined whether or not exists. When such a frame image exists, the determination unit 11 determines that the frame image of interest is a frame image including a bright region.

The determination unit 11 may use a method of using the change rate of the rectangular area luminance as another method of determining whether the frame image of interest is a frame image including a blinking area. Here, the rectangular area luminance represents an average value of luminance for each rectangular area set in advance in each frame image. For example, as shown in FIG. 2, the rectangular area luminance when a 10 × 10 block rectangular area is set in the frame image is an average value of the luminance values of the pixels included in each rectangular area. As the luminance value, the Y value of the YUV color system, the Y value of the XYZ color system, the V value of the HSV color system, etc. can be used as in the case of calculating the local area luminance.

The change rate of the rectangular area luminance represents the ratio of the difference between the rectangular area luminance of the block of interest in the target frame image and the rectangular area luminance of the block at the same position in the other input frame image. The determination unit 11 determines the rectangular area luminance L _t (i, j) at the time t of the block at the position (i, j) of the target frame image and the rectangular area luminance L _{t + k} of the frame image at the time (t + k). The change rate R _{t−t + k} (i, j) of (i, j) is calculated using equation (5).

The determination using the change rate of the rectangular area luminance is performed in the same manner as the determination using the change ratio of the local area luminance. The determination unit 11 determines whether or not the attention frame image includes a region that is brighter than the other frame images in the combination of the attention frame image at time t and all the other input frame images. Set the value of the judgment flag. The determination unit 11 determines that the frame image of interest is a frame image including a blinking area when there are frame images having the determination flag “1” at each of the times before and after the frame of interest image.

In the determination flag value setting method, as in the case of using the local area luminance change rate, a pixel whose change rate exceeds the threshold value α using a preset change rate threshold value α and an area rate threshold value β. There is a method of setting “1” or “0” depending on whether or not the area ratio exceeds the threshold value β.

The determination unit 11 outputs a determination flag between the frame image of interest and another input frame image as analysis information together with the determination result. Moreover, the determination part 11 may output the determination flag calculated between frame images other than an attention frame image as auxiliary information by performing the same process.

In addition to the determination flag between the input frame images, the determination unit 11 calculates the rectangular area luminance calculated between each rectangular area of the target frame image and the rectangular area at the same position of the other frame image. May be output as analysis information.

<Motion estimation unit 12>
FIG. 3 is a block diagram illustrating a configuration of the motion estimation unit 12.

The motion estimation unit 12 includes a selection unit 12A, a first estimation unit 12B, and a second estimation unit 12C.

The motion estimation unit 12 receives the frame image and the determination result and analysis information output from the determination unit 11 as inputs. When it is determined that the target frame image is a frame image including a bright region, the motion estimation unit 12 selects a plurality of frame images to be used for motion estimation from the input frame images, and selects between the selected frame images. The movement amount of the image due to the movement of the camera and the subject is estimated.

Selection unit 12A
The selection unit 12A selects a frame image used for estimation of the movement amount from frame images other than the target frame image, and acquires a pair of frame images including the selected frame image. The selection unit 12A selects these frame images (hereinafter referred to as “motion estimation frame images”), for example, by the following method.

(12A-1) Selection method 1
The selection unit 12A may select one frame image as a motion estimation frame image from before and after the target frame image based on the luminance difference between the target frame image and the input other frame image. In this case, the selection unit 12A acquires one frame image before and after each frame image of interest and uses it as a pair of motion estimation frame images. Specifically, the selection unit 12A may select the motion estimation frame image using the determination flag calculated by the determination unit 11.

In this method, out of the frame images whose determination flag is “1”, the frame images before and after the closest to the target frame image are selected as the motion estimation frame images.

FIG. 4 is a schematic diagram showing a method for selecting a frame image that does not include a bright region. Figure 4 shows the case of comparing the frame image from time (t-2) to time (t + 2) with other frame images for the frame image at time t for the four types of cases 1 to 4 The determination flag (flag) is illustrated. Note that in FIG. 4 (and similar figures thereafter), frame images that do not include a bright region are shown with hatching. An unhatched frame image represents a frame image including a bright region.

For example, for a frame image at time t including a bright region, the selection unit 12A selects a frame image at time (t−1) and time (t + 1) in case 1. Similarly, the selection unit 12A displays frame images at time (t-2) and time (t + 1) in case 2, and frames at time (t-1) and time (t + 2) in case 3. In the case of an image, case4, frame images at time (t-2) and time (t + 2) are selected.

Further, the selection unit 12A may correct the selection result of the motion estimation frame using the determination flag between the frame images other than the target frame image input as the auxiliary information. In the selection using the determination flag between the frame image of interest and another frame image, when the frame at time (t + k) is selected as the motion estimation frame, the selection unit 12A corrects the selection result as follows. May be. For example, the determination flag flag _{t-t + k} of the frame image at the time (t + k + 1) and the determination flag of the frame image at the time (t + k + 1) and the frame image at the time (t + k) When both flag _{t + k + 1-t + k} values are `` 1 '', there is also a large luminance change between the frame image at time (t + k + 1) and the frame image at time (t + k). It is believed that there is. Therefore, in this case, the selection unit 12A may change (correct) the motion estimation frame image to a frame image at time (t + k + 1).

(12A-2) Selection method 2
The selection unit 12A may select a plurality of frame images as the motion estimation frame images from before and after the target frame image based on the luminance change between the target frame image and the input other frame image. In this case, the selection unit 12A acquires a plurality of pairs of frame images. Specifically, the selection unit 12A may select a predetermined number of frame images having the determination flag calculated by the determination unit 11 out of the neighboring frame images of the target frame image.

FIG. 5 is a schematic diagram showing an example of selecting a plurality (two pairs in this case) of motion estimation frame images. As illustrated in FIG. 5, when the frame images at times (t−2), (t−1), (t + 1), and (t + 2) do not include a bright region, the selection unit 12A All of these frame images are selected as motion estimation frames. The determination flag in the example of FIG. 5 is equal to the determination flag in the case 1 of FIG. However, in this selection method, the selection unit 12A not only displays frame images at time (t-1) and time (t + 1) but also frame images at time (t-2) and time (t + 2). Select as a frame for motion estimation.

This selection method selectively uses an area that is less affected by light flickering from multiple frame images when frequent flickering occurs in a short time or when a flash band occurs. The accuracy can be increased (see, for example, FIG. 7). Here, the flash band refers to the difference in exposure period for each line when light emission in a short time such as flash light occurs in a rolling shutter type imaging device such as a CMOS (Complementary Metal-Oxide-Semiconductor) sensor. This is a large change (shift) in the signal intensity that occurs. In the frame image in which the flash band is generated, for example, only the upper half or the lower half is an image at the time of light emission (bright region), and the remaining part is a relatively dark image immediately before or after the light emission.

(12A-3) Selection method 3
The selection unit 12A selects, as a motion estimation frame image, one of the frame images before and after the target frame image and the target frame image based on the luminance difference between the target frame image and the input other frame image. May be. Specifically, the selection unit 12A may select a frame image closest to the target frame image from among the frames having the determination flag “1” calculated by the determination unit 11. When the determination flag is “1” both before and after the frame image of interest, the selection unit 12A selects only one preset frame. FIG. 6 shows an example of a case where a frame image at a time earlier than the target frame image is selected. In this case, the selection unit 12A uses the frame image thus selected and the frame image of interest as a pair of motion estimation frame images.

According to this selection method, compared with the

selection methods

1 and 2, the number of images to be processed by the motion estimation unit 12 and the image generation unit 13 is reduced, so that high-speed processing can be realized.

Note that this selection method is based on the assumption that corresponding points can be detected in the frame image of interest.

First estimation unit 12B
The first estimation unit 12B estimates pixel motion caused by camera or subject motion between a pair of motion estimation frame images. Motion estimation is performed on a combination (pair) of any two frame images of the motion estimation frame images. The first estimation unit 12B performs motion estimation on at least one set of one or a plurality of pairs.

For example, in the case of the selection method 1 (12A-1) described above, the first estimation unit 12B performs motion estimation on a pair of two frame images selected one by one from before and after the target frame image. In addition to this, the first estimation unit 12B may perform motion estimation on a pair composed of the target frame image and one of the frame images selected from before and after.

In the case of the selection method 2 (12A-2), as shown in FIG. 7, the first estimation unit 12B uses a rectangular area of the target frame image and a plurality of frame images selected from before and after the target frame image. The luminance of the rectangular area is compared with the rectangular area at the same position. Then, the first estimation unit 12B detects a region where the change rate of the luminance of the rectangular region exceeds the threshold value γ. The first estimator 12B makes a pair of frame images including a region having a common region where the rate of change exceeds the threshold γ, and moves with respect to the common region (region surrounded by a dotted line in FIG. 7) of each pair. Estimate. The threshold value γ may be a preset value, but an appropriate value may be dynamically set so that motion estimation can be performed in a certain area. Alternatively, the first estimation unit 12B uses the determination flag between frame images other than the target frame image input from the determination unit 11, and the frame image in which the determination flag between the frame images is “0”. Motion estimation may be performed on a pair of

Also, in the case of selection method 3 (12A-3), the first estimation unit 12B performs motion estimation on a pair of a frame and a target frame image selected from either one before or after the target frame image.

The motion of the image due to the motion of the camera can be expressed by affine transformation between a pair of motion estimation frame images because of the global motion of the screen. Affine transformation is a geometric transformation that combines translation between two images and linear transformation (enlargement / reduction, rotation, skew). When a pair of motion estimation frame images is an image I and an image I ′, and a pixel P (x, y) on the image I corresponds to a pixel P ′ (x ′, y ′) on the image I ′, The affine transformation from the image I to the image I ′ is expressed by Expression (6).

The linear transformation matrix of Equation (6) is obtained by QR decomposition.

Can be broken down into Using these, equation (6) can be expressed as equation (7).

As the affine transformation parameters (θ, a ′, b ′, d ′, tx, ty), corresponding points on the image I ′ are detected for three or more pixels on the image I, and each coordinate is expressed by an expression ( It can be calculated by substituting in 7). The first estimation unit 12B can detect corresponding points by the following method, for example.

(12B-1) Detection method 1
The first estimation unit 12B calculates an optical flow for the pixel P on the image I, and sets the pixel P ′ to which the pixel P is moved as a corresponding point. As a main calculation method of the optical flow, a method based on the Lucas-Kanade method or the Horn-Schunck method can be cited. The Lucas-Kanade method is a method for calculating the amount of movement of an image based on a constraint condition in which pixel values are approximately the same before and after movement (Non-Patent Document 3). The Horn-Schunck method is a method for calculating the amount of movement of an image by minimizing the error function of the entire image while taking into account the smoothness between adjacent optical flows (Non-Patent Document 4).

(12B-2) Detection method 2
The first estimation unit 12B specifies the region R ′ on the image I ′ corresponding to the region R on the image I, and the corresponding point of the pixel P corresponding to the center coordinate of the region R corresponds to the center coordinate of the region R ′. It is assumed that the pixel P ′ to be used. The regions R and R ′ may be rectangular regions obtained by dividing the images I and I ′ into a grid having a predetermined size, or may be clusters generated by clustering pixels based on image features such as color and texture. May be.

For example, the first estimation unit 12B can detect the region R ′ by template matching using the region R as a template. The first estimation unit 12B uses an SSD (Sum of Squared Difference), SAD (Sum of Absolute Difference), and normalized cross-correlation (ZNCC: Zero-mean Normalized) as a similarity index used for template matching. Cross-Correlation) may be used. In particular, the normalized cross-correlation (R _ZNCC ) is calculated based on the average values (T _ave and) from the template and image luminance values (T (i, j) and I (i, j)) as shown in Equation (8). By calculating by subtracting I _ave ), the similarity can be evaluated stably even if there is a variation in brightness. Therefore, by using normalized cross-correlation, the first estimation unit 12B uses another index even when there is a difference in luminance between a pair of motion estimation frame images due to the influence of flash light. The region R ′ can be detected more stably.

Alternatively, the first estimation unit 12B may detect the pixel P ′ corresponding to the corresponding point of the pixel P corresponding to the center coordinate of the region R using the optical flow. For example, the first estimation unit 12B uses the representative value (weighted average value or median value) of the optical flow estimated for each pixel in the region R as the movement amount of the region R, and moves the pixel P by the movement amount of the region R. Let the previous pixel P ′ be a corresponding point.

(12B-3) Detection method 3
The first estimation unit 12B extracts the pixel P corresponding to the feature point from the image I, and sets the pixel P ′ corresponding to the movement destination of the pixel p of the image I ′ as the corresponding point. The first estimation unit 12B may use, for example, a corner point detected by a Harris corner detection algorithm as a feature point. Harris's corner detection algorithm is based on the knowledge that “the first differential value (difference) is large only in one direction at the point on the edge, and the first differential value is large in multiple directions at the point on the corner”. This is an algorithm for extracting a point having a large positive maximum value of the represented Harris operator dst (x, y).

Here, fx and fy mean primary differential values (differences) in the x and y directions, respectively. _Gσ means smoothing by a Gaussian distribution with a standard deviation σ. k is a constant, and a value from 0.04 to 0.15 is empirically used.

The first estimation unit 12B may identify the corresponding point based on the optical flow detected at the feature point. In addition, the first estimation unit 12B has an image feature whose image feature value (for example, SIFT (Scale-Invariant Feature Transform) feature value) extracted from an image patch including a certain feature point of the image I is an image patch of the image I ′. The center of the image patch may be set as the corresponding point p ′ when it is similar to the image feature amount extracted from.

The first estimator 12B may calculate the affine transformation parameters based on the three reliable combinations of corresponding points among the corresponding points detected using the above method. You may calculate by the least squares method based on the combination of the above corresponding points. Alternatively, the first estimation unit 12B may calculate the affine transformation parameters using a robust estimation method such as RANSAC (RANdom SAmple Consensus). RANSAC calculates three tentative affine transformation parameters by randomly selecting from three combinations of corresponding points, and when there are many combinations that correspond to the tentative affine transformation parameters among other combinations of corresponding points, This is a method in which an affine transformation parameter is adopted as a true affine transformation parameter. Further, the first estimation unit 12B may exclude a specific image region from the calculation target of the affine transformation parameter. Such an image area has a corresponding point detection accuracy such as an edge portion of an image that is likely to be out of the shooting range when the camera moves or a flat portion with a small luminance difference from adjacent pixels. It is a known image area that is low. Alternatively, the pixel value of such an image area changes due to factors other than the movement of the camera, such as an area in the center of the screen where a moving subject is highly likely to be reflected, or a portion that receives a fixed illumination that changes color. It is an image area.

The combination of (12B-1), (12B-2), (12B-3) and (12A-1), (12A-2), (12A-3) described above is not particularly limited. That is, the first estimation unit 12B performs (12B-1), (12B) on the motion estimation frame image selected by any of the methods (12A-1), (12A-2), and (12A-3). -2) or (12B-3) may be executed. The first estimation unit 12B may use camera motion information acquired by a measuring instrument (gyroscope, depth sensor, etc.) mounted on the camera in addition to the motion estimation by the image processing described above.

Second estimation unit 12C
The second estimation unit 12C detects the subject region from one of the pair of motion estimation frame images and estimates the corresponding region (the region corresponding to the subject region) from the other for the motion of the image caused by the motion of the subject. Ask for. Alternatively, the second estimation unit 12C generates a converted image by performing affine transformation on one or both of the pair of motion estimation frame images, and from one of the pair of motion estimation frame images or the converted image thereof. A subject area may be detected. In this case, the second estimator 12C may determine the motion of the image due to the motion of the subject by estimating the other frame image of the pair of motion estimation frame images or the corresponding region of the converted image. .

That is, the second estimation unit 12C detects the pair of the subject area and the corresponding area by subtracting the moving amount of the image caused by the camera movement based on the affine transformation parameter and the motion estimation frame image pair. Based on this pair, the second estimation unit 12C estimates the amount of image movement caused by the movement of the subject.

Examples of the subject area detection method include the following methods.

(12C-1-1) Detection method 1
The second estimation unit 12C detects an image (a set of pixels) that moves differently from the movement amount estimated by the affine transformation parameter from one of the pair of motion estimation frame images as a subject area.

Specifically, the second estimation unit 12C uses the equation (7) to calculate the image P from the image I for the pixel P of the image I based on the affine transformation parameters calculated between the image I and the image I ′. A prediction vector (u, v) between I ′ is calculated. The second estimating unit 12C selects the pixel P as a candidate point when the difference between the vectors (x′−x, y′−y) and (u, v) between the pixel P and the pixel P ′ is equal to or greater than a certain value. And Here, calculating the vector difference means subtracting the amount of movement of the image due to the movement of the camera. The second estimation unit 12C detects the set of candidate points as the subject area of the image I.

(12C-1-2) Detection method 2
For the pair of motion estimation frame images, the second estimation unit 12C calculates a difference between a converted image generated by affine transformation of one and a converted image generated by affine transformation (inverse transformation) of the other frame image. Is detected as a subject area from both converted images.

Specifically, the second estimation unit 12C predicts from the image I at an arbitrary time t based on the affine transformation parameters calculated between the image I and the image I ′ using the equation (7). An image _Ip is generated. Similarly, the second estimation unit 12C generates a predicted image I _p ′ at time t from the image I ′ based on the affine transformation parameters calculated between the image I and the image I ′. The second estimation unit 12C calculates a difference between the predicted images I _p and I _p ′, and detects a set of pixels having an absolute value of the difference equal to or larger than a certain value as a subject area from each of the predicted images I _p and I _p ′. .

Note that the second estimation unit 12C can generate a pixel (x _p , y _p ) on the predicted image I _p by substituting the pixel (x, y) of the image I into Expression (9). Here, it is assumed that the affine transformation parameters between the image I and the image _Ip are (θ _p , a _p , b _p , d _p , t _px , t _py ).

Here, (θ _p , a _p , b _p , d _p , t _px , t _py ) can be calculated by the following relational expression. Here, the affine transformation parameters from the image I to the image I ′ are (θ, a, b, d, t _x , t _y ), the time difference between the image I and the image I ′ is T, and the time difference between the image I and the image I _p Is T _p .

However, the above relational expression assumes that the motion of the camera is constant. When the rate of change of the camera motion is known, the second estimation unit 12C calculates (θ _p , a _p , b _p , d _p , t _px , t _py ) by weighting the rate of change. May be.

In addition, the second estimation unit 12C can generate the pixel (x _p ′, y _p ′) of the predicted image I _p ′ by substituting the pixel (x ′, y ′) of the image I ′ into Expression (10). . Here, it is assumed that the affine transformation parameters between the image I ′ and the image I _p ′ are (θ _p ′, a _p ′, b _p ′, d _p ′, t _px ′, t _py ′).

Here, (θ _p ′, a _p ′, b _p ′, d _p ′, t _px ′, t _py ′) is obtained by the following relational expression. Here, the parameters of the affine transformation from the image I ′ to the image I are (θ ′, a ′, b ′, d ′, t _x ′, _ty ′), the time difference between the image I and the image I ′ is T, and the image It is assumed that the time difference between I and the image I _p ′ is T _p ′.

(12C-1-3) Detection method 3
The second estimation unit 12C may detect a region having a large difference between the converted image generated by affine transformation of one of the pair of motion estimation frame images and the other as a subject region from each of the converted image and the frame image. Good. This detection method is a derivative of (12C-1-2).

Specifically, the second estimation unit 12C predicts from the image I at the time t + k based on the affine transformation parameters calculated between the image I and the image I ′ using Expression (7). An image is generated and a difference from the image I ′ is calculated.

When the second estimation unit 12C detects the subject area, the second estimation unit 12C estimates a corresponding area corresponding to the detected subject area. Examples of a method for estimating the corresponding region of the subject region include the following methods. The second estimation unit 12C may use each method alone or in combination.

(12C-2-1) Estimation method 1
The second estimation unit 12C calculates an optical flow with respect to the other frame image for all the pixels in the subject area detected from one of the pair of motion estimation frame images, and moves by the weighted average of the optical flow. The tip is detected as a corresponding area. Alternatively, the second estimation unit 12C calculates an optical flow between the other frame image or its converted image for all pixels in the subject area detected from the converted image generated by affine transforming one of the pairs. May be.

The second estimation unit 12C may give a high weight to the optical flow of the pixels close to the center of gravity of the subject region as the weight used in the calculation of the weighted average of the optical flow. The second estimation unit 12C may give a high weight to the optical flow of the pixel having a large luminance gradient with respect to the surroundings in the subject area, and the orientation or size variance with the optical flow calculated with the surrounding pixels. A high weight may be given to the optical flow of a pixel with a small. Alternatively, the second estimation unit 12 </ b> C may exclude a certain number of flows whose magnitude is greater than or equal to a certain value or less as outliers in the optical flow of the subject area, and equally weight the remaining optical flows. . The second estimation unit 12C can estimate the position of the corresponding region based on the optical flow with high reliability by setting the weight based on the luminance gradient and the variance of the direction or the size of the optical flow. is there.

(12C-2-2) Estimation method 2
The second estimation unit 12C uses, as a template, a subject region detected in one of the pair of motion estimation frame images or a converted image after the affine transformation, and a template for scanning the other frame image or the converted image after the affine transformation. Corresponding regions are detected by matching. The second estimation unit 12C may use any of the indices described in (12B-2) as a similarity index used for template matching, or may use another method.

Alternatively, the second estimation unit 12C may detect the corresponding region based on the distance (Euclidean distance) of the image feature amount expressing the color and texture. For example, the second estimation unit 12C extracts an image feature amount from the subject region detected in one of the pair of motion estimation frame images, and the distance from the detected image feature amount for an arbitrary region of the other frame image is A short area may be detected as the corresponding area.

Alternatively, the second estimation unit 12C roughly estimates the position of the corresponding region by template matching using the entire subject region as a template, and then searches again around each partial region generated by dividing the subject region, The corresponding area may be determined.

(12C-2-3) Estimation method 3
The second estimation unit 12C detects a feature point from the subject area detected in one of the pair of motion estimation frame images or the converted image after the affine change, and corresponds to the feature point from the other frame image or the converted image. The optical flow is detected by detecting the point to be performed. The second estimator 12C detects, as a corresponding region, a destination that has moved the subject region by the weighted average of the detected optical flow. The second estimation unit 12C may use, for example, a Harris corner point as a feature point, or may use a feature point detected by another method.

(12C-2-1), (12C-2-2), (12C-2-3) and (12C-1-1), (12C-1-2), (12C-1-3) described above The combination is not particularly limited. That is, the second estimation unit 12C performs (12C-2-) on the subject region detected by any of the methods (12C-1-1), (12C-1-2), and (12C-1-3). 1), (12C-2-2) or (12C-2-3) may be executed.

The second estimation unit 12C detects the subject area, and estimates the corresponding area after estimating the corresponding area. Examples of the method for estimating the motion of the subject include the following methods.

(12C-3-1) Motion estimation method 1
When the second estimation unit 12C detects, from one of the pair of motion estimation frame images, a set of pixels that move differently from the movement amount estimated by the affine transformation parameter as a subject area (12C-1-1) The movement of the subject is estimated by the following method. The second estimation unit 12C calculates the difference between the position information (coordinates) representing the position of the subject area and the position information of the corresponding area, and uses this as a temporary movement vector of the subject area. The second estimation unit 12C calculates a difference between the temporary movement vector and the movement vector of the image due to the camera movement in the pair of motion estimation frame images, and sets the difference as the true movement vector of the subject area between the pair. .

(12C-3-2) Motion estimation method 2
When the second estimation unit 12C detects a region having a large difference between the converted images generated by affine transformation of each pair of motion estimation frame images as a subject region from both of the converted images (12C-1-2) ) Estimate the movement of the subject by the following method. The second estimation unit 12C calculates the difference between the position information of the subject area of one converted image and the position information of the corresponding area detected from the other converted image, and the second estimation unit 12C calculates the difference between the pair of motion estimation frame images. Let it be a true movement vector.

(12C-3-3) Motion estimation method 3
When the second estimation unit 12C detects an area having a large difference between the converted image generated by affine transformation of one of the pair of motion estimation frame images and the other as the subject area (12C-1-3) The movement of the subject is estimated by the following method. The second estimation unit 12C calculates the difference between the position information of the subject area of one converted image and the position information of the corresponding area detected from the other frame image, and calculates the true of the subject between the pair of motion estimation frame images. The movement vector of This estimation method is a derivative form of (12C-3-2) described above.

The motion estimation unit 12 outputs the estimated motion information to the image generation unit 13. The motion information includes at least one of motion information caused by camera motion and motion information caused by subject motion.

If the camera is fixed, no motion information due to camera motion is required. When the subject is fixed, the motion information resulting from the motion of the subject is not necessary.

The motion estimation unit 12 outputs the time of each frame of the pair of motion estimation frame images used for motion estimation and the affine transformation parameters calculated between the pair as motion information resulting from the motion of the camera. The motion estimation unit 12 outputs motion information resulting from the motion of the camera by the number of pairs of motion estimation frame images that have undergone motion estimation.

The motion estimator 12 includes each frame image of the motion estimation frame image pair used for estimating the motion of the subject and its time, location information of the subject region, location information of the corresponding region of the subject region, and true information of the subject. The movement vector is output as movement information resulting from the movement of the subject. The position information of the subject region represents one coordinate of the pair of motion estimation frame images. The position information of the corresponding region represents the other coordinate of the pair of motion estimation frame images.

In addition, when the motion estimation unit 12 detects the subject region and estimates the corresponding region of the subject region in the converted image generated by affine transformation of the pair of motion estimation frame images, the motion estimation unit 12 is caused by the motion of the subject. The motion information is output as follows. The motion estimation unit 12 includes the time of each frame of the pair of motion estimation frame images used for the motion estimation of the subject, the location information of the subject region, the location information of the corresponding region of the subject region, and the true movement vector of the subject. Is output. The position information of the subject region represents coordinates in a converted image generated by affine transformation of one of the pair of motion estimation frame images. The position information of the corresponding region represents coordinates in a converted image generated by affine transformation of the other of the pair of motion estimation frame images.

The motion estimation unit 12 outputs motion information resulting from the motion of the subject for the number of pairs of motion estimation frame images for which motion estimation has been performed.

<Image generation unit 13>
FIG. 8 is a block diagram illustrating a configuration of the image generation unit 13.

The image generation unit 13 includes a first correction unit 13A, a second correction unit 13B, and a synthesis unit 13C.

The image generation unit 13 receives a plurality of frame images, analysis information from the determination unit 11, and motion information from the motion estimation unit 12 as inputs. When it is determined that the frame of interest is a frame image including a bright area due to the blinking of light, the image generation unit 13 corrects the frame image for motion estimation to an image at the time of the frame of interest image, Output as a corrected frame image.

The first correction unit 13A first generates a first corrected image by correcting the motion of the camera for each motion estimation frame image. Next, the second correction unit 13B generates a second corrected image by correcting the motion of the subject for each motion estimation frame image. The synthesizer 13C generates a second corrected image for each motion estimation frame image, and generates a corrected frame image by combining them.

The first correction unit 13A corrects the camera motion by, for example, the following method based on the image data of the pair of motion estimation frame images and the affine transformation parameters calculated between the pair.

The first correction unit 13A determines that there is no camera movement when each value of the affine transformation parameter is smaller than a preset threshold value, and does not need to correct the camera movement. In this case, the first correction unit 13A regards the uncorrected motion estimation frame image as the first corrected image.

(13A-1) Camera motion correction method 1
The first correcting unit 13A selects the first and second frame images that are closest to the target frame image and do not include the bright region as the motion estimation frame images (12A-1) by the following method. A corrected image is generated. The first correction unit 13A uses the affine transformation parameters calculated between the two selected frame images to generate correction frame images from these frame images, respectively.

Specifically, as described in (12C-1-2), the first correction unit 13A uses one of the motion estimation frame images as the image I and the other as the image I ′. The predicted images Ip and Ip ′ at the time t of the image are generated as the first corrected image.

(13A-2) Camera motion correction method 2
When a plurality of frame images are selected as the motion estimation frame images from before and after the attention frame image (12A-2), the first correction unit 13A generates the first correction image by the following method. The first correction unit 13A generates a first correction image from each pair based on each affine transformation parameter calculated from a plurality of pairs of motion estimation frame images.

Specifically, as described in (12C-1-2), the first correction unit 13A uses one of the motion estimation frames as an image I and the other as an image I ′. Predicted images I _p and I _p ′ at time t are generated as first corrected images. For example, as illustrated in FIG. 7, when the first correction unit 13A selects two frames before and after the target frame image and performs motion estimation for two pairs, the target frame image generated for each selected frame The four predicted images at the time are set as the first corrected image.

(13A-3) Camera motion correction method 3
When the target frame image and one of the frame images before and after the target frame image are selected as the motion estimation frame image (12A-3), the first correction unit 13A performs the first correction by the following method. Generate an image. The first correction unit 13A generates a first correction image from the selected frame image based on the affine transformation parameters calculated between the target frame image and the selected frame image.

Specifically, as described in (12C-1-2), when the frame image selected as the motion estimation frame image is set as an image I, the first correction unit 13A at the time t of the frame image of interest. The predicted image _Ip is generated as the first corrected image.

Based on the first corrected image and the true movement vector input from the motion estimation unit 12, the second correction unit 13 </ b> B updates the pixel information of the position of the subject in the frame image of interest to update the movement of the subject. to correct. Specifically, the second correction unit 13B can correct the movement of the subject by the following method.

The second correction unit 13B determines that there is no movement of the subject when each value of the true movement vector of the subject is smaller than a preset threshold value, and does not correct the movement of the subject. Also good. In this case, the second correction unit 13B regards the first correction image as the second correction image.

Based on the true movement vector of the subject between the pair of motion estimation frame images and the time information of the pair and the attention frame image, the second correction unit 13B and each frame image and the attention frame image of the pair To determine the true movement vector of the subject.

The second correction unit 13B uses the pixel value of the subject area specified from the first correction image, and the previous pixel moved by the true movement vector from the coordinates of the subject area specified from the first correction image The value and the pixel value of the coordinates of the subject area specified from the first correction frame are updated. Accordingly, the second correction unit 13B generates a second corrected image.

The second correction unit 13B may update the pixel value by replacing the pixel value of the movement destination with the pixel value of the subject area. Further, the second correction unit 13B may replace the pixel value of the movement destination with a weighted average value of the pixel value and the pixel value of the subject area, or the pixel value of the movement destination may be a pixel value around the movement destination. And a weighted average value based on pixel values of the subject area.

Further, the second correction unit 13B may replace the pixel value of the coordinates of the subject area with the previous pixel value moved by the inverse vector of the true movement vector. The second correction unit 13B may replace the pixel value of the coordinates of the subject area with a weighted average value with the previous pixel value moved by the inverse vector of the true movement vector, or the inverse vector of the true movement vector The pixel value may be replaced with the weighted average value of the previous pixel value and its surrounding pixels.

Note that the true movement vector of the subject between each frame image of the pair of frame images for motion estimation and the target frame image is obtained by the following equation. Here, the true movement vector of the subject area between the frame images I1 and I2 constituting the pair of motion estimation frame images is V, the times of the frame images I1 and I2 are T1 and T2, respectively, and the time of the frame of interest is Let T3 (T1 <T3 <T2).

True movement vector of the subject from the frame image I1 to the frame image of interest:
V · (T3-T1) / (T2-T1) (Formula 11)
True movement vector of the subject from the frame image I2 to the frame image of interest:
-V · (T2-T3) / (T2-T1) (Formula 12)

In addition, the second correction unit 13B determines that the pixel of the first correction image corresponding to the pixel determined to be the subject area in the motion estimation frame image is the pixel of the subject area, thereby determining the first correction image. The subject image can be specified from the above.

The combining unit 13C can generate a corrected frame image by combining a plurality of second corrected images. For example, the combining unit 13C is corrected frame image _{I c} can be generated by equation (13). Here, the number of second correction images is N, the second correction image is I _i (i = 1,..., N), and the weight is wi. When the time difference between the motion estimation frame image corresponding to the second corrected image and the target frame image is Di, the weight wi is larger as the absolute value of Di (| Di |) is smaller.

Note that the combining unit 13C may calculate wi based on a function that linearly increases as | Di |

<Image composition unit 14>
The image synthesizing unit 14 synthesizes the frame image of interest and the correction frame image to generate and output a frame image (hereinafter referred to as “output frame image”) in which blinking due to flash or the like is suppressed.

When it is determined that the target frame image is an image including a bright region and a corrected frame image is generated, the image composition unit 14 calculates a composition ratio in each pixel, and generates an output image by composition processing. . In other cases, the image composition unit 14 uses the input frame image of interest as an output frame image as it is. When the composition ratio u (x, y) at the target pixel I _t (x, y) at the position (x, y) is given, the image composition unit 14 outputs the value I _out (x, y) of the output frame image at the same position. y) is calculated as shown in equation (14).

The image composition unit 14 can calculate the composition ratio using the change rate of the local area luminance between the target frame image and the corrected frame image. The image composition unit 14 calculates the local region luminance change rate r _t-es between the frame image of interest and the corrected frame image using a method similar to the method in which the determination unit 11 calculates the local region luminance change rate. can do. The image composition unit 14 uses the composition ratio u (x, y) at the target pixel at the position (x, y) as the change rate r _t-es (x, y) of the local region luminance at the same position (x, y). Using the preset value r _tar (x, y) of the local area luminance change rate in the output frame image corresponding to the value of r _t-es (x, y) as shown in equation (15) Can be calculated. The image composition unit 14 calculates the composition ratio u (x, y) so that the change rate of the local area luminance in the output frame image becomes r _tar (x, y).

As an example of a method of setting the local area intensity of the rate of change of the value r _tar in the output frame image, as shown in the graph of FIG. 9, and r _tar = r _t-es for r _t-es a certain small value, For large values of r _t-es, there is a way to set r _tar to a predetermined maximum value and not to exceed that value.

The image composition unit 14 may calculate the composition ratio using the change rate of the rectangular area luminance. Specifically, the image composition unit 14 first corresponds to the rectangular region luminance change rate R _t-es calculated using the same method as the determination unit 11 and a preset value of R _t-es. The composition ratio U for each rectangular area is calculated from the change rate of the luminance of the rectangular area of the output frame image. Next, the image composition unit 14 obtains a composition ratio u for each pixel from the composition ratio U for each rectangular area using linear interpolation or bicubic interpolation.

[Operation]
Next, the operation of the present embodiment will be described with reference to FIGS.

The determination unit 11 determines whether or not the frame image of interest at time t is a frame image including a bright region caused by blinking of light due to flash or the like that may induce a photosensitivity seizure (S11).

The motion estimation unit 12 selects a motion estimation frame image from a plurality of frame images including the frame image of interest, and estimates the amount of image movement due to the motion of the camera and the subject between the motion estimation frame images (S12). .

The image generation unit 13 uses the camera and subject image between the motion estimation frame image and the target frame image based on the movement amount of the pixel due to the camera and subject motion estimated between the motion estimation frame images. Is estimated. In addition, the image generation unit 13 converts each motion estimation frame image into an image at the time of the target frame image, and generates a corrected frame image by synthesizing the converted images (S13).

The image synthesizing unit 14 synthesizes the attention frame image and the correction frame image, and generates and outputs an output frame image in which blinking due to flash or the like is suppressed (S14).

[effect]
The video processing apparatus 100 according to the present embodiment can generate a natural video in which variation in luminance is suppressed with respect to a video including a large luminance change that may induce a photosensitivity seizure. .

The reason is that the video processing apparatus 100 synthesizes a frame image having no luminance change estimated from other frame images with respect to a target frame image including a region having a large luminance change while changing the weight for each pixel. It is. As a result, the video processing apparatus 100 can correct only an area where there is a large luminance change and restore information lost due to blinking or the like.

By the way, blinking by flash etc. occurs at a press conference. For example, in a press conference, a subject (conference member) walks to a conference seat, sits down, and leaves after the conference. The camera follows the subject as the subject moves. In this case, the shooting range of the camera moves following the subject.

輪郭 When an image is synthesized without considering the movement of the camera or subject, blurring or blurring of the outline occurs. When this video is played back, only the frame in which the luminance is suppressed appears to have the subject's outline thickened due to blurring or blurring, and the smoothness of the movement is impaired.

Since the video processing apparatus 100 corrects the image by estimating the movement of the camera and the subject, it can suppress blurring and blurring of the contour and generate a smooth video.

[Another embodiment]
In the above-described embodiment, an example of a bright region in which the blinking region is brighter than the other frame image by a predetermined level or more (brightness increases) in the frame image of interest has been described. However, the video processing apparatus 100 can be similarly applied to a case where the blinking region is a dark region that is darker than the other frame images by a predetermined level or more (the luminance is reduced) in the frame image of interest.

By the way, when the flash is scattered sporadically, the above bright area occurs. On the other hand, as the number of flashes increases, the overall brightness increases. When a large number of flashes are fired intermittently, a dark region is instantaneously generated.

The determination unit 11 determines whether there is an area that is darker than a predetermined level from the frame image at time (t + k) among the plurality of frame images to which the frame image of interest at time t is input. For example, the determination unit 11 uses the preset threshold value α ′ of the luminance fluctuation rate and the threshold value β ′ of the area rate to determine the area of the region where the local region luminance change rate r _{t−t + k} is lower than the threshold value α ′. Judgment is made based on whether the rate exceeds the threshold β ′.

When it is determined that there is a region that is larger and darker than the frame image at the time (t + k) of the input frame image at the time t, the determination unit 11 sets the determination flag flag _{t-t + k} to “ 1 ”. Otherwise, the determination unit 11 may set the determination flag flag _{t-t + k} to “0”. The determination unit 11 calculates a determination flag for the combination of the target frame image and all the other input frame images. The determination unit 11 determines that the target frame image is a frame image including a dark region due to blinking of light when there is a frame image whose determination flag is “1” at each time before and after the target frame image.

As another method, the determination unit 11 may use a method using a change rate of the luminance of the rectangular area. For example, the determination unit 11 uses the threshold value α ′ for the luminance fluctuation rate and the threshold value β ′ for the area ratio, which are set in advance, so that the area ratio of the region where the change rate of the luminance of the rectangular area is lower than the threshold value α ′ Depending on whether or not it exceeds, “1” or “0” is set to the determination flag flag _{t-t + k} .

Furthermore, in the above-described embodiment, an example of luminance fluctuation, that is, a general flash has been described. However, the video processing apparatus 100 can be similarly applied to a change in saturation such as red flash. Therefore, the above-described embodiment may include a mode in which “luminance” is replaced with “saturation” or “luminance or saturation”.

[Others]
The embodiment according to the present invention can be applied to a video editing system for editing video recorded on a hard disk or the like. In addition, the embodiment according to the present invention can be applied to a video camera, a display terminal, and the like by using a frame image held in a memory.

As is clear from the above description, the embodiment according to the present invention can be configured by hardware, but can also be realized by a computer program. In this case, the video processing apparatus 100 realizes the same functions and operations as those in the above-described embodiment by a processor that operates according to a program stored in the program memory. In the above-described embodiment, only a part of the functions can be realized by a computer program.

FIG. 11 is a block diagram illustrating a hardware configuration of the computer apparatus 200 that implements the video processing apparatus 100. The computer apparatus 200 includes a CPU (Central Processing Unit) 201, a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, a storage device 204, a drive device 205, a communication interface 206, and an input / output interface. 207. The video processing apparatus 100 can be realized by the configuration (or part thereof) shown in FIG.

The CPU 201 executes the program 208 using the RAM 203. The program 208 may be stored in the ROM 202. The program 208 may be recorded on a recording medium 209 such as a flash memory and read by the drive device 205 or transmitted from an external device via the network 210. The communication interface 206 exchanges data with an external device via the network 210. The input / output interface 207 exchanges data with peripheral devices (such as an input device and a display device). The communication interface 206 and the input / output interface 207 can function as means for acquiring or outputting data.

Note that the video processing apparatus 100 may be configured by a single circuit (such as a processor) or a combination of a plurality of circuits. The circuit here may be either dedicated or general purpose.

In addition, although a part or all of said embodiment can be described also as the following additional remarks, it is not restricted to the following.

(Appendix 1)
Determination means for determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region whose luminance or saturation differs by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject Motion estimation means for estimating a second movement amount to be
Image generation for generating a correction frame image corresponding to a frame image at the shooting time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount Means,
An image processing apparatus comprising: an image combining unit that combines the frame image of interest and the correction frame image.

(Appendix 2)
The motion estimation means includes
The video processing apparatus according to claim 1, further comprising selection means for selecting at least one of the pair from a frame image other than the frame image of interest.

(Appendix 3)
The motion estimation means includes
The video processing according to claim 2, further comprising: first estimation means for calculating a geometric transformation parameter based on a positional relationship between corresponding points or corresponding regions detected between the pair of frame images and estimating the first movement amount. apparatus.

(Appendix 4)
The motion estimation means includes
A subject area is detected from one frame image of the pair based on the first movement amount, a corresponding area corresponding to the subject area is detected from the other frame image of the pair, and the subject area and the corresponding area are detected. The video processing apparatus according to claim 3, further comprising: a second estimation unit that estimates the second movement amount based on the second movement amount.

(Appendix 5)
The motion estimation means includes
A subject area is detected from each frame image of the pair by subtracting the first movement amount based on the geometric transformation parameter, and the second pixel movement amount is detected based on the detected subject area. The video processing apparatus according to claim 3, further comprising: second estimation means for estimating

(Appendix 6)
The image generating means includes
First correction means for generating a first correction image from each frame image of the pair based on the first movement amount;
Second correction means for generating a second corrected image from each of the first corrected images based on the second movement amount;
The video processing apparatus according to any one of appendices 1 to 5, further comprising: a combining unit configured to combine each of the second correction frame images.

(Appendix 7)
The determination means includes
A frame image in which a region having a luminance or saturation change rate greater than or equal to a specified value or less than a specified area is determined to be the target frame image with any other frame image. The video processing apparatus described.

(Appendix 8)
The image composition means includes
The video processing apparatus according to any one of appendices 1 to 7, wherein a composite ratio for combining the frame image of interest and the correction frame image is calculated based on a predetermined function.

(Appendix 9)
The image composition means includes
As a composition ratio for compositing the attention frame image and the correction frame image, the composition ratio of the correction frame image is large for an area where the rate of change between the attention frame image and the correction frame image is large. The video processing device according to any one of supplementary notes 1 to 8.

(Appendix 10)
Based on the pair of frame images selected based on the difference in luminance or saturation from the frame image of interest and the frame images before and after the frame image of interest, and / or due to the movement of the subject and / or the movement of the subject Estimate the second travel,
Based on the selected pair and the estimated first movement amount and / or second movement amount, a corrected frame image corresponding to a frame image at the shooting time of the frame image of interest is generated,
A video processing method for combining the frame image of interest and the correction frame image.

(Appendix 11)
The video processing method according to claim 10, wherein at least one of the pair is selected from frame images other than the frame image of interest.

(Appendix 12)
The video processing method according to claim 11, wherein a geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the pair of frame images, and the first movement amount is estimated.

(Appendix 13)
A subject area is detected from one frame image of the pair based on the first movement amount, a corresponding area corresponding to the subject area is detected from the other frame image of the pair, and the subject area and the corresponding area are detected. The video processing method according to claim 12, wherein the second movement amount is estimated based on the first movement amount.

(Appendix 14)
A subject area is detected from each frame image of the pair by subtracting the first movement amount based on the geometric transformation parameter, and the second pixel movement amount is detected based on the detected subject area. The video processing method according to appendix 12.

(Appendix 15)
Generating a first corrected image from each frame image of the pair based on the first movement amount;
Generating a second corrected image from each of the first corrected images based on the second movement amount;
The video processing method according to any one of appendices 10 to 14, wherein each of the second correction frame images is synthesized.

(Appendix 16)
A frame image in which a region having a luminance or saturation change rate equal to or greater than a specified value or less than another frame image occupies a specified area or more is determined to be the target frame image. Any one of Supplementary Notes 10 to 15 The video processing method described.

(Appendix 17)
The video processing method according to any one of appendices 10 to 16, wherein a synthesis ratio for synthesizing the frame image of interest and the correction frame image is calculated based on a predetermined function.

(Appendix 18)
As a composition ratio for compositing the attention frame image and the correction frame image, the composition ratio of the correction frame image is increased for an area where the rate of change between the attention frame image and the correction frame image is large. The video processing method according to any one of supplementary notes 10 to 17.

(Appendix 19)
On the computer,
A process of determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region that differs in luminance or saturation by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject A process of estimating a second movement amount to be performed;
Processing for generating a corrected frame image corresponding to a frame image at the photographing time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount; ,
The video processing program for performing the process which synthesize | combines the said attention frame image and the said correction | amendment frame image.

(Appendix 20)
In the estimation process,
The video processing program according to claim 19, wherein at least one of the pair is selected from frame images other than the frame image of interest.

(Appendix 21)
In the estimation process,
The video processing program according to claim 20, wherein a geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the pair of frame images, and the first movement amount is estimated.

(Appendix 22)
In the estimation process,
A subject area is detected from each frame image of the pair by subtracting the first movement amount based on the geometric transformation parameter, and the second pixel movement amount is detected based on the detected subject area. The video processing program according to appendix 21.

(Appendix 23)
In the estimation process,
A subject area is detected from one frame image of the pair based on the first movement amount, a corresponding area corresponding to the subject area is detected from the other frame image of the pair, and the subject area and the corresponding area are detected. The video processing program according to attachment 21, wherein the second movement amount is estimated based on

(Appendix 24)
In the process of generating the corrected frame image,
Generating a first corrected image from each frame image of the pair based on the first movement amount;
Generating a second corrected image from each of the first corrected images based on the second movement amount;
24. The video processing program according to any one of supplementary notes 19 to 23, wherein each of the second correction frame images is synthesized.

(Appendix 25)
In the determination process,
A frame image in which a region in which a change rate of luminance or saturation is greater than or less than a specified value or less than a specified area with another frame image is determined to be the attention frame image The described video processing program.

(Appendix 26)
In the process of combining,
The video processing program according to any one of supplementary notes 19 to 25, wherein a composite ratio for combining the frame image of interest and the correction frame image is calculated based on a predetermined function.

(Appendix 27)
In the process of combining,
As a composition ratio for compositing the attention frame image and the correction frame image, the composition ratio of the correction frame image is large for an area where the rate of change between the attention frame image and the correction frame image is large. The video processing program according to any one of appendices 19 to 26.

(Appendix 28)
Selection means for selecting a first frame image and a second frame image from a plurality of temporally continuous frame images;
A geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the first frame image and the second frame image, and a first movement amount due to camera movement is estimated. First estimating means for:
A subject area is detected from the first frame image and the second frame image by subtracting the first movement amount based on the geometric transformation parameter, and a subject is detected based on the detected subject area. And a second estimating means for estimating a second movement amount resulting from the movement of the video processing apparatus.

(Appendix 29)
Selecting a first frame image and a second frame image from a plurality of temporally continuous frame images;
A geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the first frame image and the second frame image, and a first movement amount due to camera movement is estimated. And
A subject area is detected from the first frame image and the second frame image by subtracting the first movement amount based on the geometric transformation parameter, and a subject is detected based on the detected subject area. A video processing method for estimating a second movement amount resulting from the movement of the video.

(Appendix 30)
On the computer,
A process of selecting a first frame image and a second frame image from a plurality of temporally continuous frame images;
A geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the first frame image and the second frame image, and a first movement amount due to camera movement is estimated. Processing to
A subject area is detected from the first frame image and the second frame image by subtracting the first movement amount based on the geometric transformation parameter, and a subject is detected based on the detected subject area. A program for executing a process of estimating a second movement amount caused by the movement of.

Although the present invention has been described with reference to the preferred embodiments, the present invention is not necessarily limited to the above-described embodiments, and various modifications can be made within the scope of the technical idea.

This application claims priority based on Japanese Patent Application No. 2015-000630 filed on January 6, 2015, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 11 Determination part 12 Motion estimation part 12A Selection part 12B 1st estimation part 12C 2nd estimation part 13 Image generation part 13A 1st correction part 13B 2nd correction part 13C Composition part 14 Image composition part

Claims

Determination means for determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region whose luminance or saturation differs by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject Motion estimation means for estimating a second movement amount to be
Image generation for generating a correction frame image corresponding to a frame image at the shooting time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount Means,
An image processing apparatus comprising: an image combining unit that combines the frame image of interest and the correction frame image.
The motion estimation means includes
The video processing apparatus according to claim 1, further comprising a selection unit that selects at least one of the pair from a frame image other than the frame image of interest.
The motion estimation means includes
The video according to claim 2, further comprising: a first estimation unit that calculates a geometric transformation parameter based on a positional relationship between corresponding points or corresponding regions detected between the pair of frame images and estimates the first movement amount. Processing equipment.
The motion estimation means includes
A subject area is detected from one frame image of the pair based on the first movement amount, a corresponding area corresponding to the subject area is detected from the other frame image of the pair, and the subject area and the corresponding area are detected. The video processing apparatus according to claim 3, further comprising: a second estimation unit that estimates the second movement amount based on the first movement amount.
The motion estimation means includes
A subject area is detected from each frame image of the pair by subtracting the first movement amount based on the geometric transformation parameter, and the second pixel movement amount is detected based on the detected subject area. The video processing apparatus according to claim 3, further comprising second estimating means for estimating
The image generating means includes
First correction means for generating a first correction image from each frame image of the pair based on the first movement amount;
Second correction means for generating a second corrected image from each of the first corrected images based on the second movement amount;
The video processing apparatus according to claim 1, further comprising: a combining unit that combines each of the second correction frame images.
The image composition means includes
The video processing device according to any one of claims 1 to 6, wherein each pixel of the frame image of interest and the correction frame image are combined at a ratio corresponding to a difference in luminance between the pixels.
It is determined whether any of a plurality of temporally continuous frame images is an attention frame image including a blinking region whose luminance or saturation is different from a preceding frame image by a predetermined level or more.
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject Estimating a second movement amount to be
Based on the selected pair and the estimated first movement amount and / or second movement amount, a corrected frame image corresponding to a frame image at the shooting time of the frame image of interest is generated,
A video processing method for combining the frame image of interest and the correction frame image.
On the computer,
A process of determining whether any of a plurality of temporally continuous frame images is a noticed frame image including a blinking region that differs in luminance or saturation by a predetermined level or more with respect to the preceding and following frame images;
Based on a pair of frame images selected based on the difference in brightness or saturation from the frame image of interest and the frame images before and after the frame image of interest and / or the movement of the subject A process of estimating a second movement amount to be performed;
Processing for generating a corrected frame image corresponding to a frame image at the photographing time of the frame image of interest based on the selected pair and the estimated first movement amount and / or second movement amount; ,
A recording medium on which a program for executing the process of combining the frame image of interest and the correction frame image is recorded.
Selection means for selecting a first frame image and a second frame image from a plurality of temporally continuous frame images;
A geometric transformation parameter is calculated based on a positional relationship between corresponding points or corresponding regions detected between the first frame image and the second frame image, and a first movement amount due to camera movement is estimated. First estimating means for:
A subject area is detected from the first frame image and the second frame image by subtracting the first movement amount based on the geometric transformation parameter, and a subject is detected based on the detected subject area. And a second estimating means for estimating a second movement amount resulting from the movement of the video processing apparatus.