WO2007069350A1

WO2007069350A1 - Image encoding and decoding method and device

Info

Publication number: WO2007069350A1
Application number: PCT/JP2006/309233
Authority: WO
Inventors: Mikhail Tsoupko-Sitnikov; Igor Borovikov; Shinichi Yamashita; Masuharu Endo
Original assignee: Monolith Co., Ltd.
Priority date: 2005-12-12
Filing date: 2006-05-08
Publication date: 2007-06-21
Also published as: JPWO2007069350A1

Abstract

An encoding technique includes a step of performing matching calculation between both end image frames of an image group containing three or more image frames; a step of virtually generating an intermediate image frame sandwiched by the both end image frames by interpolation according to the corresponding point information between the both end information frames obtained as a result of the matching calculation; a step of judging which of the intermediate frames virtually generated has a difference not smaller than an allowance value from the actual intermediate image frame according to a predetermined judgment reference, a step performed when an intermediate image frame having a difference not smaller than the allowance value exists, for identifying a region having a large difference on the intermediate image frame; and a step of generating encoded data including the difference information concerning the identified region or the both end image frames and the corresponding point information.

Description

Specification

Method and apparatus for image coding and decoding

Technical field

[0001] The present invention relates to coding technology and decoding technology for images, in particular moving images.

Background art

Motion Picture Experts Group (MPEG) is a standard technology for moving picture compression.

In MPEG, block matching is used. This matching performs block search so as to minimize the difference between blocks. Therefore, the difference does not necessarily become smaller. It is not always the case that areas that correspond to each other between the frames necessarily correspond to each other.

Patent Document 1: Patent No. 2927350

Disclosure of the invention

Problem that invention tries to solve

In MPEG, when trying to increase the compression rate, so-called block noise becomes a problem. In order to suppress the generation of this noise and to further increase the compression rate focusing on interframe coherency, it is necessary to revise the existing block matching based technology. The technology to be sought should be encoded so that the areas or pixels that correspond to each other will correspond correctly, and that simple block matching should be avoided! /.

Means to solve the problem

The objects of the present invention are as follows. First of all, we will provide video compression technology, that is, video coding technology, that will not generate block noise that is a problem in MPEG. In addition, the present invention provides a moving picture decoding technique corresponding to the moving picture coding technique. Another object of the present invention is to provide a new video coding and decoding technology that uses an image matching technology different from MPEG. Another object is to provide different image encoding and decoding techniques as a whole, using MPEG-like image matching techniques.

One aspect of the image coding method of the present invention generates an intermediate image frame by interpolation calculation based on corresponding point information between the first and second image frames. If the difference between this intermediate image frame and the actual intermediate image frame is large! Identify large areas in the image. Next, code data is generated in a form including difference information on the specified area, data of at least the first or second image frame, and corresponding point information. Decoding techniques follow the reverse process.

[0006] It is also effective as the present invention to replace the above steps, replace part or all of the representation between the method and the device, or change the representation into a computer program, a recording medium, etc. It is.

Effect of the invention

According to the present invention, an effect corresponding to the above object is obtained.

Brief description of the drawings

[FIG. 1] FIGS. 1 (a) and 1 (b) are images obtained by applying an averaging filter to the faces of two people, and FIGS. 1 (c) and 1 (d) are images of the two. The image of p (5, 0) required by the base technology for the face of a person, Figures 1 (e) and 1 (f) are the images of p (5, 1) required by the base technology for the face of two people The images in Fig. 1 (g) and Fig. 1 (h) are the images of p (5, 2) that are required by the background art for the faces of the two people, and Figs. L (i) and l (j) are the two images. It is the photograph of the halftone image which each displayed on the display the image of p (5, 3) calculated | required by the base technology regarding a person's face.

[Figure 2] Figure 2 (R) shows the original quadrilateral, Figure 2 (A), Figure 2 (B), Figure 2 (C), Figure 2 (D), and Figure 2 (E) are each. It is a figure which shows a succession quadrilateral.

FIG. 3 is a diagram showing the relationship between the start point image and the end point image, and the relationship between the mth level and the m−1 level using an inheritance quadrilateral.

[Fig. 4] This is a diagram showing the relationship between parameter 7? And energy Cf.

[Fig. 5] Figs. 5 (a) and 5 (b) are diagrams showing how to determine whether the mapping at a certain point satisfies the bijective condition from the cross product calculation.

[FIG. 6] A flowchart showing the entire procedure of the prerequisite technology.

[FIG. 7] A flowchart showing the details of S1 in FIG.

[FIG. 8] A flowchart showing the details of S10 in FIG.

[FIG. 9] A diagram showing correspondence between a part of the image at the m-th level and a part of the image at the m-th level.

FIG. 10 is a diagram showing a starting point hierarchical image generated by the base technology. [FIG. 11] A diagram showing a procedure of preparation for matching evaluation before proceeding to S2 in FIG.

[FIG. 12] A flowchart showing the details of S2 in FIG.

FIG. 13 is a diagram showing how to determine a submapping at the 0th level.

FIG. 14 is a diagram showing how a submapping is determined at the first level.

15 is a flowchart showing the details of S21 in FIG.

[FIG. 16] It is a figure which shows the behavior of energy C (m, s) f corresponding to f (m, s) (= i A) calculated | required, changing (lambda) about certain f (m, s).

[Fig. 17] shows the behavior of energy C (n) f corresponding to f (n) (r? = 1 Δ r?) (I = 0, 1, ···) obtained while changing r? FIG.

[Fig. 18] This is a flowchart for obtaining the submapping at the m-th level in the improved base technology.

FIG. 19 is a diagram showing a flow of an image coding technology and a configuration of an image coding apparatus according to the first embodiment.

[FIG. 20] FIGS. 20 (a) to 20 (c) are diagrams showing examples of target image frames.

FIG. 21 is a diagram showing a data format of the image coding technology according to the first embodiment.

FIG. 22 is a diagram showing a flow of image decoding technology and a configuration of the image decoding apparatus according to the first embodiment.

FIG. 23 is a diagram showing a flow of image decoding technology and a configuration of the image decoding apparatus according to the second embodiment.

FIG. 24 is a diagram showing a flow of image decoding technology and a configuration of the image decoding device according to a second embodiment.

FIG. 25 is a diagram showing the configuration of DE + NR of FIG. 23 according to the embodiment.

BEST MODE FOR CARRYING OUT THE INVENTION

In the following embodiment, an image matching technique is used. This technology can use the technology proposed by the present applicant in the patent 2927350 (hereinafter referred to as “prerequisite technology”). However, other matching techniques may be used. In any of the following aspects, the modifications and considerations described in any of the sections are the same as in the other sections. It may be applied to surgery.

[0010] First, the multiresolution singular point filter technology used in the embodiment and the image matching processing using it will be described in detail as a "prerequisite technology".

[Embodiment of Prerequisite Technology]

First, the basic techniques of the base technology will be described in detail in [1], and the processing procedure will be specifically described in [2]. Furthermore, I will describe the points that have been improved based on the base technology in [3].

[1] Details of elemental technology

[1. 1] Introduction

We introduce a new multiresolution filter called singularity filter, and calculate the matching between images correctly. No prior knowledge of the object is required. Calculations of matching between images are calculated at each resolution while traversing the resolution hierarchy. At that time, the coarse level power also breaks down the hierarchy of resolution in order to the fine level. Parameters required for calculation are completely set automatically by dynamic calculation similar to human visual system. It is not necessary to manually identify corresponding points between images.

The present technology can be applied to, for example, completely automatic morphing, object recognition, stereoscopic photogrammetry, volume rendering, and generation of smooth moving images with less frame force. When used for morphing, a given image can be deformed automatically. When used for volume rendering, it is possible to accurately reconstruct an intermediate image between cross sections. The same is true even when the shape of the cross section where the distance between the cross sections increases greatly changes.

[0013] [1.2] Hierarchy of singular point filters

The multiresolution singular point filter according to the base technology can preserve the brightness and position of each singular point contained in the image while reducing the resolution of the image. Here, let N be the width of the image and M be the height. For the sake of simplicity, it is assumed that N = M = 2n (n is a natural number). Also, the interval [0, N] CR is described as I. Describe the pixel of the image at (i, j) as p (i, j) (i, j EI).

Here, a multi-resolution hierarchy is introduced. Layered images are generated by multiresolution filters. The multi-resolution filter performs a two-dimensional search on the original image to detect singular points, and extracts the detected singular points. Another image with lower resolution than the original image Generate an image. Here, the size of each image at the m-th level is 2 m × 2 m (0 ≤ m ≤ n). The singular point filter recursively constructs the following four new hierarchical images in the direction from n.

[Number 1]

(m, 0)-I '((ΤΛ + 1, 0) (m + 1, 0),-/ (m + 1, 0) fm + 1, 0) \ \

(Expression 1)

_ _ ゝ

[Number 2]

I assume. Hereinafter, these four images are called sub-images. minx≤t≤x + l ma ^! ^ + :! Denoting H and H respectively, the sub-image can be described as follows.

P (m, 0) = α (x) a (y) p (m + l, 0)

P (m, 1) = α (x) β (y) p (m + 1, 1)

P (m, 2) = β (x) a (y) p (m + 1, 2)

P (m, 3) = β (x) β (y) p (m + 1, 3)

That is, they are considered to be like tensor products of a and e. Each subimage corresponds to a singular point. As apparent from these equations, the singular point filter detects a singular point for each block composed of 2 × 2 pixels in the original image. At that time, a point having the maximum pixel value or the minimum pixel value is searched for in two directions of each pattern, ie, in the vertical and horizontal directions. As pixel values, luminance is adopted in the base technology, but various numerical values related to the image can be adopted. An image that has the largest pixel value in both directions The element is the maximum point, the pixel having the minimum pixel value in both directions is the minimum point, and the pixel having the maximum pixel value in one of the two directions is detected as the saddle point.

The singular point filter reduces the resolution of the image by representing the image of the block (here, 4 pixels) by the image (here, 1 pixel) of the singular point detected inside each block. From a theoretical point of view of singularity, ex (X) a (y) preserves the local minimum, β (X) β (y) preserves the local maximum, a (χ) β (y) and | 8 (χ) a (y) saves the saddle point.

First, singular point filtering is separately applied to the start point (source) image and the end point (destination) image to be matched to generate a series of image groups, ie, a start point hierarchical image and an end point hierarchical image. Keep it. Four types of start point hierarchical images and four end point hierarchical images are generated corresponding to the types of singular points.

After that, matching of the start point hierarchical image and the end point hierarchical image is performed in the series of resolution levels. First, p (m, 0) is used to match minimum points. Next, based on the result, the saddle point matching is performed using P (m, 1), and the other saddle point matching is performed using p (m, 2). Finally, the maximum points are matched using p (m, 3).

FIG. 1 (c) and FIG. 1 (d) show sub-image p (5, 0) of FIG. 1 (a) and FIG. 1 (b), respectively. Similarly, Figure 1 (e) and Figure 1 (f) are p (5, 1), Figure 1 (g) and Figure 1 (h) are p (5, 2), Figure 1 (i) and Figure 1 j) shows p (5, 3) respectively. As can be seen from these figures, the sub-image makes it easy to match feature parts of the image. At first, eyes are clarified by P (5, 0). The eye is also a force that is the minimum point of brightness in the face. According to p (5, 1) the mouth is clear. The mouth is low in luminance in the horizontal direction. According to p (5, 2), the vertical lines on both sides of the neck become clear. Finally, p (5, 3) clarifies the brightest point of the ear. These are the maximum points of luminance.

According to the singular point filter, the feature of the image can be extracted. For example, by comparing the feature of the image taken by the camera with the feature of some objects recorded in advance, the image is displayed on the camera. Subject can be identified.

[0021] [1.3] Calculation of mapping between images The pixel of the position (i, j) of the start image is written as p (n) (i, j), and the pixel of the position (k, 1) of the end image is similarly described by q (n) (k, 1). Let i, j, k, 1 EI. Define the energy of mapping between images (described later). This energy is determined by the difference between the luminance of the pixel of the source image and the luminance of the corresponding pixel of the destination image, and the smoothness of the mapping. The mapping f (m, 0) between p (m, 0) and q (m, 0) with minimum energy is first calculated: p (m, 0) → q (m, 0). Based on f (m, 0), the mapping f (m, 1) between p (m, l) and q (m, 1) with minimum energy is calculated. This procedure continues until the computation of the mapping f (m, 3) between p (m, 3) and q (m, 3) is complete. Each mapping f (m, i) (i = 0, 1, 2, ...) is called a submapping. For convenience of calculation of f (m, i), the order of i can be rearranged as follows. The reason why sorting is necessary will be described later.

[0022] [Equation 3] ^Ρ ' ^Ρ ' ⁷ (Equation 3)

Here, it is σ (i) e {0, 1, 2, 3}.

[1. 3. 1] bijection

If the mapping between the source and destination images is expressed as a mapping, the mapping should satisfy the bijective condition between both images. It is also a force that both pixels should be connected by a surjective and an injective, which both have no concept superiority or inferiority in both images. However, unlike the usual case, the mapping to be constructed here is a bijective digital version. In the base technology, pixels are specified by grid points.

The mapping from the start point sub-image (the sub-image provided to the start-point image) to the end-point sub-image (the sub-image provided for the end-point image) is f (m, s): l / 2n− It is represented by mXl / 2n-m → l / 2n-mXl / 2n-m (s = 0, 1, ...). Here, f (m, s) (i, j) = (k, 1) is the start image P (m, s) (i, j) is the end image q (m, s) (k, 1) It means being mapped to). For simplicity, when f (i, j) = (k, 1) holds, the pixel q (k, 1) is described as qf (i, j).

In the case where the data is discrete as in the case of pixels (grid points) dealt with in the base technology, the definition of bijection is important. Here, it defines as follows (i, i ', j, j', k, 1 are all integers). First, each square area denoted by R in the plane of the starting image,

[Number 4]

(Expression 4)

(I = 0, ···, 2m-l, j = 0, · · ·, 2m-l). Here, the direction of each side (edge) of R is determined as follows.

[Number 5]

(Eq. 5) This square must be mapped to a quadrilateral in the end image plane by the mapping f. a quadrilateral represented by f (m, s) (R),

[Number 6]

It is necessary to satisfy the following bijective conditions.

1. The edges of the quadrilateral f (m, s) (R) do not intersect one another.

2. The directions of the edges of f (m, s) (R) are equal to those of R (clockwise in Figure 2).

3. Allow contraction maps (retractions) as a relaxation condition.

[0027] This is because there is only a unit map which completely satisfies the bijection condition unless some relaxation conditions are provided. Here, the length of one edge of f (m, s) (R) may be 0, that is, f (m, s) (R) may be triangular. However, it should not be a figure with an area of 0, that is, one point or one line segment. If Fig. 2 (R) is the original quadrilateral, Fig. 2 (A) and Fig. 2 (D) satisfy the total injection condition, but Fig. 2 (B), Fig. 2 (C) and Fig. 2 (E). ) Does not meet.

[0028] In an actual implementation, the following conditions may be imposed to easily guarantee that the mapping is surjective. That is, each pixel on the boundary of the start image is copied to a pixel occupying the same position in the end image. That is, f (i, j) = (i, j) (where i = 0, i = 2m-l, j = 0, j = 2m-1). This condition is hereinafter also referred to as "additional condition".

[0029] [1.3.2] Energy of mapping

[1. 3. 2. 1] Cost related to pixel brightness Define the energy of the mapping f. The goal is to find a map with the lowest energy. The energy is mainly determined by the difference between the luminance of the pixel of the source image and the luminance of the pixel of the corresponding destination image. That is, the energy C (m, s) (i, j) at the point (i, j) of the mapping f (m, s) is determined by the following equation.

[Number 7]

.

(Eq. ₇₎ where V (p (m, s) (i, j) V (q (m, s) f (i, j)) is a pixel p (m, s) (i, j) and q (m, s) is the luminance of f (i, j) The total energy of f, C (m, s), is an evaluation formula for evaluating matching, and C (m, s) shown below It can be defined by the sum of (i, j).

[Number 8]

C)) = ∑ Z [Zf 8 8)

i = Q j = 0

[1.3.2.2] Cost of pixel location for smooth mapping

In order to obtain a smooth mapping we introduce another energy Df on the mapping. This energy is determined by the position of p (m, s) (i, j) and q (m, s) f (i, j) regardless of the luminance of the pixel (i = 0,. , 2m-l, j = 0, ..., 2m-1). The energy D (m, s) (i, j) of the mapping f (m, s) at point (i, j) is defined by the following equation.

[Number 9]

") — ^ O ₍ ") () (equation ₉₎

However, coefficient parameter 7? Is a real number of 0 or more, and

[Number 10]

(Expression 1 0)

[Number 11]

- (t ', i') ) ll 2/4

I assume. here,

[Number 12]

(Equation ₂ ), and i, 0 0 and j, く 0, f (i, j,) is determined to be 0. EO is determined by the distances of (i, j) and f (i, j). EO prevents a pixel from being mapped to a pixel too far away. However, EO will be replaced by another energy function later. E1 guarantees the smoothness of the mapping. E1 represents the distance between the displacement of P (i, j) and the displacement of its neighboring points. Based on the above consideration, energy Df, which is another evaluation equation for evaluating matching, is determined by the following equation.

[Number 13]

, 'One τ 二 two j-2 ^m -1

∑)) <Expression _{1 3} ) [1.3.2.3] Total energy of mapping

The total energy of the mapping, that is, the comprehensive evaluation formula for integrating a plurality of evaluation formulas, is defined by λ c (m, s) f + D (m, s) f. Here, the coefficient parameter is a real number of 0 or more. The purpose is to detect the state in which the comprehensive evaluation formula has extrema, that is, to find out the mapping giving the minimum energy unit expressed by

[Equation 14] (Equation 1 4)

It should be noted that in the case of λ = 0 and r? = 0, the mapping is a unit mapping (ie all i = 0, ..., 2m-1 and j = 0, ... , 2m-1 for f (m, s) (i, j) = (i, j)). In this premise technology to be described later, the unit mapping force can also be gradually deformed because the case of λ = 0 and r? = 0 is first evaluated. Assuming that the position of λ in the comprehensive evaluation formula is changed and defined as C (m, s) f + lD (m, s) f, the general evaluation formula is C (m, where λ = 0 and = 0. , s) becomes f only, and pixels that are originally unrelated at all can be associated simply by the fact that the luminance is close, and the mapping becomes meaningless. Even if you change the map based on such a meaningless map, it makes no sense at all. Therefore, the unit map is evaluated Consideration is given to how to give coefficient parameters to be selected as the best mapping at the beginning of.

[0030] The optical flow also takes into account the difference in brightness of pixels and the smoothness, as in the base technology. However, optical flow can not be used to convert images. This is because only the local movement of the object is considered. Global correspondence can be detected by using a singular point filter according to the base technology.

[0031] [1.3.3] Determination of mapping by introduction of multiple resolutions

Given the minimum energy, find the mapping fmin that satisfies the bijective condition using multiple resolution hierarchy. At each resolution level, calculate the mapping between the start and end sub-images. Starting from the top of the hierarchy of resolutions (the coarsest level), the mapping of each resolution level is determined taking into account the mapping of other levels. The number of mapping candidates at each level is limited by using higher or coarser level mappings. More specifically, in the determination of the mapping at a certain level, the mapping found at one coarser level is imposed as a kind of constraint.

First,

[Number 15]

('> = ([i], [i]) (Equation ^{1 5} )

When p holds, p (m-1, s) (i, j), q (m-l, s) (i, j,) is p (m, s) (i, j), q (m, s) We call it the parent of (i, j). [x] is the largest integer not exceeding x. In addition, p (m, s) (i, j), q (m, s) (i, j are respectively pn-1, s) (l, j eight q un-1, s) (ι, j child The function parent (i, j) is defined by

[Equation 16] par _e nt {i, j) = ([^], [^]) (Equation 16) p (m, s) (i, j) and q (m, s) (k, 1) The mapping f (m, s) between is determined by performing an energy calculation and finding the minima. The value of f (m, s) (i, j) = (k, 1) is as follows by using f (m-1, s) (m = l, 2, · · ·, n) It is determined. First, q (m, s) (k, 1) imposes the condition that it must be inside the next quadrilateral and complete the bijective condition We narrow down the one with high reality among the maps.

[Number 17]

(Expression 17)

This _ ゝ

[Number 18]

(Equation 18) The quadrilateral defined in this way is hereinafter called an inherited quadrilateral of p (m, s) (i, j). In the interior of the succession quadrilateral, find the pixel that minimizes the energy.

[0033] Figure 3 illustrates the above procedure. In the figure, the pixels A, B, C and D of the start image are mapped to Α ', Β', C and D, respectively, of the end image at the m−1th level. The pixel p (m, s) (i, j) is mapped to the pixel q (m, s) f (m) (i, j) present inside the succession quadrilateral A'B'C'D '. There must be. With the above consideration, the m-1th level of mapping power is bridged to the mth level of mapping.

The energy EO defined above is replaced by the following equation to calculate the submapping f (m, 0) at the m-th level.

[Number 19]

(Expression _{1 9)}

Also, to calculate the submapping f (m, s), the following equation is used.

[Number 20]

Eo ^ =! | / ')-'", I) H ² <ぱ_{2 o} ) Thus, a mapping that keeps the energy of all submappings at low values is obtained, Equation 20 corresponds to different singularities The submappings are related at the same level so that the similarity between the submappings is high.Equation 19 is f (m, s) (i, j) and a part of the pixels of the (m-1) th level (I, j) indicates the distance to the position of the point to be projected. If there is no pixel that satisfies the bijective condition inside the succession quadrilateral A ′ B ′ C ′ D ′, the following measures are taken. First, examine a pixel whose distance to the boundary force of A 'B' C 'D' is L (initially L = l). Among them, if the one with the lowest energy satisfies the bijective condition, this is selected as the value of f (m, s) (i, j). Increase L until the force at which such a point is found, or L reaches its upper limit L (m) max. L (m) max is fixed for each level m. If such a point is not found at all, it is possible to temporarily ignore the third condition of bijection and allow a mapping such that the area of the quadrilateral to be converted becomes zero, f (m, s) ( Determine i, j). If you still can not find a point that satisfies the conditions, then remove the first and second conditions of the bijective.

[0036] An approximation method using multiple resolutions is essential to determine the global correspondence between images while avoiding that the mapping is affected by the details of the images. It is impossible to find correspondences between distant pixels without using multiresolution approximation. In that case, the size of the image has to be limited to a very small one, and only small images of variation can be handled. Furthermore, in order to usually require smoothness in mapping, the correspondence between such pixels is found. This is because the energy of mapping from a pixel having a distance to the pixel is high. According to the multiresolution approximation method, appropriate correspondences between such pixels can be found. Their distance is at the top level (coarse, level) of the resolution hierarchy!

Automatic Determination of Optimal Parameter Value

One of the main drawbacks of existing matching techniques is the difficulty of adjusting the parameters. In most cases, adjustment of parameters is done manually, and it is extremely difficult to select the optimum value. According to the method according to the base technology, optimal parameter values can be completely determined automatically.

[0038] The system according to the base technology includes two parameters, λ and 7 ?. Briefly, λ is the weight of the difference in luminance of the pixel, and 7} indicates the stiffness of the mapping. The values of these parameters have an initial value of 0. First, fix = 0 and gradually increase λ from 0. If the force is also minimized while the value of λ is increased, then the value of C (m, s) f for each submapping generally decreases. This is basically two images It means that a strong match must be made. However, the following phenomena occur when the value exceeds the optimum value.

[0039] 1. Force between pixels which should not normally be dealt with It is possible to make a false response simply by the fact that the brightness is close.

2. As a result, the correspondence between pixels becomes strong, and the mapping starts to break down.

[0040] 3. As a result, D (m, s) f tries to increase rapidly in Eq.

4. As a result, f (m, s) changes so as to suppress the rapid increase in D (m, s) because the value of equation 14 tends to increase rapidly, and as a result, C (m, s) ) f increases.

Therefore, while maintaining the condition that Equation 14 takes the minimum value while increasing λ, C (m, s) detects a threshold at which f turns to increase, and that value is optimized at 7? = 0. It will be a value. Next, η is increased little by little, the behavior of C (m, s) f is checked, and 7? Is automatically determined by the method described later. Λ is also determined according to the r?

[0042] This method is similar to the operation of the focusing mechanism of the human visual system. In the human vision system, the left and right eye images are matched while moving one eye. When the observer clearly perceives, his eyes are fixed.

[0043] [1.4.1] Dynamic determination of

λ is increased from 0 by a predetermined step width, and the submapping is evaluated each time the value of え changes. The total energy is defined by C (m, s) f + D (m, s) f as shown in equation 14. D (m, s) f in Equation 9 represents smoothness, and is theoretically minimized in the case of unit mapping, and EO and E1 increase as the mapping is distorted. Since E1 is an integer, the minimum step size of D (m, s) f is 1. For this reason, the total energy can not be reduced by changing the mapping unless the current change (decrease) of C (m, s) (i, j) is 1 or more. The reason is that D (m, s) f increases by 1 or more as the mapping changes, so the total energy decreases unless C (m, s) (i, j) decreases by 1 or more. .

Under this condition, it is shown that C (m, s) (i, j) decreases in the normal case as λ increases. The histogram of C (m, s) (i, j) is described as h (l). Ml) is the number of pixels whose energy C (m, s) (i, j) is 12. In order that ぇ 12 1 1 holds, consider, for example, the case of 12 = ΐΖλ. When the minute amount changes from E1 to λ2, [Number 21]

A = ∑ h {l)

The 画素 pixels shown by A (equation 2 1) are

[Number 22]

_{2 2} )

Change to a more stable state with energy. Here, it is approximated that the energy of these pixels is all zero. This expression has the value of C (m, s) f

[Number 23]

(Expression 2 3)

Show that only changes, as a result,

[Equation 24] dC ' ^a) h {l)

d \ λ ⁵ zone ² (equation 24)

Is established. Since h (l)> 0, C (m, s) f usually decreases. However, when λ tends to exceed the optimum value, the above phenomenon, that is, an increase in C (m, s) f occurs. By detecting this phenomenon, the optimal value of f is determined.

When H (h> 0) and k are constants,

[Number 25]

H

Assuming ^{h (l) = Hl k =} ( Equation 2 5),

[Number 26] dC H

(Equation 26) ■ holds. If k とき −3, then [Equation 27] / ^{= C +} (z / + k / 2) X ^ + ^ (Equation 2 7)

It becomes. This is a general formula of C (m, s) f (C is a constant).

When detecting the optimum value of λ, the number of pixels that break the bijection condition may be checked for further safety. Here, when determining the mapping of each pixel, it is assumed that the probability of breaking the bijective condition is ρθ. in this case,

[Number 28]

d ₌ (Expression 2 8)

ex λ ^{^3/2}

Therefore, the number of pixels that break the bijective condition increases at the rate of the following equation.

[Number 29]

(Expression 2 9)

Therefore,

[Equation 30] ci ^{= 1} (Equation ^{3 0} )

Is a constant. If we assume Ml) = Hlk, for example,

[Number 31]

_BoA 3 ₃₊ t / 2 ₌ ^ ff (Equation ₃₁₎ becomes a constant. However, when the value exceeds the optimum value, the above value increases rapidly. This phenomenon can be detected, and it can be checked whether the value of の λ 3Z 2 + kZ 2 Z 2m exceeds the outlier BOthres to determine the optimum value of λ. Similarly, by checking whether the value of B1 λ3Z2 + kZ2Z2m exceeds the abnormal value Blthres, the increase rate B1 of the pixel that breaks the third condition of the bijection is confirmed. The reason for introducing the factor 2m will be described later. This system is not sensitive to these two thresholds. These thresholds can be used to detect excessive distortion of the mapping that is missed by observation of energy C (m, s) f. In the experiment, when calculating the submapping f (m, s), if the value exceeds 0.1, the calculation of f (m, stops and the calculation shifts to the calculation of f (m, s + 1) This is because when λ> 0.1, the difference in “3” in the pixel brightness 255 level affects the calculation of the submapping. When λ> 0.1, the positive L ヽ result is obtained. Because it was difficult to get.

[1. 4. 2] Histogram h (l)

The examination of C (m, s) f does not depend on the histogram Ml). M1) can be affected during bijection and examination of its third condition. Actually, when (λ, C (m, s) f) is plotted, k is usually around 1. In the experiment, = λ 2 and B 1 λ 2 were examined using k = 1. If it is less than the real value power of k, ΒΟλ 2 and B 1 λ 2 do not become constants, but gradually increase according to the factor (l−k) Z 2. If M1) is a constant, for example, the factor one is λ 1Z2. However, these differences can be absorbed by setting the threshold BOthres correctly.

Here, it is assumed that the starting point image is a circular object having a center of (xO, yO) and a radius r as expressed by the following equation.

[Number 32]

_{= 1} ψ ^-^) ² + ϋ-)) ² )-o) ² + (j-y _D <r)

0 (otherwise)

(Expression 3 2)

On the other hand, the end point image is assumed to be an object of center (xl, yl) and radius ごとく as expressed by the following equation.

[Number 33]

Here c (x) is in the form c (x) = xk. If the centers (xO, yO) and (xl, yl) are far enough, the histogram h (l) has the form

[Number 34]

h (l) a rl ^k (k ≠ Q)

Li, '(equation 34)

When k = 1, the image shows an object with a sharp border embedded in the background. This object becomes brighter as the center gets darker. When k =-l, the image represents an object with an ambiguous border. This object is brightest around the center It gets dark as you go to the area. General objects do not lose generality if they are considered to be in between these two types of objects. Thus, k can cover most cases as 1 l ≤ k ≤ l, and it is guaranteed that equation 27 is generally a decreasing function.

It should be noted that r is affected by the resolution of the image, that is, r is proportional to 2m, as shown in Equation 34. A factor of 2 m was introduced in [1.4.1] for this purpose.

[0051] [1. 4. 3] Dynamic determination of 7?

It is possible to automatically determine the parameter r? In the same way. First, set 7? = 0, and calculate the final mapping f (n) and energy C (n) f at the finest resolution. Subsequently, 7? Is increased by a certain value Δη, and the final mapping f (n) and energy C (n) f at the finest resolution are recalculated again. This process is continued until the optimum value is obtained. η indicates the stiffness of the mapping. It is because it is a weight of following Formula.

[Number 35]

(Equation 3 5)

In the case of r? force, D (n) f is determined independently of the immediately preceding submapping, and the current submapping is elastically deformed and distorted excessively. On the other hand, when r? Is a very large value, D (n) f is almost completely determined by the preceding submapping. At this time, the submapping is very rigid, and the pixel with high rigidity is projected to the same place. As a result, the mapping becomes a unit map. As the value of 増える increases gradually from 0, C (n) f gradually decreases as described later. However, when the value of 7? Exceeds the optimum value, energy starts to increase as shown in Fig.4. The X axis in the figure is 7 ?, and the Y axis is Cf.

[0052] In this way, it is possible to obtain an optimal value of 7? That minimizes C (n) f. However, compared to the case of λ, C (n) f changes with small fluctuations as a result of small and large factors affecting the calculation. In the opposite case, each time the input changes by a small amount, the submapping is only recalculated once. In the case of force r ?, all submappings are recalculated. For this reason, it can not be judged immediately whether the value of C (n) f obtained is minimum. Possible minimum value is If found, it is necessary to search for the true minimum value by setting an even finer interval.

[1.5] Supersampling

The range of f (m, s) can be extended to RXR (R is a set of real numbers) to increase the degree of freedom in determining the correspondence between pixels. In this case, the luminance of the pixel of the end point image is interpolated, and the non-integer point,

[Number 36]

'• ^s ) (i, j)) (Expression 3 6)

F (m, s) with luminance at is provided. That is, supersampling is performed. In experiments, f (m, s) is allowed to take integer and half integer values,

[Equation 37] (3⁄4 (Equation ^{3 7} )

Is

[Number 38]

Given by

[1. 6] Normality of luminance of pixel of each image

When the start image and the end image contain extremely different objects, it is difficult to use the luminance of the original pixel as it is to calculate the mapping. Because the difference in luminance is large, the energy C (m, s) f related to luminance is too large, and it is difficult to evaluate correctly.

For example, consider the case where human face and cat face are matched. The cat's face is covered with hair and is a mixture of very bright and very dark pixels. In this case, we first normalize the subimage to calculate the submapping between the two faces. That is, the luminance of the darkest pixel is set to 0, that of the brightest to 255, and the luminances of the other pixels are obtained by linear interpolation.

[0056] [1.7] Implementation We use an inductive method in which the calculation proceeds linearly as the source image is scanned. First, determine the value of f (m, s) for the top left pixel (i, j) = (0, 0). Next, determine the value of each f (m, s) (i, j) while incrementing i by one. When the value of i reaches the width of the image, increase the value of j by 1 and return i to 0. After that, f (m, s) (i, j) is determined along with the scanning of the start point image. Once the pixel correspondences for all points are determined, one mapping f (m, s) is determined.

If the corresponding point qf (i, j) is determined for a certain P (i, j), then the corresponding point, j + 1), of p (i, + 1) is determined. At this time, the position of qf (i, j + 1) is restricted by the position of qf (i, j) in order to satisfy the bijective condition. Therefore, the priority is higher in this system as the correspondence point is determined earlier. Whenever the state where (0, 0) is the highest priority continues, an extra bias is added to the final mapping sought. In this base technology, f (m, s) is determined by the following method to avoid this situation.

First, when (s mod 4) is 0, (0, 0) is determined while gradually increasing the start point and U and j. When (s mod 4) is 1, it is determined starting from the right end point of the top row, decreasing i and increasing j. When (s mod 4) is 2, the bottom right end point is used as the starting point, and i and j are determined while decreasing. If (s mod 4) is 3, start from the bottom left end point, and increase i and i while decreasing j. Since there is no concept of submapping, that is, the parameter s, at the nth level with the smallest resolution, two directions are calculated continuously assuming that s = 0 and s = 2.

[0058] In the actual implementation, f (m, s) satisfies the bijective condition as much as possible from among the candidates (k, 1) by penalizing the candidate that breaks the bijective condition. The value of (i, j) (m = 0, · · ·, n) was selected. The candidate energy D (k, 1) that violates the third condition is multiplied by φ, while the candidate that violates the first or second condition is multiplied by φ. This time, φ = 2 and φ = 10 0000 were used.

In order to check the bijective conditions described above, the following test was performed in determining (k, 1) = f (m, s) (i, j) as an actual procedure. That is, for each grid point (k, 1) included in the succession quadrilateral of f (m, s) (i, j), it is checked whether or not z component force ^ of the outer product of the following equation is obtained.

[Number 39] W = Ax B

(Expression 39)

_ _ ゝ

[Number 40]

^Ά _ 3 ') (one (equation 4 0)

[Number 41]

D--i) Y (W)

(Expression 4 1)

(Here, the vector is a three-dimensional vector, and the z-axis is defined in the orthogonal right-hand coordinate system). If W is negative, then the candidate is penalized by multiplying D (m, s) (k, 1) by φ, so as not to choose as much as possible.

[0060] FIG. 5 (a) and FIG. 5 (b) show the reason for checking this condition. Fig. 5 (a) shows candidates without penalty, and Fig. 5 (b) shows candidates with penalty. When determining the mapping f (m, s) (i, j + 1) to the adjacent pixel (i, j + 1), if the z component of W is negative, the bijective condition is satisfied on the starting image plane There is no pixel to This is because q (m, s) (k, 1) crosses the border between adjacent quadrilaterals.

[0061] [1. 7. 1] Order of submappings

In the implementation, when the resolution level is even, use σ (0) = 0, σ (1) = 1, σ (2) = 2, σ (3) = 3, σ (4) = 0, and it is an odd number. When σ (0) = 3, σ (1) = 2, σ (2) = 1, σ (3) = 0, and σ (4) = 3 are used. This shuffled the submappings moderately. In addition, originally there are four types of submappings, and s is any one of 0 to 3. However, in practice, processing equivalent to s = 4 was performed. The reason will be described later.

Interpolation Calculation

After the mapping between the source and destination images is determined, the intensities of corresponding pixels are interpolated. In the experiment, trilinear interpolation was used. The square p (i, j) p (i + 1, j) p (i, j + 1) p (i + 1, j + 1) in the start image plane is the quadrilateral qf (i, j) on the end image plane ) qf (i + l, j) qf (i, j + l) qf (i + l, j + 1) is assumed to be projected. For simplicity, let the distance between the images be 1 Do. Pixel r (x, y, t) (0 ≤ x ≤ N ≤ 1, 0 — y ≤ M 1 1) of the intermediate image which is the distance force St (0 ≤ t ≤ l) from the start image plane is as follows It is determined by First, the position (where X, y, tER) of the pixel r (x, y, t) is determined by the following equation.

[Number 42]

(, 3⁄4ί) = (1-dx) (l-dy) (l-t) (i) + (1-dx) (l-dy) tf (i, j)

+ dx, l-dy) {\-t) (i + l _t j) + dx (l-dy) tf {i + l, j)

+ (1-dx) dy (l 1 t) {i, i + l) + (l-dx) d ≠ f (i, j + 1)

+ dxdy {1 1 t) (i + 1) + dxdytf (i + l, j + 1)

(Expression 4 2)

Subsequently, the luminance of the pixel at r (x, y, t) is determined using the following equation.

[Number 43]

V {r {x, y, i) = [1-^) (1 dy) {\ 1 i) V (p _(iii) ) + (1-dx) {\ 1 dy) i V {q _{S {} i, _i) )

+

-dy) t V [q _{f {} i ₊₁₎ )

+ (1-dx) dy (1-(+ (1-dx) dyt V {q _{f {i +} i ₎ )

+ dxdyil-ΐ) ((ρ ₊ ι ,, · + ι)) + xdyt V (q _{j {i +} ij ₊₁ ))

Where dx and dy are parameters, which vary from 0 to 1.

[0063] [1. 9] Mapping under imposed constraint conditions

So far, I have described the determination of mapping when there are no constraints. However, when a correspondence is defined in advance between specific pixels of the start point image and the end point image, the mapping can be determined with this as the constraint condition.

[0064] The basic idea is to roughly deform the starting image roughly by rough mapping which first shifts specific pixels of the starting image to specific pixels of the ending image, and then calculate the mapping f correctly.

First, a specific pixel of the start point image is projected to a specific pixel of the end point image, and a rough mapping is determined to project other pixels of the start point image to an appropriate position. That is, a pixel close to a particular pixel is a mapping such that the particular pixel is projected near where it is projected. Here, we describe the rough mapping of the mth level as F (m). The rough mapping F is determined as follows. First, the mapping is specified for several pixels. About the source image ns pixels,

[Number 44]

P (J'0), P ("J"), ..., P (in _s -jns-l)

When specifying (Expression 44), determine the following values.

[Number 45]

(, I.) = (. Λ),

* ^, -l; Jrij-l) = (τι _β -1, 1 1 J

(Equation 4 5)

The amount of displacement of the other pixels of the starting image is an average obtained by weighting the displacement of P (ih, jh) (h = 0, ···, ns-1). That is, the pixel p (i, j) is projected to the following pixels of the end point image.

[Number 46]

(^ j) + ∑ A = o ³ " ¹ (^-3 k) weight _h (i, j)

2 ^ n— m

(Expression 4 6)

[Number 47]

3⁄4— 3h—,

weight h {i,) =

total weight (i, j) (Equation 47)

[Equation 48] τίαΐ we% ght (, j) = ^ i One-n (Equation 4 8)

Hm = ο

I assume. [0067] The energy D (m, s) (i, j) of the mapping f is changed so that the candidate mapping f has less energy close to F (m). To be precise, D (m, s) (i, j) is

[Number 49]

(Expression 4 9)

It is. However,

[Number 50] 2 Ci-j)

(Eq. 5 0), and let, and 00. Finally, f is completely determined by the above-mentioned mapping automatic calculation process.

Here, f (m, s) (i, j) is sufficiently close to F (m) (i, j) !, that is, when their distance is

[Number 51]

[ ₂ 2 (rx-m)] 5 1)

It should be noted that E2 (m, s) (i, j) becomes 0 when it is within. The reason for such definition is that the value is automatically set so that it settles at the appropriate position in the end point image as long as each f (m, s) (i, j) is sufficiently close to F (m) (i, j). It is because I want to decide on. For this reason, the starting point image that needs to specify the exact correspondence in detail is automatically mapped to match the ending point image.

[2] Specific processing procedure

The flow of processing by each element technology of [1] will be described.

FIG. 6 is a flowchart showing the overall procedure of the base technology. First of all, processing using multiple resolution singular point filters is performed! (S1), and subsequently, matching between the start point image and the end point image is performed (S2). However, S2 is not essential, and processing such as image recognition may be performed based on the features of the image obtained in S1.

FIG. 7 is a flowchart showing the details of S 1 of FIG. Here the start image and end point in S2 It is premised to match the image. Therefore, the starting point image is first hierarchized by the singular point filter (S10) to obtain a series of starting point hierarchical images. Subsequently, the end point image is hierarchized in the same manner (S11) to obtain a series of end point hierarchical images. However, the order of S10 and S11 is arbitrary, and the start point hierarchical image and the end point hierarchical image may be generated in parallel.

FIG. 8 is a flowchart showing the details of S10 in FIG. The size of the original source image is 2 n x 2 n. Since the starting point hierarchical image is created in order of resolution, the parameter m indicating the resolution level to be processed is set to n (S100). Subsequently, singular points are detected from images m (m, 0), p (m, l), p (m, 2) and p (m, 3) of the m-th level using a singular point filter (S101), Images p (m−1, 0), p (m−1, 1), p (m−1, 2) and p (m−1, 3) of the m−1 levels are generated (S102). Here, since m = n, p (m, 0) = p (m, l) = p (m, 2) = p (m, 3) = p (n), and one starting point, image power Four sub-images are generated.

FIG. 9 shows the correspondence between a part of the m-th level image and a part of the m-th level image. The numerical values in the figure indicate the luminance of each pixel. In the figure, p (m, s) symbolizes four images from p (m, 0) to p (m, 3). When p (m-1, 0) is generated, p (m (s, p) We consider m, s) to be p (m, 0). According to the rule described in [1.2], for example, for the block to which the brightness is written in the same figure, p (m-1, 0) is "3" among the four pixels contained therein, p (m-1, 1) Gets “8”, p (ml, 2) gets “6”, and p (m−1, 3) gets “10”, and this block is replaced with one pixel each. Therefore, the size of the subimage at the m−l level is 2m−1 × 2m−1.

Subsequently, m is decremented (S103 in FIG. 8), and it is confirmed that m is not negative (S104), and the process returns to S101 to generate a coarser-resolution sub-image. As a result of this iterative process, S10 ends when m = 0, that is, when the zero-level sub-image is generated. The size of the 0th level subimage is 1 × 1.

[0073] FIG. 10 illustrates the source hierarchical image generated by S10 for the case of n = 3. Only the first source image is common to the four series, and sub-images are generated independently according to the type of singularity thereafter. The process in FIG. 8 is common to S11 in FIG. The end layer hierarchical image is also generated through the same procedure. Thus, the process of S1 in FIG. 6 is completed.

In the base technology, preparation for matching evaluation is performed in order to proceed to S2 in FIG. Figure 11 shows the procedure. First of all, a plurality of evaluation formulas are set (S30). Then, the energy C (m, s) f for the pixel introduced in [1. 3. 2. 2] and the energy D (m, s) f for the smoothness of the mapping introduced in [1. 3. 2. 2]. is there. Next, a comprehensive evaluation formula is created by integrating these evaluation formulas (S31). If total energy C C (m, s) f + D (m, s) f introduced in [1. 3. 2. 3] and introduced in [1. 3. 2.2] is used,

[Number 52]

C 3⁄4 5 2)

It becomes. However, the summation is calculated with 0, 1 ···, 2m-1 for i and j respectively. Preparation for matching evaluation is now complete.

FIG. 12 is a flowchart showing the details of S2 of FIG. The matching of the start point hierarchical image and the end point hierarchical image described in [1] is taken between images of the same resolution level. In order to get a good global match between images, we calculate the match in order from the coarser level. Since the start point hierarchical image and the end point hierarchical image are generated using the singular point filter, the positions and luminances of the singular points are clearly stored even at the coarse resolution level, and the result of the global matching is It will be very superior to the conventional one.

First, the coefficient parameter 7? Is set to 0, and the level parameter m is set to 0 (S20). Subsequently, the matching is calculated between each of the four sub-images of the m-th level in the start point hierarchical image and the four sub-images of the m-th level in the end point hierarchical image, and the bijective condition is satisfied and Four types of submappings f (m, s) (s = 0, 1, 2, 3) that minimize s are obtained (S21). The bijective condition is checked using the succession quadrilateral described in [1.3.3]. At this time, as Eqs. 17 and 18 show, since the submappings at the m-th level are constrained to those at the m-l level, the matching at the coarser resolution levels is sequentially used. This is a vertical reference between different levels. Incidentally, although m = 0 and there is no coarser level now, this exceptional process will be described later with reference to FIG.

On the other hand, horizontal reference within the same level is also performed. The formula f of the formula 20 in [1. 3. 3] m, 3) are decided to be similar to f (m, 2), f (m, 2) to f (m, 1) and f (m, 1) to f (m, 0) . The reason is that even if the type of singularity is different, they are originally included in the same start and end images! /, If the submappings are completely different !, the situation is unnatural. As shown in Eq. 20, the closer the submappings are, the smaller the energy, and the matching is considered to be good.

As for f (m, 0) to be determined first, since there is no submapping that can be referred to at the same level, one coarse level is referred to as shown in Expression 19. However, in the experiment, after obtaining f (m, 3), we used the procedure of updating f (m, 0) once with this as a constraint. This is equivalent to substituting s = 4 into equation 20 and making f (m, 4) new f (m, 0). This is to avoid the tendency for the degree of association between f (m, 0) and f (m, 3) to be too low, and this measure made the experimental results better. In addition to this measure, in the experiment we also shuffled the submapping shown in [1.7.1]. This is also intended to keep closely the degree of association between submappings originally determined for each type of singularity. Also, as described in [1. 7], the position of the start point is changed according to the value of s to avoid deflection depending on the start point of the process.

FIG. 13 shows how to determine the submapping at the zeroth level. At the zeroth level, each subimage consists of only one pixel, so all four submaps f (0, s) are automatically determined as unit maps. FIG. 14 shows how to determine the submapping at the first level. At the first level, each sub-image consists of 4 pixels. In the figure, these four pixels are shown by solid lines. Now, to find the corresponding point of point X in p (l, s) in q (l, s), follow the procedure below.

1. Find the upper left point a, upper right point b, lower left point c, and lower right point d of point X at the first level resolution.

2. Find the pixel to which point ad belongs at one coarse level, ie at the zeroth level. In the case of FIG. 14, points a to d belong to pixels A to D, respectively. However, pixels A to C are virtual pixels which do not exist originally.

3. Plot the corresponding points A to D of pixels A to D already found at the 0th level into q (l, s). Pixels A ′ to C ′ are virtual pixels, which are located at the same positions as pixels A to C, respectively.

4. Assuming that the corresponding point a 'of the point a in the pixel A is in the pixel A', plot the point a ' . At this time, it is assumed that the position occupied by the point a in the pixel A (in this case, the lower right) and the position occupied by the point a ′ in the pixel A ′ are the same.

5. Plot the corresponding points b 'to d' in the same way as 4 and make an inheritance quadrilateral at points a 'to d'.

6. Find the corresponding point x 'of point X so as to minimize the energy in the succession quadrilateral. As a candidate of the corresponding point x ′, for example, the center of the pixel may be limited to one included in the succession quadrilateral. In the case of FIG. 14, all four pixels are candidates.

This is the procedure for determining the corresponding point of point X that has the above. Do the same for all other points and determine the submapping. At the second and higher levels, since the shape of the succession quadrilateral is gradually lost, as shown in FIG. 3, the interval between the pixels A ′ and D ′ becomes empty, and a state occurs.

Thus, if four submappings of the mth level are determined, m is incremented (S 22 in FIG. 12), and the m force ^! Is exceeded! (S23), return to S21. Hereinafter, each time the process returns to S21, a submapping at a finer resolution level is determined, and when the process finally returns to S21, the mapping f (n) of the nth level is determined. Since this mapping is fixed for 7? = 0, we write f (n) (r? = 0).

Next, the mapping for a different 7? Is also obtained, and 7? Is shifted by Δ7 ?, and m is zero-cleared (S24). It is confirmed that the new 7? Does not exceed the predetermined search cutoff value 7? Max (S25), and the process returns to S21 to obtain a map f (n) =? R? This process is repeated to obtain f (n) (7? = I A 7?) (I = 0, 1, ...) in S21. When 7? Force S max is exceeded, proceed to S26 and determine the optimum η = opt opt by the method described later, and finally f (n) (r? = Η opt) is made the mapping f (n) .

FIG. 15 is a flowchart showing the details of S21 of FIG. This flowchart determines the submapping at the mth level for a given r ?. In determining the submapping, in the base technology, the optimum λ is determined independently for each submapping.

In the same figure, s and s are cleared to zero (S 210). Next, find the submapping f (m, s) that minimizes the energy (and implicitly 7?) For that moment (S211), and let this be f (m, s) (λ = 0) write. The mapping for different λ is also calculated, and λ is shifted by Δ λ, and it is confirmed that the new search does not exceed the predetermined search cancellation value max (S 213), S Returning to step 211, f (m, ₅ ) (λ = ί Δ λ) (ί = 0, 1, ···) is determined by the subsequent iterative processing. When λ exceeds λ max, the process proceeds to S214, and the optimum λ = λ opt is determined, and f (m, s) {X = Xopt) is finally set as the mapping f (m, s) (S214) .

Next, λ for which another submapping at the same level is sought is cleared to zero, and s is incremented (S 215). Check that s does not exceed 4 (S216) and return to S211. Once s = 4, f (m, 0) is updated using f (m, 3) as described above, and the decision of the submapping at that level is completed.

FIG. 16 shows the energy C (m, m) corresponding to f (m, s) (λ = ίΔλ) (i = 0, 1,...) Obtained while changing λ for certain m and s. s) It is a figure which shows the behavior of f. As mentioned in [1.4], C (m, s) f usually decreases as the harvest increases. However, C (m, s) f turns to increase when the color exceeds the optimal value. Therefore, in this base technology, the choice when C (m, s) f takes a minimum value is decided as opt. As shown in the figure> Even if C (m, s) f decreases again in the range of opt, the mapping is already broken at that point and does not make sense, so it is sufficient to focus on the first minimum point . λ opt is decided independently for each submapping, and finally, f (one is determined.

On the other hand, FIG. 17 shows that the energy C (n) f corresponding to f (n) (r? = 1Δ r?) (I = 0, 1,...) Obtained while changing 7? It is a figure which shows a behavior. Again, as 7? Increases, C (n) f usually decreases, but when 7? Exceeds the optimal value, C (n) f starts to increase. Therefore, we define r? Opt when r (f) takes a local minimum value. Consider Fig. 17 as an enlarged view of the vicinity of the horizontal axis in Fig. 4 V ,. Once 7? Opt is decided, f (n) can be finally decided.

As described above, according to the base technology, various merits can be obtained. First, since it is not necessary to detect an edge, the problems of the edge detection type prior art can be solved. Also, a priori knowledge of the outside of the object contained in the image is not necessary, and automatic detection of corresponding points is realized. According to the special point filter, the brightness and position of the singular point can be maintained even at the coarse level of resolution, which is extremely advantageous for object recognition, feature extraction and image matching. As a result, it becomes possible to construct an image processing system that greatly reduces manual work.

In addition, the following modified techniques can be considered with respect to this base technology.

(1) In the base technology, when matching is performed between the start point hierarchical image and the end point hierarchical image The power of automatic determination of data This method is generally applicable to the case of matching between two normal images which are not between hierarchical images.

For example, an energy EO associated with a difference in luminance between pixels and an energy E1 associated with a positional deviation of pixels between two images is used as an evaluation equation, and a linear sum Etot = αΕΟ + E1 of these images is an integrated evaluation equation. I assume. Focus on the extreme value of this comprehensive evaluation formula, determine α automatically. In other words, we find a mapping that minimizes Etot for various α. Among these images, determine a as an optimal parameter when El takes a minimum value with respect to a. The mapping corresponding to the parameter is finally regarded as the best match between the two images.

Besides the above, there are various methods for setting the evaluation formula, and for example, as in 1ZE1 and 1ZE2, a larger value may be adopted as the evaluation result becomes better. The total evaluation formula also needs to be a linear sum. The n-th power sum (n = 2, 1/2, 1, etc.), a polynomial, an arbitrary function, etc. may be selected as appropriate.

[0092] As for the noramometer, either a only, two cases of 7? Like the base technology, and more cases may be used. If the parameter is 3 or more, change it one by one and decide

(2) In this premise technology, after determining the mapping so that the value of the comprehensive evaluation formula is minimized, a point at which C (m, s) f which is one evaluation formula constituting the comprehensive evaluation formula becomes minimum is detected Parameters were determined. However, in place of such two-step processing, depending on the situation, it is effective to simply determine the parameter so that the minimum value of the overall evaluation formula is minimized. In that case, for example, α Ε 0 + | 8 Ε 1 may be used as a comprehensive evaluation formula, and a constraint condition such as α + β = 1 may be provided to treat each evaluation formula equally. The essence of the automatic determination of parameters is that the parameters are determined so as to minimize the energy.

(3) In the base technology, four types of sub-images related to four types of singularity were generated at each resolution level. However, of course, one, two or three of the four types may be used selectively. For example, if there is only one bright point in the image, generating a hierarchical image with only f (m, 3) relating to the maximum point should have a corresponding effect. In that case, since different submappings at the same level are not necessary, there is an effect of reducing the amount of calculation for s.

(4) In this base technology, the pixel becomes 1Z4 when the level advances by the singular point filter. It was For example, it is possible to construct a block in which a singular point is searched, and in this case, when the level advances by one, the pixel becomes 1Z9.

(5) In the case of the start point image and the end point image power color, first convert them to a black and white image and calculate the mapping. The color image of the starting point is converted using the mapping obtained as a result. Alternatively, you can calculate the submapping for each component of RGB! / ,.

[3] Improvements in Prerequisite Technology

Based on the above premise technology, several improvements have been made to improve matching accuracy. Here, I will describe the improvements.

[3.1] Singularity filter and sub-image taking color information into consideration

In order to use the color information of the image effectively, the singular point filter was changed as follows. First of all, we used HIS, which is said to be most consistent with human intuition, as the color space, and chose the one that is said to be the closest to the sensitivity of human eyes as the formula for converting color to luminance.

[Number 53]

7 = 0.299 + 0.587 G + 0.1 14 ΰ 式 (Eq. 5 3) Here, Υ (brightness) at pixel a is defined as Y (a), and the following symbols are defined.

[Number 54]

(Equation 5 4

Prepare the following five filters using the above definition.

[Number 55]

(Formula 5 5)

Among these, the four filters are almost the same as the filters in the base technology before improvement, and the luminance singular point is preserved while retaining the color information. The final filter preserves color saturation singularities, again keeping color information.

These filters produce five sub-images (sub-images) for each level. The highest level sub-image matches the original image.

[Number 56]

(", 0) = («,!) = («, = («, 3) = («, 4) =

^ d) ^ (.U) ^ (iJ) ^ (iJ) ^ (iJ) ^ (iJ)

(Expression 5 6)

[3. 2] Edge image and its sub image

In order to use information on luminance differentiation (edges) for matching, we use a first-order differential edge detection filter. This filter can be realized by convolution with a certain operator H.

[Number 57]

, (n, h)

'(v)

(I) D ® (Eq. 5 7) Here, H takes into consideration the operation speed etc. and used the following operators.

[Number 58] (Equation 5 8)

The image is then multi-resolutioned. The following average value image is the most suitable sub-image because the filter produces an image with a luminance centered at 0.

[Number 59]

(m, v) _ J 1 _ (m + l, v) (m + l, v), (m + l, v), (m + l, v),

^ (i) ~ _Λ ^ (2i, 2j) "^ (2i, 2 zone +1)" (2i ₊ 1, 2 ^ ⁷ (2i + 1, 2j + 1) ^

Four

(Equation 5 9

The image of Equation 59 is used for the energy function in the calculation of the Forward Stage, ie, the first submapping derivation stage described later.

The size of the edge, that is, the absolute value is also necessary for the calculation.

[Number 60]

(Eq. 6 0 This value is always positive, so use a maximum filter for multiresolution

[Equation 61] n O '— (+ l, e)

― Y

), r (2i + l, 2 j + ί) Puno

(Equation 6 1) The image of Equation 61 is used to determine the order of calculation in the Forward Stage calculation described later.

[3.3] Calculation procedure The calculation is performed in the order of the coarsest resolution sub-image. Since there are five sub-images, the calculation is performed multiple times at each level of resolution. This is called a turn, and the maximum number of calculations is represented by t. Each turn consists of two energy minimization calculations: the Forward Stage and the submapping recalculation stage Refinement Stage. FIG. 18 is a flowchart of an improvement in the calculation for determining the submapping at the m-th level.

Clear s in the figure of the figure to zero (S 40). Next, in Forward Stage (S41), a mapping f (m, s) from the start point image P to the end point image q is obtained by energy minimization. Here, the energy to be minimized is a linear sum of the energy C by the corresponding pixel value and the energy D by the smoothness of the mapping.

[0098] The energy C is composed of an energy CI due to the difference in luminance (equivalent to the energy C in the base technology before the improvement), an energy cc due to the hue and saturation, and an energy CE due to the difference in luminance differentiation (edge) , Are respectively expressed as follows.

[Number 62]

min (() + z ()) The energy D uses the same one as that of the base technology before the improvement. However, in the base technology before the improvement, when deriving energy E1 that guarantees the smoothness of the mapping, it is possible to specify the force S considering only adjacent pixels and the number of surrounding pixels to be considered with parameter d. Improved to

[Number 63]

C = | 7 (,;))-)) | ²

C _c ^f (i) = IS (p ^ Χ 2πΐί (ρ;))-(Χ 2

+ IS (i)) si _n (2))-

"""

^ ffj I ² _ | _ I (, v)

, JJ— () I,

(, j) = XC (/, j) + ((,) + 0C (,) (Equation 6 3) In preparation for the next Refinement Stage, this stage maps the end point image q to the start point image g (m (m) , s) are calculated similarly. In the refinement stage (S42), a more appropriate mapping f '(m, s) is obtained based on the bidirectional mappings f (m, s) and g (m, s) obtained in the forward stage. Here, energy minimization calculation is performed for the newly defined energy M !. The energy M is composed of the matching degree M0 with the mapping g from the end point image to the starting point image and the difference Ml between the original mapping.

[Number 64] () = || / (■)-()) “

,

E (i) = ∑ ((('■, zo) _ (' ■, _ /)) _ (('■', zo ') _ (' ■ ', _)) only _R , (^ (3 4 )

The mapping g ′ (m, s) from the end point image q to the start point image p is also obtained in the same way so as not to lose the symmetry.

After that, s is incremented (S 43), and s exceeds t! / ヽ!, And it is confirmed (S 44), and the process proceeds to the Forward Stage (S 41) of the next turn. At that time, energy minimization calculation is performed by replacing E0 as follows.

[Number 65]

M: i) II b) _ / () || ²

(Equation 6 5)

[3.4] Map calculation order

Energy representing the smoothness of the mapping Since the mapping of surrounding points is used in calculating E1, it is influenced whether the points have already been calculated or not. In other words, the accuracy of the entire mapping changes greatly depending on the point from which it is calculated in order. Here we use the absolute value image of the edge. Since the edge part contains a large amount of information, mapping calculation is performed on the force ahead where the absolute value of the edge is large. This has made it possible to obtain very accurate maps, especially for images such as binary images.

In the base technology, position information (hereinafter also referred to as “corresponding point information”) corresponding to matching between two key frames is generated, and an intermediate frame is generated based on the position information. The key frame is an image to be matched, and is expressed as a start image and an end image in the base technology. This technology can be used to compress moving pictures, and in reality, it has not been possible to confirm at the same time the image quality and compression ratio exceeding MP EG in experiments.

First Embodiment

The coding technique and the decoding technique according to the first embodiment will be described below in order.

[1] Image coding technology

First, with reference to FIG. 19, an image coding technique according to the first embodiment will be described. FIG. 19 shows the flow of the present image coding technique, and at the same time shows the configuration of an image coding apparatus to be described later. In the figure, each element described as a functional block that performs various processing can be configured by a CPU, a memory, and other LSIs in terms of hardware, and as software, it can be loaded into a memory. It is realized by a program. Therefore, it is understood by those skilled in the art that these functional blocks can be realized in various forms by hardware only, software only, or a combination thereof, and is not limited to any of them. Further, in the figure, when the same block appears in a plurality of places, it may mean that a plurality of the same blocks necessarily exist. In other cases, it may mean that one block is used a plurality of times.

(1) A sequence of image frames to be processed (hereinafter simply referred to as "target image frames" t) in descending or ascending order FO, Fl, · · · · · Fn-1, Fn (n is an integer of 2 or more Write as). Also, corresponding point information indicating the positional relationship between corresponding points between image frames Fi and Fj (i, j = 0, 1, · · ·, n) is denoted as Mi-j. The point on the image frame Fi is generally written as pi. The subject image frames may or may not be equally spaced in time. At this time, the coding technique performs the following steps.

Step a

A matching is calculated between image frames FO and Fn to generate corresponding point information MO-n. Matching is a process for identifying areas or points that correspond to each other between image frames. When using the base technology, matching is performed on a pixel basis, and block matching is used for MPEG-like matching. When the point θθ on the image frame FO and the point pn on the image frame F correspond, the simplest example of corresponding point information MO-n is “pO → pn” It is. This is actually described by coordinates in the image (hereinafter simply referred to as "coordinates"). Specifically, when the point pi is composed of piO, pil '· · and these correspond to each other, the description example of the corresponding point information MO-n is as follows.

"ΟΟ → η θ → 01 → ΐΖ → 02 → η 2 Ζ · ·"

Step b

The path for moving the point θθ on the image frame FO to the corresponding point pn on the image frame Fn by the corresponding point information MO-n is divided into n, and the point p 1 on the image frame F1 corresponding to the point θθ, the image frame On the image frame Fn, calculate a point pn corresponding to ρθ on the image frame Fn. If the target image frame is equally spaced in time, “n division” is n division, but otherwise, it is divided by the division ratio according to the time ratio between the image frames. For example, if the coordinates of ρθ are (xO, yO), that of pn is (xn, yn), and the target image frames are equally spaced, then the coordinates of pi are generalized as follows.

((χ-χθ)-i / n + x0, (yn-yO)-i / n + yO)

This is so-called coordinate calculation by interpolation. Note that this description is an example in which the corresponding points of FO and Fn are linearly interpolated, but there is also an interpolation by a curve which will be described later.

Step c

By executing step b for a predetermined number of points on the image frame FO, a set of virtual image frames F1 'and points p2 is used using the set of points pi corresponding to the predetermined points. A virtual image frame Fn 'is generated using a set of virtual image frames F2', ···, · points pn. An example of “predetermined number of points” is all the pixels that make up an image frame. However, since the amount of calculation also increases in that case, for example, one pixel may be extracted for several pixels in the X and y directions of the image frame. This is equivalent to dividing an image frame into meshes and extracting only the pixels that fall on the mesh grid points. For example, if one pixel is taken out of five pixels in both the x and y directions, the "predetermined number of points" becomes 1Z 25 of the total number of pixels.

If “predetermined number” is not the total number of pixels, the corresponding points (temporarily referred to as non-grid points) for which the corresponding points have not been calculated are tentatively referred to (temporarily referred to as grid points). It is calculated by interpolation based on. For example, a non-grid point on the image frame FO is between two grid points on the same FO, and let the position vector of the former be p, that of the latter two be q, r, p = (l-a) q + ar and note When it can be stated, a point p 'corresponding to the point p on the image frame Fn is p' = (l-a) q 'using the points q' and r 'corresponding to q and r on the same Fn, respectively. You can write + a r '. A method to describe this in general and to describe non-grid points by three grid points is known as bilinear interpolation. You may use this.

Step d

Virtual image frame F1 'and real image frame F1 pair Sl, virtual image frame F2 and real image frame F2 pair S2, · ·, virtual image frame Fn, and real image For each set of frame Fn, the presence or absence of a thread Sk (k = l, · · ·, n) with a large difference between image frames included in the set is determined based on a predetermined determination criterion .

The "predetermined criterion" may simply be "large" if the difference is compared with a predetermined threshold value and exceeded. That is, attention may be paid to the difference itself. Here, if the threshold value is determined by experiment, the other parameters will be lower!

[0108] As another "criterion", the energy value calculated by the base technology, that is, a physical quantity that indicates the magnitude of the difference may be used. The energy increases as the position of corresponding points increases, and as the pixel value increases. Therefore, in general, the larger the energy, the more accurate the response, and the higher the possibility. If the response is not accurate, the difference tends to increase. Therefore, if the energy value between image frames is larger than a predetermined threshold, the difference may be determined as large.

Step e, f

When there is a large difference Sk, at least a matching is calculated between the image frame Fh (h = 0, 1, · · ·, k 1) and the image frame Fk to generate corresponding point information Mh-k . Subsequently, the corresponding point information MO-n is corrected using the information of the corresponding point information Mh-k (hereinafter, the corresponding point information MO-n before the correction is simply referred to as "original MO-n" or "MO-n". The corrected corresponding point information MO- n is called “Modified MO- n” or “MO- n,”). As an example, if h = 0, then MO-k is determined, and a point pk corresponding to the point θθ is determined. The modified MO-n is, for example, "pO → pk → pn", which is a broken line type expression. As a result, the point ρθ reaches pn via pk, so the difference in Sk is smaller than in the linear form of expression by the original MO-n. Note that the modified MO-n may be a curved line instead of a broken line. In that case, "pO → pk → pn "! Let's describe the locus by, for example, a spline curve so that the approximation is realized!

Step g

Output encoded data in a format including at least an image frame FO and a modified MO-n. If the image frame FO and the modified MO-n are present, the point ρθ on the image frame FO can be made to pass through pk to pn, so that it is established as encoded data. The encoded data may include data of the image frame Fn. In that case, pixel values that are not moved simply by moving ρθ can also be changed by interpolation. Assuming that the pixel value of the point θθ is VpO and that of the point pn is Vp n, the pixel value Vpi of the point pi can be interpolated as follows. For non-grid points, let's use bilinear interpolation as in the case of coordinates.

Vpi = (Vpn-VpO) -i / n + VpO

However, since it is conceivable that the points θθ and pn are originally detected as corresponding points because the pixel values are close to a certain extent, the description of the change in the pixel values is not essential. Therefore, only FO among image frames FO and Fn holds as encoded data.

According to the above method, relatively high image quality can be realized with relatively small data. The reason is that the correction MO-n is sufficient if there is at least one piece of data as an image frame, and the amount of data is greatly reduced by appropriately reducing the "predetermined number of points". Therefore, although the total amount of data is small, since the original MO-n is corrected and handled for Sk with a large difference, the image quality is greatly improved. This fact is also confirmed by experiments.

(2) Step d is to determine the size of the difference between image frames in a predetermined area unit. The region is obtained, for example, simply by meshing the image frame. The size of the area may be selected by combining the image quality and the amount of data in an experiment.

In the case of determining the region unit in step d, when the virtual image frame Fk ′ and the real image frame Fk correspond to each other, that is, when the difference between the spatially identical regions becomes larger than a predetermined threshold value, Determine Sk as "a large set of differences". In this case, the total sum of differences between the virtual image frame Fk and the real image frame Fk over the entire image frame is not necessarily large, and it is not necessary. You just have to find the area where the difference is large. Here, instead of focusing on the difference itself, or in addition to that, it is also possible to calculate the energy total for each area and compare this with the threshold value. Hereinafter, in the embodiment, It is assumed that comparisons are made region by region throughout the image frame.

FIGS. 20 (a) to 20 (d) are diagrams showing examples of target image frames. The comparison by region is particularly effective, as shown in Figs. 20 (a) to 20 (c), in which the target image frame reflects "ball bound" and the image frame FO in FIG. 20 (a) is before bounding. The image frame Fn shown in FIG. 20 (c) is a case where the image frame Fk shown in FIG. 20 (b) corresponds to the moment of bounding after bounding. Even if you use the original MO-n to generate Fl, F2, ... by interpolation, the ball only moves linearly from the pre-bound position to the post-bound position, which is unnatural. In FIGS. 20 (a) to 20 (c), the actual ball trajectory is indicated by a solid line, and the trajectory of the ball when generated by interpolation using the original MO-n is indicated by a broken line. If the difference is calculated for each area, it is possible to detect Sk and the pairs before and after Sk by adjusting the threshold. For example, in the example of FIGS. 20A to 20C, in the image frame Fk, the difference between the area including the actual ball Bk and the position Bk ′ of the ball generated by interpolation exceeds the threshold and Sk is It is detected. If they can be detected, they can be reflected in the correction M0-n, so that the state of the ball's bounce can be expressed more accurately. Figure 20 (d) shows the trajectory of the ball reproduced based on the modified MO-n.

A plurality of areas with large differences may be detected in the same Sk. For example, it is assumed that a point pOO on the image frame F0 is included in an area A having a large difference, and a point ρθΐ is included in another area B having a large difference. Also in this case, the modified MO-n has the form “ρ00 → ρ0 / ρ01 → ρ1 / 1 /. For similar reasons, multiple regions may be detected in different sets.

(3) Step e may include the following substeps.

el) Corresponding point information between image frames F0 and F1 when there is a large difference set Sk M1 2 between ··· · · · · · · · · · · · · Fk-M (k (1) — 1) — Find k respectively. e2) Combine M0-1, Ml-2, ... M (k-1) -k to generate MO-k. This has already been mentioned.

(4) Step e2 obtains pi corresponding to point θθ by corresponding point information MO-1 and obtains p2 corresponding to point pi by corresponding point information Ml-2 · · · · corresponding point information M Find the pk corresponding to the point pk—1 by (k−l) —k and specify the points corresponding to ρθ in the order of pl, p2, · · · · pk By determining the value, it is possible to finally identify pk corresponding to ρθ and generate MO−k. This has already been mentioned.

(5) Step f may use the information of corresponding point information MO-k to generate a corrected MO-n in a format indicating the trajectory of point ρθ through pk to pn. This has already been mentioned.

(6) If it is found in step d that there is a set Sk having a large difference, step g is a form including an image frame FO and a correction MO-n and including information on the difference in set Sk You may output the sign data of. For example, if the difference in the region A of the set Sk is large, it may be considered that complete image quality can not be obtained only by changing the original MO-n to the modified MO-n. In that case, the difference may be reduced by a considerable amount by correcting the corresponding point information MO-n, and then the remaining difference may be further described in the code data. In this case, the format of the encoded data is, for example, as follows.

FIG. 21 is a diagram showing a format of code data in the image coding technology according to the first embodiment. The code data D1 includes the image frame (i), the corrected corresponding point information (ii), the presence / absence bit (iii), the difference information Gv), the value of k (V), and the position Z of the area. It is comprised including the shape method (vi). The contents of each data are as follows.

i) FO data or FO + Fn data

ii) Modified MO-n (Form of line, curve etc.)

iii) "Existence bit" indicating the presence or absence of difference information

iv) Difference information (format of image data)

v) The value of k (for multiple, kl, k2, · · ·)

vi) Region position Z shape information (in the case of multiple Al (kl), A2 (k2), '

Here, the difference information is “data relating to the area A of the virtual Fk ′ and the actual Fk”, and takes the form of image data of the area A. If the presence / absence bit affirms the presence of the difference information, the difference information is valid. If the difference information is denied, the information below the difference information is ignored in the image decoding described later.

Because of the form of image data, it is desirable that the difference information be compressed by a known compression method and then stored in code data. Difference information has no meaning as an image, and it is easy to generate a clear statistical bias around zero, so a relatively high compression ratio is realized. It is also advantageous in that it can be done.

The value of k and the position of the area Z shape information indicate which set of sk the difference information relates to. The decoding device appropriately adds the difference between the value of k and the position of the area Z shape information.

(7) The difference information in the set Sk is included in the encoded data only for the region where the difference is large in the image frame. This has already been mentioned.

(8) Along with the difference, at least the value of k and the position information of the area A are included in the code data.

. This has already been mentioned.

(9) The difference is included in the encoded data after being subjected to compression processing. This has already been mentioned.

(10) An aspect of the image code device includes the following configuration. The processing operation itself of each configuration has already been described.

'Matching processing unit 20

Calculate matching between FO and Fn to generate MO-n.

• Intermediate frame generator 22

The point to move the point ρθ on FO to the corresponding point pn on Fn by MO—n is divided into n points 1 on point 1 corresponding to 0, 1 on F2 point corresponding to θθ p2, · · · By performing a process of calculating a point pn corresponding to ρθ on Fn for a predetermined number of points on F0, a virtual F1 can be calculated using a set of points pi corresponding to the points of the predetermined constant. ,, Using the set of points p2 Virtual F2 ', ··· · · Create a set of virtual points Fn' using the set of points pn.

· Determination section 24

The virtual F1 'and the real F1 pair Sl, the virtual F2 and the real F2 pair S2, ··· ···· It is judged based on a predetermined judgment standard whether there is a large thread Sk (k = l, · · ·, n) between the image frames to be processed.

Based on the above configuration, the matching processing unit calculates the matching between Fh (h = 0, 1, · · ·, k-1) and Fk if there is a pair Sk having a large difference. Mh-k is generated, and MO-n is corrected using this Mh-k information. The apparatus further includes an output unit 40 that outputs code data of a format including at least FO and a modified MO-n.

(11) Another aspect of the image coding method is the following processing. Process 1: Perform matching calculation between both image frames of the image group including 3 or more image frames. An example of the both-end image frame is the already described FO and Fn.

Process 2: Based on the corresponding point information between the two end image frames obtained as a result of the matching calculation, an intermediate image frame sandwiched between the both end image frames is virtually generated by interpolation. This example is the generation of Fl ', F2', ··· by interpolation described above.

Process 3: For any region on the image, it is determined whether or not any of the virtually generated intermediate image frames has a difference greater than or equal to the actual intermediate image frame. Judge under the judgment criteria of That is, two image frames are compared in area units. This example has already been described as the set Sk. Here, it is referred to as “difference above the tolerance value”, and if it is less than the tolerance value, processing 4 can be skipped.

Process 4: Generate encoded data including at least one of both-end image frames and corresponding point information. If it is determined that an area having a difference greater than or equal to the allowable value is present, the difference information on the area is generated together. This has already been mentioned.

(12) Another aspect of the image code device includes the following configuration. The processing by each configuration has already been described.

'Matching processing unit 20

Perform matching calculations between the two end image frames of an image group that includes three or more image frames.

• Intermediate frame generator 22

Based on the corresponding point information between the end image frames obtained as a result of the matching calculation, an intermediate image frame sandwiched between the end image frames is virtually generated by interpolation.

· Judgment unit 24

For any region on the image, whether V or V among the virtually generated intermediate image frames has a difference between the actual intermediate image frame and the actual intermediate image frame or not based on a predetermined determination criterion Determined by

, Output unit 26

Code data including at least one of the both-end image frames and corresponding point information is output. If it is determined that there is an area having a difference greater than or equal to the allowable value, It outputs together the difference information about it.

[2] Image Decoding Technology

The image decoding technology according to the first embodiment operates to decode encoded data generated by the image coding technology of [1]. Therefore, an image coding and decoding system having a combination of the technique of [1] and the following techniques is a modification of the embodiment. In the following, the description will be given with the serial number of the whole.

FIG. 22 shows the flow of the present image decoding technology, and at the same time shows the configuration of an image decoding apparatus to be described later.

(13) An aspect of the image decoding method according to the embodiment executes the following steps.

P) Input code data of a format including at least FO, MO-n and predetermined difference information. The predetermined difference information is, for example, “iv) difference information (format of image data)” in (6).

q) A point to move the point θθ on FO to the corresponding point pn on Fn by MO− n is divided into n points 1 on 0, 1 corresponding to 0, F2 on 対応 θ corresponding to p 2 · · · · Calculate the point pn-1 corresponding to ρθ on Fn-1. Same as step b.

R) By performing step q for a predetermined number of points on F 0, using a set of virtual F 1 ′ and point p 2 using a set of points pi corresponding to the predetermined number of points Virtual F 2 ', · · · · Create virtual F n' using sets of points pn respectively. Same as step c.

s) The virtual F1 'and the real F1 pair Sl, the virtual F2 and the real F2 pair S2, ··················· · Identify Sk (k = 1, · · ·, n) for which information is given. This identification is performed, for example, by “iii) presence / absence of difference information” and “Z” or “v) values of k” (for plural, kl, k2, ···) in (6). .

T) A modified virtual Fk ′ ′ is generated by adding the difference determined by the difference information to the virtual Fk ′. When adding the differences, refer to (6) “vi) Position Z shape information in the area (for multiple cases, Al (kl), A2 (k2), '').

u) As a decryption result, F0, virtual Fl ', virtual F2', · · · · corrected virtual F k '', virtual Fk + 1 ', · · · virtual Fn — Output 1 'in this order, that is, display order Do. For Fn, the virtual one may be output, but if there is a real Fn, it may be output. The output destination is, for example, a display control unit, which is converted to a display device. The image decoding apparatus according to an aspect of the embodiment includes such display control unit and display device.

(14) The difference information describes the difference only for the area where the difference between image frames is large, and step t specifies the position information of the area when adding the difference. This has already been mentioned.

(15) The difference information is compressed, and in step t, the difference information may be expanded and then added.

(16) MO- n may be generated in a form that indicates a trajectory from ρθ to pn and to pn. That is, M0-n referred to here is a modified M0-n generated on the encoding side which is not the original M0-n. According to this aspect, the image quality is improved.

(17) The image decoding apparatus according to the embodiment includes the following configuration.

• Input section 30

Code data in a format including at least F0, MO-n and predetermined difference information is input. The input unit may be any interface, and the memory in which the encoded data is stored may be a read control unit that reads it.

• Intermediate frame generator 32

The point to move the point ρθ on FO to the corresponding point pn on Fn by MO—n is divided into n points 1 on point 1 corresponding to 0, 1 on F2 point corresponding to θθ p2, · · · By performing a process of calculating points pn-1 corresponding to θθ on Fn-1 by connecting a predetermined number of points on F0, a set of points pi corresponding to the predetermined number of points is used. The virtual F1 'and the set of points p2 are used to generate the virtual Fn' using the set of virtual F2 ', ..., and the point pn.

[0137] · Identifying part 34

The virtual F1 'and the real F1 pair Sl, the virtual F2' and the real F2 pair S2, · · · · · · · · 差分, each pair of the virtual Fn, and the real Fn pair Sn, Identify the Sk (k = 1, · · ·, n) for which information is given. Examples of specific methods have already been mentioned. • Intermediate frame correction unit 36

A corrected virtual Fk ′ ′ is generated by adding the difference determined by the difference information to the virtual Fk ′.

• Output unit 38

As a decryption result, FO, virtual F1, virtual F2 '· · · · corrected virtual Fk', virtual Fk + 1, · · virtual Fn-1 ' Output. The output destination may be data for display devices, or a display control unit that generates a signal.

The apparatus may further include the display control unit, and may further include the display itself.

(18) Another aspect of the image decoding method according to the embodiment carries out the following processing.

Process 1: Input encoded data including one of both end image frames of an image group including three or more image frames, corresponding point information between the both end image frames, and predetermined difference information.

Process 2: Based on the corresponding point information, an intermediate image frame sandwiched between both end image frames is virtually generated by interpolation.

Process 3: Of the set of each of the virtually generated intermediate image frames and the corresponding actual intermediate image frame, an image of the set of intermediate image frames described in the code data as a large difference set. Above the difference is large !, specify the area.

Process 4: A modified virtual image frame is generated by adding the difference in the area to a virtual image frame included in a set having a large difference.

Process 5: As a decoding result, one of the both-end image frames, a virtual intermediate image frame corrected for a set with a large difference, and a virtual intermediate image frame for another set are decoded Output as data.

(19) Another aspect of the image decoding apparatus according to the embodiment includes the following configuration.

• Input section 30

Coded data including one of both end image frames of an image group including three or more image frames, corresponding point information between the both end image frames, and predetermined difference information is input. • Intermediate frame generator 32

Based on the corresponding point information, an intermediate image frame sandwiched between both end image frames is virtually generated by interpolation.

, Area identification unit 34 '

Of the set of each of the virtually generated intermediate image frames and the corresponding actual intermediate image frame, the difference is on the image of the set of intermediate image frames described in the code data as a large set of differences Identify areas where the An example of the identification method has already been done.

• Intermediate frame correction unit 36

A modified virtual image frame is generated by adding the difference in the area to a virtual image frame whose difference is included in a large set.

• Output unit 38

As a decoding result, one of the both-end image frames, a virtual intermediate image frame corrected for a set having a large difference, and a virtual intermediate image frame for another set are output as decoded data. Do. Here too (17) there are similar variations.

(20) The computer program may be executed by the computer program by each step of the image coding method described in (11) or the image coding method described in the other part.

(21) The computer program may cause a computer program to execute each step of the image decoding method described in (18) or the image decoding method described in the other part.

(22) The image processing system according to the embodiment has an image coding unit (100 in FIG. 19) and an image decoding unit (200 in FIG. 22). This system can be used, for example, as a moving picture recording / reproducing apparatus using a node disc. First, the image code unit has the following configuration.

'Matching processing unit 20

• Encoding side intermediate frame generator 22 Based on the corresponding point information between the end image frames obtained as a result of the matching calculation, an intermediate image frame sandwiched between the end image frames is virtually generated by interpolation.

• Judgment part 24

, Write control unit (26)

Code data including at least one of both-end image frames and corresponding point information is written to a memory (not shown). If it is determined that an area having a difference greater than or equal to the allowable value is present, the difference information on the area is also written.

On the other hand, the image decoding unit has the following configuration.

• Read control unit (30)

Read the code data from the above memory.

'Decoding side intermediate frame generator 32

Based on image frame data and corresponding point information included in the code data, an intermediate image frame sandwiched between both end image frames is virtually generated by interpolation. Note that the same configuration as the code side intermediate frame generation unit may be shared. In that case, a selector may be provided which selects one of the output of the matching processing unit on the encoding side and the output of the reading control unit on the decoding side and inputs it to the intermediate frame generation unit. The selector selects the output of the matching processor at the time of coding, and the output of the read controller at the time of decoding.

, Area identification unit 34

The encoded data includes the difference information to specify an area. An example of the method has already been described.

• Intermediate frame correction unit 36

A corrected virtual image frame is generated by calculating the difference in the region with respect to a virtual image frame including the specified region.

[0146] · Output unit 38

As a decoding result, it is assumed that one or both of the both-end image frames and the temporary For a hypothetical image frame, the modified virtual intermediate image frame and the area are not included! / For the virtual image frame, the virtual intermediate image frame itself is output as decoded data Do.

Second Embodiment

The coding technique and the decoding technique according to the second embodiment will be sequentially described below. FIG. 23 shows the configuration of the image coding apparatus according to the second embodiment, and at the same time shows the flow of the image coding technology.

[1] Configuration of Encoding Device

CPF: A critical point filter based on the technology, that is, an image matching processor using a singular point filter. The matching between key frames is calculated on a pixel basis and the corresponding point information is output. This information is output as a file. This file describes the force with which each pixel of the source side keyframe corresponds to any pixel of the destination side keyframe. Therefore, based on this file, if you interpolate the position and pixel value of corresponding pixels between these key frames, a morphing image between the two key frames can be obtained. If this file is applied to only the key frame on the source side to perform interpolation, it is possible to obtain a morphing image in which each pixel of the key frame on the source side is gradually moved to the position of the corresponding pixel described in this file. Be In this case, only the position is interpolated between corresponding pixels.

Although an image matching processor can be widely used in place of CPF, pixel matching with high accuracy is ideal from the point of the present embodiment, and the base technology satisfies the condition. .

[0150] DE: Differential Encoder Differential (error) encoder. The difference between the two image frames is subjected to variable length coding based on Huffman coding and other statistical methods.

[0151] NR: maskable Noise Reducer. Human vision often can not recognize subtle changes. For example, in a portion where the change in luminance is intense, that is, in a region where the spatial frequency component of luminance is high, the error in the luminance change is not visually grasped. Noise is superimposed on moving image information in various forms, and such data is visually recognized simply as noise and has no meaning as an image. Such visual meaningless Ignoring tasteful random information, or 'visual mask information', is important to achieve higher compression rates.

Although quantization in current block matching utilizes visual mask information on luminance values, there are some visual mask information other than luminance values. NR uses spatial position information as well as visual masks for temporal position information. The visual mask of spatial position information makes use of the fact that the phase component of the spatial frequency is difficult to visually recognize in the case of an image with a complex brightness change in relation to the position information. The visual mask of the temporal position information makes use of the fact that the change in the time direction is severe, and even if the data change in the time direction is shifted in the part, the difference is not easily recognized visually. In these cases, the deviation is also detected by comparison with a predetermined threshold value.

[0153] At least in the current MPEG scheme of block matching and differential coding, it is difficult to positively use these masks. On the other hand, the decoding process in the base technology generates a change in the moving image by trilinear or other interpolation to avoid discontinuities that cause visual artifacts. It has the function of scattering in the direction of space and time only in the direction to make it visually inconspicuous. NR is useful in combination with the base technology.

DD: Differential Decoder Differential (error) decoder. The accuracy of the image frame is improved by decoding the difference encoded by DE and adding it to the image frame in which the difference has occurred

[0155] DC: Differential Comparator. For each pair of virtual F1 'and real F1 pair Sl, virtual F2 and real F2 pair S2, ···, virtual Fn and real Fn pair Sn The presence or absence of the set Sk (k = l, · · ·, n) with differences between the included image frames is judged based on a predetermined judgment criterion.

In addition to the above, there is a function of causing corresponding point information to act on a single key frame and virtually generating another key frame by moving the pixel of the key frame. Hereinafter, a functional block that realizes this function is called a pixel shifter.

The CPF and the DC in the second embodiment can correspond to the matching processing unit 20 and the determination unit 24 in the first embodiment, respectively. [2] Encoding Process

In FIG. 23, “FO” and the like indicate each frame of the moving image to be processed, and “MO-n” indicates corresponding point information between FO and Fn generated by CPF. The following procedure is followed by the code i. Hereinafter, the case of n = 4 will be described. Also, in FIG. 23, n = 8.

2

A) Matching is calculated by the CPF between the first and second key frames (F0, F4) sandwiching one or more image frames (F1 to F3), and the correspondence between the first and second key frames Step of generating point information (MO 4).

b-1) A step of generating a modified MO-4, based on the corresponding point information (MO-4) between the first and second key frames. For the generation of the modified MO-4 ′, use the technology described in the first embodiment.

b-2) Based on the corrected corresponding point information (MO-4 ') between the first and second key frames, move the pixels included in the first key frame (FO) by the pixel shifter to make a virtual Generating the second key frame (F4,) of.

c) Compressing and encoding the difference between the real second key frame (F4) and the virtual second key frame (F4,) with DE (denoted as DE + NR) with NR function.

d) The first key frame (FO), the modified corresponding point information (M 0-4,) between the first and second key frames, and between the real second key frame and the virtual second key frame Outputting the compression encoded difference (Δ4) as code data between these key frames (FO, F4). The output destination may be a recording medium or a transmission medium. In practice, it is integrated with the information output in j) described later, and is output to a recording medium as moving picture code data.

Step b- 1) will be described in detail. As described in FIG. 19, the difference S between the real frame Fi and the corresponding virtual frame Fi 'is a large difference Sk between the image frames included in the set Sk (k = l, · · ·, n) The presence or absence of is determined based on a predetermined determination criterion. The determination result is output to DE + NR, the difference of the pair Sk is compressed, and the difference information is output as A k.

Also, the value of k indicating the set in which the difference information exists is output to the CPF. The CPF calculates corresponding point information of adjacent frames in the frames FO to Fk. That is, the corresponding points between FO and Fl, F1 and F2, F3 and F4, one, Fk-1 and Fk, '| · blue report (MO-1, Ml-2, Calculate M2— 3, · · ·). A combiner CONCAT combines these corresponding point information and outputs corresponding point information M0-k. Corresponding point information M0-k is combined with corresponding point information M0-k generated in step a) to generate corrected corresponding point information M0-n.

[0162] Continue! /, Perform the following processing after the second key frame (F4).

e) Decoding with DD the difference (Δ4) compression-coded between the real second key frame (F4) and the virtual second key frame (F4,).

f) generating in DD an improved virtual second key frame (F4 ′ ′) from the decoded difference and the virtual second key frame (F4 ′).

g) Calculate matching by CPF between the second and third keyframes (F4 and F8) sandwiching one or more image frames (F5 to F7), and the corresponding point information between the second and third keyframes ( M4 8) generating step.

h-1) A step of generating (M4-8) based on the corresponding point information (M4-8) between the second and third key frames.

h-2) Based on the corrected corresponding point information (M4-8 ') between the 2nd and 3rd key frames, it is included in the virtual 2nd key frame (F4 ") improved by the pixel shifter. Generating a virtual third key frame (F8,) by moving the

i) Compress and encode the difference between the real third key frame (F8) and the virtual third key frame (F8,) with DE + NR.

j) Modified corresponding point information (M4-8,) between the second and third keyframes, and a difference (Δ8) compressed and encoded between the real third keyframe and the virtual third keyframe Outputting as key data between these key frames (F4, F8). The output destination is generally the same as the output destination of d).

The following steps e) to j) are sequentially repeated for the subsequent key frames, and when the predetermined group end key frame is reached, the iterative process is terminated. The group end key frame corresponds to the end frame of one GOP in MPEG. Therefore, the next frame of this frame is newly regarded as the first key frame as the first frame of the new group, and the following processing is repeated. By the above processing, it is possible to use the keyframe for the group corresponding to the GOP in M PEG (hereinafter simply referred to as the group). Only one image corresponding to a frame (I picture in MPEG) needs to be encoded and transmitted.

[3] Configuration of Decoding Device

FIG. 24 is a diagram showing a flow of image decoding technology and a configuration of the image decoding apparatus according to the second embodiment.

The configuration is simpler than the coding side.

DD: Same as the DD of the encoding device.

INT: INTerpolator inter-row processor.

[0165] Besides these, there is a pixel shifter similar to the encoding side. An intermediate frame is generated by interpolation from two image frames and corresponding point information.

[4] Decryption processing

Decryption proceeds in the following order. Here, it is explained as として = 4 and η = 8.

2

K) Modified corresponding point information (M0-4) between the first and second key frames (F0, F4) sandwiching one or more image frames (F1 to F3), and the first key Step to get frame (F0). Acquisition may be from either a transmission medium or a recording medium.

1) Based on the corrected corresponding point information (M0-4,) between the first and second key frames, the image shutter moves the pixels included in the first key frame (F0) to generate a virtual image. Generating a second key frame (F4,) of

m) On the side of the code 1) In the same process, a virtual second key frame (F4 ') is generated, and on the side of the code, the difference between this and the actual second key frame (F4) Step of generating the compression coding data (Δ 4) of, and acquiring it.

o) Decode the differentially encoded data (Δ 4) of the acquired difference with DD and add it with the virtual 2nd key frame (F 4,) to improve the virtual 2nd key frame (F 4 ,,) Step to generate.

P) Based on the corrected corresponding point information (M0-4) between the first and second key frames, the INT generates the first key frame (F0) and the improved virtual second key frame (F0) Generating intermediate frames (F1,..., F3,...) To be present between these keyframes (F0, F4 ") by performing interpolation calculations between F4"). q) The first key frame (FO), the generated intermediate frame (F1 to F3), and the improved virtual second key frame (F4 ") are displayed as decoded data between these key frames, etc. Output to

[0167] Continue! /, Perform the following processing after the second key frame (F4).

r) obtaining corrected corresponding point information (M4-8 ') between the second and third key frames (F4, F8) sandwiching one or more image frames (F5 to F7);

s) Pixels included in the improved virtual second key frame (F4 ") by the pixel shutter based on the corrected corresponding point information (M4-8) between the second and third key frames Generating a virtual third key frame (F8,) by moving.

t) A virtual third key frame (F8,) is generated in advance on the code side by the same processing as on the code side, and this and the actual third key frame (F8) are generated on the coding side And differential compression encoding data (Δ 8) is generated and obtained.

u) Generating an improved virtual third key frame (F8 ′ ′) by DD from the differentially encoded compression data (Δ8) and the virtual third key frame (F8 ′) obtained.

V) Based on the corrected corresponding point information (M4-8) between the second and third key frames, INT improves the virtual second key frame (F4 ′ ′) and the virtual Generating an intermediate frame (F5,..., F7,) to be present between these keyframes by performing interpolation calculations between the third keyframe (F8 ") of

w) Improved virtual second key frame (F4 ′ ′), generated intermediate frames (F5 ′ to F7,), improved virtual third key frame (F8,. A step of outputting the decoded data between F4 "and F8") to a display device or the like.

[0168] Hereinafter, with respect to the subsequent key frames, the above steps r) to w) are sequentially repeated, and when the group end key frame is reached, the iterative process is ended. The frame following this frame is newly regarded as the first key frame as the first frame of the new group, and the process from k) onward is repeated.

[5] Advantages of the present embodiment

According to the encoding and decoding techniques according to the second embodiment, in addition to the encoding and decoding techniques according to the first embodiment, the following merits can be enjoyed. When the CPF of the base technology is used for image matching, the compression accuracy realized in the present embodiment is high because the matching accuracy is high. The reason is that the difference to be compressed by DE + NR is initially smaller and the statistical bias is larger.

[0171] Similarly, when CPF is used, this coding method does not use block matching, so even if the compression ratio is increased, there is no block noise that causes problems in MPEG. Of course, block noise is not found in image matching other than CPF.

[0172] Although MPEG originally considers only the minimization of differences, CPF detects a portion that should be handled originally, so a compression rate higher than that of MPEG can ultimately be realized.

The coding device can be configured by an image matching processor, a differential encoder with a noise reduction function, a differential decoder, and a pixel shifter, which is simple. In addition, the noise reduction function is an optional function, which may not be necessary. Similarly, the decoding device can be composed of an interpolation processor, a differential decoder, and a pixel shifter, which is simple. In particular, the decoding device has a light amount of processing that requires no image matching.

[0174] Every time a virtual key frame is generated, the difference between it and the actual key frame is Δ4,

In order to capture data into code data such as Δ8 etc., even though only one complete key frame is coded for each group, there is no accumulation of error even if a long moving image is reproduced.

[0175] [6] Modification Technology

When generating the corresponding point information file by performing matching calculation between the first and second key frames (FO, F4), even considering intermediate frames (F1 to F3) existing between the key frames. Good. In that case, CPF calculates matching for each pair of FO and Fl, F1 and F2, F2 and F3, and F3 and F4, and generates four files (provisionally called partial files MO to M3). Then, combine these four files and output as one corresponding point information file.

[0177] For integration, first, it is specified where each pixel of FO moves on F1 by MO. Subsequently, it is specified by Ml where the pixel specified on F1 moves on F2. If this is done to F4, four partial files will make the correspondence between FO and F4 more accurate. There is some distance between FO and F4 and there is a gap between adjacent image frames than between them. This is because the toching accuracy is generally higher.

Note that this method ultimately improves the matching accuracy of F0 and F4. The force response point information file may be expressed as a function of time. In this case, partial files should not be merged, but the four states should be regarded as corresponding point information files and provided to the decryption side. The decoding side generates Fl from FO, F4, and MO, and generates F2 from FO, F4, MO, and Ml, and can decode more accurate moving pictures by iterative processing.

Third Embodiment

Another embodiment of the present invention relates to the apparatus shown in FIG. Here, the matching energy of the image is introduced as a measure of the accuracy of the image matching, and this is used for noise reduction in DE + N R and so on. The following description will be made using FIG. 23 as appropriate, but the configuration and function are the same as those of the second embodiment, with no particular reference.

Here, the matching energy is determined by the difference between the distance between corresponding points and the pixel value, and is shown, for example, in Expression 49 in the base technology. In this embodiment, this matching energy obtained at the time of image matching in the CPF is used as a by-product. In the image matching of the base technology, for each pixel between key frames, the one with the lowest energy of the image is detected as the corresponding point. Focusing on these characteristics of the base technology, good matching is achieved for pixels with low matching energy, while for locations with high matching energy, naturally there is a large change in position and pixel value between key frames. Force that should have been a pixel In some cases, it can be evaluated that there may have been a matching error. As will be described in detail below, in this embodiment, the compression ratio of the difference is increased for the portion with high matching accuracy. In another example, the matching information may be highly compressed on the estimated pixel.

[1] Encoding process

In the encoding apparatus of the present embodiment, when the CPF calculates the matching of the first and second key frames, the CPF obtains the matching energy of each pixel corresponding between the two frames at the same time. An energy map describing the matching energy of each pixel is generated on a key frame (FO). Similarly, generate energy maps between other adjacent keyframes. That is, the energy map is the correspondence between keyframes and Each matching energy is basically data described for each pixel of the previous key frame. The energy map may be represented on the later key frame among the previous and subsequent key frames. The energy map is sent from CPF to DE + NR by a route not shown. In DE + NR, this energy map is used to evaluate the quality of matching between key frames, and based on that, the difference between a virtual key frame and a real key frame is adaptively compressed and encoded. In addition to energy mapping, corresponding point information files are also sent to DE + NR through a route not shown.

FIG. 25 is a diagram showing a configuration of DE + NR of FIG. 23 according to the present embodiment. The DE in Figure 25

The + NR includes a difference calculator 10, a difference compression unit 12, an energy acquisition unit 14, and a determination unit 16. Of these, the former two correspond exclusively to DE and the latter two correspond exclusively to NR. The force to explain the operation of DE + NR when coding the first key frame (FO) and the second key frame (F4) and the image frame (F1 to F3) in the middle thereof. Each subsequent key frame The operation of DE + NR is the same in coding of an image frame.

[0183] The difference calculator 10 obtains the actual second key frame (F4) and the virtual second key frame (F4,), and takes the difference between the pixel values of positionally corresponding pixels. This forms a kind of image in which each pixel has a difference in pixel value between both key frames, which is called a difference image. The difference image is sent to the energy acquisition unit 14. The energy acquisition unit 14 also receives an energy map and corresponding point information (MO-4) force between the actual first key frame (FO) and the actual second key frame (F4). Be done. The energy acquisition unit 14 utilizes these to acquire the matching energy of the difference image.

First, the acquisition unit 14 acquires, from the CPF, corresponding point information (MO-4) between the first and second key frames. By using this, the difference image strength follows the virtual second key frame (F4,) and the first key frame (FO), so that which pixel of the difference image is any pixel of the first key frame (FO) Acquire the force corresponding to the one that shifted the Then, referring to the energy of each pixel on the energy map represented on the first key frame, the matching energy of the pixel on the first key frame (FO) corresponding to each pixel of the difference image is It acquires as matching energy of each pixel of. The matching energy of the difference image is thus determined. The energy acquisition unit 14 sends the matching energy of the difference image to the determination unit 16. The determination unit 16 uses the matching energy of each pixel of the difference image to determine a high compression target region in the difference image, and notifies the compression unit 12 of information on the force to highly compress any region. The determination is performed as follows, for example. The determination unit 16 divides the difference image into blocks of 16 × 16 pixel units, and compares the matching energy with a predetermined threshold value for all the pixels included in each block. If the comparison result shows that the matching energy of all the pixels in the block is less than or equal to the value, the area is determined as a high compression target block.

The compression unit 12 compresses the difference image in JPEG format. At this time, the compression rate is adaptively changed between the normal area and the high compression corresponding area using the information on the high compression corresponding area notified from the determination unit 16. Specifically, for a block to be highly compressed, processing such as increasing the quantization width of the DCT coefficient compared to a normal block can be used. In another example, in the differential image, the pixel value of the block to be highly compressed may be set to 0, and then JPEG compression may be performed. In any case, the reason for highly compressing the region where the matching energy is low is based on the following concept.

That is, as described above, pixels with low matching energy can be regarded as having a good matching result between key frames. Therefore, in the difference image, the matching energy is low, and the difference between the actual second key frame (F4) and the virtual second key frame (F4 ') is generated for the part with a matching energy. If so, you can think of it as noise. Therefore, regions with low matching energy in the difference image can be compressed significantly compared to other regions that do not care about loss of information due to high compression. On the other hand, in the area where the matching energy is large, there may be an error in the matching, and the difference between the virtual second key frame (F4,) and the real second key frame (F4) is important in decoding. Information, so keep the compression rate low and give priority to decoding accuracy.

[2] Advantages of the Third Embodiment

After the above processing, the compression unit 18 outputs the compression encoded difference (Δ4) of the actual second key frame (F4) and the virtual second key frame (F4 ′). Code according to the present embodiment The encoding device can adaptively compress the difference information between the real key frame and the virtual key frame according to the importance for accurate decoding with the code image more faithful to the original image. Therefore, high coding efficiency can be realized while maintaining decoding accuracy. The importance is, of course, that the advantages of the first embodiment can be enjoyed in this embodiment as well.

[3] Modification of Third Embodiment

As a modification of this embodiment, it is empirically recognized that a pixel having a large matching energy, in particular, a pixel having a correspondence tendency significantly different from the correspondence tendency of neighboring pixels is recognized as having a matching error in many cases. Pixels whose energy is significantly different from surrounding pixels can be evaluated as a matching error, and this can be introduced into noise reduction. In this case, DE + NR compares the matching energy of each pixel of the second key frame (F4), for example, with the average of the matching energy of the other pixels in the block of 9 × 9 pixels centered on itself. As a result of comparison, if the difference between the two exceeds the predetermined value V, it may be determined that such a pixel causes a matching error.

The correspondence information causing the error can be considered as meaningless data for the decryption side, and the difference information between the actual second key frame (F4) and the virtual second key frame (F4,) In the information, data on pixels causing a matching error can be said to be noise. Therefore, it is not necessary to pay attention to information loss due to high compression, and DE + NR is a pixel corresponding to a matching error between real key frames in a difference image between real key frames and virtual key frames. Is compressed at a high rate compared to other pixels. Note that the matching error determination compares, for example, the tendency of the motion vector of the surrounding pixel and the tendency of the motion vector of the pixel of interest, and whether the motion vector of the pixel of interest is significantly different from the tendency of the ambient It may be done with

Also in the third embodiment, as in the second embodiment, in consideration of the intermediate frames (F1 to F3) between the first and second key frames (FO, F4), all the image frames are adjacent to each other. For each pair that matches, the matching is calculated to generate corresponding point information files (MO to M3), and they are integrated to create one between the first and second key frames (FO, F1). A variant technique is conceivable to obtain two corresponding point information files. Similar to the modification technique of the first embodiment, matching accuracy is improved, and accurate video decoding can be realized. Furthermore, with this modification technology, it is possible to calculate the matching energy between each image frame and apply it to scene change detection or the like. The configuration for scene change detection is as follows. First, CPF performs matching calculation for each pair of FO and Fl, F1 and F2, F2 and F3, F3 and F4 ', and obtains energy map, EO, Ε1, Ε2, Ε3 · · · as a by-product Do. Here, if the averaging of the matching energy for the pixels of an entire image frame is compared with a predetermined threshold for scene change detection, and the image immediately after that is used as a new group. For example, based on the energy map エネルギ 5 between F5 and F6, it is assumed that as a result of averaging the matching energy of each pixel of F5 related to the matching of F5 and F6, the value exceeds the key frame addition threshold. In this case, the key frame immediately after that, that is, F6 or less may be set as a new group, and F6 may be set as the first key frame of the next group. This is because when the matching energy is large, it can be considered that a large change has occurred between the images. This enables automatic scene change detection, and group selection in response to scene changes.

Based on each energy map, the average matching energy of pixels in each image frame is calculated and added cumulatively, and when the value exceeds a predetermined threshold, The image frame may be newly registered as a key frame. This is because if the key frame can be added when the cumulative amount of change between image frames exceeds a certain value, the picture quality at the time of decoding can be further improved.

Industrial applicability

The present invention can be used in the field of image compression processing technology.

Claims

The scope of the claims

[1] Image frame columns are described in descending or ascending order as FO, Fl, ···, Fn-1 and Fn (n is an integer greater than 1), and image frame Fi, Fj (i, j = 0, When the corresponding point information indicating the positional relationship between corresponding points among 1, 1, ..., n) is denoted as Mi-j,

a) calculating a match between FO and Fn to generate MO-n;

b) A point to move the point pO on the FO to the corresponding point pn on the Fn by MO-n is divided into n points 1 on the point 1 corresponding to 0, 1 points on the point F2 corresponding to pO p2 · · · ·, Calculating a point pn corresponding to pO on Fn,

c) By performing step b for a predetermined number of points on F0, using the set of points p1 corresponding to the predetermined number of points, a virtual set of points F1 and p2 is used to create a virtual · · · · · Step of generating virtual Fn using sets of points pn, respectively

d) Virtual F1 and real F1 pairs Sl, virtual F2 and real F2 pairs S2, · · · For each pair of virtual Fn and real Fn pairs Sn A step of determining the presence or absence of a set Sk (k = l, · · ·, n) with a large difference between the image frames included based on a predetermined criterion and e) if there is a set Sk having a large difference, at least , Fh (h = 0, 1, · · ·, k-l) and Fk to calculate a match to generate Mh-k,

f) using Mh-k information to correct M0-n,

g) outputting the sign data of the form including at least F0 and the corrected M0-n;

An image coding method comprising:

2. A method according to claim 1, wherein step d determines the size of the difference between the image frames in a predetermined area unit.

[3] In the method described in claim 1, step e is

el) If there is a large difference set Sk, M0-1 between F0 and F1 and M1-2 between F1 and F2 · · · · · M (k-1) -k between Fk-l and Fk Step to find each

e2) M0 — 1, Ml — 2, ... · M (k — 1) — combining k to generate MO — k An image coding method comprising:

[4] In the method according to claim 3, step e2 obtains pi corresponding to ρθ by MO-1 and p2 corresponding to pi by Ml-2 ····· M (k−1) — Find the pk corresponding to pk—1 by k and specify the point corresponding to ρθ in the order of pl, 22, ···, pk to finally identify the pk corresponding to ρθ — An image coding method characterized by generating k.

[5] In the method according to claim 4, the step f generates the corrected M0-n in the form of showing the locus that θθ reaches pn via pk using information of MO−k. An image coding method characterized by

[6] In the method according to claim 1, when it is found in step d that there is a set Sk having a large difference, step g is added to F0 and MO-n corrected and in set Sk. An image coding method characterized by outputting coded data in a format including difference information.

[7] The method according to [6], wherein the information of the difference in the set Sk is included in the code data only for the region where the difference is large in the image frame.

8. The method according to claim 7, wherein at least the value of k and position information of the area are included in the encoded data together with the difference.

9. The image encoding method according to claim 8, wherein the difference is included in the encoded data after being subjected to compression processing.

[10] A sequence of image frames is described in descending or ascending order as FO, Fl, ···, Fn-1, Fn (n is an integer greater than 1), and image frames Fi, Fj (i, j = 0, When the corresponding point information indicating the positional relationship between corresponding points among 1, 1, ..., n) is denoted as Mi-j,

A matching processing unit that calculates a matching between FO and Fn to generate MO-n, and n divides a path for moving a point 上の θ on FO to a corresponding point pn on Fn by MO-n, on 1 A point 1 corresponding to 0, a point p2 corresponding to ρθ on F2, a point p corresponding to θθ on Fn, and a process of calculating pn is executed at a predetermined number of points on F0. The virtual F 1 using the set of points p 1 corresponding to the points of the constant, the virtual F 2 using the set of points p 2 · · · · The virtual using the set of points pn Intermediate frames that generate A frame generation unit,

A virtual Fl and a real Fl pair Sl, a virtual F2 and a real F2 pair S2, · · · A virtual Fn and a real Fn pair are included in each pair for each pair A determination unit that determines the presence or absence of a pair Sk (k = 1, ···, n) with a large difference between image frames based on a predetermined determination criterion, and the matching processing unit If there is a large pair Sk, calculate at least a match between Fh (h = 0, 1, · · ·, k-1) and Fk to generate Mh-k, and let this M h-k information An image coding apparatus characterized in that it uses MO-n to be corrected, and the apparatus further includes an output unit for outputting encoded data in a format including at least FO and the corrected MO-n.

[11] performing matching calculations between end-to-end image frames of an image group including three or more image frames;

A step of virtually generating an intermediate image frame sandwiched between both end image frames by interpolation based on corresponding point information between both end image frames obtained as a result of matching calculation;

For any region on the image, whether V or V among the virtually generated intermediate image frames has a difference between the actual intermediate image frame and the actual intermediate image frame or not based on a predetermined determination criterion A determination step of determining

Code data including at least one of both end image frames and corresponding point information is generated, and if it is determined in the determination step that there is a region having a difference greater than or equal to the allowable value, Generating differential information on a region in the code data;

An image coding method comprising:

[12] a matching processing unit that executes matching calculation between both-end image frames of an image group including three or more image frames;

An intermediate frame generation unit for virtually generating an intermediate image frame sandwiched between both end image frames by interpolation based on corresponding point information between both end image frames obtained as a result of matching calculation;

Of the virtually generated intermediate image frame for any region on the image A determination unit that determines whether V or shift has a difference between the actual intermediate image frame and the allowable value or more based on a predetermined determination criterion;

If code data including at least one of the both-end image frames and corresponding point information is output, and the determination unit further determines that there is a region having a difference greater than or equal to the allowable value, An output unit that outputs difference information on the area;

An image coding apparatus comprising:

[13] A sequence of image frames is described in descending or ascending order as FO, Fl, ···, Fn-1, Fn (n is an integer greater than 1), and image frames Fi, Fj (i, j = 0, When the corresponding point information indicating the positional relationship between corresponding points among 1, 1, ..., n) is denoted as Mi-j,

P) inputting code data of a form including at least FO, MO-n and predetermined difference information;

q) A point to move the point θθ on FO to the corresponding point pn on Fn by MO− n is divided into n points 1 on 0, 1 corresponding to 0, F2 on 対応 θ corresponding to p 2 · · · ·, Calculating a point pn-1 corresponding to ρθ on Fn-1;

r) By performing step q for a predetermined number of points on F0, using the set of points p1 corresponding to the predetermined number of points, a virtual set of points F1 and p2 is used to create a virtual · · · · · Step of generating virtual Fn using sets of points pn, respectively

s) A pair of virtual F1 and real F1 S1, a pair of virtual F2 and real F2, S2, · · ·, each of the pairs of virtual Fn and real Fn pair Sn, the difference information Identifying a given Sk (k = 1, · · ·, n),

t) generating a corrected virtual Fk by adding a difference determined by the difference information to the virtual Fk;

u) A step of outputting F0, virtual Fl, virtual F2, · · · corrected virtual Fk, virtual Fk + 1, ..., virtual Fn-1 as the decoding result, and

An image decoding method characterized by including:

[14] In the method according to claim 13, the difference information describes the difference only for the area where the difference between the image frames is large, and the step t specifies the position information of the area when adding the difference. An image decoding method characterized in that.

15. The image decoding method according to claim 13, wherein the difference information is compressed, and step t decompresses and then adds the difference information.

[16] The image decoding method according to claim 13, wherein MO-n is generated in a format in which ρθ indicates a locus leading to pn via pk.

[17] A sequence of image frames is described in descending or ascending order as F0, Fl, ···, Fn-1, Fn (n is an integer greater than 1), and image frames Fi, Fj (i, j = 0, When the corresponding point information indicating the positional relationship between corresponding points among 1, 1, ..., n) is denoted as Mi-j,

An input unit for inputting code data in a format including at least F0, MO-n and predetermined difference information;

The point to move the point ρθ on FO to the corresponding point pn on Fn by MO—n is divided into n points 1 on point 1 corresponding to 0, 1 on F2 point corresponding to θθ p2, · · · By performing a process of calculating points pn-1 corresponding to θθ on Fn-1 by connecting a predetermined number of points on F0, a set of points pi corresponding to the predetermined number of points is used. An intermediate frame generation unit that generates a virtual Fn by using a set of virtual F1 and a set of points p2 using virtual F1 and a set of points pn;

Virtual F1 and real F1 pairs Sl, virtual F2 and real F2 pairs S2, · · · The virtual Fn and real Fn pairs are given by the difference information among each pair of Sn A specific part that specifies the Sk (k = 1, · · ·, n)

An intermediate frame correction unit that generates a corrected virtual Fk by adding a difference determined by the difference information to a virtual Fk;

As the decoding result, F0, virtual Fl, virtual F2, · · · · · corrected virtual Fk, virtual Fk + 1, · · · output portion that outputs virtual Fn-1 and ,

An image decoding apparatus comprising:

[18] inputting code data including one of both end image frames of an image group including three or more image frames, corresponding point information between the both end image frames, and predetermined difference information When,

Virtually generating an intermediate image frame sandwiched between both end image frames by interpolation based on the corresponding point information; Of the set of each of the virtually generated intermediate image frames and the corresponding actual intermediate image frame, the difference is on the image of the set of intermediate image frames described in the code data as a large set of differences A step of identifying a region with a large value, a step of generating a corrected virtual image frame by adding the difference in the region to a virtual image frame included in a set having a large difference, and a decoding result Step of outputting one of the both end image frames, a virtual intermediate image frame corrected for a set having a large difference, and a virtual intermediate image frame for the other set as decoded data When,

An image decoding method characterized by including:

[19] Input to input code data including one of both end image frames of an image group including three or more image frames, corresponding point information between the both end image frames, and predetermined difference information Department,

An intermediate frame generation unit for virtually generating an intermediate image frame sandwiched between both end image frames by interpolation based on the corresponding point information;

Of the set of each of the virtually generated intermediate image frames and the corresponding actual intermediate image frame, the difference is on the image of the set of intermediate image frames described in the code data as a large set of differences An intermediate frame for generating a corrected virtual image frame by adding a difference in the region to a region specifying unit for specifying a region having a large value and a virtual image frame included in a set having a large difference. And

As a decoding result, one of the both-end image frames, a virtual intermediate image frame corrected for a set having a large difference, and a virtual intermediate image frame for another set are output as decoded data. The output unit to

An image decoding apparatus comprising:

[20] A computer program characterized by causing a computer to execute each step of the image coding method according to claim 11.

21. A computer program causing a computer to execute each step of the image decoding method according to claim 18. In an image processing system having an image code unit and an image decoding unit, the image code unit is

A matching processing unit that executes matching calculation between both image frames of an image group including three or more image frames;

A coding side intermediate frame generation unit for virtually generating an intermediate image frame sandwiched between the both end image frames by interpolation based on corresponding point information between the both end image frames obtained as a result of matching calculation;

For any region on the image, whether V or V among the virtually generated intermediate image frames has a difference between the actual intermediate image frame and the actual intermediate image frame or not based on a predetermined determination criterion And a determination unit that determines

Write code data including at least one of the both-ends image frame and corresponding point information into the memory, and if it is determined that there is an area with a difference greater than or equal to the allowable value, write the difference information regarding that area Equipped with a built-in control unit,

The image decoding unit

A read control unit that reads code data from the memory;

A decoding side intermediate frame generation unit that virtually generates an intermediate image frame sandwiched between both end image frames by interpolation based on image frame data and corresponding point information included in the coded data;

An area specifying unit for specifying an area where the encoded data includes difference information;

An intermediate frame correction unit that generates a corrected virtual image frame by adding a difference in the region to a virtual image frame including the specified region;

As a decoding result, one or both of the both-end image frames and the corrected virtual intermediate image frame for a virtual image frame including the area, and the area are not included! / Virtual images For the frame, an output unit that outputs the virtual intermediate image frame itself and the decoded data;

An image processing system comprising: