WO2013077562A1 - Apparatus and method for setting feature points, and apparatus and method for object tracking using same - Google Patents

Apparatus and method for setting feature points, and apparatus and method for object tracking using same Download PDF

Info

Publication number
WO2013077562A1
WO2013077562A1 PCT/KR2012/008896 KR2012008896W WO2013077562A1 WO 2013077562 A1 WO2013077562 A1 WO 2013077562A1 KR 2012008896 W KR2012008896 W KR 2012008896W WO 2013077562 A1 WO2013077562 A1 WO 2013077562A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
block
feature point
sad
sum
Prior art date
Application number
PCT/KR2012/008896
Other languages
French (fr)
Korean (ko)
Inventor
우대식
박재범
전병기
김종대
정원석
Original Assignee
에스케이플래닛 주식회사
시모스 미디어텍(주)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to KR1020110123372A priority Critical patent/KR101700276B1/en
Priority to KR10-2011-0123376 priority
Priority to KR10-2011-0123372 priority
Priority to KR1020110123376A priority patent/KR101624182B1/en
Application filed by 에스케이플래닛 주식회사, 시모스 미디어텍(주) filed Critical 에스케이플래닛 주식회사
Publication of WO2013077562A1 publication Critical patent/WO2013077562A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments

Abstract

The present invention relates to an apparatus and method for setting feature points, and to an apparatus and method for object tracking using same. According to the present invention, when a user designates feature points, a representative value of variability for each block is obtained in consideration of the pixel matrix surrounding the user-designated feature points, and the user-designated feature points are reset on the basis of the representative value of the variability. Accordingly, in terms of a stereoscopic conversion, the accuracy of feature point tracking for the outer contour of the object may be increased. In addition, a key frame whose total sum of errors is at the minimum can be easily found and selected through only simple calculation without having to track each frame.

Description

Feature point setting device and method and object tracking device and method using same

According to an embodiment of the present invention, when a user designates a feature point corresponding to an initial position of an object, a block having a predetermined size is formed around the specified feature point in an input image, and a block corresponding to the size of the formed block within a preset movement range is provided. After the definitions are given, each block defined in the moving range is frequency-transformed to obtain the gradient representative value, respectively, and the center point of the block having the largest gradient representative value in the movement range is reassigned to the final feature point. .

In addition, the present invention divides the frame of the input image into a predetermined number of blocks, calculates the sum of absolute difference (SAD) between the current frame and the adjacent frame for each block, and then calculates the SADs between the adjacent frames for each frame. It is related to selecting a frame having the smallest tracking error value as a key frame after calculating the total SAD by calculating the total sum, obtaining the tracking error value for each frame using the total SAD for each frame.

Techniques for detecting and tracking specific objects in video images have been developed around security and face recognition. In the case of a security system, it is necessary to detect an object entering the surveillance area and to track the movement of the object in order to automatically monitor a person entering the surveillance area.

In general, the object tracking technique is performed by extracting feature points and tracking these feature points in the next image. Extraction of the feature points becomes a strong edge component or corner component of the image because these positions can be maintained and scored even if there is motion. And each feature point is expressed by scoring that expresses value to feature point because of the feature of these components.

Automatically finding feature points makes it possible to select feature points with high scores based on these scores.However, if feature points are manually selected, feature points can be selected by the user regardless of the scoring. Tracking feature points increases the likelihood of errors.

Meanwhile, key frame extraction, highlighting, and indexing of video can be found in digital video search, recognition, tracking, monitoring, and blocking systems that specialize in various video camcorders, recorders, and video players that can be privately owned. To support the function, etc., scene change detection must be performed.

When a specific object is given or detected in a continuous video, the object tracking technology attempts to keep track of the shape or direction while searching for the shape of the same object in the continuous video as much as possible. In general, an object in an image may move or move while maintaining or deforming shape.

In general, application fields such as motion tracking of a person in a camera track motion detection or direction of motion related to the motion of the object. Although the shape is not important, the tracking of an object for stereoscopic conversion should maintain the shape of the object as much as possible. You can get the exact outline or shape that represents the object.

Therefore, unlike the general object tracking function, in order to track the shape of a specific object for three-dimensional transformation, the error should be minimized for the shape of the outline representing the object. In general, as the motion of an object increases in successive images, the error of the shape of the outline in the tracking process for each scene becomes larger.

Hereinafter, a tracking error accumulation relationship for stereoscopic transformation will be described with reference to FIGS. 1 and 2.

Considering the tracking of unidirectional object movement as shown in FIG. 1, as the errors accumulate for each scene, the sum of the overall errors gradually increases, and when more than a predetermined error is generated, the tracking for three-dimensional transformation is no longer possible.

However, the field for tracking the general object's direction is not affected by the object's tracking because it only needs to know the direction of the object regardless of this error.

In addition, in order to minimize the error further, in consideration of bidirectional object tracking, comparing the cumulative error generated while tracking a specific object in frame 7 and the cumulative error generated by tracking a specific object in frame 9 as shown in FIG. The sum of the total errors in both directions, the sum of the area under the curve in the graph, is less than frame 9.

That is, in the case of tracking an object of a general camera, if the object is found and the position of the direction is tracked in any case, the result of designating and tracking a specific object in frame 7 or frame 9 is the same, but the object for stereo conversion In the tracking of the object, the tracking of the object with the starting point of the scene 9 can reduce the errors in the generation of the depth map required for the stereoscopic transformation, so it is most reasonable to select the frame 9 as the "key frame" corresponding to the starting point of the tracking.

However, in each of the frames selected as described above, the left and right tracks are tracked for each frame as a whole, and when selecting a key frame having a small sum of the errors obtained in this way, all possible combinations are performed. There is a disadvantage which is impossible in reality.

The present invention has been made to solve the above-mentioned disadvantages, an object of the present invention is to obtain a representative value of the change degree for each block in consideration of the peripheral pixel matrix of the feature point specified by the user when the user manually specifies the feature point, A feature point setting device and method for resetting feature points specified by a user based on the representative value and increasing the accuracy of feature point tracking of an object's outline in terms of stereoscopic transformation, and an object tracking device and method using the same are provided. It is.

In addition, another object of the present invention is to find a frame with the smallest sum of errors by simple calculations without performing tracking for each frame, and easily select a key frame, and to detect errors generated by the movement of an object. The present invention provides an apparatus and method for setting a feature point that can select a key frame that minimizes a cumulative error using the estimation factor.

According to an aspect of the present invention for achieving the above object, a feature point defining unit for receiving and storing a feature point corresponding to the initial position of the object from the user with respect to the input image; A block forming unit forming a block having a predetermined size around the designated feature point, and defining blocks corresponding to the size of the formed block within a predetermined moving range; A gradient representative value calculator for frequency transforming each block defined within the moving range to obtain a gradient representative value, respectively; And a feature point determiner for reassigning the center point of the block having the largest gradient representative value to the final feature point within the movement range.

Here, the SAD calculation unit for dividing the frame of the input image into a predetermined number of blocks, and for each block to obtain a sum of absolute difference (SAD) between the current frame and the adjacent frame for each block; A SAD total calculation unit that calculates a total SAD for each frame by adding up all SADs between adjacent frames; A tracking error calculator which calculates a tracking error value for each frame by using the sum of SADs for each frame; And a key frame selecting unit configured to select a frame having the smallest tracking error value as a key frame.

On the other hand, according to another aspect of the present invention for achieving the above object, a feature point setting unit for setting the feature point of the object by checking the validity of the feature point corresponding to the initial position of the object specified by the user with respect to the input image; An image analyzer extracting motion vector information and residual signal information of each frame from the input image; And generating forward motion vector information for each unit block from the extracted motion vector information, restoring pixel information for a predetermined block from the extracted residual signal, and then positioning information of the set feature point and the forward motion vector information. And an object tracking unit for generating optimal position information of the object in each frame from the reconstructed pixel information.

On the other hand, according to another aspect of the present invention for achieving the above object, (a) receiving a feature point corresponding to the initial position of the object from the user with respect to the input image; (b) forming a block having a predetermined size based on the designated feature point in the input image, and defining blocks corresponding to the size of the formed block within a preset movement range; (c) obtaining frequency representative values by frequency converting each of the blocks defined within the moving range; And (d) reassigning the center point of the block having the largest gradient representative value within the movement range as the final feature point.

In addition, the method for setting a feature point provides a recording medium recorded by the program and readable by the electronic device.

On the other hand, according to another aspect of the present invention for achieving the above object, (a) dividing the frame of the input image into a predetermined number of blocks; (b) obtaining SADs between a current frame and an adjacent frame for each of the predetermined number of blocks; (c) summing all SADs between adjacent frames to obtain a sum of SADs for each frame; (d) obtaining a tracking error value for each frame using the total SAD for each frame; And (e) selecting a frame having the smallest tracking error value as a key frame.

In addition, the key frame selection method provides a recording medium recorded by the program and readable by the electronic device.

Meanwhile, according to another aspect of the present invention for achieving the above object, (a) setting the feature point of the object by checking the validity of the feature point corresponding to the initial position of the object specified by the user for the input image ; (b) extracting motion vector information and residual signal information of each frame from the input image; And (c) generating forward motion vector information for each unit block from the extracted motion vector information, restoring pixel information for a predetermined block from the extracted residual signal, and then positioning information of the set feature point and the forward direction. There is provided an object tracking method using feature points of an object tracking apparatus, the method comprising generating optimal position information of an object in each frame from motion vector information and the reconstructed pixel information.

An object tracking method using the feature point is provided by a program and provides a recording medium that can be read by an electronic device.

According to the present invention, when a user manually designates a feature point, a variation degree representative value for each block is obtained in consideration of the surrounding pixel matrix of the feature point designated by the user, and the characteristic point designated by the user is determined based on the variation degree representative value. You can reset it.

In addition, it is possible to increase the accuracy of feature point tracking on the outline of the object in terms of stereoscopic transformation.

In addition, a key frame can be easily selected by finding a frame having the smallest sum of errors by a simple operation without performing tracking for each frame.

And, using the SAD generated by the movement of the object as a factor for estimating the error can be selected a key frame that minimizes the cumulative error.

1 and 2 are views for explaining a cumulative tracking error relationship for a conventional stereoscopic transformation,

3 is a block diagram schematically showing the configuration of a feature point setting apparatus according to the present invention;

4 is a view for explaining a method for forming a block according to the present invention;

5 is a view showing a frequency-converted block according to the present invention;

6 is an exemplary view for explaining a method for obtaining an inter-frame SAD according to the present invention;

7 is a graph showing the total SAD of each frame in a scene to be tracked according to the present invention;

8 is a graph representing a tracking error value for each frame according to the present invention;

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same or corresponding components will be given the same reference numerals and redundant description thereof will be omitted.

Figure 3 is a block diagram schematically showing the configuration of a feature point setting apparatus according to the present invention, Figure 4 is a view for explaining a method for forming a block according to the present invention, Figure 5 shows a frequency-converted block according to the present invention Drawing.

Referring to FIG. 3, the feature point setting device 300 includes a feature point defining unit 310, a block forming unit 320, a gradient representative value calculating unit 330, a feature point determining unit 340, and a sum of absolute difference. ) A calculator 350, a SAD total calculator 360, a tracking error calculator 370, and a key frame selector 380.

The feature point definer 310 receives and stores a feature point corresponding to the initial position of the object from the user with respect to the input image. That is, the user designates a feature point corresponding to the initial position of the object in the input image, and the feature point defining unit 310 stores the feature point specified by the user. Here, the feature points refer to edges or corners.

The block forming unit 320 forms a block having a predetermined size based on the feature point specified by the feature point defining unit 310 and defines blocks corresponding to the size of the formed block within a preset movement range.

That is, the block forming unit 320 forms an arbitrary block including peripheral pixels around the designated feature point in the input image, and sets a moving range in which the formed block can move to the front, back, left, and right. Here, the moving range of the block may be a range previously defined by the user. By the setting of the movement range, blocks that can be formed within the movement range are defined.

Referring to FIG. 4 for how the block forming unit 320 forms a block, when the feature point (i, j) 400 is given in the image, the block forming unit 420 may block an arbitrary block around the feature point. Block (n) 410 is formed. Block (n) 410 is an arbitrary block including peripheral pixels corresponding to the feature point n.

The feature point coordinates (i, j) 400 of block (n) 410 are user defined coordinates, which are before, after, and left by a predefined movable size of d (x), d (y). , You can move right. Therefore, the upper left block 420 and the lower right block 430 refer to a moving range in which the block (n) 410 can move back, forth, left and right. As a result, block (n) has a spatial coordinate that can move by (2 * dx + 2 * dy), that is, a moving range of the block, and a total of (2 * dx + 2 * dy) blocks within the moving range. (n) may be defined.

The gradient representative value calculator 330 obtains the gradient representative value of each block by frequency converting all the blocks defined in the moving range set by the block forming unit 320, respectively. If (2 * dx + 2 * dy) blocks are defined in the moving range, the gradient representative value calculator 330 frequency-converts each of the (2 * dx + 2 * dy) blocks and changes the gradient representative value. Obtain each.

That is, the gradient representative value calculator 330 performs frequency conversion on each block, and obtains a gradient representative value of each block by summing pixel values corresponding to a high frequency region in the frequency-converted block.

A score that gives a value of a feature point in a space that can be moved based on a feature point of a user specified (i, j) coordinates, that is, reliability of the feature point, uses an FFT or DFT transform representing a frequency characteristic of the corresponding block (n).

That is, a two-dimensional fast fourier transform (FFT) or discrete fourier transform (DFT) in space corresponds to the low-frequency region in the upper left as shown in FIG. 3 in the case of a flat simple image according to the characteristics of the image pixel of the block (n). When the values become large and the degree of change is large (that is, when the change between pixels is large), the values corresponding to the high frequency region at the lower right become large.

Using the values of the high frequency range that can express the intensity of the gradient in the FFT or DFT table converted to the frequency domain, a representative value proportional to the change of the image can be obtained. Therefore, the sum of the pixel values corresponding to the high frequency range is obtained. The change in the block (n) is also obtained as a representative value.

That is, the gradient representative value calculator 330 calculates a gradient representative value for each block defined in a 2 * dx + 2 * dy space in which an arbitrary block (n) can move in space, that is, a moving range. Obtain

The feature point determiner 340 reassigns the center point of the block having the largest change representative value within the movement range as the final feature point.

An important factor for tracking feature points between images is greatly influenced by how different the periphery of feature points is from other parts. In other words, if the periphery selected as a feature point is simply a flat image or if the change is not large (that is, the change is not large), it is usually not easy to distinguish it from the following image, and if there is a characteristic that can be differentiated (that is, the change between pixels). It is easy to compare this change in the next image. In other words, the part with a large degree of change is easy to track, and tracking error becomes small. If a specific part is selected as a feature point according to the user's intention for this reason, the feature point setting device 300 may reselect the feature point as a feature point that is more easily tracked within a preset movement range.

The SAD calculator 350 divides the frame of the input image into a predetermined number of blocks, and then obtains SAD between the current frame and the previous frame, and the current frame and the subsequent frame, for each block.

In this case, the block is an N × N sized block, for example, a 16 × 16 size block, an 8 × 8 size block, or a 4 × 4 sized block. Each block is called Macro Block and is divided into small blocks of appropriate size according to the size of the whole image. Generally, each block is divided into 16x16 ~ 32x32.

The sum of absolute difference (SAD) is the sum of the absolute differences of all the pixels in the block, which in turn represents the difference between the blocks in the same position between frames. Therefore, the large SAD value means that the change of the image is large.

The SAD calculator 350 calculates SADs between the current frame and the adjacent frame for each block by using Equation 1, respectively. Here, the adjacent frame refers to a previous frame of the current frame and a subsequent frame of the current frame.

Equation 1

Figure PCTKR2012008896-appb-M000001

Here, fn is the number of frames, bm is the m-th block of the frame, i is the order of the pixels of the block, abs is the absolute value.

Accordingly, the SAD calculator 350 obtains SAD between the current frame and the previous frame, and SAD between the current frame and the subsequent frame, for each block.

Referring to FIG. 6 for how the SAD calculator 350 obtains the SAD, when the current frame is Frame (fn), the previous frame is Frame (fn-1), and the subsequent frame is Frame (fn + 1), The SAD calculator 350 performs SAD between blocks bm at the same position in each frame, that is, SAD (fn-1, bm) between the current frame block bm and the previous frame block bm, and SAD between the current frame and the next frame block bm ( fn, bm) are obtained respectively.

The SAD total calculation unit 350 obtains the SAD total by adding up all the SADs of each block for each frame.

That is, the SAD sum calculator 360 obtains a first SAD sum by adding the SADs for all blocks between the current frame and the previous frame, and adds the SADs for all blocks between the current frame and the next frame and adds a second SAD sum. After the calculation, the first SAD sum and the second SAD sum are summed to obtain a total SAD for each frame.

In other words, the SAD sum calculating unit 360 obtains a second SAD sum SAD (fn) by adding the SADs of all blocks between the current frame and subsequent frames as shown in Equation 2.

Equation 2

Figure PCTKR2012008896-appb-M000002

Here, j is an index of the block number.

The SAD (fn) obtained in Equation 2 is the sum of SADs between the current frame and the subsequent frame, and the SAD total calculation unit 360 should consider the previous frame as well as the subsequent frame to obtain the SAD total in the current frame.

Therefore, the SAD sum calculator 360 calculates the sum SAD for each frame t SAD (fn) by using Equation 3 below.

Equation 3

Figure PCTKR2012008896-appb-M000003

Here, SAD (fn-1) is a first SAD sum obtained by adding the SADs of all blocks between the current frame and the previous frame, and SAD (fn) is a sum obtained by adding the SADs of all blocks between the current frame and the next frame. 2 means SAD sum.

The SAD sum calculated by the SAD sum calculation unit 360 is a representative value representing the magnitude of the inter-frame variability, that is, the difficulty of tracking.

If the SAD total calculation unit 360 represents the total SAD obtained for each frame in a graph, it is as shown in FIG. 7. Referring to FIG. 7, the total SAD (tSAD (fn)) of each frame in a scene to be tracked is shown.

The tracking error calculator 370 calculates a tracking error value for each frame by using the SAD total for each frame obtained by the SAD total calculator 360.

That is, the tracking error calculator 370 calculates a tracking error value by accumulating the SAD total and its SAD total obtained up to the previous frame for each frame of the scene to be tracked.

For example, assuming fn is a key frame, a tracking error value between a current frame (fn) and a subsequent frame (frame (fn + 1)) may be obtained by using Equation 3, but in frame (fn + 1) The sum of consecutive errors between the next frame (fn + 2) must take into account the error reflected in the previous frame. Consequently, the tracking error value from frame (fn + 1) to frame (fn + 2) is compared with the previous tracking error value. Add up.

That is, the tracking error value in frame fn is tSAD (fn).

Then, the tracking error values in the frame fn + 1, the frame fn + 2, and the frame fn + N are as shown in Equation 4.

Equation 4

Figure PCTKR2012008896-appb-M000004

The tracking error value for each frame obtained by the tracking error calculator 370 is represented by a graph as shown in FIG. 8.

Referring to FIG. 8, when the frame fn is a key frame, the tracking error value of the frame f (n + 1) is a result of accumulating the sum of its own SADs on the tracking error value of the frame fn.

In addition, it can be seen that the tracking error value of the frame f (n + 2) is a result of accumulating the sum of its own SADs on the tracking error value of f (n + 1).

The key frame selecting unit 380 selects a frame having the smallest tracking error value as the key frame among the tracking error values obtained by the tracking error calculating unit 370.

For example, referring to FIG. 8, it can be seen that the tracking error error value in a specific frame fn is the smallest. Therefore, the key frame selector 380 selects fn as a key frame of the scene.

As a result, after deriving a graph as shown in FIG. 8 for each frame, a frame in which each base area, that is, the sum of the error curves is minimum is selected as the key frame.

The precondition that the feature point setting device 300 configured as described above selects a key frame by using the SAD is that the tracking of a specific object from one image to another image including motion is more relative to the motion of the image. This increases the tracking error of the image. Of course, a lot of movement does not increase tracking error, but on the average, a lot of movement means that there is a lot of volatility of a specific object to be tracked, which makes tracking difficult.

Therefore, the feature point setting device 300 obtains SAD between frames and selects a key frame using the SAD.

9 is a block diagram schematically illustrating a configuration of an object tracking apparatus using feature points according to an embodiment of the present invention.

Referring to FIG. 9, the object tracking apparatus 900 using the feature points includes a feature point setter 910, an image analyzer 920, and an object tracker 930.

The feature point setting unit 910 sets the feature point of the object by checking whether the feature point corresponding to the initial position of the object designated by the user is valid.

That is, the feature point setting unit 910 forms an arbitrary block around a feature point designated by a user in the input image, sets a moving range of the formed block, and then frequency-converts each block within the moving range of the block. Representative values of gradients are obtained for each block. Then, the feature point setting unit 910 sets the center point of the block having the largest gradient representative value within the movement range as the final feature point. Description of the method for setting the feature point by the feature point setting unit 910 is the same as the operation of the feature point setting apparatus in FIG. 3, and thus a detailed description thereof will be omitted.

The image analyzer 920 extracts motion vector information and residual signal information of each frame from the input image.

The object tracker 930 generates forward motion vector information for each unit block from the extracted motion vector information, restores pixel information of a predetermined block from the extracted residual signal, and then stores the feature points. The optimum position information of the object in each frame is generated from the position information of the feature point, the forward motion vector information, and the reconstructed pixel information set by the setting unit 910.

According to another embodiment of the present invention, the object tracking unit 930 determines the object feature point candidate of the current frame by using the pixel value difference between the current frame and the previous frame, and includes the feature point set by the feature point setting unit 910. A template similar to a predetermined template is searched in a predetermined area surrounding the object feature point candidate of the current frame to determine an object feature point of the current frame.

That is, the object tracking unit 930 calculates an optical flow according to the object feature point set by the feature point setting unit 910 by using the pixel value difference between the previous frame and the current frame, and uses the calculated light flow. An object feature point candidate is determined in the current frame. The object tracker 930 then uses a template matching to obtain a template similar to a template including the object feature point set by the feature point setter 910 around the determined object feature point candidate of the current frame. Search in the area.

In the above description, the object tracking unit 930 divides the object tracking method using the feature point into two methods, but various conventional methods may be used.

10 is a flowchart illustrating a method of resetting a feature point designated by a user by the apparatus for setting a feature point according to the present invention, and FIG. 11 is an exemplary diagram for describing a case in which a feature point according to the present invention is reassigned.

Referring to FIG. 10, the feature point setting apparatus 300 receives and stores a feature point corresponding to an initial position of an object from a user with respect to an input image in operation S1002.

After performing S1002, the feature point setting apparatus 300 forms an arbitrary block around the designated feature point in the input image (S1004), and sets a moving range of the formed block (S1006). Setting the moving range by the feature point setting apparatus 300 means that the formed block is moved within the moving range so that a plurality of blocks are defined. For example, when a moving range is defined as (2 * dx + 2 * dy), (2 * dx + 2 * dy) blocks may be defined in the moving range.

After the operation of S1006, the feature point setting apparatus 300 obtains a representative value of the degree of change of each block by frequency converting each block defined in the movement range (S1008). That is, the feature point setting apparatus 300 frequency-transforms each block, and obtains a representative value of change degree for each block by adding values of a high frequency region in the frequency-converted image.

After the operation of S1008, the feature point setting device 300 reassigns the center point of the block having the largest change representative value among the blocks defined in the movement range as the final feature point (S1010).

A method of resetting the feature point set by the user by the feature point setting apparatus 300 will be described with reference to FIG. 11.

Referring to FIG. 11, when a user manually selects an outline and a feature point as shown in FIG. 11, in order to increase tracking accuracy, the feature point setting device searches for the surroundings as shown in b to reassign a feature point having a better score.

If objects are tracked using these redefined feature points, errors in continuous object tracking can be minimized.

Of course, if it is determined that the redesignated feature point is different from the user's intention, the feature point may be set to use the original designated location. However, in the case of FULL HD or HD video, the resolution is 1920x1080 or 1280x720, so it is practically feasible to reassign feature points near 3x3 and 5x5 pixels.

12 is a flowchart illustrating a method for tracking an object using a feature point in the object tracking device according to the present invention.

Referring to FIG. 12, the object tracking apparatus 900 sets a feature point of an object by checking whether a feature point corresponding to an initial position of an object designated by a user is valid (S1202). That is, the object tracking device 900 forms an arbitrary block around a user-specified feature point in the input image, sets a moving range of the formed block, and then frequency-converts each block defined within the moving range. Representative values of gradients are obtained for each block. Then, the object tracking device 900 sets the center point of the block having the largest change representative value within the moving range as the final feature point.

After performing S1202, the object tracking device 900 extracts motion vector information and residual signal information of each frame from the input image (S1204).

After the operation of S1204, the object tracking device 900 generates the optimal position information of the object in each frame by using the position information of the set feature point, the motion vector information, and the residual signal information (S1206). That is, the object tracking device 900 generates forward motion vector information for each unit block from the extracted motion vector information, restores pixel information for a predetermined block from the extracted residual signal, and then stores the set feature points. The optimal position information of the object in each frame is generated from the position information, the forward motion vector information, and the reconstructed pixel information.

In other words, the object tracking apparatus 900 predicts the position coordinates of each feature point in the next frame from the feature point initial position information and the forward motion vector information of the previous frame. In this case, the object tracking device 900 extracts at least one candidate position coordinate from the position coordinate of the predicted feature point in order to find the position coordinate of the feature point more accurately. That is, the object tracking device 900 selects and extracts candidate position coordinates within a predetermined range based on the position coordinates of the predicted feature points. Then, the object tracking device 900 measures texture similarity energy, shape similarity energy, and motion similarity energy of each feature point candidate position coordinate, respectively.

Meanwhile, the present invention can be embodied as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored.

13 is a flowchart illustrating a method for selecting a key frame by a feature point setting apparatus according to the present invention.

Referring to FIG. 13, the feature point setting apparatus 300 divides the frame of the input image into a predetermined number of blocks (S1302).

Thereafter, the feature point setting apparatus 300 obtains SAD between the current frame and the adjacent frame for each block (S1304).

That is, the feature point setting apparatus 300 obtains SAD between the current frame and the previous frame, and SAD between the current frame and the subsequent frame, for each block. The SAD is a sum of absolute differences of all pixels in a block, and as a result, represents a difference of blocks at the same position between frames.

After the operation of S1304, the feature point setting apparatus 300 obtains the total SAD by adding the SADs of each block for each frame (S1306).

That is, the feature point setting apparatus 300 obtains a first SAD sum by adding the SADs of all blocks between the current frame and the previous frame, obtains a second SAD sum by adding the SADs of all blocks between the current frame and the subsequent frame, and then, The sum of the first SAD and the second SAD is obtained to obtain a total SAD for each frame.

After the operation of S1306, the feature point setting apparatus 300 calculates a tracking error value for each frame using the total SAD for each frame (S1308).

That is, the feature point setting apparatus 300 calculates a tracking error value by accumulating the total SAD obtained up to the previous frame and its total SAD for each frame of the scene to be tracked.

In other words, the feature point setting apparatus 300 calculates a tracking error value for each frame of a scene to be tracked using Equation 4 described with reference to FIG. 3.

After performing S1308, the feature point setting apparatus 300 selects a frame having the smallest tracking error value as a key frame (S1310).

The feature point setting apparatus 300 selects a key frame through the above process, but the above process looks complicated but is a simple sum operation and traces because it corresponds to an extremely small amount of operation compared to an operation for tracking an object. You can predict the most optimal key frame before.

In addition, the feature point setting apparatus 300 may easily select a key frame through a simple inspection and calculation without performing tracking for each frame through the above process.

The key frame selection method of the feature point setting apparatus 300 described above can be written in a program, and codes and code segments constituting the program can be easily inferred by a programmer in the art.

In addition, the program related to the key frame selection method of the feature point setting device 300 is stored in a readable media that can be read by the electronic device, and is tracked by the key frame selection device by being read and executed by the electronic device. A key frame can be selected from the frames in the scene.

As such, those skilled in the art will appreciate that the present invention can be implemented in other specific forms without changing the technical spirit or essential features thereof. Therefore, the above-described embodiments are to be understood as illustrative in all respects and not as restrictive. The scope of the present invention is shown by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

According to the present invention, when a user manually designates a feature point, a representative value of change degree for each block is obtained in consideration of the surrounding pixel matrix of the feature point designated by the user, and the feature point designated by the user is reset based on the change degree representative value. It can be applied to a feature point setting apparatus and method and an object tracking apparatus and method using the same.

In addition, the present invention can easily select a key frame by finding the frame with the smallest sum of errors by simple calculations without performing tracking for each frame, and estimates the error of the SAD generated by the movement of the object. It can be applied to a feature point setting apparatus and method that can select a key frame that minimizes the cumulative error using.

Claims (21)

  1. A feature point definition unit configured to receive and store a feature point corresponding to the initial position of the object from the user with respect to the input image;
    A block forming unit forming a block having a predetermined size around the designated feature point, and defining blocks corresponding to the size of the formed block within a predetermined moving range;
    A gradient representative value calculator for frequency transforming each block defined within the moving range to obtain a gradient representative value, respectively; And
    A feature point determiner that reassigns the center point of the block having the largest change representative value within the moving range as a final feature point;
    Feature point setting apparatus comprising a.
  2. The method of claim 1,
    The block forming unit forms an arbitrary block including peripheral pixels based on the designated feature point, and defines a block within the movement range by setting a movement range in which the formed block can be moved forward, backward, left, and right. A feature point setting device, characterized in that.
  3. The method of claim 1,
    The gradient representative value calculating unit performs frequency conversion on each block, and obtains a gradient representative value of each block by adding pixel values corresponding to a high frequency region in the frequency-converted block. .
  4. The method of claim 1,
    A SAD calculator for dividing a frame of the input image into a predetermined number of blocks, and calculating a sum of absolute difference (SAD) between a current frame and an adjacent frame for each block;
    A SAD total calculation unit that calculates a total SAD for each frame by adding up all SADs between adjacent frames;
    A tracking error calculator which calculates a tracking error value for each frame by using the sum of SADs for each frame; And
    A key frame selecting unit selecting a frame having the smallest tracking error value as a key frame;
    Characteristic point setting device characterized in that it further comprises.
  5. The method of claim 4, wherein
    And the SAD calculator calculates an SAD between a current frame and a previous frame or between a current frame and a subsequent frame using the following equation.
    [Equation]
     SAD (fn, bm) = Σabs (Frame (fn, bm) pixel (i)-Frame (fn + 1, bm) pixel (i))
    Where fn is the number of frames, bm is the mth block of the frame, i is the order of pixels in the block, and abs is the absolute value.
  6. The method of claim 4, wherein
    The SAD sum calculation unit obtains a first SAD sum by adding SADs of all blocks between a current frame and a previous frame, obtains a second SAD sum by adding SADs of all blocks between the current frame and a subsequent frame, And a sum of first SADs and the second SAD sums to obtain a sum of SADs per frame.
  7. The method of claim 4, wherein
    The tracking error calculation unit calculates a tracking error value by accumulating the accumulated value of the SAD total and its own SAD total for each frame of the scene to be tracked according to the following equation.
    [Equation]
         Tracking error per frame = Σ tSAD (fn + k), k = 0 ~ N
    Where tSAD (fn) = SAD (fn-1) + SAD (fn), SAD (fn-1) is the first SAD sum, SAD (fn) is the second SAD sum, and tSAD (fn) is the SAD in frame fn. Means total.
  8. A feature point setting unit for setting a feature point of the object by checking whether the feature point corresponding to the initial position of the object designated by the user is valid with respect to the input image;
    An image analyzer extracting motion vector information and residual signal information of each frame from the input image; And
    After generating forward motion vector information for each unit block from the extracted motion vector information, and reconstructing pixel information for a predetermined block from the extracted residual signal, the position information of the set feature point, the forward motion vector information, and An object tracking unit to generate optimal position information of the object in each frame from the reconstructed pixel information;
    Object tracking device using a feature point comprising a.
  9. The method of claim 8,
    The feature point setting unit forms a block having a predetermined size based on a user-specified feature point in the input image, defines blocks corresponding to the size of the formed block within a preset movement range, and then defines the movement within the movement range. Frequency conversion of each block to obtain a representative value of the degree of change for each block, and the object tracking device using a feature point characterized in that to set the center point of the block having the largest change degree representative value as the final feature point.
  10. (a) designating and storing a feature point corresponding to an initial position of an object from a user with respect to the input image;
    (b) forming a block having a predetermined size based on the designated feature point in the input image, and defining blocks corresponding to the size of the formed block within a preset movement range;
    (c) obtaining frequency representative values by frequency converting each of the blocks defined within the moving range; And
    (d) reassigning the center point of the block having the largest change representative value within the moving range as the final feature point;
    Characteristic point setting method of a feature point setting device comprising a.
  11. The method of claim 10,
    In the step (b), an arbitrary block including neighboring pixels is formed around the designated feature point in the input image, and the moving range is set by setting a moving range in which the formed block can be moved forward, backward, left, and right. A method for setting a feature point of a feature point setting device, characterized in that it defines possible blocks within the frame.
  12. The method of claim 10,
    In the step (c), the frequency transformation is performed on each of the blocks, and a representative value of the degree of change of each block is obtained by summing pixel values corresponding to a high frequency region in the frequency-converted block. How to set feature points.
  13. Designating and storing a feature point corresponding to an initial position of an object from a user with respect to the input image;
    Forming a block having a predetermined size based on the designated feature point in the input image, and defining blocks corresponding to the size of the formed block within a predetermined moving range;
    Frequency transforming each block defined within the movement range to obtain a representative value of the gradient; And
    And re-designating the center point of the block having the largest change representative value within the moving range as the final feature point.
  14. (a) dividing a frame of the input image into a predetermined number of blocks;
    (b) obtaining SADs between a current frame and an adjacent frame for each of the predetermined number of blocks;
    (c) summing all SADs between adjacent frames to obtain a sum of SADs for each frame;
    (d) obtaining a tracking error value for each frame using the total SAD for each frame; And
    (e) selecting a frame having the smallest tracking error value as a key frame;
    Key frame selection method of a feature point setting device comprising a.
  15. The method of claim 14,
    In the step (b), the SAD between the current frame and the previous frame, and the SAD between the current frame and the subsequent frame are obtained for each block, respectively.
  16. The method of claim 14,
    In the step (c), the first SAD sum is obtained by adding the SADs of all blocks between the current frame and the previous frame, and the second SAD sum is obtained by adding the SADs of all blocks between the current frame and the next frame. And summing the first SAD sum and the second SAD sum to obtain a sum of SADs for each frame.
  17. The method of claim 14,
    In the step (d), the cumulative value of the sum of the SADs obtained up to the previous frame and the sum of the SADs of the scenes are calculated for each frame of the scene to be tracked to obtain a tracking error value. .
  18. Dividing a frame of the input image into a predetermined number of blocks;
    Obtaining SADs between a current frame and an adjacent frame for each of the predetermined number of blocks;
    Obtaining a sum of SADs for each frame by adding up all SADs between adjacent frames;
    Obtaining a tracking error value for each frame using the sum of SADs for each frame; And
    And selecting a frame having the smallest tracking error value as a key frame, wherein the method for selecting a key frame is recorded by a program and can be read by an electronic device.
  19. (a) setting a feature point of the object by checking the validity of the feature point corresponding to the initial position of the object designated by the user with respect to the input image;
    (b) extracting motion vector information and residual signal information of each frame from the input image; And
    (c) generating forward motion vector information for each unit block from the extracted motion vector information, restoring pixel information for a predetermined block from the extracted residual signal, and then positioning information of the set feature point and the forward motion Generating optimal position information of an object in each frame from vector information and the reconstructed pixel information;
    Object tracking method using the feature points of the object tracking device comprising a.
  20. The method of claim 19,
    In step (a),
    Forming a block having a predetermined size based on the designated feature point in the input image, and defining blocks corresponding to the size of the formed block within a preset movement range;
    Frequency transforming each block defined within the moving range to obtain a representative value of the gradient; And
    And reassigning the center point of the block having the largest change representative value to the final feature point within the moving range.
  21. Setting a feature point of the object by checking the validity of the feature point corresponding to the initial position of the object designated by the user with respect to the input image;
    Extracting motion vector information and residual signal information of each frame from the input image; And
    After generating forward motion vector information for each unit block from the extracted motion vector information, and reconstructing pixel information for a predetermined block from the extracted residual signal, the position information of the set feature point, the forward motion vector information, and And an object tracking method using a feature point recorded by the program and readable by the electronic device.
PCT/KR2012/008896 2011-11-24 2012-10-26 Apparatus and method for setting feature points, and apparatus and method for object tracking using same WO2013077562A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020110123372A KR101700276B1 (en) 2011-11-24 2011-11-24 Apparatus and Method for setting feature point, tracking object apparatus and method usig that
KR10-2011-0123376 2011-11-24
KR10-2011-0123372 2011-11-24
KR1020110123376A KR101624182B1 (en) 2011-11-24 2011-11-24 Apparatus and Method for selecting key frame

Publications (1)

Publication Number Publication Date
WO2013077562A1 true WO2013077562A1 (en) 2013-05-30

Family

ID=48469967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2012/008896 WO2013077562A1 (en) 2011-11-24 2012-10-26 Apparatus and method for setting feature points, and apparatus and method for object tracking using same

Country Status (1)

Country Link
WO (1) WO2013077562A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100466587B1 (en) * 2002-11-14 2005-01-24 한국전자통신연구원 Method of Extrating Camera Information for Authoring Tools of Synthetic Contents
KR20050085842A (en) * 2002-12-20 2005-08-29 더 파운데이션 포 더 프로모션 오브 인더스트리얼 사이언스 Method and device for tracing moving object in image
KR20080017521A (en) * 2006-08-21 2008-02-27 (주)휴먼세미컴 Method for multiple movement body tracing movement using of difference image
KR100958379B1 (en) * 2008-07-09 2010-05-17 (주)지아트 Methods and Devices for tracking multiple 3D object, Storage medium storing the same
KR100996209B1 (en) * 2008-12-23 2010-11-24 중앙대학교 산학협력단 Object Modeling Method using Gradient Template, and The System thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100466587B1 (en) * 2002-11-14 2005-01-24 한국전자통신연구원 Method of Extrating Camera Information for Authoring Tools of Synthetic Contents
KR20050085842A (en) * 2002-12-20 2005-08-29 더 파운데이션 포 더 프로모션 오브 인더스트리얼 사이언스 Method and device for tracing moving object in image
KR20080017521A (en) * 2006-08-21 2008-02-27 (주)휴먼세미컴 Method for multiple movement body tracing movement using of difference image
KR100958379B1 (en) * 2008-07-09 2010-05-17 (주)지아트 Methods and Devices for tracking multiple 3D object, Storage medium storing the same
KR100996209B1 (en) * 2008-12-23 2010-11-24 중앙대학교 산학협력단 Object Modeling Method using Gradient Template, and The System thereof

Similar Documents

Publication Publication Date Title
CN103098076B (en) Gesture recognition system for TV control
WO2015102361A1 (en) Apparatus and method for acquiring image for iris recognition using distance of facial feature
Chekhlov et al. Real-time and robust monocular SLAM using predictive multi-resolution descriptors
CN105100894B (en) Face automatic labeling method and system
Zhang et al. Automatic partitioning of full-motion video
TWI352937B (en) Motion compensated frame rate conversion
US7489803B2 (en) Object detection
JP4121376B2 (en) Local constraints for motion matching
US7522772B2 (en) Object detection
JP3461626B2 (en) Specific image region extraction method and specific image region extraction device
US8886634B2 (en) Apparatus for displaying result of analogous image retrieval and method for displaying result of analogous image retrieval
JP4586709B2 (en) Imaging device
JP4462959B2 (en) Microscope image photographing system and method
KR101173802B1 (en) Object tracking apparatus, object tracking method, and recording medium for control program
KR101363017B1 (en) System and methed for taking pictures and classifying the pictures taken
JP5371083B2 (en) Face identification feature value registration apparatus, face identification feature value registration method, face identification feature value registration program, and recording medium
JP4966820B2 (en) Congestion estimation apparatus and method
US8818097B2 (en) Portable electronic and method of processing a series of frames
JP4831623B2 (en) Moving image face index creation apparatus and face image tracking method thereof
KR100544677B1 (en) Apparatus and method for the 3D object tracking using multi-view and depth cameras
US8331619B2 (en) Image processing apparatus and image processing method
JP3944647B2 (en) Object measuring apparatus, object measuring method, and program
US5991428A (en) Moving object detection apparatus and method
Amel et al. Video shot boundary detection using motion activity descriptor
JP3242406B2 (en) Video merging using pattern key insertion

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12851233

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12851233

Country of ref document: EP

Kind code of ref document: A1