KR20120100448A

KR20120100448A - Apparatus and method for video encoder using machine learning

Info

Publication number: KR20120100448A
Application number: KR1020110019350A
Authority: KR
Inventors: 장호; 한동일
Original assignee: (주)스프링웨이브
Priority date: 2011-03-04
Filing date: 2011-03-04
Publication date: 2012-09-12

Abstract

PURPOSE: A video compression method and an apparatus thereof are provided to reduce a source necessary to compress a video and to reduce the entire video compression time. CONSTITUTION: A macro block feature extracting unit(300) extracts feature information of a macro block from a video which is supplied from a video input unit(100). A decision trees calculating unit(400) supplies mode determining information by using an optimum decision trees and feature information. An optimum mode deciding unit(500) supplies optimum mode information to a video compressing unit(600). The video compressing unit generates the compressed video. [Reference numerals] (100) Video input unit; (300) Macro block feature extracting unit; (400) Decision tree calculating unit; (500) Optimum mode deciding unit; (600) Video compressing unit; (800) Compressed video; (AA, CC) YCbCr image signal; (BB) Bit stream; (DD) Optimum mode information; (EE) Mode decision information; (FF) Feature information

Description

Apparatus and method for video encoder using machine learning

The present invention relates to a method and apparatus for compressing and decompressing a moving image. By using a machine learning method, a motion prediction and a macro block mode determination portion, which requires the most resources in the process of compressing and decompressing a video, can be performed. An image compression apparatus and method for reducing the compression time of the motion predictor by 40% or more and the overall image compression time by 20% or more while maintaining the same level as the method.

H.264 or AVC (Advanced Video Coding) is the latest video compression technology that offers superior performance in compression efficiency, image quality, and bit rate compared to previous compression technologies. This video decompression technology is widely commercialized through digital TV, and is widely used in various applications such as video telephony, video conferencing, DVD, games, and 3D TV, and its market is also enormous. At the same time, research institutes are speeding up the development of relevant technologies. These technologies are currently being standardized at each stage, and jointly developed by the ITU's Video Coding Experts Group (VCEG) and ISO / IEC's Motion Picture Experts Group (MPEG), to verify and compare the developed algorithms. A reference software is provided. Currently, H.264 or AVC compression technology provides superior performance in terms of compression efficiency, image quality, and bit rate compared to previous versions.However, compared to this version, the motion prediction method is much more complicated. 8x16, 8x8, 4x8, 8x4, and 4x4 modes are used. Therefore, more than 70% of the resources required for image compression are consumed for motion prediction and macroblock mode determination for image compression.

As described above, the video decompression technology, which forms a huge market, is applied to various kinds of products and forms a huge market worldwide. In the latest video decompression technology, H.264 or AVC (Advanced Video Coding), motion prediction and macro block mode determination consume more than 70% of the resources required for full video compression. Therefore, in the case of H.264 or AVC (Advanced Video Coding) technology, it is necessary to increase the efficiency of the motion prediction and macro block mode decision part. In the present invention, the motion prediction and the same are maintained while maintaining the same compression efficiency, image quality, and bit rate of the image. The purpose is to provide a solution that can reduce the resources consumed by the macro block mode decision part by more than 40%.

A macro block feature extractor which extracts feature values, such as an average value and a variance, of each size macroblock from an input video, and a position in a mode decision tree determined offline by using feature extract values obtained for each macroblock. Decision tree calculation unit for searching for, Optimal mode determination unit for extracting the best mode information from the current macro block using the decision tree value to which the feature belongs, Macro block mode using the mode decision tree using the existing video compression method It consists of an image compression unit that performs video compression using

As described above, the video compression technique proposed in the present invention has an advantage of reducing the resources required for video compression by 40% or more compared to H.264, which is the latest video decompression technique.

Therefore, in PC-based environment where video decompression technology is used, the time required for video compression can be minimized in video decompression environments such as video conferencing, telemedicine, and video telephony. There are advantages to it. It also has the advantage of making more efficient use of PC's computing resources.

Even when implemented as a dedicated ASIC such as a digital TV or a video phone, the ASIC can be easily implemented by simplifying a high-computation video compression portion, which can shorten the development time. In addition, by reducing the number of gates of the ASIC chip can be implemented as a small, low-cost ASIC, and can be developed as a low-power chip, it can be effectively used in a portable video telephony environment that requires low power.

1 is a block diagram of a video compression device using machine learning according to an embodiment of the present invention
2 is an internal view of offline mode decision tree generation according to an embodiment of the present invention.
3 is an example of determining macro block information in an image according to an embodiment of the present invention.
4 is an internal configuration example of a video selection input unit according to an embodiment of the present invention;
5 is an internal configuration example of a video selection input unit according to an embodiment of the present invention
6 is an internal configuration example of a video selection input unit according to an embodiment of the present invention;
7 is a motion prediction time comparison table for 14 types of test images according to an embodiment of the present invention.
8 is a motion prediction time comparison chart for 14 types of test images according to an embodiment of the present invention.
9 is a file size comparison chart for 14 types of test images according to an embodiment of the present invention.
10 is a PSNR comparison chart of 14 kinds of test images according to an embodiment of the present invention.
11 is an MSE comparison chart of 14 kinds of test images according to an embodiment of the present invention.
12 is a SSIM comparison chart for 14 types of test images according to an embodiment of the present invention.

In accordance with an aspect of the present invention, there is provided a video compression technique and apparatus using machine learning, including: a macro block feature value extraction unit for extracting feature values, such as an average value and a variance, of a macro block of each size from an input video, each macro; A decision tree calculation unit for searching a position in a mode decision tree determined offline by using feature value extraction values obtained for each block, and extracting optimal mode information from a current macro block using a decision tree value to which feature values belong. An optimal mode determiner and an image compressor which use the existing video compression method but perform video compression using a macro block mode using a mode decision tree.

BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

Combinations of the steps of each block in the accompanying block diagrams may be performed by computer program instructions. These computer program instructions may be mounted on a processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, such that the instructions executed by the processor of the computer or other programmable data processing equipment are described in each block of the block diagram. It creates a means to perform the functions. These computer program instructions may be directed to a computer or other programmable data processing equipment to implement the functionality in a particular manner, and instructions stored in a computer usable or computer readable memory may perform the functions described in each block of the block diagram. It is also possible to produce articles of manufacture containing instruction means. Computer program instructions may also be mounted on a computer or other programmable data processing equipment, such that a series of operating steps may be performed on the computer or other programmable data processing equipment to create a computer-implemented process to create a computer or other programmable data. Instructions that perform processing equipment may also provide steps for executing the functions described in each block of the block diagram.

Each block may also represent a module, segment or portion of code that includes one or more executable instructions for executing a specified logical function (s). It should also be noted that in some alternative embodiments, the functions noted in the blocks may occur out of order. For example, the two blocks shown in succession may in fact be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending on the corresponding function.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1 is a block diagram of a video compression apparatus using machine learning according to an embodiment of the present invention, a video input unit 100, a macro block feature value extractor 300, a decision tree calculator 400, and an optimal mode determiner. 500, an image compressor 600, a compressed video 800, and the like.

Referring to FIG. 1, the video input unit 100 may be an image format having various resolutions used for general image compression as shown in Table 1 below.

User scenario Resolution & Frame Rate Example Data Rates Mobile content 176x144, 10-15 fps 50-60 Kbps Internet / Standard Definition 640x480, 24 fps 1-2 Mbps High definition 1280 x 720, 24p 5-6 Mbps Full High Definition 1920x1080, 24p 7-8 Mbps

[Table 1] shows the low resolution video of 176x144, which is widely used as a video transmission format of mobile devices such as mobile phones. It can include up to 1280x720 high-definition video in use and up to 1920x1080 Full HD video in large-screen display devices. These image signals are output to the image compression unit 600 for video compression and to the macro block feature value extraction unit 300 for optimal mode determination.

The macro block feature value extractor 300 calculates and outputs feature values in units of macro blocks using YCbCr format images having various resolutions provided by the video input unit 100. Types of feature values in units of macro blocks include the average value, variance value, edge direction, edge size, edge position, etc. of image data in each block for macro blocks of size 16x16, 16x8, 8x16, 8x8, 4x8, 8x4, 4x4. The average and variance values for the 16x16 and 4x4 macroblocks are shown in Table 2 below.

(One)

(2)

(3)

(4)

In this manner, the macro block feature extractor 300 extracting various feature values inside the macro block is also used in the offline mode of FIG. 2. The macro block feature value extractor 300 of FIG. 2 performs the same operation as the macro block feature value extractor 300 of FIG. 1 and extracts the same kind of feature values from the macro block of the input image.

Referring to FIG. 2, an image provided by the video selection input unit 200 is simultaneously provided to the optimum image compressor 700 and the macro block feature value extractor 300. The optimal image compression unit 700 may be a conventional standard H.264 compressor, a standard AVC compressor, and the like, and calculates an optimal macroblock mode for each input image and provides a function to the mode learning unit 900. . The macro block feature values extracted for the same image are provided by the macro block feature extractor 300.

The mode learner 900 uses the optimal macro block mode information provided by the optimal image compression unit 700 and the feature value information of the macro block calculated for each size provided by the macro block feature value extractor 300. Then, the macro block mode according to the feature values for each size of the macro block is learned.

The final training data learned through the mode learner generates the optimal decision tree 1000, and the determined final decision tree 1000 is used by the decision tree calculator 400 when determining an online mode. The final video compression performance in the online mode depends on the performance of the optimal decision tree computed in the offline phase. Therefore, great care must be taken when determining the optimal decision tree in the offline phase.

The mode information provided by the optimal image compression unit and the feature value information provided by the macro block feature value extractor are important in mode learning in the mode learner 900 in the offline stage, but the input video also forms a very important factor. For example, when various mode determination results are not provided in the input image signal, data for all mode information may not be configured during learning of the mode learner. In addition, even if the data for the full mode information is configured, there is a lack of data constituting the information necessary for accurate determination, which may result in a decrease in the compression performance during online compression.

That is, as in the actual example of FIG. 3, in the case of an arbitrary image, the macroblock mode of all seven cases may or may not be included depending on the image. In addition, even if all seven macroblock mode information is included, information necessary for learning may be insufficient. Therefore, various kinds of video inputs are required to secure sufficient data for learning.

The video selection input unit 200 provides various types of image inputs as shown in FIGS. 4, 5, and 6 to provide learning information about all macroblock modes. In the case of FIG. 4, a video having almost no motion, a video having a large movement, a video having a large complexity in the image, and a video having a very low complexity in the image are transferred to the optimal image compression unit 700 and the macroblock feature extractor 300. By providing the mode learning unit 900 to generate the optimal decision tree (1000). That is, the mode is trained using an image with little motion, and then the mode is trained using an image with very large motion. Then, the mode is trained using an image having a very high complexity in the image, and then the mode is trained using an image having a very low complexity in the image. Through this repetitive learning, a large number of data for seven macroblock modes are inputted and an optimal decision tree can be generated through this.

The video input selector extracts not only four types of video inputs as shown in FIG. 4, but also an image having a very high complexity in the image and a video having a very low complexity in the image as shown in FIG. 5. By providing the unit 300, the mode learner 900 may generate the optimal decision tree 1000. In addition, as shown in FIG. 6, the mode learning unit 900 may provide the optimal decision tree 1000 by providing a video having almost no motion and a video having a very large motion to the optimum image compression unit 700 and the macroblock feature extractor 300. ) Can also be created.

Referring back to FIG. 1, the decision tree calculator 400 of FIG. 1 calculates a decision tree for determining an optimal mode by using feature values for each mode provided by the macro block feature value extractor 300. .

Using the mode information provided by the decision tree calculator 400, the final mode determiner 500 provides the image compressor 600 with optimal mode information on the current macroblock. In the conventional method, in order to provide the optimal mode information, it consumes about 70% of the total compressor resources to provide the mode information for each macro block. However, this method requires an offline machine learning step. Has the advantage of reducing the macro block mode decision time by nearly 50%.

The compressed video 800 finally obtained in this manner does not show a noticeable difference in terms of image quality and the result of using the video compression technique of the conventional method, and shows a difference of less than 0.5% in the compression ratio.

6 to 12 illustrate the video compression performance using the video selection input unit 200 of FIG. 6. In the case of FIG. 7, the motion prediction time of the present invention and the conventional method are shown for 14 kinds of test videos which are generally used for video performance test, and show almost 50% performance improvement in most test videos. While the time required for the compression is shortened by 50%, the size of the compressed file is increased to 0.5% or less as shown in FIG. 9, but the increase in the file size is almost insignificant compared to the time reduction.

Comparison of the image quality between the conventional method and the method proposed in this patent is shown in FIGS. 10 to 12, and it can be seen that there is a difference that cannot be recognized by the human eye.

In addition, while the above has been illustrated and described with respect to a preferred embodiment of the present invention, the present invention is not limited to the specific embodiment described above, the invention belongs without departing from the gist of the invention claimed in the claims. Various modifications may be made by those skilled in the art, and these modifications should not be individually understood from the technical spirit or the prospect of the present invention.

Description of the Related Art
100: video input unit 200: video selection input unit
210: low speed motion image providing unit 220: high speed motion image providing unit
230: complex form image providing unit 240: simple form image providing unit
250: Image selector 300: Macro block feature value extractor
400: decision tree calculator 500: optimal mode determiner
600: image compression unit 700: optimal image compression unit
800: compressed video 900: mode learning unit
1000: Optimal Decision Tree

Claims

A video input unit which provides an image at the same time to an image compression unit and a macroblock feature value extraction unit,
A macroblock feature value extraction unit for extracting feature value information of a macroblock from a video provided by the video input unit;
A decision tree calculator configured to provide mode decision information by using feature value information provided by the macro block feature value extractor and an offline optimally generated decision tree;
An optimum mode determiner for providing an optimal mode information to the image compressor from the mode decision information provided by the decision tree calculator;
And an image compressor for generating a compressed video using the video provided by the video input unit and the optimal mode information provided by the optimal mode determiner.
Video compression device using machine learning.

The method of claim 1,
The decision tree calculation unit using an optimal decision tree generated offline,
A video selection input unit providing an image simultaneously to an optimum image compression unit and a macroblock feature extraction unit;
A macroblock feature value extraction unit for extracting feature value information of a macroblock from a video provided by the video selection input unit;
An optimal image compression unit which performs optimal image compression from a video provided by the video selection input unit;
A mode learner configured to determine a mode of a macroblock by using feature value information provided by the macroblock feature value extractor and optimal compression mode information provided by the optimal image compressor;
An optimal decision tree for generating an optimal decision tree using mode information provided by the mode decision unit;
Video compression device using machine learning.

The method of claim 2,
The video selection input unit,
A low-speed motion image providing unit having almost no motion between the images,
A high speed motion image providing unit having a very large motion between the images;
A complex image providing unit having a complicated shape in the image;
A simple form image providing unit having a simple form in an image,
And an image selecting unit for selectively outputting images provided by the various kinds of image providing units.
Video Compression Device Using Machine Learning

A video input step of simultaneously providing an image to an image compression unit and a macroblock feature extraction unit;
A macroblock feature value extraction step of extracting feature value information of a macroblock from a video provided in the video input step;
A decision tree calculation step of providing mode decision information using the feature value information provided in the macro block feature value extraction step and an optimal decision tree generated offline;
An optimal mode determination step of providing optimal mode information to the image compression step from the mode determination information provided in the decision tree calculating step;
And a video compression step of generating a compressed video using the video provided in the video input step and the optimal mode information provided in the optimal mode determination step.
Video compression method using machine learning.

The method of claim 4, wherein
The decision tree calculation step using the optimal decision tree generated offline,
A video selection input step of providing an image at the same time as an optimal video compression step and a macroblock feature value extraction step;
A macroblock feature value extraction step of extracting feature value information of a macroblock from a video provided in the video selection input step;
An optimal image compression step of performing optimal image compression from a video provided in the video selection input step;
A mode learning step of determining a mode of a macroblock by using the feature value information provided in the macroblock feature value extracting step and the optimum compression mode information provided in the optimum image compression step;
An optimal decision tree for generating an optimal decision tree using the mode information provided in the mode decision step;
Video compression method using machine learning.

6. The method of claim 5,
The video selection input step,
Providing a low-speed motion image with little motion between the images,
Providing a high-speed motion image having a large motion between the images;
Providing a complex shape image having a complicated shape in the image,
Providing a simple form image having a simple form in the image,
And an image selecting step of selectively outputting images provided by the various kinds of image providing units.
Video compression method using machine learning.