CN110473264B

CN110473264B - Depth map compression method and decompression method based on Huffman coding and encoder

Info

Publication number: CN110473264B
Application number: CN201910681225.0A
Authority: CN
Inventors: 符样清; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-07-26
Filing date: 2019-07-26
Publication date: 2023-04-07
Anticipated expiration: 2039-07-26
Also published as: CN110473264A

Abstract

The invention discloses a depth map compression method, a depth map decompression method and a depth map encoder based on Huffman coding, relates to an image compression technology, and belongs to the technical field of electric communication. The method includes the steps of establishing a Huffman tree according to an input data stream extracted from a depth image, training a linear regression model describing the corresponding relation between input pixel points of the depth image and actual Huffman coding values according to the established Huffman tree and through a deep learning algorithm, establishing a coding table corresponding to the input pixel points according to optimized Huffman coding output by the trained linear regression model, and rapidly compressing the depth image by rapidly calculating Huffman coding for the input pixel points, so that the compression time is shortened, and the coding efficiency is improved.

Description

Depth map compression method and decompression method based on Huffman coding and encoder

Technical Field

The invention discloses a depth map compression method, a depth map decompression method and a depth map encoder based on Huffman coding, relates to an image compression technology, and belongs to the technical field of electric communication.

Background

The depth map is acquired by a stereo camera or a TOF camera. Depth images, also known as range images, refer to images having as pixel values the distances from the image capture to various points in the scene, which directly reflect the geometry of the visible surface of the scene. The depth image can be calculated into point cloud data through coordinate conversion, and the point cloud data with regular and necessary information can also be inversely calculated into depth image data. We construct an interactive space with depth cameras, multiple cameras can be used to measure a larger interactive space or to address the line-of-sight limitations of a single camera. However, we need to calibrate the cameras and place them in the same coordinate system, but each camera can only be connected to one pc, so we need multiple networked computers, transmit the camera data through the local area network, and give the host the task of image processing, so the depth map compression technique works.

The pc transmission of data from multiple cameras networked will be affected by network bandwidth, for example, a single microsoft kinect sensor for color high definition image transmission requires a network bandwidth in excess of 1.4Gbps using video transmission rates, but using high quality compression can reduce the required bandwidth to 30.7Mbps or 15.4Mbps, allowing multiple cameras to be used on a typical 1Gbps local area network. The depth images of the Kinect are 512 × 424, and each depth image is about 424KB or 104Mbps. A typical 1Gbps can support about 7 kinect devices, but compressing the depth image allows more than seven devices to be supported, reduces latency, and reserves network bandwidth for other data to be transmitted.

Classic Huffman coding constructs a code word with the shortest average length of an abnormal head according to the occurrence probability of characters, and when the created Huffman tree contains more weighted paths, searching for the corresponding Huffman coding for the input pixel point is relatively complex and consumes more time, so that the coding efficiency is reduced. The application aims to provide a depth map compression method capable of rapidly searching Huffman coding for input pixel points.

Disclosure of Invention

The invention aims to provide a depth map compression method, a decompression method and an encoder based on Huffman coding aiming at the defects of the background technology, the fast compression of the depth map is realized by fast calculating the Huffman coding for the input pixel points, and the technical problem of low efficiency of the traditional Huffman coding is solved.

The invention adopts the following technical scheme for realizing the aim of the invention:

the depth map compression method based on the Huffman coding comprises the steps of establishing a Huffman tree according to an input data stream extracted from a depth image, training a linear regression model describing the corresponding relation between depth map input pixel points and actual Huffman coding values through a deep learning algorithm according to the established Huffman tree, and constructing a coding table corresponding to the input pixel points according to optimized Huffman coding output by the trained linear regression model.

Furthermore, in the depth map compression method based on the Huffman coding, in the process of training a linear regression model describing the corresponding relation between the input pixel points of the depth map and the actual Huffman coding values through a deep learning algorithm, a gradient descent method is adopted to fit the linear regression model parameters enabling the cost function to be minimum.

Further, in the depth map compression method based on huffman coding, a method for creating a huffman tree from an input data stream extracted from a depth image is as follows: and converting the depth map into an input data stream of binary coding, converting character values of the input data stream into ascii codes, and then constructing a Huffman tree according to the weight of each character in the ascii codes.

Furthermore, in the depth map compression method based on the Huffman coding, the coding table is stored in an output file.

Still further, in the depth map compression method based on huffman coding, the cost function is:

x is an input pixel point, a and b are parameters of a linear regression model, m is the training quantity, y (x) is a calculated value of the linear regression model, and a (x) is an actual Huffman value.

Still further, in the depth map compression method based on huffman coding, a specific method of using a gradient descent method to combine parameters of a linear regression model with the minimum cost function is as follows: according to the expression

And decreasing the parameter a of the linear recursive model until the model converges, wherein alpha is the step length.

Furthermore, the depth map compression method based on huffman coding adopts standard library functions open or fopen provided by c language to convert the depth map into binary coded input data stream.

The encoder for realizing the method comprises the following steps:

a creation module that creates a Huffman tree from an input data stream extracted from the depth image,

a coding optimization module for reading the Huffman tree created by the creation module, training a linear regression model describing the corresponding relation between the input pixel points of the depth map and the actual Huffman coding values through a deep learning algorithm, and,

and the coding table generating module reads the optimized Huffman codes output by the coding optimizing module and constructs a coding table corresponding to the input pixel points.

Further, the encoder includes a memory for storing an output file holding an encoding table.

A depth map compression method reads the coding table output by the compression method, and completes the decompression of the depth map after decoding.

By adopting the technical scheme, the invention has the following beneficial effects: aiming at the time-consuming problem of searching for the Huffman codes from the Huffman tree, the invention provides a linear regression model for searching the codes by deep learning and training, the model parameter which enables the cost function to be minimum is determined by adopting a gradient descent algorithm, and the Huffman codes of input pixel points are rapidly optimized by using the trained model, so that the bandwidth required by depth map transmission is effectively saved, the compression time is shortened, and the coding efficiency is improved.

Drawings

Fig. 1 shows a huffman tree constructed according to the weight of the character "ffwqafaawe".

FIG. 2 is a flow chart of the present invention.

Detailed Description

The technical scheme of the invention is explained in detail in the following with reference to the attached drawings.

First we need to obtain depth pictures using a depth camera, such as microsoft's kinect camera. And then converting the obtained depth map into an input stream through a standard library function open or fopen provided by c language, calculating the occurrence frequency of each character through the input stream to obtain the weight of each character, and constructing a Hoffman tree through the obtained weight. For example, if the character represented by the binary-to-ascii code of the input stream is "ffwqafaawe", and the weight of f is 4,w, 2,q, 1,a, 3,e is 1, the huffman tree constructed by the weights is as shown in fig. 1. The method comprises the steps that values of a left node and a right node are respectively represented by 0 and 1, wherein the value of e is 0000, the value of q is 0001, the value of a is 001, the value of a is 01, the value of f is 1, the Huffqafaawe Huffman code is 110010001011101010010000, a structural body is created and the weight of the structural body is stored, a linked list is built through pointers of the left node and the right node, a Huffman tree is created through the obtained weight, huffman coding of characters is obtained after recursion is conducted through the Huffman tree, a coding table is built through the obtained Huffman coding, verification and decoding are facilitated, and the obtained Huffman coding is stored in an output file. For a determined string of Huffman codes, finding leaf nodes along a binary sequence in the codes from a root node to obtain characters, applying the characteristic of prefix codes, decoding a character, and restarting the next decoding from the root node without an error path or redundant operation.

The method comprises the following steps of optimizing Huffman coding by using a deep learning algorithm, mainly enabling the deep learning algorithm to quickly find Huffman coding values corresponding to depth image pixel points, so that the input is pixel points of a depth image, the output is corresponding numerical values of the pixel points after the Huffman coding, a hidden layer is a training algorithm based on the Huffman coding, and firstly, a linear regression model is created: y = ax + b, where x is a pixel point of the depth map, y is a value converted by huffman coding, a cost function is required to be used when a linear regression model capable of accurately describing a relationship between data needs to be found, the cost function is used for describing a difference between the linear regression model and the normal data, if there is no difference, the linear regression model can completely represent the relationship between the data, if a best-fit linear regression model needs to be found, the cost function is small enough, and the cost function is

Where m is the number of trains, y (x) is the calculated value of the model, and a (x) is the actual huffman value, to make the cost function small enough, we can use a gradient descent algorithm, i.e. continue to do the following operations until f (a, b) converges, and then>

Where α is the step size, we can simplify itIs->

The abscissa is a, the ordinate is f (a), the equation can be moved to the lowest point through continuous circulation so as to obtain the minimum value of the cost function, and therefore the corresponding Huffman value can be calculated through the pixel points, and the compression time is greatly reduced.

As shown in fig. 2, for a depth image to be compressed, a huffman tree is firstly created, then the created huffman tree is input into a linear regression model, the model is trained circularly through a deep learning algorithm, parameters during model convergence are obtained through a gradient descent method in the process of circular training, optimized huffman coding close to an actual huffman value can be rapidly calculated through the trained linear model according to input pixel points, and compared with a traditional huffman coding technology, the compression time is greatly shortened. And constructing a coding table corresponding to the input pixel points according to the optimized Huffman coding output by the linear regression model, and storing the coding table in an output file for verification and decoding calling.

Through tests, the time required for compressing a 640 × 480 depth map by the huffman coding disclosed by the application is 11ms, the depth map with 614400 bytes is compressed into 148280 bytes, the originally required bandwidth is 600kb, and the bandwidth required after the huffman coding compression is 140kb, so that the purpose of saving the bandwidth is realized. The time required for the deep learning algorithm to be optimized after the continuous training of 500 depth maps is 2ms. Although the compression size is not changed, the compression time is greatly shortened. The depth map compression method disclosed by the application is suitable for Hua Jieai meter A100/A200, and the equipment such as Microsoft kinect can be used as long as the equipment obtains the depth map.

Claims

1. The depth map compression method based on the Huffman coding is characterized in that a Huffman tree is created according to an input data stream extracted from a depth image, a linear regression model describing the corresponding relation between a depth map input pixel point and an actual Huffman coding value is trained through a deep learning algorithm according to the created Huffman tree, and the trained linear regression model is used forConstructing a coding table corresponding to the input pixel points by optimized Huffman coding output by the model; in the process of training a linear regression model describing the corresponding relation between the input pixel points of the depth map and the actual Huffman coding value through a deep learning algorithm, fitting a linear regression model parameter with the minimum cost function by adopting a gradient descent method, wherein the cost function is as follows:

x is an input pixel point, a and b are parameters of a linear regression model, m is the training quantity, y (x) is a calculated value of the linear regression model, a (x) is an actual Huffman value, and the specific method for fitting the linear regression model parameters with the minimum cost function by adopting a gradient descent method comprises the following steps: according to the expression->

And decreasing the parameter a of the linear recursive model until the model converges, wherein alpha is a step length, and f (a) is a vertical coordinate.

2. The huffman coding based depth map compression method of claim 1, wherein the huffman tree is created from the input data stream extracted from the depth image by: and converting the depth map into an input data stream of binary coding, converting character values of the input data stream into ascii codes, and then constructing a Huffman tree according to the weight of each character in the ascii codes.

3. The method of huffman coding based depth map compression as claimed in claim 1, wherein the coding table is stored in an output file.

4. The huffman coding-based depth map compression method as claimed in claim 2, wherein the depth map is converted into binary coded input data stream using standard library function open or fopen provided by c language.

5. An encoder for implementing the method of claim 1, comprising:

6. The encoder of claim 5, further comprising a memory for storing an output file holding the encoding table.

7. A depth map compression method, characterized in that, the coding table outputted by the compression method of claim 1 is read, and decompression of the depth map is completed after decoding.