CN113132755A

CN113132755A - Extensible man-machine cooperative image coding method and coding system

Info

Publication number: CN113132755A
Application number: CN201911415561.7A
Authority: CN
Inventors: 刘家瑛; 胡越予; 杨帅; 王德昭; 郭宗明
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2021-07-16
Anticipated expiration: 2039-12-31
Also published as: CN113132755B

Abstract

The invention discloses an extensible man-machine cooperative image coding method and system. The method comprises the following steps: extracting an edge image of each sample picture and vectorizing the edge image to be used as compact representation of a driving machine vision task; extracting key points in the vectorized edge image to serve as auxiliary information; respectively performing entropy coding lossless compression on the compact representation and the auxiliary information to obtain two paths of code streams; preliminarily decoding the two paths of code streams to obtain an edge graph and auxiliary information; inputting the edge graph obtained by decoding and auxiliary information into a neural network to perform forward calculation of the network; performing loss function calculation according to the obtained calculation result and the corresponding original picture, and reversely transmitting the calculated loss to a neural network for network weight updating until the neural network converges to obtain a double-path code stream decoder; acquiring an edge image and auxiliary information of an image to be processed, and coding and compressing the edge image and the auxiliary information to obtain two paths of code streams; and the double-path code stream decoder decodes the received code stream and reconstructs an image.

Description

Extensible man-machine cooperative image coding method and coding system

Technical Field

The invention belongs to the field of image coding, and relates to an extensible man-machine cooperative image coding method and coding system.

Background

Lossy image compression is an indispensable key technology in the use and propagation of digital images. The traditional lossy image compression scheme performs compression by converting an image to obtain a compact representation and then continuing quantization and entropy coding, so that the overhead of the digital image in the storage and transmission processes is greatly reduced, and the digital image is generally used in daily life.

With the development of computer vision technology, the quality of images under machine vision needs to be considered in more and more application scenes, that is, the lossy compressed images can still maintain the performance equivalent to lossless images under the task of machine vision. However, the traditional lossy image compression scheme is only optimized for human vision, and the quality under machine vision cannot be guaranteed. However, if only the features of the machine vision task are considered to be compressed, and the restoration and reconstruction of the image are not guaranteed, the observation cannot be carried out under the vision of human eyes.

In order to simultaneously ensure the performance under human vision and machine vision, the invention provides an extensible man-machine cooperative image coding system. On the basis of different requirements, the code streams of different levels can be transmitted and decoded to obtain a reconstructed image only aiming at machine vision and a reconstructed image aiming at human eye vision.

Disclosure of Invention

Under the technical background, the invention designs an extensible man-machine cooperative image coding method and a coding system. Different from the traditional visual single code stream, the invention provides an expandable coding frame, and two paths of code streams, namely a visual-driven compact representation code stream and an auxiliary information code stream, are generated simultaneously, so that decoding reconstruction is carried out according to different task requirements. The decoder of the invention adopts a generation model and can decode the code streams of different levels. For the visual-driven compact representation code stream, a reconstructed image for machine vision is generated. And generating a reconstructed image aiming at the human vision for the compact representation code stream and the auxiliary information code stream driven by the vision. The whole frame is shown in figure 1.

The technical scheme of the invention is as follows:

a method for coding extensible man-machine cooperation images comprises the following steps:

1) extracting an edge map of each sample picture;

2) vectorizing the edge graph by using a Bezier curve to be used as a compact representation for driving a machine vision task; then, extracting key points from the vectorized edge image, and taking the extracted key points as auxiliary information;

3) respectively performing entropy coding lossless compression on the compact representation and the auxiliary information to obtain two paths of code streams;

4) preliminarily decoding the two paths of code streams to obtain an edge graph and auxiliary information;

5) for a task of generating a reconstructed image aiming at human vision, inputting an edge image obtained by decoding and auxiliary information into a neural network to perform forward calculation of the network; for a reconstructed image task aiming at machine vision, inputting an edge graph obtained by decoding into a generated neural network, and carrying out forward calculation on the network;

6) performing loss function calculation on the calculation result obtained in the step 5) and the corresponding original picture, and reversely transmitting the calculated loss to a neural network for updating the network weight;

7) repeating the steps 2) -6) until the loss of the neural network is converged, and obtaining a double-path code stream decoder aiming at the human eye vision image reconstruction task or a compact representation code stream decoder aiming at the machine vision image reconstruction task;

8) for an image I to be processed, acquiring an edge image and auxiliary information of the image I, coding and compressing the edge image and the auxiliary information to obtain two paths of code streams,

are respectively marked as B_EAnd B_C；

9) And selecting a double-path code stream decoder or a compact representation code stream decoder to decode the received code stream according to the task requirement, and reconstructing an image.

Further, the method for extracting the key points comprises the following steps: and if the vectorized line of the edge map is a straight line segment, extracting key points by using a straight line mode, otherwise, extracting key points by using a Bezier curve mode.

Further, the method for extracting the key points by using the straight-line mode comprises the following steps: if the included angle between the straight line segment and the horizontal line is larger than a set angle, sampling two color values of the midpoint of the line passing segment on the horizontal line at equal intervals left and right; if the color value is smaller than or equal to the set angle, sampling two color values of the midpoint of the line passing section on a vertical line at equal intervals up and down; the method for extracting the key points by using the Bezier curve mode comprises the following steps: recording a tangent point of a line parallel to a connecting line of a starting point and an end point of the Bezier curve and the Bezier curve, and if an included angle between a tangent line and a horizontal line is larger than a set angle, sampling a color value in the curve on the horizontal line by an over-tangent point; if the value is smaller than or equal to the set angle, the over-cut point samples a color value in the curve on the vertical line.

Further, the set angle is 45 °.

Further, for the machine vision task, the code stream B corresponding to the edge map in the step 8) is used_ESending to a compact representation code stream decoder, and in step 9), the compact representation code stream decoder is used for decoding the code stream B_EDecoding to obtain vectorized edge image E and carrying out forward transmission on the vectorized edge image E to obtain a decoded image

For human eye vision task, step 8) is to convert the code stream B_EAnd B_CSending to a double-path code stream decoder, and in the step 9), the double-path code stream decoder is used for decoding a code stream B_EAnd B_CDecoding to obtain E and C and forward transmitting to obtain decoded image

A method for generating training of a two-way code stream decoder comprises the following steps:

1) extracting an edge map of each sample picture;

5) inputting the edge graph obtained by decoding and auxiliary information into a neural network to perform forward calculation of the network;

7) and repeating the steps 2) -6) until the loss of the neural network is converged, and obtaining a double-path code stream decoder aiming at the task of reconstructing the image by human vision.

A method for generating training of a compact representation code stream decoder comprises the following steps:

1) extracting an edge graph of each sample picture, and carrying out vectorization on the edge graph to be used as compact representation of a driving machine vision task;

2) performing entropy coding lossless compression on the compact representation to obtain a path of code stream;

3) carrying out preliminary decoding on the code stream to obtain an edge graph;

4) inputting the edge graph obtained by decoding into a neural network, and carrying out forward calculation on the network;

5) performing loss function calculation according to the calculation result obtained in the step 4) and the corresponding original picture, and reversely transmitting the calculated loss to a neural network for updating the network weight;

6) and repeating the steps 2) to 5) until the loss of the neural network is converged, and obtaining a compact representation code stream decoder aiming at the task of machine vision reconstruction image.

An extensible man-machine cooperative image coding system is characterized by comprising an encoder, a two-way code stream decoder and a compact representation code stream decoder; wherein the content of the first and second substances,

an encoder for extracting an edge map of a picture; vectorizing the edge graph by utilizing a Bezier curve to serve as a compact representation for driving a machine vision task; then, extracting key points from the vectorized edge image, and taking the extracted key points as auxiliary information; then, respectively carrying out entropy coding lossless compression on the compact representation and the auxiliary information to obtain two paths of code streams;

the double-path code stream decoder is used for decoding the two paths of code streams to obtain an edge image and auxiliary information, and then transmitting the edge image and the auxiliary information obtained by decoding in the forward direction to obtain a decoded image for a human eye vision image reconstruction task;

and the compact representation code stream decoder is used for decoding the code stream corresponding to the edge map to obtain the edge map, and then transmitting the edge map obtained by decoding in the forward direction to obtain a decoded image for the task of reconstructing the image by machine vision.

The method uses Bezier curves to extract an image edge image and vectorize the image edge image as compact representation of a vision task of a driving machine, obtains the coordinates of key points according to information such as the positions and parameters of all straight lines and curves in the vectorized edge image, extracts the key points from an original image, and encodes the key points to generate two corresponding paths of code streams, as shown in figure 2.

The main steps of the method of the invention are described next.

Step 1: collecting a batch of pictures, extracting edge images, and storing the collected pictures as the target of network output.

Step 2: and vectorizing the edge graph by using a Bessel curve. And sampling key points in the vectorized edge image as auxiliary information (the edge image is represented as a straight line and a curve after vectorization; and calculating the coordinates of the key points according to the positions, parameters and other information of the straight line and the curve, wherein the coordinates are used for extracting the key points in the originally acquired image). The extraction of key points is divided into two modes: a straight line mode and a bezier curve mode. If the vectorized line is a straight line segment, a straight line mode is used, and otherwise, a Bezier curve mode is used. In the straight line mode, if the included angle between the straight line segment and the horizontal line is more than 45 degrees, the midpoint of the line passing segment is sampled at equal intervals left and right on the horizontal line and two color values are recorded; if the color value is less than or equal to 45 degrees, two color values are sampled from the midpoint of the line passing section on the vertical line at equal intervals up and down and recorded. In the Bezier curve mode, a line parallel to a connecting line of a starting point and an end point of the Bezier curve is recorded with a tangent point of the Bezier curve, in the Bezier curve mode, a section of edge is described by using the Bezier curve, as shown in the specification and attached figure 2(c), the starting point of the Bezier curve is Ps, the end point of the Bezier curve is Pt, the Ps and the Pt are connected to obtain a straight line PsPt, the tangent line of the straight line PsPt and the curve is made, and the tangent point is taken. If the included angle between the tangent line and the horizontal line is more than 45 degrees, recording a color value of the over-tangent point in the sampling curve on the horizontal line; if the value is less than or equal to 45 degrees, one color value of the over-cut point in the sampling curve on the vertical line is recorded.

And step 3: and performing entropy coding lossless compression on the vectorized edge graph and the key point auxiliary information which are compactly represented to obtain two paths of code streams.

And 4, step 4: and carrying out preliminary decoding on the two paths of code streams to obtain an edge graph and key point auxiliary information.

And 5: for a decoder of a double-path code stream, an edge graph and corresponding key point auxiliary information are used as input and sent into a corresponding generation neural network (which can be a Pixel2Pixel network) to perform forward calculation of the network; for a decoder aiming at a visual drive compact representation code stream, an edge graph is taken as input and sent into a corresponding generation neural network, and forward calculation of the network is carried out.

Step 6: and 5, obtaining a calculation result, and performing loss function calculation with the original image.

And 7: and reversely transmitting the calculated loss to each layer of the two network neural networks to update the weight, so that the result is closer to the target effect in the next iteration.

And 8: and repeating the steps 2-7 until the losses of the two neural networks converge. This results in a decoder network for a two-way code stream and a decoder network for a visually driven compact representation code stream.

Compared with the prior art, the invention has the following positive effects:

the invention is an expandable image lossy compression scheme, which not only ensures the visual quality of human eyes, but also ensures the performance of machine vision tasks. Unlike the traditional image lossy compression method, which only outputs a single code stream, the compression scheme in the invention generates two parts of code streams: a visually driven compact representation code stream and an auxiliary information code stream. Specifically, the method uses Bezier curve to represent the edge information of the image as a basic code stream, extracts key points in the image as a supplementary code stream on the basis, and uses the two code streams to represent the image, thereby efficiently compressing the image. In addition, the invention adopts the generated neural network model to construct a decoder, and respectively generates an image aiming at machine vision and an image aiming at human eye vision by inputting a basic code stream or jointly inputting a base and supplementing individual code streams, and the reconstruction quality of the two images achieves excellent effect.

The following data demonstrates the performance improvement of the present method over existing JPEG image compression methods. The test measures the accuracy (error rate) of different methods on the detection task of the key points of the human face under the extremely low code rate and the subjective quality of the human eyes scored by the test:

therefore, the invention can achieve better performance under lower code rate.

Drawings

Fig. 1 is a framework of an expandable man-machine cooperative image encoder.

FIG. 2 is a vectorized image edge graph key point auxiliary information extraction method;

(a) vectorized edge map, (b) straight line (> 45)⁰) (c) straight line (< 45)⁰) And (d) a Bezier curve.

Detailed Description

For further explanation of the technical method of the present invention, the extendable man-machine cooperative image encoder of the present invention is further described in detail below with reference to the drawings and specific examples of the specification.

The present example will focus on the detailed description of the training process of the encoder encoding flow and the decoder generating network in the technical method. Suppose we have now constructed the required decoder generation network and have N training images { I }₁,I₂,…,I_NAs training data.

Firstly, a training process:

step 1: will { I₁,I₂,…,I_NThe vectorized graph of each image edge map in the graph is denoted as { E }₁,E₂,…,E_NRecording auxiliary information of corresponding key points as { C }₁,C₂,…,C_N}。

Step 2: according to FIG. 1, { E }₁,E₂,…,E_NAnd { C }₁,C₂,…,C_NAnd sending the data to a generating network for forward transmission. For a decoder-generated network for machine vision tasks, the input is only { E }₁,E₂,…,E_N}。

And step 3: forward transfer to obtain output

Computing the output and { I }₁,I₂,…,I_NLoss error of.

And 4, step 4: and after the error value is obtained, performing back propagation of the error value on the network to train the network to update the model weight.

And 5: and repeating the steps 1-4 until the neural network converges.

II, encoding and decoding processes:

step 1: and extracting an edge map of the image I, and recording the edge map as E in a map memory after vectorization of the edge map by a Bezier curve.

Step 2: and extracting the auxiliary information of the key points according to the vectorized edge image. By traversing all of its segments, the keypoints are sampled according to its segment pattern. And recording the extracted key point auxiliary information as C.

And step 3: coding E according to Scalable Vector Graphics (SVG) format, and entropy coding with C to obtain two code streams respectively marked as B_EAnd B_C。

And 4, step 4: and selecting a decoder to decode the code streams of different grades according to requirements. For machine vision tasks, only the decoder is required to decode B_EAnd obtaining the vectorized edge image E. Inputting the image into corresponding network for forward transmission to obtain decoded image

For human eye vision tasks, decoding B is required_EAnd B_CObtaining E and C, sending the E and C into a corresponding generation network for forward transmission to obtain a decoded image

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for coding extensible man-machine cooperation images comprises the following steps:

1) extracting an edge map of each sample picture;

6) performing loss function calculation according to the calculation result obtained in the step 5) and the corresponding original picture, and reversely transmitting the calculated loss to a neural network for updating the network weight;

8) for an image I to be processed, acquiring an edge image and auxiliary information of the image I, coding and compressing to obtain two paths of code streams, and respectively recording the two paths of code streams as B_EAnd B_C；

2. The method of claim 1, wherein the method of extracting the keypoints is: and if the vectorized line of the edge map is a straight line segment, extracting key points by using a straight line mode, otherwise, extracting key points by using a Bezier curve mode.

3. The method of claim 2, wherein the method of extracting key points using the straight-line pattern is: if the included angle between the straight line segment and the horizontal line is larger than a set angle, sampling two color values of the midpoint of the line passing segment on the horizontal line at equal intervals left and right; if the color value is smaller than or equal to the set angle, sampling two color values of the midpoint of the line passing section on a vertical line at equal intervals up and down; the method for extracting the key points by using the Bezier curve mode comprises the following steps: recording a tangent point of a line parallel to a connecting line of a starting point and an end point of the Bezier curve and the Bezier curve, and if an included angle between a tangent line and a horizontal line is larger than a set angle, sampling a color value in the curve on the horizontal line by an over-tangent point; if the value is smaller than or equal to the set angle, the over-cut point samples a color value in the curve on the vertical line.

4. The method of claim 3, wherein the set angle is 45 °.

5. The method of claim 1, wherein for the machine vision task, the code stream B corresponding to the edge map in step 8) is selected_ESending to a compact representation code stream decoder, and in step 9), the compact representation code stream decoder is used for decoding the code stream B_EDecoding to obtain vectorized edge image E and carrying out forward transmission on the vectorized edge image E to obtain a decoded image

6. A method for generating training of a two-way code stream decoder comprises the following steps:

1) extracting an edge map of each sample picture;

7. A method for generating training of a compact representation code stream decoder comprises the following steps:

8. An extensible man-machine cooperative image coding system is characterized by comprising an encoder, a two-way code stream decoder and a compact representation code stream decoder; wherein the content of the first and second substances,

9. The system of claim 8, wherein the method for training the two-way bitstream decoder comprises:

1) extracting an edge map of each sample picture;

10. The system of claim 9, wherein the method of training the compact representation bitstream decoder is:

1) extracting an edge map of each sample picture;

2) vectorizing an edge map as a compact representation of a driving machine vision task;

3) performing entropy coding lossless compression on the compact representation to obtain a path of code stream;

4) carrying out preliminary decoding on the code stream to obtain an edge graph;

5) inputting the edge graph obtained by decoding into a neural network, and carrying out forward calculation on the network;

7) and repeating the steps 2) to 6) until the loss of the neural network is converged, and obtaining a compact representation code stream decoder aiming at the task of machine vision reconstruction image.