CN111953989A

CN111953989A - Image compression method and device based on combination of user interaction and semantic segmentation technology

Info

Publication number: CN111953989A
Application number: CN202010702091.9A
Authority: CN
Inventors: 高陈强; 朱俊; 陈旭; 冉洁; 叶盛; 陈志乾
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2020-11-17

Abstract

The invention belongs to the technical field of image processing and computer vision, and particularly relates to an image compression method and device based on combination of user interaction and semantic segmentation technology, wherein the method comprises the steps of preprocessing an image to be compressed by utilizing a semantic segmentation network to obtain semantic segmentation of the image; dividing the image into a plurality of categories of image blocks according to semantic segmentation; the user sets the compression quality of each category of image blocks; based on the set compression quality, compressing the image blocks of each category by using a BPG (Business Process group) coding tool to obtain an intermediate file; decoding the intermediate file by using a BPG decoding tool to obtain a decompressed image block; combining the decompressed image blocks; the method can meet the requirements of compressing the local areas of the image to different degrees in a specific scene, can automatically sense the interest area and automatically adjust the image compression quality of the different local areas of the image, and has huge potential commercial value and good popularization value.

Description

Image compression method and device based on combination of user interaction and semantic segmentation technology

Technical Field

The invention belongs to the technical field of image processing and computer vision, and particularly relates to an image compression method and device based on combination of user interaction and semantic segmentation technology.

Background

On one hand, image compression is one of important support technologies in the field of information technology, and is also one of research hotspots in the field of computer vision. Conventional image compression standards, such as JPEG, JPEG2000, etc., usually have image distortion phenomena, such as blocking effect and compression artifact, when compressing at a low bit rate. BPG is a new image format based on High Efficiency Video Coding, HEVC (High Efficiency Video Coding). Compared with JPEG and JPEG2000, the BPG has higher compression quality under the same compression rate. Meanwhile, BPG supports lossless compression. However, the BPG can only support compression with the same compression ratio in the image global, and cannot compress local regions of interest with different compression ratios, which makes some specific image compression requirements unsatisfied.

Image segmentation techniques, on the other hand, are a pre-processing step of many image processing tasks. The conventional image segmentation method is divided into: threshold-based segmentation methods, region-based segmentation methods, watershed algorithms, and the like. However, the result of the conventional image segmentation method has no semantic information, that is, the algorithm itself does not know what information is about the segmented result. With the development of deep learning, the image semantic segmentation technology based on deep learning can realize the decomposition of a scene of an image into several individual entities and classify each entity with a pixel level of fineness and mark an accurate boundary. This type of algorithm can separate the foreground and background of an image and identify the class to which the pixels in the foreground belong. Further, labeling and training the interested category can adaptively identify and segment the interested region, and the result can be used for guiding other high-level image processing tasks.

Therefore, in order to solve the problem of local compression of the image, the invention guides the image compression task by using the result of semantic segmentation to realize local compression with different compression qualities on different interest areas of the same image. The method can meet the specific requirements of local image compression, and has strong application value and wide market value.

Disclosure of Invention

In order to obtain a better compressed image, the invention provides an image compression method and device based on the combination of user interaction and semantic segmentation technology, wherein the method comprises the following steps:

inputting an image to be compressed, and preprocessing the image by utilizing a semantic segmentation network to obtain semantic segmentation of the image;

dividing the image into a plurality of categories of image blocks according to semantic segmentation;

the user sets the compression quality of each category of image blocks;

based on the set compression quality, compressing the image blocks of each category by using a BPG (Business Process group) coding tool to obtain an intermediate file;

decoding the intermediate file by using a BPG decoding tool to obtain a decompressed image block;

and combining the decompressed image blocks to obtain a compression result of the original image.

Furthermore, the semantic segmentation network comprises a convolution coding end, a convolution decoding end and a softmax layer, wherein the convolution coding end is provided with three first convolution layers, and each first convolution layer comprises convolution operation, normalization operation, pooling operation and activation operation; the convolution coding end and the convolution coding end are symmetrically provided with three second convolution layers, and each second convolution layer comprises convolution operation, normalization operation, up-sampling operation and activation operation.

Further, the dividing the image block according to semantic division specifically includes: according to the semantic labels, a plurality of divided areas are aggregated into a specific number of areas by a k-means method, so that the boundary contour of the image area is continuous and smooth, and the boundary contour information is mapped to an original image, so that the image to be compressed is divided into different image blocks.

Further, aggregating the plurality of divided regions into a specific number of regions by a k-means method includes: and aggregating the semantic information of the separation result into N clusters according to the Euclidean distance between the pixels of the image block, dividing the image into N regions based on the N clusters obtained after clustering, mapping the N regions to the image to be compressed, dividing the image with compression into N regions, and storing the upper left corner of each region in the image to be compressed to obtain coordinate information.

Furthermore, according to the coordinate information of the upper left corner in the image to be compressed of each area, the decompressed image blocks are placed in the corresponding positions of the original image areas to form a compressed result image with the size consistent with that of the original image.

The invention also provides an image compression device based on the combination of user interaction and semantic segmentation technology, which comprises a semantic segmenter, an image block clustering device, a custom compression quality module, a BPG encoder, a BPG decoder and a compressed image output device, wherein:

the semantic divider is used for performing semantic division on an input image to be compressed, acquiring a boundary outline of the image and dividing the image into a plurality of image blocks according to the boundary outline;

the image block clustering device is used for further dividing the image blocks divided by the semantic divider and clustering the image blocks into N areas;

the user-defined compression quality module is used for the user to specify the compression quality of each area;

the BPG coder is used for compressing each area according to the compression quality specified by the user to produce an intermediate file;

the BPG decoder is used for decoding the intermediate file to obtain a decompressed image block;

and the compressed image output device is used for combining the decompressed image blocks to obtain a compression result of the bar original image and outputting the compression result to a user.

The invention has the beneficial effects that:

1) the invention provides an image compression method based on combination of user interaction and semantic segmentation technology, which can compress local regions of interest in an image with different compression qualities, so that the compressed image can locally adjust the compression quality, and meanwhile, the compressed image is more in line with human eye perception characteristics;

2) the method adopts a semantic segmentation technology based on deep learning, and the technology can adaptively extract boundary information of different categories, is used for segmenting the image interesting region and further guides a local region compression task.

Drawings

FIG. 1 is a schematic overall flow chart of a codec according to the present invention;

FIG. 2 is a schematic diagram of a semantic segmentation network;

fig. 3 is a schematic diagram of a process of generating image blocks of different areas.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides an image compression method based on combination of user interaction and semantic segmentation technology, which comprises the following steps:

the user sets the compression quality of each category of image blocks;

Example 1

As shown in fig. 1, the processing of the picture in this embodiment includes two steps, namely, performing compression coding on the picture, and decoding the coded and compressed picture, and specifically includes:

(1) encoding process

Carrying out memorial semantic segmentation on an image to be compressed, and further carrying out region division by using k-means to obtain region contour information of the image;

dividing the image into a plurality of interest areas according to the area outline;

the user specifies a compression quality for each region;

compressing the image blocks of each category by using a BPG coding tool to obtain an intermediate file;

(2) decoding process

Example 2

This example is described in addition to example 1.

The present embodiment uses the semantic segmentation network as the preprocessing, so the semantic segmentation network needs to be pre-trained. This time, the source data set is used as the training data set of the network, and the data set is labeled with multiple categories (including background categories) such as people, cars, trees, and the like, and covers common targets of human eyes, and these targets are regarded as objects interested by human eyes in this embodiment.

In order to obtain the boundary contour information of the target of interest, the image to be compressed is input to a pre-trained semantic segmentation network, and the schematic diagram of the network structure is shown in fig. 2. The convolutional encoder comprises three first convolutional layers, each first convolutional layer comprises a convolutional operation, a normalization operation, an activation function and a pooling layer, deep features are extracted by continuously utilizing the convolutional operation at a convolutional encoding end, and important features are selected through the pooling layers to reduce the size of the features; and at the convolution decoding end, a network structure symmetrical to the convolution encoding end is adopted, three second convolution layers are arranged, each second convolution layer comprises a convolution operation, a normalization operation, an activation function and an upsampling layer, and the image is enlarged through the upsampling layer. In the embodiment, the convolution layer rich feature information is utilized, and the pooling layer is replaced by upsampling to increase the image resolution. Further, in order to obtain richer information, feature information of each layer of the convolution coding end and features of each layer of the symmetric convolution decoding end are fused to obtain a feature layer with the size of the original image, the feature layer is sent to the Softmax layer to carry out pixel classification to obtain the probability of each category of different pixels, and the category where the maximum probability value is located is selected as the category to which the pixel belongs.

A schematic diagram of a method for generating image blocks of different regions of interest is shown in fig. 3. Since the semantic segmentation result is not necessarily accurate, a lot of segmentation noise often appears, which causes the segmentation area to be too discrete, and it is difficult to propose an effective region of interest. Therefore, the embodiment proposes to use a k-means clustering algorithm to aggregate the semantic information of the segmentation result into N clusters, wherein the aggregation is based on the euclidean distance between the pixels of the segmentation result. The number N of the cluster centers is a super parameter which can be set by a user and is smaller than the number of labels of the semantic segmentation result. Then extracting boundary contour information of the N regions, mapping the boundary contour information to an original image, and dividing the original image into the N regions; preferably, the output image in fig. 2 is divided into 5 clusters, i.e. 5 areas of different colors. Meanwhile, the coordinate information of the upper left corner of each area in the original image is saved for subsequent combination.

For each region, the compression quality is specified by the user, and the compression quality can be selected from any integer of 1 to 50, and the value serves as a parameter of BPG compression encoding. The compression quality reflects the degree of compression of the image. The larger the numerical value is, the larger the compression ratio is, and the poorer the quality of the compressed image is; conversely, the smaller the value, the smaller the compression ratio, and the better the image quality after compression.

In the invention, an open-source libpg tool is used to transmit specified compression quality parameters, and an image compression intermediate file decoding end is used to decode an intermediate file by using a BPG decoding tool to obtain a decompressed image block; and combining the images according to the semantic segmentation result to obtain a compressed output picture.

The decompressed image blocks are kept consistent with the input image blocks in size, and the decompressed image blocks are placed in the corresponding positions of the original image areas according to the stored coordinate information of the upper left corner of each area, so that a compressed result image with the size consistent with that of the original image can be combined.

Example 3

The present embodiment provides an image compression apparatus based on a combination of user interaction and a semantic segmentation technology, the apparatus includes a semantic segmenter, an image block clusterer, a custom compression quality module, a BPG encoder, a BPG decoder, and a compressed image outputter, wherein:

In this embodiment, the image block clustering device has a similar structure to the semantic segmentation network in embodiment 1, and includes a convolution encoding end, a convolution decoding end, and a softmax layer, where the convolution encoding end is provided with three first convolution layers, and each first convolution layer includes a convolution operation, a normalization operation, a pooling operation, and an activation operation; the convolution coding end and the convolution coding end are symmetrically provided with three second convolution layers, and each second convolution layer comprises convolution operation, normalization operation, up-sampling operation and activation operation.

In this embodiment, the image block clustering unit specifically operates to aggregate a plurality of partitioned areas into a specific number of areas by a k-means method according to semantic labels, so that the boundary contour of the image area is continuous and smooth, and the boundary contour information is mapped to the original image, thereby dividing the image to be compressed into different image blocks.

In this embodiment, aggregating a plurality of divided regions into a specific number of regions by a k-means method includes: and aggregating the semantic information of the separation result into N clusters according to the Euclidean distance between the pixels of the image block, dividing the image into N regions based on the N clusters obtained after clustering, mapping the N regions to the image to be compressed, dividing the image with compression into N regions, and storing the upper left corner of each region in the image to be compressed to obtain coordinate information.

In this embodiment, the compressed image output device puts the decompressed image blocks into the corresponding positions of the original image areas according to the coordinate information of the upper left corner in the image to be compressed in each area, and combines the decompressed image blocks into a compressed result image with the same size as the original image.

The invention does not carry out overall compression on the original image, but carries out compression with specified compression quality on the local part of the image, so that the compression quality of the compressed image can be locally adjusted, and a user can customize the local compression quality of the image to accord with the visual perception and the aesthetic feeling of the user, thereby having greater flexibility.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. The image compression method based on the combination of the user interaction and the semantic segmentation technology is characterized by comprising the following steps of:

the user sets the compression quality of each category of image blocks;

2. The image compression method based on the combination of the user interaction and the semantic segmentation technology as claimed in claim 1, wherein the semantic segmentation network comprises a convolutional encoding end, a convolutional decoding end and a softmax layer, the convolutional encoding end is provided with three first convolutional layers, and each first convolutional layer comprises a convolutional operation, a normalization operation, a pooling operation and an activation operation; the convolution coding end and the convolution coding end are symmetrically provided with three second convolution layers, and each second convolution layer comprises convolution operation, normalization operation, up-sampling operation and activation operation.

3. The image compression method based on the combination of user interaction and semantic segmentation technology as claimed in claim 1, wherein dividing image blocks according to semantic segmentation specifically comprises: according to the semantic labels, a plurality of divided areas are aggregated into a specific number of areas by a k-means method, so that the boundary contour of the image area is continuous and smooth, and the boundary contour information is mapped to an original image, so that the image to be compressed is divided into different image blocks.

4. The image compression method based on the combination of the user interaction and the semantic segmentation technology as claimed in claim 3, wherein the aggregating of the plurality of segmented regions into a specific number of regions by a k-means method comprises: the semantic information obtained by the separation network is aggregated into N clusters according to the Euclidean distance between image pixels, the image is divided into N semantic regions based on the N clusters obtained after clustering, the N semantic regions are mapped to the image to be compressed, the image to be compressed is divided into N image regions, and the coordinate information of the upper left corner of each region in the image to be compressed is stored.

5. The image compression method based on the combination of the user interaction and the semantic segmentation technology as claimed in claim 4, wherein the decompressed image blocks are placed in the corresponding positions of the original image areas according to the coordinate information of the upper left corner in the image to be compressed of each area to form a compressed result image with the same size as the original image.

6. The image compression apparatus based on user interaction and semantic segmentation technology combination as claimed in claim 1, comprising a semantic segmenter, an image block clustering device, a custom compression quality module, a BPG encoder, a BPG decoder and a compressed image outputter, wherein:

and the compressed image output device is used for combining the decompressed image blocks to obtain a compression result of the original image and outputting the compression result to a user.

7. The image compression device based on the combination of the user interaction and the semantic segmentation technology as claimed in claim 6, wherein the image block clustering device comprises a convolution coding end, a convolution decoding end and a softmax layer, the convolution coding end is provided with three first convolution layers, and each first convolution layer comprises a convolution operation, a normalization operation, a pooling operation and an activation operation; the convolution coding end and the convolution coding end are symmetrically provided with three second convolution layers, and each second convolution layer comprises convolution operation, normalization operation, up-sampling operation and activation operation.

8. The image compression device based on the combination of the user interaction and the semantic segmentation technology as claimed in claim 1, wherein the image block clusterer specifically operates to aggregate a plurality of segmented regions into a specific number of regions by a k-means method according to semantic tags, so that the image region boundary contour is continuous and smooth, and the boundary contour information is mapped to an original image, so as to divide the image to be compressed into different image blocks.

9. The image compression apparatus based on user interaction combined with semantic segmentation technology as claimed in claim 8, wherein aggregating a plurality of segmented regions into a certain number of regions by k-means method comprises: aggregating the semantic information of the separation result into N clusters according to the Euclidean distance between the pixels of the image block, dividing the image into N regions based on the N clusters obtained after clustering, mapping the boundary outlines of the N regions to the image to be compressed, dividing the image with compression into N regions, and storing the upper left corner of each region in the image to be compressed to obtain coordinate information.

10. The image compression apparatus based on the combination of user interaction and semantic segmentation technology as claimed in claim 9, wherein the compressed image output device puts the decompressed image blocks into the corresponding positions of the original image areas according to the coordinate information of the upper left corner in the image to be compressed of each area, and combines them into a compressed result image with the same size as the original image.