CN111193932A

CN111193932A - Image processing method and device

Info

Publication number: CN111193932A
Application number: CN201911284776.XA
Authority: CN
Inventors: 陈海宝; 孙浩然; 刘奕晨
Original assignee: Xian Wanxiang Electronics Technology Co Ltd
Current assignee: Xian Wanxiang Electronics Technology Co Ltd
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-05-22

Abstract

The present disclosure provides an image processing method and apparatus, relating to the technical field of computer images, wherein the method comprises: acquiring an image to be coded; generating a saliency region image according to a preset model; the pixel points of the image to be coded correspond to the pixel points of the image in the saliency region; dividing an image to be coded into N regions according to the saliency region image, wherein N is an integer greater than or equal to 2; and respectively coding and transmitting the N areas according to a preset priority order. According to the method and the device, the areas with high significance can be preferentially processed in the scene with limited bandwidth, and user experience can be improved.

Description

Image processing method and device

Technical Field

The present disclosure relates to the field of computer image technologies, and in particular, to an image processing method and apparatus.

Background

With the development of multimedia technology and network technology, people have higher and higher requirements on the image display effect on terminal equipment.

In order to improve the transmission efficiency and user experience of images, currently widely used encoding modes of JPG images include baseline (baseline) encoding and progressive (progressive) encoding, wherein the baseline encoding is to display image pixels one by one from top to bottom and from left to right at a receiving end, and the progressive encoding is to display the whole outline of the image pixels from fuzzy to clear at the receiving end.

Under the scene of limited transmission bandwidth, users all need to wait for a long time to see the image clearly, and experience is poor.

Disclosure of Invention

The embodiment of the disclosure provides an image processing method and device, which can preferentially process an area with high significance in a scene with limited bandwidth, and can improve user experience. The technical scheme is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including:

acquiring an image to be coded;

generating a saliency region image according to a preset model; the pixel points of the image to be coded correspond to the pixel points of the image in the saliency region;

dividing an image to be coded into N regions according to the saliency region image, wherein N is an integer greater than or equal to 2;

and respectively coding and transmitting the N areas according to a preset priority order.

In one embodiment, generating the saliency region image according to the preset model comprises:

extracting the salient region characteristics of each pixel point in the image to be coded according to a preset model;

quantizing the characteristic value of the salient region of each pixel point into a gray value of the saliency according to a preset rule;

and generating a saliency region image based on the gray values.

In one embodiment, dividing the image to be encoded into N regions according to the saliency region image comprises:

judging whether the significance gray value of each pixel belongs to a gray value interval corresponding to the ith area; wherein each region corresponds to at least one gray value interval, and i is more than or equal to 1 and less than or equal to N;

and if the significance gray value belongs to the gray value interval corresponding to the ith area, dividing the corresponding pixel point into the ith area.

In one embodiment, the pre-set model comprises a visual geometry group network VGGNet model.

In one embodiment, the encoding and transmitting the N regions according to a preset priority order respectively includes:

and coding and transmitting the N areas according to the sequence of gray value intervals from high to low.

In one embodiment, before acquiring the image to be encoded, the method further comprises:

monitoring the bandwidth and determining that the bandwidth meets a preset condition.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

the acquisition module is used for acquiring an image to be coded;

the generating module is used for generating a saliency region image according to a preset model; the pixel points of the image to be coded correspond to the pixel points of the image in the saliency region;

the dividing module is used for dividing the image to be coded into N areas according to the saliency area image, wherein N is an integer greater than or equal to 2;

and the processing module is used for coding and sending the N areas according to the preset priority order.

In one embodiment, the generating module comprises:

the extraction submodule is used for extracting the salient region characteristics of each pixel point in the image to be coded according to a preset model;

the quantization submodule is used for quantizing the characteristic value of the salient region of each pixel point into a gray value with the significance degree according to a preset rule;

and the generation submodule is used for generating a saliency area image based on the gray value.

In one embodiment, the partitioning module includes:

the judging submodule is used for judging whether the significance gray value of each pixel point belongs to the gray value interval corresponding to the ith area; wherein each region corresponds to at least one gray value interval, and i is more than or equal to 1 and less than or equal to N;

and the dividing submodule is used for dividing the corresponding pixel point into the ith area if the significance gray value belongs to the gray value interval corresponding to the ith area.

In one embodiment, the processing module is specifically configured to:

In one embodiment, the above apparatus further comprises:

and the determining module is used for monitoring the bandwidth before the image to be coded is obtained and determining that the bandwidth meets the preset condition.

By adopting the image processing method, the sending end respectively encodes and sends the N areas according to the preset priority order, so that a user can preferentially see the key areas with high priority in the image at the receiving end, and the non-key areas obtained by decoding the areas with low priority can be displayed later. Under the scene of limited bandwidth, the areas with high significance are processed preferentially, and the user experience can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart of an image processing method provided by an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a saliency region image provided by an embodiment of the present disclosure;

fig. 3a is a schematic diagram illustrating an effect of an image processing method according to an embodiment of the disclosure;

fig. 3b is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 4 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 5 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 6 is a structural diagram of an image processing apparatus provided in an embodiment of the present disclosure;

fig. 7 is a structural diagram of an image processing apparatus according to an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The present disclosure provides an image processing method and apparatus, which can preferentially process an area with high significance in a scene with limited bandwidth, and can improve user experience.

The embodiment of the present disclosure provides an image processing method, which is applied to an encoding end, and as shown in fig. 1, the image processing method includes the following steps:

step 101, obtaining an image to be coded;

102, generating a saliency region image according to a preset model; the pixel points of the image to be coded correspond to the pixel points of the image in the saliency region;

in an optional embodiment, generating the saliency region image according to the preset model includes:

and generating a saliency region image based on the gray values.

Optionally, the preset model is a Visual Geometry Group network (VGGNet) model.

VGG16 is a deep convolutional network commonly developed by oxford university computer vision group and deep mind corporation, and is often used to extract image features, and VGG16 can recognize 1000 objects with the image network ImageNet as a training set for a large visualization database for visual object recognition software research. VGGNet is a Convolutional Neural Network (CNN), and the structure of VGGNet is constructed based on a network VGG16, and performs saliency recognition on objects in an image to generate a saliency region image. In the step, a sending end transmits a picture to a trained object saliency detection neural network VGGNet, and performs saliency recognition on an object in the picture to generate a saliency region picture.

It should be noted that the saliency value output through the last fully connected layer in the VGGNet network is a value in the [0,1] interval, and for better understanding, the gray value interval quantized to [0,255] is visualized in a gray scale map manner, as shown in the saliency region image in fig. 2.

Specifically, through mapping of the saliency detection neural network, pixel points (for example, pixel points (35, 25, 56) representing information of three channels of RGB) at each position of the original image correspond to pixel points (saliency) of the saliency region Map one to one in spatial position. To facilitate the intuitive impression of Map, the saliency values in Map can be quantized from 0 to 255 (gray scale) to show in a gray scale image, with higher saliency being brighter in the gray scale image, as shown in Map in fig. 2. A grayscale image is a monochrome image having 256 grayscale levels or levels from black to white. Each pixel in the image is represented by 8-bit data, and thus the pixel values are between 256 shades of gray between black and white. To understand the concept of saliency more deeply, a multi-level saliency area map as in fig. 2 may be obtained by combining the original image and the saliency area image.

The saliency region image is substantially a matrix with a value in a range of [0,255], and the saliency value output through VGGNet is quantized to 0-255, so that the saliency region image can store saliency information of each pixel point in an original image.

The present disclosure utilizes a Visual Attention Mechanism (VA), also called saliency Mechanism, which refers specifically to a human automatically processing regions of interest while facing a scene and selectively ignoring regions of non-interest, these regions of human interest being called saliency regions.

The saliency metric value of the saliency area map is higher than the saliency metric value of the non-saliency area, the saliency metric value is a value of a saliency metric parameter, and the value of the saliency metric parameter is a value calculated by a preset method.

The visual saliency detection calculation refers to the calculation of the importance degree of information in a visual field by simulating a human visual attention mechanism by using methods such as mathematical modeling and the like. In recent years, object detection and saliency detection of pictures by using CNN have been greatly developed, and the patent adopts an object saliency detection neural network VGGNet based on a VGG16 network to detect saliency of objects in pictures. When the original VGGNet is trained in advance, for each pixel point in the original image, the method comprises the following steps of:

and outputting a numerical value as a significance value, taking the display value as a standard for measuring significance, and finally performing threshold division on the significance value to select the first 5 maximum numerical value intervals as 5-level significance regions.

103, dividing an image to be coded into N regions according to the saliency region image, wherein N is an integer greater than or equal to 2;

optionally, dividing the image to be encoded into N regions according to the saliency region image includes:

In practical application, the saliency gray value can be graded according to user requirements to form a multi-level saliency region.

For example, the saliency values ranging from 0 to 255 are divided into five regions [0, 50 ], [51,100 ], [101,150 ], [151,200 ], [201,255 ]. The higher the grayscale value of the saliency is, the more important the part of the re-image is for the user, and these five regions include a key region and a non-key region, where the key region may be at least one, such as a primary key region and a secondary key region, and the non-key region may also be at least one, such as a tertiary non-key region and a quaternary non-key region. Wherein, the [151,200 ], [201,255] can be determined as the secondary key area and the primary key area, and the [0, 50 ], [51,100 ], [101,150) can be determined as the five-level non-key area, the four-level non-key area and the three-level non-key area.

And step 104, coding and transmitting the N areas according to a preset priority order.

Optionally, the step specifically encodes and transmits the N regions according to the order of the gray value intervals from high to low.

Specifically, according to the N divided regions, N level layers are generated, which include a key layer and a non-key layer.

Specifically, the pixel points corresponding to the saliency values included in the key region may be attributed to the key layer, and the pixel points corresponding to the saliency values included in the non-key region may be attributed to the non-key layer.

If the key area is at least one, such as a primary key area, a secondary key area and the like, the generated layers are a primary key layer and a secondary key layer; if the non-key area is at least one, such as a three-level non-key area and a four-level non-key area, the generated layers are a three-level non-key layer and a driver non-key layer.

For example, the saliency value regions [151,200 ] and [201,255] are a secondary key region and a primary key region, then, the pixel points corresponding to the saliency value [151,200 ] are attributed to a secondary key layer, and the pixel points corresponding to the saliency value [201,255] are attributed to a primary key layer. The saliency value regions [0, 50), [51,100), [101,150) are five-level non-key regions, four-level non-key regions, and three-level non-key regions, so that the pixel points corresponding to the saliency value [0, 50) are attributed to five-level non-key layers, the pixel points corresponding to the saliency value [51,100) are attributed to four-level non-key layers, and the pixel points corresponding to the saliency value [101,150) are attributed to three-level non-key layers.

In this embodiment, the priority of the first-level key map layer is the highest, the second-level key map layer is the next to the third-level non-key area and the fourth-level non-key area, and the priority of the fifth-level non-key area is the lowest.

In the case of limited resources, the multi-level layers are encoded and transmitted in the priority order, and even some regions in the non-critical region may be discarded.

The sending end divides the key layer into a plurality of key macro blocks and divides the non-key layer into a plurality of non-key macro blocks; and preferentially coding and transmitting the key macro block according to a preset transmission rule.

Because the key macro block has higher significance, the key macro block can be transmitted preferentially so as to ensure that the key macro block is preferentially received and preferentially displayed at a receiving end, and good display effect and user experience are achieved.

Specifically, first, a key layer and a non-key layer need to be divided into a plurality of macro blocks, a macro block corresponding to the key layer is recorded as a key macro block, and a macro block corresponding to the non-key layer is recorded as a non-key macro block. Typically, each macroblock is 8 × 8 pixels or 16 × 16 pixels.

For example, a plurality of macro blocks obtained by dividing the first-level key layer are marked as first-level key macro blocks, a plurality of macro blocks obtained by dividing the second-level key layer are marked as second-level key macro blocks, a plurality of macro blocks obtained by dividing the third-level non-key layer are marked as third-level non-key macro blocks, a plurality of macro blocks obtained by dividing the fourth-level non-key layer are marked as fourth-level non-key macro blocks, and a plurality of macro blocks obtained by dividing the fifth-level non-key layer are marked as fifth-level non-key macro blocks.

The preset transmission rule refers to the coding transmission sequence of the macro blocks, for example, the transmission sequence is: a first-level key macro block, a second-level key macro block, a third-level non-key macro block, a fourth-level non-key macro block and a fifth-level non-key macro block.

That is, the key macro block is encoded first, then the non-key macro block is encoded, and then the encoded key macro block is transmitted first, and then the encoded non-key macro block is transmitted. The preset transmission rule can be set according to the requirements of users.

And the receiving end decodes and displays the key macro block received earlier, and then decodes and displays the received non-key macro block.

Specifically, the receiving end decodes and displays the received first-level key macro block and the second-level key macro block in sequence, and then decodes and displays the received third-level non-key macro block, fourth-level non-key macro block and fifth-level non-key macro block in sequence.

The disclosure provides an image processing method based on significance, which can determine a key area in an image to be transmitted through a neural network model, wherein the key area is usually an area which is focused by a user in the image to be transmitted; then, the sending end carries out preferential coding and transmission on the key area; then, the receiving end can preferentially decode and display the key region received earlier.

Therefore, a user can preferentially see the most concerned key areas at the receiving end, and the less concerned non-key areas can be displayed one by one, compared with the two existing coding modes, the coding mode provided by the invention can preferentially meet the key requirements of the user in the scene with limited transmission bandwidth, and the user experience is improved.

Fig. 3a and 3b are schematic diagrams illustrating the effect of an image processing method provided by an embodiment of the present disclosure, for example, the jellyfish in the ocean of fig. 3a has a higher significance level relative to the ocean; in the computer desktop image of fig. 3b, the icon is more noticeable to human eyes than the background desktop, and thus has higher significance. The region with high significance degree in the picture, namely the key region, is identified, and then the key region is coded preferentially and transmitted to the receiving end, so that the key region is displayed preferentially at the receiving end, and the user experience can be improved.

Fig. 4 is a structural diagram of an image processing apparatus according to an embodiment of the present disclosure, and the image processing apparatus 40 shown in fig. 4 includes: an acquisition module 401, a generation module 402, a division module 403 and a processing module 404,

an obtaining module 401, configured to obtain an image to be encoded;

a generating module 402, configured to generate a saliency region image according to a preset model; the pixel points of the image to be coded correspond to the pixel points of the image in the saliency region;

a dividing module 403, configured to divide an image to be encoded into N regions according to the saliency region image, where N is an integer greater than or equal to 2;

and the processing module 404 is configured to encode and send the N regions according to a preset priority order.

Fig. 5 is a structural diagram of an image processing apparatus according to an embodiment of the present disclosure, in which the generating module 402 includes:

the extraction submodule 4021 is used for extracting the salient region characteristics of each pixel point in the image to be encoded according to a preset model;

the quantization submodule 4022 is configured to quantize the saliency region feature value of each pixel into a saliency gray value according to a preset rule;

the generating sub-module 4023 is configured to generate a saliency region image based on the grayscale value.

Fig. 6 is a structural diagram of an image processing apparatus according to an embodiment of the present disclosure, in which the dividing module 403 includes:

the judgment submodule 4031 is configured to judge whether the significance gray value of each pixel belongs to a gray value interval corresponding to the ith area; wherein each region corresponds to at least one gray value interval, and i is more than or equal to 1 and less than or equal to N;

the dividing submodule 4032 is configured to divide the corresponding pixel point into an ith area if the saliency grayscale value belongs to a grayscale value interval corresponding to the ith area.

In one embodiment, the processing module is specifically configured to:

Fig. 7 is a structural diagram of an image processing apparatus according to an embodiment of the present disclosure, in which the apparatus further includes a determining module 400, configured to monitor a bandwidth before acquiring an image to be encoded, and determine that the bandwidth satisfies a preset condition.

Based on the image processing method described in the embodiment corresponding to fig. 1, an embodiment of the present disclosure further provides a computer-readable storage medium, for example, the non-transitory computer-readable storage medium may be a Read Only Memory (ROM), a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The storage medium stores computer instructions for executing the image processing method described in the embodiment corresponding to fig. 1, which is not described herein again.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. An image processing method, characterized in that the method comprises:

acquiring an image to be coded;

2. The image processing method according to claim 1, wherein generating the saliency region image according to a preset model comprises:

and generating a saliency region image based on the gray values.

3. The image processing method according to claim 2, wherein dividing the image to be encoded into N regions according to the saliency region image comprises:

4. The image processing method according to claim 1, wherein the preset model comprises a visual geometry group network (VGGNet) model.

5. The image processing method according to claim 4, wherein said encoding and transmitting the N regions respectively in a priority order set in advance comprises:

6. The image processing method according to any one of claims 1 to 5, wherein before acquiring the image to be encoded, the method further comprises:

7. An image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring an image to be coded;

8. The image processing apparatus according to claim 7, wherein the generation module includes:

9. The image processing apparatus according to claim 8, wherein the dividing module includes:

10. The image processing method according to claim 9, wherein the processing module is specifically configured to: