CN110310343B - Image processing method and device - Google Patents

Image processing method and device Download PDF

Info

Publication number
CN110310343B
CN110310343B CN201910451850.6A CN201910451850A CN110310343B CN 110310343 B CN110310343 B CN 110310343B CN 201910451850 A CN201910451850 A CN 201910451850A CN 110310343 B CN110310343 B CN 110310343B
Authority
CN
China
Prior art keywords
image
key
target image
key area
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910451850.6A
Other languages
Chinese (zh)
Other versions
CN110310343A (en
Inventor
陈海宝
孙浩然
刘奕晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Wanxiang Electronics Technology Co Ltd
Original Assignee
Xian Wanxiang Electronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Wanxiang Electronics Technology Co Ltd filed Critical Xian Wanxiang Electronics Technology Co Ltd
Priority to CN201910451850.6A priority Critical patent/CN110310343B/en
Publication of CN110310343A publication Critical patent/CN110310343A/en
Application granted granted Critical
Publication of CN110310343B publication Critical patent/CN110310343B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks

Abstract

The disclosure provides an image processing method and device, relates to the technical field of image processing, and can solve the problem that the processing of images in the prior art cannot achieve both high-quality display and low occupied bandwidth. The specific technical scheme is as follows: inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image; and obtaining image data corresponding to the key region after the image corresponding to the key region is coded, and generating image data corresponding to the non-key region through a preset generator for generating an countermeasure network. The present disclosure is useful for compromising high quality display with low bandwidth usage in image processing.

Description

Image processing method and device
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus.
Background
The image is a main channel for people to acquire information, with the development of technology, the image quality is higher and higher, the corresponding image data volume is larger and larger, and for the transmission of the image, the transmission of the image is generally required to be carried out after the compression processing due to the limited bandwidth during the transmission.
Existing image compression can be divided into two major categories, lossy compression and lossless compression. The main stream of the current industry is lossy compression algorithms such as JPEG, etc., and the design goal is to compress the file size as much as possible on the premise of not affecting the quality of the image which can be resolved by human beings, which means that the original information of a part of the image is removed; while lossless compression algorithms such as PNG-24 use a direct color bitmap, the amount of data after lossless compression is about five times that of lossy compression, but the improvement in display effect is relatively small.
The following problems exist in the current image processing schemes: the display effect of the lossy compression image is poor, the occupied bandwidth of lossless compression is high, and the processing of the image cannot be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
Disclosure of Invention
The embodiment of the disclosure provides an image processing method and device, which can solve the problem that the processing of images in the prior art cannot achieve both high-quality display and low occupied bandwidth. The technical scheme is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided an image processing method, the method including:
inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image;
and obtaining image data corresponding to the key region after the image corresponding to the key region is coded, and generating image data corresponding to the non-key region through a preset generator for generating an countermeasure network.
The key area and the non-key area of the target image are determined through the saliency detection neural network, the image of the non-key area is generated by adopting the generation of the image which has better display effect and smaller data volume and is generated by the countermeasure network, and the image of the key area is not subjected to compression processing with high compression ratio, so that the processing of the image can be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
In one embodiment, inputting the target image into a preset saliency detection neural network determines a critical area and a non-critical area of the target image, including:
inputting the target image into a saliency detection neural network to obtain a saliency map corresponding to the target image;
and determining a key region and a non-key region of the target image according to the saliency map.
In one embodiment, the image data corresponding to the key area is obtained after the image corresponding to the key area is encoded, including:
and processing the image corresponding to the key region by adopting a lossless compression algorithm to obtain image data corresponding to the key region.
And the image of the key region is processed by adopting a lossless compression algorithm, so that the image quality of partial image content focused by a user can be effectively ensured, and the user experience is improved.
In one embodiment, determining critical areas and non-critical areas of the target image from the saliency map includes:
processing the saliency map according to a preset threshold; wherein, the pixel point with the gray value larger than the preset threshold value in the saliency map is assigned 1, and the pixel point with the gray value smaller than or equal to the preset threshold value in the saliency map is assigned 0;
and performing matrix point multiplication operation on the processed saliency map and the target image to determine a key region, and determining a non-key region according to the key region and the target image.
In one embodiment, the significance detection neural network and the generation of a 16-bit fixed point for the data type in the convolutional neural network contained in the countermeasure network; the significance detection neural network and the generation of sparse matrices in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
By simplifying the data types in the convolutional neural network and the data storage of the sparse matrix, the storage space can be effectively saved and the processing speed can be increased.
In one embodiment, the significance detection neural network is built based on a VGG convolutional neural network.
According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:
the determining module is used for inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image;
the processing module is used for obtaining image data corresponding to the key area after the image corresponding to the key area is coded and processed, and generating image data corresponding to the non-key area through a preset generator for generating an countermeasure network.
The key area and the non-key area of the target image are determined through the saliency detection neural network, the image of the non-key area is generated by adopting the generation of the image which has better display effect and smaller data volume and is generated by the countermeasure network, and the image of the key area is not subjected to compression processing with high compression ratio, so that the processing of the image can be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
In one embodiment, the determining module is specifically configured to:
inputting the target image into a saliency detection neural network to obtain a saliency map corresponding to the target image;
and determining a key region and a non-key region of the target image according to the saliency map.
In one embodiment, the processing module is specifically configured to:
and processing the image corresponding to the key region by adopting a lossless compression algorithm to obtain image data corresponding to the key region.
And the image of the key region is processed by adopting a lossless compression algorithm, so that the image quality of partial image content focused by a user can be effectively ensured, and the user experience is improved.
In one embodiment, the determining module is specifically configured to include:
processing the saliency map according to a preset threshold; wherein, the pixel point with the gray value larger than the preset threshold value in the saliency map is assigned 1, and the pixel point with the gray value smaller than or equal to the preset threshold value in the saliency map is assigned 0;
and performing matrix point multiplication operation on the processed saliency map and the target image to determine a key region, and determining a non-key region according to the key region and the target image.
In one embodiment, the significance detection neural network and the generation of a 16-bit fixed point for the data type in the convolutional neural network contained in the countermeasure network; the significance detection neural network and the generation of sparse matrices in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
By simplifying the data types in the convolutional neural network and the data storage of the sparse matrix, the storage space can be effectively saved and the processing speed can be increased.
In one embodiment, the significance detection neural network is built based on a VGG convolutional neural network.
According to the image processing method and device, the key area and the non-key area of the target image are determined through the saliency detection neural network, the image with better display effect and smaller data size generated by the countermeasure network is generated for the image of the non-key area, and compression processing with high compression ratio is not performed for the image of the key area, so that high-quality display and low occupied bandwidth can be achieved for the image processing under the condition of limited transmission bandwidth.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic flow chart of an image processing method according to an embodiment of the disclosure;
FIG. 2 is a schematic diagram for describing significance in an embodiment of the present disclosure;
FIG. 3 is a schematic illustration of a saliency map provided by embodiments of the present disclosure;
FIG. 4 is a flow diagram of generating an countermeasure network generated image in an embodiment of the disclosure;
FIG. 5 is a schematic diagram of an implementation of an image processing method according to an embodiment of the disclosure;
fig. 6 is an effect schematic diagram of an image processing method provided in an embodiment of the present disclosure;
FIG. 7 is a schematic diagram depicting sparse matrix storage in an embodiment of the present disclosure;
fig. 8 is a schematic structural view of an image processing apparatus according to an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
An embodiment of the present disclosure provides an image processing method applied to an image sending device, for example, a server, a mobile terminal, and other devices, as shown in fig. 1, where the image processing method includes the following steps:
101. inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image.
Visual saliency (Visual Attention Mechanism, VA) refers to the human automatically processing regions of interest, referred to as saliency, in the face of a scene, while selectively ignoring regions of no interest. Such as jellyfish in the ocean of fig. 2, is more pronounced relative to the ocean.
The visual saliency detection calculation means that a convolutional neural network (Convolutional Neural Networks, CNN) is utilized to simulate a visual attention mechanism of a person and calculate the importance degree of information in a video field.
In one embodiment of the present disclosure, the significance detection neural network may be built based on a VGG convolutional neural network. The VGG convolutional neural network has the advantages that the structure of the neural network is simplified, the training efficiency can be improved in the training stage by building the saliency detection neural network based on the VGG convolutional neural network, and a good effect is achieved in practical application. Specifically, according to the embodiment of the disclosure, the VGG16 in the VGG convolutional neural network can be used for constructing a saliency detection neural network to carry out saliency detection on images, the VGG16 is a deep convolutional neural network commonly developed by the oxford university computer vision group and deep Mind company, the deep convolutional neural network is often used for extracting image features, the image Net is used as a training set for a large visual database for visual object recognition software research, and the VGG16 can recognize 1000 objects.
It should be noted that, after the saliency detection neural network in the embodiment of the present disclosure is built, a large number of images need to be used after training to achieve the expected effect. Specific training methods and processes may be understood with reference to the related art, and detailed descriptions of the embodiments of the present disclosure are omitted.
In one embodiment, step 101 may specifically include:
1011. inputting the target image into a saliency detection neural network to obtain a saliency map corresponding to the target image.
The saliency value of the salient region is higher than that of the non-salient region, the saliency value is the value of the saliency measurement parameter, the saliency value is the value calculated by a preset method, and a saliency map can be obtained according to the saliency value.
Specifically, after inputting the target image into the saliency detection neural network, for each pixel point in the target image, a sigmoid function in the last full-connection layer in the VGG convolutional neural network is passedOutputting the significance value corresponding to each pixel point as 0,1]Values within the interval quantized to [0,255]The higher the saliency value, the brighter in the gray scale, for example the saliency map in fig. 3. The gray scale map is a single color image having 256 gray scale levels or levels from black to white, each pixel in the image being represented by 8 bits of data, so that the pixel value is between 256 gray scales between black and white.
1012. And determining a key region and a non-key region of the target image according to the saliency map.
In one embodiment, step 1012 may specifically include:
1012a, processing the saliency map according to a preset threshold; and the pixels with gray values larger than a preset threshold value in the saliency map are assigned to be 1, and the pixels with gray values smaller than or equal to the preset threshold value in the saliency map are assigned to be 0.
The saliency map is a gray level map obtained by quantifying the saliency value output by the saliency detection neural network to 0-255 of the target image, and the saliency information of each pixel point of the target image is stored, and the saliency map substantially corresponds to a matrix with the numerical value in the [0,255] interval.
Specifically, processing the saliency map according to a preset threshold value, and assigning 1 to each pixel point corresponding to a gray value higher than the preset threshold value; and the pixel points corresponding to the gray values lower than the preset threshold value are all assigned to 0.
It should be noted that the difference of the preset thresholds may cause the size of the critical area to change. The higher the preset threshold, the more stringent the critical area judgment, and the smaller the critical area. As the preset threshold value decreases, more pixels around the salient content in the image are defined as key pixels, and the area of the key region increases. In addition, as the area of the key area increases, a picture with higher quality is decoded at the decoder end, but the more data is required to be preserved in a lossless manner or compressed with higher quality, the larger the storage of the image becomes.
1012b, performing matrix point multiplication operation on the processed saliency map and the target image to determine a key region, and determining a non-key region according to the key region and the target image.
Specifically, after the saliency map is processed in step 1012a, the value of the element in the corresponding matrix is 0 or 1, and the matrix corresponding to the saliency map after processing and the matrix corresponding to the target image are utilized to perform dot product operation, so that the matrix of the key area can be obtained to determine the key area, and the matrix of the non-key area can be obtained by subtracting the matrix corresponding to the key area from the matrix corresponding to the target image to determine the non-key area. It should be noted that, when the preset threshold in step 1012a is 255, the values of the elements in the matrix corresponding to the saliency map are all 0, and then all the regions of the target image are determined to be non-critical regions after the dot product operation. In addition, the above-mentioned non-critical area can also be determined by the dot product operation, and in the embodiment of the present disclosure, the critical area/non-critical area is determined according to actual needs, and the area with high significance may be the critical area or the non-critical area.
It should be noted that, for the target image, a multi-level key region may also be formed by grading the saliency values. For example, saliency values ranging from 0 to 255 are divided into five intervals [0, 50), [51,100 ], [101,150), [151,200 ], and [201,255]. The setting of the gray value threshold may also be determined according to the division of the saliency interval, and for example, the gray value threshold may be set to 0, 50, 100, 150, or 200. For step 1012a, the saliency maps may be respectively processed by using the 5 threshold values, and then in step 1012b, the 5-level critical area and the corresponding 5-level non-critical area are determined after dot multiplication operation is performed on the matrix corresponding to the target image and the matrix corresponding to the 5 saliency maps obtained by the processing.
102. And obtaining image data corresponding to the key region after the image corresponding to the key region is coded, and generating image data corresponding to the non-key region through a preset generator for generating an countermeasure network.
The generation of the antagonism network (Generative Adversarial Networks, GAN) was proposed in 2014 by Ian Good business in his paper, the idea originated from game theory, consisting of a Generator and a Discriminator. The challenge network is generated, the problem to be solved is how to learn new samples from training samples, the training samples are pictures to generate new pictures, the training samples are articles to output new articles, and so on.
Specifically, after a key region is determined for a region which is focused on by a user, namely a region with high significance, in the target image, lossless compression/low compression ratio coding processing is performed on a corresponding image part, and higher quality and more details are reserved; because the GAN generated image is based on the distribution of image data, the human eyes are more identified, and the image quality evaluation indexes and subjective naked eye likeness can exceed the image compression standards of JPEG, JPEG2000, webP, PNG and the like at present, and on the premise of the same image size, compared with an encoding processing mode adopting a high compression ratio, the GAN has the potential advantage of better subjective quality, and the human eye likeness of the image generated by the GAN is better, so that the corresponding image data can be generated for the non-critical area by the GAN. Preferably, the image corresponding to the key region is processed in step 102 using a lossless compression algorithm, typically if conditions allow.
The training goal of GAN is for the generator to spoof the arbiter as much as possible. The generator and the arbiter are mutually opposed and continuously adjust parameters, and the final purpose is to make: 1) The arbiter cannot judge whether the output result of the generated network is real; 2) Generating the countermeasure network can generate a picture in spurious.
GAN requires training using a large number of sample images, the training process is as follows: the sample image passes through the saliency detection neural network in the embodiment of the disclosure to determine a key region and a non-key region of the sample image, and an encoder and a quantizer generate a feature map for the sample image; in combination with the foregoing, the first generation generator randomly generates some very bad images according to the feature map of the key area, and then sends the very bad images to the first generation discriminator, and the discriminator can accurately classify the images according to the original images as a second classifier, and can set the output 0 for the generated images with large phase difference from the original images and the output 1 for the generated images with small phase difference from the original images. The parameters of the training generator are used for enabling the second generation generator to generate better images aiming at the decline of the objective function, the first generation discriminator is input to be recognized as true, namely, output 1 is output, and then the parameters of the discriminator are trained, so that the second generation discriminator discriminates that the picture generated by the second generation generator is false. And the third time and the fourth time … … are similar, along with the iteration of training times, the weight parameters of the generator are changed, so that the pixel distribution of the generated image tends to the pixel distribution of the original image until the discriminant cannot distinguish the generated image from the original image, namely, the network fitting, at the moment, the output values of the discriminant and the original image are 0.5, and the training of the round is finished. Meanwhile, a limiting condition on the size of the generated image is added in the training process, so that the generated image is smaller than the original image in size, and further the image compression effect is achieved.
The process of generating an image using the trained GAN is as follows: as shown in fig. 4, the original image is processed by an encoder and a quantizer to obtain a corresponding feature map, the feature map is stored to be the actual size after compression, and the feature map is sent to a generator for decoding (the decoder in fig. 4 is the generator, and the decoding process of the decoder is the process of the generator) to restore the image. The feature map obtained after the image coding (convolution) quantization stores the information of the image, and then the feature map is decoded by a decoder (deconvolution) to obtain the image similar to the original map.
In addition, the storage of the quantized feature vector in the middle process of storing the image information is the image compression size, and the compressed image storage size S compressed by using GAN network for all areas GAN The calculation formula is as follows:
referring to fig. 4, in the above formula, W is the width of the picture to be compressed, H is the height of the picture to be compressed, N is the downscaling multiple, representing the width/height reduction multiple of the feature map after coding quantization, C is the number of channels of the feature map, and L is the quantization bit number (e.g. the data storage format of each weight parameter before unquantization is floating point 32 bits, i.e. the bit number is 32, the number of weight postamble with quantization order of 8 becomes log) 2 8 =3), the storage size of the generated image is controlled by adjusting N, C, L. In addition, the memory space can be further reduced by adopting an arithmetic coding isentropic coding mode for the feature map, and a higher compression ratio is obtained.
For additional fractional saliency areas, the original image positions may be recorded using binary encoding for superimposition. Wherein the size of the saliency region needs to be additionally added, and the storage size of the saliency region is maximum S key The method comprises the following steps:
S key =Sum(key pixel)*log 2 D key units: bit
In the above formula; the number of pixels with key pixels as key region (with higher significance), D key And the pixel depth corresponding to each pixel point is set. Since the key pictures can also be compressed by lossless compression or other lossy compression, the storage size is generally smaller than S key Therefore, the image size S under the technical scheme of the embodiment of the present disclosure satisfies the following inequality relationship:
S≤S GAN +S key
it should be noted that, when the GAN is actually used after training, since the generator is already learned to generate the image, the discriminator may be omitted and only the generator may be reserved, or the discriminator may be reserved continuously, which is not limited in the embodiment of the present disclosure. In addition, when different requirements are made on the size of the data volume after the target image processing, if the requirement of higher compression ratio is met, the key area can be reduced, namely a smaller key area is selected, for example, the figure in the kayak center in the multi-level saliency area in FIG. 5 is subjected to lossless compression/low-pressure compression ratio compression, and the rest part is generated by GAN; if the compression ratio is required to be reduced, the key area can be increased, namely a larger key area is selected, and in the multi-level saliency area of fig. 5, the figure and the kayak are compressed in a lossless compression/low compression ratio manner, and the rest part is generated by using GAN. Obviously, the latter has smaller compression area, so that the superimposed picture is stored more and the compression ratio is smaller. In practical applications, as shown in fig. 5, different key areas may be divided for selecting the size of the key area according to the compression ratio requirement.
It is also noted that GAN networks have a high quality of generation against the background, but small objects with a large amount of detail are not effectively preserved, even do not recognize the information of the object. As shown in fig. 6, fig. 6a is a schematic diagram of an original image and a corresponding salient region (non-salient diagram, which is for more intuitively showing the visual salient content); fig. 6b is a picture generated directly by GAN, and it can be seen from the detailed view that a great amount of details are lost in the high building at the end and far end of the road, and the parking sign P at the road side cannot be generated well, but objects such as buildings and signs often have higher information content; fig. 6c illustrates an image obtained by first identifying a salient region with more information in the image, compressing the salient region with higher quality retention or low compression ratio, and generating the rest by using GAN with high compression ratio according to the technical scheme provided by the embodiment of the present disclosure. Compared with the image generated by directly using GAN in fig. 6b, after high-quality reservation is performed on the region with high significance, the detail diagram of fig. 6c can show that the high building at the far end of the road reserves rich details, and the parking sign "P" beside the road can be identified.
In one embodiment, the significance detection neural network and the generation of a 16-bit fixed point for the data type in the convolutional neural network contained in the countermeasure network; the significance detection neural network and the generation of sparse matrices in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
Specifically, parameter quantization is performed on the significance detection neural network and the weight of CNN used in the generation countermeasure network, for example, 32-bit floating point of the data type is converted into 16-bit fixed point, so that storage space can be effectively saved under the requirement of guaranteeing basic training effect. For the significance detection neural network and the generation of the sparse matrix appearing in the countermeasure network, a new storage scheme is adopted, and as shown in fig. 7, each value in the 4×4 sparse matrix is 16-bit integer data, and occupies 16×4×4=256 bits in total. The binary matrix records non-zero value positions according to a specified sequence, and occupies 4 multiplied by 4=16 bits for storage; each non-zero value is recorded in a certain sequence, and each non-zero value is 16-bit integer data, and occupies 16×5=80-bit storage, so that 256- (16+80) =160-bit storage is saved by adopting the new scheme to store the matrix, and the matrix storage is reduced by 62.5%.
It is worth mentioning that the compression of the neural network model is favorable for being transplanted to portable terminals such as mobile phones and the like, and the operation speed of the network is improved. Compressing and accelerating the image compression model based on the neural network can improve the portability and compression rate (number of compressed pictures per second) of the model.
In one embodiment, the method may further comprise;
103. and transmitting the processed key region image data and the generated non-key region image data to an image receiving device.
Specifically, adding the matrix corresponding to the image data of the key area and the matrix corresponding to the generated image data of the non-key area to obtain a complete matrix of the image data so as to obtain the completed image data, and the image sending device can combine the two parts of data to obtain the completed image data, send the completed image data to the image receiving device and then decode the completed image data by the image receiving device to obtain a high-quality image corresponding to the target image content; or the image sending device can also directly send the two parts of data to the image receiving device, and then the image receiving device receives the two parts of data and then combines the two parts of data to obtain the finished image data, and then decodes the image data to obtain the high-quality image corresponding to the target image content.
The image receiving device may be a mobile terminal, a server, or the like, and in this embodiment of the present disclosure, the image sending device and the image receiving device are a way of functionally dividing the devices, and in practical application, a certain device may be an image sending device in one scene and may be an image receiving device in another scene.
According to the image processing method provided by the embodiment of the disclosure, the key area and the non-key area of the target image are determined through the saliency detection neural network, the image of the non-key area is generated by adopting the image with better display effect generated by the countermeasure network and smaller data volume, and the image of the key area is not subjected to compression processing with high compression ratio, so that the processing of the image can be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
Based on the image processing method described in the above corresponding embodiment, the following is an embodiment of the apparatus of the present disclosure, which may be used to execute the above embodiment of the method of the present disclosure.
An embodiment of the present disclosure provides an image processing apparatus, as shown in fig. 8, the image processing apparatus 80 including:
a determining module 801, configured to input a target image into a preset saliency detection neural network to determine a critical area and a non-critical area of the target image;
the processing module 802 is configured to encode the image corresponding to the key area to obtain image data corresponding to the key area, and generate, by using a preset generator for generating an countermeasure network, image data corresponding to a non-key area.
The key area and the non-key area of the target image are determined through the saliency detection neural network, the image of the non-key area is generated by adopting the generation of the image which has better display effect and smaller data volume and is generated by the countermeasure network, and the image of the key area is not subjected to compression processing with high compression ratio, so that the processing of the image can be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
In one embodiment, the determining module 801 is specifically configured to:
inputting the target image into a saliency detection neural network to obtain a saliency map corresponding to the target image;
and determining a key region and a non-key region of the target image according to the saliency map.
In one embodiment, the processing module 802 is specifically configured to:
and processing the image corresponding to the key region by adopting a lossless compression algorithm to obtain image data corresponding to the key region.
And the image of the key region is processed by adopting a lossless compression algorithm, so that the image quality of partial image content focused by a user can be effectively ensured, and the user experience is improved.
In one embodiment, the determining module 801 is specifically configured to include:
processing the saliency map according to a preset threshold; wherein, the pixel point with the gray value larger than the preset threshold value in the saliency map is assigned 1, and the pixel point with the gray value smaller than or equal to the preset threshold value in the saliency map is assigned 0;
and performing matrix point multiplication operation on the processed saliency map and the target image to determine a key region, and determining a non-key region according to the key region and the target image.
In one embodiment, the significance detection neural network and the generation of a 16-bit fixed point for the data type in the convolutional neural network contained in the countermeasure network; the significance detection neural network and the generation of sparse matrices in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
By simplifying the data types in the convolutional neural network and the data storage of the sparse matrix, the storage space can be effectively saved and the processing speed can be increased.
In one embodiment, the significance detection neural network is built based on a VGG convolutional neural network.
The VGG convolutional neural network has the advantages that the structure of the neural network is simplified, the training efficiency can be improved in the training stage by building the saliency detection neural network based on the VGG convolutional neural network, and a good effect is achieved in practical application.
According to the image processing device provided by the embodiment of the disclosure, the key area and the non-key area of the target image are determined through the saliency detection neural network, the image of the non-key area is generated by adopting the image with better display effect generated by the countermeasure network and smaller data volume, and the image of the key area is not subjected to compression processing with high compression ratio, so that the processing of the image can be realized with both high-quality display and low occupied bandwidth under the condition of limited transmission bandwidth.
Based on the image processing method described in the above embodiment corresponding to fig. 1, the embodiment of the present disclosure further provides a computer readable storage medium, for example, a non-transitory computer readable storage medium may be a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like. The storage medium stores computer instructions for executing the image processing method described in the corresponding embodiment of fig. 1, which is not described herein.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (4)

1. An image processing method, the method comprising:
inputting a target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image;
the image data corresponding to the key area is obtained after the image corresponding to the key area is coded, and the image data corresponding to the non-key area is generated through a preset generator for generating an countermeasure network;
inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image, wherein the method comprises the following steps:
inputting the target image into the saliency detection neural network to obtain a saliency map corresponding to the target image;
determining a key area and a non-key area of the target image according to the saliency map;
the step of obtaining the image data corresponding to the key region after the image corresponding to the key region is encoded, includes:
processing the image corresponding to the key region by adopting a lossless compression algorithm to obtain image data corresponding to the key region;
the determining the key area and the non-key area of the target image according to the saliency map comprises the following steps:
processing the saliency map according to a preset threshold; wherein, the pixel point with the gray value larger than the preset threshold value in the saliency map is assigned 1, and the pixel point with the gray value smaller than or equal to the preset threshold value in the saliency map is assigned 0;
and performing matrix dot product operation on the processed saliency map and the target image to determine the key region, and determining the non-key region according to the key region and the target image.
2. The method of claim 1, wherein the data type in the saliency detection neural network and the convolutional neural network included in the generation countermeasure network is a 16-bit fixed point; the significance detection neural network and the generating sparse matrix in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
3. An image processing apparatus, characterized in that the apparatus comprises:
the determining module is used for inputting the target image into a preset saliency detection neural network to determine a key area and a non-key area of the target image;
the processing module is used for obtaining image data corresponding to the key area after the image corresponding to the key area is coded and processed, and generating the image data corresponding to the non-key area through a preset generator for generating an countermeasure network;
the determining module is specifically configured to:
inputting the target image into the saliency detection neural network to obtain a saliency map corresponding to the target image;
determining a key area and a non-key area of the target image according to the saliency map;
the processing module is specifically configured to:
processing the image corresponding to the key region by adopting a lossless compression algorithm to obtain image data corresponding to the key region;
the determining module is specifically configured to include:
processing the saliency map according to a preset threshold; wherein, the pixel point with the gray value larger than the preset threshold value in the saliency map is assigned 1, and the pixel point with the gray value smaller than or equal to the preset threshold value in the saliency map is assigned 0;
and performing matrix dot product operation on the processed saliency map and the target image to determine the key region, and determining the non-key region according to the key region and the target image.
4. The apparatus of claim 3, wherein the type of data in the saliency detection neural network and the convolutional neural network included in the generation countermeasure network is a 16-bit fixed point; the significance detection neural network and the generating sparse matrix in the antagonism network are stored as follows: and storing non-zero values in the sparse matrix according to a preset sequence, and then marking the non-zero values as 1 to be converted into a binary matrix and storing the binary matrix.
CN201910451850.6A 2019-05-28 2019-05-28 Image processing method and device Active CN110310343B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910451850.6A CN110310343B (en) 2019-05-28 2019-05-28 Image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910451850.6A CN110310343B (en) 2019-05-28 2019-05-28 Image processing method and device

Publications (2)

Publication Number Publication Date
CN110310343A CN110310343A (en) 2019-10-08
CN110310343B true CN110310343B (en) 2023-10-03

Family

ID=68075231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910451850.6A Active CN110310343B (en) 2019-05-28 2019-05-28 Image processing method and device

Country Status (1)

Country Link
CN (1) CN110310343B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111193932A (en) * 2019-12-13 2020-05-22 西安万像电子科技有限公司 Image processing method and device
CN111027576B (en) * 2019-12-26 2020-10-30 郑州轻工业大学 Cooperative significance detection method based on cooperative significance generation type countermeasure network
CN111563517B (en) * 2020-04-20 2023-07-04 腾讯科技(深圳)有限公司 Image processing method, device, electronic equipment and storage medium
CN114650421A (en) * 2020-12-18 2022-06-21 中兴通讯股份有限公司 Video processing method and device, electronic equipment and storage medium
CN112651940B (en) * 2020-12-25 2021-09-17 郑州轻工业大学 Collaborative visual saliency detection method based on dual-encoder generation type countermeasure network
CN113450313B (en) * 2021-06-04 2022-03-15 电子科技大学 Image significance visualization method based on regional contrast learning
CN113362315B (en) * 2021-06-22 2022-09-30 中国科学技术大学 Image quality evaluation method and evaluation model based on multi-algorithm fusion
CN114445437A (en) * 2021-12-29 2022-05-06 福建慧政通信息科技有限公司 Image compression clipping method of license picture and storage medium
CN114937019B (en) * 2022-05-30 2022-12-23 杭州健培科技有限公司 Key point detection method and device based on self-adaptive local gray scale balance and application
CN115396670A (en) * 2022-07-28 2022-11-25 西安空间无线电技术研究所 Image data compression method for local area processing
CN115841486B (en) * 2023-02-20 2023-04-25 深圳市特安电子有限公司 Gas perception infrared image processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016409A (en) * 2017-03-20 2017-08-04 华中科技大学 A kind of image classification method and system based on salient region of image
CN109300128A (en) * 2018-09-29 2019-02-01 聚时科技(上海)有限公司 The transfer learning image processing method of structure is implied based on convolutional Neural net

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11003995B2 (en) * 2017-05-19 2021-05-11 Huawei Technologies Co., Ltd. Semi-supervised regression with generative adversarial networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107016409A (en) * 2017-03-20 2017-08-04 华中科技大学 A kind of image classification method and system based on salient region of image
CN109300128A (en) * 2018-09-29 2019-02-01 聚时科技(上海)有限公司 The transfer learning image processing method of structure is implied based on convolutional Neural net

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于编解码网络的多姿态人脸图像正面化方法;徐海月等;《中国科学:信息科学》;20190430(第04期);全文 *

Also Published As

Publication number Publication date
CN110310343A (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN110310343B (en) Image processing method and device
CN111798400B (en) Non-reference low-illumination image enhancement method and system based on generation countermeasure network
Guarda et al. Point cloud coding: Adopting a deep learning-based approach
US10582217B2 (en) Methods and apparatuses for coding and decoding depth map
US8260067B2 (en) Detection technique for digitally altered images
CN104284190B (en) Compressed image steganography encoding method based on AMBTC high and low average optimization
US10965948B1 (en) Hierarchical auto-regressive image compression system
CN111433821A (en) Method and apparatus for reconstructing a point cloud representing a 3D object
US11893762B2 (en) Method and data processing system for lossy image or video encoding, transmission and decoding
CN113784129A (en) Point cloud quality evaluation method, encoder, decoder and storage medium
JP2023507968A (en) Method and apparatus in video coding for machines
CN104428793A (en) Method for transforming image descriptor based on gradient histogram and relative image processing apparatus
Alam et al. An improved JPEG image compression algorithm by modifying luminance quantization table
Jallouli et al. A preprocessing technique for improving the compression performance of JPEG 2000 for images with sparse or locally sparse histograms
CN116233445A (en) Video encoding and decoding processing method and device, computer equipment and storage medium
CN114897189A (en) Model training method, video coding method and decoding method
Löhdefink et al. Focussing learned image compression to semantic classes for V2X applications
CN115442609A (en) Characteristic data encoding and decoding method and device
Kekre et al. Image Reconstruction using Fast Inverse Half tone and Huffman Coding Technique
CN116600119A (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN111193932A (en) Image processing method and device
Hu et al. Improved color image coding schemes based on single bit map block truncation coding
CN113518229A (en) Method and device for training loop filter network, computer equipment and storage medium
CN106375768B (en) Video steganalysis method based on intra prediction mode calibration
CN107124559A (en) A kind of communication data compression method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant