CN113096202A

CN113096202A - Image compression method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN113096202A
Application number: CN202110341971.2A
Authority: CN
Inventors: 熊荆武; 秦暕; 杨延生
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-07-09
Anticipated expiration: 2041-03-30
Also published as: CN113096202B

Abstract

The application discloses an image compression method and device, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring an image to be compressed containing a target object and an image compression model, wherein the image compression model is obtained based on first reconstruction loss training, the first reconstruction loss is obtained by weighting and summing one or more target co-location pixel value differences, the weight of the target co-location pixel value differences is in positive correlation with the probability that a target co-location corresponding to the target co-location pixel value differences belongs to a region covered by the target object, the target co-location is a co-location in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model; and processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed.

Description

Image compression method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to an image compression method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the improvement of the hardware configuration of the photographing apparatus, the data volume of the image photographed by the photographing apparatus is larger and larger, which also puts a great pressure on storing the image. Therefore, how to compress the image is of great significance.

Disclosure of Invention

The application provides an image compression method and device, electronic equipment and a computer readable storage medium.

In a first aspect, an image compression method is provided, the method comprising:

acquiring an image to be compressed containing a target object and an image compression model, wherein the image compression model is obtained based on first reconstruction loss training, the first reconstruction loss is obtained by weighting and summing one or more target co-location pixel value differences, the weight of the target co-location pixel value differences is in positive correlation with the probability that a target co-location corresponding to the target co-location pixel value differences belongs to a region covered by the target object, the target co-location is a co-location in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model;

and processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed.

With reference to any embodiment of the present application, the method for obtaining an image to be compressed and an image compression model including a target object includes obtaining the image compression model based on a first reconstruction loss training, where the first reconstruction loss is obtained by performing weighted summation on one or more target co-location pixel value differences, where a weight of a target co-location pixel value difference is positively correlated with a probability that a target co-location point corresponding to the target co-location pixel value difference belongs to a region covered by the target object, the target co-location point is a co-location point in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model, and includes:

acquiring a model to be trained, the training image and a target object area image of the training image, wherein the target object area image comprises the probability that pixel points in the training image belong to an area covered by the target object;

processing the training image by using the model to be trained to obtain the decompressed image;

calculating the pixel value difference between the collocated points in the decompressed image and the training image to obtain a pixel value difference set;

taking the pixel value in the target object area image as the weight of the corresponding pixel value difference, and carrying out weighted summation on the pixel value differences in the pixel value difference set to obtain the first reconstruction loss;

obtaining the loss of the model to be trained according to the first reconstruction loss;

and updating the parameters of the model to be trained according to the loss of the model to be trained to obtain the image compression model.

With reference to any embodiment of the present application, the processing the training image by using the model to be trained to obtain the decompressed image includes:

coding the training image by using the model to be trained, and extracting the feature data of the target object in the training image as target feature data;

decoding the target characteristic data by using the model to be trained to obtain the decompressed image;

before obtaining the loss of the model to be trained according to the first reconstruction loss, the method further includes:

acquiring reference characteristic data of the target object in the training image;

obtaining a second reconstruction loss according to a difference between the reference feature data and the target feature data;

obtaining the loss of the model to be trained according to the first reconstruction loss, wherein the obtaining of the loss of the model to be trained comprises:

and obtaining the loss of the model to be trained according to the first reconstruction loss and the second reconstruction loss.

With reference to any embodiment of the present application, before obtaining the loss of the model to be trained according to the first reconstruction loss and the second reconstruction loss, the method further includes:

acquiring a discrimination model;

processing the decompressed image by using the discrimination model to obtain the probability that the decompressed image is not the image output by the model to be trained;

obtaining a generated countermeasure loss according to the difference between the probability and a reference label, wherein the reference label represents that the decompressed image is not an image output by the model to be trained;

obtaining the loss of the model to be trained according to the first reconstruction loss and the second reconstruction loss, including:

and obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss and the generated countermeasure loss.

In combination with any embodiment of the present application, before the obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss, and the generation countermeasure loss, the method further includes:

calculating a perceptual loss between the training image and the decompressed image;

obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss and the generated countermeasure loss, including:

and obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss, the generated pair resistance loss and the perception loss.

With reference to any embodiment of the present application, the processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed includes:

coding the image to be compressed by using the image compression model, and extracting the characteristics of the target object in the image to be compressed to obtain first characteristic data;

and decoding the first characteristic data by using the image compression model to obtain the compressed image.

With reference to any embodiment of the present application, the encoding processing, performed on the image to be compressed by using the image compression model, extracting features of the target object in the image to be compressed to obtain first feature data includes:

performing feature extraction processing on the image to be compressed by using the image compression model to obtain second feature data of the target object;

and performing entropy coding processing on the second characteristic data to obtain the first characteristic data.

With reference to any embodiment of the present application, before performing entropy encoding processing on the second feature data to obtain the first feature data, the method further includes:

rounding the second characteristic data to obtain third characteristic data;

the performing entropy coding processing on the second feature data to obtain the first feature data includes:

and performing entropy coding processing on the third characteristic data to obtain the first characteristic data.

In combination with any embodiment of the present application, the target object includes a human face.

In a second aspect, there is provided an image compression apparatus comprising:

the image compression model is obtained based on first reconstruction loss training, the first reconstruction loss is obtained by weighting and summing one or more target co-location pixel value differences, the weight of the target co-location pixel value differences is in positive correlation with the probability that a target co-location point corresponding to the target co-location pixel value differences belongs to an area covered by a target object, the target co-location point is a co-location point in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model;

and the processing unit is used for processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed.

With reference to any embodiment of the present application, the obtaining unit is configured to:

In combination with any embodiment of the present application, the processing unit is configured to:

rounding the second characteristic data to obtain third characteristic data;

In a third aspect, an electronic device is provided, which includes: a processor and a memory for storing computer program code comprising computer instructions, the electronic device performing the method of the first aspect and any one of its possible implementations as described above, if the processor executes the computer instructions.

In a fourth aspect, another electronic device is provided, including: a processor, transmitting means, input means, output means, and a memory for storing computer program code comprising computer instructions, which, when executed by the processor, cause the electronic device to perform the method of the first aspect and any one of its possible implementations.

In a fifth aspect, there is provided a computer-readable storage medium having stored therein a computer program comprising program instructions which, if executed by a processor, cause the processor to perform the method of the first aspect and any one of its possible implementations.

A sixth aspect provides a computer program product comprising a computer program or instructions which, when run on a computer, causes the computer to perform the method of the first aspect and any of its possible implementations.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments or the background art of the present application, the drawings required to be used in the embodiments or the background art of the present application will be described below.

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of a first image provided in an embodiment of the present application;

fig. 2 is a schematic diagram of a second image according to an embodiment of the present application;

fig. 3 is a schematic flowchart of an image compression method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a target area image according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a training image provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of an image compression model according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an image compression apparatus according to an embodiment of the present application;

fig. 8 is a schematic diagram of a hardware structure of an image compression apparatus according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," and the like in the description and claims of the present application and in the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more, "at least two" means two or three and three or more, "and/or" for describing an association relationship of associated objects, meaning that three relationships may exist, for example, "a and/or B" may mean: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" may indicate that the objects associated with each other are in an "or" relationship, meaning any combination of the items, including single item(s) or multiple items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural. The character "/" may also represent a division in a mathematical operation, e.g., a/b-a divided by b; 6/3 ═ 2. At least one of the following "or similar expressions.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

By performing image compression processing on an image, the data amount of the image can be reduced. The traditional image compression method is realized by compressing an image through an image compression algorithm. However, since the parameters of the image compression algorithm are fixed, the compression rate obtained by compressing the image by the method is low, or the quality of the compressed image obtained by compressing the image by the method is low, wherein the image quality is positively correlated with the data amount of the image.

For example, if the data amount of a compressed image obtained by a conventional image compression method is small, more information is lost, and the quality of the compressed image obtained by compression is low. If the quality of a compressed image obtained by the conventional image compression method is made high, the data amount of the compressed image is large, and the compression rate is low.

Based on this, the embodiments of the present application provide an image compression technical solution to improve the compression rate while improving the quality of the compressed image.

Before proceeding to the following explanation, the co-location points that will appear below are first defined. In the embodiment of the application, the same point is a pixel point with the same position in different images.

For example, in the first image shown in fig. 1 and the second image shown in fig. 2, the position of the pixel point a in the first image is the same as the position of the pixel point a in the second image, the position of the pixel point B in the first image is the same as the position of the pixel point B in the second image, the position of the pixel point C in the first image is the same as the position of the pixel point C in the second image, and the position of the pixel point D in the first image is the same as the position of the pixel point D in the second image. Therefore, in the first image and the second image, the pixel point a and the pixel point a are the same-location point, the pixel point B and the pixel point B are the same-location point, the pixel point C and the pixel point C are the same-location point, and the pixel point D are the same-location point.

The execution subject of the embodiment of the present application is an image compression device, where the image compression device may be any electronic device that can execute the technical solution disclosed in the embodiment of the present application. Optionally, the image compression device may be one of the following: cell-phone, computer, panel computer, wearable smart machine.

It should be understood that the method embodiments of the present application may also be implemented by means of a processor executing computer program code. The embodiments of the present application will be described below with reference to the drawings. Referring to fig. 3, fig. 3 is a schematic flowchart illustrating an image compression method according to an embodiment of the present disclosure.

301. The method comprises the steps of obtaining an image to be compressed containing a target object and an image compression model, wherein the image compression model is obtained based on first reconstruction loss training, the first reconstruction loss is obtained by weighting and summing one or more target co-location pixel value differences, the weight of the target co-location pixel value differences is in positive correlation with the probability that a target co-location corresponding to the target co-location pixel value differences belongs to a region covered by the target object, the target co-location is the co-location in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image through the image compression model.

In the embodiment of the present application, the target object may be any object. In one possible implementation, the target object includes one of: human body, human face, vehicle.

In the embodiment of the present application, the image compression model is a deep learning model having an image compression function. For example, the image compression model may be stacked or composed of neural network layers such as a downsampling layer, an upsampling layer, and the like in a certain manner. The structure of the image compression model is not limited in the present application.

In the embodiment of the application, the image compression model is obtained based on the first reconstruction loss training, that is, in the training process of the image compression model, a gradient is obtained based on the first reconstruction loss, and the gradient is propagated in the image compression model through gradient back propagation so as to update the parameters of the image compression model.

In the embodiment of the present application, the first reconstruction loss is obtained by performing weighted summation on one or more target co-location pixel value differences, where a weight of the target co-location pixel value difference is in positive correlation with a probability that a target co-location corresponding to the target co-location pixel value difference belongs to a region covered by a target object, and the target co-location is a co-location in a training image and a decompressed image. And the decompressed image is an image obtained by processing the training image in the training process of the image compression model.

For example, in the training process of the image compression model, the image compression model processes a training image to obtain a decompressed image, wherein the training image includes a pixel point a and a pixel point B, and the decompressed image includes a pixel point a and a pixel point B. The pixel point A in the training image and the pixel point a in the decompression image are the same position point, and the pixel point B in the training image and the pixel point B in the decompression image are the same position point.

Assuming that the target object is a human face, the probability that the pixel point a belongs to the human face region is p1, the probability that the pixel point B belongs to the human face region is p2, the difference between the pixel value of the pixel point a and the pixel value of the pixel point a is d1, and the difference between the pixel value of the pixel point B and the pixel value of the pixel point B is d 2.

The image compression apparatus obtains a weight w1 for d1 from p1 and a weight w2 for d2 from p2, where w1 is k × p1, w2 is k × p2, and k is a positive number. The image compression apparatus obtains the first reconstruction loss w1 × d1+ w2 × d2 by weighted summation of d1 and d 2.

In the process of obtaining the first reconstruction loss, the higher the probability that the co-location point belongs to the pixel point region covered by the target object, the greater the weight of the pixel value difference corresponding to the co-location point, the parameter of the image compression model is updated based on the first reconstruction loss, so that the difference between the region covered by the target object in the decompressed image and the pixel point region covered by the target object in the training image can be reduced, and the difference between the pixel point region covered by the non-target object in the decompressed image and the pixel point region covered by the non-target object in the training image can be larger.

That is to say, when the image compression model obtained based on the first reconstruction loss training is used to process the image, the pixel point region covered by the target object in the image is restored more heavily, and meanwhile, the degree of restoration of the pixel point region covered by the non-target object in the image is low, that is, the information of the pixel point region covered by the non-target object is discarded.

In one implementation of obtaining an image to be compressed, an image compression apparatus receives an image to be compressed input by a user through an input component to obtain the image to be compressed. The above-mentioned input assembly includes: keyboard, mouse, touch screen, touch pad, audio input device, etc.

In another implementation manner of acquiring the image to be compressed, the image compression device receives the image to be compressed sent by the terminal to acquire the image to be compressed. The terminal may be any one of the following: cell-phone, computer, panel computer, server.

In one implementation of obtaining the image compression model, the image compression apparatus receives the image compression model obtained by the image compression model input by the user through the input component.

In another implementation manner of obtaining the image compression model, the image compression device receives the image compression model sent by the terminal to obtain the image compression model.

It should be understood that, in the embodiment of the present application, the image compression apparatus may obtain the compressed image and the image compression model simultaneously, and the image compression apparatus may also obtain the compressed image and the image compression model separately, which is not limited in this application.

302. And processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed.

The image compression device processes the image to be compressed by using the image compression model, can restore the pixel point region covered by the target object in the image to be compressed, and simultaneously discards the information of the pixel point region not covered by the target object in the image to be compressed to obtain the compressed image of the image to be compressed.

If the pixel point region covered by the target object in the image to be compressed is called a first reference region, the pixel point region covered by the target object in the compressed image is called a second reference region. The image compression device obtains the compressed image by processing the image to be compressed by using the image compression model, can reduce the difference between the image quality of the first reference area and the image quality of the second reference area, and enables the data volume of the compressed image to be smaller than that of the image to be compressed.

Therefore, in the embodiment of the present application, the image compression apparatus can improve the compression rate of the image to be compressed while preserving the image quality of the first reference area, resulting in a compressed image.

As an alternative embodiment, the image compression apparatus performs steps 1 to 6 in the process of performing step 301:

1. and acquiring a model to be trained, the training image and a target object area image of the training image, wherein the target object area image comprises the probability that pixel points in the training image belong to an area covered by the target object.

In this step, the model to be trained is a deep learning model. The training image is an image containing a target object. The size of the target object region image is the same as the size of the training image. And representing the pixel values of the pixel points in the target object region image, wherein the pixel points which are the same as the pixel points in the training image belong to the probability of the region covered by the target object.

For example, it is assumed that the target region image shown in fig. 4 is a target region image of the training image shown in fig. 5, and the target object is a human face. Then the pixel value of pixel point a characterizes the probability that pixel point a belongs to the face region, the pixel value of pixel point B characterizes the probability that pixel point B belongs to the face region, the pixel value of pixel point C characterizes the probability that pixel point C belongs to the face region, and the pixel value of pixel point D characterizes the probability that pixel point D belongs to the face region.

In one implementation of acquiring training images, an image compression device receives training images input by a user through an input component to acquire the training images.

In another implementation of acquiring the training image, the image compression apparatus receives the training image transmitted by the terminal to acquire the training image.

In one implementation of obtaining the model to be trained, the image compression apparatus receives the model to be trained input by the user through the input component to obtain the model to be trained.

In another implementation manner of obtaining the model to be trained, the image compression device receives the model to be trained sent by the terminal to obtain the model to be trained.

In one implementation of acquiring a target object area image of a training image, an image compression apparatus receives a target object area image input by a user through an input component to acquire the target object area image.

In another implementation of acquiring the target object area image of the training image, the image compression apparatus receives the target object area image sent by the terminal to acquire the target object area image.

In yet another implementation of obtaining the target object region image of the training image, the image compression apparatus obtains a target region model, and processes the training image using the target region model to obtain the target object region image. The target area model is a deep learning model with the probability of determining the probability that a pixel point in an image belongs to an area covered by a target object. For example, assuming that the target object is a face, the target area model may be a Face Alignment Network (FAN).

Optionally, the pixel values of the pixels in the target object region image represent, and the pixels in the training image that are at the same location as the pixels are used for identifying the importance of the target object, and the pixel values are positively correlated with the importance.

For example, assume that the target object is a human face. For face recognition, the importance of the five sense organ region is higher than that of the forehead region, and the importance of the forehead region is higher than that of the non-face region. Therefore, if the first pixel point belongs to the five sense organs region, the second pixel point belongs to the forehead region, and the third pixel point does not belong to the face region in the training image, the importance of the first pixel point to the face recognition is higher than the importance of the second pixel point to the face recognition, and the importance of the second pixel point to the face recognition is higher than the importance of the third pixel point to the face recognition.

Therefore, in the target area image, the fourth pixel point is the same position point of the first pixel point, the fifth pixel point is the same position point of the second pixel point, and the sixth pixel point is the same position point of the third pixel point. Then, the pixel value of the fourth pixel point is greater than the pixel value of the fifth pixel point, and the pixel value of the fifth pixel point is greater than the pixel value of the sixth pixel point.

It should be understood that, in the embodiment of the present application, the image compression apparatus may obtain the training image, the target object region image, and the model to be trained simultaneously, or the image compression apparatus may obtain the training image, the target object region image, and the model to be trained separately, which is not limited in this application.

2. And processing the training image by using the model to be trained to obtain the decompressed image.

3. And calculating the pixel value difference between the collocated points in the decompressed image and the training image to obtain a pixel value difference set.

4. And taking the pixel value in the target object area image as the weight of the corresponding pixel value difference, and performing weighted summation on the pixel value differences in the pixel value difference set to obtain the first reconstruction loss.

For example, in the target area image shown in fig. 4 and the training image shown in fig. 5, the pixel point a and the pixel point a are the same location point, the pixel point B and the pixel point B are the same location point, the pixel point C and the pixel point C are the same location point, and the pixel point D are the same location point.

If the difference between the pixel values of the pixel a and the pixel value of the pixel a is D1, the difference between the pixel value of the pixel B and the pixel value of the pixel B is D2, the difference between the pixel value of the pixel C and the pixel value of the pixel C is D3, and the difference between the pixel value of the pixel D and the pixel value of the pixel D is D4.

If the pixel value of the pixel point a is p1, the pixel value of the pixel point B is p2, the pixel value of the pixel point C is p3, and the pixel value of the pixel point D is p4 in the target area image. Then, the second loss is d1 × p1+ d2 × p2+ d3 × p3+ d4 × p 4.

5. And obtaining the loss of the model to be trained according to the first reconstruction loss.

In one possible implementation, the image compression apparatus treats the first reconstruction penalty as a penalty for the model to be trained.

6. And updating the parameters of the model to be trained according to the loss of the model to be trained to obtain the image compression model.

In one possible implementation, the image compression apparatus obtains the gradient of the model to be trained according to the loss of the model to be trained. And carrying out backward propagation on the gradient of the model to be trained by a backward gradient propagation method so as to update the parameters of the model to be trained until the loss of the model to be trained is converged, completing the training of the model to be trained, and obtaining the image compression model.

As an alternative embodiment, the image compression apparatus performs step 7 and step 8 in the process of performing step 2:

7. and performing encoding processing on the training image by using the model to be trained, and extracting the feature data of the target object in the training image as target feature data.

In the embodiment of the present application, the encoding process is used to extract feature data of a target object from an image. In one possible implementation, the encoding process includes a convolution process.

8. And decoding the target characteristic data by using the model to be trained to obtain the decompressed image.

In the embodiment of the present application, the decoding process is used to restore the feature data to an image. In one possible implementation, the decoding process includes a deconvolution process.

Upon execution of step 7 and step 8, the image compression apparatus further executes step 9 to step 11 before executing step 5:

9. and acquiring reference characteristic data of the target object in the training image.

In this step, the reference feature data is obtained by extracting features of the target object in the training image. The features of the target object carried by the reference feature data may be considered to include all features of the target object in the training image.

In one implementation of obtaining the reference feature data, the image compression apparatus receives the reference feature data input by the user through the input component to obtain the reference feature data.

In another implementation manner of acquiring the reference characteristic data, the image compression device receives the reference characteristic data transmitted by the terminal to acquire the reference characteristic data.

In another implementation of obtaining the reference feature data, the image compression apparatus obtains a feature extraction model, and extracts features of the target object in the training image using the feature extraction model to obtain the reference feature data. For example, the target object is a human face, and the feature extraction model may be a human face feature extraction model.

10. A second reconstruction loss is obtained based on a difference between the reference feature data and the target feature data.

In one possible implementation, the image compression apparatus calculates a cosine similarity between the reference feature data and the target feature data. And obtaining a second reconstruction loss according to the cosine similarity, wherein the second reconstruction loss is in negative correlation with the cosine similarity.

Optionally, the second reconstruction loss is a reconstruction loss between the reference feature data and the target feature data.

After obtaining the second reconstruction loss, the image compression apparatus performs the following steps in performing step 5:

11. and obtaining the loss of the model to be trained according to the first reconstruction loss and the second reconstruction loss.

Suppose the first reconstruction loss is L₁The second reconstruction loss is L₂Loss of model to be trained is L_tIn one possible implementation, L₁、L₂、L_tSatisfies the following formula:

L_t＝L₁+L₂+c₁… formula (1)

Wherein, c₁Are real numbers. Optionally, c₁＝0。

In another possible implementation, L₁、L₂、L_tSatisfies the following formula:

L_t＝α₁×(L₁+L₂) … formula (2)

Wherein alpha is₁Are real numbers. Alternatively, α₁＝1。

In yet another possible implementation, L₁、L₂、L_tSatisfies the following formula:

L_t＝α₁×(L₁+L₂)+c₁… formula (3)

Wherein alpha is₁、c₁Are all real numbers. Optionally, c₁＝0，α₁＝1。

In step 11, the image compression apparatus obtains the loss of the model to be trained by the first reconstruction loss and the second reconstruction loss. Therefore, the parameters of the model to be trained are updated according to the loss of the model to be trained to obtain the image compression model, so that the characteristics of the target object obtained by the image compression model through encoding processing on the image are richer, and the characteristics of the target object carried by the image obtained by the image compression model through decoding processing on the characteristic data of the target object are richer. Therefore, the difference between the image quality of the first reference area and the image quality of the second reference area can be reduced, and the compression effect of the image compression model on the image to be compressed is improved.

As an alternative embodiment, before performing step 11, the image compression model further performs steps 12 to 14:

12. and acquiring a discrimination model.

In this step, the discrimination model is a deep learning model with a classification function. Specifically, the discrimination model may obtain a probability that the image belongs to a generated image by processing the image, where the generated image is an image output by the model to be trained.

In one possible implementation manner, the discriminant model is obtained by training the neural network by using the labeled image set as training data, wherein the label of the image in the labeled image set includes whether the image is a generated image or not.

In the training process of the neural network, the neural network processes the images in the labeled image set (hereinafter referred to as training images) to obtain the probability that the training images are generated images. A first training loss is obtained according to the probability and the label of the training image. And updating parameters of the neural network according to the first training loss to obtain a discrimination model.

In one implementation of obtaining the discriminant model, the image compression apparatus receives the discriminant model obtained from the input component.

In another implementation of obtaining the discriminant model, the image compression apparatus receives the discriminant model obtained from the terminal.

13. And processing the decompressed image by using the discrimination model to obtain the probability that the decompressed image is not the image output by the model to be trained.

In this step, the image output by the model to be trained is the generated image. The image compression apparatus processes the decompressed image using the discrimination model, and can obtain a probability that the decompressed image is not the generated image.

14. And obtaining the generation countermeasure loss according to the difference between the probability and a reference label, wherein the reference label represents that the decompressed image is not the image output by the model to be trained.

In this step, the reference label represents that the compressed image is not an image output by the model to be trained. For example, assume that a label of an image is 1, which indicates that the probability of the image being a generated image is 1, and a label of an image is 0, which indicates that the probability of the image being a generated image is 0. Then the reference label is 1.

It should be understood that the image compression apparatus may label the decompressed image before processing the decompressed image using the model to be trained, so that the label of the decompressed image is the reference label.

If the image not output by the model to be trained is called a real image, for example, the training image is a real image, and the image to be compressed is also a real image. At this time, the image compression apparatus obtains the probability that the decompressed image is a true image by executing step 14.

The image compression device generates the countermeasure loss according to the probability that the decompressed image is the real image and the difference between the reference labels, and the difference between the decompressed image and the real image can be obtained.

For example, assume that the target object is a human face. Both eyes in the face of the person in the real image are symmetric about the eyebrow center, while both eyes in the decompressed image are not symmetric about the eyebrow center. The difference between the probability of the decompressed image being a real image and the reference label includes: both eyes in the training image are not symmetric about the eyebrow center.

Optionally, the generated countermeasure loss is a generated countermeasure network (GAN) loss between the probability and the reference tag.

After the generation countermeasure loss is obtained, the image compression model performs step 15 in the process of performing step 11:

15. and obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss and the generated countermeasure loss.

Suppose the first reconstruction loss is L₁The second reconstruction loss is L₂Generating a challenge loss of L₃Loss of model to be trained is L_tIn one possible implementation, L₁、L₂、L₃、L_tSatisfies the following formula:

L_t＝L₁+L₂+L₃+c₂… formula (4)

Wherein, c₂Are real numbers. Optionally, c₂＝0。

In another possible implementation, L₁、L₂、L₃、L_tSatisfies the following formula:

L_t＝α₂×(L₁+L₂+L₃) … formula (5)

Wherein alpha is₂Are real numbers. Alternatively, α₂＝1。

In yet another possible implementation, L₁、L₂、L₃、L_tSatisfies the following formula:

L_t＝α₂×(L₁+L₂+L₃)+c₂… formula (6)

Wherein alpha is₂、c₂Are all real numbers. Optionally, c₂＝0，α₂＝1。

In step 15, the image compression apparatus obtains the loss of the model to be trained by generating the countermeasures according to the first reconstruction loss, the second reconstruction loss, and the second reconstruction loss. Therefore, the parameters of the model to be trained are updated according to the loss of the model to be trained, and the image compression model is obtained. The image compression model can not only enrich the characteristics of the target object obtained by encoding the image, but also make the compressed image obtained by the image compression model closer to the image to be compressed input to the image compression model, thereby improving the compression effect of the image to be compressed by the image compression model.

As an alternative embodiment, the image compression apparatus further performs step 16 before performing step 15:

16. a perceptual loss between said training image and said decompressed image is calculated.

In this step, the image compression apparatus obtains the perceptual loss between the training image and the decompressed image by calculating the difference between the feature data of the training image and the feature data of the decompressed image. The perceptual loss is used to characterize the difference between the training image and the decompressed image in the high-level features, where the high-level features are features that are observable by a human eye from the image. For example, in a face image, the outline of the face, the size of the nose, the skin color of the face, and the distance between the noses are all features that can be observed by human eyes from the face image.

In one implementation, the image compression apparatus obtains a perceptual loss function (perceptual loss) and calculates a perceptual loss between the training image and the decompressed image through the perceptual loss function.

After obtaining the perceptual loss, the image compression apparatus performs step 17 in the process of performing step 15:

17. and obtaining the loss of the model to be trained according to the first reconstruction loss, the second reconstruction loss, the generation pair resistance loss and the perception loss.

Suppose the first reconstruction loss is L₁The second reconstruction loss is L₂Generating a challenge loss of L₃The perceptual loss is L₄Loss of model to be trained is L_tIn one possible implementation, L₁、L₂、L₃、L₄、L_tSatisfies the following formula:

L_t＝L₁+L₂+L₃+L₄+c₃… formula (7)

Wherein, c₃Are real numbers. Optionally, c₃＝0。

In another possible implementation, L₁、L₂、L₃、L₄、L_tSatisfies the following formula:

L_t＝α₃×(L₁+L₂+L₃+L₄) … formula (8)

Wherein alpha is₃Are real numbers. Alternatively, α₃＝1。

In yet another possible implementation, L₁、L₂、L₃、L₄、L_tSatisfies the following formula:

L_t＝α₃×(L₁+L₂+L₃+L₄)+c₃… formula (9)

Wherein alpha is₃、c₃Are all real numbers. Optionally, c₃＝0，α₃＝1。

In step 17, the image compression apparatus obtains the loss of the model to be trained by generating the confrontation loss and the perception loss according to the first reconstruction loss and the second reconstruction loss. Therefore, the parameters of the model to be trained are updated according to the loss of the model to be trained, and the image compression model is obtained. The method can enrich the characteristics of the target object obtained by encoding the image by the image compression model, make the compressed image obtained by the image compression model closer to the image to be compressed input to the image compression model, and reduce the difference between the compressed image obtained by the image compression model and the image to be compressed on the high-level characteristics, thereby improving the compression effect of the image compression model on the image to be compressed.

As an alternative embodiment, the image compression apparatus executes step 18 and step 19 in the process of executing step 302:

18. and performing encoding processing on the image to be compressed by using the image compression model, and extracting the characteristics of the target object in the image to be compressed to obtain first characteristic data.

The encoding process in this step is the same as the encoding process in step 7, and will not be described herein again.

19. And decoding the first feature data by using the image compression model to obtain the compressed image.

The encoding process in this step is the same as the decoding process in step 8, and will not be described herein again.

As an alternative embodiment, the image compression apparatus performs step 20 in the process of performing step 18:

20. and performing feature extraction processing on the image to be compressed by using the image compression model to obtain second feature data of the target object.

In one possible implementation, the feature extraction process in this step may be a convolution process.

21. And performing entropy coding processing on the second characteristic data to obtain the first characteristic data.

The image compression device can remove redundant data in the second characteristic data by performing entropy coding processing on the second characteristic data, so as to reduce the data quantity of the second characteristic data and obtain the first characteristic data. For example, the second characteristic data includes 3 values, and the 3 values are, in order: 1. 3 and 3. Since the second numerical value and the third numerical value are both 3 in the first feature data, the first feature data obtained by entropy-encoding the second feature data includes 2 numerical values: 1. 3. If the second numerical value is the characteristic value of the channel a and the third numerical value is the characteristic value of the channel B in the first characteristic data, the characteristic value of the channel a and the characteristic value of the channel B are both 3 in the second characteristic data.

As an alternative embodiment, before executing step 21, the image compression apparatus further executes step 22:

22. and rounding the second characteristic data to obtain third characteristic data.

In this step, rounding processing is used to convert the decimal into an integer. In the storage medium of the image compression device, the decimal is stored in floating point type data, the integer can be stored in integer type data, and the image compression device can reduce the data volume of the second characteristic data by rounding the second characteristic data to obtain the third characteristic data.

After obtaining the third feature data, the image compression apparatus performs step 23 in the process of performing step 21:

23. and performing entropy coding processing on the third feature data to obtain the first feature data.

The entropy coding process in this step is the same as the entropy coding process in step 21, and will not be described herein again.

As an alternative embodiment, the structure of the image compression model is shown in fig. 6. In fig. 6, the image compression model includes an encoding module including 9 encoding layers and a decoding module including 8 decoding layers. Optionally, each of the 9 encoding layers includes a convolution kernel, and each of the 8 decoding layers includes a deconvolution kernel, where the convolution kernels are used to implement convolution processing, and the deconvolution kernels are used to implement deconvolution processing.

Based on the technical scheme provided by the embodiment of the application, the embodiment of the application also provides a possible application scene.

At present, in order to enhance safety in work, life or social environments, camera monitoring equipment is installed in various regional places so as to perform safety protection according to video stream information. With the rapid increase of the number of cameras in public places, how to effectively determine images containing target people through massive video streams and determine information such as the tracks of the target people according to the information of the images is of great significance.

In view of the large number of images contained in the video stream captured by the monitoring device, a huge amount of storage space is consumed for storing the images. Therefore, when storing these images, it is necessary to compress these images to reduce the amount of data processing and reduce the storage space.

With the rapid development of face recognition technology, the identity of a person in an image can be recognized by performing face recognition on the image, and then whether the image contains a target person or not is determined. When the face of the image is recognized, the face features in the image need to be extracted, and the non-face features in the image have no effect on recognizing the identity of the person in the image. Therefore, based on the technical solution provided by the embodiment of the present application, it is determined that the target object is a human face, first feature data is obtained by extracting human face features in an image (hereinafter referred to as an original image) acquired by a monitoring camera, and the first feature data is decoded to obtain a compressed image, so that the data volume of the original image can be reduced while the human face feature information in the original image is not lost. Further, the first feature data may be used to determine the identity of a person in the original image.

Specifically, based on the technical scheme provided by the embodiment of the application, the model to be trained is trained to obtain the image compression model, and the image compression model is used for processing the original image to obtain the compressed image of the original image.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an image compression apparatus according to an embodiment of the present disclosure. The apparatus 1 comprises an acquisition unit 11 and a processing unit 12, wherein:

the image compression model is obtained based on a first reconstruction loss, the first reconstruction loss is obtained by performing weighted summation on one or more target coordinate point pixel value differences, the weight of the target coordinate point pixel value differences is in positive correlation with the probability that a target coordinate point corresponding to the target coordinate point pixel value differences belongs to an area covered by a target object, the target coordinate point is a coordinate point in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model;

and the processing unit 12 is configured to process the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed.

With reference to any embodiment of the present application, the obtaining unit 11 is configured to:

In combination with any embodiment of the present application, the processing unit 12 is configured to:

rounding the second characteristic data to obtain third characteristic data;

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present application may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Fig. 8 is a schematic diagram of a hardware structure of an image compression apparatus according to an embodiment of the present disclosure. The image compression apparatus 2 comprises a processor 21, a memory 22, an input device 23, an output device 24. The processor 21, the memory 22, the input device 23 and the output device 24 are coupled by a connector, which includes various interfaces, transmission lines or buses, etc., and the embodiment of the present application is not limited thereto. It should be appreciated that in various embodiments of the present application, coupled refers to being interconnected in a particular manner, including being directly connected or indirectly connected through other devices, such as through various interfaces, transmission lines, buses, and the like.

The processor 21 may be one or more Graphics Processing Units (GPUs), and in the case that the processor 21 is one GPU, the GPU may be a single-core GPU or a multi-core GPU. Alternatively, the processor 21 may be a processor group composed of a plurality of GPUs, and the plurality of processors are coupled to each other through one or more buses. Alternatively, the processor may be other types of processors, and the like, and the embodiments of the present application are not limited.

Memory 22 may be used to store computer program instructions, as well as various types of computer program code for executing the program code of aspects of the present application. Alternatively, the memory includes, but is not limited to, Random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), or compact disc read-only memory (CD-ROM), which is used for associated instructions and data.

The input means 23 are for inputting data and/or signals and the output means 24 are for outputting data and/or signals. The input device 23 and the output device 24 may be separate devices or may be an integral device.

It is understood that, in the embodiment of the present application, the memory 22 may be used to store not only the relevant instructions, but also relevant data, for example, the memory 22 may be used to store the image to be compressed acquired through the input device 23, or the memory 22 may also be used to store the compressed image obtained through the processor 21, and the like, and the embodiment of the present application is not limited to the data specifically stored in the memory.

It will be appreciated that fig. 8 only shows a simplified design of an image compression apparatus. In practical applications, the image compression apparatus may further include other necessary components, including but not limited to any number of input/output devices, processors, memories, etc., and all image compression apparatuses that can implement the embodiments of the present application are within the scope of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It is also clear to those skilled in the art that the descriptions of the various embodiments of the present application have different emphasis, and for convenience and brevity of description, the same or similar parts may not be repeated in different embodiments, so that the parts that are not described or not described in detail in a certain embodiment may refer to the descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)), or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media that can store program codes, such as a read-only memory (ROM) or a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Claims

1. A method of image compression, the method comprising:

2. The method according to claim 1, wherein the obtaining of the image to be compressed including the target object and the image compression model is based on a first reconstruction loss training, the first reconstruction loss is obtained by performing weighted summation on one or more target co-location pixel value differences, a weight of the target co-location pixel value differences is positively correlated with a probability that a target co-location point corresponding to the target co-location pixel value differences belongs to a region covered by the target object, the target co-location point is a co-location point in a training image and a decompressed image, and the decompressed image is an image obtained by processing the training image by the image compression model, and the method comprises:

3. The method of claim 2, wherein the processing the training image using the model to be trained to obtain the decompressed image comprises:

4. The method of claim 3, wherein before the deriving the loss of the model to be trained from the first reconstruction loss and the second reconstruction loss, the method further comprises:

acquiring a discrimination model;

5. The method of claim 4, wherein before said deriving a loss of said model to be trained from said first reconstruction loss, said second reconstruction loss, and said generated countervailing loss, said method further comprises:

6. The method according to any one of claims 1 to 5, wherein the processing the image to be compressed by using the image compression model to obtain a compressed image of the image to be compressed comprises:

7. The method according to claim 6, wherein the encoding the image to be compressed by using the image compression model, extracting features of the target object in the image to be compressed, and obtaining first feature data comprises:

8. The method according to claim 7, wherein before the entropy encoding of the second feature data to obtain the first feature data, the method further comprises:

rounding the second characteristic data to obtain third characteristic data;

9. The method of any one of claims 1 to 8, wherein the target object comprises a human face.

10. An image compression apparatus, characterized in that the apparatus comprises:

11. An electronic device, comprising: a processor and a memory for storing computer program code comprising computer instructions which, when executed by the processor, cause the electronic device to perform the method of any of claims 1 to 9.

12. A computer-readable storage medium, in which a computer program is stored, which computer program comprises program instructions which, if executed by a processor, cause the processor to carry out the method of any one of claims 1 to 9.