CN112929666B - Method, device and equipment for training coding and decoding network and storage medium - Google Patents

Method, device and equipment for training coding and decoding network and storage medium Download PDF

Info

Publication number
CN112929666B
CN112929666B CN202110303982.1A CN202110303982A CN112929666B CN 112929666 B CN112929666 B CN 112929666B CN 202110303982 A CN202110303982 A CN 202110303982A CN 112929666 B CN112929666 B CN 112929666B
Authority
CN
China
Prior art keywords
image
network
decoding
coding
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110303982.1A
Other languages
Chinese (zh)
Other versions
CN112929666A (en
Inventor
任文龙
倪煜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Cloud Network Technology Co Ltd
Original Assignee
Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Cloud Network Technology Co Ltd filed Critical Beijing Kingsoft Cloud Network Technology Co Ltd
Priority to CN202110303982.1A priority Critical patent/CN112929666B/en
Publication of CN112929666A publication Critical patent/CN112929666A/en
Application granted granted Critical
Publication of CN112929666B publication Critical patent/CN112929666B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for training a coding and decoding network. The method comprises the following steps: acquiring a sample image while performing each round of training; inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to an encoding and decoding network according to the sample image and the decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training. According to the method and the device, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the restored image is high, and the decoded image is not required to be restored by using a restoration network.

Description

Method, device and equipment for training coding and decoding network and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for training a coding/decoding network.
Background
With the popularization of smart devices, more and more images are produced, and high-resolution images tend to have larger data volume, and the images with large data volume occupy more resources (storage space and transmission resources) no matter in a storage scene or a transmission scene. To solve such a problem, image compression techniques have been developed. The original image is subjected to image coding by using an image compression technology, so that the original image can be compressed into compressed data with a small data volume, then the compressed data is subjected to image decoding, and the compressed data can be restored into the original image.
Currently, the commonly used image compression technologies are a WebP image compression technology and a BPG (Better Portable Graphics) image compression technology. WebP supports lossy compression and lossless compression, adopts VP8 encoding mode, and many websites adopt WebP picture format. The BPG is a charging project, the usage cost is very High, the BPG adopts an HEVC (High Efficiency Video Coding) encoding mode, and the files in the BPG format are only half of the files in the JPEG (Joint Photographic Experts Group) format under the same storage volume.
However, the WebP image compression technology and the BPG image compression technology only consider how to compress an image into compressed data with a small data amount, and do not consider how to avoid the problem of serious distortion in the image recovery process, so that the quality of the recovered original image is not high.
Disclosure of Invention
The application provides a training method, a device, equipment and a storage medium of a coding and decoding network, which aim to solve the problem of image distortion in image recovery of the existing image compression technology.
In view of the above technical problems, the present application is implemented by the following technical solutions:
the embodiment of the application provides a training method of a coding and decoding network, which comprises the following steps: acquiring a sample image while performing each round of training; inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
Wherein, the determining the decoding loss value corresponding to the coding and decoding network according to the sample image and the decoding image corresponding to the sample image comprises: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
Wherein the obtaining a sample image comprises: acquiring at least one sample image; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Wherein, the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device; the inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image storage instruction; the image storage instructions are to instruct to store a first target image; inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication; and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
Wherein the image storage instructions are to instruct to store at least one first target image; the inputting the first target image into the coding and decoding network comprises: generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image; inputting the first target image matrix into the coding and decoding network; the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes: performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image; acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network; and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and executing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
Wherein the image reading instruction is used for indicating to read at least one second target image; before the compressed data corresponding to the second target image is input into the codec network, the method further includes: generating a compressed data matrix according to the compressed data respectively corresponding to the at least one second target image; each row vector in the compressed data matrix corresponds to one second target image; inputting the compressed data matrix into the coding and decoding network; the decoding recovery processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding recovery processing includes: utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
When the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training, determining that the coding and decoding network converges comprises the following steps: in the training of continuous preset rounds, when the network loss values corresponding to the coding and decoding network are all in the network convergence range, determining that the coding and decoding network converges; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges.
The embodiment of the present application further provides a training apparatus for a coding and decoding network, including: the acquisition module is used for acquiring a sample image when each round of training is executed; the coding and decoding module is used for inputting the sample images into a coding and decoding network, and sequentially executing compression coding processing and decoding recovery processing on the sample images by using the coding and decoding network to obtain decoded images corresponding to the sample images; a first determining module, configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image; a second determining module, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and the third determining module is used for determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training.
The embodiment of the application also provides electronic equipment which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; a memory for storing a computer program; and a processor for implementing the steps of the method for training the codec network described in any one of the above when executing the program stored in the memory.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for training a codec network described in any one of the above.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
in the embodiment of the application, the coding and decoding network is used for carrying out compression coding and decoding recovery on the image, in the process of training the coding and decoding network, not only the image loss value of the whole coding and decoding network is concerned, but also the decoding loss value in the coding and decoding network is concerned, the image loss value is the measure of coding and decoding loss, the decoding loss value is the supervision of the decoding accuracy, the coding and decoding effects of the coding and decoding network are determined by adopting double standards of the image loss value and the decoding loss value, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the recovered image is higher, and the decoded image does not need to be repaired by using a repairing network.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
FIG. 1 is a flow chart of a method of training a codec network according to an embodiment of the present application;
FIG. 2 is a schematic diagram of the operation of an autoencoder according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a multi-layer AE network according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a training network structure of a codec network according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the training steps of a codec network according to an embodiment of the present invention;
FIG. 6 is a flowchart of the steps of image storage according to one embodiment of the present invention;
FIG. 7 is a flowchart of the steps of image reading according to one embodiment of the present invention;
FIG. 8 is a block diagram of a training apparatus of a codec network according to an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 10 is a diagram illustrating image access in duplex communication mode according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.
The embodiment of the application provides a training method of an encoding and decoding network. Fig. 1 is a flowchart of a training method for a coding/decoding network according to an embodiment of the present disclosure.
In step S110, a sample image is acquired while each round of training is performed.
The sample image refers to an original image used for training encoding and decoding.
Acquiring a preset sample image set; including a plurality of sample images in a sample image set; at each round of training, at least one sample image is acquired from a set of sample images.
Step S120, inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image.
And the coding and decoding network is used for executing compression coding processing on the image to obtain compressed data corresponding to the image, and executing decoding recovery processing on the compressed data corresponding to the image to obtain a decoded image corresponding to the image. The coding and decoding network is a deep neural network. Further, the codec network may be a CED (Compression Encode and Decode, compression codec network), VAE (variation Auto-Encoder), DAE (Deep Auto-Encoder), or a feature extraction network.
And compression encoding processing for compressing the image, the data amount of the obtained compressed data being smaller than that of the sample image.
And a decoding restoration process for restoring the compressed data to an image.
The decoded image refers to an image restored by a coding and decoding network according to the compressed data.
Specifically, the at least one sample image obtained from the sample image set is input into a coding and decoding network, and the coding and decoding network is used to perform compression coding processing and decoding recovery processing on the at least one sample image, so as to obtain a decoded image corresponding to each sample image.
Further, when the number of the sample images is at least one, generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Generating a sample image matrix corresponding to the at least one sample image, including: converting each sample image in the at least one sample image into a one-dimensional image vector, and generating a sample image matrix according to the one-dimensional image vectors corresponding to the at least one sample image; each row vector in the sample image matrix represents a one-dimensional image vector corresponding to one sample image. The multidimensional sample image can be converted into a one-dimensional image vector by using a preset dimension conversion algorithm. The dimension conversion algorithm includes, but is not limited to: matrix () function algorithm.
In order to visualize the decoded image, after obtaining the interface image matrix, a row of vectors corresponding to each sample image in the decoded image matrix may be converted into a multidimensional vector according to the number of dimensions of each sample image, so as to obtain a multidimensional decoded image corresponding to the sample image.
Step S130, determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and the decoding image corresponding to the sample image.
And the image loss value is used for measuring the difference degree (coding and decoding loss) between the sample image and the decoded image.
And the decoding loss value is used for measuring the decoding correctness of the coding and decoding network.
The lower the coding and decoding loss is, the higher the decoding accuracy is, the clearer the decoded image output by the coding and decoding network is, and the lower the distortion rate is. The higher the coding and decoding loss is, the lower the decoding accuracy is, the less clear the decoded image output by the coding and decoding network is, and the higher the distortion rate is.
The following will specifically describe how to determine the image loss value and the decoding loss value, and therefore, the details are not described herein.
Step S140, determining the network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
And determining the sum value or the weighted sum of the image loss value and the decoding loss value corresponding to the coding and decoding network as the network loss value corresponding to the coding and decoding network.
And S150, determining that the coding and decoding network is converged when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
The network convergence range is used for measuring whether the network loss value of the coding and decoding network meets the application requirement. The two end values of the network convergence range may be empirical values or experimental values.
The network convergence range can control the codec loss of the codec network and the decoding accuracy.
Specifically, in a continuous preset round of training, when the network loss values corresponding to the coding and decoding network are all in the network convergence range, it may be determined that the coding and decoding network converges; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges. When the network loss values of the coding and decoding network in the multiple rounds of training are all in the preset network convergence range, the loss values of the coding and decoding network meet the application requirements and tend to be stable.
In the embodiment, the coding and decoding network is used for carrying out compression coding and decoding recovery on the image, in the process of training the coding and decoding network, not only the image loss value of the whole coding and decoding network is concerned, but also the decoding loss value in the coding and decoding network is concerned, the image loss value is the measure of coding and decoding loss, the decoding loss value is used for monitoring the decoding accuracy, the coding and decoding effects of the coding and decoding network are determined by adopting double standards of the image loss value and the decoding loss value, the definition of the decoded image is ensured, meanwhile, the distortion condition of the decoded image is controlled, the quality of the recovered image is higher, and the decoded image is not required to be repaired by using the repairing network. The repair network is used to repair the distorted image.
In this embodiment, the codec network may be deployed in a high-speed operator. Further, in order to obtain a better compression effect and a sufficiently high encoding and decoding efficiency, the embodiment of the present application may use a combination of a trained deep neural network and a high-speed arithmetic unit to compress and decode the image. Moreover, the deep neural network can process the matrix, so that the embodiment of the application can compress and decode the images in batch.
In order to make the embodiments of the present application clearer, a network structure of a codec network is described below.
A codec network comprising: an image encoding network and an image decoding network.
And the image coding network is used for performing compression coding processing on the image to obtain compressed data.
And the image decoding network is used for executing decoding recovery processing on the compressed data and recovering the compressed data into an image.
In the embodiment of the application, the coding and decoding network is a deep neural network. The deep neural network compression image is used for extracting features of the image, the less the extracted features are, the more the image is compressed, the more the extracted features are, and the smaller the image is compressed. Such features include, but are not limited to: color features, texture features, shape features, and spatial relationship features.
The image encoding network includes a plurality of encoding layers and the image decoding network includes a plurality of decoding layers. The coding layer and the decoding layer are symmetric, namely: the number of the coding layers is the same as that of the decoding layers, and the architecture of the coding layers is the same as that of the decoding layers.
Codec networks may be applied in various application scenarios. For example: the codec network can be applied in an image storage scenario, and the codec network can be applied in an image transmission scenario.
In an image storage scene, the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device. Inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
In an image transmission scene, the output end of the image coding network is connected with a preset communication transmitter; and the input end of the image decoding network is connected with a preset communication receiver. Inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the communication transmitter; receiving compressed data corresponding to the sample image through the communication receiver and inputting the compressed data into the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
Specifically, the codec network may be constructed using an AE (Auto-Encoder) network. The encoding layer and the decoding layer may be AE layers. Of course, other network or encoder configurations may be chosen.
The AE network comprises two processes of compression encoding f (X) and decoding recovery g (Y).
f (X) refers to an operation process of an image from an input layer to a feature extraction layer, and Y = f (X) = S (WX + b). X represents an image input by an AE network, W and b are respectively preset encoding parameters, S is an activation function, and Y is an extracted feature vector.
g (Y) refers to an operation procedure for restoring the feature vector to the output layer, and Z = g (Y) = S ' (W ' Y + b '). W ' and b ' are decoding parameters, S ' is an activation function, and Z is a decoded image, respectively. The closer between image Z and image X, the higher the quality of image Z.
Fig. 2 is a schematic diagram illustrating the operation of an automatic encoder according to an embodiment of the present application. When the dimension of Y is lower than that of X, the auto-encoder encodes the network for the image, as shown in the left diagram of fig. 2, and when the dimension of Y is higher than that of X, the auto-encoder decodes the network for the image, as shown in the right diagram of fig. 2.
In order to maximize compression, multiple layers of AE networks may be used in combination to extract and restore features in an image multiple times, thereby forming a compression codec network (CED).
To illustrate the specific structure of the present embodiment, an image encoding network composed of two encoded AEs (AE 1, AE 2) and an image decoding network composed of two decoded AEs (AE 3, AE 4) will be described below. Fig. 3 is a schematic structural diagram of a multi-layer AE network according to an embodiment of the present application.
When an image is compressed and coded, an image coding network is used, the first layer is an input layer of the network from left to right, an X vector (one-dimensional image vector) generated by the image is input, the image passes through an AE1 network, the X vector of the image is compressed into an h1 vector, the h1 vector is an extracted feature vector of AE1, the h1 vector is input into a next layer of AE2 network, features are further extracted, the dimension of the output feature vector h2 is lower, at the moment, the feature vector h1 is compressed into a feature vector h2, and the steps are repeated to finally obtain a low-dimensional coded Y vector.
When the image is decompressed and restored, an image decoding network is used, namely the process of restoring the characteristic Y vector into the image X, the Y vector of the low-dimensional code is decoded into an h3 vector by an AE3 network, the h3 vector is input into an AE4 network of the next layer, the h4 vector is decoded, and the decoded image Z vector is finally obtained, so that the image X is restored.
The following description will be made taking a storage scenario as an example.
High resolution images tend to occupy a large storage space, and in order to store these high resolution images, it is common practice to purchase a storage device with a larger capacity, which must be more costly. Furthermore, the artificial intelligence technology has an increasingly important position in the field of computer vision, and deep learning has good performance in image classification and image reconstruction, but a large number of images are often required for training a machine vision model, and a larger storage space is required for the large number of images. Therefore, in order to save storage space and reduce capital investment for storage equipment, the coding and decoding network of the application can be applied to a storage scene.
Firstly, a coding and decoding network is built, the input of the whole coding and decoding network is an original image to be compressed (a sample image in a training stage), and the original image is coded into a low-dimensional vector by an image coding network in the coding and decoding network and is stored. The image decoding network in the codec network can restore the low-dimensional vectors to the desired image, so that storing the low-dimensional vectors is equivalent to storing the original image.
Next, the codec network is built into a high-speed arithmetic unit. Writing the original image into a high-speed arithmetic unit, calculating a low-dimensional vector through an encoding and decoding network and storing the low-dimensional vector; when reading the image, the low-dimensional vector is read into a high-speed arithmetic unit, and the original image is decoded by applying an encoding and decoding network. Through experimental comparison, the embodiment not only can compress and store the images in batches, but also can obtain higher compression ratio and storage and reading efficiency.
The codec network needs to be trained before application, and the training network of the codec network of the present application is further described below. Fig. 4 is a schematic diagram of a training network structure of a codec network according to an embodiment of the present invention.
The training network of the coding and decoding network comprises: an image encoding network, an image decoding network, a storage device and an enhancement network.
The output of the image coding network is connected to the input of the storage device. The input of the image decoding network is connected with the output of the storage device. The input of the image coding network and the output of the image decoding network are both connected to the input of the enhancement network.
In practical application, it is not complicated to compress an image into a low-dimensional vector, and it is more complicated to restore the image, that is, when a low-dimensional feature vector Y is decoded into a Z image from a storage device, the image is often not clear enough, and the distortion is large. In order to enhance the recovery capability of the codec network for the image, so that the image is more real and clear, the embodiment provides an enhancement network, which is used for determining the decoding loss value of the codec network.
Further, the enhanced network may employ an LSGAN (Least square generated adaptive Networks) network, but other Networks may be employed.
The LSGAN network can improve the effect of decoding images in the game process of original images (sample images) and decoding images, thereby enhancing the decoding images decoded by the encoding and decoding network, shortening the distance between real distribution and generated distribution to the greatest extent and improving the decoding accuracy.
When the coding and decoding network is trained, the coding and decoding network and the enhancement network are trained simultaneously so as to pay attention to the image loss value and the decoding loss value of the coding and decoding network simultaneously.
The basic principle of this embodiment is to use the enhancement network as a Discriminator (Discriminator), and discriminate whether the output decoded image is a sample image or not by the Discriminator, if the decoded image is different from the sample image, it indicates that the decoding correctness of the image decoding network is not high, and the distortion of the decoded image is more, so that the codec network is adjusted to realize supervision of the image decoding network, so that the image decoding network generates a better image until the Discriminator cannot discriminate whether the decoded image output by the codec network is a restored image or an original sample image, and at this time, a better codec network is obtained, and the sample image can be restored to the maximum extent.
Further, when the codec network is trained, in order to prevent the discriminator from distinguishing that the image Z is a decoded and restored image, the distortion of the image Z relative to the image X needs to be as small as possible, so that by continuously training the codec network and the enhancement network at the same time, the image output by the codec network becomes clearer and clearer, and the distortion degree becomes lower and lower.
The following describes the training method of the codec network according to the embodiment of the present application with respect to the above-mentioned training network structure diagram. Fig. 5 is a flowchart illustrating the training procedure of the codec network according to an embodiment of the present invention.
Step S510, a sample image is acquired.
Step S520, inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the storage device.
The image coding network and the image decoding network are two parallel sub-networks in the coding and decoding network. Since the compression encoding process uses only the image encoding network and the decoding restoration process uses only the image decoding network, the image encoding network or the image decoding network can be selected among the encoding and decoding networks by sub-network parameter selection. When the sub-network parameters corresponding to the image coding network are selected (for example, all the sub-network parameters corresponding to the image decoding network are set to zero), the input data is processed only in the image coding network, and the image decoding network does not work. When the sub-network parameters corresponding to the image decoding network are selected (for example, all the sub-network parameters corresponding to the image coding network are set to zero), the input data is processed only in the image decoding network, and the image coding network does not work.
Step S530, obtaining compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
Step S540, determining a loss value between the sample image and the decoded image corresponding to the sample image as an image loss value of the codec network by using a preset loss function.
The categories of loss functions include, but are not limited to: a cross entropy loss function and an average error function.
For example: for calculating image loss values L when using average loss functions CED The average loss function of (X, Z) can be shown as follows:
Figure BDA0002987380510000081
wherein: omega represents a punishment item, and can prevent the overfitting of the coding and decoding network in the training process; x is a sample image, Z is a decoded image, and N is the number of sample images.
Step S550, inputting the sample image and the decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
The enhanced network may be built by a Discriminator (Discriminator). The arbiter may employ a LSGAN network. The discriminator is used for respectively scoring the sample image and the decoded image. The score is used to represent a distortion rate between an input image (sample image or decoded image) and a real image. D is scored between 0 and 1. The closer the score of the discriminator is to 1, the closer the input image is to the real image. The closer the score of the discriminator is to 0, the less the input image is a real image.
The real image is an image input to the codec network. In the present embodiment, the real image is a sample image.
Further, the present embodiment trains the codec network and the discriminator (enhancement network) synchronously. When training is started, the decoded image is greatly different from the sample image, the score of the discriminator on the sample image is close to or equal to 1, and the score on the decoded image is close to or equal to 0. A score of 1 for the sample image indicates that the discriminator can recognize the real image, and a score of 0 for the decoded image indicates that the decoded image is not the real image. Therefore, the discriminator can monitor the image decoding network, the discrimination result can measure the decoding accuracy of the image decoding network, the decoding capability of the image decoding network is improved, meanwhile, the decoded image output by the image decoding network can promote the training of the discriminator, and the discrimination capability of the discriminator on the input image is improved. With the training, the correctness of the image decoding network will be higher and higher, that is, the decoded image and the sample image are more and more difficult to distinguish, until the scores of the sample image and the decoded image by the discriminator are infinitely close, that is, the decoded image output by the image decoding network can be regarded as a real image, so that the image recovery capability of the encoding and decoding network is optimal.
The decoding loss value can measure the decoding correctness degree of the image decoding network. For example: the function used to calculate the decoding loss value may be as follows:
Figure BDA0002987380510000091
wherein D is a discriminator; x is a sample image; z is a decoded image; d (X) is the score of the sample image by the discriminator; d (Z) is the score of the discriminator on the decoded image; n is the number of sample images; a and b are preset parameters.
Further, a = -1, b =1 may be set. The function used to calculate the decoding loss value is equivalent to the pearson chi-squared divergence function.
Step S560, determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
The image loss value and the decoding loss value are combined to form a network loss value L. The network loss value L is an index for judging whether the coding and decoding network reaches the optimal value. The optimal coding and decoding network means that the compression ratio of the image coding network is optimal, and the recovery effect of the image decoding network is optimal.
For example: the function used to calculate the network loss value L may be as shown in the following graph:
L=L CED (X,Z)+λ*L LSGAN (D,X,Z);
where λ denotes a ratio of the decoding loss value to the total loss value (sum of the decoding loss value and the image loss value), that is, a degree of importance of the decoding loss value. λ may also be a preset value, which may be an empirical value or an experimental value. For example, it may be set to 0.2 during actual operation.
Step S570, determining whether the coding and decoding network meets the convergence condition according to the network loss value corresponding to the coding and decoding network; if so, go to step S580; if not, it jumps to step S510.
The convergence conditions include: in continuous multi-round training, the network loss values corresponding to the coding and decoding networks are all in a preset network convergence range.
If the coding and decoding network does not converge, the parameters of the coding and decoding network need to be adjusted, and then the next round of training is performed on the coding and decoding network.
Step S580, determining that the codec network has converged.
An image transmission scenario may be performed with reference to the present embodiment. The only difference is that after the image coding network obtains the compressed data corresponding to the sample image, the image coding network outputs the compressed data corresponding to the sample image to the communication transmitter (at this time, the image decoding network does not work), and the communication transmitter transmits the compressed data to the receiving end device. After the communication receiver receives the compressed data, the compressed data is input into an image decoding network (at this time, the image coding network does not work), and the image decoding network performs decoding recovery processing on the compressed data to obtain a decoded image corresponding to the sample image.
After the codec network converges, the training of the codec network is stopped, and the trained codec network can be applied to a specific scene.
The following description will take the storage process in the image storage scene as an example. The execution subject of the present embodiment is an image storage system. FIG. 6 is a flowchart illustrating steps of image storage according to an embodiment of the present invention.
Step S610, receiving an image storage instruction; the image storage instructions are to instruct to store a first target image.
Image storage instructions for instructing storage of at least one first target image.
After receiving the image storage instruction, the (at least one) first target image may be received or acquired.
For example: the client sends an image storage instruction to the image storage system, and immediately after the client sends the first target image to the image storage system, the image storage system starts to receive the first target image after receiving the image storage instruction.
Step S620, inputting the first target image into the coding and decoding networks, and selecting an image coding network of the coding and decoding networks according to the image storage indication.
And selecting the sub-network parameters corresponding to the image coding network according to the image storage instruction, so that the image coding network in the coding and decoding network works normally, and the image decoding network in the coding and decoding network stops working. Thus, after the first target image is input into the codec network, the first target image can be processed by the image coding network and the result can be output.
Specifically, when the number of the first target images is at least one, a first target image matrix corresponding to the at least one first target image is generated; each row vector in the first target image matrix is a one-dimensional first target image; and inputting the first target image matrix into the coding and decoding network.
Further, each first target image in the at least one first target image is converted into a one-dimensional image vector, and a first target image matrix is generated according to the one-dimensional image vectors corresponding to the at least one first target image; each row vector in the first target image matrix represents a one-dimensional image vector corresponding to one first target image. The multidimensional first target image may be converted into a one-dimensional image vector using a preset dimension conversion algorithm. The dimension conversion algorithm includes, but is not limited to: matrix () function algorithm.
For example: the first object image a is [1,2,3, 2], the first object image B is [3,2,3,4,1], the first object image C is [4,2,1,4,1], and thus the first object image matrix composed of these three one-dimensional image vectors is:
[1,2,3,3,2]
[3,2,3,4,1]
[4,2,1,4,1]。
step S630, performing compression encoding processing on the first target image by using the image encoding network, obtaining compressed data corresponding to the first target image, and outputting the compressed data to the storage device so as to store the first target image.
Performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After the compressed data corresponding to the first target image is output to the storage device, the storage device stores the compressed data corresponding to the first target image and returns the storage address of the first target image for subsequent use in reading the first target image.
In this embodiment, after receiving the image storage instruction, the first target image may be input into the trained codec network, and the image coding network in the codec network is used to perform compression coding on the first target image, so as to implement the dimension reduction processing on the first target image, achieve the purpose of compressing the first target image, and finally store the compressed data corresponding to the first target image in the storage device. Because the coding and decoding network also comprises the image decoding network, the storage of the compressed data of the first target image is equal to the storage of the first target image, the demand and the investment cost of the storage space are effectively reduced, and the utilization rate of the storage space is improved.
Compared with a mode of compressing and encoding an image by using an image encoding network, although the image volume obtained by the WebP image compression technology is smaller, the longer encoding time is consumed, the time length is usually several times or even dozens of times of a common compression algorithm, and therefore the method is not suitable for wide popularization and use; the BPG image compression technique is single picture compression and decompression, is not suitable for simultaneous compression and decompression (decoding) of a large number of images, and requires a high cost. The image coding network of the embodiment can not only perform compression and decompression of a single image, but also perform compression and decompression of batch images, and can realize more efficient storage.
The following description will take the reading process in the image storage scene as an example. Fig. 7 is a flowchart illustrating steps of image reading according to an embodiment of the present invention.
Step S710, receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image.
And image reading instructions for instructing to read at least one second target image.
Information of the second target image may be carried in the image reading instruction. The information of the second target image includes, but is not limited to: a storage address of the second target image.
The storage address of the second target image is also the storage address of the compressed data corresponding to the second target image.
Step S720, obtaining the compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the codec network.
When the number of the second target images is at least one, generating a compressed data matrix according to the compressed data corresponding to the at least one second target image; each row vector in the compressed data matrix corresponds to one second target image; and inputting the compressed data matrix into the coding and decoding network.
Step S730, selecting an image decoding network in the coding and decoding networks according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
Performing decoding recovery processing on the compressed data matrix by using the image decoding network to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
In order to visualize the decoded image, a row of vectors corresponding to each second target image in the decoded image matrix may be converted into a multidimensional vector according to the number of dimensions of each second target image, so as to obtain a multidimensional decoded image.
In this embodiment, after receiving the image reading instruction, the storage device may obtain compressed data corresponding to the second target image, input the compressed data into the trained codec network, and perform decoding recovery on the compressed data by using the image decoding network in the codec network, so as to implement dimension-up processing on the compressed data, achieve the purpose of recovering the compressed data, and finally recover the compressed data into the second target image.
When the image is required to be obtained, the stored low-dimensional characteristic vector Y is only required to be decoded and restored, and then a clear image can be obtained. The embodiment can compress the image to a greater extent, and supports batch parallel compression and decoding processing of a plurality of images. For example: 100 images are compressed or decompressed at one time, and the compression storage efficiency is improved.
In this embodiment, the codec network may be deployed to a high-speed computing Unit, which may be a device such as a DSP (Digital Signal Processing), an FPGA (programmable logic Array) Unit, and a GPU (Graphics Processing Unit). The high-speed computing unit may be a high-speed storage unit.
When the image needs to be written into the storage device, the image needs to be written into the high-speed computing unit firstly, and the high-speed computing unit computes the low-dimensional vector of the image by using the trained coding and decoding network and writes the low-dimensional vector into the storage equipment.
The storage device may be any of a variety of memories, such as: SRAM (Static Random-Access Memory), DRAM (Dynamic Random-Access Memory), other types of RAM (Random-Access Memory), etc., but also some flash Memory or other Memory technology, DVD (Digital Video Disc) or other optical storage, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium.
When the image needs to be read from the storage device, the high-speed calculation unit decodes the low-dimensional vector read from the storage device, restores the image data and finishes reading the image.
For AI (Artificial Intelligence) platforms related to cloud computing and machine learning, a large number of network models and images need to be processed, and generally, high-speed computing units are provided, so that the coding and decoding network of the embodiment is more suitable for a cloud storage system, and generates greater economic benefit for the cloud storage system.
The embodiment of the application also provides a training device of the coding and decoding network. Fig. 8 is a block diagram of a training apparatus of a codec network according to an embodiment of the present application.
The training device of the coding and decoding network comprises: an obtaining module 810, a coding and decoding module 820, a first determining module 830, a second determining module 840 and a third determining module 850.
An obtaining module 810 for obtaining a sample image while performing each round of training.
The coding and decoding module 820 is configured to input the sample image into a coding and decoding network, and sequentially perform compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image.
The first determining module 830 is configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image.
A second determining module 840, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network.
The third determining module 850 is configured to determine that the codec network converges when the network loss value corresponding to the codec network is within a preset network convergence range after multiple rounds of training.
The functions of the apparatus according to the embodiment of the present invention have been described in the foregoing method embodiments, so that reference may be made to the related descriptions in the foregoing embodiments for details that are not described in the foregoing embodiments, and further details are not described herein.
As shown in fig. 9, an electronic device according to an embodiment of the present application includes a processor 910, a communication interface 920, a memory 930, and a communication bus 940, where the processor 910, the communication interface 920, and the memory 930 complete communication with each other through the communication bus 940.
A memory 930 for storing computer programs.
In an embodiment of the present application, when the processor 910 is configured to execute the program stored in the memory 930, the method for training a codec network provided in any one of the foregoing method embodiments is implemented, including: acquiring a sample image while performing each round of training; inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image; determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoding image corresponding to the sample image; determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network; and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
Wherein, the determining a decoding loss value corresponding to the coding and decoding network according to the sample image and the decoded image corresponding to the sample image comprises: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; and determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image.
Wherein the obtaining a sample image comprises: acquiring at least one sample image; the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image; inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the image coding network to obtain a decoded image matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
Wherein, the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device; the inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes: inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the storage device; acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image storage instruction; the image storage instructions are to instruct to store a first target image; inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication; and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
Wherein the image storage instructions are to instruct to store at least one first target image; the inputting the first target image into the coding and decoding network comprises: generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image; inputting the first target image matrix into the coding and decoding network; the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes: performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
After determining that the coding and decoding network converges, the method further comprises: receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image; acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network; and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
The image reading instruction is used for indicating to read at least one second target image; before the compressed data corresponding to the second target image is input into the codec network, the method further includes: generating a compressed data matrix according to the compressed data respectively corresponding to the at least one second target image; each row vector in the compressed data matrix corresponds to one second target image; inputting the compressed data matrix into the coding and decoding network; the decoding recovery processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding recovery processing includes: utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
When the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training, determining that the coding and decoding network converges comprises the following steps: in the training of continuous preset rounds, when the network loss values corresponding to the coding and decoding network are all in the network convergence range, determining that the coding and decoding network converges; or, after multiple rounds of training, when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time, determining that the coding and decoding network converges.
In a typical configuration of the present application, a device includes one or more processing computers, input/output interfaces, and memory. Certainly, the communication between the computing unit and the storage needs a communication interface, for the point-to-point communication between the devices, according to the relation between the message transmission direction and the time, the communication mode can adopt any one of simplex communication, half-duplex communication and full-duplex communication, and as the read-write operation is needed between each device, in order to increase the applicability of the device, the full-duplex communication mode is adopted in the invention, namely, the bidirectional signal transmission of image storage and reading can be carried out on the line at any time of the communication, thereby improving the network efficiency. Fig. 10 is a schematic diagram illustrating image access in a duplex communication mode according to an embodiment of the present application. Under the duplex communication model, images can be stored (written) and read simultaneously, and the image access efficiency is further improved.
The present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the training method for the codec network provided in any one of the foregoing method embodiments. Among other things, computer-readable storage media may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the memory may also comprise a combination of the above kinds of memories.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for training a codec network, comprising:
acquiring a sample image while performing each round of training;
inputting the sample image into an encoding and decoding network, and sequentially performing compression encoding processing and decoding recovery processing on the sample image by using the encoding and decoding network to obtain a decoded image corresponding to the sample image;
determining an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image, wherein determining the decoding loss value corresponding to the coding and decoding network according to the sample image and the decoded image corresponding to the sample image comprises: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image;
determining a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network;
and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range through multiple rounds of training.
2. The method of claim 1,
the acquiring of the sample image comprises: acquiring at least one sample image;
the inputting the sample image into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes:
generating a sample image matrix corresponding to the at least one sample image; each row vector in the sample image matrix is a one-dimensional sample image;
inputting the sample image matrix into the coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample image matrix by using the coding and decoding network to obtain a decoded image matrix; and each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one sample image.
3. The method of claim 1,
the coding and decoding network comprises: an image encoding network and an image decoding network; the output end of the image coding network is connected with a preset storage device; the input end of the image decoding network is connected with the storage device;
the inputting the sample image into a coding and decoding network, and performing compression coding processing and decoding recovery processing on the sample image in sequence by using the coding and decoding network to obtain a decoded image corresponding to the sample image, includes:
inputting the sample image into the image coding network, performing compression coding processing on the sample image by using the image coding network to obtain compressed data corresponding to the sample image, and outputting the compressed data corresponding to the sample image to the storage device;
acquiring compressed data corresponding to the sample image from the storage device through the image decoding network; and performing decoding recovery processing on the compressed data corresponding to the sample image by using the image decoding network to obtain a decoded image corresponding to the sample image.
4. The method of claim 3, further comprising, after determining convergence of the codec network:
receiving an image storage instruction; the image storage instructions are to instruct to store a first target image;
inputting the first target image into the coding and decoding networks, and selecting an image coding network in the coding and decoding networks according to the image storage indication;
and performing compression coding processing on the first target image by using the image coding network to obtain compressed data corresponding to the first target image and outputting the compressed data to the storage device so as to store the first target image.
5. The method of claim 4,
the image storage instructions are to instruct to store at least one first target image;
the inputting the first target image into the coding and decoding network comprises:
generating a first target image matrix corresponding to the at least one first target image; each row vector in the first target image matrix is a one-dimensional first target image;
inputting the first target image matrix into the coding and decoding network;
the performing, by using the image coding network, compression coding processing on the first target image to obtain compressed data corresponding to the first target image includes:
performing compression coding processing on the first target image matrix by using the image coding network to obtain a compressed data matrix corresponding to the first target image matrix; each row vector in the compressed data matrix is compressed data corresponding to one first target image.
6. The method of claim 3, further comprising, after determining that the codec network has converged:
receiving an image reading instruction; the image reading instruction is used for indicating to read a second target image;
acquiring compressed data corresponding to the second target image from the storage device according to the image reading instruction, and inputting the compressed data corresponding to the second target image into the coding and decoding network;
and selecting an image decoding network in the coding and decoding network according to the image reading instruction, and performing decoding recovery processing on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image.
7. The method of claim 6,
the image reading instruction is used for indicating to read at least one second target image;
before the compressed data corresponding to the second target image is input into the codec network, the method further includes:
generating a compressed data matrix according to the compressed data corresponding to the at least one second target image respectively; each row vector in the compressed data matrix corresponds to one second target image;
inputting the compressed data matrix into the coding and decoding network;
the decoding and recovering processing is performed on the compressed data corresponding to the second target image by using the image decoding network to obtain a decoded image corresponding to the second target image, and the decoding and recovering processing includes:
utilizing the image decoding network to perform decoding recovery processing on the compressed data matrix to obtain a decoded image matrix corresponding to the compressed data matrix; each row vector in the decoded image matrix is a one-dimensional decoded image corresponding to one second target image.
8. The method according to any one of claims 1 to 7, wherein determining convergence of the codec network when the network loss value corresponding to the codec network is within a preset network convergence range after multiple rounds of training comprises:
in the training of continuous preset rounds, when the network loss values corresponding to the coding and decoding network are all in the network convergence range, determining that the coding and decoding network converges; or,
and determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in the network convergence range for the first time after multiple rounds of training.
9. An apparatus for training a codec network, comprising:
the acquisition module is used for acquiring a sample image when each round of training is executed;
the coding and decoding module is used for inputting the sample images into a coding and decoding network, and sequentially performing compression coding processing and decoding recovery processing on the sample images by using the coding and decoding network to obtain decoded images corresponding to the sample images;
a first determining module, configured to determine an image loss value and a decoding loss value corresponding to the coding and decoding network according to the sample image and a decoded image corresponding to the sample image, where the determining a decoding loss value corresponding to the coding and decoding network according to the sample image and the decoded image corresponding to the sample image includes: inputting the sample image and a decoded image corresponding to the sample image into a preset enhancement network; determining, by the enhancement network, a distortion rate of the sample image and a distortion rate of a decoded image corresponding to the sample image; determining a decoding loss value corresponding to the coding and decoding network according to the distortion rate of the sample image and the distortion rate of the decoded image;
a second determining module, configured to determine a network loss value corresponding to the coding and decoding network according to the image loss value and the decoding loss value corresponding to the coding and decoding network;
and the third determining module is used for determining the convergence of the coding and decoding network when the network loss value corresponding to the coding and decoding network is in a preset network convergence range after multiple rounds of training.
10. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing the communication between the processor and the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the steps of the method for training a codec network according to any one of claims 1 to 8 when executing a program stored in a memory.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of training a codec network according to any one of claims 1 to 8.
CN202110303982.1A 2021-03-22 2021-03-22 Method, device and equipment for training coding and decoding network and storage medium Active CN112929666B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110303982.1A CN112929666B (en) 2021-03-22 2021-03-22 Method, device and equipment for training coding and decoding network and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110303982.1A CN112929666B (en) 2021-03-22 2021-03-22 Method, device and equipment for training coding and decoding network and storage medium

Publications (2)

Publication Number Publication Date
CN112929666A CN112929666A (en) 2021-06-08
CN112929666B true CN112929666B (en) 2023-04-14

Family

ID=76175378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110303982.1A Active CN112929666B (en) 2021-03-22 2021-03-22 Method, device and equipment for training coding and decoding network and storage medium

Country Status (1)

Country Link
CN (1) CN112929666B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113746870B (en) * 2021-11-05 2022-02-08 山东万网智能科技有限公司 Intelligent data transmission method and system for Internet of things equipment
US11711449B2 (en) 2021-12-07 2023-07-25 Capital One Services, Llc Compressing websites for fast data transfers
CN116095339A (en) * 2023-01-16 2023-05-09 北京智芯微电子科技有限公司 Image transmission method, training method, electronic device, and readable storage medium
CN116737607B (en) * 2023-08-16 2023-11-21 之江实验室 Sample data caching method, system, computer device and storage medium
CN117459727B (en) * 2023-12-22 2024-05-03 浙江省北大信息技术高等研究院 Image processing method, device and system, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416752A (en) * 2018-03-12 2018-08-17 中山大学 A method of image is carried out based on production confrontation network and removes motion blur
CN109255769A (en) * 2018-10-25 2019-01-22 厦门美图之家科技有限公司 The training method and training pattern and image enchancing method of image enhancement network
CN110070174A (en) * 2019-04-10 2019-07-30 厦门美图之家科技有限公司 A kind of stabilization training method generating confrontation network
CN110225350A (en) * 2019-05-30 2019-09-10 西安电子科技大学 Natural image compression method based on production confrontation network
EP3633990A1 (en) * 2018-10-02 2020-04-08 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network
CN111462000A (en) * 2020-03-17 2020-07-28 北京邮电大学 Image recovery method and device based on pre-training self-encoder
CN111565314A (en) * 2019-02-13 2020-08-21 合肥图鸭信息科技有限公司 Image compression method, coding and decoding network training method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10540574B2 (en) * 2017-12-07 2020-01-21 Shanghai Cambricon Information Technology Co., Ltd Image compression method and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108416752A (en) * 2018-03-12 2018-08-17 中山大学 A method of image is carried out based on production confrontation network and removes motion blur
EP3633990A1 (en) * 2018-10-02 2020-04-08 Nokia Technologies Oy An apparatus, a method and a computer program for running a neural network
CN109255769A (en) * 2018-10-25 2019-01-22 厦门美图之家科技有限公司 The training method and training pattern and image enchancing method of image enhancement network
CN111565314A (en) * 2019-02-13 2020-08-21 合肥图鸭信息科技有限公司 Image compression method, coding and decoding network training method and device and electronic equipment
CN110070174A (en) * 2019-04-10 2019-07-30 厦门美图之家科技有限公司 A kind of stabilization training method generating confrontation network
CN110225350A (en) * 2019-05-30 2019-09-10 西安电子科技大学 Natural image compression method based on production confrontation network
CN111462000A (en) * 2020-03-17 2020-07-28 北京邮电大学 Image recovery method and device based on pre-training self-encoder

Also Published As

Publication number Publication date
CN112929666A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN112929666B (en) Method, device and equipment for training coding and decoding network and storage medium
US11310509B2 (en) Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA)
EP3777207B1 (en) Content-specific neural network distribution
Zhou et al. End-to-end Optimized Image Compression with Attention Mechanism.
US10728553B2 (en) Visual quality preserving quantization parameter prediction with deep neural network
EP3583777A1 (en) A method and technical equipment for video processing
CN108780499A (en) The system and method for video processing based on quantization parameter
CN111641826B (en) Method, device and system for encoding and decoding data
CN104869425A (en) Compression and decompression method based on texture image similarity
CN103188494A (en) Apparatus and method for encoding depth image by skipping discrete cosine transform (DCT), and apparatus and method for decoding depth image by skipping DCT
CN113192147A (en) Method, system, storage medium, computer device and application for significance compression
Löhdefink et al. On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
WO2023050720A1 (en) Image processing method, image processing apparatus, and model training method
CN107547773B (en) Image processing method, device and equipment
Löhdefink et al. Focussing learned image compression to semantic classes for V2X applications
Jiang et al. Multi-modality deep network for extreme learned image compression
CN114373023A (en) Point cloud geometric lossy compression reconstruction device and method based on points
CN112714313A (en) Image processing method, device, equipment and storage medium
CN113256744A (en) Image coding and decoding method and system
Nortje et al. BINet: A binary inpainting network for deep patch-based image compression
CN114245126A (en) Depth feature map compression method based on texture cooperation
CN103716622A (en) Image processing method and device
CN110276728B (en) Human face video enhancement method based on residual error generation countermeasure network
CN117956178A (en) Video encoding method and device, and video decoding method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant