CN116614637B - Data processing method, device, equipment and readable storage medium - Google Patents

Data processing method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN116614637B
CN116614637B CN202310885764.2A CN202310885764A CN116614637B CN 116614637 B CN116614637 B CN 116614637B CN 202310885764 A CN202310885764 A CN 202310885764A CN 116614637 B CN116614637 B CN 116614637B
Authority
CN
China
Prior art keywords
image
embedding layer
decoding
feature embedding
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310885764.2A
Other languages
Chinese (zh)
Other versions
CN116614637A (en
Inventor
吕悦
项进喜
张军
韩骁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310885764.2A priority Critical patent/CN116614637B/en
Publication of CN116614637A publication Critical patent/CN116614637A/en
Application granted granted Critical
Publication of CN116614637B publication Critical patent/CN116614637B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/44Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application discloses a data processing method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: performing image coding and decoding processing on the original image through an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image; acquiring the feature to be processed, which is input to a target feature embedding layer of an image decoder, in the image encoding and decoding processing process, and performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer; if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain the optimal decoding parameters of the target feature embedding layer. By adopting the application, the training efficiency of the decoder can be improved and the image compression performance can be improved in the image compression service.

Description

Data processing method, device, equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method, apparatus, device, and readable storage medium.
Background
Image compression based on deep learning can be regarded as an Auto-Encoder (AE) architecture. For an image to be compressed, firstly, nonlinear transformation is carried out through an encoder, the implicit expression of the image is obtained, then the image is quantized, and encoding is carried out based on an entropy model, so that encoding word throttling is obtained. While at decompression, the decompression side (e.g., client) can recover the implicit expression from the byte stream and input it to the decoder to obtain the reconstructed image.
In order to improve the compression performance of the image (i.e. improve the image quality of the reconstructed image), the encoder and the decoder may be pre-trained and optimized, so that the image quality of the finally obtained reconstructed image is higher. In training and optimizing a decoder, a fixed number of learnable parameters are usually introduced to optimize the decoder, specifically, since the decoder is composed of multiple network layers (such as convolution layers, etc.), the training and optimizing the decoder actually optimizes each network layer in the decoder, while the related art fixes the number of network layers that need to be optimized, and then randomly selects part (or all) of the network layers from each network layer according to the fixed number to optimize.
However, the above manner may make the network layer that needs to be optimized in the decoder not be optimized in time, which affects not only the training efficiency, but also the performance of the decoder after optimization. Therefore, there is a need for a parameter optimization method of a decoder to improve the image compression performance.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment and a readable storage medium, which can improve the training efficiency of a decoder and the image compression performance in image compression service.
In one aspect, an embodiment of the present application provides a data processing method, including:
performing image coding and decoding processing on the original image through an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image;
acquiring the feature to be processed, which is input to a target feature embedding layer of an image decoder, in the image encoding and decoding processing process, and performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer;
if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain the optimal decoding parameters of the target feature embedding layer.
In one aspect, an embodiment of the present application provides a data processing apparatus, including:
the encoding and decoding module is used for carrying out image encoding and decoding processing on the original image through the image encoder and the image decoder to obtain a first reconstructed image corresponding to the original image;
the feature acquisition module is used for acquiring the feature to be processed of the target feature embedding layer input to the image decoder in the image encoding and decoding processing process;
the feature mapping module is used for carrying out binary mapping treatment on the feature to be treated through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer;
and the parameter training module is used for training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image if the optimized control value corresponding to the target feature embedding layer is determined to be an effective value, so as to obtain the optimized decoding parameters of the target feature embedding layer.
In one embodiment, the optimized control value obtained by the gating network is used to reflect decoding suitability, which refers to the suitability between the decoding parameter of the target feature embedding layer and the feature to be processed input into the target feature embedding layer;
When the optimal control value is an effective value, the decoding parameter of the target feature embedding layer is represented, and the suitability between the decoding parameter and the feature to be processed of the target feature embedding layer is not possessed;
when the optimal control value is an invalid value, the decoding parameter of the target feature embedding layer is indicated, and the suitability is provided between the decoding parameter and the feature to be processed of the target feature embedding layer.
In one embodiment, the specific manner of performing, by the codec module, image encoding and decoding processing on an original image by using an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image includes:
performing image coding processing on the original image through an image coder to obtain implicit expression features corresponding to the original image;
carrying out quantization processing on the implicit expression characteristics to obtain first quantization characteristics corresponding to the implicit expression characteristics;
and decoding the first quantized features through an image decoder to obtain a first reconstructed image corresponding to the original image.
In one embodiment, after obtaining the optimized decoding parameters of the target feature embedding layer, the data processing apparatus further includes:
the increment matrix determining module is used for determining an increment parameter matrix between the optimized decoding parameter and the initial decoding parameter;
The matrix decomposition module is used for performing low-rank decomposition treatment on the incremental parameter matrix to obtain a decomposition matrix corresponding to the incremental parameter matrix; the matrix dimension of the decomposition matrix is lower than the matrix dimension of the incremental parameter matrix;
a determining module, configured to determine a target feature embedding layer including a decomposition matrix and initial decoding parameters as an optimized feature embedding layer, and determine an image decoder including the optimized feature embedding layer as an optimized image decoder;
the matrix fine tuning module is used for acquiring implicit expression characteristics output by the image encoder in the image encoding and decoding processing process, and carrying out fine tuning processing on the decomposition matrix through the implicit expression characteristics and the optimized image decoder to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix;
and the sending module is used for sending the implicit expression features, the fine adjustment decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to the decoding client so that the decoding client decodes the implicit expression features, the fine adjustment decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to obtain a decoded image corresponding to the original image.
In one embodiment, the matrix fine tuning module performs fine tuning processing on the decomposition matrix through implicit expression features and an optimized image decoder to obtain a specific mode of fine tuning the decomposition matrix corresponding to the decomposition matrix, where the specific mode includes:
Performing fine adjustment processing on the implicit expression characteristic through a first error loss value to obtain a fine adjustment expression characteristic;
performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through an optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine tuning processing on the decomposition matrix through the second error loss value to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
In one embodiment, the matrix fine tuning module performs fine tuning processing on the implicit expression feature through the first error loss value to obtain a specific mode of fine tuning the expression feature, including:
performing gradient calculation processing on the first error loss value and the implicit expression characteristic to obtain a first gradient value corresponding to the implicit expression characteristic;
and carrying out fine adjustment processing on the implicit expression characteristics through a first fine adjustment function and a first gradient value corresponding to the implicit expression characteristics to obtain fine adjustment expression characteristics.
In one embodiment, the matrix fine tuning module performs fine tuning processing on the decomposition matrix through the second error loss value to obtain a specific mode of fine tuning the decomposition matrix corresponding to the decomposition matrix, where the specific mode includes:
Performing gradient calculation processing on the second error loss value and the decomposition matrix to obtain a second gradient value corresponding to the decomposition matrix;
and performing fine tuning treatment on the decomposition matrix through a second fine tuning function and a second gradient value corresponding to the decomposition matrix to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
In one embodiment, the specific manner of sending the implicit expression feature, the fine tuning decomposition matrix corresponding to the decomposition matrix, and the optimal control value corresponding to the target feature embedding layer to the decoding client by the sending module includes:
acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through a first error loss value;
respectively carrying out quantization processing on the fine-tuning expression features and the fine-tuning decomposition matrix to obtain quantized fine-tuning features corresponding to the fine-tuning expression features and quantized fine-tuning matrices corresponding to the fine-tuning decomposition matrix;
respectively carrying out arithmetic coding treatment on the quantized fine tuning features, the quantized fine tuning matrix and the optimized control values corresponding to the target feature embedding layer to obtain a first bit stream corresponding to the quantized fine tuning features, a second bit stream corresponding to the quantized fine tuning matrix and a third bit stream corresponding to the optimized control values;
the first, second and third bitstreams are sent to a decoding client.
In one embodiment, the specific mode of obtaining the feature to be processed of the target feature embedding layer input to the image decoder in the image codec processing procedure by the feature obtaining module includes:
acquiring a feature embedding network used for performing feature embedding processing in an image decoder; the feature embedding network consists of a feature embedding layer sequence, wherein the feature embedding layer sequence comprises a target feature embedding layer;
when the target feature embedded layer is positioned at the sequence starting position of the feature embedded layer sequence, carrying out quantization processing on implicit expression features output by an image encoder in the image encoding and decoding processing process to obtain first quantization features, and determining the first quantization features as features to be processed of the target feature embedded layer in the image encoding and decoding processing process;
and when the target feature embedded layer is positioned at a sequence non-initial position of the feature embedded layer sequence, determining the layer output feature of the last feature embedded layer of the target feature embedded layer in the feature embedded layer sequence in the image coding and decoding process as the feature to be processed of the target feature embedded layer in the image coding and decoding process.
In one embodiment, after acquiring the implicit expression features output by the image encoder during the image codec process, the data processing apparatus further includes:
The network parameter optimization module is used for optimizing the network parameters of the gating network through the implicit expression characteristics and the optimized image decoder to obtain optimized network parameters; and the gating network comprises optimized network parameters and is used for carrying out binary mapping processing on the updated characteristics to be processed after acquiring the updated characteristics to be processed of the target characteristic embedding layer in a new round of image encoding and decoding processing process to obtain updated optimized control values corresponding to the target characteristic embedding layer.
In one embodiment, the network parameter optimization module optimizes the network parameters of the gating network through the implicit expression feature and the optimized image decoder to obtain a specific mode of optimizing the network parameters, and the specific mode comprises the following steps:
acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through a first error loss value;
performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through an optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine adjustment processing on the network parameters of the gating network through the second error loss value to obtain optimized network parameters corresponding to the network parameters of the gating network.
In one aspect, an embodiment of the present application provides a computer device, including: a processor and a memory;
the memory stores a computer program that, when executed by the processor, causes the processor to perform the methods of embodiments of the present application.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, perform a method according to embodiments of the present application.
In one aspect of the present application, a computer program product is provided, the computer program product comprising a computer program stored in a computer readable storage medium. A processor of a computer device reads the computer program from a computer-readable storage medium, and the processor executes the computer program to cause the computer device to perform a method provided in an aspect of an embodiment of the present application.
In the embodiment of the application, in the process of training and optimizing an image decoder in the image compression service, a gating network is configured for each feature embedding layer of the image decoder to adaptively determine whether the decoding parameters of the feature embedding layer should be updated. Wherein the output of the gating network may adaptively select whether to add new information to the feature embedding layer based on the features input to the feature embedding layer (where the new information may be understood as decoding parameters introduced for the feature embedding layer). Specifically, taking a target embedded layer in an image decoder as an example, after an original image is subjected to image encoding and decoding processing to obtain a first reconstructed image, for the target feature embedded layer, before a decoding parameter of the target feature embedded layer (a decoding parameter before training optimization can be called an initial decoding parameter) is trained and optimized by a first error loss value between the original image and the first reconstructed image, a binary mapping processing can be performed on a feature to be processed of the target feature embedded layer through a gating network of the target feature embedded layer, an output value is obtained and can be used as an optimal control value of the target feature embedded layer, and if the optimal control value is an effective value, training and optimizing processing can be performed on the initial decoding parameter of the target feature embedded layer based on the first error loss value to obtain an optimal decoding parameter of the target feature embedded layer. It should be understood that by performing the gating network for each feature embedding layer of the image decoder, whether the feature embedding layer performs parameter optimization can be adaptively determined through the gating network based on the features input to the feature embedding layer, so that the feature embedding layer which needs to perform parameter optimization can be accurately and timely subjected to parameter optimization, and the feature embedding layer which does not need to perform parameter optimization is subjected to non-optimization processing, so that the optimization timeliness of each feature embedding layer can be well improved, and unnecessary optimization of part of the feature embedding layers can be reduced, which is not only beneficial to improving the training efficiency of the image decoder, but also beneficial to improving the performance of the optimized decoder, thereby improving the image compression performance. In conclusion, the application can improve the training efficiency of the decoder and the image compression performance in the image compression service.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a network architecture diagram of a data processing system provided in an embodiment of the present application;
FIG. 2 is a schematic flow chart of a data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of performing image encoding and decoding according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a gating network according to an embodiment of the present application;
fig. 5 is a schematic flow chart of transmitting data to a terminal device according to an embodiment of the present application;
fig. 6 is a schematic diagram of an architecture of data interaction between a server and a decoding end according to an embodiment of the present application;
FIG. 7 is a diagram of a system logic architecture according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a data processing apparatus according to an embodiment of the present application;
Fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The application relates to related technologies such as artificial intelligence, and in order to facilitate understanding, related concepts such as artificial intelligence will be described in the following.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
The scheme provided by the embodiment of the application belongs to Computer Vision technology (CV) and Machine Learning (ML) which belong to the field of artificial intelligence.
Computer Vision (CV) is a science of how to "look" at a machine, and more specifically, to use a camera and a Computer to perform machine Vision such as recognition and measurement on a target, and further perform graphic processing to make the Computer process an image more suitable for observation or transmission to an instrument for detection. As a scientific discipline, computer vision research-related theory and technology has attempted to build artificial intelligence systems that can acquire information from images or multidimensional data. Computer vision techniques typically include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D techniques, virtual reality, augmented reality, synchronous positioning, and map construction, among others.
Machine Learning (ML) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.
The scheme of the application particularly relates to an image processing technology in a computer vision technology, which can realize compression processing of images so as to obtain compressed images. Meanwhile, the scheme of the application particularly relates to a machine learning technology, and particularly can be used for training an image encoder and an image decoder in an image compression service so as to improve the image quality (such as clearer images) of an output compressed image (or called a reconstructed image).
For ease of understanding, FIG. 1 is a diagram of a network architecture of a data processing system according to an embodiment of the present application. As shown in fig. 1, the network architecture may include a service server 1000 and a terminal device cluster, which may include one or more terminal devices, the number of which will not be limited here. As shown in fig. 1, the plurality of terminal devices may include a terminal device 100a, a terminal device 100b, terminal devices 100c, …, a terminal device 100n; as shown in fig. 1, the terminal devices 100a, 100b, 100c, …, 100n may respectively perform network connection with the service server 1000, so that each terminal device may perform data interaction with the service server 1000 through the network connection. In addition, any terminal device in the terminal device cluster 100 may refer to an intelligent device running an operating system, and the operating system of the terminal device is not specifically limited in the embodiment of the present application.
The terminal device in the data processing system shown in fig. 1 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a desktop computer, a mobile internet device (MID, mobile internet device), a POS (Point Of sale) machine, a smart speaker, a smart television, a smart watch, a smart car terminal, a Virtual Reality (VR) device, an augmented Reality (Augmented Reality, AR) device, and the like. The terminal device is often configured with a display device, which may be a display, a display screen, a touch screen, etc., and the touch screen may be a touch screen, a touch panel, etc.
The service server in the data processing system shown in fig. 1 may be a single physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligence platforms, and the like. The terminal device and the service server may be directly or indirectly connected through wired or wireless communication, which is not limited herein.
In one possible implementation, a terminal device (e.g., terminal device 100 a) has a client (the client may also be referred to as an application) running therein, such as a video client, a browser client, a game client, an educational client, a web client, a compression client, etc., which will not be illustrated one by one. Each client may be provided with an image compression function. In the embodiment of the application, taking the compression client as an example, the compression client may be used by an object (such as a user using the client) to run the compression client in the terminal device, and the compression client may provide a compression function (for example, an image compression function), and the object (where the object may refer to an object using the compression client, for example, may refer to a user using the compression client), may upload an image in the compression client based on the function provided by the compression client, so as to compress the image, so as to obtain a compressed image (for convenience of distinction, the image before compression uploaded in the compression client by the use object may be referred to as an original image).
It can be understood that, for the original image uploaded by the compression client using the object, the service server 1000 may obtain the original image, and then, the service server 1000 may perform image encoding processing on the original image through an image encoder disposed in the service server 1000, so as to obtain an implicit expression feature corresponding to the original image by encoding; further, the service server 1000 may return the implicit expression feature to the terminal device, where the terminal device may perform decoding reconstruction processing on the implicit expression feature through an image decoder disposed in the terminal device, so as to decode and reconstruct a reconstructed image, where the reconstructed image may be used as a compressed image corresponding to the original image. In order to improve the image compression performance, training optimization can be performed on the image encoder and the image decoder in advance, so that the implicit expression features output by the image encoder can more reflect the image features of the original image, and the image decoder can also output the reconstructed image with higher image quality. The image decoder includes a plurality of convolution layers, and when the image decoder is subjected to training optimization, the parameters of each convolution layer of the image decoder (the parameters included in the image decoder may be referred to as decoding parameters) may be subjected to training optimization. In order to improve the training optimization efficiency of the image decoder and the compression performance of the image decoder, the application provides a parameter optimization method aiming at the image decoder, which can dynamically and adaptively select the parameters of partial convolution layers for updating.
Specifically, since each convolution layer of the image decoder is mainly used for performing feature embedding processing (or feature convolution processing) on the acquired features, the neural network layer (such as the convolution layer) used for performing feature embedding processing in the image decoder can be called as a feature embedding layer in the present application. Based on the above, it is known that the original image can be subjected to image encoding processing by the image encoder, and the content output by the image encoder can be subjected to decoding processing by the image decoder, so that a reconstructed image can be finally obtained, and here, the image encoding processing by the image encoder and the decoding processing by the image decoder can be referred to as primary image encoding/decoding processing performed on the original image, in other words, the image encoding/decoding processing can be performed once on the original image by the image encoder and the image decoder, so that a reconstructed image corresponding to the original image can be obtained (for convenience of distinction, the present application can be named as a first reconstructed image).
Then, an error loss value between the original image and the first reconstructed image (which may be referred to as a first error loss value in the present application) may be determined, and a training optimization process may be performed on decoding parameters of each feature embedding layer in the image decoder through the first error loss value between the original image and the first reconstructed image. In the application, a gating network is configured for each feature embedding layer, and a value (which can be called as an optimal control value) capable of determining whether the corresponding feature embedding layer performs parameter optimization or not is output based on the input feature of the feature embedding layer through the gating network, that is, before the decoding parameter of the target feature embedding layer is trained and optimized through the first error loss value for a certain feature embedding layer (which is called as the target feature embedding layer), the application can acquire the feature (which is the feature to be processed by the target feature embedding layer) input to the image decoder in the image encoding and decoding process, and can perform binary mapping processing on the feature to be processed through the gating network configured for the target feature embedding layer, thereby obtaining the optimal control value corresponding to the target feature embedding layer. The application can set the gate control network as a hard gate control network, only one invalid value (such as 0) or one valid value (such as 1) can be output through the gate control network, when the optimized control value is the invalid value, the suitability between the network structure (namely the decoding parameter) of the target feature embedded layer and the input feature (namely the feature to be processed of the target feature embedded layer) can be represented, the decoding parameter of the target feature embedded layer can be adapted to the feature to be processed, and the training optimization on the decoding parameter of the target feature embedded layer is not needed; when the optimized control value is an effective value, the suitability between the network structure (i.e. the decoding parameter) of the target feature embedding layer and the input feature (i.e. the feature to be processed of the target feature embedding layer) is lower, and when the decoding parameter of the target feature embedding layer is subjected to training optimization, the obtained rate distortion gain is larger than the rate distortion gain without training optimization, the decoding parameter of the target feature embedding layer can be subjected to training optimization, that is, when the gating network outputs the effective value, the suitability between the network structure (decoding parameter) of the target feature embedding layer and the input feature is lower, and parameter optimization is required at the moment; and when the gating network outputs an invalid value, the suitability between the network structure of the target feature embedding layer and the input features can be characterized to be higher, and parameter optimization is not needed at this time. In order to facilitate the decision of whether to perform parameter optimization on the decoding parameters of the target feature embedding layer based on the output of the gating network more clearly and clearly, the suitability between the network structure of the target feature embedding layer and the feature to be processed can be directly determined in absolute terms without performing parameter optimization when the optimized control value output by the gating network is an invalid value; when the optimal control value output by the gate control network is an effective value, the network structure of the target feature embedding layer and the feature to be processed can be absolutely and directly determined, the decoding parameters of the target feature embedding layer cannot adapt to the feature to be processed, and training and optimizing are needed to be carried out on the decoding parameters of the target feature embedding layer at the moment so as to accurately carry out feature embedding processing on the feature to be processed.
Based on the above, after obtaining the optimal control value corresponding to the target feature embedding layer, if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the decoding parameters of the target feature embedding layer (the decoding parameters before training and optimizing each round may be referred to as initial decoding parameters) through the first error loss value between the original image and the first reconstructed image, so as to obtain the optimal decoding parameters of the target feature embedding layer. If the optimized control value corresponding to the target feature embedding layer is determined to be an invalid value, the decoding parameters of the target feature embedding layer can be not trained and optimized. It should be understood that the output result of the gating network may be used to determine whether the decoding parameters of the feature embedding layer are updated, so that the output result of the gating network is critical, and in order to improve the accuracy of the output result of the gating network, when the training optimization process is performed on each feature embedding layer in the image decoder, the synchronous training optimization process may be performed on the network parameters of the gating network based on the error loss value between the original image and the reconstructed image, so that the output result of the gating network is more accurate, the image quality of the reconstructed image obtained by the image decoder is higher, and further, the error loss value between the original image and the reconstructed image is smaller until the convergence condition is satisfied.
It will be appreciated that the method provided by the embodiments of the present application may be performed by a computer device, including but not limited to the terminal device or service server mentioned in fig. 1.
In the specific embodiment of the present application, the data related to the user information, the user data (such as the image uploaded in the client) and the like are all obtained by the user manually authorizing the license (i.e. by the user agreeing). That is, when the above embodiments of the present application are applied to specific products or technologies, the methods and related functions provided by the embodiments of the present application are operated under the permission or agreement of the user (the functions provided by the embodiments of the present application may be actively started by the user), and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant territories and regions.
For ease of understanding, the data processing method provided by the embodiment of the present application will be described in detail below with reference to the accompanying drawings. Referring to fig. 2, fig. 2 is a flow chart of a data processing method according to an embodiment of the application. The method may be performed by a terminal device (e.g., any terminal device in the terminal device cluster shown in fig. 1, such as the terminal device 100 a), or may be performed by a server (e.g., the service server 1000 in the embodiment corresponding to fig. 1), or may be performed by both the terminal device and the server. For ease of understanding, this embodiment will be described in terms of this method being performed by a server as an example. As shown in fig. 2, the data processing method may at least include the following steps S101 to S103:
Step S101, performing image encoding and decoding processing on an original image through an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image.
In the application, in the image compression service, for an image to be compressed (which can be understood as an image which is not subjected to compression processing and can be called as an original image), firstly, nonlinear transformation (namely image coding processing) is needed to be carried out on the image to be compressed through an image encoder, the implicit expression characteristics of the image are obtained, then quantization can be carried out on the image to be compressed, and arithmetic coding processing is carried out on the image based on an entropy model, so that a coded byte stream corresponding to the implicit expression characteristics can be obtained; the image decoder may then perform an arithmetic decoding process on the encoded byte stream corresponding to the implicit expression feature to recover the implicit expression feature from the encoded byte stream and to obtain a reconstructed image. As for the process of the image encoder for encoding the image of the original image and the image decoder for decoding the output of the image encoder, the present application can use it as a primary image encoding/decoding process, that is, by the image encoder and the image decoder for encoding/decoding the original image, a reconstructed image corresponding to the original image can be obtained (for convenience of distinction, the reconstructed image can be referred to as a first reconstructed image, that is, a compressed image).
The specific implementation manner of performing image encoding and decoding processing on the original image through the image encoder and the image decoder to obtain the first reconstructed image corresponding to the original image may be: the image encoder can be used for carrying out image encoding processing on the original image, so that the implicit expression characteristics corresponding to the original image can be obtained; then, the implicit expression features can be quantized, so that first quantized features corresponding to the implicit expression features can be obtained; further, the first quantization characteristic may be decoded by an image decoder, so that a first reconstructed image corresponding to the original image may be obtained.
The specific implementation manner of the image encoder for performing the image encoding process on the original image can be as shown in the formula (1):
formula (1)
Wherein, as shown in formula (1)Can be used to characterize the image encoder; />Encoding parameters usable to characterize an image encoder; />Can be used to characterize the original image (for a computer device, the input original image can be referred to as a matrix); />Can be used to characterize the implicit expression features of the encoded output by the image encoder.
The specific implementation of the quantization processing for the implicit expression features can be as shown in formula (2):
Formula (2)
Wherein, as shown in formula (2)Can be used for characterizing quantization functions, ">Implicit expression features that can be used to characterize the encoded output by the image encoder; />Can be used to characterize the quantized feature (e.g., the first quantized feature) resulting from the quantization process.
Further, a specific implementation manner of decoding the reconstructed image (such as the first reconstructed image) by the image decoder may be as shown in formula (3):
formula (3)
Wherein, as shown in formula (3)Can be used to characterize the image decoder; />Decoding parameters usable to characterize the image encoder; />Can be used to characterize the quantization features in equation (2) above; />Can be used to characterize the reconstructed image that is decoded by the image decoder.
Step S102, obtaining the feature to be processed of the target feature embedding layer input to the image decoder in the image encoding and decoding processing process, and performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer.
In the present application, for the image encoder, any neural network with an image encoding function may be used, for example, the image encoder may be a convolutional neural network (Convolutional Neural Network, CNN), such as a residual network, a self-attention network, a transducer network, and the like; similarly, the image decoder may be any neural network having an image decoding function, or the image decoder may be a convolutional neural network. For the convolutional neural network, the convolutional neural network can be formed by stacking a plurality of convolutional layers, an activation function layer, a normalization layer and other network layers, and for each layer, the output characteristic of the last layer can be used as the input characteristic of the current layer. For example, for a certain convolution layer, the output feature of the previous convolution layer can be used as the input feature of the current convolution layer, the convolution layer needs to perform convolution calculation processing on the input feature, the obtained convolution calculation result is the output feature of the current convolution layer, and the output feature of the current convolution layer can be used as the input feature of the next convolution layer. In other words, during the image encoding and decoding process, the first quantized feature is first input to the first convolution layer of the image decoder, then the input feature of the first convolution layer of the image decoder may be understood as the first quantized feature, then the first convolution layer may perform a convolution calculation process on the first quantized feature, a result obtained by the convolution calculation process of the first convolution layer may be input to the second convolution layer, and the input feature of the second convolution layer is the output feature of the first convolution layer; then, the second convolution layer may perform convolution calculation processing on the input features, and after a result is obtained, the input features may be input to the third convolution layer … … so as to push the result obtained by convolution calculation to be input to the last convolution layer, and the output features of the last convolution layer may be input to the activation function layer for activation processing, and the processing result of the activation function layer may be input to the normalization layer for normalization processing until a reconstructed image is output.
The feature embedding layer in the present application may refer to a convolutional layer in an image decoder (also may refer to an activation function layer or a normalization layer, and since the performance improvement of the decoding parameters of the convolutional layer in the update optimization image decoder is higher than the performance improvement of the decoding parameters of other components in the update optimization image decoder, the convolutional layer may be selected as the feature embedding layer for subsequent parameter update optimization, where, since the dimension of the decoding parameters included in the convolutional layer is higher, directly updating the entire parameters may introduce a larger additional memory overhead, the present application may perform low rank decomposition processing on the parameters first, and then calculate the content after the low rank decomposition processing to perform update optimization. In the image encoding and decoding process, the to-be-processed feature input to the target feature embedding layer, that is, the input feature of the target feature embedding layer, it can be known that when the target feature embedding layer is the first feature embedding layer located at the initial position, the to-be-processed feature is the first quantization feature. And when the target feature embedding layer is a non-first feature embedding layer, the feature to be processed is the output of the last feature embedding layer.
Specifically, the specific manner of acquiring the feature to be processed of the target feature embedding layer input to the image decoder in the image encoding and decoding process may be: a feature embedding network (which may be understood as a convolutional network formed by stacking a plurality of convolutional layers) for performing a feature embedding process in the image decoder may be acquired; that is, the feature embedding network herein may be composed of a feature embedding layer sequence, and the feature embedding layer sequence includes a target feature embedding layer; then, determining the sequence position of the target feature embedding layer, and when the target feature embedding layer is positioned at the sequence starting position of the feature embedding layer sequence, quantifying the implicit expression features output by the image encoder in the image encoding and decoding process to obtain first quantified features, and determining the first quantified features as the features to be processed of the target feature embedding layer in the image encoding and decoding process; and when the target feature embedded layer is positioned at a sequence non-initial position of the feature embedded layer sequence, the last feature embedded layer of the target feature embedded layer in the feature embedded layer sequence can be used for determining the layer output features in the image encoding and decoding processing process as the features to be processed of the target feature embedded layer in the image encoding and decoding processing process.
In order to facilitate understanding of the features to be processed in the image encoding and decoding process of each feature embedding layer, please refer to fig. 3, fig. 3 is a schematic diagram illustrating image encoding and decoding processing according to an embodiment of the present application. As shown in fig. 3, for an image decoder, the image decoder may be composed of a convolutional network, an activation function layer and a normalization layer, where the convolutional network of the image decoder is formed by stacking a convolutional layer 301, a convolutional layer 302, a convolutional layer 303 and a convolutional layer …, the convolutional network in the image decoder may refer to a feature embedding network in the present application, each convolutional layer may refer to a feature embedding layer, and each stacked convolutional layer as shown in fig. 3 may be understood as a feature embedding layer sequence, in which the convolutional layer 301 is located at a sequence start position and the convolutional layer 30n is located at a sequence end position. While the output of the convolutional network (i.e., the layer output characteristics of convolutional layer 30 n) may be the input of the activation function layer, the output of which may be the input of the normalization layer.
For the original image, an image encoder may perform image encoding processing on the original image to obtain an implicit expression feature, and then, the implicit expression feature may be subjected to quantization processing, so as to obtain a quantized feature (which may be referred to as a first quantized feature), for the first quantized feature, the first quantized feature may be first input into a first convolution layer (i.e., a convolution layer 301 shown in fig. 3) of a convolution network in the image decoder, that is, a feature to be processed of the convolution layer 301 may refer to the first quantized feature, and convolution calculation processing (which may be understood as feature embedding processing) may be performed on the first quantized feature by the convolution layer 301, so as to obtain a convolution result, which may be a layer output feature of the convolution layer 301; subsequently, the layer output characteristics of the convolution layer 301 may be input to the convolution layer 302, that is, the to-be-processed characteristics of the convolution layer 302 in the image encoding and decoding process are the layer output characteristics of the convolution layer 301, and the to-be-processed characteristics of the convolution layer 303 in the image encoding and decoding process may be obtained by the same method as the layer output characteristics of the convolution layer 302; …; the feature of the convolutional layer 30n to be processed during this image codec process is the layer output feature of the convolutional layer 30 (n-1).
It should be appreciated that after the first reconstructed image is obtained by performing the image encoding and decoding process once on the original image, the image encoder and the image decoder may be optimized based on the first error loss value between the original image and the first reconstructed image, and in the process of the optimization process, the decoding parameters of each feature embedding layer may be optimized for the image decoder. In the application, in order to improve the optimization efficiency of the image decoder and reduce unnecessary layer optimization, a gating network is configured for each feature embedding layer and used for adaptively controlling whether the decoding parameters of the feature embedding layer are optimally updated. The update location (e.g., which layer the parameter is to be updated for) and the number of layer updates in the image decoder can be adaptively adjusted by the gating network of the individual feature embedding layers. Wherein the gating network can map the output of the previous layer into a binary decision, and then it can decide to skip the parameter update (parameter optimization) of the current layer or execute the parameter update based on the output result of the gating network. The present application may employ hard gating for the output of the gating network to make its output a standard binary output of either a valid value (e.g., digital 1) or an invalid value (e.g., digital 0). The gating network in the application can be any neural network with a gating mechanism, for example, can be specifically a neural network such as CNN, RNN and the like. For ease of understanding, please refer to fig. 4, fig. 4 is a schematic structural diagram of a gating network according to an embodiment of the present application. As shown in fig. 4, the gating network may include at least a convolution layer, an activation function, an adaptive average pooling layer, a full connection layer, and a normalization layer, where it should be noted that the convolution layer included in the gating network may also include a plurality of convolution layers, but the convolution layer included in the gating network and the convolution layer included in the image decoder may be different convolution layers (for example, different layer parameters and different network structures). For ease of understanding, the various parts contained in the gating network will be briefly described below:
Convolution layer: the convolution layer can perform convolution calculation processing on the received input features (i.e. the features to be processed input to the current feature embedding layer) to obtain a convolution calculation result.
Activation function: the activation function herein may be referred to as a ReLU function, which may also be understood as a linear rectification function (Linear rectification function), also known as a modified linear unit, which is a commonly used activation function (activation function) in artificial neural networks. The content output by the convolution layer can be subjected to feature preservation and mapping processing through an activation function.
Adaptive average pooling layer (adaptive average pooling): the adaptive average pooling layer can perform adaptive average pooling processing on the content output by the activation function, wherein the adaptive average pooling layer can pre-designate the output result, then the adaptive average pooling layer can automatically adjust the step length and the kernel size based on the input content so as to adapt to the output result, and the output result obtained by the adaptive average pooling layer can be pre-designated.
Full tie layer: can be used for synthesizing the characteristic content obtained before.
Normalization layer (softmax): the normalization layer may be used to normalize the content obtained by the full connection layer.
The input of the current feature embedding layer can be mapped to a binary output (e.g., mapped to 0 or 1) through the processing of the convolutional layer, the activation function, the adaptive averaging pooling layer, the fully connected layer, and the normalization layer described above.
It can be understood that, after each feature embedding layer performs parameter optimization, in the next image encoding and decoding process, the feature embedding layer performs feature embedding process on the input features based on the optimized parameters (i.e. optimized decoding parameters), the input received by the next feature embedding layer includes the content processed by the incremental decoding parameters (initial decoding parameters+incremental decoding parameters=optimized decoding parameters) introduced by the feature embedding layer, and since the feature embedding layer is cumulative, the incremental decoding parameters introduced by the feature embedding layer are cumulative layer by layer, and the cumulative speed of the information can be controlled by the gating network, new information can be selectively added, and whether parameter optimization is to be performed for a certain layer can be determined, if parameter optimization is not performed, the current layer does not introduce new decoding parameters as the incremental decoding parameters.
Based on the above, the input feature can be mapped to a value by the gating network, and the value can include two values of a valid value and an invalid value, so the mapping process of the gating network on the input feature can be called as a binary mapping process. Then, for a certain feature embedding layer in the image decoder (referred to as a target feature embedding layer, i.e. the target feature embedding layer is any feature embedding layer in the image decoder), before optimizing the decoding parameters thereof by the first error loss value, the to-be-processed feature of the target feature embedding layer may be subjected to binary mapping processing by a gating network configured for the target feature embedding layer, so that a result may be output, and the result may be used as an optimized control value corresponding to the target feature embedding layer.
It should be understood that, since the valid value and the invalid value output by the gating network can be used to determine whether the parameter of a certain feature embedding layer is optimized, and the image decoder can perform image decoding processing on the result (quantized implicit expression feature, such as the first quantization feature) output by the image encoder based on each optimized or non-optimized feature embedding layer, so as to obtain a new reconstructed image, and based on the new error loss value between the new reconstructed image and the original image, the parameter of the current image decoder can be judged, and if the error loss value between the new reconstructed image and the original image is smaller or reduced, it can be said that the parameter of each feature embedding layer in the image decoder is more accurate and is matched with the input feature, so that the input feature can be accurately convolved to obtain more accurate output feature to the next layer; in other words, after updating the position and the number of the control parameters based on the gating network, the output content of the image encoder can be decoded again based on the optimized image decoder, so that a new reconstructed image can be obtained, the control effect of the gating network control parameter update can be judged through the image quality of the new reconstructed image, and the network parameters of the gating network can be updated through the error loss value between the new reconstructed image and the original image, so that the output of the gating network is more accurate, the parameter update control effect is improved, and the image quality of the reconstructed image output by the image decoder is higher. That is, for the gating network, the output result can be used to reflect whether the decoding parameter of the current feature embedding layer is matched with the input feature (i.e. the feature to be processed), if the output result of the gating network is an invalid value, the decoding parameter of the feature embedding layer can be reflected to adapt to the feature to be processed of the feature embedding layer, and the accurate convolution processing can be performed without optimization; if the output result of the gating network is an effective value, the decoding parameters of the feature embedding layer can be reflected to be inadaptable to the to-be-processed features of the feature embedding layer, and further optimization is needed to perform accurate convolution processing. In other words, for the target feature embedding layer, the optimal control value obtained by the gating network is used to reflect decoding suitability, where decoding suitability refers to suitability between a decoding parameter of the target feature embedding layer and a feature to be processed input to the target feature embedding layer; when the optimal control value is an effective value, the decoding parameter of the target feature embedding layer can be represented, and the suitability between the decoding parameter and the feature to be processed of the target feature embedding layer is not achieved, and the current decoding parameter (such as an initial decoding parameter) of the target feature embedding layer is required to be optimally updated; when the optimal control value is an invalid value, the suitability is provided between the decoding parameter of the target feature embedding layer and the feature to be processed of the target feature embedding layer, and the current decoding parameter (such as an initial decoding parameter) of the target feature embedding layer does not need to be optimally updated.
Step S103, if the optimized control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain optimized decoding parameters of the target feature embedding layer.
In the application, after the optimal control value corresponding to the target feature embedding layer is obtained, if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain the optimal decoding parameters of the target feature embedding layer. In the application, a loss function for obtaining an original image and a reconstructed image can be preset, parameters of an image encoder and an image decoder can be trained and optimized through the loss function, and the application can adopt a Rate-Distortion (RD) loss function for optimization, namely the loss function can be a Rate-Distortion loss function, and when the image encoder and the image decoder are an image neural network, the image encoder and the image decoder can be rapidly and accurately optimized through the Rate-Distortion loss function.
Based on the above, the specific manner of determining the first error loss value may be as shown in equation (4):
formula (4)
Wherein, as shown in formula (4)Can be used to characterize the original image (for a computer device, the input original image can be referred to as a matrix); />Can be used to characterize the reconstructed image (e.g., the first reconstructed image) decoded by the image decoder; d () may refer to a distortion measure function that may measure the distortion of an image reconstruction (i.e., measure the degree of distortion of a reconstructed image compared to the original image); />The method can be used for representing quantization characteristics (such as a first quantization characteristic) obtained by quantization processing; r is the bit rate (i.e. the bit rate of the quantized feature is found, where +.>Determining a function); />May be a lagrangian multiplier that controls the ratio between R-D; />May be used to characterize the error loss value (e.g., the first error loss value). The quantized quantization characteristic, the original image and the reconstructed image output by the image encoder are substituted into the loss function as shown in the formula (4), so that an error loss value can be obtained, and the image decoder can be optimized through the error loss value.
It should be noted that, after the decoding parameters of each feature embedding layer are controlled and optimized through the gating network, a new reconstructed image can be obtained by performing decoding processing again through the optimized image decoder, a new error loss value can be determined for the new reconstructed image, and the network parameters of the gating network can be optimized through the new error loss value, so that the output result of the gating network is more and more accurate, and therefore, whether the parameters are optimized or not can be accurately controlled through each optimized gating network in the process of new round of image encoding and decoding processing, and further, the decoding effect of the image decoder is better and better. The optimization of the gating network may be seen from the description of the corresponding embodiment of fig. 5, which follows.
In the embodiment of the application, in the process of training and optimizing an image decoder in the image compression service, a gating network is configured for each feature embedding layer of the image decoder to adaptively determine whether the decoding parameters of the feature embedding layer should be updated. Wherein the output of the gating network may adaptively select whether to add new information to the feature embedding layer based on the features input to the feature embedding layer (where the new information may be understood as decoding parameters introduced for the feature embedding layer). Specifically, taking a target embedded layer in an image decoder as an example, after an original image is subjected to image encoding and decoding processing to obtain a first reconstructed image, for the target feature embedded layer, before a decoding parameter of the target feature embedded layer (a decoding parameter before training optimization can be called an initial decoding parameter) is trained and optimized by a first error loss value between the original image and the first reconstructed image, a binary mapping processing can be performed on a feature to be processed of the target feature embedded layer through a gating network of the target feature embedded layer, an output value is obtained and can be used as an optimal control value of the target feature embedded layer, and if the optimal control value is an effective value, training and optimizing processing can be performed on the initial decoding parameter of the target feature embedded layer based on the first error loss value to obtain an optimal decoding parameter of the target feature embedded layer. It should be understood that by performing the gating network for each feature embedding layer of the image decoder, whether the feature embedding layer performs parameter optimization can be adaptively determined through the gating network based on the features input to the feature embedding layer, so that the feature embedding layer which needs to perform parameter optimization can be accurately and timely subjected to parameter optimization, and the feature embedding layer which does not need to perform parameter optimization is subjected to non-optimization processing, so that the optimization timeliness of each feature embedding layer can be well improved, and unnecessary optimization of part of the feature embedding layers can be reduced, which is not only beneficial to improving the training efficiency of the image decoder, but also beneficial to improving the performance of the optimized decoder, thereby improving the image compression performance. In conclusion, the application can improve the training efficiency of the decoder and the image compression performance in the image compression service.
Further, it will be appreciated that, based on the foregoing, an image encoder may be disposed in a server (such as the service server 1000), through which an image encoding process may be performed on an original image, and an image decoder may be disposed in a terminal device, through which a decoding reconstruction process may be performed on an encoded byte stream sent from the server, to obtain a reconstructed image as a compressed image. In the application, the image decoder can be synchronously deployed in the server, and the feature embedding layer needing to be subjected to parameter optimization and the feature embedding layer not needing to be subjected to parameter optimization are determined by training the image decoder in advance to carry out parameter adaptation of the feature embedding layer. Then, the server may send the incremental decoding parameters of the feature embedding layer (i.e. the difference between the optimized decoding parameters and the initial decoding parameters) to be parameter optimized and the content (implicit expression features) encoded and output by the image encoder to the terminal device, so that the terminal device may determine the optimized decoding parameters based on the initial decoding parameters of the local original feature embedding layer and the received incremental decoding parameters, and the terminal device may call the image decoder including the optimized decoding parameters to decode the content output by the image encoder, so as to obtain the reconstructed image with higher image quality.
It will be appreciated that, for each feature embedded layer in the image decoder, the decoding parameters of each feature embedded layer are usually in the form of a matrix, and because of the higher complexity of the network structure of each feature embedded layer, the dimension of the parameter matrix of the feature embedded layer is usually higher, and the amount of parameters included in the parameter matrix is larger, so that the amount of parameters included in the corresponding incremental decoding parameters is also larger, and when the error loss value is calculated after optimization or in the process of transmitting to the terminal device, a large amount of calculation will be performed, and the generated bit stream (i.e. the encoded word throttling) has a larger cost. The feature embedded layer in the application refers to a convolution layer, and for parameters contained in the convolution layer, the feature embedded layer has low-rank decomposable characteristic, based on the characteristic, after optimizing treatment is carried out on the feature embedded layer to obtain optimized decoding parameters, the increment decoding parameters between the optimized decoding parameters and initial decoding parameters can be determined, and the increment decoding parameters are subjected to low-rank decomposition treatment, so that the increment decoding parameters are represented by two learnable matrixes, and it is understood that the matrix dimension of the learnable matrixes after the low-rank decomposition treatment is lower, the quantity of the contained parameters is less, the corresponding calculated quantity is less, and the bit stream overhead can be generated.
For ease of understanding, please refer to fig. 5, fig. 5 is a schematic flow chart of transmitting data to a terminal device according to an embodiment of the present application. The flow may correspond to the flow after obtaining the optimized decoding parameters of the target feature embedding layer in the embodiment corresponding to fig. 2. As shown in fig. 5, the flow may include at least the following steps S501 to S505:
in step S501, an incremental parameter matrix between the optimized decoding parameters and the initial decoding parameters is determined.
Specifically, for the decoding parameters of the feature embedding layer, the decoding parameters may be in the form of a matrix, and then the optimized decoding parameters and the initial decoding parameters may both be matrices, and a difference matrix may be obtained by calculating the difference between the two, where the difference matrix is an incremental parameter matrix.
Step S502, performing low-rank decomposition processing on the increment parameter matrix to obtain a decomposition matrix corresponding to the increment parameter matrix; the matrix dimensions of the decomposition matrix are lower than the matrix dimensions of the delta parameter matrix.
Specifically, for the incremental parameter matrix, the low-rank attribute of the incremental parameter matrix can be utilized to perform low-rank decomposition processing, so that two learnable matrices corresponding to the incremental parameter matrix can be obtained, and the product of the two learnable matrices can be used for representing the incremental parameter matrix. Wherein the matrix dimension for the learnable matrix is lower than the matrix dimension of the delta parameter matrix due to the principle of low rank decomposition. The decomposition matrix in the present application may refer to a learnable matrix.
The relationship between the delta parameter matrix and the learnable matrix may be as shown in equation (5):
formula (5)
Wherein, as shown in formula (5)Can be used to characterize the delta parameter matrix; a can be used to characterize a certain leachableA matrix, the learnable matrix being initialisable to a random gaussian; b may be used to characterize another learnable matrix, which may be initialized to 0. The principle of low rank decomposition will not be described in detail here.
In step S503, the target feature embedding layer including the decomposition matrix and the initial decoding parameters is determined as an optimized feature embedding layer, and the image decoder including the optimized feature embedding layer is determined as an optimized image decoder.
Specifically, after the delta parameter matrix is subjected to low-rank decomposition processing, for convenience of distinction, a target feature embedding layer including the decomposition matrix and initial decoding parameters may be determined as an optimized feature embedding layer, and an image decoder including the optimized feature embedding layer may be determined as an optimized image decoder. The delta parameter matrix of each of the optimized feature embedding layers may be transmitted to the terminal device, i.e., at this time, the decomposition matrix of each of the feature embedding layers, the optimized control values, and the implicit expression features of the image encoder may be transmitted to the terminal device. After the incremental parameter matrix is subjected to low-rank decomposition processing to obtain a decomposition matrix corresponding to the incremental parameter matrix, the implicit expression features, the decomposition matrix and the optimization control values corresponding to the target feature embedding layer can be sent to a decoding client (such as a terminal device provided with an image decoder), the terminal device can determine whether the feature embedding layer needs parameter optimization or not based on the optimization control values of the feature embedding layers, and when the parameter optimization is determined to be needed, the decomposition matrix used for representing the incremental parameter matrix is acquired, the terminal device can determine the optimization decoding parameters based on the initial decoding parameters of the decomposition matrix and the image decoder, and decode and reconstruct the implicit expression features based on the optimization decoding parameters.
Step S504, obtaining implicit expression characteristics output by the image encoder in the image encoding and decoding process, and performing fine adjustment processing on the decomposition matrix through the implicit expression characteristics and the optimized image decoder to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix.
Specifically, it should be understood that, in order to further improve the image quality (i.e. improve the image compression performance) of the reconstructed image (such as the first reconstructed image) output by the image decoder, the present application may fine tune each parameter based on the error loss value, and the fine tuning of the parameter is beneficial to improving the rate-distortion performance of the image, so that the image is clearer, and by fine tuning the decomposition matrix, the rate-distortion performance of the final output result of the image decoder may also be improved.
The specific way for performing fine adjustment processing on the decomposition matrix through the implicit expression features and the optimized image decoder to obtain the fine adjustment decomposition matrix corresponding to the decomposition matrix can be as follows: the implicit expression characteristic can be subjected to fine adjustment processing through the first error loss value, so that the fine adjustment expression characteristic can be obtained; then, the fine tuning expression feature can be quantized, so that a second quantized feature corresponding to the fine tuning expression feature can be obtained; further, the second quantized feature may be decoded by an optimized image decoder, so that a second reconstructed image corresponding to the original image may be obtained; further, a second error loss value between the original image and the second reconstructed image can be determined, and fine adjustment processing can be performed on the decomposition matrix through the second error loss value, so as to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix.
Wherein, for performing fine adjustment processing on the implicit expression feature through the first error loss value, a specific manner of obtaining the fine adjustment expression feature may be: gradient calculation processing can be carried out on the first error loss value and the implicit expression characteristic, so that a first gradient value corresponding to the implicit expression characteristic can be obtained; and then, carrying out fine adjustment processing on the implicit expression features through a first fine adjustment function and a first gradient value corresponding to the implicit expression features, thereby obtaining the fine adjustment expression features.
The specific way to fine-tune the implicit expression features can be as shown in equation (6):
formula (6)
Wherein, as shown in formula (6)Can be used for representing learning rate parameters; />May be used to characterize the error loss value (e.g., the first error loss value); />Can be used for characterizing implicit expression features; />Can be used to characterize the error loss value based on the first error>Solving implicit expression characteristics->Gradient values (first gradient values); />Can be used for characterizing the fine-tuning expression features. The fine-tuning expression features can be obtained by subtracting the first gradient value from the implicit expression features.
It can be understood that, in order to further improve the image quality (i.e. improve the image compression performance) of the reconstructed image (e.g. the first reconstructed image) output by the image decoder, the present application can fine-tune the parameters (implicit expression features) input to the image decoder, and since the fine-tuning of the parameters is beneficial to improving the rate-distortion performance of the image, the image is clearer, and then by fine-tuning the implicit expression features, the rate-distortion performance of the compressed image can also be improved. That is, the implicit expression feature output from the image encoder may be subjected to a trimming process before being subjected to a quantization process and input to the image decoder, and the trimmed trim expression feature may be subjected to a quantization process and input to the image decoder. For fine tuning of the implicit expression feature, fine tuning can be performed based on an error loss value between the original image and the reconstructed image after the reconstructed image is obtained by one-time image codec processing.
Further, based on the above-mentioned knowledge, after the fine-tuning expression feature is quantized to obtain a second quantized feature, the second quantized feature may be input to an optimized image decoder, and the optimized image decoder may perform a further decoding process on the second quantized feature, so as to obtain a new reconstructed image (second reconstructed image) corresponding to the original image, and based on a second error loss value between the original image and the second reconstructed image, fine-tuning may be performed on the decomposition matrix, so as to obtain the decomposition matrix. The fine adjustment processing is performed on the decomposition matrix through the second error loss value, and the specific implementation manner of obtaining the fine adjustment decomposition matrix corresponding to the decomposition matrix can be as follows: gradient calculation processing can be carried out on the second error loss value and the decomposition matrix, so that a second gradient value corresponding to the decomposition matrix can be obtained; and performing fine tuning treatment on the decomposition matrix through a second fine tuning function and a second gradient value corresponding to the decomposition matrix to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
Since the second reconstructed image is an image decoded and reconstructed after the delta decoding parameters are introduced, the influence of the delta decoding parameters needs to be considered when determining the error loss value between the original image and the second reconstructed image, and the specific manner of determining the second error loss value between the original image and the second reconstructed image can be as shown in formula (7):
Formula (7)
Wherein, as shown in formula (7)A bit rate calculation function; />May be used to characterize the quantization characteristic (e.g., the second quantization characteristic) received by the optimized image decoder; />Can be used to characterize the quantized delta parameter matrix (actualA decomposition matrix usable to characterize the quantized; />Can be used to characterize the image decoder; />Can be used to characterize the reconstructed image (e.g., the second reconstructed image) output by the optimized image decoder; />Initial decoding parameters usable to characterize an optimized image decoder; d () may refer to a distortion measure function that may measure the distortion of an image reconstruction (i.e., measure the degree of distortion of a reconstructed image compared to the original image); />May be used to characterize the error loss value (e.g., the first error loss value); />May be a lagrangian multiplier (which may be understood as a coefficient that trades off between rate and distortion) that controls the ratio between R-D. />Can be used to characterize the probability estimates corresponding to the decomposition matrix. />
The method can adopt a method shown in the formula (8) to determine the probability estimation of the decomposition matrix, and the method for determining the probability estimation of the decomposition matrix is shown in the formula (8) because the decomposition matrix corresponding to the incremental decoding parameter has no prior probability:
Formula (8)
Wherein, the liquid crystal display device comprises a liquid crystal display device,can be used for characterising->I (i is the position of the element); />Is a uniform noise as a substitute for quantization; w may be a predetermined parameter. But for->The probability estimate of (2) may be as shown in equation (9):
formula (9)
Wherein w shown in formula (9) may be a preset parameter, consistent with w shown in formula (8);a cumulative distribution function that can be used to characterize pi, which obeys a logical distribution as in equation (10):
formula (10)
Wherein, as shown in formula (10)Namely, obeying logic distribution results; mu, s are predetermined parameters. In the present application, w=0.01, μ=0, s=0.05 can be set. In summary, by substituting the layer-by-layer functions of the formulas (8) to (10), the +_in the formula (7) can be determined>
Further, a second error loss value between the original image and the second reconstructed image can be determined based on the loss function shown in the formula (7), and then the fine tuning processing can be performed on the decomposition matrix through the second error loss value to obtain a fine tuning decomposition matrix. The specific implementation manner of performing fine tuning treatment on the decomposition matrix to obtain the fine tuning decomposition matrix can be as shown in formula (11):
Formula (11)
Wherein, as shown in formula (11)Can be used for representing learning rate parameters; />Can be used to characterize the decomposition matrix; />May be used to characterize the error loss value (e.g., the second error loss value) determined by equation (7); />Can be used to characterize the error loss value based on the second error>Obtaining a gradient value (second gradient value) of the decomposition matrix; />Can be used to characterize the fine tuning decomposition matrix. The fine tuning decomposition matrix can be obtained by subtracting the second gradient value from the decomposition matrix.
It should be noted that, after the second error loss value is determined based on the formula (7), not only the fine tuning optimization can be performed on the decomposition matrix, but also the network parameters of the gating network can be optimized, that is, after the implicit expression characteristics output by the image encoder in the image encoding and decoding process are obtained, the network parameters of the gating network can be optimized through the implicit expression characteristics and the optimized image decoder to obtain optimized network parameters; the gating network comprising the optimized network parameters can be used for performing binary mapping processing on the updated characteristics to be processed after acquiring the updated characteristics to be processed of the target characteristic embedding layer in a new round of image encoding and decoding processing process, so as to obtain updated optimized control values corresponding to the target characteristic embedding layer.
Based on the above, the specific implementation manner of optimizing the network parameters of the gating network through the implicit expression features and the optimized image decoder to obtain the optimized network parameters may be: the fine-tuning expression feature obtained after the fine-tuning processing of the implicit expression feature by the first error loss value may be obtained (for a specific implementation manner, reference may be made to the above description, and details will not be repeated here); then, the fine adjustment expression features can be quantized to obtain second quantized features corresponding to the fine adjustment expression features; the second quantized features can be decoded by an optimized image decoder to obtain a second reconstructed image corresponding to the original image; and then, determining a second error loss value between the original image and the second reconstructed image, and performing fine adjustment processing on the network parameters of the gating network through the second error loss value to obtain optimized network parameters corresponding to the network parameters of the gating network.
The method for performing fine tuning processing on the network parameters of the gating network through the second error loss value to obtain the optimized network parameters corresponding to the network parameters of the gating network may be as follows:
formula (12)
Wherein, as shown in formula (12)Can be used for representing learning rate parameters; />May be used to characterize the error loss value (e.g., a second error loss value); />Network parameters that can be used to characterize the gating network; />Can be used to characterize the error loss value based on the second error>Obtaining a gradient value (which can be called a third gradient value) of a network parameter of the gating network; />Can be used to characterize the optimized network parameters. And subtracting the third gradient value from the network parameter to obtain the optimized network parameter.
Step S505, the implicit expression features, the fine adjustment decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer are sent to the decoding client, so that the decoding client decodes the implicit expression features, the fine adjustment decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to obtain a decoded image corresponding to the original image.
Specifically, the implicit expression features, the fine tuning decomposition matrix corresponding to the decomposition matrix, and the optimized control value corresponding to the target feature embedding layer may be sent to the decoding client. The respective values may be first arithmetically encoded to obtain a bit stream, and then the bit stream may be transmitted to the decoding client.
The specific implementation manner for sending the implicit expression features, the fine tuning decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to the decoding client side can be as follows: the fine adjustment expression characteristic obtained after the fine adjustment processing of the implicit expression characteristic through the first error loss value can be obtained; then, the fine-tuning expression features and the fine-tuning decomposition matrix can be respectively quantized, so that quantized fine-tuning features corresponding to the fine-tuning expression features and quantized fine-tuning matrices corresponding to the fine-tuning decomposition matrix can be obtained; further, arithmetic coding processing can be performed on the quantized trim feature, the quantized trim matrix and the optimized control value corresponding to the target feature embedding layer, so that a first bit stream corresponding to the quantized trim feature, a second bit stream corresponding to the quantized trim matrix and a third bit stream corresponding to the optimized control value can be obtained; the first, second and third bitstreams may then be sent to a decoding client.
It should be appreciated that by fine tuning the implicit expression features with the decomposition matrix, the rate-distortion performance can be improved well, resulting in a higher quality of the output reconstructed image; the increment decoding parameters are subjected to low-rank decomposition processing, so that the calculation parameter quantity and the storage space can be well reduced, the network bandwidth in the data transmission process can be reduced, and in conclusion, the bit cost and the transmission bandwidth can be reduced more while the storage space is reduced.
In the embodiment of the application, by carrying out the gating network on each feature embedded layer of the image decoder, whether the feature embedded layer carries out parameter optimization or not can be determined in a self-adaptive manner through the gating network based on the features input into the feature embedded layer, so that the feature embedded layer which needs to carry out parameter optimization can be accurately and timely subjected to parameter optimization, and the feature embedded layer which does not need to carry out parameter optimization is subjected to non-optimization processing, so that the optimization timeliness of each feature embedded layer can be well improved, the redundant optimization of part of the feature embedded layers can be reduced, the training efficiency of the image decoder can be improved, the performance of the optimized decoder can be improved, and the image compression performance can be improved. In addition, for each incremental decoding parameter, a low rank decomposition process can be performed, whereby the parameter storage space and bit overhead can be reduced; and by fine tuning various parameters, the rate distortion performance of the image can be improved. In summary, the application adopts low rank constraint to increment and update parameters when performing parameter adaptation for the image decoder, thereby reducing the additionally introduced parameter quantity and bit stream overhead; whether a certain layer is subjected to parameter updating or not is controlled by a dynamic gating network, the parameter updating can be adaptively adjusted according to the number and the position of blocks to be inserted in an input characteristic, and the bit stream overhead of each layer can be adaptively determined by the gating network, so that remarkable rate-distortion performance improvement can be obtained; in addition, the rate distortion performance can be further improved by fine adjustment of various parameters, so that the image compression performance is improved.
For a clearer understanding of the adaptation of parameters in an image decoder, examples will be set forth below. Specifically, for an original image x0, the present application can perform an image codec process on it through the image encoder and the image decoder, thereby obtaining a reconstructed image 1. In the first (first round) image coding and decoding process, the image encoder can perform image coding and processing on the original image x0 to obtain an implicit expression feature y1; subsequently, the image decoder may perform decoding processing on the quantized implicit expression feature y1, and reconstruct to obtain a reconstructed image 1. Further, based on the error loss value between the original image x0 and the reconstructed image 1, the image decoder may be subjected to a first (first round of) optimization, and when the image decoder is subjected to the optimization, whether each feature embedding layer is subjected to parameter optimization may be determined by the output value of each gating network in the feature embedding layer, so that the optimized decoding parameters of each feature embedding layer may be obtained (for some feature embedding layers, may not be optimized, and for feature embedding layers that are not subjected to optimization, the optimized decoding parameters may be regarded as initial decoding parameters, i.e., the optimized decoding parameters are consistent with the initial decoding parameters). Then, the implicit expression feature y1 may be subjected to fine tuning processing through an error loss value between the original image x0 and the reconstructed image 2, so as to obtain fine tuning expression features y1', then each increment decoding parameter may be subjected to low-rank decomposition processing, the fine tuning expression features y1' subjected to quantization processing may be subjected to decoding processing again based on the optimized image decoder including each decomposition matrix, so as to reconstruct and obtain a new reconstructed image 2, the decomposition matrix of each feature embedding layer and the network parameters of the gating network may be subjected to fine tuning optimization based on the error loss value between the original image x0 and the reconstructed image 2, and then the fine tuning decomposition matrix and the initial decoding parameters of the feature embedding layer may be added, so as to obtain final optimized decoding parameters of the feature embedding layer, thereby enabling one-time optimization of the image decoder to be adaptively realized.
Subsequently, the first optimized image encoder and image decoder can perform image encoding and decoding processing once, thereby obtaining a reconstructed imageImage 3. In the second (second round) image encoding and decoding process, the image encoder can perform image encoding processing on the original image x0 to obtain an implicit expression feature y2; subsequently, the image decoder may perform decoding processing on the quantized implicit expression feature y2, and reconstruct to obtain a reconstructed image 3. Further, based on the error loss value between the original image x0 and the reconstructed image 3, the image decoder may be subjected to a second (second round) optimization process, and when the image decoder is subjected to the optimization process, whether each feature embedding layer is subjected to parameter optimization may be determined by using an output value of each gating network in the feature embedding layer (in this case, a network parameter of the gating network, that is, an optimized network parameter after the first round of optimization, where the output value is determined based on the optimized network parameter), so that an optimized decoding parameter of each feature embedding layer in the second round may be obtained (for some feature embedding layers, may not be optimized, and for the feature embedding layer that is not subjected to optimization, the optimized decoding parameter in the second round may be regarded as an initial decoding parameter (in the second round, the initial decoding parameter of the feature embedding layer may refer to the optimized decoding parameter after the second round of optimization), that is, the optimized decoding parameter is consistent with the initial decoding parameter). Then, the implicit expression feature y2 can be subjected to fine tuning processing through an error loss value between the original image x0 and the reconstructed image 3, so that fine tuning expression features y2 'can be obtained, then each increment decoding parameter can be subjected to low-rank decomposition processing, the fine tuning expression features y2' subjected to quantization processing can be subjected to decoding processing again based on the optimized image decoder comprising each decomposition matrix, thus a new reconstructed image 3 can be reconstructed, the decomposition matrix of each feature embedding layer and the network parameters of the gating network can be subjected to fine tuning optimization based on the error loss value between the original image x0 and the reconstructed image 3, and then the fine tuning decomposition matrix and the initial decoding parameters of the feature embedding layer can be added, so that final optimized decoding parameters of the feature embedding layer can be obtained, and therefore, primary optimization of the image decoder can be adaptively realized. With the above principle, the image decoder can be optimized for multiple rounds in the same way, And performing multi-round optimization on the gating network until the training convergence condition is met (if the optimized round number meets a preset value). The present application may preset the number of fine adjustment rounds for the implicit expression feature (e.g., set to 2000), or preset the number of optimization rounds for the image decoder (e.g., set to 2000). It should be noted that the present application may perform optimization preheating in 100 steps in advance, specifically, the output values of the gate control networks of the previous 100 rounds may be all fixed to 1, so that the problem that the reconstruction performance is reduced in the initial stage of parameter adaptation of the image decoder may be reduced. The application can use Adam optimizer to 10 -3 The learning rate fine tuning decomposition matrix and implicit expression characteristics of (2) can be set to 10 for a gating network -5
In summary, for a specific way to optimize the network parameters of the gating network and the decoding parameters of the feature embedding layer in the image decoder end to end, it can be as shown in equation (13):
formula (13)
Wherein K as shown in equation (13) can be used to characterize the kth feature embedding layer, and K can be used to characterize the total number of feature embedding layers;an output of the gating network usable to characterize the k-th feature embedding layer; / >Quantized values of the decomposition matrix that may be used to characterize the feature embedding layer of the k-th layer (quantized mapping may be performed at quantization intervals w, resulting in a discrete value, w may be 0.01).
It should be noted that, the output of the gating network is hard-gated, so that it is a standard binary output with an invalid value of 0 or an effective value of 1, while in the back propagation process (i.e. the process of optimizing the gating network based on the loss value), soft gating may be used for gradient propagation, i.e. as shown in formula (13)Can be determined as shown in equation (14):
formula (14)
Wherein, as shown in formula (14)Is an indication function; />Is the gating network of the k layer; />Representing a stop gradient operator which normally calculates when propagating in the forward direction and the gradient is 0 when propagating in the reverse direction; />Can be used to characterize the input features of the k-th layer (i.e., the layer output features of the k-1 layer).
After the gating network is introduced to optimize the decoding parameters of each feature embedding layer of the image decoder, the layer output for a certain feature embedding layer in the image decoder can be shown as formula (15):
formula (15)
Wherein W is as shown in formula (15) K Initial decoding parameters (i.e., decoding parameters before parameter optimization for a certain round) that can be used to characterize the kth feature embedding layer; The incremental decoding parameters (i.e. the difference between the optimized decoding parameters obtained by parameter optimization and the initial decoding parameters) which can be used for representing the kth feature embedded layer; />The output result of the gating network which can be used for representing k characteristic embedding layers; />Can be used to characterize the layer output features of the kth feature embedding layer (i.e., the input features of the k+1 layer).
Further, for easy understanding, please refer to fig. 6, fig. 6 is a schematic diagram of an architecture of data interaction between a server and a decoding end according to an embodiment of the present application. As shown in fig. 6, the architecture may include at least a hidden variable fine tuning component, an image decoder parameter adapting component, a bit stream transmission component, and an image reconstruction component. The individual components will be described below:
hidden variable fine tuning component: the hidden variable fine tuning component can be used to fine tune the implicit expression features obtained by the image encoder. As shown in fig. 6, the specific process of fine-tuning the implicit expression feature may include: the image encoder performs image encoding processing on an input original image to obtain an implicit expression feature, then the quantized implicit expression feature can be input to the image decoder, the image decoder can perform decoding reconstruction processing on the implicit expression feature, and therefore a reconstructed image (such as the first reconstructed image) can be output, and fine adjustment processing can be performed on the implicit expression feature based on an error loss value (such as the first error loss value) between the original image and the reconstructed image, so as to obtain a fine adjustment expression feature.
An image decoder adaptation component: the image decoder adapting component can optimize and adapt parameters of each feature embedding layer in the image decoder based on the fine-tuning expression features output by the hidden variable fine-tuning component, wherein when the parameters of each feature embedding layer are optimized and adapted, the feature embedding layer needing parameter optimization can be adaptively controlled through the output result of the gating network.
Bit stream transmission component: the bit stream transmission component may arithmetically encode the implicit expression features (or the fine expression features), the incremental decoding parameters of some feature embedding layers (which may actually be a decomposition matrix or a fine decomposition matrix) and the output values of the gating network of each layer (i.e., the optimal control values) to obtain a bit stream, and then send the bit stream to the image reconstruction component. Before arithmetic coding is performed on the implicit expression feature (or fine adjustment expression feature) and the increment decoding parameter, respectively performing quantization processing on the implicit expression feature (or fine adjustment expression feature) and the increment decoding parameter, respectively performing arithmetic coding processing on the quantized implicit expression feature (or fine adjustment expression feature) and the increment decoding parameter, and performing arithmetic coding on the content of the implicit expression feature (or fine adjustment expression feature), which is subjected to arithmetic coding, which can be called as a content stream; the content of the increment decoding parameter and the optimized control value after arithmetic coding can be called model stream. The content stream and the model stream are both bit streams.
An image reconstruction component: the image reconstruction component may perform arithmetic decoding processing on the bit stream (including the content stream and the model stream) to obtain implicit expression features (or fine adjustment expression features), and for the image decoder, the optimal control values and the incremental decoding parameters of each feature embedding layer may perform decoding reconstruction processing on the implicit expression features (or fine adjustment expression features) based on the optimal control values and the incremental decoding parameters of each feature embedding layer, to finally obtain a reconstructed image.
For specific implementation manners of the hidden variable fine tuning component, the image decoder parameter adapting component, the bit stream transmission component and the image reconstruction component, reference may be made to the descriptions in the foregoing embodiments, and details will not be repeated here. The beneficial effects brought by the method are not repeated.
Further, referring to fig. 7, fig. 7 is a schematic diagram of a system logic structure according to an embodiment of the present application. The system logic architecture may include at least an image encoder, an image decoder, and an optimization update component. The individual components will be described below:
an image encoder: the image encoder may be configured to perform image encoding processing on an input original image to obtain implicit expression features. The method and the device can take the implicit expression characteristics output by the image encoder as a content stream, obtain quantized implicit expression characteristics after arithmetic decoding, and decode and reconstruct the quantized implicit expression characteristics to obtain a reconstructed image.
An image decoder: the image decoder may be configured to decode and reconstruct the quantized implicit expression features to obtain a reconstructed image. In the process of decoding and reconstructing the quantized implicit expression features by the image decoder, the image decoder can perform decoding and reconstructing processing based on decoding parameters, and in order to improve the quality of reconstructed images output by the image decoder, the application can adaptively train and optimize the decoding parameters of the image decoder based on an optimizing and updating component.
An optimization updating component: the optimization updating component can adaptively update the decoding parameters of the image decoder, adaptively control the parameter optimization position and the layer optimization number of the feature embedded layer in the image decoder. That is, the feature embedding layer for parameter optimization in the image decoder and the feature embedding layer without parameter optimization can be adaptively determined through the optimization updating component, and as for the feature embedding layer for parameter optimization, the optimal control value output by the gating network of the feature embedding layer is an effective value 1, and the feature embedding layer comprises a decomposition matrix corresponding to the incremental decoding parameter; and for the feature embedded layer which is not subjected to parameter optimization, the optimized control value output by the gating network is an invalid value 0, and the feature embedded layer does not contain the incremental decoding parameters. For the decomposition matrix of each feature embedded layer and the increment decoding parameter output by the optimization updating component (when the optimization control value is an invalid value, the decomposition matrix can be regarded as a null value), the decomposition matrix can be respectively quantized and subjected to arithmetic coding, and bit streams obtained by respectively quantizing and arithmetic coding the optimization control value and the decomposition matrix of each layer can be used as model streams together.
In summary, the parameter optimization method of the image decoder provided by the application can additionally transmit a model stream for adaptively optimizing the decoding parameters of each feature embedding layer of the image decoder on the basis of transmitting the content stream, and for the image decoder, a reconstructed image with higher quality can be output through the content stream and the model stream.
Further, referring to fig. 8, fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application. The data processing apparatus may be a computer program (including program code) running in a computer device, for example the data processing apparatus is an application software; the data processing device may be used to perform the method shown in fig. 3. As shown in fig. 8, the data processing apparatus 1 may include: a codec module 11, a feature acquisition module 12, a feature mapping module 13, and a parameter training module 14.
The encoding and decoding module 11 is configured to perform image encoding and decoding processing on the original image through the image encoder and the image decoder, so as to obtain a first reconstructed image corresponding to the original image;
a feature acquisition module 12, configured to acquire a feature to be processed, which is input to a target feature embedding layer of an image decoder during an image encoding and decoding process;
The feature mapping module 13 is configured to perform binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer, so as to obtain an optimized control value corresponding to the target feature embedding layer;
and the parameter training module 14 is configured to, if it is determined that the optimal control value corresponding to the target feature embedding layer is an effective value, perform training optimization processing on the initial decoding parameter of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image, so as to obtain an optimal decoding parameter of the target feature embedding layer.
The specific implementation manners of the codec module 11, the feature acquiring module 12, the feature mapping module 13, and the parameter training module 14 may be referred to the description of step S101 to step S103 in the embodiment corresponding to fig. 2, which will not be described herein.
In one embodiment, the optimized control value obtained by the gating network is used to reflect decoding suitability, which refers to the suitability between the decoding parameter of the target feature embedding layer and the feature to be processed input into the target feature embedding layer;
when the optimal control value is an effective value, the decoding parameter of the target feature embedding layer is represented, and the suitability between the decoding parameter and the feature to be processed of the target feature embedding layer is not possessed;
When the optimal control value is an invalid value, the decoding parameter of the target feature embedding layer is indicated, and the suitability is provided between the decoding parameter and the feature to be processed of the target feature embedding layer.
In one embodiment, the specific manner of performing, by the codec module 11, image encoding and decoding processing on the original image by using the image encoder and the image decoder to obtain the first reconstructed image corresponding to the original image includes:
performing image coding processing on the original image through an image coder to obtain implicit expression features corresponding to the original image;
carrying out quantization processing on the implicit expression characteristics to obtain first quantization characteristics corresponding to the implicit expression characteristics;
and decoding the first quantized features through an image decoder to obtain a first reconstructed image corresponding to the original image.
In one embodiment, after obtaining the optimized decoding parameters of the target feature embedding layer, the data processing apparatus 1 further includes: an incremental matrix determination module 15, a matrix decomposition module 16, a determination module 17, a matrix fine tuning module 18, and a transmission module 19.
An incremental matrix determining module 15 for determining an incremental parameter matrix between the optimized decoding parameters and the initial decoding parameters;
the matrix decomposition module 16 is configured to perform low-rank decomposition processing on the incremental parameter matrix to obtain a decomposition matrix corresponding to the incremental parameter matrix; the matrix dimension of the decomposition matrix is lower than the matrix dimension of the incremental parameter matrix;
A determining module 17, configured to determine a target feature embedding layer including the decomposition matrix and the initial decoding parameters as an optimized feature embedding layer, and determine an image decoder including the optimized feature embedding layer as an optimized image decoder;
the matrix fine adjustment module 18 is configured to obtain implicit expression features output by the image encoder in the image encoding and decoding process, and perform fine adjustment processing on the decomposition matrix through the implicit expression features and the optimized image decoder to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix;
and the sending module 19 is configured to send the implicit expression feature, the fine adjustment decomposition matrix corresponding to the decomposition matrix, and the optimal control value corresponding to the target feature embedding layer to the decoding client, so that the decoding client decodes the implicit expression feature, the fine adjustment decomposition matrix corresponding to the decomposition matrix, and the optimal control value corresponding to the target feature embedding layer, and obtains a decoded image corresponding to the original image.
The specific implementation manners of the incremental matrix determining module 15, the matrix decomposing module 16, the determining module 17, the matrix fine tuning module 18, and the transmitting module 19 may be referred to the description of step S501-step S505 in the embodiment corresponding to fig. 5, and will not be described herein.
In one embodiment, the matrix fine tuning module 18 performs fine tuning on the decomposition matrix through the implicit expression feature and the optimized image decoder, to obtain a specific manner of fine tuning the decomposition matrix corresponding to the decomposition matrix, including:
performing fine adjustment processing on the implicit expression characteristic through a first error loss value to obtain a fine adjustment expression characteristic;
performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through an optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine tuning processing on the decomposition matrix through the second error loss value to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
In one embodiment, the matrix fine tuning module 18 performs fine tuning on the implicit expression feature through the first error loss value to obtain a specific manner of fine tuning the implicit expression feature, including:
performing gradient calculation processing on the first error loss value and the implicit expression characteristic to obtain a first gradient value corresponding to the implicit expression characteristic;
and carrying out fine adjustment processing on the implicit expression characteristics through a first fine adjustment function and a first gradient value corresponding to the implicit expression characteristics to obtain fine adjustment expression characteristics.
In one embodiment, the matrix fine tuning module 18 performs fine tuning on the decomposition matrix through the second error loss value to obtain a specific manner of fine tuning the decomposition matrix corresponding to the decomposition matrix, where the specific manner includes:
performing gradient calculation processing on the second error loss value and the decomposition matrix to obtain a gradient value corresponding to the decomposition matrix;
and performing fine tuning treatment on the decomposition matrix through a second fine tuning function and a second gradient value corresponding to the decomposition matrix to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
In one embodiment, the specific manner of sending the implicit expression feature, the fine tuning decomposition matrix corresponding to the decomposition matrix, and the optimized control value corresponding to the target feature embedding layer to the decoding client by the sending module 19 includes:
acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through a first error loss value;
respectively carrying out quantization processing on the fine-tuning expression features and the fine-tuning decomposition matrix to obtain quantized fine-tuning features corresponding to the fine-tuning expression features and quantized fine-tuning matrices corresponding to the fine-tuning decomposition matrix;
respectively carrying out arithmetic coding treatment on the quantized fine tuning features, the quantized fine tuning matrix and the optimized control values corresponding to the target feature embedding layer to obtain a first bit stream corresponding to the quantized fine tuning features, a second bit stream corresponding to the quantized fine tuning matrix and a third bit stream corresponding to the optimized control values;
The first, second and third bitstreams are sent to a decoding client.
In one embodiment, the specific manner of acquiring the feature to be processed of the target feature embedding layer input to the image decoder in the image codec process by the feature acquiring module 12 includes:
acquiring a feature embedding network used for performing feature embedding processing in an image decoder; the feature embedding network consists of a feature embedding layer sequence, wherein the feature embedding layer sequence comprises a target feature embedding layer;
when the target feature embedded layer is positioned at the sequence starting position of the feature embedded layer sequence, carrying out quantization processing on implicit expression features output by an image encoder in the image encoding and decoding processing process to obtain first quantization features, and determining the first quantization features as features to be processed of the target feature embedded layer in the image encoding and decoding processing process;
and when the target feature embedded layer is positioned at a sequence non-initial position of the feature embedded layer sequence, determining the layer output feature of the last feature embedded layer of the target feature embedded layer in the feature embedded layer sequence in the image coding and decoding process as the feature to be processed of the target feature embedded layer in the image coding and decoding process.
In one embodiment, after acquiring the implicit expression features output by the image encoder during the image codec process, the data processing apparatus 1 further includes: a network parameter optimization module 20.
The network parameter optimization module 20 is configured to perform optimization processing on the network parameters of the gating network through the implicit expression features and the optimized image decoder to obtain optimized network parameters; and the gating network comprises optimized network parameters and is used for carrying out binary mapping processing on the updated characteristics to be processed after acquiring the updated characteristics to be processed of the target characteristic embedding layer in a new round of image encoding and decoding processing process to obtain updated optimized control values corresponding to the target characteristic embedding layer.
For a specific implementation of the network parameter optimization module 20, reference may be made to the description in step S504 in the embodiment corresponding to fig. 5, and a detailed description will not be given here.
In one embodiment, the network parameter optimization module 20 performs optimization processing on the network parameters of the gating network through the implicit expression features and the optimized image decoder to obtain a specific mode of optimizing the network parameters, which includes:
acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through a first error loss value;
Performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through an optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine adjustment processing on the network parameters of the gating network through the second error loss value to obtain optimized network parameters corresponding to the network parameters of the gating network.
In the embodiment of the application, when the parameter adaptation is carried out for the image decoder, the low-rank constraint is adopted to incrementally update the parameters, so that the additionally introduced parameter quantity and bit stream overhead can be reduced; whether a certain layer is subjected to parameter updating or not is controlled by a dynamic gating network, the parameter updating can be adaptively adjusted according to the number and the position of blocks to be inserted in an input characteristic, and the bit stream overhead of each layer can be adaptively determined by the gating network, so that remarkable rate-distortion performance improvement can be obtained; in addition, the rate distortion performance can be further improved by fine adjustment of various parameters, so that the image compression performance is improved.
Further, referring to fig. 9, fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 9, the above-described computer device 8000 may include: processor 8001, network interface 8004, and memory 8005, and further, the above-described computer device 8000 further includes: a user interface 8003, and at least one communication bus 8002. Wherein a communication bus 8002 is used to enable connected communications between these components. The user interface 8003 may include a Display screen (Display), a Keyboard (Keyboard), and the optional user interface 8003 may also include standard wired, wireless interfaces, among others. Network interface 8004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). Memory 8005 may be a high speed RAM memory or a non-volatile memory, such as at least one disk memory. Memory 8005 may optionally also be at least one memory device located remotely from the aforementioned processor 8001. As shown in fig. 9, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 8005, which is one type of computer-readable storage medium.
In the computer device 8000 shown in fig. 9, the network interface 8004 may provide a network communication function; while user interface 8003 is primarily an interface for providing input to the user; and the processor 8001 may be used to invoke a device control application stored in the memory 8005 to implement:
performing image coding and decoding processing on the original image through an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image;
acquiring the feature to be processed, which is input to a target feature embedding layer of an image decoder, in the image encoding and decoding processing process, and performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer;
if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain the optimal decoding parameters of the target feature embedding layer.
It should be understood that the computer device 8000 according to the embodiment of the present application may perform the description of the data processing method according to the embodiment of fig. 2 to 5, and may also perform the description of the data processing apparatus 1 according to the embodiment of fig. 8, which is not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.
Furthermore, it should be noted here that: the embodiment of the present application further provides a computer readable storage medium, where a computer program executed by the computer device 8000 for data processing mentioned above is stored, and the computer program includes program instructions, when the processor executes the program instructions, the description of the data processing method in the embodiment corresponding to fig. 2 to 5 can be executed, and therefore, will not be repeated herein. In addition, the description of the beneficial effects of the same method is omitted. For technical details not disclosed in the embodiments of the computer-readable storage medium according to the present application, please refer to the description of the method embodiments of the present application.
The computer readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the computer device, for example, a hard disk or a memory of the computer device. The computer readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the computer device. Further, the computer-readable storage medium may also include both internal storage units and external storage devices of the computer device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
In one aspect of the present application, a computer program product is provided, the computer program product comprising a computer program stored in a computer readable storage medium. A processor of a computer device reads the computer program from a computer-readable storage medium, and the processor executes the computer program to cause the computer device to perform a method provided in an aspect of an embodiment of the present application.
The terms first, second and the like in the description and in the claims and drawings of embodiments of the application are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The method and related apparatus provided in the embodiments of the present application are described with reference to the flowchart and/or schematic structural diagrams of the method provided in the embodiments of the present application, and each flow and/or block of the flowchart and/or schematic structural diagrams of the method may be implemented by computer program instructions, and combinations of flows and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or structural diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or structures.
The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims (14)

1. A method of data processing, comprising:
performing image coding and decoding processing on an original image through an image encoder and an image decoder to obtain a first reconstructed image corresponding to the original image;
acquiring a feature to be processed, which is input to a target feature embedding layer of the image decoder, in the image encoding and decoding processing process, and performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer;
and if the optimal control value corresponding to the target feature embedding layer is determined to be an effective value, training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image to obtain the optimal decoding parameters of the target feature embedding layer.
2. The method according to claim 1, wherein the optimized control value obtained by the gating network is used to reflect decoding suitability, which is the suitability between a decoding parameter of the target feature embedding layer and a feature to be processed input to the target feature embedding layer;
When the optimal control value is an effective value, the decoding parameter of the target feature embedding layer is represented, and the suitability between the decoding parameter and the feature to be processed of the target feature embedding layer is not achieved;
and when the optimal control value is an invalid value, the decoding parameter representing the target feature embedded layer has suitability with the feature to be processed aiming at the target feature embedded layer.
3. The method according to claim 1, wherein the performing, by the image encoder and the image decoder, the image encoding and decoding process on the original image to obtain a first reconstructed image corresponding to the original image, includes:
performing image coding processing on the original image through the image coder to obtain implicit expression characteristics corresponding to the original image;
carrying out quantization processing on the implicit expression features to obtain first quantization features corresponding to the implicit expression features;
and decoding the first quantized features through the image decoder to obtain a first reconstructed image corresponding to the original image.
4. The method of claim 1, wherein after obtaining the optimized decoding parameters for the target feature embedding layer, the method further comprises:
Determining an incremental parameter matrix between the optimized decoding parameters and the initial decoding parameters;
performing low-rank decomposition treatment on the increment parameter matrix to obtain a decomposition matrix corresponding to the increment parameter matrix; the matrix dimension of the decomposition matrix is lower than the matrix dimension of the increment parameter matrix;
determining a target feature embedding layer containing the decomposition matrix and the initial decoding parameters as an optimized feature embedding layer, and determining an image decoder containing the optimized feature embedding layer as an optimized image decoder;
acquiring implicit expression characteristics output by the image encoder in the image encoding and decoding processing process, and performing fine adjustment processing on the decomposition matrix through the implicit expression characteristics and the optimized image decoder to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix;
and sending the implicit expression features, the fine tuning decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to a decoding client so that the decoding client decodes the implicit expression features, the fine tuning decomposition matrix corresponding to the decomposition matrix and the optimized control value corresponding to the target feature embedding layer to obtain a decoded image corresponding to the original image.
5. The method according to claim 4, wherein the performing fine adjustment processing on the decomposition matrix by using the implicit expression features and the optimized image decoder to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix, includes:
performing fine adjustment processing on the implicit expression characteristic through the first error loss value to obtain a fine adjustment expression characteristic;
performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through the optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine adjustment processing on the decomposition matrix through the second error loss value to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix.
6. The method of claim 5, wherein said performing fine tuning of said implicit expression signature via said first error loss value results in a fine tuned expression signature, comprising:
performing gradient calculation processing on the first error loss value and the implicit expression characteristic to obtain a first gradient value corresponding to the implicit expression characteristic;
And carrying out fine adjustment processing on the implicit expression features through a first fine adjustment function corresponding to the implicit expression features and the first gradient value to obtain fine adjustment expression features.
7. The method according to claim 5, wherein the performing fine adjustment processing on the decomposition matrix by using the second error loss value to obtain a fine adjustment decomposition matrix corresponding to the decomposition matrix includes:
performing gradient calculation processing on the second error loss value and the decomposition matrix to obtain a second gradient value corresponding to the decomposition matrix;
and performing fine tuning treatment on the decomposition matrix through a second fine tuning function corresponding to the decomposition matrix and the second gradient value to obtain a fine tuning decomposition matrix corresponding to the decomposition matrix.
8. The method of claim 4, wherein the sending the implicit expression features, the fine tuning decomposition matrix corresponding to the decomposition matrix, and the optimized control values corresponding to the target feature embedding layer to a decoding client comprises:
acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through the first error loss value;
respectively carrying out quantization processing on the fine-tuning expression features and the fine-tuning decomposition matrix to obtain quantized fine-tuning features corresponding to the fine-tuning expression features and quantized fine-tuning matrices corresponding to the fine-tuning decomposition matrix;
Respectively carrying out arithmetic coding treatment on the quantized fine tuning feature, the quantized fine tuning matrix and the optimized control value corresponding to the target feature embedding layer to obtain a first bit stream corresponding to the quantized fine tuning feature, a second bit stream corresponding to the quantized fine tuning matrix and a third bit stream corresponding to the optimized control value;
and sending the first bit stream, the second bit stream and the third bit stream to a decoding client.
9. The method according to claim 1, wherein the acquiring the feature to be processed of the target feature embedding layer input to the image decoder during the image codec process includes:
acquiring a feature embedding network used for performing feature embedding processing in the image decoder; the feature embedding network is composed of a feature embedding layer sequence, and the feature embedding layer sequence comprises the target feature embedding layer;
when the target feature embedding layer is positioned at the sequence starting position of the feature embedding layer sequence, carrying out quantization processing on implicit expression features output by the image encoder in the image encoding and decoding processing process to obtain first quantization features, and determining the first quantization features as features to be processed of the target feature embedding layer in the image encoding and decoding processing process;
And when the target feature embedding layer is positioned at a sequence non-initial position of the feature embedding layer sequence, determining a layer output feature in the image encoding and decoding processing process as a feature to be processed of the target feature embedding layer in the image encoding and decoding processing process by using a previous feature embedding layer of the target feature embedding layer in the feature embedding layer sequence.
10. The method of claim 4, wherein after obtaining the implicit expression features output by the image encoder during the image codec process, the method further comprises:
optimizing the network parameters of the gating network through the implicit expression characteristics and the optimized image decoder to obtain optimized network parameters; and the gating network comprising the optimized network parameters is used for carrying out binary mapping processing on the updated characteristics to be processed after acquiring the updated characteristics to be processed of the target characteristic embedding layer in a new round of image encoding and decoding processing process, so as to obtain updated optimized control values corresponding to the target characteristic embedding layer.
11. The method according to claim 10, wherein said optimizing network parameters of the gating network by the implicit expression feature and the optimized image decoder to obtain optimized network parameters comprises:
Acquiring a fine adjustment expression characteristic obtained after fine adjustment processing is carried out on the implicit expression characteristic through the first error loss value;
performing quantization processing on the fine-tuning expression features to obtain second quantization features corresponding to the fine-tuning expression features;
decoding the second quantized features through the optimized image decoder to obtain a second reconstructed image corresponding to the original image;
and determining a second error loss value between the original image and the second reconstructed image, and performing fine adjustment processing on the network parameters of the gating network through the second error loss value to obtain optimized network parameters corresponding to the network parameters of the gating network.
12. A data processing apparatus, comprising:
the encoding and decoding module is used for carrying out image encoding and decoding processing on the original image through the image encoder and the image decoder to obtain a first reconstructed image corresponding to the original image;
the feature acquisition module is used for acquiring the feature to be processed of the target feature embedding layer input to the image decoder in the image encoding and decoding processing process;
the feature mapping module is used for performing binary mapping processing on the feature to be processed through a gating network configured for the target feature embedding layer to obtain an optimized control value corresponding to the target feature embedding layer;
And the parameter training module is used for training and optimizing the initial decoding parameters of the target feature embedding layer through a first error loss value between the original image and the first reconstructed image if the optimized control value corresponding to the target feature embedding layer is determined to be an effective value, so as to obtain the optimized decoding parameters of the target feature embedding layer.
13. A computer device, comprising: a processor, a memory, and a network interface;
the processor is connected to the memory and the network interface, wherein the network interface is configured to provide a network communication function, the memory is configured to store a computer program, and the processor is configured to invoke the computer program to cause the computer device to perform the method of any of claims 1-11.
14. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program adapted to be loaded by a processor and to perform the method of any of claims 1-11.
CN202310885764.2A 2023-07-19 2023-07-19 Data processing method, device, equipment and readable storage medium Active CN116614637B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310885764.2A CN116614637B (en) 2023-07-19 2023-07-19 Data processing method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310885764.2A CN116614637B (en) 2023-07-19 2023-07-19 Data processing method, device, equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN116614637A CN116614637A (en) 2023-08-18
CN116614637B true CN116614637B (en) 2023-09-12

Family

ID=87676850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310885764.2A Active CN116614637B (en) 2023-07-19 2023-07-19 Data processing method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116614637B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117421199B (en) * 2023-12-19 2024-04-02 湖南三湘银行股份有限公司 Behavior determination method and system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109996071A (en) * 2019-03-27 2019-07-09 上海交通大学 Variable bit rate image coding, decoding system and method based on deep learning
CN110769263A (en) * 2019-11-01 2020-02-07 合肥图鸭信息科技有限公司 Image compression method and device and terminal equipment
CN111667006A (en) * 2020-06-06 2020-09-15 大连民族大学 Method for generating family font based on AttGan model
CN112991192A (en) * 2019-12-18 2021-06-18 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and system thereof
US11153566B1 (en) * 2020-05-23 2021-10-19 Tsinghua University Variable bit rate generative compression method based on adversarial learning
WO2022235785A1 (en) * 2021-05-04 2022-11-10 Innopeak Technology, Inc. Neural network architecture for image restoration in under-display cameras
WO2023031632A1 (en) * 2021-09-06 2023-03-09 Imperial College Innovations Ltd Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system
CN116233445A (en) * 2023-05-10 2023-06-06 腾讯科技(深圳)有限公司 Video encoding and decoding processing method and device, computer equipment and storage medium
CN116320435A (en) * 2023-03-20 2023-06-23 北京计算机技术及应用研究所 Visual analysis-oriented image compression method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10999606B2 (en) * 2019-01-08 2021-05-04 Intel Corporation Method and system of neural network loop filtering for video coding
CN113132723B (en) * 2019-12-31 2023-11-14 武汉Tcl集团工业研究院有限公司 Image compression method and device
US11849118B2 (en) * 2021-04-30 2023-12-19 Tencent America LLC Content-adaptive online training with image substitution in neural image compression

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109996071A (en) * 2019-03-27 2019-07-09 上海交通大学 Variable bit rate image coding, decoding system and method based on deep learning
CN110769263A (en) * 2019-11-01 2020-02-07 合肥图鸭信息科技有限公司 Image compression method and device and terminal equipment
CN112991192A (en) * 2019-12-18 2021-06-18 杭州海康威视数字技术股份有限公司 Image processing method, device, equipment and system thereof
US11153566B1 (en) * 2020-05-23 2021-10-19 Tsinghua University Variable bit rate generative compression method based on adversarial learning
CN111667006A (en) * 2020-06-06 2020-09-15 大连民族大学 Method for generating family font based on AttGan model
WO2022235785A1 (en) * 2021-05-04 2022-11-10 Innopeak Technology, Inc. Neural network architecture for image restoration in under-display cameras
WO2023031632A1 (en) * 2021-09-06 2023-03-09 Imperial College Innovations Ltd Encoder, decoder and communication system and method for conveying sequences of correlated data items from an information source across a communication channel using joint source and channel coding, and method of training an encoder neural network and decoder neural network for use in a communication system
CN116320435A (en) * 2023-03-20 2023-06-23 北京计算机技术及应用研究所 Visual analysis-oriented image compression method and system
CN116233445A (en) * 2023-05-10 2023-06-06 腾讯科技(深圳)有限公司 Video encoding and decoding processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN116614637A (en) 2023-08-18

Similar Documents

Publication Publication Date Title
CN110262819B (en) Method and device for updating model parameters of federated learning
US10599935B2 (en) Processing artificial neural network weights
US11756561B2 (en) Speech coding using content latent embedding vectors and speaker latent embedding vectors
CN111641832B (en) Encoding method, decoding method, device, electronic device and storage medium
US11057634B2 (en) Content adaptive optimization for neural data compression
CN116614637B (en) Data processing method, device, equipment and readable storage medium
CN113132723B (en) Image compression method and device
US11354822B2 (en) Stop code tolerant image compression neural networks
CN110753225A (en) Video compression method and device and terminal equipment
CN111641826B (en) Method, device and system for encoding and decoding data
KR102382383B1 (en) Method and system for improving image compression efficiency based on deep learning
CN113450421B (en) Unmanned aerial vehicle reconnaissance image compression and decompression method based on enhanced deep learning
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
US20230186927A1 (en) Compressing audio waveforms using neural networks and vector quantizers
CN111161363A (en) Image coding model training method and device
CN113038134B (en) Picture processing method, intelligent terminal and storage medium
EP4229632A1 (en) Signal coding using a generative model and latent domain quantization
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
US20230306239A1 (en) Online training-based encoder tuning in neural image compression
US20230316588A1 (en) Online training-based encoder tuning with multi model selection in neural image compression
CN117459727B (en) Image processing method, device and system, electronic equipment and storage medium
US20230334718A1 (en) Online training computer vision task models in compression domain
AU2022279597A1 (en) Training rate control neural networks through reinforcement learning
CN115714627A (en) Self-adaptive semantic communication transmission method and electronic equipment
CN116917987A (en) Generating output signals using variable rate discrete representations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40091033

Country of ref document: HK