WO2021155832A1 - 一种图像处理方法以及相关设备 - Google Patents

一种图像处理方法以及相关设备 Download PDF

Info

Publication number
WO2021155832A1
WO2021155832A1 PCT/CN2021/075405 CN2021075405W WO2021155832A1 WO 2021155832 A1 WO2021155832 A1 WO 2021155832A1 CN 2021075405 W CN2021075405 W CN 2021075405W WO 2021155832 A1 WO2021155832 A1 WO 2021155832A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
values
compression code
code rate
gain
Prior art date
Application number
PCT/CN2021/075405
Other languages
English (en)
French (fr)
Inventor
王晶
崔泽
白博
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to AU2021215764A priority Critical patent/AU2021215764A1/en
Priority to KR1020227030515A priority patent/KR20220137076A/ko
Priority to CA3167227A priority patent/CA3167227A1/en
Priority to EP21751079.1A priority patent/EP4090022A4/en
Priority to JP2022548020A priority patent/JP2023512570A/ja
Priority to BR112022015510A priority patent/BR112022015510A2/pt
Priority to CN202180013213.6A priority patent/CN115088257A/zh
Priority to MX2022009686A priority patent/MX2022009686A/es
Publication of WO2021155832A1 publication Critical patent/WO2021155832A1/zh
Priority to US17/881,432 priority patent/US20220375133A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/48Extraction of image or video features by mapping characteristic values of the pattern into a parameter space, e.g. Hough transformation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Definitions

  • This application relates to the field of artificial intelligence, and in particular to an image processing method and related equipment.
  • Ballé et al. proposed an end-to-end optimized image coding method, which has achieved the best image coding performance and even surpassed the current best traditional coding standard BPG.
  • Ballé et al. proposed an end-to-end optimized image coding method, which has achieved the best image coding performance and even surpassed the current best traditional coding standard BPG.
  • most of the current image coding based on deep convolutional network has a defect, that is, a trained model can only output one coding result for one input image, and cannot obtain the coding effect of the target compression rate according to actual needs. .
  • This application provides an image processing method for realizing compression code rate control in the same compression model.
  • the present application provides an image processing method, the method including:
  • the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first characteristic value.
  • the M is a positive integer less than or equal to N;
  • the corresponding first eigenvalues are processed to obtain M second eigenvalues.
  • at least one first characteristic map after processing may be used to replace the original first characteristic map;
  • a feature map is quantized and entropy encoded to obtain encoded data, and the processed at least one first feature map includes the M second feature values.
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition, and the preset condition is related to the target compression code rate.
  • the greater the target compression code rate the greater the information entropy of the quantized data.
  • the difference between the compression code rate corresponding to the encoded data and the target compression code rate is within a preset range.
  • the M second characteristic values are obtained by multiplying the M target gain values with the corresponding first characteristic values, respectively.
  • the at least one first feature map includes a first target feature map, the first target feature map includes P first feature values, and the P first feature values
  • the target gain value corresponding to each first eigenvalue in the is the same, and the P is a positive integer less than or equal to the M.
  • the method further includes:
  • the target mapping relationship includes multiple compression code rates and multiple gain vectors, and an association relationship between multiple compression code rates and multiple gain vectors, and the target compression code rate is the multiple compression code rates.
  • One of the M target gain values is an element of one of the multiple gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target gain values.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first gain values, so The second compression code rate corresponds to M second gain values, and the M target gain values are obtained by performing interpolation operations on the M first gain values and the M second gain values.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include The third target gain value, the first target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues
  • the third The target gain value is obtained by performing an interpolation operation on the first target gain value and the second target gain value.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used for the characteristics obtained in the process of decoding the encoded data Values are processed, and the product of each of the M target gain values and the corresponding inverse gain value is within a preset range.
  • the method further includes: performing entropy decoding on the encoded data to obtain at least one second feature map, and the at least one second feature map includes N third feature values , Each third eigenvalue corresponds to a first eigenvalue; obtain M target inverse gain values, and each target inverse gain value corresponds to a third eigenvalue; according to the M target inverse gain values, the corresponding third Eigenvalue gains to obtain M fourth eigenvalues; image reconstruction is performed on at least one second feature map after inverse gain processing to obtain a second image, and the at least one second feature map after inverse gain processing includes all The M fourth eigenvalues.
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second feature map includes a second target feature map, the second target feature map includes P third feature values, and the P third feature values
  • the target inverse gain value corresponding to each third eigenvalue in P is the same, and the P is a positive integer less than or equal to the M.
  • the method further includes: determining M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the compression code rate and The correlation between the anti-gain vectors.
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and an association relationship between multiple compression code rates and multiple inverse gain vectors.
  • the target compression code rate is one of the plurality of compression code rates
  • the M target inverse gain values are elements of one of the plurality of inverse gain vectors.
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target anti-gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first inverse gain values, The second compression code rate corresponds to M second inverse gain values, and the M target inverse gain values are obtained by interpolating the M first inverse gain values and the M second inverse gain values owned.
  • the M first anti-gain values include a first target anti-gain value
  • the M second anti-gain values include a second target anti-gain value
  • the M The target inverse gain value includes a third target inverse gain value
  • the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to one of the M first eigenvalues
  • the third target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • this application provides an image processing method, the method including:
  • a fourth feature value performing image reconstruction on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature values.
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second feature map includes a second target feature map, the second target feature map includes P third feature values, and the P third feature values
  • the target inverse gain value corresponding to each third eigenvalue in P is the same, and the P is a positive integer less than or equal to the M.
  • the method further includes: obtaining a target compression code rate; determining M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, and the target mapping relationship is ⁇ represents the association relationship between the compression code rate and the inverse gain vector; wherein the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and the relationship between the multiple compression code rates and the multiple inverse gain vectors Correlation, the target compression code rate is one of the multiple compression code rates, and the M target inverse gain values are elements of one of the multiple inverse gain vectors; or, the target mapping The relationship includes an objective function mapping relationship. When the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first inverse gain values, The second compression code rate corresponds to M second inverse gain values, and the M target inverse gain values are obtained by interpolating the M first inverse gain values and the M second inverse gain values owned.
  • the M first anti-gain values include a first target anti-gain value
  • the M second anti-gain values include a second target anti-gain value
  • the M The target inverse gain value includes a third target inverse gain value
  • the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to one of the M first eigenvalues
  • the third target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • this application provides an image processing method, the method including:
  • the target compression code rate corresponding to M initial gain values and M initial inverse gain values, each initial gain value corresponds to a first characteristic value, and each initial inverse gain value corresponds to a first characteristic Value, the M is a positive integer less than or equal to N;
  • the at least one first feature map after the gain processing includes the M second feature maps. Eigenvalues;
  • the image distortion value is related to the bit rate loss and the distortion loss
  • the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network;
  • the second codec network is a model obtained after iterative training of the first codec network.
  • the M target gains The value and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • the information entropy of the quantized data obtained by quantizing the at least one first feature map after the gain processing satisfies a preset condition, and the preset condition is the same as the target compression code rate. related.
  • the preset condition at least includes: the larger the target compression code rate, the larger the information entropy of the quantized data.
  • the M second eigenvalues are obtained by multiplying the M target gain values with the corresponding first eigenvalues.
  • the at least one first feature map includes a first target feature map, the first target feature map includes P first feature values, and the P first feature values The target gain value corresponding to each first eigenvalue in the is the same, and the P is a positive integer less than or equal to the M.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range, and among the M initial gain values, The product of each initial gain value and the corresponding initial inverse gain value is within a preset range.
  • an image processing device which includes:
  • a feature extraction module configured to perform feature extraction on the first image to obtain at least one first feature map, where the at least one first feature map includes N first feature values, where N is a positive integer;
  • the acquisition module is further configured to acquire a target compression code rate, the target compression code rate corresponds to M target gain values, each target gain value corresponds to a first characteristic value, and the M is a positive value less than or equal to N Integer
  • a gain module configured to process corresponding first eigenvalues according to the M target gain values to obtain M second eigenvalues
  • the quantization and entropy coding module is configured to perform quantization and entropy coding on the processed at least one first feature map to obtain encoded data, and the processed at least one first feature map includes the M second feature values.
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the preset condition includes at least:
  • the difference between the compression code rate corresponding to the encoded data and the target compression code rate is within a preset range.
  • the M second characteristic values are obtained by multiplying the M target gain values and the corresponding first characteristic values respectively.
  • the at least one first feature map includes a first target feature map, the first target feature map includes P first feature values, and the P first feature values The target gain value corresponding to each first eigenvalue in the is the same, and the P is a positive integer less than or equal to the M.
  • the device further includes:
  • a determining module configured to determine M target gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the association relationship between the compression code rate and the M target gain values;
  • the target mapping relationship includes multiple compression code rates and multiple gain vectors, and an association relationship between multiple compression code rates and multiple gain vectors, and the target compression code rate is the multiple compression code rates.
  • One of the M target gain values is an element of one of the multiple gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target gain values.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first gain values, so The second compression code rate corresponds to M second gain values, and the M target gain values are obtained by performing interpolation operations on the M first gain values and the M second gain values.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include The third target gain value, the first target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues
  • the third The target gain value is obtained by performing an interpolation operation on the first target gain value and the second target gain value.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used for the characteristics obtained in the process of decoding the encoded data Values are processed, and the product of each of the M target gain values and the corresponding inverse gain value is within a preset range.
  • the device further includes:
  • the decoding module is configured to perform entropy decoding on the encoded data to obtain at least one second feature map, where the at least one second feature map includes N third feature values, and each third feature value corresponds to a first feature value ;
  • the acquiring module is further configured to acquire M target anti-gain values, and each target anti-gain value corresponds to a third characteristic value;
  • the device also includes:
  • the inverse gain module is configured to gain the corresponding third eigenvalues respectively according to the M target inverse gain values to obtain M fourth eigenvalues;
  • the reconstruction module is configured to perform image reconstruction on at least one second feature map after inverse gain processing to obtain a second image, where the at least one second feature map after inverse gain processing includes the M fourth feature values .
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second feature map includes a second target feature map, the second target feature map includes P third feature values, and the P third feature values
  • the target inverse gain value corresponding to each third eigenvalue in P is the same, and the P is a positive integer less than or equal to the M.
  • the determining module is further used for:
  • the M target inverse gain values corresponding to the target compression code rate are determined according to the target mapping relationship, and the target mapping relationship is used to indicate the association relationship between the compression code rate and the inverse gain vector.
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and an association relationship between multiple compression code rates and multiple inverse gain vectors.
  • the target compression code rate is one of the plurality of compression code rates
  • the M target inverse gain values are elements of one of the plurality of inverse gain vectors.
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target anti-gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first inverse gain values, The second compression code rate corresponds to M second inverse gain values, and the M target inverse gain values are obtained by interpolating the M first inverse gain values and the M second inverse gain values owned.
  • the M first anti-gain values include a first target anti-gain value
  • the M second anti-gain values include a second target anti-gain value
  • the M The target inverse gain value includes a third target inverse gain value
  • the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to one of the M first eigenvalues
  • the third target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • the present application provides an image processing device, the device including:
  • a decoding module configured to perform entropy decoding on the encoded data to obtain at least one second feature map, where the at least one second feature map includes N third feature values, where N is a positive integer;
  • the acquiring module is further configured to acquire M target inverse gain values, each target inverse gain value corresponds to a third characteristic value, and the M is a positive integer less than or equal to N;
  • the inverse gain module is configured to process the corresponding third characteristic values according to the M target inverse gain values to obtain M fourth characteristic values;
  • the reconstruction module is configured to perform image reconstruction on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature values.
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second feature map includes a second target feature map, the second target feature map includes P third feature values, and the P third feature values
  • the target inverse gain value corresponding to each third eigenvalue in P is the same, and the P is a positive integer less than or equal to the M.
  • the acquiring module is further configured to acquire a target compression code rate
  • the device also includes:
  • a determining module configured to determine M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the association relationship between the compression code rate and the inverse gain vector;
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and association relationships between multiple compression code rates and multiple inverse gain vectors, and the target compression code rate is the multiple compression One of the code rates, the M target inverse gain values are an element of one of the plurality of inverse gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the target compression code rate is greater than the first compression code rate and less than the second compression code rate, and the first compression code rate corresponds to M first inverse gain values, The second compression code rate corresponds to M second inverse gain values, and the M target inverse gain values are obtained by interpolating the M first inverse gain values and the M second inverse gain values owned.
  • the M first anti-gain values include a first target anti-gain value
  • the M second anti-gain values include a second target anti-gain value
  • the M The target inverse gain value includes a third target inverse gain value
  • the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to one of the M first eigenvalues
  • the third target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • an image processing device which includes:
  • a feature extraction module configured to perform feature extraction on the first image according to an encoding network to obtain at least one first feature map, the at least one first feature map includes N first feature values, and N is a positive integer;
  • the acquisition module is further configured to acquire a target compression code rate, the target compression code rate corresponding to M initial gain values and M initial inverse gain values, each initial gain value corresponds to a first characteristic value, and each initial gain value corresponds to a first characteristic value.
  • the inverse gain value corresponds to a first characteristic value, and the M is a positive integer less than or equal to N;
  • a gain module configured to process the corresponding first eigenvalues according to the M initial gain values to obtain M second eigenvalues
  • the quantization and entropy coding module is configured to perform quantization and entropy coding on the processed at least one first feature map according to the quantization network and the entropy coding network to obtain coded data and a bit rate loss, and at least one first feature after the gain processing
  • the graph includes the M second characteristic values
  • the decoding module is configured to perform entropy decoding on the encoded data according to the entropy decoding network to obtain at least one second feature map, where the at least one second feature map includes M third feature values, and each third feature value corresponds to one First eigenvalue
  • the inverse gain module is configured to process corresponding third characteristic values according to the M initial inverse gain values to obtain M fourth characteristic values;
  • a reconstruction module configured to perform image reconstruction on the processed at least one second feature map according to the decoding network to obtain a second image, and the processed at least one feature map includes the M fourth feature values;
  • the acquiring module is further configured to acquire the distortion loss of the second image relative to the first image
  • the training module is used to perform joint training on the first codec network, M initial gain values, and M initial inverse gain values using a loss function, until the image distortion value between the first image and the second image reaches the first A preset degree, the image distortion value is related to the bit rate loss and the distortion loss, and the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network;
  • the output module is used to output a second codec network, M target gain values, and M target inverse gain values.
  • the second codec network is a model obtained after the first codec network has performed iterative training.
  • the M target gain values and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • the information entropy of the quantized data obtained by quantizing the at least one first feature map after the gain processing satisfies a preset condition, and the preset condition is the same as the target compression code rate. related.
  • the preset condition includes at least:
  • the M second eigenvalues are obtained by multiplying the M target gain values with the corresponding first eigenvalues.
  • the at least one first feature map includes a first target feature map, the first target feature map includes P first feature values, and the P first feature values The target gain value corresponding to each first eigenvalue in the is the same, and the P is a positive integer less than or equal to the M.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range, and among the M initial gain values, The product of each target gain value and the corresponding initial inverse gain value is within a preset range.
  • an embodiment of the present application provides an execution device, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • target compression code rate corresponds to M target gain values, each target gain value corresponds to a first eigenvalue, and the M is a positive integer less than or equal to N;
  • the execution device is a virtual reality VR device, a mobile phone, a tablet, a notebook computer, a server, or a smart wearable device.
  • the processor may also be used to execute steps executed by the executing device in each possible implementation manner of the first aspect.
  • steps executed by the executing device in each possible implementation manner of the first aspect.
  • an embodiment of the present application provides an execution device, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to execute the program in the memory, including the following steps:
  • each target anti-gain value corresponds to a third eigenvalue, where M is a positive integer less than or equal to N;
  • Image reconstruction is performed on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature values.
  • the execution device is a virtual reality VR device, a mobile phone, a tablet, a notebook computer, a server, or a smart wearable device.
  • the processor may also be used to execute steps executed by the execution device in each possible implementation manner of the second aspect.
  • steps executed by the execution device in each possible implementation manner of the second aspect.
  • an embodiment of the present application provides a training device, which may include a memory, a processor, and a bus system, where the memory is used to store programs, and the processor is used to execute programs in the memory, including the following steps:
  • the target compression code rate corresponding to M initial gain values and M initial inverse gain values, each initial gain value corresponds to a first characteristic value, and each initial inverse gain value corresponds to a first characteristic Value, the M is a positive integer less than or equal to N;
  • the at least one first feature map after the gain processing includes the M second feature maps. Eigenvalues;
  • the image distortion value is related to the bit rate loss and the distortion loss
  • the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network;
  • the second codec network is a model obtained after iterative training of the first codec network.
  • the M target gains The value and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • the processor may also be used to execute steps executed by the executing device in each possible implementation manner of the third aspect.
  • steps executed by the executing device in each possible implementation manner of the third aspect.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer executes the first to third aspects described above. Any of the described image processing methods.
  • an embodiment of the present application provides a computer program that, when running on a computer, causes the computer to execute the image processing method described in any one of the first to third aspects.
  • this application provides a chip system that includes a processor for supporting execution devices or training devices to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods And/or information.
  • the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • the embodiment of the present application provides an image processing method to obtain a first image; perform feature extraction on the first image to obtain at least one first feature map, and the at least one first feature map includes N first feature values
  • the N is a positive integer
  • the target compression code rate is obtained, the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first characteristic value, and the M is a positive value less than or equal to N Integer; respectively process the corresponding first feature values according to the M target gain values to obtain M second feature values; perform quantization and entropy coding on the processed at least one first feature map to obtain encoded data, so
  • the processed at least one first feature map includes the M second feature values.
  • Figure 1 is a schematic diagram of a structure of the main frame of artificial intelligence
  • Figure 2a is a schematic diagram of an application scenario of an embodiment of this application.
  • Figure 2b is a schematic diagram of an application scenario of an embodiment of this application.
  • FIG. 3 is a schematic diagram of an embodiment of an image processing method provided by an embodiment of this application.
  • Figure 4 is a schematic diagram of a CNN-based image processing process
  • FIG. 5a is a schematic diagram of information entropy distribution of feature maps of different compression code rates in an embodiment of this application;
  • FIG. 5b is a schematic diagram of the information entropy distribution of feature maps of different compression code rates in an embodiment of this application;
  • FIG. 6 is a schematic diagram of an objective mapping function relationship provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram of an embodiment of an image processing method provided by an embodiment of this application.
  • FIG. 8 is a schematic diagram of an image compression process provided by an embodiment of this application.
  • FIG. 9 is a schematic diagram of a compression effect of an embodiment of this application.
  • FIG. 10 is a schematic diagram of a training process according to an embodiment of the application.
  • FIG. 11 is a schematic diagram of an image processing process according to an embodiment of this application.
  • FIG. 12 is a system architecture diagram of an image processing system provided by an embodiment of the application.
  • FIG. 13 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • FIG. 14 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 15 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 16 is a schematic diagram of a structure of an image processing device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of a training device provided by an embodiment of the application.
  • FIG. 19 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom".
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, training, etc.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent transportation, Smart medical care, autonomous driving, safe city, etc.
  • This application can be applied to the field of image processing in the field of artificial intelligence.
  • the following will introduce multiple application scenarios that are implemented in the product.
  • the image compression method provided in the embodiments of the present application can be applied to the image compression process in the terminal device, and specifically, can be applied to the photo album, video surveillance, etc. on the terminal device.
  • FIG. 2a is a schematic diagram of an application scenario of an embodiment of this application.
  • the terminal device can extract the features of the acquired image to be compressed through the artificial intelligence (AI) coding unit in the embedded neural network (neural-network processing unit, NPU), and transform the image data into a lower redundancy
  • AI artificial intelligence
  • NPU neural-network processing unit
  • the output feature of the output feature, and the probability estimation of each point in the output feature is generated.
  • the central processing unit performs arithmetic coding on the extracted output feature through the probability estimation of each point in the output feature to reduce the coding redundancy of the output feature.
  • the amount of data transmission in the image compression process is further reduced, and the encoded data obtained by encoding is stored in the corresponding storage location in the form of a data file.
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on arithmetic decoding, and use the AI decoding unit in the NPU to analyze the feature map. Perform reconstruction to obtain a reconstructed image.
  • the image compression method provided in the embodiments of the present application can be applied to the image compression process on the cloud side, and specifically, can be applied to functions such as cloud photo albums on the cloud side server.
  • FIG. 2b is a schematic diagram of the application scenario of an embodiment of this application.
  • the terminal device can use the CPU to perform lossless encoding and compression on the compressed picture to obtain encoded data.
  • the terminal device can transmit the encoded data to the server on the cloud side, and the server can The received encoded data is subjected to corresponding lossless decoding to obtain the image to be compressed.
  • the server can use the AI encoding unit in the graphics processing unit (GPU) to perform feature extraction on the acquired image to be compressed, and transform the image data into Output features with lower redundancy, and generate a probability estimate of each point in the output feature.
  • the CPU performs arithmetic coding on the extracted output feature through the probability estimation of each point in the output feature, reducing the coding redundancy of the output feature and further reducing
  • the amount of data transferred in the image compression process, and the encoded data obtained by encoding is stored in the corresponding storage location in the form of a data file.
  • the CPU can obtain and load the above saved file in the corresponding storage location, and obtain the decoded feature map based on arithmetic decoding, and use the AI decoding unit in the NPU to analyze the feature map. Perform reconstruction to obtain a reconstructed image.
  • the server can use the CPU to perform lossless encoding and compression on the compressed image to obtain encoded data. For example, but not limited to any lossless compression method based on the prior art, the server can transmit the encoded data to The terminal device, the terminal device can perform corresponding lossless decoding on the received encoded data to obtain the decoded image.
  • the step of gaining the feature value in the feature map can be added between the AI coding unit and the quantization unit, and the inverse gain of the feature value in the feature map can be added between the arithmetic decoding unit and the AI decoding unit.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes xs and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • Ws is the weight of Xs
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer, and the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting multiple above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • the DNN is divided according to the positions of different layers.
  • the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer.
  • the first layer is the input layer
  • the last layer is the output layer
  • the number of layers in the middle are all hidden layers.
  • the layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated in terms of the work of each layer. Simply put, it is the following linear relationship expression: in, Is the input vector, Is the output vector, Is the offset vector, W is the weight matrix (also called coefficient), and ⁇ () is the activation function.
  • Each layer is just the input vector After such a simple operation, the output vector is obtained Due to the large number of DNN layers, the coefficient W and the offset vector The number is also relatively large.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the coefficient from the kth neuron in the L-1th layer to the jth neuron in the Lth layer is defined as
  • Convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure.
  • the convolutional neural network contains a feature extractor composed of a convolutional layer and a sub-sampling layer.
  • the feature extractor can be regarded as a filter.
  • the convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network.
  • a neuron can be connected to only part of the neighboring neurons.
  • a convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units in the same feature plane share weights, and the shared weights here are the convolution kernels.
  • Sharing weight can be understood as the way of extracting image information has nothing to do with location.
  • the convolution kernel can be initialized in the form of a matrix of random size. In the training process of the convolutional neural network, the convolution kernel can obtain reasonable weights through learning. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, and at the same time reduce the risk of overfitting.
  • the neural network can use the back propagation (BP) algorithm to modify the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, forwarding the input signal to the output will cause error loss, and the parameters in the initial neural network model are updated by backpropagating the error loss information, so that the error loss is converged.
  • the back-propagation algorithm is a back-propagation motion dominated by error loss, and aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the embodiment of the present application first takes an application scenario as a terminal device as an example for description.
  • the terminal device may be a mobile phone, a tablet, a notebook computer, a smart wearable device, etc., and the terminal device may compress the acquired pictures.
  • the terminal device may be a virtual reality (VR) device.
  • the embodiments of the present application can also be applied to intelligent surveillance. A camera can be configured in the intelligent surveillance, and then the intelligent surveillance can obtain images to be compressed through the camera. It should be understood that the embodiments of the present application can also be applied to In other scenarios where image compression is required, other application scenarios will not be listed here.
  • FIG. 3 is a schematic diagram of an embodiment of an image processing method provided by an embodiment of the present application.
  • an image processing method provided by an embodiment of the present application includes:
  • the first image is an image to be compressed, where the first image may be an image captured by the above-mentioned terminal device through a camera, or the first image may also be an image obtained from inside the terminal device (for example, , The image stored in the album of the terminal device, or the picture obtained by the terminal device from the cloud). It should be understood that the above-mentioned first image may be an image with an image compression requirement, and this application does not make any limitation on the source of the image to be processed.
  • the terminal device may perform feature extraction on the first image based on CNN to obtain at least one first feature map.
  • the first feature map may also be referred to as a channel feature map image, where Each semantic channel corresponds to a first feature map (channel feature map image).
  • FIG. 4 is a schematic diagram of a CNN-based image processing process.
  • a CNN layer is a schematic diagram of a CNN-based image processing process.
  • CNN 402 may multiply the upper left 3 ⁇ 3 pixels of the input data (first image) by the weight, and map it to the neuron at the upper left end of the first feature map.
  • the weight to be multiplied will also be 3 ⁇ 3.
  • CNN 402 scans the input data (first image) one by one from left to right and top to bottom, and multiplies the weights to map the neurons of the feature map.
  • the 3 ⁇ 3 weights used are called filters or filter kernels.
  • the process of applying the filter in CNN 402 is the process of using the filter kernel to perform the convolution operation, and the extracted result is called the "first feature map", where the first feature map can also be called
  • the term “multi-channel feature map image” may refer to a set of feature map images corresponding to multiple channels.
  • a multi-channel feature map image may be generated by CNN 402, which is also referred to as the "feature extraction layer” or “convolutional layer” of CNN.
  • the CNN layer can define the mapping of output to input.
  • the mapping defined by the layer is executed as one or more filter kernels (convolution kernels) to be applied to the input data to generate a feature map image to be output to the next layer.
  • the input data can be an image or a feature map image of a specific layer.
  • CNN 402 receives the first image 401 and generates a multi-channel feature map image 403 as an output.
  • the next layer 402 receives a multi-channel feature map image 403 as input, and generates a multi-channel feature map image 403 as an output.
  • each subsequent layer will receive the multi-channel feature map image generated in the previous layer, and generate the next multi-channel feature map image as output.
  • the multi-channel feature map image generated in the (N)th layer by receiving the multi-channel feature map image generated in the (N)th layer.
  • processing operations in addition to applying the operation of mapping the input feature map image to the convolution kernel of the output feature map image, other processing operations can also be performed. Examples of other processing operations may include, but are not limited to, applications such as activation functions, pooling, resampling, and the like.
  • the original image (the first image) is transformed into another space (at least one first feature map) through the CNN convolutional neural network in the above-mentioned manner.
  • the number of first feature maps is 192, that is, the number of semantic channels is 192, and each semantic channel corresponds to a first feature map.
  • at least one first feature map may be in the form of a three-dimensional tensor, and its size may be 192 ⁇ w ⁇ h, where w ⁇ h is the width of the matrix corresponding to the first feature map of a single channel and long.
  • feature extraction may be performed on the first image to obtain multiple feature values, and at least one of the first feature maps may be part or all of the multiple feature values. Comparing some feature maps corresponding to semantic channels that have less impact on the compression result, gain may not be performed. In this case, at least one first feature map is a partial feature map of multiple feature values.
  • the at least one first characteristic map includes N first characteristic values, and the N is a positive integer.
  • the terminal device may obtain the target compression code rate, where the target compression code rate may be specified by the user or determined by the terminal device based on the first image, which is not limited here.
  • the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first characteristic value, where M is a positive integer less than or equal to N. That is, there is a certain correlation between the target compression code rate and the M target gain values.
  • the terminal device can determine the corresponding M target gain values according to the obtained target compression code rate.
  • the terminal device may determine M target gain values corresponding to the target compression code rate according to a target mapping relationship, and the target mapping relationship is used to indicate the compression code rate and the M target gain values The relationship between.
  • the target mapping relationship may be a pre-stored mapping relationship. After the terminal device obtains the target compression code rate, it can directly find the target mapping relationship corresponding to the target compression code rate in the corresponding storage location.
  • the target mapping relationship may include multiple compression code rates and multiple gain vectors, and an association relationship between multiple compression code rates and multiple gain vectors.
  • the code rate is one of the multiple compressed code rates
  • the M target gain values are elements of one of the multiple gain vectors.
  • the target mapping relationship may be a preset table or other form, which includes multiple compression code rates and gain vectors corresponding to each compression code rate.
  • the gain vector may include multiple elements, each of which is compressed
  • the code rate corresponds to M target gain values, where the M target gain values are elements included in the gain vector corresponding to each compression code rate.
  • the target mapping relationship may include an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target gain values.
  • the target mapping relationship may be a preset objective function mapping relationship or other forms.
  • the objective function mapping relationship may at least represent the correspondence relationship between the compression code rate and the gain value.
  • the output of the target function relationship includes the M target gain values.
  • the at least one first feature map includes a first target feature map
  • the first target feature map includes P first feature values
  • each of the P first feature values The target gain values corresponding to the first eigenvalues are the same
  • the P is a positive integer less than or equal to the M, that is, the P first eigenvalues are eigenvalues of the same semantic channel, and their corresponding target gain values are the same.
  • a gain value can be used to represent the above-mentioned P first eigenvalues.
  • the same gain value as the number of semantic channels can be used to represent the M first feature values.
  • the number of semantic channels (first feature maps) is 192
  • 192 gain values can be used to represent the M first feature values.
  • the target gain values corresponding to the first feature values included in all or part of the feature maps in at least one first feature map may be the same.
  • the at least one first feature map includes the first target A feature map
  • the first target feature map includes P first feature values
  • each of the P first feature values corresponds to the same target gain value
  • the P is less than or equal to the A positive integer of M
  • the first target feature map is one of at least one first feature map, which includes P first feature values
  • each of the P first feature values corresponds to a target
  • the gain value is the same.
  • the N first feature values may be all the feature values included in at least one first feature map.
  • the number of M and N is the same, it is equivalent to at least one of the first feature maps that each of the feature values includes a corresponding target gain value, when M is less than N, it is equivalent to at least one first feature map Some of the included characteristic values have corresponding target gain values.
  • the number of first feature maps is greater than 1, wherein each of all feature values included in at least one of the partial feature maps in the first feature map has a corresponding target gain value, and Part of the feature values included in the partial feature maps in at least one first feature map have corresponding target gain values.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • the M first feature values are the feature values corresponding to one or more target objects among the N first feature values.
  • the M first feature values are the feature values corresponding to one or more target objects among the N first feature values.
  • the corresponding first eigenvalues may be processed respectively according to the M target gain values to obtain M The second characteristic value.
  • the M second eigenvalues are obtained by multiplying the M target gain values with the corresponding first eigenvalues, that is, one first eigenvalue is multiplied by the corresponding target gain value. After that, the corresponding second characteristic value can be obtained.
  • different target gain values can be obtained for the different target compression code rates obtained, and according to the M
  • Each target gain value processes the corresponding first eigenvalues, and after obtaining M second eigenvalues, the distribution of N first eigenvalues included in at least one feature map corresponding to the original first image will be The M first eigenvalues of the gain processing are changed.
  • Figures 5a and 5b are schematic diagrams of the distribution of feature maps of different compression code rates in the embodiments of this application, where different compression code rates use different bits per pixel. per pixel, bpp), where bpp represents the number of bits used to store each pixel. The smaller the bit rate, the smaller the compression rate.
  • Figure 5a shows the distribution of N first eigenvalues when bpp is 1.
  • Figure 5b shows the distribution of N first eigenvalues when bpp is 0.15.
  • the output features (N first eigenvalues) of the high compression code rate model coding network have greater variance in the statistical histogram, so after quantization The information entropy is also greater.
  • the rule for selecting M target gain values is: the larger the target compression code rate, the greater the value of the N first eigenvalues obtained after the corresponding first eigenvalues are processed according to the M target gain values. The more dispersed the distribution, the greater the information entropy after quantization.
  • each of the multiple first feature maps The target gain values corresponding to the feature values included in the first feature map are the same, that is, each first feature map corresponds to a target gain value.
  • the feature value is multiplied by the corresponding target gain value to change the distribution of the N first feature values included in the multiple first feature maps, where the larger the target compression code rate, the more scattered the distribution of the N first feature values .
  • each of a part of the first feature maps The target gain value corresponding to the feature value included in the first feature map is the same, and the target gain value corresponding to the feature value included in each first feature map in the remaining part of the first feature map is different, that is, a part of the first feature map
  • Each first feature map in the first feature map corresponds to a target gain value
  • each first feature map in the remaining part of the first feature map corresponds to multiple target gain values (different feature values in the same feature map may correspond to different Target gain value)
  • the gain may not be performed).
  • the number of the first feature maps obtained by extraction needs to be greater than 1, wherein the target gain value corresponding to the feature value included in each first feature map of the multiple first feature maps is the same, that is, each first feature map Corresponding to a target gain value, at this time, by multiplying the feature value included in each first feature map in the multiple first feature maps by the corresponding target gain value, the Nth first feature map included in the multiple first feature maps is changed.
  • a distribution of eigenvalues where the larger the target compression code rate, the more scattered the distribution of the N first eigenvalues.
  • the gain may not be performed). It is necessary that the number of the extracted part of the first feature maps is greater than 1, where each first feature map in a part of the first feature maps includes the same target gain value corresponding to the feature value, and the remaining part of the first feature map.
  • the target gain value corresponding to the feature value included in each first feature map is not the same, that is, each first feature map in a part of the first feature map corresponds to a target gain value, and the remaining part of the first feature map corresponds to a target gain value.
  • Each first feature map corresponds to multiple target gain values (different feature values in the same feature map may correspond to different target gain values).
  • the feature value included in the map is multiplied by the corresponding target gain value, and the feature value included in the first feature map of the remaining part is multiplied by the corresponding target gain value to change the N first features included in the multiple first feature maps.
  • the distribution of values wherein, the larger the target compression code rate, the more scattered the distribution of the N first eigenvalues.
  • the gain may not be performed).
  • the number of part of the first feature maps that need to be extracted is equal to 1, and the target gain values corresponding to the feature values included in the first feature map are the same, that is, the first feature map corresponds to a target gain value.
  • the feature value included in the first feature map is multiplied by the corresponding target gain value to change the distribution of the N first feature values included in the multiple first feature maps. The distribution of an eigenvalue is more scattered.
  • the gain may not be performed.
  • the number of the extracted part of the first feature maps needs to be equal to 1, and the target gain values corresponding to the feature values included in the first feature maps are not the same, that is, each first feature map in the first feature map corresponds to Multiple target gain values (different feature values in the same feature map may correspond to different target gain values), at this time, the feature value included in the first feature map is multiplied by the corresponding target gain value to change multiple The distribution of the N first eigenvalues included in the first eigenmap, where the larger the target compression code rate, the more scattered the distribution of the N first eigenvalues.
  • the gain processing may be performed on only part of the first feature values included in the first feature map.
  • the basic operation unit of gain is set to the semantic channel level (the first feature map corresponding to at least two semantic channels in all semantic channels includes the first feature map).
  • a feature value has a different target gain value) or feature value level (at least two of all the first feature values included in the first feature value corresponding to the semantic channel have different target gain values corresponding to the first feature value), which can make the compression effect better.
  • an objective function mapping relationship can be manually determined, wherein, for the case where the first feature values included in the first feature map corresponding to each semantic channel have the same target gain value, the objective function mapping relationship is The input can be the semantic channel and the target compression code rate, and the output is the corresponding target gain value (because the first feature value included in the first feature map has the same target gain value, a target gain value can be used to represent all the corresponding semantic channel Target gain value), for example, a linear function, quadratic function, cubic function, or quartic function can be used to determine the target gain value corresponding to each semantic channel.
  • FIG. 6, which is provided for an embodiment of the application A schematic diagram of the target mapping function relationship.
  • the target mapping function relationship is a linear function.
  • the input of this function is the semantic channel number (for example, the semantic channel number is 1 to 192), and the output is the target mapping Function, each target compression rate corresponds to a different target mapping function relationship, among which, the larger the target compression rate, the smaller the slope of the corresponding target mapping function relationship.
  • the approximate distribution of the quadratic nonlinear function or the cubic nonlinear function The rule is similar to this, so I won't repeat it here.
  • the target gain value corresponding to each of the M first characteristic values may be manually determined. As long as the target compression code rate is larger, the distribution of the N first eigenvalues is more dispersed, and the specific setting method is not limited in this application.
  • the method of obtaining the M target gain values corresponding to each target compression code rate through training requires the process of the decoding side. Therefore, the training method is used to obtain the M target gain values corresponding to each target compression code rate.
  • the M target gain values corresponding to the rate will be described in detail in the following embodiments, and will not be repeated here.
  • the processed at least one first feature map can be quantized and entropy Encoding to obtain encoded data, and the processed at least one first feature map includes the M second feature values.
  • the N first feature values are converted to the quantization center according to a specified rule, so as to perform entropy coding subsequently.
  • the quantization operation can convert the N first feature values from floating point numbers into a bit stream (for example, a bit stream using a specific bit integer such as an 8-bit integer or a 4-bit integer).
  • a bit stream for example, a bit stream using a specific bit integer such as an 8-bit integer or a 4-bit integer.
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition
  • the preset condition is related to the target compression code rate. Specifically, the greater the target compression code rate, the greater the information entropy of the quantized data.
  • an entropy estimation network can be used to obtain a probability estimate of each point in the output feature, and the probability estimate is used to perform entropy coding on the output feature to obtain a binary code stream. It should be noted that the entropy coding process mentioned in this application The existing entropy coding technology can be used, which will not be repeated in this application.
  • the difference between the compression code rate corresponding to the encoded data and the target compression code rate is within a preset range, where the preset range can be selected in practical applications, as long as the compression code corresponding to the encoded data
  • the difference between the compression rate and the target compression code rate is within an acceptable range, and this application does not limit the specific preset range.
  • the encoded data can be sent to the terminal device for decompression, and the image processing device for decompression can decompress the data.
  • the terminal device used for compression may store the encoded data in the storage device, and when necessary, the terminal device may obtain the encoded data from the storage device, and may decompress the encoded data.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first gain values
  • the The second compression code rate corresponds to M second gain values
  • the M target gain values are obtained by performing interpolation operations on the M first gain values and the M second gain values.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include a third target gain value
  • the first target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues
  • the third target gain value is a pass pair
  • the first target gain value and the second target gain value are obtained by performing an interpolation operation.
  • the compression effect of multiple compression code rates can be achieved on a single model. Specifically, for multiple target compression code rates, different target gain values can be set correspondingly to achieve different compression code rates. After the compression effect, the interpolation algorithm can be used to interpolate the target gain value, and a new gain value of any compression effect within the compression code rate range can be obtained.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include a third target gain value
  • the first A target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues
  • the third target gain value A target gain value and the second target gain value are obtained by performing an interpolation operation, where the interpolation operation may be an operation based on the following formula:
  • m l [(m i ) l ⁇ (m j ) 1-l ];
  • m l represents the third target gain value
  • mi represents the first target gain value
  • m j represents the second target gain value
  • m l , mi and m j correspond to the same eigenvalue
  • the compression corresponding to the target compression code rate after obtaining the M target gain values corresponding to each compression code rate in the multiple compression code rates, if the compression corresponding to the target compression code rate is to be performed, it can be determined from the multiple compression code rates Two sets of target gain values adjacent to the target compression code rate (each group includes M target gain values), and the above-mentioned interpolation processing is performed on the two sets of target gain values to obtain M target gain values corresponding to the target compression code rate .
  • arbitrary compression effects of the AI compression model within the compression code rate range can be achieved.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used to process the characteristic value obtained in the process of decoding the encoded data.
  • the product of each of the M target gain values and the corresponding inverse gain value is within a preset range.
  • the inverse gain process on the decoding side will be described in the following embodiments, and will not be repeated here.
  • the embodiment of the present application provides an image processing method to obtain a first image; perform feature extraction on the first image to obtain at least one first feature map, and the at least one first feature map includes N first feature values
  • the N is a positive integer
  • the target compression code rate is obtained, the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first characteristic value, and the M is a positive value less than or equal to N Integer; respectively process the corresponding first feature values according to the M target gain values to obtain M second feature values; perform quantization and entropy coding on the processed at least one first feature map to obtain encoded data, so
  • the processed at least one first feature map includes the M second feature values.
  • FIG. 7 is a schematic diagram of an embodiment of an image processing method provided by an embodiment of the application. As shown in FIG. 7, the image processing method provided in this embodiment includes:
  • the encoded data obtained in FIG. 3 and the corresponding embodiment can be obtained.
  • the encoded data can be sent to the terminal device for decompression, and the image processing device for decompression can acquire the encoded data and decompress the data.
  • the terminal device used for compression may store the encoded data in the storage device, and when necessary, the terminal device may obtain the encoded data from the storage device, and may decompress the encoded data.
  • the encoded data can be decoded by using the entropy decoding technology in the prior art to obtain the reconstructed output feature (at least one second feature map), wherein the at least one second feature map includes the Nth Three eigenvalues.
  • the at least one second characteristic map in the embodiment of the present application may be the same as the at least one first characteristic map after the foregoing processing.
  • each target inverse gain value corresponds to a third characteristic value, where M is a positive integer less than or equal to N.
  • a target compression code rate may be obtained, and M target inverse gain values corresponding to the target compression code rate may be determined according to a target mapping relationship, where the target mapping relationship is used to indicate the compression code rate An association relationship with an inverse gain vector; wherein the target mapping relationship includes a plurality of compression code rates and a plurality of inverse gain vectors, and an association relationship between a plurality of compression code rates and a plurality of inverse gain vectors, the The target compression code rate is one of the multiple compression code rates, and the M target inverse gain values are elements of one of the multiple inverse gain vectors; or, the target mapping relationship includes an objective function mapping When the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the anti-target gain value can be obtained at the same time as the step of obtaining the target gain value in the embodiment corresponding to FIG. 3, which is not limited here.
  • the at least one second feature map includes a second target feature map
  • the second target feature map includes P third feature values, and among the P third feature values
  • the target inverse gain value corresponding to each third eigenvalue of is the same, and the P is a positive integer less than or equal to the M.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the M fourth characteristic values may be obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively. Specifically, in the embodiment of the present application, the M third eigenvalues in at least one second feature map are respectively multiplied by the corresponding inverse gain values to obtain M fourth eigenvalues, and then at least one first eigenvalue that undergoes inverse gain processing A second feature map, which includes M fourth feature values.
  • image reconstruction may be performed on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes all
  • at least one second feature map is analytically reconstructed into a second image in the above-mentioned manner.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first inverse gain values
  • the second compression code rate corresponds to M second inverse gain values
  • the M target inverse gain values are obtained by interpolating the M first inverse gain values and the M second inverse gain values of.
  • the M first inverse gain values include a first target inverse gain value
  • the M second inverse gain values include a second target inverse gain value
  • the M target inverse gain values include a first target inverse gain value.
  • Three target inverse gain values, the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to the same eigenvalue among the M first eigenvalues, so The third target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used to process the characteristic value obtained in the process of decoding the encoded data.
  • the product of each of the M target gain values and the corresponding inverse gain value is within the preset range, that is, for the same characteristic value, there is a certain value between the corresponding target gain value and the inverse gain value Relationship:
  • the product of the two is within a preset range, and the preset range can be a value range near the value "1", which is not limited here.
  • the embodiment of the application provides an image processing method to obtain encoded data; perform entropy decoding on the encoded data to obtain at least one second feature map, and the at least one second feature map includes N third feature values, so The N is a positive integer; M target inverse gain values are obtained, and each target inverse gain value corresponds to a third characteristic value, and the M is a positive integer less than or equal to N; The corresponding third feature values are processed to obtain M fourth feature values; the processed at least one second feature map is reconstructed to obtain a second image, and the processed at least one second feature map includes all The M fourth eigenvalues.
  • different anti-target gain values are set, so as to realize the control of the compression code rate.
  • the variational autoencoder is an autoencoder used for data compression or denoising. A sort of.
  • FIG. 8 is a schematic diagram of an image compression process provided by an embodiment of this application.
  • the target gain value corresponding to the same semantic channel is the same, and the target anti-gain value corresponding to the same semantic channel is the same as an example.
  • Code rate for training.
  • Each compression code rate corresponds to a target gain vector and a target anti-gain vector.
  • the target gain vector m i is a vector of size 192 ⁇ 1 corresponding to a certain compression code rate
  • the anti-target gain vector m′ i is a vector of size 192 ⁇ 1 corresponding to a certain compression code rate
  • y is the output feature of the encoding network (including at least one first feature map)
  • the size is 192 ⁇ w ⁇ h
  • w ⁇ h is a single semantic channel feature map Width and length
  • y′ are the new output features obtained after gain processing, quantization, entropy coding, entropy decoding, and inverse gain processing
  • the size is the same as y.
  • the VAE method is used as the basic framework of the model, and the gain unit and the inverse gain unit are added.
  • the model operation can be the following steps:
  • the first image enters the coding network to obtain an output feature y.
  • the output feature y is multiplied by the corresponding gain vector m i channel by channel to obtain the output feature after gain processing
  • entropy estimation module uses the entropy estimation module to obtain a probability estimate of each point in the output feature, and use the probability estimate to perform entropy coding on the output feature to obtain a binary code stream.
  • the left image in FIG. 9 shows that a single model (not a dotted line) of this embodiment and the VAE method in the prior art train four compression models (dotted line) respectively to achieve multi-scale structural similarity.
  • the index measure, MS-SSIM is the comparison of the rate-distortion performance under the condition of the evaluation index, where the abscissa is BPP and the ordinate is MS-SIM;
  • the right picture in Figure 9 is a single model in this embodiment (not a dashed line)
  • the rate-distortion performance is compared with the peak signal-to-noise ratio (PSNR) as the evaluation index, where the abscissa is BPP , The ordinate is PSNR.
  • PSNR peak signal-to-noise ratio
  • the compression effect of any code rate can be achieved on the two evaluation indicators, and the compression effect is not weaker than that of the multiple models of the VAE method.
  • the realization effect can reduce the model storage capacity by N times (N is the number of models required by the VAE method to realize the compression effect of different code rates in the example of the present invention).
  • Fig. 10 is a schematic diagram of a training process of an embodiment of this application. As shown in Fig. 10, the loss function of the model in this embodiment is:
  • l d is the distortion loss of the second image relative to the first image calculated according to the evaluation index
  • l r is the rate loss calculated by the entropy estimation network (or called rate estimation)
  • is the adjustment distortion loss and the code
  • the model training process can be shown in Figure 10:
  • the Lagrangian coefficient ⁇ in the loss function is continuously transformed during the model training process, and From the randomly initialized gain/inverse gain matrix ⁇ M,M′ ⁇ , select the corresponding gain/inverse gain vector ⁇ m i ,m′ i ⁇ and place them at the back end of the encoding network/decoding network front end to achieve gain/inverse gain.
  • the joint optimization of the gain matrix ⁇ M,M' ⁇ and the model can achieve the compression effect of multiple compression code rates on a single model.
  • [m i ,m′ i ], [m j ,m′ j ] are the gain/inverse gain vector pairs corresponding to different compression code rates, C is a vector whose elements are all constants, i,j ⁇ (1, 4).
  • this embodiment can use the above formula to make the following derivation:
  • m l [(m i ) l ⁇ (m j ) 1-l ]
  • m′ l [(m′ i ) l ⁇ (m′ j ) 1-l ];
  • mi and m j are two adjacent gain/inverse gain vectors in the gain/inverse gain matrix, and l ⁇ (0,1) is the adjustment coefficient.
  • four adjacent gain/inverse gain vector pairs obtained by training can be interpolated to obtain a new gain/inverse gain vector pair.
  • the training process is as follows:
  • the Lagrangian coefficient in the loss function is continuously transformed during the model training process, and the gain matrix M is selected from the randomly initialized gain matrix M.
  • gain vector corresponding to m i and inverse gain vector m 'i wherein the inverse gain vector m' i may be a gain vector m i generated by taking the reciprocal.
  • the selection rule of the target gain value and the target inverse gain value in step 705 in the foregoing embodiment which will not be repeated here.
  • the gain vector m i and the inverse gain vector m′ i are respectively placed at the back end of the encoding network/decoding network front end, so as to realize the joint optimization of the gain matrix M and the model, so that 4 codes can be realized on a single model.
  • FIG. 11 is a schematic diagram of an image processing process according to an embodiment of the application.
  • the reused interpolation algorithm performs interpolation operations on the four adjacent gain vector pairs obtained by training, and can obtain a new gain vector with any compression effect in the code rate interval.
  • the compression effect of any code rate can be achieved, and the compression effect is not weaker than the effect of independent training of each code rate, and the model storage capacity can be reduced by N times (N is the number of models required by the VAE method to realize the compression effect of different code rates in the example of the present invention).
  • VAE AI compression model architecture
  • FIG. 12 is a system architecture diagram of the image processing system provided by an embodiment of the application.
  • the image processing system 200 includes an execution device 210, a training device 220, a database 230, a client device 240, and
  • the execution device 210 includes a calculation module 211.
  • the first image set is stored in the database 230, and the training device 220 generates a target model/rule 201 for processing the first image, and uses the first image in the database to perform iterative training on the target model/rule 201 to obtain a mature Target model/rule 201.
  • the target model/rule 201 includes a second codec network, M target gain values corresponding to each compression code rate, and M target inverse gain values for description.
  • the second codec network obtained by the training device 220, the M target gain values corresponding to each compression code rate, and the M target anti-gain values can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, and surveillance. System and so on.
  • the execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.
  • the data storage system 250 may be placed in the execution device 210, or the data storage system 250 may be an external memory relative to the execution device 210.
  • the calculation module 211 may perform feature extraction on the first image received by the client device 240 through the second codec network to obtain at least one first feature map.
  • the at least one first feature map includes N first feature values. Is a positive integer; to obtain the target compression code rate, the target compression code rate corresponds to M target gain values, each target gain value corresponds to a first eigenvalue, the M is a positive integer less than or equal to N;
  • the M target gain values respectively process the corresponding first feature values to obtain M second feature values.
  • the calculation module 211 may also perform entropy decoding on the encoded data through the second encoding and decoding network to obtain at least one second feature map, where the at least one second feature map includes N third feature values, where N is a positive integer ; Acquire M target inverse gain values, each target inverse gain value corresponds to a third characteristic value, where M is a positive integer less than or equal to N; according to the M target inverse gain values, the corresponding third characteristic Values are processed to obtain M fourth feature values; image reconstruction is performed on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature maps. Eigenvalues.
  • the execution device 210 and the client device 240 may be separate and independent devices.
  • the execution device 210 is equipped with an I/O interface 212 to perform data interaction with the client device 240, and the "user" may
  • the first image is input to the I/O interface 212 through the client device 240, and the execution device 210 returns the second image to the client device 240 through the I/O interface 212 and provides it to the user.
  • FIG. 12 is only a schematic diagram of the architecture of the image processing system provided by the embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 210 may be configured in the client device 240.
  • the execution device 210 may be the main processor of the mobile phone or the tablet.
  • the module used for array image processing in the CPU) the execution device 210 may also be a graphics processing unit (GPU) or a neural network processor (NPU) in a mobile phone or tablet, and the GPU or NPU is used as a coprocessor. Loaded on the main processor, the main processor assigns tasks.
  • GPU graphics processing unit
  • NPU neural network processor
  • FIG. 13 is a schematic flowchart of an image processing method provided by an embodiment of the present application.
  • the image processing method provided by an embodiment of the present application may include:
  • the target compression code rate corresponds to M initial gain values and M initial inverse gain values
  • each initial gain value corresponds to a first characteristic value
  • each initial inverse gain value corresponds to a first characteristic value.
  • a characteristic value the M is a positive integer less than or equal to N;
  • the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network.
  • Output a second codec network, M target gain values, and M target anti-gain values.
  • the second codec network is a model obtained after iterative training is performed on the first codec network.
  • the M The target gain value and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • step 1301 to step 1311 reference may be made to the description in the foregoing embodiment, which is not limited here.
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the preset conditions include at least:
  • the M second characteristic values are obtained by multiplying the M initial gain values with the corresponding first characteristic values.
  • the M fourth characteristic values are obtained by multiplying the M initial inverse gain values and the corresponding third characteristic values respectively.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range, and each of the M initial gain values is associated with the corresponding The product of the initial inverse gain value is within the preset range.
  • FIG. 14 is a schematic structural diagram of an image processing apparatus 1400 according to an embodiment of the application.
  • the image processing apparatus 1400 may be a terminal device or a server, and the image processing apparatus 1400 includes:
  • the obtaining module 1401 is used to obtain the first image
  • the feature extraction module 1402 is configured to perform feature extraction on the first image to obtain at least one first feature map, where the at least one first feature map includes N first feature values, where N is a positive integer;
  • the acquisition module 1401 is further configured to acquire a target compression code rate, the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first characteristic value, where M is less than or equal to N Positive integer;
  • the gain module 1403 is configured to process the corresponding first characteristic values respectively according to the M target gain values to obtain M second characteristic values;
  • the quantization and entropy encoding module 1404 is configured to perform quantization and entropy encoding on the processed at least one first feature map to obtain encoded data, and the processed at least one first feature map includes the M second feature values.
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the preset conditions include at least:
  • the difference between the compression code rate corresponding to the encoded data and the target compression code rate is within a preset range.
  • the M second characteristic values are obtained by performing a product operation on the M target gain values and the corresponding first characteristic values, respectively.
  • the at least one first feature map includes a first target feature map
  • the first target feature map includes P first feature values
  • each first feature value of the P first feature values The corresponding target gain values are the same, and the P is a positive integer less than or equal to the M.
  • the device further includes:
  • a determining module configured to determine M target gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the association relationship between the compression code rate and the M target gain values;
  • the target mapping relationship includes multiple compression code rates and multiple gain vectors, and an association relationship between multiple compression code rates and multiple gain vectors, and the target compression code rate is the multiple compression code rates.
  • One of the M target gain values is an element of one of the multiple gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target gain values.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first gain values
  • the second compression code rate corresponds to M second gain values
  • the M target gain values are obtained by performing an interpolation operation on the M first gain values and the M second gain values.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include a third target gain value
  • the The first target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues, and the third target gain value
  • the first target gain value and the second target gain value are interpolated.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used to process the feature value obtained in the process of decoding the encoded data
  • the M The product of each target gain value in the target gain value and the corresponding inverse gain value is within a preset range.
  • the device further includes:
  • the decoding module is configured to perform entropy decoding on the encoded data to obtain at least one second feature map, where the at least one second feature map includes N third feature values, and each third feature value corresponds to a first feature value ;
  • the acquiring module is further configured to acquire M target anti-gain values, and each target anti-gain value corresponds to a third characteristic value;
  • the device also includes:
  • the inverse gain module is configured to gain the corresponding third eigenvalues respectively according to the M target inverse gain values to obtain M fourth eigenvalues;
  • the reconstruction module is configured to perform image reconstruction on at least one second feature map after inverse gain processing to obtain a second image, where the at least one second feature map after inverse gain processing includes the M fourth feature values .
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second characteristic map includes a second target characteristic map, the second target characteristic map includes P third characteristic values, and each third characteristic value of the P third characteristic values The corresponding target inverse gain values are the same, and the P is a positive integer less than or equal to the M.
  • the determining module is further used for:
  • the M target inverse gain values corresponding to the target compression code rate are determined according to the target mapping relationship, and the target mapping relationship is used to indicate the association relationship between the compression code rate and the inverse gain vector.
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and an association relationship between multiple compression code rates and multiple inverse gain vectors, and the target compression code rate is the multiple One of the compression code rates, the M target inverse gain values are an element of one of the plurality of inverse gain vectors.
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first inverse gain values
  • the second compression code rate corresponds to
  • the M target inverse gain values are obtained by performing an interpolation operation on the M first inverse gain values and the M second inverse gain values.
  • the M first inverse gain values include a first target inverse gain value
  • the M second inverse gain values include a second target inverse gain value
  • the M target inverse gain values include a third target The inverse gain value, the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to the same eigenvalue among the M first eigenvalues
  • the first The three-target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • An embodiment of the present application provides an image processing device 1400.
  • the acquisition module 1401 acquires a first image; the feature extraction module 1402 performs feature extraction on the first image to obtain at least one first feature map, and the at least one first feature
  • the graph includes N first characteristic values, where N is a positive integer; the acquisition module 1401 acquires a target compression code rate, the target compression code rate corresponds to M target gain values, and each target gain value corresponds to a first The eigenvalue, the M is a positive integer less than or equal to N; the gain module 1403 processes the corresponding first eigenvalues respectively according to the M target gain values to obtain M second eigenvalues; quantization and entropy coding module 1404 performs quantization and entropy coding on the processed at least one first feature map to obtain encoded data, where the processed at least one first feature map includes the M second feature values.
  • different target gain values are set for different target compression code rates, so as to realize the control of the compression code rate.
  • FIG. 15 is a schematic structural diagram of an image processing apparatus 1500 according to an embodiment of the application.
  • the image processing apparatus 1500 may be a terminal device or a server, and the image processing apparatus 1500 includes:
  • the obtaining module 1501 is used to obtain encoded data
  • the decoding module 1502 is configured to perform entropy decoding on the encoded data to obtain at least one second feature map, where the at least one second feature map includes N third feature values, where N is a positive integer;
  • the acquiring module 1501 is further configured to acquire M target inverse gain values, each target inverse gain value corresponds to a third characteristic value, and the M is a positive integer less than or equal to N;
  • the inverse gain module 1503 is configured to process corresponding third characteristic values according to the M target inverse gain values to obtain M fourth characteristic values;
  • the reconstruction module 1504 is configured to perform image reconstruction on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature values.
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second characteristic map includes a second target characteristic map, the second target characteristic map includes P third characteristic values, and each third characteristic value of the P third characteristic values The corresponding target inverse gain values are the same, and the P is a positive integer less than or equal to the M.
  • the obtaining module is also used to obtain a target compression code rate
  • the device also includes:
  • a determining module configured to determine M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the association relationship between the compression code rate and the inverse gain vector;
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and association relationships between multiple compression code rates and multiple inverse gain vectors, and the target compression code rate is the multiple compression One of the code rates, the M target inverse gain values are an element of one of the plurality of inverse gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first inverse gain values
  • the second compression code rate corresponds to
  • the M target inverse gain values are obtained by performing an interpolation operation on the M first inverse gain values and the M second inverse gain values.
  • the M first inverse gain values include a first target inverse gain value
  • the M second inverse gain values include a second target inverse gain value
  • the M target inverse gain values include a third target The inverse gain value, the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to the same eigenvalue among the M first eigenvalues
  • the first The three-target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • the embodiment of the present application provides an image processing device.
  • the acquiring module 1501 acquires encoded data; the decoding module 1502 entropy-decodes the encoded data to obtain at least one second feature map.
  • the at least one second feature map includes N The third characteristic value, where N is a positive integer; the acquisition module 1501 acquires M target inverse gain values, each target inverse gain value corresponds to a third characteristic value, and the M is a positive integer less than or equal to N;
  • the anti-gain module 1503 respectively processes the corresponding third eigenvalues according to the M target anti-gain values to obtain M fourth eigenvalues; the reconstruction module 1504 performs image reconstruction on the processed at least one second feature map To obtain a second image, and the processed at least one second feature map includes the M fourth features.
  • FIG. 16 is a schematic structural diagram of an image processing apparatus 1600 provided by an embodiment of the application.
  • the image processing apparatus 1600 may be a terminal device or a server, and the image processing apparatus 1600 includes:
  • the obtaining module 1601 is used to obtain the first image
  • the feature extraction module 1602 is configured to perform feature extraction on the first image according to the coding network to obtain at least one first feature map, where the at least one first feature map includes N first feature values, where N is a positive integer ;
  • the acquisition module 1601 is also used to acquire a target compression code rate, the target compression code rate corresponding to M initial gain values and M initial inverse gain values, each initial gain value corresponds to a first characteristic value, each The initial inverse gain value corresponds to a first characteristic value, and the M is a positive integer less than or equal to N;
  • the gain module 1603 is configured to process the corresponding first eigenvalues according to the M initial gain values to obtain M second eigenvalues;
  • the quantization and entropy encoding module 1604 is configured to perform quantization and entropy encoding on the processed at least one first feature map according to the quantization network and the entropy encoding network to obtain encoded data and a bit rate loss.
  • the at least one first feature map after the gain processing is The feature map includes the M second feature values;
  • the decoding module 1605 is configured to perform entropy decoding on the encoded data according to the entropy decoding network to obtain at least one second feature map, where the at least one second feature map includes M third feature values, and each third feature value corresponds to A first eigenvalue;
  • the inverse gain module 1606 is configured to process corresponding third characteristic values according to the M initial inverse gain values to obtain M fourth characteristic values;
  • the reconstruction module 1607 is configured to perform image reconstruction on the processed at least one second feature map according to the decoding network to obtain a second image, where the processed at least one feature map includes the M fourth feature values;
  • the acquiring module 1601 is further configured to acquire the distortion loss of the second image relative to the first image
  • the training module 1608 is used to perform joint training on the first codec network, M initial gain values, and M initial inverse gain values using a loss function, until the image distortion value between the first image and the second image reaches A first preset degree, the image distortion value is related to the bit rate loss and the distortion loss, and the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network;
  • the output module 1609 is configured to output a second codec network, M target gain values, and M target inverse gain values.
  • the second codec network is a model obtained after iterative training is performed on the first codec network,
  • the M target gain values and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • the information entropy of the quantized quantized data of the at least one first feature map after the gain processing satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the N is greater than or equal to The positive integer of M.
  • the preset conditions include at least:
  • the M second characteristic values are obtained by performing a product operation on the M target gain values and the corresponding first characteristic values, respectively.
  • the at least one first feature map includes a first target feature map
  • the first target feature map includes P first feature values
  • each first feature value of the P first feature values The corresponding target gain values are the same, and the P is a positive integer less than or equal to the M.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • each target gain value in the M target gain values and the corresponding target inverse gain value is within a preset range
  • each target gain value in the M initial gain values is associated with the corresponding target gain value.
  • the product of the initial inverse gain value is within the preset range.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of this application. Tablets, laptops, smart wearable devices, monitoring data processing devices, etc., are not limited here.
  • the image processing device described in the embodiment corresponding to FIG. 14 and FIG. 15 may be deployed on the execution device 1700 to implement the function of the image processing device in the embodiment corresponding to FIG. 14 and FIG. 15.
  • the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG.
  • the processor 1703 may include an application processor 17031 and a communication processor 17032.
  • the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or other methods.
  • the memory 1704 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1703. A part of the memory 1704 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1704 stores a processor and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them.
  • the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1703 controls the operation of the execution device.
  • the various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1703 or implemented by the processor 1703.
  • the processor 1703 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1703 or instructions in the form of software.
  • the above-mentioned processor 1703 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1703 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 1701 can be used to receive input digital or character information, and generate signal input related to the relevant settings and function control of the execution device.
  • the transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include display devices such as a display .
  • the processor 1703 is configured to execute the image processing method executed by the execution device in the embodiment corresponding to FIG. 9 to FIG. 11.
  • the application processor 17031 is configured to obtain the first image;
  • target compression code rate corresponds to M target gain values, each target gain value corresponds to a first eigenvalue, and the M is a positive integer less than or equal to N;
  • the information entropy of the quantized data obtained by quantizing the processed at least one first feature map satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the preset conditions include at least:
  • the difference between the compression code rate corresponding to the encoded data and the target compression code rate is within a preset range.
  • the M second characteristic values are obtained by performing a product operation on the M target gain values and the corresponding first characteristic values, respectively.
  • the at least one first feature map includes a first target feature map
  • the first target feature map includes P first feature values
  • each first feature value of the P first feature values The corresponding target gain values are the same, and the P is a positive integer less than or equal to the M.
  • the application processor 17031 is also used for:
  • the target mapping relationship includes multiple compression code rates and multiple gain vectors, and an association relationship between multiple compression code rates and multiple gain vectors, and the target compression code rate is the multiple compression code rates.
  • One of the M target gain values is an element of one of the multiple gain vectors; or,
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target gain values.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first gain values
  • the second compression code rate corresponds to M second gain values
  • the M target gain values are obtained by performing an interpolation operation on the M first gain values and the M second gain values.
  • the M first gain values include a first target gain value
  • the M second gain values include a second target gain value
  • the M target gain values include a third target gain value
  • the The first target gain value, the second target gain value, and the third target gain value correspond to the same eigenvalue among the M first eigenvalues, and the third target gain value
  • the first target gain value and the second target gain value are interpolated.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • each of the M target gain values corresponds to an inverse gain value
  • the inverse gain value is used to process the feature value obtained in the process of decoding the encoded data
  • the M The product of each target gain value in the target gain value and the corresponding inverse gain value is within a preset range.
  • the application processor 17031 is also used for:
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second characteristic map includes a second target characteristic map, the second target characteristic map includes P third characteristic values, and each third characteristic value of the P third characteristic values The corresponding target inverse gain values are the same, and the P is a positive integer less than or equal to the M.
  • the application processor 17031 is further configured to: determine M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the relationship between the compression code rate and the inverse gain vector Relationship.
  • the target mapping relationship includes multiple compression code rates and multiple inverse gain vectors, and an association relationship between multiple compression code rates and multiple inverse gain vectors, and the target compression code rate is the multiple One of the compression code rates, the M target inverse gain values are an element of one of the plurality of inverse gain vectors.
  • the target mapping relationship includes an objective function mapping relationship, and when the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first inverse gain values
  • the second compression code rate corresponds to
  • the M target inverse gain values are obtained by performing an interpolation operation on the M first inverse gain values and the M second inverse gain values.
  • the M first inverse gain values include a first target inverse gain value
  • the M second inverse gain values include a second target inverse gain value
  • the M target inverse gain values include a third target The inverse gain value, the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to the same eigenvalue among the M first eigenvalues
  • the first The three-target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • the application processor 17031 is used to:
  • each target anti-gain value corresponds to a third eigenvalue, where M is a positive integer less than or equal to N;
  • Image reconstruction is performed on the processed at least one second feature map to obtain a second image, and the processed at least one second feature map includes the M fourth feature values.
  • the M fourth characteristic values are obtained by multiplying the M target inverse gain values and the corresponding third characteristic values respectively.
  • the at least one second characteristic map includes a second target characteristic map, the second target characteristic map includes P third characteristic values, and each third characteristic value of the P third characteristic values The corresponding target inverse gain values are the same, and the P is a positive integer less than or equal to the M.
  • the application processor 17031 is further configured to: obtain a target compression code rate; determine M target inverse gain values corresponding to the target compression code rate according to a target mapping relationship, where the target mapping relationship is used to indicate the compression code rate An association relationship with an inverse gain vector; wherein the target mapping relationship includes a plurality of compression code rates and a plurality of inverse gain vectors, and an association relationship between a plurality of compression code rates and a plurality of inverse gain vectors, the The target compression code rate is one of the multiple compression code rates, and the M target inverse gain values are elements of one of the multiple inverse gain vectors; or, the target mapping relationship includes an objective function mapping When the input of the objective function relationship includes the target compression code rate, the output of the objective function relationship includes the M target inverse gain values.
  • the second image includes a target object
  • the M third feature values are feature values corresponding to the target object in the at least one feature map.
  • the target compression code rate is greater than a first compression code rate and less than a second compression code rate
  • the first compression code rate corresponds to M first inverse gain values
  • the second compression code rate corresponds to
  • the M target inverse gain values are obtained by performing an interpolation operation on the M first inverse gain values and the M second inverse gain values.
  • the M first inverse gain values include a first target inverse gain value
  • the M second inverse gain values include a second target inverse gain value
  • the M target inverse gain values include a third target The inverse gain value, the first target inverse gain value, the second target inverse gain value, and the third target inverse gain value correspond to the same eigenvalue among the M first eigenvalues
  • the first The three-target inverse gain value is obtained by performing an interpolation operation on the first target inverse gain value and the second target inverse gain value.
  • FIG. 18 is a schematic structural diagram of a training device provided in an embodiment of the present application.
  • the training device 1800 may be deployed with the image described in the embodiment corresponding to FIG. 16
  • the processing device is used to implement the functions of the image processing device in the embodiment corresponding to FIG. 16.
  • the training device 1800 is implemented by one or more servers.
  • the training device 1800 may have relatively large differences due to different configurations or performance, and may include One or more central processing units (CPU) 1822 (e.g., one or more processors) and memory 1832, and one or more storage media 1830 (e.g., one or more) storing application programs 1842 or data 1844 Mass storage devices).
  • CPU central processing units
  • storage media 1830 e.g., one or more
  • the memory 1832 and the storage medium 1830 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device.
  • the central processing unit 1822 may be configured to communicate with the storage medium 1830, and execute a series of instruction operations in the storage medium 1830 on the training device 1800.
  • the training device 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1841 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1822 is configured to execute the image processing method executed by the image processing apparatus in the embodiment corresponding to FIG. 16. Specifically, the central processing unit 1822 is used for
  • the target compression code rate corresponding to M initial gain values and M initial inverse gain values, each initial gain value corresponds to a first characteristic value, and each initial inverse gain value corresponds to a first characteristic Value, the M is a positive integer less than or equal to N;
  • the at least one first feature map after the gain processing includes the M second feature maps. Eigenvalues;
  • the image distortion value is related to the bit rate loss and the distortion loss
  • the codec network includes the coding network, the quantization network, the entropy coding network, and the entropy decoding network;
  • the second codec network is a model obtained after iterative training of the first codec network.
  • the M target gains The value and the M target inverse gain values are obtained after the M initial gain values and the M initial inverse gain values are iteratively trained.
  • the information entropy of the quantized data obtained by quantizing the at least one first feature map after the gain processing satisfies a preset condition
  • the preset condition is related to the target compression code rate
  • the preset condition at least includes: the greater the target compression code rate, the greater the information entropy of the quantized data.
  • the M second characteristic values are obtained by performing a product operation on the M target gain values and the corresponding first characteristic values, respectively.
  • the at least one first feature map includes a first target feature map
  • the first target feature map includes P first feature values
  • each first feature value of the P first feature values The corresponding target gain values are the same, and the P is a positive integer less than or equal to the M.
  • the first image includes a target object
  • the M first feature values are feature values corresponding to the target object in the at least one feature map.
  • the product of each of the M target gain values and the corresponding target inverse gain value is within a preset range, and each of the M initial gain values is associated with the corresponding The product of the initial inverse gain value is within the preset range.
  • the embodiment of the present application also provides a product including a computer program, which when running on a computer, causes the computer to execute the steps executed by the execution device in the method described in the embodiment shown in FIG. 17, or causes the computer to execute The steps performed by the training device in the method described in the foregoing embodiment shown in FIG. 18.
  • An embodiment of the present application also provides a computer-readable storage medium, which stores a program for signal processing, and when it runs on a computer, the computer executes the embodiment shown in FIG. 17
  • the steps executed by the device are executed, or the computer is caused to execute the steps executed by the training device in the method described in the embodiment shown in FIG. 18.
  • the execution device, training device, or terminal device provided by the embodiments of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, Pins or circuits, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the execution device executes the image processing method described in the embodiments shown in FIGS. 3 to 7, or so that the chip in the training device executes the above-mentioned FIG. 13
  • the illustrated embodiment describes the image processing method.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • Figure 19 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the Host CPU assigns tasks.
  • the core part of the NPU is the arithmetic circuit 2003.
  • the controller 2004 controls the arithmetic circuit 2003 to extract matrix data from the memory and perform multiplication.
  • the arithmetic circuit 2003 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 2003 is a two-dimensional systolic array. The arithmetic circuit 2003 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 2003 is a general-purpose matrix processor.
  • PE Process Engine
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 2002 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit fetches the matrix A data and matrix B from the input memory 2001 to perform matrix operations, and the partial result or final result of the obtained matrix is stored in an accumulator 2008.
  • the unified memory 2006 is used to store input data and output data.
  • the weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 2005, and the DMAC is transferred to the weight memory 2002.
  • the input data is also transferred to the unified memory 2006 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 2010, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 2009.
  • IFB instruction fetch buffer
  • the bus interface unit 2010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 2009 to obtain instructions from the external memory, and is also used for the storage unit access controller 2005 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 2006 or the weight data to the weight memory 2002 or the input data to the input memory 2001.
  • the vector calculation unit 2007 includes multiple arithmetic processing units, if necessary, further processing the output of the arithmetic circuit, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison and so on. It is mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as Batch Normalization, pixel-level summation, and upsampling of feature planes.
  • the vector calculation unit 2007 can store the processed output vector to the unified memory 2006.
  • the vector calculation unit 2007 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 2003, such as linearly interpolating the feature plane extracted by the convolutional layer, and then, for example, a vector of accumulated values to generate the activation value.
  • the vector calculation unit 2007 generates normalized values, pixel-level summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 2003, for example for use in a subsequent layer in a neural network.
  • the instruction fetch buffer 2009 connected to the controller 2004 is used to store instructions used by the controller 2004;
  • the unified memory 2006, the input memory 2001, the weight memory 2002, and the fetch memory 2009 are all On-Chip memories.
  • the external memory is private to the NPU hardware architecture.
  • processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware. Of course, it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve. Under normal circumstances, all functions completed by computer programs can be easily implemented with corresponding hardware. Moreover, the specific hardware structure used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc. However, for this application, software program implementation is a better implementation in more cases. Based on this understanding, the technical solution of this application essentially or the part that contributes to the prior art can be embodied in the form of a software product.
  • the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments of this application method.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center uses wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, training equipment, or data center.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Electrical Discharge Machining, Electrochemical Machining, And Combined Machining (AREA)
  • Farming Of Fish And Shellfish (AREA)
  • Threshing Machine Elements (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及人工智能领域,公开了一种图像处理方法,包括:获取第一图像;对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。本申请可以在同一个压缩模型中实现压缩码率控制。

Description

一种图像处理方法以及相关设备
本申请要求于2020年02月07日提交中国国家知识产权局、申请号为202010082808.4、发明名称为“一种图像处理方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种图像处理方法以及相关设备。
背景技术
如今多媒体数据占据了互联网的绝大部分流量。对于图像数据的压缩对于多媒体数据的存储和高效传输有着至关重要的作用。所以图像编码是一项具有重大实用价值的技术。
对于图像编码的研究已经有较长的历史了,研究人员提出了大量的方法,并制定了多种国际标准,比如JPEG,JPEG2000,WebP,BPG等图像编码标准。这些编码方法虽然在目前都得到了广泛应用,但是针对现在不断增长的图像数据量及不断出现的新媒体类型,这些传统方法显示出了某些局限性。
近年来,开始有研究人员开展了基于深度学习图像编码方法的研究。有些研究人员已经取得了不错的成果,比如Ballé等人提出了一种端到端优化的图像编码方法,取得了超越目前最好的图像编码性能,甚至超越了目前最好的传统编码标准BPG。不过目前大多数基于深度卷积网络的图像编码都有一个缺陷,即一个训练好的模型针对一种输入图像只能输出一种编码结果,而不能根据实际需求,得到目标压缩码率的编码效果。
发明内容
本申请提供了一种图像处理方法,用于在同一个压缩模型中实现压缩码率的控制。
第一方面,本申请提供一种图像处理方法,所述方法包括:
获取第一图像;对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值,本申请实施例中,可以用处理后的至少一个第一特征图替代原有的第一特征图;对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。通过上述方式,针对于不同的目标压缩码率,设置不同的目标增益值,从而实现压缩码率的控制。
在第一方面的一种可选设计中,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
在第一方面的一种可选设计中,所述目标压缩码率越大,所述量化数据的信息熵越大。
在第一方面的一种可选设计中,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
在第一方面的一种可选设计中,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
在第一方面的一种可选设计中,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
在第一方面的一种可选设计中,所述方法还包括:
根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
在第一方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
在第一方面的一种可选设计中,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的。
在第一方面的一种可选设计中,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第一方面的一种可选设计中,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
在第一方面的一种可选设计中,所述方法还包括:对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,每个第三特征值对应一个第一特征值;获取M个目标反增益值,每个目标反增益值对应一个第三特征值;根据所述M个目标反增益值分别对对应的第三特征值进行增益,得到M个第四特征值;对反增益处理后的至少一个第二特征图进行图像重构,得到第二图像,所述反增益处理后的至少一个第二特征图包括所述M个第四特征值。
在第一方面的一种可选设计中,所述M个第四特征值为将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
在第一方面的一种可选设计中,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目 标反增益值相同,所述P为小于或等于所述M的正整数。
在第一方面的一种可选设计中,所述方法还包括:根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系。
在第一方面的一种可选设计中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素。
在第一方面的一种可选设计中,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
在第一方面的一种可选设计中,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第一方面的一种可选设计中,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内。
在第一方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
在第一方面的一种可选设计中,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
第二方面,本申请提供了一种图像处理方法,所述方法包括:
获取编码数据;对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
在第二方面的一种可选设计中,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
在第二方面的一种可选设计中,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
在第二方面的一种可选设计中,所述方法还包括:获取目标压缩码率;根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码 率与反增益向量之间的关联关系;其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
在第二方面的一种可选设计中,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第二方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
在第二方面的一种可选设计中,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
第三方面,本申请提供了一种图像处理方法,所述方法包括:
获取第一图像;
根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
根据所述M个初始增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
获取所述第二图像相对于所述第一图像的失真损失;
利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络;
输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络 为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
在第三方面的一种可选设计中,所述增益处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
在第三方面的一种可选设计中,所述预设条件至少包括:所述目标压缩码率越大,所述量化数据的信息熵越大。
在第三方面的一种可选设计中,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
在第三方面的一种可选设计中,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
在第三方面的一种可选设计中,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第三方面的一种可选设计中,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内,所述M个初始增益值中的每个初始增益值与对应的初始反增益值的乘积在预设范围内。
第四方面,本申请提供了一种图像处理装置,所述装置包括:
获取模块,用于获取第一图像;
特征提取模块,用于对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
所述获取模块,还用于获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
增益模块,用于根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
量化和熵编码模块,用于对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
在第四方面的一种可选设计中,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
在第四方面的一种可选设计中,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
在第四方面的一种可选设计中,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
在第四方面的一种可选设计中,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
在第四方面的一种可选设计中,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
在第四方面的一种可选设计中,所述装置还包括:
确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
在第四方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
在第四方面的一种可选设计中,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的。
在第四方面的一种可选设计中,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第四方面的一种可选设计中,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
在第四方面的一种可选设计中,所述装置还包括:
解码模块,用于对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,每个第三特征值对应一个第一特征值;
所述获取模块,还用于获取M个目标反增益值,每个目标反增益值对应一个第三特征值;
所述装置还包括:
反增益模块,用于根据所述M个目标反增益值分别对对应的第三特征值进行增益,得到M个第四特征值;
重构模块,用于对反增益处理后的至少一个第二特征图进行图像重构,得到第二图像,所述反增益处理后的至少一个第二特征图包括所述M个第四特征值。
在第四方面的一种可选设计中,所述M个第四特征值为将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
在第四方面的一种可选设计中,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
在第四方面的一种可选设计中,所述确定模块,还用于:
根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系。
在第四方面的一种可选设计中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素。
在第四方面的一种可选设计中,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
在第四方面的一种可选设计中,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第四方面的一种可选设计中,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内。
在第四方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
在第四方面的一种可选设计中,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
第五方面,本申请提供了一种图像处理装置,所述装置包括:
获取模块,用于获取编码数据;
解码模块,用于对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
所述获取模块,还用于获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
反增益模块,用于根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
重构模块,用于对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
在第五方面的一种可选设计中,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
在第五方面的一种可选设计中,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
在第五方面的一种可选设计中,所述获取模块,还用于获取目标压缩码率;
所述装置还包括:
确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
在第五方面的一种可选设计中,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第五方面的一种可选设计中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
在第五方面的一种可选设计中,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
第六方面,本申请提供了一种图像处理装置,所述装置包括:
获取模块,用于获取第一图像;
特征提取模块,用于根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
所述获取模块,还用于获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
增益模块,用于根据所述M个初始增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
量化和熵编码模块,用于根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
解码模块,用于根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
反增益模块,用于根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
重构模块,用于根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第 二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
所述获取模块,还用于获取所述第二图像相对于所述第一图像的失真损失;
训练模块,用于利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络;
输出模块,用于输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
在第六方面的一种可选设计中,所述增益处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
在第六方面的一种可选设计中,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
在第六方面的一种可选设计中,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
在第六方面的一种可选设计中,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
在第六方面的一种可选设计中,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
在第六方面的一种可选设计中,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内,所述M个初始增益值中的每个目标增益值与对应的初始反增益值的乘积在预设范围内。
第七方面,本申请实施例提供了一种执行设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:
获取第一图像;
对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
在第七方面的一种可选设计中,执行设备为虚拟现实VR设备、手机、平板、笔记本电脑、服务器或者智能穿戴设备。
本申请第七方面中,处理器还可以用于执行第一方面的各个可能实现方式中执行设备 执行的步骤,具体均可以参阅第一方面,此处不再赘述。
第八方面,本申请实施例提供了一种执行设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:
获取编码数据;
对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
在第八方面的一种可选设计中,执行设备为虚拟现实VR设备、手机、平板、笔记本电脑、服务器或者智能穿戴设备。
本申请第八方面中,处理器还可以用于执行第二方面的各个可能实现方式中执行设备执行的步骤,具体均可以参阅第二方面,此处不再赘述。
第九方面,本申请实施例提供了一种训练设备,可以包括存储器、处理器以及总线系统,其中,存储器用于存储程序,处理器用于执行存储器中的程序,包括如下步骤:
获取第一图像;
根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
根据所述M个初始增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
获取所述第二图像相对于所述第一图像的失真损失;
利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络;
输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络 为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
本申请第九方面中,处理器还可以用于执行第三方面的各个可能实现方式中执行设备执行的步骤,具体均可以参阅第三方面,此处不再赘述。
第十方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第三方面任一所述的图像处理方法。
第十一方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面至第三方面任一所述的图像处理方法。
第十二方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
本申请实施例提供了一种图像处理方法,获取第一图像;对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。通过上述方式,针对于不同的目标压缩码率,设置不同的目标增益值,从而实现压缩码率的控制。
附图说明
图1为人工智能主体框架的一种结构示意图;
图2a为本申请实施例的应用场景示意;
图2b为本申请实施例的应用场景示意;
图3为本申请实施例提供的一种图像处理方法的实施例示意;
图4为一种基于CNN的图像处理过程示意;
图5a为本申请实施例中不同压缩码率的特征图的信息熵分布示意;
图5b为本申请实施例中不同压缩码率的特征图的信息熵分布示意;
图6为本申请实施例提供的一种目标映射函数关系的示意;
图7为本申请实施例提供的一种图像处理方法的实施例示意;
图8为本申请实施例提供的一种图像压缩的流程示意;
图9为本申请实施例的一种压缩效果示意;
图10为本申请实施例的一种训练过程示意;
图11为本申请实施例的一种图像处理过程示意;
图12为本申请实施例提供的图像处理系统的一种系统架构图;
图13为本申请实施例提供的图像处理方法的一种流程示意图;
图14为本申请实施例提供的图像处理装置的一种结构示意图;
图15为本申请实施例提供的图像处理装置的一种结构示意图;
图16为本申请实施例提供的图像处理装置的一种结构示意图;
图17为本申请实施例提供的执行设备的一种结构示意图;
图18为本申请实施例提供的训练设备一种结构示意图;
图19为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
下面结合本发明实施例中的附图对本发明实施例进行描述。本发明的实施方式部分使用的术语仅用于对本发明的具体实施例进行解释,而非旨在限定本发明。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、平安城市等。
本申请可以应用于人工智能领域的图像处理领域中,下面将对多个落地到产品的多个应用场景进行介绍。
一、应用于终端设备中的图像压缩过程
本申请实施例提供的图像压缩方法可以应用于终端设备中的图像压缩过程,具体的,可以应用于终端设备上的相册、视频监控等。具体的,可以参照图2a,图2a为本申请实施例的应用场景示意,如图2a中示出的那样,终端设备可以获取到待压缩图片,其中待压缩图片可以是相机拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过嵌入式神经网络(neural-network processing unit,NPU)中的人工智能(artificial intelligence,AI)编码单元对获取到的待压缩图片进行特征提取,将图像数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,中央处理器(central processing unit,CPU)通过输出特征中各点的概率估计对提取获得的输出特征进行算术编码,降低输出特征的编码冗余,进一步降低图像压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于算数解码获取到解码得到的特征图,通过NPU中的AI解码单元对特征图进行重构,得到重构的图像。
二、应用于云侧的图像压缩过程
本申请实施例提供的图像压缩方法可以应用于云侧的图像压缩过程,具体的,可以应用于云侧服务器上的云相册等功能。具体的,可以参照图2b,图2b为本申请实施例的应用场景示意,如图2b中示出的那样,终端设备可以获取到待压缩图片,其中待压缩图片可以是相机拍摄的照片或是从视频中截取的一帧画面。终端设备可以通过CPU对待压缩图片进行无损编码压缩,得到编码数据,例如但不限于基于现有技术中的任意一种无损压缩方法, 终端设备可以将编码数据传输至云侧的服务器,服务器可以对接收到的编码数据进行相应的无损解码,得到待压缩图像,服务器可以通过图形处理器(graphics processing unit,GPU)中的AI编码单元对获取到的待压缩图片进行特征提取,将图像数据变换成冗余度更低的输出特征,且产生输出特征中各点的概率估计,CPU通过输出特征中各点的概率估计对提取获得的输出特征进行算术编码,降低输出特征的编码冗余,进一步降低图像压缩过程中的数据传输量,并将编码得到的编码数据以数据文件的形式保存在对应的存储位置。当用户需要获取上述存储位置中保存的文件时,CPU可以在相应的存储位置获取并加载上述保存的文件,并基于算数解码获取到解码得到的特征图,通过NPU中的AI解码单元对特征图进行重构,得到重构的图像,服务器可以通过CPU对待压缩图片进行无损编码压缩,得到编码数据,例如但不限于基于现有技术中的任意一种无损压缩方法,服务器可以将编码数据传输至终端设备,终端设备可以对接收到的编码数据进行相应的无损解码,得到解码后的图像。
本申请实施例中,可以在AI编码单元至量化单元之间增加对特征图中的特征值进行增益的步骤,以及在算数解码和AI解码单元之间增加对特征图中的特征值进行反增益的步骤,接下来将对本申请实施例中的图像处理方法进行详细的描述。
由于本申请实施例涉及大量神经网络的应用,为了便于理解,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以xs和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2021075405-appb-000001
其中,s=1、2、……、n,n为大于1的自然数,Ws为Xs的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入,激活函数可以是sigmoid函数。神经网络是将多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有多层隐含层的神经网络。按照不同层的位置对DNN进行划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。
虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式:
Figure PCTCN2021075405-appb-000002
其中,
Figure PCTCN2021075405-appb-000003
是输入向量,
Figure PCTCN2021075405-appb-000004
是输出向量,
Figure PCTCN2021075405-appb-000005
是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量
Figure PCTCN2021075405-appb-000006
经过如此简单的操作得到输出向量
Figure PCTCN2021075405-appb-000007
由于DNN层数多,系数W和偏移向量
Figure PCTCN2021075405-appb-000008
的数量也比较多。这些参数在DNN 中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2021075405-appb-000009
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。
综上,第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2021075405-appb-000010
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
(3)卷积神经网络
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取图像信息的方式与位置无关。卷积核可以以随机大小的矩阵的形式初始化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。
(4)损失函数
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有初始化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。
(5)反向传播算法
神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。
本申请实施例首先以应用场景为终端设备为例进行说明。
作为一种示例,所述终端设备可以为手机、平板、笔记本电脑、智能穿戴设备等,终端设备可以对获取到的图片进行压缩处理。作为另一示例,所述终端设备可以为虚拟现实(virtual reality,VR)设备。作为另一示例,本申请实施例也可以应用于智能监控中,可以在所述智能监控中配置相机,则智能监控可以通过相机获取待压缩图片等,应当理解,本申请实施例还可以应用于其他需要进行图像压缩的场景中,此处不再对其他应用场景进行一一列举。
参照图3,图3为本申请实施例提供的一种图像处理方法的实施例示意,如图3示出的那样,本申请实施例提供的一种图像处理方法包括:
301、获取第一图像。
本申请实施例中,第一图像为待压缩的图像,其中,第一图像可以是上述终端设备通过摄像头拍摄到的图像,或者,该第一图像还可以是从终端设备内部获得的图像(例如,终端设备的相册中存储的图像,或者,终端设备从云端获取的图片)。应理解,上述第一图像可以是具有图像压缩需求的图像,本申请并不对待处理图像的来源作任何限定。
302、对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数。
本申请实施例中,可选的,终端设备可以基于CNN对所述第一图像进行特征提取,得到至少一个第一特征图,在下文中,第一特征图也可以称为通道特征图图像,其中每个语义通道对应一个第一特征图(通道特征图图像)。
本申请实施例中,参照图4,图4为一种基于CNN的图像处理过程示意,图4示出了第一图像401、CNN 402以及多个第一特征图403,其中CNN 402可以包括多个CNN层。
例如,CNN 402可以将输入数据(第一图像)的左上3×3像素乘以权重,并将其映射至第一特征图的左上端的神经元。要被乘的权重将也是3×3。此后,在相同的处理中,CNN 402从左到右以及从上到下逐个地扫描输入数据(第一图像),并且乘以权重以映射特征图的神经元。这里,使用的3×3权重被称为滤波器或滤波器核。也就是说,在CNN 402中应用滤波器的过程是使用滤波器核执行卷积运算的过程,并且所提取的结果被称为“第一特征图”,其中,第一特征图也可以为称为多通道特征图图像,术语“多通道特征图图像”可以指与多个通道对应的特征图图像集。根据实施例,可以由CNN 402生成多通道特征图图像,CNN 402也被称为CNN的“特征提取层”或“卷积层”。CNN的层可以定义输出到输入的映射。将由层定义的映射作为一个或多个要被应用于输入数据的滤波器核(卷积核)来执行,以生成要被输出到下一层的特征图图像。输入数据可以是图像或特定层的特征映像图像。
参照图4,在向前执行期间,CNN 402接收第一图像401并作为输出而生成多通道特征图图像403。另外,在向前执行期间,下一层402接收多通道特征图图像403作为输入,并作为输出而生成多通道特征图图像403。然后,每一个后续层将接收在前一层中生成的多通道特征图图像,并作为输出而生成下一多通道特征图图像。最后,通过接收在第(N)层中生成的多通道特征图图像。
同时,除了应用将输入特征图图像映射到输出特征图图像的卷积核的操作之外,还可 以执行其他处理操作。其他处理操作的示例可以包括但不限于诸如激活功能、池化、重采样等的应用。
需要说明的是,以上仅为对所述第一图像进行特征提取的一种实现方式,在实际应用中,并不限定特征提取的具体实现方式。
本申请实施例中,通过上述方式,通过CNN卷积神经网络将原始图像(第一图像)变换到另一空间(至少一个第一特征图)。可选的,第一特征图的数量为192,即语义通道的数量为192,每一个语义通道对应一个第一特征图。本申请实施例中,至少一个第一特征图可以为一个三维张量的形式,其尺寸可以为192×w×h,其中,w×h为单个通道的第一特征图对应的矩阵的宽与长。
本申请实施例中,可以对所述第一图像进行特征提取,得到多个特征值,其中至少一个第一特征图可以为多个特征值中的部分或全部特征图。对比某些对压缩结果产生影响较小的语义通道对应的特征图,可以不进行增益,此时至少一个第一特征图为多个特征值中的部分特征图。
本申请实施例中,所述至少一个第一特征图包括N个第一特征值,所述N为正整数。
303、获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数。
本申请实施例中,终端设备可以获取目标压缩码率,其中目标压缩码率可以是用户指定的,或者是终端设备基于第一图像确定的,这里并不限定。
本申请实施例中,目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数。即目标压缩码率与M个目标增益值之间存在一定的关联关系,终端设备当获取到目标压缩码率之后,就可以根据获取到的目标压缩码率确定对应的M个目标增益值。
可选地,在一种实施例中,终端设备可以根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系。其中,目标映射关系可以为预先存储好的映射关系,当终端设备获取到目标压缩码率之后,可以直接在相应的存储位置查找到与目标压缩码率对应的目标映射关系。
可选地,在一种实施例中,所述目标映射关系可以包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素。
本申请实施例中,目标映射关系可以为预设的表格或者是其他形式,其包括了多个压缩码率以及各个压缩码率对应的增益向量,其中增益向量可以包括多个元素,每个压缩码率对应M个目标增益值,其中M个目标增益值为各个压缩码率对应的增益向量中包括的元素。
可选地,在一种实施例中,所述目标映射关系可以包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
本申请实施例中,目标映射关系可以为预设的目标函数映射关系或者是其他形式,该 目标函数映射关系可以至少表示压缩码率与增益值的对应关系,其中,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
需要说明的是,本申请实施例中,M个目标增益值中可能有部分或全部增益值的大小是相同的,此时,可以用少于M的数量来表示M个第一特征值中与各个第一特征值对应的目标增益值。例如,在一种实施例中,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数,即P个第一特征值为相同语义通道的特征值,其对应的目标增益值相同,此时,可以用一个增益值来表示上述P个第一特征值。
在另一种实施例中,如果每个语义通道对应的各个第一特征值的值都相同的话,可以用和语义通道数量相同的增益值来表示上述M个第一特征值。具体的,当语义通道(第一特征图)的数量为192时,可以用192个增益值来表示上述M个第一特征值。
本申请实施例中,至少一个第一特征图中的全部或部分特征图包括的第一特征值对应的目标增益值可以是相同的,此时,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数,即第一目标特征图为至少一个第一特征图中的一个,其包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同。
本申请实施例中,N个第一特征值可以为至少一个第一特征图包括的全部特征值。当M和N的数量相同时,相当于至少一个第一特征图包括的全部特征值中的每个特征值都有对应的目标增益值,当M小于N时,相当于至少一个第一特征图包括的部分特征值有对应的目标增益值。在一种实施例中,若第一特征图的数量大于1,其中,至少一个第一特征图中的部分特征图包括的全部特征值中的每个特征值都有对应的目标增益值,且至少一个第一特征图中的部分特征图包括的部分特征值有对应的目标增益值。
可选的,在一种实施例中,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
本申请实施例中,在一些场景中,M个第一特征值为N个第一特征值中与某一个或多个目标对象对应的特征值,比如对于监控器拍摄的视频内容,针对于场景相对固定的区域,可以不进行增益,而其中经过的物体或者人的内容可以进行增益。
304、根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值。
本申请实施例中,在获取到目标压缩码率以及目标压缩码率对应的M个目标增益值之后,可以根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值。在一种实施例中,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的,即一个第一特征值乘以对应的目标增益值之后,可以得到相应的第二特征值。
本申请实施例中,为了实现可以在同一个AI压缩模型中实现不同的压缩码率的效果, 针对于获取到的不同目标压缩码率,可以获取到不同的目标增益值,在根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值之后,原有的第一图像对应的至少一个特征图包括的N个第一特征值的分布会因为其中进行增益处理的M个第一特征值而发生改变。
本申请实施例中,参照图5a和图5b,图5a和图5b为本申请实施例中不同压缩码率的特征图的分布示意,其中,不同压缩码率用不同的每像素位数(bits per pixel,bpp)来表示,其中,bpp表示存储每个像素所用的位数,越小代表压缩码率越小,图5a中示出了bpp为1时,N个第一特征值的分布,图5b中示出了bpp为0.15时,N个第一特征值的分布,高压缩码率模型编码网络输出特征(N个第一特征值)在统计直方图中的方差更大,因此量化后的信息熵也更大,因此,只要使得不同压缩码率对应于不同的目标增益值,以使得不同的目标压缩码率,N个第一特征值进行不同程度的增益,就可以实现在单个AI压缩模型上实现多码率的重建效果。具体的,M个目标增益值的选取规则是:目标压缩码率越大,根据所述M个目标增益值分别对对应的第一特征值进行处理之后得到的所述N个第一特征值的分布越分散,因此量化后的信息熵越大。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的全部第一特征图进行处理,其中,多个第一特征图包括的特征值对应的目标增益值都相同,此时,通过对多个第一特征图包括的全部特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的全部第一特征图进行处理,其中,多个第一特征图中的每个第一特征图包括的特征值对应的目标增益值相同,即,每个第一特征图对应一个目标增益值,此时,通过对多个第一特征图中的每个第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的全部第一特征图进行处理,其中,一部分的第一特征图中的每个第一特征图包括的特征值对应的目标增益值相同,剩余部分的第一特征图中的每个第一特征图包括的特征值对应的目标增益值不相同,即,一部分的第一特征图中的每个第一特征图对应一个目标增益值,剩余部分的第一特征图中的每个第一特征图对应于多个目标增益值(同一个特征图中的不同特征值可能对应于不同的目标增益值),此时,通过对多个第一特征图中的部分第一特征图包括的特征值乘以对应的目标增益值,以及,对剩余部分的第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的部分第一特征图进行处理(对比某些对压缩结果产生影响较小的语义通道对应的第一特征图,可以不进行增益)。需要对提取得到的部分第一特征图的数量大于1,其中, 多个第一特征图中的每个第一特征图包括的特征值对应的目标增益值相同,即,每个第一特征图对应一个目标增益值,此时,通过对多个第一特征图中的每个第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的部分第一特征图进行处理(对比某些对压缩结果产生影响较小的语义通道对应的第一特征图,可以不进行增益)。需要对提取得到的部分第一特征图的数量大于1,其中,一部分的第一特征图中的每个第一特征图包括的特征值对应的目标增益值相同,剩余部分的第一特征图中的每个第一特征图包括的特征值对应的目标增益值不相同,即,一部分的第一特征图中的每个第一特征图对应一个目标增益值,剩余部分的第一特征图中的每个第一特征图对应于多个目标增益值(同一个特征图中的不同特征值可能对应于不同的目标增益值),此时,通过对多个第一特征图中的部分第一特征图包括的特征值乘以对应的目标增益值,以及,对剩余部分的第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的部分第一特征图进行处理(对比某些对压缩结果产生影响较小的语义通道对应的第一特征图,可以不进行增益)。需要对提取得到的部分第一特征图的数量等于1,且该第一特征图包括的特征值对应的目标增益值相同,即,该第一特征图对应一个目标增益值,此时,通过对该第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
本申请实施例中,当对所述第一图像进行特征提取得到多个第一特征图后,需要对提取得到的部分第一特征图进行处理(对比某些对压缩结果产生影响较小的语义通道对应的第一特征图,可以不进行增益)。需要对提取得到的部分第一特征图的数量等于1,且该第一特征图包括的特征值对应的目标增益值不相同,即,该第一特征图中的每个第一特征图对应于多个目标增益值(同一个特征图中的不同特征值可能对应于不同的目标增益值),此时,对该第一特征图包括的特征值乘以对应的目标增益值,以改变多个第一特征图包括的N个第一特征值的分布,其中,目标压缩码率越大,所述N个第一特征值的分布越分散。
需要说明的是,可以仅仅对第一特征图包括的部分第一特征值进行增益处理。
需要说明的是,若各个语义通道的特征值都进行相同尺度的增益,即所有语义通道对应的多个第一特征图包括的第一特征值对应于相同的目标增益值,虽然可以实现对N个第一特征值的信息熵进行改变,但是压缩效果较差,因此将增益的基本运算单元设定为语义通道级别(所有语义通道中至少有两个语义通道对应的第一特征图包括的第一特征值的目标增益值不同)或者特征值级别(语义通道对应的第一特征值包括的全部第一特征值中至少有两个第一特征值对应的目标增益值不同),可以使得压缩效果较好。
接下来描述,如何获取到可以实现上述技术效果的M个目标增益值:
一、通过人工确定的方式
本申请实施例中,可以通过人工方式确定一个目标函数映射关系,其中,针对于各个语义通道对应的第一特征图包括的第一特征值对应的目标增益值相同的情况,目标函数映射关系的输入可以为语义通道和目标压缩码率,输出为对应的目标增益值(由于第一特征图包括的第一特征值对应的目标增益值相同,可以用一个目标增益值表示该语义通道对应的全部目标增益值),示例性的,可以利用一个线性函数、二次函数、三次函数或四次函数等来确定对应每个语义通道的目标增益值,参照图6,图6为本申请实施例提供的一种目标映射函数关系的示意,如图6中示出的那样,目标映射函数关系为线性函数,该函数的输入为语义通道序号(例如语义通道序号为1至192),输出为目标映射函数,每个目标压缩码率对应不同的目标映射函数关系,其中,目标压缩码率越大,其对应的目标映射函数关系的斜率越小,二次非线性函数或三次非线性函数的大致分布规律也与此类似,此处不再赘述。
本申请实施例中,可以通过人工方式确定M个第一特征值中每个第一特征值对应的目标增益值。只要使得目标压缩码率越大,所述N个第一特征值的分布越分散即可,关于具体的设定方式,本申请并不限定。
二、通过训练的方式
本申请实施例中,由于在通过训练的方式来获取与各个目标压缩码率对应的M个目标增益值的方式中,需要结合解码侧的过程,因此,通过训练方式来获取与各个目标压缩码率对应的M个目标增益值将在之后的实施例中进行详细描述,这里不再赘述。
305、对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
本申请实施例中,在根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值之后,可以对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
本申请实施例中,将N个第一特征值根据指定规则转换至量化中心,以便后续进行熵编码。量化操作可以将N个第一特征值由浮点数转换为比特流(例如,使用8位整数或4位整数等特定位整数的比特流)。在一些实施例中,可以但不限于采用四舍五入round对N个第一特征值执行量化操作。
本申请实施例中,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。具体的,所述目标压缩码率越大,所述量化数据的信息熵越大。
本申请实施例中,可以利用熵估计网络得到输出特征中各点概率估计,利用该概率估计对输出特征进行熵编码,得到二进制的码流,需要说明的是,本申请提及的熵编码过程可采用现有的熵编码技术,本申请对此不再赘述。
本申请实施例中,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内,其中,预设范围可以在实际应用中选择,只要编码数据对应的压缩码率与目标压缩 码率的差值在可以接受的范围内,本申请并不限定具体的预设范围。
本申请实施例中,在得到编码数据之后,可以将编码数据发送给用于解压缩的终端设备,则用于解压缩的图像处理设备可以对该数据进行解压缩。或者,用于压缩的终端设备可以将编码数据存储在存储设备中,在需要时,终端设备可以从存储设备中获取编码数据,并可以对该编码数据进行解压缩。
可选的,在一种实施例中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。本申请实施例中,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的。
本申请实施例中,可在单个模型上实现多个压缩码率的压缩效果,具体的,针对于多个目标压缩码率,可以对应设置不同的目标增益值,以此实现不同压缩码率的压缩效果,之后可以利用插值算法对目标增益值进行插值运算,可得到压缩码率区间内任意压缩效果的新的增益值。具体的,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的,其中,插值运算可以为基于如下的公式进行运算:
m l=[(m i) l·(m j) 1-l];
其中,m l表示第三目标增益值,m i表示第一目标增益值,m j表示第二目标增益值,m l、m i和m j对应于相同的特征值,l∈(0,1)为调节系数,可以根据目标压缩码率的大小来确定。
本申请实施例中,在获取到多个压缩码率中每个压缩码率对应的M个目标增益值之后,若要进行目标压缩码率对应的压缩时,可以从多个压缩码率中确定与目标压缩码率相邻的两组目标增益值(每组包括M个目标增益值),并对两组目标增益值进行上述的插值处理,以得到目标压缩码率对应的M个目标增益值。本申请实施例中,可实现AI压缩模型在压缩码率区间内任意的压缩效果。
本申请实施例中,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。关于解码侧的反增益过程将在之后的实施例中描述,这里不再赘述。
本申请实施例提供了一种图像处理方法,获取第一图像;对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为 正整数;获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。通过上述方式,针对于不同的目标压缩码率,设置不同的目标增益值,从而实现压缩码率的控制。
参照图7,图7为本申请实施例提供的一种图像处理方法的实施例示意,如图7中示出的那样,本实施例中提供的图像处理方法,包括:
701、获取编码数据。
本申请实施例中,可以获取如图3以及对应的实施例中得到的编码数据。
本申请实施例中,在得到编码数据之后,可以将编码数据发送给用于解压缩的终端设备,则用于解压缩的图像处理设备可以获取编码数据,并对该数据进行解压缩。或者,用于压缩的终端设备可以将编码数据存储在存储设备中,在需要时,终端设备可以从存储设备中获取编码数据,并可以对该编码数据进行解压缩。
702、对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数。
本申请实施例中,可以利用现有技术中的熵解码技术对编码数据进行解码,得到重建的输出特征(至少一个第二特征图),其中,所述至少一个第二特征图包括N个第三特征值。
需要说明的是,本申请实施例中的至少一个第二特征图可以与上述处理后的至少一个第一特征图相同。
703、获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数。
可选的,在一种实施例中,可以获取目标压缩码率,并根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
本申请实施例中,可以在图3对应的实施例中获取目标增益值步骤的同时获取到反目标增益值,这里并不限定。
可选的,在一种实施例中,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
可选的,在一种实施例中,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
704、根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值。
本申请实施例中,所述M个第四特征值可以为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。具体的,本申请实施例中,至少一个第二特征图中的M个第三特征值分别与对应的反增益值相乘,得到M个第四特征值,进而经过反增益处理的至少一个第二特征图,其包括M个第四特征值。通过上述的反增益处理,结合图3对应的实施例中的增益处理,可以保证图像的正常解析。
705、对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
本申请实施例中,在得到M个第四特征值之后,可以对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值,通过上述方式将至少一个第二特征图解析重建为第二图像。
可选的,在一种实施例中,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。本申请实施例中,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
本申请实施例中,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内,即针对于同一个特征值,其对应的目标增益值和反增益值之间存在一定的数值关系:两者的乘积在预设范围内,该预设范围可以为数值“1”附近的数值范围,这里并不限定。
本申请实施例提供了一种图像处理方法,获取编码数据;对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。通过上述方式,针对于不同的目标压缩码率,设置不同的反目标增益值,从而实现压缩码率的控制。
接下来以变分自编码器(variational autoencoder,VAE)的架构为例,介绍本申请实施例提供的图像压缩方法,其中,变分自编码器是用于数据压缩或去噪的自动编码器的一种。
参照图8,图8为本申请实施例提供的一种图像压缩的流程示意。
本实施例以相同语义通道对应的目标增益值相同、以及相同语义通道对应的目标反增益值相同为例进行说明,语义通道数为192,在训练中需要在4个指定码点(4个压缩码率)进行训练,每个压缩码率对应一个目标增益向量以及一个目标反增益向量,目标增益向量m i为对应某一压缩码率的尺寸为192×1的向量,反目标增益向量m′ i为对应某一压缩码率的尺寸为192×1的向量,y是编码网络输出特征(包括至少一个第一特征图),尺寸为192×w×h,w×h为单个语义通道特征图的宽与长,
Figure PCTCN2021075405-appb-000011
y′分别为经过增益处理、量化、熵编码以及熵解码和反增益处理后得到的新的输出特征,尺寸均与y相同。本实施例采用VAE方法作为模型基础框架,加入了增益单元与反增益单元,如图8中示出的那样,模型运行可以为如下步骤:
801、第一图像进过编码网络得到输出特征y。
802、输出特征y与对应的增益向量m i逐通道相乘得到经过增益处理的输出特征
Figure PCTCN2021075405-appb-000012
803、输出特征
Figure PCTCN2021075405-appb-000013
经过量化,得到特征
Figure PCTCN2021075405-appb-000014
804、利用熵估计模块得到输出特征中各点概率估计,利用该概率估计对输出特征进行熵编码,得到二进制的码流。
805、利用熵解码器对二进制码流进行熵解码,得到重建的输出特征
Figure PCTCN2021075405-appb-000015
806、输出特征
Figure PCTCN2021075405-appb-000016
与从对应的反增益向量m′ i逐通道相乘,得到经过反增益处理的输出特征y′。
807、输出特征y′进入解码网络,被解析重建为第二图像。
参照图9,图9中的左图为本实施例单个模型(非虚线)与现有技术中的VAE方法分别训练四个压缩模型(虚线)在以多尺度结构相似性(multi-scale structural similarity index measure,MS-SSIM)为评价指标的条件下率失真性能的对比,其中,横坐标为BPP,纵坐标为MS-SIM;图9中的右图为本实施例中单个模型(非虚线)与现有技术中的VAE方法分别训练4个压缩模型(虚线)在以峰值信噪比(peak signal to noise ratio,PSNR)为评价指标的条件下率失真性能的对比,其中,横坐标为BPP,纵坐标为PSNR。可以看出,本实施例中在模型参数量与VAE方法单个模型基本一致的前提下,在两项评价指标上均可实现任意码率的压缩效果,且压缩效果不弱于VAE方法多个模型实现效果,可减少N倍的模型存储量(N为VAE方法实现本发明实例不同码率压缩效果所需的模型个数)。
参照图10,图10为本申请实施例的一种训练过程示意,如图10中示出的那样,本实施例中模型的损失函数为:
loss=l d+β·l r
其中,l d为根据评价指标计算第二图像相对于第一图像的失真损失,l r为熵估计网络计算得出的码率损失(或者称为码率估计),β为调整失真损失与码率估计间权衡的拉格朗日系数。
为了获得与不同压缩码率匹配的增益/反增益矩阵{M,M′},模型训练过程可以如图10所示:在模型训练过程中不断变换损失函数中的拉格朗日系数β,并从随机初始化的增益/ 反增益矩阵{M,M′}中选定对应的增益/反增益向量{m i,m′ i}分别放置在编码网络后端/解码网络前端,从而实现增益/反增益矩阵{M,M′}与模型的联合优化,这样可在单个模型上实现多个压缩码率的压缩效果。
以单个模型上可以实现4个压缩码率的压缩效果为例,将训练得到4个增益向量与对应的反增益向量相乘,不同压缩码率对应的目标增益向量与反目标增益向量中对应元素相乘结果近似相等,可以得到下面的关系式:
m i·m′ i≈m j·m′ j=C;
其中,[m i,m′ i]、[m j,m′ j]分别为对应不同压缩码率的增益/反增益向量对,C为元素均为常数的向量,i,j∈(1,4)。
为了在单个模型模型上实现码率的连续调节,本实施例可以利用上述公式进行如下推导:
(m i·m′ i) l·(m j·m′ j) 1-l=C l·C 1-l=C;
[(m i) l·(m j) 1-l]·[(m′ i) l·(m′ j) 1-l]=C;
m l=[(m i) l·(m j) 1-l],m′ l=[(m′ i) l·(m′ j) 1-l];
其中,m i与m j为增益/反增益矩阵中两个相邻的增益/反增益向量,l∈(0,1)为调节系数。
本申请实施例中,可以对训练得到的相邻的四个增益/反增益向量对进行插值运算,得到新的增益/反增益向量对。
其中,为了获得与不同压缩码率匹配的增益矩阵M,训练过程如下:本实施例在模型训练过程中不断变换损失函数中的拉格朗日系数,并从随机初始化的增益矩阵M中选定对应的增益向量m i和反增益向量m′ i,其中,反增益向量m′ i可以是对增益向量m i取倒数生成的。具体细节可以参照上述实施例中,步骤705中关于目标增益值和目标反增益值的选择规则的描述,这里不再赘述。
本申请实施例中,增益向量m i和反增益向量m′ i分别放置在编码网络后端/解码网络前端,从而实现增益矩阵M与模型的联合优化,这样可在单个模型上实现4个码率的压缩效果,具体可以参照图11,图11为本申请实施例的一种图像处理过程示意。再利用的插值算法对训练得到的相邻的四个增益向量对进行插值运算,可得到码率区间内任意压缩效果的新的增益向量。
本实施例在模型参数量与单个VAE方法模型基本一致的前提下,可实现任意码率的压缩效果,且压缩效果不弱于各码率独自训练的效果,且可减少N倍的模型存储量(N为VAE方法实现本发明实例不同码率压缩效果所需的模型个数)。
需要说明的是,以上仅仅以VAE为架构进行了说明,在实际应用中,还可以应用在其他AI压缩模型架构中(例如自动编码器auto-encoder或其他图像压缩模型),本申请并不限定。
请先参阅图12,图12为本申请实施例提供的图像处理系统的一种系统架构图,在图12中,图像处理系统200包括执行设备210、训练设备220、数据库230、客户设备240和数据存储系统250,执行设备210中包括计算模块211。
其中,数据库230中存储有第一图像集合,训练设备220生成用于处理第一图像的目标模型/规则201,并利用数据库中的第一图对目标模型/规则201进行迭代训练,得到成熟的目标模型/规则201。本申请实施例中以目标模型/规则201包括第二编解码网络、各个压缩码率对应的M个目标增益值以及M个目标反增益值为例进行说明。
训练设备220得到的第二编解码网络、各个压缩码率对应的M个目标增益值以及M个目标反增益值可以应用不同的系统或设备中,例如手机、平板、笔记本电脑、VR设备、监控系统等等。其中,执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。数据存储系统250可以置于执行设备210中,也可以为数据存储系统250相对执行设备210是外部存储器。
计算模块211可以通过第二编解码网络对客户设备240接收的第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
计算模块211还可以通过第二编解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
本申请的一些实施例中,请参阅图12,执行设备210和客户设备240可以为分别独立的设备,执行设备210配置有I/O接口212,与客户设备240进行数据交互,“用户”可以通过客户设备240向I/O接口212输入第一图像,执行设备210通过I/O接口212将第二图像返回给客户设备240,提供给用户。
值得注意的,图12仅是本发明实施例提供的图像处理系统的架构示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备210可以配置于客户设备240中,作为示例,例如当客户设备为手机或平板时,执行设备210可以为手机或平板的主处理器(Host CPU)中用于进行阵列图像处理的模块,执行设备210也可以为手机或平板中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到主处理器上,由主处理器分配 任务。
结合上述描述,下面开始对本申请实施例提供的图像处理方法的训练阶段的具体实现流程进行描述。
一、训练阶段
具体的,请参阅图13,图13为本申请实施例提供的图像处理方法的一种流程示意图,本申请实施例提供的图像处理方法可以包括:
1301、获取第一图像;
1302、根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
1303、获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
1304、根据所述M个初始增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
1305、根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
1306、根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
1307、根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
1308、根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
1309、获取所述第二图像相对于所述第一图像的失真损失。
1310、利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络。
1311、输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
步骤1301至步骤1311的具体描述可以参照上述实施例中的描述,这里不再限定。
可选的,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
可选的,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
可选的,所述M个第二特征值为通过将M个初始增益值分别与对应的第一特征值进行 乘积运算得到的。
可选的,所述M个第四特征值为通过将M个初始反增益值分别与对应的第三特征值进行乘积运算得到的。
可选的,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内,所述M个初始增益值中的每个初始增益值与对应的初始反增益值的乘积在预设范围内。
在图1至图13所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图14,图14为本申请实施例提供的图像处理装置1400的一种结构示意图,图像处理装置1400可以是终端设备或服务器,图像处理装置1400包括:
获取模块1401,用于获取第一图像;
特征提取模块1402,用于对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
所述获取模块1401,还用于获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
增益模块1403,用于根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
量化和熵编码模块1404,用于对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
可选的,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
可选的,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
可选的,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
可选的,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
可选的,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
可选的,所述装置还包括:
确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标 压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
可选的,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的。
可选的,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
可选的,所述装置还包括:
解码模块,用于对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,每个第三特征值对应一个第一特征值;
所述获取模块,还用于获取M个目标反增益值,每个目标反增益值对应一个第三特征值;
所述装置还包括:
反增益模块,用于根据所述M个目标反增益值分别对对应的第三特征值进行增益,得到M个第四特征值;
重构模块,用于对反增益处理后的至少一个第二特征图进行图像重构,得到第二图像,所述反增益处理后的至少一个第二特征图包括所述M个第四特征值。
可选的,所述M个第四特征值为将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
可选的,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
可选的,所述确定模块,还用于:
根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系。
可选的,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素。
可选的,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
可选的,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中 与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
可选的,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
本申请实施例提供了一种图像处理装置1400,获取模块1401获取第一图像;特征提取模块1402对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;所述获取模块1401获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;增益模块1403根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;量化和熵编码模块1404对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。通过上述方式,针对于不同的目标压缩码率,设置不同的目标增益值,从而实现压缩码率的控制。
参阅图15,图15为本申请实施例提供的图像处理装置1500的一种结构示意图,图像处理装置1500可以是终端设备或服务器,图像处理装置1500包括:
获取模块1501,用于获取编码数据;
解码模块1502,用于对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
所述获取模块1501,还用于获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
反增益模块1503,用于根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
重构模块1504,用于对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
可选的,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
可选的,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
可选的,所述获取模块,还用于获取目标压缩码率;
所述装置还包括:
确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
可选的,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
可选的,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
本申请实施例提供了一种图像处理装置,获取模块1501获取编码数据;解码模块1502对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;所述获取模块1501获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;反增益模块1503根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;重构模块1504对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征。通过上述方式,针对于不同的目标压缩码率,设置不同的目标增益值,从而实现压缩码率的控制。
参阅图16,图16为本申请实施例提供的图像处理装置1600的一种结构示意图,图像处理装置1600可以是终端设备或服务器,图像处理装置1600包括:
获取模块1601,用于获取第一图像;
特征提取模块1602,用于根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
所述获取模块1601,还用于获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
增益模块1603,用于根据所述M个初始增益值分别对对应的第一特征值进行处理,得 到M个第二特征值;
量化和熵编码模块1604,用于根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
解码模块1605,用于根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
反增益模块1606,用于根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
重构模块1607,用于根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
所述获取模块1601,还用于获取所述第二图像相对于所述第一图像的失真损失;
训练模块1608,用于利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络;
输出模块1609,用于输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
可选的,所述增益处理后的至少一个第一特征图量化后的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关,所述N为大于或等于所述M的正整数。
可选的,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
可选的,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
可选的,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
可选的,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内,所述M个初始增益值中的每个目标增益值与对应的初始反增益值的乘积在预设范围内。
接下来介绍本申请实施例提供的一种执行设备,请参阅图17,图17为本申请实施例提供的执行设备的一种结构示意图,执行设备1700具体可以表现为虚拟现实VR设备、手机、平板、笔记本电脑、智能穿戴设备、监控数据处理设备等,此处不做限定。其中,执 行设备1700上可以部署有图14和图15对应实施例中所描述的图像处理装置,用于实现图14和图15对应实施例中图像处理装置的功能。具体的,执行设备1700包括:接收器1701、发射器1702、处理器1703和存储器1704(其中执行设备1700中的处理器1703的数量可以一个或多个,图17中以一个处理器为例),其中,处理器1703可以包括应用处理器17031和通信处理器17032。在本申请的一些实施例中,接收器1701、发射器1702、处理器1703和存储器1704可通过总线或其它方式连接。
存储器1704可以包括只读存储器和随机存取存储器,并向处理器1703提供指令和数据。存储器1704的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1704存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1703控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1703中,或者由处理器1703实现。处理器1703可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1703中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1703可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1703可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1704,处理器1703读取存储器1704中的信息,结合其硬件完成上述方法的步骤。
接收器1701可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1702可用于通过第一接口输出数字或字符信息;发射器1702还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1702还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,处理器1703,用于执行图9至图11对应实施例中的执行设备执行的图像处理方法。具体的,应用处理器17031,用于获取第一图像;
对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
可选的,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
可选的,所述预设条件至少包括:
所述目标压缩码率越大,所述量化数据的信息熵越大。
可选的,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
可选的,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
可选的,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
可选的,应用处理器17031,还用于:
根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
可选的,所述M个第一增益值包括第一目标增益值,所述M个第二增益值包括第二目标增益值,所述M个目标增益值包括第三目标增益值,所述第一目标增益值、所述第二目标增益值和所述第三目标增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标增益值为通过对所述第一目标增益值和所述第二目标增益值进行插值运算得到的。
可选的,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
可选的,应用处理器17031,还用于:
对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,每个第三特征值对应一个第一特征值;获取M个目标反增益值,每个目标反增益值对应一个第三特征值;根据所述M个目标反增益值分别对对应的第三特征值 进行增益,得到M个第四特征值;对反增益处理后的至少一个第二特征图进行图像重构,得到第二图像,所述反增益处理后的至少一个第二特征图包括所述M个第四特征值。
可选的,所述M个第四特征值为将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
可选的,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
可选的,应用处理器17031,还用于:根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系。
可选的,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素。
可选的,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
可选的,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
可选的,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
具体的,应用处理器17031,用于:
获取编码数据;
对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
可选的,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
可选的,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
可选的,应用处理器17031,还用于:获取目标压缩码率;根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
可选的,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
可选的,所述M个第一反增益值包括第一目标反增益值,所述M个第二反增益值包括第二目标反增益值,所述M个目标反增益值包括第三目标反增益值,所述第一目标反增益值、所述第二目标反增益值和所述第三目标反增益值对应于所述M个第一特征值中的同一个特征值,所述第三目标反增益值为通过对所述第一目标反增益值和所述第二目标反增益值进行插值运算得到的。
本申请实施例还提供了一种训练设备,请参阅图18,图18是本申请实施例提供的训练设备一种结构示意图,训练设备1800上可以部署有图16对应实施例中所描述的图像处理装置,用于实现图16对应实施例中图像处理装置的功能,具体的,训练设备1800由一个或多个服务器实现,训练设备1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在训练设备1800上执行存储介质1830中的一系列指令操作。
训练设备1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,中央处理器1822,用于执行图16对应实施例中的图像处理装置执行的图像处理方法。具体的,中央处理器1822,用于
获取第一图像;
根据编码网络对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
获取目标压缩码率,所述目标压缩码率对应于M个初始增益值以及M个初始反增益值,每个初始增益值对应一个第一特征值,每个初始反增益值对应一个第一特征值,所述M为小于或等于N的正整数;
根据所述M个初始增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
根据量化网络和熵编码网络对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据以及码率损失,所述增益处理后的至少一个第一特征图包括所述M个第二特征值;
根据熵解码网络对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括M个第三特征值,每个第三特征值对应一个第一特征值;
根据所述M个初始反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
根据解码网络对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个特征图包括所述M个第四特征值;
获取所述第二图像相对于所述第一图像的失真损失;
利用损失函数对第一编解码网络、M个初始增益值以及M个初始反增益值,进行联合训练,直至第一图像与所述第二图像之间的图像失真值达到第一预设程度,所述图像失真值与所述码率损失以及所述失真损失有关,所述编解码网络包括所述编码网络、量化网络、熵编码网络以及熵解码网络;
输出第二编解码网络、M个目标增益值以及M个目标反增益值,所述第二编解码网络为所述第一编解码网络执行过迭代训练后得到的模型,所述M个目标增益值以及M个目标反增益值为所述M个初始增益值以及M个初始反增益值执行过迭代训练后得到的。
可选的,所述增益处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
可选的,所述预设条件至少包括:所述目标压缩码率越大,所述量化数据的信息熵越大。
可选的,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
可选的,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
可选的,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
可选的,所述M个目标增益值中的每个目标增益值与对应的目标反增益值的乘积在预设范围内,所述M个初始增益值中的每个初始增益值与对应的初始反增益值的乘积在预设范围内。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图17所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图18所示实施例描述的方法中训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图17所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图18所示实施例描述的方法中训练设备所执行的步骤。
本申请实施例提供的执行设备、训练设备或终端设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述图3至图7所示实施例描述的图像处理方法,或者,以使训练设备内的芯片执行上述图13所示实施例描述的图像处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图19,图19为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 2000,NPU 2000作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路2003,通过控制器2004控制运算电路2003提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路2003内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路2003是二维脉动阵列。运算电路2003还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路2003是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器2002中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器2001中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)2008中。
统一存储器2006用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)2005,DMAC被搬运到权重存储器2002中。输入数据也通过DMAC被搬运到统一存储器2006中。
BIU为Bus Interface Unit即,总线接口单元2010,用于AXI总线与DMAC和取指存储器(Instruction Fetch Buffer,IFB)2009的交互。
总线接口单元2010(Bus Interface Unit,简称BIU),用于取指存储器2009从外部存储器获取指令,还用于存储单元访问控制器2005从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器2006或将权重数据搬运到权重存储器2002中或将输入数据数据搬运到输入存储器2001中。
向量计算单元2007包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如Batch Normalization(批归一化),像素级求和,对特征平面进行上采样等。
在一些实现中,向量计算单元2007能将经处理的输出的向量存储到统一存储器2006。例如,向量计算单元2007可以将线性函数和/或非线性函数应用到运算电路2003的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元2007生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路2003的激活输入,例如用于在神经网络中的后续层中的使用。
控制器2004连接的取指存储器(instruction fetch buffer)2009,用于存储控制器2004使用的指令;
统一存储器2006,输入存储器2001,权重存储器2002以及取指存储器2009均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储 在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (33)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取第一图像;
    对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
    获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
    根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
    对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
  2. 根据权利要求1所述的方法,其特征在于,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
  3. 根据权利要求2所述的方法,其特征在于,所述预设条件至少包括:
    所述目标压缩码率越大,所述量化数据的信息熵越大。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
  5. 根据权利要求1至4任一所述的方法,其特征在于,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
  7. 根据权利要求1至6任一所述的方法,其特征在于,所述方法还包括:
    根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
    其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
    所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述目标压缩码率大于第一压 缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
  9. 根据权利要求1至8任一所述的方法,其特征在于,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
  10. 根据权利要求1至9任一所述的方法,其特征在于,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
  11. 一种图像处理方法,其特征在于,所述方法包括:
    获取编码数据;
    对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
    获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
    根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
    对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
  12. 根据权利要求11所述的方法,其特征在于,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
  13. 根据权利要求11或12所述的方法,其特征在于,所述至少一个第二特征图包括第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
  14. 根据权利要求11至13任一所述的方法,其特征在于,所述方法还包括:
    获取目标压缩码率;
    根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;
    其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,
    所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标 压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
  15. 根据权利要求11至14任一所述的方法,其特征在于,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
  16. 根据权利要求11至15任一所述的方法,其特征在于,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
  17. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取第一图像;
    特征提取模块,用于对所述第一图像进行特征提取,得到至少一个第一特征图,所述至少一个第一特征图包括N个第一特征值,所述N为正整数;
    所述获取模块,还用于获取目标压缩码率,所述目标压缩码率对应于M个目标增益值,每个目标增益值对应一个第一特征值,所述M为小于或等于N的正整数;
    增益模块,用于根据所述M个目标增益值分别对对应的第一特征值进行处理,得到M个第二特征值;
    量化和熵编码模块,用于对处理后的至少一个第一特征图进行量化和熵编码,得到编码数据,所述处理后的至少一个第一特征图包括所述M个第二特征值。
  18. 根据权利要求17所述的装置,其特征在于,对处理后的至少一个第一特征图进行量化得到的量化数据的信息熵满足预设条件,所述预设条件与所述目标压缩码率有关。
  19. 根据权利要求18所述的装置,其特征在于,所述预设条件至少包括:
    所述目标压缩码率越大,所述量化数据的信息熵越大。
  20. 根据权利要求17至19任一所述的装置,其特征在于,所述编码数据对应的压缩码率与所述目标压缩码率的差值在预设范围内。
  21. 根据权利要求17至20任一所述的装置,其特征在于,所述M个第二特征值为通过将M个目标增益值分别与对应的第一特征值进行乘积运算得到的。
  22. 根据权利要求17至21任一所述的装置,其特征在于,所述至少一个第一特征图包括第一目标特征图,所述第一目标特征图包括P个第一特征值,所述P个第一特征值中的每个第一特征值对应的目标增益值相同,所述P为小于或等于所述M的正整数。
  23. 根据权利要求17至22任一所述的装置,其特征在于,所述装置还包括:
    确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标增益值,所述目标映射关系用于表示压缩码率与M个目标增益值之间的关联关系;
    其中,所述目标映射关系包括多个压缩码率以及多个增益向量、以及多个压缩码率与多个增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标增益值为所述多个增益向量中的一个向量的元素;或,
    所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标增益值。
  24. 根据权利要求17至23任一所述的装置,其特征在于,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一增益值,所述第二压缩码率对应于M个第二增益值,所述M个目标增益值为对所述M个第一增益值和所述M个第二增益值进行插值运算得到的。
  25. 根据权利要求17至24任一所述的装置,其特征在于,所述第一图像包括目标对象,所述M个第一特征值为所述至少一个特征图中与所述目标对象对应的特征值。
  26. 根据权利要求17至25任一所述的装置,其特征在于,所述M个目标增益值中的每个目标增益值对应于一个反增益值,反增益值用于对所述编码数据进行解码过程中得到的特征值进行处理,所述M个目标增益值中的每个目标增益值与对应的反增益值的乘积在预设范围内。
  27. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取编码数据;
    解码模块,用于对所述编码数据进行熵解码,得到至少一个第二特征图,所述至少一个第二特征图包括N个第三特征值,所述N为正整数;
    所述获取模块,还用于获取M个目标反增益值,每个目标反增益值对应一个第三特征值,所述M为小于或等于N的正整数;
    反增益模块,用于根据所述M个目标反增益值分别对对应的第三特征值进行处理,得到M个第四特征值;
    重构模块,用于对处理后的至少一个第二特征图进行图像重构,得到第二图像,所述处理后的至少一个第二特征图包括所述M个第四特征值。
  28. 根据权利要求27所述的装置,其特征在于,所述M个第四特征值为通过将M个目标反增益值分别与对应的第三特征值进行乘积运算得到的。
  29. 根据权利要求27或28所述的装置,其特征在于,所述至少一个第二特征图包括 第二目标特征图,所述第二目标特征图包括P个第三特征值,所述P个第三特征值中的每个第三特征值对应的目标反增益值相同,所述P为小于或等于所述M的正整数。
  30. 根据权利要求27至29任一所述的装置,其特征在于,所述获取模块,还用于获取目标压缩码率;
    所述装置还包括:
    确定模块,用于根据目标映射关系确定所述目标压缩码率对应的M个目标反增益值,所述目标映射关系用于表示压缩码率与反增益向量之间的关联关系;
    其中,所述目标映射关系包括多个压缩码率以及多个反增益向量、以及多个压缩码率与多个反增益向量之间的关联关系,所述目标压缩码率为所述多个压缩码率中的一个,所述M个目标反增益值为所述多个反增益向量中的一个向量的元素;或,
    所述目标映射关系包括目标函数映射关系,当所述目标函数关系的输入包括所述目标压缩码率时,所述目标函数关系的输出包括所述M个目标反增益值。
  31. 根据权利要求27至30任一所述的装置,其特征在于,所述第二图像包括目标对象,所述M个第三特征值为所述至少一个特征图中与所述目标对象对应的特征值。
  32. 根据权利要求27至31任一所述的装置,其特征在于,所述目标压缩码率大于第一压缩码率,且小于第二压缩码率,所述第一压缩码率对应于M个第一反增益值,所述第二压缩码率对应于M个第二反增益值,所述M个目标反增益值为通过对所述M个第一反增益值和所述M个第二反增益值进行插值运算得到的。
  33. 一种图像处理设备,包括:相互耦合的非易失性存储器和处理器,所述处理器调用存储在所述存储器中的程序代码以执行如权利要求1-16任一项所描述的方法。
PCT/CN2021/075405 2020-02-07 2021-02-05 一种图像处理方法以及相关设备 WO2021155832A1 (zh)

Priority Applications (9)

Application Number Priority Date Filing Date Title
AU2021215764A AU2021215764A1 (en) 2020-02-07 2021-02-05 Image processing method and related device
KR1020227030515A KR20220137076A (ko) 2020-02-07 2021-02-05 이미지 프로세싱 방법 및 관련된 디바이스
CA3167227A CA3167227A1 (en) 2020-02-07 2021-02-05 Method and device for feature extraction and compression in image processing
EP21751079.1A EP4090022A4 (en) 2020-02-07 2021-02-05 IMAGE PROCESSING METHOD AND DEVICE ASSOCIATED
JP2022548020A JP2023512570A (ja) 2020-02-07 2021-02-05 画像処理方法および関連装置
BR112022015510A BR112022015510A2 (pt) 2020-02-07 2021-02-05 Método de processamento de imagem e dispositivos relacionados
CN202180013213.6A CN115088257A (zh) 2020-02-07 2021-02-05 一种图像处理方法以及相关设备
MX2022009686A MX2022009686A (es) 2020-02-07 2021-02-05 Metodo de procesamiento de imagenes y dispositivo relacionado.
US17/881,432 US20220375133A1 (en) 2020-02-07 2022-08-04 Image processing method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010082808.4A CN113259665B (zh) 2020-02-07 2020-02-07 一种图像处理方法以及相关设备
CN202010082808.4 2020-02-07

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/881,432 Continuation US20220375133A1 (en) 2020-02-07 2022-08-04 Image processing method and related device

Publications (1)

Publication Number Publication Date
WO2021155832A1 true WO2021155832A1 (zh) 2021-08-12

Family

ID=77200542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075405 WO2021155832A1 (zh) 2020-02-07 2021-02-05 一种图像处理方法以及相关设备

Country Status (10)

Country Link
US (1) US20220375133A1 (zh)
EP (1) EP4090022A4 (zh)
JP (1) JP2023512570A (zh)
KR (1) KR20220137076A (zh)
CN (2) CN113259665B (zh)
AU (1) AU2021215764A1 (zh)
BR (1) BR112022015510A2 (zh)
CA (1) CA3167227A1 (zh)
MX (1) MX2022009686A (zh)
WO (1) WO2021155832A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822955A (zh) * 2021-11-18 2021-12-21 腾讯医疗健康(深圳)有限公司 图像数据处理方法、装置、计算机设备及存储介质
CN114051082A (zh) * 2021-10-19 2022-02-15 河南师范大学 基于失真度和信息增益比的隐写检测特征选取方法及装置
CN114630125A (zh) * 2022-03-23 2022-06-14 徐州百事利电动车业有限公司 基于人工智能与大数据的车辆图像压缩方法与系统
WO2023051335A1 (zh) * 2021-09-30 2023-04-06 华为技术有限公司 数据编码方法、数据解码方法以及数据处理装置
JP7476631B2 (ja) 2019-05-22 2024-05-01 富士通株式会社 画像コーディング方法及び装置並びに画像デコーディング方法及び装置

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113840145B (zh) * 2021-09-23 2023-06-09 鹏城实验室 一种面向人眼观看和视觉分析联合优化的图像压缩方法
CN116778003A (zh) * 2022-03-10 2023-09-19 华为技术有限公司 一种特征图编码、特征图解码方法及装置
CN114944945A (zh) * 2022-05-09 2022-08-26 江苏易安联网络技术有限公司 一种基于变分自编码器和属性的动态访问控制方法
CN118250463A (zh) * 2022-12-23 2024-06-25 维沃移动通信有限公司 图像处理方法、装置及设备

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190132591A1 (en) * 2017-10-26 2019-05-02 Intel Corporation Deep learning based quantization parameter estimation for video encoding
CN110225342A (zh) * 2019-04-10 2019-09-10 中国科学技术大学 基于语义失真度量的视频编码的比特分配系统及方法

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103903271B (zh) * 2014-04-11 2017-01-18 北京航空航天大学 一种针对自然图像和基于dwt压缩篡改图像的图像的取证方法
ITUB20153912A1 (it) * 2015-09-25 2017-03-25 Sisvel Tech S R L Metodi e apparati per codificare e decodificare immagini digitali mediante superpixel
CN109996066A (zh) * 2017-12-29 2019-07-09 富士通株式会社 图像编码装置,图像解码装置和电子设备
US11257254B2 (en) * 2018-07-20 2022-02-22 Google Llc Data compression using conditional entropy models
CN110222717B (zh) * 2019-05-09 2022-01-14 华为技术有限公司 图像处理方法和装置
CN110163370B (zh) * 2019-05-24 2021-09-17 上海肇观电子科技有限公司 深度神经网络的压缩方法、芯片、电子设备及介质
CN110222758B (zh) * 2019-05-31 2024-04-23 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及存储介质
WO2022155245A1 (en) * 2021-01-12 2022-07-21 Qualcomm Incorporated Variable bit rate compression using neural network models
US11943460B2 (en) * 2021-01-12 2024-03-26 Qualcomm Incorporated Variable bit rate compression using neural network models

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190132591A1 (en) * 2017-10-26 2019-05-02 Intel Corporation Deep learning based quantization parameter estimation for video encoding
CN110225342A (zh) * 2019-04-10 2019-09-10 中国科学技术大学 基于语义失真度量的视频编码的比特分配系统及方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP4090022A4
TONG CHEN; HAOJIE LIU; ZHAN MA; QIU SHEN; XUN CAO; YAO WANG: "Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 October 2019 (2019-10-11), 201 Olin Library Cornell University Ithaca, NY 14853, XP081514975 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7476631B2 (ja) 2019-05-22 2024-05-01 富士通株式会社 画像コーディング方法及び装置並びに画像デコーディング方法及び装置
WO2023051335A1 (zh) * 2021-09-30 2023-04-06 华为技术有限公司 数据编码方法、数据解码方法以及数据处理装置
CN114051082A (zh) * 2021-10-19 2022-02-15 河南师范大学 基于失真度和信息增益比的隐写检测特征选取方法及装置
CN114051082B (zh) * 2021-10-19 2023-10-27 河南师范大学 基于失真度和信息增益比的隐写检测特征选取方法及装置
CN113822955A (zh) * 2021-11-18 2021-12-21 腾讯医疗健康(深圳)有限公司 图像数据处理方法、装置、计算机设备及存储介质
CN113822955B (zh) * 2021-11-18 2022-02-22 腾讯医疗健康(深圳)有限公司 图像数据处理方法、装置、计算机设备及存储介质
CN114630125A (zh) * 2022-03-23 2022-06-14 徐州百事利电动车业有限公司 基于人工智能与大数据的车辆图像压缩方法与系统
CN114630125B (zh) * 2022-03-23 2023-10-27 徐州百事利电动车业有限公司 基于人工智能与大数据的车辆图像压缩方法与系统

Also Published As

Publication number Publication date
AU2021215764A1 (en) 2022-09-15
CN115088257A (zh) 2022-09-20
CN113259665A (zh) 2021-08-13
MX2022009686A (es) 2022-11-16
CA3167227A1 (en) 2021-08-12
BR112022015510A2 (pt) 2022-09-27
KR20220137076A (ko) 2022-10-11
JP2023512570A (ja) 2023-03-27
EP4090022A1 (en) 2022-11-16
EP4090022A4 (en) 2023-06-07
CN113259665B (zh) 2022-08-09
US20220375133A1 (en) 2022-11-24

Similar Documents

Publication Publication Date Title
WO2021155832A1 (zh) 一种图像处理方法以及相关设备
US11272188B2 (en) Compression for deep neural network
US11729406B2 (en) Video compression using deep generative models
US20210125070A1 (en) Generating a compressed representation of a neural network with proficient inference speed and power consumption
WO2022021938A1 (zh) 图像处理方法与装置、神经网络训练的方法与装置
WO2022022176A1 (zh) 一种图像处理方法以及相关设备
WO2023231794A1 (zh) 一种神经网络参数量化方法和装置
WO2022179588A1 (zh) 一种数据编码方法以及相关设备
WO2022028197A1 (zh) 一种图像处理方法及其设备
WO2023207836A1 (zh) 一种图像编码方法、图像解压方法以及装置
WO2021175278A1 (zh) 一种模型更新方法以及相关装置
US20240078414A1 (en) Parallelized context modelling using information shared between patches
US20240267568A1 (en) Attention Based Context Modelling for Image and Video Compression
CN115409697A (zh) 一种图像处理方法及相关装置
WO2023174256A1 (zh) 一种数据压缩方法以及相关设备
TWI826160B (zh) 圖像編解碼方法和裝置
Fraihat et al. A novel lossy image compression algorithm using multi-models stacked AutoEncoders
CN114501031B (zh) 一种压缩编码、解压缩方法以及装置
TW202348029A (zh) 使用限幅輸入數據操作神經網路
CN114693811A (zh) 一种图像处理方法以及相关设备
WO2024007820A1 (zh) 数据编解码方法及相关设备
CN118318441A (zh) 特征图编解码方法和装置
TW202345034A (zh) 使用條件權重操作神經網路

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21751079

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022548020

Country of ref document: JP

Kind code of ref document: A

Ref document number: 3167227

Country of ref document: CA

ENP Entry into the national phase

Ref document number: 2021751079

Country of ref document: EP

Effective date: 20220809

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022015510

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 20227030515

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021215764

Country of ref document: AU

Date of ref document: 20210205

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 112022015510

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20220805