WO2019105179A1 - 颜色分量的帧内预测方法及装置 - Google Patents

颜色分量的帧内预测方法及装置 Download PDF

Info

Publication number
WO2019105179A1
WO2019105179A1 PCT/CN2018/113779 CN2018113779W WO2019105179A1 WO 2019105179 A1 WO2019105179 A1 WO 2019105179A1 CN 2018113779 W CN2018113779 W CN 2018113779W WO 2019105179 A1 WO2019105179 A1 WO 2019105179A1
Authority
WO
WIPO (PCT)
Prior art keywords
color component
information
data
layer
input data
Prior art date
Application number
PCT/CN2018/113779
Other languages
English (en)
French (fr)
Inventor
王莉
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2019105179A1 publication Critical patent/WO2019105179A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/189Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding
    • H04N19/192Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive
    • H04N19/194Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the adaptation method, adaptation tool or adaptation type used for the adaptive coding the adaptation method, adaptation tool or adaptation type being iterative or recursive involving only two passes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Definitions

  • the present disclosure relates to the field of video coding and decoding, and in particular, to an intra prediction method and apparatus for color components.
  • the video compression coding technology refers to removing redundancy in various dimensions of image frames, and adopts intra prediction techniques to reduce image frames. Redundancy in the air and time domains to increase the coding compression ratio.
  • pixel information (also referred to as color information) includes information of a luminance component Y, a chrominance component U, and a chrominance component V, wherein the intra prediction technique of the chrominance component is based on a luminance component and a chrominance component.
  • the brightness value of the sampling point, and then the scaling parameter and the offset parameter of the target area are obtained according to the brightness value of the down sampling point and the chromaticity value of the reconstructed pixel point, and then the reconstructed brightness point in the target area is downsampled. (Equivalent to downsampling the luminance value of the reconstructed pixel), and obtaining the chromaticity prediction value of the pixel in the target region according to the scaling parameter and the offset parameter.
  • this intra prediction technique is based on the linear correlation between the luminance component and the chrominance component, but in reality, the linear relationship cannot accurately express the relationship between the luminance component and the chrominance component, so based on the principle The predicted result of the predicted chrominance component is less reliable.
  • Embodiments of the present disclosure provide an intra prediction method and apparatus for color components, which can solve the problem of low reliability of prediction results of color components in the prior art.
  • the technical solution is as follows:
  • an intra prediction method for a color component comprising:
  • the convolutional neural network Inputting first input data to the convolutional neural network through the first channel, the first input data including information of a first color component of the target area in the image frame to be processed;
  • the convolutional neural network Acquiring first output data output by the convolutional neural network, the first output data comprising a predicted value of information of a second color component of the target region by the convolutional neural network;
  • first color component and the second color component are different color components of the target area.
  • the first input data includes information of the reconstructed second color component in the first peripheral area, and information of the reconstructed first color component in the target area, where the target area is A peripheral area is a strip-shaped area located to the left and/or above the target area.
  • the method before the inputting the first input data to the convolutional neural network by using the first channel, the method further includes:
  • the determining, according to the sampling rate relationship, the first input data includes:
  • the determining, according to the sampling rate relationship, the information of the reconstructed second color component in the target area according to the information of the reconstructed second color component in the target area includes:
  • the sampling rate ratio is 1:1, the information of the reconstructed second color component in the first peripheral region of the target region, Information related to the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate ratio is greater than 1:1, based on the sampling rate ratio, the second color that has been reconstructed in the first peripheral region
  • the information of the component is upsampled such that the distribution density of the second color component in the first peripheral region after the upsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by the upsampling Information, and information of the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is less than 1:1, and the reconstructed second color in the first peripheral region is based on the sampling rate ratio
  • the information of the component is downsampled such that the distribution density of the second color component in the first peripheral region after downsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by downsampling Information, information related to the reconstructed first color component in the target area is determined as the first input data.
  • the method further includes:
  • the initial convolutional neural network is trained to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the second input data comprising information of a first color component of a training region in a first specified image frame, the first specified image frame
  • the middle training area is the same size as the target area, and the second input data is acquired in the same manner as the first input data.
  • the raw data corresponding to the training area in the first specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, and the original data is used by the first specified image frame
  • the information consisting of the known second color component in the medium training region.
  • the method further includes:
  • each of the first side information data including information other than information of a color component included in the first input data
  • the at least one first side information data is respectively input to the convolutional neural network through at least one second channel, and the at least one second channel is in one-to-one correspondence with the at least one first side information data.
  • the determining the at least one first side information data includes:
  • the color coding format of the image frame to be processed is a YUV format
  • the first input data includes information of color components of the first sample block of the x rows and y columns, where the x and the y are both greater than Or an integer equal to 1;
  • Determining the at least one first side information data based on the related information of the reconstructed first color component in the target area including:
  • the identification values of all of the intra prediction modes are combined into one of the first side information data.
  • the first input data includes information of color components of the first sampling block of x rows and y columns, where x and the y are integers greater than or equal to 1;
  • Determining the at least one first side information data based on the information of the reconstructed second color component in the second peripheral area of the target area including:
  • the first side information data includes x rows y columns and the average value.
  • the method further includes:
  • the side information data is normalized, so that the processed The value range of the information data of either side is the same as the value range of the first input data.
  • the method further includes:
  • the initial convolutional neural network is trained to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the manner of acquiring the third input data is the same as the manner of acquiring the first input data;
  • At least one second channel Inputting, by the at least one second channel, at least one second side information data to the initial convolutional neural network, wherein the at least one second channel is in one-to-one correspondence with the at least one second side information data, the at least The second side information data is acquired in the same manner as the at least one first side information data;
  • the raw data of the second color component corresponding to the training area in the second specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, where the original data is Information consisting of information of a known second color component in the training region in the second designated image frame.
  • the convolutional neural network includes an input layer, an implicit layer, and an output layer;
  • the method further includes:
  • the input layer has a channel with input data, performing multidimensional convolution filtering and nonlinear mapping on the first input data through the input layer to obtain output data of the input layer;
  • the input layer has input data of at least two channels
  • multi-dimensional convolution filtering and nonlinear mapping are respectively performed on the data input by each channel through the input layer, and the multi-dimensional convolution filtering of different channels is performed.
  • the nonlinearly mapped input data is combined to obtain output data of the input layer;
  • the high-dimensional image data is aggregated (eg, summed) by the output layer to obtain the first output data.
  • the input layer includes at least one convolution layer sequentially connected to each channel, and a merge layer, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • Performing multi-dimensional convolution filtering and non-linear mapping on the data input to each channel through the input layer, and combining the multi-dimensional convolution filtering and non-linear mapping input data of different channels to obtain the Input layer output data including:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer corresponding to different channels is combined by the merge layer to obtain output data of the input layer.
  • the hidden layer includes at least one convolution layer sequentially connected, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • Performing multidimensional convolution filtering and non-linear mapping on the output data of the input layer by using the hidden layer to obtain high-dimensional image data including:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer is used as the high-dimensional image data.
  • the color coding format of the image frame to be processed is a YUV format
  • the first color component and the second color component are two of a luminance component Y, a chrominance component U, and a chrominance component V;
  • the color coding format of the image frame to be processed is an RGB format
  • the first color component and the second color component are two of a red component, a green component, and a blue component.
  • an intra prediction apparatus for a color component comprising:
  • a first input module configured to input first input data to the convolutional neural network through the first channel, where the first input data includes information of a first color component of the target area in the image frame to be processed;
  • An obtaining module configured to acquire first output data output by the convolutional neural network, where the first output data includes a predicted value of information of a second color component of the target region by the convolutional neural network;
  • first color component and the second color component are different color components of the target area.
  • the first input data includes information of the reconstructed second color component in the first peripheral area, and information of the reconstructed first color component in the target area, where the target area is A peripheral area is a strip-shaped area located to the left and/or above the target area.
  • the device further includes:
  • a first determining module configured to determine a sampling rate relationship between the first color component and the second color component in the image frame to be processed before the inputting the first input data to the convolutional neural network through the first channel;
  • a second determining module configured to determine, according to the sampling rate relationship, the first input data, wherein a distribution density of a second color component in the first peripheral area is equal to the target area in the first input data The distribution density of the first color component.
  • the second determining module includes:
  • a first acquiring submodule configured to acquire information about the reconstructed second color component in the first peripheral area of the target area
  • a second acquiring submodule configured to acquire information about the reconstructed first color component in the target area
  • a first determining submodule configured to determine, according to the sampling rate relationship, information of the reconstructed first color component in the target area according to information of the reconstructed second color component in the first surrounding area The first input data.
  • the first determining submodule is configured to:
  • the sampling rate ratio is 1:1, the information of the reconstructed second color component in the first peripheral region of the target region, Information related to the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate ratio is greater than 1:1, based on the sampling rate ratio, the second color that has been reconstructed in the first peripheral region
  • the information of the component is upsampled such that the distribution density of the second color component in the first peripheral region after the upsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by the upsampling Information, and information of the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is less than 1:1, and the reconstructed second color in the first peripheral region is based on the sampling rate ratio
  • the information of the component is downsampled such that the distribution density of the second color component in the first peripheral region after downsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by downsampling Information, information related to the reconstructed first color component in the target area is determined as the first input data.
  • the device further includes:
  • a first training module configured to train an initial convolutional neural network to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the second input data comprising information of a first color component of a training region in a first specified image frame, the first specified image frame
  • the middle training area is the same size as the target area, and the second input data is acquired in the same manner as the first input data.
  • the raw data corresponding to the training area in the first specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, and the original data is used by the first specified image frame
  • the information consisting of the known second color component in the medium training region.
  • the device further includes:
  • a third determining module configured to determine at least one first side information data, each of the first side information data includes information other than information of a color component included in the first input data;
  • a second input module configured to input the at least one first side information data to the convolutional neural network by using at least one second channel, where the at least one second channel and the at least one first side information data are A correspondence.
  • the third determining module includes:
  • a second determining submodule configured to determine the at least one first side information data based on related information of the reconstructed first color component in the target area
  • a third determining submodule configured to determine the at least one first side information data based on information of the reconstructed second color component in the second peripheral area of the target area, the target area
  • the two peripheral regions are strip regions located on the left and/or above the target region.
  • the color coding format of the image frame to be processed is a YUV format
  • the first input data includes information of color components of the first sample block of the x rows and y columns, where the x and the y are both greater than Or an integer equal to 1;
  • the second determining submodule is configured to:
  • the identification values of all of the intra prediction modes are combined into one of the first side information data.
  • the first input data includes information of color components of the first sampling block of x rows and y columns, where x and the y are integers greater than or equal to 1;
  • the third determining submodule is configured to:
  • the first side information data includes x rows y columns and the average value.
  • the device further includes:
  • a normalization module configured to perform standardization processing on any one of the side information data when a value range of the information data of any one of the at least one first side information data is different from a value range of the first input data
  • the value range of the processed side information data is the same as the value range of the first input data.
  • the device further includes:
  • a second training module is configured to train the initial convolutional neural network to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the manner of acquiring the third input data is the same as the manner of acquiring the first input data;
  • At least one second channel Inputting, by the at least one second channel, at least one second side information data to the initial convolutional neural network, wherein the at least one second channel is in one-to-one correspondence with the at least one second side information data, the at least The second side information data is acquired in the same manner as the at least one first side information data;
  • the raw data of the second color component corresponding to the training area in the second specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, where the original data is Information consisting of information of a known second color component in the training region in the second designated image frame.
  • the convolutional neural network includes an input layer, an implicit layer, and an output layer; the device further includes:
  • a first processing module configured to perform multi-dimensionality on the first input data through the input layer when the input layer has a channel having input data before acquiring the first output data output by the convolutional neural network Convolution filtering and non-linear mapping to obtain output data of the input layer;
  • a second processing module configured to perform multidimensional convolution filtering and non-linear mapping on data input by each input channel through the input layer when at least two channels of the input layer have input data, and different channels
  • the multi-dimensional convolution filtering and the non-linearly mapped input data are combined to obtain output data of the input layer;
  • a high-dimensional processing module configured to perform multidimensional convolution filtering and non-linear mapping on output data of the input layer by using the hidden layer to obtain high-dimensional image data
  • an aggregation module configured to aggregate the high-dimensional image data by using the output layer to obtain the first output data.
  • the input layer includes at least one convolution layer sequentially connected to each channel, and a merge layer, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • the second processing module is configured to:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer corresponding to different channels is combined by the merge layer to obtain output data of the input layer.
  • the hidden layer includes at least one convolution layer sequentially connected, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • the high dimensional processing module is configured to:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer is used as the high-dimensional image data.
  • the color coding format of the image frame to be processed is a YUV format
  • the first color component and the second color component are two of a luminance component Y, a chrominance component U, and a chrominance component V;
  • the color coding format of the image frame to be processed is an RGB format
  • the first color component and the second color component are two of a red component, a green component, and a blue component.
  • a computer device is provided, where the computer device is an encoding end device or a decoding end device, and the computer device includes:
  • a memory for storing executable instructions of the processor
  • the processor is configured to perform an intra prediction method of the color component provided by the above first aspect, for example:
  • the convolutional neural network Inputting first input data to the convolutional neural network through the first channel, the first input data including information of a first color component of the target area in the image frame to be processed;
  • the convolutional neural network Acquiring first output data output by the convolutional neural network, the first output data comprising a predicted value of information of a second color component of the target region by the convolutional neural network;
  • first color component and the second color component are different color components of the target area.
  • the first input data includes information of the reconstructed second color component in the first peripheral area, and information of the reconstructed first color component in the target area, where the target area is A peripheral area is a strip-shaped area located to the left and/or above the target area.
  • the processor is further configured to: before the inputting the first input data to the convolutional neural network by using the first channel, determining sampling of the first color component and the second color component in the image frame to be processed Rate relationship
  • the determining, according to the sampling rate relationship, the first input data includes:
  • the determining, according to the sampling rate relationship, the information of the reconstructed second color component in the target area according to the information of the reconstructed second color component in the target area includes:
  • the sampling rate ratio is 1:1, the information of the reconstructed second color component in the first peripheral region of the target region, Information related to the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate ratio is greater than 1:1, based on the sampling rate ratio, the second color that has been reconstructed in the first peripheral region
  • the information of the component is upsampled such that the distribution density of the second color component in the first peripheral region after the upsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by the upsampling Information, and information of the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is less than 1:1, and the reconstructed second color in the first peripheral region is based on the sampling rate ratio
  • the information of the component is downsampled such that the distribution density of the second color component in the first peripheral region after downsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by downsampling Information, information related to the reconstructed first color component in the target area is determined as the first input data.
  • the processor is further configured to:
  • the initial convolutional neural network is trained to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the second input data comprising information of a first color component of a training region in a first specified image frame, the first specified image frame
  • the middle training area is the same size as the target area, and the second input data is acquired in the same manner as the first input data.
  • the raw data corresponding to the training area in the first specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, and the original data is used by the first specified image frame
  • the information consisting of the known second color component in the medium training region.
  • the processor is further configured to:
  • each of the first side information data including information other than information of a color component included in the first input data
  • the at least one first side information data is respectively input to the convolutional neural network through at least one second channel, and the at least one second channel is in one-to-one correspondence with the at least one first side information data.
  • the determining the at least one first side information data includes:
  • the color coding format of the image frame to be processed is a YUV format
  • the first input data includes information of color components of the first sample block of the x rows and y columns, where the x and the y are both greater than Or an integer equal to 1;
  • Determining the at least one first side information data based on the related information of the reconstructed first color component in the target area including:
  • the identification values of all of the intra prediction modes are combined into one of the first side information data.
  • the first input data includes information of color components of the first sampling block of x rows and y columns, where x and the y are integers greater than or equal to 1;
  • Determining the at least one first side information data based on the information of the reconstructed second color component in the second peripheral area of the target area including:
  • the first side information data includes x rows y columns and the average value.
  • the processor is further configured to:
  • the side information data is normalized, so that the processed The value range of the information data of either side is the same as the value range of the first input data.
  • the processor is further configured to:
  • the initial convolutional neural network is trained to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the manner of acquiring the third input data is the same as the manner of acquiring the first input data;
  • At least one second channel Inputting, by the at least one second channel, at least one second side information data to the initial convolutional neural network, wherein the at least one second channel is in one-to-one correspondence with the at least one second side information data, the at least The second side information data is acquired in the same manner as the at least one first side information data;
  • the raw data of the second color component corresponding to the training area in the second specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, where the original data is Information consisting of information of a known second color component in the training region in the second designated image frame.
  • the convolutional neural network includes an input layer, an implicit layer, and an output layer;
  • the processor is further configured to perform multi-dimensionality on the first input data through the input layer when the input layer has a channel having input data before acquiring the first output data output by the convolutional neural network Convolution filtering and non-linear mapping to obtain output data of the input layer;
  • the input layer has input data of at least two channels
  • multi-dimensional convolution filtering and nonlinear mapping are respectively performed on the data input by each channel through the input layer, and the multi-dimensional convolution filtering of different channels is performed.
  • the nonlinearly mapped input data is combined to obtain output data of the input layer;
  • the high-dimensional image data is aggregated by the output layer to obtain the first output data.
  • the input layer includes at least one convolution layer sequentially connected to each channel, and a merge layer, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • Performing multi-dimensional convolution filtering and non-linear mapping on the data input to each channel through the input layer, and combining the multi-dimensional convolution filtering and non-linear mapping input data of different channels to obtain the Input layer output data including:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer corresponding to different channels is combined by the merge layer to obtain output data of the input layer.
  • the hidden layer includes at least one convolution layer sequentially connected, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • Performing multidimensional convolution filtering and non-linear mapping on the output data of the input layer by using the hidden layer to obtain high-dimensional image data including:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer is used as the high-dimensional image data.
  • the color coding format of the image frame to be processed is a YUV format
  • the first color component and the second color component are two of a luminance component Y, a chrominance component U, and a chrominance component V;
  • the color coding format of the image frame to be processed is an RGB format
  • the first color component and the second color component are two of a red component, a green component, and a blue component.
  • An intra prediction method and apparatus for color components inputs first input data including information of a first color component of a target region in an image frame to be processed to a convolutional neural network, and is performed by a convolutional neural network. Processing the first output data of the information including the second color component, thereby realizing the intra prediction of the color component by the convolutional neural network, and the second prediction result due to the deep learning characteristics of the convolutional neural network
  • the color component is highly reliable.
  • FIG. 1 is a schematic diagram of an encoding principle of H.265 according to related art
  • FIG. 2 is a schematic diagram showing a decoding principle of H.265 according to the related art
  • FIG. 3 is a flowchart of an intra prediction method of a color component according to an exemplary embodiment
  • FIG. 4 is a schematic diagram of an image frame that is not encoded, according to an exemplary embodiment
  • FIG. 5 is a schematic diagram showing a rendering effect of information of a luminance component Y of the image frame shown in FIG. 4;
  • FIG. 6 is a schematic diagram showing a rendering effect of information of a chrominance component U of the image frame shown in FIG. 4;
  • FIG. 7 is a schematic diagram showing a rendering effect of information of a chrominance component V of the image frame shown in FIG. 4;
  • FIG. 8 is a flowchart of an intra prediction method of another color component according to an exemplary embodiment
  • FIG. 9 is a flowchart of a method for determining first input data, according to an exemplary embodiment
  • FIG. 10 is a schematic diagram of an area in an image frame to be processed, according to an exemplary embodiment
  • FIG. 11 is a schematic diagram of an area in another image frame to be processed, according to an exemplary embodiment
  • FIG. 12 is a schematic diagram of an upsampling process according to an exemplary embodiment
  • FIG. 13 is a schematic diagram showing the constituent elements of a first input data according to an exemplary embodiment
  • FIG. 14 is a schematic diagram of a downsampling process according to an exemplary embodiment
  • FIG. 15 is a schematic diagram showing the constituent elements of another first input data according to an exemplary embodiment
  • FIG. 16 is a schematic structural diagram of a convolutional neural network according to an exemplary embodiment
  • FIG. 17 is a flowchart of still another method for intra-prediction of color components, according to an exemplary embodiment
  • FIG. 18 is a schematic structural diagram of another convolutional neural network according to an exemplary embodiment.
  • FIG. 19 is a schematic structural diagram of an intra prediction apparatus for a color component according to an exemplary embodiment
  • FIG. 20 is a schematic structural diagram of an intra prediction apparatus of another color component according to an exemplary embodiment
  • FIG. 21 is a schematic structural diagram of a second determining module according to an exemplary embodiment
  • FIG. 22 is a schematic structural diagram of an intra prediction apparatus of still another color component according to an exemplary embodiment
  • FIG. 23 is a schematic structural diagram of an intra prediction apparatus of still another color component according to an exemplary embodiment
  • FIG. 24 is a schematic structural diagram of a third determining module according to an exemplary embodiment
  • FIG. 25 is a schematic structural diagram of an intra prediction apparatus of a color component according to another exemplary embodiment.
  • FIG. 26 is a schematic structural diagram of an intra prediction apparatus of still another color component according to another exemplary embodiment.
  • FIG. 27 is a schematic structural diagram of an intra prediction apparatus of still another color component according to another exemplary embodiment.
  • FIG. 28 is a schematic structural diagram of a computer device according to another exemplary embodiment.
  • An embodiment of the present disclosure provides an intra prediction method for a color component.
  • the intra prediction method of the color component is performed by a convolutional neural network (CNN) for intra prediction. Understand, the following is a brief explanation of the convolutional neural network.
  • CNN convolutional neural network
  • Convolutional neural network is a kind of feedforward neural network. It is one of the most representative network architectures in deep learning technology. Its artificial neurons (English: Neuron) can respond to a part of the coverage of surrounding units, according to image features. Process it.
  • the basic structure of a convolutional neural network includes two layers, one of which is a feature extraction layer, and the input of each neuron is connected to a local acceptance domain of the previous layer, and the features of the local acceptance domain are extracted.
  • the second is the feature mapping layer.
  • Each feature mapping layer of the network is composed of multiple feature maps, and each feature is mapped to a plane.
  • the feature mapping layer is provided with an activation function (English: activation function), and the usual activation function is a nonlinear mapping function, which can be a sigmoid function or a neural network review (English: Rectified linear unit; referred to as: ReLU) function.
  • the convolutional neural network is formed by interconnecting a large number of nodes (also called “neurons” or “units”), each node representing a specific output function.
  • the connection between every two nodes represents a weighted value called weight (English: weight).
  • weight English: weight
  • One of the advantages of the convolutional neural network compared to the traditional image processing algorithm is that it avoids the complicated pre-processing process of the image (extracting artificial features, etc.), and can directly input the original image for end-to-end learning.
  • One of the advantages of convolutional neural networks over traditional neural networks is that traditional neural networks use a fully connected approach, ie, neurons from the input layer to the hidden layer are all connected, which will result in a parameter amount. Huge, making network training time-consuming and even difficult to train, and convolutional neural networks avoid this problem through local connections and weight sharing.
  • the intra prediction method of the color component provided by the embodiment of the present disclosure can be applied to the field of video coding and decoding.
  • the video encoding process and the decoding process are briefly explained below.
  • the current video coding standards mainly include H.261 to H.265, and MPEG-4V1 to MPEG-4V3, among which H.264, also known as video coding (English: Advanced Video Coding; abbreviation: AVC), H .265, also known as High Efficiency Video Coding (English: High Efficiency Video Coding; referred to as HEVC), both adopt motion compensation hybrid coding algorithm, and the embodiment of the present disclosure uses H.265 as an example for explanation.
  • FIG. 1 is a schematic diagram of the coding principle of H.265.
  • the coding architecture of H.265 is similar to the coding architecture of H.264. It mainly includes: intra prediction (intra prediction) module, inter prediction (English: inter prediction) module, and transform (English: transform) module.
  • Quantization English: quantization
  • entropy coding English: entropy coding
  • inverse transform module inverse quantization module
  • reconstructed image module and loop filter module also called intra-loop filter module
  • the prediction module may include a motion estimation module and a motion compensation module
  • the loop filtering module includes a deblocking module (also called a deblocking filter) and a sampling point adaptive offset (English: Sample Adaptive Offset; referred to as: SAO) module.
  • SAO Sample Adaptive Offset
  • the image to be encoded is generally divided into a plurality of regions of equal size (also unequal) in a matrix, each region corresponding to one image block (also called a coding block), each of which The area may be a square area or a rectangular area.
  • the image blocks are sequentially processed in order from top to bottom and left to right.
  • the intra prediction module is configured to predict pixel values of a current image block based on reconstructed surrounding pixel values in the same image frame to remove spatial redundancy information; and the inter prediction module is configured to utilize a video time domain.
  • Correlation predicting pixel values of an image to be encoded using pixel values in adjacent reconstructed image frames to remove temporal correlation; quantization module for mapping successive values of image blocks into a plurality of discrete amplitudes;
  • the deblocking filtering module is configured to filter pixels at an image block boundary to remove block effects;
  • the SAO module is configured to perform pixel value compensation processing, and the reconstructed image module adds the predicted value and the reconstructed residual value to obtain a reconstructed pixel value ( Without loop filtering).
  • the reconstructed frame obtained by the loop filtering module forms a reference frame list for inter-frame prediction; the entropy coding module processes the obtained mode information and residual information to obtain a code stream (English: bitstream).
  • the intra prediction module independently encodes the luminance component and the chrominance component of the image block of the image frame to be processed.
  • the encoding process of the chroma component involves a chroma intra prediction technique
  • the chroma intra prediction technique is a cross-component chroma prediction technique, which is after encoding and reconstructing the information of the luminance component of the image block. That is, before the loop filtering module performs loop filtering, the chrominance component is predicted by using the reconstructed luminance component.
  • FIG. 2 is a schematic diagram of the decoding principle of H.265.
  • the decoding architecture of H.265 is similar to the decoding architecture of H.264, and mainly includes: entropy decoding module, intra prediction module, inter prediction module, inverse transform module, inverse quantization module and loop filtering module.
  • the loop filtering module includes a deblocking filtering module and an SAO module. The reconstructed frame obtained by the loop filtering module forms a reference frame list for inter-frame prediction, and the entropy decoding module processes the obtained code stream to obtain mode information and residual information.
  • An embodiment of the present disclosure provides an intra prediction method for a color component.
  • the intra prediction method is substantially a cross-component intra prediction method.
  • the principle is based on a convolutional neural network, and the first color component is used to predict the first
  • the information of the two color components includes:
  • Step 101 Input first input data to the convolutional neural network through the first channel, where the first input data includes information of a first color component of the target area in the image frame to be processed.
  • the target area is an area of the image frame to be processed to be predicted by the second color component.
  • the information of the color component refers to the value of the color component, which is also called the component value
  • the information of the first color component of the target region is also the value of the first color component of the target region.
  • Step 102 Acquire first output data output by the convolutional neural network, where the first output data includes a predicted value of the information of the second color component of the target area by the convolutional neural network.
  • the convolutional neural network is configured to predict the first output data based on the first input data.
  • the value indicates that in different application scenarios, the types of image frames to be processed are different, and the first input data is different.
  • the first input data is information of the first color component reconstructed after coding in the target region, and the information of the first color component reconstructed after encoding is based on the encoded code.
  • the information of the color component is recovered. Taking FIG. 1 as an example, the information of the first color component reconstructed after the encoding is obtained by inversely changing the information (that is, the code stream) of the first color component that has been encoded in the target region.
  • the information obtained by adding the prediction information of the first color component of the target area is the image information processed by the reconstructed image module in FIG. 1; when the image frame to be processed is the image frame to be decoded
  • the first input data is information of the reconstructed first color component decoded in the target area, and the decoded information of the reconstructed first color component is recovered based on the information of the decoded first color component, 2, for example, the information of the reconstructed first color component obtained by the decoding is information that the first color component has been decoded in the target area (that is, after passing through
  • the code stream of the decoding module is obtained by adding the information processed by the inverse change module and the inverse quantization module to the prediction information predicted by the intra prediction module or the inter prediction module, and the acquisition process is as shown in FIG. 2
  • the process of obtaining the reconstructed information is the same.
  • the information of the first color component reconstructed after encoding and the information of the decoded reconstructed first color component may each be referred
  • the embodiment of the present disclosure inputs first input data including information of a first color component of a target region in an image frame to be processed to a convolutional neural network, and is processed by a convolutional neural network to obtain a second color component.
  • the first output data of the information thereby realizing the intra prediction of the color component by the convolutional neural network, and the reliability of the second color component obtained by the final prediction is higher due to the characteristics of deep learning such as the convolutional neural network.
  • the intra prediction method of the color component provided by the embodiment of the present disclosure can implement different color component predictions for different color coding formats of the image frame to be processed.
  • the color coding formats of the two commonly used image frames are YUV format and RGB format.
  • the basic coding principle may be: using a three-tube color camera or a color-charge coupled component (English: Charge-coupled Device; CCD) camera or camera image acquisition device
  • CCD Charge-coupled Device
  • the image is taken, and then the obtained color image signal is subjected to color separation and separately amplified to obtain an RGB signal, and then the RGB signal is subjected to a matrix conversion circuit to obtain a signal of the luminance component Y and two color difference signals B-Y (ie, chrominance components).
  • the signal of U), R-Y i.e., the signal of the chrominance component V
  • the signal of U), R-Y i.e., the signal of the chrominance component V
  • This representation of color is the so-called YUV color space representation.
  • the signal of the luminance component Y represented by the YUV color space, the signal of the chrominance component U, and the signal of the chrominance component V are separated.
  • the above-mentioned YUV format can also be obtained by other means, which is not limited by the embodiment of the present disclosure.
  • the luminance component Y and the chrominance component U are obtained.
  • the sampling rate (also called the sampling rate) of the chrominance component V may be different.
  • the distribution density of each color component in the initial image is the same, that is, the distribution density ratio of each color component is 1:1:1 due to the sampling rate of each color component. Differently, the distribution density of different color components of the final target image is different. Generally, in the target image, the distribution density ratio of each color component is equal to the sampling rate ratio.
  • the distribution density of one color component refers to The number of pieces of information of the color component contained in the unit size.
  • the distribution density of the luminance component refers to the number of luminance values included in the unit size.
  • the current YUV format is divided into multiple sampling formats based on different sampling rate ratios.
  • the sampling format can be expressed in a sampling rate ratio. This representation is called A:B:C notation, and the current sampling format can be divided. For: 4:4:4, 4:2:2, 4:2:0 and 4:1:1.
  • the sampling format is 4:4:4, which indicates that the luminance component Y in the target image has the same sampling rate of the chrominance component U and the chrominance component V, and the downsampling is not performed on the original image, and the distribution density of each color component of the target image is The ratio is 1:1:1; the sampling format is 4:2:2, indicating that each of the two luminance components Y in the target image shares a set of chrominance components U and chrominance components V, and the distribution density ratio of each color component of the target image is 2:1:1, that is, the pixel is used as the sampling unit, the luminance component of the original image is not downsampled, the chrominance component of the original image is downsampled in the horizontal direction by 2:1, and the vertical direction is not downsampled to obtain the target.
  • the sampling format is 4:2:0, indicating that for each chrominance component of the chrominance component U and the chrominance component V in the target image, the sampling rate in both the horizontal direction and the vertical direction is 2:1, the target
  • the ratio of the distribution density of the luminance component Y to the chrominance component U of the image is 2:1
  • the ratio of the distribution density of the luminance component Y to the chrominance component V of the target image is 2:1, that is, the pixel is used as a sampling unit
  • the original image is The luminance component is not downsampled, the original image Chrominance components of the horizontal direction of the 2: 1 downsampling, the vertical direction and 2: 1 downsampling obtain the target image.
  • the first color component and the second color component are different types of color components that the target region has.
  • the color coding format of the image frame to be processed is the YUV format
  • the pixel information (also referred to as color information) of each pixel in the image frame to be processed includes information of the luminance component Y, the chrominance component U, and the chrominance component V
  • the first color component and the second color component may be any two of the luminance component Y, the chrominance component U, and the chrominance component V.
  • FIG. 4 is an image frame that is not encoded, and FIG. 5 to FIG. 7 respectively show information of the luminance component Y of the image frame (also referred to as a luminance image frame in FIG. 5) and a chrominance component.
  • a schematic diagram of the rendering effect of the information of U (which may also be referred to as a chrominance U image frame) and the information of the chrominance component V (which may also be referred to as a chrominance V image frame).
  • 4 is a schematic diagram of a color image frame, and Y, U, and V in FIG. 5 to FIG. 7 are identification information, and are not contents in the image frame.
  • the pixel information (also referred to as color information) of each pixel in the image frame to be processed includes information of a transparency component and a plurality of color components, which is more
  • the color components refer to at least two color components, for example, the plurality of color components may include a red component, a green component, and a blue component, and the first color component and the second color component are a red component, a green component, and a blue color. Any two of the components.
  • the color coding format of the image frame to be processed is RGB format
  • the ratio of the red component, the green component, and the blue component sampling rate is 1:1:1
  • the distribution density ratio of the three in the image frame to be processed is Also 1:1:1.
  • the scope of protection of the embodiments of the present disclosure is not limited thereto.
  • the color coding format of the image frame to be processed is other formats, any person skilled in the art is within the technical scope disclosed in the embodiments of the present disclosure.
  • the intra prediction method of the color component provided by the embodiment of the present disclosure may also be used to easily convert or replace the prediction of the corresponding color component. Therefore, these changes or replacements are easily conceivable, and are also covered in the protection scope of the embodiments of the present disclosure. Inside.
  • the convolutional neural network includes an input layer, an Hidden layer, and an output layer.
  • the convolutional neural network may include an input layer, an implicit layer, and an output layer.
  • the input layer may include at least one channel through which data may be input to the convolutional neural network.
  • there may be at least two processes for inputting data to the convolutional neural network for prediction of color components.
  • the intra prediction methods of the color components are different in different achievable modes, as follows:
  • the first input data is input to the convolutional neural network through the first channel, so that the convolutional neural network performs cross-component intra prediction of the color components to obtain the first output data.
  • the first input data may include information of a first color component of the plurality of first sampling blocks of the target area in the image frame to be processed, and the first output data includes a plurality of second samples of the target area output by the convolutional neural network.
  • Information of a second color component of the block wherein the first sampling block is a sampling unit for the first color component, the first sampling block includes at least one first color component point, and the first color component point is capable of acquiring the first A minimum area unit of information of a color component, which may also be referred to as a first color component pixel point or a first color component pixel location.
  • the first color component is a luminance component
  • the first color component point is a luminance point. If each pixel point in the target region has a luminance value, the size of one luminance point is the same as the size of one pixel.
  • the first sample block consists of at least one luminance point, that is to say consists of at least one pixel.
  • the second sampling block is a sampling block for a second color component, the second sampling block includes at least one second color component point, the second color component point being a minimum area unit capable of acquiring information of the second color component, the The second color component point may also be referred to as a second color component pixel point or a second color component pixel location.
  • the second color component is a chroma component
  • the second color component point is a chroma point. If every two pixel points in the target region have a chroma value (or share a chroma value), then one The size of the chromaticity point is the same as the size of the two pixel points, and the second sampling block is composed of at least one chromaticity point, that is, consists of at least two pixels.
  • each of the first sampling block and each of the second sampling blocks may be composed of one or more pixels, for example, assuming that the first sampling block is composed of 2 ⁇ 2 pixels, the first input data is The information of the first color component sampled by the sampling unit per 2 ⁇ 2 pixel points of the target area in the image frame to be processed may be included, wherein each first sampling block includes information of a first color component, the information The information of the first color component point of the specified position in the first sampling block may be the average value of the information of all the first color component points in the sampling unit.
  • each first sampling block includes a brightness value, which may be a specified brightness point in the first sampling block (such as a brightness point in the upper left corner or a central position)
  • the brightness value of the brightness point can also be the average value of the brightness of all the brightness points in the first sample block.
  • the first output data may include information of the second color component sampled by the sampling unit per 2 ⁇ 2 pixel points of the target area in the image frame to be processed
  • the data is prediction data of a sampling result, wherein each second sampling block includes information of a second color component, and the information may be information of a second color component point of a specified position in the second sampling block, It may be an average of information of all second color component points in the second sampling block.
  • each second sample block includes a chroma value, which may be in the second sampling block.
  • the chromaticity value of the specified chromaticity point (such as the chromaticity point in the upper left corner or the chromaticity point in the center position) may also be the chromaticity average of all the chromaticity points in the second sampling block.
  • the size of the sampling block in the embodiment of the present application is only a schematic description. In the practical application, the size of the first sampling block and the second sampling block may be 8 ⁇ 8 pixels.
  • the first sampling block is composed of one first color component point
  • the second sampling block is composed of one.
  • the second color component points are composed.
  • the first input data then includes information of all first color components of the target area in the image frame to be processed (ie, information of the first color component of all pixels), the first output data including all of the target area of the convolutional neural network Information of the second color component (ie, information of the second color component of all pixels).
  • the first input data includes information of all first color components of the target area in the image frame to be processed
  • the first output data including information of all second color components of the target area by the convolutional neural network
  • the image frame to be processed is a video image frame
  • the intra prediction method of the color component may include:
  • Step 201 Determine a sampling rate relationship between the first color component and the second color component in the image frame to be processed.
  • the image frame to be processed is generally divided into a plurality of regions of equal size arranged in a matrix, each region corresponding to one image block (also referred to as a coding block in the field of video coding and decoding), when performing image processing,
  • the target area is an area of the image frame to be processed in which the second color component is to be predicted, in the target area, in the order of the top-to-bottom, from left to right.
  • the second color component is predicted, the second color component of the region above and to the left of the target region has completed the corresponding prediction.
  • the target area is an area of a to-be-processed image frame to be reconstructed by the second color component, and when the second color component of the target area is reconstructed, the area above and to the left of the target area The second color component has completed the corresponding reconstruction.
  • the sampling rates of different color components may be the same or different.
  • the sampling rate relationships may be the same or different, and the sampling rate relationship is determined by the sampling format of the actual color coding format.
  • the sampling format may be YUV4:2:0 or YUV4:4:4, etc., wherein when the sampling format is YUV4:2:0, the same image frame to be encoded
  • the sampling rate relationship of the luminance component Y, the chrominance component U, and the chrominance component V is such that the ratios of the luminance component Y and the chrominance component V in the horizontal and vertical directions are 2:1, the luminance component Y and the chromaticity.
  • the ratio of the sampling rate of the component U in the horizontal and vertical directions is 2:1; the ratio of the sampling rate of the chrominance component U and the chrominance component V is 1:1; when the sampling format is YUV4:4:4, the same image frame to be encoded In the region, the sampling rate relationship of the luminance component Y, the chrominance component U, and the chrominance component V is: the sampling rate ratio of the luminance component Y and the chrominance component U is 1:1, and the sampling rate of the luminance component Y and the chrominance component V The ratio is 1:1.
  • the image frame to be encoded may also be other sampling formats, which will not be described in detail in the embodiments of the present disclosure.
  • the above sampling rate relationship finally reflects the distribution density of the color components. For example, when the sampling rate ratio of the two color components is 1:1, the distribution density of the two color components in the same region is the same.
  • the intra prediction is based on the linear correlation between the luminance component and the chrominance component
  • the principle is that the local luminance of the image is linearly related to the chrominance, but in fact the texture characteristic of the luminance component is much stronger than the texture of the chrominance component.
  • the region W of the face image corner position 4 ⁇ 4 pixels in FIG. 4 is taken as an example. Assuming the sampling format is YUV4:4:4, the sampling rate relationship of the YUV color component of each pixel in the region W is: sampling. The rate ratio is 1:1:1.
  • each pixel in the region W has information of one luminance component Y (ie, numerical value), information of one chrominance component U, and information of one chrominance component V, see 5 to 7, and Tables 1 to 3, and FIG. 5 to FIG. 7 are schematic diagrams showing the information of the luminance component Y of the image frame, the information of the chrominance component U, and the information of the chrominance component V, respectively.
  • Table 1 Table 3 is the numerical value of the luminance component Y, the numerical value of the chrominance component U, and the numerical value of the chrominance component V, respectively, of the pixel points in the region W. As can be seen from FIG. 5 to FIG.
  • the convolutional neural network by predicting the cross-color component by the convolutional neural network, it is possible to generate a prediction result by using image features such as texture extracted in the perceptual field of the convolutional neural network, thereby avoiding the luminance component and the color.
  • the degree component is simply set to have a linear correlation relationship, and the correlation of the luminance component Y, the chrominance component U, and the chrominance component V can be fully considered.
  • the correlation of the luminance component Y, the chrominance component U, and the chrominance component V is effectively analyzed, and the network architecture of the convolutional neural network is simplified, the first input data Not only information of the reconstructed first color component in the target area but also information of the reconstructed second color component of the first peripheral area of the target area, the information of the reconstructed second color component may reflect The texture characteristic of the two color components in the image to be predicted, based on the information including the reconstructed second color component, the convolutional neural network can more accurately predict the information of the second color component of the target region, please refer to step 102 above.
  • the information of the reconstructed second color component is the information of the second color component reconstructed after the encoding, when the image to be processed
  • the information of the reconstructed second color component is information of the reconstructed second color component obtained by decoding.
  • the first peripheral area of the target area is a strip-shaped area (also referred to as a strip-shaped area) located on the left side and/or the upper side of the target area, and the strip-shaped area is adjacent to the target area.
  • the range of the strip region may be set according to an actual situation.
  • the strip region is composed of at least one column of pixels located on the left side of the target region and/or at least one row of pixels above, and the p and q are both greater than or equal to 1 The integer.
  • the sampling rate ratio of each color component is determined.
  • the distribution density of the color components in the target image obtained in the final embodiment, and the object to be processed in the intra-frame component in the embodiment of the present disclosure: the image to be processed, that is, the target image, the sampling rate ratio of each color component may be different, corresponding
  • the distribution density may also be different, and therefore, the distribution density of the information of the reconstructed first color component and the distribution density of the information of the reconstructed second color component included in the first input data may also be different.
  • the sampling rate relationship between the first color component and the second color component in the image frame to be processed may be based on Performing a process of uniformizing the distribution density of the first input data, which may refer to the subsequent step 2023, and the distribution density of the second color component in the first peripheral region included in the first input data obtained through the uniformization process Equal to the distribution density of the first color component in the target area, such that the distribution density of each color component included in the first input data is uniform, and since the prediction is mainly based on the information of the first color component in the target area, therefore, In the process of determining the first input data, the density of the second color component in the first peripheral region is adjusted by keeping the distribution density of the first color component in the target region constant, so that the density of the two is equal.
  • Step 202 Determine first input data based on a sampling rate relationship.
  • the information of the first color component of the target region in the image frame to be processed included in the first input data is already in the target region.
  • the information of the reconstructed first color component is assumed to be a first color component point, and the second sampling block is a second color component point.
  • the first is determined based on the sampling rate relationship.
  • the process of entering data can include:
  • Step 2021 Acquire information of the reconstructed second color component in the first peripheral area of the target area.
  • the first color component is the luminance component Y
  • the second color component is the chrominance component U
  • the sampling format can be divided into: YUV4:4:4, a square of FIG. Representing one pixel
  • the first peripheral area K is composed of two columns of pixels located on the left side of the target area and two lines of pixels above, as shown in FIG. 10, each of the first peripheral area K and the target area H
  • the sampling rate relationship of the YUV color component is 1:1:1
  • the information of the reconstructed second color component in the acquired first peripheral region is the information of the chrominance component U in the first peripheral region K.
  • Step 2022 Acquire information of the reconstructed first color component in the target area.
  • the information of the reconstructed first color component in the target area is the information of the luminance component in the target area H.
  • Step 2023 Determine, according to the sampling rate relationship, the first input data according to the information of the reconstructed second color component in the first surrounding area and the information of the reconstructed first color component in the target area.
  • step 2023 includes:
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is 1:1, the information of the reconstructed second color component in the first peripheral region of the target region, and the target The information of the reconstructed first color component in the area is determined as the first input data.
  • the first peripheral region K is directly The information of the reconstructed chroma component and the information of the reconstructed luma component in the target region H are determined as the first input data. Assuming that one square of FIG. 10 represents one pixel, the distribution density of the chrominance component U of the first peripheral region K is one chromaticity value per pixel, and the distribution density of the luminance component Y in the target region H. There is one luminance value for each pixel, and at this time, the distribution density of the chrominance components U of the first peripheral region K is equal to the distribution density of the luminance components Y in the target region H.
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is greater than 1:1, and the information of the reconstructed second color component in the first peripheral region is performed based on the sampling rate ratio.
  • Upsampling such that the distribution density of the second color component in the first peripheral region after the upsampling is equal to the distribution density of the first color component in the target region, and the information of the second color component obtained by the upsampling is Information of the reconstructed first color component in the target area is determined as the first input data.
  • the color coding format is YUV4:2:2
  • the first color component is the luminance component Y and the second color component is the chrominance component U
  • the sampling rate relationship of the luminance component Y and the chrominance component U is: the sampling rate ratio is 2:1, greater than 1:1, it is necessary to upsample the information of the reconstructed chroma component U in the first peripheral region based on the sampling rate ratio: 2:1, and the upsampled chroma component U Information
  • information of the reconstructed luminance component Y in the target area is determined as the first input data.
  • the information of the reconstructed second color component in the first peripheral area may be upsampled using an upsampling filter, or may be used based on the information of the second color component of the original image.
  • a suitable interpolation algorithm inserts information of the new second color component.
  • the sampling rate ratio of the first color component and the second color component is greater than 1:1, that is, the size of the first sampling block corresponding to the first color component in the target area is smaller than the corresponding color of the second color component.
  • the size of the second sample block, and the distribution density of the first color component in the target area needs to be kept constant, and the basic unit of the upsampled image is the first sample block.
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is equal to r: 1, and the r is an integer greater than 1, then the first peripheral region is more
  • the information of the second color component of the second sampling block is up-sampled by r times to obtain information of the second color component of the plurality of first sampling blocks, that is, the second color component in the first peripheral region after the upsampling
  • the distribution density is equal to the distribution density of the first color component in the target area, and the information of the upsampled second color component and the information of the reconstructed first color component in the target area are determined as the first input data.
  • the upsampling by using the interpolation algorithm may refer to inserting information of the second color component on the basis of the information of the original second color component of the first peripheral area, so that the first periphery after the interpolation
  • the distribution density of the second color component in the region is equal to the distribution density of the first color component of the target region.
  • the upsampling may be: copying information of the second color component on each second sampling block in the first peripheral area, and dividing each second sampling block into r 2 first sampling blocks, in each The position of a sampling block is filled with the information of the copied second color component, that is, for any of the first sampling blocks obtained by the division, the second color component of the original second sampling block to which the first sampling block belongs The information is filled into the first sample block.
  • the above filling process is performed by interpolating r 2 -1 positions adjacent to each second sampling block, and finally the information of the second color component obtained by upsampling is actually [(M ⁇ Nm ⁇ n) ⁇ r 2 ] Information of the second color component.
  • the target area is the area H in FIG. 10
  • the first color component is the luminance component Y
  • the second color component is the chrominance component U
  • the sampling format can be divided into: YUV4: 2: 2
  • the first peripheral area K is The two columns of pixels located on the left side of the target area and the two rows of pixels above, as shown in FIG. 10, in the first peripheral area K and the target area H, the sampling rate relationship of the YUV color components of each pixel is 2 : 1:1, as shown in FIG. 12, the information of the chrominance component U in the first peripheral area K is acquired, and the upsampling is performed twice to obtain the upsampled first peripheral area K.
  • the information of the chrominance component U is copied, and each second sample is copied.
  • the block is divided into four first sampling blocks, and the position of each first sampling block is filled and copied to obtain the information of the chrominance component U, that is, the information obtained by copying the chrominance component U is respectively performed on three surrounding locations.
  • Interpolation that is, the adjacent positions of the right side, the lower side, and the lower right side of the sampling block where the chrominance component U is located are interpolated, and the interpolation manners of other positions are the same, and finally the first peripheral area K below the FIG. 12 is obtained.
  • the information of the upsampled chrominance component and the information of the reconstructed luminance component in the target area are finally determined as the first input data.
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is less than 1:1, and the information of the reconstructed second color component in the first peripheral region is performed based on the sampling rate ratio.
  • Downsampling in English: subsampled, such that the distribution density of the second color component in the first peripheral region after downsampling is equal to the distribution density of the first color component in the target region, and the information of the second color component obtained by downsampling, Information of the reconstructed first color component in the target area is determined as the first input data.
  • the color coding format is YUV4:2:2
  • the first color component is the chrominance component U
  • the second color component is the luminance component Y
  • the sampling rate relationship of the chrominance component U and the luminance component Y is: the sampling rate ratio is 1:2, less than 1:1, it is necessary to downsample the information of the reconstructed luminance component Y in the first peripheral region based on the sampling rate ratio 1:2, and the information of the luminance component Y obtained by downsampling
  • Information of the reconstructed chroma component U in the target area is determined as the first input data.
  • the information of the reconstructed second color component in the first peripheral area may be downsampled using a downsampling filter, or may be downsampled based on information of the second color component of the original image.
  • the information of the sampled second color component is obtained.
  • the sampling rate ratio of the first color component and the second color component is less than 1:1, that is, the size of the first sampling block corresponding to the first color component in the target region is larger than the second.
  • the size of the second sample block of the color component, and the density of the first color component in the target region needs to be kept constant, and the basic unit of the downsampled image should be the first sample block.
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is equal to 1: s, and the s is an integer greater than 1, then the first peripheral region is more
  • the information of the second color component of the second sampling block is downsampled by s times to obtain information of the second color component of the plurality of first sampling blocks, that is, the second color component in the first peripheral region after downsampling
  • the distribution density is equal to the distribution density of the first color component, and the information of the downsampled second color component and the information of the reconstructed first color component in the target area are determined as the first input data.
  • the first peripheral area includes M ⁇ Nm ⁇ n second sampling blocks having information of the second color component
  • the downsampling of the multiples means that the average value of the information of the second color component of each s ⁇ s second sample blocks in the first peripheral region is determined as the information of the second color component of the first sample block, and all The information of the second color component of the first sampling block is used as the information of the second color component obtained by the down sampling, and the information of the second color component finally obtained by the down sampling is actually [(M ⁇ Nm ⁇ n)/s 2 ] Information of the second color component of the first sample block.
  • the first peripheral region includes the region W in FIG. 4, the first color component is the chrominance component U, the second color component is the luminance component Y, and the sampling rate ratio is 1:2, and the luminance component Y in the region W is As shown in Table 1, the area W includes 4 ⁇ 4 second sample blocks having information of the luminance component Y. Then, the information of the downsampled luminance component Y obtained by down-sampling based on the information of the luminance component Y shown in Table 2 can be as shown in Table 4, and the information of the downsampled luminance component Y includes 2 ⁇ 2 luminance components. The downsampled point of the Y information.
  • the downsampled luminance component Y corresponding to Table 4 includes four first sample blocks, and the corresponding luminance component Y values are abbreviated as 128.25, 122.5, 119.25, and 100.5, respectively.
  • the luminance value of the first sampling block is 128.25, which is the average value of the luminance values of the first row, the first column, the second column, the second row, the first column, and the second row and the second column in the region W;
  • the second The luminance value 97.5 of the first sampling block is the average value of the luminance values of the first row, the third column, the first row, the fourth column, the second row, the third column, and the second row and the fourth column in the region W;
  • the third The luminance value 119.25 of one sample block is the average value of the luminance values of the third row, the first column, the third row, the second column, the fourth row, the first column, and the fourth row and the second column in the region W;
  • the above example is only described by taking the down sampling of the partial region W in the first peripheral region as an example.
  • the information of the luminance component Y in the first peripheral region K above the FIG. 14 is downsampled as an example, and sampling is performed.
  • the information of the obtained luminance component Y is as shown in the information of the luminance component Y in the first peripheral region K below, as shown in FIG. 15, and finally the information of the luminance component obtained by downsampling is reconstructed with the target region.
  • the information of the chrominance component is determined as the first input data.
  • the above steps 201 and 202 are based on the sampling rate relationship between the first color component and the second color component in the image frame to be processed, and the distribution density of the first input data is uniformly processed, but in another
  • the information of the reconstructed second color component in the first peripheral area of the target area may be directly acquired, and the information of the reconstructed first color component in the target area is acquired (refer to step 2021 and step 2022 above)
  • determining a first distribution density of the information of the reconstructed first color component at the target region determining a second distribution density of the information of the reconstructed second color component at the target region, and then based on the first distribution density and the second distribution
  • the ratio of the density is performed to perform the coincidence processing as provided in step 2023.
  • the information of the reconstructed second color component in the first peripheral area of the target area may be directly acquired, and the information of the reconstructed first color component in the target area is obtained (refer to the above steps). 2021 and step 2022), and using the two as the first input data, it is not necessary to perform the above steps 201 and 2023.
  • Step 203 Input first input data to the convolutional neural network through the first channel.
  • the first input data contains information of a first color component of the target area in the image frame to be processed.
  • the first input data may include information of the reconstructed second color component in the first peripheral region of the target region (the information is upsampled, downsampled, or not sampled) Information) and information of the reconstructed first color component in the target area.
  • the first input data may also only contain information of the reconstructed first color component in the target area, and then step 201 and step 202 need not be performed.
  • Step 204 Perform multidimensional convolution filtering and nonlinear mapping on the first input data through the input layer to obtain output data of the input layer.
  • the input layer may include at least one channel, where the at least one channel includes a first channel for inputting the first input data, and the input layer may separately perform multidimensional convolution filtering and nonlinear mapping on the data input by each channel. And combining the multi-dimensional convolution filtering and the non-linear mapping input data through different channels to obtain the output data of the input layer.
  • the input layer may not need to perform the above-mentioned merge.
  • the action directly uses the data obtained by performing multidimensional convolution filtering and nonlinear mapping on the first input data as the output of the input layer.
  • the convolutional neural network may include an input layer, an implicit layer, and an output layer.
  • the input layer may include at least one convolution layer connected in sequence corresponding to the first channel.
  • the convolution layer number, the convolution layer connection mode, and the convolution layer attribute included in the input layer are not limited.
  • Each convolution layer includes a feature extraction layer and a feature mapping layer.
  • each feature extraction layer includes a convolution filter bank
  • each convolution filter bank includes at least one convolution filter (also called convolution kernel)
  • the nonlinear mapping function of the mapping layer is r()
  • the output data of the jth convolutional layer satisfies:
  • F j (J) represents the output data of the jth convolutional layer in the input layer
  • J is the first input data
  • * is the convolution operation
  • W j is the convolution filtering in the jth convolutional layer of the input layer
  • B j is the offset coefficient of the convolution filter bank in the jth convolutional layer.
  • the n j convolution filters act on the input data of the j-th convolution layer, and output n j image partitions.
  • the size of each convolution filter of the jth convolutional layer is c j ⁇ f j ⁇ f j , where c j is the number of input channels of the jth convolutional layer, f j ⁇ f j The size (or size) of each convolution filter of the jth convolutional layer in space.
  • FIG. 16 is a schematic structural diagram of a convolutional neural network according to an embodiment of the present disclosure.
  • the input layer includes a convolution layer including a feature extraction layer X1 and a feature mapping layer X2. .
  • the feature mapping layer X2 is provided with an activation function, which is a nonlinear mapping function.
  • n 1 is a positive integer
  • multi-dimensional convolution filtering is performed on the first input data by n 1 convolution filters of the feature extraction layer X1 to obtain n 1 images.
  • Data; non-linear mapping of the n 1 image data by the feature mapping layer X2 to obtain n 1 mapping image data, then n 1 mapping image data is the output data of the input layer.
  • J is a first input data
  • * represents convolution
  • W 1 represents the n 1 right convolution filter weight coefficient
  • R & lt () is a characteristic map
  • the activation function of the layer which may be a nonlinear mapping function such as a sigmoid function or a ReLU function.
  • n 1 64
  • J is the first input data
  • * represents the convolution
  • W 1 represents the weight coefficients of the 64 convolution filters
  • B 1 is the offset coefficient of the 64 convolution filters
  • the size of each convolution filter It is 2 x 5 x 5.
  • Step 205 Perform multidimensional convolution filtering and nonlinear mapping on the output data of the input layer through the hidden layer to obtain high-dimensional image data (also referred to as high-dimensional image segmentation).
  • the hidden layer includes at least one convolution layer connected in sequence.
  • the convolution layer number, the convolution layer connection manner, the convolution layer attribute, and the like included in the hidden layer are not limited.
  • Each convolution layer includes a feature extraction layer and a feature mapping layer.
  • For the structure of each convolution layer in the hidden layer reference may be made to the structure of the convolution layer in the input layer in the above step 204, and each convolution in the hidden layer.
  • the function of the layer can also refer to the function of the convolution layer in the above input layer.
  • the input data can be multi-dimensional convolution filtering through the feature extraction layer, and the input data is nonlinearly mapped through the feature mapping layer; then the data processed by the at least one convolution layer is processed As high-dimensional image data, the high-dimensional image data is the output data of the hidden layer.
  • each feature extraction layer includes a convolution filter bank
  • each convolution filter bank includes at least one convolution filter
  • the nonlinear mapping of the feature mapping layer The function is g(), then the output data of the i-th convolutional layer satisfies:
  • H i (I) represents the output data of the i-th convolutional layer in the hidden layer
  • I is the output data of the input layer, that is, F M (J) in the above step 204
  • * is a convolution operation
  • O i The weight coefficient of the convolution filter bank in the i-th convolutional layer of the hidden layer
  • a i is the offset coefficient of the convolution filter bank in the i-th convolution layer.
  • the convolution filter bank of the i-th convolutional layer includes m i convolution filters, and the m i convolution filters act on the input data of the i-th convolution layer, and output m i image blocks.
  • the size of each convolution filter of the i-th convolutional layer is d i ⁇ k i ⁇ k i , where d i is the number of input channels of the i-th convolution layer, k i ⁇ k i The size of each convolution filter in the i-th convolutional layer in space.
  • H 1 (I) max(0,O 1 *I+A 1 );
  • H 1 (I) is the output data of the hidden layer
  • I is the output data of the input layer, that is, F M (J) in the above step 204
  • * represents convolution
  • O 1 is 32 in the convolutional layer.
  • the weighting coefficient of the convolution filter, A 1 is the offset coefficient of 32 convolution filters, and the size of each convolution filter is 64 ⁇ 1 ⁇ 1.
  • Step 206 The high-dimensional image data is aggregated through the output layer to obtain first output data.
  • the output layer when the intra prediction method of the color component is applied to the video coding and decoding field, since the data output by the output layer is the reconstruction data of the second color component, the output layer is also referred to as a reconstruction layer and an output layer.
  • the high-dimensional image data output by the hidden layer can be aggregated to output the final first output data.
  • Embodiments of the present disclosure do not limit the structure of the output layer.
  • the structure of the output layer may be a direct learning structure.
  • the output layer may directly output the high-dimensional image data output by the hidden layer.
  • the data of the reconstructed image is the first output data.
  • the output data of the output layer satisfies the first reconstruction formula, and the first reconstruction formula is:
  • P(V) is the output data of the output layer, that is, the first output data
  • V is the output data of the hidden layer, that is, H N (I) in step 205
  • * is a convolution operation
  • U v is the weight coefficient of the output layer
  • C v is the offset coefficient of the output layer.
  • the output layer includes a convolution filter, that is, one convolution filter acts on the output data of the hidden layer, and outputs one image data, thereby realizing aggregation of high-dimensional image data; each convolution filter
  • the size of the device is e ⁇ t ⁇ t, where e is the number of input channels, and t ⁇ t is the spatial size of each convolution filter of the output layer.
  • the structure of the output layer is a Direct Learning structure
  • the output layer includes a convolution layer including a convolution filter, and convolution filtering of the output layer
  • P(V) is the output data of the output layer, that is, the first output data
  • V is the output data of the hidden layer, that is, H N (I) in step 205
  • * is a convolution operation
  • U v is a weight coefficient of one convolution filter
  • C v is an offset coefficient of one convolution filter
  • the size of the convolution filter is 32 ⁇ 3 ⁇ 3.
  • the structure of the output layer may be a Residual learning structure.
  • the output layer may perform a convolution operation on the high-dimensional image data output by the hidden layer.
  • the processed data is aggregated with the output data of the input layer to output data of the reconstructed image, and the data of the reconstructed image is the first output data.
  • the output data of the output layer satisfies the second reconstruction formula, which is:
  • P(V) is the output data of the output layer, that is, the first output data
  • V is the output data of the hidden layer, that is, H N (I) in step 205
  • I is the output data of the input layer. That is, F M (J) in the above step 204, * is a convolution operation
  • U v is a weight coefficient of the output layer
  • C v is an offset coefficient of the output layer.
  • Step 207 Acquire first output data output by the convolutional neural network, where the first output data includes a predicted value of the information of the second color component of the target region by the convolutional neural network.
  • the obtained first output data is the information of the reconstructed second color component, and the subsequent operations may be performed based on the first output data, and the process may refer to the process of FIG. 1 and FIG. 2 above. The embodiment will not be described again.
  • FIG. 16 is an example in which the convolutional neural network includes an input layer, an implicit layer and an output layer, and the target area is 3 ⁇ 3 pixels, and the convolutional neural network may have other The structure of the present disclosure is not limited thereto.
  • the cross-component intra prediction method For a picture block size determination (such as when encoding with the video coding standard H.265, the smallest image block (or processing block) size is 4 ⁇ 4 pixels, the cross-component intra prediction method provided by the embodiment of the present disclosure may be According to the cross-component intra prediction of every 4x4 pixels, the parameter set of the corresponding convolutional neural network needs to be obtained through training (also called pre-training).
  • each volume After determining the network architecture of an initial convolutional neural network, such as the number of convolution layers, the convolution layer connection, the number of convolution filters per convolution layer, and the size of the convolution kernel, each volume
  • the weighting coefficients of the layers ie, the weighting coefficients of the respective convolution filters
  • the offset coefficients of each convolutional layer ie, the offset coefficients of the respective convolutional filters
  • the trained network is the above convolutional neural network.
  • the initial convolutional neural network needs to be trained to obtain the above convolutional neural network, the network architecture of the initial convolutional neural network and the convolutional neural network described above.
  • the training process of the convolutional neural network includes:
  • Step A1 Input second input data to the initial convolutional neural network through the first channel.
  • the initial convolutional neural network needs to fully consider the network perception field, complexity and ability to solve problems.
  • Embodiments of the present disclosure do not define the network architecture of the initial convolutional neural network.
  • the second input data includes information of a first color component of the training area in the first specified image frame
  • the first specified image frame may be a preset test image frame, or may be a randomly selected image frame
  • the first The specified image frame is usually different from the image frame to be processed described above.
  • the training area of the first specified image frame is the same as the size of the target area, and the second input data is obtained in the same manner as the first input data. For details, refer to steps 201 to 202 above.
  • Step B1 The original data corresponding to the training area in the first specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain a convolutional neural network.
  • the raw data consists of information of a second color component known in the training region of the first specified image frame.
  • the information of the second color component known in the training area is information of the second color component that is not processed in the training area, and the information of the known second color component in the training area is an ideal result of prediction, that is, If the prediction of the second color component of the training area is completely accurate, the obtained data is the original data.
  • the initial convolutional neural network can be trained by specifying a training platform, which can include a process of configuring parameters such as a learning rate.
  • a training platform which can include a process of configuring parameters such as a learning rate.
  • the above training process can be implemented based on a training method of supervised learning algorithm (English: supervised learning), which is an existing training set (also called a training sample, that is, known data and its corresponding training label,
  • the training tag can be trained for explicit identification or output results to train the corresponding parameters.
  • the training process may also be implemented by a manual calibration, or an unsupervised learning algorithm, or a semi-supervised learning algorithm, which is not limited by the embodiment of the present disclosure.
  • the embodiment of the present disclosure inputs first input data including information of a first color component of a target region in an image frame to be processed to a convolutional neural network, and is processed by a convolutional neural network to obtain a second color component.
  • the first output data of the information thereby realizing the intra prediction of the color component by the convolutional neural network, and the reliability of the second color component obtained by the final prediction is higher due to the characteristics of deep learning such as the convolutional neural network.
  • the first input data is input to the convolutional neural network through the first channel
  • the at least one first side information data is respectively input to the convolutional neural network through the at least one second channel to perform crossover of the color components.
  • Component intra prediction the convolutional neural network is configured to predict the first output data based on the first input data and the at least one first side information data, and the side information refers to an existing one other than the information to be processed.
  • the side information data is data capable of being side information, for example, when performing intra prediction of a color component, the information to be processed is the first input data, and the first side information data is different from the first input data, the first The side information data may contain information other than the information of the color components contained in the first input data, and can provide a prediction reference for the convolutional neural network.
  • the intra prediction mode (for example, the direction mode of the intra prediction) can be used as one type of side information, and the data of the intra prediction mode is the side information data.
  • the first side information data in the embodiment of the present disclosure is side information data input into the convolutional neural network.
  • the content included in the first input data and the first output data may refer to the foregoing first implementation manner, which is not repeatedly described in the embodiment of the present disclosure.
  • the first input data includes information of all first color components of the target area in the image frame to be processed
  • the first output data including information of all second color components of the target area by the convolutional neural network
  • the image frame to be processed is a video image frame
  • the intra prediction method of the color component may include:
  • Step 301 Determine a sampling rate relationship between the first color component and the second color component in the image frame to be processed.
  • the step 301 can refer to the foregoing step 201, and details are not described herein again.
  • Step 302 Determine, according to the sampling rate relationship, the first input data according to the information of the first color component in the target area.
  • Step 302 can refer to step 202 above, and details are not described herein again.
  • Step 303 Determine at least one first side information data, where each first side information data includes information other than information of color components included in the first input data.
  • the at least one first side information data may include the correlation information T1 of the reconstructed first color component in the target area and/or the average of the information T2 of the reconstructed second color component in the second peripheral area of the target area.
  • Value or weighted average ie mean or weighted average of T1, mean or weighted average of T2, or mean of T1 and T2, or a weighted average of T1 and T2
  • additional information ie mean or weighted average of T1, mean or weighted average of T2, or mean of T1 and T2, or a weighted average of T1 and T2
  • the second peripheral area of the target area is a strip-shaped area located on the left side and/or the upper side of the target area, the strip-shaped area is adjacent to the target area, and the definition of the second peripheral area may refer to the first peripheral area of the above step 201, the present disclosure. The embodiment will not be described again.
  • each first side information data should be consistent with the size and the number of values of the first input data.
  • the first input data includes x rows and y columns of the first sampling block.
  • the information of the color component and includes information of x ⁇ y color components (which may also be referred to as component values or numerical values).
  • each first side information data also includes x ⁇ y information, but the first side information
  • the information contained in the data is not the information of the color component, but the average value or the weighted average value and the like.
  • the first input data may only include information of the first color component in the target area, and may also include information of the first color component in the target area, and the second color in the first peripheral area. Information of the component, therefore, the first input data includes information of one or two color components.
  • the first side information data does not need to distinguish the color component involved in the first input data, only according to the embodiment of the present disclosure. The need for a convolutional neural network used in the generation is generated with reference to the size and number of values of the first input data.
  • the first side information data may have only one.
  • At least one first side information data is determined based on related information of the reconstructed first color component in the target area.
  • the color coding format of the image frame to be processed is a YUV format
  • the first input data includes information of color components of the first sample block of x rows and y columns, where x and the y are integers greater than or equal to 1. Assume that there can be only one information data on the first side.
  • an identifier value of an intra prediction mode of the reconstructed first color component in each first sampling block may be acquired; and identifier values of all intra prediction modes are combined into one first side information data.
  • the finally obtained first side information data includes x rows and y columns of identification values, and the identification value is a numerical value.
  • the intra prediction mode may be a directional mode.
  • the first color component is a luminance component
  • there are 35 intra prediction modes in H.265 the first sampling block is 1 pixel
  • the first input data includes 8 ⁇ 8 pixels
  • each of the 8 ⁇ 8 blocks is 4
  • the identification values of the intra prediction modes of the luminance of ⁇ 4 sub-blocks are 3, 17, 22, and 33, respectively.
  • the first side information can be as shown in Table 5.
  • At least one first side information data is determined based on information of the reconstructed second color component in the second peripheral area of the target area.
  • the first input data includes information of color components of the first sample block of x rows and y columns, and the x and the y are integers greater than or equal to 1.
  • information of the reconstructed second color component in the second peripheral area of the target area may be acquired; and an average value of the information of the reconstructed second color component in the second peripheral area of the target area may be determined (the actual application may also be weighted) An average value); and a first side information data is generated, wherein the first side information data includes x rows and y columns of average values.
  • the second peripheral area may be the same size as the first peripheral area K, and is composed of two columns of pixels located on the left side of the target area and two rows of pixels above, and the second color component is color.
  • a degree component U assuming that the average value of the information of the reconstructed second color component in the second peripheral region is 117, then, if the first input data includes the values of the color components of the 3 rows and 3 columns of the first sampling block, As shown in Table 6, the first side information data includes values of 3 rows and 3 columns of chrominance components U, each of which is 117.
  • Step 304 Input first input data to the convolutional neural network through the first channel, where the first input data includes information of a first color component of the target area in the image frame to be processed.
  • Step 304 may refer to step 203 above, and details are not described herein again.
  • Step 305 Input at least one first side information data to the convolutional neural network by using at least one second channel, where the at least one second channel is in one-to-one correspondence with the at least one first side information data.
  • Step 306 Perform multidimensional convolution filtering and nonlinear mapping on the input data of each channel through the input layer, and combine multi-dimensional convolution filtering of different channels and non-linearly mapped input data (such as adding) to obtain Input layer output data.
  • the input layer may include at least one channel.
  • the input layer since the first input data and the at least one first side information data need to be input to the input layer, the input layer includes at least two channels, that is, one One channel and at least one second channel.
  • the above-mentioned steps 304 and 305 may be performed at the same time, or may be performed in sequence, which is not limited by the embodiment of the present disclosure.
  • the intra-prediction device of the color component can perform multi-dimensional convolution filtering and nonlinear mapping on the input data of each channel through the input layer, and combine the multi-dimensional convolution filtering of different channels and the non-linearly mapped input data (such as Add) to get the output data of the input layer.
  • the input layer includes at least one convolution layer that is sequentially connected to each channel, and a merge layer.
  • Each convolution layer includes a feature extraction layer and a feature mapping layer.
  • Step A2 In each convolution layer: multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and the input data is nonlinearly mapped through the feature mapping layer.
  • step 306 For the structure of the convolution layer in the input layer provided in step 306, reference may be made to the structure of the convolution layer provided in the above step 204, which is not described in detail in the embodiment of the present disclosure.
  • Step B2 Combine the data processed by the at least one convolution layer corresponding to different channels by the merge layer to obtain output data of the input layer.
  • each feature extraction layer includes a convolution filter bank
  • each convolution filter bank includes at least one convolution filter (also called convolution kernel)
  • the nonlinear mapping function of the mapping layer is r()
  • the output data of the input layer satisfies:
  • F M (J) represents the output data of the Mth convolutional layer in the input layer, that is, the output data of the input layer, J is the first input data, * is the convolution operation, and W M is the input layer
  • B M is the offset coefficient of the convolution filter bank in the Mth convolutional layer
  • S i is the i-th first side information data
  • W si is The weight coefficient of the i-th first side information data
  • B si is the offset coefficient of the i-th first side information data
  • s1 is the number of the first side information data.
  • FIG. 18 is a schematic structural diagram of another convolutional neural network according to an embodiment of the present disclosure.
  • the input layer includes two channels of a first input channel and a second input channel, and each channel is connected.
  • a convolution layer each convolution layer including a feature extraction layer and a feature mapping layer.
  • the feature mapping layer is provided with an activation function, which is a nonlinear mapping function.
  • the output data of the input layer satisfies:
  • F 1 (J) r(W 1 *J+B 1 +W s1 *S 1 +B s1 ).
  • any one of the sides may be The information data is subjected to standardization processing such that the value range of the processed side information data is the same as the value range of the first input data.
  • the normalization process can be a linear mapping process, or a normalization process.
  • the value range of the information data of any side is [PredMode MIN , PredMode MAX ]
  • the range of the value range of the first input data is [Pixel MIN , Pixel MAX ] if the information data of any one of the sides If the information is x, then for the first information, the corresponding normalization formula is:
  • Norm(x) (x-PredMode MIN ) ⁇ (PredMode MAX -PredMode MIN )/(Pixel MAX -Pixel MIN )+Pixel MIN ;
  • the first information is any one of x rows and y columns of information included in the side information data, and norm(x) is the normalized first information.
  • the first side information data of the at least one first side information data includes an identifier value of the intra prediction mode, and the value ranges from 1 to 35, and the first input data ranges from 0 to 255. Then, all the information in the first side information data is substituted into the normalization formula to perform normalization processing on the certain first side information data, so that the processed first side information data is taken.
  • the value range is 0-255.
  • the above-mentioned standardization processing may be performed before the first input data is input to the convolutional neural network, or may be performed in the convolutional neural network, which is not limited in the embodiment of the present disclosure.
  • Step 307 Perform multidimensional convolution filtering and nonlinear mapping on the output data of the input layer through the hidden layer to obtain high-dimensional image data.
  • Step 307 can refer to step 205 above, and details are not described herein again.
  • Step 308 The high-dimensional image data is aggregated through the output layer to obtain the first output data.
  • Step 307 can refer to step 206 above, and details are not described herein again.
  • Step 309 Acquire first output data output by the convolutional neural network, where the first output data includes a predicted value of the information of the second color component of the target region by the convolutional neural network.
  • Step 307 can refer to step 207 above, and details are not described herein again.
  • the initial convolutional neural network needs to be trained to obtain the above convolutional neural network, and the training process of the convolutional neural network include:
  • Step A3 Input third input data to the convolutional neural network through the first channel.
  • the initial convolutional neural network needs to fully consider the network perception field, complexity and ability to solve problems.
  • Embodiments of the present disclosure do not define the network architecture of the initial convolutional neural network.
  • the third input data includes information of a first color component of the training area in the second specified image frame, and the second specified image frame may be a preset test image frame, or may be a randomly selected image frame, the second The specified image frame is usually different from the image frame to be processed described above.
  • the training area in the second specified image frame is the same size as the target area, and the third input data is acquired in the same manner as the first input data. For details, please refer to steps 201 to 202 above.
  • Step B3 Input at least one second side information data to the initial convolutional neural network by using at least one second channel.
  • the at least one second channel is in one-to-one correspondence with the at least one second side information data, and the at least one second side information data is acquired in the same manner as the at least one first side information data.
  • the specific process please refer to step 303 above.
  • Step C3 The original data corresponding to the training area in the second specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain a convolutional neural network.
  • the raw data consists of information of a second color component known in the training region of the second specified image frame.
  • the information of the second color component known in the training area is information of the second color component that is not processed in the training area, and the information of the known second color component in the training area is an ideal result of prediction, that is, If the prediction of the second color component of the training area is completely accurate, the obtained data is the original data.
  • FIG. 18 is an example in which the convolutional neural network includes an input layer, an implicit layer and an output layer, and the target area is 3 ⁇ 3 pixels, and the convolutional neural network may have other The structure of the present disclosure is not limited thereto.
  • the embodiment of the present disclosure inputs first input data including information of a first color component of a target region in an image frame to be processed to a convolutional neural network, and is processed by a convolutional neural network to obtain a second color component.
  • the first output data of the information thereby realizing the intra prediction of the color component by the convolutional neural network, and the reliability of the second color component obtained by the final prediction is high due to the characteristics of deep learning such as the convolutional neural network.
  • the accuracy of the prediction is further increased.
  • An embodiment of the present disclosure provides an intra prediction device 40 for color components. As shown in FIG. 19, the device 40 includes:
  • a first input module 401 configured to input, by using a first channel, first input data to a convolutional neural network, where the first input data includes information of a first color component of a target area in an image frame to be processed;
  • An obtaining module 402 configured to acquire first output data output by the convolutional neural network, where the first output data includes a predicted value of information of a second color component of the target region by the convolutional neural network;
  • first color component and the second color component are different color components of the target area.
  • the first input module of the embodiment of the present disclosure inputs the first input data including the information of the first color component of the target region in the image frame to be processed to the convolutional neural network, and is processed by the convolutional neural network to obtain the inclusion.
  • the first output data of the information of the second color component, thereby realizing the intra prediction of the color component by the convolutional neural network, and the second color component finally obtained by the prediction is reliable due to characteristics such as deep learning of the convolutional neural network Higher sex.
  • the first input data includes information of the reconstructed second color component in the first peripheral area, and information of the reconstructed first color component in the target area, where the target area is A peripheral area is a strip-shaped area located to the left and/or above the target area.
  • the device 40 further includes:
  • a first determining module 403 configured to determine a sampling rate relationship between the first color component and the second color component in the image frame to be processed before the inputting the first input data to the convolutional neural network through the first channel;
  • a second determining module 404 configured to determine, according to the sampling rate relationship, the first input data, wherein a distribution density of a second color component in the first peripheral area is equal to the target area in the first input data The distribution density of the first color component.
  • the second determining module 404 includes:
  • the first obtaining sub-module 4041 is configured to acquire information about the reconstructed second color component in the first peripheral area of the target area;
  • a second obtaining sub-module 4042 configured to acquire information about the reconstructed first color component in the target area
  • a first determining submodule 4043 configured to, according to the sampling rate relationship, information of the reconstructed first color component in the target region according to information of the reconstructed second color component in the first peripheral region, Determining the first input data.
  • the first determining submodule 4043 is configured to:
  • the sampling rate ratio is 1:1, the information of the reconstructed second color component in the first peripheral region of the target region, Information related to the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate ratio is greater than 1:1, based on the sampling rate ratio, the second color that has been reconstructed in the first peripheral region
  • the information of the component is upsampled such that the distribution density of the second color component in the first peripheral region after the upsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by the upsampling Information, and information of the reconstructed first color component in the target area is determined as the first input data;
  • the sampling rate relationship between the first color component and the second color component in the target region is: the sampling rate ratio is less than 1:1, and the reconstructed second color in the first peripheral region is based on the sampling rate ratio
  • the information of the component is downsampled such that the distribution density of the second color component in the first peripheral region after downsampling is equal to the distribution density of the first color component in the target region, and the second color component obtained by downsampling Information, information related to the reconstructed first color component in the target area is determined as the first input data.
  • the device 40 further includes:
  • the first training module 405 is configured to train the initial convolutional neural network to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the second input data comprising information of a first color component of a training region in a first specified image frame, the first specified image frame
  • the middle training area is the same size as the target area, and the second input data is acquired in the same manner as the first input data.
  • the raw data corresponding to the training area in the first specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, and the original data is used by the first specified image frame
  • the information consisting of the known second color component in the medium training region.
  • the device 40 further includes:
  • a third determining module 406 configured to determine at least one first side information data, each of the first side information data includes information other than information of a color component included in the first input data;
  • a second input module 407 configured to input the at least one first side information data to the convolutional neural network by using at least one second channel, the at least one second channel and the at least one first side information data One-to-one correspondence.
  • the third determining module 406 includes:
  • a second determining submodule 4061 configured to determine the at least one first side information data based on related information of the reconstructed first color component in the target area
  • a third determining submodule 4062 configured to determine the at least one first side information data based on information of the reconstructed second color component in the second peripheral area of the target area, where the target area
  • the second peripheral area is a strip-shaped area located on the left and/or above the target area.
  • the color coding format of the image frame to be processed is a YUV format
  • the first input data includes information of color components of the first sample block of the x rows and y columns, where the x and the y are both greater than Or an integer equal to 1;
  • the second determining submodule 4061 is configured to:
  • the identification values of all of the intra prediction modes are combined into one of the first side information data.
  • the first input data includes information of color components of the first sampling block of x rows and y columns, where x and the y are integers greater than or equal to 1;
  • the third determining submodule 4062 is configured to:
  • the first side information data includes x rows y columns and the average value.
  • the device 40 further includes:
  • the normalization module 408 is configured to perform normalization processing on any one of the side information data when the value range of the information data of any one of the at least one first side information data is different from the value range of the first input data.
  • the value range of the processed side information data is made the same as the value range of the first input data.
  • the device 40 further includes:
  • the second training module 409 is configured to train the initial convolutional neural network to obtain the convolutional neural network, and the training process of the convolutional neural network includes:
  • the manner of acquiring the third input data is the same as the manner of acquiring the first input data;
  • At least one second channel Inputting, by the at least one second channel, at least one second side information data to the initial convolutional neural network, wherein the at least one second channel is in one-to-one correspondence with the at least one second side information data, the at least The second side information data is acquired in the same manner as the at least one first side information data;
  • the raw data of the second color component corresponding to the training area in the second specified image frame is used as a training tag, and the initial convolutional neural network is trained to obtain the convolutional neural network, where the original data is Information consisting of information of a known second color component in the training region in the second designated image frame.
  • the convolutional neural network includes an input layer, an implicit layer, and an output layer; and the apparatus 40 further includes:
  • a first processing module 410 configured to perform, by using the input layer, the first input data, when the input layer has a channel with input data before acquiring the first output data output by the convolutional neural network Multidimensional convolution filtering and nonlinear mapping to obtain output data of the input layer;
  • a second processing module 411 configured to perform multidimensional convolution filtering and non-linear mapping on data input by each input channel through the input layer when there are at least two channels in the input layer, and different channels
  • the multi-dimensional convolution filtering and the non-linearly mapped input data are combined to obtain output data of the input layer;
  • a high-dimensional processing module 412 configured to perform multidimensional convolution filtering and non-linear mapping on the output data of the input layer by using the hidden layer to obtain high-dimensional image data
  • the aggregation module 413 is configured to aggregate the high-dimensional image data by using the output layer to obtain the first output data.
  • the input layer includes at least one convolution layer sequentially connected to each channel, and a merge layer, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • the second processing module 411 is configured to:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer corresponding to different channels is combined by the merge layer to obtain output data of the input layer.
  • the hidden layer includes at least one convolution layer sequentially connected, and each of the convolution layers includes a feature extraction layer and a feature mapping layer.
  • the high-dimensional processing module 412 is configured to:
  • each convolution layer multi-dimensional convolution filtering is performed on the input data through the feature extraction layer, and nonlinearly mapping the input data through the feature mapping layer;
  • the data processed by the at least one convolution layer is used as the high-dimensional image data.
  • the color coding format of the image frame to be processed is a YUV format
  • the first color component and the second color component are two of a luminance component Y, a chrominance component U, and a chrominance component V;
  • the color coding format of the image frame to be processed is an RGB format
  • the first color component and the second color component are two of a red component, a green component, and a blue component.
  • the first input module of the embodiment of the present disclosure inputs the first input data including the information of the first color component of the target region in the image frame to be processed to the convolutional neural network, and is processed by the convolutional neural network to obtain the inclusion.
  • the first output data of the information of the second color component, thereby realizing the intra prediction of the color component by the convolutional neural network, and the second color component finally obtained by the prediction is reliable due to characteristics such as deep learning of the convolutional neural network Higher sex.
  • the embodiment of the present disclosure further provides a computer device, including:
  • a memory for storing executable instructions of the processor
  • processor is configured to:
  • the convolutional neural network Inputting first input data to the convolutional neural network through the first channel, the first input data including information of a first color component of the target area in the image frame to be processed;
  • the convolutional neural network Acquiring first output data output by the convolutional neural network, the first output data comprising a predicted value of information of a second color component of the target region by the convolutional neural network;
  • first color component and the second color component are different color components of the target area.
  • FIG. 28 is a schematic structural diagram of a computer device according to an exemplary embodiment.
  • the computer device 500 includes a central processing unit (CPU) 501, a system memory 504 including a random access memory (RAM) 502 and a read only memory (ROM) 503, and a system bus 505 that connects the system memory 504 and the central processing unit 501.
  • the computer device 500 also includes a basic input/output system (I/O system) 506 that facilitates transfer of information between various devices within the computer, and a large capacity for storing the operating system 513, applications 514, and other program modules 515.
  • the basic input/output system 506 includes a display 508 for displaying information and an input device 509 such as a mouse or keyboard for user input of information. Both the display 508 and the input device 509 are connected to the central processing unit 501 via an input and output controller 510 that is coupled to the system bus 505.
  • the basic input/output system 506 can also include an input and output controller 510 for receiving and processing input from a plurality of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input and output controller 510 also provides output to a display screen, printer, or other type of output device.
  • the mass storage device 507 is connected to the central processing unit 501 by a mass storage controller (not shown) connected to the system bus 505.
  • the mass storage device 507 and its associated computer readable medium provide non-volatile storage for the computer device 500. That is, the mass storage device 507 can include a computer readable medium (not shown) such as a hard disk or a CD-ROM drive.
  • the computer readable medium can include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, EPROM, EEPROM, flash memory or other solid state storage technologies, CD-ROM, DVD or other optical storage, tape cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read only memory
  • EPROM Erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • the computer device 500 may also be operated by a remote computer connected to the network via a network such as the Internet. That is, the computer device 500 can be connected to the network 512 through a network interface unit 511 connected to the system bus 505, or the network interface unit 511 can be used to connect to other types of networks or remote computer systems (not shown). ).
  • the memory further includes one or more programs, the one or more programs being stored in a memory, and the central processor 501 implementing the intra prediction method of the color components by executing the one or more programs.
  • non-transitory computer readable storage medium comprising instructions, such as a memory comprising instructions executable by a processor of a computer device to perform the expressions shown in various embodiments of the present invention.
  • Image recommendation method may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.
  • Embodiments of the present disclosure are a readable storage medium that is a non-volatile readable storage medium having instructions stored therein when the readable storage medium is run on a processing component
  • An intra prediction method that causes the processing component to perform any of the color components provided by the embodiments of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本公开是关于一种颜色分量的帧内预测方法及装置,属于视频编解码领域。所述方法包括:通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。本公开解决了相关的帧内预测技术预测的预测结果可靠性较低的问题。

Description

颜色分量的帧内预测方法及装置
本公开要求于2017年11月29日提交、申请号为201711223298.2、发明名称为“颜色分量的帧内预测方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。
技术领域
本公开涉及视频编解码领域,特别涉及一种颜色分量的帧内预测方法及装置。
背景技术
随着视频编解码技术的飞速发展,目前提出了一种高效的视频压缩编码技术,该视频压缩编码技术是指在图像帧的各种维度上去除冗余,采用帧内预测技术通过降低图像帧在空域和时域上的冗余来提高编码压缩率。
在YUV编码技术中,像素信息(也称颜色信息)包括:亮度分量Y、色度分量U和色度分量V的信息,其中,色度分量的帧内预测技术是根据亮度分量和色度分量间的线性相关性,利用待处理图像帧的目标区域周边的已重建的亮度值来预测色度值,在这个过程中需要对目标区域周边的已重建的像素点的亮度值进行下采样得到下采样点的亮度值,然后根据下采样点的亮度值和已重建的像素点的色度值求得目标区域的缩放参数和偏置参数,再对目标区域中的已重建的亮度点进行下采样(相当于对已重建的像素点的亮度值进行下采样),并根据缩放参数和偏置参数来求得目标区域中的像素点的色度预测值。
但是,这种帧内预测技术是根据亮度分量和色度分量间的线性相关性来进行预测的,但实际上,线性关系无法准确地表达亮度分量和色度分量间的关系,因此基于该原理预测得到的色度分量的预测结果可靠性较低。
发明内容
本公开实施例提供了一种颜色分量的帧内预测方法及装置,可以解决现有 技术中的颜色分量的预测结果可靠性较低的问题。所述技术方案如下:
根据本公开实施例的第一方面,提供一种颜色分量的帧内预测方法,所述方法包括:
通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
可选的,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,在所述通过第一通道向卷积神经网络输入第一输入数据之前,所述方法还包括:
确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
可选的,所述基于所述采样率关系,确定所述第一输入数据,包括:
获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
获取所述目标区域中已重建的第一颜色分量的信息;
基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
可选的,所述基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据,包括:
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率 比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
可选的,所述方法还包括:
对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述方法还包括:
确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第一边信息数据一一对应。
可选的,所述确定至少一个第一边信息数据,包括:
基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
和/或,基于所述目标区域的第二周边区域中已重建的第二颜色分量的信 息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据,包括:
获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
可选的,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,包括:
获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列个所述平均值。
可选的,所述方法还包括:
当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据的取值范围相同。
可选的,所述方法还包括:
对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述卷积神经网络包括输入层、隐含层和输出层;
在所述获取所述卷积神经网络输出的第一输出数据之前,所述方法还包括:
当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据;
通过所述输出层对所述高维图像数据进行聚合(如求和),得到所述第一输出数据。
可选的,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据,包括:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
可选的,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据,包括:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
根据本公开实施例的第二方面,提供一种颜色分量的帧内预测装置,所述装置包括:
第一输入模块,用于通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
获取模块,用于获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
可选的,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述装置还包括:
第一确定模块,用于在所述通过第一通道向卷积神经网络输入第一输入数据之前,确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
第二确定模块,用于基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
可选的,所述第二确定模块,包括:
第一获取子模块,用于获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
第二获取子模块,用于获取所述目标区域中已重建的第一颜色分量的信息;
第一确定子模块,用于基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
可选的,所述第一确定子模块,用于:
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
可选的,所述装置还包括:
第一训练模块,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述装置还包括:
第三确定模块,用于确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
第二输入模块,用于通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第一边信息数据一一对应。
可选的,所述第三确定模块,包括:
第二确定子模块,用于基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
和/或,第三确定子模块,用于基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述第二确定子模块,用于:
获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
可选的,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述第三确定子模块,用于:
获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列个所述平均值。
可选的,所述装置还包括:
标准化模块,用于当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据 的取值范围相同。
可选的,所述装置还包括:
第二训练模块,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述卷积神经网络包括输入层、隐含层和输出层;所述装置还包括:
第一处理模块,用于在所述获取所述卷积神经网络输出的第一输出数据之前,当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
第二处理模块,用于当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
高维处理模块,用于通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据;
聚合模块,用于通过所述输出层对所述高维图像数据进行聚合,得到所述第一输出数据。
可选的,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述第二处理模块,用于:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
可选的,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述高维处理模块,用于:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
根据本公开实施例的第三方面,提供一种计算机设备,所述计算机设备为编码端设备或解码端设备,所述计算机设备包括:
处理器;
用于存储所述处理器的可执行指令的存储器;
其中,所述处理器被配置为执行上述第一方面提供的颜色分量的帧内预测方法,例如:
通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
可选的,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述处理器还用于:在所述通过第一通道向卷积神经网络输入第一输入数据之前,确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
可选的,所述基于所述采样率关系,确定所述第一输入数据,包括:
获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
获取所述目标区域中已重建的第一颜色分量的信息;
基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
可选的,所述基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据,包括:
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
可选的,所述处理器还用于:
对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述处理器还用于:
确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第一边信息数据一一对应。
可选的,所述确定至少一个第一边信息数据,包括:
基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
和/或,基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据,包括:
获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
可选的,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,包括:
获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列个所述平均值。
可选的,所述处理器还用于:
当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据的取值范围相同。
可选的,所述处理器还用于:
对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,所述卷积神经网络包括输入层、隐含层和输出层;
所述处理器还用于在所述获取所述卷积神经网络输出的第一输出数据之前,当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据;
通过所述输出层对所述高维图像数据进行聚合,得到所述第一输出数据。
可选的,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据,包括:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
可选的,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据,包括:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
本公开的实施例提供的技术方案可以包括以下有益效果:
本公开实施例提供的颜色分量的帧内预测方法及装置,将包含待处理图像帧中目标区域的第一颜色分量的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学 习等特性,使得最终预测得到的第二颜色分量可靠性较高。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚地说明本公开的实施例,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据相关技术示出的一种H.265的编码原理示意图;
图2是根据相关技术示出的一种H.265的解码原理示意图;
图3是根据一示例性实施例示出的一种颜色分量的帧内预测方法的流程图;
图4是根据一示例性实施例示出的一种未进行编码的图像帧的示意图;
图5是图4所示的图像帧的亮度分量Y的信息的呈现效果示意图;
图6是图4所示的图像帧的色度分量U的信息的呈现效果示意图;
图7是图4所示的图像帧的色度分量V的信息的呈现效果示意图;
图8是根据一示例性实施例示出的另一种颜色分量的帧内预测方法的流程图;
图9是根据一示例性实施例示出的一种确定第一输入数据的方法流程图;
图10是根据一示例性实施例示出的一种待处理图像帧中的区域示意图;
图11是根据一示例性实施例示出的另一种待处理图像帧中的区域示意图;
图12是根据一示例性实施例示出的一种上采样过程示意图;
图13是根据一示例性实施例示出的一种第一输入数据的组成元素示意图;
图14是根据一示例性实施例示出的一种下采样过程示意图;
图15是根据一示例性实施例示出的另一种第一输入数据的组成元素示意图;
图16是根据一示例性实施例示出的一种卷积神经网络的结构示意图;
图17是根据一示例性实施例示出的又一种颜色分量的帧内预测方法的流程图;
图18是根据一示例性实施例示出的另一种卷积神经网络的结构示意图;
图19是根据一示例性实施例示出的一种颜色分量的帧内预测装置的结构示意图;
图20是根据一示例性实施例示出的另一种颜色分量的帧内预测装置的结构示意图;
图21是根据一示例性实施例示出的一种第二确定模块的结构示意图;
图22是根据一示例性实施例示出的又一种颜色分量的帧内预测装置的结构示意图;
图23是根据一示例性实施例示出的再一种颜色分量的帧内预测装置的结构示意图;
图24是根据一示例性实施例示出的一种第三确定模块的结构示意图;
图25是根据另一示例性实施例示出的一种颜色分量的帧内预测装置的结构示意图;
图26是根据另一示例性实施例示出的又一种颜色分量的帧内预测装置的结构示意图;
图27是根据另一示例性实施例示出的再一种颜色分量的帧内预测装置的结构示意图;
图28是根据另一示例性实施例示出的一种计算机设备的结构示意图。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。
具体实施方式
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,显然,所描述的实施例仅仅是本公开一部份实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
本公开实施例提供一种颜色分量的帧内预测方法,该颜色分量的帧内预测方法是通过卷积神经网络(英文:Convolutional Neural Network;简称:CNN)来进行帧内预测的,为了便于读者理解,下面对卷积神经网络进行简单的解释。
卷积神经网络是一种前馈神经网络,是深度学习技术中极具代表的网络架 构之一,它的人工神经元(英文:Neuron)可以响应一部分覆盖范围内的周围单元,能根据图像特征进行处理。
一般地,卷积神经网络的基本结构包括两层,其一为特征提取层,每个神经元的输入与前一层的局部接受域相连,并提取该局部接受域的特征。其二是特征映射层,网络的每个特征映射层由多个特征映射组成,每个特征映射为一个平面。特征映射层设置有激活函数(英文:activation function),通常的激活函数为非线性映射函数,可以为sigmoid函数或神经网络回顾(英文:Rectified linear unit;简称:ReLU)函数。需要说明的是,卷积神经网络由大量的节点(也称“神经元”或“单元”)相互连接而成,每个节点代表一种特定的输出函数。每两个节点之间的连接代表加权值,称之为权重(英文:weight)。不同的权重和激活函数,则会导致卷积神经网络不同的输出。
卷积神经网络相较于传统的图像处理算法的优点之一在于,避免了对图像复杂的前期预处理过程(提取人工特征等),可以直接输入原始图像,进行端到端的学习。卷积神经网络相较于传统的神经网络的优点之一在于,传统的神经网络都是采用全连接的方式,即输入层到隐藏层的神经元都是全部连接的,这样做将导致参数量巨大,使得网络训练耗时甚至难以训练,而卷积神经网络则通过局部连接和权值共享等方式避免了这一问题。
进一步的,本公开实施例所提供的颜色分量的帧内预测方法,可以应用于视频编解码领域,为了便于读者理解,下面对视频编码过程和解码过程进行简单的解释。
第一、视频编码过程。
目前的视频编码标准主要有H.261至H.265,以及MPEG-4V1至MPEG-4V3等多种,其中,H.264,又称视频编码(英文:Advanced Video Coding;简称:AVC),H.265,又称高效率视频编码(英文:High Efficiency Video Coding;简称:HEVC),两者均采用运动补偿混合编码算法,本公开实施例以H.265为例进行解释。
如图1所示,图1为H.265的编码原理示意图。H.265的编码架构大致上和H.264的编码架构相似,主要也包含:帧内预测(英文:intra prediction)模块、帧间预测(英文:inter prediction)模块、变换(英文:transform)模块、量化(英文:quantization)模块、熵编码(英文:entropy coding)模块、反变换模块、反量化模块、重建图像模块和环路滤波模块(也称环内滤波模块)等模块,其中,帧间 预测模块可以包括运动估计模块和运动补偿模块,环路滤波模块包括去块滤波(英文:deblocking)模块(也称去块滤波器(英文:deblocking filter))和采样点自适应偏移(英文:Sample Adaptive Offset;简称:SAO)模块。
其中,在进行视频编码时,通常会将待编码图像划分为矩阵状排布的尺寸相等(也可以不等)的多个区域,每个区域对应一个图像块(也称编码块),每个区域可以是正方形区域,也可以为长方形区域,在进行编码时,通常是按照从上至下,从左至右的顺序对各个图像块依次进行处理。例如,上述帧内预测模块用于基于同一图像帧中的已重建的周围像素值对当前图像块的像素值进行预测,以去除空间冗余信息;上述帧间预测模块用于利用视频时域的相关性,使用邻近已重建的图像帧中的像素值预测待编码图像的像素值,以去除时间上的关联性;量化模块用于将图像块的连续取值映射成多个离散的幅值;上述去块滤波模块用于对图像块边界处的像素进行滤波以去除块效应;SAO模块用于进行像素值的补偿处理,重建图像模块将预测值和重建残差值相加获得重建像素值(未经环路滤波)。经过环路滤波模块得到的重建帧形成参考帧列表,用于帧间预测;熵编码模块对得到的模式信息和残差信息进行处理得到码流(英文:bitstream)。
在视频编码标准H.265中,帧内预测模块对待处理图像帧的图像块的亮度分量和色度分量进行独立编码。其中,色度分量的编码过程涉及色度的帧内预测技术,该色度的帧内预测技术是跨分量的色度预测技术,是在编码并重建完图像块的亮度分量的信息后,也即在环路滤波模块进行环路滤波前,利用已重建的亮度分量对色度分量进行预测。
第二、视频解码过程。
如图2所示,图2为H.265的解码原理示意图。H.265的解码架构大致上和H.264的解码架构相似,主要也包含:熵解码模块、帧内预测模块、帧间预测模块、反变换模块、反量化模块和环路滤波模块等模块,其中,环路滤波模块包括去块滤波模块和SAO模块。经过环路滤波模块得到的重建帧形成参考帧列表,用于帧间预测,熵解码模块对得到的码流进行处理得到模式信息和残差信息。
H.265的解码原理可以参考上述H.265的编码过程,本公开实施例对此不再赘述。
本公开实施例提供一种颜色分量的帧内预测方法,该帧内预测方法实质上是跨分量的帧内预测方法,其原理为基于卷积神经网络,通过第一颜色分量的信息来预测第二颜色分量的信息,如图3所示,包括:
步骤101、通过第一通道向卷积神经网络输入第一输入数据,该第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息。
其中,目标区域为待处理图像帧中待进行第二颜色分量预测的区域。需要说明的是,本公开实施例中,颜色分量的信息是指颜色分量的数值,也称分量值,则上述目标区域的第一颜色分量的信息也即是目标区域的第一颜色分量的数值。
步骤102、获取卷积神经网络输出的第一输出数据,该第一输出数据包含卷积神经网络对所述目标区域的第二颜色分量的信息的预测值。
其中,第一颜色分量和第二颜色分量为目标区域具有的不同的颜色分量,两者属于同一颜色空间。该卷积神经网络用于基于第一输入数据预测得到第一输出数据。
值的说明的是,在不同的应用场景中,待处理图像帧的类型不同,第一输入数据相应不同。例如,当待处理图像帧为待编码图像帧时,第一输入数据为目标区域中编码后重建的第一颜色分量的信息,编码后重建的第一颜色分量的信息是基于已编码得到的第一颜色分量的信息恢复得到的,以图1为例,该编码后重建的第一颜色分量的信息是将目标区域中已编码得到第一颜色分量的信息(也即是码流)通过反变化处理和反量化处理后,与目标区域的第一颜色分量的预测信息相加得到的信息,是由图1中的重建图像模块处理得到的图像信息;当待处理图像帧为待解码图像帧时,第一输入数据为目标区域中解码得到的重建的第一颜色分量的信息,解码得到的重建的第一颜色分量的信息是基于已解码得到的第一颜色分量的信息恢复得到的,以图2为例,该解码得到的重建的第一颜色分量的信息是将目标区域中已解码得到第一颜色分量的信息(也即是经过了熵解码模块的码流)通过反变化模块和反量化模块处理得到的信息,与帧内预测模块或帧间预测模块所预测得到的预测信息相加所得到的信息,其获取过程与图2中所标示的重建信息的获取过程相同。编码后重建的第一颜色分量的信息和解码得到的重建的第一颜色分量的信息均可以称为已重建的第一颜色分量的信息。
综上所述,本公开实施例将包含待处理图像帧中目标区域的第一颜色分量 的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学习等特性,使得最终预测得到的第二颜色分量可靠性较高。
本公开实施例所提供的颜色分量的帧内预测方法针对不同的待处理图像帧的颜色编码格式可以实现不同的颜色分量的预测,目前较为常用的两种图像帧的颜色编码格式有YUV格式和RGB格式。
第一种情况,当颜色编码格式为YUV格式时,其基本编码原理可以为:采用三管彩色摄影机或彩色电荷耦合元件(英文:Charge-coupled Device;简称:CCD)摄影机或摄像机等图像采集装置进行取像,然后把取得的彩色图像信号经分色和分别放大校正后得到RGB信号,再将RGB信号经过矩阵变换电路得到亮度分量Y的信号和两个色差信号B-Y(即色度分量U的信号)、R-Y(即色度分量V的信号),最后将亮度分量Y的信号、色度分量U的信号和色度分量V的信号分别进行编码,采用同一信道发送出去。这种色彩的表示方法就是所谓的YUV色彩空间表示。采用YUV色彩空间表示的亮度分量Y的信号、色度分量U的信号和色度分量V的信号是分离的。当然,上述YUV格式也可以通过其他方式获取,本公开实施例对此不做限定。
由于YUV格式的图像(后文简称目标图像)通常是通过对图像采集装置,如摄像机,拍摄的初始图像进行一系列处理后(例如进行格式转换)采样得到的,亮度分量Y、色度分量U和色度分量V的采样率(也称抽样率)可能不同,初始图像中各个颜色分量的分布密度相同,即各个颜色分量的分布密度比例为1:1:1,由于各个颜色分量的采样率不同,最终得到的目标图像的不同颜色分量的分布密度不同,通常,目标图像中,各颜色分量的分布密度比例等于采样率比例,需要说明的是,一种颜色分量的分布密度指的是指单位尺寸中所包含的该种颜色分量的信息的个数。例如亮度分量的分布密度是指单位尺寸中所包含的亮度值的个数。
目前的YUV格式基于不同的采样率比例划分为多种采样格式,该采样格式可以采用采样率比例的方式进行表示,这种表示方式称为A:B:C表示法,目前的采样格式可以分为:4:4:4、4:2:2、4:2:0和4:1:1等。例如,采样格式为4:4:4表示目标图像中亮度分量Y,色度分量U和色度分量V的采样率相同,在原始图像上没有进行下采样,目标图像的各个颜色分量的分布密度比例为 1:1:1;采样格式为4:2:2表示目标图像中每两个亮度分量Y共用一组色度分量U和色度分量V,目标图像的各个颜色分量的分布密度比例为2:1:1,即以像素点为采样单位,对原始图像的亮度分量未进行下采样,对原始图像的色度分量进行水平方向的2:1下采样,垂直方向未进行下采样得到目标图像;采样格式为4:2:0表示对目标图像中的色度分量U和色度分量V中每个色度分量来说,水平方向和竖直方向的采样率都是2:1,目标图像的亮度分量Y与色度分量U的分布密度比例为2:1,目标图像的亮度分量Y与色度分量V的分布密度比例为2:1,即以像素点为采样单位,对原始图像的亮度分量未进行下采样,对原始图像的色度分量进行水平方向的2:1下采样,以及垂直方向的2:1下采样得到目标图像。
在本公开实施例中,第一颜色分量和第二颜色分量为目标区域具有的不同类型的颜色分量。当待处理图像帧的颜色编码格式为YUV格式,该待处理图像帧中各个像素点的像素信息(也称颜色信息)包括亮度分量Y、色度分量U和色度分量V的信息,则上述第一颜色分量和第二颜色分量可以为亮度分量Y、色度分量U和色度分量V中的任两种。
请参考图4至图7,图4为一未进行编码的图像帧,图5至图7分别为该图像帧的亮度分量Y的信息(图5也可以称为亮度图像帧)、色度分量U的信息(图6也可以称为色度U图像帧)以及色度分量V的信息(图7也可以称为色度V图像帧)的呈现效果示意图。其中,图4为一彩色图像帧的示意图,图5至图7中的Y、U和V为标识信息,并不是图像帧中的内容。
第二种情况,当待处理图像帧的颜色编码格式为RGB格式时,该待处理图像帧中各个像素点的像素信息(也称颜色信息)包括透明度分量和多个颜色分量的信息,该多个颜色分量指的是至少两种颜色分量,例如,该多个颜色分量可以包括红色分量、绿色分量和蓝色分量,则第一颜色分量和第二颜色分量为红色分量、绿色分量和蓝色分量中的任两种。需要说明的是,当待处理图像帧的颜色编码格式为RGB格式时,红色分量、绿色分量和蓝色分量采样率比例为1:1:1,三者在待处理图像帧中的分布密度比例也为1:1:1。
值得说明的是,本公开实施例的保护范围并不局限于此,当待处理图像帧的颜色编码格式为其他格式时,任何熟悉本技术领域的技术人员在本公开实施例揭露的技术范围内,也可以采用本公开实施例提供的颜色分量的帧内预测方法轻易想到变换或替换来进行相应的颜色分量的预测,因此,这些可轻易想到 变化或替换,也涵盖在本公开实施例保护范围内。
在本公开实施例中,上述卷积神经网络包括输入层(英文:Input layer)、隐含层(英文:Hidden layer)和输出层(英文:Output layer)。可选的,该卷积神经网络可以包括一个输入层、一个隐含层和一个输出层。输入层可以包括至少一个通道,通过该至少一个通道可以向卷积神经网络输入数据,在本公开实施例中,向卷积神经网络输入数据来进行颜色分量的预测的过程可以有至少两种可实现方式,在不同的可实现方式中,颜色分量的帧内预测方法不同,分别如下:
第一种可实现方式,通过第一通道向卷积神经网络输入第一输入数据,以使卷积神经网络进行颜色分量的跨分量帧内预测,得到第一输出数据。
其中,该第一输入数据可以包括待处理图像帧中目标区域的多个第一采样块的第一颜色分量的信息,第一输出数据包括卷积神经网络输出的目标区域的多个第二采样块的第二颜色分量的信息,其中,第一采样块为针对第一颜色分量的采样单位,该第一采样块包括至少一个第一颜色分量点,第一颜色分量点为能够采集到第一颜色分量的信息的最小区域单位,该第一颜色分量点也可以称为第一颜色分量像素点或者第一颜色分量像素位置。
示例的,假设第一颜色分量为亮度分量,第一颜色分量点即为亮度点,若目标区域中每个像素点都具有一个亮度值,则一个亮度点的尺寸与一个像素点的尺寸相同,第一采样块由至少一个亮度点组成,也即是由至少一个像素点组成。
第二采样块为针对第二颜色分量的采样块,该第二采样块包括至少一个第二颜色分量点,该第二颜色分量点为能够采集到第二颜色分量的信息的最小区域单位,该第二颜色分量点也可以称为第二颜色分量像素点或者第二颜色分量像素位置。
示例的,假设第二颜色分量为色度分量,第二颜色分量点即为色度点,若目标区域中每两个像素点具有一个色度值(或者说共用一个色度值),则一个色度点的尺寸与两个像素点的尺寸相同,第二采样块由至少一个色度点组成,也即是由至少两个像素点组成。
则由上可知,每个第一采样块和每个第二采样块均可以由一个或多个像素点组成,例如,假设第一采样块由2×2个像素点组成,则第一输入数据可以包 括待处理图像帧中目标区域的以每2×2个像素点为采样单位采样得到的第一颜色分量的信息,其中,每个第一采样块包含一个第一颜色分量的信息,该信息可以为该第一采样块中指定位置的第一颜色分量点的信息,也可以是该采样单位中所有第一颜色分量点的信息平均值。示例的,当第一颜色分量为亮度分量时,每个第一采样块包含一个亮度值,该亮度值可以为该第一采样块中指定亮度点(如位于左上角的亮度点或者位于中心位置的亮度点)的亮度值,也可以是该第一采样块中所有亮度点的亮度平均值。
假设第二采样块由2×2个像素点组成,则第一输出数据可以包括待处理图像帧中目标区域的以每2×2个像素点为采样单位采样得到的第二颜色分量的信息(该数据为一采样结果的预测数据),其中,每个第二采样块包含一个第二颜色分量的信息,该信息可以为该第二采样块中指定位置的第二颜色分量点的信息,也可以是该第二采样块中所有第二颜色分量点的信息平均值。示例的,当第二颜色分量为色度分量(如色度分量U或色度分量V)时,每个第二采样块包含一个色度值,该色度值可以为该第二采样块中指定色度点(如位于左上角的色度点或者位于中心位置的色度点)的色度值,也可以是该第二采样块中所有色度点的色度平均值。
需要说明的是,本申请实施例中采样块的尺寸只是示意性说明,本申请在实际应用时,上述第一采样块和第二采样块的尺寸可以均为8×8个像素点。
当然,由于采样单位越精细,预测的颜色分量的细致程度越高,因此,在一种可选的实现方式中,该第一采样块由一个第一颜色分量点组成,第二采样块由一个第二颜色分量点组成。则第一输入数据包括待处理图像帧中目标区域的所有第一颜色分量的信息(也即所有像素点的第一颜色分量的信息),第一输出数据包括卷积神经网络对目标区域的所有第二颜色分量的信息(也即所有像素点的第二颜色分量的信息)。
请参考图8,假设第一输入数据包括待处理图像帧中目标区域的所有第一颜色分量的信息,该第一输出数据包括卷积神经网络对目标区域的所有第二颜色分量的信息,例如,该待处理图像帧为视频图像帧,该颜色分量的帧内预测方法,可以包括:
步骤201、确定待处理图像帧中第一颜色分量和第二颜色分量的采样率关系。
示例性地,待处理图像帧通常会被划分为矩阵状排布的尺寸相等的多个区 域,每个区域对应一个图像块(在视频编解码领域也称编码块),在进行图像处理时,通常是按照从上至下,从左至右的顺序对各个区域依次进行处理,本公开实施例中,目标区域为待处理图像帧中待进行第二颜色分量预测的区域,在对该目标区域的第二颜色分量进行预测时,该目标区域上方和左侧的区域的第二颜色分量已经完成相应的预测。示例的,在编解码领域,该目标区域为待处理图像帧中待进行第二颜色分量重建的区域,在对该目标区域的第二颜色分量进行重建时,该目标区域上方和左侧的区域的第二颜色分量已经完成相应的重建。
在同一区域中,不同颜色分量的采样率可以相同也可以不同,相应的,相互之间的采样率关系可以相同,也可以不同,该采样率关系是由实际的颜色编码格式的采样格式决定的,如前所述,例如颜色编码格式为YUV格式时,采样格式可以为YUV4:2:0或YUV4:4:4等,其中,采样格式为YUV4:2:0时,待编码图像帧的同一区域中,亮度分量Y、色度分量U和色度分量V的采样率关系为:亮度分量Y和色度分量V在水平和垂直方向采样率比例各为2:1,亮度分量Y和色度分量U在水平和垂直方向采样率比例各为2:1;色度分量U和色度分量V的采样率比例为1:1;采样格式为YUV4:4:4时,待编码图像帧的同一区域中,亮度分量Y、色度分量U和色度分量V的采样率关系为:亮度分量Y和色度分量U的采样率比例为1:1,亮度分量Y和色度分量V的采样率比例为1:1。当然,待编码图像帧还可以是其他采样格式,本公开实施例对此不再赘述。并且,上述采样率关系,最终反映了颜色分量的分布密度,例如,两个颜色分量的采样率比例为1:1时,该两个颜色分量在同一区域中的分布密度相同。
如果根据亮度分量和色度分量间的线性相关性来进行帧内预测,其原理是依据图像的局部亮度与色度线性相关,但实际上亮度分量的纹理特性会远强于色度分量的纹理特性,以图4中人脸图像嘴角位置4x4个像素的区域W为例,假设采样格式为YUV4:4:4,则区域W中每个像素所具有的YUV颜色分量的采样率关系为:采样率比例为1:1:1,此时区域W中每个像素点具有1个亮度分量Y的信息(即数值),1个色度分量U的信息和1个色度分量V的信息,参见图5至图7,以及表1至表3,图5至图7分别为该图像帧的亮度分量Y的信息、色度分量U的信息以及色度分量V的信息的呈现效果示意图,表1至表3分别为该区域W中,像素点所分别具有的亮度分量Y的数值、色度分 量U的数值以及色度分量V的数值。由图5至图7和表1至表3可知,在区域W中该亮度分量Y的信息有显著变化的情况下,对应区域的色度分量U的信息和色度分量V的信息并无明显变化。当包含亮度分量Y、色度分量U和色度分量V的信息的图像块分别呈现时,可以看出这三个图像块都有相似的轮廓信息,因此对应于同一帧图像的同一区域的亮度分量Y、色度分量U和色度分量V有一定的相关性。在本公开实施例中,通过卷积神经网络进行跨颜色分量的预测,可以实现通过在卷积神经网络的感知野范围内提取的纹理等图像特征生成预测结果,从而既可以避免亮度分量和色度分量被简单设定为具有线性相关的关系,又可以充分考虑亮度分量Y、色度分量U和色度分量V的相关性。
表1
Figure PCTCN2018113779-appb-000001
表2
Figure PCTCN2018113779-appb-000002
表3
Figure PCTCN2018113779-appb-000003
在本公开实施例中,为了保证能够进行颜色分量的准确预测,有效分析亮度分量Y、色度分量U和色度分量V的相关性,简化卷积神经网络的网络架构,该第一输入数据不仅可以包括目标区域中已重建的第一颜色分量的信息,还可以包括目标区域的第一周边区域的已重建的第二颜色分量的信息,该已重建的第二颜色分量的信息可以反映第二颜色分量在待预测图像中的纹理特性, 基于包含该已重建的第二颜色分量的信息,卷积神经网络可以更为准确地预测目标区域的第二颜色分量的信息,请参考上述步骤102对已重建的第一颜色分量的信息的解释,当待处理图像帧为待编码图像帧时,已重建的第二颜色分量的信息为编码后重建的第二颜色分量的信息,当待处理图像帧为待解码图像帧时,已重建的第二颜色分量的信息为解码得到的重建的第二颜色分量的信息。
其中,目标区域的第一周边区域为位于目标区域左侧和/或上方的带状区域(也称条状区域),该带状区域与目标区域邻接。该带状区域的范围可以根据实际情况设定,示例的,该带状区域由位于目标区域左侧的至少一列像素和/或上方的至少一行像素组成,该p和q均为大于或等于1的整数。
由前述对颜色编码格式的介绍可知,由于目标图像是由初始图像进行下采样(或者不采样,不采样时可以视为各个颜色分量的采样率相同)得到的,各个颜色分量的采样率比例决定了最终得到的目标图像中颜色分量的分布密度,而本公开实施例中帧内跨分量预测的对象:待处理图像,即为上述目标图像,其各个颜色分量的采样率比例可能不同,相应的,分布密度也可能不同,因此,第一输入数据包括的已重建的第一颜色分量的信息的分布密度和已重建的第二颜色分量的信息的分布密度也可能不同。
为了使得卷积神经网络的架构更简单,运算更简洁,在向卷积神经网络的输入第一输入数据之前,可以基于待处理图像帧中第一颜色分量和第二颜色分量的采样率关系,进行第一输入数据的分布密度的一致化处理,该一致化处理过程可以参考后续步骤2023,经过一致化处理后得到的第一输入数据中包含的第一周边区域中第二颜色分量的分布密度等于目标区域中第一颜色分量的分布密度,使得第一输入数据中包含的各个颜色分量的分布密度均匀,并且,由于预测主要是以目标区域中第一颜色分量的信息做参考,因此,在确定第一输入数据的过程中,是通过保持目标区域中第一颜色分量的分布密度不变,调整第一周边区域中第二颜色分量的分布密度来实现两者密度相等的。
步骤202、基于采样率关系,确定第一输入数据。
示例的,当本公开实施例所提供的颜色分量的帧内预测方法应用于编解码领域时,第一输入数据包含的待处理图像帧中目标区域的第一颜色分量的信息为目标区域中已重建的第一颜色分量的信息,则假设该第一采样块为一个第一颜色分量点,第二采样块为一个第二颜色分量点,如图9所示,基于采样率关系,确定第一输入数据的过程可以包括:
步骤2021、获取目标区域的第一周边区域中已重建的第二颜色分量的信息。
例如,假设目标区域为图10中的区域H,第一颜色分量为亮度分量Y,第二颜色分量为色度分量U,采样格式可以分为:YUV4:4:4,图10的一个方格代表一个像素点,则第一周边区域K由位于目标区域左侧的2列像素和上方的2行像素组成,如图10所示,第一周边区域K和目标区域H中,每个像素所具有的YUV颜色分量的采样率关系为1:1:1,则获取的第一周边区域中已重建的第二颜色分量的信息即为第一周边区域K中的色度分量U的信息。
步骤2022、获取目标区域中已重建的第一颜色分量的信息。
仍然以上述图10的例子为例,目标区域中已重建的第一颜色分量的信息即为目标区域H中的亮度分量的信息。
步骤2023、基于采样率关系,根据第一周边区域中已重建的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息,确定第一输入数据。
示例的,步骤2023包括:
S1、当目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将目标区域的第一周边区域中已重建的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息确定为第一输入数据。
仍然以上述步骤2021中图10的例子为例,由于亮度分量Y和色度分量U的采样关系为采样率比例为1:1,则如图11所示,直接将第一周边区域K中已重建的色度分量的信息和目标区域H中已重建的亮度分量的信息确定为第一输入数据。假设,图10的一个方格代表一个像素点,则第一周边区域K的色度分量U的分布密度为每个像素点上具有一个色度值,目标区域H中的亮度分量Y的分布密度为每个像素点上具有一个亮度值,此时,第一周边区域K的色度分量U的分布密度等于目标区域H中的亮度分量Y的分布密度。
S2、当目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于采样率比例,对第一周边区域中已重建的第二颜色分量的信息进行上采样(英文:upsampling),使得上采样后第一周边区域中第二颜色分量的分布密度等于目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息确定为第一输入数据。
例如,颜色编码格式为YUV4:2:2,第一颜色分量为亮度分量Y和第二颜 色分量为色度分量U,则亮度分量Y和色度分量U的采样率关系为:采样率比例为2:1,大于1:1,则需要基于采样率比例:2:1,对第一周边区域中已重建的色度分量U的信息进行上采样,并将上采样得到的色度分量U的信息,与目标区域中已重建的亮度分量Y的信息确定为第一输入数据。
示例的,本公开实施例中,可以使用上采样滤波器对第一周边区域中已重建的第二颜色分量的信息进行上采样,或者,在原有图像的第二颜色分量的信息的基础上采用合适的插值算法插入新的第二颜色分量的信息。
以采用插值算法为例,由于第一颜色分量和第二颜色分量的采样率比例大于1:1,即目标区域中第一颜色分量对应的第一采样块的尺寸要小于第二颜色分量对应的第二采样块的尺寸,且需要保持目标区域中第一颜色分量的分布密度不变,则上采样后的图像的基本单位为第一采样块。
在本公开实施例中,当目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例等于r:1,该r为大于1的整数,则对第一周边区域中多个第二采样块的第二颜色分量的信息进行r倍的上采样,得到多个第一采样块的第二颜色分量的信息,也即是上采样后第一周边区域中第二颜色分量的分布密度和目标区域中第一颜色分量的分布密度相等,并将上采样得到的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息确定为第一输入数据。
进一步的,采用插值算法实现的上采样可以指的是,在第一周边区域的原有第二颜色分量的信息的基础上插入新的第二颜色分量的信息,以使得插值后的第一周边区域中第二颜色分量的分布密度等于目标区域的第一颜色分量的分布密度。其中,假设第一周边区域包括M×N-m×n个具有第二颜色分量的信息的第二采样块,则对该M×N-m×n个具有第二颜色分量的信息的第二采样块进行r倍上采样可以为:将第一周边区域中每个第二采样块上的第二颜色分量的信息复制,并将每个第二采样块划分为r 2个第一采样块,在每个第一采样块所在位置填充复制得到的第二颜色分量的信息,也即是,对于划分得到的任一第一采样块,将该第一采样块所属的原第二采样块中的第二颜色分量的信息填充至该第一采样块中。上述填充过程即在每个第二采样块相邻的r 2-1个位置进行插值,最终上采样得到的第二颜色分量的信息实际上为[(M×N-m×n)×r 2]个第二颜色分量的信息。
例如,假设目标区域为图10中的区域H,第一颜色分量为亮度分量Y,第二颜色分量为色度分量U,采样格式可以分为:YUV4:2:2,第一周边区域K 由位于目标区域左侧的2列像素和上方的2行像素组成,则如图10所示,第一周边区域K和目标区域H中,每个像素所具有的YUV颜色分量的采样率关系为2:1:1,则如图12所示,获取第一周边区域K中的色度分量U的信息,并进行2倍上采样,得到上采样后的第一周边区域K。以图12上方的第一周边区域K中的第一行第一列的第二采样块的色度分量U的上采样为例,将该色度分量U的信息复制,将每个第二采样块划分为4个第一采样块,将每个第一采样块所在位置填充复制得到色度分量U的信息,也即是基于复制得到色度分量U的信息分别对其周围的3个位置进行插值,即该色度分量U所在采样块处的右侧、下侧和右下侧相邻位置插值,其他位置的插值方式同理,最终得到图12下方的第一周边区域K。
如图13所示,最终将上采样得到的色度分量的信息,与目标区域中已重建的亮度分量的信息确定为第一输入数据。
S3、当目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于采样率比例,对第一周边区域中已重建的第二颜色分量的信息进行下采样(英文:subsampled),使得下采样后第一周边区域中第二颜色分量的分布密度等于目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息确定为第一输入数据。
例如,颜色编码格式为YUV4:2:2,第一颜色分量为色度分量U,第二颜色分量为亮度分量Y,则色度分量U和亮度分量Y的采样率关系为:采样率比例为1:2,小于1:1,则需要基于采样率比例1:2,对第一周边区域中已重建的亮度分量Y的信息进行下采样,并将下采样得到的亮度分量Y的信息,与目标区域中已重建的色度分量U的信息确定为第一输入数据。
示例的,本公开实施例中,可以使用下采样滤波器对第一周边区域中已重建的第二颜色分量的信息进行下采样,或者,基于原有图像的第二颜色分量的信息进行下采样得到采样后的第二颜色分量的信息。
以上述第二种下采样方式为例,由于第一颜色分量和第二颜色分量的采样率比例小于1:1,即目标区域中第一颜色分量对应的第一采样块的尺寸要大于第二颜色分量的第二采样块的尺寸,且需要保持目标区域中第一颜色分量的分布密度不变,则下采样后的图像的基本单位应该为第一采样块。
在本公开实施例中,当目标区域中第一颜色分量和第二颜色分量的采样率 关系为:采样率比例等于1:s,该s为大于1的整数,则对第一周边区域中多个第二采样块的第二颜色分量的信息进行s倍的下采样,得到多个第一采样块的第二颜色分量的信息,也即是下采样后第一周边区域中第二颜色分量的分布密度和第一颜色分量的分布密度相等,并将下采样得到的第二颜色分量的信息,与目标区域中已重建的第一颜色分量的信息确定为第一输入数据。
其中,假设第一周边区域包括M×N-m×n个具有第二颜色分量的信息的第二采样块,则对该M×N-m×n个具有第二颜色分量的信息的第二采样块进行s倍的下采样指的是,将第一周边区域中每s×s个第二采样块的第二颜色分量的信息的平均值确定为一个第一采样块的第二颜色分量的信息,将所有第一采样块的第二颜色分量的信息作为下采样得到的第二颜色分量的信息,最终下采样得到的第二颜色分量的信息实际上为[(M×N-m×n)/s 2]个第一采样块的第二颜色分量的信息。
例如,假设第一周边区域包括图4中的区域W,第一颜色分量为色度分量U,第二颜色分量为亮度分量Y,采样率比例为:1:2,区域W中亮度分量Y的信息如表1所示,区域W包括4×4个具有亮度分量Y的信息的第二采样块。则基于表2所示的亮度分量Y的信息进行下采样得到的下采样后的亮度分量Y的信息可以如表4所示,下采样后的亮度分量Y的信息包括2×2个具有亮度分量Y的信息的下采样点。请参见表4,表4所对应的下采样后的亮度分量Y包括4个第一采样块,对应的亮度分量Y的值,简称亮度值分别为128.25、122.5、119.25和100.5,其中,第一个第一采样块的亮度值128.25为区域W中第1行第1列、第1行第2列、第2行第1列和第2行第2列的亮度值的平均值;第二个第一采样块的亮度值97.5为区域W中第1行第3列、第1行第4列、第2行第3列和第2行第4列的亮度值的平均值;第三个第一采样块的亮度值119.25为区域W中第3行第1列、第3行第2列、第4行第1列和第4行第2列的亮度值的平均值;第四个第一采样块的亮度值100.5为区域W中第4行第3列、第4行第4列、第4行第3列和第4行第4列的亮度值的平均值。
表4
Figure PCTCN2018113779-appb-000004
上述例子只是以第一周边区域中的部分区域W的下采样为例进行说明,实际应用中,以对图14上方的第一周边区域K中的亮度分量Y的信息进行下 采样为例,采样后得到的亮度分量Y的信息如图14下方的第一周边区域K中的亮度分量Y的信息,如图15所示,最终将下采样得到的亮度分量的信息,与目标区域中已重建的色度分量的信息确定为第一输入数据。
上述步骤201和步骤202是以待处理图像帧中第一颜色分量和第二颜色分量的采样率关系为依据,来进行第一输入数据的分布密度的一致化处理的,但在另一种可选的实现方式中,也可以直接获取目标区域的第一周边区域中已重建的第二颜色分量的信息,获取目标区域中已重建的第一颜色分量的信息(参考上述步骤2021和步骤2022),然后确定已重建的第一颜色分量的信息在目标区域的第一分布密度,确定已重建的第二颜色分量的信息在目标区域的第二分布密度,然后基于第一分布密度与第二分布密度的比值(可选的,该比值与上述的第一颜色分量和第二颜色分量的采样率比例相等),来进行如步骤2023所提供的一致化处理过程。
在再一种可选的实现方式中,也可以直接获取目标区域的第一周边区域中已重建的第二颜色分量的信息,获取目标区域中已重建的第一颜色分量的信息(参考上述步骤2021和步骤2022),并将两者作为第一输入数据,则无需执行上述步骤201和步骤2023。
步骤203、通过第一通道向卷积神经网络输入第一输入数据。
第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息。在一种可选的情况中,由步骤202可知,第一输入数据可以包括目标区域的第一周边区域中已重建的第二颜色分量的信息(该信息为上采样、下采样或者不采样得到的信息)和目标区域中已重建的第一颜色分量的信息。在另一种可选的情况中,第一输入数据也可以仅包含目标区域中已重建的第一颜色分量的信息,则无需执行上述步骤201和步骤202。
步骤204、通过输入层对第一输入数据进行多维卷积滤波和非线性映射,得到输入层的输出数据。
可选的,输入层可以包含至少一个通道,该至少一个通道包括用于输入第一输入数据的第一通道,通过输入层可以对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将经过不同通道的多维卷积滤波和非线性映射后的输入数据进行合并,得到输入层的输出数据。当输入层有一个通道存在输入数据时,也即是当输入层仅包括第一通道,或者输入层包括多个通道,但是只通过第一通道进行了数据输入时,输入层可以无需执行上述合并动作,直接将对 第一输入数据进行多维卷积滤波和非线性映射所得到的数据作为输入层的输出。
本公开实施例提供的卷积神经网络可以包括一个输入层、一个隐含层和一个输出层。该输入层可以包括与第一通道对应的依次连接的至少一个卷积层,本公开实施例中不对输入层中包含的卷积层层数、卷积层连接方式和卷积层属性等作限定。每个卷积层包括一个特征提取层和一个特征映射层。
假设输入层包含M个卷积层,M≥1,每个特征提取层包括一个卷积滤波器组,每个卷积滤波器组包括至少一个卷积滤波器(也称卷积核),特征映射层的非线性映射函数为r(),则第j个卷积层的输出数据满足:
Figure PCTCN2018113779-appb-000005
其中,F j(J)表示输入层中第j个卷积层的输出数据,J为第一输入数据,*为卷积操作,W j为该输入层第j个卷积层中卷积滤波器组的权重系数,B j为第j个卷积层中卷积滤波器组的偏移系数。
假设第j个卷积层的卷积滤波器组包括n j个卷积滤波器,该n j个卷积滤波器作用于第j个卷积层的输入数据后,输出n j个图像分块。可选的,第j个卷积层的每个卷积滤波器的大小为c j×f j×f j,其中,c j为第j个卷积层的输入通道数,f j×f j为第j个卷积层的每个卷积滤波器在空间上的大小(或者称为尺寸)。
示例的,如图16所示,图16为本公开实施例提供的一种卷积神经网络的结构示意图,输入层包括一个卷积层,该卷积层包括特征提取层X1和特征映射层X2。其中,特征映射层X2设置有激活函数,该激活函数为非线性映射函数。
假设特征提取层X1包括n 1个卷积滤波器,n 1为正整数,则通过特征提取层X1的n 1个卷积滤波器对第一输入数据进行多维卷积滤波,得到n 1个图像数据;通过特征映射层X2对该n 1个图像数据进行非线性映射,得到n 1个映射图像数据,则n 1个映射图像数据即为输入层的输出数据。
相应的,输入层的输出数据F 1(J)满足:
F 1(J)=r(W 1*J+B 1);
其中,J为第一输入数据,*表示卷积,W 1表示n 1个卷积滤波器的权重系数,B 1为该n 1个卷积滤波器的偏移系数,r()为特征映射层的激活函数,该激活函数可以为sigmoid函数或ReLU函数等非线性映射函数。
进一步的,假设n 1=64,每个卷积滤波器的参数为:c 1=2,f 1=5,使用ReLU函数作为上述非线性映射函数r(),r()的函数表达式为r(x)=max(0,x),则该输入层的输出数据满足:
F 1(J)=max(0,W 1*J+B 1);
其中,J为第一输入数据,*表示卷积,W 1表示64个卷积滤波器的权重系数,B 1为该64个卷积滤波器的偏移系数,每个卷积滤波器的大小为2×5×5。
步骤205、通过隐含层对输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据(也称高维图像分块)。
可选的,隐含层包括依次连接的至少一个卷积层,本公开实施例中不对隐含层中包含的卷积层层数、卷积层连接方式和卷积层属性等作限定。每个卷积层包括一个特征提取层和一个特征映射层,该隐含层中每个卷积层的结构可以参考上述步骤204中输入层中卷积层的结构,隐含层中各个卷积层的功能也可以参考上述输入层中卷积层的功能。
则在每个卷积层中:可以通过特征提取层对输入的数据进行多维卷积滤波,并通过特征映射层对输入的数据进行非线性映射;然后将经过该至少一个卷积层处理的数据作为高维图像数据,该高维图像数据即为隐含层的输出数据。
假设隐含层包含N个卷积层,N≥1,每个特征提取层包括一个卷积滤波器组,每个卷积滤波器组包括至少一个卷积滤波器,特征映射层的非线性映射函数为g(),则第i个卷积层的输出数据满足:
Figure PCTCN2018113779-appb-000006
其中,H i(I)表示隐含层中第i个卷积层的输出数据,I为输入层的输出数据,即上述步骤204中的F M(J),*为卷积操作,O i为该隐含层第i个卷积层中卷积滤波器组的权重系数,A i为第i个卷积层中卷积滤波器组的偏移系数。
假设第i个卷积层的卷积滤波器组包括m i个卷积滤波器,该m i个卷积滤波器作用于第i个卷积层的输入数据后,输出m i个图像分块。可选的,第i个卷积层的每个卷积滤波器的大小为d i×k i×k i,其中,d i为第i个卷积层的输入通道数,k i×k i为第i个卷积层的每个卷积滤波器在空间上的大小。
例如,假设该隐含层包括1个卷积层,即上述N=1,该卷积层中的卷积滤波器组包括m 2=32个卷积滤波器,每个卷积滤波器的参数为:d 2=64,k 2=1, 使用ReLU函数作为上述非线性映射函数g(),g()的函数表达式为g(x)=max(0,x),则该隐含层的输出数据满足高维映射公式(也称卷积处理表达式),该高维映射公式为:
H 1(I)=max(0,O 1*I+A 1);
其中,H 1(I)为隐含层的输出数据,I为输入层的输出数据,即上述步骤204中的F M(J),*表示卷积,O 1为该卷积层中32个卷积滤波器的权重系数,A 1为32个卷积滤波器的偏移系数,每个卷积滤波器的大小为64×1×1。
步骤206、通过输出层对高维图像数据进行聚合,得到第一输出数据。
在本公开实施例中,当该颜色分量的帧内预测方法应用于视频编解码领域时,由于输出层输出的数据为第二颜色分量的重建数据,因此,输出层也称重建层,输出层可以对隐含层输出的高维图像数据进行聚合,输出最终的第一输出数据。本公开实施例不对输出层的结构作限定。
示例的,输出层的结构可以为直接学习(英文:Direct Learning)结构,当输出层的结构为Direct Learning结构时,输出层可以对隐含层输出的高维图像数据进行卷积操作后直接输出重建图像的数据,该重建图像的数据即为第一输出数据。输出层的输出数据满足第一重建公式,该第一重建公式为:
P(V)=U v*V+C v
其中,P(V)为输出层的输出数据,也即是第一输出数据,V为隐含层的输出数据,也即是步骤205中的H N(I),*为卷积操作,U v为输出层的权重系数,C v为输出层的偏移系数。
进一步的,输出层包括1个卷积滤波器,即有1个卷积滤波器作用于隐含层的输出数据,输出1个图像数据,从而实现高维图像数据的聚合;每个卷积滤波器的大小为e×t×t,其中,e为输入通道数,t×t为输出层的每个卷积滤波器在空间上的大小。
示例的,假设该输出层如图16所示,输出层的结构为Direct Learning结构,输出层包括1个卷积层,该卷积层包括1个卷积滤波器,该输出层的卷积滤波器的参数为:e=32,t=3,则输出层的输出数据满足:
P(V)=U v*V+C v
其中,P(V)为输出层的输出数据,也即是第一输出数据,V为隐含层的输出数据,也即是步骤205中的H N(I),*为卷积操作,U v为1个卷积滤波器的权重系数,C v为1个卷积滤波器的偏移系数,该卷积滤波器的大小为32×3×3。
示例的,输出层的结构可以为残差学习(英文:Residual learning)结构,当输出层的结构为Residual learning结构时,输出层可以对隐含层输出的高维图像数据进行卷积操作后,将处理后的数据与输入层的输出数据进行聚合以输出重建图像的数据,该重建图像的数据即为第一输出数据。输出层的输出数据满足第二重建公式,该第二重建公式为:
P(V)=U v*V+C v+I;
其中,P(V)为输出层的输出数据,也即是第一输出数据,V为隐含层的输出数据,也即是步骤205中的H N(I),I为输入层的输出数据,即上述步骤204中的F M(J),*为卷积操作,U v为输出层的权重系数,C v为输出层的偏移系数。
步骤207、获取卷积神经网络输出的第一输出数据,第一输出数据包含卷积神经网络对目标区域的第二颜色分量的信息的预测值。
在视频编解码领域,获取的第一输出数据即为重建后的第二颜色分量的信息,可以基于该第一输出数据进行后续操作,其过程可以参考上述图1和图2的过程,本公开实施例对此不再赘述。
需要说明的是,上述图16是以卷积神经网络包括一个输入层,一个隐含层和一个输出层,且目标区域为3×3像素点为例进行说明,卷积神经网络还可以有其他结构,本公开实施例对此不做限定。
对于一个图像块大小确定(比如在采用视频编码标准H.265进行编码时,最小的图像块(或称处理块)大小为4x4个像素点,本公开实施例提供的跨分量帧内预测方法可按照每4x4个像素点进行)的跨分量帧内预测,其对应的卷积神经网络的参数集需要通过训练(也称预训练)获得。在确定一初始卷积神经网络的网络架构后,比如卷积层数、卷积层的连接方式、每一层卷积层的卷积滤波器数量及其卷积核大小等参数,每个卷积层的权重系数(即各个卷积滤波器的权重系数)和每个卷积层的偏移系数(即各个卷积滤波器的偏移系数)需要通过训练获得,最终对初始卷积神经网络训练得到的网络为上述卷积神经网络。因此,为了保证卷积神经网络的预测准确性,在步骤201之前,需要对初始卷积神经网络进行训练以得到上述卷积神经网络,该初始卷积神经网络的网络架构与上述卷积神经网络相同,该卷积神经网络的训练过程包括:
步骤A1、通过第一通道向初始卷积神经网络输入第二输入数据。
初始卷积神经网络在设计时需充分考虑网络感知野、复杂度以及解决问题的能力等。本公开实施例并不对该初始卷积神经网络的网络架构进行限定。
其中,第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,该第一指定图像帧可以是预先设置的测试图像帧,也可以是随机选取的图像帧,该第一指定图像帧与上述待处理图像帧通常是不同的。第一指定图像帧中训练区域与目标区域的尺寸相同,第二输入数据的获取方式与第一输入数据的获取方式相同,具体过程请参考上述步骤201至202。
步骤B1、将第一指定图像帧中训练区域对应的原始数据作为训练标签,对初始卷积神经网络进行训练以得到卷积神经网络。
该原始数据由第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。该训练区域中已知的第二颜色分量的信息是该训练区域中未进行处理的第二颜色分量的信息,训练区域中已知的第二颜色分量的信息是预测的理想结果,也即是若对训练区域的第二颜色分量的预测完全准确,得到的数据即为该原始数据。
目前可以通过指定训练平台对该初始卷积神经网络进行训练,该训练过程可以包括配置学习率等参数的过程。示例的,上述训练过程可以基于监督学习算法(英文:supervised learning)的训练方式来实现,监督学习算法是通过已有的训练集(也称训练样本,即已知数据以及其对应的训练标签,该训练标签可以为明确的标识或者输出结果)来训练,以训练得到相应参数。示例的,训练过程还可以通过人工标定,或者无监督学习算法,或者半监督学习算法等方式实现,本公开实施例对此不作限定。
综上所述,本公开实施例将包含待处理图像帧中目标区域的第一颜色分量的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学习等特性,使得最终预测得到的第二颜色分量可靠性较高。
第二种可实现方式,通过第一通道向卷积神经网络输入第一输入数据,并且通过至少一个第二通道分别向卷积神经网络输入至少一个第一边信息数据,以进行颜色分量的跨分量帧内预测,该卷积神经网络用于基于第一输入数据和至少一个第一边信息数据预测得到第一输出数据,边信息(英文:side information)是指待处理信息外的已有的先验知识,边信息数据为能够作为边信息的数据,比如在进行颜色分量的帧内预测时,待处理信息是第一输入数据, 则第一边信息数据与第一输入数据不同,该第一边信息数据可以包含除第一输入数据包含的颜色分量的信息之外的信息,能够为卷积神经网络提供预测参考。例如,帧内预测模式(例如帧内预测的方向模式)便可以作为一种边信息,则上述帧内预测模式的数据即为边信息数据。本公开实施例中的第一边信息数据为输入至卷积神经网络中的边信息数据。
在第二种可实现方式中,第一输入数据和第一输出数据所包含的内容可以参考上述第一种可实现方式,本公开实施例对此不再赘述。
请参考图17,假设第一输入数据包括待处理图像帧中目标区域的所有第一颜色分量的信息,该第一输出数据包括卷积神经网络对目标区域的所有第二颜色分量的信息,例如,该待处理图像帧为视频图像帧,该颜色分量的帧内预测方法,可以包括:
步骤301、确定待处理图像帧中第一颜色分量和第二颜色分量的采样率关系。
步骤301可以参考上述步骤201,本公开实施例对此不再赘述。
步骤302、基于采样率关系,根据目标区域中第一颜色分量的信息,确定第一输入数据。
步骤302可以参考上述步骤202,本公开实施例对此不再赘述。
步骤303、确定至少一个第一边信息数据,每个第一边信息数据包含除第一输入数据包含的颜色分量的信息之外的信息。
示例的,该至少一个第一边信息数据可以包括目标区域中已重建的第一颜色分量的相关信息T1和/或目标区域的第二周边区域中已重建的第二颜色分量的信息T2的平均值或加权平均值(即T1的平均值或加权平均值,T2的平均值或加权平均值,或者,T1和T2的平均值,或T1和T2的加权平均值),只要起到提供额外信息(与第一输入数据包含的颜色分量的信息不同),提高预测准确度的作用即可。目标区域的第二周边区域为位于目标区域左侧和/或上方的带状区域,该带状区域与目标区域邻接,第二周边区域的定义可以参考上述步骤201的第一周边区域,本公开实施例对此不再赘述。
需要说明的是,每个第一边信息数据的尺寸和数值个数应该与第一输入数据的尺寸和数值个数对应一致,例如,第一输入数据包括x行y列个第一采样块的颜色分量的信息,且包括x×y个颜色分量的信息(也可以称为分量值或数值),相应的,每个第一边信息数据也包含x×y个信息,不过,第一边信息数 据包含的信息不是颜色分量的信息,而是平均值或加权平均值等。
并且,参考上述步骤301和302可知,第一输入数据可以仅包含目标区域中第一颜色分量的信息,也可以同时包含目标区域中第一颜色分量的信息,以及第一周边区域中第二颜色分量的信息,因此,第一输入数据包含一种或两种颜色分量的信息,本公开实施例中,第一边信息数据无需区分第一输入数据所涉及的颜色分量,仅根据本公开实施例中所使用的卷积神经网络的需要,参考第一输入数据的尺寸和数值个数来生成。
可选的,第一边信息数据可以只有一个。上述确定至少一个第一边信息数据的方式有多种,本公开实施例以以下两种方式为例进行说明:
第一种方式,基于目标区域中已重建的第一颜色分量的相关信息,确定至少一个第一边信息数据。
可选的,待处理图像帧的颜色编码格式为YUV格式,第一输入数据包括x行y列个第一采样块的颜色分量的信息,该x和该y均为大于或等于1的整数,假设第一边信息数据可以只有一个。
则可以获取每个第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;将所有帧内预测模式的标识值组成一个第一边信息数据。最终得到的第一边信息数据包括x行y列个标识值,该标识值为数值。例如帧内预测模式可以为方向性模式。
例如,第一颜色分量为亮度分量,在H.265中有35种帧内预测模式,第一采样块为1个像素点,第一输入数据包括8×8个像素点,8x8块内每4×4个子块的亮度的帧内预测模式的标识值分别为3、17、22和33。则第一边信息可以如表5所示。
表5
3 3 3 3 17 17 17 17
3 3 3 3 17 17 17 17
3 3 3 3 17 17 17 17
3 3 3 3 17 17 17 17
22 22 22 22 33 33 33 33
22 22 22 22 33 33 33 33
22 22 22 22 33 33 33 33
22 22 22 22 33 33 33 33
第二种方式,基于目标区域的第二周边区域中已重建的第二颜色分量的信息,确定至少一个第一边信息数据。
可选的,假设第一边信息数据只有一个,第一输入数据包括x行y列个第一采样块的颜色分量的信息,该x和该y均为大于或等于1的整数。
则可以获取目标区域的第二周边区域中已重建的第二颜色分量的信息;确定目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值(实际应用中也可以为加权平均值);并生成一个第一边信息数据,其中,第一边信息数据包括x行y列个平均值。
例如,请参考图10,假设第二周边区域可以与第一周边区域K的尺寸相同,均由位于目标区域左侧的2列像素点和上方的2行像素点组成,第二颜色分量为色度分量U,假设该第二周边区域中已重建的第二颜色分量的信息的平均值为117,则,若第一输入数据包括3行3列个第一采样块的颜色分量的数值,则如表6所示,第一边信息数据包括3行3列个色度分量U的数值,每个数值均为117。
表6
117 117 117
117 117 117
117 117 117
步骤304、通过第一通道向卷积神经网络输入第一输入数据,第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息。
步骤304可以参考上述步骤203,本公开实施例对此不再赘述。
步骤305、通过至少一个第二通道分别向卷积神经网络输入至少一个第一边信息数据,该至少一个第二通道与至少一个第一边信息数据一一对应。
步骤306、通过输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的多维卷积滤波和非线性映射后的输入数据进行合并(如相加),得到输入层的输出数据。
通常输入层可以包含至少一个通道,在本公开实施例中,由于需要向输入层分别输入第一输入数据和至少一个第一边信息数据,因此,该输入层包括至少两个通道,即一个第一通道和至少一个第二通道。上述步骤304和305可以同时执行,也可以依次执行,本公开实施例对此不做限定。颜色分量的帧内预测装置可以通过输入层对每个通道输入的数据分别进行多维卷积滤波和非线 性映射,并将不同通道的多维卷积滤波和非线性映射后的输入数据进行合并(如相加),得到输入层的输出数据。
示例的,输入层包括分别与每个通道对应的依次连接的至少一个卷积层,以及合并层,每个卷积层包括一个特征提取层和一个特征映射层,则上述步骤306包括:
步骤A2、在每个卷积层中:通过特征提取层对输入的数据进行多维卷积滤波,并通过特征映射层对输入的数据进行非线性映射。
步骤306中所提供的输入层中卷积层的结构可以参考上述步骤204中所提供的卷积层的结构,本公开实施例对此不再赘述。
步骤B2、通过合并层将经过不同通道对应的至少一个卷积层处理后的数据进行合并,得到输入层的输出数据。
假设输入层包含M个卷积层,M≥1,每个特征提取层包括一个卷积滤波器组,每个卷积滤波器组包括至少一个卷积滤波器(也称卷积核),特征映射层的非线性映射函数为r(),则输入层的输出数据满足:
Figure PCTCN2018113779-appb-000007
其中,F M(J)表示输入层中第M个卷积层的输出数据,也即是输入层的输出数据,J为第一输入数据,*为卷积操作,W M为该输入层第M个卷积层中卷积滤波器组的权重系数,B M为第M个卷积层中卷积滤波器组的偏移系数,S i为第i个第一边信息数据,W si为第i个第一边信息数据的权重系数,B si为第i个第一边信息数据的偏移系数,s1为第一边信息数据的个数。
示例的,如图18所示,图18为本公开实施例提供的另一种卷积神经网络的结构示意图,输入层包括第一输入通道和第二输入通道共两个通道,每个通道连接一个卷积层,每个卷积层包括特征提取层和特征映射层。其中,特征映射层设置有激活函数,该激活函数为非线性映射函数。输入层的输出数据满足:
F 1(J)=r(W 1*J+B 1+W s1*S 1+B s1)。
其中,各个参数的含义参考上述公式(1),本公开实施例对此不再赘述。
需要说明的是,若存在取值范围与第一输入数据的取值范围不同的边信息数据在上述步骤306中在将经过不同通道的输入数据进行卷积之前,检测至少一个第一边信息数据中是否存在取值范围与第一输入数据的取值范围不同的 边信息数据,当至少一个第一边信息数据中任一边信息与第一输入数据的取值范围不同时,可以对该任一边信息数据进行标准化处理,使得处理后的该任一边信息数据的取值范围与第一输入数据的取值范围相同。
该标准化处理过程可以是线性映射过程,或者归一化过程。例如,该任一边信息数据的取值范围为[PredMode MIN,PredMode MAX],第一输入数据的取值范围的取值范围为[Pixel MIN,Pixel MAX],若该任一边信息数据中的第一信息为x,则对于该第一信息,相应的归一化公式为:
norm(x)=(x-PredMode MIN)×(PredMode MAX-PredMode MIN)/(Pixel MAX-Pixel MIN)+Pixel MIN
其中,第一信息为该任一边信息数据包含的x行y列个信息中的任一信息,norm(x)为归一化后的第一信息。
例如,至少一个第一边信息数据中的某一第一边信息数据包含帧内预测模式的标识值,其取值范围为1-35,而第一输入数据的取值范围为0-255,则将该某一第一边信息数据中的所有信息分别代入上述归一化公式,以对该某一第一边信息数据进行标准化处理,使得处理后的该某一第一边信息数据的取值范围为0-255。
值的说明的是,上述标准化处理过程可以在第一输入数据输入卷积神经网络前执行,也可以在卷积神经网络中执行,本公开实施例对此不作限定。
步骤307、通过隐含层对输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据。
步骤307可以参考上述步骤205,本公开实施例对此不再赘述。
步骤308、通过输出层对高维图像数据进行聚合,得到第一输出数据。
步骤307可以参考上述步骤206,本公开实施例对此不再赘述。
步骤309、获取卷积神经网络输出的第一输出数据,第一输出数据包含卷积神经网络对目标区域的第二颜色分量的信息的预测值。
步骤307可以参考上述步骤207,本公开实施例对此不再赘述。
请参考第一种可实现方式,为了保证卷积神经网络的预测准确性,在步骤301之前,需要对初始卷积神经网络进行训练以得到上述卷积神经网络,该卷积神经网络的训练过程包括:
步骤A3、通过第一通道向卷积神经网络输入第三输入数据。
初始卷积神经网络在设计时需充分考虑网络感知野、复杂度以及解决问题 的能力等。本公开实施例并不对该初始卷积神经网络的网络架构进行限定。
其中,第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,该第二指定图像帧可以是预先设置的测试图像帧,也可以是随机选取的图像帧,该第二指定图像帧与上述待处理图像帧通常不同。第二指定图像帧中训练区域与目标区域的尺寸相同,第三输入数据的获取方式与第一输入数据的获取方式相同。具体过程请参考上述步骤201至202。
步骤B3、通过至少一个第二通道分别向初始卷积神经网络输入至少一个第二边信息数据。
至少一个第二通道与至少一个第二边信息数据一一对应,至少一个第二边信息数据的获取方式与至少一个第一边信息数据的获取方式相同。具体过程请参考上述步骤303。
步骤C3、将第二指定图像帧中训练区域对应的原始数据作为训练标签,对初始卷积神经网络进行训练以得到卷积神经网络。
原始数据由第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。该训练区域中已知的第二颜色分量的信息是该训练区域中未进行处理的第二颜色分量的信息,训练区域中已知的第二颜色分量的信息是预测的理想结果,也即是若对训练区域的第二颜色分量的预测完全准确,得到的数据即为该原始数据。
上述步骤A3至C3可以参考第一种可实现方式中的A1至C1,本公开实施例对此不再赘述。
需要说明的是,上述图18是以卷积神经网络包括一个输入层,一个隐含层和一个输出层,且目标区域为3×3像素点为例进行说明,卷积神经网络还可以有其他结构,本公开实施例对此不做限定。
综上所述,本公开实施例将包含待处理图像帧中目标区域的第一颜色分量的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学习等特性,使得最终预测得到的第二颜色分量可靠性较高,并且,通过向卷积神经网络中输入至少一个第一边信息数据,进一步增加了预测的准确性。
本公开实施例提供一种颜色分量的帧内预测装置40,如图19所示,所述 装置40包括:
第一输入模块401,用于通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
获取模块402,用于获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
综上所述,本公开实施例第一输入模块将包含待处理图像帧中目标区域的第一颜色分量的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学习等特性,使得最终预测得到的第二颜色分量可靠性较高。
可选的,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
如图20所示,所述装置40还包括:
第一确定模块403,用于在所述通过第一通道向卷积神经网络输入第一输入数据之前,确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
第二确定模块404,用于基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
可选的,如图21所示,所述第二确定模块404,包括:
第一获取子模块4041,用于获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
第二获取子模块4042,用于获取所述目标区域中已重建的第一颜色分量的信息;
第一确定子模块4043,用于基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
可选的,所述第一确定子模块4043,用于:
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
进一步的,如图22所示,所述装置40还包括:
第一训练模块405,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,如图23所示,所述装置40还包括:
第三确定模块406,用于确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
第二输入模块407,用于通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第 一边信息数据一一对应。
可选的,如图24所示,所述第三确定模块406,包括:
第二确定子模块4061,用于基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
和/或,第三确定子模块4062,用于基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述第二确定子模块4061,用于:
获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
可选的,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
所述第三确定子模块4062,用于:
获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列个所述平均值。
可选的,如图25所示,所述装置40还包括:
标准化模块408,用于当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据的取值范围相同。
可选的,如图26所示,所述装置40还包括:
第二训练模块409,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据 包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
可选的,如图27所示,所述卷积神经网络包括输入层、隐含层和输出层;所述装置40还包括:
第一处理模块410,用于在所述获取所述卷积神经网络输出的第一输出数据之前,当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
第二处理模块411,用于当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
高维处理模块412,用于通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据;
聚合模块413,用于通过所述输出层对所述高维图像数据进行聚合,得到所述第一输出数据。
可选的,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述第二处理模块411,用于:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
可选的,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
所述高维处理模块412,用于:
在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
可选的,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
综上所述,本公开实施例第一输入模块将包含待处理图像帧中目标区域的第一颜色分量的信息的第一输入数据输入至卷积神经网络,由卷积神经网络进行处理得到包含第二颜色分量的信息的第一输出数据,从而实现了卷积神经网络对颜色分量的帧内预测,由于卷积神经网络所具有的深度学习等特性,使得最终预测得到的第二颜色分量可靠性较高。
本公开实施例还提供一种计算机设备,包括:
处理器;
用于存储所述处理器的可执行指令的存储器;
其中,所述处理器被配置为:
通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
图28是根据一示例性实施例示出的一种计算机设备的结构示意图。所述计算机设备500包括中央处理单元(CPU)501、包括随机存取存储器(RAM)502和只读存储器(ROM)503的系统存储器504,以及连接系统存储器504和中央处理单元501的系统总线505。所述计算机设备500还包括帮助计算机 内的各个器件之间传输信息的基本输入/输出系统(I/O系统)506,和用于存储操作系统513、应用程序514和其他程序模块515的大容量存储设备507。
所述基本输入/输出系统506包括有用于显示信息的显示器508和用于用户输入信息的诸如鼠标、键盘之类的输入设备509。其中所述显示器508和输入设备509都通过连接到系统总线505的输入输出控制器510连接到中央处理单元501。所述基本输入/输出系统506还可以包括输入输出控制器510以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器510还提供输出到显示屏、打印机或其他类型的输出设备。
所述大容量存储设备507通过连接到系统总线505的大容量存储控制器(未示出)连接到中央处理单元501。所述大容量存储设备507及其相关联的计算机可读介质为计算机设备500提供非易失性存储。也就是说,所述大容量存储设备507可以包括诸如硬盘或者CD-ROM驱动器之类的计算机可读介质(未示出)。
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、EPROM、EEPROM、闪存或其他固态存储其技术,CD-ROM、DVD或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器504和大容量存储设备507可以统称为存储器。
根据本发明的各种实施例,所述计算机设备500还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备500可以通过连接在所述系统总线505上的网络接口单元511连接到网络512,或者说,也可以使用网络接口单元511来连接到其他类型的网络或远程计算机系统(未示出)。
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序存储于存储器中,中央处理器501通过执行该一个或一个以上程序来实现上述颜色分量的帧内预测方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器,上述指令可由计算机设备的处理器执行以完成本 发明各个实施例所示的表情图片推荐方法。例如,所述非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本公开实施例一种可读存储介质,该可读存储介质为非易失性可读存储介质,所述可读存储介质中存储有指令,当所述可读存储介质在处理组件上运行时,使得处理组件执行本公开实施例提供的任一所述的颜色分量的帧内预测方法。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (34)

  1. 一种颜色分量的帧内预测方法,所述方法包括:
    通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
    获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
    其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
  2. 根据权利要求1所述的方法,其中,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
  3. 根据权利要求2所述的方法,其中,在所述通过第一通道向卷积神经网络输入第一输入数据之前,所述方法还包括:
    确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
    基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
  4. 根据权利要求3所述的方法,其中,
    所述基于所述采样率关系,确定所述第一输入数据,包括:
    获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
    获取所述目标区域中已重建的第一颜色分量的信息;
    基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
  5. 根据权利要求4所述的方法,其中,
    所述基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据,包括:
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
  6. 根据权利要求1至5任一所述的方法,其中,所述方法还包括:
    对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
    通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
    将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
  7. 根据权利要求1所述的方法,其中,所述方法还包括:
    确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
    通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第一边信息数据一一对应。
  8. 根据权利要求7所述的方法,其中,所述确定至少一个第一边信息数据,包括:
    基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
    和/或,基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
  9. 根据权利要求8所述的方法,其中,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
    所述基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据,包括:
    获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
    将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
  10. 根据权利要求8所述的方法,其中,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
    所述基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,包括:
    获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
    确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
    生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列 个所述平均值。
  11. 根据权利要求8至10任一所述的方法,其中,所述方法还包括:
    当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据的取值范围相同。
  12. 根据权利要求7所述的方法,其中,所述方法还包括:
    对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
    通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
    通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
    将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
  13. 根据权利要求1所述的方法,其中,所述卷积神经网络包括输入层、隐含层和输出层;
    在所述获取所述卷积神经网络输出的第一输出数据之前,所述方法还包括:
    当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
    当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
    通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映 射,得到高维图像数据;
    通过所述输出层对所述高维图像数据进行聚合,得到所述第一输出数据。
  14. 根据权利要求13所述的方法,其中,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一个特征提取层和一个特征映射层,
    所述通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据,包括:
    在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
    通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
  15. 根据权利要求13所述的方法,其中,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
    所述通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据,包括:
    在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
    将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
  16. 根据权利要求1所述的方法,其中,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
    或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
  17. 一种颜色分量的帧内预测装置,所述装置包括:
    第一输入模块,用于通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
    获取模块,用于获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
    其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
  18. 根据权利要求17所述的装置,其中,所述第一输入数据包括所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,所述目标区域的第一周边区域为位于所述目标区域左侧和/或上方的带状区域。
  19. 根据权利要求18所述的装置,其中,所述装置还包括:
    第一确定模块,用于在所述通过第一通道向卷积神经网络输入第一输入数据之前,确定所述待处理图像帧中第一颜色分量和第二颜色分量的采样率关系;
    第二确定模块,用于基于所述采样率关系,确定所述第一输入数据,所述第一输入数据中,所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度。
  20. 根据权利要求19所述的装置,其中,
    所述第二确定模块,包括:
    第一获取子模块,用于获取所述目标区域的第一周边区域中已重建的第二颜色分量的信息;
    第二获取子模块,用于获取所述目标区域中已重建的第一颜色分量的信息;
    第一确定子模块,用于基于所述采样率关系,根据所述第一周边区域中已重建的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息,确定所述第一输入数据。
  21. 根据权利要求20所述的装置,其中,
    所述第一确定子模块,用于:
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例为1:1,将所述目标区域的第一周边区域中已重建的第二颜色分量的信息, 与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例大于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行上采样,使得上采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将上采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据;
    当所述目标区域中第一颜色分量和第二颜色分量的采样率关系为:采样率比例小于1:1,基于所述采样率比例,对所述第一周边区域中已重建的第二颜色分量的信息进行下采样,使得下采样后所述第一周边区域中第二颜色分量的分布密度等于所述目标区域中第一颜色分量的分布密度,并将下采样得到的第二颜色分量的信息,与所述目标区域中已重建的第一颜色分量的信息确定为所述第一输入数据。
  22. 根据权利要求17至21任一所述的装置,其中,所述装置还包括:
    第一训练模块,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
    通过所述第一通道向所述初始卷积神经网络输入第二输入数据,所述第二输入数据包括第一指定图像帧中训练区域的第一颜色分量的信息,所述第一指定图像帧中训练区域与所述目标区域的尺寸相同,所述第二输入数据的获取方式与所述第一输入数据的获取方式相同;
    将所述第一指定图像帧中训练区域对应的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第一指定图像帧中训练区域中已知的第二颜色分量的信息组成。
  23. 根据权利要求17所述的装置,其中,所述装置还包括:
    第三确定模块,用于确定至少一个第一边信息数据,每个所述第一边信息数据包含除所述第一输入数据包含的颜色分量的信息之外的信息;
    第二输入模块,用于通过至少一个第二通道分别向所述卷积神经网络输入所述至少一个第一边信息数据,所述至少一个第二通道与所述至少一个第一边信息数据一一对应。
  24. 根据权利要求23所述的装置,其中,所述第三确定模块,包括:
    第二确定子模块,用于基于所述目标区域中已重建的第一颜色分量的相关信息,确定所述至少一个第一边信息数据;
    和/或,第三确定子模块,用于基于所述目标区域的第二周边区域中已重建的第二颜色分量的信息,确定所述至少一个第一边信息数据,所述目标区域的第二周边区域为位于所述目标区域左侧和/或上方的带状区域。
  25. 根据权利要求24所述的装置,其中,所述待处理图像帧的颜色编码格式为YUV格式,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
    所述第二确定子模块,用于:
    获取每个所述第一采样块中已重建的第一颜色分量的帧内预测模式的标识值;
    将所有所述帧内预测模式的标识值组成一个所述第一边信息数据。
  26. 根据权利要求24所述的装置,其中,所述第一输入数据包括x行y列个第一采样块的颜色分量的信息,所述x和所述y均为大于或等于1的整数;
    所述第三确定子模块,用于:
    获取所述目标区域的第二周边区域中已重建的第二颜色分量的信息;
    确定所述目标区域的第二周边区域中已重建的第二颜色分量的信息的平均值;
    生成一个所述第一边信息数据,其中,所述第一边信息数据包括x行y列个所述平均值。
  27. 根据权利要求24至26任一所述的装置,其中,所述装置还包括:
    标准化模块,用于当所述至少一个第一边信息数据中任一边信息数据的取值范围与所述第一输入数据的取值范围不同时,对所述任一边信息数据进行标准化处理,使得处理后的所述任一边信息数据的取值范围与所述第一输入数据的取值范围相同。
  28. 根据权利要求23所述的装置,其中,所述装置还包括:
    第二训练模块,用于对初始卷积神经网络进行训练以得到所述卷积神经网络,所述卷积神经网络的训练过程包括:
    通过所述第一通道向卷积神经网络输入第三输入数据,所述第三输入数据包括第二指定图像帧中训练区域的第一颜色分量的信息,所述第二指定图像帧中训练区域与所述目标区域的尺寸相同,所述第三输入数据的获取方式与所述第一输入数据的获取方式相同;
    通过所述至少一个第二通道分别向所述初始卷积神经网络输入至少一个第二边信息数据,所述至少一个第二通道与所述至少一个第二边信息数据一一对应,所述至少一个第二边信息数据的获取方式与所述至少一个第一边信息数据的获取方式相同;
    将所述第二指定图像帧中训练区域对应的第二颜色分量的原始数据作为训练标签,对所述初始卷积神经网络进行训练以得到所述卷积神经网络,所述原始数据由所述第二指定图像帧中训练区域中已知的第二颜色分量的信息组成。
  29. 根据权利要求17所述的装置,其中,所述卷积神经网络包括输入层、隐含层和输出层;所述装置还包括:
    第一处理模块,用于在所述获取所述卷积神经网络输出的第一输出数据之前,当所述输入层有一个通道存在输入数据时,通过所述输入层对第一输入数据进行多维卷积滤波和非线性映射,得到所述输入层的输出数据;
    第二处理模块,用于当所述输入层有至少两个通道存在输入数据时,通过所述输入层对每个通道输入的数据分别进行多维卷积滤波和非线性映射,并将不同通道的所述多维卷积滤波和非线性映射后的输入数据进行合并,得到所述输入层的输出数据;
    高维处理模块,用于通过所述隐含层对所述输入层的输出数据进行多维卷积滤波和非线性映射,得到高维图像数据;
    聚合模块,用于通过所述输出层对所述高维图像数据进行聚合,得到所述第一输出数据。
  30. 根据权利要求29所述的装置,其中,所述输入层包括分别与所述每个通道对应的依次连接的至少一个卷积层,以及合并层,每个所述卷积层包括一 个特征提取层和一个特征映射层,
    所述第二处理模块,用于:
    在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
    通过所述合并层将经过不同通道对应的所述至少一个卷积层处理后的数据进行合并,得到所述输入层的输出数据。
  31. 根据权利要求29所述的装置,其中,所述隐含层包括依次连接的至少一个卷积层,每个所述卷积层包括一个特征提取层和一个特征映射层,
    所述高维处理模块,用于:
    在每个卷积层中:通过所述特征提取层对输入的数据进行多维卷积滤波,并通过所述特征映射层对所述输入的数据进行非线性映射;
    将经过所述至少一个卷积层处理的数据作为所述高维图像数据。
  32. 根据权利要求17所述的装置,其中,所述待处理图像帧的颜色编码格式为YUV格式,所述第一颜色分量和所述第二颜色分量为亮度分量Y、色度分量U和色度分量V中的两种;
    或者,所述待处理图像帧的颜色编码格式为RGB格式,所述第一颜色分量和所述第二颜色分量为红色分量、绿色分量和蓝色分量中的两种。
  33. 一种计算机设备,包括:
    处理器;
    用于存储所述处理器的可执行指令的存储器;
    其中,所述处理器被配置为:
    通过第一通道向卷积神经网络输入第一输入数据,所述第一输入数据包含待处理图像帧中目标区域的第一颜色分量的信息;
    获取所述卷积神经网络输出的第一输出数据,所述第一输出数据包含所述卷积神经网络对所述目标区域的第二颜色分量的信息的预测值;
    其中,所述第一颜色分量和所述第二颜色分量为所述目标区域具有的不同的颜色分量。
  34. 一种可读存储介质,所述可读存储介质中存储有指令,当所述可读存储介质在处理组件上运行时,使得处理组件执行权利要求1至16任一所述的颜色分量的帧内预测方法。
PCT/CN2018/113779 2017-11-29 2018-11-02 颜色分量的帧内预测方法及装置 WO2019105179A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711223298.2 2017-11-29
CN201711223298.2A CN109842799B (zh) 2017-11-29 2017-11-29 颜色分量的帧内预测方法、装置及计算机设备

Publications (1)

Publication Number Publication Date
WO2019105179A1 true WO2019105179A1 (zh) 2019-06-06

Family

ID=66664687

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/113779 WO2019105179A1 (zh) 2017-11-29 2018-11-02 颜色分量的帧内预测方法及装置

Country Status (2)

Country Link
CN (1) CN109842799B (zh)
WO (1) WO2019105179A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422986A (zh) * 2022-11-07 2022-12-02 深圳传音控股股份有限公司 处理方法、处理设备及存储介质

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
MX2022000207A (es) * 2019-06-27 2022-03-22 Hfi Innovation Inc Método y aparato de filtracion de bucle adaptable con componente cruzado para codificación de video.
WO2021035717A1 (zh) * 2019-08-30 2021-03-04 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统
CN110602491B (zh) * 2019-08-30 2022-07-19 中国科学院深圳先进技术研究院 帧内色度预测方法、装置、设备及视频编解码系统
WO2022088101A1 (zh) * 2020-10-30 2022-05-05 Oppo广东移动通信有限公司 编码方法、解码方法、编码器、解码器及存储介质
CN116686288A (zh) * 2021-01-22 2023-09-01 Oppo广东移动通信有限公司 编码方法、解码方法、编码器、解码器以及电子设备
CN118042192A (zh) * 2021-03-12 2024-05-14 腾讯科技(深圳)有限公司 点云编码、解码的方法、装置及设备
WO2024022390A1 (en) * 2022-07-27 2024-02-01 Mediatek Inc. Method and apparatus of improving performance of convolutional cross-component model in video coding system
WO2024077520A1 (zh) * 2022-10-12 2024-04-18 Oppo广东移动通信有限公司 编解码方法、码流、编码器、解码器以及存储介质
CN116343708B (zh) * 2023-05-30 2023-08-04 深圳市深远通科技有限公司 一种消除动态图像色彩偏移的方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106254879A (zh) * 2016-08-31 2016-12-21 广州精点计算机科技有限公司 一种应用自编码神经网络的有损图像压缩方法
CN107277520A (zh) * 2017-07-11 2017-10-20 中国科学技术大学 帧内预测的码率控制方法
WO2017200447A1 (en) * 2016-05-16 2017-11-23 Telefonaktiebolaget Lm Ericsson (Publ) Pixel processing with color component

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017200447A1 (en) * 2016-05-16 2017-11-23 Telefonaktiebolaget Lm Ericsson (Publ) Pixel processing with color component
CN106254879A (zh) * 2016-08-31 2016-12-21 广州精点计算机科技有限公司 一种应用自编码神经网络的有损图像压缩方法
CN107277520A (zh) * 2017-07-11 2017-10-20 中国科学技术大学 帧内预测的码率控制方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KALCHBRENNER, NAL ET AL.: "Video Pixel Networks", 3 October 2016 (2016-10-03), pages 1 - 16, XP055427937, Retrieved from the Internet <URL:http://arxiv.org/pdf/1610.00527.pdf> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115422986A (zh) * 2022-11-07 2022-12-02 深圳传音控股股份有限公司 处理方法、处理设备及存储介质
CN115422986B (zh) * 2022-11-07 2023-08-22 深圳传音控股股份有限公司 处理方法、处理设备及存储介质

Also Published As

Publication number Publication date
CN109842799A (zh) 2019-06-04
CN109842799B (zh) 2021-02-09

Similar Documents

Publication Publication Date Title
WO2019105179A1 (zh) 颜色分量的帧内预测方法及装置
CN113596482B (zh) 环路滤波实现方法、装置及计算机存储介质
JP7239711B2 (ja) クロマブロック予測方法及び装置
US20230069953A1 (en) Learned downsampling based cnn filter for image and video coding using learned downsampling feature
WO2020103800A1 (zh) 视频解码方法和视频解码器
TW202234890A (zh) 通過指示特徵圖資料進行編碼
US20230262212A1 (en) Picture prediction method, encoder, decoder, and computer storage medium
TWI805085B (zh) 基於機器學習的圖像解碼中色度子採樣格式的處理方法
JP2023547941A (ja) ニューラルネットワーク・ベースのビットストリームのデコードとエンコード
US20230076920A1 (en) Global skip connection based convolutional neural network (cnn) filter for image and video coding
CN116250235A (zh) 具有基于神经网络的环路滤波的视频编解码
JP2023548507A (ja) セグメンテーション情報のシグナリングを用いた復号化
CN109996083B (zh) 帧内预测方法及装置
JP2023528641A (ja) チャネル間相関情報を用いた適応的画像向上
Hu et al. An adaptive two-layer light field compression scheme using GNN-based reconstruction
TWI807491B (zh) 基於機器學習的圖像編解碼中的色度子採樣格式處理方法
WO2022111233A1 (zh) 帧内预测模式的译码方法和装置
WO2022166462A1 (zh) 编码、解码方法和相关设备
WO2022266955A1 (zh) 图像解码及处理方法、装置及设备
EP4210327A1 (en) Intra frame prediction method and device
WO2022077490A1 (zh) 一种帧内预测方法、编码器、解码器及存储介质
TWI834087B (zh) 用於從位元流重建圖像及用於將圖像編碼到位元流中的方法及裝置、電腦程式產品
WO2023197194A1 (zh) 编解码方法、装置、编码设备、解码设备以及存储介质
TW202416712A (zh) 使用神經網路進行圖像區域的並行處理-解碼、後濾波和rdoq
WO2024002497A1 (en) Parallel processing of image regions with neural networks – decoding, post filtering, and rdoq

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18884702

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18884702

Country of ref document: EP

Kind code of ref document: A1