WO2020083385A1 - 图像处理的方法、装置及系统 - Google Patents

图像处理的方法、装置及系统 Download PDF

Info

Publication number
WO2020083385A1
WO2020083385A1 PCT/CN2019/113356 CN2019113356W WO2020083385A1 WO 2020083385 A1 WO2020083385 A1 WO 2020083385A1 CN 2019113356 W CN2019113356 W CN 2019113356W WO 2020083385 A1 WO2020083385 A1 WO 2020083385A1
Authority
WO
WIPO (PCT)
Prior art keywords
video data
frequency domain
information component
processed video
component
Prior art date
Application number
PCT/CN2019/113356
Other languages
English (en)
French (fr)
Inventor
王莉
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Publication of WO2020083385A1 publication Critical patent/WO2020083385A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/11Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/154Measured or subjectively estimated visual quality after decoding, e.g. measurement of distortion
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria

Definitions

  • This application relates to the field of video encoding and decoding, and in particular to an image processing method, device, and system.
  • the original video image when encoding an original video image, the original video image is processed multiple times to obtain a reconstructed image.
  • the reconstructed image can be used as a reference image to encode the original video image.
  • the reconstructed image obtained after the original video image is processed multiple times may have been shifted from the original video image by pixels. That is, the reconstructed image has distortion, which affects the subjective quality of the reconstructed image.
  • Embodiments of the present application provide an image processing method, video decoding method, device, and system to remove image distortion.
  • the technical solution is as follows:
  • the present application provides an image processing method, the method including:
  • the processed video data is distorted relative to the original video data of the input encoding system, and the side information component indicates that the processed video data is relative to the original Distortion characteristics of video data;
  • the frequency domain information component and the side information component are input into a convolutional neural network model and filtered to obtain a de-distorted frequency domain information component.
  • the frequency domain information component is obtained after filtering;
  • a de-distorted image corresponding to the processed video data is generated.
  • the present application provides an image processing method, the method including:
  • the processed video data is distorted relative to the original video data before encoding corresponding to the video bit stream input to the decoding system, and the side information component indicates the Distortion characteristics of processed video data relative to the original video data;
  • the frequency domain information component and the side information component are input into a convolutional neural network model to perform convolution filtering to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is guided by the side information component Obtained after filtering the frequency domain information component;
  • a de-distorted image corresponding to the processed video data is generated.
  • the present application provides an image processing apparatus, the apparatus including:
  • An obtaining module configured to obtain the frequency domain information component and the side information component corresponding to the processed video data, the processed video data is distorted relative to the original video data of the input encoding system, and the side information component represents the processed video Distortion characteristics of data relative to the original video data;
  • the filtering module is configured to input the frequency domain information component and the side information component into a convolutional neural network model and perform filtering processing to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is the side information component Obtained after filtering the frequency domain information component to guide;
  • the generating module is configured to generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the present application provides an image processing apparatus, the apparatus including:
  • An obtaining module configured to obtain the frequency domain information component and the side information component corresponding to the processed video data, the processed video data is distorted relative to the original video data before encoding corresponding to the video bit stream input to the decoding system, and the side
  • the information component represents the distortion characteristics of the processed video data relative to the original video data
  • the filtering module is configured to input the frequency domain information component and the side information component into a convolutional neural network model and perform convolution filtering to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is based on the side
  • the information component is obtained after guiding filtering the frequency domain information component;
  • the generating module is configured to generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is executed by a processor to implement the first aspect or any of the first aspect
  • the present application provides an electronic device, wherein the electronic device includes:
  • At least one processor At least one processor
  • At least one memory At least one memory
  • the at least one memory stores one or more programs, the one or more programs are configured to be executed by the at least one processor, to execute the first aspect or the first aspect is provided in any optional manner Method steps or method steps provided in the second aspect or any optional manner provided in the second aspect.
  • an embodiment of the present application provides an image processing system including the video encoding device provided in the third aspect and the video decoding device provided in the fourth aspect.
  • the frequency domain information component and the side information component corresponding to the processed video data are filtered by the convolutional neural network model to obtain the de-distorted frequency domain information component.
  • the component removes the distortion that occurs in the frequency domain, so the image generated based on the de-distorted frequency domain information component removes the distortion and improves the subjective quality of the image.
  • FIG. 1 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 2-1 is a flowchart of another image processing method provided by an embodiment of the present application.
  • FIG. 2-2 is a structural block diagram of a video encoding system provided by an embodiment of the present application.
  • Figure 2-3 is a structural block diagram of another video encoding system provided by an embodiment of the present application.
  • Figure 2-4 is one of the schematic diagrams of the side information components provided by the embodiments of the present application.
  • 2-5 is a second schematic diagram of side information components provided by an embodiment of the present application.
  • 2-8 are schematic diagrams of obtaining frequency-domain information components for distortion reduction according to an embodiment of the present application.
  • FIG. 3 is a flowchart of an image processing method provided by an embodiment of the present application.
  • FIG. 4-1 is a flowchart of another image processing method provided by an embodiment of the present application.
  • 4-2 is a structural block diagram of a video decoding system provided by an embodiment of the present application.
  • 4-3 is a structural block diagram of another video decoding system provided by an embodiment of the present application.
  • 4-4 is a structural block diagram of another video decoding system provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of an image processing system provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of an apparatus provided by an embodiment of the present application.
  • an embodiment of the present application provides an image processing method.
  • the method includes:
  • Step 101 Obtain the frequency domain information component and the side information component corresponding to the processed video data.
  • the processed video data has distortion relative to the original video data input to the encoding system.
  • the side information component indicates the distortion of the processed video data relative to the original video data feature.
  • Step 102 The frequency domain information component and the side information component are input into a convolutional neural network model and filtered to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is guided by the side information component to the frequency domain
  • the information component is obtained after filtering.
  • Step 103 Generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the frequency domain information component and the side information component corresponding to the processed video data are obtained, and the frequency domain information component and the side information component generated by the video encoding system are performed through the convolutional neural network model.
  • the filtering process obtains the frequency component of the distortion-free frequency domain. Since the filtered frequency-domain information component of the distortion removes the distortion that occurs in the frequency domain, the image generated using the frequency-domain information component of the distortion-free component removes the distortion and improves the generated image. Subjective quality, you can also use the generated de-distorted image as a reference image and encode the original video data after the current original video data, which improves the accuracy of subsequent encoded video data and improves the performance of de-distortion in the video encoding process.
  • the detailed implementation process of the method may include:
  • Step 201 Acquire the frequency domain information component and the side information component corresponding to the processed video data.
  • a video encoding system may be used for video encoding, and the frequency domain information component and the side information component corresponding to the processed video data may be obtained from the video encoding system.
  • video encoding systems There are many types of video encoding systems. In this step, the following two video encoding systems are listed.
  • the first video encoding system includes a prediction module, an adder, a first transform unit, a quantization unit, and an entropy encoder. It consists of inverse quantization unit, first inverse transform unit, reconstruction unit, second transform unit, CNN (Convolutional Neural Network Model), second inverse transform unit and buffer.
  • a prediction module for the first video encoding system, see the schematic structural diagram of the first video encoding system shown in FIG. 2-2.
  • the first video encoding system includes a prediction module, an adder, a first transform unit, a quantization unit, and an entropy encoder. It consists of inverse quantization unit, first inverse transform unit, reconstruction unit, second transform unit, CNN (Convolutional Neural Network Model), second inverse transform unit and buffer.
  • CNN Convolutional Neural Network Model
  • the encoding process of the video encoding system is: input the current original video data into the prediction module and the adder, the prediction module predicts the input current original video data according to the reference image in the buffer to obtain the mode information, and the mode information Input to adder, entropy encoder and reconstruction unit.
  • the prediction module includes an intra prediction unit, a motion estimation and motion compensation unit, and a switch.
  • the intra prediction unit can perform intra prediction on the current original video data to obtain intra mode information, and input the intra mode information to the entropy encoder.
  • the motion estimation and motion compensation unit compares the current original video with the reference image buffered in the buffer.
  • the data is inter-predicted to obtain inter-mode information.
  • the inter-mode information is input to the entropy encoder.
  • the switch selects whether to output the intra-mode information or output the inter-mode information to the adder and reconstruction unit.
  • the adder generates initial residual data according to the mode information and the current original video data.
  • the first transform unit transforms the initial residual data and outputs the transform processing result to the quantization unit;
  • the quantization unit transforms the transform result according to the quantization parameter Perform quantization to obtain quantized residual information, and output the quantized residual information to the entropy encoder and dequantization unit;
  • the entropy encoder encodes the quantized residual information and mode information (mode information includes intra-mode information and inter-mode Information) to form a video bitstream, and the video comparison stream may include the encoding information of each encoding unit in the original video data.
  • the inverse quantization unit performs inverse quantization on the quantized residual information to obtain a first residual coefficient, and inputs the first residual coefficient to the first inverse transform unit, and the first inverse transform unit inverse transforms the first residual coefficient
  • the second residual information is obtained, and the second residual information is input into the reconstruction unit;
  • the reconstruction unit generates distortion reconstructed video data according to the second residual information and the mode information (intra mode information and inter mode information).
  • the distortion-reconstructed video data is input to the second transform unit as processed video data, and the second transform unit transforms the processed video data to obtain video data frequency domain information corresponding to the processed video data.
  • the frequency domain information of the video data can be obtained, the frequency domain information component corresponding to the processed video data can be generated according to the frequency domain information of the video data, and the quantization parameter used by the quantization unit can be obtained. Characterize the quantization step size, and generate the side information component corresponding to the processed video data according to the quantization parameter.
  • the second video encoding system please refer to the schematic diagram of the structure of the second video encoding system shown in Figs. 2-3.
  • the difference between the second video encoding system and the first video encoding system is that in the second video encoding system
  • the convolutional neural network model can be connected in series between the inverse quantization unit and the first inverse change unit, and the second change unit and the second inverse change unit are omitted in the second video encoding system.
  • the encoding process of the video encoding system is: input the current original video data into the prediction module and the adder, the prediction module predicts the input current original video data according to the reference image in the buffer to obtain the mode information, and the mode information Input to adder, entropy encoder and reconstruction unit.
  • the intra prediction unit included in the prediction module can perform intra prediction on the current original video data to obtain intra mode information, and input the intra mode information to the entropy encoder, and the motion estimation and motion compensation unit included in the prediction module according to the cache
  • the reference image buffered in the device performs inter prediction on the current original video data to obtain inter mode information.
  • the inter mode information is input to the entropy encoder.
  • the switch included in the prediction module selects whether to inter mode information or inter mode information. Output to the adder and reconstruction unit.
  • the adder generates initial residual data according to the mode information and the current original video data.
  • the first transform unit transforms the initial residual data and outputs the transform processing result to the quantization unit; the quantization unit transforms the transform result according to the quantization parameter Quantize to obtain the video data to be encoded.
  • the video data to be encoded is the processed video data, which is also the quantization residual information.
  • the processed video data is output to the entropy encoder and the inverse quantization unit; the entropy encoder encodes the processed video data Encode information such as mode information (mode information includes intra-mode information and inter-mode information) to form a video bitstream, and the video comparison stream may include encoding information for each encoding unit in the original video data.
  • the inverse quantization unit performs inverse quantization on the processed video data to obtain a first residual coefficient, and then generates a frequency domain information component corresponding to the processed video data according to the first residual coefficient, where the generation process may be:
  • the first residual coefficient is input to the first inverse transform unit, the first inverse transform unit performs an inverse transform process on the first residual coefficient to obtain a second residual coefficient, and the second residual coefficient is input to the reconstruction unit;
  • the reconstruction unit Generate distortion reconstructed video data according to the second residual coefficient and the mode information (intra mode information and inter mode information), input the distortion reconstructed video data to the second transform unit, and the second transform unit transforms the distortion reconstructed video data Obtain the frequency domain information of the video data corresponding to the processed video data.
  • the frequency domain information of the video data can be obtained, the frequency domain information component corresponding to the processed video data can be generated according to the frequency domain information of the video data, and the quantization parameter used by the quantization unit can be obtained.
  • Characterize the quantization step size obtain the inter-frame mode information corresponding to the processed video data, and generate side information components according to the quantization parameter and the inter-frame mode information.
  • the operation of generating side information components according to the quantization parameter and the inter-frame mode information may be:
  • a side information guide map is generated, and the side information guide map is a guide map of the same height and width as the current original video data generated according to the quantization parameter;
  • the side information guide map matching the inter-frame mode information is determined as the side information component.
  • Step 202 The frequency domain information component and the side information component are input into a convolutional neural network model and filtered to obtain a de-distorted frequency domain information component.
  • the side information component corresponding to the processed video data represents the distortion characteristics of the processed video data relative to the original video data.
  • the distortion characteristics may include at least one of the following distortion characteristics:
  • the side information component may indicate the degree of distortion of the processed video data relative to the original video data.
  • the side information component can also indicate the type of distortion of the processed video data relative to the original video data.
  • different coding units in the image may use different prediction modes. Different prediction modes will affect the distribution of residual data and thus affect The characteristics of the distorted target image block. Therefore, the mode information of the coding unit can be used as a kind of side information characterizing the type of distortion.
  • the matrix structure of the side information component is the same as the matrix structure of the frequency domain information component, where the coordinates [0, 0], [0, 1] represent the distortion position, and the element value of the matrix 1 represents the degree of distortion , That is, the side information component can simultaneously indicate the degree of distortion and the position of distortion.
  • the coordinates [0,0], [0,1], [2,0], [2,4] represent the distortion position
  • the element values 1, 2 of the matrix represent the distortion type, that is, the edge
  • the information component can simultaneously indicate the type of distortion and the location of the distortion.
  • the above solution provided by the embodiment of the present application may simultaneously include two side information components illustrated in FIGS. 2-4 and 2-5, respectively.
  • the side information component may include side information components corresponding to each frequency domain information component.
  • the convolutional neural network model includes: an edge information component generation module 11, a convolutional neural network 12, and a network training module 13;
  • the side information component generation module 11 can be used to generate side information components; the network training module 13 can train the convolutional neural network model according to the original sample image, so that the trained convolutional neural network model can compare the input frequency domain information components and The side information components are filtered to obtain the undistorted frequency domain information components.
  • the convolutional neural network 12 may include the following three-layer structure:
  • the input layer processing unit 121 is used to receive the input of the convolutional neural network.
  • the solution includes frequency domain information components and side information components; the input data is subjected to the first layer of convolution filtering processing;
  • the hidden layer processing unit 122 performs at least one layer of convolution filtering on the output data of the input layer processing unit 121;
  • the output layer processing unit 123 performs the convolution filtering process of the final layer on the output data of the hidden layer processing unit 122, and the output result is used as a frequency domain information component for distorting, which is used to generate a distorted image.
  • Figure 2-7 is a schematic diagram of the data flow to realize the solution, in which the frequency domain information component and the side information component are used as input data and input into the pre-trained convolutional neural network model; or, the side information guide is generated according to the side information
  • the side information may be quantization parameters and / or inter-frame mode information, frequency domain information components and side information guide maps as input data, which is input into a pre-trained convolutional neural network model.
  • the convolutional neural network model can be represented by a convolutional neural network with a preset structure and a set of network parameters. After the input data is processed by convolution filtering of the input layer, hidden layer, and output layer, the frequency-domain information component of the distortion is obtained. .
  • input data of the convolutional neural network model may include one or more side information components, and may also include one or more frequency domain information components.
  • the stored data of each pixel of an image is the data saved at the pixel position of the image, including the values of all color components of the pixel, and the frequency domain information corresponding to the processed video data is obtained
  • the component you can extract the value of one or more color components from the stored data of each pixel according to your needs, so as to obtain the frequency domain information component corresponding to the processed video data.
  • this step may specifically include the following processing steps:
  • the structure of the convolutional neural network model including the input layer, the hidden layer, and the output layer is taken as an example to describe the scheme.
  • Step 61 The frequency domain information component and the side information component are used as input data of the pre-established convolutional neural network model, and the input layer performs the first layer of convolution filtering processing, which may be as follows:
  • the input data may be input through the respective channels to the network, the present step, the frequency domain information component c y Y channel and the channel side information c m component m, performed on the channel dimensions combined together constitute the input data I c y + c m channels, and the input data using the following formula I for multidimensional convolution filtering and non-linear mapping, the n 1 represents produced in the form of a sparse Image block:
  • F 1 (I) is the output of the input layer
  • I is the input of the convolutional layer in the input layer
  • * is the convolution operation
  • W 1 is the weight coefficient of the convolutional layer filter bank of the input layer
  • B 1 is the input
  • g () is a nonlinear mapping function.
  • W 1 corresponds to n 1 convolution filters, that is, there are n 1 convolution filters acting on the input of the convolution layer of the input layer, and output n 1 image blocks; the convolution of each convolution filter
  • the size of the kernel is c 1 ⁇ f 1 ⁇ f 1 , where c 1 is the number of input channels, and f 1 is the spatial size of each convolution kernel.
  • the input layer convolution processing expression is:
  • Step 62 The hidden layer performs further high-dimensional mapping on the sparsely represented image block F 1 (I) output by the input layer.
  • the number of convolutional layers included in the hidden layer, the connection method of the convolutional layer, the attributes of the convolutional layer, etc. are not limited, and various known structures may be used, but the hidden layer contains at least 1 convolutional layer.
  • the hidden layer contains N-1 (N ⁇ 2) convolutional layers, and the hidden layer processing is represented by the following formula:
  • F i (I) g (W i * F i-1 (I) + B i ), i ⁇ ⁇ 2,3, ..., N ⁇ ;
  • F i (I) represents the output of the i-th convolutional layer in the convolutional neural network
  • * is the convolution operation
  • W i is the weight coefficient of the i-th convolutional layer filter bank
  • B i is the i-th layer
  • the offset coefficient of the convolutional layer filter bank, g () is a nonlinear mapping function.
  • W i corresponds to n i convolution filters, that is, there are n i convolution filters acting on the input of the i-th convolution layer, and output n i image blocks; the convolution of each convolution filter
  • the size of the kernel is c i ⁇ f i ⁇ f i , where c i is the number of input channels, and f i is the spatial size of each convolution kernel.
  • Step 63 The output layer aggregates the high-dimensional image blocks F N (I) output by the hidden layer, and outputs de-distorted frequency domain information components, which are used to generate a de-distorted image.
  • the structure of the output layer is not limited.
  • the output layer may be a Residual Learning structure, a Direct Learning structure, or other structures.
  • the processing using Residual Learning structure is as follows:
  • a convolution operation is performed on the output of the hidden layer to obtain a compensation residual, which is then added to the input frequency domain information component to obtain a distortion-free frequency domain information component.
  • the output layer processing can be expressed by the following formula:
  • F (I) W N + 1 * F N (I) + B N + 1 + Y;
  • F (I) is the output of the output layer
  • F N (I) is the output of the hidden layer
  • * is the convolution operation
  • W N + 1 is the weight coefficient of the convolutional layer filter bank of the output layer
  • B N +1 is the offset coefficient of the convolutional layer filter bank of the output layer
  • Y is the frequency-domain information component that has not been subjected to convolutional filter processing and is to be subjected to de-distortion processing.
  • W N + 1 corresponds to n N + 1 convolution filters, that is, n N + 1 convolution filters act on the input of the N + 1 convolution layer, and output n N + 1 image blocks
  • N N + 1 is the number of output frequency domain information components that are distorted, and is generally equal to the number of input frequency domain information components. If only one type of frequency domain information component is output, then n N + 1 generally takes the value Is 1; the size of the convolution kernel of each convolution filter is c N + 1 ⁇ f N + 1 ⁇ f N + 1 , where c N + 1 is the number of input channels and f N + 1 is each convolution The size of the core in space.
  • the frequency-domain information component of the distortion is directly output, that is, the de-distorted image block is obtained.
  • the output layer processing can be expressed by the following formula:
  • F (I) W N + 1 * F N (I) + B N + 1 ;
  • F (I) is the output layer output
  • F N (I) is the output of the hidden layer
  • * is the convolution operation
  • W N + 1 is the weight coefficient of the convolutional layer filter bank of the output layer
  • B N + 1 is the offset coefficient of the convolutional filter bank of the output layer.
  • W N + 1 corresponds to n N + 1 convolution filters, that is, n N + 1 convolution filters act on the input of the N + 1 convolution layer, and output n N + 1 image blocks
  • N N + 1 is the number of output frequency domain information components that are distorted, and is generally equal to the number of input frequency domain information components. If only one type of frequency domain information component is output, then n N + 1 generally takes the value Is 1; the size of the convolution kernel of each convolution filter is c N + 1 ⁇ f N + 1 ⁇ f N + 1 , where c N + 1 is the number of input channels and f N + 1 is each convolution The size of the core in space.
  • the output layer adopts the Residual Learning structure.
  • the output layer includes one convolutional layer.
  • the expression of the convolution processing of the output layer in this embodiment is:
  • F (I) W 3 * F 3 (I) + B 3 + Y.
  • Step 71 Obtain a preset training set, the preset training set includes the original sample image, and frequency domain information components corresponding to multiple processed video data corresponding to the original sample image, and side information components corresponding to each processed video data,
  • the side information component corresponding to the processed video data represents the distortion characteristics of the processed video data relative to the original sample image.
  • the distortion characteristics of the plurality of distorted images are different.
  • the original sample image (that is, the undistorted natural image) can be pre-processed with an image processing with different degrees of distortion to obtain the corresponding processed video data, and according to the steps in the above-mentioned distortion removal method, for each
  • the processed video data generates corresponding side information components, so that each original sample image, corresponding processed video data, and corresponding side information components form an image pair, and these image pairs form a preset training set ⁇ .
  • the processed video data corresponding to the original sample image and the side information component corresponding to the original sample image are used as the training sample of the CNN, and the original image color component in the original sample image is used as the labeling information of the training sample.
  • Each training sample in the training set corresponds to an original sample image.
  • the training set may include an original sample image, and the above image processing is performed on the original sample image to obtain multiple processed video data with different distortion characteristics, and side information components corresponding to each processed video data;
  • the training set may also include multiple original sample images, and perform the above image processing on each original sample image to obtain multiple processed video data with different distortion characteristics, and side information components corresponding to each processed video data.
  • Step 72 the default configuration for a convolutional neural network CNN, CNN initialize the convolutional neural network parameter set of network parameters, initialization parameter set may be represented by ⁇ 1, the initialization parameter can be set according to actual needs and experiences.
  • the high-level parameters related to training such as the learning rate and the gradient descent algorithm, can also be set reasonably.
  • various methods in the prior art can be adopted, and detailed descriptions will not be given here.
  • Step 73 Perform forward calculation, as follows:
  • the frequency domain information component and the corresponding side information component corresponding to each processed video data in the preset training set are input into the convolutional neural network of the preset structure, and the convolutional neural network corresponds to the original sample image in the training set.
  • a plurality of processed video data and the frequency domain information component corresponding to the original sample image are subjected to convolution filtering processing to obtain the undistorted frequency domain information component corresponding to the processed video data.
  • the forward calculation of the convolutional neural network CNN with the parameter set ⁇ i can be performed on the preset training set ⁇ to obtain the output F (Y) of the convolutional neural network, that is, each processed video data corresponds to The frequency domain information component of the distortion.
  • the current parameter set is ⁇ 1.
  • the current parameter set ⁇ i is obtained by adjusting the parameter set ⁇ i-1 used last time. description.
  • H training samples can be selected from the training set, that is, H-processed video data corresponding to the side information component and the frequency domain information component are selected, and H is an integer greater than or equal to 1. .
  • the forward information is calculated by the side information component and the frequency domain information component corresponding to the selected H processed video data.
  • the frequency-domain information components corresponding to each training sample can be obtained.
  • Step 74 Determine the loss values of the multiple original sample images based on the original image color components of the multiple original sample images and the obtained de-distorted frequency domain information components.
  • MSE mean square error
  • H represents the number of image pairs selected from the preset training set in a single training
  • I h represents the input data corresponding to the h-th processed video data, which is composed of the side information component and the frequency domain information component
  • ⁇ i ) represents the de-distorted frequency domain information component calculated by the convolutional neural network CNN under the parameter set ⁇ i for the h-th processed video data
  • X h represents the corresponding The original image color component in the original sample image
  • i is the count of the number of times forward calculation has been performed currently.
  • Step 75 Determine whether the convolutional neural network of the preset structure using the current parameter set converges based on the loss value. If it does not converge, go to step 76; if it converges, go to step 77.
  • the convergence may be determined when the loss value is less than the preset loss value threshold; or the convergence may be determined when the difference between the calculated loss value and the last calculated loss value is less than the preset change threshold value. No limitation here.
  • Step 76 Adjust the parameters in the current parameter set to obtain the adjusted parameter set, and then proceed to step 73 for the next forward calculation.
  • the back propagation algorithm can be used to adjust the parameters in the current parameter set.
  • the parameters in the current parameter set may be adjusted according to the loss values of multiple original sample images to obtain the adjusted parameter set.
  • the de-distorted frequency domain information component corresponding to each selected training sample and the original sample image corresponding to each training sample according to the de-distorted frequency domain information component corresponding to each training sample and The difference between the original image color components in the original sample image adjusts the parameters in the current parameter set to obtain the adjusted parameter set.
  • Step 77 The current parameter set is used as the output final parameter set ⁇ final , and the convolutional neural network with the preset structure using the final parameter set ⁇ final is used as the trained convolutional neural network model.
  • Step 203 Generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the de-distorted image may be used as a reference image, and the reference image is used to encode the original video data after the current original video data to obtain a video bit stream.
  • the frequency-domain information component for distortion reduction is a frequency domain reconstructed image. Therefore, in this step, the second inverse transform unit inversely transforms the de-distorted frequency domain information component to determine the inverse-transformed video data as a de-distorted image, and the de-distorted image can also be saved as a reference image in the buffer . In this way, the motion estimation and motion compensation unit performs inter prediction on the original video data after the current original video data according to the reference image buffered in the buffer to obtain inter mode information, so as to realize the use of the reference image to the original video data after the current original video data Encode to get the video bitstream.
  • the frequency-domain information component for distorting is a frequency-domain residual coefficient. Therefore, in this step, the de-distorted frequency domain information component is inversely transformed by the first inverse transform unit, the frequency domain information after the inverse transform is input to the reconstruction unit, and the reconstruction unit uses the frequency domain information and mode information after the inverse transform (Intra-mode information and inter-mode information) De-distorted reconstructed video data is output, and the de-distorted reconstructed video data is a de-distorted image, which can be used as a reference image and stored in a buffer.
  • the de-distorted frequency domain information component for distorting is a frequency-domain residual coefficient. Therefore, in this step, the de-distorted frequency domain information component is inversely transformed by the first inverse transform unit, the frequency domain information after the inverse transform is input to the reconstruction unit, and the reconstruction unit uses the frequency domain information and mode information after the inverse transform (Intra-mode information and inter-mode information) De-distorted reconstructed video data is output, and the de-distorted reconstructed video data is
  • the motion estimation and motion compensation unit performs inter prediction on the original video data after the current original video data according to the reference image buffered in the buffer to obtain inter mode information, so as to realize the use of the reference image to the original video data after the current original video data Encode to get the video bitstream.
  • the frequency domain information component and the side information component generated by the video encoding system during the video encoding process are obtained, and the frequency domain information component and the side information component generated by the video encoding system are filtered by CNN After processing, the de-distorted frequency domain information component is obtained. Since the filtered de-distorted frequency domain information component removes the distortion that occurs in the frequency domain, the de-distorted frequency domain information component is used to generate a de-distorted image as a reference image, which can improve the reference The subjective quality of the image, and then using the reference image to encode the original video data after the current original video data, improves the accuracy of subsequent encoded video data.
  • an embodiment of the present application provides an image processing method.
  • the method includes:
  • Step 301 Obtain the frequency domain information component and the side information component corresponding to the processed video data.
  • the processed video data is distorted relative to the original video data before encoding corresponding to the video bit stream input to the decoding system.
  • the side information component indicates that the processed video data is processed Distortion characteristics of video data relative to original video data.
  • Step 302 The frequency domain information component and the side information component are input into a convolutional neural network model and subjected to convolution filtering to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is guided by the side information component.
  • the frequency domain information component is obtained after filtering.
  • Step 303 Generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the frequency domain information component and the side information component generated by the video decoding system during the video decoding process are obtained, and the frequency domain information component and the side information component generated by the video encoding system are filtered by CNN After processing, the undistorted frequency domain information component is obtained. Since the filtered undistorted frequency domain information component removes the distortion occurring in the frequency domain, the use of the undistorted frequency domain information component to generate a distorted image can improve the subjectivity of the image quality.
  • the detailed implementation process of the method may include:
  • Step 401 Entropy decode the received video bit stream to obtain current entropy decoded data.
  • Step 402 Acquire the frequency domain information component and the side information component corresponding to the processed video data.
  • the frequency domain information component and the side information component are generated when decoding the current entropy decoded data, and the side information component represents the distortion characteristics of the processed video data relative to the original video data, and the original video data is the video data corresponding to the current entropy decoded data .
  • a video decoding system can be used for video decoding, and frequency domain information components and side information components can be obtained from the video decoding system.
  • the first video decoding system includes a prediction module, an entropy decoder, an inverse quantization unit, a first inverse transform unit, and reconstruction Unit, CNN (Convolutional Neural Network Model) and buffer and other parts.
  • CNN Convolutional Neural Network Model
  • the process of decoding using the first video decoding system is: input the received video bitstream into an entropy decoder, and the entropy decoder performs entropy decoding on the bitstream to obtain entropy decoded data, which includes mode information and quantization parameters , Quantized residual information, etc., the quantized residual information is processed video data, the mode information is input to the prediction module, the quantized residual information is input to the inverse quantization unit, and the inverse quantization unit The information is subjected to inverse quantization to obtain the second residual coefficient.
  • the prediction module predicts the input mode information according to the reference image in the buffer to obtain prediction mode information, and inputs the prediction mode information to the reconstruction unit.
  • the prediction module includes an intra prediction unit, a motion compensation unit, and a switch.
  • the mode information may include intra mode information and inter mode information.
  • the adder selects to input the intra mode information or inter mode information to the reconstruction unit.
  • the intra prediction unit can predict the intra mode information to obtain the intra prediction mode information.
  • the motion compensation unit performs inter prediction on the inter mode information according to the reference image buffered in the buffer to obtain the inter prediction mode information.
  • the switch selects the frame Intra prediction mode information or output inter prediction mode information to the reconstruction unit.
  • the second residual coefficient generated by the inverse quantization unit is obtained as the frequency domain information component corresponding to the processed video data and the quantization parameter and inter mode information generated by the entropy decoder are obtained. According to the quantization parameter and The inter-mode information generates side information components corresponding to the processed video data.
  • the operation of generating the side information component may be:
  • a side information guide map is generated, and the side information guide map is a guide map of the same height and width as the current original video data generated according to the quantization parameter;
  • the side information guide map matching the inter-frame mode information is determined as the side information component.
  • the second video decoding system see FIG. 4-3, the difference between the second video decoding system and the first video decoding system is that in the second video decoding system, the inverse quantization unit is connected to the first inverse transform unit, The transformation unit, CNN and the second inverse transformation unit are connected in series between the reconstruction unit and the buffer.
  • the process of decoding using the second video decoding system is different from the process using the first video decoding system in that the inverse quantization unit performs inverse quantization on the quantized residual information input by the entropy decoder to obtain a second residual coefficient
  • the inverse transform unit inputs the second residual coefficient; the first inverse transform unit performs inverse transform processing on the second residual coefficient to obtain fifth residual information, and inputs the fifth residual information to the reconstruction unit; the reconstruction unit is based on the input fifth Residual information and intra prediction mode information or according to the fifth residual information and inter prediction mode, generate distortion reconstructed video data, the distortion reconstructed video data is processed video data, and input the distortion reconstructed video data to the transform unit; transform The unit performs transform processing on the distorted reconstructed video data to obtain frequency domain information components.
  • the frequency domain information component corresponding to the processed video data is obtained by transforming the distorted reconstructed video data and the quantization parameter generated by the entropy decoder, and the side information corresponding to the processed video data is generated according to the quantization parameter Weight.
  • the third video decoding system includes an intra prediction module, an entropy decoder, an inverse quantization unit, and a first inverse transform unit , Reconstruction unit, transformation unit, CNN (Convolutional Neural Network Model), second inverse transformation unit and buffer and other components.
  • the process of decoding using the third video decoding system is as follows: input the received video bitstream into an entropy decoder, and the entropy decoder decodes the bitstream to obtain entropy decoded data, which includes intra-mode information and quantization Parameters, quantized residual information, etc., input the intra-mode information to the intra prediction module, and input the quantized residual information to the inverse quantization unit, and the inverse quantization unit performs inverse quantization on the quantized residual information to obtain the first Two residual coefficients, input the second residual coefficients to the first inverse transform unit; the first inverse transform unit inverse transforms the second residual coefficients to obtain fifth residual information, and input the fifth residual information to the reconstruction unit .
  • the intra prediction module predicts the input intra mode information according to the reference image in the buffer to obtain the intra prediction mode information, and inputs the intra prediction mode information to the reconstruction unit.
  • the reconstruction unit generates distortion reconstructed video data based on the input fifth residual information and intra prediction mode information, the distortion reconstructed video data is processed video data, and the distortion reconstructed video data is input to the transform unit; the transform unit reconstructs the distortion video Data is transformed to obtain frequency domain information components.
  • the frequency domain information component corresponding to the processed video data is obtained by transforming the distorted reconstructed video data and the quantization parameter generated by the entropy decoder, and the side information corresponding to the processed video data is generated according to the quantization parameter Weight.
  • Step 403 Input the frequency domain information component and the side information component into the convolutional neural network model to perform convolution filtering processing to obtain the de-distorted frequency domain information component corresponding to the processed video data.
  • the convolutional neural network model is obtained by training based on a preset training set.
  • the preset training set includes the image information of the original sample image, multiple frequency domain information components corresponding to the original sample image, and the processed corresponding to each original sample image. Side information component corresponding to video data.
  • the processed video data corresponding to the original sample image and the side information component corresponding to the original sample image are used as the training sample of the convolutional neural network model, and the original image color component in the original sample image is used as the labeling information of the training sample.
  • Each training sample in the training set corresponds to an original sample image.
  • Step 404 Generate a de-distorted image according to the de-distorted frequency domain information component.
  • the frequency-domain information component for distortion reduction is a frequency domain reconstructed image. Therefore, in this step, the first inverse transform unit performs inverse transform on the de-distorted frequency domain information component output by CNN to obtain fifth residual information, and inputs the fifth residual information to the reconstruction unit; the reconstruction unit uses the intra prediction mode information and The fifth residual information, or according to the inter prediction mode information and the fifth residual information, generates de-distorted reconstructed video data, and the de-distorted reconstructed video data is a de-distorted image.
  • the frequency-domain information component for distortion removal is a frequency-domain residual coefficient. Therefore, in this step, the second inverse transform unit performs an inverse transform on the de-distorted frequency-domain information component output by the CNN to obtain a de-distorted image.
  • Step 405 Use the de-distorted image as a reference image, and decode the subsequently received video bit stream according to the reference image.
  • the obtained de-distorted image is used as a reference image and stored in the buffer.
  • the distorted image can be directly displayed.
  • the frequency domain information component and the side information component generated by the video decoding system during the video decoding process are obtained, and the frequency domain information component and the side information component generated by the video encoding system are filtered by CNN Processing to obtain the de-distorted frequency domain information component corresponding to the processed video data. Since the filtered de-distorted frequency domain information component removes the distortion that occurs in the frequency domain, the de-distorted frequency domain information component is used to generate a reference image and improve the reference The subjective quality of the image, using the reference image to encode the video bitstream after the current original video data, improves the accuracy of the decoding.
  • an embodiment of the present application provides an image processing apparatus 500 provided by the present application.
  • the apparatus 500 includes:
  • the obtaining module 501 is used to obtain the frequency domain information component and the side information component corresponding to the processed video data, the processed video data is distorted relative to the original video data of the input encoding system, and the side information component represents the processed Distortion characteristics of video data relative to the original video data;
  • the filtering module 502 is configured to input the frequency domain information component and the side information component into a convolutional neural network model and perform filtering processing to obtain a de-distorted frequency domain information component.
  • the de-distorted frequency domain information component is the side information The component is obtained after guiding filtering the frequency domain information component;
  • the generating module 503 is configured to generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the acquisition module 501 is used to:
  • the side information component is generated according to the quantization parameter and the inter mode information.
  • the generation module 503 is used to:
  • the de-distorted reconstructed video data is determined as the de-distorted image.
  • the acquisition module 501 is used to:
  • the side information guide map being a guide map that is generated according to the quantization parameter and has the same height and width as the original video data;
  • the side information guide graph matching the inter-frame mode information is determined as the side information component.
  • the processed video data is distortion reconstructed video data corresponding to the original video data
  • the obtaining module 501 is used to:
  • the side information component is generated.
  • the generation module 503 uses the following:
  • the frequency domain information component and the side information component generated by the video encoding system during the video encoding process are obtained, and the frequency domain information component and the side information component generated by the video encoding system are filtered by CNN After processing, the undistorted frequency domain information component is obtained. Since the filtered undistorted frequency domain information component removes the distortion that occurs in the frequency domain, the image generated using the undistorted frequency domain information component is free of distortion, and the image is used as a reference Image, using the reference image to encode the original video data after the current original video data, which improves the certainty of the subsequent encoded image.
  • an embodiment of the present application provides an image processing apparatus 600.
  • the apparatus 600 includes:
  • the obtaining module 601 is used to obtain the frequency domain information component and the side information component corresponding to the processed video data.
  • the processed video data is distorted relative to the original video data before encoding corresponding to the video bitstream input to the decoding system.
  • the side information component represents the distortion characteristics of the processed video data relative to the original video data;
  • the filtering module 602 is configured to input the frequency domain information component and the side information component into a convolutional neural network model and perform a convolution filtering process to obtain a de-distorted frequency domain information component.
  • the side information component is obtained after guiding filtering the frequency domain information component;
  • the generating module 603 is configured to generate a de-distorted image corresponding to the processed video data according to the de-distorted frequency domain information component.
  • the obtaining module 601 is used to:
  • the side information component is generated according to the quantization parameter and the inter mode information.
  • the generation module 603 is used to:
  • the de-distorted reconstructed video data is determined as the de-distorted image.
  • the obtaining module 601 is used to:
  • the side information guide map being a guide map that is generated according to the quantization parameter and has the same height and width as the original video data;
  • the side information guide graph matching the inter-frame mode information is determined as the side information component.
  • the processed video data is distortion reconstructed video data corresponding to the original video data
  • the obtaining module 601 is used to:
  • the side information component is generated.
  • the generation module 603 is used to:
  • the frequency domain information component and the side information component generated by the video decoding system during the video decoding process are obtained, and the frequency domain information component and the side information component generated by the video encoding system are filtered by CNN After processing, the de-distorted frequency domain information component is obtained. Since the filtered de-distorted frequency domain information component removes the distortion that occurs in the frequency domain, the de-distorted frequency domain information component can be used to generate a de-distorted image, and the de-distorted image is used as a reference The image encodes the video bitstream after the current original video data, which improves the certainty of decoding.
  • an embodiment of the present application provides an image processing system 700.
  • the system 700 includes a video encoding device 701 provided in the embodiment shown in FIG. 5 and a video decoding device 702 provided in the embodiment shown in FIG. 6. .
  • FIG. 8 shows a structural block diagram of an electronic device 800 provided by an exemplary embodiment of the present invention.
  • the electronic device 800 may be a portable mobile terminal, such as a smart phone, a tablet computer, a notebook computer, or a desktop computer.
  • the electronic device 800 may also be called other names such as user equipment, portable terminal, laptop terminal, and desktop terminal.
  • the electronic device 800 includes a processor 801 and a memory 802.
  • the processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on.
  • the processor 801 may adopt at least one hardware form of DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array), PLA (Programmable Logic Array). achieve.
  • the processor 801 may also include a main processor and a co-processor.
  • the main processor is a processor for processing data in a wake-up state, also known as a CPU (Central Processing Unit).
  • the co-processor is A low-power processor for processing data in the standby state.
  • the processor 801 may be integrated with a GPU (Graphics Processing Unit, image processor).
  • the GPU is used to render and draw content that needs to be displayed on the display screen.
  • the processor 801 may further include an AI (Artificial Intelligence, Artificial Intelligence) processor, which is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, Artificial Intelligence
  • the memory 802 may include one or more computer-readable storage media, which may be non-transitory.
  • the memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices.
  • the non-transitory computer-readable storage medium in the memory 802 is used to store at least one instruction that is executed by the processor 801 to implement the video encoding provided by the method embodiment in the present application Method or video decoding method.
  • the electronic device 800 may optionally further include: a peripheral device interface 803 and at least one peripheral device.
  • the processor 801, the memory 802, and the peripheral device interface 803 may be connected by a bus or a signal line.
  • Each peripheral device may be connected to the peripheral device interface 803 through a bus, a signal line, or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 804, a touch display screen 805, a camera 806, an audio circuit 807, a positioning component 808, and a power supply 809.
  • the peripheral device interface 803 may be used to connect at least one peripheral device related to I / O (Input / Output) to the processor 801 and the memory 802.
  • the processor 801, the memory 802, and the peripheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, any one of the processor 801, the memory 802, and the peripheral interface 803 or Both can be implemented on a separate chip or circuit board, which is not limited in this embodiment.
  • the radio frequency circuit 804 is used to receive and transmit RF (Radio Frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 804 communicates with the communication network and other communication devices through electromagnetic signals.
  • the radio frequency circuit 804 converts the electrical signal into an electromagnetic signal for transmission, or converts the received electromagnetic signal into an electrical signal.
  • the radio frequency circuit 804 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a user identity module card, and so on.
  • the radio frequency circuit 804 can communicate with other terminals through at least one wireless communication protocol.
  • the wireless communication protocol includes but is not limited to: World Wide Web, Metropolitan Area Network, Intranet, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and / or WiFi (Wireless Fidelity) networks.
  • the radio frequency circuit 804 may further include NFC (Near Field Communication) related circuits, which is not limited in this application.
  • the display screen 805 is used to display a UI (User Interface, user interface).
  • the UI may include graphics, text, icons, video, and any combination thereof.
  • the display screen 805 also has the ability to collect touch signals on or above the surface of the display screen 805.
  • the touch signal can be input to the processor 801 as a control signal for processing.
  • the display screen 805 can also be used to provide virtual buttons and / or virtual keyboards, also called soft buttons and / or soft keyboards.
  • the display screen 805 may be one, and the front panel of the electronic device 800 is provided; in other embodiments, the display screen 805 may be at least two, respectively disposed on different surfaces of the electronic device 800 or in a folded design In still other embodiments, the display screen 805 may be a flexible display screen, which is disposed on the curved surface or folding surface of the electronic device 800. Even, the display screen 805 may also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 805 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode, organic light emitting diode) and other materials.
  • the camera component 806 is used to collect images or videos.
  • the camera assembly 806 includes a front camera and a rear camera.
  • the front camera is set on the front panel of the electronic device, and the rear camera is set on the back of the electronic device.
  • there are at least two rear cameras which are respectively one of the main camera, the depth-of-field camera, the wide-angle camera, and the telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function, the main camera Integrate with wide-angle camera to achieve panoramic shooting and VR (Virtual Reality, virtual reality) shooting function or other fusion shooting functions.
  • the camera assembly 806 may also include a flash.
  • the flash can be a single-color flash or a dual-color flash. Dual color temperature flash refers to the combination of warm flash and cold flash, which can be used for light compensation at different color temperatures.
  • the audio circuit 807 may include a microphone and a speaker.
  • the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 801 for processing, or input them to the radio frequency circuit 804 to implement voice communication.
  • the microphone can also be an array microphone or an omnidirectional acquisition microphone.
  • the speaker is used to convert the electrical signal from the processor 801 or the radio frequency circuit 804 into sound waves.
  • the speaker can be a traditional thin-film speaker or a piezoelectric ceramic speaker.
  • the speaker When the speaker is a piezoelectric ceramic speaker, it can not only convert electrical signals into sound waves audible by humans, but also convert electrical signals into sound waves inaudible to humans for ranging and other purposes.
  • the audio circuit 807 may also include a headphone jack.
  • the positioning component 808 is used to locate the current geographic location of the electronic device 800 to implement navigation or LBS (Location Based Service, location-based service).
  • the positioning component 808 may be a positioning component based on the GPS (Global Positioning System) of the United States, the Beidou system of China, or the Galileo system of Russia.
  • the power supply 809 is used to supply power to various components in the electronic device 800.
  • the power source 809 may be alternating current, direct current, disposable batteries, or rechargeable batteries.
  • the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery.
  • the wired rechargeable battery is a battery charged through a wired line
  • the wireless rechargeable battery is a battery charged through a wireless coil.
  • the rechargeable battery can also be used to support fast charging technology.
  • the electronic device 800 further includes one or more sensors 810.
  • the one or more sensors 810 include, but are not limited to: an acceleration sensor 811, a gyro sensor 812, a pressure sensor 813, a fingerprint sensor 814, an optical sensor 815, and a proximity sensor 816.
  • the acceleration sensor 811 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the electronic device 800.
  • the acceleration sensor 811 may be used to detect components of gravity acceleration on three coordinate axes.
  • the processor 801 may control the touch screen 805 to display the user interface in a landscape view or a portrait view according to the gravity acceleration signal collected by the acceleration sensor 811.
  • the acceleration sensor 811 can also be used for game or user movement data collection.
  • the gyro sensor 812 can detect the body direction and rotation angle of the electronic device 800, and the gyro sensor 812 can cooperate with the acceleration sensor 811 to collect a 3D action of the user on the electronic device 800. Based on the data collected by the gyro sensor 812, the processor 801 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control, and inertial navigation.
  • the pressure sensor 813 may be disposed on the side frame of the electronic device 800 and / or the lower layer of the touch display screen 805.
  • the pressure sensor 813 can detect the user's grip signal on the electronic device 800, and the processor 801 can perform left-right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 813.
  • the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on the touch screen 805.
  • the operability control includes at least one of a button control, a scroll bar control, an icon control, and a menu control.
  • the fingerprint sensor 814 is used to collect the user's fingerprint, and the processor 801 identifies the user's identity based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the user's identity based on the collected fingerprint. When the user's identity is recognized as a trusted identity, the processor 801 authorizes the user to perform related sensitive operations, including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings.
  • the fingerprint sensor 814 may be provided on the front, back, or side of the electronic device 800. When a physical button or manufacturer logo is provided on the electronic device 800, the fingerprint sensor 814 may be integrated with the physical button or manufacturer logo.
  • the optical sensor 815 is used to collect the ambient light intensity.
  • the processor 801 may control the display brightness of the touch display 805 according to the ambient light intensity collected by the optical sensor 815. Specifically, when the ambient light intensity is high, the display brightness of the touch display 805 is increased; when the ambient light intensity is low, the display brightness of the touch display 805 is decreased.
  • the processor 801 can also dynamically adjust the shooting parameters of the camera assembly 806 according to the ambient light intensity collected by the optical sensor 815.
  • the proximity sensor 816 also called a distance sensor, is usually provided on the front panel of the electronic device 800.
  • the proximity sensor 816 is used to collect the distance between the user and the front of the electronic device 800.
  • the processor 801 controls the touch display 805 to switch from the bright screen state to the breathing state; when the proximity sensor 816 When it is detected that the distance between the user and the front of the electronic device 800 gradually becomes larger, the processor 801 controls the touch display screen 805 to switch from the breathing screen state to the bright screen state.
  • FIG. 8 does not constitute a limitation on the electronic device 800, and may include more or fewer components than shown, or combine certain components, or adopt different component arrangements.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本申请是关于一种图像处理的方法、装置及系统,属于视频编解码领域。所述方法包括:获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。本申请能够去除图像的失真。

Description

图像处理的方法、装置及系统
本申请要求于2018年10月25日提交的申请号为201811253559.X、发明名称为“一种图像处理的方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频编解码领域,特别涉及一种图像处理的方法、装置及系统。
背景技术
在视频编码系统中,在对原始视频图像进行编码时,原始视频图像会被进行多次处理得到重构图像。在视频编码的过程中,该重构图像又可以作为参考图像,被用于对原始视频图像进行编码。
原始视频图像会被进行多次处理后得到的重构图像相对原始视频图像可能已经发生像素偏移,即重构图像存在失真,影响重构图像的主观质量。
发明内容
本申请实施例提供了一种图像处理的方法、视频解码方法、装置及系统,以去除图像的失真,。所述技术方案如下:
第一方面,本申请提供了一种图像处理的方法,所述方法包括:
获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
第二方面,本申请提供了一种图像处理的方法,所述方法包括:
获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
第三方面,本申请提供了一种图像处理的装置,所述装置包括:
获取模块,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
滤波模块,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
生成模块,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
第四方面,本申请提供了一种图像处理的装置,所述装置包括:
获取模块,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
滤波模块,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
生成模块,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
第五方面,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现所述第一方面或第一方面任可选的方式提供的方法步骤或实现所述第二方面或第二方面任可选的方式提供的方法步骤。
第六方面,本申请提供了一种电子设备,其特征在于,所述电子设备包括:
至少一个处理器;和
至少一个存储器;
所述至少一个存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述至少一个处理器执行,以执行所述第一方面或第一方面任可选的方式提供的方法步骤或实现所述第二方面或第二方面任可选的方式提供的方法步骤。
第七方面,本申请实施例提供了一种图像处理的系统,所述系统包括所述第三方面提供的视频编码装置和如所述第四方面提供的视频解码装置。
本申请实施例提供的技术方案可以包括以下有益效果:
通过已处理视频数据对应的频域信息分量和边信息分量,通过卷积神经网络模型对该频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于去失真频域信息分量去除了在频域上发生的失真,所以根据去失真频域信息分量生成的图像去除了失真,提高了图像的主观质量。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
图1是本申请实施例提供的一种图像处理的方法流程图;
图2-1是本申请实施例提供的另一种图像处理的方法流程图;
图2-2是本申请实施例提供的一种视频编码系统的结构框图;
图2-3是本申请实施例提供的另一种视频编码系统的结构框图;
图2-4是本申请实施例提供的边信息分量的示意图之一;
图2-5是本申请实施例提供的边信息分量的示意图之二;
图2-6是本申请实施例提供的技术方案的系统架构图;
图2-7是本申请实施例提供的技术方案的数据流示意图;
图2-8是本申请实施例获得去失真频域信息分量的示意图;
图2-9是本申请实施例提供的去失真方法的流程图;
图2-10是本申请实施例提供的去失真方法的数据流图;
图3是本申请实施例提供的一种图像处理的方法流程图;
图4-1是本申请实施例提供的另一种图像处理的方法流程图;
图4-2是本申请实施例提供的一种视频解码系统的结构框图;
图4-3是本申请实施例提供的另一种视频解码系统的结构框图;
图4-4是本申请实施例提供的另一种视频解码系统的结构框图;
图5是本申请实施例提供的一种图像处理的装置结构示意图;
图6是本申请实施例提供的一种图像处理的装置结构示意图;
图7是本申请实施例提供的一种图像处理的系统结构示意图;
图8是本申请实施例提供的一种装置结构示意图。
通过上述附图,已示出本申请明确的实施例,后文中将有更详细的描述。这些附图和文字描述并不是为了通过任何方式限制本申请构思的范围,而是通过参考特定实施例为本领域技术人员说明本申请的概念。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
参见图1,本申请实施例提供了一种图像处理的方法,所述方法包括:
步骤101:获取已处理视频数据对应的频域信息分量和边信息分量,已处理视频数据相 对于输入编码系统的原始视频数据存在失真,该边信息分量表示已处理视频数据相对原始视频数据的失真特征。
步骤102:将该频域信息分量和该边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,该去失真频域信息分量是以该边信息分量为引导对该频域信息分量进行滤波之后得到的。
步骤103:根据该去失真频域信息分量,生成已处理视频数据对应的去失真图像。
在本申请实施例中,在视频编码过程中,获取已处理视频数据对应的频域信息分量和边信息分量,通过卷积神经网络模型对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域上发生的失真,所以使用去失真频域信息分量生成的图像去除了失真,提高生成的图像的主观质量,还可以使用生成的去失真图像作为参考图像并对当前原始视频数据之后的原始视频数据进行编码,提高了后续编码视频数据的准确性,提高了在视频编码过程中去失真性能。
对于图1所示的图像处理的方法,参见图2-1,该方法的详细实现过程,可以包括:
步骤201:获取已处理视频数据对应的频域信息分量和边信息分量。
可选的,可以使用视频编码系统进行视频编码,从视频编码系统中获取已处理视频数据对应的频域信息分量和边信息分量。视频编码系统有多种,在本步骤中列举了如下两种视频编码系统。
第一种视频编码系统,参见图2-2所示的第一种视频编码系统的结构示意图,第一种视频编码系统包括预测模块、加法器、第一变换单元、量化单元、熵编码器、反量化单元、第一反变换单元、重建单元、第二变换单元、CNN(卷积神经网络模型)、第二反变换单元和缓存器等部分组成。
该视频编码系统编码的过程为:将当前原始视频数据输入到预测模块和加法器中,预测模块根据缓存器中的参考图像对输入的当前原始视频数据进行预测得到模式信息,并将该模式信息输入到加法器、熵编码器和重建单元。其中,预测模块包括帧内预测单元、运动估计与运动补偿单元和开关。帧内预测单元可以对当前原始视频数据进行帧内预测得到帧内模式信息,将该帧内模式信息输入到熵编码器,运动估计与运动补偿单元根据缓存器中缓存的参考图像对当前原始视频数据进行帧间预测得到帧间模式信息,将该帧间模式信息输入到熵编码器,开关选择将帧内模式信息或将帧间模式信息输出给加法器和重建单元。
加法器根据该模式信息和当前原始视频数据产生初始残差数据,第一变换单元对初始残差数据进行变换处理,将变换处理的结果输出给量化单元;量化单元根据量化参数对变换处理的结果进行量化得到量化残差信息,将该量化残差信息输出给熵编码器和反量化单元;熵编码器对该量化残差信息和模式信息等信息(模式信息包括帧内模式信息和帧间模式信息)进行编码形成视频比特流,该视频比较流中可以包括原始视频数据中的每个编码单元的编码信息。
同时,反量化单元对该量化残差信息进行反量化处理得到第一残差系数,将第一残差系数输入到第一反变换单元,第一反变换单元对第一残差系数进行反变换处理得到第二残差信息,将第二残差信息输入到重建单元中;重建单元根据第二残差信息和该模式信息(帧 内模式信息和帧间模式信息)生成失真重建视频数据,将失真重建视频数据作为已处理视频数据输入到第二变换单元,第二变换单元对已处理视频数据进行变换得到已处理视频数据对应的视频数据频域信息。相应的,在本步骤中,可以获取该视频数据频域信息,根据该视频数据频域信息生成已处理视频数据对应的频域信息分量,以及获取量化单元采用的量化参数,该量化参数用于表征量化步长,根据该量化参数生成已处理视频数据对应的边信息分量。
第二种视频编码系统,参见图2-3所示的第二种视频编码系统的结构示意图,第二种视频编码系统与第一种视频编码系统的区别在于:在第二种视频编码系统中,卷积神经网络模型可以串联在反量化单元和第一反变化单元之间,并且在第二种视频编码系统省去第二变化单元和第二反变化单元。
该视频编码系统编码的过程为:将当前原始视频数据输入到预测模块和加法器中,预测模块根据缓存器中的参考图像对输入的当前原始视频数据进行预测得到模式信息,并将该模式信息输入到加法器、熵编码器和重建单元。其中,预测模块包括的帧内预测单元可以对当前原始视频数据进行帧内预测得到帧内模式信息,将该帧内模式信息输入到熵编码器,预测模块包括的运动估计与运动补偿单元根据缓存器中缓存的参考图像对当前原始视频数据进行帧间预测得到帧间模式信息,将该帧间模式信息输入到熵编码器,预测模块包括的开关选择将帧内模式信息或将帧间模式信息输出给加法器和重建单元。
加法器根据该模式信息和当前原始视频数据产生初始残差数据,第一变换单元对初始残差数据进行变换处理,将变换处理的结果输出给量化单元;量化单元根据量化参数对变换处理的结果进行量化得到待编码视频数据,待编码视频数据就是已处理视频数据,其也是量化残差信息,将该已处理视频数据输出给熵编码器和反量化单元;熵编码器对该已处理视频数据和模式信息等信息(模式信息包括帧内模式信息和帧间模式信息)进行编码形成视频比特流,该视频比较流中可以包括原始视频数据中的每个编码单元的编码信息。
同时,反量化单元对该已处理视频数据进行反量化处理得到第一残差系数,然后再根据第一残差系数生成已处理视频数据对应的频域信息分量,其中,该生成过程可以为:将第一残差系数输入到第一反变换单元,第一反变换单元对第一残差系数进行反变换处理得到第二残差系数,将第二残差系数输入到重建单元中;重建单元根据第二残差系数和该模式信息(帧内模式信息和帧间模式信息)生成失真重建视频数据,将失真重建视频数据输入到第二变换单元,第二变换单元对失真重建视频数据进行变换得到已处理视频数据对应的视频数据频域信息。相应的,在本步骤中,可以获取该视频数据频域信息,根据该视频数据频域信息生成已处理视频数据对应的频域信息分量,以及获取量化单元采用的量化参数,该量化参数用于表征量化步长,获取已处理视频数据对应的帧间模式信息,根据该量化参数和该帧间模式信息,生成边信息分量。
可选的,根据该量化参数和该帧间模式信息,生成边信息分量的操作,可以为:
根据该量化参数,生成边信息引导图,该边信息引导图是根据该量化参数生成的与当前原始视频数据等高等宽的引导图;
根据该帧间模式信息,对该边信息引导图进行更新,生成与该帧间模式信息匹配的边信息引导图;
将与该帧间模式信息匹配的边信息引导图确定为该边信息分量。
步骤202:将该频域信息分量和该边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量。
已处理视频数据对应的边信息分量表示已处理视频数据相对原始视频数据的失真特征。
可选的,失真特征可以至少包括如下失真特征之一:
失真程度、失真位置,失真类型:
边信息分量可以表示已处理视频数据相对原始视频数据的失真程度。
边信息分量也可以表示已处理视频数据相对原始视频数据的失真类型,例如在视频编解码应用中,图像中不同编码单元可能采用不同预测模式,不同预测模式会影响残差数据的分布,从而影响失真的目标图像块的特征,因此,编码单元的模式信息可以作为一种表征失真类型的边信息。
如图2-4所示,边信息分量的矩阵结构与频域信息分量的矩阵结构相同,其中,坐标[0,0]、[0,1]表示失真位置,矩阵的元素值1表示失真程度,即边信息分量同时能表示失真程度与失真位置。
又如图2-5所示,坐标[0,0]、[0,1]、[2,0]、[2,4]表示失真位置,矩阵的元素值1、2表示失真类型,即边信息分量同时能表示失真类型与失真位置。
并且,本申请实施例提供的上述解决方案中,可以同时包括图2-4和图2-5分别所示意的两个边信息分量。
进一步的,根据方案的实际应用情况和需要,当频域信息分量包括多种时,边信息分量可以包括分别与每种频域信息分量对应的边信息分量。
本申请实施例提供的上述解决方案,可以应用于目前已知的各种实际应用场景中,例如,可应用于对图像进行超分辨率处理的应用场景中,本发明在此不做限定。
可选的,参见图2-6,卷积神经网络模型,包括:边信息分量生成模块11,卷积神经网络12,网络训练模块13;
边信息分量生成模块11可以用于生成边信息分量;网络训练模块13可以根据原始样本图像对卷积神经网络模型进行训练,使得训练后的卷积神经网络模型可以对输入的频域信息分量和边信息分量进行滤波得到去失真频域信息分量。
其中,卷积神经网络12可以包括如下三层结构:
输入层处理单元121,用于接收卷积神经网络的输入,本方案中包括频域信息分量,以及边信息分量;并对输入的数据进行第一层的卷积滤波处理;
隐含层处理单元122,对输入层处理单元121的输出数据,进行至少一层的卷积滤波处理;
输出层处理单元123,对隐含层处理单元122的输出数据,进行最后一层的卷积滤波处理,输出结果作为去失真频域信息分量,用于生成去失真图像。
图2-7为实现该解决方案的数据流的示意图,其中,频域信息分量以及边信息分量作为输入数据,输入到预先训练的卷积神经网络模型中;或者,根据边信息生成边信息引导图,该边信息可以为量化参数和/或帧间模式信息,频域信息分量以及边信息引导图作为输入数据,输入到预先训练的卷积神经网络模型中。卷积神经网络模型可以由预设结构的卷积神经网络和配置的网络参数集进行表示,输入数据经过输入层、隐含层和输出层的卷积滤波 处理之后,得到去失真频域信息分量。
作为卷积神经网络模型的输入数据,根据实际需要,可以包括一种或多种边信息分量,也可以包括一种或多种频域信息分量。
对于一个图像的每个像素点的存储数据,该存储数据为在该图像位于该像素点位置保存的数据,包括该像素点的所有颜色分量的值,在获得已处理视频数据对应的频域信息分量时,可以根据需要,从每个像素点的存储数据中,提取出需要的一种或多种颜色分量的值,从而得到已处理视频数据对应的频域信息分量。
参见图2-8,本步骤可以具体包括如下处理步骤:
本发明实施例中,以卷积神经网络模型包括输入层、隐含层和输出层的结构为例,对方案进行描述。
步骤61、将该频域信息分量以及该边信息分量,作为预先建立的卷积神经网络模型的输入数据,由输入层进行第一层的卷积滤波处理,具体可以如下:
参见图2-9,在卷积神经网络模型中,输入数据可以是通过各自的通道输入到网络中,本步骤中,可以将c y通道的频域信息分量Y与c m通道的边信息分量M,在通道的维度上进行合并,共同组成c y+c m通道的输入数据I,并采用如下公式对输入数据I进行多维卷积滤波和非线性映射,产生n 1个以稀疏形式表示的图像块:
F 1(I)=g(W 1*I+B 1);
其中,F 1(I)为输入层的输出,I为输入层中卷积层的输入,*为卷积操作,W 1为输入层的卷积层滤波器组的权重系数,B 1为输入层的卷积层滤波器组的偏移系数,g()为非线性映射函数。
其中,W 1对应于n 1个卷积滤波器,即有n 1个卷积滤波器作用于输入层的卷积层的输入,输出n 1个图像块;每个卷积滤波器的卷积核的大小为c 1×f 1×f 1,其中c 1为输入通道数,f 1为每个卷积核在空间上的大小。
在一个具体的实施例中,该输入层的参数可以为:c 1=2,f 1=5,n 1=64,使用ReLU(Rectified linear unit)函数作为g(),它的函数表达式为:
g(x)=max(0,x);
则该实施例中输入层卷积处理表达式为:
F 1(I)=max(0,W 1*I+B 1);
步骤62、隐含层对输入层输出的稀疏表示的图像块F 1(I)进行进一步的高维映射。
本发明实施例中,不对隐含层中包含的卷积层层数、卷积层连接方式、卷积层属性等作限定,可以采用目前已知的各种结构,但隐含层中包含至少1个卷积层。
例如,参见图2-9,隐含层包含N-1(N≥2)层卷积层,隐含层处理由下式表示:
F i(I)=g(W i*F i-1(I)+B i),i∈{2,3,…,N};
其中,F i(I)表示卷积神经网络中第i层卷积层的输出,*为卷积操作,W i为第i层卷积层滤波器组的权重系数,B i为第i层卷积层滤波器组的偏移系数,g()为非线性映射函数。
其中,W i对应于n i个卷积滤波器,即有n i个卷积滤波器作用于第i层卷积层的输入,输出n i个图像块;每个卷积滤波器的卷积核的大小为c i×f i×f i,其中c i为输入通道数,f i为每个卷积核在空间上的大小。
在一个具体的实施例中,该隐含层可以包括1个卷积层,该卷积层的卷积滤波器参数为: c 2=64,f 2=1,n 2=32,使用ReLU(Rectified linear unit)函数作为g(),则该实施例中隐含层的卷积处理表达式为:
F 2(I)=max(0,W 2*F 1(I)+B 2);
步骤63、输出层对隐含层输出的高维图像块F N(I)进行聚合,输出去失真频域信息分量,用于生成去失真图像。
本发明实施例中不对输出层的结构作限定,输出层可以是Residual Learning结构,也可以是Direct Learning结构,或者其他的结构。
采用Residual Learning结构的处理如下:
对隐含层的输出进行卷积操作获取补偿残差,再与输入的频域信息分量相加,得到去失真频域信息分量。输出层处理可由下式表示:
F(I)=W N+1*F N(I)+B N+1+Y;
其中,F(I)为输出层的输出,F N(I)为隐含层的输出,*为卷积操作,W N+1为输出层的卷积层滤波器组的权重系数,B N+1为输出层的卷积层滤波器组的偏移系数,Y为未经过卷积滤波处理、欲进行去失真处理的频域信息分量。
其中,W N+1对应于n N+1个卷积滤波器,即有n N+1个卷积滤波器作用于第N+1层卷积层的输入,输出n N+1个图像块,n N+1为输出的去失真频域信息分量个数,一般与输入的频域信息分量的个数相等,如果只输出一种去失真频域信息分量,则n N+1一般取值为1;每个卷积滤波器的卷积核的大小为c N+1×f N+1×f N+1,其中c N+1为输入通道数,f N+1为每个卷积核在空间上的大小。
采用Direct Learning结构的处理如下:
对隐含层的输出进行卷积操作后直接输出去失真频域信息分量,即得到去失真的图像块。输出层处理可由下式表示:
F(I)=W N+1*F N(I)+B N+1
其中,F(I)为输出层输出,F N(I)为隐含层的输出,*为卷积操作,W N+1为输出层的卷积层滤波器组的权重系数,B N+1为输出层的卷积层滤波器组的偏移系数。
其中,W N+1对应于n N+1个卷积滤波器,即有n N+1个卷积滤波器作用于第N+1层卷积层的输入,输出n N+1个图像块,n N+1为输出的去失真频域信息分量个数,一般与输入的频域信息分量的个数相等,如果只输出一种去失真频域信息分量,则n N+1一般取值为1;每个卷积滤波器的卷积核的大小为c N+1×f N+1×f N+1,其中c N+1为输入通道数,f N+1为每个卷积核在空间上的大小。
在一个具体的实施例中,该输出层采用Residual Learning结构,输出层包括1个卷积层,该输出层的卷积滤波器参数为:c 3=32,f 3=3,n 3=1,则该实施例中输出层的卷积处理表达式为:
F(I)=W 3*F 3(I)+B 3+Y。
在本发明实施例提供的上述解决方案中,还提出了一种卷积神经网络模型训练方法,如图2-10所示,具体包括如下处理步骤:
步骤71、获取预设训练集,预设训练集包括原始样本图像,以及原始样本图像对应的多个已处理视频数据对应的频域信息分量,以及每个已处理视频数据对应的边信息分量,其中,已处理视频数据对应的边信息分量表示该已处理视频数据相对原始样本图像的失真 特征。该多个失真图像的失真特征不同。
本步骤中,可以预先对原始样本图像(即未失真的自然图像),进行不同失真程度的一种图像处理,得到各自对应的已处理视频数据,并按照上述去失真方法中的步骤,针对每个已处理视频数据,生成对应的边信息分量,从而将每个原始样本图像、对应的已处理视频数据以及对应的边信息分量组成图像对,由这些图像对组成预设训练集Ω。
在训练集中,原始样本图像对应的已处理视频数据以及该原始样本图像对应的边信息分量作为卷积神经网络CNN的训练样本,而原始样本图像中的原始图像颜色分量作为该训练样本的标注信息。在训练集中每个训练样本对应一个原始样本图像。
进一步的,训练集可以包括一个原始样本图像,针对该原始样本图像进行上述图像处理,得到失真特征不同的多个已处理视频数据,以及每个已处理视频数据对应的边信息分量;
训练集也可以包括多个原始样本图像,分别针对每个原始样本图像进行上述图像处理,得到失真特征不同的多个已处理视频数据,以及每个已处理视频数据对应的边信息分量。
步骤72、针对预设结构的卷积神经网络CNN,初始化该卷积神经网络CNN的网络参数集中的参数,初始化的参数集可以由θ 1表示,初始化的参数可以根据实际需要和经验进行设置。
本步骤中,还可以对训练相关的高层参数如学习率、梯度下降算法等进行合理的设置,具体可以采用现有技术中的各种方式,在此不再进行详细描述。
步骤73、进行前向计算,具体如下:
将预设训练集中的每个已处理视频数据对应的频域信息分量以及对应的边信息分量,输入预设结构的卷积神经网络,由该卷积神经网络对训练集中的原始样本图像对应的多个已处理视频数据以及该原样本图像对应的频域信息分量进行卷积滤波处理,得到该已处理视频数据对应的去失真频域信息分量。
本步骤中,具体可以为对预设训练集Ω进行参数集为θ i的卷积神经网络CNN的前向计算,获取卷积神经网络的输出F(Y),即每个已处理视频数据对应的去失真频域信息分量。
第一次进入本步骤处理时,当前参数集为θ 1,后续再次进入本步骤处理时,当前参数集θ i为对上一次使用的参数集θ i-1进行调整后得到的,详见后续描述。
可选的,每次进行前向计算时,可以从训练集中选择H个训练样本,即选择H个已处理视频数据对应的由边信息分量和频域信息分量,H为大于或等于1的整数。每次对选择的H个已处理视频数据对应的由边信息分量和频域信息分量进行前向计算。
每次对选择的H个训练样本进行前向计算时,可以得到每个训练样本对应的去失真频域信息分量。
步骤74、基于多个原始样本图像的原始图像颜色分量和得到的去失真频域信息分量,确定多个原始样本图像的损失值。
具体可以使用均方误差(MSE)公式作为损失函数,得到损失值L(θ i),详见如下公式:
Figure PCTCN2019113356-appb-000001
其中,H表示单次训练中从预设训练集中选取的图像对个数,I h表示第h个已处理视频数据对应的由边信息分量和频域信息分量合并后的输入数据,F(I hi)表示针对第h个已处 理视频数据,卷积神经网络CNN在参数集θ i下前向计算得到的去失真频域信息分量,X h表示第h个已处理视频数据对应的原始样本图像中的原始图像颜色分量,i为当前已进行前向计算的次数计数。
步骤75、基于损失值确定采用当前参数集的该预设结构的卷积神经网络是否收敛,如果不收敛,进入步骤76,如果收敛,进入步骤77。
具体的,可以当损失值小于预设损失值阈值时,确定收敛;也可以当本次计算得到损失值与上一次计算得到的损失值之差,小于预设变化阈值时,确定收敛,本发明在此不做限定。
步骤76,对当前参数集中的参数进行调整,得到调整后的参数集,然后进入步骤73,用于下一次前向计算。
具体可以利用反向传播算法对当前参数集中的参数进行调整。
可选的,在本步骤中,可以根据多个原始样本图像的损失值,对当前参数集中的参数进行调整,得到调整后的参数集。或者,
可选的,在本步骤中,对于该选择的每个训练样本对应的去失真频域信息分量和每个训练样本对应的原始样本图像,根据每个训练样本对应的去失真频域信息分量和原始样本图像中的原始图像颜色分量之间的差异,对当前参数集中的参数进行调整,得到调整后的参数集。
步骤77、将当前参数集作为输出的最终参数集θ final,并将采用最终参数集θ final的该预设结构的卷积神经网络,作为训练完成的卷积神经网络模型。
步骤203:根据去失真频域信息分量生成已处理视频数据对应的去失真图像。
可选的,可以将去失真图像作为参考图像,使用参考图像对当前原始视频数据之后的原始视频数据进行编码得到视频比特流。
可选的,当采用第一种视频编码系统进行视频编码时,去失真频域信息分量是频域重建图像。所以在本步骤中,通过第二反变换单元对去失真频域信息分量进行反变换,将反变换后的视频数据确定为去失真图像,还可以将去失真图像作为参考图像保存在缓存器中。这样运动估计与运动补偿单元根据缓存器中缓存的参考图像对当前原始视频数据之后的原始视频数据进行帧间预测得到帧间模式信息,以实现使用参考图像对当前原始视频数据之后的原始视频数据进行编码得到视频比特流。
可选的,当采用第二种视频编码系统进行视频编码时,去失真频域信息分量是频域残差系数。所以在本步骤中,通过第一反变换单元对去失真频域信息分量进行反变换,将反变换之后的频域信息输入到重建单元,通过重建单元根据反变换之后的频域信息和模式信息(帧内模式信息和帧间模式信息)输出去失真重建视频数据,该去失真重建视频数据为去失真图像,可以将该去失真图像作为参考图像并保存在缓存器中。这样运动估计与运动补偿单元根据缓存器中缓存的参考图像对当前原始视频数据之后的原始视频数据进行帧间预测得到帧间模式信息,以实现使用参考图像对当前原始视频数据之后的原始视频数据进行编码得到视频比特流。
在本申请实施例中,在视频编码过程中,获取视频编码过程中视频编码系统产生的频域信息分量和边信息分量,通过CNN对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域 上发生的失真,所以使用去失真频域信息分量生成去失真图像并作为参考图像,可以提高参考图像的主观质量,进而使用参考图像对当前原始视频数据之后的原始视频数据进行编码,提高了后续编码视频数据的准确性。
参见图3,本申请实施例提供了一种图像处理的方法,所述方法包括:
步骤301:获取已处理视频数据对应的频域信息分量和边信息分量,已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,该边信息分量表示已处理视频数据相对原始视频数据的失真特征。
步骤302:将该频域信息分量和该边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,该去失真频域信息分量是以该边信息分量为引导对该频域信息分量进行滤波之后得到的。
步骤303:根据该去失真频域信息分量,生成已处理视频数据对应的去失真图像。
在本申请实施例中,在视频解码过程中,获取视频解码过程中视频解码系统产生的频域信息分量和边信息分量,通过CNN对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域上发生的失真,所以使用去失真频域信息分量生成去除失真的图像,可以提高该图像的主观质量。
对于图3所示的图像处理的方法,参见图4-1,该方法的详细实现过程,可以包括:
步骤401:对接收的视频比特流进行熵解码,得到当前熵解码数据。
步骤402:获取已处理视频数据对应的频域信息分量和边信息分量。
其中,频域信息分量和边信息分量为对当前熵解码数据进行解码时生成的,边信息分量表示已处理视频数据相对原始视频数据的失真特征,原始视频数据是当前熵解码数据对应的视频数据。
可选的,可以使用视频解码系统进行视频解码,从视频解码系统中获取频域信息分量和边信息分量。视频解码系统有多种,在本步骤中列举了如下三种视频编码系统。
第一种视频解码系统,参见图4-2所示的第一种视频解码系统的结构示意图,第一种视频解码系统包括预测模块、熵解码器、反量化单元、第一反变换单元、重建单元、CNN(卷积神经网络模型)和缓存器等部分组成。
使用第一种视频解码系统解码的过程为:将接收的视频比特流输入到熵解码器中,熵解码器对该比特流进行熵解码得到熵解码数据,该熵解码数据包括模式信息、量化参数、量化残差信息等,该量化残差信息即为已处理视频数据,将该模式信息输入到预测模块中,将该量化残差信息输入到反量化单元中,反量化单元对该量化残差信息进行反量化处理得到第二残差系数。预测模块根据缓存器中的参考图像对输入的该模式信息进行预测得到预测模式信息,并将该预测模式信息输入重建单元。其中,预测模块包括帧内预测单元、运动补偿单元和开关,模式信息可以包括帧内模式信息和帧间模式信息,加法器选择将帧内模式信息或帧间模式信息输入到重建单元。帧内预测单元可以对帧内模式信息进行预测得到帧内预测模式信息,运动补偿单元根据缓存器中缓存的参考图像对帧间模式信息进行帧间预测得到帧间预测模式信息,开关选择将帧内预测模式信息或将帧间预测模式信息输出 给重建单元。相应的,在本步骤中,获取反量化单元产生的第二残差系数作为已处理视频数据对应的频域信息分量以及获取熵解码器产生的量化参数和帧间模式信息,根据该量化参数和帧间模式信息生成已处理视频数据对应的边信息分量。
可选的,根据该量化参数和帧间模式信息,生成边信息分量的操作,可以为:
根据该量化参数,生成边信息引导图,该边信息引导图是根据该量化参数生成的与当前原始视频数据等高等宽的引导图;
根据该帧间模式信息,对该边信息引导图进行更新,生成与该帧间模式信息匹配的边信息引导图;
将与该帧间模式信息匹配的边信息引导图确定为该边信息分量。
第二种视频解码系统,参见图4-3,第二种视频解码系统与第一种视频解码系统的差别在于:在第二种视频解码系统中,反量化单元与第一反变换单元相连,在重建单元和缓存器之间串联变换单元、CNN和第二反变换单元。使用第二种视频解码系统解码的过程与使用第一种视频解码系统的过程不同在于:反量化单元对熵解码器输入的量化残差信息进行反量化处理得到第二残差系数,向第一反变换单元输入第二残差系数;第一反变换单元对第二残差系数进行反变换处理得到第五残差信息,将第五残差信息输入到重建单元;重建单元根据输入的第五残差信息和帧内预测模式信息或者根据第五残差信息和帧间预测模式,生成失真重建视频数据,该失真重建视频数据为已处理视频数据,将失真重建视频数据输入到变换单元;变换单元对失真重建视频数据进行变换处理得到频域信息分量。相应的,在本步骤中,获取对失真重建视频数据进行变换得到已处理视频数据对应的频域信息分量以及获取熵解码器产生的量化参数,根据该量化参数生成已处理视频数据对应的边信息分量。
第三种视频解码系统,参见图4-4所示的第三种视频解码系统的结构示意图,第三种视频解码系统包括帧内预测模块、熵解码器、反量化单元、第一反变换单元、重建单元、变换单元、CNN(卷积神经网络模型)、第二反变换单元和缓存器等部分组成。
使用第三种视频解码系统解码的过程为:将接收的视频比特流输入到熵解码器中,熵解码器对该比特流进行解码得到熵解码数据,该熵解码数据包括帧内模式信息、量化参数、量化残差信息等,将该帧内模式信息输入到帧内预测模块中,将该量化残差信息输入到反量化单元中,反量化单元对该量化残差信息进行反量化处理得到第二残差系数,向第一反变换单元输入第二残差系数;第一反变换单元对第二残差系数进行反变换处理得到第五残差信息,将第五残差信息输入到重建单元。帧内预测模块根据缓存器中的参考图像对输入的该帧内模式信息进行预测得到帧内预测模式信息,并将该帧内预测模式信息输入到重建单元。重建单元根据输入的第五残差信息和帧内预测模式信息,生成失真重建视频数据,该失真重建视频数据为已处理视频数据,将失真重建视频数据输入到变换单元;变换单元对失真重建视频数据进行变换处理得到频域信息分量。相应的,在本步骤中,获取对失真重建视频数据进行变换得到已处理视频数据对应的频域信息分量以及获取熵解码器产生的量化参数,根据该量化参数生成已处理视频数据对应的边信息分量。
步骤403:将频域信息分量和边信息分量输入卷积神经网络模型进行卷积滤波处理,得到已处理视频数据对应的去失真频域信息分量。
卷积神经网络模型为基于预设训练集进行训练得到的,预设训练集包括原始样本图像 的图像信息,原始样本图像对应的多个频域信息分量,以及每个原始样本图像对应的已处理视频数据所对应的边信息分量。
在训练集中,原始样本图像对应的已处理视频数据以及该原始样本图像对应的边信息分量作为卷积神经网络模型的训练样本,而原始样本图像中的原始图像颜色分量作为该训练样本的标注信息。在训练集中每个训练样本对应一个原始样本图像。
具体训练过程可以参见上述图2-10中的步骤71至77的流程,在此不再详细说明。
步骤404:根据去失真频域信息分量生成去失真图像。
可选的,当采用第一种视频编码系统进行视频编码时,去失真频域信息分量是频域重建图像。所以在本步骤中,第一反变换单元对CNN输出的去失真频域信息分量进行反变换得到第五残差信息,向重建单元输入第五残差信息;重建单元根据帧内预测模式信息和第五残差信息或者根据帧间预测模式信息和第五残差信息,生成去失真重建视频数据,该去失真重建视频数据为去失真图像。
可选的,当采用第二或第三种视频解码系统进行视频编码时,去失真频域信息分量是频域残差系数。所以在本步骤中,第二反变换单元对CNN输出的去失真频域信息分量进行反变换得到去失真图像。
步骤405:将该去失真图像作为参考图像,根据该参考图像,对后续接收的视频比特流进行解码。
在本步骤中,将得到的去失真图像作为参考图像,保存在缓存器中。或者,当采用第三种视频解码系统进行视频编码时,可以直接显示去失真图像。
在本申请实施例中,在视频解码过程中,获取视频解码过程中视频解码系统产生的频域信息分量和边信息分量,通过CNN对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到已处理视频数据对应的去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域上发生的失真,所以使用去失真频域信息分量生成参考图像,提高参考图像的主观质量,使用参考图像对当前原始视频数据之后的视频比特流进行编码,提高了解码的准确性。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
参见图5,本申请实施例提供了本申请提供了一种图像处理的装置500,所述装置500包括:
获取模块501,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
滤波模块502,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
生成模块503,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
可选的,所述获取模块501,用于:
从所述编码系统中获取所述已处理视频数据,所述已处理视频数据是所述原始视频数据的初始残差数据经变换和量化之后产生的待编码视频数据;
对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第一残差系数;
根据所述第一残差系数,生成所述频域信息分量;
获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
从所述编码系统中获取所述已处理视频数据对应的帧间模式信息;
根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
可选的,所述生成模块503,用于:
对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
将所述去失真重建视频数据确定为所述去失真图像。
可选的,所述获取模块501,用于:
根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
可选的,所述已处理视频数据为所述原始视频数据对应的失真重建视频数据;
所述获取模块501,用于:
对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
根据所述量化参数,生成所述边信息分量。
可选的,所述生成模块503,用下:
对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
在本申请实施例中,在视频编码过程中,获取视频编码过程中视频编码系统产生的频域信息分量和边信息分量,通过CNN对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域上发生的失真,所以使用去失真频域信息分量生成的图像去除了失真,使用该图像作为参考图像,使用参考图像对当前原始视频数据之后的原始视频数据进行编码,提高了后续编码图像的确定性。
参见图6,本申请实施例提供了一种图像处理的装置600,所述装置600包括:
获取模块601,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,所 述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
滤波模块602,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
生成模块603,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
可选的,所述获取模块601,用于:
从所述解码系统中获取所述已处理视频数据,所述已处理视频数据是所述解码系统对视频比特流进行熵解码输出的量化残差信息;
对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第二残差系数;
根据所述第二残差系数,生成所述频域信息分量;
获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数和帧间模式信息,所述量化参数用于表征量化步长;
根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
可选的,所述生成模块603,用于:
对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
将所述去失真重建视频数据确定为所述去失真图像。
可选的,所述获取模块601,用于:
根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
可选的,所述已处理视频数据为所述原始视频数据对应的失真重建视频数据;
所述获取模块601,用于:
对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数,所述量化参数用于表征量化步长;
根据所述量化参数,生成所述边信息分量。
可选的,所述生成模块603,用于:
对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
在本申请实施例中,在视频解码过程中,获取视频解码过程中视频解码系统产生的频域信息分量和边信息分量,通过CNN对视频编码系统产生的频域信息分量和边信息分量进行滤波处理,得到去失真频域信息分量,由于滤波后的去失真频域信息分量去除了在频域上发生的失真,所以使用去失真频域信息分量可以生成去失真图像,使用去失真图像作为参考图像对当前原始视频数据之后的视频比特流进行编码,提高了解码的确定性。
参见图7,本申请实施例提供了一种图像处理的系统700,所述系统700包括如图5所示实施例提供的视频编码装置701和如图6所示实施例提供的视频解码装置702。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
图8示出了本发明一个示例性实施例提供的电子设备800的结构框图。该电子设备800可以是便携式移动终端,比如:智能手机、平板电脑、笔记本电脑或台式电脑。电子设备800还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,电子设备800包括有:处理器801和存储器802。
处理器801可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器801可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器801也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器801可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器801还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器802可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器802还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器802中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器801所执行以实现本申请中方法实施例提供的视频编码方法或视频解码方法。
在一些实施例中,电子设备800还可选包括有:外围设备接口803和至少一个外围设备。处理器801、存储器802和外围设备接口803之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口803相连。具体地,外围设备包括:射频电路804、触摸显示屏805、摄像头806、音频电路807、定位组件808和电源809中的至少一种。
外围设备接口803可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器801和存储器802。在一些实施例中,处理器801、存储器802和外围设备接口803被集成在同一芯片或电路板上;在一些其他实施例中,处理器801、存储器802和外围设备接口803中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路804用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路804通过电磁信号与通信网络以及其他通信设备进行通信。射频电路804将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。可选地,射频电路804包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、 编解码芯片组、用户身份模块卡等等。射频电路804可以通过至少一种无线通信协议来与其它终端进行通信。该无线通信协议包括但不限于:万维网、城域网、内联网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路804还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本申请对此不加以限定。
显示屏805用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏805是触摸显示屏时,显示屏805还具有采集在显示屏805的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器801进行处理。此时,显示屏805还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏805可以为一个,设置电子设备800的前面板;在另一些实施例中,显示屏805可以为至少两个,分别设置在电子设备800的不同表面或呈折叠设计;在再一些实施例中,显示屏805可以是柔性显示屏,设置在电子设备800的弯曲表面上或折叠面上。甚至,显示屏805还可以设置成非矩形的不规则图形,也即异形屏。显示屏805可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件806用于采集图像或视频。可选地,摄像头组件806包括前置摄像头和后置摄像头。通常,前置摄像头设置在电子设备的前面板,后置摄像头设置在电子设备的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件806还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路807可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器801进行处理,或者输入至射频电路804以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在电子设备800的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器801或射频电路804的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路807还可以包括耳机插孔。
定位组件808用于定位电子设备800的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。定位组件808可以是基于美国的GPS(Global Positioning System,全球定位系统)、中国的北斗系统或俄罗斯的伽利略系统的定位组件。
电源809用于为电子设备800中的各个组件进行供电。电源809可以是交流电、直流电、一次性电池或可充电电池。当电源809包括可充电电池时,该可充电电池可以是有线充电电池或无线充电电池。有线充电电池是通过有线线路充电的电池,无线充电电池是通过无线线圈充电的电池。该可充电电池还可以用于支持快充技术。
在一些实施例中,电子设备800还包括有一个或多个传感器810。该一个或多个传感器 810包括但不限于:加速度传感器811、陀螺仪传感器812、压力传感器813、指纹传感器814、光学传感器815以及接近传感器816。
加速度传感器811可以检测以电子设备800建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器811可以用于检测重力加速度在三个坐标轴上的分量。处理器801可以根据加速度传感器811采集的重力加速度信号,控制触摸显示屏805以横向视图或纵向视图进行用户界面的显示。加速度传感器811还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器812可以检测电子设备800的机体方向及转动角度,陀螺仪传感器812可以与加速度传感器811协同采集用户对电子设备800的3D动作。处理器801根据陀螺仪传感器812采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器813可以设置在电子设备800的侧边框和/或触摸显示屏805的下层。当压力传感器813设置在电子设备800的侧边框时,可以检测用户对电子设备800的握持信号,由处理器801根据压力传感器813采集的握持信号进行左右手识别或快捷操作。当压力传感器813设置在触摸显示屏805的下层时,由处理器801根据用户对触摸显示屏805的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器814用于采集用户的指纹,由处理器801根据指纹传感器814采集到的指纹识别用户的身份,或者,由指纹传感器814根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器801授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器814可以被设置电子设备800的正面、背面或侧面。当电子设备800上设置有物理按键或厂商Logo时,指纹传感器814可以与物理按键或厂商Logo集成在一起。
光学传感器815用于采集环境光强度。在一个实施例中,处理器801可以根据光学传感器815采集的环境光强度,控制触摸显示屏805的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏805的显示亮度;当环境光强度较低时,调低触摸显示屏805的显示亮度。在另一个实施例中,处理器801还可以根据光学传感器815采集的环境光强度,动态调整摄像头组件806的拍摄参数。
接近传感器816,也称距离传感器,通常设置在电子设备800的前面板。接近传感器816用于采集用户与电子设备800的正面之间的距离。在一个实施例中,当接近传感器816检测到用户与电子设备800的正面之间的距离逐渐变小时,由处理器801控制触摸显示屏805从亮屏状态切换为息屏状态;当接近传感器816检测到用户与电子设备800的正面之间的距离逐渐变大时,由处理器801控制触摸显示屏805从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图8中示出的结构并不构成对电子设备800的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本领域技术人员在考虑说明书及实践这里公开的申请后,将容易想到本申请的其它实施方案。本申请旨在涵盖本申请的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本申请的一般性原理并包括本申请未公开的本技术领域中的公知常识或惯 用技术手段。说明书和实施例仅被视为示例性的,本申请的真正范围和精神由下面的权利要求指出。
应当理解的是,本申请并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本申请的范围仅由所附的权利要求来限制。

Claims (27)

  1. 一种图像处理的方法,其特征在于,所述方法包括:
    获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
    将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
    根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
  2. 如权利要求1所述的方法,其特征在于,所述获取所述已处理视频数据对应的频域信息分量,包括:
    从所述编码系统中获取所述已处理视频数据,所述已处理视频数据是所述原始视频数据的初始残差数据经变换和量化之后产生的待编码视频数据;
    对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第一残差系数;
    根据所述第一残差系数,生成所述频域信息分量;
    所述获取所述已处理视频数据对应的边信息分量,包括:
    获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
    从所述编码系统中获取所述已处理视频数据对应的帧间模式信息;
    根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
  3. 如权利要求2所述的方法,其特征在于,所述根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像,包括:
    对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
    将所述去失真重建视频数据确定为所述去失真图像。
  4. 如权利要求3所述的方法,其特征在于,所述根据所述量化参数和所述帧间模式信息,生成所述边信息分量,包括:
    根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
    根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
    将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
  5. 如权利要求1所述的方法,其特征在于,所述已处理视频数据为所述原始视频数据对 应的失真重建视频数据;
    所述获取所述已处理视频数据对应的频域信息分量,包括:
    对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
    所述获取所述已处理视频数据对应的边信息分量,包括:
    获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
    根据所述量化参数,生成所述边信息分量。
  6. 如权利要求5所述的方法,其特征在于,所述根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像,包括:
    对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
  7. 一种图像处理的方法,其特征在于,所述方法包括:
    获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
    将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
    根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
  8. 如权利要求7所述的方法,其特征在于,所述获取已处理视频数据对应的频域信息分量和边信息分量,包括:
    从所述解码系统中获取所述已处理视频数据,所述已处理视频数据是所述解码系统对视频比特流进行熵解码输出的量化残差信息;
    对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第二残差系数;
    根据所述第二残差系数,生成所述频域信息分量;
    所述获取所述已处理视频数据对应的边信息分量,包括:
    获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数和帧间模式信息,所述量化参数用于表征量化步长;
    根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
  9. 如权利要求8所述的方法,其特征在于,所述根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像,包括:
    对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
    将所述去失真重建视频数据确定为所述去失真图像。
  10. 如权利要求8所述的方法,其特征在于,所述根据所述量化参数和所述帧间模式信息,生成所述边信息分量,包括:
    根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
    根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
    将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
  11. 如权利要求7所述的方法,其特征在于,所述已处理视频数据为所述原始视频数据对应的失真重建视频数据;
    所述获取所述已处理视频数据对应的频域信息分量,包括:
    对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
    所述获取所述已处理视频数据对应的边信息分量,包括:
    获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数,所述量化参数用于表征量化步长;
    根据所述量化参数,生成所述边信息分量。
  12. 如权利要求11所述的方法,其特征在于,所述根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像,包括:
    对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
  13. 一种图像处理的装置,其特征在于,所述装置包括:
    获取模块,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入编码系统的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
    滤波模块,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
    生成模块,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
  14. 如权利要求13所述的装置,其特征在于,所述获取模块,用于:
    从所述编码系统中获取所述已处理视频数据,所述已处理视频数据是所述原始视频数据的初始残差数据经变换和量化之后产生的待编码视频数据;
    对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第一残差系数;
    根据所述第一残差系数,生成所述频域信息分量;
    获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
    从所述编码系统中获取所述已处理视频数据对应的帧间模式信息;
    根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
  15. 如权利要求14所述的装置,其特征在于,所述生成模块,用于:
    对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
    将所述去失真重建视频数据确定为所述去失真图像。
  16. 如权利要求15所述的装置,其特征在于,所述获取模块,用于:
    根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
    根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
    将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
  17. 如权利要求13所述的装置,其特征在于,所述已处理视频数据为所述原始视频数据对应的失真重建视频数据;
    所述获取模块,用于:
    对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
    获取所述原始视频数据在编码中产生的初始残差数据经变换处理之后进行量化处理时采用的量化参数,所述量化参数用于表征量化步长;
    根据所述量化参数,生成所述边信息分量。
  18. 如权利要求17所述的装置,其特征在于,所述生成模块,用下:
    对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
  19. 一种图像处理的装置,其特征在于,所述装置包括:
    获取模块,用于获取已处理视频数据对应的频域信息分量和边信息分量,所述已处理视频数据相对于输入解码系统的视频比特流对应的编码前的原始视频数据存在失真,所述边信息分量表示所述已处理视频数据相对所述原始视频数据的失真特征;
    滤波模块,用于将所述频域信息分量和所述边信息分量输入卷积神经网络模型进行卷积滤波处理得到去失真频域信息分量,所述去失真频域信息分量是以所述边信息分量为引导对所述频域信息分量进行滤波之后得到的;
    生成模块,用于根据所述去失真频域信息分量,生成所述已处理视频数据对应的去失真图像。
  20. 如权利要求19所述的装置,其特征在于,所述获取模块,用于:
    从所述解码系统中获取所述已处理视频数据,所述已处理视频数据是所述解码系统对视频比特流进行熵解码输出的量化残差信息;
    对所述已处理视频数据进行反量化处理,生成所述已处理视频数据经过反量化之后产生的第二残差系数;
    根据所述第二残差系数,生成所述频域信息分量;
    获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数和帧间模式信息,所述量化参数用于表征量化步长;
    根据所述量化参数和所述帧间模式信息,生成所述边信息分量。
  21. 如权利要求20所述的装置,其特征在于,所述生成模块,用于:
    对所述去失真频域分量进行反变换,根据反变换之后的频域信息生成所述已处理视频数据对应的去失真重建视频数据;
    将所述去失真重建视频数据确定为所述去失真图像。
  22. 如权利要求20所述的装置,其特征在于,所述获取模块,用于:
    根据所述量化参数,生成边信息引导图,所述边信息引导图是根据所述量化参数生成的与所述原始视频数据等高等宽的引导图;
    根据所述帧间模式信息,对所述边信息引导图进行更新,生成与所述帧间模式信息匹配的边信息引导图;
    将与所述帧间模式信息匹配的边信息引导图确定为所述边信息分量。
  23. 如权利要求19所述的装置,其特征在于,所述已处理视频数据为所述原始视频数据对应的失真重建视频数据;
    所述获取模块,用于:
    对所述已处理视频数据进行变换处理,根据变换处理之后得到的视频数据频域信息生成所述频域信息分量;
    获取所述解码系统对视频比特流进行熵解码输出的量化残差信息量化参数,所述量化参数用于表征量化步长;
    根据所述量化参数,生成所述边信息分量。
  24. 如权利要求23所述的装置,其特征在于,所述生成模块,用于:
    对所述去失真频域分量进行反变换,将反变换之后的视频数据确定所述去失真图像。
  25. 一种电子设备,其特征在于,所述电子设备包括:
    至少一个处理器;和
    至少一个存储器;
    所述至少一个存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述至少一个处理器执行,以执行如权利要求1至6任一项所述的方法的指令,或者,以执行如权利 要求7至12任一项所述的方法的指令。
  26. 一种非易失性计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序通过处理器进行加载,来执行如权利要求1至6任一项所述的方法的指令,或者,来执行如权利要求7至12任一项所述的方法的指令。
  27. 一种图像处理的系统,其特征在于,所述系统包括如权利要求13至18任一项所述的视频编码装置和如权利要求19至24任一项所述的视频解码装置。
PCT/CN2019/113356 2018-10-25 2019-10-25 图像处理的方法、装置及系统 WO2020083385A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811253559.XA CN111107357B (zh) 2018-10-25 2018-10-25 一种图像处理的方法、装置、系统及存储介质
CN201811253559.X 2018-10-25

Publications (1)

Publication Number Publication Date
WO2020083385A1 true WO2020083385A1 (zh) 2020-04-30

Family

ID=70330942

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/113356 WO2020083385A1 (zh) 2018-10-25 2019-10-25 图像处理的方法、装置及系统

Country Status (2)

Country Link
CN (1) CN111107357B (zh)
WO (1) WO2020083385A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111787187B (zh) * 2020-07-29 2021-07-02 上海大学 利用深度卷积神经网络进行视频修复的方法、系统、终端
CN113177451B (zh) * 2021-04-21 2024-01-12 北京百度网讯科技有限公司 图像处理模型的训练方法、装置、电子设备及存储介质
TWI779957B (zh) * 2021-12-09 2022-10-01 晶睿通訊股份有限公司 影像分析模型建立方法及其影像分析設備

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105850136A (zh) * 2013-12-22 2016-08-10 Lg电子株式会社 使用预测信号和变换编译信号预测视频信号的方法和装置
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
EP3319039A1 (en) * 2016-11-07 2018-05-09 UMBO CV Inc. A method and system for providing high resolution image through super-resolution reconstruction
WO2018099579A1 (en) * 2016-12-02 2018-06-07 Huawei Technologies Co., Ltd. Apparatus and method for encoding an image
CN108491926A (zh) * 2018-03-05 2018-09-04 东南大学 一种基于对数量化的低比特高效深度卷积神经网络硬件加速设计方法、模块及系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7088860B2 (en) * 2001-03-28 2006-08-08 Canon Kabushiki Kaisha Dynamically reconfigurable signal processing circuit, pattern recognition apparatus, and image processing apparatus
CN107197260B (zh) * 2017-06-12 2019-09-13 清华大学深圳研究生院 基于卷积神经网络的视频编码后置滤波方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105850136A (zh) * 2013-12-22 2016-08-10 Lg电子株式会社 使用预测信号和变换编译信号预测视频信号的方法和装置
CN107925762A (zh) * 2015-09-03 2018-04-17 联发科技股份有限公司 基于神经网络的视频编解码处理方法和装置
EP3319039A1 (en) * 2016-11-07 2018-05-09 UMBO CV Inc. A method and system for providing high resolution image through super-resolution reconstruction
WO2018099579A1 (en) * 2016-12-02 2018-06-07 Huawei Technologies Co., Ltd. Apparatus and method for encoding an image
CN108491926A (zh) * 2018-03-05 2018-09-04 东南大学 一种基于对数量化的低比特高效深度卷积神经网络硬件加速设计方法、模块及系统

Also Published As

Publication number Publication date
CN111107357B (zh) 2022-05-31
CN111107357A (zh) 2020-05-05

Similar Documents

Publication Publication Date Title
CN108305236B (zh) 图像增强处理方法及装置
WO2020228519A1 (zh) 字符识别方法、装置、计算机设备以及存储介质
CN108810538B (zh) 视频编码方法、装置、终端及存储介质
WO2021036429A1 (zh) 解码方法、编码方法及装置
WO2019141193A1 (zh) 对视频帧数据进行处理的方法和装置
US11388403B2 (en) Video encoding method and apparatus, storage medium, and device
WO2020083385A1 (zh) 图像处理的方法、装置及系统
CN111696570B (zh) 语音信号处理方法、装置、设备及存储介质
CN110933334B (zh) 视频降噪方法、装置、终端及存储介质
CN110503160B (zh) 图像识别方法、装置、电子设备及存储介质
CN111445392A (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
CN110796248A (zh) 数据增强的方法、装置、设备及存储介质
WO2024016611A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN110991457A (zh) 二维码处理方法、装置、电子设备及存储介质
WO2023087637A1 (zh) 视频编码方法和装置、电子设备和计算机可读存储介质
CN110572710B (zh) 视频生成方法、装置、设备及存储介质
WO2019141258A1 (zh) 一种视频编码方法、视频解码方法、装置及系统
CN113822955B (zh) 图像数据处理方法、装置、计算机设备及存储介质
CN111698512B (zh) 视频处理方法、装置、设备及存储介质
CN110460856B (zh) 视频编码方法、装置、编码设备及计算机可读存储介质
CN114332709A (zh) 视频处理方法、装置、存储介质以及电子设备
CN111310701B (zh) 手势识别方法、装置、设备及存储介质
CN108881739B (zh) 图像生成方法、装置、终端及存储介质
CN112750449A (zh) 回声消除方法、装置、终端、服务器及存储介质
CN110062225B (zh) 一种图片滤波的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19875137

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19875137

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19875137

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/12/2021)