CN110996101B - Video coding method and device - Google Patents

Video coding method and device Download PDF

Info

Publication number
CN110996101B
CN110996101B CN201911157969.9A CN201911157969A CN110996101B CN 110996101 B CN110996101 B CN 110996101B CN 201911157969 A CN201911157969 A CN 201911157969A CN 110996101 B CN110996101 B CN 110996101B
Authority
CN
China
Prior art keywords
image
frame image
coded
target
coding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911157969.9A
Other languages
Chinese (zh)
Other versions
CN110996101A (en
Inventor
郑振贵
黄学辉
林鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wangsu Science and Technology Co Ltd
Original Assignee
Wangsu Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wangsu Science and Technology Co Ltd filed Critical Wangsu Science and Technology Co Ltd
Priority to CN201911157969.9A priority Critical patent/CN110996101B/en
Priority to PCT/CN2020/070701 priority patent/WO2021098030A1/en
Publication of CN110996101A publication Critical patent/CN110996101A/en
Application granted granted Critical
Publication of CN110996101B publication Critical patent/CN110996101B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/625Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding using discrete cosine transform [DCT]

Abstract

The invention discloses a method and a device for video coding, and belongs to the technical field of video processing. The method comprises the following steps: extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology; setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area; and coding the frame image to be coded based on the set quantization parameter corresponding to each image area. By adopting the invention, the coding resources can be more reasonably distributed, and the quality of the coded video picture and the video watching experience can be integrally improved.

Description

Video coding method and device
Technical Field
The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for video encoding.
Background
After image information is acquired by image acquisition equipment such as a camera, original video data can be generated, the original video data is composed of a large number of frame images which are arranged in sequence, and the data volume is very large. In order to facilitate transmission and storage of video, a video coding technique may be used to encode and compress original video data to remove redundant information in the original video data, and then the encoded video data is transmitted or stored.
Existing video coding may mainly include intra-coding and inter-coding: when the intra-frame coding is carried out, discrete cosine transform, quantization, entropy coding and other processing can be sequentially carried out on the frame image, so that compressed image data can be obtained; when the inter-frame coding is performed, a motion vector between a frame image to be coded and a reference frame image can be calculated, a predicted image is generated through the reference frame image and the motion vector, then the predicted image and the frame image to be coded are compared to generate a difference image, and then discrete cosine transform, quantization, entropy coding and other processing can be sequentially performed on the difference image, so that the compressed image data can be obtained.
In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:
at present, when a frame image is quantized, the frame image is often divided into a plurality of image blocks, and then all the image blocks are quantized by using the same quantization parameter, or the quantization parameter is directly selected according to the content complexity of each image block for quantization. However, encoding according to the two quantization methods may result in that secondary picture content is encoded too finely, and key picture content cannot be encoded more finely, that is, device encoding resources are not reasonably allocated, so that the viewing experience of the encoded video picture is low, and the picture quality is poor.
Disclosure of Invention
To solve the problems of the prior art, embodiments of the present invention provide a method and apparatus for video encoding. The technical scheme is as follows:
in a first aspect, a method for video coding is provided, the method comprising:
extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology;
setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;
and coding the frame image to be coded based on the set quantization parameter corresponding to each image area.
In a second aspect, an apparatus for video encoding is provided, the apparatus comprising:
the image analysis module is used for extracting a frame image to be coded of a target video and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology;
the parameter setting module is used for setting the quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;
and the video coding module is used for coding the frame image to be coded based on the set quantization parameter corresponding to each image area.
In a third aspect, there is provided a network device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of video encoding according to the first aspect.
In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of video encoding according to the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, a frame image to be coded of a target video is extracted, and a coding weight corresponding to each image area in the frame image to be coded is determined through an image analysis technology; setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area; and coding the frame image to be coded based on the set quantization parameter corresponding to each image area. Therefore, different quantization parameters are set for different image areas in the same video frame image, so that coding processing with different fine degrees can be respectively realized for a plurality of image areas, the coding quality of key picture content can be improved, the coding quality of secondary picture content can be reduced to a certain degree, coding resources can be more reasonably distributed, and the quality of coded video pictures and the video watching experience can be integrally improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a method for video encoding according to an embodiment of the present invention;
FIG. 2 is a simplified diagram of a two-dimensional mask map according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a schematic diagram of a clipping two-dimensional mask graph according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus for enhancing video coding according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an apparatus for enhancing video coding according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a network device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a video coding method, and an execution main body of the method can be any network equipment with a video frame image processing function. The network device may be configured to perform image analysis on the acquired video frame image, and perform encoding processing on the video frame image based on an image analysis result, so that original analog data may be converted into digital data, thereby facilitating transmission and storage of video data. In the encoding process, the network device can encode different image areas in the same video frame image in different degrees of fineness, so that the different image areas in the same video frame image can have different levels of picture quality. The network device may include a processor, a memory and a transceiver, the processor may be configured to perform the video encoding process described in the following procedures, the memory may be configured to store data required and generated during the following procedures, and the transceiver may be configured to receive and transmit related data during the following procedures.
The process flow shown in fig. 1 will be described in detail below with reference to specific embodiments, and the contents may be as follows:
step 101, extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology.
In implementation, after acquiring the original video data of the target video, the network device may first acquire a video frame sequence of the target video by using a video processing tool such as ffmpeg, and then sequentially extract frame images in the target video according to the video frame sequence as frame images to be encoded. Then, for each frame image to be encoded, the network device may analyze the image content of the frame image to be encoded through an image analysis technique, so as to determine the encoding weight corresponding to each image region in the frame image to be encoded. Here, the coding weight may be a numerical value indicating a coding fineness of each image region, and it may be set that the larger the coding weight is, the higher the coding fineness of the corresponding image region is, and the better the picture quality of the coded image region is. It should be noted that the image area in the frame image to be encoded may be divided after being analyzed by the image analysis technique, or may be divided manually by a technician in advance, for example, the frame image to be encoded is divided into 400 image areas according to a 20 × 20 specification, or may be divided according to a specified area size, for example, each image area is set to be equal to a size of 9 macro blocks.
Optionally, the coding weight of the image region may be calculated by the coding weights of all pixel points included in the image region, and correspondingly, the processing in step 101 may be as follows: determining the statistical characteristic value of each pixel point of a frame image to be coded by an image analysis technology; determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule; and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point.
In implementation, the network device may determine the statistical characteristic value of each pixel point in the frame image to be encoded through an image analysis technique, where the statistical characteristic value may be initial analysis data directly obtained after analyzing the frame image to be encoded through the image analysis technique, and the statistical characteristic values of different dimensions may be obtained through different image analysis techniques. Then, the network device may determine the coding weight of each pixel point based on the statistical eigenvalue of each pixel point by using the eigenvalue selection rule under the dimensionality to which the statistical eigenvalue belongs. For example, a statistical characteristic value of a dimension a is obtained through a first image analysis technology, and the characteristic value selection rule of the dimension a is that the closer the value is to 1, the higher the coding weight value is, and the value range of the coding weight value is 0-100; and obtaining a statistical characteristic value of the dimension B through a second image analysis technology, wherein the characteristic value selection rule of the dimension B is that the lower the value is, the higher the coding weight is, and the value range of the coding weight is 0-100. Furthermore, the network device may calculate, based on the coding weights of all the pixel points in each image region, a coding weight corresponding to each image region in the frame image to be coded according to a preset calculation formula. For example, for any image region, the average value (different average calculation methods such as an arithmetic average, a geometric average, a square average, a harmonic average, or a weighted average may be adopted as the case may be) of the coding weights of all the pixels in the image region may be used as the coding weight corresponding to the image region.
Optionally, based on different image analysis techniques, different data may be used as statistical characteristic values of the pixel points, and several cases are given as follows:
in the first case, when the image analysis technology is a salient target detection technology, the frame image to be encoded is input into a salient target detection model based on a CAM (Class Activation Mapping) technology, a feature map generated in the penultimate layer in the salient target detection model is obtained, and feature data corresponding to each pixel point in the feature map is used as a statistical feature value of each pixel point of the frame image to be encoded.
In implementation, when the network device determines the statistical characteristic value of each pixel point of the frame image to be encoded by using the salient object detection technology, the frame image to be encoded may be input into a salient object detection model based on the CAM technology. The significant target detection model based on the CAM technology can be based on convolutional neural networks such as Google net, Res net and Dense net, and a global average pooling layer is adopted to replace a full connection layer in the convolutional neural network, so that the significant target detection model has the target positioning capability. Furthermore, technicians can perform personalized training on the salient target detection model by using the image set marked with the salient target so as to enhance the accuracy of model detection. Then, the network device may obtain a feature map generated by a penultimate layer (i.e., a global average pooling layer) in the salient target detection model, and then use feature data corresponding to each pixel point in the feature map as a statistical feature value of each pixel point of the frame image to be encoded.
And in the second case, when the image analysis technology is an optical flow method, calculating an optical flow value corresponding to each pixel point of the frame image to be encoded based on the optical flow method, and taking the optical flow value as a statistical characteristic value of each pixel point.
In the implementation, the optical flow method is a method for determining the correspondence between the previous frame image and the current frame image by using the change of the pixel points in the image sequence in the time domain and the correlation between the adjacent frame images, so as to calculate the motion information of the object between the adjacent frame images. Therefore, the network device can compare the frame image to be encoded with the previous frame image, calculate the optical flow value corresponding to each pixel point of the frame image to be encoded by using the change condition of the pixel point between the two frame images, and use the optical flow value as the statistical characteristic value of each pixel point.
And thirdly, when the image analysis technology is texture analysis, calculating the texture characteristic value of each pixel point of the frame image to be coded, and taking the texture characteristic value as the statistical characteristic value of each pixel point.
In implementation, texture is an inherent characteristic of an object surface, and may be considered as an appearance feature formed by gray scale or color in a certain change rule in space, and different regions in an image often have different textures. Therefore, the network equipment can calculate the texture characteristic value of each pixel point of the frame image to be coded in a texture analysis mode, and the texture characteristic value can be used as the statistical characteristic value of each pixel point.
In case four, when the image analysis technology is the target detection technology, inputting the frame image to be coded into the target detection model; and setting corresponding statistical characteristic values for pixel points of each image content in the frame image to be coded according to the image content detection result output by the target detection model.
In implementation, an object detection model may be preset on the network device, and a plurality of objects in an image may be respectively located and classified through the object detection model. In this way, when analyzing the frame image to be encoded, the network device may input the frame image to be encoded into the object detection model, so that the object detection model may output an image content detection result, where the positions and classifications of a plurality of things (i.e., image contents) in the frame image to be encoded may be marked. Then, the network device can set corresponding statistical characteristic values for the pixel points of each image content according to the position and classification of each image content in the frame image to be encoded. Specifically, the network device may assign the same preset statistical characteristic value to the pixel points in the same type of image content, and assign different preset statistical characteristic values to the pixel points in different types of image content.
It should be noted that, in order to facilitate subsequent calculation of the coding weight, normalization processing may be performed on the statistical characteristic values corresponding to the pixel points, so as to limit the data value within a range of [0, 1], and specifically, the following processing may be adopted, where cam is an original statistical characteristic value, min is a minimum value in the statistical characteristic values, and max is a maximum value in the statistical characteristic values, the normalization processing may be: (cam-min)/max. Further, in order to reduce the data processing amount in image analysis, the frame image to be encoded may be pre-processed: if the frame image to be coded can be firstly scaled to a fixed resolution; then, the image data is normalized as follows, where "image" is image data of each channel (an image generally includes three channels of RGB), "mean" and "std" are average values and standard deviations of the image data of each channel set according to empirical values, and for the three channels of RGB, values may be taken as mean ═ 0.485, 0.456, 0.406, std ═ 0.229, 0.224, 0.225, respectively, and the normalization process may be: (image channel-mean channel)/std channel.
Optionally, the coding weight corresponding to each image region may be calculated by taking a macroblock as a unit, and the corresponding processing may be as follows: for a target image area, determining all macro blocks contained in the target image area; taking the average value of the coding weight values of all pixel points contained in each macro block as the coding weight value of the macro block; and recording the coding weight values of all the macro blocks in the target image area as the corresponding coding weight values of the target image area.
The target image area may be any image area in the frame image to be encoded.
In the implementation, a frame image in video coding can be generally divided into a plurality of macro blocks, and the macro blocks are used as units in the coding process, and the macro blocks are coded one by one, so that a continuous video code stream can be organized. Therefore, when the network device calculates the coding weight corresponding to the target image region, the network device may divide the target image region into a plurality of macro blocks, and then calculate an average value of the coding weights of all pixel points included in each macro block. Furthermore, the network device may use the average value as a coding weight of each macroblock, and then use the coding weights of all macroblocks in the target image region as the coding weights corresponding to the target image region at the same time. It can be understood that when an image region is subdivided into macroblock granularities, one image region can simultaneously correspond to a plurality of coding weights, so that when quantization parameters are subsequently set, different quantization parameters can be set in one image region by taking a macroblock as a unit, thereby further improving the fineness of video frame image coding and improving the quality of a coded video picture.
Optionally, the encoding weight of each pixel point may be recorded in the form of a two-dimensional mask map, and the following processing may correspondingly exist: and constructing a two-dimensional mask image with the resolution being the same as that of the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.
In implementation, after determining the encoding weight of each pixel point in the frame image to be encoded, the network device may first construct a two-dimensional mask image with the same resolution as the frame image to be encoded, where the number and arrangement of the pixel points in the two-dimensional mask image are consistent with the frame image to be encoded, and the pixel points therein correspond to the pixel points in the frame image to be encoded one to one. Then, the network device may record the coding weight of each pixel point in the frame image to be coded by using the two-dimensional mask image, and specifically may record the coding weight of the corresponding pixel point in the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image. Specifically, as shown in fig. 2, the left side is a frame image to be encoded, which includes 64 pixel points, and the right side is a corresponding two-dimensional mask image, and 64 encoding weights are recorded (specific numerical values are only used for illustration and are not actual values). Therefore, the two-dimensional mask image is adopted to record the coding weight, the data can be accurately and orderly recorded, and the subsequent manual consulting and checking of the coding weight are facilitated.
Optionally, the data may be compressed by clipping the two-dimensional mask map, and the corresponding processing may be as follows: clipping the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image; and reserving all target pixel points of which the coding weights in the two-dimensional mask image meet the preset value standard, and recording the position information of the target pixel points.
In implementation, a technician may set a preset value standard for the coding weight in advance, and when the coding weight of the pixel does not satisfy the preset value standard, a uniform default value may be selected as the coding weight of the pixel, so that even if the coding weight of the pixel is not recorded, the subsequent processing for determining the corresponding quantization parameter will not be affected. Based on the setting, after the network device records the coding weight of each pixel point in the frame image to be coded by using the two-dimensional mask image, the network device can cut the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image and the preset value standard, so as to keep all target pixel points in the two-dimensional mask image, of which the coding weights meet the preset value standard, and simultaneously record the position information of the target pixel points. Therefore, in the subsequent process of determining the quantization parameter, the default value can be set as the corresponding coding weight value for the cut pixel points. Specifically, in the process of cutting the coding weight, the two-dimensional mask graph can be traversed first, and the continuous regions containing target pixel points with the number smaller than a certain value and the area larger than a preset threshold value are determined, so that the continuous regions can be determined as the regions to be cut. Further, when recording the position information of the target pixel points, only the position information of the first target pixel point and the last target pixel point of the target area can be recorded for the target area formed by a large number of target pixel points, and the position information of all the target pixel points does not need to be recorded. For example, as shown in fig. 3, if the preset value criterion is not zero and the difference between the preset value criterion and zero is greater than a preset value, the two-dimensional mask map may include a plurality of regions to be clipped, and the clipped two-dimensional mask map includes a large number of target pixel points and a small number of pixel points with a coding weight of zero. Therefore, the two-dimensional mask image is cut, intermediate data in the encoding process can be compressed to a certain extent, the data storage space is saved, and the data transmission quantity in network equipment in the video encoding process is reduced.
And 102, setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area.
In implementation, after determining a coding weight corresponding to each image region in a frame image to be coded, the network device may set a quantization parameter corresponding to each image region according to the coding weight. The quantization parameter (may be abbreviated as QP) may be used to reflect a compression condition of image details, when the QP value is small, most of the image details are retained, when the QP value is large, some of the image details are lost, and the distortion of the encoded image is high and the picture quality is degraded. Therefore, for an image region with a larger coding weight, the required coding fineness is higher, and the quantization parameter corresponding to the image region can be properly reduced; for an image region with a smaller encoding weight, the required encoding fineness is lower, and the quantization parameter corresponding to the image region can be increased appropriately.
Optionally, based on the above situation that one image region simultaneously corresponds to the coding weights of multiple macroblocks, the processing in step 102 may specifically be as follows: and for the target macro block contained in the target image area, calculating the quantization parameter corresponding to the target macro block according to the preset quantization parameter fluctuation range and the coding weight of the target macro block.
The target macroblock may be any macroblock included in the target image area.
In implementation, after recording the coding weights of all macro blocks included in each image area in the frame image to be coded, the network device may calculate the quantization parameter corresponding to each macro block according to a preset quantization parameter fluctuation range and the coding weights of each macro block. Here, the quantization parameter fluctuation range may be preset by a technician according to the image quality requirement of the video frame image and configured in the network device. Specifically, the average image quality of the encoded video frame image and the fineness difference between the image areas can be adjusted by setting different maximum and minimum values of the fluctuation range of the quantization parameter. For example, if a higher average picture quality is required, the maximum value of the quantization parameter fluctuation range can be appropriately reduced; if it is necessary that the difference in fineness between the respective image areas is small, the difference between the maximum value and the minimum value of the fluctuation range of the quantization parameter can be reduced appropriately. Taking the target macro block contained in the target image area as an example, set the QPiRepresenting the quantization parameter, QP, corresponding to the target macroblockmaxAnd QPminMaximum and minimum values representing the fluctuation range of the quantization parameter, BiIf the coding weight value (the value range is 0 to 1) of the target macroblock is represented, the formula exists:
QPi=QPmin+(QPmax-QPmin)×(1-Bi)
from the above formula, it can be seen that: when the coding weight of the macroblock is larger, the corresponding quantization parameter QP is smaller, and when the coding weight is 1, the macroblock corresponds to the minimum quantization parameter, namely QPmin(ii) a Otherwise, when the coding weight of the macroblock is smaller, the corresponding quantization parameter QP is larger, and when the coding weight is 0, the macroblock corresponds to the largest quantization parameter, i.e. QPmax
And 103, coding the frame image to be coded based on the set quantization parameter corresponding to each image area.
In implementation, after the network device sets the quantization parameter corresponding to each image region in the image to be encoded, the network device may perform encoding processing on the frame image to be encoded based on the quantization parameter. Specifically, the network device may perform discrete cosine transform on the frame image to be encoded to transform the image data in the spatial domain into DCT coefficients in the frequency domain. Then, the network device may perform quantization processing based on the DCT coefficient transformed in the frame image to be encoded, and the formula may be: q (x, y) ═ round (F (x, y)/Q + 0.5). Wherein, F (x, y) is a DCT coefficient obtained after discrete cosine transform; q is a quantization step size which has a certain corresponding relation with the quantization parameter, and the quantization step size is increased along with the increase of the quantization parameter; round () function is a rounded rounding function; q (x, y) is a value obtained by quantization. For example, the DCT coefficient of a certain pixel after discrete cosine transform is 203, and the quantization step Q is 28, then Q (x, y) ═ round (205/28+0.5) ═ round (7.8214) ═ 8. Furthermore, the network device may perform data scanning on the quantized frame image to be encoded to convert the image data in the form of a two-dimensional matrix into image data in the form of a one-dimensional array, and then perform processing such as entropy encoding and packaging on the scanned one-dimensional array, thereby completing encoding processing of the frame image to be encoded. Further, the network device may repeatedly perform the processes of steps 101 to 103 to implement the encoding process for all frame images in the video frame sequence of the target video.
Optionally, based on the above situation that one image region simultaneously corresponds to the coding weights of multiple macroblocks, the processing in step 103 may specifically be as follows: and coding the frame image to be coded based on the calculated quantization parameter corresponding to each macro block in each image area.
In implementation, after the network device calculates the quantization parameters corresponding to all the macroblocks in each image region in the frame image to be encoded, the network device may determine the quantization step corresponding to each macroblock by referring to the corresponding relationship between the quantization parameters and the quantization steps, then complete the quantization processing of each macroblock included in each image region according to the quantization step corresponding to each macroblock, and then perform subsequent scanning, entropy coding, encapsulation and other processing, so as to implement the encoding processing of the frame image to be encoded. In this way, the quantization processing of different degrees is executed on a plurality of macro blocks with different quantization step sizes, so that the fineness of the video frame image coding can be further improved, and the quality of the coded video picture can be improved.
In the embodiment of the invention, a frame image to be coded of a target video is extracted, and a coding weight corresponding to each image area in the frame image to be coded is determined through an image analysis technology; setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area; and coding the frame image to be coded based on the set quantization parameter corresponding to each image area. Therefore, different quantization parameters are set for different image areas in the same video frame image, so that coding processing with different fine degrees can be respectively realized for a plurality of image areas, the coding quality of key picture content can be improved, the coding quality of secondary picture content can be reduced to a certain degree, coding resources can be more reasonably distributed, and the quality of coded video pictures and the video watching experience can be integrally improved.
Based on the same technical concept, an embodiment of the present invention further provides an apparatus for video encoding, as shown in fig. 4, the apparatus including:
the image analysis module 401 is configured to extract a frame image to be encoded of a target video, and determine, by using an image analysis technology, a coding weight corresponding to each image region in the frame image to be encoded;
a parameter setting module 402, configured to set a quantization parameter corresponding to each image region according to the coding weight corresponding to each image region;
the video encoding module 403 is configured to encode the frame image to be encoded based on the set quantization parameter corresponding to each image region.
Optionally, the image analysis module 401 is specifically configured to:
determining the statistical characteristic value of each pixel point of the frame image to be coded through an image analysis technology;
determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule;
and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point.
Optionally, as shown in fig. 5, the apparatus further includes:
a weight recording module 404, configured to construct a two-dimensional mask image with the same resolution as that of the frame image to be encoded, and record the encoding weight of each pixel point of the frame image to be encoded at a position corresponding to each pixel point in the two-dimensional mask image.
Fig. 6 is a schematic structural diagram of a network device according to an embodiment of the present invention. The network device 600 may vary significantly depending on configuration or performance, and may include one or more central processors 622 (e.g., one or more processors) and memory 632, one or more storage media 630 (e.g., one or more mass storage devices) storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions that operate on the network device 600. Still further, central processor 622 may be configured to communicate with storage medium 630 to perform a series of instruction operations in storage medium 630 on network device 600.
The network device 600 may also include one or more power supplies 629, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, one or more keyboards 656, and/or one or more operating systems 641, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
Network device 600 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the video encoding described above.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (12)

1. A method of video encoding, the method comprising:
extracting a frame image to be coded of a target video, dividing the frame image to be coded into a plurality of image areas through an image analysis technology, and determining a statistical characteristic value of each pixel point of the frame image to be coded;
determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule, and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point;
setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;
coding the frame image to be coded based on the set quantization parameter corresponding to each image area;
the calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point includes: for a target image area, determining all macro blocks contained in the target image area; taking the average value of the coding weight values of all pixel points contained in each macro block as the coding weight value of the macro block; recording the coding weight values of all macro blocks in the target image area as the corresponding coding weight values of the target image area;
the setting of the quantization parameter corresponding to each image region according to the coding weight corresponding to each image region includes: for a target macro block contained in a target image area, calculating a quantization parameter corresponding to the target macro block according to a preset quantization parameter fluctuation range and a coding weight of the target macro block;
the quantization parameter corresponding to the target macro block is calculated by the following formula:
QPi=QPmin+(QPmax-QPmin)x(1-Bi)
wherein, QPiFor the quantization parameter, QP, corresponding to the target macroblockmaxAnd QPminThe maximum value and the minimum value of the fluctuation range of the quantization parameter, BiThe coding weight of the target macro block is obtained;
the quantization parameter fluctuation range is determined by:
improving the average image quality of the coded video frame image by reducing the maximum value of the fluctuation range of the quantization parameter; the difference in fineness of the image areas is reduced by reducing the difference between the maximum value and the minimum value of the fluctuation range of the quantization parameter.
2. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded by using an image analysis technique comprises:
inputting the frame image to be encoded into a significant target detection model based on a CAM technology;
and acquiring a feature map generated by the second last layer in the significant target detection model, and taking feature data corresponding to each pixel point in the feature map as a statistical feature value of each pixel point of the frame image to be coded.
3. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded by using an image analysis technique comprises:
and calculating an optical flow value corresponding to each pixel point of the frame image to be coded based on an optical flow method, and taking the optical flow value as a statistical characteristic value of each pixel point.
4. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded through an image analysis technique comprises:
and calculating the texture characteristic value of each pixel point of the frame image to be coded, and taking the texture characteristic value as the statistical characteristic value of each pixel point.
5. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded through an image analysis technique comprises:
inputting the frame image to be coded into a target detection model;
and setting corresponding statistical characteristic values for pixel points of each image content in the frame image to be coded according to the image content detection result output by the target detection model.
6. The method according to claim 1, wherein the encoding the frame image to be encoded based on the set quantization parameter corresponding to each image region comprises:
and coding the frame image to be coded based on the calculated quantization parameter corresponding to each macro block in each image area.
7. The method according to claim 1, wherein after determining the coding weight of each pixel point based on the statistical eigenvalue of each pixel point and a preset eigenvalue selection rule, the method further comprises:
and constructing a two-dimensional mask image with the same resolution as the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.
8. The method of claim 7, further comprising:
cutting the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image;
and reserving all target pixel points in the two-dimensional mask image, of which the coding weights meet preset value standards, and recording position information of the target pixel points.
9. An apparatus for video encoding, the apparatus comprising:
the image analysis module is used for extracting a frame image to be coded of a target video, dividing the frame image to be coded into a plurality of image areas through an image analysis technology, determining a statistical characteristic value of each pixel point of the frame image to be coded, determining a coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule, and calculating a coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point;
the parameter setting module is used for setting the quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;
the video coding module is used for coding the frame image to be coded based on the set quantization parameter corresponding to each image area;
the calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point includes: for a target image area, determining all macro blocks contained in the target image area; taking the average value of the coding weight values of all pixel points contained in each macro block as the coding weight value of the macro block; recording the coding weight values of all macro blocks in the target image area as the corresponding coding weight values of the target image area;
the setting of the quantization parameter corresponding to each image region according to the coding weight corresponding to each image region includes: for a target macro block contained in a target image area, calculating a quantization parameter corresponding to the target macro block according to a preset quantization parameter fluctuation range and a coding weight of the target macro block;
the quantization parameter corresponding to the target macro block is calculated by the following formula:
QPi=QPmin+(QPmax-QPmin)x(1-Bi)
wherein, QPiFor the quantization parameter, QP, corresponding to the target macroblockmaxAnd QPminThe maximum value and the minimum value of the fluctuation range of the quantization parameter, BiThe coding weight of the target macro block is obtained;
the quantization parameter fluctuation range is determined by:
improving the average image quality of the coded video frame image by reducing the maximum value of the fluctuation range of the quantization parameter; the difference in fineness of the image areas is reduced by reducing the difference between the maximum value and the minimum value of the fluctuation range of the quantization parameter.
10. The apparatus of claim 9, further comprising:
and the weight recording module is used for constructing a two-dimensional mask image with the resolution being the same as that of the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.
11. A network device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of video encoding according to any one of claims 1 to 8.
12. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of video encoding according to any one of claims 1 to 8.
CN201911157969.9A 2019-11-22 2019-11-22 Video coding method and device Active CN110996101B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201911157969.9A CN110996101B (en) 2019-11-22 2019-11-22 Video coding method and device
PCT/CN2020/070701 WO2021098030A1 (en) 2019-11-22 2020-01-07 Method and apparatus for video encoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911157969.9A CN110996101B (en) 2019-11-22 2019-11-22 Video coding method and device

Publications (2)

Publication Number Publication Date
CN110996101A CN110996101A (en) 2020-04-10
CN110996101B true CN110996101B (en) 2022-05-27

Family

ID=70086001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911157969.9A Active CN110996101B (en) 2019-11-22 2019-11-22 Video coding method and device

Country Status (2)

Country Link
CN (1) CN110996101B (en)
WO (1) WO2021098030A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114025200B (en) * 2021-09-15 2022-09-16 湖南广播影视集团有限公司 Ultra-high definition post-production solution based on cloud technology
CN116156196B (en) * 2023-04-19 2023-06-30 探长信息技术(苏州)有限公司 Efficient transmission method for video data
CN116260976B (en) * 2023-05-15 2023-07-18 深圳比特耐特信息技术股份有限公司 Video data processing application system
CN116385706B (en) * 2023-06-06 2023-08-25 山东外事职业大学 Signal detection method and system based on image recognition technology

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110087075A (en) * 2019-04-22 2019-08-02 浙江大华技术股份有限公司 A kind of coding method of image, code device and computer storage medium
CN110163076A (en) * 2019-03-05 2019-08-23 腾讯科技(深圳)有限公司 A kind of image processing method and relevant apparatus

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101309422B (en) * 2008-06-23 2010-09-29 北京工业大学 Macroblock level quantized parameter process method and apparatus
CN102263943B (en) * 2010-05-25 2014-06-04 财团法人工业技术研究院 Video bit rate control device and method
CN103079063B (en) * 2012-12-19 2015-08-26 华南理工大学 A kind of method for video coding of vision attention region under low bit rate
JP2016082395A (en) * 2014-10-16 2016-05-16 キヤノン株式会社 Encoder, coding method and program
CN107770525B (en) * 2016-08-15 2020-07-24 华为技术有限公司 Image coding method and device
CN106101602B (en) * 2016-08-30 2019-03-29 北京北信源软件股份有限公司 A kind of method that bandwidth self-adaption improves network video quality
CN106412594A (en) * 2016-10-21 2017-02-15 乐视控股(北京)有限公司 Panoramic image encoding method and apparatus
US10904531B2 (en) * 2017-03-23 2021-01-26 Qualcomm Incorporated Adaptive parameters for coding of 360-degree video
CN107147912B (en) * 2017-05-04 2020-09-29 浙江大华技术股份有限公司 Video coding method and device
JP6946979B2 (en) * 2017-11-29 2021-10-13 富士通株式会社 Video coding device, video coding method, and video coding program
CN110225342B (en) * 2019-04-10 2021-03-09 中国科学技术大学 Video coding bit distribution system and method based on semantic distortion measurement
CN110177277B (en) * 2019-06-28 2022-04-12 广东中星微电子有限公司 Image coding method and device, computer readable storage medium and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163076A (en) * 2019-03-05 2019-08-23 腾讯科技(深圳)有限公司 A kind of image processing method and relevant apparatus
CN110087075A (en) * 2019-04-22 2019-08-02 浙江大华技术股份有限公司 A kind of coding method of image, code device and computer storage medium

Also Published As

Publication number Publication date
CN110996101A (en) 2020-04-10
WO2021098030A1 (en) 2021-05-27

Similar Documents

Publication Publication Date Title
CN110996101B (en) Video coding method and device
JP7110370B2 (en) Using Nonlinear Functions Applied to Quantization Parameters of Machine Learning Models for Video Coding
US20210051322A1 (en) Receptive-field-conforming convolutional models for video coding
CN104135629B (en) The method, apparatus and computer readable storage medium encoded to image
JP4514818B2 (en) Video decoding device
EP1938613A2 (en) Method and apparatus for using random field models to improve picture and video compression and frame rate up conversion
CN111988611A (en) Method for determining quantization offset information, image coding method, image coding device and electronic equipment
US20030103676A1 (en) Data compression method and recording medium with data compression program recorded therein
WO2020061008A1 (en) Receptive-field-conforming convolution models for video coding
EP4344479A1 (en) Systems and methods for image filtering
CN108353193B (en) Method and apparatus for processing video data based on multiple graph-based models
CN116250008A (en) Encoding and decoding methods, encoder, decoder and encoding and decoding system of point cloud
US20230269385A1 (en) Systems and methods for improving object tracking in compressed feature data in coding of multi-dimensional data
CN108182712B (en) Image processing method, device and system
CN111182301A (en) Method, device, equipment and system for selecting optimal quantization parameter during image compression
Kavitha et al. A survey of image compression methods for low depth-of-field images and image sequences
CN116325732A (en) Decoding and encoding method, decoder, encoder and encoding and decoding system of point cloud
US11503292B2 (en) Method and apparatus for encoding/decoding video signal by using graph-based separable transform
CN101310534A (en) Method and apparatus for using random field models to improve picture and video compression and frame rate up conversion
US20230328246A1 (en) Point cloud encoding method and decoding method, and encoder and decoder
CN117115433B (en) Display abnormality detection method, device, equipment and storage medium
WO2022217472A1 (en) Point cloud encoding and decoding methods, encoder, decoder, and computer readable storage medium
US20240137506A1 (en) Systems and methods for image filtering
Antony et al. A Lossless Image Compression Based On Hierarchical Prediction and Context Adaptive Coding
Alhadi Compression of Medical Images Based on 2D-Discrete Cosine Transform and Vector Quantization Algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant