CN110996101B

CN110996101B - Video coding method and device

Info

Publication number: CN110996101B
Application number: CN201911157969.9A
Authority: CN
Inventors: 郑振贵; 黄学辉; 林鹏程
Original assignee: Wangsu Science and Technology Co Ltd
Current assignee: Wangsu Science and Technology Co Ltd
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2022-05-27
Anticipated expiration: 2039-11-22
Also published as: CN110996101A; WO2021098030A1

Abstract

The invention discloses a method and a device for video coding, and belongs to the technical field of video processing. The method comprises the following steps: extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology; setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area; and coding the frame image to be coded based on the set quantization parameter corresponding to each image area. By adopting the invention, the coding resources can be more reasonably distributed, and the quality of the coded video picture and the video watching experience can be integrally improved.

Description

Video coding method and device

Technical Field

The present invention relates to the field of video processing technologies, and in particular, to a method and an apparatus for video encoding.

Background

After image information is acquired by image acquisition equipment such as a camera, original video data can be generated, the original video data is composed of a large number of frame images which are arranged in sequence, and the data volume is very large. In order to facilitate transmission and storage of video, a video coding technique may be used to encode and compress original video data to remove redundant information in the original video data, and then the encoded video data is transmitted or stored.

Existing video coding may mainly include intra-coding and inter-coding: when the intra-frame coding is carried out, discrete cosine transform, quantization, entropy coding and other processing can be sequentially carried out on the frame image, so that compressed image data can be obtained; when the inter-frame coding is performed, a motion vector between a frame image to be coded and a reference frame image can be calculated, a predicted image is generated through the reference frame image and the motion vector, then the predicted image and the frame image to be coded are compared to generate a difference image, and then discrete cosine transform, quantization, entropy coding and other processing can be sequentially performed on the difference image, so that the compressed image data can be obtained.

In the process of implementing the invention, the inventor finds that the prior art has at least the following problems:

at present, when a frame image is quantized, the frame image is often divided into a plurality of image blocks, and then all the image blocks are quantized by using the same quantization parameter, or the quantization parameter is directly selected according to the content complexity of each image block for quantization. However, encoding according to the two quantization methods may result in that secondary picture content is encoded too finely, and key picture content cannot be encoded more finely, that is, device encoding resources are not reasonably allocated, so that the viewing experience of the encoded video picture is low, and the picture quality is poor.

Disclosure of Invention

To solve the problems of the prior art, embodiments of the present invention provide a method and apparatus for video encoding. The technical scheme is as follows:

in a first aspect, a method for video coding is provided, the method comprising:

extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology;

setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;

and coding the frame image to be coded based on the set quantization parameter corresponding to each image area.

In a second aspect, an apparatus for video encoding is provided, the apparatus comprising:

the image analysis module is used for extracting a frame image to be coded of a target video and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology;

the parameter setting module is used for setting the quantization parameter corresponding to each image area according to the coding weight corresponding to each image area;

and the video coding module is used for coding the frame image to be coded based on the set quantization parameter corresponding to each image area.

In a third aspect, there is provided a network device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method of video encoding according to the first aspect.

In a fourth aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by a processor to implement the method of video encoding according to the first aspect.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, a frame image to be coded of a target video is extracted, and a coding weight corresponding to each image area in the frame image to be coded is determined through an image analysis technology; setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area; and coding the frame image to be coded based on the set quantization parameter corresponding to each image area. Therefore, different quantization parameters are set for different image areas in the same video frame image, so that coding processing with different fine degrees can be respectively realized for a plurality of image areas, the coding quality of key picture content can be improved, the coding quality of secondary picture content can be reduced to a certain degree, coding resources can be more reasonably distributed, and the quality of coded video pictures and the video watching experience can be integrally improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for video encoding according to an embodiment of the present invention;

FIG. 2 is a simplified diagram of a two-dimensional mask map according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a schematic diagram of a clipping two-dimensional mask graph according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for enhancing video coding according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for enhancing video coding according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a network device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

The embodiment of the invention provides a video coding method, and an execution main body of the method can be any network equipment with a video frame image processing function. The network device may be configured to perform image analysis on the acquired video frame image, and perform encoding processing on the video frame image based on an image analysis result, so that original analog data may be converted into digital data, thereby facilitating transmission and storage of video data. In the encoding process, the network device can encode different image areas in the same video frame image in different degrees of fineness, so that the different image areas in the same video frame image can have different levels of picture quality. The network device may include a processor, a memory and a transceiver, the processor may be configured to perform the video encoding process described in the following procedures, the memory may be configured to store data required and generated during the following procedures, and the transceiver may be configured to receive and transmit related data during the following procedures.

The process flow shown in fig. 1 will be described in detail below with reference to specific embodiments, and the contents may be as follows:

step 101, extracting a frame image to be coded of a target video, and determining a coding weight corresponding to each image area in the frame image to be coded through an image analysis technology.

In implementation, after acquiring the original video data of the target video, the network device may first acquire a video frame sequence of the target video by using a video processing tool such as ffmpeg, and then sequentially extract frame images in the target video according to the video frame sequence as frame images to be encoded. Then, for each frame image to be encoded, the network device may analyze the image content of the frame image to be encoded through an image analysis technique, so as to determine the encoding weight corresponding to each image region in the frame image to be encoded. Here, the coding weight may be a numerical value indicating a coding fineness of each image region, and it may be set that the larger the coding weight is, the higher the coding fineness of the corresponding image region is, and the better the picture quality of the coded image region is. It should be noted that the image area in the frame image to be encoded may be divided after being analyzed by the image analysis technique, or may be divided manually by a technician in advance, for example, the frame image to be encoded is divided into 400 image areas according to a 20 × 20 specification, or may be divided according to a specified area size, for example, each image area is set to be equal to a size of 9 macro blocks.

Optionally, the coding weight of the image region may be calculated by the coding weights of all pixel points included in the image region, and correspondingly, the processing in step 101 may be as follows: determining the statistical characteristic value of each pixel point of a frame image to be coded by an image analysis technology; determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule; and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point.

In implementation, the network device may determine the statistical characteristic value of each pixel point in the frame image to be encoded through an image analysis technique, where the statistical characteristic value may be initial analysis data directly obtained after analyzing the frame image to be encoded through the image analysis technique, and the statistical characteristic values of different dimensions may be obtained through different image analysis techniques. Then, the network device may determine the coding weight of each pixel point based on the statistical eigenvalue of each pixel point by using the eigenvalue selection rule under the dimensionality to which the statistical eigenvalue belongs. For example, a statistical characteristic value of a dimension a is obtained through a first image analysis technology, and the characteristic value selection rule of the dimension a is that the closer the value is to 1, the higher the coding weight value is, and the value range of the coding weight value is 0-100; and obtaining a statistical characteristic value of the dimension B through a second image analysis technology, wherein the characteristic value selection rule of the dimension B is that the lower the value is, the higher the coding weight is, and the value range of the coding weight is 0-100. Furthermore, the network device may calculate, based on the coding weights of all the pixel points in each image region, a coding weight corresponding to each image region in the frame image to be coded according to a preset calculation formula. For example, for any image region, the average value (different average calculation methods such as an arithmetic average, a geometric average, a square average, a harmonic average, or a weighted average may be adopted as the case may be) of the coding weights of all the pixels in the image region may be used as the coding weight corresponding to the image region.

Optionally, based on different image analysis techniques, different data may be used as statistical characteristic values of the pixel points, and several cases are given as follows:

in the first case, when the image analysis technology is a salient target detection technology, the frame image to be encoded is input into a salient target detection model based on a CAM (Class Activation Mapping) technology, a feature map generated in the penultimate layer in the salient target detection model is obtained, and feature data corresponding to each pixel point in the feature map is used as a statistical feature value of each pixel point of the frame image to be encoded.

In implementation, when the network device determines the statistical characteristic value of each pixel point of the frame image to be encoded by using the salient object detection technology, the frame image to be encoded may be input into a salient object detection model based on the CAM technology. The significant target detection model based on the CAM technology can be based on convolutional neural networks such as Google net, Res net and Dense net, and a global average pooling layer is adopted to replace a full connection layer in the convolutional neural network, so that the significant target detection model has the target positioning capability. Furthermore, technicians can perform personalized training on the salient target detection model by using the image set marked with the salient target so as to enhance the accuracy of model detection. Then, the network device may obtain a feature map generated by a penultimate layer (i.e., a global average pooling layer) in the salient target detection model, and then use feature data corresponding to each pixel point in the feature map as a statistical feature value of each pixel point of the frame image to be encoded.

And in the second case, when the image analysis technology is an optical flow method, calculating an optical flow value corresponding to each pixel point of the frame image to be encoded based on the optical flow method, and taking the optical flow value as a statistical characteristic value of each pixel point.

In the implementation, the optical flow method is a method for determining the correspondence between the previous frame image and the current frame image by using the change of the pixel points in the image sequence in the time domain and the correlation between the adjacent frame images, so as to calculate the motion information of the object between the adjacent frame images. Therefore, the network device can compare the frame image to be encoded with the previous frame image, calculate the optical flow value corresponding to each pixel point of the frame image to be encoded by using the change condition of the pixel point between the two frame images, and use the optical flow value as the statistical characteristic value of each pixel point.

And thirdly, when the image analysis technology is texture analysis, calculating the texture characteristic value of each pixel point of the frame image to be coded, and taking the texture characteristic value as the statistical characteristic value of each pixel point.

In implementation, texture is an inherent characteristic of an object surface, and may be considered as an appearance feature formed by gray scale or color in a certain change rule in space, and different regions in an image often have different textures. Therefore, the network equipment can calculate the texture characteristic value of each pixel point of the frame image to be coded in a texture analysis mode, and the texture characteristic value can be used as the statistical characteristic value of each pixel point.

In case four, when the image analysis technology is the target detection technology, inputting the frame image to be coded into the target detection model; and setting corresponding statistical characteristic values for pixel points of each image content in the frame image to be coded according to the image content detection result output by the target detection model.

In implementation, an object detection model may be preset on the network device, and a plurality of objects in an image may be respectively located and classified through the object detection model. In this way, when analyzing the frame image to be encoded, the network device may input the frame image to be encoded into the object detection model, so that the object detection model may output an image content detection result, where the positions and classifications of a plurality of things (i.e., image contents) in the frame image to be encoded may be marked. Then, the network device can set corresponding statistical characteristic values for the pixel points of each image content according to the position and classification of each image content in the frame image to be encoded. Specifically, the network device may assign the same preset statistical characteristic value to the pixel points in the same type of image content, and assign different preset statistical characteristic values to the pixel points in different types of image content.

It should be noted that, in order to facilitate subsequent calculation of the coding weight, normalization processing may be performed on the statistical characteristic values corresponding to the pixel points, so as to limit the data value within a range of [0, 1], and specifically, the following processing may be adopted, where cam is an original statistical characteristic value, min is a minimum value in the statistical characteristic values, and max is a maximum value in the statistical characteristic values, the normalization processing may be: (cam-min)/max. Further, in order to reduce the data processing amount in image analysis, the frame image to be encoded may be pre-processed: if the frame image to be coded can be firstly scaled to a fixed resolution; then, the image data is normalized as follows, where "image" is image data of each channel (an image generally includes three channels of RGB), "mean" and "std" are average values and standard deviations of the image data of each channel set according to empirical values, and for the three channels of RGB, values may be taken as mean ═ 0.485, 0.456, 0.406, std ═ 0.229, 0.224, 0.225, respectively, and the normalization process may be: (image channel-mean channel)/std channel.

Optionally, the coding weight corresponding to each image region may be calculated by taking a macroblock as a unit, and the corresponding processing may be as follows: for a target image area, determining all macro blocks contained in the target image area; taking the average value of the coding weight values of all pixel points contained in each macro block as the coding weight value of the macro block; and recording the coding weight values of all the macro blocks in the target image area as the corresponding coding weight values of the target image area.

The target image area may be any image area in the frame image to be encoded.

In the implementation, a frame image in video coding can be generally divided into a plurality of macro blocks, and the macro blocks are used as units in the coding process, and the macro blocks are coded one by one, so that a continuous video code stream can be organized. Therefore, when the network device calculates the coding weight corresponding to the target image region, the network device may divide the target image region into a plurality of macro blocks, and then calculate an average value of the coding weights of all pixel points included in each macro block. Furthermore, the network device may use the average value as a coding weight of each macroblock, and then use the coding weights of all macroblocks in the target image region as the coding weights corresponding to the target image region at the same time. It can be understood that when an image region is subdivided into macroblock granularities, one image region can simultaneously correspond to a plurality of coding weights, so that when quantization parameters are subsequently set, different quantization parameters can be set in one image region by taking a macroblock as a unit, thereby further improving the fineness of video frame image coding and improving the quality of a coded video picture.

Optionally, the encoding weight of each pixel point may be recorded in the form of a two-dimensional mask map, and the following processing may correspondingly exist: and constructing a two-dimensional mask image with the resolution being the same as that of the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.

In implementation, after determining the encoding weight of each pixel point in the frame image to be encoded, the network device may first construct a two-dimensional mask image with the same resolution as the frame image to be encoded, where the number and arrangement of the pixel points in the two-dimensional mask image are consistent with the frame image to be encoded, and the pixel points therein correspond to the pixel points in the frame image to be encoded one to one. Then, the network device may record the coding weight of each pixel point in the frame image to be coded by using the two-dimensional mask image, and specifically may record the coding weight of the corresponding pixel point in the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image. Specifically, as shown in fig. 2, the left side is a frame image to be encoded, which includes 64 pixel points, and the right side is a corresponding two-dimensional mask image, and 64 encoding weights are recorded (specific numerical values are only used for illustration and are not actual values). Therefore, the two-dimensional mask image is adopted to record the coding weight, the data can be accurately and orderly recorded, and the subsequent manual consulting and checking of the coding weight are facilitated.

Optionally, the data may be compressed by clipping the two-dimensional mask map, and the corresponding processing may be as follows: clipping the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image; and reserving all target pixel points of which the coding weights in the two-dimensional mask image meet the preset value standard, and recording the position information of the target pixel points.

In implementation, a technician may set a preset value standard for the coding weight in advance, and when the coding weight of the pixel does not satisfy the preset value standard, a uniform default value may be selected as the coding weight of the pixel, so that even if the coding weight of the pixel is not recorded, the subsequent processing for determining the corresponding quantization parameter will not be affected. Based on the setting, after the network device records the coding weight of each pixel point in the frame image to be coded by using the two-dimensional mask image, the network device can cut the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image and the preset value standard, so as to keep all target pixel points in the two-dimensional mask image, of which the coding weights meet the preset value standard, and simultaneously record the position information of the target pixel points. Therefore, in the subsequent process of determining the quantization parameter, the default value can be set as the corresponding coding weight value for the cut pixel points. Specifically, in the process of cutting the coding weight, the two-dimensional mask graph can be traversed first, and the continuous regions containing target pixel points with the number smaller than a certain value and the area larger than a preset threshold value are determined, so that the continuous regions can be determined as the regions to be cut. Further, when recording the position information of the target pixel points, only the position information of the first target pixel point and the last target pixel point of the target area can be recorded for the target area formed by a large number of target pixel points, and the position information of all the target pixel points does not need to be recorded. For example, as shown in fig. 3, if the preset value criterion is not zero and the difference between the preset value criterion and zero is greater than a preset value, the two-dimensional mask map may include a plurality of regions to be clipped, and the clipped two-dimensional mask map includes a large number of target pixel points and a small number of pixel points with a coding weight of zero. Therefore, the two-dimensional mask image is cut, intermediate data in the encoding process can be compressed to a certain extent, the data storage space is saved, and the data transmission quantity in network equipment in the video encoding process is reduced.

And 102, setting a quantization parameter corresponding to each image area according to the coding weight corresponding to each image area.

In implementation, after determining a coding weight corresponding to each image region in a frame image to be coded, the network device may set a quantization parameter corresponding to each image region according to the coding weight. The quantization parameter (may be abbreviated as QP) may be used to reflect a compression condition of image details, when the QP value is small, most of the image details are retained, when the QP value is large, some of the image details are lost, and the distortion of the encoded image is high and the picture quality is degraded. Therefore, for an image region with a larger coding weight, the required coding fineness is higher, and the quantization parameter corresponding to the image region can be properly reduced; for an image region with a smaller encoding weight, the required encoding fineness is lower, and the quantization parameter corresponding to the image region can be increased appropriately.

Optionally, based on the above situation that one image region simultaneously corresponds to the coding weights of multiple macroblocks, the processing in step 102 may specifically be as follows: and for the target macro block contained in the target image area, calculating the quantization parameter corresponding to the target macro block according to the preset quantization parameter fluctuation range and the coding weight of the target macro block.

The target macroblock may be any macroblock included in the target image area.

In implementation, after recording the coding weights of all macro blocks included in each image area in the frame image to be coded, the network device may calculate the quantization parameter corresponding to each macro block according to a preset quantization parameter fluctuation range and the coding weights of each macro block. Here, the quantization parameter fluctuation range may be preset by a technician according to the image quality requirement of the video frame image and configured in the network device. Specifically, the average image quality of the encoded video frame image and the fineness difference between the image areas can be adjusted by setting different maximum and minimum values of the fluctuation range of the quantization parameter. For example, if a higher average picture quality is required, the maximum value of the quantization parameter fluctuation range can be appropriately reduced; if it is necessary that the difference in fineness between the respective image areas is small, the difference between the maximum value and the minimum value of the fluctuation range of the quantization parameter can be reduced appropriately. Taking the target macro block contained in the target image area as an example, set the QP_iRepresenting the quantization parameter, QP, corresponding to the target macroblock_maxAnd QP_minMaximum and minimum values representing the fluctuation range of the quantization parameter, B_iIf the coding weight value (the value range is 0 to 1) of the target macroblock is represented, the formula exists:

QP_i＝QP_min+(QP_max-QP_min)×(1-B_i)

from the above formula, it can be seen that: when the coding weight of the macroblock is larger, the corresponding quantization parameter QP is smaller, and when the coding weight is 1, the macroblock corresponds to the minimum quantization parameter, namely QP_min(ii) a Otherwise, when the coding weight of the macroblock is smaller, the corresponding quantization parameter QP is larger, and when the coding weight is 0, the macroblock corresponds to the largest quantization parameter, i.e. QP_max。

And 103, coding the frame image to be coded based on the set quantization parameter corresponding to each image area.

In implementation, after the network device sets the quantization parameter corresponding to each image region in the image to be encoded, the network device may perform encoding processing on the frame image to be encoded based on the quantization parameter. Specifically, the network device may perform discrete cosine transform on the frame image to be encoded to transform the image data in the spatial domain into DCT coefficients in the frequency domain. Then, the network device may perform quantization processing based on the DCT coefficient transformed in the frame image to be encoded, and the formula may be: q (x, y) ═ round (F (x, y)/Q + 0.5). Wherein, F (x, y) is a DCT coefficient obtained after discrete cosine transform; q is a quantization step size which has a certain corresponding relation with the quantization parameter, and the quantization step size is increased along with the increase of the quantization parameter; round () function is a rounded rounding function; q (x, y) is a value obtained by quantization. For example, the DCT coefficient of a certain pixel after discrete cosine transform is 203, and the quantization step Q is 28, then Q (x, y) ═ round (205/28+0.5) ═ round (7.8214) ═ 8. Furthermore, the network device may perform data scanning on the quantized frame image to be encoded to convert the image data in the form of a two-dimensional matrix into image data in the form of a one-dimensional array, and then perform processing such as entropy encoding and packaging on the scanned one-dimensional array, thereby completing encoding processing of the frame image to be encoded. Further, the network device may repeatedly perform the processes of steps 101 to 103 to implement the encoding process for all frame images in the video frame sequence of the target video.

Optionally, based on the above situation that one image region simultaneously corresponds to the coding weights of multiple macroblocks, the processing in step 103 may specifically be as follows: and coding the frame image to be coded based on the calculated quantization parameter corresponding to each macro block in each image area.

In implementation, after the network device calculates the quantization parameters corresponding to all the macroblocks in each image region in the frame image to be encoded, the network device may determine the quantization step corresponding to each macroblock by referring to the corresponding relationship between the quantization parameters and the quantization steps, then complete the quantization processing of each macroblock included in each image region according to the quantization step corresponding to each macroblock, and then perform subsequent scanning, entropy coding, encapsulation and other processing, so as to implement the encoding processing of the frame image to be encoded. In this way, the quantization processing of different degrees is executed on a plurality of macro blocks with different quantization step sizes, so that the fineness of the video frame image coding can be further improved, and the quality of the coded video picture can be improved.

Based on the same technical concept, an embodiment of the present invention further provides an apparatus for video encoding, as shown in fig. 4, the apparatus including:

the image analysis module 401 is configured to extract a frame image to be encoded of a target video, and determine, by using an image analysis technology, a coding weight corresponding to each image region in the frame image to be encoded;

a parameter setting module 402, configured to set a quantization parameter corresponding to each image region according to the coding weight corresponding to each image region;

the video encoding module 403 is configured to encode the frame image to be encoded based on the set quantization parameter corresponding to each image region.

Optionally, the image analysis module 401 is specifically configured to:

determining the statistical characteristic value of each pixel point of the frame image to be coded through an image analysis technology;

determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule;

and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point.

Optionally, as shown in fig. 5, the apparatus further includes:

a weight recording module 404, configured to construct a two-dimensional mask image with the same resolution as that of the frame image to be encoded, and record the encoding weight of each pixel point of the frame image to be encoded at a position corresponding to each pixel point in the two-dimensional mask image.

Fig. 6 is a schematic structural diagram of a network device according to an embodiment of the present invention. The network device 600 may vary significantly depending on configuration or performance, and may include one or more central processors 622 (e.g., one or more processors) and memory 632, one or more storage media 630 (e.g., one or more mass storage devices) storing applications 642 or data 644. Memory 632 and storage medium 630 may be, among other things, transient or persistent storage. The program stored on the storage medium 630 may include one or more modules (not shown), each of which may include a sequence of instructions that operate on the network device 600. Still further, central processor 622 may be configured to communicate with storage medium 630 to perform a series of instruction operations in storage medium 630 on network device 600.

The network device 600 may also include one or more power supplies 629, one or more wired or wireless network interfaces 650, one or more input-output interfaces 658, one or more keyboards 656, and/or one or more operating systems 641, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.

Network device 600 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for performing the video encoding described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method of video encoding, the method comprising:

extracting a frame image to be coded of a target video, dividing the frame image to be coded into a plurality of image areas through an image analysis technology, and determining a statistical characteristic value of each pixel point of the frame image to be coded;

determining the coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule, and calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point;

coding the frame image to be coded based on the set quantization parameter corresponding to each image area;

the calculating the coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point includes: for a target image area, determining all macro blocks contained in the target image area; taking the average value of the coding weight values of all pixel points contained in each macro block as the coding weight value of the macro block; recording the coding weight values of all macro blocks in the target image area as the corresponding coding weight values of the target image area;

the setting of the quantization parameter corresponding to each image region according to the coding weight corresponding to each image region includes: for a target macro block contained in a target image area, calculating a quantization parameter corresponding to the target macro block according to a preset quantization parameter fluctuation range and a coding weight of the target macro block;

the quantization parameter corresponding to the target macro block is calculated by the following formula:

QP_i＝QP_min+(QP_max－QP_min)x(1－B_i)

wherein, QP_iFor the quantization parameter, QP, corresponding to the target macroblock_maxAnd QP_minThe maximum value and the minimum value of the fluctuation range of the quantization parameter, B_iThe coding weight of the target macro block is obtained;

the quantization parameter fluctuation range is determined by:

improving the average image quality of the coded video frame image by reducing the maximum value of the fluctuation range of the quantization parameter; the difference in fineness of the image areas is reduced by reducing the difference between the maximum value and the minimum value of the fluctuation range of the quantization parameter.

2. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded by using an image analysis technique comprises:

inputting the frame image to be encoded into a significant target detection model based on a CAM technology;

and acquiring a feature map generated by the second last layer in the significant target detection model, and taking feature data corresponding to each pixel point in the feature map as a statistical feature value of each pixel point of the frame image to be coded.

3. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded by using an image analysis technique comprises:

and calculating an optical flow value corresponding to each pixel point of the frame image to be coded based on an optical flow method, and taking the optical flow value as a statistical characteristic value of each pixel point.

4. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded through an image analysis technique comprises:

and calculating the texture characteristic value of each pixel point of the frame image to be coded, and taking the texture characteristic value as the statistical characteristic value of each pixel point.

5. The method according to claim 1, wherein the determining the statistical characteristic value of each pixel point of the frame image to be encoded through an image analysis technique comprises:

inputting the frame image to be coded into a target detection model;

and setting corresponding statistical characteristic values for pixel points of each image content in the frame image to be coded according to the image content detection result output by the target detection model.

6. The method according to claim 1, wherein the encoding the frame image to be encoded based on the set quantization parameter corresponding to each image region comprises:

and coding the frame image to be coded based on the calculated quantization parameter corresponding to each macro block in each image area.

7. The method according to claim 1, wherein after determining the coding weight of each pixel point based on the statistical eigenvalue of each pixel point and a preset eigenvalue selection rule, the method further comprises:

and constructing a two-dimensional mask image with the same resolution as the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.

8. The method of claim 7, further comprising:

cutting the two-dimensional mask image according to the coding weight recorded at each pixel point in the two-dimensional mask image;

and reserving all target pixel points in the two-dimensional mask image, of which the coding weights meet preset value standards, and recording position information of the target pixel points.

9. An apparatus for video encoding, the apparatus comprising:

the image analysis module is used for extracting a frame image to be coded of a target video, dividing the frame image to be coded into a plurality of image areas through an image analysis technology, determining a statistical characteristic value of each pixel point of the frame image to be coded, determining a coding weight of each pixel point based on the statistical characteristic value of each pixel point and a preset characteristic value selection rule, and calculating a coding weight corresponding to each image area in the frame image to be coded according to the coding weight of each pixel point;

the video coding module is used for coding the frame image to be coded based on the set quantization parameter corresponding to each image area;

QP_i＝QP_min+(QP_max－QP_min)x(1－B_i)

the quantization parameter fluctuation range is determined by:

10. The apparatus of claim 9, further comprising:

and the weight recording module is used for constructing a two-dimensional mask image with the resolution being the same as that of the frame image to be coded, and recording the coding weight of each pixel point of the frame image to be coded at the position corresponding to each pixel point in the two-dimensional mask image.

11. A network device comprising a processor and a memory, the memory having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of video encoding according to any one of claims 1 to 8.

12. A computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement the method of video encoding according to any one of claims 1 to 8.