CN108737825B

CN108737825B - Video data encoding method, apparatus, computer device and storage medium

Info

Publication number: CN108737825B
Application number: CN201710241529.6A
Authority: CN
Inventors: 奚驰; 王新亮; 李斌
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2017-04-13
Filing date: 2017-04-13
Publication date: 2023-05-02
Anticipated expiration: 2037-04-13
Also published as: CN108737825A

Abstract

The invention relates to a video data encoding method, a video data encoding device, a video data encoding computer device and a video data storage medium. Wherein the method comprises the following steps: acquiring a video frame to be encoded; when one I frame is coded, caching an original image coded into the I frame; when encoding to the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory; when the comparison result is that the image is relatively static, the original image which needs to be encoded into an I frame is encoded into a P frame; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames. By the video data encoding method, the video data encoding device, the computer equipment and the storage medium, the receiving terminal can give consideration to the definition and fluency of the video frame when decoding and playing the video frame after receiving the encoded frame data.

Description

Video data encoding method, apparatus, computer device and storage medium

Technical Field

The present invention relates to the field of video encoding technologies, and in particular, to a video data encoding method, apparatus, computer device, and storage medium.

Background

Video consists of a plurality of consecutive pictures, each still picture being a video frame. In order to improve the efficiency of video transmission, the video in the network is usually required to be encoded and then transmitted, and after receiving the encoded video, the receiving terminal decodes the video according to a corresponding decoding mode so as to realize the playing of the decoded video.

With the increasing quality of video capturing, in applications such as conference sharing and desktop sharing, large-size video, for example, 1000×2000, is generally required to be transmitted. For such large-size video, if the code rate setting of the transmitting terminal is too small, the coded data of the video cannot be completely transmitted, so that the video picture decoded by the receiving terminal is blurred and not clear enough. If the code rate of the sending terminal is set to be too large, the upper limit of the bandwidth is easy to reach, so that the packet loss of the coded data is caused in the transmission process, the receiving terminal is difficult to decode the received data, the video picture cannot be displayed normally, and the smoothness of video playing is low.

In summary, the conventional video data encoding method cannot give consideration to the definition and smoothness of the video picture received and displayed by the video receiving terminal.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video data encoding method, apparatus, computer device, and storage medium that can achieve both sharpness and smoothness of video pictures received and displayed by a receiving terminal.

A method of video data encoding, comprising:

acquiring a video frame to be encoded;

when one I frame is coded, caching an original image coded into the I frame;

when encoding to the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory;

when the comparison result is that the image is relatively static, the original image which needs to be encoded into an I frame is encoded into a P frame;

when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

A video data encoding apparatus comprising:

the video frame acquisition module is used for acquiring a video frame to be encoded;

the image caching module is used for caching an original image coded into an I frame when the I frame is coded;

the image comparison module is used for comparing an original image which needs to be encoded into an I frame with an original image in a cache when the next I frame is encoded;

the video coding module is used for coding the original image which needs to be coded into an I frame into a P frame when the comparison result is that the image is relatively static; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

A computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the video data encoding method when executing the program.

A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the video data encoding method.

The video data coding method, the video data coding device, the computer equipment and the storage medium compare an original image which needs to be coded into an I frame with an original image in a cache when the next I frame is coded; it is determined whether to encode the original image to be encoded as an I frame or as a P frame by comparing the results. If the comparison result is that the images are relatively static, the similarity between the two frames of original images is larger, the original images which need to be encoded into I frames are encoded into P frames, so that the encoded data volume of the original images which need to be encoded into the I frames is reduced, the probability of packet loss during transmission is reduced, meanwhile, the decoding efficiency of a receiving terminal is correspondingly improved, and the playing fluency of the corresponding video frames is improved. If the comparison result is that the images move relatively, the difference between the two frames of original images is larger, the original images needing to be encoded into I frames are encoded into the I frames, so that the complete information of the original images is reserved, the images encoded into P frames are prevented, and the images decoded and displayed by the receiving terminal are enabled to be more blurred. The coding mode of the original image which needs to be coded into the I frame is dynamically adjusted by comparing the conditions, so that the definition and fluency of the video frame are considered when the receiving terminal decodes and plays the video frame after receiving the coded frame data.

Drawings

FIG. 1 is an application scenario diagram of a video data encoding method in one embodiment;

FIG. 2 is an application scenario diagram of a video data encoding method according to another embodiment;

FIG. 3 is an internal block diagram of a first terminal in one embodiment;

FIG. 4 is a flow chart of a video data encoding method in one embodiment;

FIG. 5 is a diagram illustrating video frame encoding in one embodiment;

FIG. 6 is a flowchart of a video data encoding method in another embodiment;

FIG. 7 is a schematic diagram of region division of an original image in one embodiment;

FIG. 8 is a diagram of a receiving terminal displaying a picture in a video frame in one embodiment;

FIG. 9 is a block diagram showing a structure of a video data encoding apparatus in one embodiment;

fig. 10 is a block diagram showing the structure of a video data encoding apparatus according to another embodiment.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The video data coding method provided by the embodiment of the invention can be applied to an application environment shown in fig. 1. Referring to fig. 1, a first terminal 110 and a second terminal 120 may perform data transmission through a network. The first terminal 110 may act as a transmitting terminal of video data and the second terminal 120 may act as a receiving terminal of video data. The first terminal 110 may acquire a video frame to be encoded; and encodes the video frame image of each frame as an I-frame or a P-frame. After encoding of the video frame image of one or more frames is completed, the encoded I-frame or P-frame data may be transmitted to the second terminal. When the second terminal 120 receives the encoded I frame or P frame, it can decode the corresponding video according to the corresponding decoding mode and play the video. It will be appreciated that, in other scenarios, the first terminal 110 may also be used as a receiving terminal of video data, and the second terminal 120 may be used as a transmitting terminal of video data, where the second terminal 120 encodes the video data and then transmits the encoded video data to the first terminal 110 for decoding and playing.

In one embodiment, as shown in fig. 2, a specific application environment of the video data encoding method is provided. The application environment is also exemplified by the first terminal 110 as a terminal transmitting video data and the second terminal 120 as a terminal receiving video data. The first terminal 110 may collect, according to a preset frequency, picture information displayed on a desktop window of the first terminal as a video frame to be encoded. And encoding the acquired video frames into I frames or P frames according to a preset encoding mode, and transmitting the encoded I frames or P frames to a second terminal in real time, wherein the transmitted data also carries an encoding identifier. After receiving the data sent by the first terminal, the second terminal 120 may decode the received I frame or P frame into a corresponding picture according to the coding identifier by adopting a corresponding decoding manner, and render and display the decoded picture. The application scene in this embodiment may be a scene of real-time video transmission, for example, may be a scene of performing a video conference, and the video data encoded by the first terminal 110 may be video data in the video conference or shared desktop data.

In one embodiment, as shown in fig. 3, an internal structure diagram of the first terminal in one embodiment is shown. The first terminal includes a processor, a non-volatile storage medium, an internal memory, an encoder, a network interface, and a display screen connected by a system bus. Wherein the processor of the first terminal is configured to provide computing and control capabilities supporting the operation of the entire server. The terminal has an operating system and a video data encoding device stored in a nonvolatile storage medium. The video data encoding apparatus is used to implement a video data encoding method provided in the following embodiments. The internal memory may have stored therein computer readable instructions that, when executed by the processor, cause the processor to perform a video data encoding method as provided in the following embodiments. The network interface is used for communicating with an external terminal or server to realize data transmission. Such as sending the encoded data to the second terminal. The display screen may be used to display video pictures, such as images that may be displayed on a desktop.

It will be appreciated by those skilled in the art that the structure shown in fig. 3 is merely a block diagram of a portion of the structure associated with the present application and does not constitute a limitation of the first terminal to which the present application is applied, and that a particular first terminal may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components. For example, the first terminal may further include a camera, which is configured to scan the visible area to generate a video frame.

In one embodiment, as shown in fig. 4, a video data encoding method is provided, which is applicable to the application environment as shown in fig. 1 or fig. 2. The embodiment is described by taking the application of the method to the first terminal as an example, and includes:

step S402, a video frame to be encoded is acquired.

In one embodiment, the video frame to be encoded is a set of original images in the video to be encoded. The video frame to be encoded includes all of the original images constituting the corresponding video, or includes only the original images constituting a portion of the corresponding video. For example, the original images in the video may be extracted according to a preset extraction frequency, and the extracted original image set is used as a video frame to be encoded. The video to be encoded can be an existing complete video, and can also be a video generated in real time through camera scanning.

In one embodiment, the first terminal may invoke the camera to scan the visible area, generate a video frame according to a preset frame rate, and present the video frame on the display interface in real time. For the original image presented in real time, the original image can be acquired according to the sampling frequency which is less than or equal to the frame rate, and the acquired original image set is taken as a video frame to be encoded. In one embodiment, the first terminal may also directly collect the picture on the desktop window according to the preset frequency, as the video frame to be encoded.

In step S404, when encoding an I frame, the original image encoded as the I frame is buffered.

In this embodiment, I-frames are encoded using an intra-frame prediction algorithm, and intra-frame prediction is a prediction encoding algorithm performed in the spatial domain. The I-frame is typically the first frame of each GOP (Group of Pictures ). A GOP is a set of consecutive encoded frames, being a set of I frames plus P frames that it references to encode. The P frame is encoded using an inter-frame prediction algorithm, and the inter-frame prediction is a prediction encoding algorithm performed in a time domain, and for example, the P frame is obtained by performing prediction encoding with reference to a previous frame and then encoding a current frame. The P-frames record the difference information between the encoded original image and other original images, and the decoding of the P-frames needs to be based on the encoded frames referenced by the P-frames to decode the corresponding video frames.

The GOP length is the sum of the number of I frames encoded and P frames located after. When the original picture to be encoded is encoded as an I-frame, it indicates the start of one GOP and the end of the last GOP.

In one embodiment, the first terminal may set the length of the GOP in advance and encode the video frames according to the set length of the GOP. The GOP length may be any predetermined length, or may be a plurality of different lengths to switch or adjust, and is not limited to a fixed length.

The first frame original image in the video frame to be encoded may be encoded as an I frame. When an I frame is encoded, the original image of the I frame may be buffered.

In one embodiment, the original image may be the complete video frame data corresponding to the video frame, or may be the stored video frame data after the frame data is subjected to data culling. For example, the luminance component in the video frame may be extracted, while the color component is discarded, and the video frame with the luminance component preserved is taken as the original image of the I frame. By reserving only the luminance component, the amount of data buffered can be reduced, saving storage space.

In one embodiment, the acquired video frames to be encoded may be encoded according to a preset codec standard. The preset encoding and decoding standard may be an H264 encoding standard, and may also be any one of VP8, VP9, H263, and H265 encoding and decoding standard. For example, an original image to be encoded may be encoded as an I frame or a P frame according to the H264 encoding standard.

In step S406, when encoding to the next I frame, the original image to be encoded as the I frame is compared with the original image in the buffer memory, and whether the comparison result is the image relative still or the image relative motion is determined.

In one embodiment, encoding to the next I frame represents a situation in which the original image to be currently encoded is ready to be encoded as an I frame. Specifically, when encoding into the next I frame, it means that the current GOP of one length is ready to end, and the next GOP of the same length is entered.

For example, as shown in fig. 5, if the preset GOP length is 30, the first frame of original image 501 under one preset GOP is taken as an original image to be encoded as an I frame, and the remaining 29 frames of original images are taken as original images to be encoded as P frames. 30 frames in the original image 501 to the original image 503 are all original images under the current GOP. When the original image 502 to the original image 503 are all encoded as P frames and encoded to the original image 504, this indicates that the next I frame is encoded.

In one embodiment, an original image to be encoded as an I frame and a buffered original image of a previous frame encoded as an I frame may be compared, and whether the original image to be encoded as an I frame is still or moving with respect to the buffered original image to be compared may be determined based on the comparison result. Specifically, all the data contained in the two frames of original images may be compared, or only the data of the luminance component in the two frames of original images may be compared.

In step S408, when the comparison result is that the image is relatively still, the original image to be encoded as the I frame is encoded as the P frame.

When it is determined that the original image to be encoded as an I frame is still relative to the compared buffered original image, that is, the similarity between the two original images is high, the original image to be encoded as an I frame may be encoded as a P frame. Specifically, the original image to be encoded as an I frame may be encoded as a P frame with reference to any one of encoded frames from and after the last I frame. For example, the original image 504 as in fig. 5 may be encoded as a P frame.

In one embodiment, the original image that needs to be encoded as an I frame may be encoded as a P frame with reference to the last I frame, or the original image that needs to be encoded as an I frame may be encoded as a P frame with reference to the last encoded frame.

Because the P frame reflects the difference information between the corresponding original image and the reference image, the higher the similarity between the original image to be encoded and the original image to be correspondingly referenced is, the smaller the data volume occupied by the corresponding encoded I frame is, and the original image with a relatively static comparison result is encoded into the P frame, so that the occupied volume of the encoded data can be reduced. So that the encoding quality of the original image is higher under the same code rate. And when the subsequent receiving terminal decodes the P frame, the decoding efficiency can be improved, so that the fluency of playing the decoded video can be improved.

In step S410, when the comparison result is the image relative motion, the original image to be encoded as the I frame is encoded as the I frame.

When it is determined that the original image to be encoded as an I frame moves relative to the buffered original image for comparison, it is indicated that the difference between the two frames of original images is large. If encoding continues as P-frames, the amount of data generated due to the need to encode frame data of the original image with reference to the previous I-frame is also relatively large and cannot be reduced significantly compared to encoding it as an I-frame. When the receiving terminal decodes the P frame, the decoding efficiency relative to the I frame cannot be greatly improved, but even can be reduced, so that the corresponding picture cannot be decoded in time. And when the coding is continued to be P frames, the probability of error occurrence of coding or decoding is increased due to larger data quantity, so that the image decoded by the receiving terminal becomes blurred. Therefore, the original image which is required to be encoded into the I frame is encoded into the I frame without changing the encoding mode of the original image, so that the complete information of the original image after the image change is reserved, and the receiving terminal decodes the I frame image, and the displayed picture has a certain definition.

According to the video data coding method, when the next I frame is coded, the original image which needs to be coded into the I frame is compared with the original image in the buffer memory; it is determined whether to encode the original image to be encoded as an I frame or as a P frame by comparing the results. If the comparison result is that the images are relatively static, the similarity between the two frames of original images is larger, the original images which need to be encoded into I frames are encoded into P frames, so that the encoded data volume of the original images which need to be encoded into the I frames is reduced, the probability of packet loss during transmission is reduced, meanwhile, the decoding efficiency of a receiving terminal is correspondingly improved, and the playing fluency of the corresponding video frames is improved. If the comparison result is that the images move relatively, the difference between the two frames of original images is larger, the original images needing to be encoded into I frames are encoded into the I frames, so that the complete information of the original images with larger difference is reserved, the images are prevented from being encoded into P frames, and the images decoded and displayed by the receiving terminal are enabled to be more blurred. The coding mode of the original image which needs to be coded into the I frame is dynamically adjusted by comparing the conditions, so that the definition and fluency of the video frame are considered when the receiving terminal decodes and plays the video frame after receiving the coded frame data.

In one embodiment, the video data encoding method further includes: when encoding to a P frame, detecting whether the original image needing to be encoded to the P frame has scene change relative to the original image of the previous frame, if so, encoding the original image needing to be encoded to the P frame, otherwise, encoding the original image needing to be encoded to the P frame to the I frame.

In one embodiment, encoding into one P frame represents a case where the original image to be currently encoded is ready to be encoded into a P frame.

Similar to the relative motion of images, the scene change also indicates that there is some difference between the original image and the compared original image. When an original image to be encoded into a P frame is encoded, whether scene change occurs to the original image relative to an original image of a previous frame is detected, and an encoding mode of the original image to be encoded into the P frame is determined according to a detection result.

When it is determined that a scene change occurs, it is encoded as an I frame for the original image that needs to be encoded as a P frame. When it is determined that no scene change occurs, an original image that needs to be encoded as a P frame is encoded as a P frame.

In one embodiment, when the original image to be encoded as the P frame is encoded as the I frame, the above step S404 may be performed, and the original image to be encoded as the P frame is buffered. And further takes the I-frame as frame data of the first frame in the preset length of the current GOP as a reference frame for the original picture to be encoded as a P-frame located thereafter.

In this embodiment, by performing scene change detection on an original image to be encoded as a P frame, when it is determined that scene change occurs, the original image to be encoded as a P frame is encoded as an I frame. If encoding continues as P frames, the amount of data cannot be reduced significantly compared to encoding it as I frames. When the receiving terminal decodes the P frame, the decoding efficiency relative to the I frame cannot be greatly improved, but even can be reduced, so that the corresponding picture cannot be decoded in time. And when the coding is continued to be P frames, the probability of error occurrence of coding or decoding is increased due to larger data quantity, so that the image decoded by the receiving terminal becomes blurred. Therefore, the original image which needs to be encoded into the P frame is encoded into the I frame so as to keep the complete information of the original image after the image change, so that the receiving terminal decodes the I frame image, and the displayed picture has a certain definition. The coding mode of the original image which needs to be coded into the P frame is dynamically adjusted by comparing the conditions, so that the definition and fluency of the video frame are considered when the receiving terminal decodes and plays the video frame after receiving the coded frame data.

In one embodiment, the original image that needs to be encoded as an I-frame is compared to the original image in the buffer in at least one of the following ways: comparing binary data of the two frames of images, wherein if the binary data are equal, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; comparing the two frames of images pixel by pixel, calculating the absolute error of the pixels of the two frames of images, if the absolute error is smaller than a preset error threshold, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; dividing the two frames of images into areas with preset numbers, comparing binary data of the two frames of images in each area one by one, if the number of the areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving.

In one embodiment, the binary data of the image includes all binary data in the original image to be compared. If the cached original image is the brightness component in the original image, the binary data to be compared is the binary data of the corresponding brightness component. When the comparison results are the same, the images displayed by the two frames of images are the same, and the comparison result is judged to be that the images are relatively static. If the comparison results are different, the images displayed by the two frames of images are different, and the comparison result is judged to be the relative motion of the images. The binary data of the two frames of images are directly compared, so that a comparison result can be obtained rapidly, and the comparison efficiency of the two frames of images is improved.

In one embodiment, the first terminal may acquire pixel data of two frames of images, compare pixels at each corresponding position in the two frames of images one by one, calculate an absolute error of the pixel at each position, and accumulate the absolute errors of the pixels at each position, where the sum of the absolute errors is the absolute error of the pixels of the two frames of images. Comparing the calculated absolute error sum with a preset error threshold, wherein when the absolute error is smaller than the preset error threshold, the comparison result is that the image is relatively static, otherwise, the comparison result is that the image is relatively moving.

Specifically, the formula sad= Σabs (i ₁ -i ₀ ) To calculate the absolute error of the two frames of images. Wherein i is ₁ A pixel value representing an ith pixel in the original image to be encoded as an I frame; i.e ₀ Representing the pixel value of the i-th pixel of the buffered original image of the previous frame. The SAD is the absolute error of two frames of images. The smaller the absolute error, i.e. the smaller the difference between the two frames of images. By the calculated absolute error, the image of two frames can be accurately reflectedIs a difference in size of (a).

In one embodiment, the second terminal may divide the two frames of images to be compared into a preset number of areas according to a preset number of division manners. Wherein the size of each region may be the same or different. The first terminal can compare whether binary data of each pair of divided areas in the two frames of images are the same or not, and calculate the number of the same binary data, if the number of the areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving. The preset number may be less than or equal to the preset number of divided regions. For example, it may be half or two-thirds of the predetermined number, etc.

The two frames of images are divided into areas, binary data of the two frames of images in each area are compared one by one, and a comparison result is judged according to the number of areas with the same binary data, so that the comparison efficiency and the accuracy of comparison difference of the two frames of images can be considered.

In one embodiment, the method further comprises: when a request of a newly added video receiving terminal is received, the video frame to be coded currently is coded as an I frame.

In this embodiment, the first terminal may encode and transmit the original image in the acquired video frame in real time, and send the encoded data to the receiving terminal. Wherein the receiving terminal may include a plurality of. The first terminal may receive a request for a new video receiving terminal at any time when encoding is performed.

When a request of a newly added video receiving terminal is received, whether the current video frame to be encoded is an original image of the next I frame to be encoded or an original image of the next P frame to be encoded, the original image to be encoded is directly encoded into the I frame. And may perform step S404 described above to buffer the original image.

In this embodiment, when a receiving terminal with a newly added video is detected, the video frame to be encoded currently is the first frame, so that the video frame to be encoded currently can be directly encoded as an I frame, relative to the newly added video receiving terminal. When the coding is prevented from being P frames, a newly added receiving terminal has difficulty in decoding and displaying the previous reference frames because the new receiving terminal does not receive the previous reference frames. Therefore, the video frame to be encoded is encoded into the I frame, so that the newly added receiving terminal can decode the corresponding picture without referring to the encoded frames of other frames, and the timeliness of the newly added receiving terminal for displaying the video picture is improved.

In one embodiment, the P-frame is encoded with a first quantization parameter and the I-frame is encoded with a second quantization parameter, wherein the second quantization parameter is greater than the first quantization parameter.

In this embodiment, the first terminal further sets a first quantization parameter and a second quantization parameter, where the first quantization parameter is greater than the second quantization parameter. Quantization refers to the process of approximating a continuous value (or a large number of possible discrete values) of a signal to a finite number (or fewer) discrete values. The quantization parameter (Quantization Parameter, QP) may reflect the compression rate of the quantized data as compared to the compression rate before quantization. The smaller the value of the quantization parameter, the finer the quantization, the smaller the corresponding compression rate, and thus the more the original data is lost. The larger the value of the quantization parameter, the coarser the quantization, the larger the corresponding compression rate, and thus the more the original data is lost.

Since the amount of occupied data is larger for the I frame data than for the P frame data, the original image determined to be encoded as the I frame can be encoded as the I frame according to the larger second quantization parameter, so as to properly reduce the amount of data of the I frame and reduce the probability of packet loss when the I frame data is transmitted. The original image determined to be encoded into the P frame is encoded into the P frame according to the smaller first quantization parameter, and more data volume is reserved, so that a clearer picture can be obtained when the subsequent receiving end decodes the P frame.

In one embodiment, the first quantization parameter and the second quantization parameter may be set or adjusted correspondingly according to a set code rate (i.e., bit rate). If the code rate is larger, the smaller first quantization parameter and the smaller second quantization parameter can be correspondingly set to properly improve the data quantity coded according to the quantization parameters, so that the coding quality is improved, and the receiving terminal obtains a clearer picture. If the code rate is smaller, the smaller first quantization parameter and the smaller second quantization parameter can be correspondingly set so as to properly reduce the data quantity coded according to the quantization parameters, and the receiving terminal can acquire smoother pictures.

In one embodiment, video frames are encoded with an encoding rate corresponding to CPU utilization, and the encoding rate is dynamically adjusted during encoding based on CPU utilization acquired in real time.

In this embodiment, the smaller the encoding speed, the higher the quality of the corresponding encoding, and correspondingly, the more CPU utilization is occupied. The greater the encoding speed, the lower the quality of the corresponding encoding and, correspondingly, the less CPU utilization.

The first terminal may pre-establish a correspondence between CPU utilization and encoding speed of the different terminals themselves. The greater the CPU utilization rate is, the greater the corresponding encoding speed is, so that the occupation of the encoding on the CPU is improved and reduced; the smaller the CPU utilization ratio is, the smaller the corresponding encoding speed is, so as to properly improve the occupation of the encoding on the CPU and improve the encoding quality.

The first terminal can acquire the current CPU utilization rate according to a certain frequency, acquire the corresponding coding speed according to the acquired current CPU utilization rate, and code the video frame according to the coding speed. Wherein the original image determined to be encoded as the I frame may be encoded as the I frame according to the encoding speed; the original image determined to be encoded as the P frame is encoded as the P frame according to the encoding speed. The CPU utilization rate obtained in real time dynamically adjusts the encoding speed, and encodes video frames according to the adjusted encoding speed, so that the playing quality of the video obtained by the receiving terminal can be improved as a whole.

In one embodiment, as shown in fig. 6, another video data encoding method is provided, which is illustrated as applied to a first terminal, including:

in step S602, a video frame to be encoded is acquired.

In one embodiment, a video frame is composed of original images of multiple frames. The first terminal may acquire an original image of each frame in the video frame to encode, or may extract the original image in the video according to a certain extraction frequency to encode. The original image can be a frame image acquired by a camera in real time, and can also be an image displayed on a desktop of the first terminal.

In step S604, when one I frame is encoded, the original image encoded as the I frame is buffered.

In one embodiment, the buffered original image is used to compare with the original image that subsequently needs to be encoded as an I-frame, comparing the difference between the two frames of images. Thus, data of only the luminance component in the original image of the I frame can be encoded, so that the amount of buffered data is reduced without affecting the contrast difference.

In step S606, it is determined whether the original image to be encoded is an original image to be encoded as an I frame or an original image to be encoded as a P frame. If the original image is to be encoded as an I frame, step S608 is performed, and if the original image is to be encoded as a P frame, step S610 is performed.

In one embodiment, the original picture to be encoded currently and the preset number of original pictures located thereafter may be set to be the original picture to be encoded as the I frame or the original picture to be encoded as the I frame, respectively, periodically according to the length of the current GOP.

In step S608, the original image to be encoded into the I frame is compared with the original image in the buffer, and whether the comparison result is the image relative still or the image relative motion is determined. If the image is relatively stationary, step S612 is performed, and if the image is relatively moving, step S614 is performed.

In one embodiment, it may be determined from the comparison whether the original image was encoded as an I-frame original image relative to the previous frame in the buffer, or whether the image was relatively moving or stationary.

In one embodiment, the two frames of images can be divided into a preset number of areas, binary data of the two frames of images in each area are compared one by one, if the number of areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is determined to be that the images are relatively moving.

For example, the preset number may be set to 6 and the preset number may be set to 3. As shown in fig. 7, the buffered original image of the previous frame encoded as an I-frame is equally divided into 6 areas, each corresponding to one binary data. Meanwhile, the original image to be coded is divided into 6 areas according to a corresponding dividing mode, binary data of the 6 areas are obtained, and whether the binary data of each area are equal or not in two frames of images is compared one by one. When the number of the equal frames exceeds 3, it is determined that the original image to be encoded is still relative to the original image encoded as the I frame from the previous frame in the buffer, and step S612 is performed. Otherwise, it is determined that the comparison result is the image relative motion, and step S614 is performed.

In step S610, it is detected whether a scene change occurs in the original image to be encoded with respect to the original image of the previous frame. If yes, step S614 is executed, otherwise step S612 is executed.

When the original image to be encoded is an original image to be encoded as a P frame in the current GOP encoding mode, it is detected whether a scene change occurs in the original image to be encoded relative to the original image of the previous frame.

In one embodiment, it may be detected whether a difference between the original image with respect to an image of a previous frame exceeds a preset difference size, and it is determined whether a scene change occurs according to the difference size.

In one embodiment, the first terminal may preset a scenecat parameter value, which is a critical value for deciding whether a scene change occurs in an original image to be encoded with respect to an original image of a previous frame. The first terminal may calculate a metric value for the original image of each frame by the encoder, the metric value being used to reflect a degree of difference from the original image of the previous frame. The larger the metric value is, the greater the degree of difference between the original image to be edited and the original image of the previous frame is. The first terminal may compare the calculated metric value with the scenecat parameter value, and if the calculated metric value is greater than the scenecat parameter value, determine that a scene change occurs if a difference between an original image to be encoded with respect to an original image of a previous frame is excessively large; otherwise, it is determined that no scene change has occurred.

When it is determined that scene change occurs, step S614 is performed, otherwise step S612 is performed.

In step S612, the original image to be encoded is encoded as a P frame.

In step S614, the original image to be encoded is encoded as an I frame.

In one embodiment, video frames may be encoded with an encoding rate corresponding to CPU utilization, and the encoding rate may be dynamically adjusted during encoding based on CPU utilization acquired in real-time.

The first terminal may pre-establish a correspondence between CPU utilization and encoding speed of the different terminals themselves. For example, the preset encoding speeds are three levels, i.e., a first encoding speed, a second encoding speed, and a third encoding speed, from small to large. The preset CPU utilization rate is from small to large as a first CPU utilization rate threshold value and a second CPU utilization rate threshold value.

The first terminal may periodically collect the CPU utilization, and when the utilization is lower than the first CPU utilization threshold, reduce the current encoding speed by one level if it is not the lowest encoding speed. For example, the current encoding speed may be reduced from the third encoding speed to the second encoding speed, or the current encoding speed may be reduced from the second encoding speed to the first encoding speed. When the utilization is greater than the second CPU utilization threshold, if the current encoding speed is not the highest encoding speed, the current encoding speed is increased by one level. For example, the current encoding speed may be increased from the first encoding speed to the second encoding speed, or the current encoding speed may be increased from the second encoding speed to the third encoding speed.

And correspondingly encoding the original image to be encoded into an I frame or a P frame according to the adjusted encoding speed and the determined encoding mode. The CPU utilization rate obtained in real time dynamically adjusts the encoding speed, and encodes video frames according to the adjusted encoding speed, so that the playing quality of the video obtained by the receiving terminal can be improved as a whole.

According to the video data coding method, the coding mode of the original image to be coded is determined by comparing the difference between two frames of images. When the difference between the continuous original images in the video frames is larger, the corresponding original images are encoded into I frames so as to keep the complete information of the original images with larger difference, prevent the original images from being encoded into P frames, and enable the pictures displayed by decoding of the receiving terminal to become more blurred. When the difference between the continuous original images in the video frames is smaller, the original images needing to be encoded into I frames are encoded into P frames, so that the data volume of the encoded data is reduced, and meanwhile, the decoding efficiency of a receiving terminal is correspondingly improved and the playing fluency of the corresponding video frames is improved due to the reduction of the encoded data volume.

In one embodiment, the method further comprises: when it is determined that an original picture to be encoded as an I frame is encoded as a P frame or an original picture to be encoded as a P frame is encoded as an I frame, the length of the current GOP is adjusted.

GOPs of two lengths, a first length and a second length, can be preset, wherein the first length is greater than the second length. And taking the first frame original image corresponding to each GOP as an original image needing to be encoded into an I frame. When entering the next GOP, the length of the next GOP may be kept the same as the length of the previous GOP. If it is determined that the original image to be encoded is encoded as an I frame, the next GOP is entered.

Specifically, if it is determined that the original image to be encoded as the I frame is encoded as the P frame and the length of the current GOP is the first length, the length of the current GOP is adjusted to the second length, so as to reduce the number of the encoded I frames in the whole video frame when the original image to be encoded is relatively still. If it is determined that the original image to be encoded as the P frame is encoded as the I frame and the length of the current GOP is the second length, the length is adjusted to the first length to reduce the number of encoded P frames in the entire video frame when the scene of the original image to be encoded changes.

There may be situations of inattention due to differences between successive video frame images in the video file. According to the video data encoding method, when the original image which needs to be encoded into the I frame is detected to be relatively static, the difference between the images of the continuous video frames is smaller, the length of the current GOP can be prolonged, and the current GOP is encoded into the P frame. When detecting that the scene change occurs in the original image which needs to be encoded into the P frame, the method indicates that the difference between the continuous video frame images is large, and can shorten the length of the current GOP and encode the current GOP into the I frame. It is achieved that the length of the current GOP is adaptively adjusted according to the size of the differences between the video frame pictures to be encoded. The video receiving terminal can give consideration to the definition and fluency of the displayed video picture when receiving the encoded data, decoding and playing the picture in the video frame.

In one embodiment, the first terminal may send the encoded data to the second terminal in real time, where the encoded data sent carries the encoded identifier. And the first terminal can send the coded data after the original image of one frame is coded in real time, or package and send the coded data to the second terminal after the original image of the preset number of frames is coded.

After receiving the coded data sent by the first terminal, the second terminal obtains a coded identifier carried by the coded data, can identify a coding standard adopted by the corresponding coded data and a coding mode such as an I frame or a P frame of the coded data according to the coded identifier, and decodes and displays a corresponding frame image according to a corresponding decoding mode. As shown in fig. 8, a decoded picture is shown when the second terminal decodes encoded data by the video data encoding method provided by the above-described embodiment. The picture displayed by decoding the data of the first I frame is relatively blurred, and the picture of the subsequent P frame is clearer and the flow of video playing is not influenced. In the process of decoding and displaying the coded data sent by the first terminal according to the video data coding method, the fluency of video playing is maintained under the condition of low code rate, and video pictures can be displayed from blurring to definition, and finally, the video pictures are very clear.

In one embodiment, as shown in fig. 9, a video data encoding apparatus is provided. The apparatus includes a video frame acquisition module 902, an image buffer module 904, an image contrast module 906, and a video encoding module 908. Wherein:

a video frame acquisition module 902, configured to acquire a video frame to be encoded.

The image buffering module 904 is configured to buffer an original image encoded as an I frame when encoding of the I frame is completed.

The image comparing module 906 is configured to compare an original image that needs to be encoded into an I frame with an original image in a buffer when encoding into a next I frame.

The video encoding module 908 is configured to encode an original image to be encoded as an I frame into a P frame when the comparison result is that the image is relatively still; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

In one embodiment, as shown in fig. 10, another video data encoding apparatus is provided. The apparatus further comprises:

the scene change detection module 910 is configured to detect, when encoding into a P frame, whether a scene change occurs in an original image that needs to be encoded into the P frame relative to an original image of a previous frame.

The video encoding module 908 is further configured to encode an original image that needs to be encoded as a P frame into a P frame when a scene change occurs, and otherwise encode an original image that needs to be encoded as a P frame into an I frame.

In one embodiment, the image comparison module 906 is further configured to compare the original image that needs to be encoded as an I-frame with the original image in the buffer in at least one of the following ways:

comparing binary data of the two frames of images, wherein if the binary data are equal, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving;

comparing the two frames of images pixel by pixel, calculating the absolute error of the pixels of the two frames of images, if the absolute error is smaller than a preset error threshold, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving;

dividing the two frames of images into areas with preset numbers, comparing binary data of the two frames of images in each area one by one, if the number of the areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving.

In one embodiment, the video encoding module 908 is further configured to encode the video frame currently to be encoded into an I frame when a request from a new video receiving terminal is received.

In one embodiment, the video encoding module 908 is further configured to encode a P-frame with a first quantization parameter and an I-frame with a second quantization parameter, wherein the second quantization parameter is greater than the first quantization parameter.

In one embodiment, the video encoding module 908 is further configured to encode video frames with an encoding rate corresponding to the CPU utilization, and dynamically adjust the encoding rate according to the CPU utilization obtained in real time during the encoding process.

The respective modules in the above-described video data encoding apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules can be embedded in the memory of the terminal in a hardware form or independent of the terminal, and can also be stored in the memory of the terminal in a software form, so that the processor can call and execute the operations corresponding to the above modules. The processor may be a Central Processing Unit (CPU), microprocessor, single-chip microcomputer, etc.

When the video data encoding device encodes the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory; it is determined whether to encode the original image to be encoded as an I frame or as a P frame by comparing the results. If the comparison result is that the images are relatively static, the similarity between the two frames of original images is larger, the original images which need to be encoded into I frames are encoded into P frames, so that the encoded data volume of the original images which need to be encoded into the I frames is reduced, the probability of packet loss during transmission is reduced, meanwhile, the decoding efficiency of a receiving terminal is correspondingly improved, and the playing fluency of the corresponding video frames is improved. If the comparison result is that the images move relatively, the difference between the two frames of original images is larger, the original images needing to be encoded into I frames are encoded into the I frames, so that the complete information of the original images with larger difference is reserved, the images are prevented from being encoded into P frames, and the images decoded and displayed by the receiving terminal are enabled to be more blurred. The coding mode of the original image which needs to be coded into the I frame is dynamically adjusted through the comparison result, so that the definition and fluency of the video frame are considered when the receiving terminal decodes and plays the video frame after receiving the coded frame data.

In one embodiment, a computer device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the program: acquiring a video frame to be encoded; when one I frame is coded, caching an original image coded into the I frame; when encoding to the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory; when the comparison result is that the image is relatively static, the original image which needs to be encoded into an I frame is encoded into a P frame; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

In one embodiment, the above processor further performs the following steps when executing the program: when encoding to a P frame, detecting whether the original image needing to be encoded to the P frame has scene change relative to the original image of the previous frame, if so, encoding the original image needing to be encoded to the P frame, otherwise, encoding the original image needing to be encoded to the P frame to the I frame.

In one embodiment, when the above processor executes the program, the comparing the original image to be encoded as the I frame with the original image in the buffer is implemented in at least one of the following ways: comparing binary data of the two frames of images, wherein if the binary data are equal, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; comparing the two frames of images pixel by pixel, calculating the absolute error of the pixels of the two frames of images, if the absolute error is smaller than a preset error threshold, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; dividing the two frames of images into areas with preset numbers, comparing binary data of the two frames of images in each area one by one, if the number of the areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving.

In one embodiment, the above processor further performs the following steps when executing the program: when a request of a newly added video receiving terminal is received, the video frame to be coded currently is coded as an I frame.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of: acquiring a video frame to be encoded; when one I frame is coded, caching an original image coded into the I frame; when encoding to the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory; when the comparison result is that the image is relatively static, the original image which needs to be encoded into an I frame is encoded into a P frame; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

In one embodiment, the program when executed by the processor further performs the steps of: when encoding to a P frame, detecting whether the original image needing to be encoded to the P frame has scene change relative to the original image of the previous frame, if so, encoding the original image needing to be encoded to the P frame, otherwise, encoding the original image needing to be encoded to the P frame to the I frame.

In one embodiment, the program, when executed by the processor, performs the comparing of the original image to be encoded as an I-frame with the original image in the buffer in at least one of the following ways: comparing binary data of the two frames of images, wherein if the binary data are equal, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; comparing the two frames of images pixel by pixel, calculating the absolute error of the pixels of the two frames of images, if the absolute error is smaller than a preset error threshold, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving; dividing the two frames of images into areas with preset numbers, comparing binary data of the two frames of images in each area one by one, if the number of the areas with the same binary data exceeds the preset number, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving.

In one embodiment, the program when executed by the processor further performs the steps of: when a request of a newly added video receiving terminal is received, the video frame to be coded currently is coded as an I frame.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), or the like.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims

1. A method of video data encoding, comprising:

acquiring a video frame to be encoded;

entering a current GOP, and caching an original image coded into an I frame when the I frame is coded;

when encoding to the next I frame, comparing the original image needing to be encoded into the I frame with the original image in the buffer memory; the original image to be encoded into the I frame is an original image corresponding to the first frame in the next GOP;

when the comparison result is that the image is relatively static, the original image which needs to be encoded into an I frame is encoded into a P frame, and the length of the current GOP is prolonged; the GOP length is the sum of the number of encoded I frames and P frames located after the encoded I frames;

when the comparison result is that the images move relatively, the next GOP is entered, and the original images which need to be encoded into I frames are encoded into the I frames.

2. The method according to claim 1, wherein the method further comprises:

when encoding to a P frame, detecting whether the original image needing to be encoded into the P frame has scene transformation relative to the original image of the previous frame, if so, encoding the original image needing to be encoded into the P frame, otherwise, encoding the original image needing to be encoded into the P frame into the I frame.

3. The method of claim 1, wherein comparing the original image to be encoded as an I-frame with the original image in the buffer is performed by at least one of:

comparing two frames of images pixel by pixel, calculating the absolute error of the pixels of the two frames of images, if the absolute error is smaller than a preset error threshold, the comparison result is that the images are relatively static, otherwise, the comparison result is that the images are relatively moving;

4. The method according to claim 1, wherein the method further comprises:

when a request of a newly added video receiving terminal is received, the video frame to be coded currently is coded as an I frame.

5. The method of claim 1, wherein the P-frame is encoded with a first quantization parameter and the I-frame is encoded with a second quantization parameter, wherein the second quantization parameter is greater than the first quantization parameter.

6. The method of claim 1, wherein the video frames are encoded with an encoding rate corresponding to CPU utilization, and wherein the encoding rate is dynamically adjusted during encoding based on CPU utilization obtained in real time.

7. A video data encoding apparatus, the apparatus comprising:

the image caching module is used for entering a current GOP, and caching an original image coded into an I frame when the I frame is coded;

the image comparison module is used for comparing an original image which needs to be encoded into an I frame with an original image in a cache when the next I frame is encoded; the original image to be encoded into the I frame is an original image corresponding to the first frame in the next GOP;

the video coding module is used for coding the original image which needs to be coded into an I frame into a P frame when the comparison result shows that the image is relatively static, and prolonging the length of the current GOP; the GOP length is the sum of the number of encoded I frames and P frames located after the encoded I frames; when the comparison result is the relative motion of the images, the original images needed to be encoded into I frames are encoded into the I frames.

8. The apparatus of claim 7, wherein the apparatus further comprises:

the scene change detection module is used for detecting whether the original image needing to be encoded into the P frame is subjected to scene change relative to the original image of the previous frame when the original image is encoded into the P frame;

the video coding module is further configured to code the original image to be coded as a P frame when a scene change occurs, and otherwise code the original image to be coded as a P frame as an I frame.

9. The apparatus of claim 7, wherein the image comparison module is further configured to compare an original image that needs to be encoded as an I-frame with an original image in the buffer by at least one of:

10. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the video coding module is also used for coding the video frame to be coded currently into an I frame when receiving the request of the newly added video receiving terminal.

11. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the video encoding module is further configured to encode a P-frame using a first quantization parameter and encode an I-frame using a second quantization parameter, wherein the second quantization parameter is greater than the first quantization parameter.

12. The apparatus of claim 7, wherein the device comprises a plurality of sensors,

the video coding module is also used for coding video frames by adopting the coding speed corresponding to the CPU utilization rate, and dynamically adjusting the coding speed according to the CPU utilization rate acquired in real time in the coding process.

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 6 when the program is executed.

14. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.