CN116916032A

CN116916032A - Video encoding method, video encoding device, electronic equipment and storage medium

Info

Publication number: CN116916032A
Application number: CN202311013733.4A
Authority: CN
Inventors: 黄震坤
Original assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Current assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-10-20

Abstract

The present disclosure provides a video encoding method, apparatus, electronic device, and storage medium, the method comprising: encoding a plurality of image frames contained in a received original video stream to obtain a plurality of corresponding frame encoded data; combining the frame coding data to obtain a joint coding data stream corresponding to the original video stream; wherein the encoded data with each frame is obtained by: dynamically monitoring the current resource occupancy rate corresponding to the image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types, and carrying out encoding processing on the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame. The method realizes the joint coding of the original video stream by the adaptive video encoder dynamically switched according to the real-time resource occupancy rate, and improves the efficiency of video coding on the premise of reasonably allocating resources.

Description

Video encoding method, video encoding device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of audio and video, and in particular relates to a video coding method, a video coding device, electronic equipment and a storage medium.

Background

Video is a continuous sequence of images, consisting of successive frames, one frame being an image. Due to the persistence of vision effect of the human eye, a video with continuous motion can be seen when a sequence of frames is played at a certain rate. Because of the extremely high similarity between successive frames, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove redundancy in spatial and temporal dimensions. Video encoding refers to converting a file in an original video format into a file in another video format by compression techniques. The most important coding and decoding standards in the video streaming at present are H.261, H.263, H.264 and the like of the international union.

In the prior art, compression encoding processing of original video is typically implemented using only a single type of video encoder. However, different types of video encoders suffer from different drawbacks. For example, only a single coding scheme is supported, resulting in lower coding efficiency; or, the resource occupancy rate is high, which may cause system blocking and the like. Therefore, in the process of performing compression encoding processing by adopting a single type of video encoder, the video encoder cannot be dynamically configured according to the actual resource occupation situation, so that the problem that the resource occupation rate and the encoding processing efficiency cannot be simultaneously achieved exists.

Disclosure of Invention

The disclosure provides a video coding method, a video coding device, an electronic device and a storage medium, which are used for improving video coding efficiency on the premise of reasonably allocating resources.

In a first aspect, the present disclosure provides a video encoding method, the video encoding method comprising:

encoding a plurality of image frames contained in a received original video stream to obtain a plurality of frame encoding data corresponding to the plurality of image frames;

combining the plurality of frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the original video stream;

wherein the frame encoded data corresponding to each image frame is obtained by: the method comprises the steps of dynamically monitoring the current resource occupancy rate corresponding to an image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and encoding the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame.

In a second aspect, the present disclosure provides a video encoding apparatus comprising:

The encoding module is used for encoding a plurality of image frames contained in the received original video stream to obtain a plurality of frame encoding data corresponding to the plurality of image frames;

the merging module is used for merging a plurality of frame coding data corresponding to a plurality of image frames to obtain a joint coding data stream corresponding to an original video stream;

In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the video encoding method described above.

In a fourth aspect, the present disclosure provides a computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor/processing core implements the video encoding method described above.

According to the video coding method provided by the disclosure, firstly, coding processing is performed on a plurality of image frames contained in a received original video stream to obtain a plurality of frame coding data corresponding to the plurality of image frames; then, combining the plurality of frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the original video stream; wherein the frame encoded data corresponding to each image frame is obtained by: the method comprises the steps of dynamically monitoring the current resource occupancy rate corresponding to an image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and encoding the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame. According to the video coding method provided by the embodiment of the disclosure, a video coder matched with the current resource occupancy rate can be selected according to the current resource occupancy rate, each image frame contained in an original video stream is respectively subjected to coding processing, and coding results corresponding to each image frame are combined to obtain a joint coding data stream corresponding to the original video stream. Therefore, the method can dynamically switch different types of video encoders according to the real-time resource occupancy rate, and realize joint encoding processing of the original video stream by adopting multiple types of video encoders, so that the efficiency of video encoding is improved on the premise of reasonably allocating resources by considering the resource occupancy rate and encoding processing efficiency through the video encoders of the adaptive type.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure, without limitation to the disclosure. The above and other features and advantages will become more readily apparent to those skilled in the art by describing in detail exemplary embodiments with reference to the attached drawings, in which:

fig. 1 is a flowchart of a video encoding method according to an embodiment of the present disclosure;

fig. 2 is a flowchart of a video encoding method according to another embodiment of the present disclosure;

fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure;

fig. 4 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

For a better understanding of the technical solutions of the present disclosure, exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, in which various details of the embodiments of the present disclosure are included to facilitate understanding, and they should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The video encoding method according to the embodiments of the present disclosure may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, etc., and the server may be a separate physical server, a server cluster composed of a plurality of physical servers, or a cloud server capable of cloud computing. The method may be implemented by way of a processor invoking computer readable program instructions stored in a memory.

In the prior art, compression encoding processing of original video is typically implemented using only a single type of video encoder. However, different types of video encoders suffer from different drawbacks. For example, only a single coding scheme is supported, resulting in lower coding efficiency; or, the resource occupancy rate is high, which may cause system blocking and the like. Therefore, in the process of performing compression encoding processing by adopting a single type of video encoder, the video encoder cannot be dynamically configured according to the actual resource occupation situation, so that the problem that the resource occupation rate and the encoding processing efficiency cannot be simultaneously achieved exists. In order to solve the above problems, the present disclosure provides a video encoding method, which selects a video encoder matching with a current resource occupancy according to the current resource occupancy, respectively performs encoding processing on each image frame included in an original video stream, and then combines encoding results corresponding to each image frame to obtain a joint encoded data stream corresponding to the original video stream; the video encoder of different types can be dynamically switched according to the real-time resource occupancy rate, and the joint coding processing of the original video stream by adopting the video encoders of multiple types is realized, so that the resource occupancy rate and the coding processing efficiency are considered by the video encoder of the adaptive type, and the video coding efficiency is improved on the premise of reasonably allocating resources.

Fig. 1 is a flowchart of a video encoding method according to an embodiment of the present disclosure. Referring to fig. 1, the method includes:

step S110: and carrying out coding processing on a plurality of image frames contained in the received original video stream to obtain a plurality of frame coding data corresponding to the plurality of image frames.

In an alternative implementation, there are multiple video encoders implemented based on the same coding standard for the same coding standard, and different types of encoders have different advantages and disadvantages. For example, commonly used encoders implemented based on the h.264 standard include an OpenH264 encoder and an X264 encoder, where the encoding efficiency and resource occupancy of the X264 encoder are higher than those of the OpenH264 encoder. Therefore, in order to more reasonably configure system resources and further improve the efficiency of video image coding, when coding processing is performed for each image frame, an adaptive type encoder is selected to realize coding according to the current resource occupancy rate.

Specifically, the frame-encoded data corresponding to each image frame in step S110 is obtained by: the method comprises the steps of dynamically monitoring the current resource occupancy rate corresponding to an image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and encoding the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame.

In an alternative implementation, the current system resource occupancy is obtained as the current resource occupancy corresponding to the current image frame prior to encoding each image frame. Or, the current system resource occupancy rate is obtained at intervals of preset monitoring time intervals by presetting the monitoring time intervals, so that the dynamic monitoring of the resource occupancy rate is realized. The predetermined monitoring time interval can be set to a smaller value, for example, to a smaller value than the time required for encoding an image frame. It should be noted that, the specific implementation manner of dynamically monitoring the resource occupancy rate is determined by those skilled in the art when implementing the method according to the actual service scenario requirements, which is not limited herein.

In an alternative implementation, the resource occupancy includes: central Processing Unit (CPU) resource occupancy, system memory resource occupancy, and the like. The real-time resource occupancy rate of which resources are specifically selected as the basis for encoder selection, and is determined by those skilled in the art according to the actual service scene requirements when implementing the method, and is not limited herein.

Step S120: and combining the plurality of frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the original video stream.

Video image data has a strong correlation, i.e. there is a lot of redundant information. Wherein, the redundant information can be divided into space domain redundant information and time domain redundant information. Video coding (compression technique) is to remove redundant information in data (remove correlation between data), and compression techniques include intra-frame image data compression technique, inter-frame image data compression technique, and entropy coding compression technique.

In step S120, a plurality of frame code data are code results obtained by performing a code process on a plurality of image frames, respectively, and have been acquired in step S110. The frame encoded data subjected to the compression encoding processing removes redundant information in the corresponding image frame, and retains only data information unique to the corresponding image frame. Compared with the corresponding image frames, the data volume of the frame coding data is greatly reduced, the memory is occupied less, and the problems of communication line faults and data storage capacity tension possibly caused by directly transmitting the image frame data are relieved.

In an alternative implementation, the plurality of frame encoded data corresponding to the plurality of image frames are obtained by encoding by a plurality of different types of encoders, respectively. Wherein a plurality of different types of encoders are implemented based on the same encoding standard, e.g., the OpenH264 encoder and the X264 encoder are both implemented based on the h.264 standard. Therefore, the plurality of frame encoded data corresponding to the plurality of image frames, which are obtained by the encoding processing by the plurality of encoders of different types, respectively, have the same encoding format. Therefore, the combination processing of the plurality of frame coding data can be directly realized by binary-based sequential splicing. And carrying out combination processing on a plurality of frame coding data corresponding to a plurality of image frames contained in the received original video stream, wherein the combination result is a combined coding data stream corresponding to the original video stream obtained by combined coding of a plurality of types of encoders.

In summary, according to the video encoding method provided by the embodiment of the present disclosure, first, encoding processing is performed on a plurality of image frames included in a received original video stream, so as to obtain a plurality of frame encoded data corresponding to the plurality of image frames; then, combining the plurality of frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the original video stream; wherein the frame encoded data corresponding to each image frame is obtained by: the method comprises the steps of dynamically monitoring the current resource occupancy rate corresponding to an image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and encoding the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame. According to the video coding method provided by the embodiment of the disclosure, a video coder matched with the current resource occupancy rate can be selected according to the current resource occupancy rate, each image frame contained in an original video stream is respectively subjected to coding processing, and coding results corresponding to each image frame are combined to obtain a joint coding data stream corresponding to the original video stream. Therefore, the method can dynamically switch different types of video encoders according to the real-time resource occupancy rate, and realize joint encoding processing of the original video stream by adopting multiple types of video encoders, so that the efficiency of video encoding is improved on the premise of reasonably allocating resources by considering the resource occupancy rate and encoding processing efficiency through the video encoders of the adaptive type.

Fig. 2 is a flowchart of a video encoding method according to another embodiment of the present disclosure. Referring to fig. 2, the method includes:

step S210: for each image frame contained in the received original video stream, the current resource occupancy corresponding to the image frame is dynamically monitored.

Video is a continuous sequence of images, consisting of successive frames, one frame being an image. Due to the persistence of vision effect of the human eye, a video with continuous motion can be seen when a sequence of frames is played at a certain rate. An image frame is the smallest unit that constitutes a video.

In an alternative implementation, the current system resource occupancy is obtained as the current resource occupancy corresponding to the current image frame, so as to realize dynamic monitoring of the current resource occupancy corresponding to the current image frame. Or, the current system resource occupancy rate is obtained at intervals of preset monitoring time intervals by presetting the monitoring time intervals, so that the dynamic monitoring of the resource occupancy rate is realized. The predetermined monitoring time interval can be set to a smaller value, for example, to a smaller value than the time required for encoding an image frame. It should be noted that, the specific implementation manner of dynamically monitoring the resource occupancy rate is determined by those skilled in the art when implementing the method according to the actual service scenario requirements, which is not limited herein.

In an alternative implementation, the resource occupancy includes: central Processing Unit (CPU) resource occupancy, system memory resource occupancy, and the like. The real-time resource occupancy rate of which resources is dynamically monitored is determined by a person skilled in the art according to actual service scene requirements when the method is implemented, and the method is not limited herein.

Step S220: for each image frame contained in the received original video stream, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and carrying out encoding processing on the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame.

An image frame is the smallest unit that constitutes a video. Because of the extremely high similarity between successive frames, in order to facilitate storage and transmission, the original video needs to be encoded and compressed to remove redundancy in spatial and temporal dimensions. Video encoding refers to converting a file in an original video format into a file in another video format by compression techniques. The most important coding and decoding standards in the video streaming at present are H.261, H.263, H.264 and the like of the international union.

In an alternative implementation, there are multiple video encoders implemented based on the same coding standard for the same coding standard, and different types of encoders have different advantages and disadvantages. In order to more reasonably configure system resources and further improve the efficiency of video image coding, when coding is carried out on each image frame, an adaptive type encoder is selected to realize coding according to the current resource occupancy rate.

Thus, in step S220, at least two encoder types include: a first type and a second type. Wherein the resource occupancy of the first type of encoder is less than the resource occupancy of the second type of encoder, and the encoding efficiency of the first type of encoder is lower than the encoding efficiency of the second type of encoder. It can be seen that the first type of encoder has both the advantage of less resource occupancy and the disadvantage of lower coding efficiency, and the second type of encoder has both the advantage of higher coding efficiency and the disadvantage of greater resource occupancy.

Based on the first type of encoder and the second type of encoder, in step S220, according to the current resource occupancy rate, a target encoder type matching with the current resource occupancy rate is dynamically selected from at least two encoder types, and a specific implementation manner of encoding the image frame by the target encoder of the target encoder type includes:

If the current resource occupancy rate exceeds a preset threshold, selecting a first type as a target encoder type, and encoding the image frame through the target encoder of the first type; if the current resource occupancy rate does not exceed the preset threshold value, selecting the second type as the type of the target encoder, and encoding the image frame through the target encoder of the second type. The preset threshold for evaluating the current resource occupancy is specifically set by a person skilled in the art according to the actual service scenario requirement when implementing the method, and is not limited herein.

Through the specific implementation manner, when each image frame is subjected to coding processing, the encoder which can simultaneously consider the current resource occupancy rate and the coding processing efficiency is selected to realize coding based on the current resource occupancy rate, so that system resources are more reasonably configured, and the video image coding efficiency is further improved.

H.264 is a highly compressed digital Video codec standard proposed by the Joint Video Team (JVT) consisting of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG) Joint. 264 has a high data compression ratio, which will save the user's download time and data traffic charges significantly. Meanwhile, the H.264 also has high-quality smooth images, and video data compressed by the H.264 has less bandwidth required in the network transmission process and is more economical.

In an alternative implementation manner, the video encoding method in this embodiment is implemented based on the h.264 standard, and the first type of encoder includes: the OpenH264 encoder, the second type of encoder includes: an X264 encoder. The OpenH264 encoder is a lightweight H.264 encoder, has a small code base, is easy to be transplanted to various systems and devices, and is relatively fast and easy to use. However, the OpenH264 encoder only supports the basic-file encoding mode of h.264, and does not support the encoding modes of other levels of h.264, resulting in low encoding efficiency. The X264 encoder is a high-efficiency h.264 encoder, has advantages in terms of high performance and quality stability, and is generally used for producing high-quality, low-bit-rate video. However, the X264 encoder is relatively slow, requiring more computing resources.

For both types of encoders, the OpenH264 encoder can only achieve a superfast speed comparable to X264, but the compression efficiency falls behind by more than 20%. And, openH264 encoder's effect in the aspect of rate control is also less than the X264 encoder, and the rate control design of X264 encoder is more accurate. However, the X264 encoder is more complex in design, unlike OpenH264, which is lightweight and easy to implement. Therefore, the resource occupancy rate of the OpenH264 encoder is smaller than that of the X264 encoder, and the encoding efficiency of the OpenH264 encoder is lower than that of the X264 encoder.

Video resolution refers to the precision of a video image in units of size, the resolution in units of PPI (Pixels Per Inch), commonly referred to as pixels per inch. When the video is displayed through the terminal equipment, the video with different resolutions can be selected and displayed according to influencing factors such as video use, viewing equipment, bandwidth, storage capacity and the like. The original video is encoded according to different compression degrees to obtain compression encoded data, and then the compression encoded data corresponding to the different compression degrees are decoded to obtain corresponding videos with different resolutions.

In an alternative implementation, to achieve multi-resolution encoding for an original video stream, the target encoder for each encoder type further comprises: a plurality of sub-encoders corresponding to a plurality of preset resolutions, and the plurality of sub-encoders are operated in parallel with each other. The multiple preset resolutions are specifically set by those skilled in the art according to the actual service scene requirements when implementing the method, and are not limited herein, for example, the multiple preset resolutions may include: 1080p, 720p, 480p, etc.

Based on the above implementation, the encoding of the image frame by the target encoder of the target encoder type includes: encoding the image frames through a plurality of sub-encoders corresponding to a plurality of preset resolutions respectively to obtain a plurality of frame encoding data corresponding to the current image frames; wherein, the plurality of frame coding data respectively correspond to a plurality of preset resolutions. And, a plurality of sub-encoders corresponding to a plurality of preset resolutions belong to the same type of target encoder.

Step S230: and combining the plurality of frame coding data corresponding to the plurality of image frames contained in the original video stream to obtain a joint coding data stream corresponding to the original video stream.

In step S230, a plurality of frame code data are code results obtained by performing a code process on a plurality of image frames, respectively, and have been acquired in step S220. The frame encoded data subjected to the compression encoding processing removes redundant information in the corresponding image frame, and retains only data information unique to the corresponding image frame. Compared with the corresponding image frames, the data volume of the frame coding data is greatly reduced, the memory is occupied less, and the problems of communication line faults and data storage capacity tension possibly caused by directly transmitting the image frame data are relieved.

As can be seen from the implementation provided in step S220, the plurality of frame encoded data corresponding to the plurality of image frames are obtained by performing encoding processing on a target encoder dynamically selected from at least two types of encoders, respectively. Wherein at least two types of encoders are implemented based on the same encoding standard, e.g., the OpenH264 encoder and the X264 encoder are both implemented based on the h.264 standard. Therefore, the plurality of frame encoded data corresponding to the plurality of image frames, which are obtained by the encoding processing by the target encoder dynamically selected from the at least two types of encoders, respectively, have the same encoding format. Therefore, the combination processing of the plurality of frame coding data can be directly realized by binary-based sequential splicing. And carrying out combination processing on a plurality of frame coding data corresponding to a plurality of image frames contained in the received original video stream, wherein the combination result is a combined coding data stream corresponding to the original video stream obtained by combined coding of at least two types of encoders.

In an alternative implementation, in order to implement multi-resolution encoding for the original video stream, the frame encoding data corresponding to each image frame in step S230 includes: the step S230 specifically includes, for any one of the preset resolutions, encoding data of a plurality of frames corresponding to the current image frame and corresponding to the preset resolutions, respectively:

and determining any one of a plurality of preset resolutions as a target resolution, extracting target frame coding data corresponding to the target resolution from the plurality of frame coding data corresponding to each image frame, and combining the plurality of target frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the target resolution.

Step S240: and sending the joint coding data stream to a decoder for decoding processing to obtain a video image corresponding to the original video stream, and transmitting the video image to a display end for display.

A video decoder is a program (video player) or device that performs a restoration decoding operation on encoded digital video. In an alternative implementation, at least two types of encoders in step S220 are implemented based on the same encoding standard, for example, the OpenH264 encoder and the X264 encoder are both implemented based on the h.264 standard. Therefore, the plurality of frame encoded data corresponding to the plurality of image frames, which are obtained by the encoding processing by the target encoder dynamically selected from the at least two types of encoders, respectively, have the same encoding format.

Thus, in step S240, the decoding process of the jointly encoded data streams encoded by the encoders of the at least two candidate encoder types may be implemented by the same decoder, i.e. the decoder supports the decoding process of the encoded data encoded by the encoders of the at least two candidate encoder types.

In an alternative implementation manner, the video encoding method in this embodiment is implemented based on the h.264 standard, and the decoder specifically includes: FFmpeg decoder. FFmpeg is an open source library integrating various audio and video codecs, supports common formats such as h.264, AAC, MP3 and the like, and has wide application scenarios such as video editing software, transcoding tools, streaming media servers and the like.

In an alternative implementation, to implement multi-resolution encoding for an original video stream, the jointly encoded data stream in step S240 includes: and transmitting the joint coded data stream corresponding to the preset resolution to a decoder for decoding processing aiming at any preset resolution to obtain a video image corresponding to the preset resolution, and transmitting the video image to a display end for display so as to realize the display of the video image with the specified resolution.

In an alternative implementation manner, the video coding method provided in this embodiment is implemented by a WebRTC implemented system, where the original program code of the system includes a first code segment corresponding to a first type of encoder, and a call interface for calling a second type of encoder; wherein the second code segment corresponding to the second encoder includes code for: resolution conversion operations, rate conversion operations, frame rate conversion operations, and/or key frame transmission operations.

WebRTC (Web Real-Time Communications) is a Real-time communication technology that allows Web applications or sites to establish Peer-to-Peer (Peer-to-Peer) connections between browsers without the aid of intermediaries, enabling the transmission of video streams, audio streams, and/or any other data. WebRTC provides a core technology for video conferencing, including audio and video acquisition, codec, network transmission, display, and other functions. The video coding and decoding technology is particularly important, and the coding efficiency directly determines the quality of video call.

In the existing WebRTC, the OpenH264 encoding is adopted as a default h.264 encoder, that is, when the h.264 code stream needs to be acquired, the WebRTC calls the OpenH264 encoder to encode the video, and the X264 encoder is not integrated into the original program code of the WebRTC.

It can be seen that, in an alternative implementation, the video encoding method in this embodiment is implemented based on the h.264 standard, and the first type of encoder includes: the OpenH264 encoder, the second type of encoder includes: an X264 encoder. In addition, the video coding method provided in the embodiment is implemented by a WebRTC-based system, and in order to implement dynamic switching of encoder types, the original program code of the system includes a first code segment corresponding to the OpenH264 encoder and a call interface for calling the X264 encoder; wherein, the first code segment corresponding to the OpenH264 encoder and the second code segment corresponding to the X264 encoder each comprise codes for realizing the following operations: resolution conversion operations, rate conversion operations, frame rate conversion operations, and/or key frame transmission operations.

In summary, according to the video encoding method provided by the embodiments of the present disclosure, first, for each image frame included in a received original video stream, a current resource occupancy rate corresponding to the image frame is dynamically monitored; secondly, for each image frame contained in the received original video stream, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and encoding the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame; then, combining the plurality of frame coding data corresponding to the plurality of image frames contained in the original video stream to obtain a joint coding data stream corresponding to the original video stream; finally, the joint coding data stream is sent to a decoder for decoding processing, a video image corresponding to the original video stream is obtained, and the video image is transmitted to a display end for display; wherein, for each image frame contained in the received original video stream, multi-resolution encoding for the original video stream is realized by a plurality of sub-encoders running in parallel with each other. Therefore, the method can dynamically switch different types of video encoders according to the real-time resource occupancy rate, and realize joint encoding processing of the original video stream by adopting multiple types of video encoders, so that the efficiency of video encoding is improved on the premise of reasonably allocating resources by considering the resource occupancy rate and encoding processing efficiency through the video encoders of the adaptive type; meanwhile, through a plurality of sub-encoders running in parallel, multi-resolution encoding of an original video stream is realized, and more diversified encoding requirements are met.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

Fig. 3 is a block diagram of a video encoding apparatus according to an embodiment of the present disclosure.

Referring to fig. 3, an embodiment of the present disclosure provides a video encoding apparatus 30, the video encoding apparatus 30 including:

an encoding module 31, configured to perform encoding processing on a plurality of image frames included in the received original video stream, so as to obtain a plurality of frame encoded data corresponding to the plurality of image frames;

a merging module 32, configured to perform merging processing on a plurality of frame encoded data corresponding to a plurality of image frames, so as to obtain a jointly encoded data stream corresponding to an original video stream;

Optionally, the at least two encoder types include: the first type and the second type, the resource occupancy rate of the encoder of the first type is smaller than the resource occupancy rate of the encoder of the second type; the encoding module 31 is specifically configured to:

if the current resource occupancy rate exceeds a preset threshold, selecting a first type as a target encoder type, and encoding the image frame through the target encoder of the first type;

if the current resource occupancy rate does not exceed the preset threshold value, selecting the second type as the type of the target encoder, and encoding the image frame through the target encoder of the second type.

Optionally, the first type of encoder comprises: the OpenH264 encoder, the second type of encoder includes: an X264 encoder.

Optionally, the target encoder of each encoder type further comprises: a plurality of sub-encoders corresponding to a plurality of preset resolutions, and the plurality of sub-encoders are operated in parallel with each other; the encoding module 31 is specifically configured to:

encoding the image frames through a plurality of sub-encoders corresponding to a plurality of preset resolutions respectively to obtain a plurality of frame encoding data corresponding to the current image frames; wherein, the plurality of frame coding data respectively correspond to a plurality of preset resolutions.

Optionally, the merging module 32 is specifically configured to:

Optionally, the video encoding device 30 further comprises a decoding module for:

transmitting the joint coding data stream to a decoder for decoding processing to obtain a video image corresponding to the original video stream, and transmitting the video image to a display end for display; wherein the decoder supports decoding of encoded data encoded by encoders of at least two candidate encoder types.

Optionally, the video encoding device 30 is implemented by a WebRTC-based system, where the original program code of the system includes a first code segment corresponding to the first type of encoder, and a call interface for calling the second type of encoder; wherein the second code segment corresponding to the second encoder includes code for: resolution conversion operations, rate conversion operations, frame rate conversion operations, and/or key frame transmission operations.

The specific structure and working principle of each module may refer to the description of the corresponding parts of the method embodiment, and are not repeated here.

The various modules in the video encoding apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

Referring to fig. 4, an embodiment of the present disclosure provides an electronic device 40, the electronic device 40 comprising: at least one processor 401; at least one memory 402, and one or more I/O interfaces 403, connected between the processor 401 and the memory 402; wherein the memory 402 stores one or more computer programs executable by the at least one processor 401, the one or more computer programs being executable by the at least one processor 401 to enable the at least one processor 401 to perform the video encoding method described above.

The various modules in the electronic device described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

The disclosed embodiments also provide a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor/processing core implements the video encoding method described above. The computer readable storage medium may be a volatile or nonvolatile computer readable storage medium.

The disclosed embodiments also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when executed in a processor of an electronic device, performs the video encoding method described above.

Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).

The term computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as known to those skilled in the art. Computer storage media includes, but is not limited to, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical disc storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and may include any information delivery media.

The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.

Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.

The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purpose of limitation. In some instances, it will be apparent to one skilled in the art that features, characteristics, and/or elements described in connection with a particular embodiment may be used alone or in combination with other embodiments unless explicitly stated otherwise. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A video encoding method, comprising:

combining the frame coding data corresponding to the image frames to obtain a joint coding data stream corresponding to the original video stream;

wherein the frame encoded data corresponding to each image frame is obtained by: dynamically monitoring the current resource occupancy rate corresponding to the image frame, dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and carrying out encoding processing on the image frame through a target encoder of the target encoder type to obtain frame encoding data corresponding to the image frame.

2. The method of claim 1, wherein the at least two encoder types comprise: a first type and a second type, the resource occupancy of the first type of encoder being less than the resource occupancy of the second type of encoder;

the step of dynamically selecting a target encoder type matched with the current resource occupancy rate from at least two encoder types according to the current resource occupancy rate, and the step of encoding the image frame through a target encoder of the target encoder type comprises the following steps:

if the current resource occupancy rate exceeds a preset threshold, selecting a first type as the type of the target encoder, and encoding the image frame through the target encoder of the first type;

and if the current resource occupancy rate does not exceed the preset threshold value, selecting a second type as the type of the target encoder, and carrying out encoding processing on the image frame through the target encoder of the second type.

3. The method of claim 2, wherein the first type of encoder comprises: an OpenH264 encoder, the second type of encoder comprising: an X264 encoder.

4. The method of claim 1, wherein the target encoder for each encoder type further comprises: a plurality of sub-encoders corresponding to a plurality of preset resolutions, and the plurality of sub-encoders are operated in parallel with each other;

the encoding of the image frame by the target encoder of the target encoder type includes:

encoding the image frames through a plurality of sub-encoders corresponding to a plurality of preset resolutions respectively to obtain a plurality of frame encoded data corresponding to the current image frame; wherein the plurality of frame encoded data corresponds to the plurality of preset resolutions, respectively.

5. The method of claim 4, wherein the merging the plurality of frame encoded data corresponding to the plurality of image frames to obtain a jointly encoded data stream corresponding to the original video stream comprises:

and determining any one of the multiple preset resolutions as a target resolution, extracting target frame coding data corresponding to the target resolution from multiple frame coding data corresponding to each image frame, and combining the multiple target frame coding data corresponding to the multiple image frames to obtain a joint coding data stream corresponding to the target resolution.

6. The method of claim 1, wherein after said deriving a jointly encoded data stream corresponding to said original video stream, further comprising:

transmitting the joint coding data stream to a decoder for decoding processing to obtain a video image corresponding to the original video stream, and transmitting the video image to a display end for display; wherein the decoder supports decoding of encoded data obtained by encoding of the encoders of the at least two candidate encoder types.

7. The method according to claim 2, wherein the method is implemented by a WebRTC-based system, the original program code of which contains a first code segment corresponding to the first type of encoder, and a call interface for calling the second type of encoder; wherein the second code segment corresponding to the second encoder includes code for: resolution conversion operations, rate conversion operations, frame rate conversion operations, and/or key frame transmission operations.

8. A video encoding apparatus, comprising:

The merging module is used for merging the plurality of frame coding data corresponding to the plurality of image frames to obtain a joint coding data stream corresponding to the original video stream;

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the video encoding method of any one of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the video encoding method according to any of claims 1-7.