CN113965714B

CN113965714B - Video stream processing method and device, electronic equipment and storage medium

Info

Publication number: CN113965714B
Application number: CN202111060659.2A
Authority: CN
Inventors: 武洋; 管显笋; 张明泽
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2023-06-23
Anticipated expiration: 2041-09-10
Also published as: CN113965714A

Abstract

The disclosure provides a video stream processing method, a video stream processing device, electronic equipment and a storage medium, and relates to the technical fields of information streams and computer vision. The specific implementation scheme is as follows: after a request terminal requests a video stream of a source terminal, detecting and determining that a video frame returned by the source terminal is a non-key frame; and sending a key frame request to the source terminal. According to the technical scheme, the video playing efficiency can be effectively improved.

Description

Video stream processing method and device, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of computers, in particular to the technical field of information flow and computer vision, and especially relates to a video flow processing method, a video flow processing device, electronic equipment and a storage medium.

Background

In recent years, as the use rate of online video conferences is continuously increased, the requirements of users on real-time performance and picture effects of online video conferences are also increased.

For online video conferencing, the video frames are divided into I-frames and P-frames, where the I-frames do not need to reference the previous video frames and can be decoded immediately. While P frames require a forward reference, relying on previous video frames for decoding. Since I-frame coding occupies much higher bandwidth than P-frames, especially for scenes such as screen sharing. In order to effectively reduce traffic and enhance video effects, it is generally not necessary to encode I frames in the conference.

Disclosure of Invention

The disclosure provides a video stream processing method, a video stream processing device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a method for processing a video stream, including:

after a request terminal requests a video stream of a source terminal, detecting and determining that a video frame returned by the source terminal is a non-key frame;

and sending a key frame request to the source terminal.

According to another aspect of the present disclosure, there is provided a processing apparatus for a video stream, including:

the detection module is used for detecting and determining whether the video frame returned by the source terminal is a non-key frame or not after the video stream of the source terminal is requested by the request terminal;

and the sending module is used for sending a key frame request to the source terminal if yes.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects and methods of any one of the possible implementations described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of the aspects and any possible implementation described above.

According to yet another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspects and any one of the possible implementations described above.

According to the technology disclosed by the invention, the video playing efficiency can be effectively improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

fig. 2 is an exemplary system architecture diagram of the present embodiment.

FIG. 3 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure;

FIG. 5 is another exemplary system architecture diagram of the present embodiment;

FIG. 6 is a schematic diagram according to a fourth embodiment of the present disclosure;

FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure;

fig. 8 is a block diagram of an electronic device for implementing a method of processing a video stream according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments in this disclosure without inventive faculty, are intended to be within the scope of this disclosure.

It should be noted that, the terminal device in the embodiments of the present disclosure may include, but is not limited to, smart devices such as a mobile phone, a personal digital assistant (Personal Digital Assistant, PDA), a wireless handheld device, and a Tablet Computer (Tablet Computer); the display device may include, but is not limited to, a personal computer, a television, or the like having a display function.

In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure; as shown in fig. 1, the present embodiment provides a method for processing a video stream, which specifically includes the following steps:

s101, after a request terminal requests a video stream of a source terminal, detecting and determining that a video frame returned by the source terminal is a non-key frame;

s102, sending a key frame request to the source terminal, and ending.

The execution subject of the video stream processing method of the present embodiment may be a processing apparatus of a video stream, for example, the processing apparatus of a video stream may be a server. The key frame of the present embodiment may refer to a frame, such as an I frame, that can be independently decoded without relying on other frames in video decoding; non-key frames may refer to frames in video decoding that rely on previous frames to be decodable, such as P-frames.

The video stream processing method of the embodiment can be applied to video conferences including a plurality of users. Specifically, in the video conference, a plurality of users participate in the video conference together through a processing device such as a server of a video stream through terminals, respectively. In a video conference, each user can see the video streams of other users through its terminal. To save bandwidth, avoid network chunking, each user may also choose to view only a portion of the user's video stream, such as the hosting user, as well as the user who is speaking or other users who want to pay attention to. For example, when a user wants to view a video stream of a specified user, the user may click on an avatar of the specified user on an interface through the video conferencing application to select to view the video stream. Or the video stream of the appointed user can be selected to be checked through the setting mode of the video conference application.

In this embodiment, a terminal requesting to view a specified user may be referred to as a requesting terminal, and a terminal requested to view a video stream of its corresponding user may be referred to as a source terminal providing a video stream.

Specifically, when the requesting terminal requests the video stream of the source terminal, the requesting terminal specifically sends a video stream request carrying the source terminal identifier to a processing device of the video stream, such as a server, that is, indicates that the user of the requesting terminal wants to watch the video stream corresponding to the source terminal identifier. Meanwhile, the video stream request can also carry the identification of the request terminal.

After receiving the video stream request, the server updates the forwarding relationship of the video stream corresponding to the source terminal on a processing device of the video stream, such as a server side, so as to add the identifier of the request terminal into the forwarding relationship of the video stream corresponding to the source terminal, and thus, the video stream corresponding to the source terminal is forwarded to the request terminal. Thus, after that, the processing device of the video stream, such as a server, receives the video frame of the source terminal, and also needs to forward the video frame of the source terminal to the requesting terminal.

However, if the video frame received from the source terminal for the first time is a non-key frame such as a P frame after the video stream from the source terminal is requested by the requesting terminal, the P frame needs to be decoded with reference to the previous frame, and the video stream from the source terminal cannot be decoded and viewed because the requesting terminal does not have the video frame before the source terminal. The requesting terminal may reinitiate a key frame such as an I-frame request. Considering that more and more online meetings are carried out, a large number of people enter, leave and focus appointed users appear in the meeting, and when the switching occurs, I-frame requests are needed for decoding and playing. However, due to the limitation of code rate control and bandwidth, the frame rate of the picture is reduced and the quantization coefficient of the I frame is increased, so that the picture can generate a click feeling and a blurring feeling.

Based on this, in order to overcome the above-mentioned problem, in this embodiment, after the request terminal requests the video stream of the source terminal, the processing device such as the server of the video stream may detect whether the video frame returned by the source terminal is a non-key frame such as a P-frame, and if so, the processing device such as the server of the video stream sends a key frame request to the source terminal without returning the video frame to the request terminal, so that the transmission of invalid data between the processing device such as the server of the video stream and the request terminal is reduced, and the waste of bandwidth resources is reduced. And if the video frame returned by the source terminal is a key frame such as an I frame, the key frame is sent to the request terminal and is ended.

Fig. 2 is an exemplary system architecture diagram of the present embodiment. As shown in fig. 2, in this embodiment, the processing device for video stream is a server, specifically, a single server may be used, or a server group formed by a plurality of servers may be used. The request terminal of the embodiment may be a terminal used by any user in the video conference, and the source terminal may be a terminal used by any user different from the request terminal in the video conference. As shown in fig. 2, the request terminal first transmits the video stream of the request source terminal to the server, and then the server may modify the forwarding relationship of the video stream of the source terminal. Then, after receiving the video frame returned by the source terminal, the server performs video processing according to steps S101 to S102 of the present embodiment, so as to avoid that the video frame is directly sent to the requesting terminal according to the forwarding relationship of the video stream of the source terminal, and the video frame is a P frame, which results in that the requesting terminal cannot decode, and further continues to request the I frame again, which causes unnecessary resource waste.

By adopting the technical scheme of the embodiment, the processing method of the video stream can detect whether the video frame returned by the source terminal is a non-key frame or not after the request terminal requests the video stream of the source terminal; sending a key frame request to a source terminal; the non-key frames are not required to be sent to the request terminal, so that unnecessary resource waste can be reduced, network congestion can be reduced, delay of video playing at the request terminal side can be effectively reduced, and video playing efficiency in a video conference can be improved.

FIG. 3 is a schematic diagram according to a second embodiment of the present disclosure; as shown in fig. 3, the processing method of the video stream according to the present embodiment is further described in more detail by taking the example that the processing device of the video stream is a server on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 3, the method for processing a video stream in this embodiment may specifically include the following steps:

s301, a server receives a video stream request carrying a request terminal identifier and a source terminal identifier of a request terminal; step S302 is performed;

for example, the application scenario in this embodiment may be that the user requesting the terminal is a user newly joining in a video conference, where the user needs to watch a video stream corresponding to the conference source terminal. The video stream request can be initiated by a user of the request terminal by clicking the video conference application, or can be initiated by the video conference application control request terminal, namely, the video conference application detects a newly added terminal, and then the newly added terminal is controlled to actively initiate a request to check the video stream of the terminal corresponding to the conference host and the speaking user. The system architecture corresponding to this embodiment may be shown in fig. 2, and in this embodiment, the request terminal and the source terminal access the same server is taken as an example.

S302, the server updates the forwarding relation of the video stream corresponding to the source terminal based on the source terminal identification and the request terminal identification; step S303 is performed;

that is, the forwarding relationship of the video stream corresponding to the source terminal may be: source terminal identification→terminal identification 1, terminal identification 2, … …, terminal identification n. If the request terminal identifier is added to the forwarding relationship of the video stream corresponding to the source terminal after receiving the video stream request carrying the source terminal identifier sent by the request terminal, if the request terminal identifier is n+1, the forwarding relationship of the video stream corresponding to the source terminal may be updated as follows: source terminal identification→terminal identification 1, terminal identifications 2, … …, terminal identification n, request terminal identification n+1. Thus, after receiving the video frame of the source terminal, the subsequent server needs to forward to the terminal corresponding to the n+1 terminal identifiers corresponding to the forwarding relationship.

S303, the server receives a video frame carrying a source terminal identifier sent by a source terminal; step S304 is executed;

s304, the server detects whether the downlink of the streaming media between the server and the request terminal is created, and if so, the step S305 is executed; if not, initiating a request for creating the downlink of the streaming media to the requesting terminal, and ending with the request for creating the downlink of the streaming media.

S305, the server detects whether the video frame is a key frame or not; if the video is the key frame, the server sends the key frame to the request terminal so that the request terminal decodes according to the key frame, starts watching the video and ends. If the frame is a non-key frame, executing step S306;

alternatively, this step S305 may be performed simultaneously with the above step S304, or the order of operations may be interchanged.

S306, the server detects whether a key frame of the source terminal is received or not in a preset time period or whether a key frame request is sent to the source terminal; if none of them is present, step S307 is performed; if it is detected that the key frame request is sent to the source terminal within the preset time period, step S312 is executed; if it is detected that the key frame of the source terminal is received within the preset time period, step S313 is executed;

s307, the server sends a key frame request to the source terminal; simultaneously recording a first moment for sending a key frame request to a source terminal, and simultaneously starting a first timer with a first preset duration; step S308 is performed;

the first timer set in the step can automatically fail when the duration reaches the first preset duration after the first time. In addition, the server may also simultaneously create a buffer of the key frame request, so that after the first timer expires, the key frame request is retrieved from the buffer and sent again.

Correspondingly, after receiving the key frame request, the source terminal encodes the key frame and returns the key frame to the server.

S308, the server detects whether the first timer fails; if not, go to step S309; if the key frame is invalid, the method returns to step S307, and a key frame request is sent to the source terminal again.

By sending the key frame request to the source terminal again, the problem of communication failure between the server and the source terminal can be solved, so that the previous key frame request is caused, the source terminal may not receive the key frame request, and the key frame request can be sent again when the first timer fails.

S309, the server detects whether a key frame carrying a source terminal identifier returned by the source terminal is received; if so, executing step S310; if not, returning to the step S308 to continue detection;

in this step, the method may include that the server receives a video frame carrying the source terminal identifier returned by the source terminal, and then determines that the video frame is a key frame through detection.

S310, the server records a second moment when the key frame is received, and controls the first timer to fail; step S311 is performed;

and S311, the server sends the key frame to the request terminal according to the forwarding relation of the video stream corresponding to the source terminal, so that the request terminal decodes according to the key frame, starts to watch the video and ends.

It should be noted that, the server may also send the key frame to the corresponding other terminal according to the forwarding relationship of the video stream.

S312, the server temporarily does not send a key frame request to the source terminal, waits for the receiving source terminal to return the key frame, and ends.

S313, the server detects whether a buffer memory of a key frame request and a corresponding second timer are set; if not, go to step S314; if yes, go to step S312;

s314, the server creates a buffer of the key frame request, and simultaneously starts a second timer with a second preset duration; step S315 is performed;

the second preset duration is used for identifying the duration of the current moment from the third moment of last time of receiving the key frame of the source terminal;

s315, the server detects whether the second timer is invalid, and if so, the step S316 is executed; if not, go to step S317;

s316, the server acquires a key frame request from the cache; returning to step S307, the key frame request is sent again to the source terminal.

S317, the server detects whether a key frame returned by the source terminal is received; if so, recording a fourth moment when the key frame is received; and controlling the second timer to fail; returning to step S311; if not, the process returns to step S315 to continue the detection.

Similarly, the key frame in this embodiment may be an I frame, and the non-key frame may be a P frame.

By adopting the technical scheme, the processing method of the video stream can detect the key frames by the server without sending all video frames to the request terminal, and the request terminal can effectively reduce the operation of requesting the key frames again when the request terminal detects the non-key frames, thereby effectively reducing unnecessary resource waste, reducing network congestion, reducing the problems of blocking and blurring when each terminal watches the video, effectively reducing the delay of video playing at the request terminal side, and improving the efficiency of video playing in the video conference.

FIG. 4 is a schematic diagram according to a third embodiment of the present disclosure; as shown in fig. 4, the processing method of the video stream according to the present embodiment is further described in more detail by taking the example that the processing device of the video stream is a server cluster on the basis of the technical solution of the embodiment shown in fig. 1. As shown in fig. 4, the method for processing a video stream in this embodiment may specifically include the following steps:

s401, an access server 1 receives a video stream request carrying a request terminal identifier and a source terminal identifier of a request terminal; step S402 is performed;

The application scenario of this embodiment is the same as the application scenario shown in fig. 3 described above. Unlike the system architecture of fig. 2 used in the embodiment of fig. 3 described above, the system architecture of fig. 2 is: in the system architecture of this embodiment, the request terminal and the source terminal access different servers are taken as an example. Fig. 5 is another exemplary system architecture diagram of the present embodiment, as shown in fig. 5, taking a request terminal access server 1 and a source terminal access server 2 as an example. That is, the processing apparatus of the video stream in the present embodiment is a server cluster including the server 1 and the access server 2.

S402, the server 1 detects whether the downlink of the streaming media between the server 1 and the request terminal is created, and if so, the step S403 is executed; if not, initiating a request for creating the downlink of the streaming media to the requesting terminal, and ending with the request for creating the downlink of the streaming media.

S403, the server 1 sends a video stream request carrying a request terminal identifier, an identifier of the server 1 and a source terminal identifier to the server 2; step S404 is performed;

s404, the server 2 updates the forwarding relation of the video stream corresponding to the source terminal based on the source terminal identification and the request terminal identification; step S405 is performed;

unlike the forwarding relationship of the video stream in the embodiment shown in fig. 3, in this embodiment, since different terminals correspond to different access servers, the identifier of the access server of each terminal may be further identified in the forwarding relationship of the video stream, for example, may be identified as: (source terminal identification: access server identification 0) → (terminal identification 1: access server identification 1), (terminal identification 2: access server identification 2), … …, (terminal identification n: access server identification n).

S405, the server 2 receives a video frame carrying a source terminal identifier sent by a source terminal; step S406 is performed;

s406, the server 2 sends a video frame carrying the source terminal identification to the server 1 based on the forwarding relation of the video stream corresponding to the source terminal; step S407 is performed;

s407, the server 1 detects whether the video frame is a key frame, if so, the step S408 is executed; if the frame is a non-key frame, step S409 is executed;

it should be noted that, whether the downlink of the streaming media between the detection of step S402 and the requesting terminal is created or not may be performed after the detection and determination of step S407 that the video frame is a non-key frame, and after the downlink of the streaming media has been created, step S409 may be performed.

S408, the server 1 sends the key frame to the request terminal so that the request terminal decodes according to the key frame, starts watching the video and ends.

S409, the server 1 sends a key frame request carrying a request terminal identifier, a server 1 identifier and a source terminal identifier to the server 2; step S410 is performed;

s410, the server 2 detects whether a key frame of the source terminal is received or not within a preset time period or whether a key frame request is sent to the source terminal; if none of them is present, step S411 is performed; if it is detected that the key frame request is sent to the source terminal within the preset time period, step S416 is executed; if it is detected that the key frame of the source terminal is received within the preset time period, step S417 is executed;

S411, the server 2 sends a key frame request to the source terminal; simultaneously recording a first moment for sending a key frame request to a source terminal, and simultaneously starting a first timer with a first preset duration; step S412 is performed;

Correspondingly, after receiving the key frame request, the source terminal encodes the key frame and returns the key frame to the server 2.

S412, the server 2 detects whether the first timer is invalid; if not, go to step S413; if the key frame is invalid, the method returns to the step S411, and a key frame request is sent to the source terminal again.

By sending the key frame request to the source terminal again, the communication failure between the server 2 and the source terminal can be solved, so that the previous key frame request is caused, the source terminal may not receive the key frame request, and the key frame request can be sent again when the first timer fails.

S413, the server 2 detects whether a video frame carrying a source terminal identifier returned by the source terminal is received; if so, execute step S414; if not, returning to the step S412 to continue detection;

S414, the server 2 records the second moment of receiving the video frame and controls the first timer to fail; step S415 is performed;

s415, the server 2 sends a video frame carrying a source terminal identifier and a request terminal identifier to the server 1 according to the forwarding relation of the video stream corresponding to the source terminal; step S407 is performed;

and S416, the server 2 temporarily does not send a key frame request to the source terminal, waits for the receiving source terminal to return the key frame, and ends.

S417, the server 2 detects whether a buffer memory of a key frame request and a corresponding second timer are set; if not, go to step S418; if yes, go to step S416;

s418, the server 2 creates a buffer memory of the key frame request and starts a second timer with a second preset duration; step S419 is performed;

s419, the server 2 detects whether the second timer is invalid, and if so, the step S420 is executed; if not, executing step S423;

s420, the server 2 acquires a key frame request from the cache; returning to step S411, sending a key frame request to the source terminal again;

s421, the server 2 detects whether a video frame carrying a source terminal identifier returned by a source terminal is received; if so, go to step S422; if not, returning to the step S419 to continue detection;

S422, the server 2 records the fourth moment of receiving the video frame and controls the second timer to be invalid; step S415 is performed;

By adopting the technical scheme, the processing method of the video stream can detect the key frames by the access server corresponding to the request terminal without sending all video frames to the request terminal, and the request terminal can effectively reduce the operation of requesting the key frames again when the request terminal detects the non-key frames, effectively reduce unnecessary resource waste, reduce network congestion, effectively reduce the delay of video playing at the request terminal side and improve the video playing efficiency in the video conference.

FIG. 6 is a schematic diagram according to a fourth embodiment of the present disclosure; as shown in fig. 6, the present embodiment provides a processing apparatus 600 for video stream, including:

the detecting module 601 is configured to detect whether a video frame returned by the source terminal is a non-key frame after the video stream of the source terminal is requested by the requesting terminal;

and the sending module 602 is configured to send a key frame request to the source terminal if yes.

The implementation principle and the technical effect of the video stream processing by adopting the above modules in the video stream processing device 600 of the present embodiment are the same as those of the above related method embodiments, and detailed description of the above related method embodiments may be referred to and will not be repeated here.

FIG. 7 is a schematic diagram according to a fifth embodiment of the present disclosure; as shown in fig. 7, the present embodiment further describes the technical solution of the present application in more detail on the basis of the technical solution of the embodiment shown in fig. 6.

In the processing device 600 for video stream of the present embodiment, the detection module 601 is further configured to:

and detecting and determining that the key frame of the source terminal is not received or the key frame request is not sent to the source terminal within a preset time period.

Further optionally, the processing device 600 for video stream of the present embodiment further includes a recording module 603 and a control module 604;

a recording module 603, configured to record a first time when a key frame request is sent to a source terminal;

a control module 604, configured to simultaneously start a first timer for a first preset duration;

the detection module 601 is further configured to detect whether the first timer fails;

the sending module 602 is further configured to resend the key frame request to the source terminal if the key frame request fails.

Further optionally, the detecting module 601 is further configured to detect whether a key frame returned by the source terminal is received after the first timer has not expired;

the recording module 603 is further configured to record a second time when the key frame is received if the key frame is received;

The control module 604 is further configured to control the first timer to fail.

Further optionally, the sending module 602 is further configured to send the key frame to the requesting terminal after receiving the key frame returned by the source terminal.

As a further alternative, as shown in fig. 7, in the processing apparatus 600 for a video stream of the present embodiment, a creating module 605 and an obtaining module 606 are further included;

the detection module 601 is further configured to detect whether a buffer of a key frame request and a corresponding second timer are set if a key frame of the source terminal is received within a preset period of time;

a creating module 605, configured to create a buffer of key frame requests if not;

the control module 604 is further configured to simultaneously start a second timer for a second preset duration; the second preset duration is used for identifying the duration of the current moment from the third moment of last time of receiving the key frame of the source terminal;

the detection module 601 is further configured to detect whether the second timer is invalid;

an obtaining module 606, configured to obtain, if the key frame request fails, a key frame request from the cache;

and a sending module 602, configured to send a key frame request to the source terminal.

Further optionally, the detecting module 601 is further configured to detect whether a key frame returned by the source terminal is received after the second timer has not expired;

A recording module 603, configured to record a fourth time when the key frame is received if the key frame is received;

the control module 604 is further configured to control the second timer to fail.

Further optionally, the detection module 601 is further configured to detect and determine that a downlink of the streaming media with the requesting terminal has been created.

As described in the related method embodiments, the processing apparatus 600 for video streaming of the present embodiment may be a server or a server cluster.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 8 illustrates a schematic block diagram of an example electronic device 800 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

Various components in device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, for example, a processing method of a video stream. For example, in some embodiments, the method of processing a video stream may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 800 via ROM 802 and/or communication unit 809. When a computer program is loaded into RAM 803 and executed by computing unit 801, one or more steps of the video stream processing method described above may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the processing method of the video stream in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for processing a video stream, an execution subject of the method for processing a video stream being a server, the method being applied in a scenario in which a request terminal newly joins a video conference, the method comprising:

after the request terminal joins the video conference, receiving a video stream request carrying a request terminal identifier and a source terminal identifier initiated by the request terminal; the video stream request is that the video conference application detects that the request terminal is a newly added terminal, and the request terminal is controlled to initiate; detecting and determining that the video frame returned by the source terminal is a non-key frame; determining that the video frame does not need to be returned to the request terminal; the video frames are video frames before the server receives the key frames returned by the source terminal for the first time after the request terminal sends the video stream request to the server;

Sending a key frame request to the source terminal;

the method further comprises the steps of:

after receiving the video stream request sent by the request terminal, the server updates the forwarding relation of the video stream corresponding to the source terminal so as to add the identification of the request terminal into the forwarding relation of the video stream corresponding to the source terminal; the forwarding relation comprises correspondence between the source terminal identification and identifications of a plurality of other terminals requesting the video stream of the source terminal.

2. The method of claim 1, wherein prior to sending a key frame request to the source terminal, the method further comprises:

and detecting and determining that the key frame of the source terminal is not received in a preset time period and the key frame request is not sent to the source terminal in the preset time period.

3. The method of claim 2, wherein the method further comprises:

recording a first moment of sending the key frame request to the source terminal, and simultaneously starting a first timer with a first preset duration;

detecting whether the first timer fails;

and if the key frame request fails, the key frame request is sent to the source terminal again.

4. A method according to claim 3, wherein if it is detected that the first timer has not expired, the method further comprises:

detecting whether a key frame returned by the source terminal is received or not;

if so, recording a second moment when the key frame is received;

and controlling the first timer to be invalid.

5. The method of claim 4, wherein after receiving the key frame returned by the source terminal, the method further comprises:

and sending the key frame to the request terminal.

6. The method according to any one of claims 2-5, wherein if the key frame of the source terminal is received within the preset time period, the method further comprises:

detecting whether a buffer memory of a key frame request and a corresponding second timer are set;

if not, creating a buffer memory of the key frame request, and simultaneously starting a second timer with a second preset duration; the second preset duration is used for identifying the duration of the current moment from the third moment when the key frame of the source terminal is last received;

detecting whether the second timer is invalid;

if the key frame request fails, acquiring the key frame request from the cache;

and sending the key frame request to the source terminal.

7. The method of claim 6, wherein if the second timer is detected as not being expired, the method further comprises:

if so, recording a fourth moment when the key frame is received;

and controlling the second timer to be invalid.

8. The method according to any of claims 1-7, wherein prior to sending a key frame request to the source terminal, the method further comprises:

a downlink of streaming media with the requesting terminal is detected and determined to have been created.

9. A processing device of video stream is a server, applied in a scene of requesting a terminal to newly join in a video conference, and comprises:

the receiving module is used for receiving a video stream request carrying a request terminal identifier and a source terminal identifier initiated by the request terminal after the request terminal joins the video conference; the video stream request is that the video conference application detects that the request terminal is a newly added terminal, and the request terminal is controlled to initiate;

the detection module is used for detecting and determining that a video frame returned by the source terminal is a non-key frame after the video stream of the source terminal is requested by the request terminal; determining that the video frame does not need to be returned to the request terminal; the video frames are video frames before the server receives the key frames returned by the source terminal for the first time after the request terminal sends the video stream request to the server;

A sending module, configured to send a key frame request to the source terminal;

the detection module is further configured to update a forwarding relationship of the video stream corresponding to the source terminal after the server receives the video stream request sent by the request terminal, so as to add the identifier of the request terminal to the forwarding relationship of the video stream corresponding to the source terminal; the forwarding relation comprises correspondence between the source terminal identification and identifications of a plurality of other terminals requesting the video stream of the source terminal.

10. The apparatus of claim 9, wherein the detection module is further configured to:

11. The apparatus of claim 10, further comprising a recording module and a control module;

the recording module is used for recording a first moment when the key frame request is sent to the source terminal;

the control module is used for simultaneously starting a first timer with a first preset duration;

the detection module is also used for detecting whether the first timer fails;

And the sending module is further used for sending the key frame request to the source terminal again if the key frame request fails.

12. The apparatus of claim 11, wherein;

the detection module is further used for detecting whether a key frame returned by the source terminal is received after detecting that the first timer is not invalid;

the recording module is further used for recording a second moment when the key frame is received if the key frame is received;

the control module is also used for controlling the first timer to fail.

13. The apparatus of claim 12, wherein the sending module is further configured to send the key frame to the requesting terminal after receiving the key frame returned by the source terminal.

14. The apparatus of any of claims 10-13, further comprising a creation module and an acquisition module;

the detection module is further configured to detect whether a buffer of a key frame request and a corresponding second timer are set if the key frame of the source terminal is received within the preset time period;

the creation module is used for creating the buffer memory of the key frame request if the key frame request is not found;

the control module is also used for simultaneously starting a second timer with a second preset duration; the second preset duration is used for identifying the duration of the current moment from the third moment when the key frame of the source terminal is last received;

The detection module is further used for detecting whether the second timer is invalid;

the acquisition module is used for acquiring the key frame request from the cache if the key frame request fails;

and the sending module is used for sending the key frame request to the source terminal.

15. The apparatus of claim 14, wherein:

the detection module is further configured to detect whether a key frame returned by the source terminal is received after the second timer is not failed;

the recording module is used for recording a fourth moment when the key frame is received if the key frame is received;

the control module is also used for controlling the second timer to fail.

16. The apparatus according to any of claims 9-15, wherein the detection module is further configured to detect and determine that a downlink of streaming media with the requesting terminal has been created.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.