CN113596556A

CN113596556A - Video transmission method, server and storage medium

Info

Publication number: CN113596556A
Application number: CN202110753126.6A
Authority: CN
Inventors: 汪维
Original assignee: China Mobile Communications Group Co Ltd; MIGU Interactive Entertainment Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Interactive Entertainment Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2021-07-02
Filing date: 2021-07-02
Publication date: 2021-11-02
Anticipated expiration: 2041-07-02
Also published as: CN113596556B

Abstract

The invention discloses a video transmission method, a server and a computer readable storage medium, wherein the method comprises the following steps: determining an original key frame according to the similarity between video frames of a video to be transmitted; when a preset condition is met, generating a smooth key frame according to adjacent original key frames; and generating a target frame transmission sequence according to the original key frame and the smooth key frame, and sequentially transmitting the video frames in the target frame sequence. The invention aims to achieve the effect of avoiding the picture pause phenomenon of the client.

Description

Video transmission method, server and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a video transmission method, a server, and a computer-readable storage medium.

Background

With the development of technology, the functions of the mobile terminal become more and more powerful. Currently, many mobile terminals have a cloud game function. In a cloud game implementation, the cloud end needs to transmit audio and video to the mobile terminal to control the display content of the mobile terminal.

In the related art, in order to reduce video transmission overhead, a key frame in a video to be transmitted is generally determined according to similarity between video frames, and then non-key video frames except the key frame are discarded, and only the key frame and the number of discarded frames are transmitted to a terminal. When still animation transmission is performed, the similarity of video frames corresponding to videos which may be continuous for a long time is high, which may cause that video frames of a long time period are discarded, thereby causing video jamming.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The present invention is directed to a video transmission method, a server and a computer readable storage medium, which are capable of avoiding the occurrence of a picture pause phenomenon at a client.

In order to achieve the above object, the present invention provides a video transmission method, including the steps of:

determining an original key frame according to the similarity between video frames of a video to be transmitted;

when a preset condition is met, generating a smooth key frame according to adjacent original key frames;

and generating a target frame transmission sequence according to the original key frame and the smooth key frame, and sequentially transmitting the video frames in the target frame sequence.

Optionally, the step of generating a smooth key frame according to the adjacent original key frames includes:

determining a frame background corresponding to an original key frame;

determining a first foreground image corresponding to a first original key frame and a second foreground image corresponding to a second original key frame based on the frame background, wherein the first original key frame is adjacent to the second original key frame;

determining a smooth foreground map according to the first foreground map and the second foreground map;

and generating the smooth key frame according to the frame background and the smooth foreground image.

Optionally, the preset condition includes at least one of:

the frame interval duration between adjacent original key frames is greater than the preset duration;

and receiving a control instruction for raising the frame rate.

Optionally, after the step of determining the original key frame corresponding to the video to be transmitted according to the similarity between the video frames of the video to be transmitted, the method further includes:

when the preset condition is not met, determining a frame tag corresponding to each original key frame;

and sending the video to be transmitted to a client based on the frame tag, wherein when the video to be transmitted is sent, the code stream corresponding to the original key frame comprises the frame tag and frame data, and the code stream corresponding to the non-original key frame comprises the frame tag and a frameless data identifier.

Optionally, the video transmission method further includes:

caching a transmission frame sequence corresponding to the video to be transmitted;

and feeding back the transmission frame sequence when receiving the transmission request corresponding to the video to be transmitted again.

In addition, the present invention also provides another video transmission method, which includes the steps of:

receiving a video transmission code stream of a server;

determining a frame tag corresponding to a current frame based on the transmission code stream, and determining whether the code stream comprises frame data;

and when the code stream does not comprise the frame data, determining the current corresponding frame data in pre-stored data based on the frame tag.

Optionally, after the steps of determining a frame tag corresponding to the current frame based on the transmission code stream and determining whether the code stream includes frame data, the method further includes:

and when the code stream comprises the frame data, the frame tag and the frame data are stored in an associated manner.

In addition, to achieve the above object, the present invention further provides a server, which includes a memory, a processor, and a video transmission program stored on the memory and executable on the processor, wherein the video transmission program, when executed by the processor, implements the steps of the video transmission method as described above.

In addition, to achieve the above object, the present invention also provides a server, including:

the determining module is used for determining an original key frame according to the similarity between video frames of the video to be transmitted;

the generating module is used for generating a smooth key frame according to the adjacent original key frame when the preset condition is met;

and the sending module is used for generating a target frame input sequence according to the original key frame and the smooth key frame and sequentially sending the video frames in the target frame sequence.

Further, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a video transmission program which, when executed by a processor, implements the steps of the video transmission method as described above.

According to the video transmission method, the server and the computer readable storage medium provided by the embodiment of the invention, original key frames are determined according to the similarity between all video frames of a video to be transmitted, when a preset condition is met, smooth key frames are generated according to adjacent original key frames, a target transmission frame sequence is generated according to the original key frames and the smooth key frames, and the video frames in the target frame sequence are sequentially sent to a client. Since the smoothing key frame can be inserted above the correspondence of the initial transmission frame sequence to generate the target transmission frame sequence. And then sequentially sending the video frames in the target frame sequence to the client. So that the client can render the transition pictures between the original key frames based on the smooth key frames after receiving the smooth key frames. Thereby making the display output of the client smoother. Therefore, the phenomenon of pause in the video playing process of the client is avoided, and the smoothness of video display of the client is improved.

Drawings

Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a video transmission method according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of an alternative embodiment of the video transmission method of the present invention;

FIG. 4 is a flowchart illustrating a video transmission method according to another embodiment of the present invention;

fig. 5 is a schematic block diagram of a server according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.

The terminal of the embodiment of the invention can be a server or a computer.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1003, a memory 1004, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The network interface 1003 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., a disk memory). The memory 1004 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, the memory 1004, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and a video transmission program.

In the terminal shown in fig. 1, the processor 1001 may be configured to call a video transmission program stored in the memory 1004 and perform the following operations:

Further, the processor 1001 may call the video transmission program stored in the memory 1005, and also perform the following operations:

determining a frame background corresponding to an original key frame;

and receiving a control instruction for raising the frame rate.

Further, the processor 1001 may be configured to call a video transmission program stored in the memory 1004 and perform the following operations:

receiving a video transmission code stream of a server;

In the related art, in order to reduce video transmission overhead, key frames in a video to be transmitted are generally determined according to similarity between video frames, and then non-key video frames except the key frames are discarded, and only the key frames and the number of discarded frames are transmitted to a terminal. When still animation transmission is performed, the similarity of video frames corresponding to videos which may be continuous for a long time is high, which may cause that video frames of a long time period are discarded, thereby causing video jamming.

In order to solve the above-mentioned defects in the related art, an embodiment of the present invention provides a video transmission method, which aims to generate a smooth key frame between two key frames based on adjacent key frames, so that a frame sequence received by a client is more flow-processed in a playing process. Thereby achieving the purpose of avoiding the video from being blocked.

Referring to fig. 2, in an embodiment of the video transmission method of the present invention, the video transmission method includes the following steps:

step S10, determining an original key frame corresponding to the video to be transmitted according to the similarity between the video frames of the video to be transmitted;

step S20, when a preset condition is met, generating a smooth key frame according to the adjacent original key frame;

and step S30, generating a target frame input sequence according to the original key frame and the smooth key frame, and sequentially sending video frames in the target frame sequence to the client.

In this embodiment, after determining the video to be transmitted, the server may determine the original key frame corresponding to the video to be transmitted according to the similarity between the video frames of the video to be transmitted. The video to be transmitted refers to video data that the server prepares to generate to the client. Wherein the video data may be composed of a plurality of video picture frames (i.e., the video frames). The original key frame refers to one or more video frames with small similarity in the video to be transmitted.

Illustratively, after determining the video to be transmitted, if the video to be transmitted is composed of video frame 1, video frame 2, video frame 3, video frame 4, and … … video frame n, which are sequentially ordered. The similarity between video frame 1 and video frame 2 may be obtained first. And when the similarity between the video frame 1 and the video frame 2 is greater than the preset similarity, taking the video frame 1 as the original key frame. Video frame 2 is taken as a non-original key frame. Further, the similarity between video frame 1 and video frame 3 is determined. And if the similarity is greater than the preset similarity, taking the video frame 3 as a non-original key frame. And continuing to acquire the similarity between the video frame 1 and the video frame 4 so as to determine whether the video frame 4 is the original key frame according to the similarity. If the similarity is smaller than or equal to the preset similarity, determining the video frame 3 as an original key frame, and acquiring the similarity between the video frame 3 and the video frame 4. And the like in sequence until the video frame n is processed. Namely, all video frames corresponding to the video to be transmitted are processed. At least one original key frame corresponding to the video to be transmitted may be determined. The preset similarity is a self-defined numerical value, and may be set to any numerical value between 90% and 95%, for example, and preferably, the preset similarity is set to 92%.

Optionally, in an embodiment, after determining an original key frame corresponding to a video to be transmitted according to a similarity between video frames of the video to be transmitted, when the video to be transmitted is sent to a client, if the video frame to be sent is the original key frame, sending a frame tag and frame data corresponding to the video frame. And if the video frame to be sent is a non-original video frame, sending a frame tag and an identifier of the frameless data.

Illustratively, if the sequence of frames to be transmitted corresponding to the video to be transmitted is video frame 1, video frame 2, video frame 3, video frame 4, and … … video frame n. Wherein, the video frame 1 is an original key frame, the video frames 2 and 3 are non-original key frames, and the video frame 4 is an original key frame … …, and the video frame n is an original key frame. In the transmission process, when the video frame 1 is transmitted, the frame tag 1 corresponding to the video frame 1 and the specific frame data corresponding to the video frame 1 are transmitted. When transmitting the video frames 2 and 3, the frame tags corresponding to the video frames 2 and 3 and the frameless data identification are sent. The frame labels corresponding to the video frames 2 and 3 are the same as the video frame 1. Other original key frames and non-original key frames may be transmitted in the same manner, as well. When receiving the code stream sent by the server, the client can analyze the code stream to determine a frame tag corresponding to the current frame and content to be displayed. When the current frame is the original key frame, the frame data can be further analyzed, and the content corresponding to the frame data is used as the content to be displayed. When the current frame is a non-original key frame, the frame data corresponding to the original key frame having the same frame tag as the non-original key frame can be obtained according to the frame tag, and the frame data can be used as the display content corresponding to the current frame. In the transmission process, the transmission overhead occupied by the non-original key frames is extremely small, so the transmission mode provided by the embodiment achieves the effect of reducing the transmission overhead of the video data. It can be understood that, in the present solution, each video frame has transmission data, so that the present solution can be well adapted to a scene requiring uninterrupted transmission of video frames. For example, in a cloud game scene, since video frames need to be sent continuously to transmit game pictures, the related scheme of discarding non-key frames cannot be applied to the game transmission scene.

Further, when the server determines the original key frame, an initial sequence of transmission frames may be generated based on the original key frame. In the initial transmission frame sequence, the original key frame includes specific frame data and a frame tag, and the non-original key frame does not include the specific frame data, including the frame tag. Therefore, the frame data of the corresponding original key frame can be selected as the display data corresponding to the non-original key frame through the frame label of the non-original key frame. When the video to be transmitted needs to be sent to the client side, the video frame transmission can be directly carried out according to the initial transmission frame sequence without repeated similarity comparison so as to determine the original key frame.

Optionally, as an optional embodiment, when the client has a buffering function, the client may also buffer the initial transmission frame sequence when receiving the initial transmission frame sequence sent by the server. When the client requests the video to be transmitted for the second time, the server may send the corresponding frame tag only as the client, and the client queries the corresponding frame data in the cache data based on the frame tag to render a display picture.

It should be noted that the best application scenario of the above transmission method and the method in which the server and the client buffer the initial transmission frame sequence is a static animation transmission scenario of the cloud game. For example, in a login interface animation of a cloud game, the screen repetition rate is high, and each time a client logs in the game, the client needs to request a server for a static animation corresponding to the login screen. In addition, in the cloud game, due to the requirement of a synchronization mechanism of the game, the server needs to continuously send the frame information of the user rendered game interface picture corresponding to each moment to the client. Therefore, the transmission overhead in the scene can be well saved by using the method.

In the cloud game video transmission scene, the server may preset a start condition and an exit condition of the transmission mode. For example, the above-described transmission mode is enabled when the server detects that a static animation transmission scene is currently entered. And when touch occurs, the mode is exited and normal audio and video transmission is recovered. Or after more complex image recognition, the server side restores to the normal mode when judging that the non-static animation mode is adopted. In the process of circular playing, monitoring is carried out at regular time, similarity comparison is carried out on the collected frames and the static animation sequence every few seconds, and the frames with high similarity cannot be found, namely, the mode is judged to be exited.

Further, after the original key frame corresponding to the video to be transmitted is determined according to the similarity between the video frames of the video to be transmitted, whether a preset condition is met can be judged. Wherein the preset condition may include at least one of:

the frame interval duration between the adjacent original key frames is greater than the preset duration;

and receiving a control instruction for raising the frame rate.

It should be noted that the frame interval duration refers to an interval duration T between two adjacent original key frames. The preset duration may be determined according to a refresh rate (i.e., a number of Frames Per Second, FPS, Frames Per Second) of the video data played by the client. For example, 2 × may be set (1000/Fps). And when the interval duration T between the original key frames is more than 2 x (1000/Fps), judging that the preset condition is met.

In order to meet the frame rate requirements in different scenarios, when the original acquisition frame sequence is of a low frame rate, a frame sequence of a high frame rate, such as 30 frames of the original frame rate, needs to be generated to generate a transport stream of 60 frames of the high frame rate. The control instruction for raising the frame rate may be triggered. When a control instruction to raise the frame rate is detected, it may be determined that the preset condition is satisfied.

Further, when the preset condition is satisfied, a smooth key frame may be generated from the adjacent original key frames.

Optionally, referring to fig. 3, when the preset condition is satisfied, the smooth key frame may be generated according to an adjacent original key frame, where when the smooth key frame is generated according to an adjacent original key frame, the method may include the following steps:

step S21, determining a frame background corresponding to the original key frame;

step S22, determining a first foreground map corresponding to a first original key frame and a second foreground map corresponding to a second original key frame based on the frame background, where the first original key frame is adjacent to the second original key frame;

step S23, determining a smooth foreground image according to the first foreground image and the second foreground image;

and step S24, generating the smooth key frame according to the frame background and the smooth foreground image.

Illustratively, interval statistics can be performed according to gray values of an original frame sequence to obtain a background F with statistical significance_bObtaining the frame background F_b. First original key frame F_xAnd frame background F_bAnd (5) performing difference to obtain a first background difference image. Second original key frame F_x+1And frame background F_bAnd (5) making a difference to obtain a second background difference image. And then, carrying out binarization on the first background difference image and the second background difference image to obtain a moving foreground area. Namely, the image segmentation is realized to obtain a first original key frame F_xCorrespondingly, the first foreground image F_fxProcessed as above to obtain the secondOriginal key frame F_x+1Second foreground map F_fx+1。

According to the first foreground image F_fxThe second foreground image F_fx+1And a first original key frame F_xAnd a second original key frame F_x+1Is a frame interval duration T between frame intervals_xComputing a smooth keyframe F_x’Smooth foreground map F_fx’. And generating the smooth key frame according to the frame background and the smooth foreground image.

And after the smooth key frame is obtained, generating a target frame input sequence according to the original key frame and the smooth key frame, and sequentially sending video frames in the target frame sequence to a client.

Optionally, when the preset condition is not met, a frame tag corresponding to each original key frame may be determined, and then the video to be transmitted is sent to a client based on the frame tag, where when the video to be transmitted is sent, a code stream corresponding to the original key frame includes the frame tag and frame data, and a code stream corresponding to a non-original key frame includes the frame tag and a frameless data identifier.

Optionally, the transmission frame sequence corresponding to the video to be transmitted may also be buffered, and when the transmission request corresponding to the video to be transmitted is received again, the transmission frame sequence is sent to the originating terminal of the transmission request based on the transmission frame sequence.

Illustratively, the smoothing key frame may be inserted above a correspondence of the initial transmission frame sequence to generate a target transmission frame sequence. And then sequentially sending the video frames in the target frame sequence to the client. So that the client can render the transition pictures between the original key frames based on the smooth key frames after receiving the smooth key frames. Thereby making the display output of the client smoother. Therefore, the phenomenon of pause in the video playing process of the client is avoided, and the smoothness of video display of the client is improved.

Referring to fig. 4, in another embodiment of the video transmission method of the present invention, the video transmission method includes the steps of:

step S1, receiving a video transmission code stream of the server;

step S2, determining a frame label corresponding to the current frame based on the transmission code stream, and determining whether the code stream includes frame data;

and step S3, when the code stream does not include the frame data, determining the current corresponding frame data in the pre-stored data based on the frame tag.

In this embodiment, the video transmission method may be used for a client. Wherein, the client can request the video data from the server according to the initiative. For example, when a client is in a running interface of a cloud game, video data corresponding to the game interface may be requested. I.e. rendering parameters of the game interface. It is understood that data transmission between the server and the client may be based on a Real-time Transport Protocol (RTP). In the transmission process, the server may add a frame tag corresponding to each video frame to the head of the code stream corresponding to the video frame. The client can receive the video transmission code stream of the server. After receiving the code stream, the frame tag of the video frame carried in the code stream can be analyzed, and whether the frame data corresponding to the current video frame is contained in the code stream or not is determined.

It should be noted that, when the server sends the generated code stream, the following conditions are included:

the method comprises the following steps that firstly, in the initial sending process, if a code stream corresponding to a key frame is sent, a frame tag and frame data corresponding to the key frame are added into the code stream;

and in the case of initial transmission, if a non-key frame is transmitted, adding a frame tag corresponding to the non-key frame and a frameless data identifier in a code stream. And the frame label corresponding to the non-key frame is the same as the frame label corresponding to the previous key frame at the position of the non-key frame.

And in the third case, only the frame label corresponding to the current frame is sent in the process of sending again.

Further, after the frame tag corresponding to the current frame is determined, if the code stream includes frame data corresponding to the current frame, rendering a display picture corresponding to the current frame based on the frame data. If the frame data corresponding to the current frame is not included, or the frame data of the frame having the same frame data as the frame tag of the current frame is received before the current time. And rendering the display picture of the current frame based on the acquired frame data.

Optionally, when the code stream includes the frame data, the frame tag and the frame data are stored in association.

And when the received code stream only comprises the frame tags, obtaining frame data matched with the frame tags from the buffered data, and rendering the current picture based on the obtained data.

In the technical scheme disclosed in this embodiment, a video transmission code stream of a server is received, then a frame tag corresponding to a current frame is determined based on the transmission code stream, and when it is determined that the code stream includes frame data and the code stream does not include the frame data, the frame data corresponding to the current frame is determined in pre-stored data based on the frame tag. Only the frame data of the key frame is transmitted in the transmission process, so that the effect of saving the transmission overhead is achieved.

In addition, an embodiment of the present invention further provides a server, where the server includes a memory, a processor, and a video transmission program stored in the memory and executable on the processor, and when the video transmission program is executed by the processor, the steps of the video transmission method according to the above embodiments are implemented.

Referring to fig. 5, an embodiment of the present invention further provides a server 100, where the server includes:

the determining module 101 determines an original key frame according to the similarity between video frames of a video to be transmitted;

a generating module 102, configured to generate a smooth key frame according to an adjacent original key frame when a preset condition is met;

a sending module 103, configured to generate a target frame input sequence according to the original key frame and the smooth key frame, and sequentially send video frames in the target frame sequence.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a video transmission program is stored, and the video transmission program, when executed by a processor, implements the steps of the video transmission method according to the above embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for causing a server to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A video transmission method is applied to a server, and comprises the following steps:

2. The video transmission method of claim 1, wherein the step of generating a smooth key frame from neighboring original key frames comprises:

determining a frame background corresponding to an original key frame;

3. The video transmission method according to claim 1, wherein the preset condition includes at least one of:

and receiving a control instruction for raising the frame rate.

4. The video transmission method according to claim 1, wherein after the step of determining the original key frame corresponding to the video to be transmitted according to the similarity between the video frames of the video to be transmitted, the method further comprises:

5. The video transmission method according to claim 4, wherein the video transmission method further comprises:

6. A video transmission method is applied to a client, and comprises the following steps:

receiving a video transmission code stream of a server;

7. The video transmission method according to claim 6, wherein after the steps of determining a frame tag corresponding to the current frame based on the transmitted code stream and determining whether the code stream includes frame data, further comprising:

8. A server, characterized in that the server comprises: memory, processor and video transmission program stored on the memory and executable on the processor, the video transmission program when executed by the processor implementing the steps of the video transmission method according to any one of claims 1 to 5.

9. A server, characterized in that the server comprises:

10. A computer-readable storage medium, characterized in that a video transmission program is stored thereon, which when executed by a processor implements the steps of the video transmission method according to any one of claims 1 to 7.