CN118101992A - Video processing method and device, electronic equipment and storage medium - Google Patents

Video processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN118101992A
CN118101992A CN202410172006.0A CN202410172006A CN118101992A CN 118101992 A CN118101992 A CN 118101992A CN 202410172006 A CN202410172006 A CN 202410172006A CN 118101992 A CN118101992 A CN 118101992A
Authority
CN
China
Prior art keywords
video
stream data
video stream
data packet
data packets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410172006.0A
Other languages
Chinese (zh)
Inventor
方晶
潘峥正
任韵雯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202410172006.0A priority Critical patent/CN118101992A/en
Publication of CN118101992A publication Critical patent/CN118101992A/en
Pending legal-status Critical Current

Links

Landscapes

  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application discloses a video processing method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: receiving video playing requests sent by N terminals, wherein each video playing request comprises: the first video, N is a positive integer greater than 1; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; respectively converting the N first video stream data packets into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; synthesizing a target data packet according to the N second video stream data packets based on the time stamps; and outputting the target data packets to N terminals.

Description

Video processing method and device, electronic equipment and storage medium
Technical Field
The application belongs to the technical field of internet, and particularly relates to a video processing method, a video processing device, electronic equipment and a storage medium.
Background
Currently, in various video scenes, there are multiple cases of video mixing and returning. Such as video conferencing, online classroom, multi-terminal live scenes, etc.
In these scenarios, the server needs to receive multiple source videos sent by multiple terminals, and then process the multiple source videos and then send the processed multiple source videos to each terminal. Because of the difference between the format and the frame number of the source video transmitted by each terminal, the server needs to transmit a video suitable for the performance of the server to each terminal, which occupies a large amount of transmission resources.
Therefore, the current video transmission scheme has high transmission cost and low transmission efficiency.
Disclosure of Invention
The embodiment of the application provides a video processing method, a device, equipment and a storage medium, which can solve the problems of high transmission cost and low transmission efficiency in the existing video transmission scheme.
In a first aspect, an embodiment of the present application provides a video processing method, including:
Receiving video playing requests sent by N terminals, wherein each video playing request comprises: the first video, N is a positive integer greater than 1;
Converting each first video into a first video stream data packet in response to the video playing request, wherein the first video stream data packet comprises a first video frame and a time stamp of each first video frame;
respectively converting the N first video stream data packets into N second video stream data packets; the number of second video frames in any two second video stream data packets is consistent;
synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
And outputting the target data packet to N terminals.
Optionally, in response to the video playing request, converting each of the first videos into a first video stream data packet includes:
splitting each first video into RTP data packets of a real-time transmission protocol, wherein the RTP data packets comprise: audio stream data packets, the first video stream data packets, and subtitle stream data packets.
Optionally, converting the N first video stream packets into N second video stream packets respectively, including:
Analyzing the video playing request to obtain the preset number of video frames;
and for any one of the N first video stream data packets, processing the first video stream data packet according to the number of first video frames in the first video stream data packet and the preset number of video frames to obtain a second video stream data packet, wherein the second video stream data packet comprises second video frames with the preset number of video frames.
Optionally, processing the first video stream data packet according to the number of first video frames in the first video stream data packet and the preset number of video frames to obtain a second video stream data packet, including:
Deleting H first video frames from the first video stream data packet to obtain the second video stream data packet under the condition that the number of the first video frames is larger than the preset number of the video frames; the H is the difference between the number of the first video frames and the preset video frame number;
Copying R first video frames into the first video stream data packet to obtain the second video stream data packet under the condition that the number of the first video frames is smaller than the preset video frame number; and R is the difference between the preset video frame number and the first video frame number.
Optionally, synthesizing a target data packet according to the N second video stream data packets based on the time stamp, including:
generating a target video stream data packet according to the N second video stream data packets;
And generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets based on the time stamp.
Optionally, generating a target video stream packet according to the N second video stream packets includes:
splicing video frames in the N second video stream data packets based on the time stamp, the position information and the size information to generate one target video stream data packet; the video play request includes: the position information and the size information.
Optionally, synthesizing a target data packet according to the N second video stream data packets based on the time stamp, including:
Synthesizing N second video stream data packets into a target video based on the time stamps;
And packaging the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
In a second aspect, an embodiment of the present application provides a video processing apparatus, including:
The receiving module is used for receiving video playing requests sent by N terminals, and each video playing request comprises: the first video, N is a positive integer greater than 1;
The conversion module is used for responding to the video playing request, converting each first video into a first video stream data packet, wherein the first video stream data packet comprises first video frames and time stamps of each first video frame;
the conversion module is further configured to convert the N first video stream data packets into N second video stream data packets respectively; the number of second video frames in any two second video stream data packets is consistent;
The synthesizing module is used for synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
and the output module is used for outputting the target data packet to N terminals.
Optionally, the conversion module is specifically configured to:
splitting each first video into real-time transport protocol RTP packets, the RTP packets comprising: audio stream packets, first video stream packets, and subtitle stream packets.
Optionally, the conversion module is specifically configured to:
analyzing the video playing request to obtain the preset number of video frames;
And for any one of the N first video stream data packets, processing the first video stream data packet according to the number of the first video frames in the first video stream data packet and the number of the preset video frames to obtain a second video stream data packet, wherein the second video stream data packet comprises the second video frames with the number of the preset video frames.
Optionally, the conversion module is specifically configured to:
Deleting H first video frames from the first video stream data packet under the condition that the number of the first video frames is larger than the preset number of the video frames, so as to obtain a second video stream data packet; h is the difference between the number of the first video frames and the number of the preset video frames;
Copying R first video frames into a first video stream data packet to obtain a second video stream data packet under the condition that the number of the first video frames is smaller than the number of the preset video frames; r is the difference between the number of preset video frames and the number of first video frames.
Optionally, the synthesis module is specifically configured to:
Generating a target video stream data packet according to the N second video stream data packets;
and generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets based on the time stamp.
Optionally, the synthesis module is specifically configured to:
Splicing video frames in the N second video stream data packets based on the time stamp, the position information and the size information to generate a target video stream data packet; the video play request includes: position information and size information.
Optionally, the synthesis module is specifically configured to:
based on the time stamps, synthesizing N second video stream data packets into a target video;
and encapsulating the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory storing computer program instructions; the processor, when executing the computer program instructions, implements the method as in the first aspect or any of the possible implementations of the first aspect.
In a fourth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement a method as in the first aspect or any of the possible implementations of the first aspect.
In the embodiment of the present application, by receiving video play requests sent by N terminals, each video play request includes: a first video; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; because the frames of the first video sent by each terminal are different, in the transmission process, N storage resources and transmission resources are required to be occupied, and therefore N first video stream data packets are respectively converted into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; according to the embodiment of the application, the first videos with different frames sent by each terminal can be converted into the second video stream data packets with the same frames. And then, based on the time stamp, synthesizing a target data packet according to the N second video stream data packets, wherein the N storage resources and the transmission resources are not required to be occupied, and the target data packet only needs to occupy one storage resource and one transmission resource, so that the target data packet is output to N terminals, the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Drawings
In order to more clearly illustrate the technical solution of the embodiments of the present application, the drawings that are needed to be used in the embodiments of the present application will be briefly described, and it is possible for a person skilled in the art to obtain other drawings according to these drawings without inventive effort.
Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a video processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a video processing architecture according to an embodiment of the present application;
Fig. 4 is a schematic structural diagram of a video processing apparatus according to an embodiment of the present application;
fig. 5 is a schematic hardware structure of an electronic device according to an embodiment of the present application.
Detailed Description
Features and exemplary embodiments of various aspects of the present application will be described in detail below, and in order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings and the detailed embodiments. It should be understood that the specific embodiments described herein are merely configured to illustrate the application and are not configured to limit the application. It will be apparent to one skilled in the art that the present application may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the application by showing examples of the application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
The technical terms related to the present application will be briefly described.
Frame number, video frame number refers to the number of still images displayed per second in video. It is generally expressed in Frames Per Second (FPS) at a frame rate (FRAMES PER seconds). The number of video frames determines the smoothness and visual effect of the video. Higher frame rates may make the video appear smoother, while lower frame rates may cause the video to appear stuck or incoherent.
The Real-time transport protocol (Real-time Transport Protocol, RTP) is a network transport protocol. The RTP protocol specifies the standard packet format for delivering audio and video over the internet. It was originally designed as a multicast protocol, but was later used in many unicast applications.
YUV is a color coding method. Are often used in individual video processing components. YUV allows for reduced bandwidth of chroma in encoding video or light, taking into account human perceptibility.
YUV is a kind of compiling true-color space (color space). "Y" represents the brightness (Luminance or Luma), i.e., the gray scale values, "U" and "V" represent the chromaticity (Chrominance or Chroma), which is used to describe the image color and saturation for the color of the given pixel.
The Real-TIME STREAM Protocol (RTSP) Protocol is a text-based multimedia play control Protocol, belonging to the application layer. RTSP works in a client-side manner to provide play, pause, rewind, forward, etc. operations to streaming media. RTSP is an application layer protocol that is mainly used to control the transmission of data with real-time characteristics.
The Real-time message transfer Protocol (RTMP) Protocol is an application layer Protocol, and the reliability of information transmission is ensured by a reliable transmission layer Protocol at the bottom layer. After the transport layer protocol based link establishment is completed, the RTMP protocol also establishes the RTMP link over the transport layer based link by the client and server through a "handshake".
The streaming media network transmission protocol (HTTP LIVE STREAMING, HLS) based on HTTP directly slices the streaming media into a segment, and the versions with different rates can be cut into corresponding slices in the information storage list file. The player can directly use the HTTP protocol to request stream data, can freely switch between versions with different rates, realize seamless play, and save the trouble of using other protocols.
Fig. 1 is a flowchart of a video processing method according to an embodiment of the present application.
As shown in fig. 1, the video processing method may include steps 110 to 150, and the method is applied to a video processing apparatus, specifically as follows:
Step 110, receiving video playing requests sent by N terminals, where each video playing request includes: the first video, N is a positive integer greater than 1;
Step 120, in response to the video playing request, converting each first video into a first video stream data packet, where the first video stream data packet includes a first video frame and a timestamp of each first video frame;
Step 130, converting the N first video stream packets into N second video stream packets, respectively; the number of the second video frames in any two second video stream data packets is consistent;
Step 140, synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
and step 150, outputting the target data packet to N terminals.
In the embodiment of the present application, by receiving video play requests sent by N terminals, each video play request includes: a first video; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; because the frames of the first video sent by each terminal are different, in the transmission process, N storage resources and transmission resources are required to be occupied, and therefore N first video stream data packets are respectively converted into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; according to the embodiment of the application, the first videos with different frames sent by each terminal can be converted into the second video stream data packets with the same frames. And then, based on the time stamp, synthesizing a target data packet according to the N second video stream data packets, wherein the N storage resources and the transmission resources are not required to be occupied, and the target data packet only needs to occupy one storage resource and one transmission resource, so that the target data packet is output to N terminals, the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Steps 110 to 150 are described below:
Step 110, receiving video playing requests sent by N terminals, where each video playing request includes: the first video, N is a positive integer greater than 1;
The execution main body of the embodiment of the application is a server, and the server receives video playing requests of N terminals, wherein each video playing request comprises videos acquired by the terminals. The number of video frames per first video is also different due to the different performance and configuration parameters of the respective terminals.
Illustratively, N is 9, the server receives the first videos of 9 terminals, with the number of video frames of the first video being 15FPS and the number of video frames of the first video being 30FPS.
In response to the video play request, each first video is converted into a first video stream packet including a first video frame and a timestamp of each first video frame, step 120.
The video stream in the first video stream data packet is a YUV original format naked stream.
In a possible embodiment, step 120 includes:
splitting each first video into real-time transport protocol RTP packets, the RTP packets comprising: audio stream packets, first video stream packets, and subtitle stream packets.
For each first video in the N first videos, splitting information such as audio streams, video streams, subtitle streams and the like of the N first videos, and outputting RTP data packets to obtain audio stream data packets, first video stream data packets and subtitle stream data packets which are all data packets in RTP format.
The step of splitting each first video into RTP packets includes:
Transcoding each RTP data packet to obtain transcoded RTP data packets, wherein the transcoded RTP data packets comprise: the first video stream data packet.
And carrying out transcoding processing on each RTP data packet through a video DECODE module, and uniformly decoding the RTP data packets into a YUV original format naked stream.
Therefore, each first video is split into real-time transmission protocol RTP data packets, and the subsequent video processing of the first video stream data packets in the RTP data packets is facilitated.
Step 130, converting the N first video stream packets into N second video stream packets, respectively; the number of the second video frames in any two second video stream data packets is consistent;
And respectively converting the N first video stream data packets into N second video stream data packets, and finally merging and outputting the unified frame number to realize the unified frame number of the multipath video streams.
In a possible embodiment, step 130 includes:
analyzing the video playing request to obtain the preset number of video frames;
And for any one of the N first video stream data packets, processing the first video stream data packet according to the number of the first video frames in the first video stream data packet and the number of the preset video frames to obtain a second video stream data packet, wherein the second video stream data packet comprises the second video frames with the number of the preset video frames.
Illustratively, the number of preset video frames is 20FPS, the number of first video frames in the first video stream data packet is 30FPS, and the second video stream data packet is obtained by processing the first video stream data packet according to the number of first video frames in the first video stream data packet and the number of preset video frames.
A second video stream packet includes 20FPS second video frames.
Therefore, the first video stream data packet is processed according to the number of the first video frames in the first video stream data packet and the number of the preset video frames to obtain the second video stream data packet, the number of the video frames in each second video stream data packet can be unified, and subsequent transmission is facilitated.
The step of processing the first video stream data packet according to the number of the first video frames and the preset number of the video frames in the first video stream data packet to obtain the second video stream data packet may specifically include the following steps:
Deleting H first video frames from the first video stream data packet under the condition that the number of the first video frames is larger than the preset number of the video frames, so as to obtain a second video stream data packet; h is the difference between the number of the first video frames and the number of the preset video frames;
Copying R first video frames into a first video stream data packet to obtain a second video stream data packet under the condition that the number of the first video frames is smaller than the number of the preset video frames; r is the difference between the number of preset video frames and the number of first video frames.
Illustratively, the number of preset video frames is 20FPS, and the number of first video frames in the first video stream data packet is 30FPS, that is, if the number of first video frames is greater than the number of preset video frames, 10 first video frames are deleted from the first video stream data packet to obtain the second video stream data packet, where h=30-20=10 FPS.
For example, the number of preset video frames is 20FPS, and the number of first video frames in the first video stream data packet is 15FPS, that is, in the case that the number of first video frames is smaller than the number of preset video frames, 5 first video frames are copied into the first video stream data packet to obtain a second video stream data packet, and a second video stream data packet is obtained, where h=20-15=5 FPS.
Because of the difference between the frames of the first video sent by each terminal, in the transmission process, for example, the first video stream data packet of 30FPS and the first video stream data packet of 15FPS cannot be combined and transmitted, and the storage resource and the transmission resource need to be occupied respectively.
Based on the embodiment of the application, 10 first video frames are deleted from the first video stream data packet of the 30FPS to obtain the second video stream data packet of the 20 FPS; and copying the 5 first video frames into a first video stream data packet of 15FPS to obtain a second video stream data packet of 20 FPS.
Therefore, the filling or discarding of the frames of the multi-channel video stream can be automatically completed according to the preset number of video frames designated by the user, the number of video frames in each second video stream data packet can be unified, and the subsequent video transmission is facilitated. The first video stream data packets of each 20FPS can be combined and transmitted without occupying storage resources and transmission resources respectively, so that the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Step 140, synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
For example, the first video stream data packet of the 20FPS may be synthesized based on the time stamp to obtain the target data packet, so as to facilitate transmission of the target data packet to each terminal.
As shown in fig. 3, N is 9, the network bandwidth that is required to be consumed for transmitting 9 first videos is 9M, each first video stream packet is now processed to obtain a second video stream packet, and based on the time stamp, a target packet is synthesized according to the N second video stream packets. The network bandwidth consumed for transmitting the target data packet is 1M. Thus, the network bandwidth required to transmit the 9 first videos is greatly reduced.
Therefore, N first video stream data packets are respectively converted into N second video stream data packets with the same number of video frames, and then based on the time stamps, the N second video stream data packets are synthesized into the target data packet, so that the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
In a possible embodiment, step 140 includes:
Generating a target video stream data packet according to the N second video stream data packets;
and generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets based on the time stamp.
Wherein generating a target video stream data packet according to the N second video stream data packets includes:
And extracting N second video frames in the memory scheduling queue based on the time stamp, and performing overlay arrangement to generate a target video stream data packet. The memory scheduling queue includes: n second video stream data packets; the Overlay network takes the existing physical network as a basis, and establishes an overlapped virtual network on the existing physical network to realize the virtualization of network resources.
Then, based on the time stamp, generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets, and maintaining original subtitle streams and audio streams while unifying the number of video frames in each second video stream data packet, thereby improving video playing experience.
Thus, based on the time stamp, the target video stream data packet can be quickly restored to the target data packet with the subtitle and the audio according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets.
The step of generating a target video stream data packet according to the N second video stream data packets may specifically include the following steps:
Splicing video frames in the N second video stream data packets based on the time stamp, the position information and the size information to generate a target video stream data packet; the video play request includes: position information and size information.
The position information may be coordinate information, and the size information may include a length value and a width value.
And scaling the video frames in each second video stream data packet according to the size information to obtain scaled video frames, and then splicing the scaled video frames according to the time stamp and the position information to generate a target video stream data packet.
Therefore, the multi-path video confluence realized based on the cloud server can realize the same-screen playing of the video automatic synchronization frame number, namely, the same-screen playing method based on differentiation, customization, flexible combination, manageability and easy maintenance of the user is realized.
In a possible embodiment, step 140 includes:
based on the time stamps, synthesizing N second video stream data packets into a target video;
and encapsulating the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
And encapsulating the target video into a target data packet with a preset format to obtain a single-channel video stream playing address of the target data packet with the preset format. The preset format may include: RTSP, RTMP or HLS to meet video playing requirements of different terminals.
And step 150, outputting the target data packet to N terminals.
Based on an operation strategy and a resource scheduling rule, mixing video stream flows of multiple paths of first videos so as to output target data packets to N terminals, meet personalized display requirements of users, and reduce loads of a server and a client bandwidth network.
In the video processing method provided by the application, by receiving video playing requests sent by N terminals, each video playing request comprises: a first video; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; because the frames of the first video sent by each terminal are different, in the transmission process, N storage resources and transmission resources are required to be occupied, and therefore N first video stream data packets are respectively converted into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; according to the embodiment of the application, the first videos with different frames sent by each terminal can be converted into the second video stream data packets with the same frames. And then, based on the time stamp, synthesizing a target data packet according to the N second video stream data packets, wherein the N storage resources and the transmission resources are not required to be occupied, and the target data packet only needs to occupy one storage resource and one transmission resource, so that the target data packet is output to N terminals, the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Based on the video processing method shown in fig. 1, an embodiment of the present application further provides a video processing architecture, as shown in fig. 3, where the video processing architecture may include:
The video source comprises N first videos respectively corresponding to the terminals.
The access layer receives video playing requests sent by N terminals, and each video playing request comprises: and (3) a first video, wherein N is a positive integer greater than 1.
The decoding layer splits each first video into real-time transport protocol RTP data packets, and the RTP data packets comprise: audio stream packets, first video stream packets, and subtitle stream packets. A first video stream packet includes first video frames and a timestamp for each first video frame.
The processing layer is used for respectively converting the N first video stream data packets into N second video stream data packets; the number of second video frames in any two second video stream packets is consistent.
And synthesizing the target data packet according to the N second video stream data packets based on the time stamps.
Specifically, under the condition that the number of the first video frames is larger than the preset number of the video frames, deleting H first video frames from the first video stream data packet to obtain a second video stream data packet; h is the difference between the number of the first video frames and the number of the preset video frames;
Copying R first video frames into a first video stream data packet to obtain a second video stream data packet under the condition that the number of the first video frames is smaller than the number of the preset video frames; r is the difference between the number of preset video frames and the number of first video frames.
And the coding layer codes the target data packet and transmits the video stream which is successfully coded in a RTP video packet mode.
And the packaging layer is used for packaging the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
And the distribution layer outputs the target data packet to N terminals.
And the client (namely the terminal) analyzes the target data packet with the preset format under the condition of receiving the target data packet to obtain the target video with the preset format.
In the video processing method provided by the application, by receiving video playing requests sent by N terminals, each video playing request comprises: a first video; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; because the frames of the first video sent by each terminal are different, in the transmission process, N storage resources and transmission resources are required to be occupied, and therefore N first video stream data packets are respectively converted into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; according to the embodiment of the application, the first videos with different frames sent by each terminal can be converted into the second video stream data packets with the same frames. And then, based on the time stamp, synthesizing a target data packet according to the N second video stream data packets, wherein the N storage resources and the transmission resources are not required to be occupied, and the target data packet only needs to occupy one storage resource and one transmission resource, so that the target data packet is output to N terminals, the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Based on the video processing method shown in fig. 1, an embodiment of the present application further provides a video processing apparatus, as shown in fig. 4, the video processing apparatus 400 may include:
the receiving module 410 is configured to receive video play requests sent by N terminals, where each video play request includes: the first video, N is a positive integer greater than 1;
a conversion module 420, configured to convert each of the first videos into a first video stream packet in response to the video playing request, where the first video stream packet includes a first video frame and a timestamp of each of the first video frames;
The conversion module 420 is further configured to convert N first video stream packets into N second video stream packets, respectively; the number of second video frames in any two second video stream data packets is consistent;
A synthesizing module 430, configured to synthesize a target packet according to N second video stream packets based on the time stamps;
and an output module 440, configured to output the target data packet to N terminals.
In one possible embodiment, the conversion module 420 is specifically configured to:
splitting each first video into real-time transport protocol RTP packets, the RTP packets comprising: audio stream packets, first video stream packets, and subtitle stream packets.
In one possible embodiment, the conversion module 420 is specifically configured to:
analyzing the video playing request to obtain the preset number of video frames;
And for any one of the N first video stream data packets, processing the first video stream data packet according to the number of the first video frames in the first video stream data packet and the number of the preset video frames to obtain a second video stream data packet, wherein the second video stream data packet comprises the second video frames with the number of the preset video frames.
In one possible embodiment, the conversion module 420 is specifically configured to:
Deleting H first video frames from the first video stream data packet under the condition that the number of the first video frames is larger than the preset number of the video frames, so as to obtain a second video stream data packet; h is the difference between the number of the first video frames and the number of the preset video frames;
Copying R first video frames into a first video stream data packet to obtain a second video stream data packet under the condition that the number of the first video frames is smaller than the number of the preset video frames; r is the difference between the number of preset video frames and the number of first video frames.
In one possible embodiment, the synthesis module 430 is specifically configured to:
Generating a target video stream data packet according to the N second video stream data packets;
and generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets based on the time stamp.
In one possible embodiment, the synthesis module 430 is specifically configured to:
Splicing video frames in the N second video stream data packets based on the time stamp, the position information and the size information to generate a target video stream data packet; the video play request includes: position information and size information.
In one possible embodiment, the synthesis module 430 is specifically configured to:
based on the time stamps, synthesizing N second video stream data packets into a target video;
and encapsulating the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
In the embodiment of the present application, by receiving video play requests sent by N terminals, each video play request includes: a first video; converting each first video into a first video stream data packet in response to a video play request, the first video stream data packet including first video frames and a timestamp of each first video frame; because the frames of the first video sent by each terminal are different, in the transmission process, N storage resources and transmission resources are required to be occupied, and therefore N first video stream data packets are respectively converted into N second video stream data packets; the number of the second video frames in any two second video stream data packets is consistent; according to the embodiment of the application, the first videos with different frames sent by each terminal can be converted into the second video stream data packets with the same frames. And then, based on the time stamp, synthesizing a target data packet according to the N second video stream data packets, wherein the N storage resources and the transmission resources are not required to be occupied, and the target data packet only needs to occupy one storage resource and one transmission resource, so that the target data packet is output to N terminals, the consumption of network bandwidth can be reduced, and the video transmission efficiency is improved.
Fig. 5 shows a schematic hardware structure of an electronic device according to an embodiment of the present application.
A processor 501 and a memory 502 storing computer program instructions may be included in an electronic device.
In particular, the processor 501 may include a Central Processing Unit (CPU), or an Application SPECIFIC INTEGRATED Circuit (ASIC), or may be configured as one or more integrated circuits that implement embodiments of the present application.
Memory 502 may include mass storage for data or instructions. By way of example, and not limitation, memory 502 may comprise a hard disk drive (HARD DISK DRIVE, HDD), floppy disk drive, flash memory, optical disk, magneto-optical disk, magnetic tape, or universal serial bus (Universal Serial Bus, USB) drive, or a combination of two or more of the foregoing. Memory 502 may include removable or non-removable (or fixed) media, where appropriate. Memory 502 may be internal or external to the integrated gateway disaster recovery device, where appropriate. In a particular embodiment, the memory 502 is a non-volatile solid state memory. In a particular embodiment, the memory 502 includes Read Only Memory (ROM). The ROM may be mask programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), electrically rewritable ROM (EAROM), or flash memory, or a combination of two or more of these, where appropriate.
The processor 501 implements any one of the video processing methods of the illustrated embodiments by reading and executing computer program instructions stored in the memory 502.
In one example, the electronic device may also include a communication interface 505 and a bus 510. As shown in fig. 5, the processor 501, the memory 502, and the communication interface 505 are connected to each other by a bus 510 and perform communication with each other.
The communication interface 505 is mainly used for implementing communication between each module, apparatus, unit and/or device in the embodiment of the present application.
Bus 510 includes hardware, software, or both that couple components of the electronic device to one another. By way of example, and not limitation, the buses may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an Industry Standard Architecture (ISA) bus, an infiniband interconnect, a Low Pin Count (LPC) bus, a memory bus, a micro channel architecture (MCa) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCI-X) bus, a Serial Advanced Technology Attachment (SATA) bus, a video electronics standards association local (VLB) bus, or other suitable bus, or a combination of two or more of the above. Bus 510 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The electronic device may perform the video processing method in the embodiment of the present application, thereby implementing the video processing method described in connection with fig. 2.
In addition, in connection with the video processing method in the above embodiment, the embodiment of the present application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement the video processing method of fig. 1.
It should be understood that the application is not limited to the particular arrangements and instrumentality described above and shown in the drawings. For the sake of brevity, a detailed description of known methods is omitted here. In the above embodiments, several specific steps are described and shown as examples. The method processes of the present application are not limited to the specific steps described and shown, but various changes, modifications and additions, or the order between steps may be made by those skilled in the art after appreciating the spirit of the present application.
The functional blocks shown in the above-described structural block diagrams may be implemented in hardware, software, firmware, or a combination thereof. When implemented in hardware, it may be, for example, an electronic circuit, an Application Specific Integrated Circuit (ASIC), suitable firmware, a plug-in, a function card, or the like. When implemented in software, the elements of the application are the programs or code segments used to perform the required tasks. The program or code segments may be stored in a machine readable medium or transmitted over transmission media or communication links by a data signal carried in a carrier wave. A "machine-readable medium" may include any medium that can store or transfer information. Examples of machine-readable media include electronic circuitry, semiconductor memory devices, ROM, flash memory, erasable ROM (EROM), floppy disks, CD-ROMs, optical disks, hard disks, fiber optic media, radio Frequency (RF) links, and the like. The code segments may be downloaded via computer networks such as the internet, intranets, etc.
It should also be noted that the exemplary embodiments mentioned in this disclosure describe some methods or systems based on a series of steps or devices. The present application is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, or may be performed in a different order from the order in the embodiments, or several steps may be performed simultaneously.
In the foregoing, only the specific embodiments of the present application are described, and it will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the systems, modules and units described above may refer to the corresponding processes in the foregoing method embodiments, which are not repeated herein. It should be understood that the scope of the present application is not limited thereto, and any equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the present application, and they should be included in the scope of the present application.

Claims (10)

1. A method of video processing, the method comprising:
Receiving video playing requests sent by N terminals, wherein each video playing request comprises: the first video, N is a positive integer greater than 1;
Converting each first video into a first video stream data packet in response to the video playing request, wherein the first video stream data packet comprises a first video frame and a time stamp of each first video frame;
respectively converting the N first video stream data packets into N second video stream data packets; the number of second video frames in any two second video stream data packets is consistent;
synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
And outputting the target data packet to N terminals.
2. The method of claim 1, wherein said converting each of said first videos into a first video stream packet in response to said video play request comprises:
splitting each first video into RTP data packets of a real-time transmission protocol, wherein the RTP data packets comprise: audio stream data packets, the first video stream data packets, and subtitle stream data packets.
3. The method of claim 1, wherein converting the N first video stream packets into N second video stream packets, respectively, comprises:
Analyzing the video playing request to obtain the preset number of video frames;
and for any one of the N first video stream data packets, processing the first video stream data packet according to the number of first video frames in the first video stream data packet and the preset number of video frames to obtain a second video stream data packet, wherein the second video stream data packet comprises second video frames with the preset number of video frames.
4. The method according to claim 3, wherein said processing the first video stream packet according to the number of the first video frames in the first video stream packet and the preset number of the video frames to obtain the second video stream packet includes:
Deleting H first video frames from the first video stream data packet to obtain the second video stream data packet under the condition that the number of the first video frames is larger than the preset number of the video frames; the H is the difference between the number of the first video frames and the preset video frame number;
Copying R first video frames into the first video stream data packet to obtain the second video stream data packet under the condition that the number of the first video frames is smaller than the preset video frame number; and R is the difference between the preset video frame number and the first video frame number.
5. The method according to claim 2, wherein synthesizing the target data packet from the N second video stream data packets based on the time stamps comprises:
generating a target video stream data packet according to the N second video stream data packets;
And generating a target data packet according to the target video stream data packet, the N audio stream data packets and the N subtitle stream data packets based on the time stamp.
6. The method of claim 5, wherein generating a target video stream packet from the N second video stream packets comprises:
splicing video frames in the N second video stream data packets based on the time stamp, the position information and the size information to generate one target video stream data packet; the video play request includes: the position information and the size information.
7. The method of claim 1, wherein synthesizing the target data packet from the N second video stream data packets based on the time stamps comprises:
Synthesizing N second video stream data packets into a target video based on the time stamps;
And packaging the target video into a target data packet with a preset format, so that the terminal can analyze the target data packet with the preset format under the condition that the terminal receives the target data packet to obtain the target video with the preset format.
8. A video processing apparatus, the video processing apparatus comprising:
The receiving module is used for receiving video playing requests sent by N terminals, and each video playing request comprises: the first video, N is a positive integer greater than 1;
The conversion module is used for responding to the video playing request, converting each first video into a first video stream data packet, wherein the first video stream data packet comprises first video frames and time stamps of each first video frame;
the conversion module is further configured to convert the N first video stream data packets into N second video stream data packets respectively; the number of second video frames in any two second video stream data packets is consistent;
The synthesizing module is used for synthesizing a target data packet according to the N second video stream data packets based on the time stamp;
and the output module is used for outputting the target data packet to N terminals.
9. An electronic device, the electronic device comprising: a processor and a memory storing computer program instructions; the video processing method according to any one of claims 1-7 when said computer program instructions are executed by said processor.
10. A computer readable storage medium, having stored thereon computer program instructions which, when executed by a processor, implement the video processing method of any of claims 1-7.
CN202410172006.0A 2024-02-06 2024-02-06 Video processing method and device, electronic equipment and storage medium Pending CN118101992A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410172006.0A CN118101992A (en) 2024-02-06 2024-02-06 Video processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410172006.0A CN118101992A (en) 2024-02-06 2024-02-06 Video processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN118101992A true CN118101992A (en) 2024-05-28

Family

ID=91143361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410172006.0A Pending CN118101992A (en) 2024-02-06 2024-02-06 Video processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118101992A (en)

Similar Documents

Publication Publication Date Title
CN112822537B (en) Method, apparatus and medium for adapting video content to display characteristics
CN105765980B (en) Transmission device, transmission method, reception device, and reception method
US8457214B2 (en) Video compositing of an arbitrary number of source streams using flexible macroblock ordering
US11722636B2 (en) Transmission device, transmission method, reception device, and reception method
CN110661752A (en) Plug-in-free real-time video playing system and method
US11533522B2 (en) Transmission apparatus, transmission method, reception apparatus, and reception method
CN111818295B (en) Image acquisition method and device
CN111031389B (en) Video processing method, electronic device and storage medium
CN110662086A (en) 5G high-definition live broadcast system and video processing method
CN110602522A (en) Multi-path real-time live webRTC stream synthesis method
CN110996122B (en) Video frame transmission method, device, computer equipment and storage medium
CN115702562A (en) Video throughput improvement using long-term referencing, deep learning, and load balancing
CN114500914A (en) Audio and video forwarding method, device, terminal and system
CN112804471A (en) Video conference method, conference terminal, server and storage medium
CN118101992A (en) Video processing method and device, electronic equipment and storage medium
WO2022116822A1 (en) Data processing method and apparatus for immersive media, and computer-readable storage medium
CN111385081B (en) End-to-end communication method and device, electronic equipment and medium
CN112565799B (en) Video data processing method and device
CN103959796A (en) Digital video code stream decoding method, splicing method and apparatus
CN113645485A (en) Method and device for realizing conversion from any streaming media protocol to NDI (network data interface)
CN114600468B (en) Combiner system, receiver device, computer-implemented method and computer-readable medium for combining video streams in a composite video stream with metadata
CN113727183B (en) Live push method, apparatus, device, storage medium and computer program product
CN110858916B (en) Identification method and system supporting large-span correlation information coding
CN101814969A (en) Method and system for reducing bit stream and electronic device
CN117768687A (en) Live stream switching method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination