CN112073810B - Multi-layout cloud conference recording method and system and readable storage medium - Google Patents

Multi-layout cloud conference recording method and system and readable storage medium Download PDF

Info

Publication number
CN112073810B
CN112073810B CN202011274901.1A CN202011274901A CN112073810B CN 112073810 B CN112073810 B CN 112073810B CN 202011274901 A CN202011274901 A CN 202011274901A CN 112073810 B CN112073810 B CN 112073810B
Authority
CN
China
Prior art keywords
data
layout
yuv
source video
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011274901.1A
Other languages
Chinese (zh)
Other versions
CN112073810A (en
Inventor
唐国华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Net Cloud Service Co Ltd
Original Assignee
G Net Cloud Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Net Cloud Service Co Ltd filed Critical G Net Cloud Service Co Ltd
Priority to CN202011274901.1A priority Critical patent/CN112073810B/en
Publication of CN112073810A publication Critical patent/CN112073810A/en
Application granted granted Critical
Publication of CN112073810B publication Critical patent/CN112073810B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/61Network physical structure; Signal processing
    • H04N21/6156Network physical structure; Signal processing specially adapted to the upstream path of the transmission network
    • H04N21/6175Network physical structure; Signal processing specially adapted to the upstream path of the transmission network involving transmission via Internet
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/64Addressing
    • H04N21/6405Multicasting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides a multi-layout cloud conference recording method, a multi-layout cloud conference recording system and a readable storage medium, wherein the method comprises the following steps: acquiring a plurality of source video data; decoding the source video data to obtain YUV data corresponding to each source video data; synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data; encoding the first YUV data to obtain encoded source video data; and generating a video file according to the coded source video data. According to the video file generation method and device, the YUV conversion is carried out on different source video data according to the source video data after the encoding, then the encoding is carried out according to the layout rule to generate the final video file, the video synthesis speed can be increased, the generation process is simplified, and the video file generation method and device are particularly suitable for the situation that the number of videos is large, various layout changes are required to be adapted, or other more complex situations can be met.

Description

Multi-layout cloud conference recording method and system and readable storage medium
Technical Field
The present application relates to the field of video processing, and more particularly, to a multi-layout cloud conference recording method, system, and readable storage medium.
Background
The cloud conference is an efficient, convenient and low-cost conference form based on a cloud computing technology, and can be used for remote communication and remote assistance by sharing audio, video, terminal desktop, documents, white boards and the like by using various terminal modes such as a telephone, a mobile phone, a computer, a special terminal and the like all over the world.
The cloud recording of the MP4 refers to that the entity image data and the audio data collected by the remote client are received at the cloud server, and are packaged into a playable MP4 media file according to the MP4 standard after being processed by cloud computing. The data sent to the server after the acquisition and processing of the client is source data of the MP4, where the source is defined as an example of audio, video, desktop, document, whiteboard, etc. shared in the cloud conference process, and the source data is data generated by these sources and also data received by the service. For example, data collected by one camera corresponds to one source, and data collected by a plurality of cameras corresponds to a plurality of sources. The generation process of the single-source MP4 is simple, image data is collected and encoded into H264 format data, the H264 data is written into an MP4 file according to an MP4 standard, so that a plurality of MP4 files can be generated from the data collected by a plurality of sources, but when the data of the plurality of sources are fused into one MP4 file, a plurality of media files are secondarily processed and fused into a single playable media file according to the current existing technology such as ffmpeg commands and a python library moviepy of video editing, and other software tools are completely feasible. This process is complex, slow, large after-production files, especially in situations where the number of videos is large and requires adaptation to various layout changes, or other more complex situations, are not applicable.
The multi-layout means that a plurality of sources are faced, the position of each source displayed on a computer screen can be changed at any time, and the image of each source can be inserted into the displayed layout at any time and can be closed from the layout at any time. Recording means that the scene is restored during playing, the synchronous relation of pictures among a plurality of sources is ensured, and the synchronous relation of the pictures and audio is ensured. How to achieve correct playback of multiple sources in the most direct and convenient form in the case of multiple layouts requires the design of special technical solutions to create a media file, such as an MP4 file, that can be played by a regular player.
In the cloud conference process, not only data fusion of a plurality of video sources, but also data such as a desktop, a document, a white board, a comment, speaker speaking voice, computer playing voice and the like participate in the fusion, and the method is a scene more complex than multi-source video fusion.
Disclosure of Invention
In order to solve at least one technical problem, the invention provides a multi-layout cloud conference recording method, a multi-layout cloud conference recording system and a readable storage medium.
The invention provides a multi-layout cloud conference recording method in a first aspect, which comprises the following steps:
acquiring a plurality of source video data;
decoding the source video data to obtain YUV data corresponding to each source video data;
synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data;
encoding the first YUV data to obtain encoded source video data;
and generating a video file according to the coded source video data.
In this scheme, the synthesizing of the plurality of YUV data according to a preset layout rule specifically includes:
acquiring a serial number of a preset layout rule;
searching a preset layout rule according to the serial number of the layout rule;
determining the number of windows and coordinate data information corresponding to each window according to the preset layout rule;
determining YUV data corresponding to each source video data in each window;
and synthesizing the YUV data corresponding to each source video data according to the position determined by the coordinate data information of each window to obtain first YUV data.
In this scheme, still include:
judging whether a preset layout rule is acquired before a plurality of source video data;
if not, determining key frame data of the plurality of source video data and a first layout rule of the corresponding source video data;
and synthesizing the YUV data corresponding to each source video data according to the first layout rule, and storing the YUV data until a preset layout rule is obtained.
In this scheme, still include:
detecting whether the plurality of source video data are one or more of documents, white boards and annotations;
if yes, generating image data from the source video data according to a time sequence;
and converting the image data into corresponding YUV data.
In this scheme, the synthesizing a plurality of YUV data according to a preset layout rule further includes:
calculating the time of the current first YUV data and the time interval of last generation of the first YUV data;
judging whether the time interval is greater than the minimum generation time or not;
if the minimum generation time is longer than the minimum generation time, triggering new first YUV data to be coded, and obtaining coded source video data;
and if the minimum generation time is shorter than the minimum generation time, storing the YUV data corresponding to each source video data.
In this scheme, still include:
acquiring recording start time and recording end time;
generating a time space of a preset unit according to the starting time and the ending time;
forming a mapping relation between the time space and the corresponding data space;
and generating the coded source video data through the mapping relation.
In this scheme, still include:
processing the code of the first YUV data by adopting a first thread to obtain coded source video data;
processing the audio information by adopting a second thread to obtain encoded audio data;
detecting whether the first thread finishes the encoding of the first YUV data or not;
if so, merging the encoded source video data and the encoded audio data in the second thread to obtain a video file.
In a second aspect of the present invention, a multi-layout cloud conference recording system includes a memory and a processor, where the memory includes a multi-layout cloud conference recording method program, and when executed by the processor, the multi-layout cloud conference recording method program implements the following steps:
acquiring a plurality of source video data;
decoding the source video data to obtain YUV data corresponding to each source video data;
synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data;
encoding the first YUV data to obtain encoded source video data;
and generating a video file according to the coded source video data.
In this scheme, the synthesizing of the plurality of YUV data according to a preset layout rule specifically includes:
acquiring a serial number of a preset layout rule;
searching a preset layout rule according to the serial number of the layout rule;
determining the number of windows and coordinate data information corresponding to each window according to the preset layout rule;
determining YUV data corresponding to each source video data in each window;
and synthesizing the YUV data corresponding to each source video data according to the position determined by the coordinate data information of each window to obtain first YUV data.
In this scheme, still include:
judging whether a preset layout rule is acquired before a plurality of source video data;
if not, determining key frame data of the plurality of source video data and a first layout rule of the corresponding source video data;
and synthesizing the YUV data corresponding to each source video data according to the first layout rule, and storing the YUV data until a preset layout rule is obtained.
In this scheme, still include:
detecting whether the plurality of source video data are one or more of documents, white boards and annotations;
if yes, generating image data from the source video data according to a time sequence;
and converting the image data into corresponding YUV data.
In this scheme, the synthesizing a plurality of YUV data according to a preset layout rule further includes:
calculating the time of the current first YUV data and the time interval of last generation of the first YUV data;
judging whether the time interval is greater than the minimum generation time or not;
if the minimum generation time is longer than the minimum generation time, triggering new first YUV data to be coded, and obtaining coded source video data;
and if the minimum generation time is shorter than the minimum generation time, storing the YUV data corresponding to each source video data.
In this scheme, still include:
acquiring recording start time and recording end time;
generating a time space of a preset unit according to the starting time and the ending time;
forming a mapping relation between the time space and the corresponding data space;
and generating the coded source video data through the mapping relation.
In this scheme, still include:
processing the code of the first YUV data by adopting a first thread to obtain coded source video data;
processing the audio information by adopting a second thread to obtain encoded audio data;
detecting whether the first thread finishes the encoding of the first YUV data or not;
if so, merging the encoded source video data and the encoded audio data in the second thread to obtain a video file.
A third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a multi-layout cloud conference recording method program, and when the multi-layout cloud conference recording method program is executed by a processor, the steps of the multi-layout cloud conference recording method described in any one of the above are implemented.
According to the multi-layout cloud conference recording method, the multi-layout cloud conference recording system and the readable storage medium, provided by the invention, YUV conversion is carried out on different source video data, and then coding is carried out according to the layout rule to generate a final video file, so that the video synthesis speed can be increased, and the generation process can be simplified. The method can be applied to the situations that the number of videos is large, various layout changes are required to be adapted, or other more complex situations.
Drawings
Fig. 1 shows a flowchart of a multi-layout cloud conference recording method according to the present invention;
FIG. 2 shows a schematic diagram of the invention after different sources have been synthesized by layout;
FIG. 3 is a schematic diagram illustrating the layout rules of the present invention;
fig. 4 shows a block diagram of a multi-layout cloud conference recording system according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
YUV: a color space used in video images has Y, U, V three components. Y denotes brightness (Luma) i.e. a gray value. U and V represent Chroma (Chroma) which is used to describe the color and saturation of the image for specifying the color of the pixel.
P of YUV 420P: for the planar in the YUV format, the Y of all the pixel points is continuously stored, the U of all the pixel points is stored, and then the V of all the pixel points is stored.
H264: in a video codec standard, H264 data is data coded and decoded according to the standard.
FFMPEG: a suite of open source libraries that can be used to record, convert digital audio, video, and convert them into streams.
Fig. 1 shows a flowchart of a multi-layout cloud conference recording method according to the present invention.
As shown in fig. 1, the invention discloses a multi-layout cloud conference recording method, which includes:
s102, acquiring a plurality of source video data;
s104, decoding the source video data to obtain YUV data corresponding to each source video data;
s106, synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data;
s108, encoding the first YUV data to obtain encoded source video data;
and S110, generating a video file according to the coded source video data.
It should be noted that, the various embodiments provided in the present invention are for better explaining the invention point of the present invention and solving how to generate a playable recording playback file under such a scenario, and the MP4 file is taken as an example but not limited to the MP4 file.
It should be noted that the source video data may be one or more of video data, desktop data, document data, whiteboard data, and annotation data.
Fig. 2 shows a schematic diagram of the different sources of the invention after synthesis by layout. The data unit constituting the MP4 picture is a frame of H264 data, and in the case of a plurality of layout windows, the frame of H264 data is merged with all source picture information in the current layout, which are distributed in order of positions divided by the layout if displayed as a frame of picture.
A frame of H264 data is not simply padded per source H264 data by position, but is obtained by a complex decoding and encoding process. The H264 data can be obtained by YUV data coding, and the YUV data can also be obtained by H264 data decoding. The specific YUV type corresponds to a fixed data storage format, and therefore. To obtain the customized H264 data frame, the customized YUV data is needed. The storage method of YUV420P is used in the invention, when storing, the space size of a fixed pixel is preset, the storage positions of 3 components of Y, U, V are calculated in the pixel space according to the storage format of YUV420P, the fixed pixel space corresponds to the total layout space of a plurality of sources to be synthesized, and the layout space is collectively called YUV data layout or YUV layout later. When H264 data of each source comes in, each frame of H264 data of each source is decoded into YUV data, then the YUV data is mapped into a YUV layout according to the window position of the source in the layout, each YUV component data is copied to the mapped position, therefore, the YUV distribution of a plurality of sources in a YUV layout space is obtained as the result of the mapping of a plurality of sources, namely, the YUV of the plurality of sources synthesizes a large YUV, namely, first YUV data, and finally the first YUV data is coded into a frame of H264 data to be output, and the obtained picture is the total picture of the pictures of the plurality of sources distributed in a large layout window.
According to the embodiment of the present invention, the synthesizing of the plurality of YUV data according to the preset layout rule specifically includes:
acquiring a serial number of a preset layout rule;
searching a preset layout rule according to the serial number of the layout rule;
determining the number of windows and coordinate data information corresponding to each window according to the preset layout rule;
determining YUV data corresponding to each source video data in each window;
and synthesizing the YUV data corresponding to each source video data according to the position determined by the coordinate data information of each window to obtain first YUV data.
It should be noted that the H264 data of the previous frame is a plurality of small windows distributed and displayed by a plurality of sources according to a specific layout form, and the distribution of the windows is changed and can be adjusted at any time in the cloud conference process, so that each frame of H264 data to be generated is adjusted and displayed according to the latest layout change in response to the latest layout change in the recording process of the cloud conference. The layout form is defined by the client, and is transmitted to the server end according to a specific protocol in the form of a message, and the server end only needs to understand what layout the client transmits according to the convention, and then controls the generation of each frame data according to the change of the layout.
Fig. 3 shows a schematic diagram of the layout rules of the present invention.
As shown in FIG. 3, first the server knows all possible layouts that have been defined, takes a layout name for each layout, and agrees with the client the window ID of each layout window. The server calculates the relative position of each small layout window relative to the total large layout window according to the window presentation form, the relative positions are recorded by the relative coordinates and relative width and height of the starting point, and each value represents the proportion of the total layout width and height, namely the value is obtained by multiplying the width and height of the total layout window when the real coordinates and the real width and height are finally calculated. Such as a uniformly distributed 4-window average4 layout, the first of its small windows has a starting coordinate of (0, 0), a relative width of 0.5, and a relative height of 0.5. The second small window has a starting coordinate of (0.5, 0), a relative width of 0.5 and a relative height of 0.5. Also the starting coordinates of the third small window are (0, 0.5) and the starting coordinates of the fourth small window are (0.5 ), since this is an equally divided layout, the relative width and height are both 0.5. If the total layout has a window size of 1280 × 720, i.e., a total width of 1280 and a total height of 720, the true starting coordinates of the 3 rd window are (0, 720 × 0.5= 360), the widths of the 4 windows are all 1280 × 0.5=640, and the heights are all 720 × 0.5= 360. Of course, the absolute value of each layout window may be configured, and the relative value is configured instead of the absolute value because the size of the total layout window can be adjusted at any time, for example, a large layout configured with 1280 × 720 may be changed to a large layout configured with 1920 × 1080, which is adjusted at any time as required. Each window of the layout is configured in a configuration file which can be read by a service after being artificially calculated, the configurations are loaded into a memory of a program when the service is started, the uploaded layout name is analyzed when the layout information uploaded by a client is received, then the relative coordinate of the corresponding window is found in the loaded configuration file corresponding to the layout name, and the absolute coordinate is obtained through calculation.
After the position of each window of the layout is determined, YUV data of each source is mapped to the whole large layout window in the program memory, the large layout window is uniformly called YUV data layout, which can be abbreviated as YUV layout, and is intermediate data for bearing all source data and determining the finally output picture form. The method is characterized in that the method is carried out when a client sends layout information when the layout of which source corresponds to which small window is changed, once the layout is stable, the window corresponding to the source is fixed, the display position is also fixed, and the display position of the source can be recalculated and adjusted along with the change of the layout only when the layout is changed again.
According to the embodiment of the invention, the method further comprises the following steps:
judging whether a preset layout rule is acquired before a plurality of source video data;
if not, determining key frame data of the plurality of source video data and a first layout rule of the corresponding source video data;
and synthesizing the YUV data corresponding to each source video data according to the first layout rule, and storing the YUV data until a preset layout rule is obtained.
It should be noted that video and desktop are the two most important sources in the source data, and in the case of good layout, if the data of a complete frame of H264 video and desktop come, it is only necessary to decode the data into YUV and then perform registration in the overall YUV large layout. If the H264 data is split into a plurality of data packets, framing operation is needed, the incoming packets are reassembled into a frame of H264 data, then YUV decoding operation is carried out, if the incoming packets are JPEG data, the JPEG data are converted into YUV data by calling an interface provided by FFMPEG, if other data in other coding modes also need YUV conversion, and the converted frame of YUV data is a synthesizing unit in a YUV total layout. Video and desktop data also have some layout-related exception handling operations. The layout is a place where the data live, and if the data can not find the position of the data storage before the data comes without the layout, the layout information must come before the data comes without guarantee, and the lost data is selected to wait for the layout. In network transmission, the loss of messages or the sequence of messages and data may not be guaranteed at all, and some abnormal situations need to be handled. If a key frame comes but a layout message is lost or a layout message does not come yet, the data driver actively creates the layout of the source as a temporary storage place and waits for the temporary layout to be covered when the layout comes. Since a key frame is required as a precondition when decoding and encoding a frame of data, all non-key frames before the next key frame need to be discarded when the first frame of data that appears is not a key frame.
According to the embodiment of the invention, the method further comprises the following steps:
detecting whether the plurality of source video data are one or more of documents, white boards and annotations;
if yes, generating image data from the source video data according to a time sequence;
and converting the image data into corresponding YUV data.
And (5) a document. In the cloud conference process, a document is uploaded to a cloud end and printed into a JPEG picture for storage, the layout of each page of the document is changed once, which page of the document is currently turned is provided in the layout, the corresponding JPEG picture is read from a disk through the page of document information, the display proportion of the current document page is informed in the layout information because the document is generally provided with a scroll bar, then the JPEG picture read into an internal memory is intercepted according to the proportion and converted into YUV data, and then the YUV data is mapped into a layout window of the YUV data.
A white board. The white board is specific point-line surface data in the cloud conference process, the white board data received by the server are structures for recording white board information, and the information comprises the size, shape, coordinates, colors and the like of the point-line surface. Without such whiteboard information, a whiteboard YUV data with a fixed size and white background color, i.e., a blank whiteboard, can be created. In the process of drawing the cloud conference whiteboard, each stroke of each pen is transmitted to the server in real time. Each data is accumulatively stored in the recording server, when each whiteboard information is transmitted to the server, the whiteboard is triggered to be redrawn, each time of drawing is to generate a new whiteboard YUV by all the current existing information and a blank whiteboard again, and then the whiteboard YUV is mapped into a layout window. The white board has pages, the white board information of each page is stored in the memory, when the page turning information is received, all the white board information of the page is taken out to regenerate YUV data, and then the YUV data is mapped to the data covering the previous page in the layout.
And (6) annotating. The content of the annotation is the content of the whiteboard, and the annotation is different from the whiteboard in that the background drawn by the annotation is not a blank whiteboard, but is a background image with the content. For example, the annotation of the desktop is a background of a frame of desktop image selected at the annotation time, and the annotation of the document is a background of the annotation selected from the current document page. So its background YUV is acquired in advance when generating annotation YUV. If the desktop annotation is generated, the current frame of desktop YUV data needs to be taken out, but the YUV data in the layout is synthesized YUV and cannot be taken out, so that the frame of YUV desktop data can only be used when the frame of YUV desktop data is stored to be annotated before the desktop data is mapped to the layout. Similarly, when the document is converted into YUV after the JPEG picture is loaded, one frame of YUV needs to be saved. Therefore, when the annotation YUV is generated, the background YUV data is taken out and redrawn with the whiteboard dot-line-plane information to synthesize a new YUV, and then the new YUV is mapped to the overall layout to replace the position of the original annotation background.
According to the embodiment of the present invention, the synthesizing the plurality of YUV data according to a preset layout rule further includes:
calculating the time of the current first YUV data and the time interval of last generation of the first YUV data;
judging whether the time interval is greater than the minimum generation time or not;
if the minimum generation time is longer than the minimum generation time, triggering new first YUV data to be coded, and obtaining coded source video data;
and if the minimum generation time is shorter than the minimum generation time, storing the YUV data corresponding to each source video data.
It should be noted that the method for determining the minimum time specifically includes:
acquiring the frame rate of each source video data;
and determining the time corresponding to the maximum frame rate as the minimum time.
And synthesizing the frame rate. In the cloud recording process, a plurality of H264 files need to be arranged on a time axis at a certain frame rate, so that a fixed mode is needed to continuously generate H264 data on the time axis. One frame of H264 data is synthesized at intervals, which is the synthesis frame rate.
The synthesis frame rate refers to the frequency at which YUV in the layout is synthesized into a frame of H264 data. If calculated at a frame rate of 10 frames/second, the above YUV layout data requires 100 milliseconds to generate one frame of H264 data. From the concept and characteristics of the composite frame rate, the first idea is to set a timer, that is, configure the frame rate, and then calculate how many milliseconds are needed to generate a frame at the configured frame rate, and the timer is used to time the frame.
For a cloud conference with multiple sources and different frame rates of each source, the timer method has some defects and can be used selectively. For example, the source data is 15 frames/second video data, if a frame rate of 10 frames/second is set to generate H264 data, the video data will be lost 5 frames per second, and the 5 frames are lost because the time to generate H264 frames is not yet reached, so the data stored in the layout will be overwritten by the new mapping. If 15 frames per second are set, other sources below this frame rate are forced to perform this composite frame rate, such as a whiteboard, which may be one frame per second for a certain period of time, and if it is forced to perform 15 frames per second, this frame data of the whiteboard has to be reused 15 times. For video data, 15 frames per second and 66 milliseconds per frame, which is an average, are not strictly one frame every 66 milliseconds, but may be 50, 60 milliseconds and may be 70, 80 milliseconds, so that the video faces the problem of setting 50, 60 milliseconds or 70, 80 milliseconds, among other sources. It is difficult to set a proper timing value using the timer. When setting the frame rate, if the frame rate is set according to the source with the highest frame rate, the YUV data of the source with the low frame rate in the layout can be continuously reused, one is to increase the encoding time and reduce the encoding efficiency, and the other is to increase the size of the file and occupy more storage space. If the set frame rate is lower than the frame rate of the source data, the source data may be lost.
The invention mainly uses the method of generating H264 data by a data driving mode without setting a timer. Setting the minimum time for generating a frame of data through a frame rate, namely setting all sources in the current layout, calculating the difference between the time of the current data and the time of the last generation every time when the data are mapped into the layout, triggering the generation of a new frame of H264 data if the difference is greater than the minimum generation time, and storing the new frame of H264 data into a YUV layout to wait for the arrival of the next data if the difference is less than the minimum generation time. In the cloud conference process, since the frame rate of each type of source is different, for example, the frame rate of the video is 15 frames per second, the frame rate of the desktop is 8 frames per second, and the frame rate of the document whiteboard is not high, in this case, it is appropriate to use the data driving method as the composite frame rate. With this method, the minimum time interval in the synthesis is set to the time occupied by one frame of the source with the maximum frame rate, for example, 66 ms in terms of the frame rate of 15 frames per second of the video. Thus, when video is available in the source, each time video data enters, the composition of the H264 frame is triggered as long as the time interval between two adjacent entries is calculated to be not less than 66 milliseconds. Since the minimum lower limit is set to the time required by the source with the maximum frame rate for one frame, the data of other sources with lower frame rates are transmitted with a time interval substantially larger than this time interval, so that the synthesis of the H264 data can be triggered.
According to the embodiment of the invention, the minimum frame rate of the encoded source video data is preset.
When a plurality of sources are collocated in a layout, the composite frame rate is the frame rate of the source with the highest frame rate, for example, when video, desktop, document and whiteboard exist, the composite frame rate is the frame rate of the video, when no video exists, the composite frame rate is determined by the frame rate of the desktop, and when no desktop exists, the document and the whiteboard are composited according to the unfixed frame rate. When only a document and a whiteboard exist, in order to avoid the compatibility problem of playing of some players caused by too small frame rate, a minimum composition frame rate can be set, if the minimum composition frame rate is set to 3 frames per second, when document whiteboard data comes, the last composition time needs to be compared, and a frame supplementing operation is performed according to a time difference, if the data comes after 1 second, 2 frames need to be supplemented additionally, the two supplemented frame data are data in a YUV layout repeatedly, that is, before the current data is synthesized and output, the previous YUV layout data are used first, time stamps corresponding to the two frame data on a time axis are set, and H264 data are synthesized and output by the data with different time stamps.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring recording start time and recording end time;
generating a time space of a preset unit according to the starting time and the ending time;
forming a mapping relation between the time space and the corresponding data space;
and generating the coded source video data through the mapping relation.
It should be noted that, the MP4 generated by recording the cloud conference needs to implement synchronous playing among multiple sources, which requires setting an accurate play time stamp PTS in generating each frame H264, and the PTS of each frame H264 data is calculated by using the millisecond time of the server receiving each data as a reference in the present invention. The time of recording the message of starting recording is taken as the starting time, the time of stopping recording or ending the conference is taken as the ending time, the time space from the starting time to the ending time is mapped into a simple data space, and the time of all the following data arriving at the server is also mapped into the data space according to the time interval distribution. The preset unit can be set by a person skilled in the art according to actual needs, and for example, the unit is millisecond. If the time space is (1599469826, 15994698863000), 15994698863000 and 159946983000 =600000, and the recording is mapped to the data space (0, 60000) or (1, 60001), then when there is a pause in the middle of recording, the pause time length is subtracted from the simple data space, if there is a half-minute pause in the middle of recording, half-minute =30000 ms, and 60001 and 30000=30001, the data space is compressed to (1, 30001), and the time of the data coming after the half-minute pause is also compressed to 30000 accordingly. Thus the PTS of each data corresponds to the time of arrival at the server, and multiple sources arrive within a frame time interval, then the composition is displayed simultaneously, i.e., each frame seen by the resulting MP4 as it plays is the data that arrives within that frame time. The data of a plurality of sources are synchronously displayed by taking the time of receiving the data by the server as a standard, so that the synchronous playing among videos in the multi-source layout is realized.
According to the embodiment of the invention, the method further comprises the following steps:
processing the code of the first YUV data by adopting a first thread to obtain coded source video data;
processing the audio information by adopting a second thread to obtain encoded audio data;
detecting whether the first thread finishes the encoding of the first YUV data or not;
if so, merging the encoded source video data and the encoded audio data in the second thread to obtain a video file.
It should be noted that the above description mainly refers to the generation process of MP4 pictures, and the MP4 file needs audio in addition to video pictures. The audio processing and the video processing in the cloud conference are both complex coding and decoding processes. In the prior art, FFmpeg provides a method for how to merge audio and video when generating MP4, that is, when creating an MP4 file, an audio stream and a video stream are created at the same time, and then data is written into each stream respectively. However, in the process of recording the MP4 in the cloud conference, the processing of audio and video is time-consuming, and two different threads need to be selected for processing. In the process of multi-thread processing, because a plurality of threads access one file stream at the same time, locking processing is necessarily required to be carried out on the stream, and in this way, copy control is carried out, so that the efficiency is also reduced. The method used by the invention is that one thread is taken for processing the audio, the other thread is taken for H264 coding of the video, because the video processing is generally time-consuming compared with the audio processing, the stream creation and the audio merging write MP4 file are in the same thread, so that the video coding needs to be reversed to the audio thread, a queue needs to be designed for the video in the audio thread, the data is imported into the queue after the video coding is completed each time, the video queue is checked when the data is written after the audio coding is completed each time, if the data exists, the video data is written into the video stream of the MP4, and the audio data is written into the audio stream of the MP4, so that the whole process only needs to simply lock the video data queue in the audio thread, and the high-efficiency audio and video coding merging is achieved.
Fig. 4 shows a block diagram of a multi-layout cloud conference recording system according to the present invention.
The second aspect of the present invention further provides a multi-layout cloud conference recording system 4, which includes a memory 41 and a processor 42, where the memory includes a multi-layout cloud conference recording method program, and when executed by the processor, the multi-layout cloud conference recording method program implements the following steps:
acquiring a plurality of source video data;
decoding the source video data to obtain YUV data corresponding to each source video data;
synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data;
encoding the first YUV data to obtain encoded source video data;
and generating a video file according to the coded source video data.
It should be noted that, the various embodiments provided in the present invention are for better explaining the invention point of the present invention and solving how to generate a playable recording playback file under such a scenario, and the MP4 file is taken as an example but not limited to the MP4 file.
It should be noted that the source video data may be one or more of video data, desktop data, document data, whiteboard data, and annotation data.
Fig. 2 shows a schematic diagram of the different sources of the invention after synthesis by layout. The data unit constituting the MP4 picture is a frame of H264 data, and in the case of a plurality of layout windows, the frame of H264 data is merged with all source picture information in the current layout, which are distributed in order of positions divided by the layout if displayed as a frame of picture.
A frame of H264 data is not simply padded per source H264 data by position, but is obtained by a complex decoding and encoding process. The H264 data can be obtained by YUV data coding, and the YUV data can also be obtained by H264 data decoding. The specific YUV type corresponds to a fixed data storage format, and therefore. To obtain the customized H264 data frame, the customized YUV data is needed. The storage method of YUV420P is used in the invention, when storing, the space size of a fixed pixel is preset, the storage positions of 3 components of Y, U, V are calculated in the pixel space according to the storage format of YUV420P, the fixed pixel space corresponds to the total layout space of a plurality of sources to be synthesized, and the layout space is collectively called YUV data layout or YUV layout later. When H264 data of each source comes in, each frame of H264 data of each source is decoded into YUV data, then the YUV data is mapped into a YUV layout according to the window position of the source in the layout, each YUV component data is copied to the mapped position, therefore, the YUV distribution of a plurality of sources in a YUV layout space is obtained as the result of the mapping of a plurality of sources, namely, the YUV of the plurality of sources synthesizes a large YUV, namely, first YUV data, and finally the first YUV data is coded into a frame of H264 data to be output, and the obtained picture is the total picture of the pictures of the plurality of sources distributed in a large layout window.
According to the embodiment of the present invention, the synthesizing of the plurality of YUV data according to the preset layout rule specifically includes:
acquiring a serial number of a preset layout rule;
searching a preset layout rule according to the serial number of the layout rule;
determining the number of windows and coordinate data information corresponding to each window according to the preset layout rule;
determining YUV data corresponding to each source video data in each window;
and synthesizing the YUV data corresponding to each source video data according to the position determined by the coordinate data information of each window to obtain first YUV data.
It should be noted that the H264 data of the previous frame is a plurality of small windows distributed and displayed by a plurality of sources according to a specific layout form, and the distribution of the windows is changed and can be adjusted at any time in the cloud conference process, so that each frame of H264 data to be generated is adjusted and displayed according to the latest layout change in response to the latest layout change in the recording process of the cloud conference. The layout form is defined by the client, and is transmitted to the server end according to a specific protocol in the form of a message, and the server end only needs to understand what layout the client transmits according to the convention, and then controls the generation of each frame data according to the change of the layout.
Fig. 3 shows a schematic diagram of the layout rules of the present invention.
As shown in FIG. 3, first the server knows all possible layouts that have been defined, takes a layout name for each layout, and agrees with the client the window ID of each layout window. The server calculates the relative position of each small layout window relative to the total large layout window according to the window presentation form, the relative positions are recorded by the relative coordinates and relative width and height of the starting point, and each value represents the proportion of the total layout width and height, namely the value is obtained by multiplying the width and height of the total layout window when the real coordinates and the real width and height are finally calculated. Such as a uniformly distributed 4-window average4 layout, the first of its small windows has a starting coordinate of (0, 0), a relative width of 0.5, and a relative height of 0.5. The second small window has a starting coordinate of (0.5, 0), a relative width of 0.5 and a relative height of 0.5. Also the starting coordinates of the third small window are (0, 0.5) and the starting coordinates of the fourth small window are (0.5 ), since this is an equally divided layout, the relative width and height are both 0.5. If the total layout has a window size of 1280 × 720, i.e., a total width of 1280 and a total height of 720, the true starting coordinates of the 3 rd window are (0, 720 × 0.5= 360), the widths of the 4 windows are all 1280 × 0.5=640, and the heights are all 720 × 0.5= 360. Of course, the absolute value of each layout window may be configured, and the relative value is configured instead of the absolute value because the size of the total layout window can be adjusted at any time, for example, a large layout configured with 1280 × 720 may be changed to a large layout configured with 1920 × 1080, which is adjusted at any time as required. Each window of the layout is configured in a configuration file which can be read by a service after being artificially calculated, the configurations are loaded into a memory of a program when the service is started, the uploaded layout name is analyzed when the layout information uploaded by a client is received, then the relative coordinate of the corresponding window is found in the loaded configuration file corresponding to the layout name, and the absolute coordinate is obtained through calculation.
After the position of each window of the layout is determined, YUV data of each source is mapped to the whole large layout window in the program memory, the large layout window is uniformly called YUV data layout, which can be abbreviated as YUV layout, and is intermediate data for bearing all source data and determining the finally output picture form. The method is characterized in that the method is carried out when a client sends layout information when the layout of which source corresponds to which small window is changed, once the layout is stable, the window corresponding to the source is fixed, the display position is also fixed, and the display position of the source can be recalculated and adjusted along with the change of the layout only when the layout is changed again.
According to the embodiment of the invention, the method further comprises the following steps:
judging whether a preset layout rule is acquired before a plurality of source video data;
if not, determining key frame data of the plurality of source video data and a first layout rule of the corresponding source video data;
and synthesizing the YUV data corresponding to each source video data according to the first layout rule, and storing the YUV data until a preset layout rule is obtained.
It should be noted that video and desktop are the two most important sources in the source data, and in the case of good layout, if the data of a complete frame of H264 video and desktop come, it is only necessary to decode the data into YUV and then perform registration in the overall YUV large layout. If the H264 data is split into a plurality of data packets, framing operation is needed, the incoming packets are reassembled into a frame of H264 data, then YUV decoding operation is carried out, if the incoming packets are JPEG data, the JPEG data are converted into YUV data by calling an interface provided by FFMPEG, if other data in other coding modes also need YUV conversion, and the converted frame of YUV data is a synthesizing unit in a YUV total layout. Video and desktop data also have some layout-related exception handling operations. The layout is a place where the data live, and if the data can not find the position of the data storage before the data comes without the layout, the layout information must come before the data comes without guarantee, and the lost data is selected to wait for the layout. In network transmission, the loss of messages or the sequence of messages and data may not be guaranteed at all, and some abnormal situations need to be handled. If a key frame comes but a layout message is lost or a layout message does not come yet, the data driver actively creates the layout of the source as a temporary storage place and waits for the temporary layout to be covered when the layout comes. Since a key frame is required as a precondition when decoding and encoding a frame of data, all non-key frames before the next key frame need to be discarded when the first frame of data that appears is not a key frame.
According to the embodiment of the invention, the method further comprises the following steps:
detecting whether the plurality of source video data are one or more of documents, white boards and annotations;
if yes, generating image data from the source video data according to a time sequence;
and converting the image data into corresponding YUV data.
And (5) a document. In the cloud conference process, a document is uploaded to a cloud end and printed into a JPEG picture for storage, the layout of each page of the document is changed once, which page of the document is currently turned is provided in the layout, the corresponding JPEG picture is read from a disk through the page of document information, the display proportion of the current document page is informed in the layout information because the document is generally provided with a scroll bar, then the JPEG picture read into an internal memory is intercepted according to the proportion and converted into YUV data, and then the YUV data is mapped into a layout window of the YUV data.
A white board. The white board is specific point-line surface data in the cloud conference process, the white board data received by the server are structures for recording white board information, and the information comprises the size, shape, coordinates, colors and the like of the point-line surface. Without such whiteboard information, a whiteboard YUV data with a fixed size and white background color, i.e., a blank whiteboard, can be created. In the process of drawing the cloud conference whiteboard, each stroke of each pen is transmitted to the server in real time. Each data is accumulatively stored in the recording server, when each whiteboard information is transmitted to the server, the whiteboard is triggered to be redrawn, each time of drawing is to generate a new whiteboard YUV by all the current existing information and a blank whiteboard again, and then the whiteboard YUV is mapped into a layout window. The white board has pages, the white board information of each page is stored in the memory, when the page turning information is received, all the white board information of the page is taken out to regenerate YUV data, and then the YUV data is mapped to the data covering the previous page in the layout.
And (6) annotating. The content of the annotation is the content of the whiteboard, and the annotation is different from the whiteboard in that the background drawn by the annotation is not a blank whiteboard, but is a background image with the content. For example, the annotation of the desktop is a background of a frame of desktop image selected at the annotation time, and the annotation of the document is a background of the annotation selected from the current document page. So its background YUV is acquired in advance when generating annotation YUV. If the desktop annotation is generated, the current frame of desktop YUV data needs to be taken out, but the YUV data in the layout is synthesized YUV and cannot be taken out, so that the frame of YUV desktop data can only be used when the frame of YUV desktop data is stored to be annotated before the desktop data is mapped to the layout. Similarly, when the document is converted into YUV after the JPEG picture is loaded, one frame of YUV needs to be saved. Therefore, when the annotation YUV is generated, the background YUV data is taken out and redrawn with the whiteboard dot-line-plane information to synthesize a new YUV, and then the new YUV is mapped to the overall layout to replace the position of the original annotation background.
According to the embodiment of the present invention, the synthesizing the plurality of YUV data according to a preset layout rule further includes:
calculating the time of the current first YUV data and the time interval of last generation of the first YUV data;
judging whether the time interval is greater than the minimum generation time or not;
if the minimum generation time is longer than the minimum generation time, triggering new first YUV data to be coded, and obtaining coded source video data;
and if the minimum generation time is shorter than the minimum generation time, storing the YUV data corresponding to each source video data.
It should be noted that the method for determining the minimum time specifically includes:
acquiring the frame rate of each source video data;
and determining the time corresponding to the maximum frame rate as the minimum time.
And synthesizing the frame rate. In the cloud recording process, a plurality of H264 files need to be arranged on a time axis at a certain frame rate, so that a fixed mode is needed to continuously generate H264 data on the time axis. One frame of H264 data is synthesized at intervals, which is the synthesis frame rate.
The synthesis frame rate refers to the frequency at which YUV in the layout is synthesized into a frame of H264 data. If calculated at a frame rate of 10 frames/second, the above YUV layout data requires 100 milliseconds to generate one frame of H264 data. From the concept and characteristics of the composite frame rate, the first idea is to set a timer, that is, configure the frame rate, and then calculate how many milliseconds are needed to generate a frame at the configured frame rate, and the timer is used to time the frame.
For a cloud conference with multiple sources and different frame rates of each source, the timer method has some defects and can be used selectively. For example, the source data is 15 frames/second video data, if a frame rate of 10 frames/second is set to generate H264 data, the video data will be lost 5 frames per second, and the 5 frames are lost because the time to generate H264 frames is not yet reached, so the data stored in the layout will be overwritten by the new mapping. If 15 frames per second are set, other sources below this frame rate are forced to perform this composite frame rate, such as a whiteboard, which may be one frame per second for a certain period of time, and if it is forced to perform 15 frames per second, this frame data of the whiteboard has to be reused 15 times. For video data, 15 frames per second and 66 milliseconds per frame, which is an average, are not strictly one frame every 66 milliseconds, but may be 50, 60 milliseconds and may be 70, 80 milliseconds, so that the video faces the problem of setting 50, 60 milliseconds or 70, 80 milliseconds, among other sources. It is difficult to set a proper timing value using the timer. When setting the frame rate, if the frame rate is set according to the source with the highest frame rate, the YUV data of the source with the low frame rate in the layout can be continuously reused, one is to increase the encoding time and reduce the encoding efficiency, and the other is to increase the size of the file and occupy more storage space. If the set frame rate is lower than the frame rate of the source data, the source data may be lost.
The invention mainly uses the method of generating H264 data by a data driving mode without setting a timer. Setting the minimum time for generating a frame of data through a frame rate, namely setting all sources in the current layout, calculating the difference between the time of the current data and the time of the last generation every time when the data are mapped into the layout, triggering the generation of a new frame of H264 data if the difference is greater than the minimum generation time, and storing the new frame of H264 data into a YUV layout to wait for the arrival of the next data if the difference is less than the minimum generation time. In the cloud conference process, since the frame rate of each type of source is different, for example, the frame rate of the video is 15 frames per second, the frame rate of the desktop is 8 frames per second, and the frame rate of the document whiteboard is not high, in this case, it is appropriate to use the data driving method as the composite frame rate. With this method, the minimum time interval in the synthesis is set to the time occupied by one frame of the source with the maximum frame rate, for example, 66 ms in terms of the frame rate of 15 frames per second of the video. Thus, when video is available in the source, each time video data enters, the composition of the H264 frame is triggered as long as the time interval between two adjacent entries is calculated to be not less than 66 milliseconds. Since the minimum lower limit is set to the time required by the source with the maximum frame rate for one frame, the data of other sources with lower frame rates are transmitted with a time interval substantially larger than this time interval, so that the synthesis of the H264 data can be triggered.
According to the embodiment of the invention, the minimum frame rate of the encoded source video data is preset.
When a plurality of sources are collocated in a layout, the composite frame rate is the frame rate of the source with the highest frame rate, for example, when video, desktop, document and whiteboard exist, the composite frame rate is the frame rate of the video, when no video exists, the composite frame rate is determined by the frame rate of the desktop, and when no desktop exists, the document and the whiteboard are composited according to the unfixed frame rate. When only a document and a whiteboard exist, in order to avoid the compatibility problem of playing of some players caused by too small frame rate, a minimum composition frame rate can be set, if the minimum composition frame rate is set to 3 frames per second, when document whiteboard data comes, the last composition time needs to be compared, and a frame supplementing operation is performed according to a time difference, if the data comes after 1 second, 2 frames need to be supplemented additionally, the two supplemented frame data are data in a YUV layout repeatedly, that is, before the current data is synthesized and output, the previous YUV layout data are used first, time stamps corresponding to the two frame data on a time axis are set, and H264 data are synthesized and output by the data with different time stamps.
According to the embodiment of the invention, the method further comprises the following steps:
acquiring recording start time and recording end time;
generating a time space of a preset unit according to the starting time and the ending time;
forming a mapping relation between the time space and the corresponding data space;
and generating the coded source video data through the mapping relation.
It should be noted that, the MP4 generated by recording the cloud conference needs to implement synchronous playing among multiple sources, which requires setting an accurate play time stamp PTS in generating each frame H264, and the PTS of each frame H264 data is calculated by using the millisecond time of the server receiving each data as a reference in the present invention. The time of recording the message of starting recording is taken as the starting time, the time of stopping recording or ending the conference is taken as the ending time, the time space from the starting time to the ending time is mapped into a simple data space, and the time of all the following data arriving at the server is also mapped into the data space according to the time interval distribution. The preset unit can be set by a person skilled in the art according to actual needs, and for example, the unit is millisecond. If the time space is (1599469826, 15994698863000), 15994698863000 and 159946983000 =600000, and the recording is mapped to the data space (0, 60000) or (1, 60001), then when there is a pause in the middle of recording, the pause time length is subtracted from the simple data space, if there is a half-minute pause in the middle of recording, half-minute =30000 ms, and 60001 and 30000=30001, the data space is compressed to (1, 30001), and the time of the data coming after the half-minute pause is also compressed to 30000 accordingly. Thus the PTS of each data corresponds to the time of arrival at the server, and multiple sources arrive within a frame time interval, then the composition is displayed simultaneously, i.e., each frame seen by the resulting MP4 as it plays is the data that arrives within that frame time. The data of a plurality of sources are synchronously displayed by taking the time of receiving the data by the server as a standard, so that the synchronous playing among videos in the multi-source layout is realized.
According to the embodiment of the invention, the method further comprises the following steps:
processing the code of the first YUV data by adopting a first thread to obtain coded source video data;
processing the audio information by adopting a second thread to obtain encoded audio data;
detecting whether the first thread finishes the encoding of the first YUV data or not;
if so, merging the encoded source video data and the encoded audio data in the second thread to obtain a video file.
It should be noted that the above description mainly refers to the generation process of MP4 pictures, and the MP4 file needs audio in addition to video pictures. The audio processing and the video processing in the cloud conference are both complex coding and decoding processes. In the prior art, FFmpeg provides a method for how to merge audio and video when generating MP4, that is, when creating an MP4 file, an audio stream and a video stream are created at the same time, and then data is written into each stream respectively. However, in the process of recording the MP4 in the cloud conference, the processing of audio and video is time-consuming, and two different threads need to be selected for processing. In the process of multi-thread processing, because a plurality of threads access one file stream at the same time, locking processing is necessarily required to be carried out on the stream, and in this way, copy control is carried out, so that the efficiency is also reduced. The method used by the invention is that one thread is taken for processing the audio, the other thread is taken for H264 coding of the video, because the video processing is generally time-consuming compared with the audio processing, the stream creation and the audio merging write MP4 file are in the same thread, so that the video coding needs to be reversed to the audio thread, a queue needs to be designed for the video in the audio thread, the data is imported into the queue after the video coding is completed each time, the video queue is checked when the data is written after the audio coding is completed each time, if the data exists, the video data is written into the video stream of the MP4, and the audio data is written into the audio stream of the MP4, so that the whole process only needs to simply lock the video data queue in the audio thread, and the high-efficiency audio and video coding merging is achieved.
A third aspect of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a multi-layout cloud conference recording method program, and when the multi-layout cloud conference recording method program is executed by a processor, the steps of the multi-layout cloud conference recording method described in any one of the above are implemented.
According to the multi-layout cloud conference recording method, the multi-layout cloud conference recording system and the readable storage medium, provided by the invention, YUV conversion is carried out on different source video data, and then coding is carried out according to the layout rule to generate a final video file, so that the video synthesis speed can be increased, and the generation process can be simplified. The method can be applied to the situations that the number of videos is large, various layout changes are required to be adapted, or other more complex situations.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (6)

1. A multi-layout cloud conference recording method is characterized by comprising the following steps:
acquiring a plurality of source video data, wherein the source video data can be one or more of video data, desktop data, document data, whiteboard data and annotation data;
decoding the source video data to obtain YUV data corresponding to each source video data;
synthesizing the multiple YUV data according to a preset layout rule to obtain first YUV data;
encoding the first YUV data to obtain encoded source video data;
generating a video file according to the coded source video data;
the synthesizing the plurality of YUV data according to a preset layout rule further includes:
calculating the time of the current first YUV data and the time interval of last generation of the first YUV data;
judging whether the time interval is greater than the minimum generation time or not;
if the minimum generation time is longer than the minimum generation time, triggering new first YUV data to be coded, and obtaining coded source video data;
if the minimum generation time is shorter than the minimum generation time, storing YUV data corresponding to each source video data;
the method for determining the minimum generation time comprises the following steps:
acquiring the frame rate of each source video data;
determining the time corresponding to the maximum frame rate as the minimum time;
when the source video data only comprise document data and whiteboard data, triggering new first YUV data to encode according to a set minimum composite frame rate;
the method further comprises the following steps:
acquiring recording start time and recording end time;
generating a time space of a preset unit according to the starting time and the ending time;
forming a mapping relation between the time space and the corresponding data space;
generating encoded source video data through the mapping relation;
the method further comprises the following steps:
processing the code of the first YUV data by adopting a first thread to obtain coded source video data;
processing the audio information by adopting a second thread to obtain encoded audio data;
detecting whether the first thread finishes the encoding of the first YUV data or not;
if so, merging the encoded source video data and the encoded audio data in the second thread to obtain a video file.
2. The method for recording the multi-layout cloud conference according to claim 1, wherein the synthesizing the plurality of YUV data according to a preset layout rule specifically comprises:
acquiring a serial number of a preset layout rule;
searching a preset layout rule according to the serial number of the layout rule;
determining the number of windows and coordinate data information corresponding to each window according to the preset layout rule;
determining YUV data corresponding to each source video data of each window;
and synthesizing the YUV data corresponding to each source video data according to the position determined by the coordinate data information of each window to obtain first YUV data.
3. The method for recording the multi-layout cloud conference according to claim 1, further comprising:
judging whether a preset layout rule is acquired before a plurality of source video data;
if not, determining key frame data of the plurality of source video data and a first layout rule of the corresponding source video data;
and synthesizing the YUV data corresponding to each source video data according to the first layout rule, and storing the YUV data until a preset layout rule is obtained.
4. The method for recording the multi-layout cloud conference according to claim 1, further comprising:
detecting whether the plurality of source video data are one or more of documents, white boards and annotations;
if yes, generating image data from the source video data according to a time sequence;
and converting the image data into corresponding YUV data.
5. A multi-layout cloud conference recording system, comprising a memory and a processor, wherein the memory includes a multi-layout cloud conference recording method program, and when the multi-layout cloud conference recording method program is executed by the processor, the steps of the multi-layout cloud conference recording method according to any one of claims 1 to 4 are implemented.
6. A computer-readable storage medium, wherein the computer-readable storage medium includes a multi-layout cloud conference recording method program, and when the multi-layout cloud conference recording method program is executed by a processor, the steps of the multi-layout cloud conference recording method according to any one of claims 1 to 4 are implemented.
CN202011274901.1A 2020-11-16 2020-11-16 Multi-layout cloud conference recording method and system and readable storage medium Active CN112073810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011274901.1A CN112073810B (en) 2020-11-16 2020-11-16 Multi-layout cloud conference recording method and system and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011274901.1A CN112073810B (en) 2020-11-16 2020-11-16 Multi-layout cloud conference recording method and system and readable storage medium

Publications (2)

Publication Number Publication Date
CN112073810A CN112073810A (en) 2020-12-11
CN112073810B true CN112073810B (en) 2021-02-02

Family

ID=73656032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011274901.1A Active CN112073810B (en) 2020-11-16 2020-11-16 Multi-layout cloud conference recording method and system and readable storage medium

Country Status (1)

Country Link
CN (1) CN112073810B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112689119B (en) * 2021-03-11 2021-06-18 全时云商务服务股份有限公司 Processing method and device for screen combination of recorded videos in cloud conference
CN113259621B (en) * 2021-07-15 2021-10-15 全时云商务服务股份有限公司 Cloud conference step-by-step recording method and system
CN115879423A (en) * 2021-09-29 2023-03-31 中兴通讯股份有限公司 Data processing method, apparatus, computer-readable storage medium, and program product
CN114615457B (en) * 2022-05-10 2022-08-16 全时云商务服务股份有限公司 Method and device for smooth switching of real-time screen-closing layout in cloud conference

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101742221A (en) * 2009-11-09 2010-06-16 中兴通讯股份有限公司 Method and device for synthesizing multiple pictures in video conference system
CN101741587A (en) * 2008-11-18 2010-06-16 中兴通讯股份有限公司 Multimedia terminal playing PPT and method for playing PPT thereon
CN102547213A (en) * 2011-12-23 2012-07-04 南京超然科技有限公司 Video imaging preview method for video conference system
CN102577369A (en) * 2009-03-10 2012-07-11 思科系统国际公司 Interface unit between video conferencing codec and interactive whiteboard
CN102572368A (en) * 2010-12-16 2012-07-11 中兴通讯股份有限公司 Processing method and system of distributed video and multipoint control unit
CN104301657A (en) * 2013-07-19 2015-01-21 中兴通讯股份有限公司 Conference television terminal and auxiliary flow data access method thereof
CN107241598A (en) * 2017-06-29 2017-10-10 贵州电网有限责任公司 A kind of GPU coding/decoding methods for multichannel h.264 video conference
WO2017173953A1 (en) * 2016-04-08 2017-10-12 中兴通讯股份有限公司 Server, conference terminal, and cloud conference processing method
CN107888953A (en) * 2016-09-29 2018-04-06 上海禾鸟电子科技有限公司 A kind of implementation method of new live broadcast system
CN108322691A (en) * 2018-02-06 2018-07-24 中兴通讯股份有限公司 Video meeting implementing method, device and system, computer readable storage medium
CN111741324A (en) * 2020-07-03 2020-10-02 全时云商务服务股份有限公司 Recording playback method and device and electronic equipment
CN111755017A (en) * 2020-07-06 2020-10-09 全时云商务服务股份有限公司 Audio recording method and device for cloud conference, server and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10347300B2 (en) * 2017-03-01 2019-07-09 International Business Machines Corporation Correlation of recorded video presentations and associated slides
US10630738B1 (en) * 2018-09-28 2020-04-21 Ringcentral, Inc. Method and system for sharing annotated conferencing content among conference participants
CN109168076B (en) * 2018-11-02 2021-03-19 北京字节跳动网络技术有限公司 Online course recording method, device, server and medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741587A (en) * 2008-11-18 2010-06-16 中兴通讯股份有限公司 Multimedia terminal playing PPT and method for playing PPT thereon
CN102577369A (en) * 2009-03-10 2012-07-11 思科系统国际公司 Interface unit between video conferencing codec and interactive whiteboard
CN101742221A (en) * 2009-11-09 2010-06-16 中兴通讯股份有限公司 Method and device for synthesizing multiple pictures in video conference system
CN102572368A (en) * 2010-12-16 2012-07-11 中兴通讯股份有限公司 Processing method and system of distributed video and multipoint control unit
CN102547213A (en) * 2011-12-23 2012-07-04 南京超然科技有限公司 Video imaging preview method for video conference system
CN104301657A (en) * 2013-07-19 2015-01-21 中兴通讯股份有限公司 Conference television terminal and auxiliary flow data access method thereof
WO2017173953A1 (en) * 2016-04-08 2017-10-12 中兴通讯股份有限公司 Server, conference terminal, and cloud conference processing method
CN107888953A (en) * 2016-09-29 2018-04-06 上海禾鸟电子科技有限公司 A kind of implementation method of new live broadcast system
CN107241598A (en) * 2017-06-29 2017-10-10 贵州电网有限责任公司 A kind of GPU coding/decoding methods for multichannel h.264 video conference
CN108322691A (en) * 2018-02-06 2018-07-24 中兴通讯股份有限公司 Video meeting implementing method, device and system, computer readable storage medium
CN111741324A (en) * 2020-07-03 2020-10-02 全时云商务服务股份有限公司 Recording playback method and device and electronic equipment
CN111755017A (en) * 2020-07-06 2020-10-09 全时云商务服务股份有限公司 Audio recording method and device for cloud conference, server and storage medium

Also Published As

Publication number Publication date
CN112073810A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN112073810B (en) Multi-layout cloud conference recording method and system and readable storage medium
US6989868B2 (en) Method of converting format of encoded video data and apparatus therefor
US6559846B1 (en) System and process for viewing panoramic video
US8270493B2 (en) Capture, editing and encoding of motion pictures encoded with repeating fields or frames
EP3151548A1 (en) Video recording method and device
US20100033484A1 (en) Personal-oriented multimedia studio platform apparatus and method for authorization 3d content
CN111641838A (en) Browser video playing method and device and computer storage medium
JPH11187398A (en) Encoding and decoding system
JPH11243542A (en) Multimedia information editing device
CN109547724B (en) Video stream data processing method, electronic equipment and storage device
CN112073543B (en) Cloud video recording method and system and readable storage medium
CN111899322A (en) Video processing method, animation rendering SDK, device and computer storage medium
CN112087642B (en) Cloud guide playing method, cloud guide server and remote management terminal
WO2022225750A1 (en) Augmented reality video stream synchronization
WO2022021519A1 (en) Video decoding method, system and device and computer-readable storage medium
CN110049347B (en) Method, system, terminal and device for configuring images on live interface
CN113490047A (en) Android audio and video playing method
KR20120019872A (en) A apparatus generating interpolated frames
US20230025664A1 (en) Data processing method and apparatus for immersive media, and computer-readable storage medium
CN114531528B (en) Method for video processing and image processing apparatus
US20230239422A1 (en) A method for coding space information in continuous dynamic images
CN112511768B (en) Multi-picture synthesis method, device, equipment and storage medium
US10893229B1 (en) Dynamic pixel rate-based video
CN110798715A (en) Video playing method and system based on image string
JP2016103808A (en) Image processing device, image processing method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 100010 room 203-35, 2 / F, building 2, No.1 and 3, Qinglong Hutong, Dongcheng District, Beijing

Patentee after: G-NET CLOUD SERVICE Co.,Ltd.

Address before: 100102 room 1102, 9th floor, Penghuan international building, building 4, yard 1, Shangdi East Road, Haidian District, Beijing

Patentee before: G-NET CLOUD SERVICE Co.,Ltd.