CN114827663B

CN114827663B - Distributed live broadcast frame inserting system and method

Info

Publication number: CN114827663B
Application number: CN202210381421.8A
Authority: CN
Inventors: 朱侠
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2022-04-12
Filing date: 2022-04-12
Publication date: 2023-11-21
Anticipated expiration: 2042-04-12
Also published as: CN114827663A

Abstract

The invention discloses a distributed live broadcast frame inserting system and a distributed live broadcast frame inserting method. Comprising the following steps: a decoder, an encoder, and a plurality of AI engines respectively disposed on different hosts; the decoder decodes the video stream, stores the video frames in a database, and stores the audio data and the video frame metadata in a message queue; each AI engine acquires video frame metadata from the message queue, pulls corresponding video frames from the database, generates intermediate frames and inserted frame metadata, stores the intermediate frames into the database, and stores the inserted frame metadata into the message queue; the encoder acquires the audio data and the frame inserting metadata from the message queue, pulls the corresponding video frames and the intermediate frames from the database, performs merging encoding on the acquired data, and outputs a target video stream. According to the invention, the distributed-deployment multiple AI engines are utilized to execute the frame inserting task of the high-quality video, so that the requirement on the number of frames inserted in a live broadcast scene is met, the live broadcast frame inserting efficiency is improved, and the fluency of the live broadcast video is improved.

Description

Distributed live broadcast frame inserting system and method

Technical Field

The invention relates to the technical field of video processing, in particular to a distributed live broadcast frame inserting system and method.

Background

The prior art of inserting frames mainly relies on an AI engine to process adjacent frames so as to create inserting frames between the adjacent frames, and further improve the fluency of video playing by improving the number of video playing frames. Current frame insertion technology for live broadcasting is often only used for relatively smooth scenes and slow motion, but for scenes with severe motion, such as: sports games are often difficult to improve significantly. The common requirement in the live broadcast scene is to promote 25 frames/S video to 50 frames/S, and current AI engine has higher demand on calculation power, and the efficiency of common AI engine is only 5 frames/S in the stand-alone working mode, is difficult to satisfy the requirement on the number of inserted frames in the live broadcast scene, and can not provide support for the live broadcast scene of motion.

The foregoing is provided merely for the purpose of facilitating understanding of the technical scheme of the present invention and is not intended to represent an admission that the foregoing is related art.

Disclosure of Invention

The invention mainly aims to provide a distributed live broadcast frame inserting system and method, which aim to solve the technical problems that the efficiency of inserting frames in a common live broadcast frame inserting mode is low and the requirement on the number of inserting frames in a live broadcast scene cannot be met.

In order to achieve the above object, the present invention provides a distributed live broadcast frame insertion system, including: a decoder, an encoder, and a plurality of AI engines respectively disposed on different hosts;

The decoder is used for decoding the acquired video stream to obtain audio data, video frames and video frame metadata, storing the video frames into a target database, and storing the audio data and the video frame metadata into a message queue;

each AI engine is configured to obtain video frame metadata decoded by the decoder from the message queue, pull a video frame corresponding to the video frame metadata from the target database, generate an intermediate frame according to the pulled video frame, generate frame inserting metadata corresponding to the intermediate frame, store the intermediate frame into the target database, and store the frame inserting metadata into the message queue;

the encoder is used for acquiring the audio data decoded by the decoder and the frame inserting metadata generated by the AI engine from the message queue, pulling the video frames and the intermediate frames corresponding to the frame inserting metadata from the target database, merging and encoding the video frames, the intermediate frames and the audio data, and then outputting a target video stream.

Optionally, the encoder is further configured to delete the intermediate frame in the target database when the video frame and the intermediate frame corresponding to the inserted frame metadata are obtained through local buffering, record a first frame number corresponding to the intermediate frame in a preset shaping array, search a previous intermediate frame number corresponding to the first frame number forward in the preset shaping array based on the first frame number, and delete the video frame in the target database if the previous intermediate frame number corresponding to the first frame number is found in the preset shaping array.

Optionally, the encoder is further configured to search for a next intermediate frame number corresponding to the first frame number backwards in the preset shaping array based on the first frame number, and if the search for the next intermediate frame number corresponding to the first frame number is found, delete a next video frame corresponding to the intermediate frame in the target database.

Optionally, the encoder is further configured to delete, after outputting the target video stream, a video frame and an intermediate frame in the target database, where the frame number is smaller than the first frame number.

Optionally, the decoder is further configured to number the video frame obtained by decoding, generate a second frame number corresponding to the video frame, generate video frame metadata corresponding to the video frame according to the second frame number, store the video frame including the second frame number into the target database, and store the audio data and the video frame metadata into a message queue;

each AI engine is further configured to determine a third frame number corresponding to the pulled video frame, generate a fourth frame number corresponding to the intermediate frame according to the third frame number, generate, according to the third frame number and the fourth frame number, insert frame metadata corresponding to the intermediate frame, store the intermediate frame including the fourth frame number into the target database, and store the insert frame metadata into the message queue;

The encoder is further configured to extract a fifth frame number and a sixth frame number from the obtained inserted frame metadata, and pull a video frame and an intermediate frame corresponding to the fifth frame number and the sixth frame number from the target database.

Optionally, the decoder is further configured to number the video frame obtained by decoding, generate a first odd frame number corresponding to the video frame, and generate video frame metadata corresponding to the video frame according to the first odd frame number;

each AI engine is further configured to determine a second odd frame number corresponding to the pulled video frame, generate a first even frame number corresponding to the intermediate frame according to an even number adjacent to the second odd frame number, and generate the inserted frame metadata corresponding to the intermediate frame according to the second odd frame number and the first even frame number.

Optionally, the decoder is further configured to code the video frames obtained by decoding, generate a second even frame code corresponding to the video frames, and generate video frame metadata corresponding to the video frames according to the second even frame code;

and each AI engine is further configured to determine a third even frame number corresponding to the pulled video frame, generate a third odd frame number corresponding to the intermediate frame according to an odd number adjacent to the third even frame number, and generate the inserted frame metadata corresponding to the intermediate frame according to the third even frame number and the third odd frame number.

Optionally, the encoder is further configured to obtain a video absolute time and an audio absolute time, determine whether the video absolute time is greater than the audio absolute time, obtain audio data obtained by decoding by the decoder from the message queue when the video absolute time is greater than the audio absolute time, and update the audio absolute time according to a display timestamp corresponding to the audio data.

Optionally, when the video absolute time is smaller than the audio absolute time, the encoder is further configured to obtain, from the message queue, the frame inserting metadata generated by the AI engine, pull, according to the frame inserting metadata, a corresponding video frame and an intermediate frame from the target database, and update the video absolute time according to a display timestamp corresponding to the video frame or the intermediate frame.

In addition, in order to achieve the above object, the present invention also proposes a distributed live broadcast frame inserting method, which is applied to the distributed live broadcast frame inserting system as described above, the distributed live broadcast frame inserting system includes: a decoder, an encoder, and a plurality of AI engines respectively deployed on different hosts, the method comprising:

The decoder decodes the acquired video stream to obtain audio data, video frames and video frame metadata, stores the video frames in a target database, and stores the audio data and the video frame metadata in a message queue;

each AI engine acquires video frame metadata obtained by decoding by the decoder from the message queue, pulls a video frame corresponding to the video frame metadata from the target database, generates an intermediate frame according to the pulled video frame, generates inserting frame metadata corresponding to the intermediate frame, stores the intermediate frame into the target database, and stores the inserting frame metadata into the message queue;

the encoder acquires the audio data decoded by the decoder and the inserted frame metadata generated by the AI engine from the message queue, pulls the video frames and the intermediate frames corresponding to the inserted frame metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and then outputs a target video stream.

The distributed live broadcast frame inserting system in the invention comprises: a decoder, an encoder, and a plurality of AI engines respectively disposed on different hosts; the decoder decodes the acquired video stream to obtain audio data, video frames and video frame metadata, stores the video frames in a target database, and stores the audio data and the video frame metadata in a message queue; each AI engine acquires video frame metadata decoded by a decoder from a message queue, pulls a video frame corresponding to the video frame metadata from a target database, generates an intermediate frame according to the pulled video frame, generates inserting frame metadata corresponding to the intermediate frame, stores the intermediate frame into the target database, and stores the inserting frame metadata into the message queue; the encoder acquires the audio data decoded by the decoder and the inserted frame metadata generated by the AI engine from the message queue, pulls the video frames and the intermediate frames corresponding to the inserted frame metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and outputs the target video stream. Through the mode, the frame inserting task of the high-quality video is executed by using the distributed-deployment multiple AI engines, multi-frame inserting data can be generated in a short time, the requirement on the number of the inserted frames in a live broadcast scene is met, the live broadcast frame inserting efficiency is improved, and the smoothness of the live broadcast video is improved.

Drawings

Fig. 1 is a block diagram of a first embodiment of a distributed live insertion system according to the present invention;

FIG. 2 is a schematic diagram illustrating video frame cleaning in accordance with one embodiment of the present invention;

fig. 3 is a flowchart of a first embodiment of the distributed live insertion method of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The embodiment of the invention provides a distributed live broadcast frame inserting system, and referring to fig. 1, fig. 1 is a structural block diagram of a first embodiment of the distributed live broadcast frame inserting system.

In this embodiment, the distributed live broadcast frame inserting system includes: a decoder 10, an encoder 30, and a plurality of AI engines 20 respectively disposed on different hosts;

the decoder 10 is configured to decode the acquired video stream to obtain audio data, video frames and video frame metadata, store the video frames in a target database, and store the audio data and the video frame metadata in a message queue.

Preferably, the target database is a Key-value database, the Key value pair stores video frame data, the decoder 10 numbers the video frames obtained by decoding, generates frame numbers corresponding to the video frames, generates Key-value data according to the frame numbers and the video frames which are in one-to-one correspondence, and stores the Key-value data in the Key-value database.

It should be appreciated that, compared to the message bus, the advantage of using the Key-value database for video frame transmission in this embodiment is that: the video frames are larger (the video frame size of each frame of 1080P video is 3 megabytes), and the traditional message bus is used for transmission, so that the performance of the message bus is reduced; in addition, the characteristic of deleting the message after the message bus is pulled is existed, the message bus is used for transmission, the AI engine 20 is required to upload again after pulling the video frame, redundant flow is occupied, the Key-value database is adopted for video frame transmission, after pulling the video frame, the AI engine 20 only needs to store the newly generated intermediate frame into the database, the pulled video frame is not required to be uploaded again, and the transmission performance is better.

Accordingly, the decoder 10 generates video frame metadata based on the frame number corresponding to each video frame and a display time stamp (Presentation Time Stamp, PTS) for informing the player when the video frame is displayed, and in this embodiment, splits the video frame and metadata for transmission in different ways, wherein the metadata is used to inform the AI engine 20 which video frames are ready to be completed. Optionally, the decoder 10 generates corresponding video frame metadata for each frame of video frame, i.e. each video frame metadata generated by the decoder 10 contains only the frame number and the display timestamp of one frame of video frame, for example: { Frameno1:1, pts1:1234567}. Preferably, the decoder 10 generates video frame metadata with the frame number and display timestamp corresponding to each two adjacent frames of video frames, for example: the decoder 10 decodes to obtain an a video frame, a B video frame, and a C video frame … …, generates a video frame metadata with a frame number and a display timestamp corresponding to the a video frame and the B video frame, and generates a video frame metadata with a frame number and a display timestamp corresponding to the B video frame and the C video frame, in this way, indicates an encoding task to the AI engine 20, so that when the AI engine 20 obtains the video frame metadata, the AI engine 20 pulls two adjacent video frames according to the video frame metadata, thereby executing a corresponding frame inserting task: and generating an intermediate frame corresponding to the video frames adjacent to the two frames.

In another implementation, the decoder 10 generates the video frame metadata according to the frame insertion frequency corresponding to the frame insertion task: and determining two adjacent frame reference video frames corresponding to the frame inserting task according to the frame inserting frequency by taking the decoded first frame video frame as a starting point. The decoder 10 generates one video frame metadata according to the frame numbers and the display time stamps corresponding to the adjacent two frame reference video frames, and the decoder 10 generates the corresponding video frame metadata for each frame video frame for the video frames which do not need to be used as the frame inserting task reference. For example, the frame inserting frequency corresponding to the frame inserting task generates an intermediate frame for every 4 frames, determines that two adjacent frame reference video frames corresponding to the first frame inserting task are a fourth frame video frame and a fifth frame video frame, generates corresponding video frame metadata for the first three frame video frames respectively, and generates video frame metadata according to the frame numbers and the display time stamps corresponding to the fourth frame video frame and the fifth frame video frame.

Each AI engine 20 is configured to obtain, from the message queue, video frame metadata decoded by the decoder 10, pull, from the target database, a video frame corresponding to the video frame metadata, generate, according to the pulled video frame, an intermediate frame, generate, from the video frame, frame insertion metadata corresponding to the intermediate frame, store the intermediate frame in the target database, and store the frame insertion metadata in the message queue.

It should be noted that, taking the video frame metadata stored in the message queue as an example, the decoder 10 generates a frame number and a display timestamp corresponding to each two adjacent frames of video frames: the AI engine 20 pulls two frames of video frames from the target database once, when the frame inserting task is completed, the AI engines 20 acquire video frame metadata from the message queue according to the queue sequence, pull adjacent X video frames and Y video frames from the target database according to the frame numbers of the adjacent video frames of the video frame metadata, generate an intermediate frame according to the adjacent two frames of video frames, and generate frame numbers corresponding to the intermediate frame, in order to avoid the first frame of video frame missing by the encoder, generate frame inserting metadata according to the frame numbers and display time stamps corresponding to the previous video frame and the intermediate frame of the intermediate frame respectively, i.e., generate frame inserting metadata according to the frame numbers and display time stamps corresponding to the X video frame and the intermediate frame respectively, for example, the AI engine 20 acquires metadata { frame 1:1, frame 2:3, pts1:1234567, pts2:1234569}, wherein the frame numbers 1 and 3 correspond to the adjacent two frames of video frames, and the AI engine 20 stores metadata of { frame 1:1, frame 2:1232, and pt2:12368 }, wherein the metadata of the AI engine 20 is stored in the message queue.

It should be understood that the plurality of AI engines 20 of the present embodiment generates an intermediate frame from two adjacent frames of video frames according to the interpolation frame generation algorithm set by themselves, for example, analyzes two adjacent frames of video frames based on an optical flow method, and determines an intermediate position between the same displacement points in two frames of data, thereby generating an intermediate frame. After completing the frame insertion task, the multiple AI engines 20 generate target video frame metadata corresponding to the intermediate frames and store the target video frame metadata in the message queue, so that the encoder 30 obtains the video frames completed by the inserted frames.

The encoder 30 is configured to obtain, from the message queue, the audio data decoded by the decoder 10 and the frame inserting metadata generated by the AI engine 20, pull, from the target database, a video frame and an intermediate frame corresponding to the frame inserting metadata, and combine and encode the video frame, the intermediate frame and the audio data, and then output a target video stream.

It should be noted that, taking the video frame metadata stored in the message queue as an example, the decoder 10 generates a frame number and a display timestamp corresponding to each two adjacent frames of video frames: the plurality of AI engines 20 pull the metadata generated by the decoder 10 from the message queue, at this time, the message queue automatically deletes the metadata, after the AI engines 20 generate an intermediate frame, the AI engines generate inserted frame metadata according to the frame numbers corresponding to the previous video frame and the intermediate frame, and store the inserted frame metadata into the message queue, and the inserted frame metadata obtained by the encoder 30 from the message queue is the video frame metadata generated by the AI engines 20 after the inserted frame processing, where the video frame metadata includes the frame numbers corresponding to the video frame and the intermediate frame respectively. And extracting a frame number from the inserted frame metadata, pulling a corresponding video frame and an intermediate frame from a target database based on the extracted frame number, merging and encoding the obtained video frame, the intermediate frame and the audio data, and outputting a target video stream.

In a specific implementation, when the encoder 30 pulls the video frames and the intermediate frames from the target database and locally buffers the video frames and the intermediate frames, the data in the target database is cleaned according to the first cleaning policy, and after the target video stream is output, the data in the target database is cleaned again according to the second cleaning policy. The first cleaning strategy is to clean the currently pulled intermediate frame in the target database, judge whether the previous intermediate frame of the intermediate frame is cleaned, if so, clean the currently pulled video frame in the target database, and the second cleaning strategy is to clean the data of which the frame number is smaller than that of the currently pulled intermediate frame in the target database.

Specifically, the decoder 10 is further configured to number a video frame obtained by decoding, generate a second frame number corresponding to the video frame, generate video frame metadata corresponding to the video frame according to the second frame number, store the video frame including the second frame number into the target database, and store the audio data and the video frame metadata into a message queue;

each AI engine 20 is further configured to determine a third frame number corresponding to the pulled video frame, generate a fourth frame number corresponding to the intermediate frame according to the third frame number, generate, according to the third frame number and the fourth frame number, insert frame metadata corresponding to the intermediate frame, store the intermediate frame including the fourth frame number into the target database, and store the insert frame metadata into the message queue; the method comprises the steps of carrying out a first treatment on the surface of the

The encoder 30 is further configured to extract a fifth frame number and a sixth frame number from the obtained frame inserting metadata, and pull a video frame and an intermediate frame corresponding to the fifth frame number and the sixth frame number from the target database, respectively.

In a specific implementation, preferably, the decoder 10 generates the video frame metadata according to the frame numbers corresponding to the two adjacent frames of video frames, so that the AI engine 20 can conveniently acquire the reference video frames for performing the frame inserting task, that is, directly pull the two frames of video frames for generating the intermediate frames, and the AI engine 20 can conveniently pull the two frames of video frames for merging and encoding according to the acquired video frames and the frame numbers corresponding to the intermediate frames obtained by processing. Wherein the first-fifth frame numbers are used to distinguish one from another, respectively characterizing the frame numbers of different video frames or intermediate frames.

The distributed live broadcast frame inserting system in the embodiment comprises: a decoder 10, an encoder 30, and a plurality of AI engines 20 respectively disposed on different hosts; the decoder 10 decodes the acquired video stream to obtain audio data, video frames and video frame metadata, stores the video frames in a target database, and stores the audio data and the video frame metadata in a message queue; each AI engine 20 obtains video frame metadata decoded by the decoder 10 from the message queue, pulls video frames corresponding to the video frame metadata from the target database, generates intermediate frames according to the pulled video frames, generates inserting frame metadata corresponding to the intermediate frames, stores the intermediate frames in the target database, and stores the inserting frame metadata in the message queue; the encoder 30 obtains the audio data decoded by the decoder 10 and the frame inserting metadata generated by the AI engine 20 from the message queue, pulls the video frames and the intermediate frames corresponding to the frame inserting metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and outputs the target video stream. Through the above manner, the multiple AI engines 20 distributed and deployed are utilized to execute the frame inserting task of the high-quality video, so that multi-frame inserting data can be generated in a short time, the requirement on the number of frames inserted in a live broadcast scene is met, the live broadcast frame inserting efficiency is improved, and the smoothness of the live broadcast video is improved.

Referring to fig. 1, in a second embodiment of the distributed live broadcast frame inserting system of the present invention, the encoder 30 is further configured to delete an intermediate frame in the target database when locally buffering to obtain a video frame and an intermediate frame corresponding to the frame inserting metadata, record a first frame number corresponding to the intermediate frame in a preset shaping array, search a previous intermediate frame number corresponding to the first frame number forward in the preset shaping array based on the first frame number, and delete the video frame in the target database if the previous intermediate frame number corresponding to the first frame number is found in the preset shaping array.

It will be appreciated that the encoder 30 obtains metadata from the message queue, pulls video frames from the target database according to the metadata, buffers them locally, and performs merging encoding on the audio data, video frames and intermediate frames after buffering reaches a certain amount or a certain time to generate an output target video stream. The encoder 30 pulls the video frames and intermediate frames from the target database in the form of frame pairs, and directly deletes the intermediate frames corresponding to the frame pairs in the target database when the frame pairs are locally buffered.

It should be noted that, when the decoder 10 encodes an odd frame number for the video frame obtained by decoding and the AI engine 20 encodes an even number for the generated intermediate frame, the first frame number is an even number; when the decoder 10 encodes an even frame number for the decoded video frame and the AI engine 20 encodes an odd number for the generated intermediate frame, the first frame number is odd. The preset shaping array is a frame set in advance for recording deleted, for example, a frame whose 512 element type is shaping array record deleted is defined, and is named as frameslot [512], and when the frame Z is record deleted, it is equivalent to frameslot [ X%512] = X, (in order to save resources, it is not completely equivalent to conventionally understood frame X being deleted). Since frames outside this range are often time-out, they are cleaned up directly.

It should be understood that, because the deleted frames are recorded in the preset shaping array, and the distributed frame inserting method is implemented in this embodiment, the generating process of the intermediate frames is not necessarily performed sequentially, the video frames in this embodiment are used as the intermediate frames and the frames inserted in the previous intermediate frames to generate the reference data, when the first frame number of the intermediate frame and the frame number of the previous intermediate frame are recorded in the preset shaping array at the same time, the video frames in the target database are deleted, and the forward search is continued, and the search and the cleaning are performed on the video frames between the two cleaned adjacent intermediate frames.

The method of this embodiment will be described with reference to fig. 2, where fig. 2 is a schematic diagram illustrating video frame cleaning in an embodiment of the distributed live-broadcast frame inserting system according to the present invention, in this embodiment, the video frames correspond to odd frames, the frames corresponding to intermediate frames correspond to even frames, where the rectangles represent video frames arranged in sequence, the arrows below the rectangles are used to indicate that the corresponding frames are marked for deletion, the frame pair that has been already locally buffered is (7, 8), and according to the starting frame number 7 of the current frame pair, the even frame 8 corresponding to the current frame pair in the target database can be directly deleted, and the deleted frame number 8 is recorded as frame slot [8%512] = 8. Looking forward with the start frame number or the frame number corresponding to the intermediate frame, deleting the frame 7 in the target database and recording when the last even frame 6 is deleted by recording, and deleting the frame 5 in the target database and recording if the frame 5 is not deleted by recording when the frame 4 is deleted by recording.

The encoder 30 is further configured to search for a next intermediate frame number corresponding to the first frame number backwards in the preset shaping array based on the first frame number, and delete a next video frame corresponding to the intermediate frame in the target database if the next intermediate frame number corresponding to the first frame number is found to be recorded in the preset shaping array.

It should be understood that, when the frame following the intermediate frame is used as the frame insertion of the intermediate frame and the following intermediate frame to generate the reference data, and the first frame number of the intermediate frame and the frame number of the following intermediate frame are recorded in the preset shaping array at the same time, the following video frame of the intermediate frame in the target database is deleted. Referring to fig. 2, looking back with a frame number corresponding to a start frame number or an intermediate frame, when the next even frame 10 is record deleted, deleting the frame 9 in the target database and recording, and when the frame 12 is record deleted, if the frame 11 is not record deleted, deleting the frame 11 in the target database and recording.

In a specific implementation, the cleaning frames are executed in parallel, but there is very low probability that a parallel access abnormality occurs, so that the frame is missed to be cleaned, and in order to protect the abnormality, the embodiment adopts the mode of searching the intermediate frames before and after to clean the target database.

The encoder 30 is further configured to delete the video frames and the intermediate frames having the frame numbers smaller than the first frame number in the target database after outputting the target video stream.

It should be noted that, in this embodiment, a cleaning flow of a timeout frame is set, and after the current frame is encoded into the target video stream to be output, all frames with frame numbers smaller than the current frame in the target database are cleaned.

When the encoder 30 in this embodiment locally buffers the video frame and the intermediate frame corresponding to the inserted frame metadata, the intermediate frame in the target database is deleted, the first frame number corresponding to the intermediate frame is recorded in the preset shaping array, the previous intermediate frame number corresponding to the first frame number is searched forward in the preset shaping array based on the first frame number, and if the previous intermediate frame number corresponding to the first frame number is recorded in the preset shaping array, the video frame in the target database is deleted. By the method, the video frames and the intermediate frames which are coded and output are deleted, memory resources are saved, the utilization efficiency of the memory resources is improved, the influence of parallel access abnormality on the frame inserting efficiency in the distributed live broadcast frame inserting process is avoided, the live broadcast frame inserting efficiency is further improved, and the fluency of live broadcast video is improved.

Referring to fig. 1, in a third embodiment of the distributed live broadcast frame inserting system of the present invention, the decoder 10 is further configured to number a video frame obtained by decoding, generate a first odd frame number corresponding to the video frame, and generate video frame metadata corresponding to the video frame according to the first odd frame number;

Each AI engine 20 is further configured to determine a second odd frame number corresponding to the pulled video frame, generate a first even frame number corresponding to the intermediate frame according to an even number adjacent to the second odd frame number, and generate the inserted frame metadata corresponding to the intermediate frame according to the second odd frame number and the first even frame number.

It should be appreciated that for processing requirements, the decoder 10 in this embodiment encodes an odd frame number for each frame, for example: 1,3,5, … …, the decoder 10 generates video frame metadata from frame numbers corresponding to two adjacent frames of video frames, for example: { frameno1:1, frameno2:3, pts1:1234567, pts2:1234569} is generated from frame numbers 1 and 3 of two adjacent frames of video frames.

Note that, after generating an intermediate frame from two adjacent frames of video frames, the AI engine 20 in this embodiment generates the interpolated metadata from the odd frame number of the previous frame and the even frame number of the intermediate frame in the two adjacent frames of video frames, with the intermediate even frame number of the two odd frame numbers corresponding to the two adjacent frames of video frames respectively being the frame number of the intermediate frame. For example, the AI engine 20 obtains video frame metadata { frame no1:1, frame no2:3, pts1:1234567, pts2:1234569} from the message queue, stores 2 as the even frame number of the generated intermediate frame { frame no1:1, frame no2:2, pts1:1234567, pts2:1234568}, into the message queue. In this embodiment, the video frames in the original video stream are distinguished from the intermediate frames generated by the AI engine 20 by odd numbers and even numbers, so that the encoder 30 can conveniently distinguish and encode the data, further improve the data processing efficiency, and improve the fluency of the live video.

Referring to fig. 1, in a fourth embodiment of the distributed live broadcast frame inserting system of the present invention, the decoder 10 is further configured to code the video frames obtained by decoding, generate a second even frame code corresponding to the video frames, and generate video frame metadata corresponding to the video frames according to the second even frame code;

each AI engine 20 is further configured to determine a third even frame number corresponding to the pulled video frame, generate a third odd frame number corresponding to the intermediate frame according to an odd number adjacent to the third even frame number, and generate the inserted frame metadata corresponding to the intermediate frame according to the third even frame number and the third odd frame number.

It should be appreciated that the decoder 10 in this embodiment encodes an even frame number for each frame, for example: 0,2,4, … …, the decoder 10 generates video frame metadata from frame numbers corresponding to two adjacent frames of video frames, for example: { frameno1:2, frameno2:4, pts1:1234568, pts2:1234570} are generated from frame numbers 2 and 4 of two adjacent frames of video frames.

Note that, after generating an intermediate frame from two adjacent frames of video frames, the AI engine 20 in this embodiment generates the interpolated metadata from the even frame number of the previous video frame and the odd frame number of the intermediate frame in the two adjacent frames of video frames, with the intermediate odd frame number of the two even frame numbers corresponding to the two adjacent frames of video frames respectively being the frame number of the intermediate frame. For example, the AI engine 20 obtains video frame metadata { frame no1:2, frame no2:4, pts1:1234568, pts2:1234570} from the message queue, stores 3 as the odd frame number of the generated intermediate frame { frame no1:2, frame no2:3, pts1:1234568, pts2:1234569}, into the message queue. In this embodiment, the even number and the odd number are used to distinguish the video frames in the original video stream from the intermediate frames generated by the AI engine 20, so that the encoder 30 can conveniently distinguish and encode the data, further improve the data processing efficiency, and improve the fluency of the live video.

Referring to fig. 1, in a fifth embodiment of the distributed live insertion system according to the present invention, the encoder 30 is further configured to obtain an absolute video time and an absolute audio time, determine whether the absolute video time is greater than the absolute audio time, obtain audio data decoded by the decoder 10 from the message queue when the absolute video time is greater than the absolute audio time, and update the absolute audio time according to a display timestamp corresponding to the audio data.

It should be understood that two timers are defined in advance in this embodiment: vtime and atime, wherein vtime is used to record the video absolute time of the current video, atime is used to record the audio absolute time of the current video, the initial phase vtime=0, atime=0. When vtime > atime, the audio data decoded by the decoder 10 are obtained from the message queue according to the queue sequence, and the corresponding time is calculated according to the display time stamp corresponding to the audio data and assigned to atime, so that atime is updated, and the step of judging whether the video absolute time is greater than the audio absolute time is executed.

In a specific implementation, the time corresponding to the audio data is calculated according to the following formula:

atime＝packet.pts*timebase；

Wherein, packet. Pts is the corresponding display timestamp of audio data, and timebase is the time benchmark that sets up in advance.

The encoder 30 is further configured to obtain, from the message queue, the frame inserting metadata generated by the AI engine 20 when the absolute time of the video is less than the absolute time of the audio, pull, from the target database, a corresponding video frame and an intermediate frame according to the frame inserting metadata, and update the absolute time of the video according to a display timestamp corresponding to the video frame or the intermediate frame.

It should be noted that, when atime > vtime, the frame inserting metadata generated by the AI engine 20 is obtained from the message queue according to the queue sequence, the corresponding video frames and intermediate frames are pulled from the target database, the larger value in the display time stamps of the video frames and intermediate frames is selected as the basis for updating the absolute time of the video, and the time corresponding to the video data is calculated by the following formula:

vtime＝frame.pts*timebase；

wherein frame. Pts is the display timestamp corresponding to the video frame or the intermediate frame, and timebase is the time reference set in advance.

Updating the vtime according to the calculated time corresponding to the video data, and returning to the step of executing the judgment of whether the video absolute time is greater than the audio absolute time. In this way, the scheduling of the audio data and the video data in the distributed live broadcast frame inserting process is realized, and the coding efficiency of the encoder 30 is improved.

Referring to fig. 3, fig. 3 is a flowchart illustrating a first embodiment of a distributed live insertion method according to the present invention.

The method of the distributed live broadcast frame inserting method of the embodiment is applied to the distributed live broadcast frame inserting system, and the distributed live broadcast frame inserting system comprises the following steps: a decoder, an encoder, and a plurality of AI engines respectively deployed on different hosts, the method comprising:

step S10: the decoder decodes the acquired video stream to obtain audio data, video frames and video frame metadata, stores the video frames in a target database, and stores the audio data and the video frame metadata in a message queue.

Step S20: each AI engine obtains video frame metadata obtained by decoding by the decoder from the message queue, pulls video frames corresponding to the video frame metadata from the target database, generates intermediate frames according to the pulled video frames, generates inserting frame metadata corresponding to the intermediate frames, stores the intermediate frames into the target database, and stores the inserting frame metadata into the message queue.

Step S30: the encoder acquires the audio data decoded by the decoder and the inserted frame metadata generated by the AI engine from the message queue, pulls the video frames and the intermediate frames corresponding to the inserted frame metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and then outputs a target video stream.

It should be understood that the foregoing is illustrative only and is not limiting, and that in specific applications, those skilled in the art may set the invention as desired, and the invention is not limited thereto.

The distributed live broadcast frame inserting system in the embodiment comprises: a decoder, an encoder, and a plurality of AI engines respectively disposed on different hosts; the decoder decodes the acquired video stream to obtain audio data, video frames and video frame metadata, stores the video frames in a target database, and stores the audio data and the video frame metadata in a message queue; each AI engine acquires video frame metadata decoded by a decoder from a message queue, pulls a video frame corresponding to the video frame metadata from a target database, generates an intermediate frame according to the pulled video frame, generates inserting frame metadata corresponding to the intermediate frame, stores the intermediate frame into the target database, and stores the inserting frame metadata into the message queue; the encoder acquires the audio data decoded by the decoder and the inserted frame metadata generated by the AI engine from the message queue, pulls the video frames and the intermediate frames corresponding to the inserted frame metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and outputs the target video stream. Through the mode, the frame inserting task of the high-quality video is executed by using the distributed-deployment multiple AI engines, multi-frame inserting data can be generated in a short time, the requirement on the number of the inserted frames in a live broadcast scene is met, the live broadcast frame inserting efficiency is improved, and the smoothness of the live broadcast video is improved.

It should be noted that the above-described working procedure is merely illustrative, and does not limit the scope of the present invention, and in practical application, a person skilled in the art may select part or all of them according to actual needs to achieve the purpose of the embodiment, which is not limited herein.

In addition, technical details not described in detail in this embodiment may refer to the distributed live broadcast frame insertion system provided in any embodiment of the present invention, which is not described herein.

Furthermore, it should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. Read Only Memory)/RAM, magnetic disk, optical disk) and including several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A distributed live insertion system, the distributed live insertion system comprising: a decoder, an encoder, and a plurality of AI engines respectively disposed on different hosts;

the encoder is used for acquiring the audio data decoded by the decoder and the frame inserting metadata generated by the AI engine from the message queue, pulling a video frame and an intermediate frame corresponding to the frame inserting metadata from the target database, merging and encoding the video frame, the intermediate frame and the audio data, and then outputting a target video stream;

the decoder is used for determining a reference video frame according to the frame inserting frequency in the frame inserting task and generating video frame metadata according to the frame number and the display time stamp of the reference video frame.

2. The distributed live broadcast frame inserting system according to claim 1, wherein the encoder is further configured to delete the intermediate frame in the target database when the video frame and the intermediate frame corresponding to the frame inserting metadata are obtained by local buffering, record a first frame number corresponding to the intermediate frame in a preset shaping array, search a previous intermediate frame number corresponding to the first frame number forward in the preset shaping array based on the first frame number, and delete the video frame in the target database if the previous intermediate frame number corresponding to the first frame number is found in the preset shaping array.

3. The distributed live insertion system of claim 2, wherein the encoder is further configured to search backward for a next intermediate frame number corresponding to the first frame number based on the first frame number in the preset shaping array, and delete a next video frame corresponding to the intermediate frame in the target database if the search finds that the next intermediate frame number corresponding to the first frame number is recorded in the preset shaping array.

4. The distributed live insertion system of claim 2, wherein the encoder is further configured to delete video frames and intermediate frames having a frame number less than the first frame number in the target database after outputting the target video stream.

5. The distributed live insertion system of claim 1, wherein the decoder is further configured to number the decoded video frames, generate a second frame number corresponding to the video frames, generate video frame metadata corresponding to the video frames according to the second frame number, store the video frames including the second frame number in the target database, and store the audio data and the video frame metadata in a message queue;

6. The distributed live insertion system of claim 5, wherein the decoder is further configured to number the decoded video frames, generate a first odd frame number corresponding to the video frames, and generate video frame metadata corresponding to the video frames according to the first odd frame number;

7. The distributed live insertion system of claim 5, wherein the decoder is further configured to code the decoded video frames, generate a second even frame code corresponding to the video frames, and generate video frame metadata corresponding to the video frames according to the second even frame code;

8. The distributed live insertion system of claim 1, wherein the encoder is further configured to obtain an absolute video time and an absolute audio time, determine whether the absolute video time is greater than the absolute audio time, obtain audio data decoded by the decoder from the message queue when the absolute video time is greater than the absolute audio time, and update the absolute audio time according to a display timestamp corresponding to the audio data.

9. The distributed live insertion system of claim 8, wherein the encoder is further configured to obtain the insertion metadata generated by the AI engine from the message queue when the absolute time of the video is less than the absolute time of the audio, pull corresponding video frames and intermediate frames from the target database according to the insertion metadata, and update the absolute time of the video according to a display timestamp corresponding to the video frame or the intermediate frame.

10. A distributed live insertion method, wherein the method is applied to a distributed live insertion system as claimed in any one of claims 1 to 9, the distributed live insertion system comprising: a decoder, an encoder, and a plurality of AI engines respectively deployed on different hosts, the method comprising:

The encoder acquires audio data decoded by the decoder and the inserted frame metadata generated by the AI engine from the message queue, pulls video frames and intermediate frames corresponding to the inserted frame metadata from the target database, combines and encodes the video frames, the intermediate frames and the audio data, and then outputs a target video stream;

the decoder determines a reference video frame according to the frame inserting frequency in the frame inserting task, and generates video frame metadata according to the frame number and the display time stamp of the reference video frame.