CN115119009A

CN115119009A - Video alignment method, video encoding device and storage medium

Info

Publication number: CN115119009A
Application number: CN202210759941.8A
Authority: CN
Inventors: 王健; 王龙君; 王�忠; 何广
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2022-09-27
Anticipated expiration: 2042-06-29
Also published as: CN115119009B

Abstract

The application relates to a video alignment method, a video coding device and a storage medium. In the encoding process, the standard index number of the image group is corrected based on the encoding delay of the image group, and the corrected index number is larger than the standard index number, so that when the slice is generated, the encoding delay can be aligned based on the corrected index number, and the problem that when a playing end downloads the slice from a generating server according to the time of a standard frame rate, the corresponding slice is not produced, so that the slice production delay is generated at the live broadcast cloud end is solved, and the alignment of live broadcast code streams with different definitions in a live broadcast scene is accurately realized.

Description

Video alignment method, video encoding device and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a video alignment method, a video encoding device, and a storage medium.

Background

Currently, the HLS (HTTP Live Streaming) protocol is a mainstream technology for realizing Live broadcast services. Live video is carried out based on HLS protocol, and the problem of blockage during live video can be solved. For example, in a scene with obvious network fluctuation such as a subway, video live broadcast jamming always occurs, and the jamming can be well avoided by switching to a multimedia stream with low definition.

However, since slices of multimedia streams with different definitions are usually distributed in different servers or distributed in different task processes of the same server, in a live video scene, when switching multimedia streams with different definitions is performed, a problem that the slices of the multimedia streams with different definitions and audio/video timestamps cannot be accurately aligned occurs.

Disclosure of Invention

The application provides a video alignment method, a video coding device and a storage medium, which are used for solving the problem that slices of multimedia streams with different definitions and audio and video timestamps cannot be aligned accurately.

In a first aspect, a video encoding method is provided, including:

for any one path of live broadcast code stream in the N paths of live broadcast code streams, acquiring the coding time, the standard index number and the starting reference time of any image group in the any one path of live broadcast code stream; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production;

calculating the coding delay of any one image group based on the coding time, the standard index number and the starting reference time;

updating the standard index number according to the coding delay to obtain the index number of any image group; and when the coding delay is longer than the time length of a standard image group, the index number is larger than the standard index number.

Optionally, calculating an encoding delay of the arbitrary one group of pictures based on the encoding time, the standard index number, and the start reference time includes:

calculating the product of the standard index number and the standard image group duration to obtain the encoding time offset of any one image group on the encoding production virtual time axis;

calculating the sum of the coding time offset and the initial reference time to obtain a summation result;

and calculating the difference between the coding time and the summation result to obtain the coding delay.

Optionally, updating the standard index number according to the encoding delay to obtain the index number of the any one image group, including:

judging whether the coding delay is longer than the time length of N standard image groups or not;

if the time length of the N standard image groups is longer, determining the index number of any one image group as the standard index number plus N;

if the time length of the N standard image groups is not more than the time length of the N standard image groups, updating N-1, returning to the step of judging whether the encoding delay is more than the time length of the N standard image groups or not, and if the N updating is 1, if the encoding delay is more than 1 time length of the standard image groups, determining the index number of any image group as the standard index number plus 1, and if the encoding delay is not more than 1 time length of the standard image groups, determining the index number of any image group as the standard index number.

In a second aspect, a video alignment method is provided, including:

acquiring any image group in a path of live broadcast code stream;

analyzing the any image group to obtain an alignment parameter of the any image group, wherein the alignment parameter comprises a starting reference time, an index number of the any image group and a timestamp offset of a first video frame in the any image group, the starting reference time is a coding completion time of the first video frame in the N paths of live broadcast code streams on a coding production virtual time axis, and the index number of the any image group is related to coding delay of the any image group; the timestamp offset is the offset of the first video frame relative to the encoding timestamp of the first video frame in a target image group, and the target image group is a first image group processed when a slicing task for slicing any one image group is started;

determining the segmentation position of the any image group in the live broadcast code stream according to the index number of the any image group;

and determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset.

Optionally, determining a segmentation position of the any image group in the live broadcast code stream according to the index number of the any image group includes:

calculating the product of the index number and the time length of the standard image group to obtain the standard encoding time offset of any one image group on the encoding production virtual time axis;

acquiring a quotient value and a remainder value of the standard coding time offset removal by a preset slicing period duration;

determining the serial number of the slice to which the arbitrary image group belongs based on the quotient value, and determining the position of the arbitrary image group in the slice to which the arbitrary image group belongs based on the remainder value;

and taking the serial number of the slice and the position in the slice as the slicing position.

Optionally, determining an audio playing time and a video playing time of the arbitrary one of the groups of pictures based on the starting reference time and the timestamp offset includes:

calculating the sum of the starting reference time and the timestamp offset to obtain the time offset of the first video frame in any one image group on the coding production virtual time axis;

for each video frame in any one image group, respectively correcting the video coding timestamp and the video display timestamp of each video frame by using the time offset to obtain a corrected video coding timestamp corresponding to the video coding timestamp and a corrected video display timestamp corresponding to the video display timestamp; respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp;

taking the modified video coding timestamp and the modified video display timestamp as the video timestamp of each video frame; and taking the modified audio encoding timestamp and the modified audio display timestamp as the audio timestamp of each audio frame;

taking the playing time indicated by the video time stamps of all the video frames in the any one image group as the video playing time of the any one image group; and taking the playing time indicated by the audio time stamps of all the audio frames in the arbitrary image group as the audio playing time of the arbitrary image group.

Optionally, for each video frame in the any one group of pictures, respectively correcting the video coding timestamp and the video display timestamp of the each video frame by using the time offset, so as to obtain a corrected video coding timestamp corresponding to the video coding timestamp and a corrected video display timestamp corresponding to the video display timestamp, including:

calculating the sum of the video coding time stamp of each video frame and the time offset to obtain a first summation result; calculating the sum of the video display time of each video frame and the time offset to obtain a second summation result;

and using the first summation result as the modified video coding time stamp, and using the second summation result as the modified video display time stamp.

Optionally, the modifying, with the time offset, the audio coding time stamp and the audio display time stamp of each audio frame respectively to obtain a modified audio coding time stamp corresponding to the audio coding time stamp and a modified audio display time stamp corresponding to the audio display time stamp includes:

calculating the sum of the audio coding time stamp of each audio frame and the time offset to obtain a first summation result; calculating the sum of the audio display time of each audio frame and the time offset to obtain a second summation result;

taking the first summation result as the modified audio encoding timestamp and the second summation result as the modified audio display timestamp.

In a third aspect, a video encoding apparatus is provided, including:

the first acquisition module is used for acquiring the coding time, the standard index number and the starting reference time of any image group in any one path of live broadcast code stream in the N paths of live broadcast code streams; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production;

a calculating module, configured to calculate an encoding delay of the any one image group based on the encoding time, the standard index number, and the starting reference time;

the updating module is used for updating the standard index number according to the coding delay to obtain the index number of any image group; and when the encoding delay is longer than the time length of the standard image group, the index number is larger than the standard index number.

In a fourth aspect, a video alignment apparatus is provided, comprising:

the second acquisition module is used for acquiring any image group in a path of live broadcast code stream;

the system comprises an analyzing module, a judging module and a processing module, wherein the analyzing module is used for analyzing any image group and acquiring an alignment parameter of any image group, the alignment parameter comprises an initial reference time, a reference number of any image group and a timestamp offset of a first video frame in any image group, the initial reference time is the encoding completion time of the first video frame in the N paths of live broadcast code streams on an encoding production virtual time axis, and the reference number of any image group is related to the encoding delay of any image group; the timestamp offset is the offset of the first video frame relative to the encoding timestamp of the first video frame in a target image group, and the target image group is a first image group processed when a slicing task for slicing any one image group is started;

the first determining module is used for determining the segmentation position of any image group in the live broadcast code stream according to the index number of the any image group;

and the second determining module is used for determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset.

In a fifth aspect, a video alignment system is provided, comprising:

an encoding server and a slicing server;

the encoding server is used for acquiring the encoding time, the standard index number and the starting reference time of any image group in any one path of live broadcast code stream in the N paths of live broadcast code streams; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production; calculating the coding delay of any one image group based on the coding time, the standard index number and the starting reference time; updating the standard index number according to the coding delay to obtain the index number of any image group; when the coding delay is longer than the time length of a standard image group, the index number is larger than the standard index number;

the slicing server is used for acquiring any image group in a path of live broadcast code stream; analyzing the any image group to obtain an alignment parameter of the any image group, wherein the alignment parameter comprises a starting reference time, an index number of the any image group and a timestamp offset of a first video frame in the any image group, the starting reference time is a coding completion time of the first video frame in the N paths of live broadcast code streams on a coding production virtual time axis, and the index number of the any image group is related to coding delay of the any image group; the timestamp offset is the offset of the first video frame relative to the encoding timestamp of the first video frame in a target image group, and the target image group is a first image group processed when a slicing task for slicing any one image group is started; determining the segmentation position of the any image group in the live broadcast code stream according to the index number of the any image group; and determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset.

In a sixth aspect, an electronic device is provided, which includes: the system comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;

the memory for storing a computer program;

the processor is configured to execute the program stored in the memory to implement the video encoding method of the first aspect or the video alignment method of the second aspect.

In a seventh aspect, a computer-readable storage medium is provided, which stores a computer program, wherein the computer program is configured to implement the video encoding method according to the first aspect or the video alignment method according to the second aspect when executed by a processor.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method provided by the embodiment of the application, in the encoding process, the standard index number of the image group is corrected based on the encoding delay of the image group, and the corrected index number is larger than the standard index number, so that when the slice is generated, the encoding delay can be aligned based on the corrected index number, and the problem that when a playing end downloads the slice to a generating server according to the time of a standard frame rate, the corresponding slice is not produced yet, and the slice production delay is caused in a live broadcast cloud end is solved, so that the alignment of live broadcast code streams with different definitions in a live broadcast scene is accurately realized.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart illustrating a video encoding method according to an embodiment of the present application;

FIG. 2 is a flowchart illustrating a video alignment method according to an embodiment of the present application;

FIG. 3 is a block diagram of an exemplary video encoding apparatus;

FIG. 4 is a schematic structural diagram of a video alignment apparatus according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a video alignment system according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the related art, slice alignment is usually performed at a constant frame rate, where a source stream of a signal to be transcoded by a live broadcast is a standard stream, but test observation shows that a production method of slice alignment ideally performs normally. However, when the source stream of the transcoded live broadcast is not transcoded at a standard constant frame rate, a delay in downloading the broadcast occurs for the broadcasting end.

For example, at a standard frame rate, for a group of pictures (GOP) of 2s, each GOP is 50 frames. When slicing is performed according to GOP, if a slice is 6s, the slice should contain 3 GOPs, 150 frames of video frames. If the frame rate is insufficient, for example, 24.9fps, the time difference between two video frames is slightly larger than 1000/25-40 ms. When slicing is performed, since each GOP is composed of 50 frames of video, but its actual duration is slightly longer than 2s, which results in the duration of each live slice being slightly longer than 6s, the slice production will be slightly "late", i.e., the slice production time will be slightly longer than that at the standard frame rate. The deviation becomes larger and larger as a result of a slight piece difference and accumulation over a long period of time. When the playing end downloads the slices from the generating server according to the time at the standard frame rate, the corresponding slices are not produced yet, and the slicing production delay is caused at the live broadcast cloud end.

In order to solve the problems in the related art, the present embodiment provides a video encoding method, which is applicable to an encoding server;

as shown in fig. 1, the method may include the steps of:

step 101, for any one path of live broadcast code stream in N paths of live broadcast code streams, acquiring the coding time, the standard index number and the starting reference time of any image group in any one path of live broadcast code stream; the initial reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production.

It should be understood that, among the N channels of live broadcast code streams, multiple channels of live broadcast code streams with mutually different definitions are obtained by transcoding the same signal source stream by the encoding server. Each path of live broadcast code stream consists of a plurality of GOPs.

In this embodiment, the encoding time of any one group of pictures is the time when the encoding server actually encodes any one group of pictures. It should be understood that when the temporal interval of the video frames is unstable, so that the frame rate of the signal source stream is lower than the standard frame rate, the encoding time of any image group is later than that of the image group at the standard frame rate.

It should be understood that, in this embodiment, the standard index number of any image group is the index number of the image group determined when the frame rate of the signal source stream is the standard frame rate. Accordingly, the standard index number indicates the encoding order in which the group of pictures is encoded when the frame rate of the signal source stream is the standard frame rate. For example, when the standard index number of the group of pictures is 2, it indicates that the group of pictures is the second group of pictures that have been encoded in the live broadcast stream to which the group of pictures belongs.

In application, the coding server can synchronously start the codes of the N paths of live code streams and can also successively start the codes of the N paths of live code streams. In this embodiment, in order to achieve alignment of multiple paths of live broadcast code streams, a time when a first frame of video frame in N paths of live broadcast code streams is encoded on an encoding production virtual time axis is used as an initial reference time, that is, for each path of live broadcast code stream in the N paths of live broadcast code streams, it is considered that each path of live broadcast code stream is encoded from the first frame of video frame, and a time when the first frame of video frame is encoded is the encoding completion time.

In this embodiment, the virtual time axis of encoding production is kept consistent with the time local to the encoding server, that is, the time on the virtual time axis of encoding production is actually the time local to the encoding server.

And 102, calculating the coding delay of any image group based on the coding time, the standard index number and the starting reference time.

It should be understood that the encoding delay indicates when any one group of pictures is delayed from being encoded with respect to the signal source stream at the standard frame rate.

In an alternative embodiment, the specific calculation process of the coding delay may include: calculating the product of the standard index number and the standard image group duration to obtain the encoding time offset of any image group on the encoding production virtual time axis; calculating the sum of the coding time offset and the initial reference time to obtain a summation result; and calculating the difference between the coding time and the summation result to obtain the coding delay.

It should be understood that the standard image group duration is a duration of one image group preset assuming that the frame rate of the signal source stream is the standard frame rate. For example, when the standard frame rate is 25fps and one image group includes 50 video frames, the standard image group duration may be (1/25fps) × 50 ═ 2 s.

In application, each group of pictures consists of one key frame and a plurality of non-key frames. The key frame is typically the first frame in the group of pictures. The index number of the group of pictures is carried by the key frame. Therefore, the encoding time offset in this embodiment indicates that the encoding of the first frame video frame of any image group is later than the time of the first frame video frame in the N-way live broadcast code stream.

In application, when the coding server transcodes the signal source stream, the signal source stream is transcoded according to individual GOPs, and in the transcoding process, the transcoding server encapsulates the index number as parameter information into an SEI data unit and attaches the SEI data unit to a key frame of a picture group. It is understood that SEI is a data unit for information transfer defined in the H265/H264 video coding standard.

It should be understood that the summation result calculated based on the encoding time offset and the starting reference time represents the time when any one image group is encoded assuming that the frame rate of the signal source stream is the standard frame rate.

In this embodiment, the formula used for calculating the coding delay is as follows:

delay＝current_ntp_time–(start_ntp_time+gop_index*gop_time)

wherein, delay is coding delay, current _ ntp _ time is coding time, start _ ntp _ time is starting reference time, gop _ index is standard index number of any group of pictures, and gop _ time is standard group of pictures duration.

And 103, updating the standard index number according to the coding delay to obtain the index number of any image group, wherein when the coding delay is longer than the time length of the standard image group, the index number is larger than the standard index number.

The embodiment adopts a mode of increasing the index number to catch up with the coding delay. During specific implementation, the updating size of the standard index number is determined based on the relative relation between the coding delay and the time length of the standard image group.

In an optional embodiment, judging whether the coding delay is greater than the duration of N standard image groups; if the time length of the standard image group is longer than N, determining the index number of any one image group as the standard index number plus N; and if the time length of the N standard image groups is not more than the time length of the N standard image groups, updating N-1, returning to the step of judging whether the encoding delay is more than the time length of the N standard image groups or not, determining that the index number of any image group is the standard index number plus 1 if the encoding delay is more than 1 standard image group time length until the N is updated to be 1, and determining that the index number of any image group is the standard index number if the encoding delay is not more than 1 standard image group time length.

In practical applications, in order to offset coding delay in time and avoid the occurrence of the slice skipping phenomenon, N may be preferably set to 1.

In the technical scheme provided by this embodiment, in the encoding process, the standard index number of the image group is corrected based on the encoding delay of the image group, and the corrected index number is larger than the standard index number, so that when the slice is generated, the encoding delay can be aligned based on the corrected index number, and the problem that when a playing end downloads the slice to a generation server according to the time at the standard frame rate, the corresponding slice is not produced yet, so that the slice production delay is generated at the live broadcast cloud end is avoided, and the alignment of live broadcast code streams with different definitions in a live broadcast scene is accurately realized.

The embodiment provides a video alignment method which can be applied to a slicing server. It should be understood that the live code streams with different definitions can be sliced in different threads of the same process, and the live code streams with different definitions are sliced in the same slicing server; certainly, the live code streams with different definitions may also be sliced in different slicing servers, which is not specifically limited in this embodiment.

As shown in fig. 2, the method may include the steps of:

step 201, acquiring any image group in a live broadcast code stream;

step 202, analyzing any image group, and acquiring an alignment parameter of any image group, wherein the alignment parameter comprises an initial reference time, an index number of any image group and a timestamp offset of a first video frame in any image group, the initial reference time is an encoding completion time of the first video frame in an N-channel live broadcast code stream on an encoding production virtual time axis, and the index number of any image group is related to encoding delay of any image group; the time stamp offset is the offset of a first video frame relative to the encoding time stamp of the first video frame in a target image group, and the target image group is a first image group processed when a slicing task for slicing any one image group is started;

step 203, determining the segmentation position of any image group in a path of live broadcast code stream according to the index number of any image group;

and step 204, determining the audio playing time and the video playing time of any one image group based on the starting reference time and the timestamp offset.

In application, the slicing server can pull up a stream from the coding server or the transit server to acquire any image group in a live broadcast stream. When the slice is pulled up from the transit server, the encoding server transcodes the signal source stream and uploads the transcoded result to the transit server so that the slice server pulls up the stream from the transit server.

It should be understood that the slicing server may perform slicing processing on one live broadcast code stream or multiple live broadcast code streams to obtain sliced data meeting HLS live broadcast rules. When the slicing server slices the multi-path live code streams, different processes in the slicing server or different threads in the same process respectively slice different live code streams.

In this embodiment, the slicing server performs slicing processing on the live code stream with a GOP (group of pictures) as a minimum unit. In order to realize data alignment of different paths of live broadcast code streams and avoid download delay on a playing end, each image group in each path of live broadcast code stream in the N paths of live broadcast code streams carries a reference number (GOP _ index), so that a slicing server can determine the slicing position of a GOP (group of pictures) based on the reference number of the GOP after the GOP is pulled. It should be understood that, since the gop _ index carried in the group of pictures is related to the coding delay of the group of pictures at the side of the coding server, when the group of pictures is aligned based on the gop _ index, the problem of "slice production delay" caused by the coding delay can be eliminated in the slicing process.

In a specific implementation, in an optional embodiment, the product of the index number and the time length of the standard image group is calculated to obtain the standard encoding time offset of any image group on the encoding production virtual time axis; acquiring a quotient value and a remainder value of standard coding time offset removal by a preset slicing period duration; determining the serial number of the slice to which any image group belongs based on the quotient value, and determining the position of any image group in the slice to which the image group belongs based on the remainder value; the slice number and the position in the slice to which the slice belongs are used as the slicing position.

It should be understood that the standard image group duration herein refers to the duration of one image group when the frame rate of the signal source stream is the standard frame rate. For example, the standard frame rate is 25fps, and for an image group consisting of 50 frames of video frames, the standard image group duration should be (1/25fps) × 50 ═ 2 s. It should be noted that the source stream usually carries a frame rate, which is a standard frame rate, and based on various uncertain factors of practical application, the source stream is often not transmitted according to the standard frame rate, and a time difference between two frames of video frames may be slightly larger than 1/25fps, which is 40 ms.

In application, in order to improve the alignment efficiency of video data, after downloading a first GOP, a slicing server calculates the standard group of pictures (GOP _ time) of the first GOP, and caches the first GOP. When the standard group of pictures (GOP) time needs to be acquired, acquiring the GOP _ time of the first GOP, and taking the GOP _ time of the first GOP as the marked group of pictures (GOP) time. It should be understood that the first GOP herein refers to the first GOP downloaded after the slicing server starts, where the GOP _ index of the first GOP is 0 in the case where the slicing server does not fail, and if the slicing server fails and restarts in the process of pulling and slicing, the GOP _ index of the first GOP herein is not necessarily 0.

In this embodiment, the slice period duration reflects the slicing rule. The slice period duration is the least common multiple of the slice duration and the standard image group duration. For example, if the slice duration segment _ time is 5s and the gop _ time is 2s, the slice period duration is 5 × 2 — 10 s.

In this embodiment, the slicing duration is preset manually according to the service requirement. For example, the slice duration may be set to 5s or 6s, and so on. The slice duration is the same for different way (different definition) slices.

Taking the slice duration of 5s and the gop _ time of 2s as an example, the slicing rule reflected by the slice period duration is described. Since the GOP each slice contains must be a complete GOP, in this example, the average time of a slice in one slice period can only be guaranteed to be 5 s. Further, the obtained slicing rule may be 6s, 4s, 6s, 4s, where a 6s slice includes 3 GOPs, a 4s slice includes 2 GOPs, the first 6s slice and 4s slice constitute a first slice period, and the second 6s slice and 4s slice constitute a second slice period.

In this embodiment, since the standard index number of any one group of pictures is increased when the coding delay of any one group of pictures is greater than that of the standard group of pictures, in this case, the index number of any one group of pictures jumps relative to the index number of the previous group of pictures adjacent to the index number of the previous group of pictures. Thus, when determining the slice position based on the index number, some slices may lack a set of images.

In one example, the segment _ time is set to 6s, the gop _ time is set to 2s, and the least common multiple of the segment _ time and the gop _ time is 6s, so that the slice period duration can be determined to be 6 s.

In the first case, a path of live broadcast code stream includes 5 image groups, and the index numbers of the 5 image groups are 0, 2, 3, 4, and 5, according to the above calculation method for determining the segmentation position, for index 2, gop _ time, gop _ index, 2s × 2s, 4s/6s, quotient 0, slice 1 (with a preceding slice period of 0) is determined based on the quotient and the remainder, 4s/6s is 4, remainder 4 is determined based on the remainder, slice 1, gop 3(4/2+1) is determined based on the remainder, gop0 is determined to be 1 gop of the first slice, gop 3 is 1 gop of the second slice, gop4 is 2 gop of the second slice, and gop 5 is 3 gop of the second slice.

That is, in this case, the hopped GOP2 still belongs to the first slice, which lacks the intermediate GOP.

In the second case, the index numbers of 5 groups of pictures included in a live broadcast stream are 0, 1, 3, 4, and 5, respectively, in this case, the group of pictures with index number 3 is a group of pictures with jump, and it can be known according to the above calculation method for determining the slice position that the group of pictures belongs to the first slice of the second slice, so the first slice lacks a GOP, that is, lacks the last GOP.

In the third case, the index numbers of 5 groups of pictures included in a live stream are respectively 0, 1, 2, 4, and 5, and in this case, the GOP4 is a group of pictures that has a jump, and the group of pictures belongs to the second slice of the second slice according to the calculation method for determining the slice position, so that the second slice lacks a GOP, that is, lacks the first GOP.

In this embodiment, the time delay effect can be achieved by the image group with the jump index number. Taking the third case as an example, according to the delay jump GOP tracked by the encoding server side, the GOP index jumps from 2 to 4 without 3; then the slice side would group 4, 5 into one slice, i.e. the 2 nd slice has only 2 GOPs. The reason why the second slice has only two GOPs is that the GOP0, GOP1 and GOP2 generate delay, the group of pictures duration of each GOP is actually slightly longer than the standard group of pictures duration (2s), which results in the total duration of the first slice also being longer than 6s, so as the coding server side detects delay, the GOP _ index jumps, which results in the effect that the 2 nd slice has only 2 GOPs (possibly a little larger than 4s, but smaller than 6 s), thereby compensating the delay back.

That is, when the GOP index +1 jumps when detected, there is a slice that is one GOP less nearby, or the previous slice is compensated in the form of a slice-missing GOP by a slight delay time.

In the embodiment, the gop _ index is based on a unified coding to produce the virtual time axis, and the segmentation rule determined according to the gop _ time and the segment _ time also has cross-task consistency; multi-way slice data alignment can be guaranteed. Even if one-in multi-out coding task is interrupted or a certain path of slicing task is interrupted, the data stream still can be positioned to the slicing position according to the gop _ index carried in the data stream after being restarted. And moreover, the index number of the image group takes the coding delay into consideration, so that the generation of the slice generation delay can be avoided on the basis of ensuring the alignment of multi-path slice data.

In this embodiment, the alignment of video data includes the alignment of slicing any image group, and the video playback time and the audio playback time in any image group need to be corrected.

In an optional embodiment, the sum of the starting reference time and the timestamp offset is calculated to obtain the time offset of the first video frame in any image group on the coding production virtual time axis; for each video frame in any image group, respectively correcting the video coding timestamp and the video display timestamp of each video frame by adopting a time offset to obtain a corrected video coding timestamp corresponding to the video coding timestamp and a corrected video display timestamp corresponding to the video display timestamp; respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp; taking the modified video coding timestamp and the modified video display timestamp as a video timestamp of each video frame; and using the modified audio encoding timestamp and the modified audio display timestamp as an audio timestamp for each audio frame; taking the playing time indicated by the video time stamps of all the video frames in any image group as the video playing time of any image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

The formula for calculating the time offset in this embodiment may be:

video_ts_offset＝start_ntp_time+first_gop_dts_drift；

the video _ ts _ offset is a time offset of a first video frame in any one of the gop on an encoding production virtual time axis, the start _ ntp _ time is a starting reference time, and the first _ gop _ dts _ drift is a timestamp offset.

In this embodiment, the calculation formula for respectively correcting the video encoding timestamp and the video display timestamp of each video frame by using the time offset may be as follows:

video_packet_dts’＝video_packet_dts+video_ts_offset；

video_packet_pts’＝video_packet_pts+video_ts_offset；

the video _ packet _ dts is a video coding time stamp, the video _ packet _ pts is a video display time stamp, the video _ packet _ dts 'is a modified video coding time stamp, and the video _ packet _ pts' is a modified video display time stamp.

It should be understood that, when two values, i.e., video _ packet _ dts and video _ packet _ pts, are streaming of the slicing server, the video coding time stamp and the video display time stamp of each path of live code stream data are automatically obtained when the slicing server pulls the stream.

In the embodiment, the alignment of the time stamps of the multi-channel slicing video is ensured by converting the video time stamps of the video frames into a uniform coding production time axis.

audio_packet_dts’＝audio_packet_dts+video_ts_offset

audio_packet_pts’＝audio_packet_dts+video_ts_offset

wherein, audio _ packet _ dts is an audio encoding time stamp, audio _ packet _ dts is an audio display time stamp, audio _ packet _ dts 'is a modified audio encoding time stamp, and audio _ packet _ pts' is a modified audio display time stamp.

It should be understood that when two values, i.e., audio _ packet _ dts and audio _ packet _ dts, are streaming by the slice server, the audio encoding time stamp and the audio display time stamp of each path of live code stream data are automatically obtained when the slice server pulls the stream.

In the embodiment, the alignment of the multi-channel slice audio time stamps is ensured by converting the audio time stamps of the audio frames into a uniform coding production time axis.

In the technical scheme provided by the embodiment of the application, the segmentation position of any image group, and the audio playing time and the video playing time in the live broadcast code stream of the image group are determined based on the alignment parameter in any image group, so that the video data are aligned. Because the virtual time axis of coding production on the coding server is used as the basis of the production of the segmentation position and the time stamp, and any image group carries the index number, when transcoding or any path of slicing task is interrupted and started, the position of any image group on the virtual time axis of transcoding production can be sensed in real time whenever the slicing is started, so that the segmentation position is determined and the time stamp is corrected in a multi-path synchronous manner, and the method has strong robustness and anti-interference performance. Meanwhile, the index number of any image group is related to the coding delay, so that the generation of the slice generation delay can be avoided on the basis of ensuring the alignment of multi-path slice data.

According to the embodiment, transcoding and multiple paths of slicing can be distributed on different servers, task deployment is more flexible, for example, coded data can be used for producing rtmp streams, then the rtmp servers are used as transfer servers for pulling streams, then multiple paths of slicing production are carried out, and rtmp data streams and hls slicing data streams are obtained at the same time, so that the number of tasks is reduced, and computing resources are saved.

In the embodiment, video playing pause can be optimized through automatic code rate adjustment in an HLS slice live scene on the basis of cross-task slice alignment by combining an adaptive code rate technology.

The embodiment is based on cross-task slice alignment, and can derive and transform a server-side advertisement insertion technology (SSAI) for realizing alignment of source stream slices and personalized advertisement slices.

The cross-task slice alignment of the embodiment can ensure that the slice data is actually effective, and the playing card is optimized through ways such as data filling and the like under the condition of insufficient coding efficiency and the like.

Based on the same concept, embodiments of the present application provide a video encoding apparatus, and specific implementation of the apparatus can refer to the description of the method embodiment, and repeated details are not repeated, as shown in fig. 3, the apparatus mainly includes:

a first obtaining module 301, configured to obtain, for any one of the N live broadcast code streams, a coding time, a standard index number, and an initial reference time of any one image group in the any one live broadcast code stream; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production;

a calculating module 302, configured to calculate an encoding delay of any one image group based on the encoding time, the standard index number, and the starting reference time;

an updating module 303, configured to update the standard index number according to the coding delay, to obtain an index number of any image group; when the coding delay is longer than the time length of the standard image group, the index number is larger than the standard index number.

The calculation module 302 is configured to:

calculating the product of the standard index number and the standard image group duration to obtain the encoding time offset of any image group on the encoding production virtual time axis;

The update module 303 is configured to:

Based on the same concept, embodiments of the present application provide a video alignment apparatus, and specific implementation of the apparatus may refer to descriptions in the method embodiment section, and repeated details are not repeated, as shown in fig. 4, the apparatus mainly includes:

a second obtaining module 401, configured to obtain any image group in a live broadcast code stream;

the analysis module 402 is configured to analyze any one image group and acquire an alignment parameter of any one image group, where the alignment parameter includes an initial reference time, an index number of any one image group, and a timestamp offset of a first video frame in any one image group, the initial reference time is a coding completion time of the first video frame in N paths of live broadcast code streams on a coding production virtual time axis, and the index number of any one image group is related to coding delay of any one image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when a slicing task for slicing any image group is started;

a first determining module 403, configured to determine, according to the index number of any image group, a segmentation position of any image group in a live broadcast code stream;

a second determining module 404, configured to determine an audio playing time and a video playing time of any one of the groups of pictures based on the starting reference time and the timestamp offset.

The first determining module 403 is configured to:

calculating the product of the index number and the time length of the standard image group to obtain the standard encoding time offset of any image group on the encoding production virtual time axis;

acquiring a quotient value and a remainder value obtained by removing the standard coding time offset by a preset slicing period duration;

determining the serial number of the slice to which any one image group belongs based on the quotient value, and determining the position of any one image group in the slice to which the any one image group belongs based on the remainder value;

the slice number and the position in the slice to which the slice belongs are used as the slicing position.

The second determining module 404 is configured to:

calculating the sum of the initial reference time and the timestamp offset to obtain the time offset of the first video frame in any image group on the coding production virtual time axis;

for each video frame in any image group, respectively correcting the video coding timestamp and the video display timestamp of each video frame by adopting a time offset to obtain a corrected video coding timestamp corresponding to the video coding timestamp and a corrected video display timestamp corresponding to the video display timestamp; respectively correcting the audio coding time stamp and the audio display time stamp of each audio frame by adopting the time offset to obtain a corrected audio coding time stamp corresponding to the audio coding time stamp and a corrected audio display time stamp corresponding to the audio display time stamp;

taking the modified video coding timestamp and the modified video display timestamp as a video timestamp of each video frame; and using the modified audio encoding timestamp and the modified audio display timestamp as an audio timestamp for each audio frame;

taking the playing time indicated by the video time stamps of all the video frames in any image group as the video playing time of any image group; and taking the playing time indicated by the audio time stamps of all the audio frames in any one image group as the audio playing time of any one image group.

The second determining module 404 is configured to:

calculating the sum of the video coding time stamp and the time offset of each video frame to obtain a first summation result; calculating the sum of the video display time and the time offset of each video frame to obtain a second summation result;

the first summation result is taken as a modified video encoding timestamp and the second summation result is taken as a modified video display timestamp.

The second determining module 404 is configured to:

calculating the sum of the audio coding time stamp and the time offset of each audio frame to obtain a first summation result; calculating the sum of the audio display time and the time offset of each audio frame to obtain a second summation result;

the first summation result is taken as a modified audio encoding timestamp and the second summation result is taken as a modified audio display timestamp.

Based on the same concept, the embodiment of the present application provides a video alignment system, and the specific implementation of the system may refer to the description of the method embodiment, and repeated details are not repeated, as shown in fig. 5, the system mainly includes:

an encoding server 501 and a slicing server 502;

the encoding server 501 is configured to, for any one of the N live broadcast code streams, acquire an encoding time, a standard index number, and an initial reference time of any one image group in the any one live broadcast code stream; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production; calculating the coding delay of any image group based on the coding time, the standard index number and the starting reference time; updating the standard index number according to the coding delay to obtain the index number of any image group; when the coding delay is longer than the time length of the standard image group, the index number is larger than the standard index number;

the slicing server 502 is configured to obtain any image group in a live broadcast code stream; analyzing any image group to obtain an alignment parameter of any image group, wherein the alignment parameter comprises an initial reference time, an index number of any image group and a timestamp offset of a first video frame in any image group, the initial reference time is a coding completion time of the first video frame in N paths of live broadcast code streams on a coding production virtual time axis, and the index number of any image group is related to coding delay of any image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when a slicing task for slicing any image group is started; determining the segmentation position of any image group in a path of live broadcast code stream according to the index number of any image group; based on the start reference time and the timestamp offset, an audio playback time and a video playback time for any one of the groups of pictures are determined.

Based on the same concept, an embodiment of the present application further provides an electronic device, as shown in fig. 6, the electronic device mainly includes: a processor 601, a memory 602, and a communication bus 603, wherein the processor 601 and the memory 602 communicate with each other via the communication bus 603. The memory 602 stores a program executable by the processor 601, and the processor 601 executes the program stored in the memory 602 to implement the following steps:

for any one path of live broadcast code stream in the N paths of live broadcast code streams, acquiring the coding time, the standard index number and the initial reference time of any one image group in any one path of live broadcast code stream; the starting reference time is the encoding completion time of the first frame of video frame in the N paths of live broadcast code streams on the virtual time axis of encoding production; calculating the coding delay of any image group based on the coding time, the standard index number and the starting reference time; updating the standard index number according to the coding delay to obtain the index number of any image group; when the coding delay is longer than the time length of the standard image group, the index number is larger than the standard index number;

or the like, or a combination thereof,

acquiring any image group in a path of live broadcast code stream; analyzing any image group to obtain an alignment parameter of any image group, wherein the alignment parameter comprises an initial reference time, an index number of any image group and a timestamp offset of a first video frame in any image group, the initial reference time is a coding completion time of the first video frame in N paths of live broadcast code streams on a coding production virtual time axis, and the index number of any image group is related to coding delay of any image group; the time stamp offset is the offset of the encoding time stamp of the first video frame relative to the first video frame in the target image group, and the target image group is the first image group processed when a slicing task for slicing any image group is started; determining the segmentation position of any image group in a path of live broadcast code stream according to the index number of any image group; based on the start reference time and the timestamp offset, an audio playback time and a video playback time of any one of the groups of pictures are determined.

The communication bus 603 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 603 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 6, but this is not intended to represent only one bus or type of bus.

The Memory 602 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Alternatively, the memory may be at least one storage device located remotely from the processor 601.

The Processor 601 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like, and may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic devices, discrete gates or transistor logic devices, and discrete hardware components.

In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the video encoding method or the video alignment method described in the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wirelessly (e.g., infrared, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The available media may be magnetic media (e.g., floppy disks, hard disks, tapes, etc.), optical media (e.g., DVDs), or semiconductor media (e.g., solid state drives), among others.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is merely illustrative of particular embodiments of the invention that enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A video encoding method, comprising:

2. The method of claim 1, wherein calculating the encoding delay of any one of the group of pictures based on the encoding time, the standard index number and the starting reference time comprises:

3. The method according to claim 1 or 2, wherein updating the standard index number according to the encoding delay to obtain the index number of any one of the group of pictures comprises:

4. A method of video alignment, comprising:

acquiring any image group in a direct broadcast code stream;

5. The method according to claim 4, wherein determining a slicing position of the any one group of pictures in the one path of live code stream according to the index number of the any one group of pictures comprises:

6. The method of claim 4, wherein determining the audio playing time and the video playing time of any one of the groups of pictures based on the starting reference time and the timestamp offset comprises:

7. The method according to claim 6, wherein for each video frame in any one of the groups of pictures, respectively correcting the video encoding timestamp and the video display timestamp of each video frame by using the time offset to obtain a corrected video encoding timestamp corresponding to the video encoding timestamp and a corrected video display timestamp corresponding to the video display timestamp, comprising:

8. The method of claim 6, wherein the modifying the audio encoding timestamp and the audio display timestamp of each audio frame with the time offset to obtain a modified audio encoding timestamp corresponding to the audio encoding timestamp and a modified audio display timestamp corresponding to the audio display timestamp comprises:

9. A video encoding apparatus, comprising:

an updating module, configured to update the standard index number according to the coding delay to obtain an index number of the any one image group; and when the encoding delay is longer than the time length of the standard image group, the index number is larger than the standard index number.

10. A video alignment apparatus, comprising:

the analysis module is used for analyzing the any one image group and acquiring the alignment parameter of the any one image group, wherein the alignment parameter comprises starting reference time, the index number of the any one image group and the timestamp offset of a first video frame in the any one image group, the starting reference time is the encoding completion time of the first video frame in the N paths of live broadcast code streams on an encoding production virtual time axis, and the index number of the any one image group is related to the encoding delay of the any one image group; the timestamp offset is the offset of the first video frame relative to the encoding timestamp of the first video frame in a target image group, and the target image group is a first image group processed when a slicing task for slicing any one image group is started;

11. A video alignment system, comprising:

an encoding server and a slicing server;

12. An electronic device, comprising: the system comprises a processor, a memory and a communication bus, wherein the processor and the memory are communicated with each other through the communication bus;

the memory for storing a computer program;

the processor, configured to execute the program stored in the memory, to implement the video encoding method of any one of claims 1 to 3 or the video alignment method of any one of claims 4 to 8.

13. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the video encoding method of any one of claims 1-3 or the video alignment method of any one of claims 4-8.