CN116567308A

CN116567308A - Method, device, equipment and storage medium for synchronizing multi-stream network video

Info

Publication number: CN116567308A
Application number: CN202310451718.1A
Authority: CN
Inventors: 邹颖思; 胡钊
Original assignee: Ava Electronic Technology Co Ltd
Current assignee: Ava Electronic Technology Co Ltd
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-08-08

Abstract

The invention discloses a method, a device, equipment and a storage medium for synchronizing multi-stream network videos. Wherein the method comprises the following steps: sending a setting request and a clock synchronization instruction of multimedia data to each client through a signaling channel; receiving the multimedia data sent by each client after completing the setting request, and recording and generating multimedia data after each path of execution operation; determining a start time point and a first calibration time point based on the local clock; calculating a first duration between a starting time point and a first calibration time point, and calculating a second duration of the multimedia data after each path of operation at the first calibration time point; according to the first time length and the second time length, obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation; and eliminating the audio time deviation and the video time deviation of the multimedia data after each path of operation. The invention realizes the video synchronization of the multipath network without coding and decoding, and reduces the overhead of the system.

Description

Method, device, equipment and storage medium for synchronizing multi-stream network video

Technical Field

The present invention relates to the field of video image recording technologies, and in particular, to a method, an apparatus, a device, and a storage medium for synchronizing video in a multi-stream network.

Background

In recent years, with the development of internet technology, online education has been increasingly widely used. The on-line education mainly comprises the steps of shooting and recording teaching live videos, and enabling students to watch the teaching videos on line in real time to learn. And q, the recording and playing server pulls the audio and video streams of each client to record and store so as to be provided for users to record and play back later.

When the recording and playing server simultaneously pulls up multiple paths of RTP audio and video streams for recording, scenes which need picture synchronization and audio synchronization among the multiple paths of RTP audio and video streams often exist. Because the multiple paths of RTP audio and video streams come from different clients, the clocks of the clients have differences, which can cause the problem that recorded files of the paths of RTP video streams are out of alignment in picture or sound.

At present, in order to solve the alignment problem, a recording and broadcasting server generally decodes multiple paths of video streams to form corresponding multiple paths of image streams and caches the multiple paths of image streams, synchronizes the multiple paths of image streams according to synchronization information, and finally records the multiple paths of image streams at the same moment. In this process, the multi-channel video decoder is independent, the decoded image streams are discrete, and each image needs to be time-stamped and cached according to the synchronization information obtained by decoding, which consumes a long time and requires a large amount of memory, and the system overhead is large.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for synchronizing multi-stream network video, which aims to solve the technical problem of high system overhead during multi-channel RTP audio and video stream recording in the prior art.

In a first aspect, the present invention provides a method for synchronizing video in a multi-stream network, including:

a control signaling channel connecting at least two clients;

sending a setting request and a clock synchronization instruction of multimedia data to each client through a signaling channel; wherein the multimedia data comprises audio data and video data, and the setting request is used for indicating a client to generate an I frame at a specified time point;

receiving the multimedia data sent by each client after completing the setting request, and respectively executing operation on each path of multimedia data to generate multimedia data after each path of operation;

determining a start time point of performing an operation and a first calibration time point based on a local clock;

calculating a first time length between a starting time point and a first calibration time point, and calculating respective second time lengths of the multimedia data after each path of operation at the first calibration time point;

according to the first time length and the second time length, obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation;

and eliminating the audio time deviation and the video time deviation of the multimedia data after each path of operation.

In one embodiment, the process of eliminating the audio time deviation and the video time deviation of the multimedia data after each path of operation includes:

when the audio time deviation of the operated multimedia data is not in a first threshold value, adjusting the inter-frame interval of part of audio frames in the operation process after a first calibration time point according to the audio time deviation so as to enable the audio time deviation to return to the first threshold value range;

and when the video time deviation of the operated multimedia data is not in the second threshold value, adjusting the inter-frame interval of partial video frames in the operation process after the first calibration time point according to the video time deviation so as to enable the video time deviation to return to the second threshold value range.

In one embodiment, when the audio time deviation of the operated multimedia data is not within the first threshold, adjusting the inter-frame interval of the partial audio frames in the operation process after the first calibration time point according to the audio time deviation so as to make the audio time deviation return to the first threshold range, which includes:

according to the audio time deviation, the interval of n1 frames of audio frames after the first calibration time point is adjusted, wherein the interval of the n1 frames of audio frames is the same.

In one embodiment, when the video time deviation of the operated multimedia data is not in the second threshold, adjusting the inter-frame interval of the partial video frames in the operation process after the first calibration time point according to the video time deviation so as to make the video time deviation return to the second threshold range, including:

and adjusting the interval of n2 frames of video frames after the first calibration time point according to the video time deviation, wherein the interval of the n2 frames of video frames is the same.

In one embodiment, the method further comprises:

calculating a third time length between a starting time point and a second calibration time point, and calculating a fourth time length of each path of operated multimedia data at the second calibration time point, wherein the second calibration time point is a time point of a local clock;

and according to the third time length and the fourth time length, obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation.

In one embodiment, the method further comprises:

the request of NTP clock is sent to each client through signaling channel at fixed time.

In one embodiment, the process of determining a start time point of an execution operation and a first calibration time point based on a local clock includes:

and determining the starting time point of the execution operation according to the starting time point of the multimedia data of the path of the earliest execution operation.

In a second aspect, the present invention provides an apparatus for synchronizing video of a multi-stream network, comprising:

the connection module is used for connecting control signaling channels of at least two clients;

the sending module is used for sending a setting request and a clock synchronization instruction of the multimedia data to each client through a signaling channel; wherein the multimedia data comprises audio data and video data, and the setting request is used for indicating a client to generate an I frame at a specified time point;

the receiving module is used for receiving the multimedia data sent by each client after the setting request is completed, respectively executing operation on each path of multimedia data, and generating the multimedia data after each path of operation;

a determining module for determining a start time point of an execution operation and a first calibration time point based on a local clock;

the computing module is used for computing a first duration between a starting time point and a first calibration time point and computing a second duration of each path of operated multimedia data at the first calibration time point;

the comparison module is used for obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation according to the first time length and the second time length;

and the alignment module is used for eliminating the audio time deviation and the video time deviation of the multimedia data after each path of operation.

In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the above embodiments when executing the program.

In a fourth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the method of any of the above embodiments.

In the invention, by enabling each client to generate the I frame at the appointed time point, the server can use the I frame as a starting point, so that the time of each path of video initial execution operation is relatively similar, and further, the reference time length of the local clock of the server is used as a standard, and the deviation between the time length of each path of video and the standard time length is adjusted, thereby realizing the synchronization of multiple paths of network videos under the condition of no need of coding and decoding, and reducing the overhead of a system.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention.

Fig. 2 is an alignment diagram of a first embodiment of the present invention.

Fig. 3 is a schematic representation of the content of mdhd.

Fig. 4 is a schematic representation of the content of stts.

Fig. 5 is a schematic diagram of the overall structure of a second embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

It should be noted that, the term "first\second\ … …" related to the embodiments of the present invention is merely to distinguish similar objects, and does not represent a specific ordering for the objects, it is to be understood that "first\second\ … …" may interchange a specific order or sequence where allowed. It is to be understood that the objects identified by "first\second\ … …" may be interchanged where appropriate to enable embodiments of the invention described herein to be practiced in sequences other than those illustrated or described herein.

Example 1

Referring to fig. 1, fig. 1 is a flowchart of a method for synchronizing video of a multi-stream network according to an embodiment of the invention, which includes steps S110 to S170. It should be noted that, the steps S110 to S170 are only reference numerals for clearly explaining the correspondence between the embodiments and fig. 1, and do not represent the sequence limitation of the steps in the present embodiment.

Step S110, a control signaling channel of at least two clients is connected;

step S120, sending a setting request and a clock synchronization instruction of multimedia data to each client through a signaling channel;

wherein the multimedia data comprises audio data and video data, and the setting request is used for indicating a client to generate an I frame at a specified time point;

step S130, receiving the multimedia data sent by each client after completing the setting request, and respectively executing operation on each path of multimedia data to generate multimedia data after each path of operation;

step S140, determining a start time point of an execution operation and a first calibration time point based on a local clock;

step S150, calculating a first duration between a starting time point and a first calibration time point, and calculating a second duration of each path of operated multimedia data at the first calibration time point;

step S160, obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation according to the first time length and the second time length;

step S170, eliminating the audio time deviation and the video time deviation of the multimedia data after each path operation.

The method can be used for a server performing various operations, such as a server performing video recording, video playback and video forwarding, so the performing operations mentioned in step S130 include recording, playback or forwarding. For convenience of explanation, the method will be explained below by taking a recording and playing server as an example.

The recording and broadcasting server initiates connection to all clients, receives RTP audio and video data, and simultaneously connects tcp control signaling channels of the clients, and preparation work is completed. The first frame recorded by the recording and playing server must be an I frame, but the starting time of each client is different, so the time point of the original I frame in each path of RTP video stream is also different, and the starting position alignment of each path of video data cannot be realized without decoding the video. Therefore, step S120 sends a clock synchronization instruction to each client to synchronize clocks to each client with the server. After clock synchronization, sending a setting request of the multimedia data, uniformly commanding each client to generate an I frame at a designated time point, taking the time point for producing the I frame as the starting point of each path of multimedia data, and correspondingly taking each I frame as the starting point of recording by the recording and playing server.

Preferably, the NTP clock request is sent to each client via a signaling channel to synchronize the clocks to each client with the server. In addition, the designated time point is determined according to the time point of the client synchronous NTP clock and the preset time length, namely, each client is controlled to start generating the I frame serving as a starting point at a certain moment.

After each client finishes the setting request, the RTP multimedia data is sent to the recording and playing server. Because of network delay or different speeds of responding to the I frames by the clients, as shown in fig. 2, the time for the I frames generated at the designated time points of the RTP multimedia data of each path to reach the recording and playing server is also different. And because the recording is started when each path of RTP multimedia data is received, the time point when the server starts recording for each path of RTP multimedia data is different. The different time points of starting recording lead to the fact that the speed is high and low between each generated recorded multimedia data, so that the video is asynchronous, a marker post is needed to be set up, and each path of video is close to the marker post, so that alignment is realized.

In the present invention, preferably, the recording time point of the marker post, that is, the recording start time point t1, is determined according to the start time point of recording one path of RTP multimedia data. For convenience of explanation, in this embodiment, the recording start time point t1 is determined by the earliest RTP multimedia data (i.e., RTP 1 in fig. 2) to be recorded. It should be noted that, generally, the recording is prepared after receiving the I frame generated at the designated time point, and the preparation time of each path of RTP multimedia data is the same, so that the path that receives the I frame at the earliest is the path that records at the earliest. It should be noted here that since the entire alignment process is performed in the server, the start time point t1 is determined based on the clock local to the server. As shown in fig. 2, a calibration time point t2 is set on the local clock, and a time period between the start time point t1 and the calibration time point t2 is set as a first time period for calibration. At the calibration time point t2, each recorded multimedia has a duration of time that has been recorded by itself, i.e., a second duration. In fig. 2, since the recording start time point t1 is determined by RTP 1, the second duration of RTP 1 is just equal to the first duration, and other RTP multimedia data starts recording later than RTP 3, so there is a time deviation (i.e., the duration of the shaded box in fig. 2) at the calibration time point t 2. At this time, according to the first time length and the second time length, the audio time deviation and the video time deviation of each recorded multimedia data can be obtained. It should be noted here that since the multimedia data includes audio data and video data, an audio time offset and a video time offset of each recorded multimedia data are respectively derived.

In one embodiment, the process of step S170 includes: step S171 and step S172.

Step S171, when the audio time deviation of the operated multimedia data is not in the first threshold value, adjusting the inter-frame interval of part of the audio frames in the operation process after the first calibration time point according to the audio time deviation so as to enable the audio time deviation to return to the first threshold value range;

step S172, when the video time deviation of the operated multimedia data is not within the second threshold, adjusting the inter-frame interval of the partial video frames in the operation process after the first calibration time point according to the video time deviation, so as to make the video time deviation return to the second threshold range.

In this embodiment, a first threshold value related to the audio time deviation and a second threshold value related to the video time deviation are preset, and the audio time deviation and the video time deviation are respectively returned to the interval ranges of the first threshold value and the second threshold value by adjusting the inter-frame intervals of the audio data and the video data after the first calibration time point, so as to eliminate the audio time deviation and the video time deviation.

In one embodiment, the process of step S171 includes:

In this embodiment, a number n1 is preset, the interval of n1 frames of audio frames after the first calibration time point is adjusted, and the audio time offset is evenly distributed to the intervals of the following n1 frames of audio frames.

In one embodiment, the process of step S172 includes:

Similar to the previous embodiment, a number n2 is preset, the interval of n2 frames of video frames after the first calibration time point is adjusted, and the video time deviation is evenly distributed to the intervals of the following n2 frames of video frames.

The method is mainly applied to operating MP4 files, such as MP4 recording, and the format of the MP4 files conforms to the ISO/IEC 14496-12 standard. All data are encapsulated in a data structure called Box, one MP4 file is composed of a plurality of boxes, and the MP4 file is composed of ftype, free, mdat and moov four boxes. The method relates to the synchronization of audio and video, namely mdhd and stts in audio and video track in moov box. The method is suitable for live scenes, so the recorded RTP stream has no B frame, and therefore, no ctts box.

As shown in fig. 3, the content in mdhd, duration/Time scale=seconds of recording Duration, and the audio/video track has one mdhd, and the Duration/Duration of the content indicates the respective Duration. The premise of normal audio-video synchronization is that the two durations are consistent.

The content in stts is shown in fig. 4, and the audio and video track has a stts respectively, and the mapping table of DTS-sample sequence number is recorded in the stts. "No." in FIG. 4 is the number of each frame, sample count is the number of frames, and Sample delta is the interval between the preceding and following frames. The player scales the decoding time according to the sample delta interval inside stts, so this value is related to the synchronization of file playback.

For example, the Time scale video in mdhd of the encapsulated mp4 is 90000, the audio sampling rate, these two values are exactly the unit of the Time stamp of the audio/video encapsulation in the RTP packet, and the difference value of the received frames before and after the RTP is directly taken as the sample delta in stts box without complex conversion. These sample deltas are accumulated to be Duration inside mdhd. So the video recording time period seconds is Duration/90000, and the audio recording time period seconds is Duration/audio sampling rate.

If the Duration (video: duration/90000, audio: duration/audio sample rate) of the recorded MP4 audio mdhd box is within a delta millisecond of (t 2-t 1), the delta millisecond is converted into a sample delta in the audio stts box.

The specific error calculation and conversion method is as follows:

video duration_svr_video= (t 2-t 1) 90000, which should be recorded with the clock of the server;

audio frequency duration_svr_audio= (t 2-t 1) which should be recorded with the clock of the server;

the Video duration_client_video of actual MP4 = Duration value inside mdhd box inside Video track;

the Audio duration_client_audio of actual MP4 = Duration value inside mdhd box inside Audio track;

video sample delta= (duration_svr_video-duration_client_video) of the final stts box to be corrected;

finally, the Audio sample delta= (duration_svr_audio_duration_client_audio) of stts box to be corrected;

the sample delta in stts box in audio video track is typically a fixed value, such as video sample delta=90000/30=3000 with frame rate 30, sample delta=1024 for aac audio.

Considering that sample delta in stts box determines the decoding time interval of player, if the audio/video sample delta value of stts box to be corrected is larger, the correction is carried out in one frame, which can result in the unsmooth playing, so the error value can be averaged to the sample delta in stts box corresponding to N frames of audio/video recorded in the subsequent. The N determines how many frames after correcting the audio and video synchronization, and can be obtained by integrating the error value with the current video frame rate and the audio sampling rate. Such as: the sample delta of the stts box audio and video to be corrected can be evenly distributed into the sample delta of the stts box of the subsequent recorded audio and video frames according to the sample delta theoretical value (such as video 3000 and audio 1024) of the stts box. For example, if the video sample delta of the stts box to be corrected is 90000, the audio sample delta of the stts box to be corrected is 10240. It is contemplated that the difference of the video may be allocated to the subsequent 30 frames, and the sample delta= (rtp timestamp difference of the previous and subsequent frames) +90000/30 corresponding to the stts box of the subsequently recorded 30 frames. As for audio that can be allocated into the next 10 frames, the samples delta= (rtp timestamp difference between the previous and subsequent frames) +10240/10 corresponding to the next 10 frames of recorded audio stts box.

In one embodiment, the method for synchronizing multi-stream network video further comprises: step S180 and step S190.

Step S180, calculating a third time length between a starting time point and a second calibration time point, and calculating a fourth time length of each path of operated multimedia data at the second calibration time point, wherein the second calibration time point is a time point of a local clock;

step S190, according to the third time length and the fourth time length, obtaining the audio time deviation and the video time deviation of the multimedia data after each path of operation.

Because the network loses frames or the clocks of the multiple RTP are not synchronous with the clocks of the local servers, calibration actions are performed at intervals, such as 1 hour. The calibration action comprises two parts of time deviation obtaining and time deviation eliminating, wherein the time deviation obtaining parts are step S180 and step S190, and the processes of the step S180 and the step S190 are approximately the same as the processes of the step S150 and the step S160. The step of eliminating the time deviation portion is step S170. In other words, step S170 is a process of synchronizing video throughout the multi-stream network, and when the audio time deviation and the video time deviation occur, the cancellation operation is performed. While performing the calibration action, a request to synchronize NTP is also initiated for each client to calibrate the clock.

In the method, by enabling each client to generate the I frame at the appointed time point, the server can use the I frame as a starting point, the time of each path of video initial execution operation is similar, and further the reference time length of the local clock of the server is used as a standard, so that the deviation between the time length of each path of video and the standard time length is adjusted, and the multi-path network video synchronization is realized under the condition that encoding and decoding are not needed, and the system overhead is reduced.

Example two

Corresponding to the method of the first embodiment, as shown in fig. 5, the present invention further provides a device for synchronizing multi-stream network video, including: a connection module 510, a transmission module 520, a reception module 530, a determination module 540, a calculation module 550, a comparison module 560, and an alignment module 570.

A connection module 510, configured to connect control signaling channels of at least two clients;

a sending module 520, configured to send a setting request and a clock synchronization instruction of the multimedia data to each client through a signaling channel; wherein the multimedia data comprises audio data and video data, and the setting request is used for indicating a client to generate an I frame at a specified time point;

the receiving module 530 is configured to receive multimedia data sent by each client after completing the setting request, perform an operation on each path of multimedia data, and generate multimedia data after each path of operation;

a determining module 540 for determining a start time point of performing an operation and a first calibration time point based on a local clock;

a calculating module 550, configured to calculate a first duration between the start time point and the first calibration time point, and calculate a second duration of each recorded multimedia data at the first calibration time point according to the time stamp of each recorded multimedia data;

the comparison module 560 is configured to obtain an audio time deviation and a video time deviation of the multimedia data after each path of operation according to the first duration and the second duration;

the alignment module 570 is configured to eliminate audio time deviation and video time deviation of the multimedia data after each path of operation.

In one embodiment, the alignment module performs a process of eliminating audio time deviation and video time deviation of multimedia data after each path of operation, and includes the steps of:

In one embodiment, when the audio time deviation of the operated multimedia data is not within a first threshold, the process of adjusting the inter-frame interval of a part of the audio frames in the operation process after a first calibration time point according to the audio time deviation so as to return the audio time deviation to be within the first threshold range includes:

In one embodiment, when the video time deviation of the operated multimedia data is not within a second threshold, the process of adjusting the inter-frame interval of a part of video frames in the operation process after the first calibration time point according to the video time deviation so as to make the video time deviation return to be within the second threshold range includes:

In one embodiment, the determination module 540 is further for determining a second calibration time point;

the calculating module 550 is further configured to calculate a third duration between the start time point and a second calibration time point, and calculate a fourth duration of each path of the operated multimedia data at the second calibration time point, where the second calibration time point is a time point of the local clock;

the comparison module 560 is further configured to obtain an audio time deviation and a video time deviation of the multimedia data after each path of operation according to the third duration and the fourth duration.

In one embodiment, the sending module 520 is further configured to time sending the NTP clock request to each client over the signaling channel.

In one embodiment, the process of determining a start time point of an execution operation and a first calibration time point based on the local clock is performed, including the steps of:

In the device, by enabling each client to generate the I frame at the appointed time point, the server can use the I frame as a starting point, the time of each path of video initial execution operation is similar, and further the reference time length of the local clock of the server is used as a standard, so that the deviation between the time length of each path of video and the standard time length is adjusted, and the multi-path network video synchronization is realized under the condition that encoding and decoding are not needed, and the system overhead is reduced.

Example III

The embodiment of the invention also provides a storage medium, on which computer instructions are stored, which when executed by a processor, implement the method for synchronizing multi-stream network video according to any of the above embodiments.

Those skilled in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a random access Memory (RAM, random Access Memory), a Read-Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.

Alternatively, the above-described integrated units of the present invention may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or part contributing to the related art, and the computer software product may be stored in a storage medium, and include several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the methods of the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program code, such as a removable storage device, RAM, ROM, magnetic or optical disk.

Corresponding to the above computer storage medium, in one embodiment, there is also provided a computer device, including a memory, an encoder, and a computer program stored on the memory and executable on the encoder, wherein the encoder implements the method of multi-stream network video synchronization according to any of the above embodiments when the encoder executes the program.

According to the computer equipment, the I frames are generated at the appointed time points by the clients, so that the server can use the I frames as the starting points, the time for initially executing the operation of each path of video is similar, the reference time length of the local clock of the server is used as the standard, the deviation between the time length of each path of video and the standard time length is adjusted, and therefore, the multi-path network video synchronization is realized under the condition that encoding and decoding are not needed, and the overhead of a system is reduced.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A method for synchronizing video over a multi-stream network, comprising:

a control signaling channel connecting at least two clients;

2. The method of claim 1, wherein the process of eliminating audio time bias and video time bias of the multimedia data after each path operation comprises:

3. The method of claim 2, wherein the step of adjusting the inter-frame spacing of the portion of the audio frames during operation after the first calibration time point according to the audio time offset when the audio time offset of the operated multimedia data is not within the first threshold value to return the audio time offset to be within the first threshold value comprises:

4. The method of claim 2, wherein the step of adjusting the inter-frame spacing of the partial video frames during operation after the first calibration time point according to the video time offset when the video time offset of the operated multimedia data is not within the second threshold value so as to return the video time offset to be within the second threshold value comprises:

5. The method of multi-stream network video synchronization according to any one of claims 1-4, further comprising:

6. The method of multi-stream network video synchronization of claim 5, further comprising:

7. The method of multi-stream network video synchronization according to any one of claims 1-4, 6, wherein the process of determining a start time point of an execution operation and a first calibration time point based on a local clock comprises:

8. An apparatus for synchronous recording of video over a multi-stream network, comprising:

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-7 when the program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.