CN114257771B - Video playback method and device for multipath audio and video, storage medium and electronic equipment - Google Patents

Video playback method and device for multipath audio and video, storage medium and electronic equipment Download PDF

Info

Publication number
CN114257771B
CN114257771B CN202111572204.9A CN202111572204A CN114257771B CN 114257771 B CN114257771 B CN 114257771B CN 202111572204 A CN202111572204 A CN 202111572204A CN 114257771 B CN114257771 B CN 114257771B
Authority
CN
China
Prior art keywords
video
audio
stream
video stream
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111572204.9A
Other languages
Chinese (zh)
Other versions
CN114257771A (en
Inventor
金宏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111572204.9A priority Critical patent/CN114257771B/en
Publication of CN114257771A publication Critical patent/CN114257771A/en
Application granted granted Critical
Publication of CN114257771B publication Critical patent/CN114257771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method for recording multipath audio and video, which comprises the following steps: acquiring audio streams and video streams of a plurality of devices for multipath audio and video communication; mixing the audio streams of each device to obtain a mixed audio stream, storing the mixed audio stream, and retaining time stamp information; and respectively storing the video streams of the plurality of devices and reserving the respective time stamp information. The application also provides a playback method of the multipath audio and video, which comprises the following steps: acquiring video streams of a plurality of devices which are respectively stored and used for carrying out multi-channel audio and video communication and respective time stamp information; acquiring stored mixed sound stream and timestamp information thereof; and synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream. By the method and the device, the processing of video recording and playback equipment can be simplified.

Description

Video playback method and device for multipath audio and video, storage medium and electronic equipment
Technical Field
The present application relates to video playback technologies, and in particular, to a method and apparatus for playing back multiple audio and video recordings, a storage medium, and an electronic device.
Background
The existing two-way audio and video record playback adopts the scheme generally comprising:
1. and storing the video files of the two-way audio in a network video server, and transmitting data to a playback device through a plurality of network video servers for video playback when video playback is carried out. However, in this processing manner, the audio and video data played back is unstable due to the influence of network transmission, and especially, each path of audio may jump during synchronization, and the sudden change of sound may be easily perceived by a person as abnormal playing, so that the user experience is poor.
2. And respectively carrying out video decoding on the two paths of independent video streams, carrying out picture arrangement on all YUV images which are decoded and output according to a certain proportion to form one path of YUV images, and transcoding and compressing the synthesized images into one path of video stream, wherein the process is called transcoding, storing the one path of video stream after transcoding, and decoding and playing the stored one path of video stream during playback. In this processing manner, since the stored video stream is the result of one path of two paths of image synthesis, the newly generated video stream cannot restore the effects of the original video image quality and details of each path.
Disclosure of Invention
The application provides a video recording and playback method, a device, a storage medium and electronic equipment for multipath audio and video, which can simplify the processing of video recording and playback equipment.
In order to achieve the above purpose, the application adopts the following technical scheme:
a video recording method of multipath audio and video includes:
acquiring audio streams and video streams of a plurality of devices for multipath audio and video communication;
mixing the audio streams of each device to obtain a mixed audio stream, storing the mixed audio stream, and retaining time stamp information;
and respectively storing the video streams of the plurality of devices, and reserving the respective time stamp information.
Preferably, when the device storing the audio stream and the video stream is a designated device of the plurality of devices, the acquiring the audio stream and the video stream of the plurality of devices performing the multi-path audio-video communication includes:
the appointed equipment receives audio and video streams of corresponding equipment transmitted by other equipment except the appointed equipment in the plurality of equipment, and extracts video streams and audio streams of the corresponding equipment after unpacking the audio and video streams; the second device collects and generates local video and audio streams.
Preferably, said saving said mixed stream comprises: encoding and packaging the mixed sound stream, and storing;
The storing the video streams of the plurality of devices respectively includes: and packaging and storing the video streams of the other devices, and encoding and packaging the local video streams and storing the local video streams.
Preferably, the mixed audio stream and the video stream of each device are packaged and stored in a video file corresponding to each device.
Preferably, the multi-path audio-video communication is two-path audio-video communication, the multi-path equipment is a first equipment and a second equipment, and the mixed audio stream and the video streams of the plurality of equipment are stored in the second equipment;
the obtaining the audio streams and the video streams of the plurality of devices for multi-path audio-video communication comprises the following steps:
the second device receives an audio-video stream of the first device sent by the first device, unpacks the audio stream and then extracts the audio stream and the video stream of the first device; the second device collects and generates local audio streams and video streams;
the step of mixing the audio streams of each device to obtain a mixed audio stream comprises the following steps:
decoding the audio stream of the first device, and then performing audio mixing processing on the decoded audio stream and the local audio stream to obtain the audio mixing stream;
the method for storing the mixed audio stream and the video streams of the plurality of devices comprises the following steps:
Encoding the mixed audio stream;
and packaging and storing the encoded mixed audio stream and the video stream of the first device into a first video file corresponding to the first device, and packaging and storing the local video stream and the encoded mixed audio stream into a second video file corresponding to the second device after encoding the local video stream.
A playback method of multipath audio and video includes:
acquiring video streams of a plurality of devices which are respectively stored and used for carrying out multi-channel audio and video communication and respective time stamp information;
acquiring stored mixed sound stream and timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
and synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream.
Preferably, when synchronously playing each video stream and the audio mixing stream, the playing progress of the audio mixing stream is taken as a reference, and each video stream is synchronously played with reference to the playing progress of the audio mixing stream.
Preferably, the method for synchronously playing any video stream with reference to the playing progress of the audio mixing stream includes:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
Subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
Preferably, before comparing V2-A2- (V0-A0) with SyncT, the method further comprises:
judging whether |V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold, if so, executing the operation of comparing the V2-A2- (V0-A0) with the SyncT;
otherwise, when V2-A2- (V0-A0) >0, suspending the playing processing of any path of video stream within a set time, and after the suspension time is up, re-executing the operation of judging whether the I V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold; and when V2-A2- (V0-A0) <0, directly playing the current frame of any path of video stream without delay processing.
Preferably, when the device for playing back the mixed audio stream and the video stream is a specified device of the plurality of devices, before determining whether V2-A2- (V0-A0) is less than or equal to SyncT, the method further comprises:
judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold, if so, normally playing the current frame of the video stream of the appointed equipment according to the timestamp information of the current frame; otherwise, the operation of comparing V2-A2- (V0-A0) with SyncT is performed.
Preferably, the multi-path audio-video communication is two-path audio-video communication, and the multi-path equipment is two equipment.
A recording apparatus for multiplexing audio and video, the apparatus comprising: the device comprises a receiving unit, a mixing unit and a storage unit;
the receiving unit is used for acquiring audio streams and video streams of a plurality of devices for multipath audio and video communication, respectively storing the video streams of each device into the storage unit, and reserving respective time stamp information;
the audio mixing unit is used for mixing the audio streams of each device to obtain a mixed audio stream, storing the mixed audio stream in the storage unit and retaining time stamp information;
the storage unit is used for storing the mixed audio stream and the timestamp information thereof, and storing the video stream and the respective timestamp information of each device.
A playback apparatus for multiplexed audio and video, comprising: a video stream processing unit, an audio stream processing unit and a playback unit;
the video stream processing unit is used for acquiring the video streams of the multiple devices which are respectively stored and used for carrying out multi-channel audio and video communication and the respective time stamp information;
the audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
and the playback unit is used for synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream.
A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, perform a method of recording or playback of multiple audio and video signals as described above.
An electronic device comprising at least a computer readable storage medium, further comprising a processor;
the processor is configured to read the executable instructions from the computer-readable storage medium and execute the instructions to implement the recording or playback method of the multi-channel audio/video.
As can be seen from the above technical solution, in the present application, when video recording is performed in a multi-channel audio/video communication process, audio streams and video streams of a plurality of devices performing multi-channel audio/video communication are acquired; mixing audio streams of a plurality of devices to obtain a mixed audio stream, storing the mixed audio stream and retaining time stamp information; and respectively storing the video streams of the plurality of devices and reserving the respective time stamp information. Correspondingly, when the multi-channel audio and video communication is played back, video streams of a plurality of devices and respective time stamp information which are respectively stored are obtained, and stored mixed audio streams and time stamp information thereof are obtained; and synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream. Through the processing, audio streams of a plurality of devices are stored after being mixed, so that the pressure of audio mixing and playing during playback is reduced, and the processing of playback devices is simplified; meanwhile, the multi-path video streams are not combined into one-path video stream through transcoding, so that the time and memory consumption for rendering the multi-path video stream into picture-in-picture and transcoding are reduced, the performance requirement on a system on a video recording (SOC) is low, and the processing of video recording equipment can be simplified.
Drawings
FIG. 1 is a basic flow chart of a method for recording multiple paths of audios and videos in the application;
fig. 2 is a basic flow diagram of a playback method of multiple audio and video in the present application;
fig. 3 is a basic flow diagram of a recording and playback method of two-way audio and video in an embodiment of the present application;
FIG. 4 is a block diagram of a synchronous video recording for two-way video intercom;
FIG. 5 is a video playback block diagram of a two-way video intercom;
FIG. 6 is a schematic diagram of a recording apparatus for multi-channel audio and video according to the present application;
FIG. 7 is a schematic diagram of a playback apparatus for multi-channel audio/video in the present application;
fig. 8 is a schematic structural diagram of an electronic device according to the present application.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings, in order to make the objects, technical means and advantages of the present application more apparent.
The application provides a video recording scheme of multiple paths of audios and videos and a corresponding synchronous playback scheme, and the technical scheme can be widely applied to various scenes of two-path or even more paths of audio and video communication, such as two-path or more paths of video intercom scenes. The following describes a multi-channel audio and video communication scene, which can include a plurality of audio and video communication devices (such as video interphones), and each device previews a local camera picture and plays other remote audio and video simultaneously in the audio and video communication process of the plurality of devices, so as to keep pictures of multiple visual pictures in picture, and the size of each video picture can be adjusted at any time according to the use habit. Each path of audio and video can be recorded and stored in the communication process, each path of audio and video is synchronously played back when the video is required to be checked, and then the problems of how to record and synchronously play back two paths of audio and video are solved through the processing of the application.
Fig. 1 is a basic flow chart of a recording method of multiple audio and video in the present application. As shown in fig. 1, the method includes:
step 101, acquiring audio streams and video streams of a plurality of devices for multi-channel audio and video communication;
102, mixing audio streams of a plurality of devices to obtain a mixed audio stream, storing the mixed audio stream, and retaining time stamp information;
and 103, respectively storing the video streams of the plurality of devices and reserving the respective time stamp information.
To this end, the basic method flow shown in fig. 1 ends. In the method, audio streams of a plurality of devices for multi-channel audio-video communication are stored after being mixed so as to facilitate subsequent playback processing; meanwhile, the multipath video streams are respectively stored, so that the image quality and the details of the original video are ensured as much as possible during playback.
Fig. 2 is a basic flow chart of a playback method of multiple audio and video in the present application. As shown in fig. 2, the method includes:
step 201, obtaining respectively stored video streams of a plurality of devices for multi-channel audio and video communication and respective time stamp information;
step 202, obtaining stored mixed audio stream and timestamp information thereof;
the audio stream is obtained after audio streams of a plurality of devices are mixed;
Step 203, playing the plurality of video streams and the audio mixing stream synchronously according to the time stamp information of the plurality of video streams and the time stamp information of the audio mixing stream.
The basic method flow shown in fig. 2 ends up here. In the method, the mixed audio stream stored after the multi-path audio stream is mixed is directly played during playback, and the mixed audio stream and each video stream are synchronously played. In this way, no additional mixing process is required for audio playback, enabling the processing of the playback device to be simplified.
In the video recording and playback method of the present application, the processing methods of video recording and playback may be implemented in one physical device, or may be implemented in different devices; meanwhile, the device for realizing video recording and playback may be one of a plurality of devices for performing multi-channel audio and video communication, or may be a third party device different from the plurality of communication devices. For example, in order to reduce the occupation of the storage space of the terminal, when multiple communication devices perform multi-channel audio and video communication, video recording processing can be performed in the network server according to the method and corresponding media files can be stored, and when playback is required, playback processing can be performed on the multiple communication devices or another third party device to perform synchronous playback of the multi-channel audio and video; or, in order to facilitate viewing and not be affected by network secondary transmission, the above-mentioned recording and playback processing may also be performed on several devices among a plurality of devices performing multi-channel audio/video communication. Here, for convenience of description, a device that performs recording or playback processing is referred to as a specified device, but although a device that performs recording or playback processing is referred to as a specified device, it is not represented by a device specified by a user or a system or the like, but is merely referred to as a term representing that the device is a device that performs saving or playback processing, for example, the specified device may also be a random device, a preset device or the like.
The following embodiments take a recording and playback method performed on one of two devices performing two-way audio and video communication as an example, and describe specific implementations of the recording method and playback method in the present application. Since the processing of the recording method and the playback method are corresponding, the following embodiment describes a complete flow including the recording method and the playback method.
Fig. 3 is a basic flow chart of a recording and playback method of two-way audio and video in an embodiment of the application. The two devices for performing two-way audio and video communication are called a first device and a second device, and the second device performs storage and playback processing of an audio and video stream, that is, the aforementioned designated device. As shown in fig. 3, the method includes:
in step 301, the second device extracts the video stream of the first device from the audio/video stream sent by the first device, and stores the video stream, and retains the timestamp information.
When the first equipment and the second equipment carry out audio and video communication, the first equipment sends the audio and video stream which is locally collected and generated to the second equipment. The first device may generate the audio/video stream according to the existing manner, for example, encode and compress the audio/video stream and package the audio/video stream into RTP packets for transmission to the second device.
The second device receives the audio and Video stream of the first device, extracts the Video stream Video1 from the audio and Video stream to save, synchronously plays the audio and Video stream for the subsequent step, and also needs to keep time stamp information when the Video stream of the first device is saved. The method of extracting the Video stream is performed according to the format of the audio/Video stream, for example, if the audio/Video stream is transmitted in the form of RTP packets, the audio/Video stream needs to be first RTP unpacked when the Video stream is extracted, and then the Video1 part of the Video stream is extracted from the unpacked audio/Video stream.
In particular, when the extracted Video stream Video1 is stored, in order to save processing resources, it is preferable that the extracted Video stream is stored after being directly packaged without performing decoding processing. Of course, the extracted video stream may be decoded and packaged, and then the packaged video stream may be stored, which obviously requires more computing resources, but the second device may use a different encoding mode than the first device when recoding the video stream of the first device. The specific coding and packing modes can be selected according to the needs, for example, the specific coding and packing modes are packed into PS packets for storage.
In this embodiment, since the device performing the video recording process is one of the audio and video communications, it is only necessary to receive and extract the video stream of the other device according to this step, and the video stream of the local device is processed according to step 302. However, if the device performing the video recording process is a third party device, such as a web server, different from the first device and the second device, the video streams for both devices are extracted and processed in the manner of this step 301.
Step 302, the second device collects and generates a local video stream to save, and retains timestamp information.
The manner in which the second device collects and generates the local Video stream Video2 may be in an existing manner, which will not be described herein.
The local Video stream Video2 is stored, so that for synchronous playing later, the timestamp information needs to be reserved when the mixed audio stream is stored. Specifically, when storing Video2, video2 may be encoded and packaged for storage. The specific coding and packing modes can be selected according to the needs, for example, the specific coding and packing modes are packed into PS packets for storage.
Step 303, the second device extracts an audio stream from the audio/video stream sent from the first device, mixes the audio stream with the local audio stream collected and generated by the second device, and stores the obtained mixed audio stream, and retains the timestamp information.
The manner in which the second device collects and generates the local Audio stream Audio2 may be an existing manner, which will not be described herein.
For the Audio/video stream extracted from the first device, the specific extraction is similar to the video stream extraction in step 301, and is performed according to the format of the Audio/video stream, for example, if the Audio/video stream is transmitted in the form of RTP packets, the Audio/video stream needs to be first RTP unpacked when the Audio/video stream is extracted, and then the Audio1 part of the Audio stream is extracted from the unpacked Audio/video stream, and since the Audio mixing process is further required to be performed subsequently, the Audio1 needs to be decoded to be recovered to the original Audio stream.
The Audio stream Audio1 of the first device (i.e. the recovered original Audio stream) and the local Audio stream Audio2 of the second device are subjected to Audio mixing processing to obtain a mixed Audio stream Audio3, and the Audio mixing processing mode can adopt various existing modes, which is not limited in the application.
And storing the Audio3 of the mixed stream obtained after mixing, and keeping time stamp information when storing the mixed stream for synchronous playing. Specifically, when Audio3 is stored, audio3 may be encoded and packaged for storage. The specific coding and packing modes can be selected according to the needs, for example, the specific coding and packing modes are packed into PS packets for storage.
In this embodiment, since the device performing the video recording process is one of the audio and video communications, the audio stream processes for the first device and the second device in this step are not exactly the same. However, if the device for performing video recording processing is a third party device, such as a web server, different from the first device and the second device, the audio streams of both devices may be processed according to the processing manner of the audio stream of the first device, and then the processed audio streams are encoded, packaged and stored after being subjected to audio mixing processing.
Through the steps 301, 302 and 303, video1, video2 and Audio3 can be saved, thus completing the Video processing method. In fact, generally, when audio and video files are stored, audio and video may be stored in one file to facilitate the playback process. Based on this, preferably, in the present application, video1 and Audio3 may be packaged together and stored in a first Video file corresponding to a first device, and Video2 and Audio3 may be packaged together and stored in a second Video file corresponding to a second device. Therefore, when the video playback is carried out, the first video file and the second video file can be independently played, and because the Audio3 is a mixed Audio stream, the Audio playback device has the sound of both Audio and video communication sides when any video file is played, and the voice background environment of the two-way Audio and video communication is effectively restored. Of course, if taking account of the occupation of storage resources and the unilateral picture when playing Audio-Video communication alone is not required, audio3 may be packaged and stored alone or together with one of Video streams of Video1 and Video 2.
In addition, steps 301, 302 and 303 described above may be performed in parallel.
Step 304, when the second device performs video playback, the video stream, the local video stream and the audio mixing stream of the first device are synchronously played according to the timestamp information of the video stream, the timestamp information of the local video stream and the timestamp information of the audio mixing stream of the first device.
Through the steps 301, 302 and 303, video1, video2 and Audio3 can be recorded and stored, and because corresponding timestamp information is reserved when corresponding Audio and Video streams are stored, video1, video2 and Audio3 can be synchronously played according to the corresponding timestamp information, so that recording and playback are realized. Because the Audio3 is the Audio stream after mixing the two paths of Audio, the two paths of Audio do not need to be mixed and played again during video playback, and the video playback process is greatly simplified. In addition, if the playback process and the previous recording process are performed in different devices, it is also necessary to acquire the mixed audio stream and the timestamp information thereof stored in the recording location, and the two video streams and the timestamp information thereof stored separately before step 304.
Specifically, when synchronous playback is performed, the playing progress of one path of video stream a can be used as a reference, and the audio stream and the other path of video stream can be synchronously played with reference to the playing progress of the video stream a. Taking the foregoing process as an example, the playing progress of the Video2 may be used as a reference, and the Audio3 and the Video1 may be synchronously played with reference to the playing progress of the Video2, or the playing progress of the Video1 may be used as a reference, and the Audio3 and the Video2 may be synchronously played with reference to the playing progress of the Video 1.
However, considering that the sensitivity of the person to the sound jump is higher, it is preferable that the playing progress of the mixed audio stream is used as a reference, and the two video streams are synchronously played with reference to the playing progress of the mixed audio stream, that is, the two video streams calculate the delay synchronization time of the current frame relative to the previous frame by the difference value of the start time stamp of the mixed audio stream, the time consumption of the playing pretreatment, the video frame interval time and the like, so that the playing treatment is performed according to the calculated time. The following provides a calculation method of the delay synchronization time:
the meaning of the individual quantities involved will first be described.
V0 represents the start time stamp of Video2, A0 represents the start time stamp of Audio3,
v1 represents the last frame time stamp of Video2, A1 represents the last frame time stamp of Audio3,
v2 represents the current frame timestamp of Video2, A2 represents the current frame timestamp of Audio3,
v0 'represents the start timestamp of Video1, V1' represents the last frame timestamp of Video1,
v2' represents the current frame timestamp of Video1,
SyncT represents the maximum between V2-V1 and the one-way audio-video synchronization threshold,
SyncT ' represents the maximum between V2' -V1' and the two-way audio-video synchronization threshold.
Next, introduce the calculation mode of synchronous delay and video playing method
1. Delay synchronous calculation of non-same-path Video1 synchronous Audio frequency Audio3
When the Video stream Video1 is independently played, the timestamp difference value between the current frame and the previous frame is V2'-V1', namely the delay time between the two frames is V2'-V1'; when playing the video with reference to the audio, the time stamp information of the audio playing needs to be considered on the basis of the time stamp information of the video, and the time delay time between two video frames is adjusted to keep synchronous with the audio playing.
Specifically, V2'-A2- (V0' -A0) may be compared with SyncT ', if V2' -A2- (V0 '-A0) < -SyncT' indicates that the Video1 is played back later than the Audio3, and the delay should be reduced, determining that the first delay time of the current frame of the Video stream Video1 relative to the previous frame is V2'-V1' - λ1 '(i.e. a preset step λ1' is advanced based on the normal Video frame interval V2 '-V1'); if-SyncT 'is less than or equal to V2' -A2- (V0 '-A0) is less than or equal to SyncT', indicating that the current delay processing is in a normal synchronous playing range, normally playing the current frame of the Video stream Video1 according to the timestamp information of the current frame, namely determining that the first delay time of the current frame of the Video stream Video1 relative to the previous frame is normal Video frame interval V2'-V1'; if V2' -A2- (V0 ' -A0) > SyncT ', indicating that the Video stream Video1 is played ahead of the Audio3, an additional delay should be performed, it may be determined that the first delay time of the current frame of the Video stream Video1 with respect to the previous frame is V2' -V1' +λ2' (i.e. the preset step λ2' is re-delayed on the basis of the normal Video frame interval V2' -V1 '). After the first delay time is determined in the above manner, the time consuming time of the processing before playing is subtracted from the first delay time to be used as the delay synchronization time, and playing of the current frame of the Video stream Video1 is performed according to the delay synchronization time. Wherein λ1 'and λ2' may be empirical values set according to current device performance and operating environment, etc. In addition, λ1' can be set as V2' -V1' directly, that is, when V2' -A2- (V0 ' -A0) < -SyncT, the first delay time is directly 0, the current frame does not perform delay processing relative to the previous frame, and decoding and playing of the current frame are directly performed, so that the delay synchronization time does not need to be calculated under the natural condition, and the playing progress of the audio can be caught up more quickly.
In addition, consider a special scenario: in the initial stage of video communication establishment, the local audio and video of the second device starts to be recorded first, and the audio and video sent by the first device starts to be recorded after a period of time, so that the video recording start time of the first device is far later than the recording start time of a mixed audio stream including the local audio and exceeds a reasonable audio and video synchronous processing range, and then the processing of restoring the real scene should be: the local video and the mixed audio stream of the second device start to play first, at this time, the picture of the first device should be black screen, the video content is not displayed, and after a period of time, the video starts to display the picture. If, in the above processing manner, it is found that V2'-A2- (V0' -A0) is longer than SyncT ', the additional delay processing is performed and then played, which is not consistent with the actual scene, and in consideration of this situation, in this embodiment, in order to reduce the actual scene when two-way audio and video communication is performed as much as possible, it may be preferable to further include the following processing before comparing V2' -A2- (V0 '-A0) with SyncT':
comparing |V2' -A2- (V0 ' -A0) | with a preset allowable synchronization threshold X, if |V2' -A2- (V0 ' -A0) | is less than or equal to X, continuing to compare the V2' -A2- (V0 ' -A0) with SyncT ' and performing subsequent processing, namely calculating and playing the synchronization delay time of the video. If |v2'-A2- (V0' -A0) | > X, it indicates that the time difference between the video stream and the audio mixing stream is very large, and exceeds the synchronous processing range, and the processing is continuously divided into the following two cases:
a. When V2'-A2- (V0' -A0) >0, it indicates that the playing of the Video stream Video1 is much advanced relative to the playing of the Audio3, and beyond a reasonable synchronization processing range, the Video communication should be in an initial stage of Video communication establishment, and the Video of the second device is caused by that the Video of the second device begins to be recorded before the Video of the first device, then the processing of the Video stream Video1 is paused in a set time, the Video frame is not decoded and played, and after the pause time is reached, the timestamp of the current frame is again judged until V2'-A2- (V0' -A0) enters a reasonable synchronization processing range (namely 0< V2'-A2- (V0' -A0) < X), and then the Video synchronization playing of the reference Audio is performed by using the manner described above;
b. when V2' -A2- (V0 ' -A0) <0, it indicates that the Video1 is played back much later than the Audio3, and the first delay time is set to V2' -V1' - λ1' as in the previous process, and preferably the first delay time is set to 0, and the playback is directly decoded without delay processing, so as to catch up with the Audio playing progress as soon as possible.
According to the above preferred embodiment, when the Audio/Video recording of the second device is started earlier than the first device, in the stage of playing back the Video recording just started, the picture of the first device is consistent with the actual scene and displayed as a black screen, and the playing of the picture is not started until the time stamp of Video1 and the time stamp of Audio3 are close to be within the synchronous processing range.
Further, in the above-described processing, since the reference time of the time stamp information may be different between the stored video stream and the audio-visual stream, V2'-A2- (V0' -A0) is used in the comparison processing. In general, when the received audio/video stream is stored, the time stamp information of the video stream and the audio stream is adjusted to be the same as the reference time, and in this case, V2' -A2 may be used instead of V2' -A2- (V0 ' -A0) to perform the above-described various operations and comparisons.
2. Delay synchronization time calculation of same-channel Video2 synchronous Audio frequency Audio3
Specifically, V2-A2- (V0-A0) may be compared with SyncT, if V2-A2- (V0-A0) < -SyncT indicates that the local Video playing is delayed relative to the audio playing, and the delay should be reduced, then the first delay time of the current frame of the local Video stream Video2 relative to the previous frame is determined to be V2-V1- λ1 (i.e. the preset step λ1 is advanced on the basis of the normal Video frame interval V2-V1); if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, indicating that the current delay processing is in a normal synchronous playing range, normally playing the current frame of the local Video stream Video2 according to the timestamp information of the current frame, namely determining that the first delay time of the current frame of the local Video stream Video2 relative to the previous frame is normal Video frame interval V2-V1; if V2-A2- (V0-A0) > SyncT indicates that the local Video playback is advanced relative to the audio playback and additional delay should be performed, then it may be determined that the first delay time of the current frame of the local Video stream Video2 relative to the previous frame is v2-v1+λ2 (i.e. the preset step λ2 is re-delayed on the basis of the normal Video frame interval V2-V1). After the first delay time is determined in the above manner, the time consuming time of the processing before playing is subtracted from the first delay time to be used as the delay synchronization time, and the playing of the current frame of the local video stream is performed according to the delay synchronization time. The λ1 and λ2 may be empirical values according to the current device performance and operating environment, where λ1 may be directly set to V2-V1, that is, when V2-A2- (V0-A0) < -SyncT, the first delay time is directly 0, the current frame does not perform delay processing with respect to the previous frame, and decoding and playing of the current frame are directly performed, so that the delay synchronization time does not need to be calculated any more in the natural situation, and thus the playing progress of the audio can be caught up more quickly. In this embodiment, λ1, λ2, λ1 'and λ2' may be equal or different.
The above processing method ensures that the video and the audio are always synchronous, but considering that the same-path video is actually a locally recorded video, the situation that the video of the first device starts to be recorded earlier than the local video does not occur, so that the |v2-a2- (V0-A0) | is in a reasonable synchronous processing range under normal conditions, and therefore, the aforesaid synchronization delay time judgment is that the |v2-a2- (V0-A0) | is not compared with the allowable synchronization threshold value X; however, considering that the occurrence of |v2-A2- (V0-A0) | > X is likely to occur due to file abnormality, it is preferable in the present embodiment that the following process be further included before comparing V2-A2- (V0-A0) with SyncT:
comparing |V2-A2- (V0-A0) | with an allowable synchronization threshold X, if |V2-A2- (V0-A0) | > X, indicating that the file is abnormal, and performing synchronous playing of the local Video stream Video2 without referring to the Audio mixing stream Audio3, and performing self-playing according to the Video frame interval of the local Video stream Video 2; if the I V2-A2- (V0-A0) I is less than or equal to X, the comparison of V2-A2- (V0-A0) and SyncT and the subsequent processing are continued, namely the synchronous playing of the local Video stream Video2 is carried out by referring to the Audio mixing stream Audio 3.
In this embodiment, since the apparatus for performing video recording processing is one of the audio and video communications, the processing for distinguishing the same-channel video and the processing for non-same-channel video and audio in the synchronous playback is performed in this step. However, if the apparatus for performing video recording processing is a third party apparatus, such as a web server, different from the first apparatus and the second apparatus, the processing of the video stream synchronized audio mix for both apparatuses is performed according to the processing of the above-described non-identical video synchronized audio.
The method flow shown in fig. 3 ends up here. By the video recording and playback method, the double-path audio is stored after being mixed, so that the pressure of audio mixing playing during video recording synchronous playback is reduced; the two paths of videos are respectively stored and played, so that the image quality and the details of the original videos can be kept as much as possible, the time consumption and the memory consumption of fusion rendering of the two videos into picture-in-picture and re-encoding are reduced, the SOC performance requirement is low, meanwhile, the two paths of video pictures are respectively played, and the positions of the two paths of video pictures can be freely dragged and adjusted along with a user according to the use habit; the video stream and the audio stream are stored locally in the playback device, and the audio and video is not required to be retrieved through network transmission, so that the situation of poor user experience caused by the influence of network quality is avoided. Furthermore, when playing back, the video stream is synchronously played with reference to the audio stream by taking the playing progress of the audio stream as a reference, so that the original audio scene is maintained, even if the video picture is jumped, the video picture is not easy to perceive, and the user experience is greatly improved.
In addition, the present embodiment provides a case of recording and playback processing of two-way audio and video communication, and in fact, the above processing method is equally applicable to a case of N (N > 2) way audio and video communication. Specifically, if the Video recording and playback device is a certain device a of the N-channel devices for communication, the processes of collecting, storing and synchronously playing the local Audio stream and the Video stream of the device a are the same as the processes of the local Audio stream Audio 2 and the local Video stream Video2 in the embodiment, and the processes of extracting, storing and synchronously playing the Audio stream and the Video stream of the other N-1 devices are the same as the processes of the far-end Audio stream Audio1 and the far-end Video stream Video1 in the embodiment; if the Video and playback device is not one of the N devices but a further device B, e.g. a web server, different from the N devices, the processing of the Audio stream and the Video stream for each of the N devices is performed according to the processing of the far-end Audio stream Audio1 and the far-end Video stream Video1 described above, respectively. For the scenes of N paths of audio and video communication, the video recording and playback method can reduce the audio mixing playing pressure during synchronous video recording and playback, maintain the image quality and detail of the original video as far as possible, reduce the time consumption and memory consumption of double video fusion rendering into picture-in-picture and re-transcoding, have low SOC performance requirements, enable multiple paths of video pictures to be played respectively, enable the positions of the multiple paths of video pictures to be dragged and adjusted freely along with users according to the use habit, avoid the situation of poor user experience caused by the influence of network quality, maintain the original audio scene, enable the video pictures to be not easily perceived even if the video pictures jump to some extent, and greatly improve the user experience.
An example of two-way audio and video synchronous video recording and playback is given below by taking two-way video intercom as an example. Fig. 4 is a synchronous video block diagram of two-way video intercom, and the device 1 and the device 2 perform real-time audio and video two-way visual intercom. Transmitting Video1 and Audio1 of the device 1 to the device 2 through a network protocol, and synthesizing the Audio1 and the Audio2 acquired by the local device into Audio3 through an Audio mixing algorithm after the device 2 obtains the Audio1 through RTP unpacking and decoding; audio3 is used as Audio played by the bidirectional real-time visual intercom, and is respectively stored in a video1 file and a video2 file, and a time stamp is reserved; and the Video1 is unpacked by RTP and then is stored in a Video1 file by PS package, the locally acquired Video2 is stored in the Video2 file by PS package after encoding, and the Video1 and the Video2 respectively keep respective time stamps.
Fig. 5 is a video playback block diagram of a two-way video intercom. The device 2 starts two paths of Audio and Video unpacking, decoding and playing, wherein the Audio3 of one path of Video2 is directly played by PS unpacking, decoding and playing, and the Video2 calculates delay synchronous time by the difference value of the starting time stamp of the Audio3, the decoding display time consumption, the Video frame interval time and the like, and plays according to the delay synchronous time. Similarly, video1 of Video1 and Audio3 of Video2 calculate the delay synchronization time in the same way to play, and Audio3 of Video1 is consistent with Audio3 of Video2 and can be directly discarded. When the picture is displayed, the local video of the device 2 is displayed full screen, and the video of the device 1 is displayed as a small picture in picture.
The method is a specific implementation of the multi-channel audio and video recording method and the playback method. The application also provides a video recording device and a playback device for the multipath audio and video, which can be used for implementing the video recording method and the playback method respectively.
Fig. 6 is a schematic diagram of a recording apparatus for multi-channel audio and video according to the present application. As shown in fig. 6, the apparatus includes: a receiving unit, a mixing unit and a storage unit.
The receiving unit is used for acquiring the audio streams and the video streams of a plurality of devices for multipath audio and video communication, respectively storing the video streams of the plurality of devices in the storage unit, and reserving the respective time stamp information. And the audio mixing unit is used for mixing the audio streams of each device to obtain mixed audio streams, storing the mixed audio streams in the storage unit and retaining the time stamp information. And the storage unit is used for storing the mixed audio stream and the timestamp information thereof, and storing video streams of a plurality of devices and the respective timestamp information.
Alternatively, when the video recording apparatus is located in a designated device among the plurality of devices, the receiving unit may include a receiving subunit, an audio processing subunit, and a video processing subunit.
The receiving subunit is used for receiving the audio and video streams of the corresponding devices sent by the other devices except the designated device in the plurality of devices, and extracting the video streams and the audio streams of the corresponding devices after unpacking the audio and video streams. The audio processing subunit is used for collecting and generating local audio streams and sending the local audio streams to the mixing unit; and the audio stream of other devices extracted by the receiving subunit is decoded and then sent to the mixing unit. The video processing subunit is used for collecting and generating a local video stream, encoding and packaging the local video stream, storing the local video stream in the storage unit, and retaining time stamp information; and the video streaming device is also used for packaging the video streams of other devices extracted by the received subunit, storing the packaged video streams in the storage unit and retaining the time stamp information. The mixing unit encodes and packages the mixed stream and stores the mixed stream when storing the mixed stream.
Alternatively, the mixed audio stream and the video stream of each device may be packaged and stored in a video file corresponding to each device.
Optionally, the multi-path audio-video communication may be two-path audio-video communication, where the multi-path devices are a first device and a second device, respectively, and the apparatus is located in the second device; the receiving unit may include a receiving subunit, an audio processing subunit, and a video processing subunit; the apparatus may further include a packing unit;
the receiving subunit is used for receiving the audio and video streams of the first device sent by the first device, extracting the audio streams and the video streams of the first device after unpacking the audio and video streams, sending the audio streams of the first device to the audio processing subunit, and sending the video streams of the first device to the packing unit;
the audio processing subunit is used for collecting and generating local audio streams and sending the local audio streams to the mixing unit; the audio stream of the first device extracted by the receiving subunit is decoded and then sent to the audio mixing unit;
the video processing subunit is used for collecting and generating local video streams and sending the local video streams to the packaging unit;
in the mixing unit, the process of mixing the audio stream of each device to obtain a mixed stream may include:
Mixing the decoded result of the audio stream of the first device sent by the audio processing subunit with the local audio stream sent by the audio processing subunit to obtain a mixed audio stream, and sending the mixed audio stream to the packaging unit;
a packing unit for encoding the mixed stream from the mixing unit; and the video processing subunit is also used for packaging and storing the coded mixed audio stream and the video stream of the first device sent by the receiving subunit into a first video file corresponding to the first device in the storage unit, and packaging and storing the local video stream sent by the video processing subunit and the coded mixed audio stream into a second video file corresponding to the second device in the storage unit.
Fig. 7 is a schematic diagram of a playback apparatus for multi-channel audio and video in the present application. As shown in fig. 7, the apparatus includes: a video stream processing unit, an audio stream processing unit, and a playback unit.
And the video stream processing unit is used for acquiring the respectively stored video streams of the plurality of devices for carrying out multi-channel audio and video communication and the respective time stamp information. The audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio stream is obtained by mixing audio streams of a plurality of devices. And the playback unit is used for synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream.
Optionally, when each video stream and the audio mixing stream are synchronously played in the playback unit, each video stream is synchronously played with reference to the playing progress of the audio mixing stream by taking the playing progress of the audio mixing stream as a reference.
Optionally, in the playback unit, the method for synchronously playing any one of the video streams with reference to the playing progress of the audio mixing stream includes:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
Optionally, before comparing V2-A2- (V0-A0) with SyncT, the method may further comprise:
judging whether |V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold, if so, executing the operation of comparing the V2-A2- (V0-A0) with the SyncT;
otherwise, when V2-A2- (V0-A0) >0, suspending the playing processing of any path of video stream within a set time, and after the suspension time is up, continuing to execute the operation of judging whether |V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold; and when V2-A2- (V0-A0) <0, directly playing the current frame of any path of video stream without delay processing.
Alternatively, when the playback apparatus is located in a specified device among the plurality of devices, in the playback unit, before determining whether V2-A2- (V0-A0) is less than or equal to SyncT, for the video stream playback of the specified device, it may further include: judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold, if so, normally playing a current frame of a video stream of the appointed equipment according to the timestamp information of the current frame; otherwise, an operation of comparing V2-A2- (V0-A0) with SyncT is performed.
Optionally, the multi-channel audio-video communication is two-channel audio-video communication, and the multi-channel equipment is two devices.
The present application also provides a computer readable storage medium storing instructions that, when executed by a processor, perform steps in a recording method and a playback method for implementing multiple audio and video as described above. In practice, the computer readable medium may be comprised by or separate from the apparatus/device/system of the above embodiments, and may not be incorporated into the apparatus/device/system. Wherein instructions are stored in a computer readable storage medium which when executed by a processor perform the steps of the recording method and playback method of multi-channel audio and video as described above.
According to an embodiment of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: portable computer diskette, hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), portable compact disc read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing, but are not intended to limit the scope of the application. In the disclosed embodiments, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Fig. 8 is an electronic device provided by the application. As shown in fig. 8, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:
the electronic device can include a processor 801 of one or more processing cores, a memory 802 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. When the program of the memory 802 is executed, a recording method and a playback method of multiple audio and video can be implemented.
Specifically, in practical application, the electronic device may further include a power supply 803, an input/output unit 904, and other components. It will be appreciated by those skilled in the art that the structure of the electronic device shown in fig. 8 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
the processor 801 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of a server and processes data by running or executing software programs and/or modules stored in the memory 802, and calling data stored in the memory 802, thereby performing overall monitoring of the electronic device.
Memory 802 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 801 executes various functional applications and data processing by executing software programs and modules stored in the memory 802. The memory 802 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function, and the like; the storage data area may store data created according to the use of the server, etc. In addition, memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 with access to the memory 802.
The electronic device further comprises a power supply 803 for supplying power to the various components, which may be logically connected to the processor 801 via a power management system, such that functions of managing charging, discharging, and power consumption are performed via the power management system. The power supply 803 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
The electronic device may further comprise an input output unit 804, which input unit output 804 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The input unit output 804 may also be used to display information entered by a user or provided to a user as well as various graphical user interfaces that may be composed of graphics, text, icons, video, and any combination thereof.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims (13)

1. A video playback method of multipath audio and video is characterized by comprising the following steps:
acquiring audio streams and video streams of a plurality of devices for multipath audio and video communication;
mixing the audio streams of each device to obtain a mixed audio stream, storing the mixed audio stream, and retaining time stamp information;
respectively storing the video streams of the plurality of devices, and reserving respective time stamp information;
Acquiring video streams of a plurality of devices which are respectively stored and used for carrying out multi-channel audio and video communication and respective time stamp information;
acquiring stored mixed sound stream and timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
synchronously playing each video stream and the audio mixing stream according to the time stamp information of each video stream and the time stamp information of the audio mixing stream;
when each video stream and the audio mixing stream are synchronously played, taking the playing progress of the audio mixing stream as a reference, and synchronously playing any path of video stream by referring to the playing progress of the audio mixing stream, specifically comprising the following steps:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
Wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
2. The video playback method of claim 1, wherein when the device that holds the mixed audio stream and the video stream is a designated device of the plurality of devices, the acquiring the audio streams and the video streams of the plurality of devices that perform the multiplexed audio-video communication comprises:
the appointed equipment receives audio and video streams of corresponding equipment transmitted by other equipment except the appointed equipment in the plurality of equipment, and extracts video streams and audio streams of the corresponding equipment after unpacking the audio and video streams; the designated device collects and generates local video and audio streams.
3. The video playback method of claim 2, wherein said saving said mixed stream comprises: encoding and packaging the mixed sound stream, and storing;
The storing the video streams of the plurality of devices respectively includes: and packaging and storing the video streams of the other devices, and encoding and packaging the local video streams and storing the local video streams.
4. The video playback method of claim 1, wherein the mixed audio stream and the video stream of each device are stored in packets in a video file corresponding to each device.
5. The video playback method of claim 1, wherein the multi-channel audio-video communication is two-channel audio-video communication, the plurality of devices are a first device and a second device, respectively, and the audio mix stream and the video streams of the plurality of devices are stored in the second device;
the obtaining the audio streams and the video streams of the plurality of devices for multi-path audio-video communication comprises the following steps:
the second device receives an audio-video stream of the first device sent by the first device, unpacks the audio stream and then extracts the audio stream and the video stream of the first device; the second device collects and generates a local audio stream and a local video stream;
the step of mixing the audio streams of each device to obtain a mixed audio stream comprises the following steps:
decoding the audio stream of the first device, and then performing audio mixing processing on the decoded audio stream and the local audio stream to obtain the audio mixing stream;
The method for storing the mixed audio stream and the video streams of the plurality of devices comprises the following steps:
encoding the mixed audio stream;
and packaging and storing the encoded mixed audio stream and the video stream of the first device into a first video file corresponding to the first device, and packaging and storing the local video stream and the encoded mixed audio stream into a second video file corresponding to the second device after encoding the local video stream.
6. The playback method of the multipath audio and video is characterized by comprising the following steps:
acquiring video streams of a plurality of devices which are respectively stored and used for carrying out multi-channel audio and video communication and respective time stamp information;
acquiring stored mixed sound stream and timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
synchronously playing each video stream and the audio mixing stream according to the time stamp information of each video stream and the time stamp information of the audio mixing stream;
when each video stream and the audio mixing stream are synchronously played, taking the playing progress of the audio mixing stream as a reference, and synchronously playing any path of video stream by referring to the playing progress of the audio mixing stream, specifically comprising the following steps:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
Subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
7. The playback method of claim 6, wherein prior to comparing V2-A2- (V0-A0) with SyncT, the method further comprises:
judging whether |V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold, if so, executing the operation of comparing the V2-A2- (V0-A0) with the SyncT;
otherwise, when V2-A2- (V0-A0) >0, suspending the playing processing of any path of video stream within a set time, and after the suspension time is up, re-executing the operation of judging whether the I V2-A2- (V0-A0) | is larger than or equal to a preset allowable synchronization threshold; and when V2-A2- (V0-A0) <0, directly playing the current frame of any path of video stream without delay processing.
8. The playback method as recited in claim 6, wherein when a device that plays back the mixed stream and the video stream is a specified device among the plurality of devices, for video stream playback of the specified device, before determining whether V2-A2- (V0-A0) is less than or equal to SyncT, further comprising:
judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold, if so, normally playing the current frame of the video stream of the appointed equipment according to the timestamp information of the current frame; otherwise, the operation of comparing V2-A2- (V0-A0) with SyncT is performed.
9. The playback method as recited in any one of claims 6 to 8, wherein the multi-way audio-video communication is a two-way audio-video communication, and the plurality of devices are two devices.
10. A video recording and playback system of multipath audio and video is characterized in that the system comprises a video recording device and a playback device; wherein, the video recording device includes: the device comprises a receiving unit, a mixing unit and a storage unit;
the receiving unit is used for acquiring audio streams and video streams of a plurality of devices for multipath audio and video communication, respectively storing the video streams of each device into the storage unit, and reserving respective time stamp information;
The audio mixing unit is used for mixing the audio streams of each device to obtain a mixed audio stream, storing the mixed audio stream in the storage unit and retaining time stamp information;
the storage unit is used for storing the mixed audio stream and the timestamp information thereof, and storing the video stream and the respective timestamp information of each device;
the playback apparatus includes: a video stream processing unit, an audio stream processing unit and a playback unit;
the video stream processing unit is used for acquiring the video streams of the multiple devices which are respectively stored and used for carrying out multi-channel audio and video communication and the respective time stamp information;
the audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
the playback unit is used for synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream;
when synchronously playing each video stream and the mixed audio stream, the playback unit takes the playing progress of the mixed audio stream as a reference, and any one path of video stream is synchronously played with reference to the playing progress of the mixed audio stream, which specifically comprises the following steps:
Comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
11. A playback apparatus for multiplexed audio and video, comprising: a video stream processing unit, an audio stream processing unit and a playback unit;
The video stream processing unit is used for acquiring the video streams of the multiple devices which are respectively stored and used for carrying out multi-channel audio and video communication and the respective time stamp information;
the audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio stream is obtained after audio streams of the plurality of devices are mixed;
the playback unit is used for synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream;
when synchronously playing each video stream and the mixed audio stream, the playback unit takes the playing progress of the mixed audio stream as a reference, and any one path of video stream is synchronously played with reference to the playing progress of the mixed audio stream, which specifically comprises the following steps:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1-lambda 1; if the SyncT is less than or equal to V2-A2- (V0-A0) is less than or equal to SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any path of video stream relative to the previous frame is V2-V1 +lambda2;
Subtracting the time consuming time processed before playing from the first delay time as delay synchronous time, and playing the current frame of any path of video stream according to the delay synchronous time;
wherein V2 and A2 are the time stamps of the current frame of any one path of video stream and the current frame of the audio mixing stream respectively, V0 and A0 are the start time stamps of any path of video stream and the audio mixing stream respectively, V1 is the time stamp of a frame on any path of video stream, syncT is the maximum value between a preset audio and video synchronization threshold and V2-V1, and lambda 1 and lambda 2 are respectively a preset first step length and a preset second step length.
12. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the method of video playback of multiple audio and video according to any one of claims 1 to 9.
13. An electronic device comprising at least a computer-readable storage medium and a processor;
the processor is configured to read executable instructions from the computer readable storage medium and execute the instructions to implement the method for video playback of multiple audio and video according to any one of claims 1 to 9.
CN202111572204.9A 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment Active CN114257771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111572204.9A CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111572204.9A CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN114257771A CN114257771A (en) 2022-03-29
CN114257771B true CN114257771B (en) 2023-12-01

Family

ID=80796327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111572204.9A Active CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114257771B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643442A (en) * 2022-10-25 2023-01-24 广州市保伦电子有限公司 Audio and video converging recording and playing method, device, equipment and storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1205599A (en) * 1997-05-15 1999-01-20 松下电器产业株式会社 Compressed code decoding device and audio decoding device
EP2448265A1 (en) * 2010-10-26 2012-05-02 Google, Inc. Lip synchronization in a video conference
CN104601863A (en) * 2013-09-12 2015-05-06 深圳锐取信息技术股份有限公司 IP matrix system for recording and playing
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
CN109714634A (en) * 2018-12-29 2019-05-03 青岛海信电器股份有限公司 A kind of decoding synchronous method, device and the equipment of live data streams
CN112235597A (en) * 2020-09-17 2021-01-15 深圳市捷视飞通科技股份有限公司 Method and device for synchronous protection of streaming media live broadcast audio and video and computer equipment
CN112702559A (en) * 2021-03-23 2021-04-23 浙江华创视讯科技有限公司 Recorded broadcast abnormity feedback method, system, equipment and readable storage medium
CN112738451A (en) * 2021-04-06 2021-04-30 浙江华创视讯科技有限公司 Video conference recording and playing method, device, equipment and readable storage medium
CN113205822A (en) * 2021-04-02 2021-08-03 苏州开心盒子软件有限公司 Multi-channel audio data recording and sound mixing method and device and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110187927A1 (en) * 2007-12-19 2011-08-04 Colin Simon Device and method for synchronisation of digital video and audio streams to media presentation devices
WO2013082965A1 (en) * 2011-12-05 2013-06-13 优视科技有限公司 Streaming media data processing method and apparatus and streaming media data reproducing device
US10034036B2 (en) * 2015-10-09 2018-07-24 Microsoft Technology Licensing, Llc Media synchronization for real-time streaming
US9979997B2 (en) * 2015-10-14 2018-05-22 International Business Machines Corporation Synchronization of live audio and video data streams
CN105933800A (en) * 2016-04-29 2016-09-07 联发科技(新加坡)私人有限公司 Video play method and control terminal
CN111510755A (en) * 2019-01-30 2020-08-07 上海哔哩哔哩科技有限公司 Audio and video switching method and device, computer equipment and readable storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1205599A (en) * 1997-05-15 1999-01-20 松下电器产业株式会社 Compressed code decoding device and audio decoding device
EP2448265A1 (en) * 2010-10-26 2012-05-02 Google, Inc. Lip synchronization in a video conference
CN104601863A (en) * 2013-09-12 2015-05-06 深圳锐取信息技术股份有限公司 IP matrix system for recording and playing
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
CN109714634A (en) * 2018-12-29 2019-05-03 青岛海信电器股份有限公司 A kind of decoding synchronous method, device and the equipment of live data streams
CN112235597A (en) * 2020-09-17 2021-01-15 深圳市捷视飞通科技股份有限公司 Method and device for synchronous protection of streaming media live broadcast audio and video and computer equipment
CN112702559A (en) * 2021-03-23 2021-04-23 浙江华创视讯科技有限公司 Recorded broadcast abnormity feedback method, system, equipment and readable storage medium
CN113205822A (en) * 2021-04-02 2021-08-03 苏州开心盒子软件有限公司 Multi-channel audio data recording and sound mixing method and device and storage medium
CN112738451A (en) * 2021-04-06 2021-04-30 浙江华创视讯科技有限公司 Video conference recording and playing method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN114257771A (en) 2022-03-29

Similar Documents

Publication Publication Date Title
CN109168078B (en) Video definition switching method and device
CN110868600B (en) Target tracking video plug-flow method, display method, device and storage medium
WO2015031548A1 (en) Audio video playback synchronization for encoded media
CN105393547B (en) Sending method, method of reseptance, sending device and reception device
WO2020215453A1 (en) Video recording method and system
JP2008500752A (en) Adaptive decoding of video data
US11388472B2 (en) Temporal placement of a rebuffering event
WO2020215454A1 (en) Screen recording method, client, and terminal device
CN114257771B (en) Video playback method and device for multipath audio and video, storage medium and electronic equipment
WO2016008131A1 (en) Techniques for separately playing audio and video data in local networks
Tang et al. Audio and video mixing method to enhance WebRTC
CN105992049A (en) RTMP live broadcast playback method and system
CN110351576B (en) Method and system for rapidly displaying real-time video stream in industrial scene
JP2011244328A (en) Video reproduction apparatus and video reproduction apparatus control method
JP5954489B2 (en) Movie data editing device, movie data editing method, playback device, playback method, and program
CN111836071B (en) Multimedia processing method and device based on cloud conference and storage medium
CN111064698B (en) Method and device for playing multimedia stream data
CN113409801A (en) Noise processing method, system, medium, and apparatus for real-time audio stream playback
JP5205900B2 (en) Video conference system, server terminal, and client terminal
US9253441B2 (en) Conference system, program and conference method
KR20200005968A (en) Apparatus and method for generating contents
JP2012054693A (en) Video audio transmission/reception device, video audio transmission/reception system, computer program, and recording medium
CN115695918A (en) Multi-camera broadcast guide control method and device, readable storage medium and terminal equipment
CN117041613A (en) Live video live broadcast pause method based on RTMP, live video live broadcast pause device based on RTMP and live broadcast system
KR20120088148A (en) Method and apparatus for media trick playing in universal plug and play

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant