CN114257771A - Video playback method and device for multi-channel audio and video, storage medium and electronic equipment - Google Patents

Video playback method and device for multi-channel audio and video, storage medium and electronic equipment Download PDF

Info

Publication number
CN114257771A
CN114257771A CN202111572204.9A CN202111572204A CN114257771A CN 114257771 A CN114257771 A CN 114257771A CN 202111572204 A CN202111572204 A CN 202111572204A CN 114257771 A CN114257771 A CN 114257771A
Authority
CN
China
Prior art keywords
video
audio
stream
streams
mixed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111572204.9A
Other languages
Chinese (zh)
Other versions
CN114257771B (en
Inventor
金宏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202111572204.9A priority Critical patent/CN114257771B/en
Publication of CN114257771A publication Critical patent/CN114257771A/en
Application granted granted Critical
Publication of CN114257771B publication Critical patent/CN114257771B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/433Content storage operation, e.g. storage operation in response to a pause request, caching operations
    • H04N21/4334Recording operations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4398Processing of audio elementary streams involving reformatting operations of audio signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/63Control signaling related to video distribution between client, server and network components; Network processes for video distribution between server and clients or between remote clients, e.g. transmitting basic layer and enhancement layers over different transmission paths, setting up a peer-to-peer communication via Internet between remote STB's; Communication protocols; Addressing
    • H04N21/643Communication protocols
    • H04N21/6437Real-time Transport Protocol [RTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8547Content authoring involving timestamps for synchronizing content

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The application discloses a method for recording multi-channel audio and video, which comprises the following steps: acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication; mixing the audio stream of each device to obtain a mixed stream, storing the mixed stream, and keeping timestamp information; and respectively storing the video streams of the plurality of devices and keeping the respective time stamp information. The application also provides a multi-channel audio and video playback method, which comprises the following steps: acquiring video streams and respective timestamp information of a plurality of devices which are respectively stored and carry out multi-path audio and video communication; acquiring the stored mixed audio stream and timestamp information thereof; and synchronously playing each video stream and the mixed audio stream according to the timestamp information of each video stream and the timestamp information of the mixed audio stream. By the aid of the method and the device, processing of the video recording and playback equipment can be simplified.

Description

Video playback method and device for multi-channel audio and video, storage medium and electronic equipment
Technical Field
The present application relates to video playback technologies, and in particular, to a method and an apparatus for playing back a video of multiple channels of audio and video, a storage medium, and an electronic device.
Background
The existing two-way audio and video recording playback adopts the following scheme:
1. and storing the video files of the two-way audio in a network video server, and transmitting data to playback equipment for video playback through a plurality of network video servers when video playback is performed. However, in this processing mode, due to the influence of network transmission, the played back audio and video data is unstable, especially, each channel of audio may jump during synchronization, and sudden change of sound can be easily perceived by people as abnormal playing, so that the user experience is poor.
2. And respectively carrying out video decoding on the two independent video streams, carrying out picture arrangement on all the decoded and output YUV images according to a certain proportion to synthesize a YUV image, then carrying out transcoding and compression on the synthesized image into a video stream, wherein the processing is called transcoding, the transcoded video stream is stored, and the stored video stream is decoded and played during playback. In this processing method, the stored video stream is a result of combining two images into one image, so that the newly generated video stream cannot restore the original image quality and detail of each image.
Disclosure of Invention
The application provides a method and a device for recording and playing back multi-channel audio and video, a storage medium and electronic equipment, which can simplify the processing of the recording and playing back equipment.
In order to achieve the purpose, the following technical scheme is adopted in the application:
a method for recording multi-channel audio and video comprises the following steps:
acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication;
mixing audio streams of each device to obtain mixed audio streams, storing the mixed audio streams, and keeping timestamp information;
and respectively storing the video streams of the plurality of devices, and reserving respective time stamp information.
Preferably, when the device for storing the audio stream and the video stream is a designated device of the multiple devices, the acquiring the audio streams and the video streams of the multiple devices performing the multi-channel audio and video communication includes:
the appointed equipment receives audio and video streams of corresponding equipment sent by other equipment except the appointed equipment in the plurality of equipment, unpacks the audio and video streams and extracts the video streams and the audio streams of the corresponding equipment; the second device captures and generates a local video stream and an audio stream.
Preferably, the saving the mixed sound stream includes: encoding and packaging the mixed stream and then storing the coded mixed stream;
the respectively storing the video streams of the plurality of devices comprises: and packaging and storing the video streams of the other devices, and encoding and packaging the local video stream and storing the local video stream.
Preferably, the audio-video stream and the video stream of each device are packaged and stored in a video file corresponding to each device.
Preferably, the multi-channel audio-video communication is two-channel audio-video communication, the multi-channel devices are a first device and a second device respectively, and the audio-video stream and the video streams of the devices are stored in the second device;
the acquiring of audio streams and video streams of a plurality of devices performing multi-channel audio and video communication includes:
the second equipment receives the audio and video stream of the first equipment from the first equipment, unpacks the audio stream and extracts the audio stream and the video stream of the first equipment; the second device collects and generates a local audio stream and a local video stream;
the mixing the audio stream of each device to obtain a mixed stream includes:
decoding the audio stream of the first device and then mixing the decoded audio stream with the local audio stream to obtain the mixed audio stream;
the method for saving the mixed audio stream and the video streams of the plurality of devices comprises the following steps:
encoding the mixed stream;
and packaging and storing the coded audio mixing stream and the video stream of the first equipment into a first video file corresponding to the first equipment, and packaging and storing the coded local video stream and the coded audio mixing stream into a second video file corresponding to the second equipment.
A method for playing back multi-channel audio and video comprises the following steps:
acquiring video streams and respective timestamp information of a plurality of devices which are respectively stored and carry out multi-path audio and video communication;
acquiring the stored mixed audio stream and timestamp information thereof; the audio mixing stream is an audio stream obtained by mixing audio streams of the multiple devices;
and synchronously playing each video stream and the mixed audio stream according to the timestamp information of each video stream and the timestamp information of the mixed audio stream.
Preferably, when the video streams and the audio-video stream are played synchronously, the video streams are played synchronously with reference to the playing progress of the audio-video stream on the basis of the playing progress of the audio-video stream.
Preferably, the method for synchronously playing any one of the video streams with reference to the playing progress of the audio-mixed stream includes:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1-lambda 1; if-SyncT is less than or equal to V2-A2- (V0-A0) and less than or equal to SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1+ lambda 2;
the time consumed before playing is subtracted from the first delay time to serve as delay synchronization time, and the current frame of any one path of video stream is played according to the delay synchronization time;
v2 and a2 are timestamps of the current frame of the any one of the video streams and the current frame of the mixed stream, V0 and a0 are timestamps of the start of the any one of the video streams and the mixed stream, V1 is a timestamp of a previous frame of the any one of the video streams, SyncT is a maximum value between a preset audio-video synchronization threshold and V2-V1, and λ 1 and λ 2 are preset first step size and second step size, respectively.
Preferably, before comparing V2-a2- (V0-a0) with SyncT, the method further comprises:
judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowed synchronization threshold, if so, executing the operation of comparing V2-A2- (V0-A0) with SyncT;
otherwise, when the video stream is V2-A2- (V0-A0) >0, pausing the playing processing of any one video stream within a set time, and after the pause time is up, re-executing the operation of judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowable synchronization threshold value; and when V2-A2- (V0-A0) <0, directly playing the current frame of any one path of video stream without delay processing.
Preferably, when the device playing back the audio-video stream and the audio-video stream is a designated device of the plurality of devices, before determining whether V2-a2- (V0-a0) is less than or equal to SyncT for video stream playing of the designated device, the method further comprises:
judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold value, if so, normally playing the current frame of the video stream of the specified device according to the timestamp information of the current frame; otherwise, the operation of comparing V2-A2- (V0-A0) with SyncT is performed.
Preferably, the multi-channel audio-video communication is two-channel audio-video communication, and the multi-channel device is two devices.
A multi-channel audio-video recording device, the device comprising: a receiving unit, a mixing unit and a storage unit;
the receiving unit is used for acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication, respectively storing the video streams of each device in the storage unit, and reserving respective timestamp information;
the audio mixing unit is used for mixing audio streams of each device to obtain mixed audio streams, storing the mixed audio streams to the storage unit and keeping timestamp information;
the storage unit is configured to store the mixed stream and timestamp information thereof, and also store the video stream of each device and respective timestamp information thereof.
A multi-channel audio-video playback device, comprising: a video stream processing unit, an audio stream processing unit, and a playback unit;
the video stream processing unit is used for acquiring the video streams of a plurality of devices which are respectively stored and carry out multi-channel audio and video communication and respective timestamp information;
the audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio mixing stream is an audio stream obtained by mixing audio streams of the multiple devices;
the playback unit is used for synchronously playing each video stream and the mixed sound stream according to the timestamp information of each video stream and the timestamp information of the mixed sound stream.
A computer readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the above-mentioned method for recording or playing back multiple audio and video.
An electronic device comprising at least a computer-readable storage medium, further comprising a processor;
the processor is used for reading the executable instruction from the computer readable storage medium and executing the instruction to realize the recording or playback method of the multi-path audio and video.
According to the technical scheme, when the video recording of the multi-channel audio and video communication process is carried out, the audio streams and the video streams of a plurality of devices for carrying out the multi-channel audio and video communication are obtained; mixing audio streams of a plurality of devices to obtain mixed streams, storing the mixed streams and keeping timestamp information; and respectively storing the video streams of the plurality of devices and keeping the respective time stamp information. Correspondingly, when the multi-channel audio and video communication is played back, the video streams and the respective timestamp information of the multiple devices which are respectively stored are obtained, and the stored audio mixing streams and the timestamp information thereof are obtained; and synchronously playing each video stream and the mixed audio stream according to the timestamp information of each video stream and the timestamp information of the mixed audio stream. Through the processing, the audio streams of the multiple devices are mixed and then stored, so that the pressure of audio mixed-sound playing during playback is reduced, and the processing of the playback device is simplified; meanwhile, multiple paths of video streams are not subjected to transcoding and combined into one path of video stream, the time and memory consumption for fusing and rendering multiple paths of video into picture-in-picture and transcoding is reduced, the performance requirement on a System On Chip (SOC) of a video is low, and the processing of video equipment can be simplified.
Drawings
Fig. 1 is a schematic diagram of a basic flow of a multi-channel audio and video recording method in the present application;
fig. 2 is a schematic basic flow chart of a multi-channel audio and video playback method in the present application;
fig. 3 is a schematic basic flow chart of a two-way audio and video recording playback method in the embodiment of the present application;
FIG. 4 is a block diagram of a synchronized video recording of a two-way video intercom;
FIG. 5 is a video playback block diagram of a two-way video intercom;
fig. 6 is a schematic diagram of a multi-channel audio/video recording apparatus according to the present application;
fig. 7 is a schematic diagram of a multi-channel audio and video playback device in the present application;
fig. 8 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
For the purpose of making the objects, technical means and advantages of the present application more apparent, the present application will be described in further detail with reference to the accompanying drawings.
The technical scheme can be widely applied to various scenes of two-way or even more-way audio and video communication, such as two-way or more-way video intercommunication scenes. A multi-channel audio/video communication scenario is described below, which may include multiple pieces of audio/video communication equipment (e.g., a video interphone), where in an audio/video communication process of the multiple pieces of equipment, each piece of equipment simultaneously previews a picture of a local camera and plays other remote audios and videos, and keeps a picture of a multi-party visual picture-in-picture, and the size of each video picture can be adjusted at any time according to a use habit. Each path of audio and video can be recorded and stored in the communication process, each path of audio and video is played back synchronously when the video needs to be checked, and then the problem of how to record and synchronously play back two paths and more paths of audio and video is solved through the processing of the application.
Fig. 1 is a schematic basic flow diagram of a multi-channel audio and video recording method in the present application. As shown in fig. 1, the method includes:
step 101, acquiring audio streams and video streams of a plurality of devices for multi-channel audio and video communication;
step 102, mixing audio streams of a plurality of devices to obtain mixed audio streams, storing the mixed audio streams, and keeping timestamp information;
and 103, respectively storing the video streams of the multiple devices and keeping the respective time stamp information.
The basic method flow shown in fig. 1 ends up. In the method, the audio streams of a plurality of devices in multi-path audio and video communication are mixed and then stored, so that the subsequent playback processing is facilitated; meanwhile, the multi-channel video streams are respectively stored so as to ensure the image quality and the details of the original video as much as possible during playback.
Fig. 2 is a basic flow diagram of a multi-channel audio and video playback method in the present application. As shown in fig. 2, the method includes:
step 201, acquiring video streams and respective timestamp information of a plurality of devices performing multi-channel audio and video communication, which are respectively stored;
step 202, obtaining the saved mixed sound stream and the timestamp information thereof;
the audio mixing stream is an audio stream obtained by mixing audio streams of a plurality of devices;
and step 203, synchronously playing the plurality of video streams and the audio mixing stream according to the timestamp information of the plurality of video streams and the timestamp information of the audio mixing stream.
The basic method flow shown in fig. 2 ends up so far. In the method, the mixed stream stored after the multi-channel audio stream is mixed is directly played during playback, and the mixed stream and each video stream are played synchronously. Thus, no additional mixing process is required for audio playing, and the processing of the playback device can be simplified.
In the video recording and playback method of the present application, the processing method of video recording and playback can be implemented in one physical device, or can be implemented in different devices; meanwhile, the device for realizing video recording and playback may be one of a plurality of devices for performing multi-channel audio/video communication, or may be a third-party device different from the plurality of communication devices. For example, in order to reduce the occupation of the storage space of the terminal, when a plurality of communication devices perform multi-channel audio/video communication, the method can be used for performing video recording processing and storing corresponding media files in the network server, and when playback is required, the playback processing can be performed on the plurality of communication devices or another third-party device to perform synchronous playback of multi-channel audio/video; or, in order to facilitate the viewing and avoid the influence of network secondary transmission, the above-mentioned recording and playback processes may be executed on several devices among a plurality of devices performing multi-channel audio/video communication. Here, for convenience of description, a device that performs recording or playback processing is referred to as a designated device, but although a device that performs recording or playback processing is referred to as a designated device, it does not represent a device designated by a user, a system, or the like, but merely serves as a name for representing that the device is a device that performs saving or playback processing, and for example, the designated device may be a random device, a preset device, or the like.
The following embodiments take the example of performing the video recording and playback method on one of two devices performing two-way audio/video communication, and illustrate specific implementations of the video recording method and the playback method in the present application. Since the processes of the recording method and the playback method are corresponding to each other, the following embodiments will be described with reference to a complete flow including the recording method and the playback method.
Fig. 3 is a basic flow diagram of a two-way audio/video recording and playback method in the embodiment of the present application. The two devices performing two-way audio/video communication are referred to as a first device and a second device, and the second device performs storage and playback processing of audio/video streams, that is, the specified device. As shown in fig. 3, the method includes:
step 301, the second device extracts the video stream of the first device from the audio/video stream sent by the first device for storage, and retains the timestamp information.
When the first device and the second device carry out audio and video communication, the first device sends the locally acquired and generated audio and video stream to the second device. The first device may generate the audio/video stream according to an existing manner, for example, encode and compress the audio/video stream and then pack the encoded audio/video stream into an RTP packet to transmit the RTP packet to the second device.
The second device extracts the Video stream Video1 from the audio/Video stream of the first device for storage after receiving the audio/Video stream, and meanwhile, time stamp information needs to be reserved when the Video stream of the first device is stored for subsequent synchronous playing. For example, if the audio/Video stream is transmitted in the form of an RTP packet, the audio/Video stream needs to be unpacked by RTP first, and then a part of the Video stream Video1 is extracted from the unpacked audio/Video stream.
Specifically, when the extracted Video stream Video1 is saved, in order to save processing resources, it is preferable that the extracted Video stream be directly packaged and saved without being decoded. Of course, the extracted video stream may be decoded, encoded and packetized, and then the packetized video stream is stored, but such processing obviously consumes more computing resources, but the second device may adopt a different encoding method when re-encoding the video stream of the first device. The specific encoding and packaging mode can be selected according to the requirement, for example, the encoding and packaging mode can be packaged into a PS packet for storage.
In this embodiment, since the device performing the video recording processing is one side of audio/video communication, it is only necessary to receive and extract the video stream of the opposite device according to this step, and the video stream of the local device is processed according to step 302. However, if the device performing the video recording processing is a third-party device different from the first device and the second device, such as a web server, unlike the present embodiment, the video streams of both devices are extracted and processed in the manner of the present step 301.
And step 302, the second device collects and generates a local video stream for storage, and retains the timestamp information.
The manner in which the second device collects and generates the local Video stream Video2 may be conventional and will not be described herein.
The local Video stream Video2 is saved, and time stamp information needs to be preserved when the audio-mixed stream is saved for subsequent synchronous playing. Specifically, when the Video2 is saved, the Video2 may be encoded and packaged for saving. The specific encoding and packaging mode can be selected according to the requirement, for example, the encoding and packaging mode can be packaged into a PS packet for storage.
And step 303, the second device extracts an audio stream from the audio/video stream sent by the first device, mixes the audio stream with the local audio stream collected and generated by the second device, obtains the mixed audio stream, stores the mixed audio stream, and retains the timestamp information.
The manner in which the second device collects and generates the local Audio stream Audio2 may be conventional and will not be described further herein.
For example, if the Audio/video stream is transmitted in the form of an RTP packet, the Audio/video stream needs to be first subjected to RTP unpacking when the Audio/video stream is extracted, and then a part of the Audio stream Audio1 is extracted from the unpacked Audio/video stream, and because Audio mixing processing needs to be subsequently performed, Audio1 also needs to be decoded to restore the original Audio stream.
The Audio stream Audio1 of the first device (i.e., the restored original Audio stream) and the local Audio stream Audio2 of the second device are subjected to Audio mixing processing to obtain an Audio mixed stream Audio3, and the Audio mixing processing mode may adopt various existing modes, which is not limited in this application.
And storing the Audio mixed stream Audio3 obtained after Audio mixing, and keeping the timestamp information when the Audio mixed stream is stored for subsequent synchronous playing. Specifically, when Audio3 is stored, Audio3 may be encoded and packaged for storage. The specific encoding and packaging mode can be selected according to the requirement, for example, the encoding and packaging mode can be packaged into a PS packet for storage.
In this embodiment, since the device performing the video recording processing is one of audio and video communications, the audio stream processing for the first device and the second device in this step is not completely the same. However, if the device performing the video recording processing is a third-party device different from the first device and the second device, for example, a network server, unlike the present embodiment, the audio streams of both devices may be processed according to the processing method of the audio stream of the first device, and then the processed audio streams are mixed, encoded, packaged, and stored.
Through the steps 301, 302 and 303, Video1, Video2 and Audio3 can be saved, and the Video recording processing method is completed. In fact, when the audio/video file is saved, the audio and the video can be saved in one file to facilitate the playing process. Based on this, it is preferable that Video1 and Audio3 be packaged together and stored in a first Video file corresponding to the first device, and Video2 and Audio3 be packaged together and stored in a second Video file corresponding to the second device. Therefore, when video playback is carried out, the first video file and the second video file can be played independently, and because the Audio3 is a mixed stream, the sound of both Audio and video communication parties is carried when any video file is played, and the voice background environment of bidirectional Audio and video communication is effectively restored. Of course, if the occupation of storage resources is considered and a single picture is not needed to be played in Audio-Video communication separately, Audio3 may be packaged and stored separately or together with one of Video1 and Video 2.
In addition, the above steps 301, 302 and 303 may be performed in parallel.
And step 304, when the second device performs video playback, the video stream, the local video stream and the audio mixing stream of the first device are synchronously played according to the timestamp information of the video stream, the timestamp information of the local video stream and the timestamp information of the audio mixing stream of the first device.
Through the steps 301, 302 and 303, Video1, Video2 and Audio3 can be recorded and stored, and corresponding timestamp information is reserved when corresponding Audio/Video streams are stored, so that Video1, Video2 and Audio3 can be synchronously played according to the corresponding timestamp information, and Video playback is realized. Because Audio3 is an Audio stream obtained by mixing two Audio channels, it is not necessary to mix and play the two Audio channels again during video playback, which greatly simplifies the processing of video playback. In addition, if the playback process and the previous recording process are performed in different devices, the remix stream and its timestamp information stored in the recording area, and the two video streams and their respective timestamp information stored in the recording area need to be obtained before step 304.
Specifically, when synchronous playback is performed, the audio stream and the other video stream can be played synchronously with reference to the playing progress of the video stream a on the basis of the playing progress of one video stream a. Taking the foregoing processing as an example, Audio3 and Video1 may be played synchronously with reference to the playback progress of Video2 on the basis of the playback progress of Video2, or Audio3 and Video2 may be played synchronously with reference to the playback progress of Video1 on the basis of the playback progress of Video 1.
However, considering that human sensitivity to sound jumps is higher, it is preferable that the two video streams are played synchronously with reference to the playing progress of the audio-mixed stream, that is, the two video streams calculate the delay synchronization time of the current frame relative to the previous frame by using the difference between the start time stamps of the two video streams and the audio-mixed stream, the time consumed before playing, the interval time between video frames, and the like, so as to perform playing processing according to the calculated time. The following provides a calculation method of the delay synchronization time:
the meaning of the respective quantities involved is first presented.
V0 denotes the start timestamp of Video2, a0 denotes the start timestamp of Audio3,
v1 denotes the last frame timestamp of Video2, a1 denotes the last frame timestamp of Audio3,
v2 denotes a current frame timestamp of Video2, a2 denotes a current frame timestamp of Audio3,
v0 'denotes the start timestamp of Video1, V1' denotes the last frame timestamp of Video1,
v2' denotes the current frame timestamp of Video1,
SyncT denotes the maximum between V2-V1 and the one-way audio-video synchronization threshold,
SyncT ' represents the maximum between V2' -V1' and the two-way audio-video synchronization threshold.
Next, the calculation method of synchronous delay and video playing method are introduced
1. Time delay synchronous calculation of non-same path Video1 synchronous Audio Audio3
When the Video1 is played independently, the difference between the timestamps of the current frame and the previous frame is V2'-V1', that is, the delay time between two frames is V2 '-V1'; when the video is played with reference to the audio, the time stamp information of the audio playing needs to be considered on the basis of the time stamp information of the video, and the delay time between two video frames is adjusted to keep synchronization with the audio playing.
Specifically, V2'-a2- (V0' -a0) may be compared with SyncT ', and if V2' -a2- (V0'-a0) < -SyncT', which indicates that Video1 playing is delayed relative to Audio3 playing, and the delay should be reduced, it is determined that the first delay time of the current frame of Video1 relative to the previous frame is V2'-V1' - λ 1 '(i.e., a preset step λ 1' is advanced again on the basis of the normal Video frame interval V2 '-V1'); if-SyncT '≦ V2' -a2- (V0'-a0) ≦ SyncT', indicating that the current delay processing is within the normal synchronized playback range, then the current frame of Video stream Video1 is normally played back according to the timestamp information of the current frame, i.e., the Video frame interval V2'-V1' that determines that the first delay time of the current frame of Video stream Video1 relative to the previous frame is normal; if V2' -a2- (V0' -a0) > SyncT ' indicates that the Video stream Video1 playback is advanced with respect to the Audio3 playback and should be delayed additionally, it can be determined that the first delay time of the current frame of the Video stream Video1 with respect to the previous frame is V2' -V1' + λ 2' (i.e., the current frame is delayed again by a preset step λ 2' based on the normal Video frame interval V2' -V1 '). After the first delay time is determined in the above manner, the time consumed before playing is subtracted from the first delay time to serve as a delay synchronization time, and the current frame of the Video1 is played according to the delay synchronization time, where the time consumed before playing needs to be calculated when the delay synchronization time is calculated because the unpacking and decoding needs to be performed before playing and the processing time is not negligible. Wherein λ 1 'and λ 2' may be set to empirical values according to current device performance, operating environment, and the like. In addition, λ 1' can be directly set to V2' -V1', that is, when V2' -a2- (V0' -a0) < -SyncT, the first delay time is directly 0, the current frame is not delayed relative to the previous frame, and the decoding and playing of the current frame are directly performed.
In addition, a special scenario is considered: at the initial stage of video communication establishment, a local audio and video of a second device starts to be recorded first, and an audio and video sent by a first device starts to be recorded after a period of time, so that the video recording start time of the first device is much later than the recording start time of a mixed stream including the local audio and exceeds a reasonable audio and video synchronous processing range, and then the processing for restoring a real scene in this case should be: the local video and audio stream of the second device start to play first, at this time, the picture of the first device should be a black screen, the video content is not displayed, and the video starts to display the picture after a period of time. However, if it is found that V2'-a2- (V0' -a0) is greater than SyncT 'in the above processing manner, the playback is performed after extra delay, which is inconsistent with the actual scene, and in consideration of this situation, in this embodiment, to restore the actual scene in the two-way audio-video communication as much as possible, it is preferable that the following processing is further included before comparing V2' -a2- (V0'-a0) with SyncT':
comparing | V2' -A2- (V0' -A0) | with a preset allowed synchronization threshold value X, and if | V2' -A2- (V0' -A0) | is less than or equal to X, continuing to compare V2' -A2- (V0' -A0) with SyncT ' and subsequent processing, namely the calculation and playing processing of the synchronization delay time of the video. If | V2'-A2- (V0' -A0) | > X, the time difference between the video stream and the mixed stream is large, exceeds the synchronous processing range, and the following two cases are continuously processed:
a. when V2'-a2- (V0' -a0) >0 indicates that the playing of the Video stream Video1 is advanced much relative to the playing of the Audio3, and exceeds a reasonable synchronization processing range, it should be the initial stage of Video communication establishment, and the Video of the second device is recorded earlier than the Video of the first device, so that the processing of the Video stream Video1 is paused within a set time, the Video frames are not decoded and played, and after the pause time is up, the time stamp of the current frame is judged again until V2'-a2- (V0' -a0) enters the reasonable synchronization processing range (i.e. 0< V2'-a2- (V0' -a0) < X), and then the Video synchronization playing with reference to the Audio is performed by using the manner described above;
b. when V2' -a2- (V0' -a0) <0, which indicates that the Video1 playback is delayed much from the Audio3 playback, the first delay time is set to V2' -V1' - λ 1' as in the foregoing process, and preferably set to 0, and the playback is directly decoded without delay processing, so as to catch up with the Audio playback progress as soon as possible.
According to the preferred embodiment, when the recording of the Audio and Video by the second device is earlier than the beginning of the first device, at the beginning stage of the playback recording, the picture of the first device is consistent with the actual scene and is displayed as a black screen, and the picture is not played until the time stamp of the Video1 and the time stamp of the Audio3 are close to the synchronous processing range.
Further, in the above-described processing, in consideration of the possibility that the reference time of the time stamp information may be different between the saved video stream and the remix stream, the comparison processing is performed using V2'-a2- (V0' -a 0). In general, when storing the received audio/video stream, the timestamp information of the audio stream and the video stream is generally adjusted to be the same as the reference time, and in this case, V2' -a2 may be used to replace V2' -a2- (V0' -a0) to perform the above operations and comparisons.
2. Delay synchronization time calculation of synchronous Audio Audio3 of Video2 of the same channel
Specifically, V2-a2- (V0-a0) may be compared with SyncT, and if V2-a2- (V0-a0) < -SyncT, which indicates that the local Video playing is delayed relative to the audio playing, the delay should be reduced, it is determined that the first delay time of the current frame of the local Video stream Video2 relative to the previous frame is V2-V1- λ 1 (i.e., a preset step λ 1 is advanced on the basis of the normal Video frame interval V2-V1); if-SyncT is less than or equal to V2-a2- (V0-a0) is less than or equal to SyncT, it indicates that the current delay processing is within the normal synchronous playing range, the current frame of the local Video stream Video2 is normally played according to the timestamp information of the current frame, that is, the first delay time of the current frame of the local Video stream Video2 relative to the previous frame is determined to be the normal Video frame interval V2-V1; if V2-a2- (V0-a0) > SyncT indicates that the local Video playback is advanced with respect to the audio playback and an additional delay should be made, it can be determined that the first delay time of the current frame of the local Video stream Video2 with respect to the previous frame is V2-V1+ λ 2 (i.e., a preset step λ 2 is delayed again on the basis of the normal Video frame interval V2-V1). After the first delay time is determined in the above manner, the time consumed before playing is subtracted from the first delay time to serve as a delay synchronization time, and the current frame of the local video stream is played according to the delay synchronization time. Wherein λ 1 and λ 2 can set empirical values according to the current device performance and operating environment, etc., where λ 1 can be directly set to V2-V1, that is, when V2-a2- (V0-a0) < -SyncT, the first delay time is directly 0, the current frame is not delayed relative to the previous frame, and the current frame is directly decoded and played. In this embodiment, λ 1, λ 2, λ 1 'and λ 2' may be equal or different.
The processing mode ensures that the video and the audio are always synchronized, but considering that the video of the same path is actually the locally recorded video, the situation that the video of the first device starts to be recorded earlier than the local video does not occur, so that the | V2-A2- (V0-A0) | is within a reasonable synchronization processing range under the normal condition, and therefore, the judgment of the synchronization delay time does not compare the | V2-A2- (V0-A0) | with the allowable synchronization threshold value X; however, considering that there is a possibility that | V2-a2- (V0-a0) | > X may occur due to a file exception, the following processing may be further included in this embodiment, preferably before comparing V2-a2- (V0-a0) with SyncT:
comparing | V2-A2- (V0-A0) | with an allowable synchronization threshold value X, if | V2-A2- (V0-A0) | > X, indicating that a file is abnormal, and not referring to the mixed stream Audio3 to synchronously play the local Video stream Video2, but automatically playing the local Video stream Video2 according to the Video frame interval; if the | V2-A2- (V0-A0) | is less than or equal to X, the comparison between V2-A2- (V0-A0) and SyncT and the subsequent processing are continued, namely the Audio stream Audio3 is referred to for synchronous playing of the local Video stream Video 2.
In this embodiment, since the device performing the video recording processing is one side in audio/video communication, the processing of the video synchronization audio of the same channel and the processing of the video synchronization audio of the non-same channel are distinguished during the synchronous playing in this step. However, if the device performing the video recording processing is a third-party device different from the first device and the second device, such as a network server, unlike the present embodiment, the processing of the video stream synchronized audio mixing stream for both devices is performed according to the processing of the non-same-path video synchronized audio.
The method flow shown in fig. 3 ends up so far. By the video recording and playback method, the two-way audio is stored after being mixed, so that the audio mixed-sound playing pressure during the synchronous playback of the video is reduced; the two paths of videos are respectively stored and played, so that the image quality and the details of the original videos can be kept as much as possible, the time consumption and the memory consumption of fusion rendering of the two paths of videos into a picture in picture and re-encoding are reduced, the requirement on the SOC performance is low, meanwhile, the two paths of video pictures are respectively played, and the positions of the two paths of video pictures can be freely dragged and adjusted according to the use habit of a user; the video stream and the audio stream are stored in the local playback equipment, and the playback audio and video do not need to be acquired through network transmission, so that the condition of poor user experience caused by the influence of network quality is avoided. Furthermore, during playback, the video stream is synchronously played with reference to the audio stream by taking the playing progress of the audio stream as a reference, so that the original audio scene is kept, even if the video picture jumps to some extent, the video picture is not easy to perceive, and the user experience is greatly improved.
In addition, the video recording and playback processing of two-way audio-video communication is given in the embodiment, and in fact, the processing method is also applicable to the case of N (N >2) way audio-video communication. Specifically, if the recording and playback device is a device a in N channels of devices for communication, the processes of capturing, saving, and synchronized playing of the local Audio stream and Video stream of the device a are respectively the same as the processes of the local Audio stream Audio2 and local Video stream Video2 in this embodiment, and the processes of extracting, saving, and synchronized playing of the Audio stream and Video stream of the other N-1 devices are respectively the same as the processes of the remote Audio stream Audio1 and remote Video stream Video1 in this embodiment; if the recording and playback device is not one of the N devices but another device B than the N devices, for example a web server, the processing of the Audio and Video streams for each of the N devices is performed according to the processing of the remote Audio stream Audio1 and the remote Video stream Video1, respectively, as described above. For N-channel audio and video communication scenes, the video recording and playback method can also reduce the pressure of audio mixing playing during the synchronous playback of the video, maintain the image quality and details of the original video as much as possible, reduce the time consumption and memory consumption of fusion rendering of double videos into picture-in-picture and re-encoding, have low requirements on SOC performance, respectively play multiple channels of video pictures, freely drag and adjust the positions of the multiple channels of video pictures according to the use habits of users, avoid the condition of poor user experience caused by the influence of network quality, maintain the original audio scene, even if the video pictures jump to some extent, the video pictures are not easy to be perceived, and greatly improve the user experience.
An example of two-way audio and video synchronous recording and playback is given below by taking two-way video intercommunication as an example. Fig. 4 is a synchronous video recording block diagram of two-way video intercom, and the device 1 and the device 2 perform real-time audio and video two-way video intercom. Video1 and Audio1 of device 1 are transmitted to device 2 through a network protocol, device 2 obtains Audio1 through RTP unpacking and decoding, and then Audio1 and Audio2 collected by local device are synthesized into Audio3 through Audio mixing algorithm; the Audio3 is used as Audio played by the two-way real-time video intercommunication, and is respectively stored in the video1 file and the video2 file, and the time stamp is reserved; and Video1 is unpacked through RTP and then PS packaged and stored in a Video1 file, Video2 collected locally is stored in a Video2 file through the PS packaged after coding, and Video1 and Video2 respectively keep respective timestamps.
Fig. 5 is a video playback block diagram of a two-way video intercom. The device 2 starts two paths of Audio and Video unpacking, decoding and playing, wherein the Audio3 of one path of Video2 is directly played by PS unpacking decoding, and the Video2 calculates the delay synchronization time through the difference value of the starting time stamp of the Audio3, the decoding display time, the Video frame interval time and the like, and plays according to the delay synchronization time. Similarly, Video1 of Video1 calculates the delay synchronization time to play with Audio3 of Video2 in the same manner, and Audio3 of Video1 is consistent with Audio3 of Video2 and can be discarded directly. When displaying the picture, the local video of the device 2 is displayed in a full screen, and the video of the device 1 is displayed as a small picture of a picture-in-picture.
The foregoing is a specific implementation of the recording method and the playback method of the multi-channel audio and video in the present application. The application also provides a video recording device and a playback device of the multi-channel audio and video, which can be respectively used for implementing the video recording method and the playback method.
Fig. 6 is a schematic diagram of a multi-channel audio/video recording apparatus according to the present application. As shown in fig. 6, the apparatus includes: a receiving unit, a mixing unit and a storage unit.
The receiving unit is used for acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication, respectively storing the video streams of the plurality of devices in the storage unit, and keeping respective timestamp information. And the audio mixing unit is used for mixing the audio stream of each device to obtain a mixed stream, storing the mixed stream in the storage unit and keeping the timestamp information. And the storage unit is used for storing the mixed stream and the time stamp information thereof, and also used for storing the video streams of the plurality of devices and the respective time stamp information thereof.
Alternatively, when the video recording apparatus is located in a specified device among the plurality of devices, the receiving unit may include a receiving sub-unit, an audio processing sub-unit, and a video processing sub-unit.
The receiving subunit is configured to receive audio and video streams of corresponding devices sent by other devices in the multiple devices except the designated device, unpack the audio and video streams, and extract the video stream and the audio stream of the corresponding device. The audio processing subunit is used for acquiring and generating a local audio stream and sending the local audio stream to the sound mixing unit; and the audio stream of the other device extracted by the receiving sub-unit is decoded and then sent to the mixing unit. The video processing subunit is used for acquiring and generating a local video stream, encoding and packaging the local video stream, storing the local video stream in the storage unit, and keeping timestamp information; and the video stream of other equipment extracted by the received subunit is packaged and then stored in the storage unit, and the timestamp information is preserved. The mixing unit encodes and packages the mixed stream and stores the mixed stream when storing the mixed stream.
Alternatively, the audio-video stream and the video stream of each device may be packaged and saved in a video file corresponding to each device.
Optionally, the multi-channel audio/video communication may be two-channel audio/video communication, the multi-channel devices are a first device and a second device respectively, and the apparatus is located in the second device; the receiving unit may include a receiving sub-unit, an audio processing sub-unit, and a video processing sub-unit; the apparatus may further comprise a packing unit;
the receiving subunit is used for receiving the audio and video stream of the first device sent by the first device, extracting the audio stream and the video stream of the first device after unpacking the audio and video stream, sending the audio stream of the first device to the audio processing subunit, and sending the video stream of the first device to the packing unit;
the audio processing subunit is used for acquiring and generating a local audio stream and sending the local audio stream to the sound mixing unit; the audio mixing unit is also used for decoding the audio stream of the first device extracted by the receiving subunit and then sending the decoded audio stream to the audio mixing unit;
the video processing subunit is used for acquiring, generating and sending the local video stream to the packing unit;
in the mixing unit, the processing of mixing the audio stream of each device to obtain the mixed stream may include:
mixing the decoded result of the audio stream of the first device sent by the audio processing subunit with the local audio stream sent by the audio processing subunit to obtain a mixed stream, and sending the mixed stream to the packing unit;
a packing unit for encoding the mixed stream from the mixing unit; and the video processing unit is further configured to store the encoded audio mixing stream and the video stream of the first device sent by the receiving subunit in a first video file corresponding to the first device in the storage unit in a packaged manner, and store the encoded local video stream sent by the video processing subunit and the encoded audio mixing stream in a second video file corresponding to the second device in the storage unit in a packaged manner.
Fig. 7 is a schematic diagram of a multi-channel audio and video playback device in the present application. As shown in fig. 7, the apparatus includes: a video stream processing unit, an audio stream processing unit, and a playback unit.
And the video stream processing unit is used for acquiring the video streams of the multiple devices which are respectively stored for carrying out the multi-channel audio and video communication and the respective timestamp information. The audio stream processing unit is used for acquiring the stored mixed stream and the timestamp information thereof; the audio mixing stream is an audio stream obtained by mixing audio streams of a plurality of devices. And the playback unit is used for synchronously playing each video stream and the mixed audio stream according to the time stamp information of each video stream and the time stamp information of the mixed audio stream.
Alternatively, when each video stream and the audio-mix stream are played synchronously in the playback unit, each video stream is played synchronously with reference to the playing progress of the audio-mix stream, with reference to the playing progress of the audio-mix stream.
Optionally, in the playback unit, the manner of synchronously playing any one of the video streams with reference to the playing progress of the mixed stream includes:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1-lambda 1; if-SyncT is less than or equal to V2-A2- (V0-A0) and less than or equal to SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1+ lambda 2;
the time consumed before playing is subtracted from the first delay time to serve as delay synchronization time, and the current frame of any one path of video stream is played according to the delay synchronization time;
v2 and a2 are timestamps of the current frame of the any one of the video streams and the current frame of the mixed stream, V0 and a0 are timestamps of the start of the any one of the video streams and the mixed stream, V1 is a timestamp of a previous frame of the any one of the video streams, SyncT is a maximum value between a preset audio-video synchronization threshold and V2-V1, and λ 1 and λ 2 are preset first step size and second step size, respectively.
Optionally, before comparing V2-a2- (V0-a0) with SyncT, the method may further comprise:
judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowed synchronization threshold, if so, executing the operation of comparing V2-A2- (V0-A0) with SyncT;
otherwise, when the video stream is V2-A2- (V0-A0) >0, pausing the playing processing of any one video stream within a set time, and after the pause time is up, continuing to execute the operation of judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowable synchronization threshold value; and when V2-A2- (V0-A0) <0, directly playing the current frame of any one path of video stream without delay processing.
Alternatively, when the playback apparatus is located in a specified device among the plurality of devices, in the playback unit, before determining whether V2-a2- (V0-a0) is less than or equal to SyncT for video stream playback of the specified device, it may further include: judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold value, if so, normally playing the current frame of the video stream of the designated device according to the timestamp information of the current frame; otherwise, an operation is performed that compares V2-A2- (V0-A0) with SyncT.
Optionally, the multiple audio-video communication is two-way audio-video communication, and the multiple devices are two devices.
The present application also provides a computer-readable storage medium storing instructions that, when executed by a processor, may perform the steps of the method for recording and the method for playing back multi-channel audio and video. In practical applications, the computer readable medium may be included in each of the apparatuses/devices/systems of the above embodiments, or may exist separately and not be assembled into the apparatuses/devices/systems. Wherein instructions are stored in a computer readable storage medium, which stored instructions, when executed by a processor, may perform the steps in the method for recording and the method for playing back multi-channel audio and video as described above.
According to embodiments disclosed herein, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example and without limitation: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing, without limiting the scope of the present disclosure. In the embodiments disclosed herein, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
Fig. 8 is an electronic device according to still another embodiment of the present disclosure. As shown in fig. 8, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:
the electronic device may include a processor 801 of one or more processing cores, memory 802 of one or more computer-readable storage media, and a computer program stored on the memory and executable on the processor. When the program of the memory 802 is executed, a recording method and a playback method of multi-channel audio/video can be realized.
Specifically, in practical applications, the electronic device may further include a power supply 803, an input/output unit 904, and the like. Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 8 is not intended to be limiting of the electronic device and may include more or fewer components than shown, or some components in combination, or a different arrangement of components. Wherein:
the processor 801 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the server and processes data by running or executing software programs and/or modules stored in the memory 802 and calling data stored in the memory 802, thereby performing overall monitoring of the electronic device.
The memory 802 may be used to store software programs and modules, i.e., the computer-readable storage media described above. The processor 801 executes various functional applications and data processing by executing software programs and modules stored in the memory 802. The memory 802 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the server, and the like. Further, the memory 802 may include high speed random access memory and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 802 may also include a memory controller to provide the processor 801 access to the memory 802.
The electronic device further comprises a power supply 803 for supplying power to each component, and the power supply 803 can be logically connected with the processor 801 through a power management system, so that functions of charging, discharging, power consumption management and the like can be managed through the power management system. The power supply 803 may also include one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and any like components.
The electronic device may also include an input-output unit 804, the input-output unit 804 operable to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. The input unit output 804 may also be used to display information input by or provided to the user, as well as various graphical user interfaces, which may be composed of graphics, text, icons, video, and any combination thereof.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (15)

1. A method for recording multi-channel audio and video is characterized by comprising the following steps:
acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication;
mixing audio streams of each device to obtain mixed audio streams, storing the mixed audio streams, and keeping timestamp information;
and respectively storing the video streams of the plurality of devices, and reserving respective time stamp information.
2. The video recording method according to claim 1, wherein, when the device that stores the audio-video stream and the video stream is a designated device among the plurality of devices, the acquiring the audio streams and the video streams of the plurality of devices that perform the multi-channel audio-video communication includes:
the appointed equipment receives audio and video streams of corresponding equipment sent by other equipment except the appointed equipment in the plurality of equipment, unpacks the audio and video streams and extracts the video streams and the audio streams of the corresponding equipment; the second device captures and generates a local video stream and an audio stream.
3. The video recording method according to claim 2, wherein said saving the audio-mixed stream comprises: encoding and packaging the mixed stream and then storing the coded mixed stream;
the respectively storing the video streams of the plurality of devices comprises: and packaging and storing the video streams of the other devices, and encoding and packaging the local video stream and storing the local video stream.
4. The video recording method according to claim 1, wherein the audio-video stream and the video stream of each device are packaged and saved in a video recording file corresponding to each device.
5. The video recording method according to claim 1, wherein the multi-channel audio-video communication is two-channel audio-video communication, the multi-channel devices are a first device and a second device respectively, and the audio-mixed stream and the video streams of the multiple devices are stored in the second device;
the acquiring of audio streams and video streams of a plurality of devices performing multi-channel audio and video communication includes:
the second equipment receives the audio and video stream of the first equipment from the first equipment, unpacks the audio stream and extracts the audio stream and the video stream of the first equipment; the second device collects and generates a local audio stream and a local video stream;
the mixing the audio stream of each device to obtain a mixed stream includes:
decoding the audio stream of the first device and then mixing the decoded audio stream with the local audio stream to obtain the mixed audio stream;
the method for saving the mixed audio stream and the video streams of the plurality of devices comprises the following steps:
encoding the mixed stream;
and packaging and storing the coded audio mixing stream and the video stream of the first equipment into a first video file corresponding to the first equipment, and packaging and storing the coded local video stream and the coded audio mixing stream into a second video file corresponding to the second equipment.
6. A method for playing back a plurality of paths of audios and videos is characterized by comprising the following steps:
acquiring video streams and respective timestamp information of a plurality of devices which are respectively stored and carry out multi-path audio and video communication;
acquiring the stored mixed audio stream and timestamp information thereof; the audio mixing stream is an audio stream obtained by mixing audio streams of the multiple devices;
and synchronously playing each video stream and the mixed audio stream according to the timestamp information of each video stream and the timestamp information of the mixed audio stream.
7. The playback method according to claim 6, wherein, when the each video stream and the mixed audio stream are played synchronously, the each video stream is played synchronously with reference to a playing progress of the mixed audio stream, based on the playing progress of the mixed audio stream.
8. The playback method according to claim 7, wherein the manner of synchronously playing any one of the video streams with reference to the playing progress of the audio-mixed stream includes:
comparing V2-A2- (V0-A0) with SyncT, and if V2-A2- (V0-A0) < -SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1-lambda 1; if-SyncT is less than or equal to V2-A2- (V0-A0) and less than or equal to SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1; if V2-A2- (V0-A0) > SyncT, determining that the first delay time of the current frame of any one video stream relative to the last frame is V2-V1+ lambda 2;
the time consumed before playing is subtracted from the first delay time to serve as delay synchronization time, and the current frame of any one path of video stream is played according to the delay synchronization time;
v2 and a2 are timestamps of the current frame of the any one of the video streams and the current frame of the mixed stream, V0 and a0 are timestamps of the start of the any one of the video streams and the mixed stream, V1 is a timestamp of a previous frame of the any one of the video streams, SyncT is a maximum value between a preset audio-video synchronization threshold and V2-V1, and λ 1 and λ 2 are preset first step size and second step size, respectively.
9. The playback method of claim 8, wherein before comparing V2-a2- (V0-a0) with SyncT, the method further comprises:
judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowed synchronization threshold, if so, executing the operation of comparing V2-A2- (V0-A0) with SyncT;
otherwise, when the video stream is V2-A2- (V0-A0) >0, pausing the playing processing of any one video stream within a set time, and after the pause time is up, re-executing the operation of judging whether | V2-A2- (V0-A0) | is greater than or equal to a preset allowable synchronization threshold value; and when V2-A2- (V0-A0) <0, directly playing the current frame of any one path of video stream without delay processing.
10. The playback method according to claim 8, wherein when the device that plays back the mixed stream and the video stream is a specified device of the plurality of devices, before determining whether V2-a2- (V0-a0) is less than or equal to SyncT for video stream playback of the specified device, further comprising:
judging whether V2-A2- (V0-A0) is smaller than a preset allowable synchronization threshold value, if so, normally playing the current frame of the video stream of the specified device according to the timestamp information of the current frame; otherwise, the operation of comparing V2-A2- (V0-A0) with SyncT is performed.
11. The playback method according to any one of claims 6 to 10, wherein the multi-channel audio-video communication is two-channel audio-video communication, and the multi-channel device is two devices.
12. A multi-channel audio and video recording device is characterized by comprising: a receiving unit, a mixing unit and a storage unit;
the receiving unit is used for acquiring audio streams and video streams of a plurality of devices for multi-path audio and video communication, respectively storing the video streams of each device in the storage unit, and reserving respective timestamp information;
the audio mixing unit is used for mixing audio streams of each device to obtain mixed audio streams, storing the mixed audio streams to the storage unit and keeping timestamp information;
the storage unit is configured to store the mixed stream and timestamp information thereof, and also store the video stream of each device and respective timestamp information thereof.
13. A multi-channel audio-video playback apparatus, comprising: a video stream processing unit, an audio stream processing unit, and a playback unit;
the video stream processing unit is used for acquiring the video streams of a plurality of devices which are respectively stored and carry out multi-channel audio and video communication and respective timestamp information;
the audio stream processing unit is used for acquiring the stored mixed audio stream and the timestamp information thereof; the audio mixing stream is an audio stream obtained by mixing audio streams of the multiple devices;
the playback unit is used for synchronously playing each video stream and the mixed sound stream according to the timestamp information of each video stream and the timestamp information of the mixed sound stream.
14. A computer readable storage medium having stored thereon computer instructions, wherein the instructions when executed by a processor implement the method for recording or playing back multi-channel audio/video according to any one of claims 1 to 11.
15. An electronic device, comprising at least a computer-readable storage medium, and further comprising a processor;
the processor is used for reading the executable instructions from the computer readable storage medium and executing the instructions to realize the method for recording or playing back the multi-channel audio and video according to any one of the claims 1 to 11.
CN202111572204.9A 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment Active CN114257771B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111572204.9A CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111572204.9A CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN114257771A true CN114257771A (en) 2022-03-29
CN114257771B CN114257771B (en) 2023-12-01

Family

ID=80796327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111572204.9A Active CN114257771B (en) 2021-12-21 2021-12-21 Video playback method and device for multipath audio and video, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114257771B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643442A (en) * 2022-10-25 2023-01-24 广州市保伦电子有限公司 Audio and video converging recording and playing method, device, equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1205599A (en) * 1997-05-15 1999-01-20 松下电器产业株式会社 Compressed code decoding device and audio decoding device
US20110187927A1 (en) * 2007-12-19 2011-08-04 Colin Simon Device and method for synchronisation of digital video and audio streams to media presentation devices
EP2448265A1 (en) * 2010-10-26 2012-05-02 Google, Inc. Lip synchronization in a video conference
US20140049689A1 (en) * 2011-12-05 2014-02-20 Guangzhou Ucweb Computer Technology Co., Ltd Method and apparatus for streaming media data processing, and streaming media playback equipment
CN104601863A (en) * 2013-09-12 2015-05-06 深圳锐取信息技术股份有限公司 IP matrix system for recording and playing
US20170105038A1 (en) * 2015-10-09 2017-04-13 Microsoft Technology Licensing, Llc Media Synchronization for Real-Time Streaming
US20170111680A1 (en) * 2015-10-14 2017-04-20 International Business Machines Corporation Synchronization of live audio and video data streams
US20170318323A1 (en) * 2016-04-29 2017-11-02 Mediatek Singapore Pte. Ltd. Video playback method and control terminal thereof
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
CN109714634A (en) * 2018-12-29 2019-05-03 青岛海信电器股份有限公司 A kind of decoding synchronous method, device and the equipment of live data streams
US20200241835A1 (en) * 2019-01-30 2020-07-30 Shanghai Bilibili Technology Co., Ltd. Method and apparatus of audio/video switching
CN112235597A (en) * 2020-09-17 2021-01-15 深圳市捷视飞通科技股份有限公司 Method and device for synchronous protection of streaming media live broadcast audio and video and computer equipment
CN112702559A (en) * 2021-03-23 2021-04-23 浙江华创视讯科技有限公司 Recorded broadcast abnormity feedback method, system, equipment and readable storage medium
CN112738451A (en) * 2021-04-06 2021-04-30 浙江华创视讯科技有限公司 Video conference recording and playing method, device, equipment and readable storage medium
CN113205822A (en) * 2021-04-02 2021-08-03 苏州开心盒子软件有限公司 Multi-channel audio data recording and sound mixing method and device and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1205599A (en) * 1997-05-15 1999-01-20 松下电器产业株式会社 Compressed code decoding device and audio decoding device
US20110187927A1 (en) * 2007-12-19 2011-08-04 Colin Simon Device and method for synchronisation of digital video and audio streams to media presentation devices
EP2448265A1 (en) * 2010-10-26 2012-05-02 Google, Inc. Lip synchronization in a video conference
US20140049689A1 (en) * 2011-12-05 2014-02-20 Guangzhou Ucweb Computer Technology Co., Ltd Method and apparatus for streaming media data processing, and streaming media playback equipment
CN104601863A (en) * 2013-09-12 2015-05-06 深圳锐取信息技术股份有限公司 IP matrix system for recording and playing
US20170105038A1 (en) * 2015-10-09 2017-04-13 Microsoft Technology Licensing, Llc Media Synchronization for Real-Time Streaming
US20170111680A1 (en) * 2015-10-14 2017-04-20 International Business Machines Corporation Synchronization of live audio and video data streams
US20170318323A1 (en) * 2016-04-29 2017-11-02 Mediatek Singapore Pte. Ltd. Video playback method and control terminal thereof
CN108965971A (en) * 2018-07-27 2018-12-07 北京数码视讯科技股份有限公司 MCVF multichannel voice frequency synchronisation control means, control device and electronic equipment
CN109714634A (en) * 2018-12-29 2019-05-03 青岛海信电器股份有限公司 A kind of decoding synchronous method, device and the equipment of live data streams
US20200241835A1 (en) * 2019-01-30 2020-07-30 Shanghai Bilibili Technology Co., Ltd. Method and apparatus of audio/video switching
CN112235597A (en) * 2020-09-17 2021-01-15 深圳市捷视飞通科技股份有限公司 Method and device for synchronous protection of streaming media live broadcast audio and video and computer equipment
CN112702559A (en) * 2021-03-23 2021-04-23 浙江华创视讯科技有限公司 Recorded broadcast abnormity feedback method, system, equipment and readable storage medium
CN113205822A (en) * 2021-04-02 2021-08-03 苏州开心盒子软件有限公司 Multi-channel audio data recording and sound mixing method and device and storage medium
CN112738451A (en) * 2021-04-06 2021-04-30 浙江华创视讯科技有限公司 Video conference recording and playing method, device, equipment and readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115643442A (en) * 2022-10-25 2023-01-24 广州市保伦电子有限公司 Audio and video converging recording and playing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN114257771B (en) 2023-12-01

Similar Documents

Publication Publication Date Title
CN107846633B (en) Live broadcast method and system
US9172979B2 (en) Experience or “sentio” codecs, and methods and systems for improving QoE and encoding based on QoE experiences
CN104735470B (en) A kind of streaming media data transmission method and device
US9398315B2 (en) Multi-source video clip online assembly
CN106488265A (en) A kind of method and apparatus sending Media Stream
CN112584087B (en) Video conference recording method, electronic device and storage medium
JP2008500752A (en) Adaptive decoding of video data
US11115706B2 (en) Method, client, and terminal device for screen recording
CN112954433B (en) Video processing method, device, electronic equipment and storage medium
CN114546308A (en) Application interface screen projection method, device, equipment and storage medium
CN108366044B (en) VoIP remote audio/video sharing method
Tang et al. Audio and video mixing method to enhance WebRTC
WO2016008131A1 (en) Techniques for separately playing audio and video data in local networks
CN114257771B (en) Video playback method and device for multipath audio and video, storage medium and electronic equipment
WO2017016266A1 (en) Method and device for implementing synchronous playing
US20210227005A1 (en) Multi-user instant messaging method, system, apparatus, and electronic device
CN107040748A (en) One kind monitoring and video conference application integration platform and method
WO2024001661A1 (en) Video synthesis method and apparatus, device, and storage medium
CN110351576B (en) Method and system for rapidly displaying real-time video stream in industrial scene
WO2023011408A1 (en) Multi-window video communication method, device and system
CN114554277B (en) Multimedia processing method, device, server and computer readable storage medium
CN104754285B (en) Video conference system
CN115243074A (en) Video stream processing method and device, storage medium and electronic equipment
CN110392285B (en) Media stream processing method and device
JP5205900B2 (en) Video conference system, server terminal, and client terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant