CN115643442A

CN115643442A - Audio and video converging recording and playing method, device, equipment and storage medium

Info

Publication number: CN115643442A
Application number: CN202211322012.7A
Authority: CN
Inventors: 崔杰城; 蔡文生; 张常华; 朱正辉; 赵定金
Original assignee: Guangzhou Baolun Electronics Co Ltd
Current assignee: Guangzhou Baolun Electronics Co Ltd
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-01-24

Abstract

The invention discloses an audio and video confluence recording and playing method, device, equipment and storage medium, wherein the method comprises the following steps: acquiring video streams of all cameras and audio streams of all microphones; determining data of a target file according to the video stream and the audio stream, and generating header information of the target file; writing each video stream and each audio stream into the target file according to the header information of the target file, so that each audio stream corresponds to each video stream one by one, and recording the target file is completed; obtaining audio streams and video streams contained in the recorded target file by analyzing the header information of the recorded target file; and responding to the video stream which needs to be played in each window determined by the user, and respectively reading the audio stream corresponding to each video stream when the video streams are synchronous, thereby realizing the playing of the audio and the video.

Description

Audio and video converging recording and playing method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of audio and video recording and playing, in particular to an audio and video converging recording and playing method, device, equipment and storage medium.

Background

At present, the confluence recording of the audio and video streams of a camera and a microphone by a recording and broadcasting host in the market is formed by splicing all input video streams into a video stream according to the width and the height, splicing all input audio streams into an audio stream, and finally synthesizing the spliced video stream and audio stream into a media file.

However, the prior art has obvious defects, mainly including: (1) And audio and video coding and decoding are required during synthesis, the software operation cost is high, and the requirement on the hardware performance of a recording and broadcasting host is high. (2) Only one video stream and one audio stream are in the synthesized media file, only pictures containing all cameras can be played during playing, and the selection of playing all video pictures or pictures of a certain camera cannot be selected, so that the method is not flexible enough.

Disclosure of Invention

The invention provides a problem, a device and equipment for recording and playing audio and video confluence, which aim to solve the technical problems that the steps of audio and video synthesis are complex and the required camera pictures cannot be played in a self-defined way in the prior art.

In order to solve the technical problem, an embodiment of the present invention provides an audio and video converging recording and playing method, including:

acquiring video streams of all cameras and audio streams of all microphones;

determining data of a target file according to the video stream and the audio stream, and generating header information of the target file;

writing each video stream and each audio stream into the target file according to the header information of the target file, so that each audio stream corresponds to each video stream one by one, and recording of the target file is completed;

acquiring an audio stream and a video stream contained in the recorded target file by analyzing the header information of the recorded target file;

and responding to the video stream which needs to be played in each window determined by the user, and respectively reading the corresponding audio stream when the audio stream is synchronous with each video stream, thereby realizing the playing of the audio and the video.

Compared with the prior art, the method and the device have the advantages that the data and the header information of the target file are determined by acquiring the video stream of each camera and the audio stream of each microphone, the video stream and the audio stream of each microphone are written into the target file through the header information of the target file, each audio stream and each video stream are ensured to be in one-to-one correspondence, so that the recording of the target file is completed, audio and video coding and decoding are not needed during synthesis, the step of audio and video recording is simplified, the audio and video recording can be ensured to be carried out only by one video stream and one audio stream in the synthesized target file through the header information of the target file, the corresponding audio stream is read when the video stream is synchronous with each video stream by responding to the video stream which is determined to be played by a user, the pictures containing all the cameras can be played, the pictures of one of the cameras can be selected, and the user experience is improved.

Preferably, the data of the target file includes: the playing time of the target file, the number of the contained audio streams and video streams, the stream index of each video stream in the target file, and the stream index of each audio stream in the target file.

As a preferred scheme, the generating of the header information of the target file specifically includes:

and respectively writing the stream indexes of the video streams and the audio streams into a head track of the target file, and further generating the head information of the target file.

It can be understood that, by respectively writing the stream indexes of the video streams and the audio streams into the head track of the target file, the video streams and the audio streams can be accurately distinguished and positioned, and by directly calling in the head track of the target file, the video streams and the audio streams in the target file can be conveniently played subsequently, and then the stream indexes corresponding to the video streams and the audio streams are written into the head information of the target file, so that the problems of complicated calling and inaccurate calling of the actual audio and video data of the video streams and the audio streams to the target file are avoided.

As a preferred scheme, the writing of the target file into each video stream and each audio stream is performed according to the header information of the target file, so that each audio stream corresponds to each video stream one to one, thereby completing the recording of the target file, specifically:

according to the stream indexes of the video streams and the audio streams in the header information of the target file, registering each audio stream and each video stream to enable each audio stream to be in one-to-one correspondence with each video stream;

and directly writing the video streams and the audio streams which are registered into the media data of the target file, thereby completing the recording of the target file.

It can be understood that each audio and video data is registered by indexing each video stream and each audio stream in the header information of the target file, so that each audio stream corresponds to each video stream one to one, the accuracy of recording and playing the audio and video files is improved, the corresponding video streams and audio stream data are directly written into the media data of the target file after the stream indexes are registered, and the recording of the target file can be accurately and efficiently realized.

As a preferred scheme, the obtaining of the audio stream and the video stream contained in the recorded target file by analyzing the header information of the recorded target file specifically includes:

and according to the head track in the head information of the recorded target file, obtaining the stream indexes of the video streams and the audio streams, and according to the media data of the target file, obtaining the video streams and the audio streams corresponding to the recorded target file.

It can be understood that the stream index of each audio/video stream is obtained through the header track in the header information of the recorded target file, and then the video stream and the audio stream written in the media data in the recorded target file can be quickly and accurately positioned.

As a preferred scheme, the responding to the video stream that the user determines that each window needs to be played, respectively reading the audio stream corresponding to each video stream when the video stream is synchronized, and further implementing the playing of the audio and video, specifically:

in response to a target audio and video played by a window selected by a user, determining a stream index corresponding to the target audio and video, and selecting a track corresponding to an index stream;

according to the selected track corresponding to the index stream, the corresponding video stream in the media data and the corresponding audio stream in synchronization with the video stream are determined and read, and then the video playing window corresponding to the stream index number of the selected audio/video is created to play the single or multiple videos.

It can be understood that the stream index corresponding to the target audio/video is determined by responding to the target audio/video played in the window selected by the user, and the track corresponding to the stream index is selected, so that the corresponding video stream in the read media data and the corresponding audio stream when the video stream is synchronized can be called to the track for playing subsequently and accurately, and the video playing window corresponding to the stream index number of the selected audio/video is created for playing a single or multiple videos.

Correspondingly, the invention also provides an audio and video interflow recording and playing device, which comprises: the device comprises an acquisition module, a header information module, a recording module, an analysis module and a playing module;

the acquisition module is used for acquiring the video stream of each camera and the audio stream of each microphone;

the header information module is used for determining data of the target file according to the video stream and the audio stream and generating header information of the target file;

the recording module is used for writing each video stream and each audio stream into the target file according to the header information of the target file, so that each audio stream corresponds to each video stream one by one, and the recording of the target file is completed;

the analysis module is used for analyzing the header information of the recorded target file to obtain an audio stream and a video stream contained in the recorded target file;

and the playing module is used for responding to the video stream which is determined by the user to be played in each window, and respectively reading the corresponding audio stream when the audio stream and the video stream are synchronous, so as to realize the playing of the audio and the video.

and directly writing the video streams and the audio streams which are subjected to the registration into the media data of the target file, thereby completing the recording of the target file.

responding to a target audio and video played by a window selected by a user, determining a stream index corresponding to the target audio and video, and selecting a track corresponding to an index stream;

Correspondingly, the invention further provides a terminal device, which comprises a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the audio and video confluence recording and playing method when executing the computer program.

Accordingly, the present invention also provides a computer readable storage medium comprising a stored computer program; when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the audio and video confluence recording and playing method.

Drawings

FIG. 1: the step flow chart of the audio and video interflow recording playing method provided by the embodiment of the invention is shown;

FIG. 2: a schematic structural diagram of an mp4 file format provided by an embodiment of the present invention;

FIG. 3: the schematic diagram of the player playing the audio and video provided by the embodiment of the invention;

FIG. 4: the structure schematic diagram of the audio and video interflow recording and playing device provided by the embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

Example one

Referring to fig. 1, the method for recording and playing an audio and video stream according to the embodiment of the present invention includes the following steps S101 to S105:

step S101: and acquiring the video stream of each camera and the audio stream of each microphone.

It should be noted that, in this embodiment, the video stream is acquired through each camera, and the audio stream is acquired through each microphone; it will be appreciated that the video stream and the audio stream acquired for a certain moment or time period are synchronized, i.e. the pictures in the video stream correspond to the sound of the audio stream.

Step S102: and determining data of the target file according to the video stream and the audio stream, and generating header information of the target file.

As a preferable solution of this embodiment, the data of the target file includes: the playing time of the target file, the number of the contained audio streams and video streams, the stream index of each video stream in the target file, and the stream index of each audio stream in the target file.

It should be noted that, preferably, the target file is in an mp4 file format, and in this embodiment, the data of the mp4 file includes a playing time of the mp4 file, a number of audio/video streams included in the mp4 file, and a stream index of each audio/video stream in the final mp4 file.

To explain further, referring to fig. 2, the mp4 file is composed of boxes (box), each of which is divided into a Header portion Header and a Data portion Data. Wherein, the Header part Header contains the type and size of the box, the Data contains the sub-box or Data, and the box can embed the sub-box. The media data Mdat contains the actual media data, and the audio and video data which are finally decoded and played are all in the surface. Track trak is a Track Box, where an mp4 may contain one or more tracks (e.g. video tracks, audio tracks), and Track related information is in the trak, which is an integrated Box (container Box) containing at least two boxes, tkhd and mdia respectively.

As a preferred solution of this embodiment, the generating header information of the target file specifically includes:

It should be noted that, in this embodiment, stream indexes of each audio/video stream are respectively written into trak of an mp4 file header, so as to generate header information of an mp4 file to be synthesized.

Step S103: and writing the target file into each video stream and each audio stream according to the header information of the target file so as to enable each audio stream to correspond to each video stream one by one, thereby completing the recording of the target file.

As a preferable solution of this embodiment, the writing of the target file into each video stream and each audio stream according to the header information of the target file, so that each audio stream and each video stream correspond to each other one by one, thereby completing the recording of the target file, specifically:

according to the stream indexes of the video streams and the audio streams in the header information of the target file, registering each audio stream and each video stream to enable each audio stream to be in one-to-one correspondence with each video stream; and directly writing the video streams and the audio streams which are subjected to the registration into the media data of the target file, thereby completing the recording of the target file.

In this embodiment, after the stream indexes of the audio and video streams are written into the trak of the mp4 file header, the actual audio and video data are directly written into the Mdat without encoding and decoding, and the index of each audio stream corresponds to the actual audio and video data one to synthesize an mp4 file. The mp4 file thus synthesized will have multiple video streams and multiple audio streams.

Step S104: and obtaining the audio stream and the video stream contained in the recorded target file by analyzing the header information of the recorded target file.

As a preferred solution of this embodiment, the obtaining, by analyzing header information of the recorded target file, an audio stream and a video stream contained in the recorded target file specifically includes:

It should be noted that, during playing, the player obtains the stream index of each stream according to trak by analyzing the header information in the mp4 file, so as to obtain the actual audio/video data according to Mdat for playing.

It can be understood that the stream index of each audio/video stream is obtained through the header track in the header information of the recorded target file, and then the video stream and the audio stream written in the media data in the recorded target file can be quickly and accurately located.

Step S105: and responding to the video stream which needs to be played in each window determined by the user, and respectively reading the audio stream corresponding to each video stream when the video streams are synchronous, thereby realizing the playing of the audio and the video.

As a preferred solution of the embodiment, in response to the determination by the user that the video stream needs to be played in each window, the audio streams corresponding to the video streams when the audio streams are synchronized are read respectively, so as to implement the playing of the audio and video, specifically:

responding to a target audio and video played by a window selected by a user, determining a stream index corresponding to the target audio and video, and selecting a track corresponding to an index stream; according to the selected track corresponding to the index stream, the corresponding video stream in the media data and the corresponding audio stream in synchronization with the video stream are determined and read, and then the video playing window corresponding to the stream index number of the selected audio/video is created to play the single or multiple videos.

It should be noted that, when the video is played, the corresponding played video stream is determined by the stream index corresponding to the played video selected by the user, so as to select the track trak corresponding to the index stream, read the corresponding audio/video data in the Mdat, and create a video playing window corresponding to the number of the selected video stream indexes on the player to play the single or multiple videos.

Further, the playing principle is equivalent to that a plurality of sub-players are created on the player to play each video stream, so that the effect that the video is visually perceived to be spliced is achieved. Since the track in mp4 contains the information of the actual media data position and size, and the Mdat can be found according to the track, the user can select a specific camera picture to be played in the player through the track, and can also play all the camera pictures in a split screen manner.

In this embodiment, when the recording and playing host performs merging and recording, the stream information of each camera video stream and each microphone audio stream is analyzed, and the playing time length, the number of audio and video streams contained, and the stream index of each audio and video stream in the final mp4 file of the finally merged mp4 file are determined.

And according to the determination result, generating header information of the mp4 file to be synthesized, and independently storing the media data of each audio and video stream as each track in the mp4 file to be synthesized according to the stream index without physically splicing the media data according to the width and the height, and outputting the media data as the mp4 file.

Referring to fig. 3, during playing, the player analyzes the header information in the mp4 file to obtain the number of audio/video streams contained, the duration and index of each audio/video stream, and determines the video stream to be played in each playing window and the stream that needs to be based on the number when the audio/video is synchronized, so as to implement split-screen playing of all video frames or single playing of a certain video frame. The invention can not physically splice the media data in width and height during confluence, so audio and video coding and decoding are not needed during confluence, the software overhead is lower, the hardware performance requirement on a recording and broadcasting host is lower, and in the mode, a user can select any one camera picture to independently watch during broadcasting, and all the camera pictures can be watched simultaneously.

The above embodiment is implemented, and has the following effects:

compared with the prior art, the method and the device have the advantages that the data and the head information of the target file are determined by acquiring the video stream of each camera and the audio stream of each microphone, the target file is written into each video stream and each audio stream through the head information of the target file, each audio stream and each video stream are ensured to be in one-to-one correspondence, so that the recording of the target file is completed, audio and video coding and decoding are not needed in the synthesis process, the audio and video recording steps are simplified, the head information of the target file can ensure that only one video stream and one audio stream exist in the synthesized target file, the video stream which is synchronous with each video stream is determined by a user, the audio stream which corresponds to each window and needs to be played is read, the pictures containing all the cameras can be played in the playing process, the picture of one of the cameras can be selected, and the user experience is improved.

Example two

Please refer to fig. 4, which is a device for recording and playing an audio and video stream according to the present invention, including: an acquisition module 201, a header information module 202, a recording module 203, an analysis module 204, and a play module 205.

The obtaining module 201 is configured to obtain a video stream of each camera and an audio stream of each microphone.

The header information module 202 is configured to determine data of the target file according to the video stream and the audio stream, and generate header information of the target file.

The recording module 203 is configured to write the target file into each video stream and each audio stream according to the header information of the target file, so that each audio stream corresponds to each video stream one to one, thereby completing recording of the target file.

The analysis module 204 is configured to obtain an audio stream and a video stream contained in the recorded target file by analyzing header information of the recorded target file.

The playing module 205 is configured to respond to a video stream that a user determines that each window needs to be played, and respectively read an audio stream corresponding to each video stream when the video streams are synchronized, thereby implementing playing of audio and video.

Preferably, the data of the object file includes: the playing time of the target file, the number of the contained audio streams and video streams, the stream index of each video stream in the target file, and the stream index of each audio stream in the target file.

registering each audio stream and each video stream according to the stream index of each video stream and each audio stream in the header information of the target file so as to enable each audio stream and each video stream to be in one-to-one correspondence; and directly writing the video streams and the audio streams which are subjected to the registration into the media data of the target file, thereby completing the recording of the target file.

As a preferred scheme of this embodiment, the obtaining of the audio stream and the video stream contained in the recorded target file by analyzing the header information of the recorded target file specifically includes:

As a preferred scheme of this embodiment, the responding to the video stream that the user determines that each window needs to be played, respectively reading the audio stream corresponding to each video stream when the audio stream is synchronized, and further implementing the playing of the audio and video, specifically includes:

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

The embodiment of the invention has the following effects:

EXAMPLE III

Correspondingly, the invention also provides a terminal device, comprising: the device comprises a processor, a memory and a computer program which is stored in the memory and configured to be executed by the processor, wherein the processor executes the computer program to realize the audio and video confluence recording and playing method according to any one of the above embodiments.

The terminal device of this embodiment includes: a processor, a memory, and a computer program, computer instructions stored in the memory and executable on the processor. The processor implements the steps in the first embodiment, such as steps S101 to S105 shown in fig. 1, when executing the computer program. Alternatively, the processor, when executing the computer program, implements the functions of the modules/units in the above-described apparatus embodiments, such as the recording module 203.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory and executed by the processor to implement the invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the terminal device. For example, the recording module 203 is configured to write the target file into each video stream and each audio stream according to the header information of the target file, so that each audio stream and each video stream are in one-to-one correspondence, thereby completing recording of the target file.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of a terminal device, and may include more or less components than those shown, or combine certain components, or different components, for example, the terminal device may also include input output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used to store the computer programs and/or modules, and the processor may implement various functions of the terminal device by running or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

Wherein, the terminal device integrated module/unit can be stored in a computer readable storage medium if it is implemented in the form of software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Example four

Correspondingly, the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus where the computer-readable storage medium is located is controlled to execute the audio and video merging recording and playing method according to any of the above embodiments.

The above-mentioned embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, and it should be understood that the above-mentioned embodiments are only examples of the present invention and are not intended to limit the scope of the present invention. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the invention, may occur to those skilled in the art and are intended to be included within the scope of the invention.

Claims

1. An audio and video interflow recording playing method is characterized by comprising the following steps:

acquiring video streams of all cameras and audio streams of all microphones;

writing each video stream and each audio stream into the target file according to the header information of the target file, so that each audio stream corresponds to each video stream one by one, and recording the target file is completed;

obtaining audio streams and video streams contained in the recorded target file by analyzing the header information of the recorded target file;

and responding to the video stream which needs to be played in each window determined by the user, and respectively reading the audio stream corresponding to each video stream when the video streams are synchronous, thereby realizing the playing of the audio and the video.

2. The audio-video interflow recording playing method according to claim 1, wherein the data of the target file comprises: the playing time of the target file, the number of the contained audio streams and video streams, the stream index of each video stream in the target file, and the stream index of each audio stream in the target file.

3. The audio-video interflow recording and playing method according to claim 2, wherein the generating of the header information of the target file specifically comprises:

4. The method for merging, recording and playing audio and video according to claim 3, wherein the writing of the target file into each video stream and each audio stream is performed according to the header information of the target file, so that each audio stream corresponds to each video stream one to one, thereby completing the recording of the target file, specifically:

5. The audio and video interflow recording and playing method according to claim 4, wherein the audio stream and the video stream contained in the recorded target file are obtained by analyzing the header information of the recorded target file, and the method specifically comprises the following steps:

6. The method for merging, recording and playing the audio and video according to claim 5, wherein the audio stream corresponding to each video stream in synchronization is read respectively in response to the video stream that the user determines to play in each window, so as to realize the playing of the audio and video, specifically:

7. An audio and video interflow recording playing device is characterized by comprising: the device comprises an acquisition module, a header information module, a recording module, an analysis module and a playing module;

8. The apparatus for merging, recording and playing of audio and video according to claim 7, wherein the data of the target file includes: the playing time of the target file, the number of the contained audio streams and video streams, the stream index of each video stream in the target file, and the stream index of each audio stream in the target file.

9. A terminal device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, wherein the processor implements the audio-video merged stream recording and playing method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program; wherein the computer program controls, when running, the device on which the computer-readable storage medium is located to execute the audio and video interflow recording and playing method according to any one of claims 1 to 6.