CN110650307A

CN110650307A - QT-based audio and video plug flow method, device, equipment and storage medium

Info

Publication number: CN110650307A
Application number: CN201911046707.5A
Authority: CN
Inventors: 曾义; 杜其昌; 吴艳茹
Original assignee: Guangzhou Hedong Technology Co Ltd
Current assignee: Guangzhou Hedong Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-01-03

Abstract

The embodiment of the application discloses an audio and video plug flow method and device based on QT, electronic equipment and a storage medium. The method comprises the following steps: accessing a camera and a microphone of the visual intercom system through an API (application programming interface) of the QT, and acquiring original audio and video; encoding the original audio and video based on an FFmpeg library; and merging the coded original audio and video into an rtmp streaming media in an flv format, and pushing the rtmp streaming media to an nginx server for being pulled by clients of different platforms. By adopting the technical scheme, the video intercom system can meet the requirements of client audio and video coding of various different operating systems and different platforms, cross-platform audio and video plug flow is realized, and further development cost is reduced. And in the process of audio and video acquisition, encoding and confluence, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices and streaming media servers with different performances, thereby realizing the adjustment of the audio and video performances.

Description

QT-based audio and video plug flow method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of visual talkback, in particular to an audio and video plug flow method, device, equipment and storage medium based on QT.

Background

At present, along with the improvement of living standard of people, the awareness of prevention of personal and property safety is gradually enhanced. Generally, the building construction of a residential area has a unified security door, and when visitors visit, the visitors call residents to open the doors by pressing down doorbells of the relevant residents. In order to better confirm the visitor's identity, the building visual intercom system is then in transit. The video intercom system is used as a set of modern residential community service measures, two-way video communication between visitors and residents is provided, the visitors and owners can directly communicate through videos and open the anti-theft door locks for the visitors, and therefore double recognition of images and voice is achieved, and safety and reliability are improved.

However, in the existing video intercom system, the client of the door host may relate to different platforms such as android, iOS, Windows, and the like, and if different audio and video codes are respectively set corresponding to different platforms, the development time is relatively long, the maintenance cost is high, and the maintenance difficulty is large.

Disclosure of Invention

The embodiment of the application provides an audio and video plug flow method, device, equipment and storage medium based on QT, which can adapt to different operating systems and realize cross-platform audio and video plug flow.

In a first aspect, an embodiment of the present application provides an audio and video plug-streaming method based on QT, including:

accessing a camera and a microphone of the visual intercom system through an API (application programming interface) of the QT, and acquiring original audio and video;

encoding the original audio and video based on an FFmpeg library;

and merging the coded original audio and video into an rtmp streaming media in an flv format, and pushing the rtmp streaming media to an nginx server for being pulled by clients of different platforms.

Further, in the acquisition of the original audio and video, according to the enumerated parameter information of the camera and the microphone of the visual intercom system, a specified format is selected for the acquisition of the original audio and video.

Further, the encoding the original audio and video based on the FFmpeg library includes:

converting the original audio into a planar audio frame with a specified format, and converting the original video into a planar video frame with a specified format;

and encoding the plane audio frame and the plane video frame by using the corresponding encoding standard, and waiting for the compressed video data and the compressed audio data in the corresponding formats.

Further, in the step of converting the original audio into the flat audio frame in the specified format and converting the original video into the flat video frame in the specified format, the flat audio frame is in the FLTP format, and the flat video frame is in the YUV420p format.

Further, in the encoding of the planar audio frame and the planar video frame by using the corresponding encoding standards, the planar audio frame is encoded by using the AAC encoding standard, and the planar video frame is encoded by using the H264 encoding standard.

Further, in the collection of the original audio and video, the encoding of the original audio and video and the merging of the encoded original audio and video into the rtmp streaming media in the flv format, the audio parameters and the video parameters are adjusted according to the performance requirements.

In a second aspect, an embodiment of the present application provides an audio and video stream pushing device based on QT, including:

the acquisition module is used for accessing a camera and a microphone of the visual intercom system through the QT API interface and acquiring original audio and video;

the encoding module is used for encoding the original audio and video based on an FFmpeg library;

and the confluence module is used for converging the coded original audio and video into rtmp streaming media in an flv format and pushing the rtmp streaming media to the nginx server so as to be pulled by clients of different platforms.

Specifically, in the process of acquiring the original audio and video through the QT, the acquisition module selects a specified format to acquire the original audio and video according to the enumerated parameter information of the camera and the microphone of the visual intercom system.

Specifically, still include:

and the parameter adjusting module is used for adjusting audio parameters and video parameters according to performance requirements in the rtmp streaming media which collects the original audio and video, codes the original audio and video and merges the coded original audio and video into flv format.

Specifically, the encoding module includes:

the conversion unit is used for converting the original audio into a planar audio frame with a specified format and converting the original video into a planar video frame with the specified format;

and the compression unit is used for encoding the plane audio frame and the plane video frame by using the corresponding encoding standard and waiting for the compressed video data and the compressed audio data in the corresponding formats.

Specifically, the conversion unit converts the original audio into a planar audio frame in a specified format, and converts the original video into a planar video frame in a specified format, where the planar audio frame is in an FLTP format, and the planar video frame is in a YUV420p format.

Specifically, the compression unit encodes the planar audio frame using the AAC coding standard and encodes the planar video frame using the H264 coding standard in encoding the planar audio frame and the planar video frame using the corresponding coding standards.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the QT-based audio video plug method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the QT-based audio video plug flow method according to the first aspect.

According to the embodiment of the application, the original audio and video are collected through the QT API, the FFmpeg library is used for encoding the audio and video of the original audio and video, the encoded audio and video data are converged into rtmp streaming media in an flv format, and the rtmp streaming media are pushed to a nginx server to be used by clients of different platforms for pulling the streaming media. By adopting the technical scheme, the video intercom system can meet the requirements of client audio and video coding of various different operating systems and different platforms, cross-platform audio and video plug flow is realized, and further development cost is reduced. And in the process of audio and video acquisition, encoding and confluence, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices and streaming media servers with different performances, thereby realizing the adjustment of the audio and video performances.

Drawings

Fig. 1 is a flowchart of an audio and video plug flow method based on QT according to an embodiment of the present application;

FIG. 2 is a flowchart of encoding provided in an embodiment of the present application;

fig. 3 is a schematic view of setting video capture parameters according to an embodiment of the present application;

fig. 4 is a schematic diagram of audio and video coding parameter setting provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a push flow parameter setting provided in an embodiment of the present application;

fig. 6 is a schematic structural diagram of an audio and video stream pushing device based on QT according to a second embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The QT-based audio and video stream pushing method aims to collect field audio and video through a QT (cross-platform C + + graphical user interface application), encode audio and video data through an FFmpeg library and push the encoded audio and video data to a server side in real time, and therefore the video intercom system can meet the requirements of various different operating systems and different platforms for client side audio and video encoding. When the existing visual intercom system is referred to for on-site audio and video plug-streaming, a set of corresponding audio and video coding plug-streaming system is usually set according to an operating system used by a client, namely the audio and video coding plug-streaming system is only applied to the client corresponding to the operating system. Because different operating systems may be used by clients of different video intercom systems, in order to enable an audio and video coding stream pushing system to be suitable for clients of different video intercom systems, the QT-based audio and video stream pushing method in the embodiment of the present application is provided to solve the problem that the operating system of the client is incompatible with the audio and video coding stream pushing system, thereby realizing cross-platform audio and video stream pushing.

The first embodiment is as follows:

fig. 1 shows a flowchart of an audio and video stream pushing method based on QT provided in an embodiment of the present application, where the audio and video stream pushing method based on QT provided in this embodiment may be executed by an audio and video stream pushing device based on QT, the audio and video stream pushing device based on QT may be implemented in a software and/or hardware manner, and the audio and video stream pushing device based on QT may be formed by two or more physical entities or may be formed by one physical entity. Generally, the QT-based audio/video stream pushing device may be a client terminal such as a door host of a video intercom system.

The following description takes the QT-based audio and video stream pushing device as an example of a main body for executing the QT-based audio and video stream pushing method. Referring to fig. 1, the QT-based audio and video stream pushing method specifically includes:

and S110, accessing a camera and a microphone of the visual intercom system through an API (application programming interface) of the QT, and acquiring original audio and video.

Specifically, the embodiment of the application uses a multimedia technology of QT (cross-platform C + + graphical user interface application framework) to realize the acquisition of audio and video, and provides a series of rich interfaces by virtue of excellent cross-platform characteristics and rich APIs thereof. When audio and video acquisition is carried out, a camera and microphone equipment of the visual intercom system are accessed through an API (application program interface) of the QT. For example, the interface provided by the QCamera class in Qt for the camera is used for acquiring the video data; on the other hand, the QAudioInput class in Qt is used to perform audio acquisition of local devices such as microphones and radios.

Illustratively, when a visitor accesses a corresponding resident by using a doorway host of the video intercom system, the doorway host of the video intercom system collects field original video data and original audio data respectively through a camera and a microphone configured on the doorway host. And correspondingly acquiring the original audio data and the original video data, and acquiring the field audio data of the corresponding visitor by the QT-based audio and video plug-flow equipment through an API (application program interface) of the QT.

When audio and video acquisition is carried out, selecting a specified format to carry out original audio and video acquisition according to enumerated parameter information of a camera and a microphone of the visual intercom system. For example, according to the hardware performance of different cameras, the frame rate, format and definition of the acquired video are different. If the format collected by the camera does not meet the system requirements, the performance of the collected video is affected. Therefore, according to the enumerated parameter information of the camera and the microphone of the visual intercom system, a formulated format is selected for collecting the original audio and video.

And S120, encoding the original audio and video based on an FFmpeg library.

The method and the device for processing the audio and video data encode the acquired original audio and video data into the audio and video data in the corresponding compression format by using the FFmpeg open source library. FFmpeg is a piece of open-source software capable of being called across platforms, can be used for recording and converting digital audio and video, and can convert the digital audio and video into stream. Besides the video and audio coding library libavcodec, the method also comprises the following parts: the libavformat is used for generating and analyzing various audio and video packaging formats and has the functions of acquiring information required by decoding to generate a decoding context structure, reading audio and video frames and the like; libavutil, which contains some common tool functions, is the basis of FFmpeg; libscale is used for image format conversion, video scene scaling and color mapping conversion. The FFmpeg is developed under a Linux platform, but the FFmpeg can also be compiled and run in other operating system environments, so that the audio and video coding plug flow requirements of different operating systems are met.

Furthermore, when the original audio and video coding is carried out, the collected original audio and video is converted into a plane model frame from an interleaved frame. And then compressing the audio/video data converted into the plane model frame.

Referring to fig. 2, the audio and video encoding process includes:

s1201, converting original audio into a planar audio frame in a specified format, and converting an original video into a planar video frame in the specified format;

s1202, the plane audio frame and the plane video frame are coded by using the corresponding coding standards, and the compressed video data and the compressed audio data in the corresponding formats are obtained.

When the audio and video data in the specified format is coded, the collected audio and video data is in a cross storage frame format, and the original audio and video data needs to be converted into a planar model frame from a cross storage frame, and then the audio and video data of the planar model frame is coded by using corresponding coding standards respectively.

Specifically, the acquired video data format corresponding to the acquired original video data is an RGB32 interleaved frame format, which needs to be converted into a YUV240P flat video frame format, and then the H264 coding function of FFmpeg is used to realize video data coding, so as to obtain compressed video data in an H264 coding format. Correspondingly, the original audio data in the format of the interleaved frame is converted into a planar audio frame in the format of FLTP, and then the audio data coding is realized by using the AAC coding function of FFmpeg to obtain the compressed audio data in the AAC coding format.

And S130, converging the coded original audio and video into an rtmp streaming media in an flv format, and pushing the rtmp streaming media to a nginx server for being pulled by clients of different platforms.

And correspondingly, the compressed video data and the compressed audio data obtained after coding need to be encapsulated by using a transmission protocol to be used for stream pushing so as to be converted into stream data. Common streaming protocols are rtsp, rtmp, hls, etc. The delay of using rtmp transmission is usually 1-3 seconds, and for a scene with a very high real-time requirement, such as a visual intercom system, the demand of streaming media real-time transmission can be met by using rtmp streaming protocol. And merging the video data and the audio data into rtmp streaming media in the flv format by using the flv packaging format. The stream media file is packaged by using the flv packaging format, the formed file is extremely small, the loading speed is extremely high, the client can watch the video file through the network, and the problems that the exported SWF file is large in size and cannot be well used on the network after the video file is imported into Flash are effectively solved.

Illustratively, the streaming media file obtained by the confluence is pushed to the nginx server for being pulled by an indoor extension client of the visual intercom system. The indoor extension sets perform protocol de-encoding, de-encapsulating and decoding operations on the rtmp streaming media files by pulling rtmp streaming media in the nginx server, restore the audio and video data to original audio and video data, and play the audio and video at the client ends of the indoor extension sets through audio and video synchronization operations. And then the resident can check the audio and video through the indoor extension set to obtain the site condition of the corresponding gate host of the current visitor, so as to check the visitor information, and better confirm the identity of the visitor to ensure the personal safety.

Furthermore, the original audio and video acquisition is carried out by using the QT and the encoding is carried out by the FFmpeg library, so that the method and the device can adapt to different operating systems and realize cross-platform audio and video plug flow. Therefore, the client for pulling the rtmp streaming media can also be a mobile terminal device such as a mobile phone of a resident. After audio and video data of a visitor site are acquired through the doorway host, the audio and video data are encoded, converged and pushed to the server. The resident correspondingly installs QT-based audio and video stream pulling software through the mobile terminal, uses the system software to pull rtmp stream media, carries out protocol de-encapsulation, de-encapsulation and decoding operations, restores the audio and video data into original audio and video data, and enables the audio and video to be played on the mobile terminal of the user through audio and video synchronization operations. Therefore, cross-platform audio and video plug flow is further realized, and the use experience of a user is further optimized.

In addition, referring to fig. 3 to 5, in the process of audio and video acquisition, encoding and streaming, the audio and video push streaming method based on QT according to the embodiment of the present application adjusts audio parameters and video parameters according to performance requirements. As shown in fig. 3, the related acquisition parameters of the video are adjusted to adapt to hardware with different performances of the camera, so as to adjust the performances of the acquired video frames and achieve the effect of adjusting the performances of the video. Similarly, when the relevant acquisition parameters of the corresponding audio are adjusted, parameters such as the number of channels, the encoder id, the channel layout, the sampling rate and the like are adjusted, so that the performance of the acquired audio frame is adjusted, and the effect of adjusting the audio performance is achieved. As shown in fig. 4, in the audio and video encoding process, parameter adjustment is performed on the audio encoder and the video encoder respectively. For a video encoder, parameters such as the number of encoding threads, code rate, the width of an output image, the height of the output image, a key frame period, the maximum b frame number, the pixel format of the output frame, the compression rate and the like are adjusted so as to adjust the performance of a compressed video data frame; for an audio encoder, parameters such as an audio sampling rate, the number of audio sampling channels, the number of bytes of a frame of audio data, a sampling format, a resampling format, the number of encoded threads, a code rate, a single-channel sample size and the like are adjusted to adjust the performance of compressing an audio data frame. As shown in fig. 5, in the stream pushing process, the setting and the encapsulation format of the stream transmission protocol are adjusted, and the flv encapsulation format and the rtmp stream transmission protocol used in the embodiment of the present application perform the stream merging operation on the audio and video data. By the technical means of parameter adjustment in the audio and video acquisition, encoding and confluence processes, the QT-based audio and video plug-flow system can realize controllable audio and video data parameters, so that the audio and video performance at each stage can be adjusted, and audio and video data with different performance effects can be obtained.

The original audio and video are collected through an API (application programming interface) interface of the QT, the FFmpeg library is used for carrying out audio and video coding on the original audio and video, the coded audio and video data are converged into rtmp streaming media in an flv format, and the rtmp streaming media are pushed to a nginx server to be used by clients of different platforms for pulling the streaming media. By adopting the technical scheme, the video intercom system can meet the requirements of client audio and video coding of various different operating systems and different platforms, cross-platform audio and video plug flow is realized, and further development cost is reduced. And in the process of audio and video acquisition, encoding and confluence, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices and streaming media servers with different performances, thereby realizing the adjustment of the audio and video performances.

Example two:

on the basis of the foregoing embodiment, fig. 6 is a schematic structural diagram of an audio and video stream pushing device based on QT according to the second embodiment of the present application. Referring to fig. 6, the QT-based audio and video stream pushing apparatus provided in this embodiment specifically includes: an acquisition module 21, an encoding module 22 and a merging module 23.

The acquisition module 21 is used for accessing a camera and a microphone of the visual intercom system through an API (application programming interface) of the QT and acquiring original audio and video;

the encoding module 22 is used for encoding the original audio and video based on an FFmpeg library;

and the merging module 23 is configured to merge the encoded original audio and video into an rtmp streaming media in an flv format, and push the rtmp streaming media to the nginx server, so as to be pulled by clients of different platforms.

Specifically, still include:

Specifically, the encoding module includes:

The QT-based audio and video stream pushing device provided by the second embodiment of the application can be used for executing the QT-based audio and video stream pushing method provided by the first embodiment of the application, and has corresponding functions and beneficial effects.

Example three:

an embodiment of the present application provides an electronic device, and with reference to fig. 7, the electronic device includes: a processor 31, a memory 32, a communication module 33, an input device 34, and an output device 35. The number of processors in the electronic device may be one or more, and the number of memories in the electronic device may be one or more. The processor, memory, communication module, input device, and output device of the electronic device may be connected by a bus or other means.

The memory 32 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the QT-based audio/video stream pushing method according to any embodiment of the present application (for example, the capture module, the encoding module, and the stream merging module in the QT-based audio/video stream pushing apparatus). The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The communication module 33 is used for data transmission.

The processor 31 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory, that is, the QT-based audio/video plug flow method described above is implemented.

The input device 34 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 35 may include a display device such as a display screen.

The electronic device provided by the above can be used to execute the QT-based audio/video stream pushing method provided by the first embodiment, and has corresponding functions and beneficial effects.

Example four:

the embodiment of the present application further provides a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to perform a QT-based audio and video plug flow method, where the QT-based audio and video plug flow method includes: accessing a camera and a microphone of the visual intercom system through an API (application programming interface) of the QT, and acquiring original audio and video; encoding the original audio and video based on an FFmpeg library; and merging the coded original audio and video into an rtmp streaming media in an flv format, and pushing the rtmp streaming media to an nginx server for being pulled by clients of different platforms.

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media residing in different locations, e.g., in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present application is not limited to the QT-based audio and video plug streaming method described above, and may also perform related operations in the QT-based audio and video plug streaming method provided in any embodiment of the present application.

The QT-based audio and video stream pushing device, the storage medium, and the electronic device provided in the above embodiments may execute the QT-based audio and video stream pushing method provided in any embodiment of the present application, and refer to the QT-based audio and video stream pushing method provided in any embodiment of the present application without detailed technical details described in the above embodiments.

The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims

1. QT-based audio and video plug-flow method is characterized by comprising the following steps:

encoding the original audio and video based on an FFmpeg library;

2. The QT-based audio and video plug-in method according to claim 1, characterized in that in the acquisition of original audio and video through QT, an assigned format is selected for the acquisition of original audio and video according to enumerated parameter information of a camera and a microphone of a visual intercom system.

3. The QT-based audio and video plug-streaming method according to claim 1, wherein the encoding of the original audio and video based on the FFmpeg library comprises:

4. The QT-based audio and video plug-streaming method according to claim 3, wherein in the converting of the original audio into the flat audio frames of the specified format and the converting of the original video into the flat video frames of the specified format, the flat audio frames are in FLTP format and the flat video frames are in YUV420p format.

5. The QT-based audio-video plug-streaming method according to claim 3, wherein in the encoding of the planar audio frames and the planar video frames using the respective encoding standards, the planar audio frames are encoded using the AAC encoding standard and the planar video frames are encoded using the H264 encoding standard.

6. The QT-based audio and video plug flow method of claim 1, wherein audio parameters and video parameters are adjusted according to performance requirements in the acquisition of original audio and video, the encoding of the original audio and video, and the merging of the encoded original audio and video into rtmp streaming media in flv format.

7. An audio and video plug flow device based on QT, its characterized in that includes:

8. An audio and video QT-based stream plug apparatus according to claim 7, comprising:

9. An electronic device, comprising:

a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the QT-based audio video plug method of any of claims 1-6.

10. A storage medium containing computer executable instructions for performing the QT-based audio video plug streaming method of any of claims 1-6 when executed by a computer processor.