CN110650308A

CN110650308A - QT-based audio and video stream pulling method, device, equipment and storage medium

Info

Publication number: CN110650308A
Application number: CN201911047950.9A
Authority: CN
Inventors: 曾义; 杜其昌; 吴艳茹
Original assignee: Guangzhou Hedong Technology Co Ltd
Current assignee: Guangzhou Hedong Technology Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-01-03

Abstract

The embodiment of the application discloses an audio and video stream pulling method and device based on QT, electronic equipment and a storage medium. The method comprises the following steps: using an FFmpeg library to pull a streaming media data packet from a server in real time, and carrying out protocol decoding on the streaming media data packet to obtain audio and video compression data; decoding the audio and video compressed data based on an FFmpeg library to obtain original audio data and original video data; and drawing the original video data on a Qt interface, and writing the original audio data into a Qt loudspeaker IO for playing. By adopting the technical means, the dependence of the audio/video stream on a specific architecture can be eliminated, so that the transportability of the system is ensured, the intercom system can adapt to different operating systems, and the cross-platform audio/video stream pushing is realized. And in the process of audio and video decoding and playing, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices with different performances and decoding requirements, and further realize the adjustment of audio and video performances.

Description

QT-based audio and video stream pulling method, device, equipment and storage medium

Technical Field

The embodiment of the application relates to the technical field of visual talkback, in particular to an audio and video stream pulling method, device, equipment and storage medium based on QT.

Background

At present, along with the improvement of living standard of people, the awareness of prevention of personal and property safety is gradually enhanced. Generally, the building construction of a residential area has a unified security door, and when visitors visit, the visitors call residents to open the doors by pressing down doorbells of the relevant residents. In order to better confirm the visitor's identity, the building visual intercom system is then in transit. The video intercom system is used as a set of modern residential community service measures, two-way video communication between visitors and residents is provided, the visitors and owners can directly communicate through videos and open the anti-theft door locks for the visitors, and therefore double recognition of images and voice is achieved, and safety and reliability are improved.

However, in the existing video intercom system, clients of indoor extension sets may relate to different platforms such as android, iOS, Windows, and the like, and if different audio/video decoding systems are respectively arranged corresponding to different platforms, the development time is relatively long, the maintenance cost is high, and the maintenance difficulty is large.

Disclosure of Invention

The embodiment of the application provides an audio and video stream pulling method and device based on QT, electronic equipment and a storage medium, which can adapt to different operating systems and realize cross-platform audio and video stream pulling.

In a first aspect, an embodiment of the present application provides an audio and video stream pulling method based on QT, including:

using an FFmpeg library to pull a streaming media data packet from a server in real time, and carrying out protocol decoding on the streaming media data packet to obtain audio and video compression data;

decoding the audio and video compressed data based on an FFmpeg library to obtain original audio data and original video data;

and drawing the original video data on a Qt interface, and writing the original audio data into a Qt loudspeaker IO for playing.

Further, in the process of carrying out protocol decoding on the streaming media data packet to obtain audio and video compressed data, the signaling data is deleted from the streaming media data packet to obtain the audio and video compressed data.

Further, the decoding the audio/video compressed data based on the FFmpeg library to obtain original audio data and original video data includes:

decapsulating the audio and video compressed data, and dividing the audio and video compressed data into audio compressed data and video compressed data;

respectively decoding the audio compressed data and the video compressed data by using corresponding decoders to obtain corresponding plane audio frames and plane video frames;

and resampling the plane audio frame into a cross access frame corresponding to the original audio data, and converting the plane video frame into the cross access frame corresponding to the original video data.

Further, in the step of decoding the audio compressed data and the video compressed data by using the corresponding decoders, the corresponding decoding modules are selected for decoding according to the compression coding standard.

Further, in the obtaining of the corresponding planar audio frame and planar video frame, the planar audio frame is in an FLTP format, and the planar video frame is in a YUV420p format.

Further, the audio and video compressed data are decoded, the original video data are drawn on a Qt interface, the original audio data are written into a Qt loudspeaker IO for playing, and audio parameters and video parameters are adjusted according to performance requirements.

In a second aspect, an embodiment of the present application provides an audio and video stream pulling apparatus based on QT, including:

the system comprises a pulling module, a data processing module and a data processing module, wherein the pulling module is used for pulling a streaming media data packet from a server in real time by using an FFmpeg library, and carrying out protocol decoding on the streaming media data packet to obtain audio and video compression data;

the decoding module is used for decoding the audio and video compressed data based on the FFmpeg library to obtain original audio data and original video data;

and the playing module is used for drawing the original video data on a Qt interface and writing the original audio data into a Qt loudspeaker IO for playing.

Specifically, the pulling module deletes signaling data from the streaming media data packet to obtain audio/video compressed data in the process of carrying out protocol decoding on the streaming media data packet to obtain the audio/video compressed data.

Specifically, still include:

and the parameter adjusting module is used for adjusting the audio parameters and the video parameters according to the performance requirements when decoding the audio and video compressed data, drawing the original video data on a Qt interface and writing the original audio data into a Qt loudspeaker IO for playing.

Specifically, the decoding module includes:

the shunting unit is used for de-encapsulating the audio and video compressed data and shunting the audio and video compressed data into audio compressed data and video compressed data;

the decoding unit is used for respectively decoding the audio compressed data and the video compressed data by using corresponding decoders to obtain corresponding plane audio frames and plane video frames;

and the conversion unit is used for resampling the plane audio frame into a cross access frame corresponding to the original audio data and converting the plane video frame into a cross access frame corresponding to the original video data.

Specifically, the decoding unit selects a corresponding decoding module to decode according to a compression coding standard in decoding the audio compressed data and the video compressed data respectively by using a corresponding decoder.

Specifically, in the case that the decoding unit obtains the corresponding planar audio frame and the planar video frame, the planar audio frame is in the FLTP format, and the planar video frame is in the YUV420p format.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the QT-based audiovisual pull method as described in the first aspect.

In a fourth aspect, embodiments of the present application provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the QT-based audiovisual pull method as described in the first aspect.

According to the embodiment of the application, the streaming media data packet is pulled from the server side in real time through the FFmpeg library, the audio and video compression data are obtained through protocol decoding, the original video data are drawn on the Qt interface after the audio and video compression data are decoded, and the original audio data are written into the Qt loudspeaker IO for playing. By adopting the technical means, the dependence of the audio/video stream on a specific architecture can be eliminated, so that the transportability of the system is ensured, the intercom system can adapt to different operating systems, and the cross-platform audio/video stream pushing is realized. And in the process of audio and video decoding and playing, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices with different performances and decoding requirements, and further realize the adjustment of audio and video performances.

Drawings

Fig. 1 is a flowchart of an audio and video stream pulling method based on QT according to an embodiment of the present application;

fig. 2 is a flowchart of audio/video decoding according to an embodiment of the present application;

FIG. 3 is a flow chart of the FFmpeg library processing audio and video;

fig. 4 is a schematic structural diagram of an audio and video stream pulling device based on QT according to a second embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to a third embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, specific embodiments of the present application will be described in detail with reference to the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be further noted that, for the convenience of description, only some but not all of the relevant portions of the present application are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The application provides an audio and video stream pulling method based on QT, which aims to pull a stream media data packet from a server in real time through an FFmpeg library, perform protocol de-encoding, de-encapsulating and decoding operations, and play original audio and video data of decoded original audio and video through a QT (cross-platform C + + graphical user interface application program). The video intercom system meets the requirements of decoding and playing of audio and video of clients of various different operating systems and different platforms. Compared with the existing video intercom system, when the audio/video stream is decoded and played, a set of corresponding audio/video decoding and playing system is usually set according to an operating system used by an indoor extension client of the video intercom system. The audio/video decoding and playing system is generally only applicable to the operating system of the corresponding client, and cannot get rid of the dependence on a specific architecture. The client of the video intercom system may use different operating systems, or in order to enable the audio/video stream pulling system to be capable of adapting to the operating systems of different clients, so as to achieve transportability of the audio/video stream pulling system, the QT-based audio/video stream pulling method provided by the embodiment of the application solves the problem that the client operating systems of the indoor extension of the video intercom system are incompatible with the audio/video stream pushing system through a QT (cross-platform C + + graphical user interface application) and an FFmpeg library, thereby achieving cross-platform audio/video stream pulling.

The first embodiment is as follows:

fig. 1 shows a flowchart of an audio and video stream pulling method based on QT provided in an embodiment of the present application, where the audio and video stream pulling method based on QT provided in this embodiment may be executed by an audio and video stream pulling device based on QT, the audio and video stream pulling device based on QT may be implemented in a software and/or hardware manner, and the audio and video stream pulling device based on QT may be formed by two or more physical entities or may be formed by one physical entity. Generally, the QT-based audiovisual stream device may be a client such as an indoor extension of a video intercom system.

The following description will be given by taking a QT-based audio/video stream pulling device as an example of a device for executing a QT-based audio/video stream pulling method. Referring to fig. 1, the QT-based audio and video stream pulling method specifically includes:

s110, using an FFmpeg library to pull a streaming media data packet from a server in real time, and carrying out protocol decoding on the streaming media data packet to obtain audio and video compression data.

Specifically, in the embodiment of the application, the FFmpeg library is used for pulling the streaming media data packet from the nginx server, the streaming media data packet is transmitted by adopting a streaming protocol, and the FFmpeg library is used for supporting the file acquisition of a protocol layer, so that the method is convenient. The transmission Protocol of the streaming media used in the embodiment of the present application is a Real Time Messaging Protocol (RTMP), which is a network Protocol designed for Real-Time data communication, and can satisfy the requirement of pushing the live audio and video data at the host at the gate to the indoor extension for Real-Time playing in the video intercom system. And correspondingly pulling the RTMP streaming media data packet, deleting signaling data from original streaming media protocol data in the data packet by the audio and video streaming device based on the QT, and only keeping audio and video compression data. And outputting the audio and video compressed data in the flv format after protocol resolution.

And S120, decoding the audio and video compressed data based on the FFmpeg library to obtain original audio data and original video data.

When the audio and video compressed data is decoded, the audio and video compressed data needs to be unpacked first, that is, the audio compressed coded data and the video compressed coded data are separated. Common packaging formats are mp4, mvb, flv and avi, and audio and video data which are compressed and encoded are put together to be streamed when being packaged. The audio and video needs to be separated when the audio and video decompression is carried out. The H264 coded video compression data and the AAC coded audio compression data are output after decapsulation corresponding to flv format audio/video compression data obtained by the protocol resolution in the embodiment of the application.

Then, the H264 encoded video compressed data and the AAC encoded audio compressed data are decoded, and the decoded data are restored to uncompressed audio data and video data.

The audio and video decoding process comprises the following steps:

s1201, decapsulating the audio and video compressed data, and dividing the audio and video compressed data into audio compressed data and video compressed data;

s1202, decoding the audio compression data and the video compression data by using corresponding decoders respectively to obtain corresponding plane audio frames and plane video frames;

s1203, resampling the planar audio frame into a cross access frame corresponding to the original audio data, and converting the planar video frame into a cross access frame corresponding to the original video data.

Specifically, when the FFmpeg library is unpacked, the format type of a file is determined according to file header information for audio and video compressed data obtained by protocol decoding, the file format type is analyzed, and then a proper shunt is selected for shunting. Because different file types indicate that the standards adopted by the audio and video streams during packaging are different, different packaging formats need different splitters for splitting, for example, a mkv splitter is needed for a mkv format file, and a flv splitter is needed for a flv format file. The shunt stores the shunted audio and video streams into respective buffer areas, and meanwhile extracts and collects timestamps which can ensure the synchronization of the audio and video streams when the audio and video streams are output. And by de-encapsulation, splitting the flv format audio and video compressed data into audio compressed data using an AAC coding standard and video compressed data using an H264 coding standard.

After the audio compressed data and the video compressed data which are branched out by the splitter are stored in a buffer area in a compressed format, the audio compressed data and the video compressed data can be output only after being decoded and converted by a decoder. The decoder comprises a decoding module and a selection module, the selection module can select the corresponding decoding module to respectively decode the audio compression data and the video compression data according to the coding format of the audio and video stream, and the corresponding plane audio frame and the plane video frame are obtained after decoding. Decoding the audio compression data corresponding to the AAC coding standard into a planar audio frame in an FLTP format; video compressed data corresponding to the H264 coding standard is decoded into flat video frames in YUV420p format.

It should be noted that, for the plane audio frame and the plane video frame which are decoded into the plane model, the plane audio frame and the plane video frame need to be changed into the interleaved frame side to be output for playing. The flat video frame in YUV420p format is converted into a cross access frame, and the flat audio frame in FLTP format is resampled into the cross access frame.

Referring to fig. 3, the split and decoded audio and video data is sent to an output layer, and the output layer has an indispensable function of audio and video synchronization in addition to audio output and video output. The output layer can select a proper output device for driving the audio and video stream, and the audio and video stream can be ensured to be synchronously played according to the timestamp information with the help of the audio and video synchronization module.

S130, drawing the original video data on a Qt interface, and writing the original audio data into a Qt loudspeaker IO for playing.

And when the audio and video are output, the QT (cross-platform C + + graphical user interface application program) is used for playing the audio and video. Constructing a QImage by using the output original video data, and then drawing on a Qt interface so as to display the original video data on a client; and writing the audio frame into a Qt loudspeaker IO so as to output the original audio at the client.

In one embodiment, a visitor accesses a corresponding resident through a client of a door host, the door host acquires field audio and video data through a camera and a microphone and pushes the audio and video data to a server, the client pulls the audio and video data from the server according to corresponding IP address information to perform protocol de-encoding, de-encapsulating, decoding and audio and video synchronization operations, and finally the audio and video are displayed on the client of an indoor extension of the resident. The resident can obtain the specific information of the visitor through the audio and video information displayed by the indoor extension.

Furthermore, due to the adoption of the method and the device, the audio and video data are decoded through the FFmpeg library, and the original audio and video is played by using the QT. The QT and FFmpeg libraries can adapt to different operating systems, and cross-platform audio and video plug flow is achieved. Therefore, the client for pulling the rtmp streaming media can also be a mobile terminal device such as a mobile phone of a resident. After audio and video data of a visitor site are acquired through the doorway host, the audio and video data are encoded, converged and pushed to the server. A resident installs QT-based audio and video stream pulling software in the embodiment of the application through a mobile terminal, uses the system software to pull rtmp stream media, carries out protocol de-encapsulation, de-encapsulation and decoding operations, restores audio and video data into original audio and video data, and enables the audio and video to be played on the mobile terminal of the user through audio and video synchronization operations. Therefore, cross-platform audio and video plug flow is further realized, and the use experience of a user is further optimized.

In addition, in the process of decoding and playing the audio and video compressed data, audio parameters and video parameters are adjusted according to performance requirements. By adjusting the audio and video parameters, the audio and video stream can be adapted to different hardware devices. For example, parameters such as the code rate of video decoding, the width of the image, the height of the image, the pixel format of the frame, etc. are adjusted to adjust the performance of decoding the video data frame. By the technical means for adjusting the audio and video parameters, the controllability of the audio and video data parameters of the QT-based audio and video stream pulling system in the embodiment of the application can be realized, so that the adjustment of audio and video performances at various stages is realized, and audio and video data with different performance effects are obtained.

And drawing the streaming media data packet from the server side in real time through the FFmpeg library, carrying out protocol decoding to obtain audio and video compression data, drawing the original video data on a Qt interface after the audio and video compression data are decoded, and writing the original audio data into a Qt loudspeaker IO for playing. By adopting the technical means, the dependence of the audio/video stream on a specific architecture can be eliminated, so that the transportability of the system is ensured, the intercom system can adapt to different operating systems, and the cross-platform audio/video stream pushing is realized. And in the process of audio and video decoding and playing, audio and video parameters are adjusted according to performance requirements so as to adapt to hardware devices with different performances and decoding requirements, and further realize the adjustment of audio and video performances.

Example two:

on the basis of the foregoing embodiment, fig. 4 is a schematic structural diagram of an audio and video stream pulling apparatus based on QT according to a second embodiment of the present application. Referring to fig. 4, the QT-based audio/video stream pulling apparatus provided in this embodiment specifically includes: a pull module 21, a decode module 22 and a play module 23.

The pulling module 21 is configured to pull a streaming media data packet from a server in real time by using an FFmpeg library, and perform a protocol decoding on the streaming media data packet to obtain audio/video compression data;

the decoding module 22 is configured to decode the audio/video compressed data based on an FFmpeg library to obtain original audio data and original video data;

and the playing module 23 is configured to draw the original video data on a Qt interface, and write the original audio data into a Qt speaker IO for playing.

Specifically, still include:

Specifically, the decoding module includes:

The QT-based audio and video stream pulling device provided by the second embodiment of the application can be used for executing the QT-based audio and video stream pulling method provided by the first embodiment of the application, and has corresponding functions and beneficial effects.

Example three:

an embodiment of the present application provides an electronic device, and with reference to fig. 5, the electronic device includes: a processor 31, a memory 32, a communication module 33, an input device 34, and an output device 35. The number of processors in the electronic device may be one or more, and the number of memories in the electronic device may be one or more. The processor, memory, communication module, input device, and output device of the electronic device may be connected by a bus or other means.

The memory 32 is a computer readable storage medium, and can be used to store software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the QT-based audio/video stream pulling method according to any embodiment of the present application (for example, a pulling module, a decoding module, and a playing module in the QT-based audio/video stream pulling apparatus). The memory 32 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 32 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory may further include memory located remotely from the processor, and these remote memories may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The communication module 33 is used for data transmission.

The processor 31 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory, that is, implements the QT-based audio-video pull method described above.

The input device 34 may be used to receive entered numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 35 may include a display device such as a display screen.

The electronic device provided by the above can be used to execute the QT-based audio/video stream pulling method provided by the first embodiment, and has corresponding functions and beneficial effects.

Example four:

the embodiment of the present application further provides a storage medium containing computer executable instructions, which when executed by a computer processor, are configured to execute an audio and video stream pulling method based on QT, where the audio and video stream pulling method based on QT includes: using an FFmpeg library to pull a streaming media data packet from a server in real time, and carrying out protocol decoding on the streaming media data packet to obtain audio and video compression data; decoding the audio and video compressed data based on an FFmpeg library to obtain original audio data and original video data; and drawing the original video data on a Qt interface, and writing the original audio data into a Qt loudspeaker IO for playing.

Storage medium-any of various types of memory devices or storage devices. The term "storage medium" is intended to include: mounting media such as CD-ROM, floppy disk, or tape devices; computer system memory or random access memory such as DRAM, DDR RAM, SRAM, EDO RAM, Lanbas (Rambus) RAM, etc.; non-volatile memory such as flash memory, magnetic media (e.g., hard disk or optical storage); registers or other similar types of memory elements, etc. The storage medium may also include other types of memory or combinations thereof. In addition, the storage medium may be located in a first computer system in which the program is executed, or may be located in a different second computer system connected to the first computer system through a network (such as the internet). The second computer system may provide program instructions to the first computer for execution. The term "storage medium" may include two or more storage media residing in different locations, e.g., in different computer systems connected by a network. The storage medium may store program instructions (e.g., embodied as a computer program) that are executable by one or more processors.

Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present application is not limited to the QT-based audio/video pull method described above, and may also perform related operations in the QT-based audio/video pull method provided in any embodiment of the present application.

The QT-based audio and video stream pulling apparatus, the storage medium, and the electronic device provided in the above embodiments may execute the QT-based audio and video stream pulling method provided in any embodiment of the present application, and refer to the QT-based audio and video stream pulling method provided in any embodiment of the present application without detailed technical details described in the above embodiments.

The foregoing is considered as illustrative of the preferred embodiments of the invention and the technical principles employed. The present application is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present application has been described in more detail with reference to the above embodiments, the present application is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present application, and the scope of the present application is determined by the scope of the claims.

Claims

1. QT-based audio and video stream pulling method is characterized by comprising the following steps:

2. The QT-based audio and video stream pulling method of claim 1, wherein in de-agreement of the streaming media data packet to obtain audio and video compressed data, signaling data is deleted from the streaming media data packet to obtain the audio and video compressed data.

3. The QT-based audio and video stream pulling method of claim 1, wherein the FFmpeg-based library is used for decoding the audio and video compressed data to obtain original audio data and original video data, and the method comprises the following steps:

4. The QT-based audiovisual pull stream method according to claim 3, characterized in that in said decoding of the audio compressed data and the video compressed data respectively using respective decoders, respective decoding modules are selected for decoding according to a compression coding standard.

5. The QT-based audio-video pull stream method of claim 3, wherein in the obtaining of the corresponding planar audio frame and planar video frame, the planar audio frame is in FLTP format, and the planar video frame is in YUV420p format.

6. The QT-based audio and video stream pulling method of claim 1, wherein audio parameters and video parameters are adjusted according to performance requirements during decoding of the audio and video compressed data, rendering of the original video data on a Qt interface, and writing of the original audio data into a Qt speaker IO for playing.

7. An audio and video frequency based on QT draws a class device, its characterized in that includes:

8. The QT-based audiovisual stream pulling device of claim 7, further comprising:

9. An electronic device, comprising:

a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the QT-based audiovisual pull method of any of claims 1-6.

10. A storage medium containing computer executable instructions for performing the QT-based audiovisual pull method of any of claims 1-6 when executed by a computer processor.