WO2023231478A1

WO2023231478A1 - Audio and video sharing method and device, and computer-readable storage medium

Info

Publication number: WO2023231478A1
Application number: PCT/CN2023/078498
Authority: WO
Inventors: 杨海城; 黄图斌; 董桥桥; 钱宇; 严敏之
Original assignee: 中兴通讯股份有限公司
Priority date: 2022-05-31
Filing date: 2023-02-27
Publication date: 2023-12-07
Also published as: CN117201719A

Abstract

The present disclosure relates to communication technology. Provided in the present disclosure are an audio and video sharing method and device, and a computer-readable storage medium. The method comprises: acquiring an audio stream and a video stream in audio and video to be shared, and parsing the video stream into video frames; determining audio information corresponding to each video frame according to timestamp information of said audio and video; and when an audio and video sharing instruction has been received, sending to a receiving end the video frames and the audio information corresponding to the video frames, so as to share said audio and video. By means of the method in the present disclosure, audio information corresponding to each video frame is determined according to timestamp information of an audio and video file, thus realizing accurate synchronization of a video stream and an audio stream; and according to an audio and video sharing instruction, the video frames and the audio information corresponding to the current video frames are synchronously transmitted, and the audio and video file with synchronized audio and picture is received at a receiving end, thus improving the user experience.

Description

Audio and video sharing method, device and computer-readable storage medium

Cross-references to related applications

This disclosure is based on the Chinese patent application CN202210606704.8 with the invention title "Audio and Video Sharing Method, Device and Computer-Readable Storage Medium" submitted on May 31, 2022, and claims the priority of this patent application, and all its contents are incorporated by reference. The entire disclosure is incorporated into this disclosure.

Technical field

The present disclosure relates to the field of communication technology, and in particular to an audio and video sharing method, device and computer-readable storage medium.

Background technique

With the continuous development of communication technology, network communication has an increasing impact on users' daily lives. In daily life, people often use network communication to communicate in a timely manner across space constraints and achieve rapid transmission of information. The video sharing function based on network communication can be used to share local information to the sharing terminal. However, it is currently not possible to share audio and video files synchronously during a call. The receiving end needs to view the audio and video files sent by the sending end locally, and there is a problem of audio and video being out of sync when outputting the audio and video files, which reduces the user experience. Therefore, how to solve the problem of being unable to share audio and video files synchronized with audio and video during existing audio and video calls has become an urgent technical problem that needs to be solved.

Contents of the invention

The main purpose of the present disclosure is to provide an audio and video sharing method, device and storage medium, aiming to solve the existing technical problem of being unable to share audio and video files that synchronize audio and video during audio and video calls.

The present disclosure provides an audio and video sharing method. The audio and video sharing method includes: obtaining the audio stream and video stream in the audio and video to be shared, and parsing the video stream into video frames; and determining based on the timestamp information of the audio and video to be shared. Audio information corresponding to each video frame; when receiving an audio and video sharing instruction, the video frame and the audio information corresponding to the video frame are sent to the receiving end to share the audio and video to be shared.

The present disclosure also provides an audio and video sharing device. The audio and video sharing device includes a processor, a memory, and an audio and video sharing program stored on the memory and executable by the processor, wherein the audio and video sharing program is executed by the processor. When, implement the steps of the above audio and video sharing method.

The present disclosure also provides a computer-readable storage medium. An audio and video sharing program is stored on the computer-readable storage medium. When the audio and video sharing program is executed by a processor, the steps of the above audio and video sharing method are implemented. .

Description of the drawings

Figure 1 is a schematic diagram of the hardware structure of the audio and video sharing device involved in the embodiment of the present disclosure;

Figure 2 is a schematic flowchart of the first embodiment of the audio and video sharing method of the present disclosure;

Figure 3 is a schematic flow chart of the second embodiment of the audio and video sharing method of the present disclosure;

Figure 4 is a schematic flowchart of a third embodiment of the audio and video sharing method of the present disclosure;

Figure 5 is a schematic flowchart of the fourth embodiment of the audio and video sharing method of the present disclosure;

FIG. 6 is a functional module diagram of the first embodiment of the audio and video sharing device of the present disclosure.

The realization of the purpose, functional features and advantages of the present disclosure will be further described with reference to the embodiments and the accompanying drawings.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the present disclosure and are not used to limit the present disclosure.

The audio and video sharing method involved in the embodiments of the present disclosure is mainly applied to audio and video sharing devices. The audio and video sharing devices may be devices with display and processing functions such as PCs, portable computers, and mobile terminals.

Referring to Figure 1, Figure 1 is a schematic diagram of the hardware structure of the audio and video sharing device involved in the embodiment of the present disclosure. In the embodiment of the present disclosure, the audio and video sharing device may include a processor 1001 (such as a CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to realize connection and communication between these components; the user interface 1003 can include a display screen (Display) and an input unit such as a keyboard (Keyboard); the network interface 1004 can optionally include a standard wired interface and a wireless interface. (such as WI-FI interface); the memory 1005 can be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 can optionally be a storage device independent of the aforementioned processor 1001 .

Those skilled in the art can understand that the hardware structure shown in Figure 1 does not constitute a limitation on the audio and video sharing equipment, and may include more or less components than shown in the figure, or combine certain components, or or different component arrangements.

Continuing to refer to Figure 1, the memory 1005 as a computer-readable storage medium in Figure 1 can include an operating system, a network communication module, and an audio and video sharing program.

In Figure 1, the network communication module is mainly used to connect to the server and perform data communication with the server; and the processor 1001 can call the audio and video sharing program stored in the memory 1005 and execute the audio and video sharing method provided by the embodiment of the present disclosure.

Embodiments of the present disclosure provide an audio and video sharing method.

Referring to Figure 2, Figure 2 is a schematic flowchart of the first embodiment of the audio and video sharing method of the present disclosure.

In this embodiment, the audio and video sharing method includes the following steps:

Step S10, obtain the audio stream and video stream in the audio and video to be shared, and parse the video stream into video frames;

In this embodiment, the audio and video to be shared can be played through the video player in the mobile terminal; when the video player reads the audio and video files, it will transcode and decode the video stream and audio stream in the audio and video files respectively; Convert the format of the audio and video files into the code stream corresponding to the video player, and then decode and output the transcoded audio and video stream.

Further, the step S10 specifically includes:

The audio and video files to be shared are read through the currently running video player, and the audio and video files to be shared are parsed into video frames corresponding to the audio stream and the video stream.

In this embodiment, an audio and video file is compressed in one or more formats, that is, video encoding and audio encoding. The purpose of encoding is to reduce the amount of data and facilitate storage and transmission. When opening an audio and video file, you need to determine whether the current video player has a parsing protocol for the compression format of the audio and video file. If not, you need to change to a video player that can parse it, or convert the audio and video file into the video. A compression format that the player can parse.

Specifically, because the audio and video files include audio and video, and the audio stream and the video stream are compressed separately, because the two compression algorithms are different, the decoding is also different, so the audio stream and the video stream need to be decoded separately.

Understandably, for streaming audio and video files, the audio and video files need to be parsed into standard encapsulation format data, such as parsing the data transmitted by the PTMP protocol and outputting data in FLV format.

In a specific embodiment, although the audio stream and the video stream are compressed separately, they are bundled together for transmission during the transmission process. Therefore, when parsing audio and video files, first To separate the audio stream and the video stream, that is, demultiplex or decapsulate, the input encapsulated format data is separated into audio stream compression encoding data and video stream compression encoding data. There are many types of encapsulation formats, such as MP4, MKV, RMVB, TS, FLV, AVI, etc. Its function is to put the compressed and encoded video data and audio data together in a certain format. For example, after decapsulating data in FLV format, an H.264-encoded video stream and an AAC-encoded audio stream are output.

In a specific embodiment, the audio/video compressed coded data is decoded into uncompressed audio/video original data through a decoder, wherein the decoded output of the video compressed coded data is uncompressed color data, such as YUV, RGB, etc., while the audio compressed coded data is The decoded output of the encoded data is uncompressed audio sample data, such as PCM data.

Specifically, the video stream is parsed into a series of video frames according to the specified size through the parsing module of the video player, and the video frames are numbered to obtain the frame number of each video frame.

In specific embodiments, when establishing an IMS video call (such as VoNR, VoLTE, VoWiFi, etc.), both parties to the call will negotiate the picture size, frame rate, etc. of the video call. The parsing module can parse audio and video files (such as .mp4 files) into static video frames of corresponding sizes (such as 480×640) according to the picture size of the video call.

In a specific embodiment, the video player synchronizes the decoded video data and audio data based on the parameter information obtained during the processing of the decapsulation module, and plays them through the graphics card and sound card of the sharing end.

Step S20: Determine the audio information corresponding to each video frame based on the timestamp information of the audio and video to be shared;

In this embodiment, when parsing the audio and video to be shared, there will be a display timestamp and a decoding timestamp. The display timestamp and decoding timestamp of the audio are the same. The decoding order of the video will be different due to the existence of image compression. The display order is not the same, that is, the display timestamp and decoding timestamp are different.

Among them, the display timestamp refers to the display order of data decoded from the data packet; the decoding timestamp refers to the decoding order of the data packet.

Specifically, in the existing technical solutions, for the synchronization of the audio stream and the video stream, the relative playback time of the video stream and the audio stream is mostly calculated to determine the difference between the two. When the difference is small, the delay method can be used to catch up. And if the video stream lags far behind the audio stream, it needs to catch up by losing frames. This method cannot achieve more precise synchronization adjustment, and there will still be differences in sound and picture. There is a synchronization problem; and if there are frame drops, it will cause the screen playback connection to be unsmooth and the user experience to be poor.

In this embodiment, after the audio and video to be shared are parsed into audio streams and video streams, the audio streams and video streams will not be transmitted or played immediately. Instead, the timestamps of the parsed audio streams and video streams will be aligned to determine each The audio information corresponding to the video frame.

Specifically, you can first select a reference clock (such as the system clock of the processor); when encoding, you can time stamp each audio and video data block based on the reference clock; when parsing the audio stream and video stream, each frame Video frames and each piece of audio information will also have timestamp information, which can determine the corresponding relationship between video frames and audio information.

Step S30: When receiving an audio and video sharing instruction, the video frame and the audio information corresponding to the video frame are sent to the receiving end to share the audio and video to be shared.

In the existing technology, to share audio and video files, the audio and video files can be sent to the receiving end, and then played and displayed at the receiving end. However, because the playback speeds of the video stream and the audio stream are inconsistent, the audio and video files played at the receiving end There is a problem of audio and video being out of sync.

In this embodiment, by establishing a data channel for audio and video calls or creating a new data channel, video frames and audio information corresponding to the same timestamp information are synchronously output to the receiving end according to the timestamp order. The audio and video files received by the receiving end are Audio stream and video frame files corrected and synchronized based on timestamp information. And before sending the audio stream and video frame, the sharing end first plays the video frame and the corresponding audio information synchronously according to the synchronization signal (that is, the frame number of the currently played video frame), thereby outputting the synchronized audio and video stream, and the receiving end receives the process. Synchronized audio and video streams can play audio and video files with synchronized audio and video.

In this embodiment, the call type for establishing the audio and video call may be an IMS call type or an OTT call type.

Among them, IMS is also called IP multimedia subsystem, which can implement voice services under packet switching networks, including VoNR, VoLTE, VoWiFi and other types.

Specifically, VoNR is a call solution based on pure 5G access, which enables both voice and data services to be carried on the 5G network.

Specifically, VoLTE is an IP data transmission technology that does not require a 2G/3G network. All services are carried on the 4G network, which can realize the unification of data and voice services under the same network. VoLTE is an end-to-end voice solution built on the 4G network under all-IP conditions. Compared with traditional calls, VoLTE high-definition call technology has fast connection times and close to zero call drop rates. LTE’s spectrum utilization efficiency is far superior to traditional calls. Standard.

Specifically, VoWiFi is a voice service provided through WiFi networks. Users can make calls without mobile signals. VoWiFi is a complementary technology to VoLTE. Compared with traditional calling services, VoWiFi prioritizes the use of WiFi networking to implement calling functions and automatically and seamlessly switches between mobile networks and WiFi networks. Users can make calls in different locations without special settings. More importantly, using WiFi networking overcomes the problem of poor signal indoors or in the basement. In places with weak network coverage or interference, you can make or receive calls as long as you can connect to WiFi.

Among them, OTT refers to Internet companies that go beyond operators (telecom, mobile, linkage) to develop Internet-based value-added services such as video, social networking, games, and data services, including WeChat, Skype and other types.

Specifically, WeChat is an application that provides instant messaging services for smart terminals. It can quickly send voice messages, videos, pictures, text, etc. across communication operators and operating system platforms through the network.

Specifically, Skype is an instant messaging software that has the functions required for IM, such as video chat, multi-person voice conferencing, multi-person chat, file transfer, file chat, etc.

Understandably, after completing the audio and video sharing, close the video player and end the audio and video sharing. The sharing terminal can switch the current display interface, display the pictures collected by the terminal camera in real time, and switch the current audio device to the call audio device to collect the user's Voice information, and send the real-time collected pictures and voice information to the receiving end through the call data channel until the audio and video call ends.

The present disclosure provides an audio and video sharing method. The method obtains the audio stream and video stream in the audio and video to be shared, and parses the video stream into video frames; according to the timestamp information of the audio and video to be shared, The audio information corresponding to each video frame is determined in the audio stream; based on the call data channel or new data channel of the audio and video call, each video frame, and the audio information corresponding to each video frame, all the audio information corresponding to each video frame is synchronously sent to the receiving end. Audio and video to be shared. Through the above method, the present disclosure parses the video stream of the audio and video file into video frames, and parses the timestamp information corresponding to the video stream. There is a corresponding relationship between the video frame and the timestamp information; and the audio stream of the audio and video file also corresponds to The timestamp information of audio and video can be determined, so the audio information corresponding to each video frame can be determined to achieve precise synchronization of video stream and audio stream; during audio and video calls, by establishing a data channel for audio and video calls or establishing a data channel for transmitting data The newly created data channel synchronously transmits the video frame and the audio information corresponding to the current video frame. The receiving end can receive the accurately synchronized video stream and audio stream, that is, the receiving end receives the audio and video files with synchronized audio and video, improving user experience, solving the current problem of Technical issue that audio and video files synchronized with audio and video cannot be shared during video calls.

Referring to Figure 3, Figure 3 is a schematic flow chart of a second embodiment of the audio and video sharing method of the present disclosure.

Based on the above embodiment shown in Figure 2, in this embodiment, step S30 specifically includes:

Step S31, determine the current video frame in the video frame, and synchronously output the current video frame and the audio information corresponding to the current video frame according to the frame number of the current video frame, to obtain synchronized audio and video flow;

In this embodiment, the video stream and audio stream with time stamp alignment are simultaneously played through the video player on the sharing end. During the parsing process, the video stream is further parsed into video frames, and the video frames are numbered to obtain each video frame. frame number.

Further, when the video stream and the audio stream are played synchronously through the video player, the video frame corresponding to the frame number is synchronously output according to the frame number corresponding to the currently played video stream, and the audio stream is synchronously output to the audio front end while being played. , thereby achieving synchronous output of video frames and audio streams, and obtaining synchronous output of audio and video streams.

Step S32: Send the synchronized audio and video stream to the receiving end based on the data transmission channel, so that the receiving end plays the audio and video to be shared synchronously.

In this embodiment, according to the type of audio and video calls, the synchronously output audio and video streams are sent to the receiving end by establishing a call data channel for the audio and video calls or creating a new data channel, and the receiving end can call the corresponding target application (such as video playback devices, audio and video applications, etc.) to directly play the synchronized audio and video streams.

Further, the step S32 specifically includes:

Send application data information corresponding to the target application to the receiving end through the application data channel in the data transmission channel; wherein the target application is used to play the audio and video stream;

The synchronized audio and video stream is sent to the receiving end through a transmission data channel in the data transmission channel.

In this embodiment, when establishing a VoNR+ video call, a non-Bootstrap data channel can be established to transmit audio streams and video frames to the receiving end; a Bootstrap data channel can then be established to transmit the target application for playing audio and video files to the receiving end. Corresponding application data information.

Further, before sending the application data information corresponding to the target application to the receiving end through the application data channel, the method further includes:

Determine whether the application data information exists in the network data server;

If the application data information does not exist in the network data server, the local application data information is sent to the receiving end based on the application data channel.

In this embodiment, the sharing end can send a SIP UPDATE message carrying the dcmap field to the network side, and then establish a data channel with the network data server or the receiving end; the sharing end can pre-save the audio and video to be shared in the network data server.

Specifically, when establishing an audio and video call, when the sharing terminal and the receiving terminal establish a data connection, the receiving terminal simultaneously establishes a data channel with the network data server; when receiving an audio and video file sharing instruction, the sharing terminal displays the network data according to the sharing instruction. server, and queries the target application in the network data server. If the target application exists, directly points to the location of the target application according to the query result, and sends the application data information corresponding to the target application through the data channel between the receiving end and the network data server. ; If the target application does not exist in the network data server, the query will show no results. The sharing end can choose to close the network data server and query the target application locally.

Specifically, the sharing end can select audio and video to be shared that are pre-stored in the network data server and transmit them to the receiving end through the data channel of the audio and video call, or directly transmit them through the data channel between the network data server and the receiving end. When audio and video files need to be shared, the sharing end can select audio and video files and applications in the network data server, or can choose audio and video files and applications stored locally on the sharing end.

Among them, the dcmap field represents the need to establish a data channel. The data channel can be a Bootstrap data channel used to transmit data channel applications, or a non-Bootstrap data channel used to transmit data information required by data channel applications.

In a specific embodiment, if the audio, video and application data information to be shared are not in the network data server, you can select locally stored audio and video files or streaming videos, as well as the corresponding applications, and transmit them to the receiving end through the data channel established for the call. .

Further, if the application data information exists in the network data server, the application data information is sent to the receiving end through the network data server.

In a specific embodiment, if the audio, video and application data information to be shared have been stored in the network data server in advance, the sharing end can select the corresponding audio and video file to be shared in the network data server, and complete the editing of the audio and video file to be shared on the sharing end. Operations such as analysis and synchronization; then the synchronized audio stream and video frame are sent to the receiving end through the data channel, and the corresponding video frame and audio stream are played on the receiving end through real-time transmission of the video frame number currently played by the sharing end. Realize audio and video at both ends of the call Synchronous playback of video files.

In a specific embodiment, the receiving end can directly obtain the audio and video files and applications from the network data server, open the audio and video files through the application, and the sharing end transmits the frame number of the currently played video frame in real time. Play the corresponding video frames and audio streams, thereby achieving synchronous playback of audio and video files at both ends of the call.

Optionally, the sharing end can replace the output video frame with the picture currently collected by the camera, and output the audio stream to the audio device collected by the microphone. Through the frame number corresponding to the currently played video stream, the video corresponding to the frame number is synchronously output. Frame and audio information, and through the call data channel of the audio and video call, the video frame and audio information played by the current call are transmitted to the receiving end, and played synchronously at the receiving end.

As another implementation, the sharing end can directly send the audio and video files outputted synchronously with the video frames and audio streams to the receiving end. After receiving the synchronized audio and video streams, the receiving end can save the audio and video streams locally. As local audio and video files, the audio and video files can include video frame files and audio stream files; then during the audio and video call, the sender can synchronously send the frame number of the currently played audio frame to the receiving end, and the receiving end calls The corresponding playback application plays the audio and video file, and plays the current video frame corresponding to the frame number in the audio and video file according to the received frame number, and based on the timestamp information of the current video frame, synchronously plays the audio corresponding to the same timestamp information. information, you can achieve synchronous playback of the same audio and video files between the sharing end and the receiving end.

Specifically, because audio and video files are synchronously output through frame numbers, when the sharing end adjusts the playback speed of audio and video files (such as fast forward, rewind, or pause, etc.), the refresh speed of the frame number received by the receiving end will also Synchronous adjustment, then the receiving end will also realize synchronous adjustment of the audio and video file playback speed, further ensuring the synchronous playback of the same audio and video files between the sharing end and the receiving end.

Further, before step S32, it also includes:

Establish an audio and video call with the receiving end. If the audio and video call is not the VoNR+ call type, use the call data channel of the audio and video call as the data transmission channel;

If the audio and video call is a VoNR+ call type, a data channel is established with the receiving end based on the call data channel as the data transmission channel.

In this embodiment, according to the type of the audio and video call, it is determined whether data transmission is performed directly through the call data channel of the current audio and video call, or data transmission is performed through a new data channel.

Specifically, if the current audio and video call is VoNR+video call, you can create a new data call Channel is used as a data transmission channel to transmit audio and video streams or application data information; otherwise, the call data channel of the current audio and video call can be directly used as a data transmission channel to transmit audio and video streams to the receiving end.

Among them, VoNR+ (5G New Call) refers to a new data transmission channel based on 5G VoNR multimedia real-time communication, providing users with richer real-time interactive services in addition to high-definition audio and video, and establishing multimedia real-time communication capabilities. Platform-centered, unified and open network architecture enables agile development and rapid deployment of innovative services on the basis of compatibility with existing services. VoNR+ is a real-time communication network architecture based on VoNR (5GNR), which can quickly integrate new business forms to meet people's diverse communication needs. The interactive channel that carries real-time interactive information is wider, the interactive content types are more, and the interactive forms are richer.

In a specific embodiment, the sharing end first transmits the application data information for playing the audio and video to be shared to the receiving end through the data channel of Bootstrap. The receiving end generates the corresponding target application through the application data information, which can be a temporary application. (such as a small program, etc.), or it can be an application download link, through which the receiving end downloads the corresponding media playback application.

In a specific embodiment, after completing the transmission of the target application, the sharing end simultaneously plays the parsed and synchronized audio stream and video stream, and synchronously transmits the audio stream and video frame to the receiving end through a non-Bootstrap data channel, and will share The frame number of the current video frame currently played by the end is transmitted to the receiving end in real time. The receiving end synchronously plays the audio and video files transmitted by the sharing end through the target application based on the frame number of the current video frame.

Among them, the Bootstrap data channel is a data channel with a Stream ID less than 1000. It is used for the terminal to obtain HTML web pages from the network side (that is, the data channel application defined in the 3GPP TS26.114 specification. The application generally includes HTML, JavaScript scripts, CSS, etc. ).

Referring to Figure 4, Figure 4 is a schematic flowchart of a third embodiment of the audio and video sharing method of the present disclosure.

Based on the above embodiment shown in Figure 2, in this embodiment, before step S30, it specifically includes:

Step S031: Determine the target audio channel corresponding to the audio information according to the data type of the audio information;

Step S032: Switch the first audio channel corresponding to the video player and the second audio channel corresponding to the audio and video call to the target audio channel, so as to play the audio information through the target audio channel while simultaneously providing the audio information to the receiver. The terminal sends the audio information synchronously.

In this embodiment, when the sharing end transmits the audio stream and video frame, it first needs to synchronize the audio stream and video frame through the timestamp information. After the synchronization is completed, the audio stream and video frame will not be transmitted immediately. Use the video player to play the audio stream and video frame simultaneously, so that the audio stream and video frame start playing at the same starting time. While playing, the audio stream and video frame are synchronously transmitted to the receiving end so that the receiving end can play synchronously. Synchronized audio and video streams.

Furthermore, the audio stream is transmitted to the receiving end while playing. In order to avoid the problem of audio and video being out of sync due to transmission delays caused by operations such as transcoding, the audio stream must be transmitted before the sharing end starts playing the audio stream. audio channel, and then switch the audio data channels of the video player and call data channel to this audio channel. While playing the audio stream, the audio stream can be output synchronously, avoiding the transcoding process, and the audio output can be realized. The timestamps of streaming and playing audio streams are synchronized, thereby achieving synchronous output of audio streams and video frames. For example, before playing, you can switch the audio channel of the video player to the audio channel corresponding to the call mode. When playing the audio and video files through the video player, the audio stream can be output and the audio stream data can be transmitted at the same time. to the receiving end and played through the call audio channel of the receiving end.

Referring to Figure 5, Figure 5 is a schematic flowchart of a fourth embodiment of the audio and video sharing method of the present disclosure.

Step S310: Determine an application whitelist corresponding to the application to which the audio and video to be shared belongs based on the application information to which the audio and video to be shared belongs, and determine whether the application to which the audio and video call belongs belongs to the application whitelist;

Step S320: If the application to which the audio and video call belongs belongs to the application whitelist, the audio and video to be shared are sent to the receiving end based on the respective video frames and the audio information corresponding to the respective video frames.

In this embodiment, when establishing an OTT audio and video call (such as WeChat, Skype, etc.), because the OTT audio and video call is a call type that transcends operators and is based on the Internet, when sharing audio and video files, it is necessary to establish an OTT audio and video call. The permissions of the calling app are checked.

Specifically, when the audio and video to be shared comes from a short video application (such as Douyin, Kuaishou, etc.), it is necessary to obtain the shareable application whitelist of the application. In the whitelist, the information that can be used for sharing will be displayed. Call application derived from the audio and video files of the short video application.

Specifically, when the calling application that establishes the OTT audio and video call is in the whitelist, the audio and video to be shared can be shared to the receiving end directly through the calling application; and when the calling application is not in the whitelist, you need to use the calling application to share the audio and video to be shared with the receiving end. Only by setting the permissions of the calling application and adding the calling application to the whitelist can the audio and video to be shared be shared. For example, when establishing an audio and video call through WeChat, the sharing terminal needs to share an audio and video file in Douyin. At this time, the sharing terminal needs to determine whether WeChat has the permission to share the audio and video file. If WeChat is not selected in the whitelist, then It means that WeChat does not have the sharing permission, so you need to set the permissions so that WeChat can share Douyin audio and video files.

In addition, embodiments of the present disclosure also provide an audio and video sharing device.

Referring to FIG. 6 , FIG. 6 is a schematic diagram of functional modules of the audio and video sharing device according to the first embodiment of the present disclosure.

In this embodiment, the audio and video sharing device includes:

The audio and video parsing module 10 is used to obtain the audio stream and video stream in the audio and video to be shared, and parse the video stream into video frames;

The audio and video stream synchronization module 20 is used to determine the audio information corresponding to each video frame based on the timestamp information of the audio and video to be shared;

The audio and video sending module 30 is configured to, when receiving an audio and video sharing instruction, send the video frame and the audio information corresponding to the video frame to the receiving end to share the audio and video to be shared.

Further, the audio and video sending module 30 specifically includes:

An audio and video stream synchronization output unit is used to determine the current video frame in the video frame, and synchronously output the current video frame and the audio information corresponding to the current video frame according to the frame number of the current video frame. , obtain synchronized audio and video streams;

The audio and video stream sending unit is configured to send the synchronized audio and video stream to the receiving end based on the data transmission channel, so that the receiving end can synchronously play the audio and video to be shared.

Further, the audio and video stream sending unit specifically includes:

The application data information sending subunit is configured to send application data information corresponding to the target application to the receiving end through the application data channel in the data transmission channel; wherein the target application is used to play the audio and video stream;

The audio and video stream sending subunit is configured to send the synchronized audio and video stream to the receiving end through the transmission data channel in the data transmission channel.

Further, the audio and video sharing device also includes a network data server module. The network data server module specifically includes:

A network server query unit, used to determine whether the application data information exists in the network data server;

A local application information sending unit is configured to send the local application data information to the receiving end based on the application data channel when the application data information does not exist in the network data server.

A network-side application sending unit is configured to send the application data information to the receiving end through the network data server if the application data information exists in the network data server.

Further, the audio and video sharing device also includes a data transmission channel determination module. The data transmission channel determination module specifically includes:

An audio and video call type judgment unit is used to establish an audio and video call with the receiving end. If the audio and video call is not the VoNR+ call type, use the call data channel of the audio and video call as the data transmission channel;

Create a new data channel unit, used to establish a data channel with the receiving end based on the call data channel as the data transmission channel if the audio and video call is a VONR+ call type.

Further, the audio and video analysis module 10 includes:

The audio and video file parsing unit is used to read the audio and video to be shared through the currently running video player, and parse the audio and video file to be shared into the audio stream and the video frame corresponding to the video stream.

Further, the audio and video sharing device includes an audio channel switching module, and the audio channel switching module specifically includes:

A target audio channel determination unit, configured to determine the target audio channel corresponding to the audio information according to the data type of the audio information;

An audio channel switching unit is used to switch the first audio channel corresponding to the video player and the second audio channel corresponding to the audio and video call to the target audio channel, so as to play the audio information through the target audio channel while playing The audio information is synchronously sent to the receiving end.

Further, the audio and video sharing device includes a whitelist module, and the whitelist module specifically includes:

An application whitelist determination unit is configured to determine based on the application information to which the audio and video to be shared belongs. The application whitelist corresponding to the application to which the audio and video to be shared belongs, and determine whether the application to which the audio and video call belongs belongs to the application whitelist;

An audio and video file sending unit, configured to, if the application to which the audio and video call belongs belongs to the application whitelist, send the to-be-received file to the receiving end based on the respective video frames and the audio information corresponding to the respective video frames. Share audio and video.

Each module in the above-mentioned audio and video sharing device corresponds to each step in the above-mentioned audio and video sharing method embodiment, and their functions and implementation processes will not be described in detail here.

In addition, embodiments of the present disclosure also provide a computer-readable storage medium.

The computer-readable storage medium of the present disclosure stores an audio and video sharing program. When the audio and video sharing program is executed by a processor, the steps of the above audio and video sharing method are implemented.

For the method implemented when the audio and video sharing program is executed, reference may be made to various embodiments of the audio and video sharing method of the present disclosure, and details will not be described again here.

It should be noted that, as used herein, the terms "include", "comprising" or any other variation thereof are intended to cover a non-exclusive inclusion, such that a process, method, article or system that includes a list of elements not only includes those elements, but It also includes other elements not expressly listed or that are inherent to the process, method, article or system. Without further limitation, an element qualified by the statement "comprises a..." does not exclude the presence of additional identical elements in the process, method, article, or system that includes that element.

The above serial numbers of the embodiments of the present disclosure are only for description and do not represent the advantages and disadvantages of the embodiments.

The present disclosure may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics devices, network PCs, minicomputers, mainframe computers, including Distributed computing environment for any of the above systems or devices, etc. The present disclosure may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types. The present disclosure may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

Through the above description of the embodiments, those skilled in the art can clearly understand the above The embodiment method can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is a better implementation method. Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product in nature or in part that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM/RAM) as mentioned above. , magnetic disk, optical disk), including several instructions to cause a terminal device (which can be a mobile phone, computer, server, air conditioner, or network device, etc.) to execute the methods described in various embodiments of the present disclosure.

The above are only preferred embodiments of the present disclosure, and are not intended to limit the patent scope of the present disclosure. Any equivalent structure or equivalent process transformation made using the contents of the disclosure description and drawings may be directly or indirectly applied in other related technical fields. , are all similarly included in the patent protection scope of this disclosure.

Claims

An audio and video sharing method, the method includes:

Obtain the audio stream and video stream in the audio and video to be shared, and parse the video stream into video frames;

Determine the audio information corresponding to each video frame according to the timestamp information of the audio and video to be shared;

When receiving an instruction to share audio and video, the video frame and the audio information corresponding to the video frame are sent to the receiving end to share the audio and video to be shared.
The audio and video sharing method according to claim 1, wherein when receiving an audio and video sharing instruction, the video frame and the audio information corresponding to the video frame are sent to the receiving end for the to-be-shared Audio and video sharing, including:

Determine the current video frame in the video frame, and synchronously output the current video frame and the audio information corresponding to the current video frame according to the frame number of the current video frame, to obtain a synchronized audio and video stream;

Based on the data transmission channel, the synchronized audio and video stream is sent to the receiving end, so that the receiving end plays the audio and video to be shared synchronously.
The audio and video sharing method according to claim 2, wherein the sending the synchronized audio and video stream to the receiving end based on the data transmission channel includes:

Send application data information corresponding to the target application to the receiving end through the application data channel in the data transmission channel; wherein the target application is used to play the audio and video stream;

The synchronized audio and video stream is sent to the receiving end through a transmission data channel in the data transmission channel.
The audio and video sharing method according to claim 3, wherein before sending the application data information corresponding to the target application to the receiving end through the application data channel in the data transmission channel, it further includes:

Determine whether the application data information exists in the network data server;

If the application data information does not exist in the network data server, the local application data information is sent to the receiving end based on the application data channel.

If the application data information exists in the network data server, the application data information is sent to the receiving end through the network data server.
The audio and video sharing method according to claim 2, wherein before sending the synchronized audio and video stream to the receiving end based on the data transmission channel, the method includes:

Establish an audio and video call with the receiving end. If the audio and video call is not the VoNR+ call type, use the call data channel of the audio and video call as the data transmission channel;

If the audio and video call is a VoNR+ call type, a data channel is established with the receiving end based on the call data channel as the data transmission channel.
The audio and video sharing method according to claim 1, wherein said obtaining the audio stream and video stream in the audio and video to be shared, and parsing the video stream into video frames includes:

The audio and video files to be shared are read through the currently running video player, and the audio and video files to be shared are parsed into video frames corresponding to the audio stream and the video stream.
The audio and video sharing method according to claim 1, wherein when receiving an audio and video sharing instruction, the video frame and the audio information corresponding to the video frame are sent to the receiving end for the to-be-shared Before sharing audio and video, it also includes:

Determine the target audio channel corresponding to the audio information according to the data type of the audio information;

Switch the first audio channel corresponding to the video player and the second audio channel corresponding to the audio and video call to the target audio channel, so as to simultaneously send the audio information to the receiving end through the target audio channel while playing the audio information. the audio information.
The audio and video sharing method according to any one of claims 1 to 7, wherein when receiving an audio and video sharing instruction, the video frame and the audio information corresponding to the video frame are sent to the receiving end, Also includes:

According to the application information to which the audio and video to be shared belongs, determine the application whitelist corresponding to the application to which the audio and video to be shared belongs, and determine whether the application to which the audio and video call belongs belongs to the application whitelist;

If the application to which the audio and video call belongs belongs to the application whitelist, the audio and video to be shared are sent to the receiving end based on the respective video frames and the audio information corresponding to the respective video frames.
An audio and video sharing device, wherein the audio and video sharing device includes a processor, a memory, and an audio and video sharing program stored on the memory and executable by the processor, wherein the audio and video sharing program is When the processor is executed, the steps of the audio and video sharing method according to any one of claims 1 to 8 are implemented.
A computer-readable storage medium, wherein an audio and video sharing program is stored on the computer-readable storage medium, and when the audio and video sharing program is executed by a processor, the method of any one of claims 1 to 8 is implemented. Steps of the audio and video sharing method described above.