CN112399190B

CN112399190B - Audio and video data acquisition method and device

Info

Publication number: CN112399190B
Application number: CN202011110897.5A
Authority: CN
Inventors: 赵一开; 罗喆; 郭君健
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2023-05-09
Anticipated expiration: 2040-10-16
Also published as: CN112399190A

Abstract

The disclosure relates to an audio and video data acquisition method and device, and relates to the technical field of streaming media, wherein the method and device are used for guaranteeing the fluency of audio and video playing. The method disclosed by the invention comprises the following steps: the method comprises the steps of obtaining a buffer area state of a player, wherein the buffer area is used for buffering audio and video data obtained from a network side so as to provide the audio and video data for the player to play; determining an audio and video data acquisition mode according to the buffer area state; and acquiring the audio and video data from the network side by adopting the determined audio and video data acquisition mode, and storing the acquired audio and video data into a buffer area of the player.

Description

Audio and video data acquisition method and device

Technical Field

The present disclosure relates to the field of streaming media technologies, and in particular, to a method and an apparatus for acquiring audio and video data.

Background

The network live broadcast is based on the advantage of the Internet, and the video mode is utilized to carry out the network live broadcast, so that contents such as product display, related conferences, dialogue interviews, online training and the like can be released on site to the Internet, and the promotion effect of the activity site is enhanced by utilizing the characteristics of intuitiveness, strong interactivity, unrestricted region and the like of the Internet.

One form of network live broadcast is: and erecting signal acquisition equipment on site, importing the acquired audio and video to a broadcasting guide end (comprising broadcasting guide equipment or a platform), and uploading the audio and video to a server through a network for release so as to be watched by a user.

The terminal device can download the network live broadcast data and play the network live broadcast data. In the process, the terminal equipment acquires an audio and video data stream from a server storing network live broadcast data to a local place (namely a pull stream process), and plays the audio and video data stream after decoding. Therefore, how to control the audio and video data acquisition process to ensure the smoothness of user watching and improve the user experience is a problem which is solved by the current industry.

Disclosure of Invention

The disclosure provides an audio and video data acquisition method and device, so as to ensure the fluency and stability of audio and video data playing at the side of terminal equipment. The technical scheme of the present disclosure is as follows:

according to a first aspect of the present disclosure, there is provided an audio/video data acquisition method, including:

the method comprises the steps of obtaining a buffer area state of a player, wherein the buffer area is used for buffering audio and video data obtained from a network side so as to provide the audio and video data for the player to play;

determining an audio and video data acquisition mode according to the buffer area state;

And acquiring the audio and video data from the network side by adopting the determined audio and video data acquisition mode, and storing the acquired audio and video data into a buffer area of the player.

In some embodiments, the determining an audio/video data acquisition mode according to the buffer status includes:

if the buffer area state is stable and the current acquisition mode of the audio and video data is the acquisition mode based on the CDN, determining to switch the acquisition mode of the audio and video data from the current acquisition mode based on the CDN to the acquisition mode based on the PCDN.

In some embodiments, the buffer status is determined to be stable when the following condition is satisfied:

and the data quantity of the data buffered by the buffer area in the duration of the first set duration is larger than a first threshold value.

In some embodiments, when the playing time of the data buffered in the buffer area within the duration of the first set duration is greater than a first threshold, if at least one of the following conditions is also satisfied, determining that the buffer area state is stable: the time interval from the last audio/video data acquisition mode switching exceeds the set time interval, and the times of audio/video data acquisition mode switching failure are less than the set times.

In some embodiments, the switching the acquisition mode of the audio and video data from the current acquisition mode based on the CDN to the acquisition mode based on the PCDN includes:

the method comprises the steps of obtaining audio and video data from a network side in a CDN-based obtaining mode which is adopted currently, and starting a PCDN-based obtaining mode to obtain the audio and video data from the network side; the method comprises the steps that audio and video data acquired by adopting a CDN-based acquisition mode are cached to a first cache queue, and audio and video data acquired by adopting a PCDN-based acquisition mode are cached to a second cache queue;

determining a switching starting position according to the data units matched with the time information and the frame format in the first buffer queue and the second buffer queue;

and stopping outputting the audio and video data of the first buffer queue to a buffer zone of the player at the switching starting position, starting outputting the audio and video data of the second buffer queue to the buffer zone of the player, and stopping acquiring the audio and video data from a network side by adopting a CDN-based acquisition mode.

In some embodiments, the method further comprises:

if the currently adopted acquisition mode based on the CDN is kept to acquire the audio and video data from the network side, and after the acquisition mode based on the PCDN is started to acquire the audio and video data from the network side, the data units with the time stamps and the frame types matched with each other are not acquired in the first cache queue and the second cache queue, and then the acquisition of the audio and video data from the network side by the acquisition mode based on the PCDN is stopped.

if the buffer area state is unstable and the current audio and video data acquisition mode is a PCDN-based acquisition mode, determining to switch the audio and video data acquisition mode from the current PCDN-based acquisition mode to a CDN-based acquisition mode.

In some embodiments, the buffer is determined to be unstable when the following condition is satisfied:

and the data volume of the data buffered by the buffer area is smaller than or equal to a second threshold value within the duration of the second set duration.

In some embodiments, the switching the acquisition mode of the audio and video data from the current acquisition mode based on the PCDN to the acquisition mode based on the CDN includes:

the method comprises the steps of obtaining audio and video data from a network side in a PCDN-based obtaining mode adopted currently, and starting the CDN-based obtaining mode to obtain the audio and video data from the network side; the method comprises the steps that audio and video data acquired by adopting a CDN-based acquisition mode are cached to a first cache queue, and audio and video data acquired by adopting a PCDN-based acquisition mode are cached to a second cache queue;

And stopping outputting the audio and video data of the second buffer queue to a buffer zone of the player at the switching starting position, starting outputting the audio and video data of the first buffer queue to the buffer zone of the player, and stopping acquiring the audio and video data from a network side by adopting a PCDN-based acquisition mode.

In some embodiments, the method further comprises:

if the currently adopted acquisition mode based on the PCDN is kept to acquire the audio and video data from the network side, and after the acquisition mode based on the CDN is started to acquire the audio and video data from the network side, the data units with the time stamps and the frame types matched with each other are not acquired in the first cache queue and the second cache queue, and then the acquisition of the audio and video data from the network side by adopting the acquisition mode based on the PCDN is stopped.

In some embodiments, the determining the handover start position according to the data units in the first buffer queue and the second buffer queue, where the time information and the frame format match, includes:

acquiring data units with matched time stamps and frame types in the first buffer queue and the second buffer queue;

and determining the starting position of the data unit matched with the time stamp and the frame type as a switching starting position.

In some embodiments, the obtaining the buffer status of the player includes:

and acquiring the buffer area state of the player according to the set time interval.

According to a second aspect of the present disclosure, there is provided an audio-video data acquisition apparatus including:

the player is configured to decode the audio and video data and play the decoded audio and video data;

the controller is configured to acquire the buffer area state of the player, determine an audio and video data acquisition mode according to the buffer area state, and store the acquired audio and video data into the buffer area of the player; the buffer area is used for buffering the audio and video data acquired from the network side so as to provide the audio and video data for the player to play;

the first audio and video acquisition module is configured to acquire audio and video data from a network side in a CDN-based mode according to the CDN-based acquisition mode determined by the controller;

and the second audio and video acquisition module is configured to acquire audio and video data from a network side in a PCDN-based mode according to the PCDN-based acquisition mode determined by the controller.

In some embodiments, the controller is specifically configured to:

If the buffer area state is stable and the current acquisition mode of the audio and video data is the acquisition mode based on the content delivery network CDN, determining to switch the acquisition mode of the audio and video data from the current acquisition mode based on the CDN to the acquisition mode based on the point-to-point content delivery network PCDN.

In some embodiments, the controller is specifically configured to:

determining that the buffer status is stable when the following conditions are satisfied: and the data quantity of the data buffered by the buffer area in the duration of the first set duration is larger than a first threshold value.

In some embodiments, the controller is specifically configured to:

when the playing time length of the data buffered in the buffer area within the duration of the first set time length is longer than a first threshold value, if at least one of the following conditions is also met, determining that the buffer area state is stable: the time interval from the last audio/video data acquisition mode switching exceeds the set time interval, and the times of audio/video data acquisition mode switching failure are less than the set times.

In some embodiments, the controller is specifically configured to:

In some embodiments, the controller is further configured to:

In some embodiments, the controller is specifically configured to:

determining that the buffer is unstable when the following conditions are met: and the data volume of the data buffered by the buffer area is smaller than or equal to a second threshold value within the duration of the second set duration.

In some embodiments, the controller is specifically configured to:

In some embodiments, the controller is further configured to:

In some embodiments, the controller is specifically configured to:

According to a third aspect of the present disclosure, there is provided a terminal device comprising: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the audio-video data acquisition method according to any one of the first aspects above.

According to a fourth aspect of the present disclosure, there is provided a non-volatile readable storage medium, which when executed by a processor of a terminal device, causes the terminal device to perform the audio-video data acquisition method as described in any one of the first aspects above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the audio and video data playing process, a current audio and video data acquisition mode is determined according to the buffer zone state of the player, and the audio and video data is acquired from the network side by adopting the determined audio and video data acquisition mode, so that the audio and video acquisition mode can be switched according to the buffer zone state of the player in time, the smoothness and reliability of audio and video playing of the side of the terminal equipment are ensured, and the user experience is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

FIG. 1 is a schematic diagram of a CDN architecture, according to an example embodiment;

FIG. 2 is a block diagram of a terminal device, according to an example embodiment;

FIG. 3 is a diagram illustrating a data queue according to an example embodiment;

fig. 4 is a schematic structural view of a terminal device according to an exemplary embodiment;

FIG. 5 is a flow chart illustrating a method of audio video data acquisition according to an exemplary embodiment;

FIG. 6 is a schematic diagram illustrating a handoff at a matched FLV TAG, according to an example embodiment;

fig. 7 is a schematic diagram illustrating a switching flow of an audio and video data acquisition mode in a live scene according to an exemplary embodiment;

FIG. 8 is a block diagram of a terminal device, according to an example embodiment;

fig. 9 is a block diagram of a terminal device according to an exemplary embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

The application scenario described in the embodiments of the present disclosure is for more clearly describing the technical solution of the embodiments of the present disclosure, and does not constitute a limitation on the technical solution provided by the embodiments of the present disclosure, and as a person of ordinary skill in the art can know that, with the appearance of a new application scenario, the technical solution provided by the embodiments of the present disclosure is equally applicable to similar technical problems.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure.

The term "and/or" in the embodiments of the present disclosure describes an association relationship of association objects, which indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

The term "plurality" in the presently disclosed embodiments refers to two or more than two.

The embodiment of the disclosure provides an audio and video data acquisition method and device, and the embodiment of the disclosure determines an audio and video data acquisition mode based on a buffer area state of a player, so that smoothness and reliability of a data code stream are ensured in an audio and video playing process, and thus viewing experience of a user on audio and video can be improved.

The embodiment of the disclosure can be applied to various audio and video playing scenes, such as a network live broadcast scene, an audio and video on demand scene and the like. In the embodiment of the disclosure, switching among different audio and video acquisition modes can be realized, so that the switched audio and video acquisition modes are more matched with the buffer area state of the current player, and the smoothness and stability of audio and video playing at the terminal equipment side are ensured.

The audio/video data acquisition modes may include an acquisition mode based on a content delivery network (Content delivery network or Content distribution network, abbreviated as CDN streaming mode) and an acquisition mode based on a peer-to-peer content delivery network (P2P CDN, PCDN) (also referred to as PCDN streaming mode).

The CDN is a computer network system connected to each other through the internet, and is a distributed network node structure for delivering internet content such as streaming media, and selects an optimal edge network node (also called an edge server or an edge CDN node) for a user according to a geographic location of the user, so that streaming media data (such as music, pictures, videos, applications, other files, etc.) is sent to the user by using the edge network node closest to the user.

CDNs are a system that includes components such as distributed storage, load balancing, redirection of network requests, and content management, and that can distribute content resources from an Internet data center (Internet Data Center, abbreviated IDC) to network edge nodes (servers) that are closer to the user (or otherwise), redirect user access requests to edge network nodes using their load balancing techniques, reduce the response time of the access, and relieve the pressure of the backbone network. A CDN network is mainly composed of edge network nodes.

Fig. 1 illustrates an exemplary live CDN architecture. As shown in the figure, the live webcast data provider sends live webcast data to the source station, and the source station sends live webcast data to the intermediate source CDN nodes and the edge CDN nodes (there may be a plurality of edge CDN nodes, only one CDN edge node is taken as an example in the figure); when a user sends a network live broadcast request through the terminal equipment where the user is located to request to watch the network live broadcast, the dispatching center selects the optimal edge CDN node based on the geographic position of the user, the load balancing and other strategies, and informs the edge CDN node to send network live broadcast data to the terminal equipment where the user is located.

PCDN (P2P CDN) is a technology of combining point-to-point transmission with CDN, and providing a technology of delivering internet content like CDN through an edge network node. The PCDN is based on peer-to-peer (P2P) technology, and constructs a content distribution network by mining and utilizing the massive fragmented idle resources of the edge network.

One typical PCDN architecture is a network architecture employing an upper CDN and a lower P2P. On the basis of CDN (the central CDN node is still used for sending the edge CDN node), a manageable P2P autonomous region is introduced into the edge CDN node, and the edge CDN node (possibly a plurality of edge CDN nodes) and terminal equipment are used for reducing the pressure of a backbone network. Another PCDN architecture is to build a P2P architecture between edge servers, i.e., connect edge CDN nodes in a P2P manner to reduce the load of the center CDN node.

Compared with CDNs, PCDNs have the characteristics of low bandwidth cost and large capacity, and are suitable for network live broadcast service scenes, but the quality (performance or stability) of the network content service provided by the PCDNs is lower than that of the CDNs.

In the embodiment of the disclosure, the stability of the audio and video data can be determined according to the state of the buffer area of the player at the terminal equipment side, and the audio and video acquisition modes are automatically switched between the PCDN acquisition mode and the CDN acquisition mode based on the characteristics of the two modes, so that the bandwidth cost can be saved and the playing quality can be guaranteed.

Fig. 2 is a schematic structural diagram of a terminal device in an embodiment of the present disclosure, where the terminal device may perform audio and video data acquisition and playing functions.

The term "terminal device" in the embodiments of the present disclosure may be an electronic device having a streaming media processing function and an audio/video display function, and may support audio/video data processing and playing, such as processing and playing of a network live stream. The terminal device may be a smart phone, a tablet computer, a smart television, a medical terminal device, a personal digital assistant, etc.

As shown in the figure, the terminal device includes a player 21, a controller 22, a first audio/video acquisition module 23, and a second audio/video acquisition module 24. These modules may be implemented in software, or in a combination of software and hardware, as embodiments of the present disclosure are not limited in this regard.

And a player 21 configured to decode the audio-video data and play the decoded audio-video data. The player 21 includes a Buffer for buffering the audio/video data to be played, and the player 21 reads the audio/video data from the Buffer for playing.

In order to ensure smooth playback, in the embodiment of the present disclosure, audio and video data may be acquired from the network side and stored in the buffer as much as possible, so that the player 21 reads the audio and video data therefrom and plays. If the speed of acquiring the audio and video data from the network side is continuously smaller than the speed required by playing of the player 21, the audio and video data buffered in the buffer area of the player 21 will be reduced until empty, and a playing clamping phenomenon will occur.

The controller 22 is configured to acquire a buffer status of the player, determine an audio/video data acquisition mode according to the buffer status, and store the acquired audio/video data in a buffer of the player.

Wherein the buffer status is used to characterize the data buffering situation in the data buffer of the player 21. For example, the buffer status may be characterized using at least one of the following information: the number of data units (or data packets) of the audio/video data buffered in the buffer, the data amount (such as the number of bits or the number of bytes) of the audio/video data buffered in the buffer, the number of frames of the audio/video data buffered in the buffer, and the time stamp of the audio/video data buffered in the buffer (such as each video frame has a respective time stamp), and may further include the percentage of the buffered audio/video data in the buffer capacity. The playing time length of the cached audio and video data can be determined through the data quantity and the time stamp of the cached audio and video data in the buffer area.

The audio/video data buffered in the buffer may be stored in a specified data format, for example, FLV (flash video) format, the video data packet stored in FLV format may be referred to as FLV TAGs, and the buffer status may be represented using the number of FLV TAGs and the time stamp of each FLV TAG.

FLV is a streaming format, which is a video format that has evolved with the push of Flash MX. The FLV streaming media format file has small data size and high loading speed, can also play a role in protecting copyrights, and is suitable for network transmission of video data.

The FLV format data structure includes an FLV header (FLV header) and an FLV body (FLV body). The FLV header may include indication information in the "FLV" format, version information, indication information of a stream (indicating whether there is video information, whether there is audio information), header length information, and the like. FlV bodies contain a number of TAGs, which are classified into three types: script (script stream), video (video stream), audio (audio stream).

The FLV TAG contains TAG header (TAG head) and TAG data (TAG body). The TAG header may include information such as a data type (e.g., script type, audio type, video type), data area length information, time stamp, etc. The video stream type obtaining TAG data may include frame type indication information, a decoder ID, and video data.

Wherein the type indication information may indicate one of the following frame types: keyframe, inner frame, disposable inner fram, generated keyframe (generated keyframe).

In the embodiment of the disclosure, whether the buffer status is stable or not can be judged according to the buffer status information of the player, and then a proper audio/video acquisition mode can be selected according to the current buffer status. The buffer status may be determined by a decision condition, for example, if the buffer status meets a preset steady state decision condition, the buffer status is determined to be steady, and if the buffer status meets a preset unsteady state decision condition, the buffer status is determined to be unsteady. For example, in some embodiments, if the amount of the audio and video data buffered in the buffer of the player is small, which indicates that the playing duration of the buffered audio and video data is short, the current buffer status is unstable; in other embodiments, if the amount of audio-video data buffered by the buffer is small and the condition persists for more than a certain period of time, the current buffer status is unstable.

The first audio/video acquisition module 23 may acquire audio/video data from the network side according to the control of the controller 22 by adopting the CDN-based acquisition manner determined by the controller 22.

The second audio/video acquisition module 24 may acquire audio/video data from the network side according to the control of the live broadcast pull stream controller 22 by using the PCDN-based acquisition mode determined by the controller 22.

In the above structure of the terminal device, the interface between the player 21 and the controller 22 may include a standard socket-like interface (including open interface, close interface, and read interface). Wherein, based on the open interface, the player 21 can establish a connection with the controller 22 for delivering audio and video data to the player 21; based on the close interface, the player 21 can disconnect it from the controller 22 to stop receiving audio-video data from the controller 22; based on the read interface, the player 21 may receive audio and video data from the controller 22 and buffer it in a buffer.

In the embodiment of the present disclosure, the interface between the player 21 and the controller 22 may further include an interface for providing and transmitting the buffer status information of the player 21, through which the controller 22 may obtain the buffer status information of the player 21, so as to serve as a basis for switching the audio/video obtaining mode. In some embodiments, the player 21 may send its buffer status information to the controller 22 at set time intervals or at set periods or time intervals; in other embodiments, the controller 22 may also query the buffer status information of the player 21 at set time intervals or at set periods or time intervals.

In the above structure of the terminal device, the controller 22 interacts with the first audio/video acquisition module 23 and the second audio/video acquisition module 24 at the bottom layer through standard socket-like interfaces (including open interface, close interface and read interface) respectively, so as to acquire audio/video data from two channels respectively. The controller 22 provides a standard socket-like interface (including open interface, close interface, read interface) for the player 21 at the upper layer, so as to provide an uninterrupted audio/video data stream, so that the upper layer (player) does not need to perceive a switching process between different audio/video acquisition modes.

In some embodiments, a data queue is configured inside the controller 22 for the first audio-video acquisition module 23 and the second audio-video acquisition module 24, respectively.

Fig. 3 illustrates a schematic diagram of a data queue. As shown in the figure, the data queue 25 configured for the first audio/video acquisition module 23 is used for storing the audio/video data acquired by the module from the network side, and the data queue 26 configured for the second audio/video acquisition module 24 is used for storing the audio/video data acquired by the module from the network side. Optionally, the data queue 25 and the data queue 26 may store audio and video data in FLV format, i.e. the acquired audio and video data is stored as FLV TAGs, each FLV TAG being a data unit.

For the audio/video data acquired from the CDN by the first audio/video acquisition module 23, the controller 22 parses the data and puts the parsed data into the data queue 25 (each box filled with oblique lines in the data queue 25 represents a corresponding data unit, such as an FLV TAG); for the audio/video data acquired from the PCDN by the second audio/video acquisition module 24, the controller 22 parses the data and places the parsed data into the data queue 26 (each diagonally filled box in the data queue 26 represents a corresponding data unit, such as an FLV TAG).

The audio and video data provided by the controller 22 to the player 21 is presented from a data queue 25 and a data queue 26 inside the controller 22. At a given moment, the controller 22 selects one of the data queues to read data through audio/video acquisition mode switching judgment, and the read data is reassembled and provided for the player 21 to play. Specifically, the controller 22 can switch the pull stream mode by switching the data queue corresponding to the read interface, so as to transfer the audio/video data acquired by the corresponding audio/video acquisition mode in the corresponding data queue to the player 21.

With respect to the above-described terminal device shown in fig. 3, details of operations performed by the respective constituent parts and the implemented functions may be referred to in the related description in the audio/video acquisition flow provided in the following embodiments, and will not be described in detail herein.

Fig. 4 schematically illustrates the structure of one controller 22 provided by an embodiment of the present disclosure.

As shown, the controller 22 may include: an acquisition unit 41, a processing unit 42, a control unit 43.

An obtaining unit 41, configured to obtain a buffer area state of a player, where the buffer area is used to buffer audio and video data obtained from a network side, so as to provide the audio and video data to the player for playing;

a processing unit 42, configured to determine an audio/video data acquisition mode according to the buffer status;

the control unit 43 is configured to instruct the corresponding audio/video acquisition module to acquire audio/video data from the network side according to the determined audio/video data acquisition mode, and store the acquired audio/video data in the buffer area of the player.

With respect to the above-described apparatus shown in fig. 4, details of operations performed by respective units and functions implemented therein may be referred to as related descriptions in the audio/video data acquisition process provided in the following embodiments, and will not be described in detail herein.

Fig. 5 illustrates a flowchart of an audio and video data acquisition method according to an embodiment of the present disclosure, where the flowchart may be implemented by an audio and video data acquisition device, such as the device shown in fig. 2, 3, or 4. In particular, the apparatus may be a terminal device or an apparatus integrated in a terminal device.

As shown, the process may include:

s501: the buffer status of the player is obtained.

The buffer area is used for buffering the audio and video data acquired from the network side so as to provide the audio and video data for the player to play.

Alternatively, the buffer status information may be acquired at a set period or a set time interval, and the buffer status may be determined according to the buffer status information. Wherein the buffer status information may include at least one of the following information: the number of data units (or data packets) of the audio and video data buffered in the buffer, the data amount (such as the number of bits or the number of bytes) of the audio and video data buffered in the buffer, the number of frames of the audio and video data buffered in the buffer, and the time stamp of the audio and video data buffered in the buffer, and the like, and further may further include the percentage of the buffered audio and video data in the buffer capacity.

Since the buffer status can reflect the data buffer status in the data buffer area of the player, it can be determined whether the current audio/video data acquisition status is stable or not according to the buffer status.

Specifically, the buffer status may include stable and unstable, where a stable buffer status indicates that the current audio and video data acquisition status is stable, and an unstable buffer status indicates that the current audio and video data acquisition status is unstable.

For example, if the buffer status indicates that the speed of acquiring the audio and video data from the network side is continuously greater than or equal to the playing speed, the playing smoothness of the player is ensured, and the situation of jamming is avoided, the current audio and video data acquisition status is indicated to be stable; if the buffer status indicates that the speed of acquiring the audio and video data from the network side is continuously smaller than the playing speed, the data amount buffered in the buffer may be reduced or even be empty, so that the player is blocked, and the current audio and video data acquisition status is unstable.

For example, if it is determined that the data amount of the buffered data is greater than the first threshold value within the duration time not less than the first set duration according to the buffer status information of the player, it indicates that the speed of acquiring the audio and video data from the network side is continuously greater than or equal to the playing speed, so that it can be determined that the current audio and video data acquisition status is stable, that is, the buffer status is stable. By the judging method, whether the buffer area state is stable or not can be accurately judged, and then the audio and video acquisition mode determined by taking the buffer area state as a basis can be matched with the buffer area state, so that the smoothness and stability of audio and video playing are ensured.

For another example, if it is determined that the data amount of the buffered data is greater than the first threshold value within the duration not less than the first set duration according to the buffer status information of the player, the following condition is further satisfied: and if the time interval from the last audio/video data acquisition mode switching exceeds the set time interval and/or the number of times of audio/video data acquisition mode switching failures is less than the set number of times, the speed of acquiring the audio/video data from the network side is continuously greater than or equal to the playing speed, so that the current audio/video data acquisition state is judged to be stable, namely the buffer state is judged to be stable.

The set time interval and the set times may be preset, specifically may be set according to the network performance and in combination with the actual measurement result, which is not limited in the embodiment of the present disclosure.

The buffer area state is judged whether to be stable by combining the buffer area state of the audio and video data with the audio and video data acquisition mode switching state, so that the judging result is more reasonable, the audio and video data acquisition mode switching judgment is more reasonable, and the smoothness of audio and video playing can be ensured. For example, the number of data units buffered in the buffer area within n seconds (the first set duration is equal to n) is not lower than m (the first threshold is equal to m), but the time interval for switching the audio and video data acquisition mode is shorter (the set time interval is not exceeded) and the number of times of switching the audio and video data acquisition mode is more than the last time, so that it cannot be judged whether the buffering condition of the audio and video data is caused by the current audio and video data acquisition mode or the audio and video data acquisition mode before switching, therefore, if the buffering condition of the audio and video data meets the corresponding condition, the current audio and video data acquisition state is stable, and the accuracy and the rationality of the judgment can be improved.

An example of determining that the buffer status is unstable is: if the data amount of the cached data is less than or equal to the second threshold value within the duration time not less than the second set duration time according to the buffer zone state information of the player, the speed of acquiring the audio and video data from the network side is indicated to be continuously less than the playing speed, so that the current audio and video data acquisition state is not stable, namely the buffer zone state is not stable. By the judging method, whether the buffer area state is stable or not can be accurately judged, and then the audio and video acquisition mode determined by taking the buffer area state as a basis can be matched with the buffer area state, so that the smoothness and stability of audio and video playing are ensured.

The first threshold value and the second threshold value may be equal, and the second threshold value may be smaller than the first threshold value. If the second threshold is smaller than the first threshold, frequent switching of audio and video acquisition modes can be reduced, namely ping-pong phenomenon is reduced, so that stability is improved.

Optionally, the first threshold and the second threshold are related to a current play speed or frame rate. For example, if the playing speed or the frame rate is higher, the value of the first threshold is larger, otherwise, the value of the first threshold is smaller. Corresponding first threshold and second threshold can be set in advance for possible playing speed or frame rate, and in the audio/video playing process, the corresponding first threshold and second threshold can be selected according to the current audio/video playing speed or frame rate. In other embodiments, the first threshold and the second threshold may be set only for a normal or default playing speed or a frame rate, and in the audio/video playing process, the first threshold and the second threshold that are configured in advance may be scaled (for example, scaling is performed in a corresponding proportion) according to a multiple relationship between the current audio/video playing speed and the default playing speed, so as to obtain the first threshold and the second threshold corresponding to the current audio/video playing speed.

The first set duration and the second set duration may be the same or different, for example, the second set duration may be smaller than the first set duration, so that the situation that the speed of acquiring the audio and video data from the network side is continuously smaller than the playing speed can be timely judged to be unstable in the audio and video data acquiring state (that is, the buffer area state is unstable), and accordingly the audio and video acquiring mode can be timely switched to ensure the smoothness of playing of the player.

The first threshold and the second threshold, and the first set duration and the second set duration may be preset, specifically may be set according to the network performance and in combination with the actual measurement result, which is not limited in the embodiment of the disclosure.

Taking the case of adopting the FLV format to buffer the audio and video data, the controller determines that the number of FLV TAGs is always greater than n (the first threshold value is equal to n) in m seconds (the first set duration is equal to m seconds) according to the buffer status information (such as the number of FLV TAGs) sent by the player, so that the speed of acquiring the audio and video data from the network side is enough to ensure the smoothness of playing of the player in a certain time, and therefore the current audio and video data acquisition status is determined to be stable, namely the buffer status is determined to be stable.

S502: and determining an audio and video data acquisition mode according to the buffer state.

In this step, if the buffer status is stable and the acquisition mode is currently based on the CDN, it is determined that the audio/video data acquisition mode needs to be switched, that is, to the acquisition mode based on the PCDN. Under the condition that the acquisition mode based on CDN is adopted currently, if the current buffer area state is stable, the method can be switched to the acquisition mode based on PCDN, so that the network bandwidth cost can be saved based on the characteristic that the PCDN can be saved, the advantages of the PCDN are utilized, and the network bandwidth cost is saved under the condition that smooth playing of audio and video data is ensured.

If the buffer area state is unstable and the acquisition mode is based on the PCDN, the switching of the audio and video data acquisition mode is judged to be needed, namely, the switching is carried out to the acquisition mode based on the CDN. Under the condition that the acquisition mode is based on the PCDN at present, if the state of the current buffer area is unstable, the acquisition mode is switched to the acquisition mode based on the CDN, so that the characteristics of higher network content service quality can be provided based on the CDN network, and smooth playing of the audio and video data is ensured.

S503: and acquiring the audio and video data from the network side by adopting the determined audio and video data acquisition mode, and storing the acquired audio and video data into a buffer area of the player.

In this step, in some embodiments, if it is determined that the currently adopted CDN-based acquisition mode is switched to the PCDN-based acquisition mode, the currently adopted CDN-based acquisition mode is maintained to acquire audio and video data from the network side, and the PCDN-based acquisition mode is started to acquire audio and video data from the network side, where the audio and video data acquired by adopting the CDN-based acquisition mode is cached in the first cache queue, and the audio and video data acquired by adopting the PCDN-based acquisition mode is cached in the second cache queue; determining a switching starting position according to the data units matched with the time information and the frame format in the first buffer queue and the second buffer queue; and stopping outputting the audio and video data of the first buffer queue to a buffer zone of the player at the switching starting position, starting outputting the audio and video data of the second buffer queue to the buffer zone of the player, and stopping acquiring the audio and video data from a network side by adopting an acquisition mode based on a CDN (content delivery network), thereby realizing the switching of the acquisition modes of the audio and video data.

Taking the case of storing the audio and video data in the FLV format as an example, if there is one FLV TAG in the second data queue, and the time stamp of the FLV TAG matches the time stamp of one FLV TAG in the first data queue, the two FLV TAGs are aligned in time (i.e. the start time is the same), and if the frame types of the two FLV TAGs are also the same, the FLV TAG may be switched from the first data queue to the second data queue from the time point to be read for being transferred to the player.

As shown in fig. 6, the live webcast data pulled in the CDN-based acquisition mode is stored in the CDN data queue in the FLV TAG form, and the live webcast data pulled in the PCDN-based acquisition mode is stored in the PCDN data queue in the FLV TAG form. The FLV TAG 61 (e.g. box filled with oblique lines in the figure) contained in the CDN data queue and the FLV TAG 62 (e.g. box filled with oblique lines in the figure) contained in the PCDN data queue are both keyframes and have the same time stamp, and the two FLV TAGs are aligned in time, so when the audio/video data acquisition mode is switched, the FLV TAG is switched from the CDN data queue to the PCDN data queue at time t0, i.e. reading of data from the CDN queue is stopped at that time, and reading of the FLV TAG from the PCDN data queue is started to be provided to the buffer of the player.

Further, if the timestamp and the data unit with the frame type matched with the data unit in the first data queue (i.e., the data queue corresponding to the CDN-based acquisition mode) are not found in the second data queue (i.e., the data queue corresponding to the CDN-based acquisition mode), the switching of the audio/video acquisition mode may be abandoned, that is, the current CDN-based acquisition mode is adopted to acquire the audio/video data from the network side and the current CDN-based acquisition mode is adopted to acquire the data from the first data queue corresponding to the CDN-based acquisition mode and send the data to the buffer of the player, and the acquisition of the audio/video data from the network side by the PCDN-based acquisition mode is stopped. Under the condition that the frame type or the time stamp of the audio and video data acquired by the original acquisition mode based on the CDN and the subsequently opened acquisition mode based on the PCDN is not matched, the audio and video data acquisition mode is abandoned to switch, and the fluency of audio and video data playing can be ensured.

In other embodiments, if it is determined that the currently adopted PCDN-based acquisition mode is switched to the CDN-based acquisition mode, the currently adopted PCDN-based acquisition mode is maintained to acquire audio and video data from the network side, and the CDN-based acquisition mode is started to acquire audio and video data from the network side, where the audio and video data acquired by adopting the CDN-based acquisition mode is cached in the first cache queue, and the audio and video data acquired by adopting the PCDN-based acquisition mode is cached in the second cache queue; determining a switching starting position according to the data units matched with the time information and the frame format in the first buffer queue and the second buffer queue; and stopping outputting the audio and video data of the second buffer queue to a buffer zone of the player at the switching starting position, starting outputting the audio and video data of the first buffer queue to the buffer zone of the player, and stopping acquiring the audio and video data from a network side by adopting a PCDN-based acquisition mode.

Further, if the data units with the time stamps and the frame types matched with each other are not acquired in the first buffer queue and the second buffer queue, the acquisition of the audio and video data from the network side by adopting the PCDN-based acquisition mode is stopped.

In the above-mentioned audio and video data acquisition mode switching process, the original acquisition mode is maintained to acquire audio and video data, and at the same time, the target acquisition mode is started to acquire audio and video data from the network side. In this case, the acquiring mode based on the CDN and the acquiring mode based on the PCDN are performed simultaneously, so that it is possible to find data with a timestamp matching the frame type from the data pulled by the CDN and the data pulled by the PCDN.

Based on a combination of one or more of the above embodiments, fig. 7 illustrates a live webcast pull flow control flow in a specific application scenario. As shown, the process may include:

s700 to S701: when the player starts to play the network live broadcast, the pull stream controller starts the first audio/video acquisition module to pull the network live broadcast data in an acquisition mode based on the CDN, and at the moment, the second audio/video acquisition module adopting the acquisition mode based on the PCDN is in a closed state.

When the network live broadcast starts to be played, the CDN-based acquisition mode is selected to be used based on the characteristic that the CDN can provide higher-quality network content service, so that the player can smoothly play the network live broadcast.

S702: and judging whether the audio and video data acquisition state is stable in the current CDN-based acquisition mode according to the buffer state of the player, if so, turning to S703, otherwise, keeping the current use of the first audio and video acquisition module to pull the network live broadcast data.

In this step, after the network live broadcast data play starts for a period of time, if the controller determines that the buffer status is stable according to the buffer status information of the player, the process goes to S703 to switch to the PCDN-based acquisition mode. If the buffer area state is stable in the acquisition mode based on the CDN, the acquisition mode based on the PCDN is switched to pull the network live broadcast data, and the system bandwidth overhead can be saved based on the characteristics of the PCDN.

S703: and switching from pulling the network live broadcast data by using the first audio/video acquisition module to pulling the network live broadcast data by using the second audio/video acquisition module, namely switching from a CDN-based acquisition mode to a PCDN-based acquisition mode.

S704: and judging whether the buffer area state is stable in the current PCDN-based acquisition mode according to the buffer area state of the player, if so, turning to S705, otherwise, keeping the current use of the second audio/video acquisition module to pull the network live broadcast data.

In this step, after the network live broadcast data is pulled for a period of time by using the PCDN-based acquisition mode, if the controller determines that the buffer status is unstable according to the buffer status information of the player, the process goes to S705 to switch to the CDN-based acquisition mode. If the buffer area state is unstable in the acquisition mode based on the PCDN, switching to the acquisition mode based on the CDN to pull the network live broadcast data can improve the streaming stability based on the characteristics of the CDN so as to ensure the smoothness of playing.

S705: and switching from pulling the network live broadcast data by using the second audio/video acquisition module to pulling the network live broadcast data by using the first audio/video acquisition module, namely switching from a PCDN-based acquisition mode to a CDN-based acquisition mode.

And executing the flow in the live broadcast playing process until the live broadcast is finished or the user stops playing.

The implementation of the relevant steps in the flow shown in fig. 7 is substantially the same as in the previous embodiment and will not be repeated here.

In the above embodiment of the present disclosure, the buffer status information of the player is transferred to the controller to be used as a basis for switching and deciding the audio/video acquisition mode, so that the player and the controller are deeply fused, and the controller can flexibly and accurately decide the switching between the acquisition mode based on the CDN and the acquisition mode based on the PCDN in real time according to the buffer status of the player. Further, when the controller switches the audio and video data acquisition modes, the controller does not immediately switch, but keeps the original acquisition mode and the target acquisition mode to acquire the audio and video data at the same time, and switches at a proper switching time, so that the interruption of the output audio and video stream is not caused, the controller is transparent to the upper layer, the situation of poor PCDN quality can be more robustly dealt with, and the acquisition mode based on the PCDN can be fully utilized under the condition of allowing the PCDN resource. Overall, the embodiments of the present disclosure can effectively improve the audio/video playing quality based on the PCDN.

Based on the same technical concept, the embodiment of the disclosure also provides a terminal device.

Fig. 8 is a block diagram of a terminal device 800, according to an exemplary embodiment, the apparatus comprising: a processor 801, a memory 802 for storing instructions executable by the processor 801. The processor 801 is configured to execute instructions to implement any one of the audio and video data acquisition methods according to the embodiments of the present disclosure.

In an exemplary embodiment, a non-transitory readable storage medium is also provided that includes instructions executable by the processor 801 of the terminal device 800 to perform the above-described method. Alternatively, the non-volatile readable storage medium may be a non-transitory computer readable storage medium, for example, a ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an embodiment of the present disclosure, a terminal device is further provided, a structure of which is shown in fig. 9, and an embodiment of the present disclosure provides a terminal device 900 for an audio/video data acquisition method, including: radio Frequency (RF) circuitry 910, power supply 920, processor 930, memory 940, input unit 950, display unit 960, camera 970, communication interface 980, and wireless fidelity (Wireless Fidelity, wi-Fi) module 990. It will be appreciated by those skilled in the art that the structure of the terminal device shown in fig. 9 does not constitute a limitation of the terminal device, and that the terminal provided by the embodiments of the present disclosure may include more or less components than illustrated, or may combine certain components, or may be arranged in different components.

The following describes the respective constituent elements of the terminal device 900 in detail with reference to fig. 9:

the RF circuitry 910 may be used for receiving and transmitting data during a communication or session. Specifically, the RF circuit 910 sends the downlink data of the base station to the processor 930 for processing; in addition, uplink data to be transmitted is transmitted to the base station. Typically, the RF circuitry 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (Low Noise Amplifier, LNA), a duplexer, and the like.

In addition, the RF circuitry 910 may also communicate with networks and other terminals via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (Global System of Mobile communication, GSM), general packet radio service (General Packet Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), long term evolution (Long Term Evolution, LTE), email, short message service (Short Messaging Service, SMS), and the like.

Wi-Fi technology belongs to short-distance wireless transmission technology, and terminal equipment 900 can be connected with an Access Point (AP) through Wi-Fi module 990, so that Access to a data network is realized. Wi-Fi module 990 may be used for receiving and transmitting data during communication.

Terminal device 900 may be physically coupled to other terminals through communication interface 980. Optionally, the communication interface 980 is connected with communication interfaces of other terminals through a cable, so as to implement data transmission between the terminal 900 and the other terminals.

Since in the embodiment of the present disclosure, the terminal device 900 can interact with the network side to request to acquire audio and video data, and acquire audio and video data from the network side, the terminal device 900 needs to have a data transmission function, that is, the terminal device 900 needs to include a communication module inside. Although fig. 9 illustrates communication modules such as RF circuitry 910, wi-Fi module 990, and communication interface 980, it will be appreciated that at least one of the above components or other communication modules (e.g., bluetooth module) for enabling communication are present in terminal device 900 for data transmission.

For example, when terminal device 900 is a cell phone, terminal device 900 may include RF circuitry 910 and may also include Wi-Fi module 990; when terminal device 900 is a computer, terminal device 900 may include a communication interface 980 and may also include a Wi-Fi module 990; when the terminal device 900 is a tablet computer, the terminal device 900 may include a Wi-Fi module.

Memory 940 may be used to store software programs and modules. Processor 930 executes various functional applications and data processing of terminal device 900 by running software programs and modules stored in memory 940, and when processor 930 executes the program code in memory 940, some or all of the processes in fig. 5 or 7 of the embodiments of the present disclosure can be implemented.

Alternatively, the memory 940 may mainly include a storage program area and a storage data area. The storage program area can store an operating system, various application programs (such as communication application), a face recognition module and the like; the storage data area may store data created according to the use of the terminal (such as multimedia files such as various pictures, video files, and the like, and face information templates), and the like.

In addition, memory 940 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 950 may be used to receive numeric or character information input by a user and to generate key signal inputs related to user settings and function control of the terminal 900.

Alternatively, the input unit 950 may include a touch panel 951 and other input terminals 952.

The touch panel 951, also referred to as a touch screen, may collect touch operations thereon or nearby by a user (such as operations of the user using any suitable object or accessory such as a finger, a stylus, etc. on the touch panel 951 or nearby the touch panel 951), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 951 may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends the touch point coordinates to the processor 930, and can receive and execute commands sent from the processor 930. Further, the touch panel 951 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave.

Alternatively, other input terminals 952 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 960 may be used to display information input by a user or information provided to the user and various menus of the terminal device 900. The display unit 960 is a display system of the terminal device 900, and is used for presenting an interface to realize man-machine interaction.

The display unit 960 may include a display panel 961. Alternatively, the display panel 961 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

Further, the touch panel 951 may cover the display panel 961, and when the touch panel 951 detects a touch operation thereon or thereabout, the touch panel is transferred to the processor 930 to determine a type of touch event, and then the processor 930 provides a corresponding visual output on the display panel 961 according to the type of touch event.

Although in fig. 9, the touch panel 951 and the display panel 961 are two independent components to implement the input and input functions of the terminal device 900, in some embodiments, the touch panel 951 may be integrated with the display panel 961 to implement the input and output functions of the terminal device 900.

Processor 930 is a control center of terminal device 900, connects the various components using various interfaces and lines, and performs various functions of terminal device 900 and processes data by running or executing software programs and/or modules stored in memory 940 and invoking data stored in memory 940, thereby implementing various services based on the terminal device.

In the alternative, processor 930 may include one or more processing units. Alternatively, processor 930 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 930.

A camera 970 for implementing a photographing function of the terminal device 900 to photograph pictures or videos. The camera 970 may also be used to realize a scanning function of the terminal 900 to scan a scanning object (two-dimensional code/bar code).

The terminal device 900 also includes a power source 920 (e.g., a battery) for powering the various components. Alternatively, the power supply 920 may be logically connected to the processor 930 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system.

It should be noted that, the processor 930 of the embodiment of the disclosure may perform the functions of the processor 801 in fig. 8, and the memory 940 stores the contents of the memory 802.

The embodiments of the present disclosure also provide a computer program product, which when run on a terminal device, causes the terminal device to perform a method for implementing any one of the above-mentioned audio-video data acquisition methods or a method possibly related thereto of the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An audio/video data acquisition method, comprising:

acquiring audio and video data from a network side by adopting the determined audio and video data acquisition mode, and storing the acquired audio and video data into a buffer area of the player;

If the audio/video data acquisition mode is switched from the current audio/video data acquisition mode to the target audio/video data acquisition mode, the step of acquiring the audio/video data from the network side and storing the acquired audio/video data in the buffer area of the player includes:

maintaining the current audio and video data acquisition mode to acquire audio and video data from a network side, and starting the target audio and video data acquisition mode to acquire audio and video data from the network side; the audio and video data acquired by adopting the current audio and video data acquisition mode are cached to a cache queue corresponding to the current audio and video data acquisition mode, and the audio and video data acquired by adopting the target audio and video data acquisition mode are cached to a cache queue corresponding to the target audio and video data acquisition mode;

determining a switching starting position according to a data unit matched with time information and a frame format in a buffer queue corresponding to the current audio and video data acquisition mode and a buffer queue corresponding to the target audio and video data acquisition mode;

and stopping outputting the audio and video data of the buffer queue corresponding to the current audio and video data acquisition mode to a buffer zone of the player at the switching starting position, starting outputting the audio and video data of the buffer queue corresponding to the target audio and video data acquisition mode to the buffer zone of the player, and stopping acquiring the audio and video data from a network side by adopting the current audio and video data acquisition mode.

2. The method of claim 1, wherein determining an audio-video data acquisition mode according to the buffer status comprises:

if the buffer area state is stable and the current audio and video data acquisition mode is the acquisition mode based on the content delivery network CDN, determining to switch the audio and video data acquisition mode from the current acquisition mode based on the CDN to the acquisition mode based on the point-to-point content delivery network PCDN.

3. The method of claim 2, wherein the buffer status is determined to be stable when the following conditions are met:

4. The method of claim 3, wherein when the play time of the data buffered by the buffer is greater than a first threshold value within the first set duration, determining that the buffer status is stable if at least one of the following conditions is also satisfied: the time interval from the last audio/video data acquisition mode switching exceeds the set time interval, and the times of audio/video data acquisition mode switching failure are less than the set times.

5. The method of claim 2, wherein switching the audio-video data acquisition mode from the current CDN-based acquisition mode to the PCDN-based acquisition mode comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 1, wherein determining an audio-video data acquisition mode according to the buffer status comprises:

8. The method of claim 7, wherein the buffer is determined to be unstable when the following conditions are met:

9. The method of claim 7, wherein switching the audio-video data acquisition mode from the current PCDN-based acquisition mode to the CDN-based acquisition mode comprises:

10. The method of claim 9, wherein the method further comprises:

11. The method according to claim 5 or 9, wherein determining the handover start position according to the data units in the first buffer queue and the second buffer queue, where the time information and the frame format match, includes:

12. The method of claim 1, wherein the obtaining the buffer status of the player comprises:

13. An audio/video data acquisition apparatus, comprising:

the first audio and video acquisition module is configured to acquire audio and video data from a network side in a CDN-based mode according to the CDN acquisition mode based on the content delivery network determined by the controller;

The second audio and video acquisition module is configured to acquire audio and video data from a network side in a PCDN-based mode according to the acquisition mode based on the point-to-point content distribution network PCDN determined by the controller;

wherein the controller is specifically configured to:

if the audio/video data acquisition mode is switched from the current audio/video data acquisition mode to the target audio/video data acquisition mode, the following operations are executed:

14. The apparatus of claim 13, wherein the controller is specifically configured to:

15. The apparatus of claim 14, wherein the controller is specifically configured to:

16. The apparatus of claim 15, wherein the controller is specifically configured to:

17. The apparatus of claim 14, wherein the controller is specifically configured to:

18. The apparatus of claim 17, wherein the controller is further configured to:

19. The apparatus of claim 13, wherein the controller is specifically configured to:

20. The apparatus of claim 19, wherein the controller is specifically configured to:

21. The apparatus of claim 19, wherein the controller is specifically configured to:

22. The apparatus of claim 21, wherein the controller is further configured to:

23. The apparatus of claim 17 or 21, wherein the controller is specifically configured to:

24. The apparatus of claim 13, wherein the controller is specifically configured to:

25. A terminal device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the audio-visual data acquisition method of any one of claims 1-12.

26. A non-transitory readable storage medium, characterized in that instructions in the non-transitory readable storage medium, when executed by a processor of a terminal device, enable the terminal device to perform the audio-video data acquisition method according to any one of claims 1-12.