CN117319752A

CN117319752A - Audio and video processing method and device, electronic equipment and storage medium

Info

Publication number: CN117319752A
Application number: CN202311181573.4A
Authority: CN
Inventors: 曾凡志
Original assignee: Douyin Vision Co Ltd
Current assignee: Douyin Vision Co Ltd
Priority date: 2023-09-13
Filing date: 2023-09-13
Publication date: 2023-12-29

Abstract

The embodiment of the disclosure provides an audio and video processing method, an audio and video processing device, electronic equipment and a storage medium. Wherein the method comprises the following steps: determining jitter characteristics corresponding to the data packets according to the packet information of the received data packets; when the condition that the jitter feature extraction condition is met is detected, determining a first delay according to at least one of the jitter feature, the historical jitter feature corresponding to the historical data packet and a preset confidence level parameter; determining a second delay based on the jitter characteristics and the historical jitter characteristics; a target delay corresponding to the data packet is determined based on the first delay and the second delay to process the multimedia data stream stored in the jitter buffer based on the target delay. According to the technical scheme, the effects of simultaneously considering the delay condition and the stuck condition in the delay prediction process are achieved, and further, the effect of self-adaptive adjustment of jitter delay in the multimedia data stream transmission process is achieved.

Description

Audio and video processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of audio and video processing, in particular to an audio and video processing method, an audio and video processing device, electronic equipment and a storage medium.

Background

In the process of playing the multimedia data stream, the delay index and the blocking index are important indexes for measuring the real-time communication quality. As the last link before the data packet is transmitted to the decoding module in the data packet transmission link, a jitter buffer (jitter buffer) is an important module in the real-time audio/video processing process. The jitter buffer can process the conditions of data packet loss, disorder, delay arrival and the like, smoothly output data packets/frames to the decoding module, resist the influence of various weak network environments on playing/rendering, reduce blocking and improve the viewing experience of users. In practical application, reasonably predicting jitter delay of jitter buffer is an important link for improving user experience.

In the related art, when predicting jitter delay of jitter buffer, a developer typically analyzes historical jitter delay based on a manual statistics algorithm, so as to finally obtain predicted delay.

However, in the jitter delay prediction process, there may be a problem that the obtained prediction delay cannot simultaneously consider the smoothness and the definition of the play of the multimedia data stream, and further, the play effect of the multimedia data stream may be affected, so that the viewing experience of the user may be affected.

Disclosure of Invention

The disclosure provides an audio and video processing method, an audio and video processing device, electronic equipment and a storage medium, so as to realize the effect of simultaneously considering the delay condition and the stuck condition in the delay prediction process, and further realize the effect of self-adaptive adjustment of jitter delay in the multimedia data stream transmission process.

In a first aspect, an embodiment of the present disclosure provides an audio/video processing method, including:

determining jitter characteristics corresponding to a data packet according to packet information of the received data packet; wherein the data packet comprises a multimedia data stream;

when the condition that the jitter feature extraction condition is met is detected, at least one of the jitter feature, the historical jitter feature corresponding to the historical data packet and a preset confidence level parameter is used;

determining a second delay based on the jitter characteristics and historical jitter characteristics; wherein, the history jitter feature corresponds to a history data packet received before the current moment;

and determining a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

In a second aspect, an embodiment of the present disclosure further provides an audio/video processing apparatus, including:

the jitter feature determining module is used for determining jitter features corresponding to the data packets according to the packet information of the received data packets; wherein the data packet comprises a multimedia data stream;

the first delay determining module is used for determining at least one of a jitter feature, a historical jitter feature corresponding to a historical data packet and a preset confidence level parameter when the condition that the jitter feature extraction condition is met is detected;

a second delay determining module configured to determine a second delay based on the jitter feature and the historical jitter feature; wherein, the history jitter feature corresponds to a history data packet received before the current moment;

and the target delay determining module is used for determining a target delay corresponding to the data packet based on the first delay and the second delay so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

In a third aspect, embodiments of the present disclosure further provide an electronic device, including:

one or more processors;

Storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the audio video processing method as described in any of the embodiments of the present disclosure.

In a fourth aspect, the presently disclosed embodiments also provide a storage medium containing computer-executable instructions that, when executed by a computer processor, are used to perform an audio-video processing method as described in any of the presently disclosed embodiments.

According to the technical scheme, jitter characteristics corresponding to the data packets are determined according to the received packet information of the data packets; when the condition that the jitter feature extraction condition is met is detected, determining a first delay according to at least one of the jitter feature, the historical jitter feature corresponding to the historical data packet and a preset confidence level parameter; determining a second delay based on the jitter characteristics and the historical jitter characteristics; the historical jitter characteristic corresponds to a historical data packet received before the current moment; based on the first delay and the second delay, determining a target delay corresponding to the data packet, so as to process the multimedia data stream stored in the jitter buffer based on the target delay, thereby solving the problems that the prediction delay obtained in the related technology cannot simultaneously consider the play smoothness and definition of the multimedia data stream, realizing the effect of simultaneously considering the delay condition and the blocking condition in the delay prediction process, further realizing the effect of self-adaptive adjustment of the jitter delay, improving the play effect of the multimedia data stream and improving the watching experience of users.

Drawings

The above and other features, advantages, and aspects of embodiments of the present disclosure will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.

Fig. 1 is a schematic diagram of a multimedia data stream transmission process in the related art;

FIG. 2 is a diagram of a related art system architecture of a jitter buffer;

FIG. 3 is a schematic diagram of the related art in the case where the target prediction delay is too small;

FIG. 4 is a schematic diagram of the related art in the case where the target prediction delay is too large;

fig. 5 is a flowchart of an audio/video processing method according to an embodiment of the present disclosure;

fig. 6 is a flowchart of an audio/video processing method according to an embodiment of the present disclosure;

fig. 7 is a flowchart of an audio/video processing method according to an embodiment of the present disclosure;

FIG. 8 is a schematic diagram of a jitter feature probability histogram provided by an embodiment of the present disclosure;

fig. 9 is a flowchart of an audio/video processing method according to an embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an audio/video processing device according to an embodiment of the disclosure;

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure have been shown in the accompanying drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but are provided to provide a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustration purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is based at least in part on. The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.

It should be noted that the terms "first," "second," and the like in this disclosure are merely used to distinguish between different devices, modules, or units and are not used to define an order or interdependence of functions performed by the devices, modules, or units.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those of ordinary skill in the art will appreciate that "one or more" is intended to be understood as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the various devices in the embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.

Before the present technical solution is introduced, an application scenario may be illustrated. The technical scheme of the embodiment of the disclosure can be applied to a scene of predicting jitter delay of a data packet in a multimedia data stream transmission process. First, a transmission procedure of a multimedia data stream can be explained according to fig. 1. As shown in fig. 1, the multimedia data streaming process may include steps of acquisition, pre-analysis, pre-processing, encoding, packaging and transmitting, network-based transmitting to a media background, network-based transmitting to a jitter buffer, framing and packet loss feedback in the jitter buffer, jitter estimation in the jitter buffer, decoding, post-processing, and rendering. The technical scheme provided by the embodiment is the technical scheme for predicting the jitter delay in the jitter buffer. As a final link before the transmission link sends the decoder, jitter buffer (jitter buffer) is an important module in real-time audio and video. The jitter buffer can process the conditions of data packet loss, disorder, delay arrival and the like, smoothly output data packets/frames to the decoding module, resist the influence of various weak network environments on playing/rendering, reduce blocking and improve the viewing experience of users. Generally, determining a target prediction delay of any time slice in an audio/video signal, so as to process audio/video data buffered in a jitter buffer based on the target prediction delay, is an important way to provide an audio/video playing effect. By way of example, the overall architecture of the jitter buffer may be described in connection with fig. 2. As shown in fig. 2, the jitter buffer may include a negative feedback module (nack node), a packet buffer module, a jitter estimation module, and a policy control module. Various delay prediction modes can be deployed in the delay prediction module, and optionally, jitter calculation, disorder histograms, jitter estimation filtering, jitter histograms and peak detection can be performed. Various strategy control modes can be deployed in the strategy control model, and optionally, strategy control and tun packet water level filtering can be included. In practical application, the jitter corresponding to the received data packet may be estimated based on the jitter estimation mode of the jitter estimation module, and further, the data packet buffered in the jitter buffer may be processed according to the determined jitter estimation value based on the policy control model. Further, the processed data packet may be transmitted to a decoding module to process the data packet based on the decoding module. In the related art, the target delay is usually predicted by adopting a mode such as probability histogram jitter estimation, peak value calculation jitter estimation or kalman jitter estimation.

However, in the case where the predicted target prediction delay is too small, there may be a case where the actual delay corresponding to the plurality of time slices is greater than the target prediction delay, or a case where the actual delay corresponding to the plurality of time slices is less than the target prediction delay, as shown in fig. 3. In fig. 3, the dashed line indicates the predicted target delay corresponding to each predicted time slice, each rectangular bar indicates a time slice, and the height of the rectangular bar indicates the actual delay corresponding to the time slice. For a time slice with the actual delay greater than the target prediction delay, that is, a time slice with the height of the rectangular bar higher than the position of the dotted line, the actual delay of the time slice and the target prediction delay are differentiated, the determined difference value can be expressed as the part of the rectangular bar above the dotted line, and as a blocking loss, the situation that the audio and video data played in the time slice can be blocked can be indicated. For a time slice with the actual delay less than the target prediction delay, that is, a time slice with the height of the rectangular bar lower than the position of the dotted line, the target prediction delay of the time slice is differentiated from the actual delay, and the determined difference value can be represented as a blank part below the dotted line as delay loss, so that the situation that delay waste exists in the audio and video data played in the time slice can be indicated.

Alternatively, in the case where the predicted target prediction delay is too large, there may be a case where the actual delay corresponding to each time slice is smaller than the target prediction delay, as shown in fig. 4. The dashed line in fig. 4 is the predicted target delay corresponding to each time slice, and the height of the rectangular bar indicates the actual delay corresponding to the time slice, as shown in fig. 4, where the actual delay corresponding to each time slice is smaller than the predicted target delay, which can indicate that the predicted target delay is unreasonable and the delay is wasted. Based on the above, by adopting the technical scheme provided by the embodiment of the disclosure, the probability histogram jitter estimation mode and the peak value calculation jitter estimation mode can be optimized. That is, after receiving the data packet, determining jitter characteristics of the data packet, and when detecting that the jitter characteristics extraction is satisfied, determining a first delay of the data packet based on the optimized probability histogram jitter estimation mode; and if the jitter characteristic of the data packet is detected to be the peak jitter characteristic, the second delay of the data packet can be determined based on the optimized peak calculation jitter estimation mode. Further, after determining the first delay and the second delay, a target delay of the data packet may be determined according to the first delay and the second delay. Based on the technical scheme provided by the embodiment of the disclosure, the effect of adaptively determining the target delay can be realized, the accuracy of jitter delay is improved, and the playing effect of the multimedia data stream is improved.

Before describing the solution of the embodiments of the present disclosure, it should also be noted that the delay determination algorithm determined according to the embodiments of the present disclosure may be deployed in a server or a client. The server may be a service program that provides services and resources to the client and has pertinence, and the device running the server is the server. Correspondingly, the client is a program corresponding to the server and providing local service for the user. Meanwhile, the client and the server may communicate based on various forms of text transmission protocols, such as hypertext transmission protocol (Hyper Text Transfer Protocol, HTTP). The delay determination algorithm in the embodiments of the present disclosure is illustratively integrated into application software supporting various functions such as audio and video processing, and the software may be installed in an electronic device. Alternatively, the electronic device may be a mobile device or a PC terminal, etc. The application software may be a type of software for processing data such as audio, video or audio-video, and specific application software is not described here in detail, so long as the processing of the data such as audio, video or audio-video can be realized. And the method can also be a specially developed application program which is integrated in corresponding software or in corresponding pages, so that a user can realize the processing of related data through the pages integrated in the PC side.

Fig. 5 is a schematic flow chart of an audio and video processing method provided by an embodiment of the present disclosure, where the embodiment of the present disclosure is applicable to any scenario in which jitter delay of a data packet needs to be predicted, the method may be performed by an audio and video processing apparatus, and the apparatus may be implemented in a form of software and/or hardware, and optionally, may be implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like.

As shown in fig. 5, the method includes:

s110, determining jitter characteristics corresponding to the data packets according to the received packet information of the data packets.

Wherein the data packet may be an encapsulation packet for transmitting communication data. Those skilled in the art will appreciate that data packets are typically organized units of data that encapsulate text, images, executables, and other data, etc., that can be transmitted over a network in a reliable and efficient manner. In this embodiment, the data packet includes a multimedia data stream, that is, in practical applications, the multimedia data stream may be transmitted based on the data packet. Wherein the multimedia data stream may be any form of data stream. Alternatively, the multimedia data stream may be an audio data stream, a video data stream, or a data stream composed of audio and video together, which is not specifically limited in the embodiments of the present disclosure. Packet information is understood to be information characterizing the nature of the data packet and the transmission of the data packet. Generally, before transmitting the data packet to the corresponding device side, the data packet may be numbered according to the sending sequence of the data packet to obtain a packet sequence number corresponding to the data packet, and at the same time, marking the data packet to mark the sending time of the data packet on the data packet, so as to send the numbered data packet to the device side. Furthermore, when the corresponding device side receives the data packet, the data packet may be marked to mark the data packet with the data packet receiving time. The packet number, the packet transmission time, and the packet reception time marked in the packet may be used as packet information corresponding to the packet. The packet information may further include other information associated with the data packet, for example, may include a data type in the data packet or a data packet transmission protocol.

The jitter feature is understood to be feature data that characterizes the delay variation of the data packets, i.e. the jitter feature is time difference data that characterizes the delay variation of the data packets. As will be appreciated by those skilled in the art, jitter, also known as variation in delay, refers to variation in delay exhibited by different packets in the same data stream. Typically, packets leave the sender at regular intervals, however, the regular intervals are destroyed by the different delays experienced by the packets as they pass through the network, thereby creating jitter.

In practical applications, in the process of transmitting a multimedia data stream, there are various situations that cause delay jitter, for example, jitter of the multimedia data stream due to out-of-order data packets, or jitter of the multimedia data stream due to lost data packets, etc. For different jitter occurrence conditions, different jitter feature determination strategies can be corresponding, so that when the jitter feature corresponding to the data packet is determined, specific analysis can be performed according to the packet information of the data packet.

Specifically, whether or not there is a packet disorder may be determined according to packet information of the packet, and if it is determined that there is a packet disorder, the jitter feature determination policy may be determined as an optimization policy. Furthermore, jitter characteristics corresponding to the data packets can be determined according to the optimization strategy and the packet information of the data packets. If it is determined that no packet out-of-order condition exists, the jitter feature determination policy may be determined as a base policy. Furthermore, jitter characteristics corresponding to the data packets can be determined according to the basic strategy and the packet information of the data packets.

It should be noted that, in the process of transmitting the multimedia data stream, the jitter characteristics of the data packets carrying the multimedia data stream are determined, so that the effect of analyzing the transmission condition of the multimedia data stream and optimizing the playing effect of the multimedia data stream in the terminal device based on the analysis result can be realized.

And S120, when the condition that the jitter characteristic extraction condition is met is detected, determining a first delay according to at least one of the jitter characteristic, the historical jitter characteristic corresponding to the historical data packet and a preset confidence level parameter.

The jitter feature extraction condition may be understood as a condition corresponding to extraction of a jitter feature corresponding to the data packet. The jitter feature extraction condition may be any condition that enables extraction of the jitter feature corresponding to the data packet. Alternatively, the shake feature extraction condition may be a sampling interval at which shake features are sampled. The sampling interval may be understood as the time interval between two adjacent samples of the dither characteristic. In this embodiment, since different jitter conditions may be determined based on packet information of the data packet, in order to be adaptable to jitter feature sampling procedures in different jitter conditions, the sampling interval may be dynamically updated based on packet information of the data packet. In practical applications, corresponding to different jitter conditions, different sampling interval updating modes may be corresponding, and the following may respectively describe these cases:

The first case may be: if the disorder is determined to exist based on the packet sequence number in the packet information, the sampling interval is adjusted to be a first preset multiple of the preset interval duration.

The packet sequence number may be a number indicating the transmission sequence of the data packet. Generally, before the data packet is transmitted to the next processing module, the data packet may be numbered to obtain a packet sequence number corresponding to the data packet, so that in a subsequent data packet processing process, a user may clearly and intuitively know the data packet transmission condition based on the packet sequence number of the data packet. The preset interval duration may be a preset jitter feature sampling interval. The preset interval duration may be any duration. The first preset multiple may be any multiple, and optionally, may be 0.7.

In practical applications, after determining the packet information of the received data packet, the packet sequence numbers in the received plurality of packet information may be compared to determine whether there is a case where the packet is not received in the transmission order of the data packet, that is, the reception order of the data packet is different from the transmission order of the data packet. If the disorder is determined, the sampling interval of the jitter feature can be adjusted from the preset interval duration to the product of the preset interval duration and the first preset multiple, so that the dynamic update of the sampling interval can be realized. The advantages of this arrangement are that: the sampling interval is reduced, so that the jitter delay updating speed can be improved, and the target delay is more easily prolonged to avoid jamming.

It should be noted that, if it is determined that no disorder situation exists based on the packet sequence number in the packet information, the sampling interval may be continuously maintained at the preset interval duration, and the jitter feature may be sampled based on the preset interval duration.

The second case may be: and if the number of the data packets in the preset time period is lower than the preset number, adjusting the sampling interval to be a second preset multiple of the preset interval time period.

In general, if the number of data packets received in the preset duration is less than the preset number, it may be determined that there are one or more data packets that are not transmitted to the corresponding terminal device on time, that is, a situation in which data packets are congestion and accumulated during the transmission process of the data packets. The preset duration may be any duration, and optionally, may be 500 milliseconds. The preset number may be any number, alternatively, 10. The second preset multiple may be any multiple, alternatively, may be 2.

In practical application, the number of the data packets received in the preset time period can be counted, if the number of the data packets received in the preset time period is lower than the preset number, the condition that the data packets are congested can be determined, at this time, the sampling interval of the jitter feature can be adjusted from the preset interval time period to the product between the preset interval time period and the second preset multiple, and therefore dynamic update of the sampling interval can be achieved. The advantage of this is that by extending the sampling interval, continuous congestion can be avoided and the target delay can also be more easily extended to avoid jamming.

In this embodiment, when it is detected that the jitter feature extraction condition is satisfied, the first delay may be determined according to at least one of the jitter feature, the historical jitter feature corresponding to the historical data packet, and a preset confidence level parameter.

Wherein, the historical data packet may be a data packet received before the current time. Accordingly, the historical jitter characteristic may be a jitter characteristic determined from packet information of historical data packets. In the statistical field, confidence level refers to the percentage between the interval containing the overall parameter and the total sample interval among the plurality of sample intervals constructing the overall parameter. The confidence level may represent the accuracy of the sample statistic, which refers to the probability that the sample statistic falls within a certain positive and negative interval of the parameter value. Correspondingly, the confidence level parameter is the percentage or probability value of the characterization confidence level. In this embodiment, the confidence level parameter may be any parameter value, alternatively, may be 0.85, 0.95, or 0.99, etc. The confidence level parameter may be understood as a parameter corresponding to when the jitter characteristic corresponding to the received data packet satisfies the jitter delay prediction requirement. The first delay may be understood as a jitter prediction delay determined based on the jitter characteristics of the received data packets and a preset confidence level parameter in case the jitter characteristic extraction condition is satisfied.

In practical application, when the jitter feature extraction condition is detected to be satisfied, the jitter feature of the data packet may be extracted, so as to perform delay optimization on the jitter buffer based on the jitter feature. Specifically, the jitter feature corresponding to the currently received data packet and the history jitter feature corresponding to the history data packet received before the current time may be determined, further, the jitter feature of the currently received data packet and the history jitter feature corresponding to the history data packet may be accumulated, and when the accumulated total jitter feature is detected to reach the preset confidence level parameter, the delay corresponding to the time when the confidence level parameter is reached may be taken as the first delay.

S130, determining a second delay based on the jitter characteristic and the historical jitter characteristic.

Wherein the historical jitter feature corresponds to historical data packets received prior to the current time. The second delay may be a jitter prediction delay determined by the jitter characteristics under satisfaction of a preset condition. Alternatively, the preset condition may be a jitter peak characteristic.

In general, jitter delay prediction is performed based on jitter peak values, which is one delay optimization mode in the jitter delay prediction process. In practical application, when predicting the target delay corresponding to the currently received data packet, the current jitter feature may be detected according to a preset jitter peak value determination condition after determining the jitter feature corresponding to the currently received data packet, in order to determine whether the current jitter feature meets the jitter peak value determination condition, the current jitter feature may be compared with the determined historical jitter feature, and further, if it is determined that the current jitter feature meets the jitter peak value determination condition, the current jitter feature may be determined as a peak value jitter feature, and jitter delay prediction may be performed according to the peak value jitter feature, so that the delay obtained after prediction may be used as a second delay.

And S140, determining a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

The target delay may be understood as a delay that meets the jitter buffer delay optimization requirements. In this embodiment, after the first delay and the second delay corresponding to the data packet are obtained, the first delay and the second delay may be processed according to a preset rule to obtain the target delay corresponding to the data packet. The preset rule may be a preset delay optimization rule. The preset rule may be any rule, and optionally, the preset rule may use the maximum value as the target delay, or the minimum value as the target delay, or may perform weighted summation on multiple delays to determine the target delay, or the like. Jitter buffer (jitter buffer) is an important module in the audio and video processing flow. In the development process of short video, the jitter buffer area can be arranged to effectively solve the problems of data packet loss, disorder, delay arrival and the like. The jitter buffer area can smoothly output data packets or audio and video frames to the decoding module, and resists the influence on playing or rendering under various weak network conditions, reduces the frequency of the occurrence of the clamping condition of the audio and video content, and improves the watching experience of users. In general, there may be at least two common jitter buffer arrangements: one is a static jitter buffer implemented in system hardware; one is a dynamic jitter buffer implemented in system software. Whichever arrangement is by adjusting the buffering to accommodate changes in the network. The multimedia data stream stored in the jitter buffer may be understood as a buffered and unplayed multimedia data stream.

In practical application, the first delay and the second delay corresponding to the data packet are delays determined according to different delay optimization modes, and in order to determine the optimal delay among the determined delays, the first delay and the second delay can be processed to obtain a target delay corresponding to the data packet.

Optionally, determining the target delay corresponding to the data packet based on the first delay and the second delay includes: and taking the maximum delay in the first delay and the second delay as the target delay corresponding to the data packet.

In practical application, after the first delay and the second delay are obtained, a maximum value between the first delay and the second delay can be determined, and the maximum value is the maximum delay. Further, the maximum delay may be used as a target delay for the data packet. The advantages of this arrangement are that: the final target delay is the optimal delay obtained by considering the data stream delay condition and the data stream blocking condition at the same time, and the playing quality of the multimedia data stream is optimized.

Further, after the target delay is determined, the multimedia data stream stored in the jitter buffer area can be processed according to the target delay, so that the adjusted jitter buffer area can smoothly output the multimedia data stream to the decoding module, and the occurrence frequency of the jam condition or the delay waste condition in the playing process of the multimedia data stream is reduced.

In practical application, after determining the target delay, a timestamp corresponding to the multimedia data stream stored in the jitter buffer may be determined, and further, a playing duration of the multimedia data stream may be determined according to the timestamp corresponding to the multimedia data stream. Furthermore, the target delay time and the playing time length can be compared in numerical value. Thus, the multimedia data stream stored in the jitter buffer can be processed according to the value size comparison result. The value comparison result may include that the play time length is longer than the target delay, or that the play time length is smaller than a preset multiple of the target delay. The comparison results of different values may correspond to different jitter buffer processing modes, and the two cases will be described below.

One case may be: and under the condition that the playing time is longer than the target delay, improving the transmission rate of the multimedia data stream stored in the jitter buffer.

The playing time length is the corresponding playing time length of the multimedia data stream stored in the jitter buffer under the playing condition. The delivery rate is understood to be the corresponding delivery rate of the multimedia data stream as it is delivered in the network.

In practical application, if the playing time is longer than the target delay, the transmission efficiency of the multimedia data stream stored in the jitter buffer area can be improved, so as to accelerate the multimedia data stream to be transmitted out of the jitter buffer area. The advantage of this is that by increasing the transport efficiency of the buffered multimedia data stream, the length of the jitter buffer can be reduced and the amount of buffered data stored, and thus the end-to-end delay of the multimedia data stream, can be reduced.

Another case may be: and under the condition that the playing time length is smaller than the preset multiple of the target delay, reducing the transmission efficiency of the stored multimedia data stream.

The preset multiple may be any number, and optionally, may be 0.75%. The transmission efficiency can be understood as the speed at which a data stream is transmitted from a source to a destination during any time period.

In practical application, when the playing duration is smaller than the product of the target delay and the preset multiple, the transmission efficiency of the multimedia data stream can be reduced, so that the transmission efficiency of the multimedia data stream stored in the jitter buffer area is reduced, and the buffering time of the multimedia data stream in the jitter buffer area can be increased. The advantages of this arrangement are that: by reducing the transmission efficiency of the multimedia data stream, the reserved length of the jitter buffer can be increased, and furthermore, the blocking can be reduced by increasing the delay.

It should be noted that, in practical applications, there may be a case where only the first delay is obtained, or only the second delay is obtained, where the first delay or the second delay may be used as a target delay to process the multimedia data stream stored in the jitter buffer based on the target delay.

Fig. 6 is a flowchart of an audio/video processing method according to an embodiment of the disclosure. Based on the above embodiment, the target policy may be determined according to the packet information of the data packet, and further, the jitter feature corresponding to the data packet may be determined according to the target policy and the packet information. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 6, the method of this embodiment may specifically include:

s210, determining a target strategy according to the packet sequence number in the packet information and the packet sequence number of each data packet in the time window.

The packet sequence number is the sequence number representing the sending sequence of the data packet method. The packet sequence number may be any form of information, and may alternatively be a number. A time window may be understood as a window of pre-built adaptive size. A predetermined number of data packets may be stored within the time window. When the number of the data packets stored in the time window reaches a preset number threshold and a new data packet enters the time window, the data packet which is stored in the time window in a preset manner and has the largest difference between the storage time and the current time can be moved out of the time window, so that the new data packet can be stored in the time window. In an exemplary embodiment, assuming that the threshold number of the preset data packets corresponding to the time window is 10, when it is detected that 10 data packets have been stored in the time window, if a new data packet to be stored in the time window is received, the storage time corresponding to the 10 data packets already stored in the time window may be determined, and the storage time may be first arranged in the first data packet, that is, the data packet stored in the time window first in the 10 data packets is moved out of the time window, and the new data packet is stored in the time window, so that the effect of "last-in first-out" may be achieved when the data packet stored in the time window reaches the threshold number of the preset data packet.

In practical applications, the newly received data packet may be processed according to the data packet stored in the time window, so as to determine the jitter characteristic corresponding to the newly received data packet. For example, assume that a packet is stored in the time window as packet 1, and when a new packet is received, the packet is referred to as packet 2. When determining the jitter characteristic corresponding to the data packet 2, determining the jitter characteristic corresponding to the data packet 2 according to the data packet associated information corresponding to the data packet 1; further, when a new data packet, namely, the data packet 3 is received, the jitter characteristic corresponding to the data packet 3 can be determined according to the data packet association information corresponding to the data packet 1 and the data packet 2.

The target policy may be understood as a policy for determining jitter characteristics corresponding to the data packet. The target policy may include a plurality of policies corresponding to different jitter occurrence situations. Alternatively, the target policy may include a base policy or an optimization policy. The optimization strategy may correspond to an out-of-order packet situation, that is, in the case where the data packet for which the jitter characteristic is to be determined is an out-of-order packet, the jitter characteristic corresponding to the data packet may be determined based on the optimization strategy. Accordingly, the base policy may correspond to a case of a non-out-of-order packet, that is, in a case where a data packet for which a jitter feature is to be determined is a non-out-of-order packet, the jitter feature corresponding to the data packet may be determined based on the base policy.

In practical application, after determining packet information of a newly received data packet in a time window, in order to determine whether the received data packet is a disordered packet, a corresponding packet sequence number can be determined based on the packet information of the data packet, and further, the packet sequence number can be compared with packet sequence numbers of all stored data packets in the time window, if the packet sequence number of the newly received data packet does not accord with the arrangement sequence of the packet sequence numbers of all stored data packets, the newly received data packet can be determined to be the disordered sequence number, and further, a target strategy can be determined to be an optimization strategy; if the packet sequence number of the newly received data packet matches the sequence of the packet sequence numbers of the stored data packets, the newly received data packet can be determined to be a non-disordered packet, and then the target policy can be determined to be the basic policy. For example, assuming that there are 5 data packets stored in the time window and packet numbers are 1, 2, 3, 4, and 5, if the packet number of the newly received data packet is 6, it may be determined that there is no disorder situation, and the target policy may be determined as the base vehicle; if the packet sequence number of the newly received data packet is 8, it can be determined that an out-of-order situation exists, and the target policy can be determined as an optimization policy.

S220, determining jitter characteristics corresponding to the data packets based on the target strategy and the packet information.

In this embodiment, after determining the target policy, the jitter characteristics corresponding to the received data packets may be determined based on the target policy and the packet information of all the data packets in the time window. The packet information may be packet information of each data packet stored in the time window, and packet information of the received data packet.

In practical applications, when the received data packet is an out-of-order packet or a non-out-of-order packet, the corresponding target strategies are different, so that the determination modes of jitter characteristics corresponding to the data packet are also different, and the modes of determining the jitter characteristics based on the different target strategies can be respectively described below.

Optionally, if the target policy is the basic policy, determining the jitter feature according to the acquisition time and the receiving time of the plurality of data packets in the time window.

The collection time may be a time corresponding to when the data packet is sent to the time window. Correspondingly, the receiving time is the corresponding time when the data packet is received in the time window.

In practical application, after determining that a received data packet is a non-disordered packet, the acquisition time and the receiving time of the data packet can be determined according to the packet information of the received data packet, then, for each data packet stored in the time window, the acquisition time and the receiving time of the current data packet can be determined, the difference between the receiving time and the acquisition time can be determined, and the difference can be used as the delay corresponding to the current data packet. After obtaining the delays corresponding to the data packets stored in the time window, the minimum value in each delay can be determined, that is, the data packet with the shortest transmission time stored in the time window is determined. Further, a difference between the received time of the received data packet and the determined time of the received data packet in the time window may be determined, to obtain a first value to be processed. And then, determining the difference between the acquisition time of the received data packet and the determined receiving time of the data packet in the time window to obtain a second value to be processed. Then, a difference between the first value to be processed and the second value to be processed may be determined, and the difference may be used as a jitter feature corresponding to the received data packet.

By way of example, the process of determining jitter characteristics based on the underlying policy may be formulated based on the following formula:

iat_ms＝(rev_ms-min_jitter_rev_ms)-(rtp_ms-min_jitter_rtp_ms)

wherein iat _ms represents the jitter characteristics of the received data packets; rev_ms represents the reception time of the received packet; min_jitter_rev_ms represents the time of reception of the packet stored in the time window and having the shortest transmission time; rtp_ms represents the acquisition time of the received data packet; min_jitter_rtp_ms represents the acquisition time of the data packet stored in the time window and having the shortest transmission time.

Optionally, if the target policy is an optimization policy, determining a target historical data packet in the time window, and determining the jitter characteristic based on the target historical data packet and a preset frame number.

The target historical data packet may be a data packet with the largest packet sequence number in the time window. The preset number of frames may be a preset data frame length.

In practical application, when determining that the received data packets are out-of-order packets, the packet sequence numbers of all the stored data packets in the time window can be determined, and the data packet with the largest packet sequence number is screened out to be used as the target historical data packet. Further, the jitter characteristic corresponding to the currently received data packet may be determined according to the packet information of the target historical data packet, the packet information of the currently received data packet, and the preset frame number.

It should be noted that, determining the jitter characteristics corresponding to the data packets based on the corresponding target policies has the following advantages: the accuracy of jitter characteristics can be improved, and further, the prediction accuracy of the target delay corresponding to the data packet is improved.

Optionally, determining the jitter feature based on the target historical data packet and the preset frame number includes: determining relative delay according to the acquisition time and the receiving time of the target historical data packet; determining a first numerical value according to the packet sequence number of the target historical data packet and the packet sequence number of the currently received data packet; determining jitter characteristics of the data packet based on the relative delay, the first value, the preset frame number, and the jitter characteristics corresponding to the target historical data packet

The jitter feature corresponding to the target historical data packet may be feature data representing a delay variation corresponding to the target historical data packet, that is, a predicted delay corresponding to the target historical data packet.

In practical application, after determining the target historical data packet from the data packets stored in the time window, the acquisition time and the receiving time of the target historical data packet may be determined according to the packet information of the target historical data packet. Then, a difference between the receiving time and the collecting time can be determined, and the difference is taken as the relative delay of the target historical data packet. Further, a difference between the packet sequence number of the target historical packet and the packet sequence number of the currently received packet may be determined, and the difference may be used as the first value. Further, a product between the first value and the preset number of frames may be determined, and the product may be added to the relative delay to obtain the value to be processed. Furthermore, the jitter characteristic corresponding to the target historical data packet can be obtained, the jitter characteristic is added with the value to be processed, and the value obtained after the addition can be used as the jitter characteristic corresponding to the current received data packet. The advantages of this arrangement are that: the method has the advantages that the effect of effectively reducing the jitter characteristic dimension is achieved, the data reporting amount is reduced, and further, the effect of accurately predicting the jitter delay under the condition that the terminal calculation force is limited is achieved.

By way of example, the process of determining jitter characteristics based on an optimization strategy may be formulated based on the following formula:

iat_ms＝newest_packet_rev_elapse+(newest_seq-cur_seq)*frame_len

+iat_delay_ms(newest_packet)

wherein iat _ms represents jitter characteristics corresponding to a currently received data packet; newest_packet_rev_elappe represents the relative delay of the target historical data packet; newest_seq represents a packet sequence number corresponding to the target historical data packet; cur_seq represents a packet sequence number corresponding to a currently received data packet; frame_len represents a preset number of frames; iat _delay_ms (newest_packet) represents the jitter characteristics corresponding to the target historical data packet; * Representing the product.

And S230, when the condition that the jitter characteristic extraction condition is met is detected, determining a first delay according to at least one of the jitter characteristic, the historical jitter characteristic corresponding to the historical data packet and a preset confidence level parameter.

S240, determining a second delay based on the jitter characteristic and the historical jitter characteristic.

S250, determining a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

According to the technical scheme, the target strategy is determined according to the packet sequence number of each data packet in the packet information and the packet sequence number of each data packet in the time window, the jitter characteristics corresponding to the data packets are determined based on the target strategy and the packet information, when the condition that the jitter characteristics extraction condition is met is detected, the first delay is determined according to at least one of the jitter characteristics, the historical jitter characteristics corresponding to the historical data packets and the preset confidence level parameters, the second delay is determined based on the jitter characteristics and the historical jitter characteristics, the target delay corresponding to the data packets is determined based on the first delay and the second delay, and the multimedia data stream stored in the jitter buffer area is processed based on the target delay, so that the effect of determining the jitter characteristics of the data packets under the condition that disorder and non-disorder exist is achieved, the flexibility of the jitter characteristics determination mode is improved, and the accuracy of the jitter characteristics is improved.

Fig. 7 is a flowchart of an audio/video processing method according to an embodiment of the disclosure. On the basis of the above embodiment, when it is detected that the shake feature extraction condition is satisfied, the total shake feature corresponding to the index value is adjusted based on the shake feature and the adjustment step length, and the first delay is determined according to the adjusted total shake feature and the confidence level parameter. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 7, the method of this embodiment may specifically include:

s310, determining jitter characteristics corresponding to the data packets according to the received packet information of the data packets.

And S320, when the condition that the jitter characteristic extraction condition is met is detected, determining an index value corresponding to the data packet based on the jitter characteristic and a preset adjustment step length, adjusting the total jitter characteristic corresponding to the index value based on a first preset weight, and adjusting the total jitter characteristics corresponding to other index values based on a second preset weight.

The adjustment step size may be a parameter for adjusting the magnitude of the jitter feature value. The adjustment step may be any value, alternatively, the adjustment step may be 5 milliseconds to 20 milliseconds. It should be noted that, the preset adjustment step may be determined based on the determined histogram of jitter feature probability, and a time interval between two neighboring bars in the histogram may be used as the adjustment step. The index value may be an identification characterizing the order of the jitter characteristics. The index value may be represented based on any form of information, alternatively, the index value may be represented numerically. In general, where a plurality of dither features are included, the corresponding dither feature may be located based on the index value to determine a dither feature corresponding to the index value among the plurality of dither features. The first preset weight may be any value, and optionally, may be 0.01. The second preset weight may be any value, and alternatively, may be 0.99. The total jitter characteristic may be determined based on the historical jitter characteristic corresponding to the historical data packet and the jitter characteristic corresponding to the currently received data packet.

In practical application, when the condition of extracting the jitter feature is detected to be satisfied, the probability histogram composed of the jitter feature can be updated to determine the first delay corresponding to the received data packet based on the updated probability histogram. Specifically, after determining the jitter feature corresponding to the data packet, a probability histogram for characterizing the jitter feature distribution rule may be determined according to the jitter feature corresponding to the currently received data packet and the history jitter feature corresponding to the history data packet. Further, a total jitter feature corresponding to the plurality of index values may be determined based on the probability histogram. As shown in fig. 8, namely, a probability histogram of jitter features is illustrated, it can be seen from fig. 8 that each of the square bars may be sequentially arranged from left to right, an arrangement sequence of the square bars may be represented based on the index value, and a length of each of the square bars is a total jitter feature corresponding to the corresponding index value, where at this time, the total jitter feature corresponding to the index value may be represented based on a probability form.

Further, in order to determine a specific position of the jitter feature in the determined probability histogram, a preset adjustment step size may be obtained, and a ratio between the jitter feature and the adjustment step size may be determined, where the ratio may be used as an index value corresponding to the data packet. Further, for the total jitter feature corresponding to the index value, a product between the total jitter feature and the first preset weight may be determined, and the product is used as the total jitter feature after the index value is adjusted; for the total jitter characteristics corresponding to other index values, products between the jitter characteristics and the second preset weights can be determined, and the products are used as the total jitter characteristics after adjustment of other index values.

For example, the index value corresponding to the data packet may be determined based on the following formula:

wherein index represents the index value corresponding to the data packet; iat _ms represents the jitter characteristics corresponding to the data packets; the bucket_ms represents the adjustment step size.

It should be noted that, setting the first preset weight to a value far smaller than the second preset weight has the following advantages: the attenuation speed of the total jitter characteristic corresponding to the matched index value can be increased, and the prediction delay corresponding to the currently received data packet can be conveniently and subsequently determined.

S330, sequentially accumulating the corresponding total jitter characteristics from the preset index value until the confidence level parameter is reached.

The preset index value may be an index value corresponding to when the predetermined jitter feature starts to accumulate. The preset index value may be any value, and optionally, may be 0.

In practical application, after the adjustment of the total jitter feature corresponding to each index value, the total jitter feature corresponding to each index value may be accumulated sequentially from the predetermined index value, and when it is determined that the accumulated total jitter feature reaches the preset confidence level parameter, that is, when the accumulated total jitter feature is equal to or greater than the confidence level parameter, the accumulation may be stopped.

Illustratively, with continued reference to FIG. 8, the confidence level parameter is assumed to be 0.95. The index value of the first bar in the graph may be used as a preset index value, and further, from the first bar, the total jitter feature corresponding to each bar may be accumulated sequentially, that is, the ordinate value corresponding to each bar in the probability histogram may be accumulated sequentially, until the accumulated ordinate total value reaches 0.95, and then the accumulation may be stopped.

And S340, taking the delay corresponding to the time when the confidence level parameter is reached as a first delay.

In practical application, when the accumulated total jitter characteristics reach the confidence level parameters, the index value corresponding to the confidence level parameters can be determined, and then the first delay corresponding to the data packet can be determined according to the adjustment step length and the index value. That is, the product between the adjustment step size and the index value corresponding to when the confidence level parameter is reached is determined and may be taken as the first delay corresponding to the data packet.

By way of example, with continued reference to the above example, assuming that the accumulated total value of the ordinate reaches 0.95 when accumulating to the 6 th bar, as can be seen from fig. 8, the index value corresponding to the 6 th bar is 5, the product between 5 and the adjustment step can be determined and taken as the first delay.

It should be noted that, when the accumulated total jitter feature reaches the confidence level parameter, the accumulated total jitter feature corresponding to the plurality of index values may be further accumulated to obtain a new confidence level parameter. If the difference between the new confidence level parameter and the preset confidence level parameter is larger than the preset value, the index value corresponding to the new confidence level parameter is used as the index value required for determining the first delay; if the difference between the new confidence level parameter and the preset confidence level parameter is smaller than the preset value, the index value corresponding to the preset confidence level parameter can be continuously used as the index value required for determining the first delay. The advantages of this arrangement are that: the effect of reasonably configuring the confidence level parameters can be achieved, and further, the effect of adaptively adjusting the target delay to maximize the target delay can be achieved.

S350, determining a second delay based on the jitter characteristic and the historical jitter characteristic.

And S360, determining a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

According to the technical scheme, through determining jitter characteristics corresponding to the data packet according to received packet information of the data packet, when the condition of jitter characteristic extraction is met, determining index values corresponding to the data packet based on the jitter characteristics and preset adjustment step sizes, adjusting total jitter characteristics corresponding to the index values based on first preset weights, adjusting total jitter characteristics corresponding to other index values based on second preset weights, accumulating the corresponding total jitter characteristics in sequence from the preset index values until reaching confidence level parameters, taking delay corresponding to the confidence level parameters as first delay, determining second delay based on the jitter characteristics and historical jitter characteristics, determining target delay corresponding to the data packet based on the first delay and the second delay, processing multimedia data streams stored in a jitter buffer based on the target delay, optimizing a mode of estimating delay based on a probability histogram, and improving prediction accuracy of delay.

Fig. 9 is a flowchart of an audio/video processing method according to an embodiment of the disclosure. On the basis of the above embodiment, in the case where the jitter feature is determined to be the peak jitter feature, the second delay is determined according to the peak interval corresponding to the jitter feature, the peak jitter feature, and the preset reference data corresponding to the preset jitter feature. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 9, the method of this embodiment may specifically include:

s410, determining jitter characteristics corresponding to the data packets according to the received packet information of the data packets.

And S420, when the condition that the jitter characteristic extraction condition is met is detected, determining a first delay according to at least one of the jitter characteristic, the historical jitter characteristic corresponding to the historical data packet and a preset confidence level parameter.

And S430, when the jitter characteristic is determined to be a peak value based on the jitter characteristic and the preset jitter characteristic, determining the jitter characteristic to be a peak value jitter characteristic, and determining a peak value interval between the peak value jitter characteristic and the previous historical peak value jitter characteristic.

Typically, after determining the jitter feature corresponding to any received data packet, the jitter feature may be detected to determine whether the jitter feature is a peak jitter feature. Further, if the jitter feature is determined to be a peak jitter feature, the jitter feature may be processed according to a peak optimization manner to obtain a second delay corresponding to the data packet; if it is determined that the jitter feature is not a peak jitter feature, the jitter feature may be processed based on other delay optimization methods to obtain a target delay corresponding to the data packet.

The preset jitter characteristic may be preset, and is used for determining whether other jitter characteristics are jitter delays of peak jitter characteristics. The peak jitter feature may be understood as a jitter feature corresponding to a maximum value among a plurality of jitter features. The previous historical peak jitter feature may be understood as the peak jitter feature that was determined prior to the current time and is closest to the current peak jitter feature. The peak interval is the time interval between two adjacent peak jitter features.

In practical applications, after determining the jitter feature corresponding to the received data packet, the jitter feature may be detected to determine whether the jitter feature is a peak jitter feature. In particular, the jitter feature may be detected based on a preset jitter feature and the jitter feature. Firstly, a predicted jitter characteristic determined based on a probability histogram can be obtained, and further, when the jitter characteristic corresponding to the data packet is detected to be larger than an accumulated value between a preset jitter characteristic and a preset numerical value, the jitter characteristic corresponding to the data packet can be determined to be a peak value.

The peak jitter feature determination process described above may be expressed, for example, based on the following formula:

iat_ms>target_level+78

Wherein iat _ms represents jitter characteristics corresponding to received data packets; target_level represents the predicted jitter characteristics determined based on the probability histogram; 78 is a predetermined value.

Or when the preset jitter characteristic is detected to be larger than the product between the predicted jitter characteristic and the preset multiple, the jitter characteristic corresponding to the received data packet can be determined to be a peak value.

iat_packets>2*target_level

wherein iat _packets represent a preset jitter characteristic; 2 is a preset multiple; * Representing the product.

If the jitter characteristic is greater than a predetermined jitter characteristic, the jitter characteristic may be determined to be a peak value, and the jitter characteristic may be determined to be a peak jitter characteristic. Further, a previous historical peak jitter feature closest to the peak jitter feature may be determined, and a first occurrence time of the previous peak jitter feature and a second occurrence time corresponding to the peak jitter feature may be determined, respectively, and further, a difference between the second occurrence time and the first occurrence time may be determined, and the difference may be used as a peak interval.

S440, determining a second delay based on the peak interval, the peak jitter characteristic and preset reference data corresponding to the preset jitter characteristic.

In this embodiment, the preset jitter feature may include at least three level ranges, and each level range includes corresponding preset reference data. The rank range can be understood as a preset peak element interval. Illustratively, the rank range may be [400 ms, 1280 ms ], [200 ms, 500 ms ], or [50 ms, 300 ms ]. The preset reference data may be understood as predetermined reference data that is required in case of determining the jitter delay of the peak jitter feature in the corresponding level range. The preset reference data includes a preset minimum interval duration and a maximum interval duration. The preset minimum interval duration may be understood as the minimum value of the peak interval between two adjacent peak jitter features. The preset maximum interval duration may be understood as the maximum value of the peak interval between two adjacent peak jitter features. The preset jitter profile includes three level ranges, level range 1, level range 2, and level range 3, respectively, for example. Wherein, the level range 1 is [400 ms, 1280 ms ], the preset minimum interval duration in the preset reference data corresponding to the level range 1 is 5 seconds, and the preset maximum interval duration is 10 seconds; the level range 2 is [200 ms, 500 ms ], the preset minimum interval duration in the preset reference data corresponding to the level range 2 is 8 seconds, and the preset maximum interval duration is 16 seconds; the level range 3 is [50 ms, 300 ms ], and the preset minimum interval duration in the preset reference data corresponding to the level range 3 is 20 seconds, and the preset maximum interval duration is 40 seconds.

In practical application, after the peak jitter feature is determined, a level range corresponding to the peak jitter feature may be determined according to a preset jitter feature. Further, preset reference data corresponding to the level range may be determined. Further, the current peak jitter feature may be detected according to the peak interval and preset reference data corresponding to the preset jitter feature, so as to determine whether the current peak jitter feature is an effective jitter feature. Furthermore, the second delay corresponding to the data packet may be determined based on the corresponding delay determining manner in the case that the current peak jitter feature is the valid jitter feature or the invalid jitter feature.

Optionally, determining the second delay based on the peak interval, the peak jitter feature, and preset reference data corresponding to the preset jitter feature includes: determining whether the current peak jitter feature is a valid jitter feature based on the peak interval and preset reference data corresponding to the preset jitter feature; if the current peak jitter feature is an effective jitter feature, a second delay is determined based on the effective jitter feature.

The effective jitter feature may be a preset peak jitter feature effective value.

In practical applications, after determining the peak jitter feature, the peak jitter feature may be compared with a plurality of reference ranges included in the preset jitter feature to determine a reference range in which the peak jitter feature is located, and to determine preset reference data corresponding to the reference range. Further, the determined peak interval may be compared with the determined preset reference data to determine whether the peak interval is greater than a preset minimum interval and less than a preset maximum interval in the preset reference data. Further, when it is detected that the peak interval is greater than the preset minimum interval and less than the preset maximum interval, the current peak jitter feature may be determined to be an effective jitter feature. Further, the effective jitter feature may be taken as the second delay. The advantages of this arrangement are that: whether the peak jitter characteristic is an effective peak value can be detected, and further, the delay corresponding to the data packet under the peak optimization condition can be accurately determined.

In practical application, if the detected peak interval is greater than the preset maximum peak interval in the corresponding preset reference data, the current peak jitter feature can be determined to be an invalid jitter feature, and further, the peak interval corresponding to the next peak jitter feature can be determined based on the timestamp corresponding to the current peak jitter feature.

Specifically, if the current peak jitter feature is determined to be an invalid jitter feature, the current peak jitter feature may not be stored in a pre-constructed peak list, the current peak jitter feature may be used as a basis for determining a peak interval corresponding to the next peak jitter feature, that is, if the next peak jitter feature is detected, the current peak jitter feature may be used as a previous peak jitter feature, the next peak jitter feature may be used as the current peak jitter feature, further, a timestamp corresponding to the previous peak jitter feature and a timestamp corresponding to the current peak jitter feature may be determined, and a difference value between the two timestamps may be used as the peak interval corresponding to the current jitter feature. The advantages of this arrangement are that: the invalid peak jitter feature can be detected, invalid peaks are prevented from being mixed in the peak list, and the application mode of the invalid peaks in the jitter estimation process is also determined.

In this embodiment, there may be other ways to determine the peak interval in addition to the peak interval from the previous peak jitter feature. Another way of determining the peak spacing may be described below.

Optionally, determining the peak interval further includes: determining historical peak intervals corresponding to each historical peak jitter characteristic in a peak list; and determining a weight coefficient based on the historical peak interval, and determining a peak interval corresponding to the jitter feature based on the weight coefficient and the maximum value in the historical peak interval.

The peak list may be a list storing association information of peaks whose peak jitter characteristics are peaks of effective jitter characteristics, among others. The peak list may store association information corresponding to a preset number of peaks. The preset number may be any value, alternatively 8. The peak correlation information stored in the peak list may be any information, and optionally, may be a peak interval.

In practical application, the historical peak intervals corresponding to the stored historical peak jitter features can be determined according to the peak list. Thereafter, the historical peak intervals may be summed to obtain a total historical peak interval, and a ratio between the total historical peak interval and the number of historical peak jitter features included in the peak list may be determined to obtain an average peak interval. Thereafter, a maximum peak interval and a minimum peak interval among the respective history peak intervals may be determined, and a reference peak interval may be determined from the maximum history peak interval and the minimum history peak interval. Further, a ratio between the reference peak interval and the average peak interval may be determined, and the ratio may be used as a weight coefficient. Thereafter, a product between the weight coefficient and the maximum value in the history peak interval may be determined, and the product may be taken as the peak interval corresponding to the jitter feature. Further, the jitter feature may be processed according to the peak interval to determine a second delay corresponding to the respective data packet. The advantages of this arrangement are that: the determination mode of the peak value interval is increased, and further, the accuracy of target delay can be improved.

The above-described peak interval determination process may be expressed, for example, based on the following formula:

peak_delay＝max_peak_delay*factor

wherein, the base_peak_period is generally between [ min_peak_period, max_peak_period ].

Wherein peak_delay may represent a peak interval corresponding to the jitter feature; max_peak_delay represents the maximum value in the history peak interval; factor represents a weight coefficient; base_peak_period represents the reference peak interval; avg_peak_period represents the average peak interval; min_peak_period represents the minimum peak interval; max_peak_period represents the maximum peak interval.

It should be noted that other peak interval determining manners are also included, and optionally, each historical peak jitter feature in the peak list is arranged in a sequence from high to low, and a peak interval corresponding to the historical peak jitter feature arranged at a preset position is taken as a peak interval corresponding to the jitter feature. The preset position may be any position, and optionally, may be a second position.

In practical application, when determining the peak interval, in order to obtain an accurate peak interval, for a historical time far away from the current time, the corresponding peak jitter feature can provide low referential property when determining the peak interval corresponding to the current time, at this time, the peak jitter feature corresponding to the historical time can be processed to determine the peak interval corresponding to the current time based on the processed peak jitter feature. Optionally, each time different from the current time by a preset time length, the corresponding peak jitter characteristic is adjusted based on a preset attenuation coefficient. Illustratively, the coefficient decays by 1% every 1 second from the current time.

In practical application, if the current peak jitter feature is an effective jitter feature and the step size of the peak list is larger than a preset value, determining the maximum peak jitter feature in the peak list;

if the current peak jitter feature is an invalid jitter feature and the step length of the peak list is larger than a preset value, determining a peak interval corresponding to the current peak jitter feature; and if the peak interval is smaller than the product between the maximum peak interval in the peak list and the preset multiple, determining the maximum peak jitter characteristic in the peak list.

Illustratively, when iat _ms is the valid peak and the size of peak_history is greater than 2, max_ iat _ms in peak_history is determined; when iat _ms is an invalid peak and the size of peak_history is larger than 2, calculating a peak interval corresponding to iat _ms, and if the peak interval is smaller than max_peak_period in peak_history by 2, determining max_ iat _ms in peak_history.

Wherein peak_history represents a peak list; size represents the step size; max_ iat _ms represents the maximum peak jitter feature; max_peak_period represents the maximum peak interval.

In practical applications, there may be a case where a peak cluster, that is, a preset number of peak jitter features are detected within a preset duration, or a plurality of peak jitter features are continuously detected. The preset duration may be any value, and optionally, may be 3 seconds. The preset number may be any value, alternatively, may be 2. At this time, for a plurality of peak jitter features in the same peak cluster, the timestamp corresponding to the peak cluster may be the creation time corresponding to the first peak jitter feature, and the peak jitter feature corresponding to the peak cluster may be the peak jitter feature maximum value in the peak cluster.

S450, determining a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

According to the technical scheme provided by the embodiment of the disclosure, the jitter characteristic corresponding to the data packet is determined according to the packet information of the received data packet, when the jitter characteristic extraction condition is detected to be met, the first delay is determined according to the jitter characteristic, the historical jitter characteristic corresponding to the historical data packet and the preset confidence level parameter, when the jitter characteristic is determined to be a peak value based on the jitter characteristic and the preset jitter characteristic, the jitter characteristic is determined to be the peak value jitter characteristic, the peak value interval between the peak value jitter characteristic and the previous historical peak value jitter characteristic is determined, the second delay is determined based on the peak value interval, the peak value jitter characteristic and the preset reference data corresponding to the preset jitter characteristic, the target delay corresponding to the data packet is determined based on the first delay and the second delay, so that the multimedia data stream stored in the jitter buffer is processed based on the target delay, the effect of optimizing the delay estimation mode based on the peak value detection is realized, and the prediction accuracy of the jitter delay is improved.

Fig. 10 is a schematic structural diagram of an audio/video processing apparatus according to an embodiment of the present disclosure, as shown in fig. 10, where the apparatus includes: a jitter characteristic determination module 510, a first delay determination module 520, a second delay determination module 530, and a target delay determination module 540.

The jitter feature determining module 510 is configured to determine jitter features corresponding to the received data packets according to packet information of the data packets; wherein the data packet comprises a multimedia data stream; the first delay determining module 520 is configured to determine, when it is detected that the jitter feature extraction condition is satisfied, a first delay according to at least one of the jitter feature, a historical jitter feature corresponding to a historical data packet, and a preset confidence level parameter; a second delay determination module 530, configured to determine a second delay based on the jitter feature and the historical jitter feature; wherein, the history jitter feature corresponds to a history data packet received before the current moment; a target delay determining module 540, configured to determine a target delay corresponding to the data packet based on the first delay and the second delay, so as to process the multimedia data stream stored in the jitter buffer based on the target delay.

Based on the above aspects, the jitter feature determining module 510 includes: the target policy determination submodule and the jitter feature determination submodule.

A target policy determining submodule, configured to determine a target policy according to a packet sequence number of each data packet in the packet information and a packet sequence number of each data packet in a time window; wherein the target policy comprises a base policy or an optimization policy, the optimization policy corresponding to a case of an out-of-order packet; and the jitter characteristic determining submodule is used for determining jitter characteristics corresponding to the data packets based on the target strategy and the packet information.

On the basis of the above technical solutions, the jitter feature determination submodule includes: a first jitter characteristic determining unit and a second jitter characteristic determining unit.

The first jitter characteristic determining unit is used for determining the jitter characteristic according to the acquisition time and the receiving time of a plurality of data packets in a time window if the target strategy is taken as a basic strategy; a second jitter feature determining unit, configured to determine, if the target policy is an optimization policy, a target historical packet in the time window, and determine the jitter feature based on the target historical packet and a preset frame number; the target historical data packet is the data packet with the largest packet sequence number in the time window.

On the basis of the above technical solutions, the jitter feature determination submodule includes: the device comprises a relative delay determining unit, a first numerical value determining unit and a jitter characteristic third determining unit.

The relative delay determining unit is used for determining relative delay according to the acquisition time and the receiving time of the target historical data packet; a first value determining unit, configured to determine a first value according to a packet sequence number of the target historical data packet and a packet sequence number of a currently received data packet; and a third determination unit for determining jitter characteristics of the data packet based on the first value, the preset frame number and the jitter characteristics corresponding to the target historical data packet.

On the basis of the above technical solutions, the jitter feature extraction condition is a sampling interval for sampling the jitter feature, the sampling interval is dynamically updated based on the packet information, and updating the sampling interval based on the packet information includes: if the disorder is determined to exist based on the packet sequence number in the packet information, adjusting the sampling interval to be a first preset multiple of a preset interval duration; and if the number of the data packets in the preset time period is lower than the preset time period, adjusting the sampling interval to be a second preset multiple of the preset interval time period.

Based on the above technical solutions, the first delay determining module 520 includes: the device comprises a total jitter characteristic adjusting unit, a total jitter characteristic accumulating unit and a first delay determining unit.

The total jitter feature adjustment unit is used for determining an index value corresponding to the data packet based on the jitter feature and a preset adjustment step length, adjusting the total jitter feature corresponding to the index value based on a first preset weight, and adjusting the total jitter feature corresponding to other index values based on a second preset weight; the total jitter feature accumulation unit is used for sequentially accumulating the corresponding total jitter features from a preset index value until the confidence level parameter is reached; and the first delay determining unit is used for taking the delay corresponding to the time when the confidence level parameter is reached as the first delay.

Based on the above technical solutions, the second delay determining module 530 includes: a peak interval determination sub-module and a second delay determination sub-module.

The peak interval determining submodule is used for determining that the jitter characteristic is a peak jitter characteristic when the jitter characteristic is determined to be a peak based on the jitter characteristic and a preset jitter characteristic, and determining a peak interval between the peak jitter characteristic and a previous historical peak jitter characteristic; and the second delay determining submodule is used for determining the second delay based on the peak value interval, the peak value jitter characteristic and preset reference data corresponding to the preset jitter characteristic.

On the basis of the above technical solutions, the second delay determining submodule includes: an effective jitter feature determination unit and a second delay determination unit.

An effective jitter feature determining unit, configured to determine whether a current peak jitter feature is an effective jitter feature based on the peak interval and preset reference data corresponding to the preset jitter feature; and the second delay determining unit is used for determining the second delay based on the effective jitter characteristic if the current peak jitter characteristic is the effective jitter characteristic.

On the basis of the technical schemes, the device further comprises: and a peak interval determining module.

And the peak interval determining module is used for determining the peak interval corresponding to the current peak jitter characteristic when the next peak jitter characteristic value is determined if the current peak jitter characteristic is an invalid jitter characteristic.

On the basis of the technical schemes, the preset jitter feature comprises at least three level ranges, and each level range comprises corresponding preset reference data, wherein the preset reference data comprises preset minimum interval duration and maximum interval duration.

On the basis of the technical schemes, the device further comprises: a historical peak interval determination module and a peak interval determination module.

The historical peak interval determining module is used for determining historical peak intervals corresponding to each historical peak jitter characteristic in the peak list; and the peak interval determining module is used for determining a weight coefficient based on the historical peak interval and determining the peak interval corresponding to the jitter characteristic based on the weight coefficient and the maximum value in the historical peak interval.

Based on the above technical solutions, the target delay determining module 540 is specifically configured to take the maximum delay of the first delay and the second delay as the target delay corresponding to the data packet.

The audio and video processing device provided by the embodiment of the disclosure can execute the audio and video processing method provided by any embodiment of the disclosure, and has the corresponding functional modules and beneficial effects of the execution method.

It should be noted that each unit and module included in the above apparatus are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for convenience of distinguishing from each other, and are not used to limit the protection scope of the embodiments of the present disclosure.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure. Referring now to fig. 11, a schematic diagram of an electronic device (e.g., a terminal device or server in fig. 11) 500 suitable for use in implementing embodiments of the present disclosure is shown. The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, the electronic device 500 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An edit/output (I/O) interface 505 is also connected to bus 504.

In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, magnetic tape, hard disk, etc.; and communication means 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 11 shows an electronic device 500 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or from the storage means 508, or from the ROM 502. The above-described functions defined in the methods of the embodiments of the present disclosure are performed when the computer program is executed by the processing device 501.

The electronic device provided by the embodiment of the present disclosure and the audio/video processing method provided by the foregoing embodiment belong to the same inventive concept, and technical details not described in detail in the present embodiment may be referred to the foregoing embodiment, and the present embodiment has the same beneficial effects as the foregoing embodiment.

The embodiment of the present disclosure provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the audio/video processing method provided by the above embodiment.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

In some implementations, the clients, servers may communicate using any currently known or future developed network protocol, such as HTTP (HyperText Transfer Protocol ), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the internet (e.g., the internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed networks.

The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to:

when the condition that the jitter feature extraction condition is met is detected, determining a first delay according to at least one of the jitter feature, a history jitter feature corresponding to a history data packet and a preset confidence level parameter;

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including, but not limited to, an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing description is only of the preferred embodiments of the present disclosure and description of the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in this disclosure is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or equivalents thereof without departing from the spirit of the disclosure. Such as those described above, are mutually substituted with the technical features having similar functions disclosed in the present disclosure (but not limited thereto).

Moreover, although operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.

Claims

1. An audio/video processing method, comprising:

2. The method of claim 1, wherein determining jitter characteristics corresponding to the data packets based on packet information of the received data packets comprises:

determining a target strategy according to the packet sequence number in the packet information and the packet sequence number of each data packet in the time window; wherein the target policy comprises a base policy or an optimization policy, the optimization policy corresponding to a case of an out-of-order packet;

and determining jitter characteristics corresponding to the data packets based on the target policy and the packet information.

3. The method of claim 2, wherein the determining jitter characteristics corresponding to the data packets based on the target policy and the packet information comprises:

if the target strategy is the basic strategy, determining the jitter characteristic according to the acquisition time and the receiving time of a plurality of data packets in a time window;

if the target strategy is an optimization strategy, determining a target historical data packet in the time window, and determining the jitter characteristic based on the target historical data packet and a preset frame number;

the target historical data packet is the data packet with the largest packet sequence number in the time window.

4. The method of claim 3, wherein the determining the jitter characteristic based on the target historical data packet and a preset number of frames comprises:

determining relative delay according to the acquisition time and the receiving time of the target historical data packet;

determining a first numerical value according to the packet sequence number of the target historical data packet and the packet sequence number of the currently received data packet;

and determining jitter characteristics of the data packet based on the relative delay, the first numerical value, the preset frame number and the jitter characteristics corresponding to the target historical data packet.

5. The method of claim 1, wherein the jitter feature extraction condition is a sampling interval at which a jitter feature is sampled, the sampling interval being dynamically updated based on the packet information, the updating the sampling interval based on the packet information comprising:

if the packet sequence number in the packet information determines that disorder exists, adjusting the sampling interval to be a first preset multiple of a preset interval duration;

and if the number of the data packets in the preset time period is lower than the preset number, adjusting the sampling interval to be a second preset multiple of the preset interval time period.

6. The method of claim 1, wherein determining the first delay based on at least one of the jitter characteristics, historical jitter characteristics corresponding to historical data packets, and a preset confidence level parameter comprises:

determining an index value corresponding to the data packet based on the jitter characteristic and a preset adjustment step length, adjusting a total jitter characteristic corresponding to the index value based on a first preset weight, and adjusting total jitter characteristics corresponding to other index values based on a second preset weight; wherein the total jitter feature is determined based on a historical jitter feature and the jitter feature;

Sequentially accumulating the corresponding total jitter characteristics from a preset index value until the confidence level parameter is reached;

and taking the delay corresponding to the time when the confidence level parameter is reached as the first delay.

7. The method of claim 1, wherein the determining a second delay based on the jitter characteristics and historical jitter characteristics comprises:

when the jitter feature is determined to be a peak value based on the jitter feature and a preset jitter feature, determining the jitter feature to be a peak value jitter feature, and determining a peak value interval between the peak value jitter feature and a previous historical peak value jitter feature;

the second delay is determined based on the peak interval, the peak jitter feature, and preset reference data corresponding to the preset jitter feature.

8. The method of claim 7, wherein the determining the second delay based on the peak interval, the peak jitter feature, and preset reference data corresponding to the preset jitter feature comprises:

determining whether a current peak jitter feature is a valid jitter feature based on the peak interval and preset reference data corresponding to the preset jitter feature;

And if the current peak jitter characteristic is an effective jitter characteristic, determining the second delay based on the effective jitter characteristic.

9. The method as recited in claim 8, further comprising:

and if the current peak jitter feature is an invalid jitter feature, determining a peak interval corresponding to a next peak jitter feature value based on a time stamp corresponding to the current peak jitter feature.

10. The method of claim 7, wherein the predetermined jitter characteristics comprise at least three level ranges, each level range including corresponding predetermined reference data, the predetermined reference data including a predetermined minimum interval duration and a predetermined maximum interval duration.

11. The method of claim 7, wherein determining the peak interval further comprises:

determining historical peak intervals corresponding to each historical peak jitter characteristic in a peak list;

and determining a weight coefficient based on the historical peak interval, and determining a peak interval corresponding to the jitter feature based on the weight coefficient and the maximum value in the historical peak interval.

12. The method of claim 1, wherein the determining a target delay corresponding to the data packet based on the first delay and the second delay comprises:

And taking the maximum delay in the first delay and the second delay as the target delay corresponding to the data packet.

13. An audio/video processing apparatus, comprising:

14. An electronic device, the electronic device comprising:

One or more processors;

storage means for storing one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the audio-video processing method of any of claims 1-12.

15. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing the audio video processing method of any of claims 1-12.