WO2024103961A1

WO2024103961A1 - Low-delay playing method and apparatus, and electronic device and storage medium

Info

Publication number: WO2024103961A1
Application number: PCT/CN2023/120692
Authority: WO
Inventors: 侯锐填
Original assignee: Oppo广东移动通信有限公司
Priority date: 2022-11-18
Filing date: 2023-09-22
Publication date: 2024-05-23
Also published as: CN118057841A

Abstract

The present application relates to a low-delay playing method and apparatus, and an electronic device and a storage medium. The method is applied to an audio playing device with a playing storage area, wherein the playing storage area comprises a first storage area and a second storage area. The method comprises: when a playing trigger moment Tt arrives, reading a playing storage area from a first preset position of a first storage area, and performing playing; triggering decoding at a first moment T1, so as to obtain first data, and storing the first data in a playing storage area according to a first storage rule; and triggering decoding at a second moment T2, so as to obtain second data, and storing the second data in the playing storage area according to a second storage rule, wherein T1 is the same as or later than the time Tr1 of an audio playing device receiving the first data, T2 is the same as or later than the time Tr2 of the audio playing device receiving the second data, T2 is later than Tt, and the first data starts to be played before first reading of a second storage area from Tt.

Description

Low-delay playback method, device, electronic device and storage medium

This application claims priority to the Chinese patent application filed with the China Patent Office on November 18, 2022, with application number 202211449488.7 and application name “Low-latency playback method, device, electronic device and storage medium”, the entire contents of which are incorporated by reference in this application.

Technical Field

The present application relates to the field of audio, and in particular to a low-latency playback method, device, electronic device and storage medium.

Background technique

With the widespread application of wireless audio playback devices such as wireless headphones, especially the development of Bluetooth true wireless (TWS, True wireless stereo) technology, the playback delay problem of wireless audio technology has attracted more attention from users compared to traditional wired headphones.

Summary of the invention

Embodiments of the present application provide a low-latency playback method, device, electronic device, and storage medium.

In a first aspect, a low-latency playback method is provided, which is applied to an audio playback device, wherein the audio playback device includes a playback storage area, and the playback storage area includes a first storage area and a second storage area. The method includes: when a playback trigger time Tt arrives, the playback storage area is read from a first preset position in the first storage area and played; at a first time T1, decoding is triggered to obtain first data, and the first data is stored in the playback storage area according to a first storage rule; at a second time T2, decoding is triggered to obtain second data, and the second data is stored in the playback storage area according to a second storage rule; wherein T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, and T2 is equal to or later than Tt, and the first data starts to play before the second storage area is read for the first time since Tt.

In a second aspect, a low-latency playback method is provided, which is applied to an audio playback device, wherein the audio playback device includes a first audio playback device and a second audio playback device, the first audio playback device and the second audio playback device respectively having a playback storage area as described in the first aspect, the first audio playback device and the second audio playback device respectively execute the method as described in the first aspect to achieve synchronous playback, wherein first data of the first audio playback device and first data of the second audio playback device have the same timestamp Ts1; second data of the first audio playback device and second data of the second audio playback device have the same timestamp Ts2; Tt is determined based on Ts1.

In a third aspect, a low-latency playback device is provided, the low-latency playback device comprising: a playback storage area, the playback storage area comprising a first storage area and a second storage area;

A playing module, configured to read and play the playing storage area from the first preset position of the first storage area when the playing triggering moment Tt arrives;

The audio module is configured to trigger decoding at a first moment T1 to obtain first data, and store the first data in a playback storage area according to a first storage rule; trigger decoding at a second moment T2 to obtain second data, and store the second data in a playback storage area according to a second storage rule; wherein T1 is equal to or later than the time Tr1 when the first data is received, T2 is equal to or later than the time Tr2 when the second data is received, T2 is equal to or later than Tt, and the first data starts to play before the second storage area is read for the first time starting from Tt.

In a fourth aspect, an audio playback device includes: a playback buffer area, the playback buffer area includes a first buffer area and a second buffer area;

The decoding unit is configured to trigger decoding at a first time T1 to obtain first data, and store the first data in a playback storage area according to a first storage rule; trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule;

A playback unit, configured to read and play the playback storage area from a first preset position in the first storage area when a playback triggering moment Tt arrives;

Among them, T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, T2 is equal to or later than Tt, and the first data starts to play before the second storage area is read for the first time since Tt.

In a fifth aspect, an electronic device is provided, comprising: at least one processor, at least one wireless transceiver and at least one memory, wherein the memory stores a computer program, and the computer program is executed by the electronic device to perform a method as described in any one of the preceding aspects.

In a sixth aspect, a storage medium is provided, which stores a computer program, and when the computer program is executed by an audio playback device, the The method of any one of the above aspects.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying any creative work.

FIG1A is a schematic diagram of wireless communication between an audio playback device and an audio source device in some embodiments;

FIG1B is a schematic diagram of wireless communication between a mobile phone and a TWS headset in some embodiments;

FIG2 is a flow chart of switching to a low latency mode in some embodiments;

FIG3 is a flowchart of switching a low-latency mode of a game in some embodiments;

Figure 4 is a schematic diagram of full-link delay based on LE Audio Bluetooth technology;

FIG5 is a schematic diagram of data transmission between a mobile phone and a TWS headset in some embodiments;

FIG6 is a diagram of a data processing architecture of a Bluetooth audio playback device in some embodiments;

Figure 7 is a schematic diagram of pingpang buffer decoding storage and playback;

FIG8 is a basic flow chart of low-latency playback in some embodiments;

FIG9 is a low-latency playback flow chart in Example A;

FIG10 is a flowchart of low-latency playback of headphones in Example A;

FIG11 is a low-latency playback flow chart in Example B;

FIG12 is a flowchart of low-latency playback of headphones in Example B;

FIG13 is a flowchart of low-latency headphone playback in Example C;

FIG14 is a low-latency playback flow chart in Embodiment D;

FIG15 is a flowchart of low-latency playback of headphones in Example D.

Detailed ways

The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.

It should be noted that the terms "including" and "having" and any variations thereof in the embodiments of the present application and the accompanying drawings are intended to cover non-exclusive inclusions. For example, a process, method, system, product or device including a series of steps or units is not limited to the listed steps or units, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products or devices.

It is understood that the terms "first", "second", etc. used in this application can be used in this article to describe various elements, but these elements are not limited by these terms. Without special instructions, these terms are only used to distinguish one element from another element. The term "plurality" used in this application refers to two or more. The term "and/or" used in this application refers to one of the solutions, or any combination of multiple solutions.

With the widespread application of wireless audio playback devices such as wireless headphones, especially the development of Bluetooth true wireless (TWS, True wireless stereo) technology, the playback delay problem of wireless audio technology has attracted more attention from users compared to traditional wired headphones, especially in scenarios such as playing games with TWS Bluetooth headphones, high sound delay has always been a pain point in mobile gaming experience.

As shown in FIG1A , the audio playback device 100 communicates wirelessly with the sound source device 200 via a wireless link, and the sound source device 100 transmits the audio data to the audio playback device 200 via the wireless link for playback, and the wireless link can be based on Bluetooth technology. Taking LE Aduio Bluetooth technology as an example, LE Audio is a new audio transmission technology promoted by the Bluetooth SIG Alliance, which brings people wireless audio services with lower power consumption, lower cost, higher quality, and lower latency. Different from the traditional classic Bluetooth, it can support audio data encoded by LC3 (Low Complexity Communications Codec, low-power audio codec), so that the transmission of audio data can better balance power consumption, real-time performance, sound quality and other aspects. As shown in FIG1B , a scenario of TWS headphones and mobile phone networking using LE Audio Bluetooth technology is provided. A Bluetooth transmission link based on LE Audio can be established between the left earphone 201, the right earphone 202 and the mobile phone 100, such as a Connected Isochronous Stream (CIS for short). Some embodiments of the present application can ensure that the left and right earphones are connected to each other. The earphones play synchronously while further reducing audio delay. It should be noted that the present application can be exemplified based on LE Aduio Bluetooth technology, and can also be applied to other wireless communication technologies without conflict.

It should be noted that the underlying packet transmission of LE audio Bluetooth is different from that of classic Bluetooth BT. The underlying layer mainly transmits packets through ISO channels. The connection-based method is called Connected Isochronous Stream (CIS for short), and the broadcast data-based method is called Broadcast Isochronous Stream (BIS for short). Taking the CIS-based method as an example, compared with the traditional Bluetooth transmission method, CIS adopts a synchronous transmission mechanism, that is, the number of retransmissions of the packet sent by the sender is limited, which is qualitatively different from the unlimited retransmission mechanism of BT. This is one of the reasons why CIS can achieve lower latency. However, in some specific scenarios, lower latency is still required to meet user needs, such as gaming scenarios.

In some embodiments, as shown in FIG2 , a method for switching to a low latency mode based on LE Audio technology includes:

S200: After the audio source device determines to enter the preset mode, it notifies the audio playback device to make the audio playback device enter the low latency mode;

S201: The audio source device updates CIS parameters in a preset mode, where the CIS parameters include: the number of retransmissions of a data packet;

S202: When the audio source device determines to exit the preset mode, it notifies the audio playback device to exit the low-latency mode and updates the CIS parameters.

Among them, the preset mode can be a game mode. As shown in Figure 3, the process of switching the game low-latency mode is taken as an example when the mobile phone is used as the sound source device and the headset is used as the audio playback device and the LE Audio Bluetooth technology is applied. It can be seen that when the mobile phone maintains the LE Audio Bluetooth connection with the headset, if the mobile phone enters the game, it will notify the headset that it has entered the game mode. In response to the notification of the mobile phone, the headset will enter the game low-latency mode and switch to the low-latency parameters. After entering the game, the mobile phone will switch to the CIS parameters of the game low-latency mode, mainly to reduce the number of retransmissions of the mobile phone's packets, so as to achieve a lower transmission latency; when there is a CIS data stream, CIS is established with the headset, and communication is carried out with the headset based on the CIS parameters in the low-latency mode; when the mobile phone exits the game, the headset is notified to exit the game low-latency mode and update the CIS parameters.

It should be noted that this embodiment can take low latency in a gaming scenario as an example. Of course, it can also be other scenarios with low latency requirements, which are not limited to this. Other low latency requirements can also refer to this embodiment, and other methods can be used to enter the low latency mode, and even the user can manually enter the low latency mode. Of course, low latency can also be used as a regular requirement without special entry. The actual needs can be adjusted according to the actual needs of the product, and it is not a limitation on other embodiments of this application.

In related technologies, the delay of the entire link from the audio source device to the audio playback device, Total System Delay, may include: Audio Processing Time, air transmission delay, and decoding and playback delay (Presentation Delay) of the audio playback end (such as sink devices such as headphones). It can also be described as the delay of the audio source device, the delay in air transmission, and the delay of the audio playback device. The presentation delay may refer to the delay from the synchronization reference time to the final sound of the headphones. This process can be calculated from the Sync point of the left and right ear packages, and may include the decoding of audio data, as well as the loss of subsequent sound effect processing and other paths. In LE Audio and other related standards, the default configuration of the Spec presentation delay is 40ms.

For the full-link delay of TWS audio playback devices based on LE Audio Bluetooth technology, please refer to Figure 4. The inventors have found that the delay difference between the audio source Audio Processing Time, LE Audio Bluetooth and classic Bluetooth BT is not large. The main difference between the two technologies is the air transmission delay (Transport Latency) and the headset decoding and playback delay (Presentation Delay). Among them, Transport Latency mainly refers to the air transmission delay. Taking mobile phones and TWS headphones as examples, it can include the time from the mobile phone Bluetooth chip side getting the data packet to the final headset Bluetooth chip side receiving the packet and reaching the synchronization reference time (synchronization reference time, referred to as sync). Transport Latency will be affected by the configuration of the number of retransmissions and parameters such as packet size/frame length. The value of the parameter will be different in different business scenarios.

As shown in Figure 5, a mobile phone is used as an audio source device and is currently connected to a pair of TWS headphones (audio playback device). A CIS is built for both the mobile phone and the dual headphones. The mobile phone sends a packet of data pk1 on the first CIS channel CIS_1, and synchronously sends a packet of data pk2 on the second CIS channel CIS_2. The pk1 and pk2 data are basically the same, and the main difference is that the left and right ear channels are different. Due to the time difference between the normal reception of data by the headphone end of different CIS channels, such as pk1 on CIS_1, the mobile phone sends the first packet pk1. The headphone end does not normally feedback ACK (ACK is used to indicate that the headset has correctly received the pk1 data packet), but replies NACK (NACK is used to indicate that the headset has not correctly received the pk1 data packet, which means that the mobile phone needs to retransmit). After the mobile phone resends a packet of data pk1, the headphone end replies ACK normally; and for pk2 on CIS_2, the mobile phone sends it once and the headphone end replies ACK normally, so that the left and right ears receive the corresponding pk There is a difference in the receive time. If the left and right ears perform their respective playback logic based on the time when the packet is received, it is bound to cause the left and right ears to be out of sync. In order to solve the problem of asynchrony between different CIS links, the relevant standards set a service data unit synchronization reference time (Service Data Unit synchronization reference time, which can be referred to as SDU Sync). This time can be determined according to the maximum transport latency set by the mobile phone. This time represents the latest time point when the headset effectively receives the packet data. If the headset receives the packet data after this time point, the packet data also needs to be discarded due to timeout. Although the receive time of pk1 and pk2 received by the left and right ears is different, the timestamps of pk1 and pk2 are the same, that is, the playback timestamp of the packet data is determined according to the synchronization reference time. This avoids the problem of asynchronous playback of the left and right ears due to the difference in receive time.

As shown in FIG6 , a data processing architecture of a Bluetooth audio playback device is provided, and Bluetooth technologies such as LE Audio can be applied. After the Bluetooth module of the Bluetooth audio playback device receives the data stream (such as the CIS data stream), the audio data in the data packet in the data stream (which can be understood as valid data) can be stored in the Bluetooth buffer. The audio module of the Bluetooth audio playback device will obtain data from the Bluetooth buffer (generally one frame of data) for decoding and store it in the decoder buffer, and then perform digital-to-analog conversion through the digital-to-analog converter DAC, and further play the analog signal through the speaker. For example, after the Bluetooth chip of the audio playback device receives the data sent by the Bluetooth audio source device through the transmission channel such as CIS, it will add a corresponding timestamp field to each frame of data according to the synchronization reference time. The timestamp field is used to identify the synchronization start point of the frame data. At the same time, each frame of data also has a corresponding playback order. Taking LE Audio Bluetooth technology as an example, the difference between the timestamps of two adjacent frames of data is the frame length of one frame of data. Taking the LC3 encoding it uses as an example, the frame length (frame-duration) is mainly 10ms and 7.5ms, which represents the length of time that each frame of data can be played, so the timestamp difference between adjacent frames is generally 10ms/7.5ms.

It can be known from the data processing architecture of the above-mentioned Bluetooth audio playback device that the data processing of the audio playback device can be divided into three parts. The first part includes a data receiving end, which is used to receive data packets from the sound source device, and after unpacking, the valid data therein (such as the audio data to be decoded) is stored in the communication storage area (such as the Bluetooth buffer area in Figure 6). This part is mainly executed by a communication module, such as a Bluetooth chip. The second part includes a decoding storage end, which obtains data from the above-mentioned communication storage area for decoding and stores the decoded data in the playback storage area. This part is mainly executed by an audio module. The third part includes a playback end, which reads data from the above-mentioned playback storage area for processing (such as digital-to-analog conversion), and then plays it through a speaker. If you want to ensure low latency, continuity, and stability of audio playback, you need to rely on the cooperation of the above-mentioned three parts. This application can also be improved from the settings of the above-mentioned related parts to reduce latency.

For example, DAC is a hardware process, which is usually asynchronous with decoding. That is, once DAC is turned on, it will continuously perform digital-to-analog conversion of data and output sound, and this process will be crossed with decoding. Therefore, in order to ensure the smoothness of data playback, the software design can design ping pang buffer (two buffers) or ring buffer (multiple buffers) for processing, of course, other storage types can also be used. Taking the design of ping pang buffer as an example, according to the read and write logic of ping pang buffer, as shown in Figure 7, first, each buffer is used to store one frame of LC3 data, that is, LC3 data with a playback length of 10ms or 7.5ms. When DAC moves data from ping buffer via DMA (Direct Memory Access) (reading data), according to the logic of ping-pang buffer, data can be written in pang buffer. When reading data from pang buffer, data is written in ping buffer. And because decoding takes time (usually 1-2ms), the time of decoding and writing buffer will be 1-2ms slower. If a buffer is read and written at the same time, it is easy to cause some data to be lost during DAC moving process. So in this case, when DMA starts to move data from ping buffer, the decoded data will be written to pang buffer. And when DMA starts to move data from pang buffer, the decoded data will be written to ping buffer, so that each DMA can normally get the complete decoded data from buffer. Therefore, according to the above principle, when the ping buffer is rendered, there is no sound played because no data has been decoded and placed in the ping buffer. After the ping buffer is played, an interrupt request is generated to trigger the decoding. The data obtained needs to be placed in the ping buffer because the pang buffer is being rendered. If no optimization is performed, a round of ping buffer and pang buffer must be rendered before there is sound output and the user can hear the sound, for example, two frames of data may be delayed.

According to the inventor's research, for LE audio technology, due to the Bluetooth official spec parameter configuration table, different retransmission times/frame length/bit rate and other parameter changes will only affect the size of transport latency, and will not affect the size of presentation delay. This value is always configured as a fixed value, such as 40ms as specified in the standard. Therefore, if the headset end wants to reduce the delay, it is necessary to modify the data processing logic of the audio playback device to optimize the delay. Therefore, this application proposes a low-latency playback method, which is described in detail below in conjunction with the embodiments.

In some embodiments, a low-latency playback method is applied to an audio playback device, the audio playback device includes a playback storage area, the playback storage area includes a first storage area and a second storage area, the method comprising: when a playback trigger time Tt arrives, the first storage area in the first storage area is played back. Start reading the playback storage area and playing from the preset position; trigger decoding at the first moment T1 to obtain the first data, and store the first data in the playback storage area according to the first storage rule; trigger decoding at the second moment T2 to obtain the second data, and store the second data in the playback storage area according to the second storage rule; wherein T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, thereby ensuring that the first data can be decoded; T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, thereby ensuring that the second data can be decoded, and T2 is equal to or later than Tt, thereby making it unnecessary to wait for the decoded second data to start playing, and starting the playing in advance; the first data starts playing before the second storage area is read for the first time since Tt, thereby ensuring that the first data can be heard by the user earlier, thereby reducing the audio delay, improving the performance of the audio playback device, and improving the user experience.

In some embodiments, the audio playback device also includes a third storage area for storing data to be decoded, the data to be decoded including first data to be decoded and/or second data to be decoded; the method also includes: sequentially receiving a first data packet and a second data packet sent by the sound source device, the first data packet including the first data to be decoded, and the second data packet including the second data to be decoded; storing the first data to be decoded and/or the second data to be decoded in the third storage area.

Due to the different storage rules, the settings of Tt, T1, T2, etc. need to be adjusted accordingly. Therefore, the following will be further explained according to the different storage rules:

In some embodiments, storing the first data in the playback storage area according to the first storage rule includes: storing the first data in the first storage area; storing the second data in the playback storage area according to the second storage rule includes: storing the second data in the second storage area.

In this embodiment, it can be understood that the playback triggering time Tt is determined according to the timestamp Ts1 corresponding to the first data and the first preset time period; the first preset time period is sufficient to decode and obtain the first data and store the first data in the first storage area. In some embodiments, the first preset time period is less than the total time of reading the first storage area and the second storage area; thus, the playback delay can be guaranteed to be less than the total time of reading the first storage area and the second storage area. If the first storage area and the second storage area are both one frame long, the delay can be guaranteed to be less than two frames.

It can be understood that T1 is earlier than or equal to Ta, and the time from Ta to Tt is sufficient to decode the first data and store it in the first storage area; thereby ensuring that decoding at T1 can store the first data in the first storage area before Tt arrives, so that there will be no delay in the playback phase after Tt. From the user's perspective, the sound can be heard when Tt arrives.

It can be understood that T2 is earlier than or equal to Tb, and the time from Tb to the completion of reading the first data is sufficient to decode the second data and store the second data in the second storage area, so that the second data can be played immediately after the first data is played, ensuring the continuity of data playback.

In some embodiments, storing the first data in the playback storage area according to the first storage rule includes: dividing the first data into a first portion of data and a second portion of data according to the playback order, the first portion of data is stored in a first area of the first storage area, the second portion of data is stored in a second area of the second storage area, and the first area is adjacent to the second area; storing the second data in the playback storage area according to the second storage rule includes: dividing the second data into a third portion of data and a fourth portion of data according to the playback order, storing the third portion of data in an area of the second storage area except the second area; storing the fourth portion of data in an area of the first storage area except the first area; or

The second data is stored in the playback storage area according to the second storage rule, including dividing the second data into a third part of data and a fourth part of data, storing the third part of data in an area of the second storage area except the second area; storing the fourth part of data in a third area of the fourth storage area, the third area is adjacent to the second storage area, and the fourth storage area is a storage area in the playback storage area adjacent to the second storage area.

In this embodiment, it can be understood that T1 is earlier than or equal to Tc, and the time from Tc to the time of reading the first storage area for storing the first data is sufficient to decode the first data and store the first data according to the first storage rule of this embodiment, so as to ensure that there is sound output when the position for storing the first data is read; T2 is earlier than or equal to Td, and the time from Td to the time of reading the first data is sufficient to decode the second data and store the second data according to the second rule, so as to ensure the continuity of the playback of the second data and the first data; Tt is determined according to the timestamp Ts1 corresponding to the first data and the second preset time period; the second preset time period is greater than or equal to 0, and in some embodiments, the second preset time can be less than the total time of reading the first storage area and reading half of the second storage area. In some embodiments, the setting of the second preset time ensures that the playback delay is less than the total time of reading the first storage area and the second storage area. That is to say, it can be adjusted according to the change of T2 time.

In some embodiments, the first data may be equally divided into a first portion of data and a second portion of data, the first portion of data is stored in the second half of the first storage area, the second portion of data is stored in the first half of the second storage area, and the first portion of data and the second portion of data are adjacent; the second data may be equally divided into a third portion of data and a fourth portion of data, the third portion of data is stored in the second half of the second storage area, and the fourth portion of data is adjacent to the first portion of data. The third partial data is stored in the first half of the first storage area, or the fourth partial data is stored in the first half of the fourth storage area, and the third partial data is adjacent to the fourth partial data.

In some embodiments, storing the first data in the playback storage area according to the first storage rule includes: storing the first data in the second storage area; storing the second data in the playback storage area according to the second storage rule includes: storing the second data in the first storage area; or storing the second data in a fourth storage area, the fourth storage area being a storage area in the playback storage area adjacent to the second storage area. It is understandable that the fourth storage area may be a storage area in the annular storage area.

In this embodiment, it can be understood that T1 is earlier than or equal to Te, and the time from Te to reading the first storage area is sufficient to decode the first data and store the first data in the second storage area, thereby ensuring the playback of the first data and reducing latency.

It can be understood that T2 is less than or equal to Tf, and the time from Tf to the completion of reading the first data is sufficient to decode the second data and store the second data in the first storage area or the fourth storage area, thereby ensuring the continuity of the playback of the first data and the second data.

It can be understood that Tt is determined based on the timestamp Ts1 corresponding to the first data and the third preset time period; the third preset time period is greater than or equal to 0 and less than the duration of reading the first storage area, thereby ensuring that the delay of the first data is controlled within the total duration of reading the first storage area and the second storage area.

In some embodiments, the playback trigger time Tt is equal to the sum of the timestamp Ts1 corresponding to the first data and the duration of the half-frame data.

In some embodiments, triggering decoding at a first time T1 to obtain first data includes: triggering decoding to obtain first data upon receiving a first data packet.

In some embodiments, triggering decoding at the second time T2 to obtain second data includes:

When the middle position of the first storage area is read, decoding is triggered to obtain the second data.

When the end position of the first storage area is read, decoding is triggered to obtain the second data.

In some embodiments, Tt is equal to the timestamp Ts1 corresponding to the first data.

In some embodiments, triggering decoding at a first moment T1 to obtain first data includes: triggering decoding at a play triggering moment Tt to obtain first data.

Generate a first interrupt request at a second time T2, and obtain data in response to the first interrupt request and decode to obtain second data;

The method further includes: generating a second interrupt request at a third time T3, decoding the acquired data in response to the second interrupt to obtain third data, and storing the third data in a playback storage area according to a third storage rule;

The time interval between the first interrupt request and the second interrupt request is at least the duration of one frame of data.

In some embodiments, the first storage area is a ping buffer, and the second storage area is a pong buffer; or, the first storage area and the second storage area are two adjacent buffer areas in a ring buffer.

In some embodiments, the audio playback device and the audio source device communicate via LE Audio Bluetooth technology.

In some embodiments, the first data and the second data are both one frame of data; and the first storage area and the second storage area are both sized to store one frame of data.

In some embodiments, T2 is less than or equal to Td, and the time from Td to the completion of reading the first data is sufficient to decode and obtain the second data and store the second data according to the second storage rule.

The following is further described in conjunction with the accompanying drawings: In some embodiments, as shown in FIG8, a low-latency playback method is provided, which is applied to an audio playback device, the audio playback device includes a playback storage area, the playback storage area includes a first storage area and a second storage area, the method includes:

At the playback end, for example, the part including reading the data in the playback storage area for playback, as shown in FIG8 , may include the following parts:

S800: Determine the playback triggering time Tt;

It is understandable that the definition of the playback trigger time can refer to the trigger time mentioned in the previous related technology, but in some embodiments, the setting of Tt can be adjusted. For example, the playback trigger time Tt can be determined based on the timestamp Ts1 corresponding to the first data and the first preset time period, where the first preset time period can be greater than or equal to 0. In other words, Tt can be the same as Ts1. In some embodiments, the first preset time period is less than the total time to read the first storage area and the second storage area, that is, the guaranteed delay is less than The total duration of the two storage areas; therefore, in some embodiments, Tt can also be adjusted with the change of other parameters, for example, Tt is determined according to the timestamp Ts1 corresponding to the first data and the second preset time period; the second preset time period is greater than or equal to 0, and for example, Tt is determined according to the timestamp Ts1 corresponding to the first data and the third preset time period; the third preset time period is greater than or equal to 0, and is less than the duration of reading the first storage area. When LE Audio Bluetooth technology is applied, Ts1 can be determined according to the SDU synchronization reference time corresponding to the first data, for example, Ts1 = SDU synchronization reference time, at this time, the first preset time period can be considered to be 0. In some embodiments, the first preset time period can be the duration of half a frame of data, that is, half a frame length, and then Tt = Ts1 + half a frame length. The inventors have found that the half-frame duration can better meet the audio module decoding a frame of data and storing it in the relevant storage area, so the half-frame duration can be reserved for the first data to be decoded and stored. Of course, in some embodiments, the time for decoding and storage may be less than half a frame duration, so the first preset time period may also be less than half a frame duration, as long as it can meet the time required to decode a frame of data and store it in the relevant storage area, so that decoding and storage can be completed before Tt arrives. In some embodiments, Tt = Ts1. In this case, it can be achieved by decoding while playing. However, since no data has been stored in the playback storage area (the playback storage area may correspond to the decoding buffer area of FIG. 6, which is used to store data for playback, so that the DAC can obtain the data for conversion and playback), although the audio module is reading and playing, no sound is actually emitted, and the user cannot hear it. Therefore, in order to reduce the delay, other parameters need to be adjusted in this case, such as adjusting the decoding time, which will be described in detail later.

S801: When the time Tt arrives, the playback storage area is read and played from the first preset position of the first storage area.

It can be understood that the above-mentioned first preset position can be the starting position of the first storage area, that is, when the playback trigger time Tt arrives, the first storage area is read for playback. However, it should be noted that in the above-mentioned different embodiments or combinations of embodiments, some can play the sound when Tt arrives, and in some embodiments, it may be necessary to read other positions of the first storage area before the sound can be played, such as reading the middle position, and some even read the second storage area. Because when Tt arrives, some storage areas in the first storage area may have no valid data, so even if the playback time render_time is running, that is, reading and playing, the user cannot hear the sound.

It should be noted that once Tt is reached to trigger playback, data will be read from the playback storage area according to the reading rules for playback, such as reading from the ping buffer to the pang buffer and looping back to the ping buffer for the next round of reading and playback. Even if an interrupt request is generated in the middle to trigger decoding, the reading and playback of the playback storage area will not be stopped. Therefore, it is necessary to ensure the timeliness of data collection, decoding and storage to ensure the continuity of playback. Otherwise, there may be freezes, silence, etc. because there is no data available for playback in the playback storage area.

As shown in FIG8 , the decoding storage end may include a portion that triggers decoding and stores the content in a playback storage area, for example, the following portions:

S802: Trigger decoding at a first moment T1 to obtain first data, and store the first data in a playback storage area according to a first storage rule.

It should be noted that, in some embodiments, the first data may be the first frame data to be played, such as the first frame audio data in playing music. It is understandable that the first storage area and the second storage area may be adjacent storage areas, or the render may be played at a uniform speed from the first storage area to the second storage area. For example, the first storage area may be a ping-pang buffer in a ping-pang buffer, and the second storage area may be a pong buffer in a ping-pang buffer; in some embodiments, the first storage area and the second storage area are two adjacent buffer areas in a ring buffer, for example, the first storage area is the first storage area of the ring storage area. Of course, there may be other types of storage types that are also applicable, which are not limited here.

In some embodiments, T1 may be equal to or later than the time Tr1 at which the audio playback device receives the first data, and Tr1 may refer to the time at which the communication module of the audio playback device receives the data packet carrying the first data, thereby ensuring that decoding can be triggered at time T1 to obtain the first data. Triggering decoding can be understood as notifying the audio module to obtain the data to be decoded for decoding operations, etc.

In some embodiments, the time T1 is earlier than the time when the first storage area is read for the first time since the time Tt, so there is no need to wait for the first storage area to be read before triggering the decoding to obtain the first data. The earlier the first data is obtained, the earlier the sound can be played out to reduce the delay. In addition, in order to ensure the reception of data, in some embodiments, T1 is greater than or equal to the reception time Tr1 of the first data. Tr1 can be understood as the time when the first data is actually received. If the first data is sent from the sound source device to the audio playback device in the form of a first data packet, Tr1 can be the time when the audio playback device receives the first data packet. In some embodiments, if Tr1 exceeds the synchronization reference time corresponding to the first data (for example, the SDU synchronization reference time specified in the LE Audio standard), the packet of data is discarded due to timeout to ensure the synchronization of playback. Further, in some embodiments, T1=Tr1, that is, decoding is triggered to obtain the first data when the first data packet is received. In some embodiments, the audio playback device may further include a third storage area (also referred to as a communication storage area), which can be used to store valid data to be decoded (for example, audio data for playback), and then The audio module can obtain data from the third storage area frame by frame for decoding. For example, after receiving the first data packet sent by the sound source device, the method also includes: storing the first data to be decoded in the first data packet in the third storage area, so that the audio module can obtain and decode the data before playing it. In addition, in the case of T1=Tr1, the encoded first data can also be directly decoded and stored in the playback storage area without first storing it in the third storage area. Of course, it is also possible to store it in the third storage area first, then read it immediately, decode it, and store it in the playback storage area.

In some embodiments, T1 is any time from Tr1 to the first time the second preset position of the playback storage area is played, and the time from the second preset position to the first storage area is sufficient to decode and store the first data in the second storage area. In other words, T1 can be equal to Tr1, or T1 can be equal to the time when the second preset position is played for the first time, or T1 can be any time between the two, which can be set according to actual needs. T1 can be greater than Tt, for example, T1 is any time from the time when half of the first storage area is read from the time Tt to the time when the second preset position is read, and the time from the second preset position to the first storage area is sufficient to decode and store the first data in the second storage area. It should be noted that T1 can be the time when the first storage area is read, or the time when the middle position of the first storage area is read, or the time when the second preset position is read, or other moments in between. Because, according to the analysis of the inventor, it is sufficient to decode and store the first data as long as the time from the time T1 to the time when the first data is read to the starting position for storing the first data is sufficient, that is, it is sufficient to ensure that the first data can be read and played when the starting position for storing the first data is read. The inventors have further discovered through research that when the length of the first storage area and the second storage area is one frame of data, and the length of the first data and the second data is one frame, the half-frame data duration can basically guarantee the decoding, storage and other system overheads of one frame of data, so it can also be set to trigger the decoding data when the half position of the first storage area is read for the first time since time Tt to obtain one frame of data (for example, the first data) and store it in the second storage area. Therefore, when the second storage area is read, the user can hear the audio corresponding to the first data, so that the delay is controlled within one frame of data and the continuity of the playback is guaranteed.

It can be understood that after the decoding is triggered at the first moment T1 to obtain the first data, in some embodiments, the first data can be stored in the first storage area according to the first storage rule; in some embodiments, the first data can be stored in the second storage area according to the first storage rule; in some embodiments, the first data can also be divided into at least two parts, which can be stored in the first storage area and the second storage area respectively according to the first storage rule. In order to ensure the continuity of playback, the continuity of data playback must also be ensured when stored in different storage areas. For example, in some embodiments, the first data is divided into the first part of data and the second part of data according to the playback order, the first part of data is stored in the first area of the first storage area, the first area is close to the second storage area, the second part of data is stored in the second area of the second storage area, the second area is close to the first storage area, the first part of data is adjacent to the second part of data, and it can also be understood that the first area is adjacent to the second area, so as to ensure that the first data can be played continuously. For example, the first part of data is stored in the second half of the first storage area, the second part of data is stored in the first half of the second storage area, and the first part of data is adjacent to the second part of data. It can be seen that the first data can be divided into two parts, the above-mentioned first area can correspond to the second half of the first storage area, and the second area can correspond to the first half of the second storage area. Of course, the first data may also be unequally divided, and there is no restriction on this. In the case of unequal division, the first part of the data may be more than the second part of the data. In this way, when the first part of the data is stored in the first storage area, it can be stored closer to the starting playback position, so that it can be read and played earlier. However, it should be noted that the time required to decode and store the first data should be estimated to ensure that the first data has been stored when it is played from the starting position of the first storage area to the starting storage position of the first data, so as to reduce the delay as much as possible.

It is understandable that the first storage rule may be different in different embodiments. In order to achieve the effect of reducing the delay, it is also necessary to combine the settings of Tt or T1, etc., and refer to the following embodiments for details.

In some embodiments, the first data is a frame of data; the first storage area and the second storage area are both sized to store a frame of data. In particular, when applied to LE Audio Bluetooth technology, each buffer (such as the first storage area and the second storage area) stores a frame of LC3 data. In some embodiments, the first data, the second data, and subsequent data are decoded with a data length of one frame. Of course, a frame of data can be sent from a sound source device to an audio playback device through a data packet, or sent to an audio playback device through multiple data packets, or multiple frames of data can be sent in one data packet. If LE Audio Bluetooth technology is applied, a frame of data can be compressed and transmitted through a data packet. Other compression and transmission methods may be used in other wireless communication protocols.

S803: Trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule.

It can be understood that the second data can be data adjacent to the first data. For example, the first data and the second data can both be one frame of data length, for example, the second data is the next frame of data of the first data. Therefore, T1 is earlier than T2, that is, the decoding is triggered first. The first data is read, and then the decoding is triggered to obtain the second data. In addition, the setting of T2 also needs to take into account the reception of the second data. At least when the audio playback device receives the data packet corresponding to the second data sent by the sound source device, the decoding can be triggered to obtain the second data. In some embodiments, T2 is the moment when the first storage area is read for the first time since the moment Tt. For example, when the first storage area is read, the data is obtained from the third storage area for decoding to obtain the second data. The specific moment may have a certain fluctuation range and all belong to the protection scope of this application.

In some embodiments, the audio playback device further includes a third storage area; the method further includes: receiving a second data packet sent by the audio source device, the second data packet including encoded second data; storing the encoded second data in the third storage area. It can be understood that the third storage area can be a buffer area of the Bluetooth module, which is used to store the data to be decoded received by the Bluetooth module. In addition, the second data packet may include the second data to be decoded, wherein the size of the second data is one frame.

In some embodiments, when the third preset position of the first storage area is read, the data is obtained from the third storage area for decoding to obtain the second data. Optionally, the third preset position is located between the middle position of the first storage area and the end position of the first storage area, and the duration of playing from the third preset position to the end position of the first storage area is sufficient to decode and store the second data according to the second storage rule. It can be understood that in some embodiments, the third preset position is located between the middle position of the first storage area and the end position of the first storage area, which can include the case where the third preset position is the middle position of the first storage area. For application scenarios such as LE Audio Bluetooth technology, the third preset position is set at the middle position of the first storage area, which can meet the requirement that the duration of playing from the third preset position to the end position of the first storage area is sufficient to decode and store the second data according to the second storage rule, such as storing the second data in the second storage area. Of course, the decoding time of the second data can be calculated more finely, so as to set the third preset position accordingly to ensure the continuity of the playback. In some embodiments, the second data is stored in the second storage area, so that when it is played to the second storage area, the second data has been decoded and placed in the second storage area. That is to say, the above method can ensure the continuity of the first data and the second data, and reduce the delay, for example, it can be controlled to be a half-frame delay. In some embodiments, a first interrupt request may be generated at the second time T2, and in response to the first interrupt request, data is acquired from the third storage area and decoded to obtain second data.

It can be understood that when the data amounts of the first data and the second data are the same, the second preset position and the third preset position can be the same position.

It should be noted that after the second data is decoded, the second data needs to be stored for reading and playing. In some embodiments, according to the second storage rule, the second data can be stored in the first storage area, such as the case of the ping-pong storage area; in some embodiments, the audio playback device also includes a fourth storage area. According to the second storage rule, the second data can also be stored in the fourth storage area. In this case, mainly for the annular buffer area, the fourth storage area can be a storage area adjacent to the second storage area, such as two adjacent buffers. In some embodiments, according to the second storage rule, the second data can also be stored in part or in whole in the second storage area. Taking the ping-pong storage area as an example, in some embodiments, the second data can be divided into a third part of data and a fourth part of data. According to the second storage rule, the third part of data can be stored in the area of the second storage area except the aforementioned second area; the fourth part of data can be stored in the area of the first storage area except the aforementioned first area. When the first area is the second half of the first storage area and the second area is the first half of the second storage area, the third part of data can be stored in the second half of the second storage area, and the fourth part of data can be stored in the first half of the first storage area, so as to ensure the continuity of the playback of the second data and the first data, as well as the continuity of the playback of the second data itself. In the case of a circular storage area, in some embodiments, the audio playback device further includes a fourth storage area, which is adjacent to the second storage area; the second data can be divided into a third portion of data and a fourth portion of data, and the third portion of data is stored in an area of the second storage area other than the second area; the fourth portion of data is stored in a third area of the fourth storage area, and the third portion of data and the fourth portion of data are adjacent, which can also be understood as the third area being adjacent to the second area, thereby ensuring the continuity of the second data playback. In the case of a circular storage area (such as a circular buffer), other parts can also be implemented based on some of the previous embodiments.

In some embodiments of the present application, as mentioned above, in addition to the above-mentioned decoding storage end and playback end, a receiving end is also included. In the receiving part, a data packet sent by the sound source device is received through a radio transceiver module (such as a Bluetooth module), and is stored in the aforementioned communication storage area after unpacking, thereby obtaining valid data such as the first data to be decoded and/or the second data to be decoded waiting for decoding. Information such as the reception time of the relevant data packet may also be related to the settings of the subsequent decoding storage and playback parts, as described in the description of the relevant embodiments for details.

It can be understood that in some embodiments of the present application, T1 is equal to or later than the time Tr1 at which the audio playback device receives the first data, T2 is equal to or later than the time Tr2 at which the audio playback device receives the second data, T2 is equal to or later than Tt, and the first data is received for the first time since Tt. Playback starts before the second storage area is read, thereby ensuring low-latency playback and playback continuity.

In some embodiments, T1 is earlier than T2, Tt is earlier than or equal to T2, and the setting of T2 ensures that the audio playback device has received the second data sent by the sound source device at T2, thereby ensuring that the second data can be decoded while reducing audio delay and improving user experience. Tt is earlier than or equal to T2, which can trigger decoding earlier to obtain the second data. Playing and decoding combined with the setting of storage rules can reduce delay while ensuring the continuity of playback. In some embodiments, the duration from T1 to T2 is sufficient to decode and store the first data according to the first storage rule. However, the interval of subsequent interrupt requests for triggering decoding can be controlled to at least one frame of data duration. In some embodiments, taking the application of LE Audio Bluetooth technology as an example, the frame length of the second data is one frame, which is encoded and sent to the audio playback device by the sound source device through the second data packet. Generally speaking, the difference between the timestamps corresponding to two adjacent frames of data is a frame duration, such as the length of a frame of data. To ensure the reception of the second data, the duration from the timestamp Ts1 corresponding to the first data to T2 needs to be controlled to be greater than or equal to the length of a frame of data. In addition, the interval between two subsequent interrupt requests can also be controlled to be at least one frame of data length to ensure data reception.

The above-mentioned embodiments or the combination of the technical features of the embodiments can enable the first data to be heard by the user earlier during playback, thereby reducing playback delay.

It should be noted that the first data and the second data in the above embodiments can be obtained by receiving a data packet sent by the sound source device, the first data or the second data can be divided into multiple data packets for transmission, or the first data can be transmitted by the first data packet and the second data can be transmitted by the second data packet, and there is no limitation on this. In addition, the order between S800-S803 is only for example, and can be executed in other orders without conflict, and some can even be executed in parallel, for example, the relevant parts of the aforementioned playback end and the relevant parts of the storage end may be parallel, and in different embodiments, the settings of each moment are different, so the order may also be different, but there is a connection between the two, for example, the determination of Tt in S800 and the setting of T1 in S802 can be related to the first data received from the sound source device, and the specific relationship is described in the previous or subsequent parts, which will not be repeated here; it can be understood that the above different embodiments and their technical features can be partially or completely combined to achieve the effect of reducing delay without conflict.

In some embodiments, based on a combination of part or all of the above embodiments or technical features, the method may further include:

S804: Generate a second interrupt request at a third moment T3, and in response to the second interrupt request, obtain data from a third storage area for decoding to obtain third data, wherein the third storage area is used to store one or more of the first data to be decoded, the second data to be decoded, and the third data to be decoded; the time interval between the first interrupt request and the second interrupt request is at least the duration of one frame of data.

It can be understood that, except for the interval from T1 to T2, the intervals between subsequent adjacent interrupt requests are at least as long as one frame of data, thereby ensuring the reception of data and ensuring that there is data for decoding.

S805: Store the third data in the playback storage area according to the third storage rule.

It can be understood that the third storage rule can be stored in the ping pang buffer in a manner that refers to the first storage rule. In the case of a circular buffer, there may be differences between the first storage rule and the second storage rule. If the circular loop position is not reached, the storage will continue. Otherwise, the storage can be similar to the ping pang buffer and referred to the first storage rule.

It should be noted that the order of the above steps is not limited, and they can even exist in parallel. For example, while playing, decoding and storing are performed at the same time. For the playback buffer, data is read out for playback while data is stored in the playback buffer. Storage and playback are balanced to ensure the continuity and stability of audio playback.

The following are some examples of combinations of the above embodiments and their technical features:

Embodiment A, as shown in FIG9 , provides a method for low-latency playback, which can apply wireless communication technologies such as LE Audio Bluetooth technology, and adjust the interrupt request timing, etc., so that the delay can be controlled within half a frame. The method includes:

S900: Receive a first data packet sent by a sound source device, where the first data packet includes first data to be decoded.

It can be understood that, taking the application of LE Audio technology as an example, the audio source device can establish a CIS channel with the audio playback device through LE Audio Bluetooth technology to transmit the first data packet. Generally speaking, after receiving the first data packet, it will be unpacked to extract valid data (such as the first data to be decoded), and then the first data to be decoded will be stored in a third storage area (such as a Bluetooth cache area) to wait for decoding and playback. In this embodiment, in order to reduce the delay, the first data is decoded in advance, so the first data to be decoded may not be stored in the third storage area, but directly decoded to obtain the first data.

S901: Obtain a timestamp Ts1 corresponding to the first data, and use the sum of Ts1 and the duration of the half-frame data as a playback triggering time Tt.

It can be understood that the timestamp Ts1 corresponding to the first data can be determined according to the synchronization reference time. Taking LE Audio Bluetooth technology as an example, the synchronization reference time can be the SDU synchronization reference time specified in the relevant standard, which is determined according to the maximum transmission latency set by the audio source device. The specific determination method can refer to the relevant description of the LE Audio Bluetooth protocol. Other communication protocols may have similar time parameters for synchronous playback reference, which can be unrestricted. In this embodiment, Ts1 can be the synchronization reference time (corresponding to the SDU synchronization reference time in LE Audio Bluetooth technology), so Tt can be equal to the sum of the synchronization reference time and the half-frame duration. The reason for increasing the half-frame duration is that the inventor has found that the half-frame duration can generally reserve decoding and other system time overheads. That is to say, the half-frame duration can stably ensure that the first data is decoded and stored according to the first storage rule for reading and playback.

S902: trigger decoding upon receiving a first data packet to obtain first data; store the first data in a first storage area.

It can be understood that in order to reduce the delay, the present embodiment does not need to wait until the playback time render_time is completed and the first storage area generates a preset interrupt request to trigger decoding to obtain the first data, but triggers decoding to obtain the first data when the first data packet is received, wherein the moment of receiving the first data packet can be Tr1, which is regarded as the actual receiving time of the first data packet. Here, it can be understood as notifying the audio module to obtain data for decoding as soon as possible after receiving the first data packet. In this case, the unpacked first data to be decoded does not need to be stored in the aforementioned third storage area, but can be directly decoded and stored in the playback storage area; it is also possible to store the first data to be decoded in the third storage area after unpacking when the first data packet is received, and notify the audio module to obtain the first data to be decoded from the third storage area for decoding.

It is understandable that, since the playback trigger time Tt in S901 is increased by half a frame, this half frame is sufficient for decoding to obtain and store the first data, so the first data can be directly stored in the first storage area, so there is no need to worry about the first storage area having no valid data to play when Tt arrives. Therefore, the delay on the playback device side can be shortened to half a frame.

It should be noted that there is no restriction on the execution order of the above S901 and S902 , for example, they can be processed in parallel.

S903: When Tt arrives, start reading and playing from the first preset position in the first storage area.

It is understandable that since Tt is set to arrive half a frame later than the timestamp Ts1 corresponding to the first data, the first data has been stored in the first storage area when Tt arrives, so the first data can be played at Tt so that the user can hear the sound earlier.

It should be noted that reading and playing begins when the Tt moment arrives. For example, DMA will continue to move data in the playback storage area according to preset rules for playback. Taking the ping pang buffer as an example, it can read and play from the ping buffer to the pang buffer, and then return to the ping buffer to continue a new round of reading and playing. When a preset interrupt request is generated at a certain position to trigger the decoding of the next frame of data, DMA will continue to move data from the playback storage area. Therefore, decoding and playback can be considered to be parallel. If decoding and storage are not timely, DMA may have no data to move, resulting in the inability to output audio and causing sound jams.

S904: Receive a second data packet sent by the audio source device, where the second data packet includes encoded second data; and store the encoded second data in a third storage area.

It can be understood that after receiving the second data packet, the second data to be decoded (i.e., encoded) is first obtained by unpacking, and the second data to be decoded is stored in a third storage area. The third storage area (such as a Bluetooth cache area) can be used to store valid data after unpacking, so that the audio module can obtain data frames therefrom for decoding.

It should be noted that S904 may be executed after S900, and may be executed in parallel with decoding and playing of the first data, such as in parallel with S901, S902 or S903.

S905: When the third preset position of the first storage area is read, trigger the acquisition of data from the third storage area for decoding to obtain second data; and store the second data in the second storage area.

It should be noted that the third preset position can be located between the middle position of the first storage area and the end position of the first storage area, and the duration of playback from the third preset position to the end position of the first storage area must be sufficient to decode and store the second data according to the second storage rule, so as to ensure that there is second data to play when playing to the second storage area.

It is understandable that when T2 arrives, it may be when the middle position of the first storage area is read, that is, when half of the first data is played, a first interrupt request, such as dma_irq, is generated. In response to the first interrupt request, the audio module of the audio playback device reads the second data to be decoded (such as a frame of data length) from the third storage area to decode and obtain the second data. The reason why the first interrupt request is generated after half of the first data is played is that the inventor has found through research that the playing time of the remaining half of the data is sufficient for decoding and storing the second data, and Tt delays half a frame and then plays the first data for half a frame, which is a total of one frame. In some cases, it is enough to receive the second data sent by the sound source device, such as when LE Audio technology is applied. For example, the time from the third preset position to the end position of the first storage area may be less than half the playback time of the first storage area. Therefore, when the second data decoding and storage are satisfied, the distance from the third preset position to the end position of the first storage area can be further shortened.

It can be understood that after the second data is obtained by decoding, the second data is stored in the second storage area, so that the second storage area can continue to be read after the first storage area is read, thereby ensuring the continuity of playback.

It should be noted that, during S905, the audio playback device will continue to read data from the playback storage area and play. It can be seen that the receiving end, the decoding storage end, and the playback end can be parallel, but at the same time, the continuity of data operation between the three must be guaranteed to ensure the continuity of playback. For example, relevant data can only be decoded and stored after it is received, and then it can be further played. If a problem occurs in a certain link, it may cause discontinuity of playback. This embodiment can further reduce the delay on the basis of ensuring the continuity of playback.

Further, based on the aforementioned scheme of Example A, for the third data such as the third frame data, the second interrupt request is set by setting the interval of one frame between the first interrupt request and the second interrupt request, that is, the second interrupt request is set in the middle position of the second storage area (such as pang buffer), so that when reading and playing to the middle position of the second storage area, the decoding is triggered to obtain the third data. Taking the ping pang buffer as an example, the third data can be stored in the ping buffer because the first data in the ping buffer has been played at this time, so it can be used to store the third data. Taking the ring buffer as an example, the third data can be stored in the fourth buffer, so as to continue to read and play the render. By analogy, the interrupt request interval still maintains the length of one frame of data to ensure packet reception, and guarantees the synchronization between decoding and playback consumption, so that the first data can be played earlier and the subsequent data can be played continuously, reducing the delay (half-frame delay) and improving the stability of playback.

In conjunction with FIG10 , the following example A is illustrated by taking a Bluetooth headset as an audio playback device, a mobile phone as an audio source device, using a ping-pang buffer, and applying LE Audio Bluetooth technology:

S1000: The headset receives a first data packet sent by the mobile phone at time Tr1, and triggers decoding of the data therein to obtain first data;

S1001: The headset stores the first data in the ping buffer;

S1002: When the playback trigger time reaches Tt, the headset starts reading and playing from the start position of the ping buffer, where the playback trigger time is equal to the timestamp Ts1 corresponding to the first data + half frame duration;

S1003: When the earphone plays to the half position of the ping buffer, a first interrupt request is generated to trigger decoding data to obtain second data;

S1004: The headset stores the second data in the pang buffer;

S1005: When the earphone plays to the middle position of the pang buffer, a second interrupt request is generated to trigger decoding of the data to obtain the third data;

S1006: The headset stores the third data in the ping buffer.

It can be understood that in embodiment A, the sound source device (such as a mobile phone) sends the first data packet, the second data packet and the third data packet to the audio playback device (such as headphones) in sequence. The valid data in the above data packets: the first data, the second data and the third data can all be one frame of data. After the audio playback device receives the first data packet, the audio playback device will continue to receive the second data packet, the third data packet and other subsequent data packets from the sound source device. The valid data after the relevant data packets are unpacked (such as the first data to be decoded, the second data to be decoded and/or the third data to be decoded) can be stored in the communication storage area (i.e., the third storage area, such as the Bluetooth buffer), and the above valid data is decoded to obtain the corresponding first data, second data or third data. The first data to be decoded may also not be stored in the communication storage area, but directly decoded, so as to be decoded and stored earlier in advance. There is a frame of data between the timestamp Ts1 corresponding to the first data and the timestamp Ts2 corresponding to the second data, and there is a frame of data between Ts2 and the timestamp Ts3 corresponding to the third data. The size of the ping buffer and the pang buffer is also set to a frame of data length.

By the above method, decoding is triggered to obtain the first data when the first data packet is received, so there is no need to wait for the first storage area to generate an interrupt request dma_irq after broadcasting. In order to avoid the conflict between DMA data movement and decoding and storage process, Tt is set to Ts1+ half-frame duration corresponding to the first data. The half-frame duration is mainly reserved for decoding and other system time overheads; the delayed half-frame is used to decode and store the first data in the first storage area (such as ping buffer), so that when Tt arrives, it can be directly read from the first storage area. The first data is read from the storage area for playback. For example, when Tt arrives, the first data is transferred from the ping buffer through DMA and sent to the DAC for digital-to-analog conversion and played out, so that the delay of the first data playback is controlled to half a frame. Further, since the time set by Tt is Ts1+half-frame duration corresponding to the first data, and the difference between the timestamps corresponding to the adjacent first data and the second data is a frame period frame_duration, that is, the length of a frame of data. In this way, when playing to the middle position of the first storage area (such as ping buffer) (playing half-frame duration), the second data (for example, carried by the second data packet) is generally also received by the audio playback device. At this time, the first interrupt request is generated to ensure the reception of the second data, to ensure that the trigger decoding of the next frame of data can obtain the second data, and at the same time, the half-frame duration of the second half of the first storage area is used to ensure that the second data is decoded and stored in the second storage area, so that when playing to the pang buffer, there is second data that can be played out. This method reduces the delay (only half-frame delay) and ensures the continuity of playback. It should be noted that the receiving time of the second data must be earlier than or equal to the timestamp Ts2 corresponding to the second data, otherwise the second data is discarded. If the second data audio playback device does not receive it normally, PLC packet supplement processing can be performed. Therefore, the playback delay of embodiment A can be controlled within half a frame of data, which is significantly improved compared to the two-frame delay.

Embodiment B, as shown in FIG11 , provides a low-latency playback method, which can use wireless communication methods such as LE Audio Bluetooth technology to transmit data, and can control the delay to half a frame by modifying the storage position and other methods. The method includes:

S1100: Receive a first data packet sent by a sound source device, where the first data packet includes first data to be decoded.

For example, the audio source device can establish a CIS channel through LE Audio Bluetooth technology to transmit the first data packet. Generally speaking, after receiving the first data packet, it will be unpacked to extract valid data (such as the first data to be decoded), and then the first data to be decoded will be stored in a third storage area (such as a Bluetooth cache area) to wait for decoding and playback. In this embodiment, in order to reduce latency, the first data is decoded in advance, so the first data to be decoded may not be stored in the third storage area, but directly decoded to obtain the first data.

S1101: Obtain a timestamp Ts1 corresponding to the first data, and use Ts1 as Tt; when Tt arrives, read and play from a first preset position in the first storage area.

It can be understood that the timestamp Ts1 corresponding to the first data can be determined according to the synchronization reference time. Taking LE Audio Bluetooth technology as an example, the synchronization reference time can be the SDU synchronization reference time specified in the relevant standards, which is determined according to the maximum transmission delay transport latency set by the audio source device. The specific determination method can refer to the relevant description of the LE Audio Bluetooth protocol. Other communication protocols may have similar time parameters for synchronous playback reference, which can be unrestricted. In this embodiment, Ts1 can be the synchronization reference time (corresponding to SDU synchronization reference time in LE Audio Bluetooth technology), so Tt can be equal to SDU synchronization reference time. In embodiment B, Tt can be set to Ts1.

It should be noted that, for the related instructions of reading and playing from the first preset position of the first storage area when Tt arrives, reference may be made to the description in other embodiments, such as S903, and will not be repeated here.

S1102: Trigger decoding upon receiving a first data packet to obtain first data.

It can be understood that in order to reduce the delay, the present embodiment does not need to generate an interrupt request when the playback time render_time has completed the first storage area to trigger decoding to obtain the first data, but triggers decoding to obtain the first data when the first data packet is received, wherein the moment Tr1 of receiving the first data packet can be regarded as the actual receiving time of the first data packet, which can be understood here as notifying the audio module to obtain data for decoding as soon as possible after receiving the first data packet. In other words, T1 = Tr1. The audio module can directly decode the first data to be decoded after unpacking, or it can first store the first data to be decoded in the communication storage area (such as Bluetooth buffer) and wait for the audio module to read and decode.

S1103: Divide the first data into a first part of data and a second part of data in a playback order, wherein the first part of data is stored in a first area of a first storage area, the first area is close to the second storage area, the second part of data is stored in a second area of the second storage area, the second area is close to the first storage area, and the first part of data is adjacent to the second part of data.

It can be understood that the first storage area and the second storage area are two adjacent storage areas, so the first data can be placed separately in the two storage areas, and the first part of the data is further stored in the first storage area near the second storage area, so that the first data can be decoded and stored by playing the front part of the first storage area. It should be noted that the above "close to" and "front part" are relative, and are not necessarily divided by the middle. In addition, the first part of the data is adjacent to the second part of the data, indicating that the first part of the data and the second part of the data are to be continuous when playing, for example, including the adjacent parts of two adjacent buffers.

In some embodiments, the first portion of data is stored in the second half of the first storage area, the second portion of data is stored in the first half of the second storage area, and the first portion of data and the second portion of data are adjacent. That is, the first data is evenly divided into two parts, one half is placed in the second half of the first storage area, and the other half is placed in the first half of the second storage area.

In some embodiments, the first data may be unequally divided into two parts, for example, the first half may exceed the second half of the first storage area, that is, a part of the data is stored in the first half of the first storage area. In theory, as long as the time from T1 to the starting storage position of the first part of the data is played back, it can be sufficient to decode and store the first data.

S1104: After reading the first storage area, trigger the acquisition of data from the third storage area for decoding to obtain second data.

It can be understood that T2 is when the first storage area is read. In some embodiments, a first interrupt request is generated at time T2, and data is obtained from the third storage area in response to the first interrupt request for decoding to obtain the second data. The third storage area can refer to the previous description. Before decoding the second data, it is generally necessary to ensure that the second data packet corresponding to the second data is received by the audio playback device. In this embodiment, when applying LE Audio Bluetooth related technology, the timestamps corresponding to two adjacent frames of data differ by one frame duration. Therefore, when the first storage area is read, the second data is generally received by the audio playback device.

S1105: Divide the second data into a third part of data and a fourth part of data according to the playback order, store the third part of data in an area of the second storage area except the second area; store the fourth part of data in an area of the first storage area except the first area.

It is understandable that the first storage area, the second storage area or the fourth storage area in this embodiment can be set to the size of one frame of data. This solution can be applied to the ping-pang buffer. To ensure the continuity of data playback, the ping buffer and the pang buffer will be read and played in a loop. Therefore, the storage method of this embodiment can ensure low latency and continuity of data playback.

In some embodiments, the second data is divided equally into a third portion of data and a fourth portion of data in sequence, the third portion of data is stored in the second half of the second storage area, and the fourth portion of data is stored in the first half of the first storage area.

In some embodiments, which are applicable to a circular buffer area, S1105 can be replaced by: dividing the second data into a third portion of data and a fourth portion of data, storing the third portion of data in an area of the second storage area other than the second area; storing the fourth portion of data in a third area of the fourth storage area, and the third portion of data and the fourth portion of data are adjacent.

It is understandable that there are not only the first storage area and the second storage area in the circular buffer, for example, there is also a fourth storage area adjacent to the second storage area, so the second data can be placed in the remaining part of the second storage area and the front part of the fourth storage area. In this way, the continuity of the playback of the first data and the second data can be guaranteed in the process of reading from the second storage area to the fourth storage area. It should be noted that when applied to the circular storage area, in the absence of conflict, the other steps in this embodiment can be the same as those in the ping-pang storage area, and will not be repeated.

On the basis of the above part, there may be subsequent data such as third data and fourth data. The decoding and storage of subsequent data according to different storage area types can refer to the rules of previous data, such as first data and second data. For example, the above low-latency playback method also includes:

S1106: Generate a second interrupt request at a third time T3, and obtain data from the third storage area in response to the second interrupt request and decode to obtain third data.

It is understandable that the third storage area can be used to store one or more of the first data to be decoded, the second data to be decoded, and the third data to be decoded; other valid data to be decoded can also be stored. To ensure the reception of each frame of data, in some embodiments, it is necessary to set the time interval between the first interrupt request and the second interrupt request to at least the duration of one frame of data. In some embodiments, the interval between two subsequent adjacent interrupt requests needs to maintain the duration of at least one frame of data to ensure packet reception. In some embodiments, the T3 time is when the second storage area is read, or it can be when the first storage area is started to be read, or it can be when the fourth storage area is started to be read.

S1107: Sequentially divide the third data into a fifth part of data and a sixth part of data, and store them according to a third storage rule.

It is understandable that different storage types have different storage rules. For example, in a ping-pang buffer, the fifth part of the third data can be stored in the second half of the ping buffer, and the sixth part of the third data can be stored in the first part of the pang buffer. Ring buffers, etc. can be stored according to relevant rules, which will not be described in detail.

It should be noted that during the decoding and storage process, the data in the playback storage area can be continuously read and played, for example, the data in the playback storage area can be moved by DMA, and then input to the speaker for playback after conversion by DAC. It can be seen that playback and decoding storage can be parallel. In order to ensure the continuity and low latency of playback, it is necessary to reasonably set the decoding timing and playback timing. The timing when the code obtains the first data and the way of storing the first data can reduce the delay, and further, the continuity of subsequent playback can be guaranteed by reasonably setting the interrupt request.

In conjunction with FIG12 , the following example B is described by taking a Bluetooth headset as an audio playback device, a mobile phone as an audio source device, using a ping-pang buffer, and applying LE Audio Bluetooth technology:

S1200: The headset receives a first data packet sent by the mobile phone at time Tr1, and triggers decoding of the data therein to obtain first data;

S1201: When the playback trigger time reaches Tt, the earphone starts reading and playing from the start position of the ping buffer, where the playback trigger time Tt is equal to the timestamp Ts1 corresponding to the first data;

S1202: The headset divides the first data into a first part of data and a second part of data according to a playback order, the first part of data is stored in the second half of the ping buffer, the second part of data is stored in the first half of the pang buffer, and the first part of data is adjacent to the second part of data;

S1203: When the headset finishes reading the ping buffer (it can also be understood as starting to read the pang buffer), it triggers to obtain data from the Bluetooth buffer for decoding to obtain the second data;

S1204: the headset divides the second data into a third part of data and a fourth part of data in order, stores the third part of data in the second half of the pang buffer, and stores the fourth part of data in the first half of the ping buffer;

S1205: When the earphone finishes playing the pang buffer (or starts playing the ping buffer), the earphone generates a second interrupt request to trigger decoding of the third data packet to obtain the third data;

S1206: The headset divides the third data into the fifth part of data and the sixth part of data in sequence, and stores the fifth part of data in the second half of the ping buffer and stores the sixth part of data in the first half of the pang buffer.

It can be understood that the above example of embodiment B does not need to adjust the logic design of the interrupt compared with the related example of embodiment A, and only needs to modify the storage logic of the decoded data in the ping pang buffer. When the headset receives the first data packet seq1 (first packet), the headset does not wait for the first storage area to be played before decoding, but notifies the audio module to decode to obtain the first data at the reception time Tr1 of the first data, and at the same time, the first data is split into two half-frame data according to the playback order, the first half-frame data is placed in the second half of the ping buffer, and the second half-frame data is placed in the first half of the pang buffer, that is, the ping pang buffer places each half-frame data of the first data. And Tt (trigger time) is set to the timestamp Ts1 corresponding to the first data. When the DMA starts to play (render) to the half-frame position of the ping buffer, the first data can start to play, so that the playback delay of the first data is the delay of half a frame. Subsequently, when the DMA render reaches the starting point of the pang buffer, the system will generate a preset first interrupt request, and the interrupt request interval is at least equal to the frame period frame_duration. When the audio module receives the interrupt request, it will start to get a frame of data from the third storage area (Bluetooth buffer) to decode and obtain the second data, and split the second data into two half-frames of data. The first half-frame of data is placed in the second half of the pang buffer, and the second half-frame of data is placed in the first half of the ping buffer. The subsequent process is analogous. By designing in this way, the delay of each frame of data from the corresponding timestamp time_stamp to the final playback sound point can be stably controlled within the half-frame duration.

Embodiment C provides a low-latency playback method, which can be improved on the basis of embodiment B, and S1102 in embodiment B is modified to: trigger decoding at the playback trigger time Tt to obtain the first data. The other parts can be carried out with reference to the relevant steps of embodiment B, which will not be described in detail here. Embodiment C can also control the delay of the half-frame data length while ensuring the continuity of playback.

In conjunction with FIG13 , the following example C is illustrated by taking a Bluetooth headset as an audio playback device, a mobile phone as an audio source device, using a ping-pang buffer, and applying LE Audio Bluetooth technology:

S1300: The headset receives the first data packet sent by the mobile phone at time Tr1, unpacks it and stores it in the Bluetooth buffer;

S1301: When the earphone reaches the time Tt, it starts to read the ping buffer and play, and triggers the decoding of data from the Bluetooth buffer to obtain the first data, where Tt is equal to the timestamp Ts1 corresponding to the first data;

S1302: The headset divides the first data into a first part of data and a second part of data according to a playback order, the first part of data is stored in the second half of the ping buffer, and the second part of data is stored in the first half of the pang buffer, and the first part of data and the second part of data are adjacent to each other;

S1303: When the earphone finishes reading the ping buffer, it triggers to obtain data from the Bluetooth buffer for decoding to obtain the second data;

S1304: the headset divides the second data into a third part of data and a fourth part of data in order, stores the third part of data in the second half of the pang buffer, and stores the fourth part of data in the first half of the ping buffer;

S1305: When the headset finishes playing the pang buffer (or starts playing the ping buffer), the headset generates a second interrupt request to trigger the acquisition of data from the Bluetooth buffer for decoding to obtain third data;

S1306: The headset divides the third data into the fifth part of data and the sixth part of data in sequence, and stores the fifth part of data in the second half of the ping buffer and stores the sixth part of data in the first half of the pang buffer.

It can be understood that, compared with Embodiment B, Embodiment C adjusts the time for triggering decoding to obtain the first data from moment Tr1 to moment Tt. The reason for this adjustment is that, after research by the inventor, it is believed that the duration of half-frame data is sufficient to decode and store the first data. Therefore, triggering decoding from moment Tt to obtain the first data is also sufficient to ensure half-frame delay and continuous playback. The effect is similar to that of Embodiment B, except that the timing of the first interrupt request design is different. Of course, a more accurate decoding to storage time can be determined through more precise calculations, so as to further adjust the timing of triggering decoding to obtain the first data or the storage location, and even the delay may be shortened to less than half a frame duration, all of which fall within the scope of protection of the present application.

Embodiment D, as shown in FIG14 , provides a low-delay playback method capable of controlling the delay of only one frame of data, including:

S1400: Receive a first data packet sent by a sound source device, where the first data packet includes first data to be decoded.

It can be understood that the description of S1400 can refer to S1100.

S1401: Obtain a timestamp Ts1 corresponding to the first data, and use Ts1 as Tt; when Tt arrives, read and play from a first preset position in the first storage area.

It can be understood that the description of S1401 can refer to S1101.

S1402: Trigger decoding upon receiving a first data packet to obtain first data.

It can be understood that the description of S1402 can refer to S1102.

S1403: Store the first data in the second storage area.

It can be understood that the second storage area can be the pang buffer of the ping-pang buffer, the second buffer of the ring buffer, or a part of other storage areas, and there is no limitation on this.

S1404: After reading the first storage area, trigger the acquisition of data from the third storage area for decoding to obtain second data.

It can be understood that S1404 can refer to the relevant description of S1104.

S1405: Store the second data in the first storage area.

It is understandable that the first storage area may be a ping buffer of a ping-pang buffer, or may be the third storage area of a ring buffer. When applied to the ring storage area, S1405 may be replaced by: storing the second data in the fourth storage area. For example, for the ring-shaped buffer area, the fourth storage area may be a storage area adjacent to the second storage area.

It should be noted that there may be a third frame of data in Embodiment D. For example, the third data is a frame of data after the second data. The third data can continue to be stored with reference to the first data. Different buffer types may have different storage methods. You can refer to the previous storage rules for storage and playback, which will not be repeated here. The technical features in Embodiment D are applicable to ping-pang buffers and circular buffers when there is no conflict, and can also be applied to other types of storage methods. Embodiment D can satisfy that time T1 is earlier than the time when the first storage area is read for the first time since time Tt, Tt is earlier than or equal to T2, and the setting of T2 ensures that the audio playback device has received the second data sent by the sound source device at time T2.

In conjunction with FIG15 , the following example D is described by taking a Bluetooth headset as an audio playback device, a mobile phone as an audio source device, using a ping-pang buffer, and applying LE Audio Bluetooth technology:

S1500: The headset receives a first data packet sent by the mobile phone at time Tr1, and triggers decoding of the data therein to obtain first data;

S1501: When the playback trigger time reaches Tt, the headset starts reading and playing from the start position of the ping buffer, where the playback trigger time Tt is equal to the timestamp Ts1 corresponding to the first data;

S1502: The headset stores the first data in pang buffer;

S1503: When the earphone finishes reading the ping buffer, it triggers to obtain data from the Bluetooth buffer for decoding to obtain the second data;

S1504: The headset stores the second data in the ping buffer;

S1505: When the earphone finishes playing the pang buffer (or starts playing the ping buffer), the earphone generates a second interrupt request to trigger decoding of the third data packet to obtain third data;

S1506: The earphone stores the third data in the pang buffer.

It can be understood that in embodiment D, the headset can notify the headset audio module to decode the first packet of data at the time Tr1 when the headset receives the first data packet seq1. The headset does not need to wait for the interrupt request dma_irq generated when the first storage area is read. Instead, the first data is triggered to decode and obtain the first data at the time Tr1 of receiving the first data, and the first data is placed in the second storage area, such as pang buffer. And the play trigger time Tt is set to the timestamp Ts1 corresponding to the first data, so that the theoretical delay of the first data is a delay of the length of a ping buffer, that is, a frame length of 7.5ms/10ms, because the first data has been placed in the pang buffer in advance, when the pang buffer is played, the headset can start to sound, and at this time, the distance from the headset start point, that is, Tt, is exactly a buffer, such as a frame time. When render_time reaches the starting point of the pang buffer, that is, the first interrupt cycle irq_intervel has been experienced, and a preset first interrupt request will be generated at this time, and the audio module starts to get data from the Bluetooth buffer to decode and obtain the second data, and places the second data in the ping buffer.

Embodiment E provides a low-delay playback method, which can be improved on the basis of embodiment D, so as to control the delay of one frame of data and ensure the continuity of playback. In embodiment D, when the first data sent by the sound source device is received (i.e., Tr1 moment, for example, when the first data packet is received), decoding is triggered to obtain the first data. According to the analysis of the inventor, T1 can also be set in other ways, for example, T1 is any moment from Tr1 to the first playback to the second preset position, and the time from the second preset position to the completion of the playback of the first storage area is sufficient to decode and store the first data in the second storage area. In other words, as long as the time interval from T1 to the starting storage position of the first data is sufficient to trigger decoding to obtain the first data and store it in the above position, for example, the first data is stored in the second storage area. This embodiment can control the delay in one frame and ensure the continuity of playback. The scheme of embodiment E can be adjusted on the basis of embodiment D, for example, S1402 is replaced by: triggering decoding to obtain the first data at any moment from Tr1 to the first playback to the second preset position. Among them, the time from the second preset position to the completion of the playback of the first storage area is sufficient to decode and store the first data in the second storage area. For other reference steps, please refer to Example D and will not be repeated here.

It can be understood that in different embodiments, when describing the ranges of T1 and T2, a preset position of the playback storage area is used, such as the first preset position and the second preset position, and a preset moment is also used, such as Ta, Tb, Tc, Td, etc., but their functions are similar, have correspondence when there is no conflict, and can be interchangeable.

It should be noted that the technical features in the above embodiments are applicable to ping-pang buffers and ring buffers when there is no conflict, and can also be applied to other types of storage methods. Embodiments A-E can all satisfy that T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, T2 is equal to or later than Tt, and the first data starts to play before the second storage area is read for the first time since Tt. It can be seen that by optimizing the settings of Tt, T1, T2, and storage rules, low latency of audio playback can be achieved, while ensuring the continuity of audio playback, thereby improving user experience.

Some of the above embodiments can also be applied to TWS audio playback devices, such as TWS headphones, TWS speakers, etc. The TWS audio playback device may include a first audio playback device and a second audio playback device. The first audio playback device and the second audio playback device can both apply the low-latency playback method of some of the above embodiments. The first audio playback device and the second audio playback device receive data packets sent by the audio source device. The data packets received by the two devices may be the same or different. For example, the first audio playback device receives left channel data, and the second audio playback device receives right channel data. The left and right channel data corresponding to a frame of data have the same timestamp, thereby ensuring that the first audio playback device and the second audio playback device can play audio synchronously.

In some embodiments, a low-latency playback method is provided, which is applied to a playback device, the playback device comprising a first audio playback device and a second audio playback device, the first audio playback device and the second audio playback device respectively having a playback storage area described in any of the foregoing embodiments, the first audio playback device and the second audio playback device respectively executing the method described in any of the foregoing embodiments (for specific steps, see the foregoing embodiments, which are not described in detail here) to achieve synchronous playback, wherein first data of the first audio playback device and first data of the second audio playback device have the same timestamp Ts1; second data of the first audio playback device and second data of the second audio playback device have the same timestamp Ts2; Tt is determined according to Ts1, so that the first audio playback device and the second audio playback device can start reading their respective playback storage areas for playback at the same playback triggering time Tt, in addition, the first audio playback device and the second audio playback device are set to the same time T1, so that the two can be decoded synchronously to obtain the first data, and in addition, the two have the same Ts2, so that the respective second data can also be played synchronously, for example, if the playback time of the second data of a certain playback device exceeds Ts2 due to a playback error, the second data is not played to ensure the synchronization of playback.

In some embodiments, the first data of the first audio playback device and the first data of the second audio playback device may be the same. In some embodiments, the first data of the first audio playback device and the first data of the second audio playback device may also be different, for example, one is the left channel and the other is the right channel. For example, the first audio playback device is the left earphone of the TWS headset, and the second audio playback device is the right earphone of the TWS headset.

In some embodiments, the timestamp corresponding to each frame of data played by the first audio playback device is the same as the timestamp corresponding to each frame of data played by the second audio playback device, thereby ensuring the synchronous playback of each frame of data by the first audio playback device and the second audio playback device, so that the sound heard by the user remains consistent. Each of the corresponding frames can be understood as a frame of data with the same synchronization reference time sent to the first audio playback device and the second audio playback device respectively.

It can be understood that in order to ensure low latency and continuity of playback, T1 needs to be set earlier than T2, that is, the first data is decoded first, and then the second data is decoded, and T1 is set earlier than the moment when the first storage area is read for the first time since Tt, that is, decoding and storage can be triggered earlier, so that the audio delay can be reduced earlier, and Tt is set earlier than or equal to T2 and T2 is set to ensure that the audio playback device has received the second data sent by the sound source device at T2, that is, there is no need to wait for the second data to be decoded to start playback, avoiding delayed playback and increasing delay, and ensuring that the second data has been received at T2, so as to ensure that the next frame of data can be decoded to obtain the second data, thereby ensuring the continuity of data playback.

The present application also provides a low-latency playback device, the low-latency playback device comprising: a playback storage area, the playback storage area comprising a first storage area and a second storage area;

The audio module is configured to trigger decoding at a first time T1 to obtain first data, and store the first data in a playback storage area according to a first storage rule; trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule;

Among them, T1 is equal to or later than the time Tr1 when the first data is received, T2 is equal to or later than the time Tr2 when the second data is received, T2 is equal to or later than Tt, and the first data starts to be played before the second storage area is read for the first time since Tt.

In some embodiments, the device also includes: a communication module, the communication module includes a wireless transceiver and a communication storage area, the wireless transceiver is configured to receive a data packet sent by a sound source device, and the communication storage area is configured to store data to be decoded carried in the data packet, and the data to be decoded includes first data to be decoded and/or second data to be decoded.

In some embodiments, the wireless transceiver is configured to sequentially receive a first data packet and a second data packet sent by a sound source device, wherein the first data packet contains first data to be decoded and the second data packet contains second data to be decoded; and store the first data to be decoded and/or the second data to be decoded in a communication storage area.

In some embodiments, the audio module is configured to store the first data in a first storage area; and store the second data in a second storage area.

In some embodiments, the audio module is configured to divide the first data into a first portion of data and a second portion of data in a playback order, and store the first portion of data in a first area of the first storage area and the second portion of data in a second area of the second storage area according to a first storage rule, wherein the first area is adjacent to the second area; the audio module is configured to divide the second data into a third portion of data and a fourth portion of data in a playback order, and store the third portion of data in an area of the second storage area excluding the second area according to a second storage rule; and store the fourth portion of data in an area of the second storage area excluding the second area according to a second storage rule. The data is stored in an area other than the first area in the first storage area; or

The audio module is configured to divide the second data into a third part of data and a fourth part of data, and store the third part of data in an area of the second storage area except the second area according to the second storage rule; and store the fourth part of data in a third area of the fourth storage area, the third area is adjacent to the second storage area, and the fourth storage area is a storage area in the playback storage area adjacent to the second storage area.

In some embodiments, the audio playback device is configured to equally divide the first data into a first portion of data and a second portion of data, the first portion of data is stored in the second half of the first storage area, the second portion of data is stored in the first half of the second storage area, and the first portion of data and the second portion of data are adjacent; to equally divide the second data into a third portion of data and a fourth portion of data, the third portion of data is stored in the second half of the second storage area, the fourth portion of data is stored in the first half of the first storage area, or the fourth portion of data is stored in the first half of the fourth storage area, and the third portion of data is adjacent to the fourth portion of data.

In some embodiments, the audio module stores the first data in the second storage area according to the first storage rule; stores the second data in the first storage area according to the second storage rule; or stores the second data in the fourth storage area, which is a storage area in the playback storage area adjacent to the second storage area.

In some embodiments, the audio module is configured to trigger decoding to obtain the first data when the communication module receives the first data packet.

In some embodiments, the audio module is configured to trigger decoding to obtain the second data when the playback module reads a middle position of the first storage area.

In some embodiments, the audio module is configured to trigger decoding to obtain the second data when the playback module reads the end position of the first storage area.

In some embodiments, the audio module is configured to trigger decoding at a play triggering time Tt to obtain the first data.

In some embodiments, the audio module is configured to obtain data in response to a first interrupt request generated at a second time T2 to decode and obtain second data.

In some embodiments, the audio module is configured to, in response to generating a second interrupt request at a third time T3, obtain data for decoding to obtain third data, and store the third data in the playback storage area according to a third storage rule;

In some embodiments, the time interval between the first interrupt request and the second interrupt request is at least the duration of a frame of data.

In some embodiments, the first data and the second data are both one frame of data; the first storage area and the second storage area are both for storing one frame of data. the size of.

The above embodiments, implementation methods and related technical features can be combined with each other to correspond to the previous embodiments, such as embodiments A-E, when they do not conflict, and will not be described in detail. Through the above various combinations, the playback delay of the first data, such as the first frame data of music, can be reduced, so that the user can hear the music earlier when playing music. In addition, the continuity of audio playback can be guaranteed by setting parameters such as T1, T2, storage rules, etc., thereby improving the user experience.

The embodiment of the present application provides an audio playback device, comprising: a playback buffer area, the playback buffer area comprising a first buffer area and a second buffer area;

In some embodiments, the playback unit includes a digital-to-analog converter DAC and a speaker, the digital-to-analog converter is configured to perform digital-to-analog conversion on the data in the playback storage area; the speaker is configured to play the data after the digital-to-analog conversion;

The audio playback device also includes: a communication module, the communication module includes a wireless transceiver and a communication buffer area, the wireless transceiver is configured to receive a data packet sent by the audio source device, and the communication buffer area is configured to store the data to be decoded carried in the data packet, and the data to be decoded includes first data to be decoded and/or second data to be decoded.

For the settings of the relevant modules and the relevant parameters such as T1/T2/Tt in the audio playback device, reference may be made to the description of the previous relevant embodiments, which will not be repeated here.

An embodiment of the present application discloses an electronic device, comprising at least one processor, at least one wireless transceiver and at least one memory, wherein the memory stores a computer program, and the computer program is executed by the at least one processor so that the electronic device executes any one of the low-latency playback methods described in the above embodiments.

An embodiment of the present application discloses a storage medium storing a computer program, wherein when the computer program is executed by a processor, an audio playback device implements any one of the low-latency playback methods described in the above embodiments.

Those skilled in the art can understand that all or part of the processes in the above-mentioned embodiments can be implemented by instructing related hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium, and when the program is executed, it can include the processes of the embodiments of the above-mentioned methods. The storage medium can be a disk, an optical disk, a ROM, etc.

As used herein, any reference to memory, storage, database, or other medium may include nonvolatile and/or volatile memory. Suitable nonvolatile memory may include ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM), which is used as external cache memory. By way of illustration and not limitation, RAM can be in many forms, such as static RAM (SRAM), dynamic random access memory (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced synchronous DRAM (ESDRAM), synchronous link DRAM (SLDRAM), Rambus DRAM (RDRAM) and direct Rambus dynamic RAM (DRDRAM).

It should be understood that the "one embodiment" or "an embodiment" mentioned throughout the specification means that the specific features, structures or characteristics related to the embodiment are included in at least one embodiment of the present application. Therefore, the "in one embodiment" or "in an embodiment" appearing throughout the specification does not necessarily refer to the same embodiment. In addition, these specific features, structures or characteristics can be combined in one or more embodiments in any suitable manner. Those skilled in the art should also know that the embodiments described in the specification are all optional embodiments. The actions and modules are not necessarily required by this application. It should be noted that "multiple" in this application includes "two or more".

In the various embodiments of the present application, it should be understood that the size of the serial numbers of the above-mentioned processes does not necessarily mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The units described above as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above-mentioned integrated unit may be implemented in the form of hardware or in the form of software functional units.

The technical features of the above-described embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above-described embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

The above is a detailed introduction to a low-latency playback method, device, electronic device, and storage medium disclosed in the embodiments of the present application. Specific examples are used in this article to illustrate the principles and implementation methods of the present application. The description of the above embodiments is only used to help understand the method of the present application and its core idea. At the same time, for those of ordinary skill in the art, according to the ideas of the present application, there will be changes in the specific implementation methods and application scopes. In summary, the content of this specification should not be understood as a limitation on the present application.

Claims

A low-latency playback method is applied to an audio playback device, wherein the audio playback device includes a playback storage area, and the playback storage area includes a first storage area and a second storage area, wherein the method includes:

When the play triggering time Tt arrives, the play storage area is read from the first preset position of the first storage area and played;

Trigger decoding at a first time T1 to obtain first data, and store the first data in the playback storage area according to a first storage rule;

Trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule;

Among them, T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, T2 is equal to or later than Tt, and the first data starts to be played before the second storage area is read for the first time starting from Tt.
According to the method of claim 1, the audio playback device further includes a third storage area for storing data to be decoded, wherein the data to be decoded includes first data to be decoded and/or second data to be decoded; the method further includes: sequentially receiving a first data packet and a second data packet sent by a sound source device, the first data packet including the first data to be decoded, and the second data packet including the second data to be decoded; and storing the first data to be decoded and/or the second data to be decoded in the third storage area.
The method according to claim 1, wherein storing the first data in the playback storage area according to a first storage rule comprises: storing the first data in the first storage area;

The storing the second data in the playback storage area according to a second storage rule includes: storing the second data in the second storage area.
The method according to claim 3, wherein the playback trigger moment Tt is determined according to the timestamp Ts1 corresponding to the first data and a first preset time period; the first preset time period is sufficient to decode and obtain the first data and store the first data in the first storage area, and the first preset time period is less than the total time length for reading the first storage area and the second storage area; T1 is earlier than or equal to Ta, and the time from Ta to Tt is sufficient to decode and obtain the first data and store it in the first storage area; T2 is earlier than or equal to Tb, and the time from Tb to the completion of reading the first data is sufficient to decode and obtain the second data and store the second data in the second storage area.
According to the method of claim 1, wherein storing the first data in the playback storage area according to the first storage rule comprises: dividing the first data into a first portion of data and a second portion of data according to a playback order, the first portion of data is stored in a first area of the first storage area, the second portion of data is stored in a second area of the second storage area, and the first area is adjacent to the second area; storing the second data in the playback storage area according to the second storage rule comprises: dividing the second data into a third portion of data and a fourth portion of data according to a playback order, storing the third portion of data in an area of the second storage area excluding the second area; storing the fourth portion of data in an area of the first storage area excluding the first area; or,

Storing the second data in the playback storage area according to a second storage rule includes dividing the second data into a third portion of data and a fourth portion of data, storing the third portion of data in an area of the second storage area excluding the second area; storing the fourth portion of data in a third area of a fourth storage area, the third area being adjacent to the second storage area, and the fourth storage area being a storage area in the playback storage area adjacent to the second storage area.
The method according to claim 5, wherein T1 is earlier than or equal to Tc, and the time from Tc to reading the location of the first storage area for storing the first data is sufficient to decode the first data and store the first data according to the first storage rule; T2 is earlier than or equal to Td, and the time from Td to reading the first data is sufficient to decode the second data and store the second data according to the second rule; Tt is determined according to the timestamp Ts1 corresponding to the first data and a second preset time period; the second preset time period is greater than or equal to 0.
The method according to claim 5, wherein the first data is equally divided into the first portion of data and the second portion of data, the first portion of data is stored in the second half of the first storage area, the second portion of data is stored in the first half of the second storage area, and the first portion of data and the second portion of data are adjacent; the second data is equally divided into the third portion of data and The fourth part of data, the third part of data is stored in the second half of the second storage area, the fourth part of data is stored in the first half of the first storage area, or the fourth part of data is stored in the first half of the fourth storage area, and the third part of data is adjacent to the fourth part of data.
The method according to claim 1, wherein storing the first data in the playback storage area according to a first storage rule includes: storing the first data in the second storage area; storing the second data in the playback storage area according to a second storage rule includes: storing the second data in the first storage area; or storing the second data in a fourth storage area, wherein the fourth storage area is a storage area in the playback storage area adjacent to the second storage area.
The method according to claim 8, wherein T1 is earlier than or equal to Te, and the time from Te to the completion of reading the first storage area is sufficient to decode the first data and store the first data in the second storage area; T2 is less than or equal to Tf, and the time from Tf to the completion of reading the first data is sufficient to decode the second data and store the second data in the first storage area or the fourth storage area; Tt is determined according to the timestamp Ts1 corresponding to the first data and a third preset time period; the third preset time period is greater than or equal to 0 and less than the duration of reading the first storage area.
The method according to any one of claims 1-4, wherein the playback trigger time Tt is equal to the sum of the timestamp Ts1 corresponding to the first data and the duration of the half-frame data.
The method according to any one of claims 1 to 9, wherein triggering decoding at a first moment T1 to obtain the first data comprises: triggering decoding to obtain the first data upon receiving the first data packet.
The method according to any one of claims 1 to 4, wherein triggering decoding at the second time T2 to obtain the second data comprises:

When the middle position of the first storage area is read, decoding is triggered to obtain the second data.
The method according to any one of claims 1-2 or 5-9, wherein the triggering decoding at the second time T2 to obtain the second data comprises:

When the end position of the first storage area is read, decoding is triggered to obtain the second data.
The method according to any one of claims 1-2 or 5-9, wherein the Tt is equal to the timestamp Ts1 corresponding to the first data.
The method according to any one of claims 1-2 or 5-9, wherein triggering decoding at the first moment T1 to obtain the first data includes: triggering decoding at the playback triggering moment Tt to obtain the first data.
The method according to claim 2, wherein triggering decoding at the second time T2 to obtain the second data comprises:

Generate a first interrupt request at a second time T2, and acquire data in response to the first interrupt request and decode to obtain the second data;

The method further includes: generating a second interrupt request at a third time T3, decoding data acquired in response to the second interrupt to obtain third data, and storing the third data in the playback storage area according to a third storage rule;

The time interval between the first interrupt request and the second interrupt request is at least the duration of a frame of data.
The method according to claim 1, wherein the first storage area is a ping buffer, and the second storage area is a pong buffer; or, the first storage area and the second storage area are two adjacent buffer areas in a ring buffer.
According to the method according to claim 1, the audio playback device and the audio source device communicate via LE Audio Bluetooth technology.
The method according to claim 1, wherein the first data and the second data are both one frame of data; and the first storage area and the second storage area are both sized to store one frame of data.
The method according to claim 1, wherein T2 is less than or equal to Td, and the time from Td to the completion of reading the first data is sufficient to decode the second data and store the second data according to a second storage rule.
A low-latency playback method, wherein the method is applied to a playback device, wherein the playback device includes a first audio playback device and a second audio playback device, wherein the first audio playback device and the second audio playback device respectively have a playback storage area as claimed in claim 1, and the first audio playback device and the second audio playback device respectively execute the method as claimed in claim 1 to achieve synchronous playback, wherein the first data of the first audio playback device and the first data of the second audio playback device have the same timestamp Ts1; the second data of the first audio playback device and the second data of the second audio playback device have the same timestamp Ts2; the Tt is determined according to the Ts1.
A low-latency playback device, wherein the low-latency playback device comprises: a playback storage area, the playback storage area comprises a first storage area and a second storage area;

A playing module, configured to read and play the playing storage area from a first preset position of the first storage area when a playing triggering moment Tt arrives;

The audio module is configured to trigger decoding at a first time T1 to obtain first data, and store the first data in the playback storage area according to a first storage rule; trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule;

Among them, T1 is equal to or later than the time Tr1 when the first data is received, T2 is equal to or later than the time Tr2 when the second data is received, T2 is equal to or later than Tt, and the first data starts to play before the second storage area is read for the first time since Tt.
The device according to claim 22, wherein the device further comprises: a communication module, the communication module comprising a wireless transceiver and a communication storage area, the wireless transceiver is configured to receive a data packet sent by a sound source device, and the communication storage area is configured to store data to be decoded carried in the data packet, and the data to be decoded includes the first data to be decoded and/or the second data to be decoded.
An audio playback device, comprising: a playback buffer area, the playback buffer area comprising a first buffer area and a second buffer area;

A decoding unit is configured to trigger decoding at a first time T1 to obtain first data, and store the first data in the playback storage area according to a first storage rule; trigger decoding at a second time T2 to obtain second data, and store the second data in the playback storage area according to a second storage rule;

A playback unit, configured to read and play the playback storage area from a first preset position in the first storage area when a playback triggering moment Tt arrives;

Among them, T1 is equal to or later than the time Tr1 when the audio playback device receives the first data, T2 is equal to or later than the time Tr2 when the audio playback device receives the second data, T2 is equal to or later than Tt, and the first data starts to be played before the second storage area is read for the first time starting from Tt.
The audio playback device according to claim 24, wherein:

The playback unit includes a digital-to-analog converter and a speaker, wherein the digital-to-analog converter is configured to perform digital-to-analog conversion on the data in the playback storage area; and the speaker is configured to play the data after the digital-to-analog conversion;

The audio playback device also includes: a communication module, the communication module includes a wireless transceiver and a communication buffer area, the wireless transceiver is configured to receive a data packet sent by a sound source device, and the communication buffer area is configured to store data to be decoded carried in the data packet, and the data to be decoded includes the first data to be decoded and/or the second data to be decoded.
An electronic device, comprising: at least one processor, at least one wireless transceiver and at least one memory, wherein the memory stores a computer program, and the computer program is executed by the memory, so that the electronic device implements the method as described in any one of claims 1-21.
A storage medium storing a computer program, wherein the computer program, when executed by a processor, enables an audio playback device to implement the method as described in any one of claims 1 to 21.