WO2022179246A1

WO2022179246A1 - Audio playback method, device and system

Info

Publication number: WO2022179246A1
Application number: PCT/CN2021/136897
Authority: WO
Inventors: 彭正元
Original assignee: 华为技术有限公司
Priority date: 2021-02-27
Filing date: 2021-12-09
Publication date: 2022-09-01
Also published as: CN114974321B; CN114974321A

Abstract

An audio playback method, device and system, relating to the technical field of audios. The audio playback device can be adjusted to be consistent with the playing progress of an audio source, thereby improving the listening experience of users. The audio playback method comprises: after receiving audio data sent by an audio source device, the audio playback device segments the audio data into a plurality of audio segments, and adjusts an expected playback time point of subsequent audio segments according to the change trend of the number of audio segments in a cache region of the audio playback device, so as to maintain a consistent playback progress (or delivery progress) with the audio source device; or, the audio playback device can adjust the playback speed thereof according to a deviation between an actual playback time point of each audio segment and the expected playback time point, to be consistent with the playback speed (or delivery speed) of the audio source device, so as to maintain a consistent playback progress (or delivery progress) with the audio source device.

Description

An audio playback method, device and system

This application claims the priority of the Chinese patent application with the application number 202110221879.2 and the application title "An audio playback method, device and system" filed with the State Intellectual Property Office on February 27, 2021, the entire contents of which are incorporated by reference in in this application.

technical field

The present application relates to the field of audio technology, and in particular, to an audio playback method, device, and system.

Background technique

Smart devices can be connected to audio playback devices (such as headphones, speakers, etc.) through wireless communication methods such as Bluetooth and Wi-Fi, and play audio content through the audio playback devices. At the same time, the smart device plays the video screen. Due to the difference between the crystal oscillator of the smart device and the audio playback device, the playback speed of the two devices is different, and the video screen played by the smart device may appear out of sync with the audio content played by the audio playback device, especially after a long period of time. More obvious, resulting in poor user experience.

SUMMARY OF THE INVENTION

In order to solve the above technical problems, the present application provides an audio playback method, device and system. The technical solution provided by the present application can make the playback progress of the audio playback device and the audio source device (ie, the smart device) consistent, and improve the user experience, especially the listening experience.

In a first aspect, an audio playback method is provided, which is applied to a first audio playback device, and the first audio playback device communicates wirelessly with an audio source device. The method includes: receiving audio data sent by an audio source device; dividing the audio data into N audio segments; buffering the N audio segments; wherein, according to a first adjustment coefficient, an expected playback time point of each audio segment is obtained ; Play each audio fragment in turn; periodically collect the current number of cached audio fragments and the collection time point corresponding to the current number; after the period of periodic collection reaches the preset period, or After the number of collections reaches the preset number of times, the second adjustment coefficient is obtained according to the current quantity collected each time and the collection time point corresponding to the current quantity collected each time; the second adjustment coefficient is obtained according to the second adjustment coefficient. Estimated playback time point; play subsequent audio segments in sequence; wherein, N is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.

It can be understood that the change trend of the data amount of the buffer area of the first audio playback device reflects the deviation of the playback speed of the audio source device and the first audio playback device. Therefore, according to the change trend of the data volume of the buffer area, the time point at which each audio fragment in the audio data is expected to be played (referred to as the expected playback time point) can be adjusted to synchronize the playback of the first audio playback device and the audio source device. effect of speed. In this way, data overflow or exhaustion in the buffer area of the first audio playback device can be avoided, so as to avoid the situation that the sound is stuck or popped when the audio is played, and the listening experience of the externally played audio can be improved.

In a possible implementation manner, before receiving the audio data sent by the audio source device, the method further includes: receiving an instruction to play audio sent by the audio source device.

In a possible implementation manner, the second adjustment coefficient is obtained according to the current quantity collected each time and the collection time point corresponding to the current quantity collected each time; Perform linear fitting at the acquisition time point corresponding to the current number of , to obtain the first slope; and obtain the second adjustment coefficient according to the first slope.

Exemplarily, with time as the X-axis, and the number of the first audio segments buffered by the first audio playback device as the Y-axis, discrete points are drawn on a two-dimensional plane. In other words, each discrete point drawn is used to represent the number of the first audio segments collected at the corresponding collection time point. Then, a linear regression is performed on the discrete points to obtain a straight line, and the slope of the straight line (ie, the first slope) represents the change trend of the increase or decrease of the buffered first audio segment, that is, the increase per unit time. Or reduce the number of first audio slices. When the first slope is a positive value, it indicates the number of first audio segments added per unit time, which also means that the playback speed (or delivery speed) of the audio source device is faster than the playback speed of the first audio playback device. When the first slope is a negative value, it means that the number of first audio segments reduced per unit time, which also means that the playback speed (or delivery speed) of the audio source device is slower than the playback speed of the first audio playback device. By calculating the adjustment coefficient through the first slope, it is possible to adjust the playback of the subsequent second audio segment as quickly as possible so as to quickly keep the playback speed or delivery speed of the audio source device.

In a possible implementation, periodically collecting the current number of buffered audio segments and the collection time point corresponding to the current number; including: when the actual playback time point and the expected playback time point of any audio segment are two When the absolute value of the difference is greater than the first threshold, the first audio playback device starts to periodically collect the current number of buffered audio fragments and the collection time point corresponding to the current number; wherein, the actual playback time of the audio fragment The point is the expected output time point of the speaker of the audio fragment; the expected output time point of the speaker of the audio fragment is obtained by the first audio playback device calling the interface of the audio output driver of the first audio playback device to query.

Thus, an opportunity to start collecting the number of buffered first audio segments is provided.

In a second aspect, an audio playback method is provided, which is applied to a first audio playback device, and the first audio playback device communicates wirelessly with an audio source device. The method includes: receiving audio data sent by an audio source device; dividing the audio data into N audio segments; buffering the N audio segments; wherein, according to a first adjustment coefficient, an expected playback time point of each audio segment is obtained ; Play each audio segment in turn; after the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is greater than the preset threshold, adjust the number of cached audio segments; The actual playback time point is the expected output time point of the speaker of the audio fragment; the expected output time point of the speaker of the audio fragment is obtained by the first audio playback device calling the interface of the audio output driver of the first audio playback device; N is greater than or equal to 2 is a positive integer; the first adjustment coefficient is a preset coefficient.

Thus, a method for adjusting the playback speed of the first audio playback device is provided, which can be consistent with the playback speed or delivery speed of the audio source, which is beneficial to keep the playback progress or delivery schedule consistent with the audio source for a long time.

In a possible implementation manner, after the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is greater than a preset threshold, adjust the number of buffered audio segments; including: in the audio segment If the absolute value of the difference between the actual playback time point and the expected playback time point is greater than the preset threshold, and the difference is a negative value, add the first number of audio fragments; between the actual playback time point of the audio fragment and the expected playback time point After the absolute value of the difference between the time points is greater than the preset threshold, and the difference is a positive value, the first number of audio segments are deleted.

In a possible implementation manner, the first quantity is related to the quotient of the absolute value of the difference divided by the playing duration of the audio segment.

In a possible implementation manner, the first quantity is the quotient of the absolute value of the difference divided by the playing duration of the audio segment.

In a possible implementation manner, the added first number of audio segments are mute data.

In a possible implementation, after the absolute value of the difference between the actual playback time point of the audio fragment and the expected playback time point is less than or equal to a preset threshold, the playback speed of the first audio playback device is adjusted.

In a possible implementation manner, after the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is less than or equal to a preset threshold, collect the actual playback time point of the audio segment, and the audio segment The difference between the actual playback time point and the expected playback time point of the film; after the number of collections reaches the preset number of times, or after the collection time reaches the preset time, for the actual playback time point of each collection, each collection Perform linear fitting on the difference corresponding to the actual playback time point of , to obtain the second slope; obtain the current playback speed of the first audio playback device; obtain the adjusted playback speed according to the current playback speed and the second slope; Playback speed, play subsequent audio segments in sequence.

Thus, a specific method for calculating the speed deviation between the first audio playback device and the audio source device is provided.

In a possible implementation manner, the first audio playback device is connected to a second audio playback device, and the method further includes: sending N audio segments to the second audio playback device.

That is to say, the first audio playback device can play audio together with the second audio playback device. When the playback synchronization between the first audio playback device and the audio source device is realized, the playback of the second audio playback device and the audio source device is also realized. Synchronize. Further, the second audio playback device can also adjust the playback speed of the second audio playback device by using the same method for adjusting the playback speed as the first audio playback device.

In a possible implementation manner, before the first audio playback device plays the first audio segment, the method further includes: sending an instruction to start playing the audio segment to the second audio playback device.

In a third aspect, a first audio playback device is provided. The first audio playback device includes a processor, an audio output device, and a memory, the audio output device and the memory are both coupled to the processor, and the memory is used for storing a computer program, and when the computer program is executed by the processor, the first audio playback device is executed. The first aspect and the method in any possible implementation manner of the first aspect, or the second aspect and the method in any possible implementation manner of the second aspect.

In a fourth aspect, an apparatus is provided. The apparatus is included in a first audio playback device, and the apparatus has the function of implementing the behavior of the first audio playback device in any of the above-mentioned aspects and possible implementation manners of the above-mentioned aspects. This function can be implemented by hardware or by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the above-mentioned functions. For example, a communication module or unit, a processing module or unit, and a playback module or unit, etc.

In a fifth aspect, a computer-readable storage medium is provided. The computer-readable storage medium includes a computer program that, when the computer program runs on the first audio playback device, causes the first audio playback device to perform the above-mentioned first aspect and any possible implementation of the first aspect. method, or perform the method in the second aspect and any possible implementation manner of the second aspect.

In a sixth aspect, a computer program product is provided. When the computer program product runs on the computer, it causes the computer to execute the method in the first aspect and any possible implementation of the first aspect, or execute the second aspect and any possible implementation of the second aspect. method in method.

In a seventh aspect, a chip system is provided. The chip system includes a processor, and when the processor executes an instruction, the processor executes the first aspect and the method in any possible implementation manner of the first aspect, or executes any of the second aspect and the second aspect. method in one possible implementation.

In an eighth aspect, a system is provided. The system includes an audio source playback device and a first audio playback device, where the first audio playback device executes the first aspect and the method in any possible implementation manner of the first aspect, or executes the second aspect and the first aspect. The method in any possible implementation manner of the two aspects.

In a possible implementation manner, the system further includes a second audio playback device, and the second audio playback device executes the method in the second aspect and any possible implementation manner of the second aspect.

It can be understood that the first audio playback device described in the third aspect, the apparatus described in the fourth aspect, the computer storage medium described in the fifth aspect, the computer program product described in the sixth aspect, and the seventh aspect For the described chip system and the beneficial effects that can be achieved by the system described in the eighth aspect, reference may be made to the beneficial effects in the first aspect or the second aspect and any possible design manner thereof, and details are not repeated here.

Description of drawings

1 is a schematic diagram of a scene of an audio playback method provided by an embodiment of the present application;

2 is a schematic structural diagram of an audio source device provided by an embodiment of the present application;

3 is a schematic structural diagram of an audio playback device provided by an embodiment of the present application;

4 is a schematic flowchart of an audio playback method provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of a fitting method for a trend in the number of audio fragments buffered by an audio playback device according to an embodiment of the present application;

6 is a schematic flowchart of an audio playback method provided by an embodiment of the present application;

7 is a schematic diagram of a fitting method of a variation trend of the difference between an actual playback time point and an expected playback time point of an audio playback device provided by an embodiment of the present application;

8 is a schematic flowchart of an audio playback method provided by an embodiment of the present application;

9 is a schematic flowchart of an audio playback method provided by an embodiment of the present application;

FIG. 10 is a schematic structural diagram of a chip system provided by an embodiment of the present application.

Detailed ways

In the description of the embodiments of the present application, unless otherwise specified, "/" means or. For example, A/B can mean A or B. "And/or" in this document is only an association relationship to describe the associated objects, indicating that three kinds of relationships can exist. For example, A and/or B can mean that A exists alone, A and B exist at the same time, and B exists alone.

Hereinafter, the terms "first" and "second" are only used for descriptive purposes, and should not be construed as indicating or implying relative importance or implicitly indicating the number of indicated technical features. Thus, a feature defined as "first" or "second" may expressly or implicitly include one or more of that feature. In the description of the embodiments of the present application, unless otherwise specified, "plurality" means two or more.

In the embodiments of the present application, words such as "exemplary" or "for example" are used to represent examples, illustrations or illustrations. Any embodiments or designs described in the embodiments of the present application as "exemplary" or "such as" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present the related concepts in a specific manner.

FIG. 1 is a schematic diagram of a scenario of an audio playback method provided by an embodiment of the present application. FIG. 1 shows a communication system provided by an embodiment of the present application. The communication system includes an audio source device 100 and an audio playback device 200 . Optionally, the communication system may further include an audio playback device 300 .

The audio source device 100 is configured to provide audio content to the audio playback device 200 . Exemplarily, the audio source device 100 in this embodiment of the present application may be, for example, a mobile phone, a tablet computer, a personal computer (PC), a personal digital assistant (PDA), a netbook, a wearable device, or an augmented reality technology. (augmented reality, AR) device, virtual reality (virtual reality, VR) device, in-vehicle device, smart screen, etc. The specific form of the audio source device 100 is not particularly limited in this application.

The audio playback device 200 is configured to receive the audio content sent by the audio source device 100 and play the audio content. Exemplarily, the audio playback device 200 may be, for example, a wireless headset, a wireless speaker, a wearable device, an AR device, a VR device, etc. The specific form of the audio playback device 200 is not particularly limited in this application.

In an application scenario, when the audio source device 100 plays a video, it can play the screen content of the video through its own display screen, and send the audio content in the video to the audio playback device 200; the audio playback device 200 plays the audio content. .

Generally, in order to reduce the influence of network jitter during wireless transmission, the audio playback device 200 buffers the audio content received from the audio source device 100 and delays the playback. In addition, because the audio source device 100 and the audio playback device 200 are different devices and have hardware differences (for example, different crystal oscillator frequencies), the playback speeds of the two devices will be different, thereby causing data overflow or consumption in the buffer area of the audio playback device 200. However, the sound stutters or pops when playing audio.

In one solution, a maximum threshold and a minimum threshold for buffering data of the audio playback device 200 may be set. When the data in the buffer area of the audio playback device 200 is greater than the maximum threshold, the playback speed of the audio playback device 200 is increased according to a certain ratio. When the data in the buffer area of the audio 200 is smaller than the minimum threshold, the playback speed of the audio playback device 200 is reduced according to a certain ratio. Therefore, the amount of data buffered by the audio playback device 200 is kept within a preset range, and the situation of data overflow or exhaustion in the buffer area of the audio playback device 200 is reduced.

In this solution, when the wireless network transmission speed is unstable, the amount of data in the buffer area of the audio playback device 200 will change constantly, and it may be necessary to adjust the playback speed of the audio playback device 200 frequently, resulting in fast and slow playback of audio content, and user experience not good. In addition, adjusting the playback speed of the audio playback device 200 is usually adjusted according to a fixed ratio, which does not match the actual playback speed, and the adjustment accuracy of the playback speed is not high.

Therefore, the technical solution provided by the present application can make the playback progress of the audio playback device and the audio source device (ie, the smart device) consistent, and improve the user experience, especially the listening experience.

Exemplarily, FIG. 2 shows the hardware structure of the audio source device 100 .

The audio source device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2. Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, And a subscriber identification module (subscriber identification module, SIM) card interface 195 and so on. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the audio source device 100 . In other embodiments of the present application, the audio source device 100 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the audio source device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.

Internal memory 121 may be used to store computer executable program code, which includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area can store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like. The storage data area may store data (such as audio data, phone book, etc.) created during the use of the audio source device 100 and the like. In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like. The processor 110 executes various functional applications and data processing of the audio source device 100 by executing instructions stored in the internal memory 121, and/or instructions stored in a memory provided in the processor.

In some embodiments, the processor 110 may include one or more interfaces, such as including a USB interface 130, which is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to charge the audio source device 100, and can also be used to transmit data between the audio source device 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones. The interface can also be used to connect other electronic devices, such as AR devices.

The wireless communication function of the audio source device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The mobile communication module 150 may provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the audio source device 100 .

The wireless communication module 160 can provide applications on the audio source device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation Satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .

In the embodiment of the present application, the audio source device 100 may establish a communication connection with the audio playback device 200 through the wireless communication module 160, and send the audio content to be played to the audio playback device 200 through a wireless connection, and the audio playback device 200 play. The to-be-played audio content may be sound content in the video, or may be independent audio, such as music. In some examples, the audio playback device 200 may further forward the audio content to the audio playback device 300 again, and the audio playback device 200 and the audio playback device 300 play together the audio content. The audio playback device 200 is the master playback device, the audio playback device 300 is the slave playback device, and the number of audio playback devices 300 is one or more.

The audio source device 100 implements a display function through a GPU, a display screen 194, an application processor, and the like.

The audio source device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playback, recording, etc.

Exemplarily, FIG. 3 shows the hardware structure of the audio playback device 200 .

The audio playback device 200 may include a processor 210, a memory 220, a wireless communication module 230, an antenna 240, a speaker 250, a power module 260, and the like. It can be understood that the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the audio playback device 200 . In other embodiments of the present application, the audio playback device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 210 may include one or more processing units. For example, the processor 210 may include a distribution module, a playback module, a cache module, an adjustment coefficient calculation module, a broadcast speed module, and the like. Optionally, the processor 210 may further include a clock synchronization module and the like. The specific functions of each module will be described in detail below with reference to specific embodiments. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

Memory 220 may be used to store computer-executable program code, which includes instructions. In some examples, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), and the like. The processor 210 executes various functional applications and data processing of the audio playback device 200 by executing the instructions stored in the memory 220 and/or the instructions stored in the memory provided in the processor.

The wireless communication function of the audio playback device 200 may be implemented by the antenna 240 , the wireless communication module 230 , the modem processor in the processor 210 , the baseband processor, and the like.

The wireless communication module 230 can provide a wireless communication solution including WLAN (eg Wi-Fi network), BT, GNSS, FM, NFC, IR, etc. applied on the audio playback device 200 . The wireless communication module 230 may be one or more devices integrating at least one communication processing module. The wireless communication module 230 receives the electromagnetic wave via the antenna 2 , modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor 210 . The wireless communication module 230 can also receive the signal to be sent from the processor 210 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 240 .

In this embodiment of the present application, the audio playback device 200 may establish a communication connection with the audio source device 100 through the wireless communication module 230 , receive audio content sent by the audio source device 100 through the wireless connection, and play through the speaker 250 . In some examples, the audio playback device 200 may also establish a communication connection with other audio playback devices 300 through the wireless communication module 230 . Then, the audio playback device 200 can forward the audio content to the audio playback device 300 again, and the audio playback device 200 and the audio playback device 300 play together the audio content to play a sound with stereo effect. The audio playback device 200 is a master playback device, the audio playback device 300 is a slave playback device, and the number of audio playback devices 300 is one or more.

The power supply module 260 provides power for various components of the audio playback device 200, such as power supply for the processor 210, the memory 220, the wireless communication module 230, and the like.

It should be noted that the structure of the audio playback device 300 may refer to the audio playback device 200. Of course, the structure of the audio playback device 300 may be the same as or different from that of the audio playback device 200, which is not limited in this application.

The technical solutions provided in the embodiments of the present application are applicable to the communication system shown in FIG. 1 , and the audio source device 100 has the structure shown in FIG. 2 , and the audio playback device 200 has the structure shown in FIG. 3 .

In a technical solution provided by the present application, the audio playback device 200 collects the size of the data volume of the buffer area at multiple adjacent time points (or moments), and calculates the change trend of the buffer area data volume by means of linear fitting . The change trend of the data amount of the buffer area reflects the deviation of the playback speed of the audio source device 100 and the audio playback device 200 . Therefore, according to the change trend of the data volume of the buffer area, the time point at which each audio fragment in the audio content is expected to be played (referred to as the expected playback time point) can be adjusted to synchronize the playback of the audio playback device 200 and the audio source device 100. The effect of speed improves the user's listening experience.

In yet another technical solution provided by the present application, since the playback speeds of the audio playback device 200 and the audio source device 100 are different, the playback speed of the audio playback device 200 can also be adjusted so that the playback speed of the audio playback device 200 is the same as that of the audio source device. The playback speed of 100 remains the same. Specifically, the audio playback device 200 may collect the expected playback time point in the audio segment, and the deviation from the corresponding time point when the audio playback device 200 actually starts playing the audio segment (referred to as the actual playback time point of the audio segment). , and then calculate the variation trend of the deviation by linear fitting; and use the variation trend to adjust the playback speed of the audio playback device 200 so that the playback speed of the audio playback device 200 is consistent with the playback speed of the audio source device 100 .

In another application scenario, when the audio source device 100 plays a video, it can play the screen content of the video through its own display screen, and send the audio content in the video to the audio playback device 200 and the audio playback device 300. The device 200 and the audio playback device 300 jointly play audio content. The audio playback device 200 is the master playback device, and the audio playback device 300 is the slave playback device.

For the synchronous playback between the audio source device 100 and the audio playback device 200, reference may be made to the description in the previous application scenario, which will not be repeated here. Usually, the audio playback device 200 and the audio playback device 300 are devices of the same manufacturer, and can perform clock synchronization. Even if clock synchronization is performed, the audio playback device 200 and the audio playback device 300 are still different devices, and there are still differences in playback speed due to hardware differences (eg, different crystal oscillator frequencies). When the playback time is prolonged, the playback progress of the two devices is still out of sync.

Similarly, the audio playback device 300 can also collect the expected playback time point in the audio segment, the deviation from the time point when the audio playback device 300 actually starts to play the audio segment (referred to as the actual playback time point), and then perform linear fitting. The variation trend of the deviation is calculated by using the variation trend;

In another application scenario, when the audio source device 100 plays audio (eg, music), the audio may be directly sent to the audio playback device 200 , and the audio playback device 200 plays the audio content.

Generally, the speed at which the audio source device 100 sends the audio stream to the audio playback device 200 (also referred to as the delivery speed) is different from the playback speed of the audio playback device 200, which may still cause data overflow or consumption of the audio playback device 200 in the buffer area. However, when the audio is played, the sound becomes stuttered or popped. Therefore, the audio playback device 200 needs to adjust the playback speed to be consistent with the delivery speed of the audio source device 100, so as to avoid data overflow or exhaustion in the buffer area of the audio playback device 200, thereby avoiding the occurrence of sound stuttering or popping when playing audio. .

In yet another application scenario, when the audio source device 100 plays audio (for example, music), the audio can be directly sent to the audio playback device 200 and the audio playback device 300, and the audio playback device 200 and the audio playback device 300 jointly play the audio. content.

Then, the audio playback device 200 needs to adjust the playback speed to be consistent with the delivery speed of the audio source device 100 , and the audio playback device 300 also needs to adjust the playback speed to be consistent with the delivery speed of the audio source device 100 .

The technical solutions of the present application will be described below with reference to the accompanying drawings.

FIG. 4 shows the flow of an audio playback method provided by the present application. As shown in Figure 4, the audio playback method may include:

S401. The audio source device 100 receives an instruction to play audio.

Exemplarily, the audio source device 100 is connected with at least one audio playback device, for example, the audio playback device 200 . Wherein, the audio source device 100 may establish a wireless connection with the audio playback device 200 through the wireless communication module 160, and the adopted wireless connection mode may be, for example, Bluetooth, WLAN, NFC, or the like.

When the user operates on the audio source device 100 and instructs to start playing the video, the audio source device 100 can play the screen content of the video, and send the audio content (that is, the audio stream) of the video to the audio playback device 200 for playback by the audio playback device 200 , that is, to realize the external playback of audio. Alternatively, the user operates on the audio source device 100 and instructs to start playing pure audio (eg, music, recording, etc.), and the audio source device 100 sends the audio stream to the audio playback device 200 .

It should be noted that when the audio source device 100 plays a video, the screen content on the audio source device 100 side should be consistent with the playback speed of the audio content on the audio playback device 200 side. When the audio source device 100 plays pure audio, the speed at which the audio source device 100 delivers the audio stream to the audio playback device 200 should be consistent with the playback speed of the audio content on the audio playback device 200 side.

S402 , the audio source device 100 sends an audio stream to the audio playback device 200 .

S403. The audio playback device 200 encodes and decodes the received audio stream.

Exemplarily, taking the audio playback device 200 including a distribution module, an adjustment coefficient calculation module, a playback module and a buffer module as an example, the distribution module receives the audio stream sent by the audio source device 100, and encodes the received audio stream. Decode to obtain audio data that conforms to its own playback format. The encoding and decoding are related to the number of channels, the number of sampling bits, and the sampling frequency of the audio playback device 200 . For the specific encoding and decoding process, reference may be made to related audio encoding and decoding technologies, which will not be described here.

S404. The audio playback device 200 segments the encoded and decoded audio stream, and calculates the expected playback time point of each audio segment according to the adjustment coefficient.

Exemplarily, taking the audio playback device 200 including a distribution module, an adjustment coefficient calculation module, a playback module and a cache module as an example for description, then step S404 may specifically include steps S404a to S404e.

S404a: After obtaining the encoded and decoded audio stream, the distribution module requests the adjustment coefficient calculation module to obtain the current adjustment coefficient.

S404b, the adjustment coefficient calculation module returns the current adjustment coefficient to the distribution module.

It should be noted that the adjustment coefficient calculation module can periodically update the value of the adjustment coefficient, and the distribution module calculates the expected playback time point of each audio segment according to the current latest adjustment coefficient. For the calculation process of updating the adjustment coefficient by the adjustment coefficient calculation module, reference may be made to the following step S406. The initial value of the adjustment coefficient can be set to 1.

S404c, the distribution module segments the encoded and decoded audio stream, and calculates the expected playback time point of each audio segment.

Exemplarily, the data structure of each audio fragment after fragmentation may be:

data type

-index:int

-len:int

-playtime: long long

Among them, index is the number of the audio segment, which starts from 1 and increases sequentially.

len, is the data length of the audio fragment. The relationship between the data length and the playback duration of the audio segment is: len=number of channels*sampling bits*sampling rate*audio segment playback duration/8. The playback duration of the audio segment is a preset value, for example, 10ms (millisecond, millisecond). In other words, the data length of the audio slice is a fixed value.

For example, the number of channels of the audio playback device 200 is 1, the number of sampling bits is 32, the sampling rate is 96KHz, and the playback duration of the audio segment is 10ms, then len=1*32*96*10/8=3840ms.

playtime, is the expected playback time point of the audio segment, for example, the unit is μs (microsecond, microsecond). Wherein, the expected playback time point of the first audio segment (that is, audio segment 1#)=current time+preset delay time (for example, 1s). The preset delay time may enable the audio playback device 200 to cache the data of the audio segment, so as to prevent abnormal playback caused by network transmission jitter. Expected playback time point of the N th audio segment = expected playback time point of the first audio segment + (N-1)*audio segment playback duration*1000*adjustment coefficient.

It can be seen that the expected playback time point of the audio segment in this application is determined according to the expected playback time point of the first audio segment, the number of the audio segment, and the adjustment coefficient. The adjustment coefficient is dynamically changed according to the number of audio segments buffered in the audio playback device 200 . The following step S406 will describe the calculation method of the adjustment coefficient in detail, and will not be described here for the time being. The initial value of the adjustment coefficient is 1.

Following the above example, the current time is 2020/11/11 00:00:00.000 000. The preset delay time is 1s, then the expected playback time point of the first audio segment is 2020/11/11 00:00:01.000 000. Expected playback time point of the second audio segment = 2020/11/11 00:00:01.000 000+(2-1)*10*1=2020/11/11 00:00:01.010 000=2020/11/ 11 00:00:01.010 000.

The data information of the first audio segment is shown in Table 1:

Table I

index index	11
lenlen	38403840
playTimeplayTime	2020/11/11 00:00:01.000 0002020/11/11 00:00:01.000 000

The data information of the second audio segment is shown in Table 2:

Table II

index index	22
lenlen	38403840
playTimeplayTime	2020/11/11 00:00:01.010 0002020/11/11 00:00:01.010 000

It should be noted that, in this article, the unit of the audio segment playback time is ms (milliseconds), and the expected playback time point of the audio segment is μs (microseconds).

It should also be noted that, in some embodiments, the distribution module may periodically request the current adjustment coefficient from the adjustment coefficient calculation module, so as to calculate the expected playback time point of the audio segment according to the current adjustment coefficient. Alternatively, the distribution module may also request the current adjustment coefficient from the adjustment coefficient calculation module after receiving the audio stream data of a specific amount of data, so as to calculate the expected playback time point of the audio segment according to the current adjustment coefficient. In other embodiments, after the adjustment coefficient is updated by the adjustment coefficient calculation module, the updated adjustment coefficient may also be sent to the distribution module, so that the distribution module calculates the expected playback time point of the audio segment according to the updated adjustment coefficient. In other words, the distribution module can passively receive the latest adjustment coefficient sent by the adjustment coefficient calculation module to calculate the expected playback time point of the audio segment.

S404d. The distribution module sends each audio segment to the cache module.

The distribution module sends the generated audio segments to the cache module in turn for caching. Each audio segment carries an estimated playback time point.

S404e, the cache module caches each audio fragment.

S405. When the current time is the expected playback time point of the first audio segment (that is, audio segment 1#), the audio source device 100 starts audio playback.

Exemplarily, the audio playback device 200 includes a distribution module, an adjustment coefficient calculation module, a playback module, and a cache module as an example for description. Then, step S405 may specifically include step S405a and step S405b.

S405a, when the current time is later than or equal to the expected playback time point of the first audio segment, the distribution module notifies the playback module to start audio playback.

S405b, the playback module reads the data of the first audio segment and the data of the subsequent audio segments from the cache module, and starts to play each audio segment in sequence.

In some embodiments, after the playback module reads the data of the audio segment from the cache module, the data of the audio segment is written into an audio output driver in the playback module, such as an advanced Linux sound architecture (ALSA). The expected output time corresponding to the currently written audio segment is obtained through the interface of the audio output driver, and the expected output time may be considered as the actual playback time point of the audio segment.

Case 1. If the actual playback time point of the audio fragment is later than the expected playback time point carried in the audio fragment, it indicates that the playback progress of the audio playback device 200 is slower than the playback progress of the audio source device 100, and the playback module can notify the cache module to delete it. Part of the audio is fragmented so that the playback progress of the audio playback device 200 is the same as the playback progress of the audio source device 100 as soon as possible. For example, the number of audio segments deleted by the audio playback device 200 may be determined according to the quotient of dividing the absolute value of the difference by the playback duration of the audio segments (ie, the data length of each audio segment). If the quotient of the absolute value of the difference divided by the playback duration of the audio segment (that is, the data length of each audio segment) is an integer, then the number is equal to the absolute value of the difference divided by the playback duration of the audio segment. business. If the quotient of the absolute value of the difference divided by the playing duration of the audio segment is not an integer, the quotient may be rounded, and the integer obtained after the rounding will be used as the number of deleted audio segments. Among them, the rounding method may be rounding, rounding up, rounding down, etc.

For example, if the difference between the actual playback time point of the audio segment minus the expected playback time point is 0.6s (600ms), and the data length of each audio segment is 10ms, then the cache module needs to delete 600ms/10ms=60 audios Fragmentation.

Case 2. If the actual playback time point of the audio fragment is earlier than the expected playback time point carried in the audio fragment, it indicates that the playback progress of the audio playback device 200 is faster than the playback progress of the audio source device 100, and the playback module can notify the cache module to increase Partial audio fragmentation. The added audio segment may be muted audio data, or may be copying the currently written audio segment data or other audio data. This can be equivalent to the audio playback device 200 playing subsequent audio segments after waiting for a corresponding time, so that the audio playback device 200 and the audio source device 100 have the same playback progress.

For the number of added audio segments, reference may be made to the calculation method of the number of deleted audio segments in Case 1, which is not repeated here.

For example, if the difference between the actual playback time point of the audio segment minus the expected playback time point is -0.6s (600ms), and the data length of each audio segment is 10ms, then the cache module needs to be increased by 600ms/10ms=60 Audio fragmentation. The data of the added audio segment is all 0, that is, the mute data.

It should be emphasized that, as explained above, the expected playtime (playtime) carried by each audio segment here is determined according to the number of the audio segment, the expected playback time of the first audio segment, and the adjustment coefficient. The adjustment coefficient is dynamically changed according to the number of audio segments buffered in the audio playback device 200 . The calculation and update process of the adjustment coefficient will be described in detail below.

Specifically, when the audio playback device 200 determines that the current time is the expected playback time point of the first audio fragment, after the audio playback is started, it also records the number of audio fragments buffered in the audio playback device 200 and the time variation characteristics, And calculate the adjustment coefficient according to the change characteristic. That is to say, when the audio playback device 200 performs step S405, step S406 is also performed, and the details are as follows:

S406. The audio source device 100 records the variation characteristics of the number and time of the audio segments buffered in the audio playback device 200, and calculates an adjustment coefficient according to the variation characteristics.

Exemplarily, the audio playback device 200 includes a distribution module, an adjustment coefficient calculation module, a playback module, and a cache module as an example for description. Then, step S406 may specifically include steps S406a to S406d.

S406a. After receiving the notification of starting audio playback, the playback module notifies the adjustment coefficient calculation module to periodically collect the number of audio fragments in the cache module.

In some embodiments, the playback module may immediately notify the adjustment coefficient calculation module to periodically collect the number of audio fragments in the cache module after receiving the notification for starting audio playback, or after receiving the notification for starting audio playback After a period of time (for example, 1 second), the adjustment coefficient calculation module is notified to start the collection. In other embodiments, the playback module may further notify the player when detecting that the absolute value of the difference between the expected playback time point of an audio segment and the actual playback time point is greater than the preset threshold A (or other thresholds) The adjustment coefficient calculation module starts to collect. That is to say, the present application does not specifically limit the timing when the adjustment coefficient calculation module starts to periodically collect the number of audio fragments in the buffer module.

S406b, the adjustment coefficient calculation module periodically collects the number of audio fragments in the buffer module.

In some embodiments, the adjustment coefficient calculation module may set a timer, and then periodically collect the number of audio fragments currently stored in the buffer module from the buffer module, for example, once every 200 μs. In other embodiments, the adjustment coefficient calculation module may also instruct the cache module to periodically report the number of audio fragments stored by itself. That is, the adjustment coefficient calculation module sends an indication of the number of audio fragments that are periodically collected and buffered to the buffering module. After receiving the instruction, the cache module sets a timer and periodically reports the number of audio fragments stored by itself.

S406c, the adjustment coefficient calculation module stores the number of the collected audio fragments and the collection time point in the cache module.

Exemplarily, the adjustment coefficient calculation module records the collection time point and the number of audio fragments in the buffer module collected at each collection time point in the sampling queue 1, and the content stored in the sampling queue 1 is shown in Table 3.

Table 3

S406d, when the preset conditions are met, the adjustment coefficient calculation module calculates and updates the adjustment coefficient according to the data of the audio fragment in the collected buffer module and the collection time point.

The preset condition may be that the number of data pieces in the sampling queue 1 (for example, each row of data in Table 3 is one piece of data) reaches a predetermined number, such as 100 pieces, or it may be the preset time after the last calculation and update of the adjustment coefficient segment (eg 3 minutes).

In some embodiments, the adjustment coefficient calculation module can perform linear fitting on the data in the sampling queue 1 to obtain the change trend of the audio fragments in the sampling queue 1 (that is, in the cache module), that is, the number of audio fragments in the sampling queue 1 is the same as the number of audio fragments in the sampling queue 1. time relationship.

Generally, the audio source device 100 sends an audio stream to the audio playback device 200 according to its own playback progress. The buffering module of the audio playback device 100 buffers the received audio stream to obtain a buffer queue. It can be understood that the head of the buffer queue is the audio fragment obtained from the audio stream received first, and the tail of the buffer queue is the audio fragment obtained from the audio stream received later. It can be seen that the playback speed of the audio source device 100 (or the speed of delivering the audio stream to the audio playback device 200, referred to as the delivery speed for short) affects the increasing speed of the audio fragments at the end of the cache queue. When the audio playback device 200 starts audio playback, it acquires audio segments from the head of the cache queue for playback, and deletes the played audio segments. It can be seen that the playback speed of the audio playback device 200 affects the reduction speed of the audio fragment at the head of the buffer queue. On the whole, the difference between the playback speed (or delivery speed) of the audio source device 100 and the playback speed of the audio playback device 200 is reflected in the changing trend of the number of audio fragments in the buffer queue. Since the network transmission between the audio source device 100 and the audio playback device 200 also affects the number of audio fragments at individual time points in the buffer queue, individual abnormal data can be excluded by linear fitting.

Specifically, as shown in FIG. 5 , with time as the X-axis, and the number of audio fragments in the cache module as the Y-axis, discrete points are drawn on a two-dimensional plane. In other words, each discrete point in FIG. 5 is used to represent the number of audio segments collected at the corresponding collection time point. Then, the straight line in Figure 5 is obtained after linear regression is performed on the discrete points. The slope of the straight line (denoted as slope 1) represents the change trend of the increase or decrease of audio fragments in the cache module, that is, per unit time. Increase or decrease the number of audio slices within. When the slope 1 is a positive value, it indicates the number of audio segments added per unit time, which also means that the playback speed (or delivery speed) of the audio source device 100 is faster than the playback speed of the audio playback device 200 . When the slope 1 is a negative value, it indicates the number of audio segments reduced per unit time, which also means that the playback speed (or delivery speed) of the audio source device 100 is slower than the playback speed of the audio playback device 200 .

Then, formula (1) can be used to calculate the adjustment coefficient:

Adjustment coefficient=1-(slope 1*value of audio segment playback duration/1000) Formula (1)

The value of the playback duration of the audio segment is the value when the unit of the playback duration of the audio segment is milliseconds; the value has no unit. Adjustment factor and slope 1 have no unit.

After the adjustment coefficient calculation module obtains the latest adjustment coefficient, the data in the collection queue is cleared. Subsequently, the adjustment coefficient for the next cycle will be calculated according to the number of audio fragments in the cache module collected in the next cycle.

Formula (2) has been obtained in step S404c:

The expected playback time point of the Nth audio segment = the expected playback time point of the first audio segment+(N-1)*audio segment playback duration*1000*adjustment coefficient formula (2)

It can be understood that when the slope 1 is a positive value, it means that the playback speed (or delivery speed) of the audio source device 100 is faster than the playback speed of the audio playback device 200, then the playback progress of the audio source device 100 is also faster than that of the audio playback device 200. playback progress. According to formula (1), it can be deduced that the adjustment coefficient is less than 1, then according to formula (2), it can be deduced that the expected playback time point of the Nth audio segment is also smaller (compared to when the adjustment coefficient is 1), that is The estimated playback time of the Nth audio segment is advanced. In other words, the playback progress of the audio playback device 200 is accelerated, which facilitates catching up with the playback progress of the audio source device 100 as soon as possible.

When the slope 1 is negative, it means that the playback speed (or delivery speed) of the audio source device 100 is slower than the playback speed of the audio playback device 200, and the playback progress of the audio source device 100 is also slower than that of the audio playback device 200. . According to formula (1), it can be deduced that the adjustment coefficient is greater than 1, then according to formula (2), it can be deduced that the expected playback time point of the N-th audio segment is also larger (compared to when the adjustment coefficient is 1), that is The estimated playback time of the Nth audio segment is delayed. In other words, the playback progress of the audio playback device 200 is slowed down so as to be the same as the playback progress of the audio source device 100 .

To sum up, since the changing trend of the number of audio fragments cached in the audio playback device 200 reflects the difference between the playback speed (or delivery speed) of the audio playback device 200 and the playback speed of the audio source device 100, The change trend of the number of cached audio fragments in 200 updates the adjustment coefficient, and then calculates the expected playback time point of the audio fragment according to the adjustment coefficient, which can make the playback speed of the audio playback device 200 and the playback speed (or delivery speed) of the audio source device 100. Consistent. In this way, data overflow or exhaustion in the buffer area of the audio playback device 200 can be avoided, thereby avoiding the situation of sound stuttering or popping when playing the audio, and improving the listening experience of the externally played audio.

In other embodiments of the present application, the audio playback device 200 and the audio source device 100 may have inconsistent playback progress due to hardware differences (eg, different crystal oscillator frequencies) between the two devices. To this end, the audio playback device 200 can also adjust its own playback speed, so that its own playback speed is consistent with the playback speed of the audio source device 100, so that the audio playback device 200 can maintain the playback progress with the audio source device 100 for a long time. Consistent.

Exemplarily, the audio playback device 200 includes a distribution module, a playback module, a cache module, and a playback speed module as an example for description.

FIG. 6 shows the flow of another audio playback method provided by the present application. As shown in 6, the audio playback method includes steps S401-S403, S404c-S404e, S405a, S601-S611, and S406.

Wherein, step S401-step S403, step S404c-step S404e, step S405a, and step S406, please refer to the description of the relevant content in FIG. 4, and will not be repeated here.

S601. The playback module obtains the content of the audio segment from the cache module.

When the current time is equal to the expected playback time point of the first audio segment, the distribution module notifies the playback module to start audio playback, and the playback module starts to sequentially read the content of the first audio segment and subsequent audio segments from the cache module .

S602. The playback module writes the acquired audio segment content to the audio output driver.

The playback module sequentially writes the read audio segments into the audio output driver (eg ALSA) in the playback module, and plays the currently written audio segment through the audio output driver.

S603: The playback module invokes the interface of the audio output driver to query the speaker expected output time of the currently written audio segment, that is, the actual playback time point of the currently written audio segment.

After the playback module writes the audio segment to the audio output driver, the playback module also calls the interface of the audio output driver to query the speaker expected output time of the currently written audio segment, and the speaker expected output time of the audio segment can be considered to be the The actual playback time point of the audio segment.

S604: The playback module calculates the difference between the actual playback time point of the audio fragment and the expected playback time point (eg, playtime) carried in the audio fragment, and determines whether the absolute value of the difference is greater than the preset threshold A.

In some embodiments, the magnitude of the absolute value of the difference and a preset threshold value A (for example, 1 second) can be determined. The absolute value of the difference is greater than the preset threshold A, indicating that the playback progress difference between the audio playback device 200 and the audio source device 100 is relatively large. Step S605 can be executed to quickly reduce the playback progress difference between the two devices. The absolute value of the difference is less than or equal to the preset threshold A, indicating that the difference between the playback progress of the audio playback device 200 and the audio source device 100 is small, and steps S606 to S611 may be performed. That is, by adjusting the playback speed of the audio playback device 200 , the playback progress of the audio playback device 200 and the audio source device 100 are consistent, and the playback speed of the audio playback device 200 is consistent with the playback speed of the audio source device 100 . Of course, in some other embodiments, it is not necessary to distinguish the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point (such as playtime) and the size of the preset threshold A, and the audio playback device 200 is adjusted directly through the following steps. playback speed. That is to say, after the playback module calculates the difference between the actual playback time point of the audio segment and the expected playback time point (such as playtime) carried in the audio segment, it is not necessary to compare the absolute value of the difference with the preset threshold A. size, but directly execute step S606.

S605. The playback module notifies the cache module to delete or add audio segments.

After determining that the difference between the actual playback time point of the audio segment and the expected playback time point is greater than the preset threshold A, it is further determined according to the relative size of the actual playback time point and the expected playback time point to delete one or more audio segments, or whether to delete one or more audio segments. Add one or more audio slices.

If the actual playback time of the audio fragment is later than the expected playback time carried in the audio fragment, it indicates that the playback progress of the audio playback device 200 is slower than the playback progress of the audio source device 100, and the playback module may notify the cache module to delete some audio fragments , so that the playback progress of the audio playback device 200 is the same as the playback progress of the audio source device 100 as soon as possible.

If the actual playback time point of the audio fragment is earlier than the expected playback time carried in the audio fragment, it indicates that the playback progress of the audio playback device 200 is faster than the playback progress of the audio source device 100, and the playback module may notify the cache module to add some audio fragments . The added audio segment may be muted audio data, or copy the currently written audio segment data or other audio data, which can be equivalent to the audio playback device 200 playing the subsequent audio segment after waiting for a corresponding time. The audio playback device 200 and the audio source device 100 have the same playback progress.

Subsequently, the playback module monitors the difference between the actual playback time point of the subsequent audio segment and the expected playback time point, determines whether the difference is greater than the preset threshold A, and further adopts a corresponding method.

S606: The playback module sends the actual playback time point of the audio segment to the playback speed module and the difference value corresponding to it.

Since the playback speed of the audio playback device 200 needs to be adjusted, it is necessary to calculate the actual playback time point of the audio playback device 200 and the change trend of the difference. Therefore, when the playback module determines that the difference is less than the preset threshold A, the difference and The actual playback time point corresponding to the difference is sent to the playback speed module.

S607: The playback speed module stores the difference between the actual playback time point of the audio segment and its corresponding value.

Exemplarily, the playback speed module records the difference between the actual playback time point of the received audio segment and its corresponding difference in the sampling queue 2, and the content stored in the sampling queue 2 is shown in Table 4.

Table 4

S608: The playback speed module calculates the deviation of the playback speed according to the difference between the actual playback time point of the audio segment and its corresponding value.

When the playback speed module determines that a certain condition is met, the deviation between the playback speeds of the audio playback device 200 and the audio source device 100 may be calculated according to the data in the sampling queue 2 . The certain condition may be, for example, that the number of data pieces in the sampling queue 2 (for example, each row of data in Table 4 is one piece of data) reaches a predetermined number, for example, 100 pieces of data.

In some embodiments, the adjustment coefficient calculation module may perform linear fitting on the data in the sampling queue 2 to obtain a variation trend of the deviation between the playback speeds of the audio playback device 200 and the audio source device 100 .

Specifically, taking the actual playback time point as the X-axis, and the difference between the actual playback time point and the expected playback time point as the Y-axis, the discrete points are drawn on the two-dimensional plane. After performing linear regression on the discrete points, a kneaded straight line is obtained. The slope of the straight line (referred to as slope 2) represents the deviation between the actual playback speed and the expected playback speed, that is, how much time the deviation per unit time is. As shown in (1) in FIG. 7 , when the slope 2 of the fitted straight line is a positive value, it means that the actual playback speed of the audio playback device 200 is slower than the expected playback speed, and the playback speed of the audio playback device 200 needs to be increased. As shown in (2) in FIG. 7 , when the fitted straight line slope 2 is negative, it means that the actual playback speed of the audio playback device 200 is faster than the expected playback speed, and the playback speed of the audio playback device 200 needs to be slowed down. If the slope 2 is zero, the actual playback speed of the audio playback device 200 is the same as the expected playback speed, and there is no need to adjust the playback speed of the audio playback device 200 .

S609, the playback speed module calculates the target playback speed according to the deviation of the playback speed.

The target playback speed is the desired playback speed of the audio playback device 200 . Then, formula (3) can be used to calculate the target playback speed:

Target playback speed = current playback speed * (1 + slope 2) formula (3)

It can be understood that when the slope 2 is a positive value, the target playback speed calculated according to formula (3) will increase. When the slope 2 is a negative value, the target playback speed calculated according to formula (3) will decrease.

In other examples, the preset threshold value B can also be set. When the absolute value of the slope 2 is less than the preset threshold B, it can be considered that the difference between the actual playback time point of the audio playback device 200 and the expected playback time point is small, and the playback speed of the audio playback device 200 does not need to be adjusted. When the absolute value of the slope 2 is greater than or equal to the preset threshold B, formula (3) is used to adjust the playback speed of the audio playback device 200 .

S610. The playback speed module sends the target playback speed to the playback module.

S611. The playback module modifies the playback speed value of the audio output driver to the target playback speed.

For example, the playback module adjusts the playback speed of the audio output driver to (1+slope 2) times the current speed.

To sum up, according to the difference between the actual playback time point of the audio playback device 200 and the expected playback time point, the playback speed of the audio playback device 200 is adjusted to be consistent with the playback speed (or delivery speed) of the audio source device 100. The speeds of the devices are different, causing the audio playback device 200 to frequently add or delete audio segments in the cache.

In still other embodiments of the present application, the technical solution described in FIG. 4 and the technical solution described in FIG. 6 may be combined. That is to say, first calculate the adjustment coefficient through the change trend of the number of audio fragments buffered in the audio playback device 200, and calculate the expected playback time point of each audio fragment according to the adjustment coefficient, that is, align the audio playback device 200 and the audio source device. 100 The point in time when the audio segment is expected to start playing. Then, the audio playback device 200 can further adjust its own playback speed, so that its own playback speed is consistent with the playback speed of the audio source device 100, so that the audio playback device 200 can maintain the playback progress with the audio source device 100 for a long time. Consistent.

In still other embodiments of the present application, the audio playback device 200 is further connected with other audio playback devices, for example, the audio playback device 300 . The audio content is played jointly by the audio playback device 200 and the audio playback device 300 . Then, in addition to the playback speed of the audio playback device 200 and the playback speed (or delivery speed) of the audio source device 100, the playback speed of the audio playback device 300 should also be consistent with the playback speed (or delivery speed) of the audio source device 100. be consistent.

FIG. 8 shows the flow of another audio playback method provided by the present application. As shown in FIG. 8 , the audio playback method includes steps S401 to S406 , and steps S801 to S805 .

Wherein, for steps S401 to S406, please refer to the related content of the process in FIG. 4 . The differences from the flow in Figure 4 are highlighted here.

First, a wired connection or a wireless connection may be established between the audio playback device 200 and the audio playback device 300, where the wireless connection may be, for example, Bluetooth, WLAN, NFC, or the like.

S801, the audio playback device 200 is time synchronized with the audio playback device 300.

In some examples, audio playback device 200 and audio playback device 300 establish a wireless connection. Before the audio playback device 200 and the audio playback device 300 play audio content together, the audio playback device 200 and the audio playback device 300 perform time synchronization. For example, after the audio playback device 200 receives the audio stream (ie, step S402 ), or after the audio playback device 200 sends the audio segment to the audio playback device 300 (ie, at step S802 ), or the audio playback device 200 sends the audio After the playback device 300 sends to start the audio playback (ie, step S804 ), the audio playback device 200 and the audio playback device 300 perform time synchronization.

Specifically, the audio playback device 200 (for example, the time synchronization module in particular) may use a simple network time protocol (SNTP) or a precision time protocol (PTP), etc., to communicate with the audio playback device 300 (eg, specifically a time synchronization module) to perform time synchronization.

S802: The audio playback device 200 sends the audio fragment to the audio playback device 300.

Exemplarily, after the distribution module of the audio playback device 200 performs step S404c, on the one hand, the obtained audio fragment is sent to its own cache module (ie, step S404d), and on the other hand, the obtained audio fragment is sent to The audio playback device 300 (eg, the cache module of the audio playback device 300 ).

S803. The audio playback device 300 caches the audio fragment.

Exemplarily, the buffering module of the audio playback device 300 buffers the received audio fragments.

S804, the audio playback device 200 notifies the audio playback device 300 to start audio playback.

Exemplarily, when the distribution module of the audio playback device 200 determines that the current time is later than or equal to the expected playback time point of the first audio segment, on the one hand, it notifies its own playback module to start audio playback (ie, step S405a), and on the other hand The audio playback device 300 (eg, a playback module of the audio playback device 300 ) is notified to start audio playback.

S805. The audio playback device 300 starts audio playback.

Exemplarily, the audio playback device 300 reads the first audio segment and subsequent audio segments from its own cache module, and starts playing.

It should be noted that the audio fragment received by the audio playback device 300 carries the expected playback time point, and the expected playback time point is the playback progress difference between the audio playback device 100 and the audio playback device 200 according to the audio source playback device 200 . Periodically updated.

The same as the playback process of the audio playback device 200 (refer to the related content in step S405b), when the audio playback device 300 plays the audio segment, it will write the audio segment into the audio output driver (for example, ALSA), and call the audio output The driver's interface reads the speaker output time point currently written to the audio segment. The speaker output time point may be considered to be the time point when the audio playback device 300 actually plays the currently written audio segment, which is simply referred to as the actual playback time point of the currently written audio segment. And by comparing the actual playback time point and the expected playback time point of the currently written audio fragment, delete or increase the audio fragment of the cache module in the audio playback device 300, so that the playback speed of the audio playback device 300 is the same as that of the audio source device 100. The playback speed (or delivery speed) remains the same.

FIG. 9 shows the flow of another audio playback method provided by the present application. As shown in FIG. 9 , the method includes steps S401 to S404, steps S405a, S406, S601 to S611, steps S801 to S804, and steps 901 to S911.

The difference between the process in this embodiment and the process in FIG. 8 is that the audio playback device 200 records the difference between the actual playback time point of each audio fragment and the expected playback time point carried in the audio fragment, and according to the difference The playback speed of the audio playback device 200 is adjusted so that the playback speed of the audio playback device 200 is consistent with the playback speed of the audio source device 100 . That is, the audio playback device 200 executes steps S601 to S611, and the specific content can refer to the related content of the flow in FIG. 6 .

In addition, similar to the adjustment of the playback speed of the audio playback device 200 , the audio playback device 300 also uses a similar method to adjust its own playback speed, so that the playback speed of the audio playback device 300 is also consistent with the playback speed of the audio source device 100 . That is, the audio playback device 300 executes steps S901 to S911. In some examples, the audio playback device 300 further includes a playback speed module.

S901. The playback module of the audio playback device 300 acquires the content of the audio segment from the cache module.

S902, the playback module writes the acquired audio segment content to the audio output driver.

S903: The playback module invokes the interface of the audio output driver to query the speaker expected output time of the currently written audio segment, that is, the actual playback time point of the currently written audio segment.

S904: The playback module calculates the difference between the actual playback time point of the audio fragment and the expected playback time point (eg, playtime) carried in the audio fragment, and determines whether the difference is greater than the preset threshold A.

S905, the playback module notifies the cache module to delete or add audio segments.

S906, the playback module sends the actual playback time point of the audio segment to the playback speed module and the difference value corresponding to it.

S907: The playback speed module stores the difference between the actual playback time point of the audio segment and its corresponding difference.

S908, the playback speed module calculates the deviation of the playback speed according to the difference between the actual playback time point of the audio segment and its corresponding value.

S909, the playback speed module calculates the target playback speed according to the deviation of the playback speed.

S910. The playback speed module sends the target playback speed to the playback module.

S911, the playback module modifies the playback speed value of the audio output driver to the target playback speed.

The specific contents of steps S901 to S911 may refer to the related contents of steps S601 to S611 in FIG. 6 , which will not be repeated here.

It should be noted that the target playback speed of the audio playback device 300 calculated in steps S901 to S911 is the same or approximately the same as the target playback speed of the audio playback device 200 calculated in steps S601 to S611.

Of course, in some other embodiments, the audio playback device 200 may also use the existing technical solution to determine the expected playback time point of each audio segment, that is, without using adjustment coefficients to adjust the expected playback time point of each audio segment, Instead, the playback speed of the audio playback device 200 is adjusted directly according to the difference between the actual playback time point of each audio segment and the expected playback time point until it is consistent with the playback speed (or delivery speed) of the audio source device. At the same time, the audio playback device 300 can also directly determine the actual playback time point of each audio segment according to its own audio output driver, calculate the difference between the actual playback time point and the expected playback time point of each audio segment, and adjust the audio playback time. The playback speed of the device 300 until it is consistent with the playback speed (or delivery speed) of the audio source device.

In FIG. 8 and FIG. 9 , the audio playback device 200 may be a master speaker or a master headset, and the audio playback device 300 may be a slave speaker or a slave headset. Optionally, the audio playback device 200 may be a master speaker, and the audio playback device 300 may be a slave earphone. Optionally, the audio playback device 200 may be a master earphone, and the audio playback device 300 may be a slave speaker.

It should be noted that the audio playback device is not limited to special audio playback devices such as speakers and earphones, and may also be a composite device such as a mobile device with speakers.

It should be noted that, all or part of any features of the above-mentioned embodiments of the present application can be freely combined to obtain technical solutions, which are also within the scope of the present application.

The embodiments of the present application also provide a chip system. As shown in FIG. 10 , the chip system includes at least one processor 2101 and at least one interface circuit 1102 . The processor 2101 and the interface circuit 1102 may be interconnected by wires. For example, the interface circuit 1102 may be used to receive signals from other devices (eg, the memory of the audio playback device 200). As another example, the interface circuit 1102 may be used to send signals to other devices (eg, the processor 2101). Exemplarily, the interface circuit 1102 may read the instructions stored in the memory and send the instructions to the processor 2101 . When the instructions are executed by the processor 2101, the electronic device can be made to execute various steps executed by the audio playback device 200 (eg, a sound box) in the above-mentioned embodiment. Of course, the chip system may also include other discrete devices, which are not specifically limited in this embodiment of the present application.

It can be understood that, in order to realize the above-mentioned functions, the above-mentioned terminal and the like include corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that, in conjunction with the units and algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the embodiments of the present invention.

In this embodiment of the present application, functional modules may be divided into the above terminal and the like according to the above method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules. It should be noted that, the division of modules in the embodiment of the present invention is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

From the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. In practical applications, the above functions can be allocated as required. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. For the specific working process of the system, apparatus and unit described above, reference may be made to the corresponding process in the foregoing method embodiments, and details are not described herein again.

Each functional unit in each of the embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.

The integrated unit, if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, and the computer software products are stored in a storage The medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the protection scope of the present application. . Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

An audio playback method, applied to a first audio playback device that communicates wirelessly with the audio source device, wherein the method includes:

receiving audio data sent by the audio source device;

dividing the audio data into N audio segments;

Cache the N audio fragments; wherein, the expected playback time point of each audio fragment is obtained according to the first adjustment coefficient;

Play each audio segment in turn;

Periodically collect the current number of buffered audio fragments and the collection time point corresponding to the current number;

After the period of periodical collection reaches the preset period, or after the number of times of periodical collection reaches the preset number of times,

According to the current quantity collected each time and the collection time point corresponding to the current quantity collected each time, a second adjustment coefficient is obtained; according to the second adjustment coefficient, the expected playback time point of each subsequent audio fragment is obtained;

Play subsequent audio segments in sequence;

Wherein, N is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.
The method according to claim 1, wherein before receiving the audio data sent by the audio source device, the method further comprises:

Received an instruction to play audio from the audio source device.
The method according to claim 1 or 2, wherein the obtaining the second adjustment coefficient according to the current quantity collected each time and the collection time point corresponding to the current quantity collected each time; comprising:

Perform linear fitting on the current quantity collected each time and the collection time point corresponding to the current quantity collected each time to obtain the first slope;

A second adjustment coefficient is obtained according to the first slope.
The method according to any one of claims 1-3, wherein the periodically collecting the current number of buffered audio fragments and the collection time point corresponding to the current number; comprising:

When the absolute value of the difference between the actual playback time point and the expected playback time point of any audio segment is greater than the first threshold, the first audio playback device starts to periodically collect the current number of buffered audio segments and the collection time point corresponding to the current quantity;

Wherein, the actual playback time point of the audio fragment is the expected output time point of the speaker of the audio fragment; the speaker expected output time point of the audio fragment is when the first audio playback device calls the first audio Obtained by querying the interface driven by the audio output of the playback device.
An audio playback method, applied to a first audio playback device that communicates wirelessly with the audio source device, wherein the method includes:

receiving audio data sent by the audio source device;

dividing the audio data into N audio segments;

Cache the N audio fragments; wherein, the expected playback time point of each audio fragment is obtained according to the first adjustment coefficient;

Play each audio segment in turn;

After the absolute value of the difference between the actual playback time point of the audio fragment and the expected playback time point is greater than the preset threshold, adjust the number of buffered audio fragments;

Wherein, the actual playback time point of the audio fragment is the expected output time point of the speaker of the audio fragment; the speaker expected output time point of the audio fragment calls the first audio through the first audio playback device Obtained by querying the interface of the audio output driver of the playback device; N is a positive integer greater than or equal to 2; and the first adjustment coefficient is a preset coefficient.
The method according to claim 5, wherein after the absolute value of the difference between the actual playback time point of the audio fragment and the expected playback time point is greater than a preset threshold, adjusting the buffered audio fragment Quantity; includes:

After the absolute value of the difference between the actual playback time point of the audio fragment and the expected playback time point is greater than the preset threshold, and the difference is a negative value, the first number of audio fragments is added;

After the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is greater than the preset threshold, and the difference is a positive value, the first number of audio segments are deleted.
The method according to claim 6, wherein the first quantity is associated with the quotient of the absolute value of the difference divided by the playback duration of the audio segment.
The method according to claim 7, wherein the first quantity is the quotient of the absolute value of the difference divided by the playback duration of the audio segment.
The method according to any one of claims 6-8, wherein the added first number of audio segments are mute data.
The method according to any one of claims 5-9, wherein after the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is less than or equal to the preset threshold , and adjust the playback speed of the first audio playback device.
The method according to claim 10, wherein after the absolute value of the difference between the actual playback time point of the audio segment and the expected playback time point is less than or equal to the preset threshold,

Collect the actual playback time point of the audio fragment, and the difference between the actual playback time point of the audio fragment and the expected playback time point;

After the number of collections reaches the preset number of times, or, after the collection time period reaches the preset time period,

Perform linear fitting on the actual playback time point collected each time and the difference corresponding to the actual playback time point collected each time to obtain a second slope;

Obtain the current playback speed of the first audio playback device;

Obtain the adjusted playback speed according to the current playback speed and the second slope;

Play subsequent audio segments in sequence at the adjusted playback speed.
The method according to any one of claims 5-11, wherein the first audio playback device is connected with a second audio playback device, and the method further comprises:

Send the N audio segments to the second audio playback device.
The method according to claim 12, wherein before the first audio playback device plays the first audio segment, the method further comprises:

Send an instruction to start playing the audio segment to the second audio playback device.
A first audio playback device, characterized in that it includes a processor, an audio output device, and a memory, wherein the audio output device and the memory are both coupled to the processor, and the memory is used to store a computer program, when all the When the computer program is executed by the processor, the first audio playback device is caused to perform the method according to any one of claims 1-13.
A computer-readable storage medium, comprising a computer program that, when the computer program runs on a first audio playback device, causes the first audio playback device to perform any one of claims 1-13 the method described.
A computer program product, characterized in that, when the computer program product is run on a computer, the computer is caused to execute the method according to any one of claims 1-13.