CN114974321B

CN114974321B - Audio playing method, equipment and system

Info

Publication number: CN114974321B
Application number: CN202110221879.2A
Authority: CN
Inventors: 彭正元
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-02-27
Filing date: 2021-02-27
Publication date: 2023-11-03
Anticipated expiration: 2041-02-27
Also published as: WO2022179246A1; CN114974321A

Abstract

An audio playing method, equipment and a system relate to the technical field of audio, and can adjust the playing progress of audio playing equipment and an audio source to be consistent and improve the listening experience of a user. The audio playing method comprises the following steps: after receiving the audio data sent by the audio source device, the audio playing device divides the audio data into a plurality of audio fragments, and adjusts the predicted playing time point of the subsequent audio fragments according to the change trend of the number of the audio fragments in the buffer area of the audio playing device so as to keep consistent with the playing progress (or throwing progress) of the audio source device; or, the audio playing device may adjust the playing speed of the audio playing device according to the deviation between the actual playing time point and the expected playing time point of each audio fragment, so as to adjust the playing speed (or the playing speed) of the audio source device to be consistent, so as to keep the playing progress (or the playing progress) of the audio source device consistent.

Description

Audio playing method, equipment and system

Technical Field

The present application relates to the field of audio technologies, and in particular, to an audio playing method, device, and system.

Background

The intelligent device can be connected to an audio playing device (such as a headset, a sound box and the like) in a wireless communication mode such as Bluetooth, wi-Fi and the like, and plays audio content through the audio playing device. Meanwhile, the intelligent device plays the video picture. Because the crystal oscillator difference of the intelligent device and the audio playing device causes different playing speeds of the two devices, video pictures played by the intelligent device may appear, and the video pictures and audio content played by the audio playing device are asynchronous, especially more obvious after a long time, so that the user experience is poor.

Disclosure of Invention

In order to solve the technical problems, the application provides an audio playing method, equipment and a system. The technical scheme provided by the application can keep the playing progress of the audio playing device and the audio source device (namely the intelligent device) consistent, and improves the user experience, in particular the listening experience.

In a first aspect, an audio playing method is provided and applied to a first audio playing device, where the first audio playing device is in wireless communication with an audio source device. The method comprises the following steps: receiving audio data sent by audio source equipment; dividing audio data into N audio fragments; caching N audio fragments; the method comprises the steps of obtaining an expected playing time point of each audio fragment according to a first adjustment coefficient; sequentially playing each audio fragment; periodically collecting the current number of the cached audio fragments and a collection time point corresponding to the current number; after the periodically acquired time length reaches the preset time length or the periodically acquired times reach the preset times, obtaining a second adjustment coefficient according to the current quantity acquired each time and the acquisition time point corresponding to the current quantity acquired each time; obtaining the predicted playing time point of each subsequent audio fragment according to the second adjustment coefficient; sequentially playing the subsequent audio fragments; wherein N is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.

It can be understood that the variation trend of the data amount of the buffer area of the first audio playing device represents the deviation of the playing speeds of the audio source device and the first audio playing device. Therefore, according to the change trend of the data amount of the buffer, the time point (called as the predicted playing time point for short) of each audio fragment in the audio data, at which playing is predicted, can be adjusted, so as to achieve the effect of synchronizing the playing speeds of the first audio playing device and the audio source device. And then can avoid causing the data overflow or the exhaustion of first audio playback device buffer memory district, and then avoid appearing playing the condition that sound card was blocked or was exploded when audio frequency, promote the listening experience of playing audio frequency outward.

In one possible implementation, before receiving the audio data sent by the audio source device, the method further includes: an indication of playing audio sent by an audio source device is received.

In one possible implementation manner, the second adjustment coefficient is obtained according to the current number of each acquisition and the acquisition time point corresponding to the current number of each acquisition; comprising the following steps: performing linear fitting on the current quantity collected each time and the collection time points corresponding to the current quantity collected each time to obtain a first slope; and obtaining a second adjustment coefficient according to the first slope.

Illustratively, with time as the X-axis, the number of first audio clips buffered by the first audio playback device is the Y-axis, and discrete points are plotted on a two-dimensional plane. In other words, each discrete point plotted is used to characterize the number of first audio slices acquired at the corresponding acquisition time point. Then, a linear regression is performed on the discrete points in a linear regression mode to obtain a straight line, and the slope (i.e. the first slope) of the straight line represents the change trend of increasing or decreasing the buffered first audio fragments, i.e. the number of the first audio fragments is increased or decreased in each unit time. When the first slope is positive, it indicates the number of the first audio fragments increased per unit time, which also means that the playing speed (or putting speed) of the audio source device is faster than the playing speed of the first audio playing device. When the first slope is negative, it indicates the number of first audio clips that decrease per unit time, which also means that the playing speed (or putting speed) of the audio source device is slower than the playing speed of the first audio playing device. The first slope calculates an adjustment coefficient, so that the playing speed of the subsequent second audio fragment can be adjusted as soon as possible, and the playing speed of the subsequent second audio fragment is consistent with the playing speed or the throwing speed of the audio source equipment.

In one possible implementation, the current number of buffered audio clips and the acquisition time points corresponding to the current number are periodically acquired; comprising the following steps: when the absolute value of the difference value between the actual playing time point and the expected playing time point of any one audio fragment is larger than a first threshold value, the first audio playing device starts to periodically acquire the current number of the cached audio fragments and the acquisition time point corresponding to the current number; the actual playing time point of the audio fragment is the expected output time point of the loudspeaker of the audio fragment; the expected output time point of the loudspeaker of the audio fragment is obtained by calling an interface query of the audio output drive of the first audio playing device by the first audio playing device.

Thereby, an opportunity is provided to start collecting the buffered first number of audio slices.

In a second aspect, an audio playing method is provided and applied to a first audio playing device, where the first audio playing device is in wireless communication with an audio source device. The method comprises the following steps: receiving audio data sent by audio source equipment; dividing audio data into N audio fragments; caching N audio fragments; the method comprises the steps of obtaining an expected playing time point of each audio fragment according to a first adjustment coefficient; sequentially playing each audio fragment; after the absolute value of the difference between the actual playing time point and the predicted playing time point of the audio fragments is larger than a preset threshold, adjusting the number of the cached audio fragments; the actual playing time point of the audio fragment is the expected output time point of the loudspeaker of the audio fragment; the expected output time point of the loudspeaker of the audio fragment is obtained by calling an interface of an audio output drive of the first audio playing device through the first audio playing device; n is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.

Therefore, the method for adjusting the playing speed of the first audio playing device can be consistent with the playing speed or the throwing speed of the audio source, and is beneficial to keeping consistent with the playing progress or the throwing progress of the audio source for a long time.

In one possible implementation, after the absolute value of the difference between the actual playing time point and the predicted playing time point of the audio fragments is greater than a preset threshold, adjusting the number of the buffered audio fragments; comprising the following steps: after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio fragments is larger than a preset threshold value and the difference is a negative value, adding a first number of audio fragments; and deleting the first number of audio fragments after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio fragments is larger than a preset threshold value and the difference is positive.

In one possible implementation, the first number is associated with a quotient of an absolute value of the difference divided by a length of the audio clip playback.

In one possible implementation, the first number is the quotient of the absolute value of the difference divided by the audio clip playback duration.

In one possible implementation, the increased first number of audio clips is silence data.

In one possible implementation, the playing speed of the first audio playing device is adjusted after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio piece is smaller than or equal to a preset threshold value.

In one possible implementation manner, after an absolute value of a difference value between an actual playing time point and an expected playing time point of the audio piece is smaller than or equal to a preset threshold value, acquiring the actual playing time point of the audio piece and the difference value between the actual playing time point and the expected playing time point of the audio piece; after the acquired times reach the preset times or the acquired time length reaches the preset time length, performing linear fitting on the difference value corresponding to the actual playing time point acquired each time to obtain a second slope; acquiring the current playing speed of first audio playing equipment; obtaining an adjusted playing speed according to the current playing speed and the second slope; and sequentially playing the subsequent audio clips at the adjusted playing speed.

Thus, a specific method of calculating a speed deviation of a first audio playback device from an audio source device is provided.

In one possible implementation, the first audio playing device is connected to a second audio playing device, and the method further includes: and transmitting the N audio fragments to the second audio playing device.

That is, the first audio playing device may play audio together with the second audio playing device, and when the playing synchronization of the first audio playing device and the audio source device is achieved, the playing synchronization of the second audio playing device and the audio source device is also achieved. Further, the second audio playing device may also adjust the playing speed of the second audio playing device by using the same method for adjusting the playing speed as the first audio playing device.

In one possible implementation, before the first audio playing device plays the first audio clip, the method further includes: an indication to begin playing the audio clip is sent to the second audio playback device.

In a third aspect, a first audio playback device is provided. The first audio playback device comprises a processor, audio output means, and a memory, both coupled to the processor, the memory for storing a computer program which, when executed by the processor, causes the first audio playback device to perform the method of, or to perform the method of, any one of the possible implementations of, the first aspect.

In a fourth aspect, an apparatus is provided. The apparatus is comprised in a first audio playback device, the apparatus having the functionality to implement the behaviour of the first audio playback device in any one of the above aspects and possible implementations of the above aspects. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes at least one module or unit corresponding to the functions described above. Such as a communication module or unit, a processing module or unit, a playback module or unit, etc.

In a fifth aspect, a computer-readable storage medium is provided. The computer readable storage medium comprises a computer program which, when run on a first audio playback device, causes the first audio playback device to perform the method of any one of the possible implementations of the first aspect and the first aspect, or to perform the method of any one of the possible implementations of the second aspect and the second aspect.

In a sixth aspect, a computer program product is provided. The computer program product, when run on a computer, causes the computer to perform the method of or to perform the method of any of the possible implementations of the first aspect and the second aspect described above.

In a seventh aspect, a system on a chip is provided. The system-on-a-chip comprises a processor which, when executing instructions, performs the method of any one of the possible implementations of the first aspect and the first aspect, or performs the method of any one of the possible implementations of the second aspect and the second aspect.

In an eighth aspect, a system is provided. The system comprises an audio source playback device and a first audio playback device performing the method of the first aspect and any of the possible implementations of the first aspect, or performing the method of the second aspect and any of the possible implementations of the second aspect.

In a possible implementation manner, the system further includes a second audio playing device, where the second audio playing device performs the method in the second aspect and any one of the possible implementation manners of the second aspect.

It will be appreciated that the advantages achieved by the first audio playing device according to the third aspect, the apparatus according to the fourth aspect, the computer storage medium according to the fifth aspect, the computer program product according to the sixth aspect, the chip system according to the seventh aspect, and the system according to the eighth aspect provided above may be referred to the advantages in any one of the possible designs of the first aspect or the second aspect, and are not repeated herein.

Drawings

Fig. 1 is a schematic view of a scenario of an audio playing method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of an audio source device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an audio playing device according to an embodiment of the present application;

fig. 4 is a flowchart of an audio playing method according to an embodiment of the present application;

fig. 5 is a schematic diagram of a fitting method of a number change trend of buffered audio clips of an audio playing device according to an embodiment of the present application;

fig. 6 is a flowchart of an audio playing method according to an embodiment of the present application;

fig. 7 is a schematic diagram of a fitting method of a variation trend of a difference between an actual playing time point and an estimated playing time point of an audio playing device according to an embodiment of the present application;

fig. 8 is a flowchart of an audio playing method according to an embodiment of the present application;

fig. 9 is a flowchart of an audio playing method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a chip system according to an embodiment of the present application.

Detailed Description

In the description of the embodiments of the present application, "/" means or is meant unless otherwise indicated. For example, A/B may represent A or B. "and/or" herein is merely one association relationship describing the association object, meaning that three relationships may exist. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone.

The terms "first" and "second" are used below for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the embodiments of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In embodiments of the application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.

Fig. 1 is a schematic view of a scenario of an audio playing method according to an embodiment of the present application. Fig. 1 shows a communication system provided by an embodiment of the present application. The communication system includes an audio source device 100 and an audio playback device 200. Optionally, the communication system may further comprise an audio playback device 300.

Wherein the audio source device 100 is used to provide audio content to the audio playback device 200. For example, the audio source device 100 in the embodiment of the present application may be, for example, a mobile phone, a tablet computer, a personal computer (personal computer, PC), a personal digital assistant (personal digital assistant, PDA), a netbook, a wearable device, an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, a vehicle-mounted device, an intelligent screen, or the like, and the specific form of the audio source device 100 is not limited by the present application.

The audio playing device 200 is configured to receive the audio content sent from the audio source device 100 and play the audio content. By way of example, the audio playback device 200 may be, for example, a wireless earphone, a wireless speaker, a wearable device, an AR device, a VR device, etc., and the specific form of the audio playback device 200 is not particularly limited by the present application.

In an application scenario, when the audio source device 100 plays a video, the audio source device may play the picture content of the video through its own display screen, and send the audio content in the video to the audio playing device 200; the audio content is played by the audio playback device 200.

In general, in order to reduce the effect of network jitter during wireless transmission, the audio playback device 200 buffers audio content received from the audio source device 100 and delays playback. Because the audio source device 100 and the audio playing device 200 are different devices, the difference of hardware (for example, different crystal oscillator frequencies) can cause different playing speeds of the two devices, and further cause overflow or exhaustion of data in a buffer area at the audio playing device 200, and sound is blocked or popped when playing audio.

In one approach, a maximum threshold and a minimum threshold for the audio playback device 200 to buffer data may be set. When the data in the buffer area of the audio playing device 200 is greater than the maximum threshold value, the playing speed of the audio playing device 200 is increased according to a certain proportion. When the data in the buffer area of the audio 200 is smaller than the minimum threshold value, the playing speed of the audio playing device 200 is reduced according to a certain proportion. Thus, the amount of data buffered by the audio playback apparatus 200 is maintained within a predetermined range, and the overflow or exhaustion of the data in the buffer area at the audio playback apparatus 200 is reduced.

In this scheme, when the wireless network transmission speed is unstable, the data amount of the buffer area of the audio playing device 200 may be changed continuously, and the playing speed of the audio playing device 200 may need to be frequently adjusted, so that the playing of the audio content is stumbling, and the user experience is poor. In addition, the playback speed of the audio playback apparatus 200 is adjusted in a fixed ratio, which is not matched with the actual playback speed, and the adjustment accuracy of the playback speed is not high.

Therefore, the technical scheme provided by the application can keep the playing progress of the audio playing device and the audio source device (namely the intelligent device) consistent, and improve the user experience, in particular the listening experience.

Illustratively, fig. 2 shows a hardware structure of the audio source device 100.

The audio source device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It will be appreciated that the structure illustrated by the embodiments of the present application does not constitute a specific limitation on the audio source device 100. In other embodiments of the application, the audio source device 100 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the storage capabilities of the audio source device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the audio source device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the audio source device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

In some embodiments, the processor 110 may include one or more interfaces, including for example, the USB interface 130, which is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the audio source device 100, or may be used to transfer data between the audio source device 100 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. The interface may also be used to connect other electronic devices, such as AR devices, etc.

The wireless communication function of the audio source device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied on the audio source device 100.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied to the audio source device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In the embodiment of the present application, the audio source device 100 may establish a communication connection with the audio playing device 200 through the wireless communication module 160, and send the audio content to be played to the audio playing device 200 through the wireless connection, so that the audio content is played by the audio playing device 200. The audio content to be played may be sound content in video or may be independent audio, such as music. In some examples, the audio playback device 200 may also forward the audio content to the audio playback device 300 again for playback by the audio playback device 200 and the audio playback device 300 together. Wherein the audio playing device 200 is a master playing device, the audio playing device 300 is a slave playing device, and the number of the audio playing devices 300 is one or more.

The audio source device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like.

The audio source device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

Illustratively, fig. 3 shows a hardware structure of the audio playback apparatus 200.

The audio playback device 200 may include a processor 210, a memory 220, a wireless communication module 230, an antenna 240, a speaker 250, a power module 260, and the like. It should be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the audio playback apparatus 200. In other embodiments of the application, the audio playback device 200 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 210 may include one or more processing units such as, for example: processor 210 may include a distribution module, a play module, a cache module, an adjustment coefficient calculation module, a play speed module, and the like. Optionally, the processor 210 may further include a clock synchronization module or the like. The specific roles of the respective modules will be described in detail below in connection with specific embodiments. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

Memory 220 may be used to store computer executable program code that includes instructions. In some examples, memory 220 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash memory (universal flash storage, UFS), and the like. The processor 210 performs various functional applications and data processing of the audio playback device 200 by executing instructions stored in the memory 220 and/or instructions stored in a memory provided in the processor.

The wireless communication function of the audio playback apparatus 200 can be implemented by the antenna 240, the wireless communication module 230, a modem processor in the processor 210, a baseband processor, and the like.

The wireless communication module 230 may provide a solution for wireless communication including WLAN (e.g., wi-Fi network), BT, GNSS, FM, NFC, IR, etc. applied on the audio playback device 200. The wireless communication module 230 may be one or more devices that integrate at least one communication processing module. The wireless communication module 230 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 210. The wireless communication module 230 may also receive a signal to be transmitted from the processor 210, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 240.

In an embodiment of the present application, the audio playing device 200 may establish a communication connection with the audio source device 100 through the wireless communication module 230, receive the audio content transmitted by the audio source device 100 through the wireless connection, and play the audio content through the speaker 250. In some examples, the audio playback device 200 may also establish a communication connection with other audio playback devices 300 through the wireless communication module 230. The audio playback device 200 may forward the audio content to the audio playback device 300 again and play back by the audio playback device 200 and the audio playback device 300 together, playing back sound with stereo sound effects. Wherein the audio playing device 200 is a master playing device, the audio playing device 300 is a slave playing device, and the number of the audio playing devices 300 is one or more.

The power module 260 provides power to the various components of the audio playback device 200, such as the processor 210, the memory 220, the wireless communication module 230, and the like.

It should be noted that, the structure of the audio playing device 300 may refer to the audio playing device 200, and of course, the structure of the audio playing device 300 may be the same as or different from the structure of the audio playing device 200, which is not limited by the present application.

The technical solution provided by the embodiment of the present application is applicable to the communication system shown in fig. 1, where the audio source device 100 has the structure shown in fig. 2, and the audio playing device 200 has the structure shown in fig. 3.

In one embodiment of the present application, the audio playing device 200 collects the data size of the buffer area at a plurality of adjacent time points (or moments), and calculates the trend of the change of the data size of the buffer area by using a linear fitting method. The trend of the buffer data amount shows the deviation of the playing speed of the audio source device 100 and the audio playing device 200. Therefore, according to the variation trend of the data amount in the buffer, the time point (simply called the predicted playing time point) of each audio fragment in the audio content, where the audio fragment is predicted to begin playing, can be adjusted, so as to achieve the effect of synchronizing the playing speeds of the audio playing device 200 and the audio source device 100, and improve the hearing experience of the user.

In still another aspect of the present application, since the playing speeds of the audio playing device 200 and the audio source device 100 are different, the playing speed of the audio playing device 200 may also be adjusted so that the playing speed of the audio playing device 200 is consistent with the playing speed of the audio source device 100. Specifically, the audio playing device 200 may collect the deviation between the predicted playing time point in the audio clip and the time point at which the corresponding audio playing device 200 actually starts playing the audio clip (abbreviated as the actual playing time point of the audio clip), and then calculate the variation trend of the deviation by means of linear fitting; and adjusts the play speed of the audio play device 200 using the variation trend so that the play speed of the audio play device 200 coincides with the play speed of the audio source device 100.

In another application scenario, when the audio source device 100 plays a video, the audio content in the video may be sent to the audio playing device 200 and the audio playing device 300 through its own display screen, and the audio playing device 200 and the audio playing device 300 play the audio content together. Wherein the audio playing device 200 is a master playing device and the audio playing device 300 is a slave playing device.

The synchronous playing between the audio source device 100 and the audio playing device 200 may refer to the description in the previous application scenario, and will not be described herein. Typically, the audio playback apparatus 200 and the audio playback apparatus 300 are apparatuses of the same manufacturer, and clock synchronization can be performed. Even if clock synchronization is performed, the audio playback apparatus 200 and the audio playback apparatus 300 are still different apparatuses, and there is a difference in playback speed due to a difference in hardware (e.g., a difference in crystal oscillator frequency). When the playing time is prolonged, the playing progress of the two devices is not synchronous.

Similarly, the audio playing device 300 may also collect the deviation between the predicted playing time point in the audio clip and the time point when the audio playing device 300 actually starts playing the audio clip (simply referred to as the actual playing time point), and then calculate the variation trend of the deviation by means of linear fitting; and adjusts the playing speed of the audio playing device 300 by using the variation trend so that the playing speed of the audio playing device 300 is consistent with the playing speed of the audio source device 100.

In yet another application scenario, the audio source device 100 may send audio directly to the audio playback device 200 while playing audio (e.g., music), with the audio content being played by the audio playback device 200.

Generally, the speed at which the audio source device 100 sends an audio stream to the audio playing device 200 (also referred to as a delivery speed), unlike the playing speed of the audio playing device 200, still causes the data in the buffer to overflow or run out, and the situation of sound jamming or popping occurs when playing audio. Therefore, the audio playing device 200 needs to adjust the playing speed to be consistent with the playing speed of the audio source device 100, so as to avoid overflowing or exhausting data in the buffer area at the audio playing device 200, and further avoid the situation of sound jamming or popping when playing audio.

In yet another application scenario, the audio source device 100 may send audio directly to the audio playback device 200 and the audio playback device 300 while playing audio (e.g., music), with the audio content being played back jointly by the audio playback device 200 and the audio playback device 300.

Then, the audio playback apparatus 200 needs to adjust the playback speed to be consistent with the delivery speed of the audio source apparatus 100, and the audio playback apparatus 300 also needs to adjust the playback speed to be consistent with the delivery speed of the audio source apparatus 100.

The technical scheme of the application will be described below with reference to the accompanying drawings.

Fig. 4 shows a flow of an audio playing method provided by the present application. As shown in fig. 4, the method for playing audio may include:

s401, the audio source device 100 receives an instruction to play audio.

Illustratively, at least one audio playback device, such as audio playback device 200, is connected to audio source device 100. The audio source device 100 may establish a wireless connection with the audio playing device 200 through the wireless communication module 160, where the wireless connection mode may be, for example, bluetooth, WLAN, NFC, etc.

When the user operates on the audio source device 100, the start of playing video is instructed, the audio source device 100 can play the picture content of the video, send the audio content (i.e., the audio stream) of the video to the audio playing device 200, and play the video by the audio playing device 200, i.e., realize the playback of the audio. Alternatively, the user operates on the audio source device 100, indicating that the playing of pure audio (e.g., music, recording, etc.) is to begin, and the audio source device 100 sends an audio stream to the audio playback device 200.

Note that, when the audio source device 100 plays video, the screen content on the audio source device 100 side should be kept consistent with the play speed of the audio content on the audio play device 200 side. When the audio source device 100 plays pure audio, the speed at which the audio source device 100 delivers an audio stream to the audio playback device 200 should be consistent with the playback speed of audio content on the audio playback device 200 side.

S402, the audio source device 100 transmits an audio stream to the audio playback device 200.

S403, the audio playback apparatus 200 encodes and decodes the received audio stream.

Illustratively, taking the audio playing device 200 including a distribution module, an adjustment coefficient calculating module, a playing module and a buffering module as an example, the distribution module receives an audio stream sent by the audio source device 100, and performs encoding and decoding on the received audio stream to obtain audio data conforming to the playing format of the audio data. The codec is related to the number of channels, the number of sampling bits, and the sampling frequency of the audio playback apparatus 200, and specific codec procedures may refer to related audio codec techniques, which will not be described herein.

S404, the audio playing device 200 segments the encoded and decoded audio stream, and calculates the predicted playing time point of each audio segment according to the adjustment coefficient.

For example, taking the audio playing device 200 including the distribution module, the adjustment coefficient calculating module, the playing module, and the buffering module as an example, the step S404 may include steps S404a to S404e.

S404a, the distribution module requests to acquire the current adjustment coefficient from the adjustment coefficient calculation module after obtaining the encoded and decoded audio stream.

S404b, the adjustment coefficient calculation module returns the current adjustment coefficient to the distribution module.

It should be noted that the adjustment coefficient calculation module may update the values of the adjustment coefficients periodically, and the distribution module calculates the predicted play time points of the respective audio clips according to the current latest adjustment coefficient. The process of calculating the adjustment coefficient by the adjustment coefficient calculating module may refer to the following step S406. Wherein the initial value of the adjustment coefficient may be set to 1.

S404c, the distribution module segments the encoded and decoded audio stream and calculates the expected playing time point of each audio segment.

Illustratively, the data structure of each audio tile after the tile may be:

data type

-index:int

-len：int

-playtime：long long

the index is the number of the audio fragment, and is sequentially increased from 1.

len, the data length of the audio slice. The relationship between the data length and the audio fragment playing time length is as follows: len=number of channels×number of sampling bits×sampling rate×audio clip playback duration/8. The audio slice playing duration is a preset value, for example, 10ms (millisecond). In other words, the data length of the audio slice is a fixed value.

For example, the number of channels of the audio playing device 200 is 1, the number of sampling bits is 32, the sampling rate is 96KHz, and the audio slicing playing duration is 10ms, so len=1x32x96 x 10/8=3840 ms.

playtime, which is the predicted play time point of an audio clip, is for example in μs (micro seconds). Where the predicted play time point of the first audio clip (i.e., audio clip 1 #) = current time + a preset delay time (e.g., 1 s). The preset delay time can enable the audio playing device 200 to buffer the data of the audio fragments, so as to prevent abnormal playing caused by jitter of network transmission. The N-th audio clip predicted play time point=the first audio clip predicted play time point+ (N-1) the audio clip play time length is 1000 the adjustment coefficient.

It can be seen that the predicted playing time point of the audio slice in the present application is determined according to the predicted playing time point of the first audio slice, the number of the audio slice, and the adjustment coefficient. Wherein the adjustment coefficient is dynamically changed according to the number of audio clips buffered in the audio playback device 200. The following step S406 will explain the calculation method of the adjustment coefficient in detail, and will not be described here. Wherein, the initial value of the adjustment coefficient is 1.

Following the above example, the current time is 2020/11/11 00:00:00.000 000. The predetermined delay time is 1s, and then the predicted play time point of the first audio clip is 2020/11/11 00:00:01.000 000. The second audio clip's predicted play time = 2020/11/11 00:00:01.000 000+ (2-1) 10 x 1 = 2020/11/11 00:00:01.010 000 = 2020/11/11 00:00:01.010 000.

The data information for the 1 st audio slice is shown in table one:

list one

index	1
		len	3840
playTime	2020/11/11 00:00:01.000 000

The data information of the 2 nd audio slice is shown in table two:

watch II

index	2
		len	3840
playTime	2020/11/11 00:00:01.010 000

Here, the unit of the audio clip playing time length is ms (millisecond), and the predicted playing time point of the audio clip is μs (microsecond) as an example, which will not be described in detail.

It should also be noted that, in some embodiments, the distribution module may periodically request the current adjustment coefficient from the adjustment coefficient calculation module, so as to calculate the predicted play time point of the audio clip according to the current adjustment coefficient. Alternatively, the distribution module may request the current adjustment coefficient from the adjustment coefficient calculation module after receiving the audio stream data of the specific data amount, so as to calculate the predicted play time point of the audio clip according to the current adjustment coefficient. In other embodiments, when the adjustment coefficient calculating module updates the adjustment coefficient, the updated adjustment coefficient may also be sent to the distributing module, so that the distributing module calculates the predicted playing time point of the audio slice according to the updated adjustment coefficient. In other words, the distribution module may passively receive the latest adjustment coefficient sent by the adjustment coefficient calculation module to calculate the predicted play time point of the audio clip.

S404d, the distribution module sends each audio fragment to the cache module.

The distribution module sequentially sends the generated audio fragments to the caching module for caching. Each audio clip carries an expected play time point.

S404e, the buffer module buffers each audio fragment.

S405, when the current time is the predicted playing time point of the first audio clip (i.e. audio clip 1#), the audio source device 100 starts audio playing.

Illustratively, the audio playback apparatus 200 includes a distribution module, an adjustment coefficient calculation module, a playback module, and a buffer module. Then, step S405 may specifically include step S405a and step S405b.

And S405a, when the current time is later than or equal to the expected playing time point of the first audio fragment, the distribution module informs the playing module to start audio playing.

S405b, the playing module reads the data of the first audio fragment and the data of the following audio fragments from the buffer module, and starts to play each audio fragment in turn.

In some embodiments, after the playback module reads the data of the audio clip from the buffer module, the data of the audio clip is written into an audio output driver in the playback module, such as an advanced Linux sound architecture (advanced Linux sound architecture, ALSA). And the expected output time corresponding to the current written audio fragment is obtained through the interface of the audio output driver, and the expected output time can be regarded as the actual playing time point of the audio fragment.

In case 1, if the actual playing time point of the audio fragment is later than the expected playing time point carried in the audio fragment, which indicates that the playing progress of the audio playing device 200 is slower than the playing progress of the audio source device 100, the playing module may notify the buffer module to delete a part of the audio fragment, so that the playing progress of the audio playing device 200 is the same as the playing progress of the audio source device 100 as soon as possible. For example, the number of audio clips deleted by the audio playback device 200 may be determined based on the quotient of the absolute value of the difference divided by the audio clip playback duration (i.e., the data length of each audio clip). If the quotient of the absolute value of the difference divided by the audio clip playback time length (i.e., the data length of each audio clip) is an integer, then the number is equal to the quotient of the absolute value of the difference divided by the audio clip playback time length. If the quotient of the absolute value of the difference divided by the audio fragment playing time is not an integer, the quotient can be rounded, and the integer obtained after rounding is used as the number of deleted audio fragments. The rounding method may be rounding, rounding up, rounding down, or the like.

For example, if the difference between the actual playing time point of the audio slices minus the expected playing time point is 0.6s (600 ms), and the data length of each audio slice is 10ms, the buffer module is required to delete 600ms/10 ms=60 audio slices.

In case 2, if the actual playing time point of the audio clip is earlier than the expected playing time point carried in the audio clip, it indicates that the playing progress of the audio playing device 200 is faster than the playing progress of the audio source device 100, and the playing module may notify the buffer module to add a part of the audio clip. The added audio slices can be mute audio data, copy the current written audio slice data or other audio data. This may correspond to the audio playback device 200 waiting for a corresponding time to play a subsequent audio clip, and may enable the audio playback device 200 to have the same playback schedule as the audio source device 100.

The number of audio clips to be added may be similar to the number of audio clips to be deleted in case 1, and will not be described here.

For example, if the difference between the actual playing time point of the audio slice and the expected playing time point is-0.6 s (600 ms), and the data length of each audio slice is 10ms, the buffer module needs to be increased by 600ms/10 ms=60 audio slices. The data of the added audio slice is all 0, i.e. mute data.

It should be emphasized that, as described above, the predicted play time point (playtime) carried by each audio slice is determined according to the number of the audio slice, the predicted play time point of the first audio slice, and the adjustment coefficient. Wherein the adjustment coefficient is dynamically changed according to the number of audio clips buffered in the audio playback device 200. The calculation and update process of the adjustment coefficient are described in detail below.

Specifically, when the audio playing device 200 determines that the current time is the predicted playing time point of the first audio clip, after the audio playing is started, the change characteristics of the number of audio clips buffered in the audio playing device 200 and time are recorded, and the adjustment coefficient is calculated according to the change characteristics. That is, when the audio playback apparatus 200 performs step S405, step S406 is also performed, specifically as follows:

s406, the audio source device 100 records the change characteristics of the number of audio clips buffered in the audio playback device 200 and time, and calculates the adjustment coefficient according to the change characteristics.

Illustratively, the audio playback apparatus 200 includes a distribution module, an adjustment coefficient calculation module, a playback module, and a buffer module. Then, step S406 may specifically include steps S406a to S406d.

S406a, after receiving the notification of starting the audio playing, the playing module notifies the adjustment coefficient calculating module to start periodically collecting the number of the audio fragments in the buffer module.

In some embodiments, the playing module may immediately notify the adjustment coefficient calculating module to start periodically collecting the number of audio slices in the buffer module after receiving the notification of starting audio playing, or may notify the adjustment coefficient calculating module to start collecting after receiving a period of time (for example, 1 second) after receiving the notification of starting audio playing. In other embodiments, the playing module may further notify the adjustment coefficient calculating module to start the collection when detecting that the absolute value of the difference between the predicted playing time point and the actual playing time point of a certain audio slice is greater than a preset threshold a (or other thresholds are also possible). That is, the timing at which the adjustment coefficient calculation module starts periodically collecting the number of audio clips in the buffer module is not particularly limited.

S406b, the adjustment coefficient calculation module periodically collects the number of the audio fragments in the buffer module.

In some embodiments, the adjustment factor calculation module may set a timer and then periodically collect the number of audio slices stored in the current buffer module from the buffer module, for example, every 200 μs. In other embodiments, the adjustment coefficient calculation module may also instruct the buffer module to periodically report the number of audio slices stored by the buffer module. That is, the adjustment coefficient calculation module sends an indication of the number of audio slices periodically collected and buffered to the buffer module. After receiving the indication, the buffer module sets a timer and periodically reports the number of the audio fragments stored by the buffer module.

S406c, the adjustment coefficient calculation module stores the number of the audio fragments in the acquired cache module and the acquisition time point.

Illustratively, the adjustment coefficient calculation module records the collection time points and the number of audio slices in the buffer module collected at each collection time point in the sample queue 1, and the contents stored in the sample queue 1 are shown in table three.

Watch III

And S406d, when the preset condition is met, the adjustment coefficient calculation module calculates and updates the adjustment coefficient according to the acquired data of the audio fragments in the cache module and the acquisition time point.

The preset condition may be that the number of data in the sample queue 1 (e.g. one data for each line in table three) reaches a predetermined number, for example 100, or a preset period (e.g. 3 minutes) after the adjustment coefficient is calculated and updated last time.

In some embodiments, the adjustment coefficient calculation module may perform linear fitting on the data in the sample queue 1 to obtain a trend of the audio fragments in the sample queue 1 (i.e. in the buffer module), that is, a relationship between the number of audio fragments in the sample queue 1 and time.

In general, the audio source device 100 transmits an audio stream to the audio playback device 200 according to its playback progress. The received audio stream is buffered in a buffer module of the audio playback device 100 to obtain a buffer queue. It can be understood that the head of the buffer queue is the audio fragment obtained according to the audio stream received first, and the tail of the buffer queue is the audio fragment obtained by the audio stream received later. It can be seen that the play speed of the audio source device 100 (or the speed of playing the audio stream to the audio play device 200, simply referred to as the play speed) affects the increase speed of the audio slices at the tail of the buffer queue. When the audio playback apparatus 200 starts audio playback, the audio clip is acquired from the head of the buffer queue to be played, and the played audio clip is deleted. It can be seen that the play speed of the audio playback device 200 affects the rate of decrease of the audio clips at the head of the buffer queue. In summary, the difference between the playing speed (or delivery speed) of the audio source device 100 and the playing speed of the audio playing device 200 is reflected in the trend of the number of audio clips in the buffer queue. Since the network transmission situation between the audio source device 100 and the audio playback device 200 also affects the number of audio slices at individual time points in the buffer queue, individual abnormal data can be excluded by means of a linear fit.

Specifically, as shown in fig. 5, with time as an X axis, the number of audio slices in the buffer module is a Y axis, and discrete points are drawn on a two-dimensional plane. In other words, each discrete point in fig. 5 is used to characterize the number of audio slices acquired at the corresponding acquisition time point. Then, the discrete points are linearly regressed by a linear regression method to obtain a straight line in fig. 5, wherein the slope of the straight line (denoted as slope 1) represents the trend of increasing or decreasing the audio fragments in the buffer module, i.e. increasing or decreasing the number of the audio fragments in each unit time. When the slope 1 is a positive value, it indicates the number of audio clips that increase per unit time, which also means that the playback speed (or delivery speed) of the audio source device 100 is faster than that of the audio playback device 200. When the slope 1 is negative, it indicates the number of audio clips that decrease per unit time, which also means that the playback speed (or delivery speed) of the audio source device 100 is slower than the playback speed of the audio playback device 200.

Then, the adjustment coefficient can be calculated using equation (1):

adjustment coefficient=1- (slope 1. Value of audio clip playing duration/1000) formula (1)

The value of the audio fragment playing time length is a value when the unit of the audio fragment playing time length is millisecond; the number is not in units. The adjustment coefficient and the slope 1 have no unit.

And after the latest adjustment coefficient is obtained by the adjustment coefficient calculation module, the data in the acquisition queue are emptied. Subsequently, according to the number of audio fragments in the buffer module collected in the next period, an adjustment coefficient of the next period is calculated.

Equation (2) has been obtained in step S404 c:

estimated play time point of nth audio clip = estimated play time point of first audio clip + (N-1) audio clip play time length 1000 adjustment coefficient formula (2)

It will be appreciated that when the slope 1 is positive, meaning that the playback speed (or delivery speed) of the audio source device 100 is faster than the playback speed of the audio playback device 200, then the playback progress of the audio source device 100 is also faster than the playback progress of the audio playback device 200. If the adjustment coefficient is less than 1 according to the formula (1), then the predicted playing time point of the nth audio slice is also reduced (compared with the case when the adjustment coefficient is 1) according to the formula (2), that is, the predicted playing time point of the nth audio slice is advanced. In other words, the play progress of the audio playing device 200 is accelerated, facilitating the catch-up to the play progress of the audio source device 100 as soon as possible.

When the slope 1 is negative, meaning that the playing speed (or delivery speed) of the audio source device 100 is slower than the playing speed of the audio playing device 200, then the playing progress of the audio source device 100 is also slower than the playing progress of the audio playing device 200. It can be deduced from the formula (1) that the adjustment coefficient is greater than 1, and then from the formula (2) that the predicted playing time point of the nth audio slice is also greater (compared to when the adjustment coefficient is 1), that is, the predicted playing time point of the nth audio slice is delayed. In other words, the playback progress of the audio playback device 200 is slowed down, facilitating the same playback progress as the audio source device 100.

In summary, since the trend of the number of the audio clips buffered in the audio playing device 200 reflects the difference between the playing speed (or the delivering speed) of the audio playing device 200 and the playing speed of the audio source device 100, the audio playing device 200 updates the adjustment coefficient according to the trend of the number of the audio clips buffered in the audio playing device 200, and calculates the predicted playing time point of the audio clips according to the adjustment coefficient, so that the playing speed of the audio playing device 200 is consistent with the playing speed (or the delivering speed) of the audio source device 100. And further, data overflow or exhaustion of a buffer area at the audio playing device 200 can be avoided, and further, the situation of sound jamming or sound explosion during audio playing is avoided, and the listening experience of the externally-played audio is improved.

In other embodiments of the present application, the playing progress of the audio playing device 200 and the audio source device 100 is inconsistent due to the hardware difference (e.g. different crystal oscillator frequencies) between the two devices. For this reason, the audio playback apparatus 200 may also adjust its own playback speed so that its own playback speed coincides with the playback speed of the audio source apparatus 100, so that the audio playback apparatus 200 can maintain consistency with the audio source apparatus 100 in playback progress for a long time.

Illustratively, the audio playback device 200 includes a distribution module, a playback module, a cache module, and a playback speed module.

Fig. 6 shows a flow of yet another audio playing method provided by the present application. As shown in fig. 6, the audio playing method includes steps S401 to S403, steps S404c to S404e, step S405a, steps S601 to S611, and step S406.

The descriptions of the relevant contents in fig. 4 are referred to in step S401-step S403, step S404 c-step S404e, step S405a, and step S406, and are not repeated here.

S601, the playing module acquires the content of the audio fragment from the caching module.

When the current time is equal to the expected playing time point of the first audio fragment, the distribution module informs the playing module to start audio playing, and the playing module starts to sequentially read the content of the first audio fragment and the following audio fragments from the caching module.

S602, the playing module writes the acquired audio fragment content into an audio output driver.

The playing module sequentially writes the read audio fragments into an audio output driver (such as ALSA) in the playing module, and plays the currently written audio fragments through the audio output driver.

S603, the playing module calls an interface of the audio output driver to inquire expected output time of a loudspeaker of the current written audio fragment, namely an actual playing time point of the current written audio fragment.

After the playing module writes the audio fragments into the audio output driver, the playing module also calls the interface of the audio output driver to inquire the expected output time of the loudspeaker of the audio fragment which is written currently, and the expected output time of the loudspeaker of the audio fragment can be considered as the actual playing time point of the audio fragment.

S604, the playing module calculates the difference between the actual playing time point of the audio fragment and the expected playing time point (such as playtime) carried in the audio fragment, and determines whether the absolute value of the difference is greater than a preset threshold A.

In some embodiments, the absolute value of the difference may be determined to be greater than a preset threshold a (e.g., 1 second). The absolute value of the difference is greater than the preset threshold a, which indicates that the difference between the playing progress of the audio playing device 200 and the playing progress of the audio source device 100 is greater, step S605 may be executed to quickly reduce the difference between the playing progress of the two devices. The absolute value of the difference is less than or equal to the preset threshold a, which indicates that the difference in play progress of the audio playing device 200 and the audio source device 100 is small, steps S606-S611 may be performed. That is, by adjusting the play speed of the audio play device 200, the play progress of the audio play device 200 and the audio source device 100 is made to coincide, and the play speed of the audio play device 200 and the play speed of the audio source device 100 are made to coincide. Of course, in other embodiments, the playing speed of the audio playing device 200 is directly adjusted by the following steps without distinguishing the absolute value of the difference between the actual playing time point of the audio fragment and the expected playing time point (such as playtime) from the preset threshold value a. That is, after the playing module calculates the difference between the actual playing time point of the audio segment and the predicted playing time point (e.g. playtime) carried in the audio segment, step S606 may be directly performed without comparing the absolute value of the difference with the preset threshold value a.

S605, the playing module informs the buffer module to delete or add the audio fragments.

When it is determined that the difference between the actual playing time point and the predicted playing time point of the audio clips is greater than the preset threshold value a, it is further determined whether to delete one or more audio clips or add one or more audio clips according to the relative sizes of the actual playing time point and the predicted playing time point.

If the actual playing time point of the audio clip is later than the expected playing time carried in the audio clip, which indicates that the playing progress of the audio playing device 200 is slower than the playing progress of the audio source device 100, the playing module may notify the buffer module to delete a part of the audio clip, so that the playing progress of the audio playing device 200 is the same as the playing progress of the audio source device 100 as soon as possible.

If the actual playing time point of the audio clip is earlier than the expected playing time carried in the audio clip, which indicates that the playing progress of the audio playing device 200 is faster than the playing progress of the audio source device 100, the playing module may notify the buffer module to add a portion of the audio clip. The added audio clips may be mute audio data, copy currently written audio clip data or other audio data, which may be equivalent to that the audio playing device 200 waits for a corresponding time to play a subsequent audio clip, so that the playing progress of the audio playing device 200 and the playing progress of the audio source device 100 are the same.

Subsequently, the playing module monitors the difference value between the actual playing time point and the expected playing time point of the subsequent audio fragment, judges whether the difference value is larger than a preset threshold value A, and further adopts a corresponding method.

S606, the playing module sends the difference value between the actual playing time point of the audio fragment and the corresponding playing time point to the playing speed module.

Since the playing speed of the audio playing device 200 is to be adjusted, the trend of the change of the actual playing time point and the difference value of the audio playing device 200 needs to be calculated, and therefore, when the playing module determines that the difference value is smaller than the preset threshold value a, the difference value and the actual playing time point corresponding to the difference value are sent to the playing speed module.

S607, the playing speed module stores the difference value between the actual playing time point of the audio fragment and the corresponding playing time point.

Illustratively, the playing speed module records the difference value between the actual playing time point of the received audio fragment and the corresponding time point in the sample queue 2, and the content stored in the sample queue 2 is shown in table four.

Table four

S608, the playing speed module calculates the deviation of the playing speed according to the difference value between the actual playing time point of the audio fragment and the corresponding playing time point.

When the play-out module determines that a certain condition is satisfied, a deviation of the play-out speed of the audio playback device 200 from the audio source device 100 may be calculated from the data in the sample queue 2. The certain condition may be, for example, that the number of data pieces in the sample queue 2 (e.g., one piece of data for each line in table four) reaches a predetermined number, for example, 100 pieces, etc.

In some embodiments, the adjustment coefficient calculation module may perform linear fitting on the data in the sample queue 2 to obtain a variation trend of the deviation of the playing speeds of the audio playing device 200 and the audio source device 100.

Specifically, the actual playing time point is taken as an X axis, the difference value between the actual playing time point and the predicted playing time point is taken as a Y axis, and discrete points are drawn on a two-dimensional plane. And linear regression is performed on the discrete points to obtain a kneaded straight line, and the slope of the straight line (marked as slope 2) represents the deviation of the actual playing speed from the expected playing speed, namely, how much time is deviated per unit time. As shown in (1) of fig. 7, when the slope 2 of the fitted straight line is a positive value, it means that the actual playing speed of the audio playing device 200 is slower than the expected playing speed, and it is necessary to speed up the playing speed of the audio playing device 200. As shown in (2) of fig. 7, when the slope 2 of the fitted straight line is negative, it indicates that the actual playing speed of the audio playing device 200 is faster than the expected playing speed, and the playing speed of the audio playing device 200 needs to be slowed down. If the slope 2 is zero, the actual playing speed of the audio playing device 200 is the same as the expected playing speed, and the playing speed of the audio playing device 200 does not need to be adjusted.

S609, the playing speed module calculates the target playing speed according to the deviation of the playing speed.

Wherein the target playing speed is the desired playing speed of the audio playing device 200. Then, the target play speed can be calculated using equation (3):

target playback speed = current playback speed (1 + slope 2) equation (3)

It will be appreciated that when the slope 2 is positive, the target play speed calculated according to equation (3) will increase. When the slope 2 is negative, the target play speed calculated according to the formula (3) will decrease.

In other examples, the preset threshold B may also be set. When the absolute value of the slope 2 is smaller than the preset threshold B, it may be considered that the difference between the actual playing time point and the predicted playing time point of the audio playing device 200 is small, and the playing speed of the audio playing device 200 may not be adjusted. When the absolute value of the slope 2 is greater than or equal to the preset threshold B, the playing speed of the audio playing device 200 is adjusted by adopting the formula (3).

S610, the playing speed module sends the target playing speed to the playing module.

S611, the playing module modifies the playing speed value of the audio output drive to be the target playing speed.

For example, the play module adjusts the play speed of the audio output drive to be (1+slope 2) times the current speed.

In summary, according to the difference between the actual playing time point and the expected playing time point of the audio playing device 200, the playing speed of the audio playing device 200 is adjusted to be consistent with the playing speed (or the playing speed) of the audio source device 100, so that the situation that the audio playing device 200 needs to frequently delete the audio fragments in the buffer memory due to different speeds of the two devices can be avoided.

In still other embodiments of the present application, the solution described in fig. 4 above may be combined with the solution described in fig. 6. That is, the adjustment coefficient is calculated by the trend of the number of audio clips buffered in the audio playback apparatus 200, and the predicted playback time point of each audio clip, that is, the time point at which the audio playback apparatus 200 and the audio source apparatus 100 are predicted to start playing the audio clip, is calculated based on the adjustment coefficient. Then, the audio playback apparatus 200 may further adjust its own playback speed so that its own playback speed coincides with the playback speed of the audio source apparatus 100, so that the audio playback apparatus 200 can maintain consistency with the audio source apparatus 100 in playback progress for a long time.

In still other embodiments of the present application, other audio playback devices, such as audio playback device 300, are also connected to audio playback device 200. The audio content is played by the audio playback device 200 and the audio playback device 300 together. Then, in addition to the playback speed of the audio playback device 200 being consistent with the playback speed (or delivery speed) of the audio source device 100, the playback speed of the audio playback device 300 is also consistent with the playback speed (or delivery speed) of the audio source device 100.

Fig. 8 shows a flow of yet another audio playing method provided by the present application. As shown in fig. 8, the audio playing method includes steps S401 to S406, and steps S801 to S805.

In step S401 to step S406, please refer to the related content of the flow in fig. 4. The difference from the flow in fig. 4 is described here with emphasis.

First, a wired connection may be established between the audio playback device 200 and the audio playback device 300, or a wireless connection may be established, where the wireless connection may be, for example, bluetooth, WLAN, NFC, etc.

S801, the audio playback apparatus 200 is time-synchronized with the audio playback apparatus 300.

In some examples, audio playback device 200 and audio playback device 300 establish a wireless connection. Before the audio playback apparatus 200 and the audio playback apparatus 300 play back the audio content together, the audio playback apparatus 200 and the audio playback apparatus 300 perform time synchronization. For example, the audio playback apparatus 200 may perform time synchronization with the audio playback apparatus 300 after receiving the audio stream (i.e., step S402), or after the audio playback apparatus 200 transmits the audio clip to the audio playback apparatus 300 (i.e., step S802), or after the audio playback apparatus 200 transmits the startup audio playback to the audio playback apparatus 300 (i.e., step S804).

In particular, the audio playback device 200 (e.g., in particular, a time synchronization module) may perform time synchronization with the audio playback device 300 (e.g., in particular, a time synchronization module) using a simple network time protocol (simple network time protocol, SNTP) or a high precision time synchronization protocol (precision time protocol, PTP), or the like.

S802, the audio playback apparatus 200 transmits the audio clip to the audio playback apparatus 300.

Illustratively, after the distribution module of the audio playback apparatus 200 performs step S404c, the obtained audio slice is sent to its own buffer module (i.e., step S404 d), and the obtained audio slice is sent to the audio playback apparatus 300 (e.g., the buffer module of the audio playback apparatus 300).

S803, the audio playback device 300 caches the audio clips.

Illustratively, the caching module of the audio playback device 300 caches received audio clips.

S804, the audio playback apparatus 200 notifies the audio playback apparatus 300 to start audio playback.

Illustratively, when the distribution module of the audio playing device 200 determines that the current time is later than or equal to the predicted playing time point of the first audio clip, on the one hand, the own playing module is notified to start audio playing (i.e., step S405 a), and on the other hand, the audio playing device 300 (e.g., the playing module of the audio playing device 300) is notified to start audio playing.

S805, the audio playback device 300 starts audio playback.

Illustratively, the audio playback device 300 reads the first audio slice and the subsequent audio slices from its own buffer module, and starts playing.

Note that the audio clips received by the audio playback apparatus 300 carry predicted playback time points, and the predicted playback time points are periodically updated by the audio playback apparatus 200 according to the playback progress difference between the audio source playback apparatus 100 and the audio playback apparatus 200.

As with the playing process of the audio playing device 200 (refer to the relevant content in step S405 b), the audio playing device 300 writes the audio clip into an audio output driver (e.g. ALSA) when playing the audio clip, and invokes the interface of the audio output driver to read the speaker output time point of the currently written audio clip. The speaker output time point may be considered as a time point when the current written audio clip is actually played by the audio playing device 300, and is simply referred to as an actual playing time point of the current written audio clip. And deleting or adding the audio clips of the buffer module in the audio playing device 300 by comparing the actual playing time point of the currently written audio clip with the expected playing time point, so that the playing speed of the audio playing device 300 is consistent with the playing speed (or putting speed) of the audio source device 100.

Fig. 9 shows a flow of yet another audio playing method provided by the present application. As shown in fig. 9, the method includes steps S401 to S404, S405a, S406, S601 to S611, S801 to S804, and 901 to S911.

The difference between the actual playing time point of each audio slice and the predicted playing time point carried in the audio slice is recorded by the audio playing device 200, and the playing speed of the audio playing device 200 is adjusted according to the difference, so that the playing speed of the audio playing device 200 is consistent with the playing speed of the audio source device 100. That is, the audio playback apparatus 200 performs steps S601 to S611, and specific content may refer to the relevant content of the flow in fig. 6.

In addition, similarly to the adjustment play speed of the audio playback apparatus 200, the audio playback apparatus 300 adjusts its own play speed in a similar manner so that the play speed of the audio playback apparatus 300 also coincides with the play speed of the audio source apparatus 100. I.e., the audio playback apparatus 300 performs steps S901 to S911. In some examples, the audio playback device 300 further includes a play-speed module.

S901, a playing module of the audio playing device 300 obtains the content of the audio clip from the buffer module.

S902, the playing module writes the acquired audio fragment content into an audio output driver.

S903, the playing module calls an interface of the audio output driver to inquire the expected output time of the loudspeaker of the current written audio fragment, namely the actual playing time point of the current written audio fragment.

S904, the playing module calculates the difference between the actual playing time point of the audio fragment and the expected playing time point (such as playtime) carried in the audio fragment, and determines whether the difference is greater than a preset threshold A.

S905, the playing module informs the buffer module to delete or add the audio fragments.

S906, the playing module sends the difference value between the actual playing time point of the audio fragment and the corresponding playing time point to the playing speed module.

S907, the playing speed module stores the difference value between the actual playing time point of the audio fragment and the corresponding playing time point.

S908, the playing speed module calculates the deviation of the playing speed according to the difference value between the actual playing time point of the audio fragment and the corresponding playing time point.

S909, the playing speed module calculates the target playing speed according to the deviation of the playing speed.

S910, the playing speed module sends the target playing speed to the playing module.

S911, the playing module modifies the playing speed value of the audio output drive to be the target playing speed.

The specific content of step S901 to step S911 may refer to the relevant content of step S601 to step S611 in fig. 6, and will not be described herein.

It should be noted that the target playing speed of the audio playing device 300 calculated in step S901 to step S911 is the same or substantially the same as the target playing speed of the audio playing device 200 calculated in step S601 to step S611.

Of course, in other embodiments, the audio playing device 200 may also determine the predicted playing time point of each audio clip according to the existing technical scheme, that is, the predicted playing time point of each audio clip is not adjusted by using the adjustment coefficient, but the playing speed of the audio playing device 200 is adjusted directly according to the difference between the actual playing time point and the predicted playing time point of each audio clip until the playing speed (or the playing speed) of the audio source device is consistent. Meanwhile, the audio playing device 300 may also determine the actual playing time point of each audio segment directly according to the audio output drive thereof, calculate the difference between the actual playing time point of each audio segment and the expected playing time point, and adjust the playing speed of the audio playing device 300 until the playing speed is consistent with the playing speed (or the throwing speed) of the audio source device.

In fig. 8 and 9, the audio playback device 200 may be a master speaker or master earphone, and the audio playback device 300 may be a slave speaker or slave earphone. Alternatively, the audio playback device 200 may be a master speaker and the audio playback device 300 may be a slave earphone. Alternatively, the audio playback device 200 may be a master earphone and the audio playback device 300 may be a slave speaker.

It should be noted that the audio playing device is not limited to a specific audio playing device such as a speaker, a headphone, and the like, but may be a composite device such as a mobile device with a speaker.

All or part of any of the features of the above embodiments of the present application may be freely combined, and the present application is also within the scope of the present application.

The embodiment of the application also provides a chip system. As shown in fig. 10, the system-on-chip includes at least one processor 2101 and at least one interface circuit 1102. The processor 2101 and the interface circuit 1102 may be interconnected by wires. For example, the interface circuit 1102 may be used to receive signals from other devices (e.g., a memory of the audio playback apparatus 200). For another example, the interface circuit 1102 may be used to send signals to other devices (e.g., the processor 2101). The interface circuit 1102 may, for example, read instructions stored in a memory and send the instructions to the processor 2101. The instructions, when executed by the processor 2101, may cause the electronic device to perform the various steps performed by the audio playback device 200 (e.g., a loudspeaker) in the embodiments described above. Of course, the system-on-chip may also include other discrete devices, which are not particularly limited in accordance with embodiments of the present application.

It will be appreciated that the above-described terminal, etc. may comprise hardware structures and/or software modules that perform the respective functions in order to achieve the above-described functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The embodiment of the application can divide the functional modules of the terminal and the like according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.

The functional units in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.

The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An audio playing method applied to a first audio playing device, wherein the first audio playing device is in wireless communication with the audio source device, and the method comprises the following steps:

receiving audio data sent by the audio source equipment;

dividing the audio data into N audio slices;

caching the N audio fragments; the method comprises the steps of obtaining an expected playing time point of each audio fragment according to a first adjustment coefficient;

sequentially playing each audio fragment;

periodically collecting the current number of cached audio fragments and a collection time point corresponding to the current number;

after the periodically collected time length reaches the preset time length, or after the periodically collected times reach the preset times,

obtaining a second adjustment coefficient according to the current quantity collected each time and the collection time point corresponding to the current quantity collected each time; obtaining the predicted playing time point of each subsequent audio fragment according to the second adjustment coefficient;

Sequentially playing the subsequent audio fragments;

wherein N is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.

2. The method of claim 1, wherein prior to receiving the audio data transmitted by the audio source device, the method further comprises:

an indication of playing audio sent by an audio source device is received.

3. The method according to claim 1 or 2, wherein the second adjustment coefficient is obtained according to the current number of each acquisition and the acquisition time point corresponding to the current number of each acquisition; comprising the following steps:

performing linear fitting on the current quantity collected each time and the collection time points corresponding to the current quantity collected each time to obtain a first slope;

and obtaining a second adjustment coefficient according to the first slope.

4. A method according to any of claims 1-3, wherein the current number of buffered audio slices and the acquisition time points corresponding to the current number are periodically acquired; comprising the following steps:

when the absolute value of the difference value between the actual playing time point and the expected playing time point of any one audio fragment is larger than a first threshold value, the first audio playing device starts periodically collecting the current number of the cached audio fragments and the collection time point corresponding to the current number;

The actual playing time point of the audio fragment is the expected output time point of the loudspeaker of the audio fragment; the expected output time point of the loudspeaker of the audio fragment is obtained by calling an interface query of the audio output drive of the first audio playing device by the first audio playing device.

5. An audio playing method applied to a first audio playing device, wherein the first audio playing device is in wireless communication with the audio source device, and the method comprises the following steps:

receiving audio data sent by the audio source equipment;

dividing the audio data into N audio slices;

sequentially playing each audio fragment;

after the absolute value of the difference between the actual playing time point and the predicted playing time point of the audio fragments is larger than a preset threshold, adjusting the number of the cached audio fragments;

the actual playing time point of the audio fragment is the expected output time point of the loudspeaker of the audio fragment; the expected output time point of the loudspeaker of the audio fragment is obtained by calling an interface query of an audio output drive of the first audio playing device through the first audio playing device; n is a positive integer greater than or equal to 2; the first adjustment coefficient is a preset coefficient.

6. The method of claim 5, wherein the number of buffered audio clips is adjusted after an absolute value of a difference between an actual playing time point and an expected playing time point of the audio clip is greater than a preset threshold; comprising the following steps:

after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio fragments is larger than the preset threshold value and the difference is a negative value, adding a first number of audio fragments;

and deleting the first number of audio fragments after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio fragments is larger than the preset threshold value and the difference is a positive value.

7. The method of claim 6, wherein the first number is associated with a quotient of an absolute value of the difference divided by the audio clip playback time period.

8. The method of claim 7, wherein the first number is a quotient of an absolute value of the difference divided by the audio clip playback time period.

9. The method of any of claims 6-8, wherein the first number of audio clips added is silence data.

10. The method according to any of claims 5-9, wherein the playing speed of the first audio playing device is adjusted after the absolute value of the difference between the actual playing time point and the expected playing time point of the audio piece is smaller than or equal to the preset threshold.

11. The method of claim 10, wherein, after an absolute value of a difference between an actual playing time point and an expected playing time point of the audio piece is less than or equal to the preset threshold,

collecting the actual playing time point of the audio fragment and the difference value between the actual playing time point and the expected playing time point of the audio fragment;

after the number of times of collection reaches the preset number of times, or after the duration of collection reaches the preset duration,

performing linear fitting on the difference value corresponding to the actual playing time point collected each time to obtain a second slope;

acquiring the current playing speed of the first audio playing device;

obtaining an adjusted playing speed according to the current playing speed and the second slope;

and sequentially playing the subsequent audio clips at the adjusted playing speed.

12. The method according to any one of claims 5-11, wherein the first audio playback device has a second audio playback device connected thereto, the method further comprising:

and sending the N audio fragments to the second audio playing device.

13. The method of claim 12, wherein prior to the first audio playback device playing the first audio clip, the method further comprises:

and sending an instruction for starting to play the audio fragments to the second audio playing device.

14. A first audio playback device comprising a processor, audio output means and a memory, both coupled to the processor, the memory being for storing a computer program which, when executed by the processor, causes the first audio playback device to perform the method as claimed in any one of claims 1 to 13.

15. A computer readable storage medium comprising a computer program which, when run on a first audio playback device, causes the first audio playback device to perform the method of any one of claims 1-13.

16. A computer program product, characterized in that the computer program product, when run on a computer, causes the computer to perform the method according to any of claims 1-13.