CN116110411A

CN116110411A - Audio transmission method, chip, user terminal and audio playing device

Info

Publication number: CN116110411A
Application number: CN202310066078.2A
Authority: CN
Inventors: 颜廷管; 余庆华; 王泷
Original assignee: Zeku Technology Shanghai Corp Ltd
Current assignee: Zeku Technology Shanghai Corp Ltd
Priority date: 2023-01-12
Filing date: 2023-01-12
Publication date: 2023-05-12

Abstract

Embodiments of the present application relate to an audio transmission method, a chip, a user terminal, an audio playback device, a computer-readable storage medium, and a computer program product. The audio transmission method is applied to the user terminal and comprises the following steps: based on the fact that the number of frames to be played cached by the audio playing device exceeds a first threshold, encoding one frame of audio data according to a first encoding mode; and sending a first audio coding packet corresponding to the frame of audio data to the audio playing device. By setting the buffer area in the audio playing device, the technical scheme of the application can reduce the problem of audio playing blocking caused by the fact that the audio playing device does not receive data when the communication quality between the user terminal and the audio playing device is poor.

Description

Audio transmission method, chip, user terminal and audio playing device

Technical Field

Embodiments of the present disclosure relate to the field of data transmission technologies, and in particular, to an audio transmission method, a chip, a user terminal, an audio playing device, a user terminal, a computer readable storage medium, and a computer program product.

Background

With the popularization of wireless audio playing devices, users have increasingly higher requirements on the playing quality of the devices. When wireless transmission of audio data is performed, large delay is likely to occur, and even jamming and silence occur, because interference is likely to occur during wireless transmission.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio transmission method, a chip, a user terminal, an audio playback device, a computer-readable storage medium, and a computer program product that enable playback without jamming and that enable audio playback quality to meet user demands.

In a first aspect, the present application provides an audio transmission method, applied to a user terminal, where the audio transmission method includes:

based on the fact that the number of frames to be played cached by the audio playing device exceeds a first threshold, encoding one frame of audio data according to a first encoding mode;

and sending a first audio coding packet corresponding to the frame of audio data to the audio playing device.

In a second aspect, the present application provides an audio transmission method, applied to an audio playing device, where the audio transmission method includes:

and generating and sending a feedback signal to the user terminal based on whether the buffered frame number to be played exceeds a first threshold value and/or is lower than a second threshold value, so that the user terminal can select an audio coding mode based on the feedback signal.

In a third aspect, the present application provides an audio transmission method, applied to an audio playing device, where the audio transmission method includes:

updating the occupation amount information of the buffer area according to the received audio coding packet;

and generating and sending the feedback signal to the user terminal according to the updated occupancy information of the buffer memory so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold value and/or is lower than a second threshold value according to the feedback signal.

In a fourth aspect, the present application provides a chip configured to perform an audio transmission method as described above.

In a fifth aspect, the present application provides a user terminal comprising a memory storing a computer program and a processor implementing the steps of the method as described above when the processor executes the computer program.

In a sixth aspect, the present application provides an audio playing device comprising a memory storing a computer program and a processor implementing the steps of the method as described above when the processor executes the computer program.

In a seventh aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.

In an eighth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.

According to the audio transmission method, the chip, the user terminal, the audio playing device and the storage medium, the buffer area is arranged on the audio playing device, so that the buffer of audio data with a certain frame number can be realized. Due to the arrangement of the buffer area, when the communication quality between the user terminal and the audio playing device is poor, the audio playing device can play the buffered audio data. Therefore, the technical scheme of the application can reduce the problem of audio playing clamping caused by the fact that the audio playing equipment does not receive data when the communication quality between the user terminal and the audio playing equipment is poor. In addition, the buffered number of frames to be played exceeding the first threshold value indicates that the audio playing device can directly retrieve data from the buffer area for playing in consecutive multi-frames, and accordingly, the user terminal can encode the audio data in an appropriate encoding mode to provide audio playing quality meeting the user requirements, for example, higher audio playing quality. Therefore, the audio transmission method can enable the playing to be free from jamming and enable the audio playing quality to meet the requirements of users.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of a related art audio processing and transmission method;

FIG. 2 is a flowchart of an audio transmission method according to an embodiment;

FIG. 3 is a second flowchart of an audio transmission method according to an embodiment;

FIG. 4 is a third flowchart of an audio transmission method according to an embodiment;

FIG. 5 is a flowchart illustrating steps performed in determining that the number of frames to be played buffered by an audio playback device exceeds a first threshold;

FIG. 6 is a schematic diagram of an encapsulation format of an audio encoding packet according to an embodiment;

FIG. 7 is a fourth flowchart of an audio transmission method according to an embodiment;

FIG. 8 is a diagram illustrating an encapsulation format of a real-time audio encoding packet according to an embodiment;

FIG. 9 is a fifth flowchart of an audio transmission method according to an embodiment;

FIG. 10 is a flowchart of a method for audio transmission according to an embodiment;

FIG. 11 is a schematic diagram of a package format of a feedback signal according to an embodiment;

FIG. 12 is a flow chart of a method of audio transmission according to an embodiment;

fig. 13 is a block diagram of an audio transmission apparatus according to an embodiment;

fig. 14 is an internal structural diagram of a user terminal of an embodiment;

fig. 15 is an internal structural diagram of an audio playback apparatus of an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

It will be understood that the terms "first," "second," and the like, as used herein, may be used to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another element. For example, a first audio encoding packet may be referred to as a third audio encoding packet, and similarly, a third audio encoding packet may be referred to as a first audio encoding packet, without departing from the scope of the present application. Both the first audio encoding packet and the third audio encoding packet are audio encoding packets, but they are not the same audio encoding packet.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" is at least two, such as two, three, etc., unless explicitly defined otherwise. In the description of the present application, the meaning of "several" means at least one, such as one, two, etc., unless explicitly defined otherwise.

The embodiment of the application provides an audio transmission method which is used for transmitting audio data between a user terminal and audio playing equipment. Specifically, the user terminal transmits audio data to the audio playing device, and the audio playing device receives the audio data and plays the audio data. And wireless communication connection such as Bluetooth, wiFi and the like can be established between the user terminal and the audio playing device. User terminals include, but are not limited to, cell phones, smart wearable devices, in-vehicle user terminals, tablet computers, personal computers (Personal Computer, PCs), personal digital assistants (Personal Digital Assistant, PDAs), and the like. Audio playback devices include, but are not limited to, bluetooth headphones, bluetooth speakers, etc. Taking a real wireless bluetooth headset (true wireless studio, TWS) as an example, a user can listen to music and make a call using the TWS in daily life and work. Moreover, users have increasingly high requirements on the playing quality of audio playing devices. For example, many users wish to listen to high quality music, even lossless (lossless) music, through an audio playback device.

For ease of presentation, the following description is presented in terms of transmitting audio data via bluetooth communications (including classical bluetooth and BLE bluetooth), however, it should be understood that audio data may be transmitted via Wi-Fi communications, other short range communications, or 4g,5g cellular communications, or via wire.

Fig. 1 is a schematic diagram of an audio processing and transmission method according to the related art, referring to fig. 1, in which a user terminal starts processing data from a non-real-time audio source, which may be a lossless music data source, for example, a lossless audio compression-coding (Free Lossless Audio Codec, FLAC) audio source. Besides lossless audio sources, non-real-time audio sources may also be lossy audio sources, such as MP3 audio sources, etc. The user terminal performs pulse code modulation (Pulse Code Modulation, PCM) decoding on the audio data to obtain PCM audio data. The PCM audio data of the non-real-time audio source and the real-time audio source is mixed based on the PCM audio data, such as audio data of games, keys, telephones, etc., for which the real-time audio source is determined to exist. And then, carrying out audio coding on the PCM audio data after the audio mixing to obtain an audio coding packet. In some embodiments, the audio encoding may be Lossless FLAC or apple Lossless audio compression encoding (Apple Lossless Audio Codec, ALAC), and the audio sources of the Lossless audio compression encoding may be collectively referred to as Lossless audio sources. In other embodiments, the audio encoding may be any of advanced audio encoding (Advanced Audio Coding, ADC), sub-band encoding (SBC), apt-X, LDAC, LHDC, and the like. After the above processing, the sending module wirelessly transmits the audio encoding packet to the audio playing device.

The receiving module of the audio playing device receives the transmitted audio coding packet and performs PCM decoding on the received audio coding packet to obtain PCM audio data. Wherein the number of bits of the PCM audio data decoded by the audio playing device is the same as that of the user terminal. The PCM audio data is converted into playable analog audio signals through processing of digital-to-analog converters (Digital to Analog Converter, DAC) and power amplifiers (amplifier for power, AMP), and playback of the analog audio signals is completed by a playback unit.

It will be appreciated that in performing the above wireless transmission, there may be delays or errors in the data received by the audio playback device due to a variety of factors. These influencing factors may come from interference in different frequency bands, and may be interference of other audio playing devices, or limitation of radio frequency performance of the user terminal itself. Wherein the different frequency band is for example the 2.4G frequency band of Wi-Fi. Especially in non-ideal transmission environments, factors such as environmental radio frequency interference may further increase delay caused by data retransmission. The non-ideal transmission environment may be a scene with more interference such as subway, bus, etc., or a scene where the user terminal is in a moving state. Thus, the above effects may cause delay in transmission of audio encoded packets, and may even cause a situation where the audio is played with stuck and silent.

In order to solve the above-mentioned problems, an embodiment of the present application proposes an audio transmission method, where the audio transmission method of the present embodiment is applied to a user terminal. Fig. 2 is one of flowcharts of an audio transmission method according to an embodiment, and referring to fig. 2, the audio transmission method includes steps 202 to 204.

Step 202, based on determining that the number of frames to be played buffered by the audio playing device exceeds a first threshold, encoding a frame of audio data according to a first encoding mode.

Wherein the buffered audio data may be non-real time audio data. The non-real-time audio data is audio data having low instantaneity, and may be understood as data having low relevance to an instant operation of a user, and the non-real-time audio data may be music, for example. Based on the above characteristics, non-real-time audio data is more suitable for cached scenes. The real-time audio data is audio data having high instantaneity, and may be understood as data having high relevance to the instant operation of the user, and the non-real-time audio data may be any of video sound, background sound of an application program running on the user terminal, talking voice, and alert sound, for example. Therefore, in the subsequent embodiments of the present application, the buffered audio data is taken as non-real-time audio data as an example, but it is understood that the buffered audio data is real-time audio data, and the embodiment is not limited thereto.

For example, whether the audio data is non-real-time audio data may be determined according to the source of the audio data, and accordingly, a source table of the non-real-time audio data is pre-stored in the processor of the user terminal, and whether the source of the audio data is one of the pre-stored source tables is matched. For example, based on determining that the audio data originates from a music application and that the music application is one of a pre-stored list of sources, the audio data may be determined to be non-real-time audio data. As another example, whether the audio data is non-real time audio data may also be determined based on an identification of the audio data, which may be tagged by an application generating the audio data. For example, the audio data may be determined to be non-real-time audio data based on determining that the audio data has an identification of non-real-time audio data. It will be appreciated that an application may generate either real-time audio data or non-real-time audio data. For example, taking an application program as an instant messaging application program as an example, when a user performs voice chat or video chat, the instant messaging application program can generate real-time audio data according to chat content, and when the user reads public number articles, the instant messaging application program can generate non-real-time audio data according to background music of the articles. In this scenario, the accuracy of the identification-based determination method may be relatively higher, and the manner of adding the identification may have higher requirements on the application program. Therefore, in actual processing, whether the audio data has an identifier or not can be firstly identified, and whether the audio data to be transmitted is real-time audio data or non-real-time audio data is determined according to the identifier based on the determination that the identifier exists; based on determining that no identification exists, it is determined from the application generating the audio data and a pre-stored source list. It should be noted that, the above-mentioned manner of determining whether the non-real-time audio data or the real-time audio data to be transmitted exists is only used for illustration, but is not used for limiting the protection scope of the embodiment, and other manners may be selected to determine whether the non-real-time audio data or the real-time audio data to be transmitted exists according to the need.

Wherein, the audio playing device is provided with a buffer area. The buffer area is a part of the memory space, that is, the audio playing device reserves a certain storage space in the memory space, and these storage spaces are used for buffering the input or output data, and this reserved storage space is called a buffer area. In this embodiment, the buffer may buffer audio data. In particular, if non-real-time audio data is buffered, due to the low instantaneity of the non-real-time audio data, the user will not normally operate the non-real-time audio data in real time, and accordingly, the non-real-time audio data will be played in its own order. Therefore, the user terminal can learn the non-real-time audio data to be transmitted at a plurality of future moments at a certain moment, and accordingly, the audio playing device can learn the non-real-time audio data to be played at a plurality of future moments at a certain moment. The buffer area can buffer the non-real-time audio data to be played at a plurality of future moments, so that the audio playing device can directly call the non-real-time audio data to be played from the internal buffer area when the non-real-time audio data is needed to be played, and the non-real-time audio data is not needed to be acquired from the user terminal. Therefore, when the audio played by the audio playing device is not generated based on the instantly transmitted data, even if the communication quality is reduced, the corresponding playing jamming phenomenon is not generated, and the occurrence probability of the playing jamming is greatly reduced.

Specifically, each time the audio playing device plays a frame, the buffer area correspondingly reduces the audio data of one frame. With the increase in wireless communication speed, the transmission speed of audio data may be greater than the consumption speed of audio data. Based on determining that the number of frames to be played buffered by the audio playing device exceeds a first threshold, the buffer of the audio playing device cannot necessarily buffer more data. Therefore, the user terminal can determine the transmission frequency of the non-real-time audio data according to the frame rate of the audio playing device, so as to reduce the occurrence of the situation that the audio playing device cannot buffer the received audio coding packet. For example, based on determining that each frame of the audio playback device has a duration of 10ms, the user terminal may transmit one frame of audio data every 10 ms.

Wherein the user terminal may encode and encapsulate the audio data based on a data format specified by the communication protocol to obtain a first audio encoded packet comprising a header portion and a data portion. Specifically, taking non-real-time audio data as an example, the packet header portion may include indication information of a communication protocol and header information corresponding to the audio data, and the data portion includes encoded data generated by encoding the audio data. The indication information of the communication protocol can at least comprise a supplier identifier (Vendor ID) and an encoder identifier [ ] _Codec ID), etc. The vendor identification may be understood as the code number of the company that issued the communication protocol. The encoder identifies an encoding format that may be used to identify the non-real-time audio data. It will be appreciated that based on determining that the encoder is a lossy encoder, the encoder may only be used for encoding non-real time audio data in a lossy format. Based on determining that the encoder is a lossless encoder, the encoder may be used for both lossy format encoding of non-real time audio data and lossless format encoding of non-real time audio data. Thus, the header section may further include a lossy/lossless flag to indicate whether the encoding format is a lossy format or a lossless format. Optionally, the indication information of the communication protocol may further include information such as a version number of the communication protocol, which is not limited in this embodiment. It should be noted that the vendor identifier, the encoder identifier, the version number, etc. may be one or more of numerals, characters, symbols, etc., which are not limited herein.

Specifically, the first encoding mode refers to an encoding mode that minimizes information loss caused by encoding audio data on the premise of meeting user requirements. For example, audio data may be losslessly encoded based on a determination that the user's demand for audio playback quality is high; audio data may be lossy encoded based on a determination that the user's demand for audio playback quality is low. The user's requirement for audio playing quality may be preset in the user terminal by the user, for example, a correspondence between a music application and the requirement for audio playing quality may be preset, and based on the music application determining that the audio data is from the requirement for high audio playing quality, the audio data may be losslessly encoded. Optionally, the user may set a requirement for audio playing quality in the application program, and before the application program sends the audio data, the application program sends the user setting to the processor, so that an appropriate encoding mode may be selected according to the situation of the buffer area.

Step 204, sending a first audio coding packet corresponding to the audio data of the frame to the audio playing device.

Specifically, since the processing speed of the audio data by the user terminal is often greater than the frame rate of the audio playing device, the user terminal may select the time of generating and transmitting the first audio encoding packet according to the actual situation. For example, the user terminal may select to generate a first audio encoding packet every predetermined time period according to the frame rate, for example, generate a first audio encoding packet every 10ms, and transmit the first audio encoding packet to the audio playing apparatus after the generation. As another example, the user terminal may also choose to generate a plurality of first audio encoding packets in batch, but transmit one first audio encoding packet per preset time interval according to the frame rate, and transmit one first audio encoding packet per 10ms interval. The real-time performance of generating the first audio coding packet by adopting the former example is stronger than that of the latter example, and the former example can not code, package and send the audio data after the song cutting operation based on the fact that the user modifies the audio data (such as cutting songs), but because the latter example is to generate a plurality of first audio coding packets in batches, the latter example can have the condition of coding, packaging and sending invalid audio data. The invalid audio data refers to audio data after the target playing time is the time when the user performs the song cutting operation. However, since the processing method of the latter example is batch processing, and frequent switching with the processing of other data is not required in the processing, fewer instructions need to be set and control logic is simpler. It should be noted that the above transmission method is only used for illustration, and is not used for limiting the protection scope of the present embodiment.

In this embodiment, the buffer area is set in the audio playing device, so that the audio data can be buffered. Due to the arrangement of the buffer area, when the communication quality between the user terminal and the audio playing device is poor, the audio playing device can play the buffered audio data. Therefore, the technical scheme of the application can reduce the problem of audio playing clamping caused by the fact that the audio playing equipment does not receive data when the communication quality between the user terminal and the audio playing equipment is poor. In addition, the buffered number of frames to be played exceeding the first threshold value indicates that the audio playing device can directly retrieve data from the buffer area for playing in consecutive multi-frames, and accordingly, the user terminal can encode the audio data in an appropriate encoding mode to provide audio playing quality meeting the user requirements, for example, higher audio playing quality. Therefore, the audio transmission method can enable the playing to be free from jamming and enable the audio playing quality to meet the requirements of users.

Fig. 3 is a second flowchart of an audio transmission method according to an embodiment, referring to fig. 3, in one embodiment, the audio transmission method further includes steps 302 to 304.

Step 302, based on determining that the number of frames to be played cached by the audio playing device does not exceed a first threshold, and the signal quality information of the audio transmission environment meets a preset condition, encoding the multi-frame audio data synchronously according to a first encoding mode.

Wherein the buffer not exceeding the first threshold indicates that the buffer can buffer more data. The signal quality information meeting the preset condition can be understood as having better signal quality, and can support to rapidly transmit a plurality of second audio coding packets. Specifically, one or more frames of audio data may be encoded once and encapsulated into corresponding second audio encoded packets. That is, the multi-frame audio data may correspond to the plurality of second audio encoding packets one by one, or may be encapsulated into one second audio encoding packet. Specifically, the signal quality information may be information detected based on a coupler or the like in the radio frequency system, and transmitted as further processed by the radio frequency transceiver and then fed back to the processor. Alternatively, the signal quality information may be, but is not limited to, at least one of signal strength, received signal level RSL, quality of scrambling time stamp sequence STS, reference signal received power RSRP, received signal strength indication RSSI, reference signal received quality RSRQ, signal to interference plus noise ratio RS-SINR. Optionally, the signal quality information satisfies the preset condition may be that any one of the signal strength, the received signal level RSL, the quality of the scrambling timestamp sequence STS, the reference signal received power RSRP, the received signal strength indication RSSI, the reference signal received quality RSRQ, and the signal-to-interference and noise ratio RS-SINR is greater than a corresponding reference threshold, or that a plurality of parameters are all greater than a corresponding reference threshold.

Step 304, sending a second audio coding packet corresponding to the multi-frame audio data to the audio playing device.

The second audio encoding packet is the encoding packet generated by the encoding in step 302. In this embodiment, when the buffer area does not exceed the first threshold, by acquiring the signal quality information, an appropriate audio encoding packet generating manner may be adopted according to the signal quality information, so as to provide audio playing quality as much as possible to meet the user requirement on the premise of no jamming. When the signal quality information meets the conditions, a first coding mode with smaller information loss is correspondingly adopted, and a plurality of frames of audio data are synchronously coded, so that the audio playing quality can be ensured, the data quantity of a buffer area can be improved as soon as possible, and the influence of the communication quality reduction of the future audio transmission environment on the fluency of audio playing is restrained.

Further, in one embodiment, based on determining that the signal quality information does not meet a preset condition, it is indicated that the signal quality is poor, and it is not possible to support quick transmission of the plurality of first audio encoding packets. In this case, therefore, it is possible to select to generate only one first audio encoded packet at a time, or to generate other audio encoded packets in an encoding scheme with a relatively large information loss. Still further, it may also be determined, according to the signal quality information, whether the current communication quality is greater than the playing speed of the audio playing device, where the speed of the user terminal sending the first audio encoded packet is greater than the playing speed of the audio playing device, so as to determine whether to encode in the first encoding manner, and encapsulate the encoded first audio encoded packet into a corresponding first audio encoded packet. For example, based on determining that the audio playing device plays one frame of data every 10ms, the user terminal can send the first audio encoding packet to the audio playing device within 10ms, and the audio data can be encoded in the first encoding manner and encapsulated into the corresponding first audio encoding packet. However, if the user terminal needs more than 10ms to send the first audio coding packet to the audio playing device, and the buffer area has less unplayed data, other coding modes can be adopted to code the audio data and package the audio data into corresponding other audio coding packets, so that the volume of the audio coding packets to be transmitted is reduced, the audio coding packets can reach the audio playing device faster, and the problem of playing blocking is reduced.

Fig. 4 is a third flowchart of an audio transmission method according to an embodiment, referring to fig. 4, in one embodiment, the audio transmission method further includes steps 402 to 404.

And step 402, based on the fact that the signal quality information does not meet the preset condition, and the number of frames to be played cached by the audio playing device is lower than a second threshold, encoding a frame of the audio data according to a second encoding mode.

Wherein, the buffered number of frames to be played being lower than the second threshold value means that no or less playable data exists in the buffer area. When the cached frame number to be played is lower than the second threshold value, the audio playing device can only play audio according to the audio coding packet transmitted by the user terminal in real time.

The second coding scheme may be the same as the first coding scheme or different from the first coding scheme, depending on the specific situation of the first coding scheme. Specifically, the second coding scheme may be the same as the first coding scheme or the lossy coding scheme based on determining that the first coding scheme is the lossy coding scheme. Based on determining that the first coding scheme is a lossless coding scheme, the second coding scheme may be a lossy coding scheme, different from the first coding scheme. That is, the information loss of the second encoding mode during the encoding process is greater than the information loss of the first encoding mode during the encoding process, and the volume of the first audio encoding packet is greater than that of the third audio encoding packet. It can be appreciated that, based on determining that the volume of the third audio encoding packet is smaller than the volume of the first audio encoding packet, the requirement for the audio transmission environment for transmitting the third audio encoding packet is lower, and even if the current audio transmission environment has poor communication quality, the third audio encoding packet can be timely and effectively transmitted. Therefore, when the buffered number of frames to be played is lower than the second threshold, the audio playing device has a more urgent receiving requirement for the audio coding packet, and should take timely and effective transmission of the audio coding packet as a transmission target. It should be noted that the third audio encoding packet and the first audio encoding packet are only used to distinguish the encoding modes, and are not used to distinguish the specific content of the encoded non-real-time audio data, that is, the same non-real-time audio data may be encoded and packaged into any one of the first audio encoding packet and the third audio encoding packet according to the requirement.

Step 404, sending a third audio coding packet corresponding to the one frame of audio data to the audio playing device.

The third audio encoding packet refers to the encoding packet generated by the encoding in step 402. In this embodiment, when the buffered number of frames to be played is lower than the second threshold and the audio transmission environment is poor, the user terminal may select to encode the non-real-time data in the second encoding mode with relatively large information loss, so as to generate the third audio encoding packet with smaller volume, thereby ensuring that the data can reach the audio playing device before the time when the data needs to be played, and completing operations such as unpacking and decoding in time, so as to reduce the problem of playing and blocking, and promote the fluency of audio playing.

In one embodiment, the first audio encoding packet and the third audio encoding packet are configured in the same encapsulation format, the audio encoding packet in the encapsulation format including a header portion and a data portion, the header portion including a real-time/non-real-time identification and an encoding packet identification. Wherein the real-time/non-real-time identification is used to identify whether the data portion includes non-real-time audio data and the encoded packet identification is used to identify a decoding order of different audio encoded packets when the data portion includes non-real-time audio data. In this embodiment, by setting the identification bit, the audio playing device can conveniently obtain the coding related information after decapsulating the audio coding packet, so as to accurately decode the data portion according to the coding related information, thereby ensuring the accuracy and fluency of audio playing.

In one embodiment, the audio transmission method further comprises the steps of: based on the fact that the number of frames to be played cached by the audio playing device is not lower than a second threshold, encoding one frame of audio data according to a first encoding mode; and sending a fourth audio coding packet corresponding to the frame of audio data to the audio playing device. Specifically, when the number of frames to be played is not lower than the second threshold, the audio playing device can still play audio according to the cached data, so that the problem of playing clamping is not easy to occur. Therefore, the non-real-time data can be encoded in a better encoding mode, so that the audio playing quality is improved.

In one embodiment, the preset condition includes at least one of: the RSSI of the Bluetooth signal measured by the user terminal is smaller than a third threshold; the error packet rate PER of the audio coding packet measured by the audio playing equipment is larger than a fourth threshold value; the signal to noise ratio measured by the audio playing device is larger than a fifth threshold value; and the retransmission rate of the audio coding packet measured by the user terminal is larger than a sixth threshold. Specifically, the determination may be performed according to the comparison result of a single parameter, or may be comprehensively evaluated according to the comparison result of a plurality of parameters, which is not limited in this embodiment.

In one embodiment, before determining that the number of frames to be played buffered by the audio playing device exceeds the first threshold, the method further includes: and acquiring the number of frames to be played cached by the audio playing device according to the feedback signal sent by the audio playing device. In particular, the feedback signal may carry occupancy information, including occupied or unoccupied capacity. Wherein the occupied capacity is positively correlated with the number of frames to be played, and the unoccupied capacity is negatively correlated with the number of frames to be played. Thus, the audio playback device may increase the corresponding occupied capacity after receiving the audio encoding packet. Based on determining that the occupancy information includes unoccupied capacity, the audio playback device may reduce the corresponding unoccupied capacity after receiving the audio encoding packet. After the audio playing device completes updating the occupation amount information, the audio playing device can select to generate a feedback signal carrying the occupation amount information, so that the user terminal can acquire the number of frames to be played cached by the audio playing device according to the feedback signal. Similarly, the user terminal also judges that the number of frames to be played cached by the audio playing device is lower than a second threshold value through the occupancy information.

Fig. 5 is a flowchart illustrating steps for determining that the number of frames to be played buffered by the audio playing device exceeds a first threshold, and referring to fig. 5, in one embodiment, the steps include steps 502 to 504.

Step 502, obtaining the total capacity of the buffer area of the audio playing device.

The total capacity of the buffer area is determined by the memory setting of the audio playing device. For example, the user terminal may acquire the total capacity of the buffer area of the audio playing device when establishing a connection with the audio playing device, store information of the total capacity, and call when necessary. For example, when the user terminal starts the wireless communication function and performs handshake with the audio playing device to determine that connection is established, the user terminal requests to obtain the total capacity of the buffer area of the audio playing device, or the audio playing device actively sends the total capacity of the buffer area to the user terminal. In this way, the user terminal may no longer acquire the total capacity of the buffer of the audio playback device until the user terminal is disconnected from the audio playback device. For another example, the user terminal may acquire the total capacity of the buffer of the audio playback device when non-real-time audio data needs to be transmitted. For example, the user terminal may request to acquire the total capacity of the buffer area of the audio playback apparatus immediately before the user terminal transmits the non-real-time audio data. It can be understood that the user terminal may also obtain the total capacity of the buffer area of the audio playing device at other occasions, which is not limited in this embodiment.

And step 504, judging that the number of frames to be played cached by the audio playing device exceeds a first threshold according to the total capacity and the number of frames to be played.

In this embodiment, the user terminal may determine the occupation condition of the buffer area according to the feedback signal sent by the audio playing device, so as to accurately determine the encoding mode of the audio encoding packet, so that the non-real-time audio data can be timely transmitted to the audio playing device.

Further, fig. 6 is a schematic diagram of an encapsulation format of an audio coding packet according to an embodiment, and referring to fig. 6, the header portion may further include any one of a sampling rate, a sampling depth, a frame length, a channel number, a channel identifier, and a coding packet length of each channel. The sampling rate, that is, the sampling frequency, is the number of times the analog signal is sampled in a unit time, and the sampling rate may be, for example, 192k, 96k, 48k, 44.1k, or the like. The sampling depth refers to the number of bits of a binary coded stream after an analog signal is sampled at a sampling rate and quantized into the stream. The sampling depth may also be referred to as quantization accuracy. The sampling depth of PCM audio may be represented by the number of bits of PCM audio data, which may be 24 bits, 16 bits, etc. When the PCM audio data is 24 bits, one encoded packet contains 24 bits of PCM audio data. The frame length refers to the length of one frame of audio data, and may be, for example, 10ms, 5ms, or the like. The number of channels may refer to the number of channels, and may be, for example, 1, 2, 3, etc., wherein headphones typically have 2 channels, and sound, etc., may have 2 or more than 3 channels. The channel identifiers may be, for example, a left channel, a right channel, an nth channel, etc., the number of identifiers of the encoding packet lengths of the channels is equal to the number of channel identifiers, the encoding packet lengths of the channels are in one-to-one correspondence with the channel identifiers, and the encoding packet lengths of the channels may be, for example, 1000, 2025, etc.

Still further, the header portion may further include a header check code. The user terminal can calculate the non-real-time audio data according to a preset calculation mode to obtain the check code. For example, the user terminal may divide the read non-real time audio data into a plurality of sets of bit data, each set of bit data packets may contain an M-bit value, where M may be a positive integer, e.g., M may be 1, 4, 8, etc. The user terminal may perform exclusive-or calculation on the current set of bit data and the next set of bit data from the first set of bit data of the read non-real-time audio data until the next set of bit data is the last set of bit data, so as to obtain the check code. For another example, a piece of bit data may be selected from the read non-real-time audio data, and each bit value may be inverted from the selected first bit value to obtain the check code. It should be noted that the check code may be calculated in other manners, which is not limited in this embodiment.

In one embodiment, the packet header further includes a buffer parameter, where the buffer parameter is determined by a sampling rate, a sampling depth, and a frame length, and is used to identify a size of a buffer occupied by data of a certain frame length. For example, based on determining a sampling rate of 192k, a sampling depth of 24 bits, a frame length of 5ms, and a buffer parameter of 192k x 24bit x 5s/8 bit= 2880000byte. Taking playing music as an example, only multiple frames in the same song can be counted into the same buffer parameter, so based on determining that a user performs a song cutting operation, the audio playing device can discard part of buffered data in batches according to the buffer parameter without playing, thereby reducing power consumption.

Fig. 7 is a flowchart of an audio transmission method according to an embodiment, referring to fig. 7, in one embodiment, the audio transmission method includes steps 702 to 708. Step 704 and step 708 may refer to the foregoing embodiments, and are not described herein.

Step 702 encodes real-time audio data.

The real-time audio data refers to audio data with high instantaneity, and can be understood as data with high relevance to instant operation of a user. The non-real-time audio data may be, for example, any one of video sound, background sound of an application program run by the user terminal, call voice, alert sound, and the like. In particular, real-time audio data needs to be transmitted to the audio playback device more timely than non-real-time audio data. Therefore, under the condition that the real-time audio data to be transmitted exist, the real-time audio data can be encoded and packaged preferentially, so that the time difference between the generation time and the playing time of the real-time audio data is shortened, and the use experience of a user is improved. It can be appreciated that, since the data processing speed of the current user terminal tends to be high, the time difference between the generation time and the playing time of the real-time audio data is more affected by the communication quality of the audio transmission environment, and the processing order of the audio data has less effect on the playing time. Thus, in some embodiments, real-time audio data may not be limited to first processing, but non-real-time audio data may be first processed. It should be understood that, fig. 8 is a schematic diagram of an encapsulation format of a real-time audio encoding packet according to an embodiment, and referring to fig. 8, the encapsulation format of the real-time audio encoding packet may be similar to that of a non-real-time audio encoding packet, except that the buffer parameters are not included, so that a description of the encapsulation format of the real-time audio encoding packet is omitted here.

Step 704, based on determining that the number of frames to be played buffered by the audio playing device exceeds a first threshold, encoding a frame of audio data according to a first encoding mode.

Step 706, before sending the first audio coding packet, sending a fifth audio coding packet corresponding to the non-real-time audio to the audio playing device.

Step 708, sending the first audio coding packet to the audio playing device.

In this embodiment, since the timeliness requirement of the user on the real-time audio data is compared with the timeliness requirement of the user on the non-real-time audio data, when the real-time audio data exists, the playing timeliness of the real-time audio data can be effectively improved by sending the non-real-time encoding packet first, and the use experience of the user is improved.

In one embodiment, the audio transmission method is applied to an audio playing device, and the audio transmission method includes generating and sending a feedback signal to a user terminal based on determining whether the buffered number of frames to be played exceeds a first threshold and/or is lower than a second threshold, so that the user terminal can select an audio coding mode based on the feedback signal. The feedback signal is used for identifying whether the number of frames to be played, which are cached by the audio playing device, exceeds a first threshold and/or is lower than a second threshold. In this embodiment, the audio playing device returns a feedback signal to the user terminal in response to receiving the audio coding packet including the audio data, so that the user terminal can learn the occupation change condition of the buffer area of the audio playing device in time, so as to improve the accuracy of selecting the coding mode and sending the audio coding packet by the user terminal, and select the coding mode with better audio playing quality on the premise of not generating a clip, thereby improving the use experience of the user.

Further, due to the high timeliness requirement of the real-time audio data, the real-time audio data can be directly decoded and played after reaching the audio playing device. And the non-real-time audio data can be buffered in the buffer for subsequent playback. Therefore, only when the real-time/non-real-time identification identifies that the audio coding packet comprises non-real-time audio data, a feedback signal needs to be generated and sent according to the data quantity so as to inform the user terminal of the current buffer situation, so that the user terminal can accurately select the coding mode. Alternatively, the audio playback device may return a feedback signal to the user terminal each time non-real-time audio data is received. The audio playing device may also return a feedback signal to the user terminal after receiving a plurality of non-real-time audio data, which is not limited in this embodiment. The audio playing device can learn whether the data part of the audio coding packet comprises real-time audio data or non-real-time audio data according to the real-time/non-real-time identification of the packet head part through the unpacking result of the audio coding packet. Moreover, because the timeliness requirements and the follow-up operations of the real-time audio data and the non-real-time audio data are different, the audio playing device can conveniently execute the accurate follow-up operations by acquiring the real-time/non-real-time identification.

In one embodiment, the method further comprises: and selecting a first decoding mode to decode the received audio coding packet based on the fact that the buffered number of frames to be played exceeds a first threshold or is not lower than a second threshold. The first threshold is greater than the second threshold, and the first decoding mode corresponds to the first encoding mode. Specifically, if the buffered number of frames to be played exceeds the first threshold or is not lower than the second threshold, it indicates that there is a large number of frames to be played, and when the user terminal generates the audio coding packet, the user terminal also encodes the audio coding packet in a better encoding mode, which means that the quality of the generated audio coding packet is high. In this embodiment, based on the buffered frame number to be played, the audio encoded packet may be accurately decoded.

In one embodiment, the method further comprises: and based on the fact that the buffered frame number to be played is lower than a second threshold value, and the signal quality information when the audio coding packet is received does not meet the preset condition, selecting a second decoding mode to decode the received audio coding packet. Wherein the second decoding mode corresponds to the second encoding mode. Specifically, if the buffered number of frames to be played exceeds the first threshold or is not lower than the second threshold, it indicates that there is a smaller number of frames to be played, and when the user terminal generates the audio coding packet, the user terminal also encodes the audio coding packet in a worse coding mode, that is, the quality of the generated audio coding packet is lower, but the volume of the audio coding packet is smaller and the transmission speed is faster. In this embodiment, based on the buffered frame number to be played, the audio encoded packet may be accurately decoded.

Fig. 9 is a flowchart of an audio transmission method according to an embodiment, referring to fig. 9, in one embodiment, the audio transmission method includes steps 902 to 904.

And step 902, updating the occupation amount information of the buffer area according to the received audio coding packet.

Wherein the occupancy information includes occupied capacity or unoccupied capacity; and generating and sending the feedback signal according to the occupancy information. Specifically, based on determining that the occupancy information includes an occupied capacity, the audio playback device may increase the corresponding occupied capacity after receiving the audio encoding packet. Based on determining that the occupancy information includes unoccupied capacity, the audio playback device may reduce the corresponding unoccupied capacity after receiving the audio encoding packet.

Step 904, generating and sending a feedback signal to the user terminal according to the updated occupancy information of the buffer area, so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold and/or is lower than a second threshold according to the feedback signal.

In this embodiment, after the audio playing device completes updating the occupation amount information, the audio playing device may select to generate a feedback signal carrying the occupation amount information, so that the user terminal may obtain the occupation amount information of the buffer area according to the feedback signal. Based on the occupation amount information, the user terminal can accurately judge the condition of the number of frames to be played, and accurately encode the audio data to be transmitted.

Fig. 10 is a flowchart of an audio transmission method according to an embodiment, referring to fig. 10, in one embodiment, the audio transmission method includes steps 1002 to 1010.

Step 1002, obtaining an encoder identifier and an encoded packet identifier of a header portion of the audio encoded packet.

Wherein the encoder identifies an encoding format that may be used to identify the non-real-time audio data. It will be appreciated that based on determining that the encoder is a lossy encoder, the encoder may only be used for encoding non-real time audio data in a lossy format. Based on determining that the encoder is a lossless encoder, the encoder may be used for both lossy format encoding of non-real time audio data and lossless format encoding of non-real time audio data. Therefore, when the encoder identification indicates a lossless encoder, the audio playback apparatus can further acquire the lossy/lossless identification of the header section, thereby achieving accurate decoding.

And step 1004, updating the occupation amount information of the buffer area according to the received audio coding packet.

Step 1006, generating and sending a feedback signal to the user terminal according to the updated occupancy information of the buffer area, so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold and/or is lower than a second threshold according to the feedback signal.

Step 1008, obtaining an encoder identifier and an encoded packet identifier of a header portion of the audio encoded packet.

And step 1010, decoding the audio data to be played according to the corresponding encoder identifications in sequence according to the coded packet identifications corresponding to the audio data to be played of each frame in the buffer area so as to obtain non-real-time original audio code stream data of each frame.

Specifically, when the audio transmission environment is poor, which results in failure of transmitting the audio encoding packets, there may be a case where the user terminal supplements the audio encoding packets, which may result in that the actual receiving order of the audio encoding packets is not exactly the same as the target playing order. In this case, the target play order may be determined according to the encoding table identification, so that decoding is sequentially performed according to the target play order, thereby ensuring the accuracy of audio play. It should be noted that the above examples that may cause the audio coding packet sequence error are only for illustration, and are not intended to limit the scope of the present embodiment, and other cases that may cause the audio coding packet sequence error also belong to the scope of the present embodiment.

Further, fig. 11 is a schematic diagram of a package format of a feedback signal according to an embodiment, referring to fig. 11, a header portion of the feedback signal further includes a Vendor ID and an encoder ID _Co ^d _ec ID), encoded packet identification, a buffering parameter, and a check code. The meaning of the above-mentioned identifier may refer to the foregoing embodiment, and will not be described herein.

Fig. 12 is a flowchart of an audio transmission method according to an embodiment, referring to fig. 12, in one embodiment, the audio transmission method includes steps 1202 to 1210.

Step 1202, updating the buffer occupancy information according to the received audio coding packet.

And 1204, generating and sending a feedback signal to the user terminal according to the updated occupancy information of the buffer area, so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold and/or is lower than a second threshold according to the feedback signal.

Step 1206, obtaining an encoder identification and a coded packet identification of a header portion of the audio coded packet.

Step 1208, based on determining that the audio encoding packet includes real-time audio data, decoding the real-time audio data according to the corresponding encoder identifier to obtain real-time original audio bitstream data.

Step 1210, performing audio mixing processing on the real-time original audio code stream data and the non-real-time original audio code stream data of the current frame.

Specifically, in the related art, real-time audio data and non-real-time audio data are typically mixed by a user terminal side. However, in the present embodiment, since non-real-time audio data can be buffered, even if mixing is performed on the user terminal side, the audio playback apparatus needs to split the received data into real-time audio data and non-real-time audio data, which greatly increases. Therefore, in the present embodiment, the user terminal does not perform the mixing process, but the audio playing device performs the mixing process on the real-time audio data and the non-real-time audio data of the current frame according to the playing progress, thereby greatly simplifying the complexity of the process. The specific operation of the mixing process may refer to the related art, and will not be described in detail in this embodiment.

It should be understood that, although the steps in the flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed in rotation or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Fig. 13 is a block diagram of an audio transmission apparatus according to an embodiment, and referring to fig. 13, the audio transmission apparatus is applied to a user terminal, and includes a first encoding module and a first transmitting module. The first encoding module is used for encoding a frame of audio data according to a first encoding mode based on the fact that the number of frames to be played, buffered by the audio playing device, exceeds a first threshold value. The first sending module is used for sending a first audio coding packet corresponding to the audio data of the frame to the audio playing device. In this embodiment, based on the characteristic that the instantaneity of the non-real-time audio data is weaker, the buffering of the non-real-time audio data can be achieved by setting a buffer area in the audio playing device. Due to the arrangement of the buffer area, when the communication quality between the user terminal and the audio playing device is poor, the audio playing device can play the buffered non-real-time audio data. Therefore, the technical scheme of the application can reduce the problem of audio playing clamping caused by the fact that the audio playing equipment does not receive data when the communication quality between the user terminal and the audio playing equipment is poor. In addition, the buffered number of frames to be played exceeding the first threshold indicates that the audio playing device can directly retrieve data from the buffer area for playing in consecutive multi-frames, and accordingly, the user terminal can encode the non-real-time audio data by adopting an appropriate encoding mode so as to provide audio playing quality meeting the user requirements, for example, higher audio playing quality. Therefore, the audio transmission device can enable the playing to be free from jamming and enable the audio playing quality to meet the requirements of users.

In one embodiment, the audio transmission apparatus further includes a second encoding module and a second transmitting module. The second encoding module is configured to encode the plurality of frames of audio data synchronously according to a first encoding mode based on a determination that a number of frames to be played buffered by the audio playing device does not exceed a first threshold, and signal quality information of an audio transmission environment meets a preset condition. The second sending module is used for sending a second audio coding packet corresponding to the multi-frame audio data to the audio playing device.

In one embodiment, the audio transmission apparatus further includes a third encoding module and a third transmitting module. The third encoding module is configured to encode one frame of the audio data according to a second encoding mode based on a determination that the signal quality information does not meet a preset condition, and a number of frames to be played buffered by the audio playing device is lower than a second threshold. And the third sending module is used for sending a third audio coding packet corresponding to the one frame of audio data to the audio playing device.

In one embodiment, the audio transmission apparatus further includes a fourth encoding module and a fourth transmitting module. And the fourth coding module is used for coding one frame of the audio data according to the first coding mode based on the fact that the number of frames to be played, which are cached by the audio playing equipment, is not lower than a second threshold value. The fourth sending module is configured to send a fourth audio coding packet corresponding to the one frame of audio data to the audio playing device.

In one embodiment, the buffer area exceeding the first threshold value judging module includes a total capacity acquiring unit, a first occupation amount acquiring unit and a judging unit. The total capacity acquisition unit is used for acquiring the total capacity of the buffer area of the audio playing device. The first occupation amount acquisition unit is used for acquiring occupation amount information of the buffer area according to a feedback signal sent by the audio playing equipment; the occupancy information includes occupied capacity or unoccupied capacity. The judging unit is used for judging that the number of frames to be played cached by the audio playing device exceeds a first threshold according to the total capacity and the occupation amount information.

In one embodiment, the audio transmission apparatus further includes a fifth encoding module and a fifth transmitting module. Wherein, the fifth encoding module is used for encoding the real-time audio data. And the fifth sending module is used for sending the fifth audio coding packet corresponding to the non-real-time audio to the audio playing device before sending the first audio coding packet.

In one embodiment, an audio transmission apparatus is applied to an audio playing device, and the audio transmission apparatus includes a first feedback module. The first feedback module is used for generating and sending a feedback signal to the user terminal based on whether the buffered frame number to be played exceeds a first threshold value and/or is lower than a second threshold value, so that the user terminal can select an audio coding mode based on the feedback signal.

In one embodiment, the audio transmission apparatus further comprises a first decoding module. The first decoding module is used for selecting a first decoding mode to decode the received audio coding packet based on the fact that the buffered frame number to be played exceeds a first threshold value or is not lower than a second threshold value.

In one embodiment, the audio transmission apparatus further comprises a second decoding module. The second decoding module is configured to select a second decoding mode to decode the received audio encoded packet based on determining that the buffered frame number to be played is lower than a second threshold, and the signal quality information when the audio encoded packet is received does not satisfy a preset condition.

In one embodiment, an audio transmission apparatus is applied to an audio playing device, and the audio transmission apparatus includes an updating module and a second feedback module. The updating module is used for updating the occupation amount information of the buffer area according to the received audio coding packet. The second feedback module is used for generating and sending a feedback signal to the user terminal according to the updated occupancy information of the buffer area so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold value and/or is lower than a second threshold value according to the feedback signal.

In one embodiment, the audio data to be played cached by the audio playing device is non-real-time audio data, and the audio transmission device further includes an identifier obtaining module and a third decoding module. The identification acquisition module is used for acquiring the coder identification and the coding packet identification of the packet head part of the audio coding packet. The third decoding module is used for decoding the audio data to be played according to the corresponding coding packet identifiers of the audio data to be played of each frame in the buffer area and the corresponding encoder identifiers in sequence so as to obtain non-real-time original audio code stream data of each frame.

In one embodiment, the audio transmission apparatus further includes a fourth decoding module and a mixing module. The fourth decoding module is configured to decode the real-time audio data according to the corresponding encoder identifier based on determining that the audio encoding packet includes real-time audio data, so as to obtain real-time original audio code stream data. And the sound mixing module is used for carrying out sound mixing processing on the real-time original audio code stream data and the non-real-time original audio code stream data of the current frame.

The above-mentioned division of the respective modules in the audio transmission device is only for illustration, and in other embodiments, the audio transmission device may be divided into different modules as needed to complete all or part of the functions of the audio transmission device. For specific limitations of the audio transmission apparatus, reference may be made to the above limitations of the audio transmission method, and no further description is given here. The respective modules in the above-described audio transmission apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the user terminal, or may be stored in software in a memory in the user terminal, so that the processor may call and execute operations corresponding to the above modules.

The embodiment of the application also provides a chip configured to execute the audio transmission method. In this implementation, the chip, by employing the audio transmission method as described above,

in one embodiment, a user terminal is provided, and fig. 14 is an internal structure diagram of the user terminal of one embodiment. The user terminal comprises a processor, a memory, a communication interface, a display screen and an input device which are connected by a system bus. Wherein the processor of the user terminal is adapted to provide computing and control capabilities. The memory of the user terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the user terminal is used for carrying out wired or wireless communication with an external user terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio transmission method. The display screen of the user terminal can be a liquid crystal display screen or an electronic ink display screen, and the input device of the user terminal can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the user terminal, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 14 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the user terminal to which the present application is applied, and that a particular user terminal may include more or fewer components than shown in fig. 14, or may combine certain components, or have a different arrangement of components.

In one embodiment, an audio playing device is provided, and fig. 15 is an internal structural diagram of the audio playing device of one embodiment. The audio playback apparatus includes a processor, a memory, a communication interface, and an input device connected by a system bus. Wherein the processor of the audio playback device is configured to provide computing and control capabilities. The memory of the audio playback device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the audio playing device is used for carrying out wired or wireless communication with external audio playing devices, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio transmission method. The input device of the audio playing device can be a touch layer covered on a display screen, can also be keys, a track ball or a touch pad arranged on the shell of the audio playing device, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 15 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the audio playback apparatus to which the present application is applied, and that a particular audio playback apparatus may include more or fewer components than shown in fig. 15, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.

In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few implementations of the examples of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be obvious to a person skilled in the art that the determination of dry deformation and modifications can be made without departing from the spirit of the embodiments of the present application, which are all within the scope of the embodiments of the present application. Accordingly, the protection scope of the embodiments of the present application shall be subject to the appended claims.

Claims

1. An audio transmission method, applied to a user terminal, comprising:

2. The audio transmission method according to claim 1, further comprising:

based on the fact that the number of frames to be played cached by the audio playing equipment does not exceed a first threshold value, and the signal quality information of an audio transmission environment meets preset conditions, encoding a plurality of frames of audio data synchronously according to a first encoding mode;

and sending a second audio coding packet corresponding to the multi-frame audio data to the audio playing device.

3. The audio transmission method according to claim 2, further comprising:

based on the fact that the signal quality information does not meet the preset condition, the number of frames to be played cached by the audio playing device is lower than a second threshold, and one frame of audio data is encoded according to a second encoding mode;

and sending a third audio coding packet corresponding to the frame of audio data to the audio playing device.

4. The audio transmission method according to claim 2 or 3, wherein the preset condition includes at least one of:

the RSSI of the Bluetooth signal measured by the user terminal is smaller than a third threshold;

the error packet rate PER of the audio coding packet measured by the audio playing equipment is larger than a fourth threshold value;

The signal to noise ratio measured by the audio playing device is larger than a fifth threshold value;

and the retransmission rate of the audio coding packet measured by the user terminal is larger than a sixth threshold.

5. The audio transmission method according to claim 2, further comprising:

based on the fact that the number of frames to be played cached by the audio playing device is not lower than a second threshold, encoding one frame of audio data according to a first encoding mode;

and sending a fourth audio coding packet corresponding to the frame of audio data to the audio playing device.

6. The audio transmission method according to claim 1, characterized in that the method further comprises:

and acquiring the number of frames to be played cached by the audio playing device according to the feedback signal sent by the audio playing device.

7. The audio transmission method according to any one of claims 1 to 6, wherein the encoding a frame of audio data according to the first encoding mode includes:

and encoding a frame of non-real-time audio data according to a first encoding mode.

8. The audio transmission method according to claim 7, further comprising:

encoding real-time audio data;

and before the first audio coding packet is sent, sending a fifth audio coding packet corresponding to the non-real-time audio to the audio playing device.

9. An audio transmission method, characterized by being applied to an audio playing device, comprising:

10. The audio transmission method according to claim 9, characterized in that the method further comprises:

and selecting a first decoding mode to decode the received audio coding packet based on the fact that the buffered number of frames to be played exceeds a first threshold or is not lower than a second threshold.

11. The audio transmission method according to claim 9, characterized in that the method further comprises:

and based on the fact that the buffered frame number to be played is lower than a second threshold value, and the signal quality information when the audio coding packet is received does not meet the preset condition, selecting a second decoding mode to decode the received audio coding packet.

12. An audio transmission method, characterized by being applied to an audio playing device, comprising:

Generating and sending a feedback signal to the user terminal according to the updated occupancy information of the buffer area so as to instruct the user terminal to determine whether the buffered frame number to be played exceeds a first threshold value and/or is lower than a second threshold value according to the feedback signal.

13. A chip configured to perform the audio transmission method of any one of claims 1 to 12.

14. A user terminal comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 8 when the computer program is executed.

15. An audio playback device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 9 to 12 when the computer program is executed.

16. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 12.

17. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 12.