CN110033781B

CN110033781B - Audio processing method, apparatus and non-transitory computer readable medium

Info

Publication number: CN110033781B
Application number: CN201810494561.XA
Authority: CN
Inventors: 李敬祥; 张丰盛; 陈继健
Original assignee: Shengwei Advanced Technology Co ltd
Current assignee: Shengwei Advanced Technology Co ltd
Priority date: 2018-01-10
Filing date: 2018-05-22
Publication date: 2021-06-01
Anticipated expiration: 2038-05-22
Also published as: CN110033781A

Abstract

An audio processing method. The audio processing method comprises the following steps: dividing, by a processor, an audio file into a plurality of audio segments; and compressing, by a processor, the plurality of audio segments to generate a plurality of compressed audio segments, comprising: down-sampling a first audio segment of the plurality of audio segments to generate a first compressed audio segment of the plurality of compressed audio segments, wherein a first target bandwidth of the first audio segment is less than a bandwidth threshold; and sampling a second audio segment of the plurality of audio segments to generate a second compressed audio segment of the plurality of compressed audio segments, and adding a delay time to the second compressed audio segment, wherein a second target bandwidth of the second audio segment is not less than the bandwidth threshold. The implementation can more effectively compress the audio data stream when the frequency width changes, and prevent the audio from being discontinuous to generate the popping sound.

Description

Audio processing method, apparatus and non-transitory computer readable medium

Technical Field

The present disclosure relates to audio processing methods, apparatuses, and non-transitory computer readable media, and more particularly, to audio processing methods, apparatuses, and non-transitory computer readable media for compressing audio files.

Background

Conventionally, if an audio file is transmitted to an audio playing device through a wireless transmission protocol such as bluetooth that supports only a low bandwidth, a distorted/lossy compression method such as MP3 format is used to greatly reduce the data amount, but a larger compression ratio is likely to cause audio distortion and generate noise or pop.

In addition, the general compression technique usually involves a lot of operations such as converting the audio file between the time domain and the frequency domain, so that the continuous audio data stream can be divided into audio segments (frames) with fixed sizes for operation and compression, and the receiving end decompresses the audio segments and restores the decompressed audio segments into the audio data stream. Generally, a slightly larger audio segment will have better compression efficiency, but too large an audio region will increase the sound delay and require larger memory. However, small-sized playing devices such as bluetooth headsets, bluetooth speakers, etc. usually have only a low-processing microprocessor and a small storage space, so when decompressing audio files, these small-sized playing devices will consume a long processing time and cannot play in real time.

Disclosure of Invention

An embodiment of the present disclosure provides an audio processing method. The audio processing method comprises the following steps: dividing, by a processor, an audio file into a plurality of audio segments; and compressing, by a processor, the plurality of audio segments to generate a plurality of compressed audio segments, comprising: down-sampling a first audio segment of the plurality of audio segments to generate a first compressed audio segment of the plurality of compressed audio segments, wherein a first target bandwidth of the first audio segment is less than a bandwidth threshold; and sampling a second audio segment of the plurality of audio segments to generate a second compressed audio segment of the plurality of compressed audio segments, and adding a delay time to the second compressed audio segment, wherein a second target bandwidth of the second audio segment is not less than the bandwidth threshold.

In some embodiments, compressing the audio segments by the processor to generate the compressed audio segments further comprises: respectively calculating a first compression rate for compressing one of the audio sections by a first algorithm and a second compression rate for compressing the one of the audio sections by a second algorithm; and compressing the one of the audio segments with the first algorithm in response to the first compression ratio being higher than the second compression ratio.

In some embodiments, the one of the audio segments comprises a header, and the header comprises a tag indicating the first algorithm.

In some embodiments, compressing the audio segments by the processor to generate the compressed audio segments further comprises: each of the compressed audio segments is partitioned into a plurality of audio regions.

In some embodiments, the method further comprises: the compressed audio segments are transmitted to an audio playing device, so that the compressed audio segments are decompressed by the audio playing device according to the compressed audio regions.

In some embodiments, the delay time is equal to a delay time of a low pass filter of the processor.

In some embodiments, the method further comprises setting the first target bandwidth according to a first instruction; and setting the second target bandwidth according to a second instruction.

Another embodiment of the present disclosure provides an apparatus comprising a memory and a processor. The memory is used for storing audio files. The processor is configured to divide the audio file into a plurality of audio segments, down-sample a first audio segment of the plurality of audio segments to generate a first compressed audio segment, wherein the processor samples a second audio segment of the plurality of audio segments to generate a second compressed audio segment, and add a delay time to the second compressed audio segment. The first target bandwidth of the first audio frequency section is smaller than the bandwidth threshold, and the second target bandwidth of the second audio frequency section is not smaller than the bandwidth threshold.

In some embodiments, the processor is further configured to partition each of the compressed audio segments into a plurality of audio regions.

Another embodiment of the present disclosure provides a non-transitory computer readable medium storing instructions that, when executed by a processor, perform the following steps: dividing an audio file into a plurality of audio sections; down-sampling a first audio segment of the plurality of audio segments to generate a first compressed audio segment, wherein a first target bandwidth of the first audio segment is less than a bandwidth threshold; and sampling a second audio segment of the plurality of audio segments to generate a second compressed audio segment, and adding a delay time to the second compressed audio segment, wherein a second target bandwidth of the second audio segment is not less than the bandwidth threshold.

Therefore, according to the technical implementation manner of the present disclosure, embodiments of the present disclosure provide an audio processing method, an audio processing apparatus and a non-transitory computer readable medium, and more particularly, to an audio processing method, an audio processing apparatus and a non-transitory computer readable medium for compressing an audio file, which compress an audio data stream more effectively during bandwidth variation and prevent audio discontinuity from generating pop sound by dynamically down-sampling and up-sampling. In addition, the embodiment of the disclosure executes two or more different compression algorithms simultaneously during compression, so as to achieve better compression efficiency. Furthermore, the embodiments of the present disclosure divide an audio segment into multiple audio regions (chunks) during compression, and the receiving end only needs a small space to decompress the audio data during decompression.

Drawings

In order to make the aforementioned and other objects, features, and advantages of the invention, as well as others which will become apparent, reference is made to the following description of the preferred embodiments of the invention in which:

FIG. 1 is a schematic diagram of an apparatus shown in accordance with some embodiments of the present disclosure;

FIG. 2 is a waveform diagram of an audio segment shown in accordance with some embodiments of the present disclosure;

FIG. 3 is a waveform diagram of an audio segment shown in accordance with some embodiments of the present disclosure;

FIG. 4 is a schematic diagram of an audio segment shown in accordance with some embodiments of the present disclosure; and

fig. 5 is a flow chart illustrating an audio processing method according to some embodiments of the present disclosure.

Description of reference numerals:

100: device for measuring the position of a moving object

110: memory device

130: processor with a memory having a plurality of memory cells

200: wave form diagram

300: wave form diagram

400: audio section

900: audio playing device

500: audio processing method

S510, S530 and S550: step (ii) of

Detailed Description

The following disclosure provides many different embodiments or illustrations for implementing different features of the invention. Elements and configurations in the specific illustrations are used in the following discussion to simplify the present disclosure. Any examples discussed are intended for illustrative purposes only and do not limit the scope or meaning of the invention or its illustrations in any way. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples, which are for purposes of simplicity and clarity, and do not in themselves dictate a relationship between the various embodiments and/or configurations discussed below.

The term (terms) used throughout the specification and claims has the ordinary meaning as commonly understood in each term used in the art, in the disclosure herein, and in the specific context, unless otherwise indicated. Certain words used to describe the disclosure are discussed below or elsewhere in this specification to provide additional guidance to those skilled in the art in describing the disclosure.

As used herein, coupled or connected means that two or more elements are in direct or indirect physical or electrical contact with each other, and coupled or connected means that two or more elements are in operation or act on each other.

It will be understood that the terms first, second, third, etc. may be used herein to describe various elements, components, regions, layers and/or regions. These elements, components, regions, layers and/or regions should not be limited by these terms. These terms are only used to distinguish one element, component, region, layer or region from another. Thus, a first element, component, region, layer or region discussed below could be termed a second element, component, region, layer or region without departing from the teachings of the present invention. As used herein, the word "and/or" includes any combination of one or more of the associated listed items. References to "and/or" in this disclosure refer to any combination of any, all, or at least one of the listed elements.

Please refer to fig. 1. Fig. 1 is a schematic diagram of an apparatus 100 shown in accordance with some embodiments of the present disclosure. The device 100 is configured to be communicatively connected to the audio playback device 900. In some embodiments, the device 100 processes the audio file and transmits the processed audio data to the audio playback device 900 through wireless communication transmission. The audio playing apparatus 900 decompresses the processed audio data to play the audio quickly and in real time.

In connection with this, the apparatus 100 includes a memory 110 and a processor 130. The processor 130 is coupled to the memory 110. In operational relation, the processor divides the audio file into a plurality of audio segments and performs individual processing for each audio segment. The audio file may be partitioned according to any rule, such as time length, number of sample points, and/or file size. The audio processing method 100 processes each audio segment according to the time sequence of the audio content, and the content of each audio segment has the same or different time length, number of sampling points and/or file size, which is not limited in this disclosure.

The processor 130 compresses the plurality of audio sections. Since the bandwidth of audio data transmission is variable, multiple audio segments of the same audio file can respectively contain different target bandwidths. For example, the user can adjust the bandwidth of audio data transmission during audio playing, and the target bandwidth of each audio segment is changed according to the bandwidth of audio data transmission set by the user.

The first audio segment of the plurality of audio segments in the audio file will be compressed first. And after the second audio section is processed, continuing to process the next audio section until the whole audio file is processed.

In some embodiments, if the user sets the bandwidth of the audio data transmission to 400Kbps before the processor 130 compresses the first audio segment, the processor 130 receives an instruction containing information that the bandwidth of the audio data transmission is 400Kbps, and sets the target bandwidth of the first audio segment to 400Kbps according to the instruction. If the user sets the bandwidth of audio data transmission to be 1Mbps before the processor 130 compresses the second audio segment, the processor 130 receives the instruction containing the information that the bandwidth of audio data transmission is 1Mbps, and sets the target bandwidth of the first audio segment to be 1Mbps according to the instruction.

The processor 130 compresses the audio segments according to the target bandwidth of each audio segment. If the target bandwidth of the audio segment is less than the bandwidth threshold, the audio segment is down-sampled to generate a compressed audio segment. And if the target bandwidth of the audio section is not less than the bandwidth threshold, sampling the audio section to generate a compressed audio section, and adding a delay time to the compressed audio section.

Please refer to fig. 2 and fig. 3. Fig. 2 is a waveform diagram 200 of an audio segment shown in accordance with some embodiments of the present disclosure. Fig. 3 is a waveform diagram 300 of an audio segment shown in accordance with some embodiments of the present disclosure. As shown in fig. 2, the processor 130 samples the audio segment to obtain a plurality of sampling points. Assuming normal sampling, the processor 130 samples at 96 KHz. When the target bandwidth of the audio segment is less than the bandwidth threshold, the processor 130 down-samples the audio segment. That is, the processor 130 samples at a lower frequency, such as 48KHz, 32KHz, etc., to generate the compressed audio segments. On the other hand, when the target bandwidth of the audio segment is not less than the bandwidth threshold, the processor 130 samples the audio segment at the sampling frequency of the normal sampling to generate a compressed audio segment, and adds a delay time to the compressed audio segment. For example, as shown in fig. 3, a delay time td is added to the compressed audio segment.

As can be seen from the above, in the present disclosure, under the condition of a low target bandwidth, the audio segment is down-sampled, so as to achieve a better compression rate. In addition, the sound is delayed when down-sampling is performed, and the sound is not delayed when down-sampling is not performed. Therefore, under the condition of not performing down sampling, namely when the target bandwidth is not less than the bandwidth threshold, the delay time is added to the compressed audio section, so that when the target bandwidth is dynamically changed, the played audio cannot generate popping sound due to audio discontinuity.

In some embodiments, when the processor 130 down-samples the audio segment, the audio segment is passed through a low pass filter (not shown) of the processor 130. In some embodiments, the low pass filter may be a Sinc filter. After the audio section is processed by the low pass filter of the processor 130, the compressed audio section generated by the processor 130 is affected by the low pass filter to generate a delay time. In some embodiments, the delay time may be one of 16 samples to 256 samples. For example, if the sampling frequency is 96KHz, the delay time is between 16/96000 seconds and 256/96000 seconds. For the audio section of the down-sampling process, the processor 130 adds the same delay time as the low-pass filter to the compressed audio section to make the audio playback continuous. The delay time is only exemplary, and the disclosure is not limited thereto.

In some embodiments, the processor 130 is further configured to segment the compressed audio segment into a plurality of audio regions. Please refer to fig. 4. Fig. 4 is a schematic diagram of an audio segment 400 shown in accordance with some embodiments of the present disclosure. As shown in fig. 4, each audio section contains a header (header), and the processor 130 partitions the audio data of the compressed audio section 400 into a plurality of audio regions C1-C8. When the apparatus 100 transmits the compressed audio segment 400 to the audio playing apparatus 900, the audio playing apparatus 900 decompresses according to the audio region. That is, the processor 130 decompresses data of the audio region C1, then decompresses data of the audio region C2, and so on. In this way, the amount of operations of the audio playback device 900 during decompression can be reduced, and the audio playback device 900 can decompress in a smaller memory space.

For example, assume that a compressed audio segment 400 contains 1024 sample point data, and the audio playback device 900 needs 6Kbyte of memory space for decompression. If the audio playback device 900 performs decompression according to the audio regions, assuming that the compressed audio segment 400 is divided into 8 audio regions, each audio region contains only 128 sampling point data, the audio playback device 900 only needs 750 bytes of memory space for decompression.

As can be seen from the above, by dividing the compressed audio segment into a plurality of audio segments, the audio playback apparatus 900 can perform decompression processing with a smaller memory space, and can reduce the amount of computation.

In some embodiments, before the processor 130 compresses an audio segment, the processor 130 calculates a first compression rate for compressing the audio segment by a first algorithm and a second compression rate for compressing the audio segment by a second algorithm, respectively, and the processor 130 compresses the audio segment by the first algorithm in response to the first compression rate being higher than the second compression rate. For example, before the processor 130 compresses the first audio segment, the processor 130 calculates a first compression rate for compressing the first audio segment by using a trellis encoding (RICE Coding) algorithm, and then calculates a second compression rate for compressing the first audio segment by using an LZ algorithm. If the first compression rate is higher than the second compression rate, the processor 130 compresses the first audio segment by using the Gramb encoding algorithm to generate a first compressed audio segment. If the first compression rate is not higher than the second compression rate, the processor 130 compresses the first audio segment by an LZ algorithm to generate a first compressed audio segment. In addition, different audio sections of the same audio file can be compressed by different algorithms. The compression algorithms listed above are exemplary only, and the disclosure is not limited thereto.

In some embodiments, the header of the audio segment includes a tag indicating the algorithm used to compress the audio segment. For example, if the first audio segment is compressed using the Gramber encoding algorithm, a tag indicating that the first audio segment is compressed using the Gramber encoding algorithm is included in the header of the first audio segment. Otherwise, if the second audio section is compressed by LZ algorithm, a tag indicating that the second audio section is compressed by LZ algorithm is included in the header of the second audio section.

As can be seen from the above, in the embodiment of the present disclosure, a better algorithm may be selected for different audio sections in the same audio file to compress the different audio sections. Therefore, the embodiment of the disclosure can achieve better compression efficiency.

Please refer to fig. 5. Fig. 5 is a flow diagram illustrating an audio processing method 500 according to some embodiments of the present disclosure. As shown in fig. 5, the audio processing method 500 includes steps S510 to S550.

In step S510, the audio file is divided into a plurality of audio segments. In some embodiments, step S510 may be performed by the processor 130 in fig. 1. For example, the processor 130 divides the audio file into a plurality of audio segments and performs individual processing for each audio segment.

For example, the first audio segment of the plurality of audio segments in the audio file will first go through steps S530 to S550. After the first audio segment is compressed, the second audio segment is processed in steps S530 to S550, and after the second audio segment is processed, the next audio segment is processed until the entire audio file is processed. The first and second are just for illustrative purposes.

In step S530, the plurality of audio segments are compressed to generate a plurality of compressed audio segments. In some embodiments, step S530 may be executed by the processor 130 in fig. 1. In detail, in step S530, if the target bandwidth of the audio segment is smaller than the bandwidth threshold, the audio segment is down-sampled to generate the compressed audio segment. And if the target bandwidth of the audio section is not less than the bandwidth threshold, sampling the audio section to generate a compressed audio section, and adding a delay time to the compressed audio section.

In some embodiments, step S530 further includes calculating a first compression rate for compressing the audio segment by a first algorithm and a second compression rate for compressing the audio segment by a second algorithm, respectively, and the processor 130 compresses the audio segment by the first algorithm in response to the first compression rate being higher than the second compression rate. In addition, the header of the compressed audio segment includes a tag indicating the compression algorithm used when compressing the audio segment, so that the audio playback device 190 can identify the algorithm used by the processor 130 when compressing the audio segment when performing decompression.

In step S550, the compressed audio segments are transmitted to an audio playing device. In some embodiments, step S550 can be executed by the processor 130 in fig. 1 to transmit the plurality of compressed audio segments to the audio playing device 190 in fig. 1. After the audio playing device 190 receives the compressed audio segment, the audio playing device 190 decompresses the compressed audio segment to play the audio file in real time.

In some embodiments, step S530 further includes dividing each of the plurality of compressed audio segments into a plurality of audio regions, so that the audio playing device 190 can perform the decompression process in units of audio regions in step S550.

In some embodiments, the audio processing method 500 can be implemented by a non-transitory computer readable medium. The non-transitory computer readable medium stores a plurality of program code instructions, and when the plurality of program code instructions are executed by the processor, the method can be performed in steps S510 to S550 of the audio processing method 500 or an integration method of the steps. The non-transitory computer readable medium may be a computer, a cell phone, or a standalone audio encoder, and the processor may be a processor or system chip, etc.

In some embodiments of the disclosure, the processor 130 may be a server, a circuit, a Central Processing Unit (CPU), a Microprocessor (MCU) or other devices with equivalent functions, which have functions of storing, calculating, reading data, receiving signals or information, transmitting signals or information, and the like.

In some embodiments of the present disclosure, the memory 110 may be a circuit having a data storage function or other functionally equivalent device or circuit. In some embodiments of the present disclosure, the device 100 may be a higher-performance device such as a computer, and the audio playback device 900 may be a lower-performance device such as a bluetooth device. The above-mentioned operation processing capability refers to operation parameters such as clock rate of the processor, performance of the processor, floating point calculation capability, bit bandwidth, capacity of the memory, etc., for example, the device with higher operation processing capability may include a sound system, a smart phone, a tablet computer, a portable music player, etc., and the device with lower operation processing capability may include a bluetooth earphone, a bluetooth speaker, etc.

In view of the foregoing, embodiments of the present disclosure provide an audio processing method, an audio processing apparatus and a non-transitory computer readable medium, and more particularly, to an audio processing method, an audio processing apparatus and a non-transitory computer readable medium for compressing an audio file, which compress an audio data stream more effectively during bandwidth variation and prevent audio discontinuity from generating pop sound by dynamically down-sampling and up-sampling. In addition, the embodiment of the disclosure can simultaneously execute two or more different compression algorithms when compressing the audio segment, so as to achieve better compression efficiency. Furthermore, in the embodiments of the present disclosure, an audio segment is divided into a plurality of audio regions during compression, and during decompression, a receiving end (e.g., an audio playing apparatus) can decompress audio data with only a small space and a low operation processing capability.

Additionally, the above illustration includes exemplary steps in a sequential order, but the steps need not be performed in the order shown. It is within the contemplation of the disclosure that the steps may be performed in a different order. Steps may be added, substituted, changed in order, and/or omitted as appropriate within the spirit and scope of embodiments of the disclosure.

Although the present disclosure has been described with reference to the above embodiments, it should be understood that various changes and modifications can be made therein by those skilled in the art without departing from the spirit and scope of the disclosure, and therefore, the scope of the disclosure should be determined by that of the appended claims.

Claims

1. An audio processing method, comprising:

dividing an audio file into a plurality of audio sections by a processor; and

setting a first target bandwidth of a first audio section in the audio sections by the processor according to a first instruction, wherein the first instruction corresponds to a first transmission bandwidth;

setting a second target bandwidth of a second audio section in the audio sections by the processor according to a second instruction, wherein the second instruction corresponds to a second transmission bandwidth; and

compressing, by the processor, the audio segments to generate a plurality of compressed audio segments, comprising:

down-sampling the first audio segment of the audio segments to generate a first compressed audio segment of the compressed audio segments, wherein the first target bandwidth of the first audio segment is less than a bandwidth threshold; and

sampling the second audio segment of the audio segments to generate a second compressed audio segment of the compressed audio segments, and adding a delay time to the second compressed audio segment, wherein the second target bandwidth of the second audio segment is not less than the bandwidth threshold.

2. The audio processing method of claim 1, wherein compressing the audio segments by the processor to generate the compressed audio segments further comprises:

respectively calculating a first compression rate for compressing one of the audio sections by a first algorithm and a second compression rate for compressing the one of the audio sections by a second algorithm; and

in response to the first compression ratio being higher than the second compression ratio, the one of the audio segments is compressed with the first algorithm.

3. The audio processing method of claim 2, wherein the one of the audio segments comprises a header, and the header comprises a tag indicating the first algorithm.

4. The audio processing method of claim 1, wherein compressing the audio segments by the processor to generate the compressed audio segments further comprises:

each of the compressed audio segments is partitioned into a plurality of audio regions.

5. The audio processing method of claim 4, further comprising:

the compressed audio segments are transmitted to an audio playing device, so that the compressed audio segments are decompressed by the audio playing device according to the compressed audio regions.

6. The audio processing method of claim 1, wherein the delay time is equal to a delay time of a low pass filter of the processor.

7. An apparatus, comprising:

a memory for storing an audio file; and

a processor for partitioning the audio file into a plurality of audio segments, wherein the processor sets a first target bandwidth of a first audio segment of the audio segments according to a first instruction and sets a second target bandwidth of a second audio segment of the audio segments according to a second instruction, wherein the processor down-samples the first audio segment of the audio segments to generate a first compressed audio segment, wherein the processor samples the second audio segment of the audio segments to generate a second compressed audio segment, and adds a delay time to the second compressed audio segment,

wherein the first instruction corresponds to a first transmission bandwidth and the second instruction corresponds to a second transmission bandwidth,

wherein the first target bandwidth of the first audio segment is less than a bandwidth threshold, and the second target bandwidth of the second audio segment is not less than the bandwidth threshold.

8. The apparatus of claim 7, wherein the processor is further configured to partition each of the compressed audio segments into a plurality of audio regions.

9. A non-transitory computer readable medium having stored thereon instructions that, when executed by a processor, perform:

dividing an audio file into a plurality of audio sections;

setting a first target bandwidth of a first audio section in the audio sections according to a first instruction, wherein the first instruction corresponds to a first transmission bandwidth;

setting a second target bandwidth of a second audio section in the audio sections according to a second instruction, wherein the second instruction corresponds to a second transmission bandwidth;

down-sampling the first audio segment of the audio segments to generate a first compressed audio segment, wherein the first target bandwidth of the first audio segment is less than a bandwidth threshold; and

sampling the second audio segment of the audio segments to generate a second compressed audio segment, and adding a delay time to the second compressed audio segment, wherein the second target bandwidth of the second audio segment is not less than the bandwidth threshold.