WO2000022736A1

WO2000022736A1 - System and method for compressing an audio signal

Info

Publication number: WO2000022736A1
Application number: PCT/US1999/023586
Authority: WO
Inventors: Angél L. DECEGAMA
Original assignee: Westford Technology Corporation
Priority date: 1998-10-09
Filing date: 1999-10-08
Publication date: 2000-04-20
Also published as: AU1107600A

Abstract

An audio signal is typically formed from several periodic segments, each being formed from successive repetitions of a constituent period. A real-time compression method takes advantage of this phenomenon by identifying the periodic segments and forming a truncated signal in which each periodic segment is represented by one constituent period, together with information indicative of the number of constituent periods in the segment. The truncated signal is then provided to a compressor to generate a compressed version of the audio signal.

Description

SYSTEM AND METHOD FOR COMPRESSING AN AUDIO SIGNAL

FIELD OF THE INVENTION

The invention relates generally to a system and method for compressing a digital representation of an analog signal and more particularly to a system and method for compressing a digital representation of an audio analog signal by identifying periodic data to achieve both efficient storage and transmission of the signal.

BACKGROUND OF THE INVENTION There are many advantages to transmitting a digital signal that is representative of an analog signal rather than sending the analog signal itself. For example, a digital signal is far less prone to corruption from noise. Error checking techniques, such as hashing, can be used to verify the proper reception of the digital signal. A digital signal can readily be encrypted so that the signal is secure from interception and detection by an unauthorized third party.

While there are clear advantages to the transmission digital signals, these advantages do not come without a cost. It is well known that in order to provide a high- fidelity digital representation of an analog signal, the analog signal must be sampled at a rate greater than the Nyquist rate, which is twice the highest frequency component contained in the analog signal. Since a human with normal hearing can detect sounds having a frequency as high as 20,000 Hz, an analog audio signal must be sampled at a rate of at least 40,000 Hz. Because of the very fast sampling circuitry required to generate the digital signal, sampling at such a high rate is impractical. It has been found, however, that most humans will tolerate a voice signal that has been band limited to 4,000 Hz. While this limitation reduces the required sampling rate to 8,000 Hz, the resulting digital signal continues to be too large for efficient real-time signal processing. In particular, such a signal cannot readily be compressed in real-time prior to transmission.

It is thus desirable to employ techniques and methods for compressing digital data to further reduce the amount of digital data that must be transmitted in order to allow reconstruction of an audio signal without reducing the quality of the signal. Additionally, compression is important in other applications other than telephony or communications applications, since compression of data allows one to store information with reduced memory requirements. As an example, compression can be utilized to increase the amount of audio that can be stored on a compact disk, and thus effectively enlarging the storage capacity of the disk .

OBJECTS OF THE INVENTION

It is an object of the invention to provide a method and device to compress a digital representation of an analog signal that overcomes the disadvantages of the prior art. It is a further object of the invention to provide an efficient means for identifying duplicative portions of an analog signal so as to more efficiently transmit and store a digital audio signal while maintaining the fidelity of the signal.

It is a still further object of the invention to provide for efficient and economical computer data structures to implement the above noted objects of the invention on a general purpose computer system.

SUMMARY OF THE INVENTION

The current invention is directed to a system and software-implemented method for real-time compression of an audio signal. By implementing the method in software, a system for carrying out the method of the invention can be a generic computer system. This avoids the cost associated with acquisition of specialized hardware to perform the compression.

It is known that most audio signals of interest are periodic. This is the natural result of the fact that most audio signals are generated by vibrating structures. For example, human speech is generated by vibrating vocal chords. Musical sounds are formed by strings, columns of air, or membranes, all of which are set into vibration by some excitation source under the control of the musician. As a result, a considerable amount of audio information is redundant and need not be saved.

For example, when a musician holds the same note for one or more measures, it is wasteful to keep all samples from the resulting analog audio signal since that signal contains an extended periodic segment having a fundamental frequency determined by the pitch of the note being held. However, when a musician plays a complex melody, it is clearly desirable to keep many more samples from the resulting audio signal. This is because the resulting audio signal contains a large number of short periodic segments each having a different fundamental frequency. In human speech, vowels are known to result in a predominantly periodic portions of a speech signal whereas consonants and other fricatives generate aperiodic portions of the speech signal. A typical digital audio signal thus includes periodic segments consisting of successive constituent periods. These periodic segments are separated from each other by relatively aperiodic segments.

The method of the invention exploits this inherent periodicity of audio signals by determining the temporal extent of these constituent periods and determining the temporal extent of the periodic segment formed by successive constituent periods. Because the periodic segment can be recovered from knowledge of the constituent period and the number of such constituent periods, the method of the invention compresses only the constituent period rather than the entire periodic segment. This is accomplished by selecting, from the successive constituent periods that make up a periodic segment, a subset of constituent periods having fewer constituent periods than there are in the entire periodic segment. The constituent periods in this subset are provided to a compression routine to generate a portion of the compressed audio signal. Since there is little to gain by compressing more than one constituent period, the subset of constituent periods that are provided to the compression routine generally has only one constituent period.

The method of determining the extent of the constituent period for a periodic segment recognizes that the digital signal is made up of segments that are above and below a selected threshold. If a selected signal segment and a successive signal segment having sample values that are on the same side of the selected threshold, and if the selected signal segment and the successive signal segment have approximately the same area or approximately the same maxima, it is a reasonable inference that the temporal extent of the constituent period is related to the interval separating the two segments. In particular, if no other signal segment between the selected signal segment and the successive signal segment has the properties of the successive signal segment, then the temporal separation between the selected signal segment and the successive signal segment is likely to be approximately equal to the temporal extent of the constituent period. In one aspect of the invention, the audio is sampled to produce a digital representation of the audio. The sampled data is grouped into segments to more easily operate on the data.

In another aspect of the invention a transform is applied to the sampled data to better emphasize the periodicity of the audio. The transform is applied only to the positive valued sample values. Negative sample values are ignored.

In another aspect of the invention, the transformed segments are analyzed to determine the periodicity of the data so as to reduce the amount of sampled data that must be retained. In yet a further aspect of the invention, a wavelet transform is used to transform the periodic and non-periodic portions of the audio data. For periodic data only the first period is transformed.

These and other features and advantages of the invention will be apparent in the following detailed description and the accompanying drawings, in which:

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a representative computer system for implementing the method of the invention; FIG. 2 shows representative architecture for software executed by the system of FIG.1 for implementing the method of the invention; and

FIG. 3 illustrates the transformation performed by the periodicity enhancer of FIG. 2.

DESCRIPTION OF AN ILLUSTRATIVE EMBODIMENT

According to one practice of the invention, a computer implemented method compresses periodic segments of a digital audio signal to provide efficient storage and transmission of the signal. The scope of the invention envisions a method for sampling an analog signal to create a digital representation of that signal. The method then identifies periodic segments of that signal Since a periodic segment can be reconstructed from only the first constituent period in the periodic segment and an indication of the number of constituent periods, compression can be achieved by discarding all but one constituent period. While the current invention is not limited to specific types of audio signals, the inherent periodicity of speech renders it peculiarly applicable to voice signals. One of ordinary skill in the art will recognize that the methods of the instant invention are easily extended to other types of audio signals, including but not limited to music or sounds of nature.

FIG. 1 depicts a computer system 10 for practicing the present invention. The illustrated computer system 10 includes a processor 12 for executing programmed instructions to implement the method of the instant invention. The processor 12 can be a standard microprocessor, such as a PENTIUM® processor from INTEL or a 68000 series processor from MOTOROLA. Preferably, the processor 12 should be capable of operating at a speed that permits real-time capture and compression of the audio signal. The computer system 10 also includes a memory element 14 for storing both the programmed instructions to be executed by the processor 12 and the data upon which the instructions are to operate. The computer system 10 also includes an application specific integrated circuit (ASIC) 16 for doing certain special comparisons that are specific to the present application. The ASIC 16 has an on-board processor and an on- board memory to store data. The processor 12 is coupled to the memory element 14 and the ASIC 16 through a standard computer bus 18. In one practice of the invention, the ASIC 16 is mounted on a printed circuit board (PCB) 23 which is in communication with the processor 12 and the memory element 14 through the bus 18. A communications transceiver 20 allows for the reception and/or transmission of an analog speech signal to the computer system 10. An analog filter 22, coupled to communication transceiver 20, can also be included within computer system 10 to bandpass filter the analog signal prior to digitizing the analog signal in an A/D converter 24. It is well known to those of ordinary skill in the art that, to prevent aliasing, an analog signal must be sampled by an A/D converter at a sampling rate greater than twice the highest frequency component of the analog signal. Consequently, it may be advantageous to filter high frequency components of the audio signal to reduce the required sampling frequency at the A/D converter 24.

In one embodiment of the current invention, the analog speech signal is input to computer system 10 through the communications transceiver 20. The communications transceiver 20 may be connected to any analog signal source, such as a telephone handset, a microphone, or a cassette player. In one alternative embodiment of the invention, the analog speech signal is filtered to eliminate its high frequency components before being passed to the A D converter 24. The A/D converter 24 periodically samples the analog signal to obtain a digital audio signal representative of the analog signal. In one practice of the invention, the analog signal is sampled at a rate of 1 1,025 Hz to obtain 11,025 digital samples per second. The resulting digital samples are then grouped into units of 256 sample values each of which is referred to as an "audio segment." Audio segments of different sizes (e.g. 128 or 512 samples) can also be used depending on the delay that can be tolerated in the particular application. FIG. 2 shows a representative system 26 embodying the principles of the invention. The representative system includes an optional periodicity enhancer 28 that operates on a digital signal x, to render any underlying period in that signal more apparent. The periodicity enhancer 28 generates a transformed signal y, that, in the preferred embodiment, is an offset replica of the digital signal x, in which those samples from the digital signal x, that, following transformation, are below a selected threshold, are disregarded. In the preferred embodiment, the transformation performed by the periodicity enhancer 28 is given by =A -(2"-' -l)

where N is the number of bits used to represent each sample. Thus, in a typical implementation, with 8 bits being used to represent each sample, the preferred transformation would be y, =*, -127

The offset (2^^"' -l) applied to the original signal x, is selected to ensure that in the resulting offset signal, y„ approximately half the sample values will be negative and approximately half will be positive. This is advantageous because in subsequent processing for identifying periodic segments in the transformed signal y, (which, of course, correspond to periodic segments in the original signal x,), those sample values that are negative can be disregarded.

To see why one can disregard the negative sample values iny„ consider a simple periodic function such as a sine wave. In order to determine the period of a sine wave, one selects a point on the sine wave and travels forward or backward along the wave to find another point which has the same value as the selected point. Once the period of the sine wave is established in this manner, it is possible to reconstruct the entire sine wave. Clearly, if the selected point is positive, then it is known that the second point that one seeks is also a positive. As a result, there is no need to save any of the negative values of the sine wave in order to correctly reconstruct it. While the sine wave is a somewhat simple example, it is clear that this principle applies to the more complex, but nevertheless periodic wave forms in an audio signal.

By offsetting the original signal, the periodicity enhancer 28 makes it easier to search for a point that is one period away from a selected point in the offset signal y,. It does so by causing approximately half the samples in the original signal to be rendered negative, thereby enabling them to be disregarded. The choice of the offset value determines the fraction of the points that are rendered negative and hence disregarded. If the offset value is too small, the number of points that rendered negative will be too small to make an appreciable difference in performance. On the other hand, if the offset value is too large, there is a risk that the point establishing the first period from the selected point that would otherwise establish the first period away from the selected point will inadvertently be rendered negative, and hence disregarded. This will result in an error in identifying a periodic segment and determining the temporal extent of the constituent period of that periodic segment. The offset value (2^^"' -l) has been selected experimentally to satisfy the constraint that approximately half the sample values be rendered negative in normal speech. It is of course possible to select other offset values for different applications without departing from the spirit and scope of the invention.

The offset signal v, is then provided to a feature extractor 30. The feature extractor 30 identifies those features of the transformed signal that are potentially manifestations of a period.

FIG. 3 shows the effect of the transformation performed by the period enhancer on an original signal x, shown as a sequence 42 in the uppermost graph of FIG. 2. Following the transformation, the marginally positive sample values in the sequence are pushed below the horizontal axis as shown in the first transformed sequence 44. In the second transformed sequence 46, these values, together with all the values that were originally negative in the original sequence 42, are set to zero and disregarded in subsequent signal processing steps associated with identifying a periodic segment and the temporal extent of its constituent periods.

The second transformed sequence includes a plurality of inter-zero segments 48a, 48b that carry features indicative of the extent of the periodic segments in the original signal and the temporal extent of the constituent periods of each such periodic segment. These features are extracted by a feature extractor 30.

The feature extractor 30 identifies indicia of periodicity in the original signal x, . Among the indicia of periodicity are the maxima associated with successive inter-zero segments, the normalized area of successive inter-zero segments, and the separation n between inter-zero segments. One or both of these indicia of periodicity are provided by the feature extractor 30 to a period extractor 32.

In one aspect of the invention, the feature extractor 30 obtains maxima across successive inter-zero segments and provides those maxima to the period extractor 32. The period extractor 32 searches the set of maxima provided by the feature extractor 30 for maxima having similar values. This can be accomplished, for example, by selecting a maximum from a selected inter-zero segment and proceeding from the selected inter- zero segment toward successive inter-zero segments. Once an inter-zero segment is found that has a maximum within a selected threshold of the maximum of the selected inter-zero segment, the period extractor 32 obtains the temporal distance between the selected inter-zero segment and the inter-zero segment having a maximum within a selected threshold of the maximum of the selected inter-zero segment. In most cases, this temporal distance corresponds to the temporal extent of a constituent period of a periodic segment.

Another index of periodicity identified by the feature extractor 30 is the normalized area of successive inter-zero segments. In this specification, the normalized area for a particular inter-zero segment Z_k refers to the summation

∑y. where the summation is evaluated over those samples that are within the inter-zero segment Z_k . However, it will be apparent that other definitions are possible without departing from the scope of the invention. In another aspect of the invention, the feature extractor 30 obtains normalized areas of successive inter-zero segments and provides those normalized areas to the period extractor 32. The period extractor 32 searches the set of normalized areas provided by the feature extractor 30 for normalized areas having similar values. This can be accomplished, for example, by selecting a normalized area from a selected inter-zero segment and proceeding from the selected inter-zero segment toward successive inter- zero segments. Once an inter-zero segment is found that has a normalized area within a selected threshold of the normalized area of the selected inter-zero segment, the period extractor 32 obtains the temporal distance between the selected inter-zero segment and the inter-zero segment having a normalized area within a selected threshold of the normalized area of the selected inter-zero segment. In most cases, this temporal distance corresponds to the temporal extent of a constituent period of a periodic segment.

The output of the period extractor 32 is an estimate of the temporal extent To a constituent period of a periodic segment of the original signal x . This estimate, together with the transformed signal yi is provided to a truncator 34. The output z, of the truncator 34 is a truncated signal z, generated by removing all but one constituent period from each periodic segment in the transformed signal y, . The truncated signal is then provided to a compression module 36 which generates a compressed version w_l of the truncated signal. In the preferred embodiment, the compression module 36 implements a wavelet transform based compression method disclosed in pending U.S. Application entitled "Improved Estimator for Recovering High Frequency Components from Compressed Data" filed March 28, 1998, having Ser. No. 09/047,868, and naming the same inventor as the present application, the contents of which are incorporated by reference herein. As described in that application, the wavelet transform of the truncated signal results in a set of wavelet coefficients corresponding to the frequencies present in the signal. The resulting spectrum is divided into the low frequency components and high frequency components. In one embodiment of the invention, the spectrum is divided so that there are as many high frequency components as there are low frequency components. However, one of ordinary skill in the art will recognize that other apportionment divisions may also be used, and remain within the scope of the invention. After applying a first level of the wavelet transform to the data to derive the low frequency components and high frequency components, a second level of transformation is applied to the low frequency components using the same wavelet transform as previously applied. The high frequency components 620 are not wavelet transformed and are discarded. This second level of transformation produces a second spectrum of frequency coefficients which is again divided up into low frequency components and high frequency components.

This process of wavelet transforming the previously derived low frequency spectrum, and discarding the derived high frequency spectrum, may continue to any number of levels, depending on the degree of compression that is desired. As more levels of wavelet transformation are applied to the data, more of the high frequency components are discarded.

In one embodiment of the invention, three levels of wavelet transformation are applied, but again more transform levels (limited only by the size of the data being transformed) may be applied to further compress the data. After the last level of transformation has been applied, both the resulting low frequency and high frequency components are retained for encoding as the compressed data corresponding to a single constituent period of a periodic segment.

Although the preceding discussion relates to the compression of periodic segments from the audio signal, the same wavelet transform procedure can readily be applied to the aperiodic segments from the audio signal.

It is thus seen that the invention efficiently attains the objects set forth above, among those made apparent from the preceding description. Since certain changes may be made in the above constructions without departing from the scope of the invention, it is intended that all matter contained in the above description or shown in the accompanying drawings be interpreted as illustrative and not in a limiting sense.

It is also to be understood that the following claims are to cover all generic and specific features of the invention described herein, and all statements of the scope of the invention which, as a matter of language, might be said to fall therebetween.

Claims

WHAT IS CLAIMED

1. A method for encoding a digital audio signal into a digitally compressed format, said digital audio signal having a periodic segment formed by a plurality of successive repetitions of a constituent period, said method comprising the steps of: determining a temporal extent of said constituent period of said periodic segment; determining a temporal extent of said periodic segment; selecting, from said plurality of constituent periods, a subset of constituent periods, said subset of constituent periods having a cardinality smaller than said plurality of constituent periods, said subset of constituent periods and said temporal extent of said periodic segment together being representative of said periodic segment of said digital audio signal; and encoding said subset of constituent periods into a digitally compressed format.

2. The method of claim 1 further comprising the step of sampling an analog audio signal to generate said digital audio signal.

3. The method of claim 1 wherein said step of determining said temporal extent of said periodic segment comprises the step of determining a cardinality of said plurality of constituent periods.

4. The method of claim 1 wherein said step of determining the extent of said constituent period comprises the step of: defining an offset threshold such that said digital audio signal includes a plurality of successive threshold crossings at which said digital audio signal crosses said selected threshold, said plurality of threshold crossings defining a plurality of successive signal segments having sample values on opposite sides of said selected threshold, each signal segment having substantially the same plurality of sample values; and disregarding those signal segments having sample values less than said selected threshold.

5. The method of claim 4 further comprising the step of selecting said selected threshold to be zero.

6. The method of claim 4 wherein said step of determining a temporal extent of said constituent period comprises the steps of: selecting a signal segment, determining an area of said selected signal segment, and identifying a successive signal segment having an area substantially equal to said area of said selected signal segment, said successive signal being separated from said selected signal by a temporal extent indicative of said temporal extent of said constituent period.

7. The method of claim 1 further comprising the step of selecting said cardinality of said subset of constituent periods to be equal to one.

8. The method of claim 4 wherein said step of determining a temporal extent of said constituent period comprises the steps of: selecting a signal segment, determining a maximum sample value from said plurality of sample values for said selected signal segment, and identifying a successive signal segment having maximum value substantially equal to said maximum value of said selected signal segment, said successive signal being separated from said selected signal by a temporal extent indicative of said temporal extent of said constituent period.

9. A computer-readable medium for having encoded thereon software for encoding a digital audio signal into a digitally compressed format, said digital audio signal having a periodic segment formed by a plurality of successive repetitions of a constituent period, said software comprising instructions for executing the steps of: determining a temporal extent of said constituent period of said periodic segment; determining a temporal extent of said periodic segment; selecting, from said plurality of constituent periods, a subset of constituent periods, said subset of constituent periods having a cardinality smaller than said plurality of constituent periods, said subset of constituent periods and said temporal extent of said periodic segment together being representative of said periodic segment of said digital audio signal; and encoding said subset of constituent periods into a digitally compressed format.

10. The computer-readable medium of claim 9 wherein said software instructions further comprise instructions for executing the step of sampling an analog audio signal to generate said digital audio signal.

11. The computer-readable medium of claim 9 wherein said instructions for executing the step of determining said temporal extent of said periodic segment comprise instructions for executing the step of determining a cardinality of said plurality of constituent periods.

12. The computer-readable medium of claim 9 wherein said instructions for executing the step of determining the extent of said constituent period comprise instructions for executing the step of: defining an offset threshold such that said digital audio signal includes a plurality of successive threshold crossings at which said digital audio signal crosses said selected threshold, said plurality of threshold crossings defining a plurality of successive signal segments having sample values on opposite sides of said selected threshold, each signal segment having substantially the same plurality of sample values; and disregarding those signal segments having sample values less than said selected threshold.

13. The computer-readable medium of claim 12 wherein said software further comprises instructions for executing the step of selecting said selected threshold to be zero.

14. The computer-readable medium of claim 12 wherein said instructions for executing the step of determining a temporal extent of said constituent period comprise instructions for executing the steps of: selecting a signal segment, determining an area of said selected signal segment, and identifying a successive signal segment having an area substantially equal to said area of said selected signal segment, said successive signal being separated from said selected signal by a temporal extent indicative of said temporal extent of said constituent period.

15. The computer-readable medium of claim 9 wherein said software instructions further comprise instructions for executing the step of selecting said cardinality of said subset of constituent periods to be equal to one.

16. The computer-readable medium of claim 12 wherein said instructions for executing the step of determining a temporal extent of said constituent period comprise instructions for executing the steps of:

selecting a signal segment, determining a maximum sample value from said plurality of sample values for said selected signal segment, and identifying a successive signal segment having maximum value substantially equal to said maximum value of said selected signal segment, said successive signal being separated from said selected signal by a temporal extent indicative of said temporal extent of said constituent period.

17. A system for compressing a digital audio signal, said digital audio signal having a periodic segment formed by a plurality of successive repetitions of a constituent period, said system comprising a feature extractor for identifying indicia of periodicity in said digital audio signal; a period extractor in communication with said feature extractor, said period extractor estimating, on the basis of said indicia of periodicity, a temporal extent of said periodic segment and a temporal extent of said constituent period; a signal truncator in communication with said period extractor and receiving said digital audio signal as an input, said signal truncator generating a truncated signal representative of said digital audio signal, said truncated signal having at least one constituent period from each of said periodic segments; and a compression module in communication with said signal truncator, said compression module generating a compressed version of said truncated signal, said compressed version being representative of said digital audio signal.

18. The system of claim 17 further comprising a periodicity enhancer for generating a transformed digital audio signal and providing said transformed digital audio signal to said feature extractor.

19. The system of claim 18 wherein said periodicity enhancer compresses signal offset means for causing a selected fraction of sample values from said digital audio signal to be negative in said transformed digital audio signal.

20. The system of claim 19 wherein said signal offset means comprises means for selecting said selected fraction to be approximately one half.

21. The system of claim 17 wherein said compression module is a wavelet transform compression module.

22. The system of claim 17 wherein said feature extractor comprises means for evaluating a normalized area for selected segments of said digital audio signal.

23. The system of claim 17 wherein said feature extractor comprises means for evaluating maxima for selected segments of said digital audio signal.

24. The system of claim 17 further comprising a signal sampler for generating said digital audio signal from an analog audio signal.

25. A method for compressing a digital audio signal, said method comprising the steps of: sampling a digital audio signal, said digital audio signal having a periodic segment, determining the extent of said periodic segment, discarding a portion of said periodic segment and retaining a retained portion of said periodic segment, and compressing said retained portion of said periodic segment.