WO2008001316A2 - Decoding sound parameters - Google Patents

Decoding sound parameters Download PDF

Info

Publication number
WO2008001316A2
WO2008001316A2 PCT/IB2007/052488 IB2007052488W WO2008001316A2 WO 2008001316 A2 WO2008001316 A2 WO 2008001316A2 IB 2007052488 W IB2007052488 W IB 2007052488W WO 2008001316 A2 WO2008001316 A2 WO 2008001316A2
Authority
WO
WIPO (PCT)
Prior art keywords
sound
transient
parameters
frame
components
Prior art date
Application number
PCT/IB2007/052488
Other languages
French (fr)
Other versions
WO2008001316A3 (en
Inventor
Marek Szczerba
Andreas Gerrits
Marc Middelink
Original Assignee
Nxp B.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nxp B.V. filed Critical Nxp B.V.
Priority to EP07789815A priority Critical patent/EP2038882A2/en
Priority to US12/306,605 priority patent/US20090308229A1/en
Priority to JP2009517552A priority patent/JP2009543112A/en
Publication of WO2008001316A2 publication Critical patent/WO2008001316A2/en
Publication of WO2008001316A3 publication Critical patent/WO2008001316A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/093Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using sinusoidal excitation models
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • G10L19/025Detection of transients or attacks for time/frequency resolution switching
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Definitions

  • the present invention relates to decoding sound parameters and synthesizing sound. More in particular, the present invention relates to a device for and a method of producing sound samples from sound parameters representing transient sound components, sinusoidal sound components and/or other sound components.
  • Parametric decoders are capable of decoding such parameters and producing sound samples which can subsequently be converted into an analog sound signal.
  • Parametric synthesizers likewise use sound parameters to produce sound samples.
  • the sound parameters and the resulting sound samples are typically arranged in frames: sets of data that may be processed in a single routine.
  • Each frame may contain one or more parameters, which may be processed to produce a number of sound samples.
  • the parameters typically constitute an efficient representation of the sound.
  • sound parameters may be used to represent different components of the sound. For example, some sound parameters may represent only transient sound components, while other sound parameters may represent other sound components, for example sinusoidal components and/or noise components. As these sound components have different properties, they can be represented more efficiently by different sets of parameters.
  • the number of sound components per frame may be very large. However, synthesizing many sound components may require a large number of computations. This requires a device having a relatively large processing power, which is not feasible in many applications.
  • the present invention provides a device for producing sound samples from sound parameters representing transient sound components and other sound components, the device comprising means for reducing the number of sound parameters to be synthesized. More in particular, the present invention provides a device for producing sound samples from sound parameters representing sound components, the device comprising: at least one selection unit for receiving frames containing sound parameters which represent sound components and for selecting, for each frame, a limited number of sound components, and at least one synthesis unit for synthesizing selected sound components from their parameters.
  • the selection unit may be a transient selection unit for selecting a single transient sound component per frame
  • the synthesis unit may be a transient synthesis unit for synthesizing any selected transient components.
  • transient sound component By selecting only a single transient sound component in each frame containing transient sound components, the synthesis of multiple transient (sound) components per frame is avoided. It has been found that the synthesis of multiple transient components is computationally very demanding, and that the processing required can be significantly reduced by synthesizing only one transient component per frame. It has further been found that the quality of the sound is in most cases hardly affected. Thus the efficiency of the sound production is greatly improved while the omission of the further transients of each frame is hardly audible.
  • some frames may contain no transient sound components, in which case no transient component will be synthesized.
  • Other frames may contain only a single transient component, which will accordingly be selected.
  • the transient selection unit may select the single transient to be synthesized in various ways. It is possible to select the first transient of each frame and ignore the (parameters of the) remaining ones. However, other criteria can be used to select a transient sound component. In a preferred embodiment, the selection unit is provided with means for selecting the transient sound component having the largest energy content.
  • the transient synthesis unit is preferably provided with a discontinuation unit for discontinuing a transient sound component of a previous frame when synthesizing a transient sound component in the present frame.
  • the device of the present invention may additionally, or alternatively, comprise a sinusoid selection unit for selecting one or more sinusoidal sound components for each frame containing sinusoidal sound components, and a sinusoid synthesis unit for synthesizing selected sinusoidal sound components from their parameters.
  • the sinusoid selection unit may advantageously be dependent on the transient selection unit and may produce fewer sinusoidal sound components if the transient selection unit selects a transient for the same frame.
  • the sinusoid selection unit is preferably controlled by the transient selection unit, the number of selected sinusoidal components depending on the presence of a transient component in the same frame.
  • reducing the number of sinusoids if a transient is being synthesized reduces the required number of computations. It has been found that this measure hardly affects the sound quality, as the transient "masks" the sinusoids. In frames containing no transients, all sinusoidal sound components may be selected and synthesized.
  • the feature of producing fewer sinusoidal sound components if the transient synthesis unit produces a transient for the same frame can be used independently, and can therefore also be used in devices that synthesize more than one transient per frame. If a particular frame contains no transients but the previous frame did, a transient may still be synthesized. In such cases, the number of sinusoids may also be reduced to reduce the computational load.
  • the selection of sinusoidal components and transient components is preferably based on their psycho-acoustical relevance, while the sinusoid selection and the transient selection may mutually influence each other.
  • the sinusoidal sound parameters represent transform domain coefficients, or represent data that can be converted into transform domain coefficients.
  • the device preferably further comprises an inverse transform unit for transforming transform domain coefficients into time domain samples.
  • the transform domain preferably is the frequency domain, in particular the complex spectrum domain, the inverse transform being an inverse fast Fourier transform (IFFT).
  • IFFT inverse fast Fourier transform
  • other transform domains and associated (inverse) transforms may be used, for example the (discrete) cosine transform domain or the quadrature mirror filter (QMF) transform domain.
  • the sound parameters may be transform domain coefficients, such as Fourier coefficients, but that it may also be possible to generate transform domain coefficients from the sound parameters. In the former case the sound parameters are equal to transform domain coefficients, while in the latter case the sound parameters represent such coefficients or equivalent data and may be converted into transform domain sound coefficients.
  • the sinusoidal synthesis unit comprises a convolution unit for convolving the transform domain sound coefficients with a transform domain representation of a time window, and a coefficient limiting unit for limiting the number of additional transform domain sound coefficients resulting from the convolution.
  • the coefficient limiting unit may effectively limit the number of sound coefficients after convolution by selecting a sub-set of the available set of coefficients.
  • the processing may involve multiplication when the sound parameters represent time domain coefficients, or convolution when the sound parameters represent transform domain coefficients.
  • a convolution typically causes an increase in the number of non-zero transform domain coefficients. This, however, also increases the amount of processing required.
  • the coefficient limiting unit may be arranged for limiting the number of transform domain coefficients in a frame in dependence of the original number of sound parameters in the frame.
  • the number of selected additional coefficients may be small if the original number of coefficients is large.
  • the total number of coefficients may be kept approximately constant, or at least below a certain maximum.
  • the number of additional coefficients may be kept approximately constant or below a certain maximum.
  • the number of additional coefficients may be limited in various ways.
  • the number of additional coefficients in a frame is equal to: six if the original number of coefficients is smaller than three, four if the original number of coefficients is between three and five, two if the original number of coefficients is greater than four.
  • the device of the present invention may comprise a noise selection unit for selecting, for each frame, noise sound components to be synthesized, and a noise synthesis unit for synthesizing selected noise sound components from their parameters. By selecting noise components prior to the synthesis, the computational load can be further reduced.
  • the selection of noise components may be independent or may depend on the selection of transient and/or sinusoidal components.
  • the device of the present invention may further comprise an output unit for outputting the sound samples, the output unit preferably being provided with means for adding overlapping frames. That is, the output unit may use the well-known overlap-and-add technique to combine the frames into an output signal.
  • the device of the present invention may comprise a frame forming unit for forming frames containing sound parameters, in which case the transient selection unit, the sinusoid selection unit and/or the noise selection unit receives the frames from the frame forming unit.
  • the present invention further provides a consumer device comprising a device as defined above, as well as a sound system comprising a device as defined above.
  • the consumer device of the present invention may be a portable consumer device, such as a mobile (US: cellular) telephone apparatus, a solid state music player, such as an MP3 player, a music synthesizer, or any other suitable device.
  • the present invention also provides a method of producing sound samples from sound parameters representing transient sound components and other sound components, the method comprising the steps of: receiving frames containing sound parameters which represent sound components, selecting, for each frame, a limited number of sound components, and synthesizing any selected sound components from their parameters.
  • the method of the present invention has the same advantages as the device discussed above.
  • the selected sound components may comprise only a single transient component per frame.
  • the method of the present invention may further comprise the step of synthesizing sinusoidal sound components from sinusoidal sound parameters contained in a frame, and producing fewer sinusoidal sound components if at least one transient sound component for the same frame is produced.
  • the sound parameters may represent transform domain parameters or data that can be converted into transform domain parameters, the method preferably further comprising the step of inversely transforming parameters.
  • the method of the present invention may comprise the step of convolving the transform domain sound coefficients with a transform domain representation of a time window, and limiting the number of additional sound coefficients resulting from the convolution.
  • the method of the present invention may also comprise the step of forming frames containing sound parameters which represent one or more sound components.
  • the present invention additionally provides a computer program product for carrying out the method as defined above.
  • a computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD.
  • the set of computer executable instructions which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
  • Fig. 1 schematically shows an exemplary embodiment of a device according to the present invention.
  • Fig. 2 schematically shows the process of limiting the number of parameters after convolution in accordance with the present invention.
  • Fig. 3 schematically shows limiting the duration of transient sound components of adjacent frames in accordance with the present invention.
  • Fig. 4 schematically shows a transients synthesis unit according to the present invention.
  • Fig. 5 schematically shows a sinusoid synthesis unit according to the present invention.
  • Fig. 6 schematically shows a consumer device according to the present invention.
  • the inventive device 1 shown merely by way of non- limiting example in Fig. 1 comprises a bitstream parser (BP) unit 10, a transient selection (SEL) unit 11, a transients synthesis (TS) unit 14, a sinusoid selection (SEL) unit 12, a sinusoid synthesis (SS) unit 15, a noise selection (SEL) unit 13, a noise synthesis (NS) unit 15, a spectrum building (SB) unit 16, an inverse fast Fourier transform (IFFT) unit 17, an overlap-and-add (OLA) unit 18, and a mixing (MIX) and output unit 19.
  • BP bitstream parser
  • SEL transient selection
  • TS transients synthesis
  • SEL sinusoid selection
  • SS sinusoid synthesis
  • SEL noise selection
  • NS noise synthesis
  • SB spectrum building
  • MIX overlap-and-add
  • MIX mixing
  • the bitstream parser 10 parses the input bitstream A and forms frames containing sound parameters.
  • the frames may contain transient parameters (TP), sinusoidal parameters (SS) and/or noise parameters (NP) representing transient, sinusoidal and noise sound components respectively.
  • the parameters of each frame are supplied to the transients synthesis unit 13, the sinusoidal synthesis unit 14 and the noise synthesis unit 15 respectively. It is noted that in some embodiments only one or two types of sound parameters may be distinguished, while in other embodiments three, four or more different types of sound parameters may be used.
  • the bitstream parser 10 may have multiple input terminals to receive multiple channels (for example multiple instruments in a synthesizer).
  • the transient parameters TP are not fed directly to the transients synthesis unit 14. Instead, the transient parameters TP are first supplied to the transient selection unit 11 which selects one transient out of the transients present in the particular frame (it is noted that in alternative embodiments more than a single transient per frame may be selected, for example two transients, while still obtaining at least part of the advantages of the present invention).
  • the selection unit 11 selects a single transient, for example the transient having the largest energy content, and outputs the parameters TP' of the selected transient.
  • the selection data sd which indicate whether a transient was selected, are sent to the sinusoid selection unit 12.
  • transient selection unit 11 is shown as a separate unit. However, the transient selection unit 11 may alternatively be incorporated in the transients synthesis unit 14. The transient selection unit 11 will later be explained in more detail with reference to Fig. 4.
  • the transients synthesis unit 14 synthesizes transient (sound) components TC using the selected transient parameters TP' and feeds the resulting samples Ts of this transient component to the mixing and output unit 19.
  • the sinusoid selection unit 12 receives the sinusoidal parameters SP and selects the parameters of one or more sinusoidal sound components. In the embodiment shown, this selection depends on the selection data sd received from the transient selection unit 11. If no transient is selected (typically, this means that no transient, or no transient having a significant amplitude is present in the current frame), the number of sinusoids can be relatively large, and all sinusoidal components of the current frame may be selected, for example.
  • the number of sinusoids may be reduced, as effected by the sinusoid selection unit 12. If only a relatively small transient is present in the frame, it may be omitted in favor of relatively large sinusoids, in dependence on control data sd sent from the sinusoid selection unit 12 to the transient selection unit 11.
  • a preferred embodiment of the sinusoid selection unit 12 will later be explained in more detail with reference to Fig. 5.
  • the sinusoid synthesis unit 14 synthesizes the selected sinusoidal (sound) components using the selected sinusoidal parameters SP' and produces sinusoidal sound coefficients Sc, which in the present embodiment are spectral (that is, Fourier) coefficients.
  • the coefficients Sc are inversely transformed by the inverse FFT (IFFT) unit 17.
  • IFFT inverse FFT
  • the resulting time domain samples are combined in the overlap-and-add (OLA) unit 18 to produce sinusoidal sound samples Ss, which are fed to the mixing and output unit 19.
  • the noise selection unit 13 similarly receives the noise parameters NP and selects the parameters of one or more noise sound components. In the embodiment shown, this selection depends on the selection data sd received from the transient selection unit 11 and the sinusoid selection unit 12. If no transient is selected (typically, this means that no transient, or no transient having a significant amplitude is present in the current frame), the number of noise components can be relatively large, and all noise components of the current frame may be selected, for example. If a transient is selected, as indicated by the selection data sd, the number of noise components may be reduced, also because the sinusoidal components will typically have less psycho-acoustic relevance. If a relatively large number of sinusoidal components is selected, as shown by the selection data sd received from the sinusoid selection unit 12, the number of noise components to be synthesized may be reduced.
  • the selection data sd may also be transferred in the opposite direction, for example reducing the number of transients if a certain number of sinusoids is synthesized, or suppressing a transient having a relatively low energy if the same frame contains sinusoids having a relatively high energy.
  • the noise synthesis unit 16 synthesizes noise (sound) components using the selected noise parameters NP', and also feed the noise sound samples Ns of the synthesized components to the mixing and output unit 19, where they are combined with the transients sound samples Ts and the sinusoidal sound samples Ss to produce the output signal B.
  • the sinusoid selection unit 12 and the noise selection unit 13 are shown to be separate units. In alternative embodiments, the sinusoid selection unit 12 and/or the noise selection unit 13 may be incorporated in the sinusoid synthesis unit 14 and/or the noise synthesis unit 16 respectively. Similarly, the inverse transform unit 17 and the overlap-and- add unit 18 could be incorporated into the sinusoid synthesis unit 15 to form a single, combined unit.
  • the sinusoid synthesis unit 15 comprises a convolution unit which performs a convolution of the spectral (or other transform domain) coefficients represented by the selected sinusoidal parameters SP' and a spectral (or other transform domain) representation of a suitable time window. The result of this convolution is a frame of spectral coefficients (in general: transform domain data), the length of the frame corresponding with a suitable transform length, for example 256 or 512 coefficients.
  • the convolution performed by the convolution unit (151 in Fig. 5) is schematically illustrated in Fig. 2, where an exemplary transform domain representation P has a single coefficient, which may for example represent a sinusoidal component.
  • This transform domain representation P is convolved with the transform domain representation Q of a time window, the symbol "*" denoting convolution (in Fig. 2 only the absolute values of representations P and Q are shown for the sake of clarity).
  • the resulting transform domain representation R has nine coefficients, eight more than the original representation P.
  • the convolution typically results in an increased number of non-zero coefficients, which may be referred to as additional transform domain coefficients.
  • this number of additional transform domain coefficients (typically spectral bins) is limited by a coefficient limiting (CL) unit (152 in Fig. 5).
  • the additional transform domain coefficients which are the result of the convolution operation increase the number of computations required for processing the coefficients. For this reason, the coefficient limiting unit (152 in Fig. 5) reduces the number of coefficients, if necessary, in order to increase the computational efficiency. In the illustration of Fig. 2, the number of coefficients is limited to a set S of five, thus discarding the other coefficients and reducing the number of parameters to be processed. It is noted that the number of additional coefficients generated also determines the time- frequency resolution of the synthesized signal.
  • the number of additional coefficients used depends advantageously on the original number of coefficients, and therefore on the number of sinusoidal components.
  • the number of additional coefficients used (contained in S in Fig. 2) is in a preferred embodiment inversely proportional to the number of original coefficients (P in Fig. 2).
  • the number of additional transform domain coefficients in a frame is equal to: - six if the original number of transform domain coefficients is smaller than three, four if the original number of transform domain coefficients is between three and five, two if the original number of transform domain coefficients is greater than four.
  • transient synthesis unit 14 A preferred embodiment of a transient synthesis (TS) unit 14 is illustrated in Fig. 4.
  • the embodiment shown is provided with a transients discontinuation (TD) unit 141 which serves to discontinue transients of a previous frame if a transient of the present frame is synthesized.
  • the transient Tl of the first frame Fl will continue into the second frame F2, causing the synthesis of both Tl and T2 in at least part of the second frame F2.
  • a further increase of the synthesis efficiency may be achieved when the sinusoidal synthesis (SS) unit 15 is provided with a coefficient limiting (CL) unit 152, as illustrated in Fig. 5.
  • the coefficient limiting (CL) 152 limits the number of sinusoids synthesized in a frame, depending on the presence of a synthesized transient in the same frame, and optionally also on psycho-acoustic criteria. As a result, the number of sinusoidal coefficients Sc is reduced, thus reducing the number of computations required.
  • the coefficient limiting unit 152 may be used in addition to, or instead of, the sinusoid selection unit 12.
  • the sinusoidal synthesis (SS) unit 15 is shown to further comprise a convolution (CON) unit 151 for convolving the transform domain coefficients represented by the selected sinusoidal parameters SP' with the transform domain representation of a time window.
  • the sinusoidal synthesis unit 15 may further comprise a coefficients generating unit (not shown) for generating the transform domain coefficients referred to above from the selected sinusoidal parameters SP', and a storage unit (not shown) for storing the transform domain representation of the time window.
  • the length of the time window is preferably chosen so as to allow an efficient transform and may have a length of, for example, 128, 256, 512 or 1024 coefficients, or 128 x N, 256 x N, etc. if oversampling is used, where N is the oversampling factor, which may for example be equal to 32.
  • a consumer device is schematically illustrated in Fig. 6.
  • the consumer device 9 is shown to comprise a sound synthesis device 1 according to the present invention.
  • the consumer device 9 may comprise additional elements, for example a sound data storage 2, an amplifier, loudspeaker, power source, control panel (not shown), etc..
  • the consumer device 9 may be a portable audio player, a cellular (mobile) telephone apparatus, a portable digital assistant (PDA), a music synthesizer, a gaming device, or any other consumer device capable of outputting a digital or acoustical sound signal.
  • the sound synthesis device 1 according to the present invention may also be used in sound systems, and is particularly suitable for use in parametric decoders and parametric synthesizers.
  • the present invention is based upon the insight that the efficiency of sound synthesis can be increased by selecting sound components to be synthesized, in particular when psycho-acoustic criteria are taken into account.
  • the present invention benefits from the further insight that only a single transient per frame can be synthesized without substantially affecting the sound quality.
  • the present invention benefits from the further insights that the number of sinusoids synthesized per frame may be reduced if a transient component is synthesized in the same frame, and that the number of additional coefficients produced by a transform domain convolution may be decreased while leaving the sound quality virtually unchanged.
  • any terms used in this document should not be construed so as to limit the scope of the present invention.
  • the words “comprise(s)” and “comprising” are not meant to exclude any elements not specifically stated.
  • Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents.
  • Each of the embodiments may be used in isolation, or be combined with any of the other embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Stereophonic System (AREA)

Abstract

A device (1) for producing sound samples from sound parameters representing sound components comprises a transient synthesis unit (14) for synthesizing transient sound components from transient sound parameters contained in each frame. To increase the efficiency of the synthesis, a transient selection unit (11) is arranged for selecting only a single transient sound component per frame. Additionally, the device may be arranged for producing fewer sinusoidal sound components if a transient is produced. Transform domain coefficients may be convolved with a transform domain representation of a time window representation, the number of resulting transform domain coefficients being controlled to further enhance the efficiency of the synthesis.

Description

Decoding sound parameters
FIELD OF THE INVENTION
The present invention relates to decoding sound parameters and synthesizing sound. More in particular, the present invention relates to a device for and a method of producing sound samples from sound parameters representing transient sound components, sinusoidal sound components and/or other sound components.
BACKGROUND OF THE INVENTION
It is well known to produce sound samples from sound parameters, such as temporal and/or spectral envelope parameters, spectral coefficients, and other parameters. Parametric decoders, for example, are capable of decoding such parameters and producing sound samples which can subsequently be converted into an analog sound signal. Parametric synthesizers likewise use sound parameters to produce sound samples.
The sound parameters and the resulting sound samples are typically arranged in frames: sets of data that may be processed in a single routine. Each frame may contain one or more parameters, which may be processed to produce a number of sound samples. As the number of sound samples may be much greater than the number of parameters from which they are derived, the parameters typically constitute an efficient representation of the sound.
Different types of sound parameters may be used to represent different components of the sound. For example, some sound parameters may represent only transient sound components, while other sound parameters may represent other sound components, for example sinusoidal components and/or noise components. As these sound components have different properties, they can be represented more efficiently by different sets of parameters.
The number of sound components per frame may be very large. However, synthesizing many sound components may require a large number of computations. This requires a device having a relatively large processing power, which is not feasible in many applications.
SUMMARY OF THE INVENTION
It is an object of the present invention to overcome these and other problems of the Prior Art and to provide a device for and method of producing sound samples from sound parameters which involve fewer computations. Accordingly, the present invention provides a device for producing sound samples from sound parameters representing transient sound components and other sound components, the device comprising means for reducing the number of sound parameters to be synthesized. More in particular, the present invention provides a device for producing sound samples from sound parameters representing sound components, the device comprising: at least one selection unit for receiving frames containing sound parameters which represent sound components and for selecting, for each frame, a limited number of sound components, and at least one synthesis unit for synthesizing selected sound components from their parameters.
The selection unit may be a transient selection unit for selecting a single transient sound component per frame, and the synthesis unit may be a transient synthesis unit for synthesizing any selected transient components.
By selecting only a single transient sound component in each frame containing transient sound components, the synthesis of multiple transient (sound) components per frame is avoided. It has been found that the synthesis of multiple transient components is computationally very demanding, and that the processing required can be significantly reduced by synthesizing only one transient component per frame. It has further been found that the quality of the sound is in most cases hardly affected. Thus the efficiency of the sound production is greatly improved while the omission of the further transients of each frame is hardly audible.
It will be understood that some frames may contain no transient sound components, in which case no transient component will be synthesized. Other frames may contain only a single transient component, which will accordingly be selected.
The transient selection unit may select the single transient to be synthesized in various ways. It is possible to select the first transient of each frame and ignore the (parameters of the) remaining ones. However, other criteria can be used to select a transient sound component. In a preferred embodiment, the selection unit is provided with means for selecting the transient sound component having the largest energy content.
Sound components of a particular frame, and in particular transients, may extend into the next frame. When synthesizing the sound of a frame, it is possible that part of the sound of the previous frame is also being synthesized. In such cases, it is still possible for two (or possibly even more than two) transient sound components to be synthesized simultaneously, even when the present invention is utilized. To further increase the efficiency of the synthesis, the transient synthesis unit is preferably provided with a discontinuation unit for discontinuing a transient sound component of a previous frame when synthesizing a transient sound component in the present frame.
The device of the present invention may additionally, or alternatively, comprise a sinusoid selection unit for selecting one or more sinusoidal sound components for each frame containing sinusoidal sound components, and a sinusoid synthesis unit for synthesizing selected sinusoidal sound components from their parameters. If the device also comprises a transient synthesis unit, the sinusoid selection unit may advantageously be dependent on the transient selection unit and may produce fewer sinusoidal sound components if the transient selection unit selects a transient for the same frame. Accordingly, the sinusoid selection unit is preferably controlled by the transient selection unit, the number of selected sinusoidal components depending on the presence of a transient component in the same frame.
In an embodiment comprising a sinusoid selection unit, reducing the number of sinusoids if a transient is being synthesized reduces the required number of computations. It has been found that this measure hardly affects the sound quality, as the transient "masks" the sinusoids. In frames containing no transients, all sinusoidal sound components may be selected and synthesized.
It is noted that the feature of producing fewer sinusoidal sound components if the transient synthesis unit produces a transient for the same frame can be used independently, and can therefore also be used in devices that synthesize more than one transient per frame. If a particular frame contains no transients but the previous frame did, a transient may still be synthesized. In such cases, the number of sinusoids may also be reduced to reduce the computational load. The selection of sinusoidal components and transient components is preferably based on their psycho-acoustical relevance, while the sinusoid selection and the transient selection may mutually influence each other. As the synthesis of sinusoids in a transform domain is generally more efficient than in the time domain, it is preferred that the sinusoidal sound parameters represent transform domain coefficients, or represent data that can be converted into transform domain coefficients. In addition, the device preferably further comprises an inverse transform unit for transforming transform domain coefficients into time domain samples. The transform domain preferably is the frequency domain, in particular the complex spectrum domain, the inverse transform being an inverse fast Fourier transform (IFFT). However, other transform domains and associated (inverse) transforms may be used, for example the (discrete) cosine transform domain or the quadrature mirror filter (QMF) transform domain. It is noted that the sound parameters may be transform domain coefficients, such as Fourier coefficients, but that it may also be possible to generate transform domain coefficients from the sound parameters. In the former case the sound parameters are equal to transform domain coefficients, while in the latter case the sound parameters represent such coefficients or equivalent data and may be converted into transform domain sound coefficients.
In a preferred embodiment, the sinusoidal synthesis unit comprises a convolution unit for convolving the transform domain sound coefficients with a transform domain representation of a time window, and a coefficient limiting unit for limiting the number of additional transform domain sound coefficients resulting from the convolution. The coefficient limiting unit may effectively limit the number of sound coefficients after convolution by selecting a sub-set of the available set of coefficients.
It is advantageous to process the sound coefficients using a representation of a time window so as to produce sound data (coefficients or samples) corresponding with a suitable time duration. The processing may involve multiplication when the sound parameters represent time domain coefficients, or convolution when the sound parameters represent transform domain coefficients. A convolution typically causes an increase in the number of non-zero transform domain coefficients. This, however, also increases the amount of processing required.
According to a further aspect of the present invention, the coefficient limiting unit may be arranged for limiting the number of transform domain coefficients in a frame in dependence of the original number of sound parameters in the frame. For example, the number of selected additional coefficients may be small if the original number of coefficients is large. In this way, the total number of coefficients may be kept approximately constant, or at least below a certain maximum. Alternatively, the number of additional coefficients may be kept approximately constant or below a certain maximum.
The number of additional coefficients may be limited in various ways. In a particularly advantageous embodiment, the number of additional coefficients in a frame is equal to: six if the original number of coefficients is smaller than three, four if the original number of coefficients is between three and five, two if the original number of coefficients is greater than four.
It will be understood, however, that these numbers may depend on the particular frame length and other considerations, such as the energy of the respective sinusoidal components, and will generally depend on the particular embodiment. In particular, the numbers stated above may apply per frequency band, preferably per ERB band or similar band, as the well-known ERB (Equivalent Rectangular Bandwidth) scale takes psycho-acoustic considerations into account.
The device of the present invention may comprise a noise selection unit for selecting, for each frame, noise sound components to be synthesized, and a noise synthesis unit for synthesizing selected noise sound components from their parameters. By selecting noise components prior to the synthesis, the computational load can be further reduced. The selection of noise components may be independent or may depend on the selection of transient and/or sinusoidal components. The device of the present invention may further comprise an output unit for outputting the sound samples, the output unit preferably being provided with means for adding overlapping frames. That is, the output unit may use the well-known overlap-and-add technique to combine the frames into an output signal.
Additionally, or alternatively, the device of the present invention may comprise a frame forming unit for forming frames containing sound parameters, in which case the transient selection unit, the sinusoid selection unit and/or the noise selection unit receives the frames from the frame forming unit.
The present invention further provides a consumer device comprising a device as defined above, as well as a sound system comprising a device as defined above. The consumer device of the present invention may be a portable consumer device, such as a mobile (US: cellular) telephone apparatus, a solid state music player, such as an MP3 player, a music synthesizer, or any other suitable device.
The present invention also provides a method of producing sound samples from sound parameters representing transient sound components and other sound components, the method comprising the steps of: receiving frames containing sound parameters which represent sound components, selecting, for each frame, a limited number of sound components, and synthesizing any selected sound components from their parameters. The method of the present invention has the same advantages as the device discussed above.
The selected sound components may comprise only a single transient component per frame. The method of the present invention may further comprise the step of synthesizing sinusoidal sound components from sinusoidal sound parameters contained in a frame, and producing fewer sinusoidal sound components if at least one transient sound component for the same frame is produced.
The sound parameters may represent transform domain parameters or data that can be converted into transform domain parameters, the method preferably further comprising the step of inversely transforming parameters.
Advantageously, the method of the present invention may comprise the step of convolving the transform domain sound coefficients with a transform domain representation of a time window, and limiting the number of additional sound coefficients resulting from the convolution. The method of the present invention may also comprise the step of forming frames containing sound parameters which represent one or more sound components.
Further method steps according to the present invention will become apparent from the detailed description of the invention below.
The present invention additionally provides a computer program product for carrying out the method as defined above. A computer program product may comprise a set of computer executable instructions stored on a data carrier, such as a CD or a DVD. The set of computer executable instructions, which allow a programmable computer to carry out the method as defined above, may also be available for downloading from a remote server, for example via the Internet.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will further be explained below with reference to exemplary embodiments illustrated in the accompanying drawings, in which:
Fig. 1 schematically shows an exemplary embodiment of a device according to the present invention.
Fig. 2 schematically shows the process of limiting the number of parameters after convolution in accordance with the present invention.
Fig. 3 schematically shows limiting the duration of transient sound components of adjacent frames in accordance with the present invention. Fig. 4 schematically shows a transients synthesis unit according to the present invention.
Fig. 5 schematically shows a sinusoid synthesis unit according to the present invention. Fig. 6 schematically shows a consumer device according to the present invention.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
The inventive device 1 shown merely by way of non- limiting example in Fig. 1 comprises a bitstream parser (BP) unit 10, a transient selection (SEL) unit 11, a transients synthesis (TS) unit 14, a sinusoid selection (SEL) unit 12, a sinusoid synthesis (SS) unit 15, a noise selection (SEL) unit 13, a noise synthesis (NS) unit 15, a spectrum building (SB) unit 16, an inverse fast Fourier transform (IFFT) unit 17, an overlap-and-add (OLA) unit 18, and a mixing (MIX) and output unit 19. In the embodiment shown, the device 1 receives an input bitstream A which comprises sound parameters, and produces an output signal B which comprises time domain sound samples.
The bitstream parser 10 parses the input bitstream A and forms frames containing sound parameters. The frames may contain transient parameters (TP), sinusoidal parameters (SS) and/or noise parameters (NP) representing transient, sinusoidal and noise sound components respectively. The parameters of each frame are supplied to the transients synthesis unit 13, the sinusoidal synthesis unit 14 and the noise synthesis unit 15 respectively. It is noted that in some embodiments only one or two types of sound parameters may be distinguished, while in other embodiments three, four or more different types of sound parameters may be used. The bitstream parser 10 may have multiple input terminals to receive multiple channels (for example multiple instruments in a synthesizer).
According to the present invention, the transient parameters TP are not fed directly to the transients synthesis unit 14. Instead, the transient parameters TP are first supplied to the transient selection unit 11 which selects one transient out of the transients present in the particular frame (it is noted that in alternative embodiments more than a single transient per frame may be selected, for example two transients, while still obtaining at least part of the advantages of the present invention). The selection unit 11 selects a single transient, for example the transient having the largest energy content, and outputs the parameters TP' of the selected transient. The selection data sd, which indicate whether a transient was selected, are sent to the sinusoid selection unit 12.
In the embodiment of Fig. 1 the transient selection unit 11 is shown as a separate unit. However, the transient selection unit 11 may alternatively be incorporated in the transients synthesis unit 14. The transient selection unit 11 will later be explained in more detail with reference to Fig. 4.
The transients synthesis unit 14 synthesizes transient (sound) components TC using the selected transient parameters TP' and feeds the resulting samples Ts of this transient component to the mixing and output unit 19. The sinusoid selection unit 12 receives the sinusoidal parameters SP and selects the parameters of one or more sinusoidal sound components. In the embodiment shown, this selection depends on the selection data sd received from the transient selection unit 11. If no transient is selected (typically, this means that no transient, or no transient having a significant amplitude is present in the current frame), the number of sinusoids can be relatively large, and all sinusoidal components of the current frame may be selected, for example. If a transient is selected, as indicated by the selection data sd, the number of sinusoids may be reduced, as effected by the sinusoid selection unit 12. If only a relatively small transient is present in the frame, it may be omitted in favor of relatively large sinusoids, in dependence on control data sd sent from the sinusoid selection unit 12 to the transient selection unit 11. A preferred embodiment of the sinusoid selection unit 12 will later be explained in more detail with reference to Fig. 5.
The sinusoid synthesis unit 14 synthesizes the selected sinusoidal (sound) components using the selected sinusoidal parameters SP' and produces sinusoidal sound coefficients Sc, which in the present embodiment are spectral (that is, Fourier) coefficients. The coefficients Sc are inversely transformed by the inverse FFT (IFFT) unit 17. The resulting time domain samples are combined in the overlap-and-add (OLA) unit 18 to produce sinusoidal sound samples Ss, which are fed to the mixing and output unit 19.
The noise selection unit 13 similarly receives the noise parameters NP and selects the parameters of one or more noise sound components. In the embodiment shown, this selection depends on the selection data sd received from the transient selection unit 11 and the sinusoid selection unit 12. If no transient is selected (typically, this means that no transient, or no transient having a significant amplitude is present in the current frame), the number of noise components can be relatively large, and all noise components of the current frame may be selected, for example. If a transient is selected, as indicated by the selection data sd, the number of noise components may be reduced, also because the sinusoidal components will typically have less psycho-acoustic relevance. If a relatively large number of sinusoidal components is selected, as shown by the selection data sd received from the sinusoid selection unit 12, the number of noise components to be synthesized may be reduced.
The selection data sd may also be transferred in the opposite direction, for example reducing the number of transients if a certain number of sinusoids is synthesized, or suppressing a transient having a relatively low energy if the same frame contains sinusoids having a relatively high energy. The noise synthesis unit 16 synthesizes noise (sound) components using the selected noise parameters NP', and also feed the noise sound samples Ns of the synthesized components to the mixing and output unit 19, where they are combined with the transients sound samples Ts and the sinusoidal sound samples Ss to produce the output signal B.
The sinusoid selection unit 12 and the noise selection unit 13 are shown to be separate units. In alternative embodiments, the sinusoid selection unit 12 and/or the noise selection unit 13 may be incorporated in the sinusoid synthesis unit 14 and/or the noise synthesis unit 16 respectively. Similarly, the inverse transform unit 17 and the overlap-and- add unit 18 could be incorporated into the sinusoid synthesis unit 15 to form a single, combined unit. In the exemplary embodiment of Fig. 1, the sinusoid synthesis unit 15 comprises a convolution unit which performs a convolution of the spectral (or other transform domain) coefficients represented by the selected sinusoidal parameters SP' and a spectral (or other transform domain) representation of a suitable time window. The result of this convolution is a frame of spectral coefficients (in general: transform domain data), the length of the frame corresponding with a suitable transform length, for example 256 or 512 coefficients.
The convolution performed by the convolution unit (151 in Fig. 5) is schematically illustrated in Fig. 2, where an exemplary transform domain representation P has a single coefficient, which may for example represent a sinusoidal component. This transform domain representation P is convolved with the transform domain representation Q of a time window, the symbol "*" denoting convolution (in Fig. 2 only the absolute values of representations P and Q are shown for the sake of clarity). In the present example, the resulting transform domain representation R has nine coefficients, eight more than the original representation P. Although the total number of transform domain coefficients may not be altered, the convolution typically results in an increased number of non-zero coefficients, which may be referred to as additional transform domain coefficients. According to a further aspect of the present invention, this number of additional transform domain coefficients (typically spectral bins) is limited by a coefficient limiting (CL) unit (152 in Fig. 5).
The additional transform domain coefficients (or "side bins") which are the result of the convolution operation increase the number of computations required for processing the coefficients. For this reason, the coefficient limiting unit (152 in Fig. 5) reduces the number of coefficients, if necessary, in order to increase the computational efficiency. In the illustration of Fig. 2, the number of coefficients is limited to a set S of five, thus discarding the other coefficients and reducing the number of parameters to be processed. It is noted that the number of additional coefficients generated also determines the time- frequency resolution of the synthesized signal.
The number of additional coefficients used depends advantageously on the original number of coefficients, and therefore on the number of sinusoidal components. To reduce the total number of coefficients, the number of additional coefficients used (contained in S in Fig. 2) is in a preferred embodiment inversely proportional to the number of original coefficients (P in Fig. 2). In a particularly preferred embodiment, the number of additional transform domain coefficients in a frame is equal to: - six if the original number of transform domain coefficients is smaller than three, four if the original number of transform domain coefficients is between three and five, two if the original number of transform domain coefficients is greater than four.
It will be understood that the actual number of additional transform domain coefficients used will depend on the particular embodiment. These numbers may apply per frequency band, preferably per ERB band or similar band.
A preferred embodiment of a transient synthesis (TS) unit 14 is illustrated in Fig. 4. The embodiment shown is provided with a transients discontinuation (TD) unit 141 which serves to discontinue transients of a previous frame if a transient of the present frame is synthesized. As further illustrated in Fig. 3, transients Tl and T2 may be synthesized in adjacent frames Fl and F2, first frame Fl starting at t = 0 and second frame F2 starting at t = 1. The transient Tl of the first frame Fl will continue into the second frame F2, causing the synthesis of both Tl and T2 in at least part of the second frame F2. To prevent the synthesis of multiple transients, the first transient Tl is discontinued when the second frame F2 starts at t = 1. A further increase of the synthesis efficiency may be achieved when the sinusoidal synthesis (SS) unit 15 is provided with a coefficient limiting (CL) unit 152, as illustrated in Fig. 5. The coefficient limiting (CL) 152 limits the number of sinusoids synthesized in a frame, depending on the presence of a synthesized transient in the same frame, and optionally also on psycho-acoustic criteria. As a result, the number of sinusoidal coefficients Sc is reduced, thus reducing the number of computations required. The coefficient limiting unit 152 may be used in addition to, or instead of, the sinusoid selection unit 12.
The sinusoidal synthesis (SS) unit 15 is shown to further comprise a convolution (CON) unit 151 for convolving the transform domain coefficients represented by the selected sinusoidal parameters SP' with the transform domain representation of a time window. The sinusoidal synthesis unit 15 may further comprise a coefficients generating unit (not shown) for generating the transform domain coefficients referred to above from the selected sinusoidal parameters SP', and a storage unit (not shown) for storing the transform domain representation of the time window. The length of the time window is preferably chosen so as to allow an efficient transform and may have a length of, for example, 128, 256, 512 or 1024 coefficients, or 128 x N, 256 x N, etc. if oversampling is used, where N is the oversampling factor, which may for example be equal to 32.
A consumer device according to the present invention is schematically illustrated in Fig. 6. The consumer device 9 is shown to comprise a sound synthesis device 1 according to the present invention. In addition, the consumer device 9 may comprise additional elements, for example a sound data storage 2, an amplifier, loudspeaker, power source, control panel (not shown), etc.. The consumer device 9 may be a portable audio player, a cellular (mobile) telephone apparatus, a portable digital assistant (PDA), a music synthesizer, a gaming device, or any other consumer device capable of outputting a digital or acoustical sound signal. The sound synthesis device 1 according to the present invention may also be used in sound systems, and is particularly suitable for use in parametric decoders and parametric synthesizers.
The present invention is based upon the insight that the efficiency of sound synthesis can be increased by selecting sound components to be synthesized, in particular when psycho-acoustic criteria are taken into account. The present invention benefits from the further insight that only a single transient per frame can be synthesized without substantially affecting the sound quality. The present invention benefits from the further insights that the number of sinusoids synthesized per frame may be reduced if a transient component is synthesized in the same frame, and that the number of additional coefficients produced by a transform domain convolution may be decreased while leaving the sound quality virtually unchanged.
It is noted that any terms used in this document should not be construed so as to limit the scope of the present invention. In particular, the words "comprise(s)" and "comprising" are not meant to exclude any elements not specifically stated. Single (circuit) elements may be substituted with multiple (circuit) elements or with their equivalents. Each of the embodiments may be used in isolation, or be combined with any of the other embodiments.
It will therefore be understood by those skilled in the art that the present invention is not limited to the embodiments illustrated above and that many modifications and additions may be made without departing from the scope of the invention as defined in the appending claims.

Claims

CLAIMS:
1. A device (1) for producing sound samples from sound parameters representing sound components, the device comprising: at least one selection unit (11, 12, 14) for receiving frames containing sound parameters which represent sound components and for selecting, for each frame, a limited number of sound components, and at least one synthesis unit (14, 15, 16) for synthesizing any selected sound components from their parameters.
2. The device according to claim 1, comprising a transient selection unit (11) for selecting, for each frame containing transient sound components, a single transient sound component, and a transient synthesis unit (14) for synthesizing any selected transient sound components from their parameters.
3. The device according to claim 2, wherein the transient selection unit (11) is provided with means for selecting the transient sound component having the largest energy content.
4. The device according to claim 2, wherein the transient synthesis unit (14) is provided with a discontinuation unit (141) for discontinuing a transient sound component of a previous frame when synthesizing a transient sound component in the present frame.
5. The device according to claim 1, comprising a sinusoidal selection unit (12) for selecting, for each frame, one or more sinusoidal sound components, and a sinusoidal synthesis unit (15) for synthesizing selected sinusoidal sound components from their parameters.
6. The device according to claims 2 and 5, wherein the sinusoidal selection unit (12) reduces the number of selected sinusoidal components if the transient selection unit (11) selects a transient component for the same frame.
7. The device according to claim 5, further comprising an inverse transform unit
(17).
8. The device according to claim 5, wherein the sinusoidal selection unit (12) comprise a convolution unit (151) for convolving the transform domain coefficients with a transform domain representation of a time window, and wherein the sinusoidal selection unit (12) is preferably also provided with a coefficient limiting unit (152) for limiting the number of additional transform domain coefficients resulting from the convolution.
9. The device according to claim 8, wherein the coefficient limiting unit (152) limits the number of additional transform domain coefficient in a frame in dependence on the original number of sound parameters in the frame, preferably per frequency band.
10. The device according to claim 1, comprising a noise selection unit (13) for selecting, for each frame, noise sound components to be synthesized, and a noise synthesis unit (16) for synthesizing noise sound components from their parameters.
11. A consumer device comprising a device (1) according to claim 1.
12. A sound system comprising a device (1) according to claim 1.
13. A method of producing sound samples from sound parameters representing transient sound components and other sound components, the method comprising the steps of: receiving frames containing sound parameters which represent sound components, selecting, for each frame, a limited number of sound components, and synthesizing any selected sound components from their parameters.
14. The method according to claim 13, wherein the selecting step involves selecting, for each frame, a single transient sound component, and wherein the synthesizing step involves synthesizing any selected transient sound components from their parameters.
15. The method according to claim 14, wherein the selecting step involves selecting the transient sound component having the largest energy content.
16. The method according to claim 14, wherein the synthesizing step involves discontinuing a transient sound component of a previous frame when synthesizing a transient sound component in the present frame.
17. The method according to claim 13, further comprising the step of synthesizing sinusoidal sound components from sinusoidal sound parameters contained in a frame, and selecting sinusoidal sound components prior to the synthesis.
18. The method according to claim 14 and 17, further comprising the step of reducing the number of selected sinusoidal components if a transient sound component for the same frame is produced.
19. The method according to claim 13, wherein the sound parameters represent transform domain coefficients, the method preferably further comprising the step of inversely transforming said transform domain coefficients.
20. The method according to claim 19, further comprising the step of convolving the transform domain coefficients with a transform domain representation of a time window, and preferably limiting the number of additional transform domain coefficients resulting from the convolution.
21. The method according to claim 13, further comprising the steps of synthesizing noise sound components from noise sound parameters contained in a frame, and selecting noise sound components prior to the synthesis.
22. A computer program product for carrying out the method according to claim 13.
PCT/IB2007/052488 2006-06-29 2007-06-27 Decoding sound parameters WO2008001316A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP07789815A EP2038882A2 (en) 2006-06-29 2007-06-27 Decoding sound parameters
US12/306,605 US20090308229A1 (en) 2006-06-29 2007-06-27 Decoding sound parameters
JP2009517552A JP2009543112A (en) 2006-06-29 2007-06-27 Decoding speech parameters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP06116297.0 2006-06-29
EP06116297 2006-06-29

Publications (2)

Publication Number Publication Date
WO2008001316A2 true WO2008001316A2 (en) 2008-01-03
WO2008001316A3 WO2008001316A3 (en) 2008-02-21

Family

ID=38704357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2007/052488 WO2008001316A2 (en) 2006-06-29 2007-06-27 Decoding sound parameters

Country Status (5)

Country Link
US (1) US20090308229A1 (en)
EP (1) EP2038882A2 (en)
JP (1) JP2009543112A (en)
CN (1) CN101479789A (en)
WO (1) WO2008001316A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452392A (en) * 2013-01-08 2017-12-08 杜比国际公司 The prediction based on model in threshold sampling wave filter group

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5063364B2 (en) * 2005-02-10 2012-10-31 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech synthesis method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991016769A1 (en) 1990-04-12 1991-10-31 Dolby Laboratories Licensing Corporation Adaptive-block-length, adaptive-transform, and adaptive-window transform coder, decoder, and encoder/decoder for high-quality audio

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5886276A (en) * 1997-01-16 1999-03-23 The Board Of Trustees Of The Leland Stanford Junior University System and method for multiresolution scalable audio signal encoding
US5903872A (en) * 1997-10-17 1999-05-11 Dolby Laboratories Licensing Corporation Frame-based audio coding with additional filterbank to attenuate spectral splatter at frame boundaries
US6266003B1 (en) * 1998-08-28 2001-07-24 Sigma Audio Research Limited Method and apparatus for signal processing for time-scale and/or pitch modification of audio signals
US6266644B1 (en) * 1998-09-26 2001-07-24 Liquid Audio, Inc. Audio encoding apparatus and methods
JP3751001B2 (en) * 2002-03-06 2006-03-01 株式会社東芝 Audio signal reproducing method and reproducing apparatus
CN1886783A (en) * 2003-12-01 2006-12-27 皇家飞利浦电子股份有限公司 Audio coding
US7454332B2 (en) * 2004-06-15 2008-11-18 Microsoft Corporation Gain constrained noise suppression
WO2006003813A1 (en) * 2004-07-02 2006-01-12 Matsushita Electric Industrial Co., Ltd. Audio encoding and decoding apparatus
US8476518B2 (en) * 2004-11-30 2013-07-02 Stmicroelectronics Asia Pacific Pte. Ltd. System and method for generating audio wavetables
KR101315075B1 (en) * 2005-02-10 2013-10-08 코닌클리케 필립스 일렉트로닉스 엔.브이. Sound synthesis
JP5063364B2 (en) * 2005-02-10 2012-10-31 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Speech synthesis method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1991016769A1 (en) 1990-04-12 1991-10-31 Dolby Laboratories Licensing Corporation Adaptive-block-length, adaptive-transform, and adaptive-window transform coder, decoder, and encoder/decoder for high-quality audio

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MARENTAKIS G. ET AL.: "Sinusoid synthesis optimization", ICMC, INTERNATIONAL COMPUTER MUSIC CONFERENCE, PROCEEDINGS, 2002, pages 1 - 4

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452392A (en) * 2013-01-08 2017-12-08 杜比国际公司 The prediction based on model in threshold sampling wave filter group
CN107452392B (en) * 2013-01-08 2020-09-01 杜比国际公司 Model-based prediction in critically sampled filterbanks
US10971164B2 (en) 2013-01-08 2021-04-06 Dolby International Ab Model based prediction in a critically sampled filterbank
US11651777B2 (en) 2013-01-08 2023-05-16 Dolby International Ab Model based prediction in a critically sampled filterbank
US11915713B2 (en) 2013-01-08 2024-02-27 Dolby International Ab Model based prediction in a critically sampled filterbank

Also Published As

Publication number Publication date
US20090308229A1 (en) 2009-12-17
CN101479789A (en) 2009-07-08
EP2038882A2 (en) 2009-03-25
JP2009543112A (en) 2009-12-03
WO2008001316A3 (en) 2008-02-21

Similar Documents

Publication Publication Date Title
US9407993B2 (en) Latency reduction in transposer-based virtual bass systems
EP1851760B1 (en) Sound synthesis
KR100908055B1 (en) Coding / decoding apparatus and method
KR101370354B1 (en) Low complexity parametric stereo decoder
US8295508B2 (en) Processing an audio signal
EP2907324B1 (en) System and method for reducing latency in transposer-based virtual bass systems
EP1851752B1 (en) Sound synthesis
EP2525352A1 (en) Audio-processing device, audio-processing method and program
US20090308229A1 (en) Decoding sound parameters
RU2433489C2 (en) Parametric multichannel decoding
US20160179458A1 (en) Digital signal processing using a combination of direct and multi-band convolution algorithms in the time domain
EP2038881B1 (en) Sound frame length adaptation
US20090245526A1 (en) Device for and method of adding reverberation to an input signal
US7668848B2 (en) Method and system for selectively decoding audio files in an electronic device
EP1519619B1 (en) Loudspeaker sensitive sound reproduction
CN101479790B (en) Noise synthesis
CN117157706A (en) Audio decorrelator, processing system and method for decorrelating audio signals

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 200780024376.4

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07789815

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2007789815

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 12306605

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2009517552

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

NENP Non-entry into the national phase

Ref country code: RU