WO2012058650A2 - Low bit rate signal coder and decoder - Google Patents

Low bit rate signal coder and decoder Download PDF

Info

Publication number
WO2012058650A2
WO2012058650A2 PCT/US2011/058479 US2011058479W WO2012058650A2 WO 2012058650 A2 WO2012058650 A2 WO 2012058650A2 US 2011058479 W US2011058479 W US 2011058479W WO 2012058650 A2 WO2012058650 A2 WO 2012058650A2
Authority
WO
WIPO (PCT)
Prior art keywords
model
data
parameters
frame
complete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2011/058479
Other languages
English (en)
French (fr)
Other versions
WO2012058650A3 (en
Inventor
Anton Yen
Irina Gorodnitsky
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US13/882,195 priority Critical patent/US10084475B2/en
Priority to RU2013124363/08A priority patent/RU2565995C2/ru
Priority to AU2011320141A priority patent/AU2011320141B2/en
Priority to BR112013010518A priority patent/BR112013010518A2/pt
Priority to JP2013536900A priority patent/JP5815723B2/ja
Priority to MX2013004802A priority patent/MX337311B/es
Priority to KR1020137013787A priority patent/KR101505341B1/ko
Priority to EP11837236.6A priority patent/EP2633625A4/en
Application filed by Individual filed Critical Individual
Priority to CN201180063393.5A priority patent/CN103348597B/zh
Publication of WO2012058650A2 publication Critical patent/WO2012058650A2/en
Publication of WO2012058650A3 publication Critical patent/WO2012058650A3/en
Priority to IL226045A priority patent/IL226045A/en
Anticipated expiration legal-status Critical
Priority to US16/044,329 priority patent/US10686465B2/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • One or more embodiments of the invention generally relate to the field of signal and data modeling, compression/decompression (lossless and lossy), coding/decoding, and analysis such as detection and classification. More particularly, one or more embodiments of the invention relate to an excitation model, and based on it, systems for obtaining new signal models.
  • encoding The process of transforming a source sequence into a set of model parameters is called encoding and restoring is referred to as decoding. Therefore, the same methods can be applied to either signal modeling or coding.
  • a coder is assumed to be used in combination with a second process, a decoder which reconstructs the signal from its coded parameters.
  • coding can be viewed as a technique that encompasses modeling as part of its process.
  • an input signal is divided into intervals, often called frames, sections, or events.
  • Each frame can be transformed by windowing and/or filtering, and possibly other operations, to obtain a windowed/filtered/transformed frame.
  • Standard oscillator models transform a current data frame into a small set of parameters consisting of delays or pointers and weight coefficients associated with them.
  • the pointers reference fixed-lengths blocks in a buffer containing a restored version of the earlier acquired data frames.
  • the restoration of a frame takes place once its model parameters have been estimated, and the restored frame is kept in memory, creating a sequence of historical data that represents a restored version of the input sequence.
  • the blocks of these historic data are chosen so that their weighted sum provides the 'best match' to the current data frame, where 'best match' may be defined, in many typical applications, as the one which minimizes the mean squared error between the current frame and its model.
  • 'best match' may be defined, in many typical applications, as the one which minimizes the mean squared error between the current frame and its model.
  • the Adaptive Multi-Rate (AMR) family of codecs used in mobile telecommunications typically utilize three models in tandem, first a linear predictor (LP) for modeling short scale patterns, followed by an "adaptive codebook” (AC), which is an improved SEV-like model that can encode mid-to-long scale structures, and finally a third model, which encodes the residual remaining after the first two models have been applied.
  • the AC model in AMR improves on the traditional SEV by allowing some limited section of data from the current input frame to be used for modeling that data. This extends the range of structures that one can model with AC to mid-to-long scale structures. However, this improvement still may not allow modeling of all source scales, which is why LP is used prior to AC in AMR.
  • MBE Multiband Excitation
  • IMBE IMBE
  • AMBE Multiband Excitation
  • Coding a single frame in a form of multiple models or components means that the frame is represented by the corresponding multiple sets of coding parameters, each typically assigned a fixed coding budget. Encoding signals with multiple sets of parameters may not be efficient if a comparable modeling quality can be achieved with a smaller, single set of parameters. The need to represent signals efficiently in a small set of parameters in order to extract information, maximize transmission rates, and to minimize memory in storage systems, all motivate development of the more efficient coding technologies.
  • Fig. 1 illustrates an exemplary a block diagram depicting three basic components of the
  • COMPLETE-based analysis/coding system in accordance with an embodiment of the present invention
  • Fig. 2 illustrates an exemplary a block diagram of the essential analysis components for estimating parameters of the COMPLETE model (), which shows basic blocks of the code generating module 170 in greater detail;
  • Fig. 3 illustrates an exemplary a block diagram of the COMPLETE synthesizer/decoder that restores the signal from the received COMPLETE parameters, in accordance with an
  • FIG. 4 illustrates an exemplary a block diagram that illustrates components of a general multimodal COMPLETE/KAM system, in accordance with an embodiment of the present invention
  • FIG. 5 illustrates an exemplary a block diagram depicting an example of a speech analysis (encoding) system utilizing a multimodal COMPLETE/PACT implementation, in accordance with an embodiment of the present invention
  • FIG. 6 illustrates a typical computer system that, when appropriately configured or designed, can serve as a computer system in which the invention may be embodied.
  • a reference to “a step” or “a means” is a reference to one or more steps or means and may include sub-steps and subservient means. All conjunctions used are to be understood in the most inclusive sense possible. Thus, the word “or” should be understood as having the definition of a logical “or” rather than that of a logical “exclusive or” unless the context clearly necessitates otherwise. Structures described herein are to be understood also to refer to functional equivalents of such structures. Language that may be construed to express approximation should be so understood unless the context clearly dictates otherwise. [0024] Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art to which this invention belongs.
  • references to "one embodiment,” “an embodiment,” “example embodiment,” “various embodiments,” etc., may indicate that the embodiment(s) of the invention so described may include a particular feature, structure, or characteristic, but not every embodiment necessarily includes the particular feature, structure, or characteristic. Further, repeated use of the phrase "in one
  • a commercial implementation in accordance with the spirit and teachings of the present invention may configured according to the needs of the particular application, whereby any aspect(s), feature(s), function(s), result(s), component(s), approach(es), or step(s) of the teachings related to any described embodiment of the present invention may be suitably omitted, included, adapted, mixed and matched, or improved and/or optimized by those skilled in the art, using their average skills and known techniques, to achieve the desired implementation that addresses the needs of the particular application.
  • a "computer” may refer to one or more apparatus and/or one or more systems that are capable of accepting a structured input, processing the structured input according to prescribed rules, and producing results of the processing as output.
  • Examples of a computer may include: a computer; a stationary and/or portable computer; a computer having a single processor, multiple processors, or multi-core processors, which may operate in parallel and/or not in parallel; a general purpose computer; a supercomputer; a mainframe; a super mini-computer; a mini-computer; a workstation; a micro-computer; a server; a client; an interactive television; a web appliance; a telecommunications device with internet access; a hybrid combination of a computer and an interactive television; a portable computer; a tablet personal computer (PC); a personal digital assistant (PDA); a portable telephone; application-specific hardware to emulate a computer and/or software, such as, for example, a digital signal processor (DSP), a field-programmable gate array (FPGA), an application specific
  • Software may refer to prescribed rules to operate a computer. Examples of software may include: code segments in one or more computer-readable languages; graphical and or/textual instructions; applets; pre-compiled code; interpreted code; compiled code; and computer programs.
  • a "computer-readable medium” may refer to any storage device used for storing data accessible by a computer. Examples of a computer-readable medium may include: a magnetic hard disk; a floppy disk; an optical disk, such as a CD-ROM and a DVD; a magnetic tape; a flash memory; a memory chip; and/or other types of media that can store machine -readable instructions thereon.
  • a "computer system” may refer to a system having one or more computers, where each computer may include a computer-readable medium embodying software to operate the computer or one or more of its components.
  • Examples of a computer system may include: a distributed computer system for processing information via computer systems linked by a network; two or more computer systems connected together via a network for transmitting and/or receiving information between the computer systems; a computer system including two or more processors within a single computer; and one or more apparatuses and/or one or more systems that may accept data, may process data in accordance with one or more stored software programs, may generate results, and typically may include input, output, storage, arithmetic, logic, and control units.
  • a "network” may refer to a number of computers and associated devices that may be connected by communication facilities.
  • a network may involve permanent connections such as cables or temporary connections such as those made through telephone or other communication links.
  • a network may further include hard-wired connections (e.g., coaxial cable, twisted pair, optical fiber, waveguides, etc.) and/or wireless connections (e.g., radio frequency waveforms, free- space optical waveforms, acoustic waveforms, etc.).
  • Examples of a network may include: an internet, such as the Internet; an intranet; a local area network (LAN); a wide area network (WAN); and a combination of networks, such as an internet and an intranet.
  • Exemplary networks may operate with any of a number of protocols, such as Internet protocol (IP), asynchronous transfer mode (ATM), and/or synchronous optical network (SONET), user datagram protocol (UDP), IEEE 802.x, etc.
  • IP Internet protocol
  • ATM asynchronous transfer mode
  • SONET synchronous optical network
  • UDP user datagram protocol
  • IEEE 802.x IEEE 802.x
  • Embodiments of the present invention may include apparatuses for performing the operations disclosed herein.
  • An apparatus may be specially constructed for the desired purposes, or it may comprise a general-purpose device selectively activated or reconfigured by a program stored in the device.
  • Embodiments of the invention may also be implemented in one or a combination of hardware, firmware, and software. They may be implemented as instructions stored on a machine- readable medium, which may be read and executed by a computing platform to perform the operations described herein.
  • “computer readable medium” may be used to generally refer to media such as, but not limited to, removable storage drives, a hard disk installed in hard disk drive, and the like. These computer program products may provide software to a computer system. Embodiments of the invention may be directed to such computer program products.
  • An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
  • processors may refer to any device or portion of a device that processes electronic data from registers and/or memory to transform that electronic data into other electronic data that may be stored in registers and/or memory.
  • a “computing platform” may comprise one or more processors.
  • a non-transitory computer readable medium includes, but is not limited to, a hard drive, compact disc, flash memory, volatile memory, random access memory, magnetic memory, optical memory, semiconductor based memory, phase change memory, optical memory, periodically refreshed memory, and the like; however, the non-transitory computer readable medium does not include a pure transitory signal per se.
  • some embodiments of the invention will be referred to here as a coder/decoder, with the understanding that the coding part is equally applicable to signal and data modeling and analysis.
  • compression is often associated with coding discrete memoryless sources - where any existing pattern in the source evolution is treated statistically rather than in a model form.
  • Some embodiments of present invention are at least useful with regard to the type of coding where a data sequence evolves according to some quantifiable rule, and more specifically, useful to obtaining a model - a closed form representation of this rule.
  • a practical embodiment of the present invention is to model/encode as much of the signal information as possible using one parsimonious model, so as to replace as much as possible the multiple sets of parameters used for encoding in the current art. Many practical embodiments decode information back with as much fidelity as possible.
  • the modeling method termed the Complete Oscillator Plus External Excitation (COMPLETE)
  • COMPLETE improves on the current models, in that it may account simultaneously for all scales in the source pattern as well as random features in the data, thus enabling users in many cases to apply a single highly efficient model in place of multiple models used in the existing art.
  • COMPLETE improves on the standard oscillator models as follows.
  • the reconstructed frame may not be equal to the modeled signal.
  • the discrepancy between the restored and the modeled input signal may accumulate with each subsequent frame being modeled and eventually lead to model instability.
  • the maximal number of data points from the current input is allowed to be used for modeling that input and the entire input content may be encoded with such a model, as provided by this invention, the discrepancy that leads to model instability can grow rapidly, quickly making the model unstable. This is one challenge solved in many practical embodiments of the invention, which ensure model stability by evaluating multiple candidate reconstructed frames during the model estimation process.
  • the standard model fitting methods are extended in the embodiments to incorporate multiple evaluation metrics, which among other functions, ensures model stability.
  • COMPLETE model is not equivalent to a combination of various existing short and long-scale models. Rather, the resulting COMPLETE representations are distinct from any existing models. Furthermore, using COMPLETE to comprehensively capture the entire signal structure at once leads to very parsimonious models with far fewer parameters than the total required for separate short and long-scale models in existing art. Due to this efficiency, a wide range of various signal classes can be modeled with high accuracy according to the invention, some with as few as two model terms. Many practical embodiments of COMPLETE pertain to lossy coding of signals that may contain noise and nonstationary features. Further, some_embodiments of the invention may provide lossless coding for completely
  • Oscillators use redundancies in the structure of the acquired data to develop a model.
  • oscillators are generally considered poorly suited for modeling transient features, such as isolated events, discontinuities, and noise-like features, that are not well defined in the acquired data patterns.
  • Speech is one example of fast changing signal in which the ratio between semi-oscillatory and noise-like energy can change abruptly.
  • the COMPLETE model can be robust to some such conditions considered unfavorable for typical oscillators, for example in the presence of colored noise and certain transients, with the degree of COMPLETE robustness being determined by the complexity of the specific chosen functional form of the COMPLETE model and the specific embodiment of the external excitation vectors.
  • COMPLETE can diminish when such unfavorable conditions are pronounced.
  • the potential loss of performance is not desirable in applications in which a certain level of performance must be met.
  • Many practical embodiments of the invention constitute systems that combine COMPLETE with the known in the art methods (KAMs), for the purpose of reaching the desired level of performance of a COMPLETE -based coder or improving coding efficiency of a KAM.
  • KAMs known in the art methods
  • Another utility of such multimodal COMPLETE/KAM systems according to the present embodiment of invention is to provide initialization for COMPLETE.
  • innovations to apply the above principles encompass the following: an improved excitation model that extends the range of model references to include a mixture of information derived from data history, the maximal causal range of data from the current input, additional information derived by the system from sources other than data history, and a dictionary of predetermined waveforms; innovative methods for estimating such models that can employ multiple metrics and several different types of model outputs for the purposes of selecting the optimal model and ensuring stability of the model; improved methods for decoding signals that reconstruct the unavailable reference data using the model parameters derived from that data; methods and systems for combining at least some embodiment of the invention with known in the art methods that can be used to initialize COMPLETE and to enhance efficiency of COMPLETE and/or existing coding/compression methods; and a sample speech coder/decoder realized from these innovations, as detailed next.
  • methods for modeling and encoding an input frame use an improved range of model references.
  • a part of the data from the current frame all the data from the current input frame, except for the very last point, are used as internal excitation input.
  • This allows the short- and long-scale patterns in the source to be encoded by one comprehensive source model.
  • external excitations that are not part of the previously acquired frames are included as possible model references.
  • the external excitation vectors enable one to model unstructured features in data as well as recently emerged structures. External excitations can also be used to initialize the COMPLETE model.
  • the mixture of internal and external references allows one to model with a single
  • methods are provided for estimating the parameters of such complete oscillator models and for restoring (decoding) signals from such parameters which use an innovative process of reconstructing unavailable model references point- by-point during both, model estimation (the evaluation stage) and decoding.
  • COMPLETE model expand the range of metric options that can be used to estimate the best model tailored to a specific application. Further, a multi-step process is provided for evaluating a single model using multiple metrics. In addition, the expanded range of metrics is applied to different model outputs, including the output that is restored from the model parameters without using data from the current input frame. A key aspect of this improvement comes from the fact that such restored frame may not be equal to the frame model used to estimate its parameters. The restored model output used in evaluation helps ensure stability of the COMPLETE model across frames.
  • KAMs known in the art methods
  • a method and system for speech coding/decoding is provided based on a multimodal COMPLETE/KAM system.
  • Fig. 1 illustrates an exemplary block diagram depicting three basic components of the
  • COMPLETE-based analysis/coding system in accordance with an embodiment of the present invention.
  • Fig. 1 shows a diagram of the basic blocks of a Complete Oscillator PLus External Excitation (COMPLETE) based encoding system 10.
  • COMPLETE 10 can contain a preprocessor 120, which builds an L-sample input frame and which can transform an input, for example by windowing and filtering.
  • the preprocessing operations can also include an optional analog sampling, performed in unit 90, which can convert an analog input into a digital signal.
  • COMPLETE 10 can further includes COMPLETE generator module 170, which is the main COMPLETE code generating module; and a postprocessor 160 which can organize/store/send the derived code parameters, and which can also analyze/ transform these parameters.
  • Input can be a signal evolving in time or a spatial vector, such as a data sequence extracted from a 2-D image.
  • Digital or analog input can be supplied to the preprocessing module in Fig. 1. Analog inputs can be first sampled in unit 90 prior to being passed to preprocessor 120, while digital signal inputs can be directly received by preprocessor 120.
  • the subscript 'k' refers to the order in which the frame was acquired. Throughout the description, the subscript 'k' will indicate the current frame X k being modeled, and the subscript 'k-s', where 's' is an integer value, will refer to the frame acquired V frames prior to the current frame.
  • Preprocessor 120 can also filter, window, or otherwise transform an input frame, using known methods that would be appropriate for the application. Further, overlapping frames X k may be created, where some part of data from the preceding frame X k -i is retained in preprocessor 120 and used to make a part of the current frame X k .
  • Fig. 2 illustrates an exemplary block diagram of the essential analysis components for estimating parameters of the COMPLETE model (Eq. (1)), which shows basic blocks of the code generating module 170 in greater detail.
  • COMPLETE generator module 170 can include storage unit 1 10, reference buffer (RB) 130, model estimator/e valuator 140, and a signal synthesizer (decoder) 150.
  • Unit 140 models the input frame X k as a function of reference vectors as described in detail in this mode.
  • the reference vectors are supplied to unit 140 from reference buffer RB 130, which itself receives and organize inputs from storage unit 1 10 and preprocessor 120.
  • Storage unit 1 10 can store some form of history of the received signal and, in some
  • external reference vectors defined below.
  • Storage unit 1 10 in Fig. 2 stores reference information that can be used to model the current frame.
  • One type of information unit 1 10 can store is the data history derived from the input received prior to the current frame X k . Such historical values can provide part of the reference data for the COMPLETE.
  • Each Y k _ s stored in 1 10 has been synthesized from the derived COMPLETE parameters for the corresponding frame 'k-s' by the decoder 150, using methods described more fully below.
  • storage unit 1 10 can store some form of the actual earlier inputs, optionally transformed as described above in preprocessor 120. If overlapping frames are used, the
  • appropriately weighted actual/restored overlapping frames can be combined in 1 10 to reproduce the actual/restored input sequence within the overlap.
  • references that can be stored in unit 1 10 are called 'external references' or 'external excitations', to distinguish them from the 'internal' excitations derived from the previously acquired frames.
  • external references [E ls E h ] can be a set of predefined waveforms, a basic example would be a set of unit-amplitude sinusoids of various frequencies, and these waveforms or the parameters from which they can be synthesized, can be placed in unit 110 prior to the start of the system operation. The choice and the number of such waveforms would be typically dictated by the applications at hand and the hardware constraints.
  • external references can be inferred in several of the units of system 10 during its operation.
  • the inferred excitations are encoded by system 10 as a set of parameters that can be used by a decoder to reconstruct the inferred excitation waveforms.
  • preprocessor 120 can estimate parameters that measure noise-like energy in the current frame X k and use those parameters to generate an external excitation. Such estimation can be done using known in the art methods. For instance, some speech coders use Fourier Transform based methods to estimate parameters of noise-like energy in the input.
  • external excitations are inferred in 140 or 150 using an output of an estimated model, some embodiments may use the inferred external excitation, in addition to the existing reference vectors, to estimate a new model for the current data frame.
  • some embodiment options can employ both types of external excitations, the a priori defined and the inferred external references.
  • the frame content When at least some of the frame content must be reconstructed using external references, it may be because the frame contains random events, such as a pulse or noise-like energy. Alternatively, it may be because the available signal history does not contain sufficient for COMPLEX source pattern information, for example during the initialization of the COMPLETE system operation. Yet, another reason can be a change in the source structure itself, in which case the historical data may not have all the new source features. In the latter two cases, the content modeled by the external references is a part of the source structure and in the preferred embodiment it is incorporated into the data history. In this case, unit 110 can store the Y k frame that was reconstructed from all the used references, internal and external.
  • unit 110 can store a version of the input reconstructed only from the internal references included in its model, which would correspond to a source model based purely on the past source patterns. Yet, in other embodiments, unit 110 can store a version of the input reconstructed from the internal references included in its model and only some of the included external references. For example, only the a priori defined external references may be used and not the inferred ones. The choice among these options depends on the specific application and many implementations of the above embodiments can be designed based on basic principles.
  • storage unit 110 has a fixed length storage capacity.
  • One cycle of the source pattern is required to model the entire source pattern with COMPLETE, but using more cycles provides robustness when operating in non-ideal environments.
  • the chosen storage capacity of unit 110 would depend on the specific application, but it can also be constrained by other considerations, such as the hardware limits.
  • the size of the storage unit 110 can be maintained at a specified capacity by deleting the oldest internal reference frames each time the new frame Yk placed in unit 110.
  • a similar strategy may be used to update external reference vectors in some embodiments.
  • Unit 130 combines and arranges the reference information from unit 110 and all except the most recent point of the current frame Xk, i.e. L-l points [x(n-L+l), ..., x(n-l)], which in Fig. 2 are received from preprocessor 120.
  • the information is arranged in the reference buffer labeled as RB in Fig. 2 and it is accessed by units 140 and 150.
  • Fig. 1 the reference buffer labeled as RB in Fig. 2
  • the vectors in RB are arranged sequentially, starting with the external reference waveforms [E ls Eh], which can be supplied by unit 110 or can be synthesized from their corresponding parameters in 110, if appropriate, prior to being supplied to unit 130; followed by the [Yk p , ⁇ ⁇ , Yk-i] sequence derived from data history as was described above; and then followed by the most recent L-l data points Xk(n i) at the end.
  • An infinite number of other buffer configurations can be used for RB, as long as the different parts of RB are indexed consistently by the various units of system 10.
  • storage unit 110 does not contain frames of historical data when the system starts its operation and such data can also be cleared in some embodiments anytime the source pattern changes significantly.
  • the COMPLETE system 10 can start to generated data history in storage unit 110 using a priori provided external references and/or L-l data points [x(n-L+l), . .. , x(n-l)] from the current input X ⁇ , using all except the most recent point.
  • a full L-point reference block can be produced in this case from the current input by augmenting the L-1 input points with an additional point which can be created in unit 130 by either repeating one of the existing points, for example by creating [x(n-L+l), x(n-L+l), . ..
  • L-point reference block then can be created from the L-1 point output of such a model by either repeating some data point in the restored frame or by extrapolating a data point from some of the points in the restored frame. Once such L-point reference block is created, it can be stored in unit 1 10 and used as reference for modeling subsequent frames.
  • KAM known in the art method
  • storage unit 1 10, preprocessor 120, and buffer 130 do not need to be implemented as physically separate components, but can be implemented in a single software or hardware component, or split arbitrarily across multiple components, as long all the information can be accessed though appropriate software or hardware to emulate the operations described above.
  • the data sequences in units 1 10, 120, and 130 can refer to all types of transformed, filtered, windowed, or approximated versions of input data.
  • the sequences in storage unit 1 10, preprocessor 120, and buffer 130 can refer to some component, such as a wavelet component, of some raw/trans formed/approximate version of the input signal.
  • Estimator/evaluator unit 140 performs the COMPLETE model estimating function.
  • G denotes the specified function form, linear or nonlinear
  • N is the number of B d ( i ) blocks used in Eq. (1);
  • B d( i ) is the 1 th block, referred to as the i th 'tap', that is drawn from the RB in 130 beginning with the entry d(i) of RB;
  • d(i) denotes the i th delay (may also be written as di for convenience) and is the pointer to the entry in RB 130 that is the first element in the block B d( i ) .
  • the COMPLETE model defined by Eq. (1) accounts simultaneously for the short- and long-scale patterns as well as unstructured components in the input.
  • the set of delays ⁇ di ⁇ provides a means of identifying the appropriate blocks (B d( i ) ⁇ in RB 130.
  • Model estimation amounts to estimating the values of delays ⁇ di ⁇ and any variables that are specific to the function G.
  • the function form represented by G is typically tailored to the particular application at hand, and can be nonlinear. In many applications, however, linear COMPLETE containing a few taps is sufficient to model with high accuracy many types of signals. For this reason, to facilitate the description of the more pertinent features of the present embodiment of the invention, the subsequent description of the current mode will place emphasis on linear COMPLETE, with nonlinear COMPLETE being envisioned in some embodiments of the present invention.
  • Linear COMPLETE expresses X k as a linear combination of blocks (B d( i ) ⁇ :
  • N, di, B d ( i ) are as defined above in Eq. (1);
  • 3 ⁇ 4 is the relative scaling (weight) of the corresponding block B d ( i
  • the parameters ⁇ a;,di ⁇ can be estimated in COMPLETE estimator/evaluator 140 by adapting one of several known methods for fitting parametric models to data.
  • the embodiments described here generally perform two basic procedures. The first procedure generates a multitude of candidate models. The results are then passed to the second procedure, which evaluates
  • the 'best performing' model is defined as the model that provides the best outcome as measured by preset criteria.
  • [B d (t)] is a matrix whose columns are the blocks B d (t) selected from 130;
  • inv[B j(t) ] is a pseudo-inverse of the [B d (t)] matrix, computed using known methods, for example the singular value decomposition method;
  • [a i, a t ]' is a column vector composed of the coefficients ⁇ a t ⁇ ;
  • X' k is a column vector composed of the elements of the input X k .
  • COMPLETE parameters may be quantized, by means described below.
  • the third step computes, for each candidate set ⁇ a t ,d t ⁇ j , the COMPLETE model output as defined in detail later.
  • steps 1 and 3 A number of implementation options exists for the steps 1 and 3 in particular. Several implementations of steps 1 (and the corresponding adjustments to step 2) are described next.
  • [0070] Three methods for generating candidate parameter sets are described below, which can correspondingly use an exhaustive search, a sequential search, and a constrained search to generate candidate delay sets. All three methods draw delay values from the integer set D, which itself is constructed from the range [1 maxd]. Some values in the [1 maxd] set do not provide meaningful delays and do not need to be included as part of D. Specifically, all the points in the sequence composed of [Yk- P , ⁇ , Yk-i] typically provide meaningful references so the delays that index Bd(i) blocks of these points would typically be included in D. On the other hand, blocks Bd ( i ) that span two external references, e.g. Ei and ⁇ 1+1 , or the external reference E and Y k-P , typically do not provide meaningful references so the delays which index those blocks would not be included in D.
  • An exhaustive search method selects combinations of N delays from D and then computes the corresponding coefficients ⁇ al A , a2 A , ..., aN A ⁇ for each combination by solving Eq. 3 above.
  • the exhaustive search method can generate all feasible sets of delay and the corresponding coefficient values first and then evaluate the outcomes from all the resulting candidate models to identify the optimal parameter set that produces the best model outcome.
  • feasible sets of parameters can be generated and evaluated in sets of groups.
  • One of many existing intelligent search methods, such as evolutionary programming may be used to implement exhaustive searches. .
  • the exhaustive search method can produce the closest to the optimal model among the three search methods but it can be computationally expensive when estimating COMPLETE models that contain more than two delays.
  • Another method is a sequential search approach, which uses an iterative process where each step finds a subset of best parameter values.
  • the optimal d2* value is found by evaluating all the candidate two-delay COMPLETES in which the dl * value is kept fixed to the optimal value found in the first iteration, and the candidate d2 A values are selected from D.
  • the coefficients ⁇ al A , a2 A ⁇ corresponding to a candidate set of delays ⁇ dl *, d2 A ⁇ are calculated by solving Eq. 3 as before. Note that while the value of dl * is fixed after the first iteration, the value of the coefficient al is not and must be recalculated in each subsequent iteration. The process repeats until the optimal values for all the COMPLETE parameters are obtained.
  • the sequential search method can produce near-optimal results at a significantly lower complexity than the exhaustive search.
  • the third method is a constrained search which can combine certain aspects of the preceding two methods.
  • a sequential search is performed as described above to produce a "seed" estimate ⁇ dl A , d2 A , dN A ⁇ .
  • the exhaustive search procedure described above is used within this constrained candidate delay range to generate new sets of candidate parameters and evaluate the corresponding model outcomes to identify the parameter values that produce the best model outcome. The performance of this method is typically between that of the first two methods.
  • candidate parameter sets can be sorted in the order of decreasing values of the smallest delay in each set, so that models referencing the most recent data history (indexed by largest delays) can be evaluated first during the candidate model
  • the candidate model evaluation/selection process can then terminate when the first model that meets the desired performance criteria is found.
  • Another strategy which can be used by itself or in conjunction with the previous strategy, is to generate only a subset of candidate parameter sets that correspond to the most preferable references and evaluate this subset first. As with the first strategy, the candidate model evaluation/selection process can terminate if a model that meets the desired performance criteria is found from the first subset. Otherwise, candidate sets involving less preferable references can be created and evaluated next.
  • More advanced embodiments can include complex trade -off criteria that can allow users to favor specific references in the models even when the choice leads to subpar results. For example, in some embodiments, a predetermined loss in performance can be allowed for models which depend only on 'internal' references [Yk- P , ⁇ , Yk-i], if such models are preferred. In general, many others are possible.
  • the first procedure outputs a set of parameterized candidate models derived according to Eq. (2). Adaptations of various methods that can be used by
  • estimator/e valuator 140 for the purpose of evaluating quality of these candidate models are described next.
  • a model is defined by an equation
  • its output is computed using this equation.
  • such standard output X k (j) would be used to evaluate model quality.
  • a common measure in this regard is the mean squared error (MSE), given for the instance of candidate parameter set ⁇ a , d ⁇ j as:
  • the best candidate model is identified in this case as the model which minimizes the
  • MSE MSE-MSE
  • a large number of metrics can be substituted in place of the MSE to analyze statistical, temporal and frequency domain properties of a model output.
  • the choice of a metric or metrics can be determined by the needs of the specific application.
  • the invention without limitation, covers use of any metric, either existing in the art or designed based on basic known principles, individually or in conjunction with other known metrics, for the purpose of evaluating the quality of the candidate COMPLETE models. Several such metrics will be described later in this section.
  • a second model output which is Y k synthesized in synthesizer 150 from the parameters supplied by unit 140, is used as part of model evaluation/selection process and it is also used in creating data history references in unit 1 10.
  • the synthesized Y k may not be equal to the estimated X k for a given parameter set in the case of COMPLETE, because Y k may be derived from the reconstructed data in the input frame, while X k references the actual input X k .
  • both estimates, X k and Y k can be computed using approximate rather than exact model parameters, for example quantized parameters, where quantization is done by means described below.
  • the existing metrics used to evaluate closed form models such as the MSE in Eq. (4), can be modified where all instances of the model estimate X k are replaced with an estimate X computed using Eq. (2) with approximate, e.g. quantized, parameters, or alternatively replaced with an estimate Y k synthesized either from the exact or approximate model parameters.
  • it is more meaningful to use certain metrics with the synthesized Y k and other metrics with X k and the choice would be determined by the application at hand.
  • model evaluations based on Y k can be viewed as related to analysis-by-synthesis technique in which system outputs synthesized from various inputs are compared to select the best output.
  • a metric based on synthesized output used in the COMPLETE model selection process in many embodiments is coupled with other metrics, typically involving X k or X k , so the entire COMPLETE model estimation procedure comprises a multi-step evaluation process.
  • an embodiment that uses two metrics sequentially may first generate q best candidate models according to the minimum MSE criterion given by Eq. (4), then synthesize in synthesizer (decoder) 150 the q outputs Y k from the quantized parameters of these q best candidate models and pass the q synthesized outputs to estimator/e valuator unit 140 where they are then evaluated using a second metric, for instance the PWE perceptual metric given in Eq.
  • Mode 3 of the present embodiment of the invention provides a specific case of a speech coder that utilizes multiple evaluation metrics.
  • the design of evaluation procedures based on multiple metrics is a part of the COMPLETE that not only adapts it to practical applications but is used to ensure stability of the COMPLETE model for the expanded range of references that it employs.
  • Non- limiting examples of some metrics are given next, written for the instance of evaluating Y k , but, as stated above, these and other metrics can alternatively be utilized with X k or X k to evaluate COMPLETE quality, if justified by a given application.
  • SIGNAL-TO-NOISE RATIO Minimum signal-to-noise ratio (SNR) is a common criterion used for selectin the optimal model. For Y k , SNR can be computed as
  • f s is the sampling frequency
  • f is the frequency bin of interesting ranging from [0, fs];
  • j is the index of the candidate parameter set ⁇ 3 ⁇ 4 , di ⁇ j .
  • PWE perceptually- wei hted error
  • I indicates the magnitude spectrum
  • Xk (u) [x (n-u-L+ 1 ), . .. , x(n-u- 1 )] denotes the length-Z, data sequence that has latency u- 1 with respect to the last point of the current frame.
  • model parameters may be quantized in the process of their estimation in 140 or
  • Quantization can be implemented using any number of methods from the existing art, including but not limited to vector quantization for the coefficients ⁇ ai ⁇ , scalar quantization for the delays ⁇ di ⁇ , and all derivatives thereof.
  • the outcome of the evaluations performed in estimator/evaluator 140 is the parameter set ⁇ a; , di ⁇ that produces the best model outcome. If the optimal model utilizes inferred external references, the parameters needed to reconstruct these references also become part of the output code from unit 140.
  • the COMPLETE code for the input frame can include model parameters and, if applicable, parameters for the external references.
  • the final code can be outputted from estimator/evaluator 140 to synthesizer 150 and also to the post-processor 160 for storage and/or transmission. According to the embodiment shown in Fig. 2, if the desired form of the optimal restored frame Yk was not saved during model estimation process, it is synthesized in unit 150 from the supplied parameters and outputted to storage unit 1 10.
  • Unit 160 can further process or transform the COMPLETE code prior to storing/transmitting it by means appropriate for the application at hand. For example, if the parameters have not been quantized in 140, they may be quantized in post-processor 160 using methods in existing art as stated above.
  • the analysis steps described above can be transferred in a straightforward way to any nonlinear model that consist of a weighted sum of linear and nonlinear functions of Bd(i). Further, a general nonlinear function can be approximated by a truncated polynomial expansion which consists of a weighted sum of B d( i) blocks themselves as well the elements of B d( i) blocks raised to some power.
  • the analysis methods described above can be adapted to estimate such polynomial expansions of nonlinear models, as follows.
  • the delays di can index data blocks in the reference buffer 130 as before.
  • the blocks of data B d(i) are retrieved for the selected delay values as above and are used to compute sets of new blocks ⁇ B A ter m(c) ⁇ , where each B A ter m(c) corresponds to the c th term of the polynomial expansion.
  • each B A ter m(c) corresponds to the c th term of the polynomial expansion.
  • their corresponding weights in the polynomial expansion are computed analogous to the coefficients ⁇ a ; ⁇ in the linear COMPLETE case, by substituting in Eq. 3 the terms ⁇ B A ter m(c) ⁇ for the ⁇ B Q ⁇ .
  • the rest of the analysis can proceed as described for linear COMPLETE above.
  • Fig. 3 illustrates an exemplary block diagram of the COMPLETE synthesizer/decoder that restores the signal from the received COMPLETE parameters, in accordance with an
  • Decoding refers to the operation performed in synthesizer 150 in Fig 2 and also in unit 250 in Fig 3.
  • the synthesizer (decoder) 150/250 restores to some precision the original input frame from the supplied parameters.
  • the synthesis parameters are inputted from estimator/evaluator 140, and in the case of the standalone decoding system in Fig. 3, the parameters are obtained from the transmitted/stored code.
  • unit 210 stores and arranges the restored 'signal history' [Y k - P , Y k -i] and also any a priori defined external references, either their actual waveforms or the parameters needed for their generation.
  • the arrangement in 210 mirrors the arrangement of these references in unit 130 in Fig. 2.
  • the parameters needed to generate inferred external references, if any, are also supplied as part of the transmitted/stored code to the decoder in Fig 3 and are used to generate these external reference waveforms.
  • the current frame Y k is restored from the supplied parameter set analogous to the
  • synthesizers 150/250 synthesize the entries of Y k point-by-point, beginning with the earliest point of the current frame and advancing toward the end of the frame, estimating each point as:
  • Fig. 4 illustrates an exemplary block diagram that illustrates components of a general multimodal COMPLETE/KAM system, in accordance with an embodiment of the present invention.
  • Fig. 4 shows the general structure of a hybrid, multi-mode COMPLETE/KAM system 400, which encodes an input frame by choosing among various forms of the COMPLETE units 10 a , 10b, , 10 n , and various known in the art methods (KAMs) 405 a , 405b, ...405 n , and combinations thereof, the various blocks and units of which may be implemented in hardware, software, or a combination thereof.
  • KAMs known in the art methods
  • the embodiments of system 400 can provide two practical functions: 1) Initialization of the COMPLETE; and 2) Improving performance of a KAM or alternatively, performance of the COMPLETE in applications where the COMPLETE by itself does not provide the desired level of performance.
  • 'pattern-breaking' events in the input signal such as significant rises in unstructured energy, discontinuities, and transients that occur on a short time-scale relative to the scale on which on-going patterns in the signal evolve, can negatively impact COMPLETE performance.
  • KAM 405 can be used to encode some parts of the signal and COMPLETE can be used to encode other parts, to enhance the overall performance over what can be provided by the KAM or the COMPLETE alone.
  • the COMPLETE/KAM system 400 in Fig. 4 can include a preselector 410, which can analyze the input signal X k and choose which COMPLETE 10, KAM 405, or combination of COMPLETE 10/KAM 405 to be used to model X k ; a COMPLETE/KAM encoding module 415, which can contain a bank of various COMPLETES 10 and KAMs 405 model estimation units, which can be activated by the preselector 410 and/or the postselector 430; a storage unit 420, which contains restored earlier input frames [Yk p , ⁇ , Yk-i] that can be accessed by the COMPLETES 10 and, if required, by KAMs 405 unit(s) of COMPLETE/KAM Module 415; a postselector 430, which routes the relevant output from the ultimately selected 'best' model to storage 420 and postprocessor 440 and, optionally, it can evaluate the outputs of the candidate models supplied from COMPLETE/KAM Module 4
  • the data preprocessing functions in preselector 410 that produces input frames for the multimodal COMPLETE/KAM system can be analogous to the preprocessing functions in the preprocessor 120 of Fig. 1 described in Mode 1, so the description of this component and associated preprocessing steps is not repeated for Mode 2.
  • the functions in postprocessor unit 440 in Fig. 4 can be implemented analogously to the functions in postprocessor 160 in Mode 1 (See. Fig. 1), with the exception that postprocessor 440 can perform an additional function, which is to package the parameters of the final model together with their model identifier code.
  • the model identifier code is supplied to 440 by postselector 430, along with the model parameters, and consists of the minimum number of bits necessary for a given system to specify which model or combination of models has been used for encoding the current frame.
  • Preselector 410 and postselector 430, and COMPLETE/KAM Module 415 in Fig. 4 provide the main blocks for various embodiments of the COMPLETE/KAM system. Embodiments of three basic implementations of the COMPLETE/KAM system will be described below following the description of operations performed by units 410 and 430, and COMPLETE/KAM Module 415.
  • preselector 410 can select a set of COMPLETES 10 and/or
  • preselector 410 can be a simple de-multiplexer that selects between just two models, a single COMPLETE 10 and a single implemented KAM 405, and may, optionally, also select a
  • preselector 410 can perform sophisticated processes of selecting methods in COMPLETE/KAM Module 415 based on the nature of the input signal X k . Such selection processes can involve computing parameters for the input X k that reflect statistical and/or deterministic properties of the signal, analyzing these properties and then using the results to select the combinations of multiple COMPLETES 10 a -10 travel and KAMs 405 a -405 shadow to model X k .
  • the computed parameters for the input X k can reflect any number of statistical, temporal, frequency, and time-frequency properties of the signal, which can be obtained using prior art methods.
  • the computed parameter values can be analyzed relative to preset baselines/thresholds or other predetermined metrics.
  • preselector 410 can be used for detecting 'pattern-breaking' events.
  • preselector 410 can analyze consistency of certain parameters across the current and the preceding frames X k , using known methods. For example, preselector 410 can compare the distribution of the deterministic energy in X k relative to that in some preceding input frames [X k _ p ... X k -i]. The distribution can be measured, for example, by computing fundamental frequency (called pitch period in speech coding) and other parameters which can reveal the proportion of quasi-periodic energy (V) and noise-like energy (U) in the frames. These parameters can be estimated using known in the art methods.
  • some speech coders compute U and V parameters using Fourier Transform (FT) based methods, such as Fast Fourier Transforms (FFTs), to make voiced/unvoiced determination for each frame.
  • FT Fourier Transform
  • the computed parameters V(t, w), for the quasi-periodic energy and U(t, w), for the noise-like energy are functions of time (t) and frequency (w).
  • Other known methods for computing these parameters can also be used.
  • the computed distribution of the quasi-periodic and noise-like energy in time and frequency in the given frame relative to the distribution of these quantities in the preceding frames could control whether and how many COMPLETES 10 and KAMs 405 can be selected by preselector 410.
  • Such control process can be implemented in a number of known ways as well, taking into account the desired quality of the output.
  • the distribution of the quasi-periodic energy V and noise-like energy U can be partitioned into ranges or bins and a particular choice of a COMPLETE and/or KAM can be assigned to each bin.
  • Preselector 410 can also receive control commands from external resources, which can modify the settings in preselector 410 or, alternatively, the commands can be integrated as part of the decision logic in preselector 410.
  • external resources which can modify the settings in preselector 410 or, alternatively, the commands can be integrated as part of the decision logic in preselector 410.
  • knowledge of when 'pattern-breaking' events occur may be available outside the COMPLETE/KAM system, in which case the external command can supply this information to preselector 410, thus freeing preselector 410 from performing such analyses.
  • COMPLETE/KAM Module 415 in Fig. 4 contains a bank of one or more COMPLETE 10 and KAM 405 estimators. Each COMPLETE 10 in COMPLETE/KAM Module 415 estimates a different functional form of the COMPLETE. For example,
  • COMPLETE/KAM Module 415 can contain a bank of 4 COMPLETE units, where each individual unit estimates a linear COMPLETE with a specific number of delays, ranging from 1 to 4. Each COMPLETE 10 or KAM 405 can be assumed to stay inactive until it is switched "on” by an input either from preselector 410, postselector 430, or from another COMPLETE 10 or KAM 405 within COMPLETE/KAM Module 415. Thus, the COMPLETE and KAM units can be switched "on” and applied to the provided input individually or in various combinations, that is in-series, in-parallel, or a mix of in-series and in-parallel combinations.
  • the first selected unit encodes X k
  • the next unit encodes the residual output of the first, and so forth, the end result being a serial model, for example (COMPLETE 10 a + KAM 405 a + ... KAM 405 e ).
  • the first selected unit encodes a part of X k
  • the next unit encodes another part of X k
  • these described approaches can be used in conjunction with each other to create any combination of COMs and KAMs.
  • KAM 405 units can use known methods to estimate their respective models. For the
  • COMPLETE 10 units in Fig. 4 the same implementations can be used as for COMPLETE estimator/evaluator 140 and synthesizer 150 of COMPLETE 10 described above (See Fig. 2), with the following exception.
  • the evaluation of candidate model quality which was described for COMPLETE evaluator/estimator 140 above may be split in Mode 2 between the model estimation units of COMPLETE/KAM Module 415 and postselector 430. The way this part of the process may be split can depend on the choice of a particular COMPLETE/KAM system implementation, with some choices being described more fully below.
  • model evaluation function may be divided between COMPLETE/KAM Module 415 and postselector 430, however, the overall process and the metrics used for evaluating the candidate models to select the optimal model for the given method are analogous to those described for unit 140. Further, it should be noted that in several embodiments, candidate model outputs are synthesized within the respective COMPLETE or KAM estimation unit. An alternative embodiment can use other components within or outside COMPLETE/KAM Module 415 to synthesize these model outputs for some of the embodiments.
  • COMPLETE/KAM Module 415 shown in Fig. 4 are used to represent the different forms of the COMPLETE 10 and the KAM strictly for the sake of clarity of the description. Estimation of several model types can be accomplished within a single unit or split in some way across several units, in which cases software or hardware would be used to select the specific terms appropriate for the desired model. For example, instead of using four separate units to estimate the four linear COMPLETES, each having a different number of delays ranging from 1 to 4, COMPLETE/KAM Module 415 may have a single COMPLETE unit allowing up to four delays and the desired number of delays would be chosen during the model estimation process.
  • the modeling results can be supplied to postselector 430 for further processing.
  • postselector 430 can receive results from the COMPLETE/KAM Module 415 and may assess the supplied results.
  • the choice of a particular logical structure of the COMPLETE/KAM system 400 controls how much processing is performed in postselector 430.
  • One function that can be performed in postselector 430 is an evaluation of analysis outcomes received from COMPLETE/KAM Module 415.
  • Two types of evaluations can be performed. The first type evaluates model quality and can be used to help select among the various candidate models obtained from a single modeling method. The second type of evaluation can be used to choose among the results obtained from different COMPLETES 10 and/or KAMs 405 in COMPLETE/KAM Module 415.
  • the first type of evaluation can be implemented in postselector 430 using methods for evaluating model quality which were described for COMPLETE
  • the second type can be implemented using the same methods for evaluating model quality as in the COMPLETE estimator/evaluator 140 in Mode 1 , but it can also include performance measures other than those related to the model quality. Examples include coding efficiency in terms of the number of bits required to encode the given parameter set, computational complexity, model robustness with respect to environmental noise, quantization robustness, and other performance metrics that are known in the art and suitable for the specific applications at hand. All these known measures can be computed using prior art methods. Further, multiple performance metrics can be used in conjunction with each other and with measures related to model quality, in which case the evaluation would typically involve a performance trade-off based on multiple metrics. One example is a selection criterion that involves a trade -off between the model quality and coding efficiency.
  • the evaluation outcome can control the decision process made in postselector 430.
  • postselector 430 One implementation option is for postselector 430 to always select the best model according to some preset criteria and this model is taken as the final outcome of the analysis, in which case postselector 430 outputs the selected model parameters together with the model identifier code to postprocessor 440, and, if available, outputs to storage 420 the final Y k frame restored from the parameters of the selected optimal model. If the final Y k is not available, postselector 430 instructs COMPLETE/KAM Module 415 to synthesize this Y k and to output it to unit 420. Alternatively, in some implementation options postselector 430 can choose to continue the model estimation process in COMPLETE/KAM Module 415.
  • postselector 430 turns on the selected model estimation units in COMPLETE/KAM Module 415 and supplies any necessary input the them.
  • the data frames supplied to units in 415 through postselector 430 may contain some form of data derived from X k or, alternatively, this input may obtained from a previous iteration, for instance f the residual error obtained from a previous iteration.
  • These model estimation steps may be repeated iteratively until postselector 430 chooses the final model and terminates the model estimation process by outputting the selected model parameters together with the model identifier code to unit 440, and also outputting to unit 420 the Y k frame restored by the selected model, as described immediately above.
  • COMPLETE/KAM system 400 There are three basic logical structures for the COMPLETE/KAM system 400, which can combine in different logical sequences the various functions performed in preselector 410, COMPLETE/KAM module 415 and postprocessor 430. These embodiments can be referred to as Decision-First (DF), Decision-Later (DL), and Mixed-Decision (MD) embodiments. A specific example of an MD embodiment for a speech coder will be provided in Mode 3.
  • DF Decision-First
  • DL Decision-Later
  • MD Mixed-Decision
  • the DF embodiment makes all the decisions regarding the choice of a model in the preselector 410 in Fig. 4, and selects one specific method, which can be a COMPLETE, a KAM, or a combination of COMPLETES and/or KAMs, for encoding a given frame X k .
  • a basic example of DF embodiment is a system for COMPLETE initialization, in which the basic COMPLETE/KAM system consists of one COMPLETE and one KAM.
  • unit 410 is a simple switch set to select the KAM at the start of the system operation (and after events requiring re-initialization), until enough signal history [Y k-P , ⁇ , Y k -i] is generated in unit 420 to enable COMPLETE operations. After this occurs, preselector 410 can be set to select the
  • COMPLETE More complex DF implementations that can select among multiple choices of COMPLETES lOa- ⁇ and KAMs 405a-405n can be obtained which employ analyses of the input X k as was described above in the description of preselector 410.
  • the DL embodiment makes all the decisions regarding the choice of a model or models in the postselector 430, instead of in preselector 410.
  • the DL strategy allows several possible embodiments.
  • the most basic DL strategy computes candidate models for all available method options in COMPLETE/KAM 415 and then postselector 430 selects among all the method options by comparing results obtained from their respective best models, using one or more evaluation metrics for assessing model quality and other performance measures that were given above under the description of postselector 430.
  • This strategy may be used, for example, when the goal is to choose the best overall performing model according to some predetermined set of criteria.
  • More complex DL strategies can consist of several iterative steps, each step involves generating multiple candidate models in COMPLETE/KAM 415 and evaluating results from these models in postselector 430, until the desired performance is obtained.
  • This DL strategy may be used, for example, when the evaluation criteria involve trade-offs, for example a criterion to find the
  • COMPLETE/KAM model that provides the lowest bit rate while meeting or exceeding a preset requirement for model quality.
  • the model providing the lowest bit rate can be found first using above methods, and if its output quality does not meet the desired requirement, the process is repeated for the next lowest bit rate model, until the model of desired quality is reached.
  • the iterative DL embodiment can incorporate more complex logic based on known decision making protocols. For example, the outcome from one iterative step may be evaluated and, based on the results, a set of completely different COMPLETES 10 or KAMs 405 from the set that was used in the previous step may be chosen for the next iteration by postselector 430.
  • postselector 430 may switch the methodology used from a COMPLETE 10 to a KAM 405 based on the outcome of the evaluation from a given iteration. Further, postselector 430 can direct such new model to be estimated for the signal derived from the original input frame X k or, alternatively, to be estimated for the residual error obtained from a model evaluated in one of the previous iterations. The iterative process can terminate once a predetermined number of iterations have been completed. Alternatively, postselector 430 can make the decision to terminate iterations once it finds the model that satisfies the preset criteria.
  • the MD embodiment can use both the preselector 410 and the postselector 430 to combine attributes of the DF and DL strategies.
  • preselector 410 can select a set of potential methods to be evaluated for each frame, rather than specifying a single method as done in the DF embodiment.
  • Unit 430 can accomplish further selection from among the chosen models after they have been evaluated.
  • 410 can be used to narrow down the choice of COMPLETES 10 and KAMs 405 in COMPLETE/KAM Module 415 that need to be considered for a given frame.
  • postselector 430 can change this determination after the chosen models have been evaluated and choose another model not originally selected by preselector 410.
  • the decoder appraises the received model identifier code and reconstructs the output signal Y k accordingly, using the method that corresponds to the one used to encode X k .
  • a method used by the coder may be a COMPLETE or a KAM, or a combination of COMPLETES and/or KAMs.
  • Each part of the signal that was encoded by a KAM is decoded using the known decoding method for that KAM.
  • Each part of the signal that was encoded by a COMPLETE is decoded using the corresponding COMPLETE decoding method described in Mode 1.
  • the restored frames are accumulated in a storage unit of the decoder in a way that mirrors the accumulation of the restored frames in storage 420 on the coder side, and are used in restoring the future frames as needed.
  • Mode 3 provides a specific example of some of the embodiments discussed in Modes 1 and 2.
  • Fig. 5 illustrates an exemplary block diagram depicting an example of a speech analysis (encoding) system utilizing a multimodal COMPLETE/PACT implementation, in accordance with an embodiment of the present invention.
  • Fig. 5 shows a block diagram of the essential portion of a speech coder 50 that is capable of producing toll quality speech at bit rates comparable to those of modern codecs and which is implemented using the mixed COMPLETE/KAM strategy described in Mode 2 of this invention.
  • Mode 3 uses the Complete Oscillator (COM) part of the COMPLETE model.
  • COM Complete Oscillator
  • the COM part of the model uses references derived only from the data history and the current frame and does not use external references.
  • the model used in Mode 3 will be referred to as COM.
  • the speech coder in Fig. 5 includes a preprocessor module 500; preselector unit 510, which controls the initialization process; and COM/PACT encoding module 515, which contains one COM estimation unit 525 and one KAM estimation unit 535 which implements the Pulsed Autoregressive Compensator (PACT) method described in detail below.
  • PACT Pulsed Autoregressive Compensator
  • Speech coder 50 can further include a storage unit 520 which contains restored earlier inputs [Yk p , ⁇ , Yk-i] that can be accessed by the COM 525; and a postselector 530, which controls the model selection process in the regular (non-initialization) mode and which initiates the output process after the 'best' model is found by routing the relevant information described below to unit 520 and to the postprocessor 540.
  • the postprocessor 540 operates the same as postprocessor 440 of Fig. 4, described in Mode 2, and therefore its description is omitted.
  • the preprocessor 500 in Fig. 5 processes input data using the same general
  • the windowing operation may be applied to create frames of input data that overlap their preceding input frame by some fixed number of data points. Typically the frames are also filtered to remove noise using one of many known methods. Windows spanning from 5 milliseconds (ms) to 40 ms are common in speech coding. In some embodiments, Mode 3 uses 20 ms triangular windows and 50% frame overlap. Each created frame can be outputted by the preprocessor 530 to preselector 510.
  • Preselector 510 supplies the current input frame to either the COM 525 or the PACT 535, and this controls which model is initially use in COM/PACT module 515. Operations in preselector 510 follow the general description provided for the initialization process in preselector 410 in Mode 2, but with some differences. Specifically, preselector 510 switches to an
  • unit 510 remains in 'initialization' mode until enough frames have been accumulated in unit 520 to capture what is expected to be the maximum time span of one pitch period. For example, in coding speech spoken in standard American English, it is typically sufficient to capture 32 ms of continuous speech in the frames in unit 520. Other considerations described in Mode 1 can also influence the number of frames that are being accumulated in storage unit 520. After accumulation is completed, unit 510 can switch to 'regular' mode.
  • unit 510 can activate the PACT unit 535 in module 515 by inputting to it the current input frame, to initiate modeling of this frame. While in the 'regular' mode, preselector 510 activates the COM unit 525 in COM/PACT module 515 by inputting to the COM 525 the received frames.
  • COM/PACT module 515 in Fig. 5 can contain a COM 525 which estimates the linear 4-delay COM of the form shown in Eq. (2), and a PACT 535 which estimates an autoregressive linear predictor model described below.
  • the 8 th order autoregressive linear predictor model is used.
  • Alternative embodiments can use COM 525 with a different number of delays and nonlinear function forms of COM and as well as different orders of the PACT 535 and different KAM in place of PACT. Estimation of the COM is performed in the COM 525 using the following embodiment of the general COM estimation procedure described in Mode 1 above.
  • the superscript ⁇ is used here to indicate a candidate value
  • the notation [ ] indicates the method being used to compute the given variable
  • the subscript j is the index used to indicate the individual candidate parameter sets.
  • the quality of each candidate model Y k [COM j ] can then be evaluated using the perceptually weighted error (PWE) metric of Eq. (7) above, and the parameter values which yield the minimum PWE are identified as those providing the optimal model and these parameters are supplied to postselector 530, along with the corresponding output Y k [COM] .
  • PWE perceptually weighted error
  • X k as before indicates an input sequence of some length L
  • X u (z) is a length L block of data with latency z, which means that the block starts z samples prior to the last point in X k
  • E k is the modeling error known as the prediction error
  • b u represents coefficients that are found by minimizing E k using, for example, the Levinson-Durbin algorithm or some other method according to the prior art.
  • the input is regenerated from the estimated parameters according to Eq. (9), where E k is commonly approximated as shown in Eq. (10).
  • the approximation, denoted R k is computed as a combination of scaled, time-shifted ulses that are re resented b Dirac delta functions as
  • ⁇ ⁇ denotes a Dirac delta function with amplitudes 1 at a point p(v) within the current frame and zero otherwise;
  • p(v) indicates the position of pulse v within the current frame
  • c v indicates the gain for pulse v
  • the candidate parameter sets for the PACT model consist of the pulse positions ⁇ p v ⁇ and the coefficients ⁇ c v ⁇ , which can be generated using a number of methods.
  • One option is to adapt the same method that is employed to generate the COM parameters in this mode.
  • pulse positions ⁇ p v ⁇ can be chosen the same way as the delays ⁇ di ⁇
  • coefficients ⁇ c v ⁇ can be computed the same way as the ⁇ a; ⁇ in Eq. (3).
  • Other basic parameter generation methods can also be used without diminishing the claims made in this invention.
  • the parameters of the COM and the PACT models can be quantized in their respective units in COM/PACT module 515.
  • the following embodiment assumes a narrow-band speech coder with input being sampled at an 8 kHz rate.
  • this embodiment of Mode 3 can use the following known method of partitioning the delays and pulse locations into the interleaved subsets.
  • the delays can be partitioned into subsets containing 64 entries, and individual positions within each subset are represented using 6 bits.
  • the PACT pulse locations are partitioned into subsets containing 32 entries, and individual positions within each grouping are represented using 5 bits.
  • the COM coefficients ⁇ 3 ⁇ 4 ⁇ are quantized to 12 bits using one of the known vector quantization methods.
  • the coefficients of the PACT linear predictor model can be converted to line spectral frequencies, as is standard to the art, and can be quantized using known split vector quantization methods.
  • the weights of the pulses in the PACT model are quantized to 13 bits using one of the known vector quantization methods, which can be the same method that is used to quantize the COM coefficients.
  • the operating mode of the coder controls the sequence of functions performed in postselector 530.
  • postselector 530 receives the quantized parameter set for the optimal PACT model and initiates the process of outputting the code, which will be described below, while the system 50 advances to process the next frame.
  • postselector 530 receives from module 515 the synthesized output Y k [COM ] obtained from the best candidate COM model, along with the corresponding to it quantized parameter set, and computes the Signal-to-Noise Ratio (SNR) metric given in Eq. (5), using Y k [COM ].
  • SNR Signal-to-Noise Ratio
  • PACT 535 After PACT 535 receives a frame containing either the input X k or the residual error entries E k , the second model estimation is performed, where the PACT model is estimated the same way as was already described above, treating the received frame in each case as the input to be modeled.
  • the optimal PACT is found in the current iteration, the corresponding synthesized output Y k [PACT ], if the input X k was modeled, or Y k [COM*+PACT*], if the residual error E k was modeled, along with the corresponding quantized parameter sets are supplied by the PACT 535 to postselector 530 and are used to compute the Log-spectral distance (LSD) metric given by Eq. (6).
  • LSD Log-spectral distance
  • postselector 530 initiates the process of outputting the code, which will be described below, while the system advances to process the next frame. Otherwise, the LSD metric is computed for both synthesized outputs Y k [COM * ] and Y k [COM*+PACT*]. If the difference between the two is less than some threshold value, for example LSD[COM * ] - LSD[COM * *+PACT*] ⁇ 0.5dB used in this embodiment, the COM 525 is chosen as the final model, otherwise, PACT 535 is chosen. In both cases, postselector 530 initiates the process of outputting the code, as described below, while the system advances to process the next frame X k+ i.
  • the process of outputting the code in postselector 530 consists of supplying the chosen parameters and the model identifier code to the postprocessor 540. Further, as part of the output process, postselector 530 supplies to storage 520 the signal synthesized from the chosen parameters, which has typically been computed in the process of model identification, or, if not, postselector 530 can initiates that computation.
  • the described basic embodiment produces toll-quality speech in noise-free testing conditions at an average of 9.77 kilobits per second (kbps).
  • the overall performance of this coder in terms of both bit rate and perceptual quality, is in between the current state of the art G.729 and AMR coders operating in their highest quality modes.
  • the basic embodiment used in the current Mode provides an example, chosen for the sake of simplicity and clarity of the presentation.
  • Mode 3 the speech frames are reconstructed at the decoder using decoding methods of Modes land 2 described above.
  • any of the foregoing steps and/or system modules may be suitably replaced, reordered, removed and additional steps and/or system modules may be inserted depending upon the needs of the particular application, and that the systems of the foregoing embodiments may be implemented using any of a wide variety of suitable processes and system modules, and is not limited to any particular computer hardware, software, middleware, firmware, microcode and the like.
  • a typical computer system can, when appropriately configured or designed, serve as a computer system in which those aspects of the invention may be embodied.
  • Fig. 6 illustrates a typical computer system that, when appropriately configured or designed, can serve as a computer system in which the invention may be embodied.
  • the computer system 600 includes any number of processors 602 (also referred to as central processing units, or CPUs) that are coupled to storage devices including primary storage 606 (typically a random access memory, or RAM), primary storage 604 (typically a read only memory, or ROM).
  • processors 602 may be of various types including microcontrollers (e.g., with embedded RAM/ROM) and microprocessors such as programmable devices (e.g., RISC or SISC based, or CPLDs and FPGAs) and
  • primary storage 604 acts to transfer data and instructions uni-directionally to the CPU and primary storage 606 is used typically to transfer data and instructions in a bi-directional manner. Both of these primary storage devices may include any suitable non-transitory computer- readable media such as those described above.
  • a mass storage device 608 may also be coupled bi- directionally to CPU 602 and provides additional data storage capacity and may include any of the non-transitory computer-readable media described above. Mass storage device 608 may be used to store programs, data and the like and is typically a secondary storage medium such as a hard disk.
  • mass storage device 608 may, in appropriate cases, be incorporated in standard fashion as part of primary storage 606 as virtual memory.
  • a specific mass storage device such as a CD-ROM 614 may also pass data uni- directionally to the CPU.
  • CPU 602 may also be coupled to an interface 610 that connects to one or more input/output devices such as such as video monitors, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, or other well-known input devices such as, of course, other computers.
  • CPU 602 optionally may be coupled to an external device such as a database or a computer or telecommunications or internet network using an external connection as shown generally at 612, which may be implemented as a hardwired or wireless communications link using suitable conventional technologies. With such a connection, it is contemplated that the CPU might receive information from the network, or might output information to the network in the course of performing the method steps described in the teachings of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Analogue/Digital Conversion (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
PCT/US2011/058479 2010-10-29 2011-10-28 Low bit rate signal coder and decoder Ceased WO2012058650A2 (en)

Priority Applications (11)

Application Number Priority Date Filing Date Title
KR1020137013787A KR101505341B1 (ko) 2010-10-29 2011-10-28 로우 비트 레이트 신호 코더 및 디코더
AU2011320141A AU2011320141B2 (en) 2010-10-29 2011-10-28 Low bit rate signal coder and decoder
BR112013010518A BR112013010518A2 (pt) 2010-10-29 2011-10-28 low bit rate signal coder and decoder
JP2013536900A JP5815723B2 (ja) 2010-10-29 2011-10-28 低ビットレート信号コーダおよびデコーダ
MX2013004802A MX337311B (es) 2010-10-29 2011-10-28 Codificador y decodificador de señal de velocidad de bits baja.
EP11837236.6A EP2633625A4 (en) 2010-10-29 2011-10-28 CODING DEVICE AND DECODER WITH LOW BITRATE
CN201180063393.5A CN103348597B (zh) 2010-10-29 2011-10-28 低比特率信号的编码及解码方法
US13/882,195 US10084475B2 (en) 2010-10-29 2011-10-28 Low bit rate signal coder and decoder
RU2013124363/08A RU2565995C2 (ru) 2010-10-29 2011-10-28 Кодирующее и декодирующее устройство для низкоскоростных сигналов
IL226045A IL226045A (en) 2010-10-29 2013-04-29 Encryption device and decoder with low signal rate
US16/044,329 US10686465B2 (en) 2010-10-29 2018-07-24 Low bit rate signal coder and decoder

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US12/915,989 US8620660B2 (en) 2010-10-29 2010-10-29 Very low bit rate signal coder and decoder
US12/915,989 2010-10-29

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US12/915,989 Continuation-In-Part US8620660B2 (en) 2010-10-29 2010-10-29 Very low bit rate signal coder and decoder

Related Child Applications (2)

Application Number Title Priority Date Filing Date
US13/882,195 A-371-Of-International US10084475B2 (en) 2010-10-29 2011-10-28 Low bit rate signal coder and decoder
US16/044,329 Continuation US10686465B2 (en) 2010-10-29 2018-07-24 Low bit rate signal coder and decoder

Publications (2)

Publication Number Publication Date
WO2012058650A2 true WO2012058650A2 (en) 2012-05-03
WO2012058650A3 WO2012058650A3 (en) 2012-09-27

Family

ID=45994838

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/058479 Ceased WO2012058650A2 (en) 2010-10-29 2011-10-28 Low bit rate signal coder and decoder

Country Status (11)

Country Link
US (3) US8620660B2 (enExample)
EP (1) EP2633625A4 (enExample)
JP (1) JP5815723B2 (enExample)
KR (1) KR101505341B1 (enExample)
CN (1) CN103348597B (enExample)
AU (1) AU2011320141B2 (enExample)
BR (1) BR112013010518A2 (enExample)
IL (1) IL226045A (enExample)
MX (1) MX337311B (enExample)
RU (1) RU2565995C2 (enExample)
WO (1) WO2012058650A2 (enExample)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8620660B2 (en) * 2010-10-29 2013-12-31 The United States Of America, As Represented By The Secretary Of The Navy Very low bit rate signal coder and decoder
EP2731377A4 (en) * 2012-03-02 2014-09-10 Huawei Tech Co Ltd METHOD AND DEVICE FOR IDENTIFICATION AND GENERATION OF AMBE ENCODING AND DECODING RATING INFORMATION IN SDP
FR3023646A1 (fr) * 2014-07-11 2016-01-15 Orange Mise a jour des etats d'un post-traitement a une frequence d'echantillonnage variable selon la trame
US9456075B2 (en) * 2014-10-13 2016-09-27 Avaya Inc. Codec sequence detection
WO2016103222A2 (en) * 2014-12-23 2016-06-30 Dolby Laboratories Licensing Corporation Methods and devices for improvements relating to voice quality estimation
US10542961B2 (en) 2015-06-15 2020-01-28 The Research Foundation For The State University Of New York System and method for infrasonic cardiac monitoring
RU2610285C1 (ru) * 2016-02-15 2017-02-08 федеральное государственное казенное военное образовательное учреждение высшего образования "Военная академия связи имени Маршала Советского Союза С.М. Буденного" Министерства обороны Российской Федерации Способ распознавания протоколов низкоскоростного кодирования
RU2667462C1 (ru) * 2017-10-24 2018-09-19 федеральное государственное казенное военное образовательное учреждение высшего образования "Военная академия связи имени Маршала Советского Союза С.М. Буденного" Министерства обороны Российской Федерации Способ распознавания протоколов низкоскоростного кодирования речи
CN110768680B (zh) * 2019-11-04 2024-03-29 重庆邮电大学 一种scl剪枝技术联合球型列表译码的方法及装置
US11373639B2 (en) * 2019-12-12 2022-06-28 Mitsubishi Electric Research Laboratories, Inc. System and method for streaming end-to-end speech recognition with asynchronous decoders pruning prefixes using a joint label and frame information in transcribing technique
CN116110409B (zh) * 2023-04-10 2023-06-20 南京信息工程大学 一种ASIP架构的大容量并行Codec2声码器系统及编解码方法

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1179803B (it) * 1984-10-30 1987-09-16 Cselt Centro Studi Lab Telecom Metodo e dispositivo per la correzione di errori causati da rumore di tipo impulsivo su segnali vocali codificati con bassa velocita di ci fra e trasmessi su canali di comunicazione radio
JP3343965B2 (ja) 1992-10-31 2002-11-11 ソニー株式会社 音声符号化方法及び復号化方法
US5701390A (en) 1995-02-22 1997-12-23 Digital Voice Systems, Inc. Synthesis of MBE-based coded speech using regenerated phase information
US5717819A (en) * 1995-04-28 1998-02-10 Motorola, Inc. Methods and apparatus for encoding/decoding speech signals at low bit rates
US6014622A (en) * 1996-09-26 2000-01-11 Rockwell Semiconductor Systems, Inc. Low bit rate speech coder using adaptive open-loop subframe pitch lag estimation and vector quantization
JP4121578B2 (ja) * 1996-10-18 2008-07-23 ソニー株式会社 音声分析方法、音声符号化方法および装置
US6233550B1 (en) * 1997-08-29 2001-05-15 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
AU4975597A (en) 1997-09-30 1999-04-23 Siemens Aktiengesellschaft A method of encoding a speech signal
WO2000079519A1 (en) * 1999-06-18 2000-12-28 Koninklijke Philips Electronics N.V. Audio transmission system having an improved encoder
CN100387061C (zh) * 1999-11-29 2008-05-07 索尼公司 视频/音频信号处理方法和视频/音频信号处理设备
SE517156C2 (sv) * 1999-12-28 2002-04-23 Global Ip Sound Ab System för överföring av ljud över paketförmedlade nät
KR100861884B1 (ko) * 2000-06-20 2008-10-09 코닌클리케 필립스 일렉트로닉스 엔.브이. 정현파 코딩 방법 및 장치
JP2002062899A (ja) * 2000-08-23 2002-02-28 Sony Corp データ処理装置およびデータ処理方法、学習装置および学習方法、並びに記録媒体
JP3876781B2 (ja) * 2002-07-16 2007-02-07 ソニー株式会社 受信装置および受信方法、記録媒体、並びにプログラム
WO2005024783A1 (en) * 2003-09-05 2005-03-17 Koninklijke Philips Electronics N.V. Low bit-rate audio encoding
US7337108B2 (en) * 2003-09-10 2008-02-26 Microsoft Corporation System and method for providing high-quality stretching and compression of a digital audio signal
EP1849156B1 (en) * 2005-01-31 2012-08-01 Skype Method for weighted overlap-add
WO2007080212A1 (en) * 2006-01-09 2007-07-19 Nokia Corporation Controlling the decoding of binaural audio signals
US20090198491A1 (en) * 2006-05-12 2009-08-06 Panasonic Corporation Lsp vector quantization apparatus, lsp vector inverse-quantization apparatus, and their methods
JP5749462B2 (ja) * 2010-08-13 2015-07-15 株式会社Nttドコモ オーディオ復号装置、オーディオ復号方法、オーディオ復号プログラム、オーディオ符号化装置、オーディオ符号化方法、及び、オーディオ符号化プログラム
US8620660B2 (en) * 2010-10-29 2013-12-31 The United States Of America, As Represented By The Secretary Of The Navy Very low bit rate signal coder and decoder
US9275644B2 (en) * 2012-01-20 2016-03-01 Qualcomm Incorporated Devices for redundant frame coding and decoding

Also Published As

Publication number Publication date
EP2633625A2 (en) 2013-09-04
US20130214943A1 (en) 2013-08-22
WO2012058650A3 (en) 2012-09-27
US20120109653A1 (en) 2012-05-03
AU2011320141A1 (en) 2013-06-27
RU2565995C2 (ru) 2015-10-20
KR20130086234A (ko) 2013-07-31
KR101505341B1 (ko) 2015-03-23
MX2013004802A (es) 2014-05-09
IL226045A (en) 2016-05-31
US10084475B2 (en) 2018-09-25
US8620660B2 (en) 2013-12-31
US20180358981A1 (en) 2018-12-13
CN103348597A (zh) 2013-10-09
MX337311B (es) 2016-02-25
IL226045A0 (en) 2013-06-27
RU2013124363A (ru) 2014-12-10
JP2014502366A (ja) 2014-01-30
AU2011320141B2 (en) 2015-06-04
US10686465B2 (en) 2020-06-16
JP5815723B2 (ja) 2015-11-17
BR112013010518A2 (pt) 2016-08-02
EP2633625A4 (en) 2014-05-07
CN103348597B (zh) 2017-01-18

Similar Documents

Publication Publication Date Title
US10686465B2 (en) Low bit rate signal coder and decoder
EP3039676B1 (en) Adaptive bandwidth extension and apparatus for the same
EP2272062B1 (en) An audio signal classifier
RU2651187C2 (ru) Основанное на линейном предсказании кодирование аудио с использованием улучшенной оценки распределения вероятностей
KR102745244B1 (ko) 선형예측계수 양자화방법 및 장치와 역양자화 방법 및 장치
WO2007083933A1 (en) Apparatus and method for encoding and decoding signal
JP2019174834A (ja) 低または中ビットレートに対する知覚品質に基づくオーディオ分類
KR20050020728A (ko) 음성 처리 시스템, 음성 처리 방법 및 음성 프레임 평가방법
Prandoni et al. R/D optimal linear prediction
WO2006014677A1 (en) Apparatus and method for audio coding
EP0950238B1 (en) Speech coding and decoding system
AU2020365140A1 (en) Methods and system for waveform coding of audio signals with a generative model
Merazka Codebook Design Using Simulated Annealing Algorithm for Vector Quantization of Line Spectrum Pairs
Kwong et al. Design and implementation of a parametric speech coder
Pinagé et al. Prediction-Based Coding of Speech Signals Using Multiscale Recurrent Patterns
RU2022107245A (ru) Формат со множественным запаздыванием для кодирования звука
Merazka et al. Robust split vector quantization of LSP parameters at low bit rates
Negrescu et al. On Rationally DSP Implementation of the MP-MLQ/ACELP Dual Rate Speech Encoder for Multimedia Communications
HK1210316B (en) Linear prediction based audio coding using improved probability distribution estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11837236

Country of ref document: EP

Kind code of ref document: A2

ENP Entry into the national phase

Ref document number: 2013536900

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 13882195

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: MX/A/2013/004802

Country of ref document: MX

WWE Wipo information: entry into national phase

Ref document number: 2011837236

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2013124363

Country of ref document: RU

Kind code of ref document: A

Ref document number: 20137013787

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2011320141

Country of ref document: AU

Date of ref document: 20111028

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112013010518

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112013010518

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20130429