EP1593116B1 - Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens - Google Patents

Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens Download PDF

Info

Publication number
EP1593116B1
EP1593116B1 EP04705433A EP04705433A EP1593116B1 EP 1593116 B1 EP1593116 B1 EP 1593116B1 EP 04705433 A EP04705433 A EP 04705433A EP 04705433 A EP04705433 A EP 04705433A EP 1593116 B1 EP1593116 B1 EP 1593116B1
Authority
EP
European Patent Office
Prior art keywords
signal
pitch
block
synthesis
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP04705433A
Other languages
English (en)
French (fr)
Other versions
EP1593116A1 (de
Inventor
Jean-Luc Crebouw
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of EP1593116A1 publication Critical patent/EP1593116A1/de
Application granted granted Critical
Publication of EP1593116B1 publication Critical patent/EP1593116B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0204Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Definitions

  • the present invention relates to a differentiated digital processing of voice and music, noise filtering, the creation of special effects and a device for implementing said method.
  • the voice signal is composed of a mixture of very complex transient signals (noises) and quasiperiodic signal parts (harmonic sounds).
  • the noises can be small explosions: P, B, T, D, K, GU; soft diffuse noise: F, V, J, Z or intense CH, S; as for harmonic sounds, their spectrum varies with the type of vowel and with the speaker.
  • the intensity ratios between noises and vowels change according to whether it is a conversational voice, a conference-like voice, a loud voice or a sung voice.
  • the strong voice and the sung voice favor vowel sounds at the expense of noises.
  • the vocal signal transmits simultaneously two types of messages: a semantic message conveyed by the word, verbal expression of the thought, and an aesthetic message perceptible through the aesthetic qualities of the voice (timbre, intonation, flow, etc.).
  • the semantic content of speech is practically independent of the qualities of the voice; it is conveyed by temporal acoustic forms; a whispered voice consists only of flow noises; an "intimate" or proximity voice consists of a mixture of harmonic sounds in the low frequencies and noises of flow in the treble; the voice of a speaker or singer has a rich and intense harmonic vowel spectrum.
  • the musical range and the spectral content are not directly related; some instruments have the energy maxima included in the range; others have a well-circumscribed maximum zone of energy at the sharp limit of the range and beyond; others, finally, have extremely spreading energy maxima that extend well beyond the acute limit of the tessitura.
  • the originality of digital technologies is to introduce a determinism (that is to say, a priori knowledge) as much as possible at the level of the processed signals so as to perform particular treatments that will reside in the form of calculations.
  • this signal will be processed without undergoing degradation such that background, distortion and bandwidth limitation; moreover, it can be processed to create special effects such as the transformation of the voice, the suppression of the ambient noise, the modification of the breath of the voice, the differentiation of the voice and the music.
  • the patent US 5,684,262 A discloses a method of multiplying the original voice with a tone to obtain a frequency offset and thereby obtain a deeper or more acute voice.
  • the rate reduction methods are used mainly for digital storage (in order to reduce the bit volume) and for transmission (in order to reduce the required bit rate). These methods include pre-storage or transmission processing (coding) and rendering processing (decoding).
  • This process is based on the mask effect of human hearing, that is, the disappearance of weak sounds in the presence of loud sounds, equivalent to a shift of the hearing threshold caused by the loudest sound and depending on the frequency and level difference between the two sounds.
  • the number of bits per sample is set according to the mask effect since the low sounds and the quantization noise are inaudible.
  • the audio spectrum is divided into a number of sub-bands, thus allowing the mask level to be specified in each of the sub-bands and to realize a bit allocation for each one of the sub-bands. between them.
  • This technique consists in transmitting a variable bit rate according to the instant composition of the sound.
  • this method is rather adapted to the processing of the music and not to the vocal signal; it does not detect the presence of voice or music, separate the voice or musical signal and noise, modify the voice in real time to synthesize a different but realistic voice, synthesize breath (noise) for create special effects, code a voice signal with a single voice, reduce ambient noise.
  • the object of the invention is therefore more particularly to eliminate these disadvantages.
  • this method of transforming the voice, the music and the ambient noise is as defined in claim 1.
  • the analysis of the voice signal and the coding of the parameters constitute the two functionalities of the analyzer (block A); likewise, the decoding of the parameters, the special effects and the synthesis constitute the functions of the synthesizer (block C).
  • thresholds (blocks 4, 7, 8, 22) respectively make it possible to detect the presence of an inaudible signal, the presence of an inaudible frame, the presence of a pulse, the presence of a mains disturbance signal (50 Hz) or 60 Hz).
  • a fifth threshold makes it possible to perform the Fast Fourier Transform (TRF) on the untreated signal according to the characteristics of the "pitch" and its variation.
  • a sixth threshold makes it possible to restore the result of the Fast Fourier Transform (TRF) with pretreatment as a function of the signal-to-noise ratio.
  • Two frames are used in the audio signal analysis method, a so-called "current” frame, of fixed periodicity, containing a certain number of samples corresponding to the vocal signal, and a so-called “analysis” frame, whose number of samples is equivalent to that of the current frame or the double, and can be shifted, according to the temporal interpolation, with respect to the aforesaid current frame.
  • the formatting of the input signal (block 1) consists in performing a high pass filtering in order to improve the future coding of the frequency amplitudes by increasing their dynamics; said high pass filter increases the dynamic of frequency amplitude by avoiding that a low audible frequency occupies all the dynamics and makes disappear frequencies of small amplitude but nevertheless audible.
  • the filtered signal is then directed to block 2 for the determination of the time envelope.
  • the time offset to be applied to the analysis frame is then calculated by searching on the one hand for the maximum of the envelope in the said frame and on the other hand two indices corresponding to the values of the envelope that are lower by a certain percentage than the value of the maximum.
  • the temporal interpolation detection (block 3) makes it possible to correct the two offset indices of the analysis frame found in the previous calculation, and this taking into account the past.
  • a first threshold detects the presence of an audible signal or not by measuring the maximum value of the envelope; if so, the analysis of the frame is completed; otherwise, continuous treatment.
  • the dynamics of the signal is then calculated (block 6) for its normalization in order to reduce the computation noise; the normalization gain of the signal is calculated from the highest sample in absolute value in the analysis frame.
  • a second threshold (block 7) detects or not the presence of an inaudible frame by mask effect caused by the previous frames; if yes, the analysis is complete; otherwise, the treatment continues.
  • a third threshold (block 8) then detects the presence of a pulse; in the affirmative, a specific treatment is carried out (blocks 9, 10); otherwise, the signal parameter calculations (block 11) for preprocessing the time signal (block 12) will be performed.
  • the repetition of the pulse (block 9) is performed by creating an artificial "pitch", equal to the duration of the pulse, so as to avoid the masking of the useful frequencies during the fast transformation of Fourrier (TRF).
  • the Fast Fourier Transform (TRF) (block 10) is then performed on the repeated pulse keeping only the absolute value of the complex number and not the phase; the calculation of frequencies and modules of the frequency data (block 20) is then performed.
  • the calculation of the "pitch” is carried out beforehand by differentiating the signal from the analysis frame, followed by a low-pass filtering of the high-rank components, and then a cube elevation of the result of said filtering; the value of the "pitch” is determined by calculating the minimum distance between a portion of the high energy signal and the sequence of the subsequent signal, since the aforesaid minimum distance is the sum of the absolute value of the differences between the samples the template and the samples to be correlated; then, the main part of a "pitch” centered around one and a half times the value of the "pitch” is sought at the beginning of the analysis frame in order to calculate the distance of this portion of "pitch” on the entirety of the analysis frame; thus, the minimum distances defining the positions of the "pitch", the "pitch” being the average of the "pitches” detected; then the variation of the "pitch” is calculated using a line that minimizes the mean square error of the successions of the detected "pitches”; the "pitch” estimated
  • the subtraction of the variation of the pitch consists of sampling the oversampled analysis frame with a sampling step varying with the inverse value of said variation of the pitch.
  • the oversampling, in a ratio two, of the analysis frame is performed by multiplying the result of the Fast Fourier Transform (TFR) of the analysis frame by the factor exp (-j * 2 * PI * k / (2 * L_frame), so as to add a half-sample delay to the time signal used to calculate the fast Fourier transform, the fast Fourier transform is then performed in order to obtain the offset time signal of half a sample.
  • TFR Fast Fourier Transform
  • a frame of double length is thus produced by alternately using a sample of the original frame with a sample of the frame shifted by half a sample.
  • the calculation of the signal-to-noise ratio is performed on the absolute value of the Fast Fourier Transform (TRF) result; the aforesaid ratio is in fact the ratio of the difference of the energy of the signal and the noise to the sum of the energy of the signal and the noise; the numerator of the aforesaid ratio corresponds to the logarithm of the difference between two peaks of energy, respectively of the signal and the noise, the peak of energy being that which is either greater than the four adjacent samples corresponding to the harmonic signal, or less than the four adjacent samples corresponding to the noise; the denominator is the sum of the logarithms of all signal peaks and noise; Moreover, the calculation of signal-to-noise ratio is by subband, the highest subbands, in terms of level, are averaged and give the desired ratio.
  • TRF Fast Fourier Transform
  • the calculation of the signal-to-noise ratio defined as the signal ratio minus the signal noise plus the noise, performed in the block 14, makes it possible to determine whether the signal analyzed is a voiced signal or music, the case of a high ratio, or noise, case of a low ratio.
  • the calculation of the signal-to-noise ratio is then carried out in the block 17, so as to transmit to the block 20 the results of the Fast Fourier Transform (TRF) without pre-processing, in the case of a variation of the "pitch" zero, or, in the opposite case to restore the results of the fast Fourier transform (TRF) with pretreatment (block 19).
  • TRF Fast Fourier Transform
  • TRF fast fourier transform
  • the Fast Fourier Transform (TRF), previously mentioned with reference to the blocks 10, 13, 16, is carried out, for example, on 256 samples in the case of an offset frame or pulse, or on the double of samples in the case of a centered field without impulse.
  • a weighting of the samples located at the ends of the samples is carried out in the case of the fast Fourier transform (TRF) on n samples; on 2n samples, we use the HAMMING weighting window multiplied by the square root of the HAMMING window.
  • TRF fast Fourier transform
  • the ratio between two adjacent maximal values is calculated, each representing the product of the amplitude of the frequency component by a cardinal sinus; by successive approximations, we compare this ratio between the maximum values, with values contained in tables containing this same ratio, for N frequencies (for example 32 or 64) distributed uniformly on a half sample of the fast Fourier transform (TRF) .
  • N frequencies for example 32 or 64
  • TRF fast Fourier transform
  • the calculation of frequencies and frequency data modules of the fast Fourier transform (TRF) performed in block 20 also makes it possible to detect a DTMF signal (dual tone multifrequency) in telephony.
  • TRF fast Fourier transform
  • the signal-to-noise ratio is the essential criterion that defines the type of signal.
  • Detection of the presence or absence of disturbing signal at 50 Hz is carried out in block 22; the level of the detection threshold is a function of the level of the desired signal so as to avoid confusing the electromagnetic disturbance (50, 60 Hz) and the fundamental of a musical instrument.
  • a computation of the dynamics of the amplitudes of the frequency components, or modules, is carried out in the block 23; the aforesaid frequency dynamic is used for the coding as well as for the suppression of the inaudible signals subsequently carried out in block 25.
  • the frequency plane is subdivided into several parts, each of which has several amplitude ranges differentiated according to the type of signal detected at block 21.
  • time interpolation and frequency interpolation are suppressed at block 24; these had been done to optimize the quality of the signal.
  • Frequency interpolation depends on the variation of the pitch; this one will be suppressed according to the offset of a certain number of samples and the direction of the variation of the pitch.
  • the amplitudes below the lower limit of the amplitude range are eliminated, and the frequencies whose range is smaller than one frequency unit, defined as the sampling frequency by sample unit.
  • the inaudible components are eliminated by means of a test between the amplitude of the frequency component to be tested and the amplitude of the others. adjacent components multiplied by an attenuator term depending on the difference between their frequency.
  • the number of frequency components is limited to a value beyond which the difference on the result obtained is not perceptible.
  • the calculation of the "pitch" on the frequency signal must make it possible to decide whether it should be used in coding, knowing that the use of the pitch in the coding makes it possible to strongly reduce the coding and to make the voice more natural to synthesis; it is also used by the noise filter.
  • the principle of the "pitch" calculation consists in synthesizing the signal by a sum of cosines having phases at the origin zero; thus the shape of the original signal will be reconstituted without the disturbances of the envelope, the phases and the variation of the pitch.
  • the value of the frequency "pitch” is defined by the value of the temporal “pitch” which is equivalent to the first synthesis value having a maximum greater than the product of a coefficient by the sum of the modules used for the local synthesis (sum of the cosines said modules); this coefficient is equal to the ratio of the energy of the signal, considered as harmonic, to the sum of the energy of the noise and the energy of the signal; the aforesaid coefficient is even lower than the "pitch" to be detected is embedded in the noise; for example, at a signal-to-noise ratio of 0 decibels corresponds to a coefficient of 0.5.
  • the validation information of the frequency "pitch” is obtained using the report of the synthesis sample, at the location of the "pitch", the sum of the modules used for the local synthesis; this ratio, synonymous with the energy of the harmonic signal on the total energy of the signal, is corrected according to the approximate signal-to-noise ratio calculated in the block 14; the "pitch" validation information depends on exceeding the threshold of this report.
  • the values of said modules are not limited for the second local synthesis, only the number of frequencies is limited by taking into account only those which have a significant modulus in order to limit the noise.
  • a second method of calculating the "pitch” consists in selecting the "pitch” which gives the maximum energy for a sampling step of the synthesis equal to the desired "pitch”; this process is used for music or a sound environment with multiple voices.
  • the analysis will end with the following processing consisting in attenuating the noise, in the block 28, by decreasing the frequency components which are not a multiple of the "pitch"; after attenuation of said frequency components, the suppression of the inaudible signal, as described previously, at the block 25 will be carried out again.
  • the attenuation of said frequency components is a function of the type of signal as defined previously by block 21.
  • the formatting of the modules (block 31) consists in eliminating the attenuation of the input filter of the samples of the analysis (block 1 of the block A1) and taking into account the direction of the variation of the "pitch" because the synthesis is performed temporally by a phase increment of a sinus.
  • the validation information of the "pitch" is deleted if the synthesis option of the music is validated; this option improves the phase calculation of the frequencies by avoiding synchronizing the phases of the harmonics with each other according to the "pitch".
  • the noise reduction (block 32) is performed if it has not been previously performed during the analysis (block 28 of block A1).
  • the signal upgrade (block 33) deletes the standardization of modules received from the analysis; this upgrade consists of multiplying the modules by the inverse of the normalization gain defined in the calculation of the signal dynamics (block 6 of block A1) and multiplying said modules by 4 in order to eliminate the effect of the HAMMING window, and only half of the frequency plane is used.
  • the saturation of the modules is performed if the sum of the modules is greater than the signal dynamics of the output samples; it consists of multiplying the modules by the ratio of the maximum value of the sum of the modules to the sum of the modules, in the case where said ratio is less than 1.
  • the pulse is re-generated by realizing the sum of sines in the pulse duration; the pulse parameters are modified (block 35) according to the variable speed of synthesis.
  • the frequency phases are then calculated (block 36); it aims to give a phase continuity between the frequencies of the frames or to re-synchronize the phases between them; it also makes the voice more natural.
  • Phase continuity consists of looking for the frequencies of the current frame at the beginning of the frame which are closest to the frequencies at the end of the frame of the previous frame; then the phase of each frequency becomes equal to that of the nearest preceding frequency, knowing that the frequencies at the beginning of the current frame are calculated from the central value of the frequency modified by the variation of the "pitch".
  • the phases of the harmonics will be synchronized to that of the pitch by multiplying the phase of the "pitch” by the index of the harmonic of the "pitch”; as for the continuity of phase, one calculates the phase of the "pitch” at the end of the frame according to its variation and the phase at the origin of the frame; this phase will be used for the beginning of the next frame.
  • a second solution is to no longer apply the variation of "pitch” on the "pitch” to know the new phase; it is enough to resume the phase of the end of the previous frame of the "pitch”; moreover, during the synthesis, the variation of the "pitch” is applied on the interpolation of the synthesis realized without variation of the "pitch".
  • the generation of the breath is then performed (block 37).
  • any sound signal in the interval of a frame is the sum of sinus of fixed amplitude and whose frequency is linearly modulated as a function of time, this sum being temporally modulated by the envelope of signal, the noise being added to this signal prior to said sum.
  • the principle of the noise calculation is based on a filtering of a white noise by a transversal filter whose coefficients are calculated by the sum of the sines of the frequencies of the signal whose amplitudes are attenuated according to the values of their frequency and their amplitude.
  • a HAMMING window is then applied to the coefficients to decrease the sidelobes.
  • the filtered noise is then saved in two separate parts.
  • a first part will make the link between two successive frames; the connection between two frames is made by overlapping these two frames each of which is weighted linearly and in the opposite direction; said overlap is performed when the signal is sinusoidal; it does not apply when it comes to uncorrelated noise; thus the saved portion of the filtered noise is added without weighting on the overlap area.
  • the second part is intended for the main body of the frame.
  • the link between two frames must first allow a smooth passage between two noise filters of two successive frames, and secondly prolong the noise of the next frame beyond the overlapping part of the frames if a start word (or sound) is detected.
  • the fluid passage between two frames is achieved by the sum of the white noise filtered by the filter of the previous frame weighted by a linear downward slope, and the same white noise filtered by the noise filter of the current frame weighted by the slope. rising opposite to that of the filter of the previous frame.
  • the energy of the noise will be added to the energy of the sum of the sines, according to the proposed method.
  • the generation of a pulse differs from a signal without pulse; indeed, in the case of the generation of a pulse, the sum of the sines is not realized only on a part of the current frame to which is added the sum of the sines of the previous frame.
  • the synthesis with the new frequency data (block 39) consists in carrying out the sum of the sinuses of the frequency components of the current frame; the variation of the length of the frame makes it possible to perform a variable speed synthesis; nevertheless the values of the frequencies at the beginning and at the end of the frame must be identical, whatever the length of the frame, for a given speed of synthesis.
  • phase associated with the sinus a function of frequency, will be calculated by iteration; indeed for each iteration, one calculates the sine multiplied by the module; the result is then summed for each sample according to all the frequencies of the signal.
  • Another method of synthesis is to carry out the inverse of the analysis by recreating the frequency domain from the cardinal sinus realized with the module, the frequency and the phase, and then realizing a fast inverse Fourier transform (TFR), followed by the product of the inverse of the HAMMING window to obtain the time domain of the signal.
  • TFR fast inverse Fourier transform
  • the phases at the origin of the frequency data are maintained at the value 0.
  • the calculation of the sine sum is also performed on a portion preceding the frame and on the same portion following the frame; the parts at both ends of the frame will then be summed with those of the adjacent frames by linear weighting.
  • the sum of the sines is made in the generation interval of the pulse; in order to avoid the creation of spurious pulses due to the discontinuities in the calculation of the sum of the sines, a certain number of samples situated at the beginning and at the end of the sequence are weighted respectively by a rising slope and a descending slope.
  • the synthesis by the sum of the sines with the data of the previous frame (block 41) is performed when the current frame contains a pulse to be generated; indeed, in the case of music or noise, if the synthesis is not performed on the previous frame, serving as a background signal, the pulse will be generated on a silence, which is detrimental to a good quality of the sound. obtained result ; Moreover the continuity of the previous frame is inaudible, even in the presence of a progression of the signal.
  • the application of the envelope on the synthesis signal (block 42) is performed from the sampled values of the envelope previously determined (block 2 of block A3); moreover, the connection between two successive frames is performed by the weighted sum, as indicated above; this Weighting by the increasing and decreasing curves is not done on the noise, because the noise is not juxtaposed between wefts.
  • the length of the frame varies in steps in order to be homogeneous with the sampling of the envelope.
  • the juxtaposition weighting between two frames is then performed (block 45) as indicated above.
  • the backup of the raster edge (block 47) will be performed so that said raster edge can be added to the beginning of the next frame.
  • the coding of the parameters (block A2) calculated in the analysis (block A1) in the method according to the invention consists in limiting the quantity of useful information in order to reproduce at the synthesis (block C3) after decoding (block C1) an auditory equivalent to the original audio signal.
  • each coded frame has a number of bits of clean information; the audio signal being variable, more or less information will be coded.
  • the coding of the parameters can be either linear, the number of bits being a function of the number of values, or of the HUFFMAN type, the number of bits being a statistical function of the value to be encoded (the more the data is frequent, the less it uses bits and vice versa).
  • the type of signal as defined during the analysis (block 21 of block A1), provides the noise generation information and the quality of the coding to be used; the coding of the type of signal is carried out first (block 51).
  • a test is then performed (block 52) allowing in the case of type 3 of the signal, as defined in block 21 of the analysis (block A1), not to perform coding parameters; the synthesis will include null samples.
  • the encoding of the type of compression (block 53) is used in the case where the user wishes to act on the bit rate of the coding data, to the detriment of the quality; this option can be advantageous in telecommunication mode associated with a high compression ratio.
  • the coding of the normalization value (block 54) of the signal of the analysis frame is of the HUFFMAN type.
  • a test on the presence of pulse (block 55) is then performed, allowing in case of synthesis of a pulse, to code the parameters of said pulse.
  • parameters of said pulse (block 56) will be performed on the beginning and the end of said pulse in the current frame.
  • the coding of the doppler variation of the pitch (block 57) it will be done according to a logarithmic law, taking into account the sign of said variation; this coding will not be performed in the presence of a pulse or if the type of signal is unvoiced.
  • a limitation of the number of frequencies to be coded (block 58) is then performed in order to prevent a high value frequency from exceeding the dynamic bounded by the sampling frequency, since the doppler variation of the pitch varies. the frequencies during the synthesis.
  • the encoding of the sampling values of the envelope depends on the variation of the signal, the type of compression, the type of signal, the normalization value and the possible presence of pulse; said coding consists in coding the variations and the minimum value of said sampling values.
  • the validation of the "pitch” is then coded (block 60), followed by a validation test (block 61) requiring, if so, to code the harmonic frequencies (block 62) according to their index with respect to the frequency of the "Pitch". As for the non-harmonic frequencies, they will be coded (block 63) according to their whole part.
  • the coding of the harmonic frequencies (block 62) consists in logarithmic coding of the pitch, in order to obtain the same relative precision for each harmonic frequency; the coding of said harmonic indices is performed according to their presence or absence by packet of three indices according to the HUFFMAN coding.
  • Frequencies that have not been detected as harmonic to the pitch frequency will be coded separately (block 63).
  • a non-harmonic frequency changes position with respect to a harmonic frequency
  • the non-harmonic frequency which is too close to the harmonic frequency is suppressed, knowing that it has less weight in the sense audible; thus the suppression takes place if the non-harmonic frequency is greater than the harmonic frequency and the fraction of the non-harmonic frequency due to the coding of the whole part, makes said non-harmonic frequency lower than the near harmonic frequency.
  • the coding of non-harmonic frequencies (block 63) consists in coding the number of non-harmonic frequencies, then the integer part of the frequencies, then the fractional parts when the modules are coded; concerning the coding of the integer part of the frequencies, only the gaps between said integer parts are coded; moreover, the smaller the module, the lower the precision on the fractional part; this in order to decrease the bit rate.
  • a maximum number of differences between two frequencies is defined.
  • the coding of the dynamics of the modules uses a HUFFMAN law as a function of the number of ranges defining said dynamics and the type of signal.
  • the signal energy is in the low frequencies; for the other types of signal, the energy is distributed uniformly in the frequency plane, with a decrease towards the high frequencies.
  • the coding of the highest module (block 65) consists of coding, according to a HUFFMAN law, the integer part of said highest module taking into account the statistic of said highest module.
  • the coding of the modules (block 66) is performed only if the number of modules to be coded is greater than 1, since otherwise it is the only module being the highest.
  • the suppression of the inaudible signal eliminates the modules lower than the product of the module by the corresponding attenuation; thus a module is necessarily located in a zone of the module / frequency plane depending on the distance separating it from its two adjacent modules as a function of the frequency deviation of said modules adjacent.
  • the value of the module is approximated relative to the preceding module as a function of the frequency difference and the corresponding attenuation which depends on the type of signal, the normalization value and the type of compression, said approximation of the value of the module is made with reference to a scale whose pitch varies according to a logarithmic law.
  • the coding of the attenuation (block 67) provided by the sample input filter is performed, followed by the deletion of the normalization (block 68) which makes it possible to recalculate the highest module and the corresponding frequency.
  • the coding of the frequency fractions of the non-harmonic frequencies completes the coding of the integer parts of said frequencies.
  • the coding of the number of coding bytes (block 70) is carried out after the coding of the various parameters mentioned above, stored in a dedicated coding memory.
  • the decoding phase of the parameters is represented by the block C1.
  • Decoding being the inverse of the coding, the exploitation of the coding bits of the various parameters mentioned above will make it possible to recover the original values of the parameters, with possible approximations.
  • the noise filtering and special effects generation phase from the analysis, without going through the synthesis is indicated by the block D.
  • the noise filtering is performed from the voice parameters calculated in the analysis (block A1 of block A), taking the path IV indicated on the simplified flowchart of the method according to the invention.
  • noise filtering is therefore to reduce all kinds of noise such as: ambient noise of car, engine, crowd, music, other voices if they are weaker than those to be kept, as well as as the computational noise of any vocoder (for example: ADPCM, GSM, G723).
  • the majority of noises have their energy in the low frequencies; the fact of using the signal of the analysis previously filtered by the sample input filter makes it possible to reduce the very low frequency noise by the same amount.
  • the noise filtering (block D) for a voiced signal consists of producing the sum for each sample, of the original signal, of the original signal shifted by a "pitch" in positive value and of the original signal shifted by a "pitch” in negative value. .
  • the two offset signals are multiplied by the same coefficient, and the original signal is not shifted by a second coefficient; the sum of said first coefficient added to itself and said second coefficient is equal to 1, decreased so as to maintain an equivalent level of the resulting signal.
  • the number of samples spaced from a temporal "pitch" is not limited to three samples; the more samples used for the noise filter, the more the filter reduces the noise.
  • the number of three samples is adapted to the highest temporal pitch encountered in the voice and the filter delay.
  • the lower the temporal "pitch" the more we can use samples shifted by a "pitch” to carry out the filtering; which amounts to keeping the bandwidth around a harmonic, almost constant; the higher the fundamental, the higher the attenuated bandwidth.
  • the noise filtering does not concern signals in the form of pulses; it is therefore necessary to detect the presence of any pulses in the signal.
  • the noise filtering (block D) for an unvoiced signal consists of attenuating said signal by a coefficient of less than 1.
  • the sum of the three signals mentioned above is correlated; as for the noise contained in the original signal, the sum will attenuate its level.
  • the noise filtering and special effects generation phase from the analysis, without going through the synthesis, may not include the calculation of the variation of the "pitch"; this makes it possible to obtain a hearing quality close to that previously obtained according to the aforementioned method; in this procedure, the functions defined by blocks 11, 12, 15, 16, 17, 18, 19, 25 and 28 are deleted.
  • the "Transvoice” function consists in recreating the harmonic modules from the spectral envelope, the original harmonics are abandoned knowing that the non-harmonic frequencies are not modified; as such, said function “Transvoice” uses the function "Formant” which determines the formant.
  • the transformation of the voice is done realistically because the formant is preserved; a coefficient of multiplication of the harmonic frequencies higher than 1 rejuvenates the voice, even the feminization; conversely, a coefficient of multiplication of the harmonic frequencies lower than 1 makes the voice more serious.
  • the new amplitudes will be multiplied by the ratio of the sum of the input modules of said "Transvoice" function to the sum of the output modules.
  • Said "Formant” function can be applied during the coding of the modules, the frequencies, the amplitude ranges and the frequency fractions, by performing the said coding only on the essential parameters of the formant, the "pitch" being validated.
  • the frequencies and the modules are recalculated from the "pitch” and the spectral envelope, respectively.
  • the bit rate is reduced; nevertheless, this approach is applicable only to the voice.
  • this coefficient of multiplication is a function of the ratio between the new "pitch” and the real “pitch”
  • the voice will be characterized by a fixed "pitch” and a variable formant; it will thus be transformed into a robot voice associated with a spatial effect.
  • this multiplication coefficient varies periodically or randomly, at low frequency, the voice is aged associated with a very low frequency.
  • a last solution is to perform a fixed rate coding.
  • the type of signal is reduced to the voiced signal (type 0 and 2 with the validation of the "pitch” at 1), or to the noise (type 1 and 2 with the validation of the "pitch” at 0).
  • Type 2 being for music, it is eliminated in this case, since this coding can only encode the voice.
  • the pitch provides all the harmonics of the voice; their amplitudes are those of the formant.
  • frequencies of the unvoiced signal frequencies are spaced apart from each other by an average value to which is added a random deviation; the amplitudes are those of the formant.
  • the device may include all the elements mentioned above, in professional or semi professional version; some elements, such as the display, can be simplified in basic version.
  • the device according to the invention can exploit the process of differentiated digital processing of voice and music, noise filtering and the creation of special effects.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)
  • Noise Elimination (AREA)
  • Signal Processing Not Specific To The Method Of Recording And Reproducing (AREA)

Claims (21)

  1. Verfahren zur differenzierten digitalen Bearbeitung eines Tonsignals, das im Intervall eines Rahmens durch die Summe von Sinussen feststehender Amplitude gebidet ist, deren Frequenz linear in Abhängigkeit von der Zeit moduliert wird, wobei diese Summe zeitlich durch eine Hüllkurve moduliert wird, wobei das Rauschen des Tonsignals zum Signal vor der Summe hinzugefügt wird,
    dadurch gekennzeichnet, dass es umfasst:
    einen Analyseschritt, der es gestattet, Parameter zu bestimmen, die für dieses Tonsignal repräsentativ sind, durch
    • eine Berechnung der Hüllkurve des Signals,
    • eine Berechnung des Tonsignal des Pitch und seiner Änderung,
    • ein Anlegen der inversen Änderung des "Pitch" an das Zeitsignal, das darin besteht, dass eine zeitliche Abtastung des Tonsignals mit veränderlichem Abtastschritt vorgenommen wird, wobei dieser Schritt sich mit dem inversen Wert der Änderung des Pitch ändert,
    • eine schnelle Fourrier-Transformation (FFT) auf dem vorbehandelten Signal,
    • eine Extraktion der Frequenzkomponenten und ihrer Amplituden des Signals ausgehend von dem Ergebnis der schnellen Fourrier-Transformation,
    • eine Berechnung des "Pitch" in der Frequenzdomäne und seine Änderung bezüglich des zuvor berechneten "Pitch", so dass die Genauigkeit des zuvor berechneten "Pitch" verbessert wird.
  2. Verfahren nach Anspruch 1,
    dadurch gekennzeichnet, dass es außerdem einen Schritt der Synthese der repräsentativen Parameter umfasst, der es gestattet, das Tonsignal zu rekonstituieren.
  3. Verfahren nach den vorhergehenden Ansprüchen,
    dadurch gekennzeichnet, dass es außerdem einen Schritt der Codierung und der Decodierung der für das Tonsignal repräsentativen Parameter umfasst.
  4. Verfahren nach den vorhergehenden Ansprüchen,
    dadurch gekennzeichnet, dass es außerdem einen Schritt der Filterung des Rauschens und einen Schritt der Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, umfasst.
  5. Verfahren nach den vorhergehenden Ansprüchen,
    dadurch gekennzeichnet, dass es außerdem einen Schritt der Erzeugung von der Synthese zugeordneten Spezialeffekten umfasst.
  6. Verfahren nach Anspruch 2,
    dadurch gekennzeichnet, dass der Syntheseschritt umfasst:
    • eine Summation der Sinusse, deren Amplitude der Frequenzkomponenten sich in Abhängigkeit von der Hüllkurve des Signals ändert und deren Frequenzen sich linear ändern,
    • eine Berechnung der Phasen in Abhängigkeit von dem Wert der Frequenzen und der Werte der Phasen und der Frequenzen, die zu dem vorhergehenden Rahmen gehören,
    • eine Überlagerung des Rauschens,
    • ein Anlegen der Hüllkurve.
  7. Verfahren nach Anspruch 4,
    dadurch gekennzeichnet, dass der Schritt der Filterung des Rauschens und der Schritt der Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, eine Summe des Originalsignals, des um einen "Pitch" in positivem Wert versetzten Originalsignals und des um einen "Pitch" in negativem Wert versetzten Originalsignals umfassen.
  8. Verfahren nach Anspruch 7,
    dadurch gekennzeichnet, dass die versetzten Signale mit ein und demselben Koeffizienten multipliziert werden und das Originalsignal mit einem zweiten Koeffizienten, wobei die Summe des zu sich selbst hinzuaddierten ersten Koeffizienten und des zweiten Koeffizienten gleich 1 ist, so vermindert, dass ein äquivalenter Pegel des resultierenden Signals beibehalten wird.
  9. Verfahren nach Anspruch 7,
    dadurch gekennzeichnet, dass der Filterungsschritt und der Schritt der Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, umfassen:
    • eine Teilung des zeitlichen Werts des "Pitch" durch zwei,
    • eine Modifizierung der Amplituden des Originalsignals und der beiden versetzten Signale.
  10. Verfahren nach Anspruch 7,
    dadurch gekennzeichnet, dass der Schritt der Filterung und der Schritt der Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, umfassen:
    • eine Multiplikation jeder Abtastung der Originalstimme mit einem Kosinus, der sich im Rhythmus der Hälfte der Grundschwingung ändert (Multiplikation der Anzahl von Frequenzen mit zwei) oder sich im Rhythmus eines Drittels der Grundschwingung ändert (Multiplikation der Anzahl von Frequenzen mit drei),
    • dann eine Addition des erhaltenen Ergebnisses zur Originalstimme.
  11. Verfahren nach Anspruch 5,
    dadurch gekennzeichnet, dass der Schritt der Erzeugung von der Synthese zugeordneten Spezialeffekten umfasst:
    • eine Multiplikation aller Frequenzen der Frequenzkomponenten des Originalsignals, einzeln genommen, mit einem Koeffizienten,
    • eine Regenerierung der Module der Oberschwingungen ausgehend von der Spektralhüllkurve des Originalsignals.
  12. Verfahren nach Anspruch 11,
    dadurch gekennzeichnet,
    dass der Multiplikationskoeffizient der Frequenzkomponenten ist:
    • ein Koeffizient in Abhängigkeit von dem Verhältnis zwischen dem neuen "Pitch" und dem realen "Pitch",
    • ein Koeffizient, der sich periodisch oder zufällig mit niedriger Frequenz ändert.
  13. Vorrichtung zur differenzierten digitalen Bearbeitung eines Tonsignals, das im Intervall eines Rahmens durch die Summe von Sinussen feststehender Amplitude gebildet ist, deren Frequenz linear in Abhängigkeit von der Zeit moduliert wird, wobei diese Summe durch eine Hüllkurve zeitlich moduliert wird, wobei das Rauschen des Tonsignals zu dem Signal vor der Summe hinzugefügt wird,
    dadurch gekennzeichnet, dass sie Analysemittel umfasst, die es gestatten, für das Tonsignal repräsentative Parameter zu bestimmen,
    umfassend:
    • Mittel zur Berechnung der Hüllkurve des Signals,
    • Mittel zur Berechnung des "Pitch" und seiner Änderung,
    • Mittel zum Anlegen der inversen Änderung des "Pitch" an das Zeitsignal, das darin besteht, dass eine zeitliche Abtastung des Tonsignals mit veränderlichem Abtastschritt vorgenommen wird, wobei dieser Schritt sich mit dem inversen Wert der Änderung des Pitch ändert,
    • Mittel zur schnellen Fourrier-Transformation (FFT) auf dem vorbehandelten Signal,
    • Mittel zur Extraktion der Frequenzkomponenten und ihrer Amplituden des Signals ausgehend von dem Ergebnis der schnellen Fourrier-Transformation,
    • Mittel zur Berechnung des "Pitch" in der Frequenzdomäne und seiner Änderung bezüglich des zuvor berechneten "Pitch", so dass die Genauigkeit dieses zuvor berechneten "Pitch" verbessert wird.
  14. Vorrichtung nach Anspruch 13, dadurch gekennzeichnet, dass sie außerdem umfasst:
    - Mittel zur Synthese der repräsentativen Parameter, die es gestattet, das Tonsignal zu rekonstituieren, und/oder
    - Mittel zur Codierung und Decodierung der für das Tonsignal repräsentativen Parameter und/oder
    - Mittel zur Filterung des Rauschens und zur Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, und/oder
    - Mittel zur Erzeugung von der Synthese zugeordneten Spezialeffekten.
  15. Vorrichtung nach Anspruch 14,
    dadurch gekennzeichnet, dass die Synthesemittel umfassen:
    • Mittel zur Summation der Sinusse, deren Amplitude der Frequenzkomponenten sich in Abhängigkeit von der Hüllkurve des Signals ändert,
    • Mittel zur Berechnung der Phasen in Abhängigkeit von dem Wert der Frequenzen und den Werten der Phasen und der Frequenzen, die zu dem vorhergehenden Rahmen gehören,
    • Mittel zur Überlagerung des Rauschens,
    • Mittel zum Anlegen der Hüllkurve.
  16. Vorrichtung nach Anspruch 13,
    dadurch gekennzeichnet, dass die Mittel zum Filtern des Rauschens und zur Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, Mittel zum Summieren des Originalsignals, des um einen "Pitch" in positivem Wert versetzten Signals und des um einen "Pitch" in negativem Wert versetzten Signals umfassen.
  17. Vorrichtung nach Anspruch 16,
    dadurch gekennzeichnet, dass die versetzten Signale mit ein und demselben Koeffizienten multipliziert werden und das Originalsignal mit einem zweiten Koeffizienten, wobei die Summe des zu sich selbst hinzuaddierten ersten Koeffizienten und des zweiten Koeffizienten gleich 1 ist, so vermindert, dass ein äquivalenter Pegel des resultierenden Signals beibehalten wird.
  18. Vorrichtung nach Anspruch 14,
    dadurch gekennzeichnet, dass die Mitteln zur Filterung und Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, umfassen:
    • Mittel zum Teilen des zeitlichen Werts des "Pitch" durch zwei
    • Mittel zur Modifizierung der Amplituden des Originalsignals und der beiden versetzten Signale.
  19. Vorrichtung nach Anspruch 14,
    dadurch gekennzeichnet, dass die Mittel zur Filterung und zur Erzeugung von Spezialeffekten ausgehend von der Analyse, ohne über die Synthese zu gehen, umfassen:
    • Mittel zum Multiplizieren jeder Abtastung der Originalstimme mit einem Kosinus, der sich im Rhythmus der Hälfte der Grundschwingung ändert (Multiplikation der Anzahl von Frequenzen mit zwei) oder sich im Rhythmus eines Drittels der Grundschwingung ändert (Multiplikation der Anzahl von Frequenzen mit drei),
    • Mittel zur darauffolgenden Addition des erhaltenen Ergebnisses zur Originalstimme.
  20. Vorrichtung nach Anspruch 14,
    dadurch gekennzeichnet, dass die Mittel zur Erzeugung von der Synthese zugeordneten Spezialeffekten umfassen:
    • Mittel zum Multiplizieren aller Frequenzen der Frequenzkomponenten des Originalsignals, einzeln genommen, mit einem Koeffizienten,
    • Mittel zur Regenerierung der Module der Oberschwingungen ausgehend von der Spektralhüllkurve des Originalsignals.
  21. Vorrichtung nach Anspruch 20,
    dadurch gekennzeichnet, dass der Koeffizient zur Multiplikation der Frequenzkomponenten ist:
    • ein Koeffizient in Abhängigkeit von dem Verhältnis zwischen dem neuen "Pitch" und dem realen "Pitch",
    • ein Koeffizient, der sich periodisch mit niedriger Frequenz ändert.
EP04705433A 2003-01-30 2004-01-27 Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens Expired - Lifetime EP1593116B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR0301081 2003-01-30
FR0301081A FR2850781B1 (fr) 2003-01-30 2003-01-30 Procede pour le traitement numerique differencie de la voix et de la musique, le filtrage du bruit, la creation d'effets speciaux et dispositif pour la mise en oeuvre dudit procede
PCT/FR2004/000184 WO2004070705A1 (fr) 2003-01-30 2004-01-27 Procede pour le traitement numerique differencie de la voix et de la musique, le filtrage de bruit, la creation d’effets speciaux et dispositif pour la mise en oeuvre dudit procede

Publications (2)

Publication Number Publication Date
EP1593116A1 EP1593116A1 (de) 2005-11-09
EP1593116B1 true EP1593116B1 (de) 2010-03-10

Family

ID=32696232

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04705433A Expired - Lifetime EP1593116B1 (de) 2003-01-30 2004-01-27 Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens

Country Status (7)

Country Link
US (1) US8229738B2 (de)
EP (1) EP1593116B1 (de)
AT (1) ATE460726T1 (de)
DE (1) DE602004025903D1 (de)
ES (1) ES2342601T3 (de)
FR (1) FR2850781B1 (de)
WO (1) WO2004070705A1 (de)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100547113B1 (ko) * 2003-02-15 2006-01-26 삼성전자주식회사 오디오 데이터 인코딩 장치 및 방법
US20050226601A1 (en) * 2004-04-08 2005-10-13 Alon Cohen Device, system and method for synchronizing an effect to a media presentation
JP2007114417A (ja) * 2005-10-19 2007-05-10 Fujitsu Ltd 音声データ処理方法及び装置
US7772478B2 (en) * 2006-04-12 2010-08-10 Massachusetts Institute Of Technology Understanding music
US7622665B2 (en) * 2006-09-19 2009-11-24 Casio Computer Co., Ltd. Filter device and electronic musical instrument using the filter device
FR2912249A1 (fr) * 2007-02-02 2008-08-08 France Telecom Codage/decodage perfectionnes de signaux audionumeriques.
CA2690433C (en) * 2007-06-22 2016-01-19 Voiceage Corporation Method and device for sound activity detection and sound signal classification
KR101410230B1 (ko) * 2007-08-17 2014-06-20 삼성전자주식회사 종지 정현파 신호와 일반적인 연속 정현파 신호를 다른방식으로 처리하는 오디오 신호 인코딩 방법 및 장치와오디오 신호 디코딩 방법 및 장치
US8315398B2 (en) 2007-12-21 2012-11-20 Dts Llc System for adjusting perceived loudness of audio signals
US20100329471A1 (en) * 2008-12-16 2010-12-30 Manufacturing Resources International, Inc. Ambient noise compensation system
US8670990B2 (en) * 2009-08-03 2014-03-11 Broadcom Corporation Dynamic time scale modification for reduced bit rate audio coding
WO2011019339A1 (en) * 2009-08-11 2011-02-17 Srs Labs, Inc. System for increasing perceived loudness of speakers
US8538042B2 (en) 2009-08-11 2013-09-17 Dts Llc System for increasing perceived loudness of speakers
US8204742B2 (en) 2009-09-14 2012-06-19 Srs Labs, Inc. System for processing an audio signal to enhance speech intelligibility
EP2492911B1 (de) * 2009-10-21 2017-08-16 Panasonic Intellectual Property Management Co., Ltd. Audiokodierungsvorrichtung, dekodierungsvorrichtung, verfahren, schaltung und programm
EP2737479B1 (de) 2011-07-29 2017-01-18 Dts Llc Adaptive sprachverständlichkeitsverbesserung
US9312829B2 (en) 2012-04-12 2016-04-12 Dts Llc System for adjusting loudness of audio signals in real time
US9318086B1 (en) * 2012-09-07 2016-04-19 Jerry A. Miller Musical instrument and vocal effects
JP5974369B2 (ja) * 2012-12-26 2016-08-23 カルソニックカンセイ株式会社 ブザー出力制御装置およびブザー出力制御方法
US9484044B1 (en) * 2013-07-17 2016-11-01 Knuedge Incorporated Voice enhancement and/or speech features extraction on noisy audio signals using successively refined transforms
US9530434B1 (en) 2013-07-18 2016-12-27 Knuedge Incorporated Reducing octave errors during pitch determination for noisy audio signals
US20150179181A1 (en) * 2013-12-20 2015-06-25 Microsoft Corporation Adapting audio based upon detected environmental accoustics
JP6402477B2 (ja) * 2014-04-25 2018-10-10 カシオ計算機株式会社 サンプリング装置、電子楽器、方法、およびプログラム
TWI569263B (zh) * 2015-04-30 2017-02-01 智原科技股份有限公司 聲頻訊號的訊號擷取方法與裝置
CN112908352B (zh) * 2021-03-01 2024-04-16 百果园技术(新加坡)有限公司 一种音频去噪方法、装置、电子设备及存储介质
US20230154480A1 (en) * 2021-11-18 2023-05-18 Tencent America LLC Adl-ufe: all deep learning unified front-end system
US20230289652A1 (en) * 2022-03-14 2023-09-14 Matthias THÖMEL Self-learning audio monitoring system

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4201105A (en) * 1978-05-01 1980-05-06 Bell Telephone Laboratories, Incorporated Real time digital sound synthesizer
US4357852A (en) * 1979-05-21 1982-11-09 Roland Corporation Guitar synthesizer
US5054072A (en) * 1987-04-02 1991-10-01 Massachusetts Institute Of Technology Coding of acoustic waveforms
JP3351905B2 (ja) * 1994-07-28 2002-12-03 ソニー株式会社 音声信号処理装置
WO1997017692A1 (en) * 1995-11-07 1997-05-15 Euphonics, Incorporated Parametric signal modeling musical synthesizer
US6031173A (en) * 1997-09-30 2000-02-29 Kawai Musical Inst. Mfg. Co., Ltd. Apparatus for generating musical tones using impulse response signals
US7072832B1 (en) * 1998-08-24 2006-07-04 Mindspeed Technologies, Inc. System for speech encoding having an adaptive encoding arrangement
US6240386B1 (en) * 1998-08-24 2001-05-29 Conexant Systems, Inc. Speech codec employing noise classification for noise compensation
JP2000082260A (ja) * 1998-09-04 2000-03-21 Sony Corp オーディオ信号再生装置及び方法
AU2001241475A1 (en) * 2000-02-11 2001-08-20 Comsat Corporation Background noise reduction in sinusoidal based speech coding systems
US20020184009A1 (en) * 2001-05-31 2002-12-05 Heikkinen Ari P. Method and apparatus for improved voicing determination in speech signals containing high levels of jitter

Also Published As

Publication number Publication date
US8229738B2 (en) 2012-07-24
EP1593116A1 (de) 2005-11-09
ATE460726T1 (de) 2010-03-15
DE602004025903D1 (de) 2010-04-22
FR2850781A1 (fr) 2004-08-06
WO2004070705A1 (fr) 2004-08-19
ES2342601T3 (es) 2010-07-09
FR2850781B1 (fr) 2005-05-06
US20060130637A1 (en) 2006-06-22

Similar Documents

Publication Publication Date Title
EP1593116B1 (de) Verfahren zur differenzierten digitalen Sprach- und Musikbearbeitung, Rauschfilterung, Erzeugung von Spezialeffekten und Einrichtung zum Ausführen des Verfahrens
EP0002998B1 (de) Verfahren und Vorrichtung zur Sprachdatenkompression
EP2002428B1 (de) Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung
BE1005622A3 (fr) Methodes de codage de segments du discours et de reglage du pas pour des systemes de synthese de la parole.
EP1692689B1 (de) Optimiertes mehrfach-codierungsverfahren
EP1395981B1 (de) Einrichtung und verfahren zur verarbeitung eines audiosignals
Kumar Real-time performance evaluation of modified cascaded median-based noise estimation for speech enhancement system
EP0428445B1 (de) Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate
EP1849157B1 (de) Verfahren zur messung von durch geräusche in einem audiosignal verursachten beeinträchtigungen
EP2080194B1 (de) Dämpfung von stimmüberlagerung, im besonderen zur erregungserzeugung bei einem decoder in abwesenheit von informationen
EP2795618B1 (de) Verfahren zur erkennung eines vorgegebenen frequenzbandes in einem audiodatensignal, erkennungsvorrichtung und computerprogramm dafür
EP1125283A1 (de) Verfahren zur quantisierung der parameter eines sprachkodierers
EP3138095A1 (de) Verbesserte frameverlustkorrektur mit sprachinformationen
EP0573358B1 (de) Verfahren und Vorrichtung zur Sprachsynthese mit variabler Geschwindigkeit
EP1192619B1 (de) Audio-kodierung, dekodierung zur interpolation
EP1192618B1 (de) Audiokodierung mit adaptiver lifterung
EP1192621B1 (de) Audiokodierung mit harmonischen komponenten
EP1190414A1 (de) Audio-kodierung, dekodierung, mit harmonischen komponenten und minimaler phase
FR2760285A1 (fr) Procede et dispositif de generation d'un signal de bruit pour la sortie non vocale d'un signal decode de la parole
Kwon An Improved Weighting Function for Low-rate CELP Speech Coding
FR2737360A1 (fr) Procedes de codage et de decodage de signaux audiofrequence, codeur et decodeur pour la mise en oeuvre de tels procedes
FR2739482A1 (fr) Procede et dispositif pour l'evaluation du voisement du signal de parole par sous bandes dans des vocodeurs
EP1192620A1 (de) Audiosignalkodierer und -dekodierer einschliesslich nicht-harmonischen komponenten
EP1194923A1 (de) Verfahren und system für audio analyse und synthese
FR2847706A1 (fr) Analyse de la qualite de signal vocal selon des criteres de qualite

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20050824

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

DAX Request for extension of the european patent (deleted)
17Q First examination report despatched

Effective date: 20081001

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RTI1 Title (correction)

Free format text: METHOD FOR DIFFERENTIATED DIGITAL VOICE AND MUSIC PROCESSING, NOISE FILTERING, CREATION OF SPECIAL EFFECTS AND DEVICE FOR CARRYING OUT SAID METHOD

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REF Corresponds to:

Ref document number: 602004025903

Country of ref document: DE

Date of ref document: 20100422

Kind code of ref document: P

REG Reference to a national code

Ref country code: NL

Ref legal event code: VDEP

Effective date: 20100310

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2342601

Country of ref document: ES

Kind code of ref document: T3

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

REG Reference to a national code

Ref country code: IE

Ref legal event code: FD4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100611

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100610

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100712

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

26N No opposition filed

Effective date: 20101213

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20110131

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20110131

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20110131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20110127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20100310

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 602004025903

Country of ref document: DE

Representative=s name: GRAMM, LINS & PARTNER PATENT- UND RECHTSANWAEL, DE

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 13

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 14

REG Reference to a national code

Ref country code: FR

Ref legal event code: PLFP

Year of fee payment: 15

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20190719

Year of fee payment: 16

Ref country code: IT

Payment date: 20190730

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: BE

Payment date: 20190718

Year of fee payment: 16

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20190719

Year of fee payment: 16

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20200127

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20200131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200127

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200131

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200127

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20210604

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200128

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20220720

Year of fee payment: 19

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230127

Year of fee payment: 20

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 602004025903

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20230801