CN108885875A - Device and method for improving the conversion from the concealing audio signal section of audio signal to subsequent audio signal parts - Google Patents
Device and method for improving the conversion from the concealing audio signal section of audio signal to subsequent audio signal parts Download PDFInfo
- Publication number
- CN108885875A CN108885875A CN201780020242.9A CN201780020242A CN108885875A CN 108885875 A CN108885875 A CN 108885875A CN 201780020242 A CN201780020242 A CN 201780020242A CN 108885875 A CN108885875 A CN 108885875A
- Authority
- CN
- China
- Prior art keywords
- audio signal
- sample
- signal parts
- parts
- processor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 495
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 title claims description 55
- 238000013016 damping Methods 0.000 claims description 35
- 239000003550 marker Substances 0.000 claims description 34
- 230000005284 excitation Effects 0.000 claims description 31
- 230000006978 adaptation Effects 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 13
- 238000001914 filtration Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 238000012986 modification Methods 0.000 claims description 5
- 230000003213 activating effect Effects 0.000 claims description 2
- 230000007257 malfunction Effects 0.000 claims 1
- 238000013461 design Methods 0.000 description 10
- 238000012545 processing Methods 0.000 description 6
- 238000011084 recovery Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 239000002131 composite material Substances 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000005086 pumping Methods 0.000 description 3
- 206010002953 Aphonia Diseases 0.000 description 2
- 108010076504 Protein Sorting Signals Proteins 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000008439 repair process Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/04—Time compression or expansion
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Signal Processing For Digital Recording And Reproducing (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Stereophonic System (AREA)
Abstract
It provides a kind of for improving the device (10) of the conversion of the subsequent audio signal parts from the concealing audio signal section of audio signal to audio signal.Device (10) includes processor (11), processor (11) is configured as generating the decoding audio signal parts of audio signal according to the first audio signal parts and according to the second audio signal parts, wherein the first audio signal parts depend on concealing audio signal section, and wherein the second audio signal parts depend on subsequent audio signal parts.In addition, device (10) includes output interface (12), for exporting decoding audio signal parts.First audio signal parts and the second audio signal parts and decoding each of audio signal parts include multiple samples, wherein each of the first audio signal parts and the second audio signal parts and the multiple samples for decoding audio signal parts sample are defined by sample position in multiple sample positions and sample value.
Description
Technical field
The present invention relates to Audio Signal Processings and decoding, and particularly a kind of for improving concealing audio signal section
The device and method for assigning to the conversion from the subsequent audio signal parts of audio signal.
Background technique
In the case where being easy the network of error, each codec attempts to mitigate pseudomorphism as caused by being lost these
(artifacts).Prior art concern from simple mute or noise by means of being substituted into such as based on the pre- of past good frame
Survey etc sophisticated method distinct methods come to lose information be hidden.One of the pseudomorphism as caused by packet loss is bright
The huge source for showing ignored is located at (several good frames after loss) recovery.
Due to the long-term forecast being commonly used in the case where audio coder & decoder (codec), restoring pseudomorphism may be very serious, and
And error propagation may influence multiple subsequent good frames.Some prior arts attempt to mitigate the problem, see, for example, [1] and
[2]。
In the case where general or audio codec (any codec to work in the transform domain as illustrated), it can find perhaps
Mostly about the document of hiding frame loss (for example, in [3]).However, the available prior art and being not concerned with the recovery of frame.Assuming that by
In the property of transform domain codec, it is overlapped and adds smooth conversion pseudomorphism.One good example is in Facetime
AAC-ELD (AAC-ELD=Advanced Audio Coding-enhanced low delay for communicating on ip networks;Referring to [4]).
Former frames after frame loss are known as " restoring frame ".The transform domain codec of the prior art seems not provide about one
A or multiple specially treateds for restoring frame.It sometimes appear that annoying pseudomorphism.Execute the example for the problem of may occurring when restoring
It is to hide wave signal and good wave signal to be overlapped and adding the superposition in part, this sometimes results in annoying energy lift.
Another problem is that the unexpected pitch changing on frame boundaries.The example of the case where for voice signal is to work as original signal
Pitch changing and when frame loss occurs, hidden method can predict the pitch slightly mistake at End of Frame.It is this slightly to go out
Wrong prediction may result in pitch and jump in next good frame.Most known hidden method does not even use prediction,
And only using fixed pitch benchmark (pitch base) on last effectively pitch, this be may cause with the first good frame very
To bigger mismatch.Some other methods reduce offset using advanced prediction, for example, with reference to EVS (the enhanced language of EVS=
Sound service) in TD-TCX PLC (TD=time domain;TCX=transform coded excitation;PLC=packet loss concealment), referring to [5].
For modifying the art methods of the pitch in voice signal (for example, TD-PSOLA (TD-PSOLA=time domain-
Pitch-synchronous overlapping-addition, referring to [6] and [7]) prosody modification is executed (for example, expansion/receipts of duration to voice signal
Contract (referred to as time-stretching)) or change fundamental frequency (pitch).This is by the way that voice signal is resolved into short-term and pitch Synchronization Analysis
Signal is completed, then relocate on a timeline and gradually juxtaposition these analyze signals.However, when the sound in concealment frames
When high different with the pitch in original signal, the signal restored in frame is destroyed after overlapping mechanism.TD-PSOLA mechanism is
Pseudomorphism is relocated on a timeline, this is not suitable for restoring.
Summary of the invention
Therefore, the purpose of the present invention is to provide be used for Audio Signal Processing and decoded improved design.
The purpose of the present invention is by device according to claim 1, according to the method for claim 35 and according to right
It is required that 36 computer program solved.
Provide a kind of subsequent audio letter for improving from the concealing audio signal section of audio signal to audio signal
The device of the conversion of number part.
The device includes processor, and processor is configured as believing according to the first audio signal parts and according to the second audio
Number part generate audio signal decoding audio signal parts, wherein the first audio signal parts depend on concealing audio signal section
Point, and wherein the second audio signal parts depend on subsequent audio signal parts.
In addition, the device includes output interface, for exporting decoding audio signal parts.
Each of first audio signal parts and the second audio signal parts and decoding audio signal parts include
Multiple samples, wherein multiple samples of the first audio signal parts and the second audio signal parts and decoding audio signal parts
Each of sample be to be defined by sample position in multiple sample positions and sample value, plurality of sample position
Be sorted so that in the first sample position and multiple sample positions in multiple sample positions with first sample position not
Each of the second same sample position is right, and first sample position is the subsequent or leading of the second sample position.
Processor is configured to determine that the first subdivision of the first audio signal parts, so that with the first audio signal parts
It include less sample compared to the first subdivision.
Processor is configured with the first subdivision of the first audio signal parts and uses the second audio signal portion
Point or the second subdivision of the second audio signal parts generate decoding audio signal parts so that for the second audio signal portion
Point two or more samples in each sample, the sample in two or more samples of the second audio signal parts
This sample position is equal to the sample position of a sample of decoding audio signal parts, and makes the second audio signal parts
Two or more samples in the sample value of the sample be different from the one samples of decoding audio signal parts
Sample value.
Further it is provided that a kind of for improving from the concealing audio signal section of audio signal to the subsequent sound of audio signal
The method of the conversion of frequency signal section.The method includes:
Believe according to the first audio signal parts and according to the decoding audio that the second audio signal parts generate audio signal
Number part, wherein the first audio signal parts depend on concealing audio signal section, and wherein the second audio signal parts take
Certainly in subsequent audio signal parts.And:
Output decoding audio signal parts.
Each of first audio signal parts and the second audio signal parts and decoding audio signal parts include
Multiple samples, wherein multiple samples of the first audio signal parts and the second audio signal parts and decoding audio signal parts
Each of sample be to be defined by sample position in multiple sample positions and sample value, plurality of sample position
Be sorted so that in the first sample position and multiple sample positions in multiple sample positions with first sample position not
Each of the second same sample position is right, and first sample position is the subsequent or leading of the second sample position.
Generating decoding audio signal includes determining the first subdivision of the first audio signal parts, so that believing with the first audio
Number part includes less sample compared to first part.
In addition, generating decoding audio signal parts is the first subdivision using the first audio signal parts and uses the
Second subdivision of two audio signal parts or the second audio signal parts is performed, so that for the second audio signal parts
Two or more samples in each sample, the sample in two or more samples of the second audio signal parts
Sample position be equal to decoding audio signal parts a sample sample position, and make the second audio signal parts
The sample value of the sample in two or more samples is different from the sample of one sample of decoding audio signal parts
This value.
Further it is provided that a kind of meter for being configured as realizing the above method when executing on the computer or signal processor
Calculation machine program.
Some embodiments, which provide, restores filter, is used for smooth for one kind and repairs in (for example, block-based) audio
Tool in codec from lost frames to the conversion of the first good frame.According to embodiment, restoring filter can be used in language
Pitch changing is fixed during concealment frames in first good frame of sound signal, but is also used for the conversion of smooth noise signal.
Especially, some embodiments based on the finding that:The length of modification of signal is limited, from concealment frames
The last one sample terminated starts the last one sample to the first good frame.Length can increase above in the first good frame
The last one sample, but this can risk the risk of error propagation, and error propagation is difficult to handle in the frame in future.Therefore,
Need fast quick-recovery.In order to repair characteristics of speech sounds in the unmatched situation between lost frames and recovery frame, restore the letter in frame
Number pitch should slowly change from the pitch in concealment frames to the pitch restored in frame, while modification of signal must be kept long
The limitation of degree.It the use of TD-PSOLA algorithm will be possible if the multiple of pitch changing integer value.Since this is a kind of non-
Often rare situation, therefore TD-PSOLA cannot be applied in this case.
Detailed description of the invention
Below with reference to attached drawing embodiment of the present invention will be described in more detail, in the accompanying drawings:
Fig. 1 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal
Subsequent audio signal parts conversion device.
Fig. 1 b is shown according to another embodiment for realizing pitch adaptation overlapping design for improving from audio signal
Concealing audio signal section to audio signal subsequent audio signal parts conversion device.
Fig. 1 c is shown according to another embodiment for realizing excitation overlapping design for improving hiding from audio signal
Audio signal parts to audio signal subsequent audio signal parts conversion device.
Fig. 1 d is shown according to another embodiment for realizing energy damping for improving the concealing audio from audio signal
Signal section to audio signal subsequent audio signal parts conversion device.
Fig. 1 e shows device according to another embodiment, and wherein the device further includes hidden unit.
Fig. 1 f shows device according to another embodiment, and wherein the device further includes the activation for activating hidden unit
Unit.
Fig. 1 g shows device according to another embodiment, wherein activation unit is additionally configured to active processor.
Fig. 2 shows Hamming Cosine Windows according to the embodiment.
Fig. 3 shows the concealment frames and good frame according to such embodiment.
Fig. 4 shows the generation of two prototypes of realization pitch adaptation overlapping according to the embodiment.And:
Fig. 5 shows excitation overlapping according to the embodiment.
Fig. 6 shows concealment frames according to the embodiment and good frame.
Fig. 7 a shows system according to the embodiment.
Fig. 7 b shows system according to another embodiment.
Fig. 7 c shows system according to another embodiment.
Fig. 7 d shows system according to another embodiment.And:
Fig. 7 e shows system according to another embodiment.
Specific embodiment
Fig. 1 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal
Subsequent (succeeding) audio signal parts conversion device 10.
Device 10 includes processor 11, and processor 11 is configured as according to the first audio signal parts and according to the second sound
Frequency signal section generates the decoding audio signal parts of audio signal, wherein the first audio signal parts are believed depending on concealing audio
Number part, and wherein the second audio signal parts depend on subsequent audio signal parts.
In some embodiments, the first audio signal parts can be exported for example according to concealing audio signal section, still
Can for example different from concealing audio signal section and/or the second audio signal parts can be for example according to subsequent audio signal
Part exports, but can be for example different from subsequent audio signal parts.
In other embodiments, the first audio signal parts may, for example, be and (be equal to) concealing audio signal section, and
Second audio signal parts may, for example, be subsequent audio signal parts.
In addition, device 10 includes output interface 12, for exporting decoding audio signal parts.
Each of first audio signal parts and the second audio signal parts and decoding audio signal parts include
Multiple samples, wherein multiple samples of the first audio signal parts and the second audio signal parts and decoding audio signal parts
Each of sample be to be defined by sample position in multiple sample positions and sample value, plurality of sample position
Be sorted so that in the first sample position and multiple sample positions in multiple sample positions with first sample position not
Each of the second same sample position is right, and first sample position is the subsequent or leading of the second sample position.
For example, defining sample by sample position and sample value.For example, in two-dimensional coordinate system, sample position can be with
The x-axis value (axis of abscissas value) of sample is defined, and sample value can define the y-axis value (axis of ordinates value) of the sample.Therefore,
In view of specific sample, all samples in two-dimensional coordinate system on the left of specific sample be all the specific sample it is leading (because
It is less than the sample position of specific sample for their sample position).All samples in two-dimensional coordinate system on the right side of specific sample
This is all subsequent (because sample position that their sample position is greater than specific sample) of the specific sample.
Processor 11 is configured to determine that the first subdivision of the first audio signal parts, so that with the first audio signal portion
Split-phase includes less sample than the first subdivision.
Processor 11 is configured with the first subdivision of the first audio signal parts and uses the second audio signal
Second subdivision of part or the second audio signal parts generates decoding audio signal parts, so that for the second audio signal
Each sample in two or more partial samples, it is described in two or more samples of the second audio signal parts
The sample position of sample is equal to the sample position of a sample of decoding audio signal parts, and makes the second audio signal portion
The sample value of the sample in two or more samples divided is different from one sample of decoding audio signal parts
Sample value.
Therefore, in some embodiments, processor 11 is configured with the first subdivision and is believed using the second audio
Number part generates decoding audio signal parts.
In other embodiments, processor 11 will use the first subdivision and use the second of the second audio signal parts
Subdivision generates decoding audio signal parts.The second subdivision includes less sample compared with the second audio signal parts.
Embodiment based on the finding that:By modifying the sample of subsequent audio signal parts and not only by adjusting hiding
The sample of audio signal is improved from the concealing audio signal section of audio signal to the subsequent audio signal parts of audio signal
Conversion be beneficial.By also modifying the sample for the frame being properly received, can improve from (for example, concealing audio signal frame)
Conversion of the concealing audio signal section to (for example, subsequent audio signal frame) subsequent audio signal parts.
Therefore, decoding audio signal parts are generated using the first audio signal parts and the second audio signal parts, but
Be decoding audio signal parts include (at least two or more) sample, the sample be assigned to sample position and as second
The different sample of sample value in audio signal parts (its depend on subsequent audio signal parts).This means that for these samples
This, the sample value of corresponding sample does not use not instead of as it is, is modified, to obtain the correspondence sample of decoding audio signal parts
This.
About the first audio signal parts and the second audio signal parts, processor 11 can for example receive the first audio letter
Number part and the second audio signal parts.
Alternatively, in another embodiment, for example, processor 11 can for example receive concealing audio signal section, and can
To determine the first audio signal parts according to concealing audio signal section, and processor 11 can for example receive subsequent audio
Signal section, and the second audio signal parts can be determined according to subsequent audio signal parts.
Alternatively, in another embodiment, for example, processor 11 can for example receive audio signal frame;For example, processor 11
It can determine that the first frame loss or first frame are destroyed.Then, processor 11 can execute hiding, and can be for example according to existing
There is technical concept to generate concealing audio signal section.In addition, processor 11 can for example receive the second audio signal frame, and
Subsequent audio signal parts can be obtained from the second audio signal frame.Fig. 1 e shows such embodiment.
In some embodiments, the first audio signal parts may, for example, be as relative to concealing audio signal section
The residual signals part of first residual signals of residual signals.In some embodiments, for example, the second audio signal parts can be with
It is the residual signals part of the second residual signals as the residual signals relative to subsequent audio signal parts.
In Fig. 1 e, device 10 further includes hidden unit 8, and hidden unit 8 is configured as to error or loss work as
Previous frame, which executes, to be hidden, to obtain concealing audio signal section.
According to the embodiment of Fig. 1 e, which further includes hidden unit 8.Hidden unit 8 can be for example configured as:If
Frame loss is destroyed, then is executed and hidden according to the prior art.Then, concealing audio signal section is delivered to by hidden unit 8
Processor 11.In such embodiments, concealing audio signal section may, for example, be being performed hiding error or lose
The concealing audio signal section of the frame of mistake.Subsequent audio signal frame, which may, for example, be, is not performed hiding (subsequent) audio signal
The subsequent audio signal parts of frame.Subsequent audio signal frame can be for example subsequent in time in error or loss frame.
Fig. 1 f shows embodiment, and wherein device 10 further includes activation unit 6, and activation unit 6 can be for example configured as
Whether detection present frame is loss or error.For example, if present frame is after last received frame not predetermined
It is reached in adopted time restriction, then activates unit 6 that can for example obtain the conclusion of current frame loss.Alternatively, for example, having than current
Another frame (for example, subsequent frame) of the big frame number of the frame number of frame reaches, then activates unit that can for example obtain current frame loss
Conclusion.If such as it is received verification and/or received check bit be not equal to by the calculated calculating of activation unit verification and/or
The check bit of calculating then activates unit 6 that can for example show that frame is the conclusion of error.
The activation unit 6 of Fig. 1 f can be for example configured as:If present frame is to lose either error, activate
Hidden unit 8 is hidden with executing to present frame.
Fig. 1 g shows embodiment, wherein activation unit 6 can be for example configured as:If present frame be lose or
It is error, then detects whether the subsequent frame not malfunctioned reaches.In the embodiment of Fig. 1 g, activation unit 6 be can be configured as:
If present frame is to lose either error, and if the subsequent frame not malfunctioned reaches, active processor (8) is to produce
Raw decoding audio signal parts.
Fig. 1 b show according to another embodiment for improving from the concealing audio signal section of audio signal to audio
The device 100 of the conversion of the subsequent audio signal parts of signal.The device of Fig. 1 b realizes pitch adaptation overlapping design.
The device 100 of Fig. 1 b is the specific embodiment of the device 10 of Fig. 1 a.The processor 110 of Fig. 1 b is the processor of Fig. 1 a
11 specific embodiment.The output interface 120 of Fig. 1 b is the specific embodiment of the output interface 12 of Fig. 1 a.
In the embodiment of Fig. 1 b, processor 110 can be for example configured as:It is determined as the second audio signal parts
Second prototype signal part of the second subdivision, so that the second subdivision includes less sample compared with the second audio signal parts
This.
Processor 110 can for example be configured as the first prototype signal part and second by that will be used as the first subdivision
Prototype signal part is combined, and each of one or more intermediate prototype signal parts is determined, to determine one or more
A intermediate prototype signal part.
In Figure 1b, processor 110 can for example be configured with the first prototype signal part, using one or more
Intermediate prototype signal part and decoding audio signal parts are generated using the second prototype signal part.
According to embodiment, processor 110 can be for example configured as by by the first prototype signal part, one or more
Intermediate prototype signal part and the second prototype signal part are combined to generate decoding audio signal parts.
In embodiment, processor 110 is configured to determine that three or more marker samples positions, wherein three or more
Each of multiple marker samples positions are at least one of the first audio signal parts and the second audio signal parts
Sample position.In addition, processor 110 be configured as selection the second audio signal parts in, for the second audio signal parts
Any other sample any other sample position for be all subsequent sample sample position, as three or more
The final sample position of a marker samples position.In addition, processor 110 is configured as by according to the first audio signal parts
Correlation between first subdivision and the second subdivision of the second audio signal parts is selected from the first audio signal parts
Sample position, to determine the beginning sample position of three or more marker samples positions.In addition, processor 110 is configured as
According to the beginning sample position of three or more marker samples positions and according to three or more marker samples positions
Final sample position, to determine one or more intermediate sample positions of three or more marker samples positions.In addition, processing
Device 110 is configured as by being carried out the first prototype signal part and the second prototype signal part according to the intermediate sample position
It combines and is directed to each of one or more of intermediate sample positions in prototype signal part among one or more to determine
The intermediate prototype signal part of a intermediate sample position determines one or more intermediate prototype signal parts.
According to embodiment, processor 110 is configured as by according to the following formula by the first prototype signal part and second
Prototype signal part is combined to determine being directed in one or more of in prototype signal part among one or more
Between sample position each intermediate sample position intermediate prototype signal part, to determine prototype signal among one or more
Part:
sigi=(1- α) sigfirst+α·siglast
Wherein:
Wherein, i is integer, and i >=1, and wherein nrOfMarkers is the quantity of three or more marker samples positions
1 is subtracted, wherein sigiIt is i-th of intermediate prototype signal part among one or more in prototype signal part, wherein
sigfirstIt is the first prototype signal part, wherein siglastIt is the second prototype signal part.
In embodiment, processor 110 is configured as any one of according to the following formula to determine three or more
One or more intermediate sample positions of a marker samples position:
Or
Wherein,
Wherein, δ=x1-(x0+nrOfMarkers·Tc),
Wherein,
Wherein, i is integer, and i >=1, and wherein nrOfMarkers is the quantity of three or more marker samples positions
1 is subtracted, wherein markiIt is i-th of intermediate sample position in three or more marker samples positions, wherein marki-1It is three
(i-1)-th intermediate sample position of a or more marker samples position, wherein marki+1It is three or more marker samples
The i+1 intermediate sample position of position, wherein x0It is the beginning sample position of three or more marker samples positions, wherein
x1It is the final sample position of three or more marker samples positions, and wherein TcIndicate pitch lag.
According to embodiment, processor 110 is configured as filtering according to concealing audio signal section and according to multiple thirds
Device coefficient determines the first audio signal parts, plurality of third filter coefficient depend on concealing audio signal section and after
After audio signal parts, and wherein, processor 110 is configured as according to subsequent audio signal parts and multiple third filters
Coefficient determines the second audio signal parts.
In embodiment, processor 110 can be for example including filter, and wherein processor 110 is configured as to hiding sound
Frequency signal section applies the filter with third filter coefficient to obtain the first audio signal parts, and wherein processor
110 are configured as applying the filter with third filter coefficient to obtain the second audio signal subsequent audio signal parts
Part.
According to embodiment, processor 110 is configured as determining multiple first filter systems according to concealing audio signal section
Number, wherein processor 110 is configured as determining multiple second filter coefficients according to subsequent audio signal parts, wherein processor
110 are configured as being determined according to the combination of one or more first filter coefficients and one or more second filter coefficients
Each third filter coefficient.
In embodiment, the filter of the filter coefficient of multiple first filter coefficients, multiple second filter coefficients
The filter coefficient of coefficient and multiple third filter coefficients is the LPC parameters of linear prediction filter.
According to embodiment, processor 110 is configured as determining each filtering of third filter coefficient according to the following formula
Device coefficient:
A=0.5Aconc+0.5·Agood
Wherein, A indicates the filter coefficient value of the filter coefficient, wherein AconcIndicate multiple first filter coefficients
In filter coefficient coefficient value, and wherein AgoodIndicate the coefficient of the filter coefficient in multiple second filter coefficients
Value.
In embodiment, more than processor 110 is configured as applying concealing audio signal section and is defined by following formula
Porthole hides windowing signal part to obtain:
Wherein, processor 110 is configured as to subsequent audio signal parts using the Cosine Window to obtain subsequent adding window
Signal section, wherein processor 110 is configured as determining multiple first filter coefficients according to hiding windowing signal part,
Middle processor 110 is configured as determining multiple second filter coefficients according to subsequent windowing signal part, and wherein x, x1
And x2Each of be sample position in multiple sample positions.
According to embodiment, processor 110 can for example be configured as candidate according to multiple subdivisions of the first audio signal
Multiple correlations of each subdivision and second subdivision of the second audio signal parts of item select first prototype
Signal section as multiple subdivision candidate items of the first audio signal parts subdivision.Processor 110 can for example by
Be configured to it is in the multiple samples for selecting first prototype signal part, for first prototype signal part it is any its
It is all leading sample position for any other sample position of its sample, as three or more marker samples positions
Beginning sample position.
In embodiment, processor 110 can be for example configured as selecting in the subdivision candidate item and described second
The correlation of subdivision has the subdivision of the highest correlation in the multiple correlation as first prototype signal
Part.
According to embodiment, processor 110 is configured as according to the following formula determining each phase for multiple correlations
The correlation of closing property:
Wherein, LframeIndicate the sample of second audio signal parts equal with the sample size of the first audio signal parts
Quantity, wherein r (2Lframe- i) instruction the second audio signal parts in sample position 2LframeThe sample of the sample at the place-i
It is worth, wherein r (Lframe- i- Δ) instruction the first audio signal parts in sample position LframeThe sample of sample at-i- Δ
Value, wherein for each of subdivision candidate item and multiple correlations of second subdivision in multiple subdivision candidate items
Correlation, Δ instruction number and depend on the subdivision candidate item.
The sound of beginning of the pitch adaptation overlapping for compensating first good decoding frame after possibly being present at frame loss
The pitch between pitch at high and end with the TD PLC frame hidden is poor.Signal operates in the domain LPC, to be closed using LPC
The signal smoothly constructed at the end of algorithm at filter.In the domain LPC, have by cross correlation as described below to find
The moment of highest similitude, and the pitch of signal is from last pitch lag TcSlowly develop into new pitch lag TgTo keep away
Exempt from unexpected change in pitch.
It is overlapped in the following, it is described that being adapted to according to the pitch of specific embodiment.
It can for example be realized as follows according to the device or method of such embodiment:
The hiding signal s (0 about preemphasis is calculated separately using Hamming Cosine Window:Lframe- 1) and the first good frame s
(Lframe:2Lframe- 1) 16 rank LPC parameter AconcAnd Agood, Hamming Cosine Window is, for example, following form:
Wherein, for the frame length with 480 samples, x1=200 and x2=40.
Fig. 2 shows this Hamming Cosine Windows according to the embodiment.The shape of window can be for example so that believe in analysis
The mode that there is the last sample of signal of number part highest to influence designs.
Interpolation is carried out in the domain LSP obtains A=0.5Aconc+0.5·Agood。
The LPC residual signal of concealment frames is calculated using A:
With the LPC residual signal of the first good frame:
Find moment x0, it indicates the maximum comparability between the decline of concealment frames and the decline of good frame, x1
It is 2Lframe-1。
Fig. 3 shows the concealment frames and good frame according to such embodiment.
Obtain x0It is to be completed by maximizing normalized cross correlation:
In general, normalization is completed at the end of correlation:For example, finding pitch value in pitch search
When be normalized after correlation.
Normalization is completed, during correlation to resist the energy fluctuation between signal.For complexity reason, normalization
Item is calculated according to update scheme.Only for initial value
Wherein Δ=0, such as complete dot product can be calculated.For the next increment of Δ, this can be for example updated to
It is as follows:
normΔ=normΔ-1+r(Lframe-Tg-Δ)2-r(Lframe-Δ)2, Δ=1...Tc
In order to make pitch lag from last pitch lag Tc(x0) slowly develop into new pitch lag Tg(x1), it is necessary to it sets
Moment label mark therebetween is set, wherein:
mark0=x0
marknrOfMarkers=x1
If nrOfMarkers is lower than 1 or is higher than 12, algorithm changeover to energy damping.Otherwise, if δ > 0 and Tc<
TgOr δ < 0 and Tc> fg, wherein
δ=x1-(x0+nrOfMarkers·Tc)
And
Calculate label as follows from left to right:
Otherwise, label is constructed from right to left:
It should be noted that nrOfMarkers is that all marker numbers subtract 1.Alternatively, indicate in different ways,
NrOfMarkers is that the quantity of institute's marked sample position subtracts 1, because of x0=mark0And x1=marknrOfMarkersIt is also mark
Remember sample position.For example, having 5 marker samples positions, i.e. mark if nrOfMarkers=40、mark1、mark2、
mark3And mark4,
For composite signal, (cutting-out) input segment is cut out by adding window and around moment label mark to be arranged
(segment is deviated in time to concentrate on moment label).It is non-overlapping good in order to slowly be smoothed to from hiding signal shape
Good signal, segment by be two not laps linear combination:That is, the latter end of concealment frames and the end portion of good frame
Point.Hereinafter referred to as prototype sigfirstAnd siglast。
The length len of prototype is twice of minimum mark distance -1, and to prevent in overlapping addition synthetic operation, energy may
Increase.If the distance between two labels are not in TcAnd TgBetween, then it will lead to boundary and go wrong.(therefore, specific
In embodiment, algorithm can for example stop in these cases, and can for example be switched to energy damping.Energy is described below
Amount damping).
So that by x0And x1It is arranged in sigfirstAnd siglastMidpoint on mode cut out from pumping signal r (x)
With length TcAnd TgThe prototype (step 1) in referring to fig. 4.Then, by prototype recycle extend, with reach length len (referring to
Step 2) in Fig. 4.Then, the adding window (step 3) in referring to fig. 4, to avoid overlapping region is carried out to prototype using Hamming window
In artifact.
Prototype (the step 4) in referring to fig. 4 of label i is calculated as follows:
sigi=(1- α) sigfirst+α·siglast
Wherein
Then, prototype is arranged at correspondence markings position according to midpoint, and these prototypes is added (in referring to fig. 4
Step 5).
Finally, being filtered first with the LPC composite filter with filter parameter A to the signal of building, then use
Deemphasis filter is filtered it, is returned to original signal domain.
It is fade-in fade-out to the signal and original decoded signal, to prevent the artifact on frame boundaries.
Fig. 4 shows the generation of two prototypes according to such embodiment.
For security reasons, energy damping for example as described below should be applied to signal of being fade-in fade-out, is restored with eliminating
The increased risk of energy height in frame.
X is directed to about above-mentioned0And x1Prototype cut out, x0And x1It is time point, when two residual signals have
When highest similitude, for x0And x1Prototype sigfirstAnd siglastLength len=" twice of the minimum mark distance-having
1".Therefore, length always odd number, this makes sigfirstAnd siglastThere is a midpoint.(concealment frames) had into length now
For TcResidual signals and (good frame) have length be TgResidual signals be arranged as so that x0Positioned at sigfirstMidpoint
On, and make x1Positioned at siglastMidpoint on.Later, these residual signals can be recycled and is extended to fill from sigfirst
And siglast1 Dao len all samples.
In the following, it is described that excitation overlapping according to the embodiment.
Fig. 1 c show according to another embodiment for improving from the concealing audio signal section of audio signal to audio
The device 200 of the conversion of the subsequent audio signal parts of signal.The device of Fig. 1 c realizes excitation overlapping design.
The device 200 of Fig. 1 c is the specific embodiment of the device 10 of Fig. 1 a.The processor 210 of Fig. 1 c is the processor of Fig. 1 a
11 specific embodiment.The output interface 220 of Fig. 1 c is the specific embodiment of the output interface 12 of Fig. 1 a.
In figure 1 c, processor 210 can for example be configured as generating the first extension signal section according to the first subdivision,
So that the first extension signal section is different from the first audio signal parts, and the first extension signal section is had than first
The more samples of sample possessed by subdivision.
In addition, the processor 210 of Fig. 1 c can for example be configured with the first extension signal section and use the second sound
Frequency signal section generates decoding audio signal parts.
According to embodiment, processor 210 is configured as by the first extension signal section and the second audio signal parts
Execution is fade-in fade-out to generate decoding audio signal parts, to obtain signal section of being fade-in fade-out.
In embodiment, processor 210 can for example be configured as generating the first son according to the first audio signal parts
Part, so that the length of the first subdivision is equal to the pitch lag (T of the first audio signal partsc)。
According to embodiment, processor 210 can for example be configured as generating the first extension signal section, so that the first extension
The sample size that the quantity of the sample of signal section is equal to the pitch lag of the first audio signal parts adds the second audio
Quantity (the T of the sample of signal sectioncThe sample size of+the second audio signal parts).
In embodiment, processor 210 can be for example configured as according to concealing audio signal section and according to multiple
Filter coefficient determines the first audio signal parts, and plurality of filter coefficient depends on concealing audio signal section.This
Outside, processor 210 can for example be configured as determining the second audio according to subsequent audio signal parts and multiple filter coefficients
Signal section.
According to embodiment, processor 210 can be for example including filter.In addition, processor 210 can be for example configured as
Apply the filter with filter coefficient to obtain the first audio signal parts concealing audio signal section.In addition, processing
Device 210 can for example be configured as applying the filter with filter coefficient to obtain the second sound subsequent audio signal parts
Frequency signal section.
In embodiment, the filter coefficient of multiple filter coefficients may, for example, be the linear pre- of linear prediction filter
Survey coding parameter.
According to embodiment, processor 210 can for example be configured as applying by following formula concealing audio signal section
The Cosine Window of definition hides windowing signal part to obtain.
Processor 210 can for example be configured as determining multiple filter coefficients according to hiding windowing signal part, wherein x
And x1And x2Each of be sample position in multiple sample positions.
Fig. 5 is shown to be overlapped according to the excitation of such embodiment.
Realize that the device of excitation overlapping is repeating to be faded between decoded signal in excitation domain in the forward direction of concealment frames
It fades out, with slowly smooth between two signals.
It can for example be realized as follows according to the device or method of such embodiment:
Firstly, as done in pitch adaptation method of superposition, using Hamming Cosine Window to the preemphasis of previous frame
Latter end carries out 16 rank lpc analysis (referring to the step 1) in Fig. 5.
Using LPC filter to obtain the pumping signal of concealment frames and the pumping signal of the first good frame (referring in Fig. 5
Step 2).
In order to construct recovery frame, the last Tc sample of the excitation of concealment frames is by preceding to repeatedly to create on full frame length
(referring to the step 3) in Fig. 5.This will be used for Chong Die with the first good frame.
The excitation of extension and the excitation of the first good frame are fade-in fade-out (referring to the step 4) in Fig. 5.
Then, to the signal application LPC synthesis of being fade-in fade-out with the last preemphasis sample that storage content is concealment frames
(referring to the step 5) in Fig. 5, with the conversion between smooth concealment frames and the first good frame.
Finally, (referring to the step 6) in Fig. 5, signal is returned to original to composite signal application deemphasis filter
In domain.
It is fade-in fade-out the signal and original decoded signal that newly construct (referring to the step 7) in Fig. 5, to prevent frame side
Pseudomorphism at boundary.
In the following, it is described that energy damping according to the embodiment.
Fig. 1 d shows embodiment, wherein the first audio signal parts are concealing audio signal sections, wherein the second audio
Signal section is subsequent audio signal parts.
The device 300 of Fig. 1 d is the specific embodiment of the device 10 of Fig. 1 a.The processor 310 of Fig. 1 d is the processor of Fig. 1 a
11 specific embodiment.The output interface 320 of Fig. 1 d is the specific embodiment of the output interface 12 of Fig. 1 a.
The processor 310 of Fig. 1 d can for example be configured to determine that (it is for the first subdivision of concealing audio signal section
First subdivision of the first audio signal parts) so that the first subdivision includes the one or more of concealing audio signal section
Sample but include less sample compared with concealing audio signal section, and makes each sample of the sample of the first subdivision
Position be in concealing audio signal section do not include any sample in the first subdivision any sample position it is subsequent.
In addition, the processor 310 of Fig. 1 d can for example be configured to determine that the third subdivision of subsequent audio signal parts,
So that third subdivision includes one or more samples of subsequent audio signal parts but wraps compared with subsequent audio signal parts
Less sample is included, and each sample position of each sample of third subdivision is made to be in subsequent audio signal parts
It does not include the subsequent of any sample position of any sample in third subdivision.
In addition, the processor 310 of Fig. 1 d can for example be configured to determine that the second sub-portion (its of subsequent audio signal parts
For the second subdivision of the second audio signal parts) so that not including in third subdivision in subsequent audio signal parts
Any sample includes in the second subdivision of subsequent audio signal parts.
In the embodiment according to Fig. 1 d, processor 310 can for example be configured as from concealing audio signal section
The first peak value sample is determined in the sample of one subdivision, so that the sample value of the first peak value sample is believed more than or equal to concealing audio
Any other sample value of any other sample of first subdivision of number part.The processor 310 of Fig. 1 d can for example be matched
It is set to from the sample of the second subdivision of subsequent audio signal parts and determines the second peak value sample, so that the second peak value sample
Sample value is greater than or equal to any other sample value of any other sample of the second subdivision of subsequent audio signal parts.This
Outside, the processor 310 of Fig. 1 d can for example be configured as determining from the sample of the third subdivision of subsequent audio signal parts
Third peak value sample, so that the sample value of third peak value sample is greater than or equal to the third subdivision of subsequent audio signal parts
Any other sample value of any other sample.
When meeting condition, the processor 310 of Fig. 1 d can for example be configured as modifying subsequent audio signal portion
Each sample value of leading each sample as the second peak value sample in point, to generate decoding audio signal parts.
The sample value that the condition may, for example, be the second peak value sample is greater than the sample value and the second peak of the first peak value sample
The sample value for being worth sample is greater than the sample value of third peak value sample.
Alternatively, the condition may, for example, be between the sample value of the second peak value sample and the sample value of the first peak value sample
First ratio is greater than the second ratio between first threshold and the sample value of the second peak value sample and the sample value of third peak value sample
Rate is greater than second threshold.
According to embodiment, the sample value which may, for example, be the second peak value sample is greater than the sample of the first peak value sample
Value and the sample value of the second peak value sample are greater than the sample value of third peak value sample.
In embodiment, which may, for example, be the first ratio greater than first threshold and the second ratio is greater than the second threshold
Value.
According to embodiment, first threshold can be greater than 1.1, and second threshold can be greater than 1.1.
In embodiment, first threshold can be for example equal to second threshold.
According to embodiment, when meeting condition, processor 310 can for example be configured as repairing according to the following formula
Change each sample value of leading each sample as the second peak value sample in subsequent audio signal parts:
smodified(Lframe+i)=s (Lframe+i) αi
Wherein, Lframe indicates any other sample for subsequent audio signal parts in subsequent audio signal parts
Any other sample position for be leading sample sample position,
Wherein, Lframe+i is the integer for indicating the sample position of i+1 sample of subsequent audio signal parts,
Wherein, 0≤i≤Imax-1, wherein ImaxThe sample position of -1 the second peak value sample of instruction,
Wherein, s (Lframe+i) is the i+1 sample of the subsequent audio signal parts before being modified by processor 310
Sample value,
Wherein, smodifiedIt (Lframe+i) is by the i+1 of the modified subsequent audio signal parts of processor 310
The sample value of sample,
Wherein, 0 < αi< 1.
In embodiment,
Wherein, EcmaxIt is the sample value of the first peak value sample, wherein EmaxIt is the sample value of the second peak value sample, and its
Middle EgmaxIt is the sample value of third peak value.
According to embodiment, when meeting condition, processor 310 can for example be configured as coming according to the following formula
Modify two or more subsequent samples as the second peak value sample in multiple samples of subsequent audio signal parts
The sample value of each sample, to generate decoding audio signal parts:
smodified(Imax+k)=s (Imax+k) αi.
Wherein, Imax+k is the integer for indicating the sample position of max+k+1 sample of I of subsequent audio signal parts.
Fig. 6 is another diagram of concealment frames according to the embodiment and good frame.Especially, Fig. 6 shows concealing audio letter
Number part, subsequent audio signal parts, the first subdivision, the second subdivision and third subdivision.
Energy damping is used to eliminate the high-energy in the lap of the signal between last concealment frames and the first good frame
Increase.This is completed by slowly damping signal area to peak amplitude values.
Method according to the embodiment can be realized for example as follows:
Peak swing value is found in following item:
The last T of the last previous concealment frames of the previous concealment frames of οcSample:Ecmax
Last T in the first good frame of οgSample:Egmax
ο and, the sample between these regions:Emax
EcmaxIt is the first peak value sample, EmaxIt is the second peak value sample, and EgmaxIt is third peak value sample.
If Ecmax< Emax> Egmax, then the decoded signal in the first good frame will be damped.
In other examples, if meeting following formula, the first good frame will be damped:
For example, 1.1 < thresholdValue1 < 4 and 1.1 < thresholdValue2 < 4.
The first part of decoded signal will be damped as follows:
Wherein ImaxIt is EmaxIndex, and
Second part will be damped as follows:
Wherein
It in a preferred embodiment, for security reasons, can be for example to signal applied energy damping of being fade-in fade-out, to eliminate
Restore the increased risk of energy height in frame.
Now, the combination of different improved conversion design according to the embodiment is provided.
Fig. 7 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal
Subsequent audio signal parts conversion system.
The system include switching module 701, for realizing the energy damping above with reference to described in Fig. 1 d device 300, with
And the device 100 for realizing the adaptation overlapping of the pitch above with reference to described in Fig. 1 b.
Switching module 701 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection
A device in device 300 for realizing energy damping and the device 100 for realizing pitch adaptation overlapping, for producing
Raw decoding audio signal parts.
Fig. 7 b show according to another embodiment for improving from the concealing audio signal section of audio signal to audio
The system of the conversion of the subsequent audio signal parts of signal.
The system include switching module 702, for realizing the energy damping above with reference to described in Fig. 1 d device 300, with
And the device 200 for realizing the excitation overlapping above with reference to described in Fig. 1 c.
Switching module 702 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection
A device in device 300 for realizing energy damping and the device 200 for realizing excitation overlapping, for generating solution
Code audio signal parts.
Fig. 7 c show according to another embodiment for improving from the concealing audio signal section of audio signal to audio
The system of the conversion of the subsequent audio signal parts of signal.
The system includes switching module 703, for realizing the device of the adaptation overlapping of the pitch above with reference to described in Fig. 1 b
100 and for realizing above with reference to described in Fig. 1 c excitation overlapping device 200.
Switching module 703 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection
For realizing the device 100 of pitch adaptation overlapping and for realizing a device in the device 200 of excitation overlapping, for producing
Raw decoding audio signal parts.
Fig. 7 d show according to yet another embodiment for improving from the concealing audio signal section of audio signal to audio
The system of the conversion of the subsequent audio signal parts of signal.
The system includes switching module 701, the device 300 for realizing the energy damping above with reference to described in Fig. 1 d, use
In the device 100 for realizing the adaptation overlapping of the pitch above with reference to described in Fig. 1 b and for realizing above with reference to described in Fig. 1 c
Motivate the device 200 of overlapping.
Switching module 701 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection
For realizing the device 300 of energy damping, for realizing the device 100 of pitch adaptation overlapping and for realizing excitation overlapping
Device 200 in a device, with for generate decoding audio signal parts.
According to embodiment, switching module 704 can for example be configured to determine that concealing audio signal frame and subsequent audio letter
Whether at least one of number frame includes voice.In addition, switching module 704 can be for example configured as:If concealing audio is believed
Number frame and subsequent audio signal frame do not include voice, then device 300 of the selection for realizing energy damping generates decoding audio
Signal section.
In embodiment, switching module 704 can be for example configured as:According to the frame length of subsequent audio signal frame and
According at least one of the pitch of concealing audio signal section or the pitch of subsequent audio signal parts, to select for real
Existing pitch is adapted to the device 100 of overlapping, for realizing the device 200 of excitation overlapping and for realizing the device of energy damping
One device in 300, to decode audio signal parts for generating, wherein subsequent audio signal parts are subsequent audios
The audio signal parts of signal frame.
Fig. 7 e show according to another embodiment for improving from the concealing audio signal section of audio signal to audio
The system of the conversion of the subsequent audio signal parts of signal.
As in Fig. 7 c, the system of Fig. 7 e includes switching module 703, for realizing the sound above with reference to described in Fig. 1 b
The device 100 that high adaptation is overlapped and the device 200 for realizing the excitation overlapping above with reference to described in Fig. 1 c.
Switching module 703 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection
For realizing the device 100 of pitch adaptation overlapping and for realizing one in the device 200 of excitation overlapping, for generating solution
Code audio signal parts.
In addition, the system of Fig. 7 e further includes the device 300 for realizing the energy damping above with reference to described in Fig. 1 d.
The switching module 703 of Fig. 7 e can be for example configured as according to concealing audio signal section and according to subsequent audio
Signal section, select for realizing pitch adaptation overlapping device 100 and for realizing excitation overlapping device 200 in described in
One device, to generate intermediate audio signal parts.
In the embodiment of Fig. 7 e, processing intertone can be for example configured as realizing the device 300 of energy damping
Frequency signal section is to generate decoding audio signal parts.
Now, specific embodiment is described.Particularly, the specific reality for switching module 701,702,703 and 704 is provided
Existing design.
For example, the combined first embodiment for providing different improved conversion designs can for example be used for any transform domain and compile
Decoder:
The first step is to detect whether signal is such as to have the voice of prominent pitch (for example, clean speech item, having
The voice of ambient noise or voice with musical background).
If signal is such voice,:
ο finds the pitch T in last concealment framesc
ο finds the pitch T in the first good frameg
If ο increases with the energy in last concealment frames lap,
If the pitch of ■ good frame differs more than three sample with hiding pitch,
→ execute and restore filter
■ is otherwise
→ execute energy damping
Otherwise
→ execute energy damping
If recovery filter selected above,
If hiding pitch TcOr good pitch TgHigher than frame length Lframe, then
→ execute energy damping
Otherwise, if hiding pitch or good pitch are higher than half frame length and normalized cross correlation value xCorr is small
In threshold value, then
→ execute excitation overlapping
Otherwise, if hiding pitch or good pitch are lower than half frame length,
→ using pitch adaptation overlapping
For example, firstly, test concealment frames are with the presence or absence of voice (for example, can find out whether voice is deposited according to concealing technology
).Later, for example, it is also possible to for example test good frame using normalized crosscorrelation value xCorr with the presence or absence of voice.
For example, above-mentioned lap can be the second subdivision shown in such as Fig. 6, it means that lap be from
To sample, " frame length subtracts T to first sampleg" good frame.
Now, it provides and the combined second embodiment of different improved conversion designs is provided.Such second embodiment
It can be for example used for AAC-ELD codec, two of them hiding frames error method to be time domain approach and frequency domain method.
Time domain approach is the frame that loss is synthesized using pitch extrapolation, referred to as TD PLC (referring to [8]).
Frequency domain method is the prior art hidden method (referred to as noise substitution (NS)) for AAC-ELD codec,
Copy is scrambled using the symbol of previous good frame.
In a second embodiment, the first division (division) is made according to latter hidden method:
If last frame is using TD PLC come hiding:
ο finds the pitch in the first good frame
If ο increases with the energy in last concealment frames lap,
If the pitch of ■ good frame differs more than three sample with hiding pitch,
→ execute and restore filter
■ is otherwise
→ execute energy damping
If last frame is using NS come hiding,
→ execute energy damping
In addition, in a second embodiment, following second being carried out in restoring filter and is divided:
If hiding pitch Tc(pitch in last frame being hidden) or good pitch Tg(the sound in the first good frame
It is high) it is higher than frame length Lframe
→ execute energy damping
If hiding pitch or good pitch is higher than half frame length and normalized cross correlation value xCorr is less than threshold value
→ execute excitation overlapping
If hiding pitch or good pitch being lower than half frame length,
→ using pitch adaptation overlapping.
Multiple embodiments have been provided.
According to embodiment, provides and a kind of lost with subsequent in hiding for improving the hiding lost frames of transform domain coding signal
Lose the filter of the conversion between one or more frames of the transform domain coding signal of frame.
In embodiment, filter for example can also be configured as described above.
A kind of transform domain decoding according to embodiment, including providing the filter including according to one of above-described embodiment
Device.
Further it is provided that a kind of method executed by transform domain decoder as described above.
Further it is provided that a kind of for executing the computer program of method as described above.
Although describing some aspects in the context of device, it will be clear that these aspects are also represented by
The description of corresponding method, wherein block or apparatus and method for step or the feature of method and step are corresponding.Similarly, it is walked in method
Aspect described in rapid context also illustrates that the description of the item to corresponding blocks or corresponding intrument or feature.It can be by (or making
With) hardware device (for example, microprocessor, programmable calculator or electronic circuit) executes some or all method and steps.?
In some embodiments, one or more method and steps in most important method and step can be executed by this device.
According to certain realizations require, the embodiment of the present invention can with hardware or software realization, or at least partly with
Hardware is at least partly implemented in software.The digital storage media for being stored thereon with electronically readable control signal can be used
(for example, floppy disk, DVD, blue light, CD, ROM, PROM, EPROM, EEPROM or flash memory) executes realization, electronically readable control
Signal cooperates (or can cooperate) with programmable computer system thereby executing correlation method.Therefore, stored digital is situated between
Matter can be computer-readable.
It according to some embodiments of the present invention include the data medium with electronically readable control signal, the electronically readable control
Signal processed can cooperate with programmable computer system thereby executing one of method described herein.
In general, the embodiment of the present invention may be implemented as the computer program product with program code, program code
It can be used to one of execution method when computer program product is run on computers.Program code can be for example stored in
In machine-readable carrier.
Other embodiments include the computer program being stored in machine-readable carrier, and the computer program is for executing sheet
One of method described in text.
In other words, therefore the embodiment of the method for the present invention is the computer program with program code, which uses
In one of execution method described herein when computer program is run on computers.
Therefore, another embodiment of the method for the present invention be thereon record have computer program data medium (or number
Storage medium or computer-readable medium), the computer program is for executing one of method described herein.Data medium, number
The medium of word storage medium or record is usually tangible and/or non-transitory.
Therefore, another embodiment of the method for the present invention is to indicate the data flow or signal sequence of computer program, the meter
Calculation machine program is for executing one of method described herein.Data flow or signal sequence can for example be configured as logical via data
Letter connection (for example, via internet) transmission.
Another embodiment include one of be configured as or be adapted for carrying out method described herein processing unit (for example,
Computer or programmable logic device)
Another embodiment includes the computer for being equipped with computer program thereon, and the computer program is for executing this paper institute
One of method stated.
It according to another embodiment of the present invention include being configured as to receiver (for example, electronically or with optics side
Formula) transmission computer program device or system, the computer program is for executing one of method described herein.Receiver can
To be such as computer, mobile device, storage equipment.Device or system can be for example including for transmitting calculating to receiver
The file server of machine program.
In some embodiments, programmable logic device (for example, field programmable gate array) can be used for executing this paper
Some or all of described function of method function.In some embodiments, field programmable gate array can be with micro- place
Device cooperation is managed to execute one of method described herein.In general, method is preferably executed by any hardware device.
Device described herein can be used hardware device or use computer or use hardware device and calculating
The combination of machine is realized.
Method described herein can be used hardware device or use computer or use hardware device and calculating
The combination of machine executes.
Above-described embodiment is merely illustrative the principle of the present invention.It should be understood that:It is as described herein arrangement and
The modification and variation of details will be apparent others skilled in the art.Accordingly, it is intended to only by appended patent right
The range that benefit requires is to limit rather than by by describing and explaining given detail and limit to embodiment hereof.
Bibliography:
[1]Philippe Gournay:"Improved Frame Loss Recovery Using Closed-Loop
Estimation of Very Low Bit Rate Side Information ", Interspeech 2008, Brisbane,
Australia, 22-26September, 2008.
[2] Mohamed Chibani, Roch Lefebvre, Philippe Gournay:"Resynchronization
Of the Adaptive Codebook in a Constrained CELP Codec after a frame erasure ",
2006International Conference on Acoustics, Speech and Signal Processing
(ICASSP ' 2006), Toulouse, FRANCE March 14-19,2006.
[3] S.-U.Ryu, E.Choy, and K.Rose, " Encoder assisted frame loss
Concealment for MPEG-AAC decoder ", ICASSP IEEE Int.Conf.Acoust.Speech Signal
Process Proc., vol.5, pp.169-172, May 2006.
[4]ISO/IEC 14496-3:2005/Amd 9:2008:Enhanced low delay AAC, available
at:
http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.h
tm?Csnumber=46457
[5] J.Lecomte, et al, " Enhanced time domain packet loss concealment in
Switched speech/audio codec ", submitted to IEEE ICASSP, Brisbane, Australia,
Apr.2015.
[6] E.Moulines and J.Laroche, " Non-parametric techniques for pitch-
Seale and time-scale modification of speech ", Speech Communication, vol.16,
Pp.175-205,1995.
[7]European Patent EP 363233 B1:"Method and apparatus for speech
synthesis by wave form overlapping and adding”.
[8]International Patent Application WO 2015063045 A1:"Audio Decoder
and Method for Providing a Decoded Audio Information using an Error
Concealment Modifying a Time Domain Excitation Signal”.
[9] Schnell, M.;Schmidt, M.;Jander, M.;Albert, T.;Geiger, R.;Ruoppila, V.;
Ekstrand, P.;Grill, B.,, MPEG-4enhanced low delay AAC-a new standard for high
Quality communication ", Audio Engineering Society:125th Audio Engineering
Society Convention 2008;October 2-5,2008, San Francisco, USA.
Claims (43)
1. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The device (10 of the conversion divided;100;200;300), wherein described device (10;100;200;300) include:
Processor (11;110;210;310) it, is configured as according to the first audio signal parts and according to the second audio signal portion
Divide the decoding audio signal parts for generating the audio signal, wherein first audio signal parts depend on the hiding sound
Frequency signal section, and wherein second audio signal parts depend on the subsequent audio signal parts, and
Output interface (12;120;220;320), for exporting the decoding audio signal parts,
Wherein, in first audio signal parts, second audio signal parts and the decoding audio signal parts
Each include multiple samples, wherein first audio signal parts, second audio signal parts and the solution
Each of multiple samples of code audio signal parts sample is by the sample position and sample value in multiple sample positions
Come what is defined, wherein the multiple sample position is sorted, so that for the first sample position in the multiple sample position
With right, first sample of each of the second sample positions different from the first sample position in the multiple sample position
This position is the subsequent or leading of second sample position,
Wherein, the processor (11;110;210;310) it is configured to determine that the first sub-portion of first audio signal parts
Point, so that first subdivision includes less sample compared with first audio signal parts, and
Wherein, the processor (11;110;210;310) it is configured with the first sub-portion of first audio signal parts
Divide and generates the solution using the second subdivision of second audio signal parts or second audio signal parts
Code audio signal parts, so that for each sample in two or more samples of second audio signal parts, institute
The sample position for stating the sample in two or more samples of the second audio signal parts is believed equal to the decoding audio
The sample position of one sample of number part, and make in two or more samples of second audio signal parts
The sample value of the sample is different from the sample value of one sample of the decoding audio signal parts.
2. the apparatus according to claim 1 (100),
Wherein, the processor (110) is configured as:It is determined as the of the second subdivision of second audio signal parts
Diarch signal section, so that second subdivision includes less sample compared with second audio signal parts, with
And
Wherein, the processor (110) is configured as in the following manner to determine prototype signal part among one or more:
It will be combined as the first prototype signal part of first subdivision and second prototype signal part, to determine
State the intermediate prototype signal in each of prototype signal part part among one or more;
Wherein, the processor (110) is configured with first prototype signal part, using in one or more of
Between prototype signal part and the decoding audio signal parts are generated using second prototype signal part.
3. the apparatus of claim 2 (100), wherein the processor (110) is configured as:By described in combination
First prototype signal part, one or more of intermediate prototype signal parts and second prototype signal part generate
The decoding audio signal parts.
4. device (100) according to claim 2 or 3,
Wherein, the processor (110) is configured to determine that three or more marker samples positions, wherein described three or more
Each of multiple marker samples positions marker samples position is first audio signal parts and second audio letter
The sample position of at least one of number part,
Wherein, the processor (110) be configured as selecting it is in second audio signal parts, for second audio
It is all the sample position of subsequent sample for any other sample position of any other sample of signal section, as described
The final sample position of three or more marker samples positions,
Wherein, the processor (110) is configured as:Pass through the first subdivision according to first audio signal parts and institute
The correlation stated between the second subdivision of the second audio signal parts selects sample bit from first audio signal parts
It sets, determines the beginning sample position of three or more marker samples positions,
Wherein, the processor (110) is configured as:According to the beginning sample bit of three or more marker samples positions
It sets and according to the final sample position of three or more marker samples positions, three or more determining described labels
One or more intermediate sample positions of sample position, and
Wherein, the processor (110) is configured as:By in each for one or more of intermediate sample positions
Between sample position, according to the intermediate sample position by first prototype signal part and second prototype signal part into
Row combination to determine the intermediate prototype signal part in one or more of intermediate prototype signal parts, determine it is one or
Multiple intermediate prototype signal parts.
5. device (100) according to claim 4,
Wherein, the processor (110) is configured as:By in each for one or more of intermediate sample positions
Between sample position, according to the following formula by first prototype signal part and second prototype signal part be combined come
It determines the intermediate prototype signal part in one or more of intermediate prototype signal parts, determines one or more of centres
Prototype signal part:
sigi=(1- α) sigfirst+α·siglast
Wherein
Wherein, i is integer, and i >=1,
Wherein, nrOfMarkers is that the quantity of three or more marker samples positions subtracts 1,
Wherein, sigiIt is i-th of intermediate prototype signal part of one or more of intermediate prototype signal parts,
Wherein, sigfirstIt is first prototype signal part,
Wherein, siglastIt is second prototype signal part.
6. device (100) according to claim 4 or 5,
Wherein, the processor (110) be configured as any of according to the following formula come determine it is described three or more
One or more intermediate sample positions of marker samples position:
Or
Wherein
Wherein δ=x1-(x0+nrOfMarkers·Tc),
Wherein
Wherein, i is integer, and i >=1,
Wherein, nrOfMarkers is that the quantity of three or more marker samples positions subtracts 1,
Wherein, markiIt is i-th of intermediate sample position of three or more marker samples positions,
Wherein, marki-1It is (i-1)-th intermediate sample position of three or more marker samples positions,
Wherein, marki+1It is the i+1 intermediate sample position of three or more marker samples positions,
Wherein, x0It is the beginning sample position of three or more marker samples positions,
Wherein, x1It is the final sample position of three or more marker samples positions,
Wherein, TcIndicate pitch lag.
7. the device according to any one of claim 4 to 6 (100),
Wherein, the processor (110) is configured as:According to every in multiple subdivision candidate items of first audio signal
Multiple correlations of a subdivision and second subdivision of second audio signal parts select the first audio letter
Subdivision in multiple subdivision candidate items of number part as first prototype signal part,
Wherein, the processor (110) is configured as:Select it is in multiple samples of first prototype signal part, for
All it is leading sample position for any other sample position of any other sample of first prototype signal part, makees
For the beginning sample position of three or more marker samples positions.
8. device (100) according to claim 7, wherein the processor (110) is configured as:Select the sub-portion
Dividing in candidate item with the correlation of second subdivision there is the subdivision of the highest correlation in the multiple correlation to come
As first prototype signal part.
9. device (100) according to claim 7 or 8,
Wherein, the processor (110) is configured as according to the following formula to determine for each of the multiple correlation
The correlation of correlation:
Wherein, LframeIndicate second audio signal parts equal with the sample size of first audio signal parts
Sample size,
Wherein, r (2Lframe- i) in instruction second audio signal parts in sample position 2LframeThe sample of the sample at the place-i
This value,
Wherein, r (Lframe- i- Δ) in instruction first audio signal parts in sample position LframeSample at-i- Δ
Sample value,
Wherein, in the multiple subdivision candidate item subdivision candidate in multiple correlations of second subdivision
Each correlation, Δ instruction number and depend on the subdivision candidate item.
10. the device according to any one of claim 4 to 9 (100),
Wherein, the processor (110) is configured as filtering according to the concealing audio signal section and according to multiple thirds
Device coefficient determines first audio signal parts, wherein the multiple third filter coefficient depends on the concealing audio
Signal section and the subsequent audio signal parts, and
Wherein, the processor (110) is configured as according to the subsequent audio signal parts and the multiple third filter
Coefficient determines second audio signal parts.
11. device (100) according to claim 10,
Wherein, the processor (110) includes filter,
Wherein, the processor (110), which is configured as applying the concealing audio signal section, has the third filter
The filter of coefficient to obtain first audio signal parts, and
Wherein, the processor (110), which is configured as applying the subsequent audio signal parts, has the third filter
The filter of coefficient is to obtain second audio signal parts.
12. device described in 0 or 11 (100) according to claim 1,
Wherein, the processor (110) is configured as determining multiple first filter systems according to the concealing audio signal section
Number,
Wherein, the processor (110) is configured as determining multiple second filter systems according to the subsequent audio signal parts
Number,
Wherein, the processor (110) is configured as according to one or more filter systems in the first filter coefficient
It counts with the combinations of one or more filter coefficients in the second filter coefficient and determines the third filter coefficient
Each of filter coefficient.
13. device (100) according to claim 12, wherein the multiple first filter coefficient, the multiple second
Filter coefficient in filter coefficient and the multiple third filter coefficient is the linear prediction of linear prediction filter
Coding parameter.
14. device (100) according to claim 12 or 13,
Wherein, the processor (110) is configured as determining each filter in the third filter coefficient according to the following formula
Wave device coefficient:
A=0.5Aconc+0.5·Agood
Wherein, A indicates the filter coefficient value of the filter coefficient,
Wherein, AconcIndicate the coefficient value of the filter coefficient in the multiple first filter coefficient, and
Wherein, AgoodIndicate the coefficient value of the filter coefficient in the multiple second filter coefficient.
15. device described in any one of 2 to 14 (100) according to claim 1,
Wherein, more than the processor (110) is configured as applying the concealing audio signal section and is defined by following formula
Porthole hides windowing signal part to obtain:
Wherein, the processor (110) be configured as to the subsequent audio signal parts using the Cosine Window to obtain after
After windowing signal part,
Wherein, the processor (110) is configured as determining the multiple first filtering according to the hiding windowing signal part
Device coefficient,
Wherein, the processor (110) is configured as determining the multiple second filtering according to the subsequent windowing signal part
Device coefficient, and
Wherein, x, x1And x2Each of be sample position in the multiple sample position.
16. the apparatus according to claim 1 (200),
Wherein, the processor (210) is configured as generating the first extension signal section according to first subdivision, so that institute
It is different from first audio signal parts to state the first extension signal section, and the first extension signal section is made to compare institute
Stating the first subdivision has more samples,
Wherein, the processor (210) is configured with the first extension signal section and is believed using second audio
Number part generates the decoding audio signal parts.
17. device (200) according to claim 16, wherein the processor (210) is configured as by described
One extension signal section and second audio signal parts execution are fade-in fade-out to obtain signal section of being fade-in fade-out, to generate
The decoding audio signal parts.
18. device (200) according to claim 16 or 17, wherein the processor (210) is configured as from described
First subdivision is generated in one audio signal parts, so that the length of first subdivision is believed equal to first audio
The pitch lag of number part.
19. device (200) according to claim 18, wherein the processor (210) is configured as generating described first
Signal section is extended, so that the quantity of the sample of the first extension signal section is equal to the institute of first audio signal parts
The sample size of pitch lag is stated plus the quantity of the sample of second audio signal parts.
20. device described in any one of 6 to 19 (200) according to claim 1,
Wherein, the processor (210) is configured as according to the concealing audio signal section and according to multiple filter systems
Number is to determine first audio signal parts, wherein the multiple filter coefficient depends on the concealing audio signal section
Point, and
Wherein, the processor (210) is configured as according to the subsequent audio signal parts and the multiple filter coefficient
Determine second audio signal parts.
21. device (200) according to claim 20,
Wherein, the processor (210) includes filter,
Wherein, the processor (210), which is configured as applying the concealing audio signal section, has the filter coefficient
Filter to obtain first audio signal parts, and
Wherein, the processor (210), which is configured as applying the subsequent audio signal parts, has the filter coefficient
Filter to obtain second audio signal parts.
22. device (200) according to claim 21, wherein the filter coefficient in the multiple filter coefficient is
The LPC parameters of linear prediction filter.
23. the device according to any one of claim 20 to 22 (200),
Wherein, more than the processor (210) is configured as applying the concealing audio signal section and is defined by following formula
Porthole hides windowing signal part to obtain:
Wherein, the processor (210) is configured as determining the multiple filter system according to the hiding windowing signal part
Number,
Wherein, x, x1And x2Each of be sample position in the multiple sample position.
24. the apparatus according to claim 1 (300),
Wherein, first audio signal parts are the concealing audio signal sections, wherein second audio signal parts
It is the subsequent audio signal parts,
Wherein, the processor (310) is configured to determine that the first subdivision of the concealing audio signal section, as described
First subdivision of the first audio signal parts, so that first subdivision includes one of the concealing audio signal section
Or multiple samples, but include less sample compared with the concealing audio signal section, and make first subdivision
Each sample position of sample be in the concealing audio signal section, do not include any in first subdivision
Any sample position of sample it is subsequent,
Wherein, the processor (310) is configured to determine that the third subdivision of the subsequent audio signal parts, so that described
Third subdivision includes one or more samples of the subsequent audio signal parts, but with the subsequent audio signal parts
Compared to including less sample, and each sample position of each sample of the third subdivision is made to be the subsequent sound
Any sample position in frequency signal section, not including any sample in the third subdivision it is subsequent,
Wherein, the processor (310) is configured to determine that the second subdivision of the subsequent audio signal parts, as described
Second subdivision of the second audio signal parts, so that not being included in the third subdivision in the subsequent audio signal parts
Interior any sample include in the second subdivision of the subsequent audio signal parts,
Wherein, the processor (310) is configured as from the sample of the first subdivision of the concealing audio signal section really
Fixed first peak value sample, so that the sample value of the first peak value sample is greater than or equal to the of the concealing audio signal section
Any other sample value of any other sample of one subdivision, wherein the processor (310) is configured as from described subsequent
The second peak value sample is determined in the sample of second subdivision of audio signal parts, so that the sample value of the second peak value sample
More than or equal to any other sample value of any other sample of the second subdivision of the subsequent audio signal parts, wherein
The processor (310) is configured as determining third peak value from the sample of the third subdivision of the subsequent audio signal parts
Sample, so that the sample value of the third peak value sample is greater than or equal to the third subdivision of the subsequent audio signal parts
Any other sample value of any other sample,
Wherein, when meeting condition, the processor (310) is configured as modifying in the subsequent audio signal parts
, each sample value of leading each sample as the second peak value sample, to generate the decoding audio signal portion
Point,
Wherein, the condition is that the sample value of the second peak value sample is greater than sample value and the institute of the first peak value sample
The sample value for stating the second peak value sample is greater than the sample value of the third peak value sample, or
Wherein, the condition is between the sample value of the second peak value sample and the sample value of the first peak value sample
One ratio is greater than between first threshold and the sample value of the second peak value sample and the sample value of the third peak value sample
Second ratio is greater than second threshold.
25. device (300) according to claim 24, wherein the condition is the sample value of the second peak value sample
Greater than the sample value of the first peak value sample and the sample value of the second peak value sample is greater than the third peak value sample
Sample value.
26. device (300) according to claim 24, wherein the condition is that first ratio is greater than described first
Threshold value and second ratio are greater than the second threshold.
27. device (300) according to claim 26, wherein the first threshold is greater than 1.1, and wherein described the
Two threshold values are greater than 1.1.
28. the device according to claim 26 or 27 (300), wherein the first threshold is equal to the second threshold.
29. the device according to any one of claim 24 to 28 (300),
Wherein, when meeting the condition, the processor (310) be configured as modifying according to the following formula it is described after
After each sample value of leading each sample in audio signal parts, as the second peak value sample:
smodified(Lframe+i)=s (Lframe+i) αi
Wherein, Lframe indicate it is in the subsequent audio signal parts, for the subsequent audio signal parts it is any its
It is the sample position of leading sample for any other sample position of its sample,
Wherein, Lframe+i is the integer for indicating the sample position of i+1 sample of the subsequent audio signal parts,
Wherein, 0≤i≤Imax-1, wherein ImaxThe sample position of -1 instruction the second peak value sample,
Wherein, s (Lframe+i) is by the i+1 of the subsequent audio signal parts before the processor (310) modification
The sample value of a sample,
Wherein, smodifiedIt (Lframe+i) is by the modified subsequent audio signal parts of the processor (310)
The sample value of i+1 sample,
Wherein, 0 < αi< 1.
30. device (300) according to claim 29,
Wherein
Wherein, EcmaxIt is the sample value of the first peak value sample,
Wherein, EmaxIt is the sample value of the second peak value sample,
Wherein, EgmaxIt is the sample value of the third peak value sample.
31. the device according to claim 29 or 30 (300),
Wherein, when meeting the condition, the processor (310) is configured as described to modify according to the following formula
It is in multiple samples of subsequent audio signal parts, as in two or more subsequent samples of the second peak value sample
Each sample sample value, to generate the decoding audio signal parts:
smodified(Imax+k)=s (Imax+k) αi,
Wherein, Imax+k is the integer for indicating the sample position of max+k+1 sample of I of the subsequent audio signal parts.
32. device (10 according to any one of the preceding claims;100;200;300), wherein described device (10;
100;200;It 300) further include hidden unit (8), the hidden unit (8) is configured as to error or loss present frame
It executes and hides, to obtain the concealing audio signal section.
33. device (10 according to claim 32;100;200;300),
Wherein, described device (10;100;200;It 300) further include activation unit (6), the activation unit (6) is configured as examining
Survey whether present frame is lost or malfunction, wherein the activation unit (6) is configured as activating if current frame loss or error
The hidden unit (8) is hidden with executing to present frame.
34. device (10 according to claim 33;100;200;300),
Wherein, the activation unit (6) is configured as:If current frame loss or error, detect the subsequent frame not malfunctioned
Whether reach, and
Wherein, the activation unit (6) is configured as:If current frame loss or error and if the subsequent frame not malfunctioned
It reaches, then activates the processor (8) to generate the decoding audio signal parts.
35. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The method of the conversion divided, wherein the method includes:
Believe according to the first audio signal parts and according to the decoding audio that the second audio signal parts generate the audio signal
Number part, wherein first audio signal parts depend on the concealing audio signal section, and wherein second sound
Frequency signal section depends on the subsequent audio signal parts, and
The decoding audio signal parts are exported,
Wherein, in first audio signal parts, second audio signal parts and the decoding audio signal parts
Each include multiple samples, wherein first audio signal parts, second audio signal parts and the solution
Each of multiple samples of code audio signal parts sample is by the sample position and sample value in multiple sample positions
Come what is defined, wherein the multiple sample position is sorted, so that for the first sample position in the multiple sample position
With right, first sample of each of the second sample positions different from the first sample position in the multiple sample position
This position is the subsequent or leading of second sample position,
Wherein, the first subdivision that the decoding audio signal includes determining first audio signal parts is generated, so that with
First audio signal parts include less sample compared to first subdivision,
Wherein, generating the decoding audio signal parts is the first subdivision using first audio signal parts and makes
It is performed with the second subdivision of second audio signal parts or second audio signal parts, so that for described
Each sample in two or more samples of second audio signal parts, two of second audio signal parts or more
The sample position of the sample in multiple samples is equal to the sample position of a sample of the decoding audio signal parts, and
And the sample value of the sample in two or more samples of second audio signal parts is made to be different from the solution
The sample value of one sample of code audio signal parts.
36. a kind of computer program, for realizing when being executed on computer or signal processor according to claim 35 institute
The method stated.
37. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The system of the conversion divided, wherein the system comprises:
Switching module (701);
The device according to any one of claim 24 to 31 (300), as the device for realizing energy damping
(300), and
The device according to any one of claim 2 to 15 (100) is adapted to the device (100) being overlapped as pitch,
Wherein, the switching module (701) is configured as according to the concealing audio signal section and according to the subsequent sound
Frequency signal section come select for realizing energy damping device (300) and for realizing pitch adaptation overlapping device (100)
In a device, with for generating the decoding audio signal parts.
38. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The system of the conversion divided, wherein the system comprises:
Switching module (702);
The device according to any one of claim 24 to 31 (300), as the device for realizing energy damping
(300), and
Device described in any one of 6 to 23 (200) according to claim 1, as the device for realizing excitation overlapping
(200),
Wherein, the switching module (702) is configured as according to the concealing audio signal section and according to the subsequent sound
Frequency signal section come select for realizing energy damping device (300) and for realizing excitation overlapping device (200) in
One device, for generating the decoding audio signal parts.
39. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The system of the conversion divided, wherein the system comprises:
Switching module (703);
The device according to any one of claim 2 to 15 (100), as the device for realizing pitch adaptation overlapping
(100), and
Device described in any one of 6 to 23 (200) according to claim 1, as the device for realizing excitation overlapping
(200),
Wherein, the switching module (703) is configured as according to the concealing audio signal section and according to the subsequent sound
Frequency signal section is to select to be adapted to the device (100) of overlapping for realizing pitch and for realizing the device (200) of excitation overlapping
In a device, with for generating the decoding audio signal parts.
40. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal
The system of the conversion divided, wherein the system comprises:
Switching module (704);
The device according to any one of claim 2 to 15 (100), as the device for realizing pitch adaptation overlapping
(100),
Device described in any one of 6 to 23 (200) according to claim 1, as the device for realizing excitation overlapping
(200), and
The device according to any one of claim 24 to 31 (300), as the device for realizing energy damping
(300),
Wherein, the switching module (704) is configured as according to the concealing audio signal section and according to the subsequent sound
Frequency signal section come select for realizing pitch be adapted to overlapping device (100), for realizing excitation overlapping device (200),
And for realizing a device in the device (300) of energy damping, for generating the decoding audio signal parts.
41. system according to claim 40,
Wherein, the switching module (704) is configured to determine that in concealing audio signal frame and subsequent audio signal frame at least
Whether one include voice, and
Wherein, the switching module (704) is configured as:If the concealing audio signal frame and the subsequent audio signal frame
Do not include voice, then selects the device (300) for realizing energy damping to generate the decoding audio signal parts.
42. the system according to claim 40 or 41, wherein the switching module (704) is configured as:According to subsequent sound
The frame length of frequency signal frame and according to the pitch of the concealing audio signal section or the subsequent audio signal parts
At least one of pitch selects the device (100) for realizing pitch adaptation overlapping, for realizing the device of excitation overlapping
(200) and for realizing one device in the device (300) of energy damping for generating the decoding audio letter
Number part, wherein the subsequent audio signal parts are the audio signal parts of the subsequent audio signal frame.
43. system according to claim 39,
Wherein, the system also includes the device according to any one of claim 24 to 31 (300) as realizing
The device (300) of energy damping,
Wherein, the switching module (703) is configured as according to the concealing audio signal section and according to the subsequent sound
Frequency signal section is to select to be adapted to the device (100) of overlapping for realizing pitch and for realizing the device (200) of excitation overlapping
In one device, to generate intermediate audio signal parts,
Wherein, the device (300) for realizing energy damping is configured as handling the intermediate audio signal parts to produce
The raw decoding audio signal parts.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16153409.4 | 2016-01-29 | ||
EP16153409 | 2016-01-29 | ||
EPPCT/EP2016/060776 | 2016-05-12 | ||
PCT/EP2016/060776 WO2017129270A1 (en) | 2016-01-29 | 2016-05-12 | Apparatus and method for improving a transition from a concealed audio signal portion to a succeeding audio signal portion of an audio signal |
PCT/EP2017/051623 WO2017129665A1 (en) | 2016-01-29 | 2017-01-26 | Apparatus and method for improving a transition from a concealed audio signal portion to a succeeding audio signal portion of an audio signal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108885875A true CN108885875A (en) | 2018-11-23 |
CN108885875B CN108885875B (en) | 2023-10-13 |
Family
ID=55300366
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201780020242.9A Active CN108885875B (en) | 2016-01-29 | 2017-01-26 | Apparatus and method for improving conversion from hidden audio signal portions |
Country Status (11)
Country | Link |
---|---|
US (1) | US10762907B2 (en) |
EP (1) | EP3408852B1 (en) |
JP (1) | JP6789304B2 (en) |
KR (1) | KR102230089B1 (en) |
CN (1) | CN108885875B (en) |
BR (1) | BR112018015479A2 (en) |
CA (1) | CA3012547C (en) |
ES (1) | ES2843851T3 (en) |
MX (1) | MX2018009145A (en) |
RU (1) | RU2714238C1 (en) |
WO (1) | WO2017129270A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113544773A (en) * | 2019-02-13 | 2021-10-22 | 弗劳恩霍夫应用研究促进协会 | Decoder and decoding method for LC3 concealment including full and partial frame loss concealment |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108492832A (en) * | 2018-03-21 | 2018-09-04 | 北京理工大学 | High quality sound transform method based on wavelet transformation |
WO2020256491A1 (en) * | 2019-06-19 | 2020-12-24 | 한국전자통신연구원 | Method, apparatus, and recording medium for encoding/decoding image |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5327498A (en) * | 1988-09-02 | 1994-07-05 | Ministry Of Posts, Tele-French State Communications & Space | Processing device for speech synthesis by addition overlapping of wave forms |
US20030200083A1 (en) * | 2002-04-19 | 2003-10-23 | Masahiro Serizawa | Speech decoding device and speech decoding method |
CN101231849A (en) * | 2007-09-15 | 2008-07-30 | 华为技术有限公司 | Method and apparatus for concealing frame error of high belt signal |
WO2008151410A1 (en) * | 2007-06-14 | 2008-12-18 | Voiceage Corporation | Device and method for noise shaping in a multilayer embedded codec interoperable with the itu-t g.711 standard |
EP2040251A1 (en) * | 2006-07-12 | 2009-03-25 | Panasonic Corporation | Audio decoding device and audio encoding device |
US20110208517A1 (en) * | 2010-02-23 | 2011-08-25 | Broadcom Corporation | Time-warping of audio signals for packet loss concealment |
US20120010882A1 (en) * | 2006-08-15 | 2012-01-12 | Broadcom Corporation | Constrained and controlled decoding after packet loss |
WO2012070370A1 (en) * | 2010-11-22 | 2012-05-31 | 株式会社エヌ・ティ・ティ・ドコモ | Audio encoding device, method and program, and audio decoding device, method and program |
WO2015063045A1 (en) * | 2013-10-31 | 2015-05-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1323532C (en) * | 2001-11-15 | 2007-06-27 | 松下电器产业株式会社 | Method for error concealment apparatus |
WO2005086138A1 (en) | 2004-03-05 | 2005-09-15 | Matsushita Electric Industrial Co., Ltd. | Error conceal device and error conceal method |
US7831421B2 (en) * | 2005-05-31 | 2010-11-09 | Microsoft Corporation | Robust decoder |
US8255207B2 (en) * | 2005-12-28 | 2012-08-28 | Voiceage Corporation | Method and device for efficient frame erasure concealment in speech codecs |
US8731913B2 (en) * | 2006-08-03 | 2014-05-20 | Broadcom Corporation | Scaled window overlap add for mixed signals |
KR101291193B1 (en) * | 2006-11-30 | 2013-07-31 | 삼성전자주식회사 | The Method For Frame Error Concealment |
JP4708446B2 (en) | 2007-03-02 | 2011-06-22 | パナソニック株式会社 | Encoding device, decoding device and methods thereof |
JP5255358B2 (en) | 2008-07-25 | 2013-08-07 | パナソニック株式会社 | Audio transmission system |
ES2960089T3 (en) * | 2012-06-08 | 2024-02-29 | Samsung Electronics Co Ltd | Method and apparatus for concealing frame errors and method and apparatus for audio decoding |
CN103714821A (en) * | 2012-09-28 | 2014-04-09 | 杜比实验室特许公司 | Mixed domain data packet loss concealment based on position |
CA2964362C (en) * | 2013-06-21 | 2020-03-31 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Jitter buffer control, audio decoder, method and computer program |
EP3107096A1 (en) * | 2015-06-16 | 2016-12-21 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Downscaled decoding |
-
2016
- 2016-05-12 WO PCT/EP2016/060776 patent/WO2017129270A1/en active Application Filing
-
2017
- 2017-01-26 BR BR112018015479A patent/BR112018015479A2/en active Search and Examination
- 2017-01-26 KR KR1020187023876A patent/KR102230089B1/en active IP Right Grant
- 2017-01-26 CA CA3012547A patent/CA3012547C/en active Active
- 2017-01-26 JP JP2018539420A patent/JP6789304B2/en active Active
- 2017-01-26 RU RU2018130662A patent/RU2714238C1/en active
- 2017-01-26 CN CN201780020242.9A patent/CN108885875B/en active Active
- 2017-01-26 ES ES17707475T patent/ES2843851T3/en active Active
- 2017-01-26 MX MX2018009145A patent/MX2018009145A/en unknown
- 2017-01-26 EP EP17707475.4A patent/EP3408852B1/en active Active
-
2018
- 2018-07-27 US US16/048,166 patent/US10762907B2/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5327498A (en) * | 1988-09-02 | 1994-07-05 | Ministry Of Posts, Tele-French State Communications & Space | Processing device for speech synthesis by addition overlapping of wave forms |
US20030200083A1 (en) * | 2002-04-19 | 2003-10-23 | Masahiro Serizawa | Speech decoding device and speech decoding method |
EP2040251A1 (en) * | 2006-07-12 | 2009-03-25 | Panasonic Corporation | Audio decoding device and audio encoding device |
US20120010882A1 (en) * | 2006-08-15 | 2012-01-12 | Broadcom Corporation | Constrained and controlled decoding after packet loss |
WO2008151410A1 (en) * | 2007-06-14 | 2008-12-18 | Voiceage Corporation | Device and method for noise shaping in a multilayer embedded codec interoperable with the itu-t g.711 standard |
CN101231849A (en) * | 2007-09-15 | 2008-07-30 | 华为技术有限公司 | Method and apparatus for concealing frame error of high belt signal |
US20110208517A1 (en) * | 2010-02-23 | 2011-08-25 | Broadcom Corporation | Time-warping of audio signals for packet loss concealment |
WO2012070370A1 (en) * | 2010-11-22 | 2012-05-31 | 株式会社エヌ・ティ・ティ・ドコモ | Audio encoding device, method and program, and audio decoding device, method and program |
WO2015063045A1 (en) * | 2013-10-31 | 2015-05-07 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio decoder and method for providing a decoded audio information using an error concealment modifying a time domain excitation signal |
CN105793924A (en) * | 2013-10-31 | 2016-07-20 | 弗朗霍夫应用科学研究促进协会 | Audio decoder and method for providing decoded audio information using error concealment modifying time domain excitation signal |
Non-Patent Citations (3)
Title |
---|
J.LECOMTE: "Enhanced time domain packet loss concealment inswitched speech/audio codec", 《IEEE ICASSP》 * |
LUONG PHAM VAN: "Out-of-the-loop information hiding for HEVC video", 《2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING》 * |
王朝朋: "音频丢包补偿算法研究", 《中国优秀硕士学位论文全文数据库(信息科技)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113544773A (en) * | 2019-02-13 | 2021-10-22 | 弗劳恩霍夫应用研究促进协会 | Decoder and decoding method for LC3 concealment including full and partial frame loss concealment |
US11875806B2 (en) | 2019-02-13 | 2024-01-16 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Multi-mode channel coding |
US12009002B2 (en) | 2019-02-13 | 2024-06-11 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio transmitter processor, audio receiver processor and related methods and computer programs |
Also Published As
Publication number | Publication date |
---|---|
KR102230089B1 (en) | 2021-03-19 |
CN108885875B (en) | 2023-10-13 |
CA3012547C (en) | 2021-12-28 |
JP6789304B2 (en) | 2020-11-25 |
CA3012547A1 (en) | 2017-08-03 |
EP3408852B1 (en) | 2020-12-02 |
RU2714238C1 (en) | 2020-02-13 |
US10762907B2 (en) | 2020-09-01 |
KR20180123664A (en) | 2018-11-19 |
BR112018015479A2 (en) | 2018-12-18 |
ES2843851T3 (en) | 2021-07-20 |
WO2017129270A1 (en) | 2017-08-03 |
JP2019510999A (en) | 2019-04-18 |
MX2018009145A (en) | 2018-12-06 |
US20190122672A1 (en) | 2019-04-25 |
EP3408852A1 (en) | 2018-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103620672B (en) | For the apparatus and method of the error concealing in low delay associating voice and audio coding (USAC) | |
AU2014283123B2 (en) | Audio decoding with reconstruction of corrupted or not received frames using TCX LTP | |
CN103493129B (en) | For using Transient detection and quality results by the apparatus and method of the code segment of audio signal | |
Janicki | Spoofing countermeasure based on analysis of linear prediction error. | |
CN109155133B (en) | Error concealment unit for audio frame loss concealment, audio decoder and related methods | |
TR201802808T4 (en) | The audio decoder and method for providing a decoded audio information using an error suppression based on a time domain excitation signal. | |
JP2004508597A (en) | Simulation of suppression of transmission error in audio signal | |
JP2017521728A (en) | Packet loss concealment method and apparatus, and decoding method and apparatus using the same | |
JP7490894B2 (en) | Real-time packet loss concealment using deep generative networks | |
CN108885875A (en) | Device and method for improving the conversion from the concealing audio signal section of audio signal to subsequent audio signal parts | |
KR20220045260A (en) | Improved frame loss correction with voice information | |
US20220180884A1 (en) | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack | |
US20220392458A1 (en) | Methods and system for waveform coding of audio signals with a generative model | |
CN117935840A (en) | Method and device for execution by a terminal device | |
WO2017129665A1 (en) | Apparatus and method for improving a transition from a concealed audio signal portion to a succeeding audio signal portion of an audio signal | |
MX2008008477A (en) | Method and device for efficient frame erasure concealment in speech codecs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |