CN108885875A

CN108885875A - Device and method for improving the conversion from the concealing audio signal section of audio signal to subsequent audio signal parts

Info

Publication number: CN108885875A
Application number: CN201780020242.9A
Authority: CN
Inventors: 阿德里安·托马舍克; 杰里米·莱科特
Original assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Current assignee: Fraunhofer Gesellschaft zur Forderung der Angewandten Forschung eV
Priority date: 2016-01-29
Filing date: 2017-01-26
Publication date: 2018-11-23
Anticipated expiration: 2037-01-26
Also published as: BR112018015479A2; CA3012547A1; RU2714238C1; CN108885875B; EP3408852A1; US10762907B2; JP2019510999A; US20190122672A1; KR102230089B1; ES2843851T3; WO2017129270A1; KR20180123664A; CA3012547C; EP3408852B1; MX2018009145A; JP6789304B2

Abstract

It provides a kind of for improving the device (10) of the conversion of the subsequent audio signal parts from the concealing audio signal section of audio signal to audio signal.Device (10) includes processor (11), processor (11) is configured as generating the decoding audio signal parts of audio signal according to the first audio signal parts and according to the second audio signal parts, wherein the first audio signal parts depend on concealing audio signal section, and wherein the second audio signal parts depend on subsequent audio signal parts.In addition, device (10) includes output interface (12), for exporting decoding audio signal parts.First audio signal parts and the second audio signal parts and decoding each of audio signal parts include multiple samples, wherein each of the first audio signal parts and the second audio signal parts and the multiple samples for decoding audio signal parts sample are defined by sample position in multiple sample positions and sample value.

Description

For improving from the concealing audio signal section of audio signal to subsequent audio signal portion The device and method of the conversion divided

Technical field

The present invention relates to Audio Signal Processings and decoding, and particularly a kind of for improving concealing audio signal section The device and method for assigning to the conversion from the subsequent audio signal parts of audio signal.

Background technique

In the case where being easy the network of error, each codec attempts to mitigate pseudomorphism as caused by being lost these (artifacts).Prior art concern from simple mute or noise by means of being substituted into such as based on the pre- of past good frame Survey etc sophisticated method distinct methods come to lose information be hidden.One of the pseudomorphism as caused by packet loss is bright The huge source for showing ignored is located at (several good frames after loss) recovery.

Due to the long-term forecast being commonly used in the case where audio coder & decoder (codec), restoring pseudomorphism may be very serious, and And error propagation may influence multiple subsequent good frames.Some prior arts attempt to mitigate the problem, see, for example, [1] and [2]。

In the case where general or audio codec (any codec to work in the transform domain as illustrated), it can find perhaps Mostly about the document of hiding frame loss (for example, in [3]).However, the available prior art and being not concerned with the recovery of frame.Assuming that by In the property of transform domain codec, it is overlapped and adds smooth conversion pseudomorphism.One good example is in Facetime AAC-ELD (AAC-ELD=Advanced Audio Coding-enhanced low delay for communicating on ip networks；Referring to [4]).

Former frames after frame loss are known as " restoring frame ".The transform domain codec of the prior art seems not provide about one A or multiple specially treateds for restoring frame.It sometimes appear that annoying pseudomorphism.Execute the example for the problem of may occurring when restoring It is to hide wave signal and good wave signal to be overlapped and adding the superposition in part, this sometimes results in annoying energy lift.

Another problem is that the unexpected pitch changing on frame boundaries.The example of the case where for voice signal is to work as original signal Pitch changing and when frame loss occurs, hidden method can predict the pitch slightly mistake at End of Frame.It is this slightly to go out Wrong prediction may result in pitch and jump in next good frame.Most known hidden method does not even use prediction, And only using fixed pitch benchmark (pitch base) on last effectively pitch, this be may cause with the first good frame very To bigger mismatch.Some other methods reduce offset using advanced prediction, for example, with reference to EVS (the enhanced language of EVS= Sound service) in TD-TCX PLC (TD=time domain；TCX=transform coded excitation；PLC=packet loss concealment), referring to [5].

For modifying the art methods of the pitch in voice signal (for example, TD-PSOLA (TD-PSOLA=time domain- Pitch-synchronous overlapping-addition, referring to [6] and [7]) prosody modification is executed (for example, expansion/receipts of duration to voice signal Contract (referred to as time-stretching)) or change fundamental frequency (pitch).This is by the way that voice signal is resolved into short-term and pitch Synchronization Analysis Signal is completed, then relocate on a timeline and gradually juxtaposition these analyze signals.However, when the sound in concealment frames When high different with the pitch in original signal, the signal restored in frame is destroyed after overlapping mechanism.TD-PSOLA mechanism is Pseudomorphism is relocated on a timeline, this is not suitable for restoring.

Summary of the invention

Therefore, the purpose of the present invention is to provide be used for Audio Signal Processing and decoded improved design.

The purpose of the present invention is by device according to claim 1, according to the method for claim 35 and according to right It is required that 36 computer program solved.

Provide a kind of subsequent audio letter for improving from the concealing audio signal section of audio signal to audio signal The device of the conversion of number part.

The device includes processor, and processor is configured as believing according to the first audio signal parts and according to the second audio Number part generate audio signal decoding audio signal parts, wherein the first audio signal parts depend on concealing audio signal section Point, and wherein the second audio signal parts depend on subsequent audio signal parts.

In addition, the device includes output interface, for exporting decoding audio signal parts.

Each of first audio signal parts and the second audio signal parts and decoding audio signal parts include Multiple samples, wherein multiple samples of the first audio signal parts and the second audio signal parts and decoding audio signal parts Each of sample be to be defined by sample position in multiple sample positions and sample value, plurality of sample position Be sorted so that in the first sample position and multiple sample positions in multiple sample positions with first sample position not Each of the second same sample position is right, and first sample position is the subsequent or leading of the second sample position.

Processor is configured to determine that the first subdivision of the first audio signal parts, so that with the first audio signal parts It include less sample compared to the first subdivision.

Processor is configured with the first subdivision of the first audio signal parts and uses the second audio signal portion Point or the second subdivision of the second audio signal parts generate decoding audio signal parts so that for the second audio signal portion Point two or more samples in each sample, the sample in two or more samples of the second audio signal parts This sample position is equal to the sample position of a sample of decoding audio signal parts, and makes the second audio signal parts Two or more samples in the sample value of the sample be different from the one samples of decoding audio signal parts Sample value.

Further it is provided that a kind of for improving from the concealing audio signal section of audio signal to the subsequent sound of audio signal The method of the conversion of frequency signal section.The method includes：

Believe according to the first audio signal parts and according to the decoding audio that the second audio signal parts generate audio signal Number part, wherein the first audio signal parts depend on concealing audio signal section, and wherein the second audio signal parts take Certainly in subsequent audio signal parts.And：

Output decoding audio signal parts.

Generating decoding audio signal includes determining the first subdivision of the first audio signal parts, so that believing with the first audio Number part includes less sample compared to first part.

In addition, generating decoding audio signal parts is the first subdivision using the first audio signal parts and uses the Second subdivision of two audio signal parts or the second audio signal parts is performed, so that for the second audio signal parts Two or more samples in each sample, the sample in two or more samples of the second audio signal parts Sample position be equal to decoding audio signal parts a sample sample position, and make the second audio signal parts The sample value of the sample in two or more samples is different from the sample of one sample of decoding audio signal parts This value.

Further it is provided that a kind of meter for being configured as realizing the above method when executing on the computer or signal processor Calculation machine program.

Some embodiments, which provide, restores filter, is used for smooth for one kind and repairs in (for example, block-based) audio Tool in codec from lost frames to the conversion of the first good frame.According to embodiment, restoring filter can be used in language Pitch changing is fixed during concealment frames in first good frame of sound signal, but is also used for the conversion of smooth noise signal.

Especially, some embodiments based on the finding that：The length of modification of signal is limited, from concealment frames The last one sample terminated starts the last one sample to the first good frame.Length can increase above in the first good frame The last one sample, but this can risk the risk of error propagation, and error propagation is difficult to handle in the frame in future.Therefore, Need fast quick-recovery.In order to repair characteristics of speech sounds in the unmatched situation between lost frames and recovery frame, restore the letter in frame Number pitch should slowly change from the pitch in concealment frames to the pitch restored in frame, while modification of signal must be kept long The limitation of degree.It the use of TD-PSOLA algorithm will be possible if the multiple of pitch changing integer value.Since this is a kind of non- Often rare situation, therefore TD-PSOLA cannot be applied in this case.

Detailed description of the invention

Below with reference to attached drawing embodiment of the present invention will be described in more detail, in the accompanying drawings：

Fig. 1 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal Subsequent audio signal parts conversion device.

Fig. 1 b is shown according to another embodiment for realizing pitch adaptation overlapping design for improving from audio signal Concealing audio signal section to audio signal subsequent audio signal parts conversion device.

Fig. 1 c is shown according to another embodiment for realizing excitation overlapping design for improving hiding from audio signal Audio signal parts to audio signal subsequent audio signal parts conversion device.

Fig. 1 d is shown according to another embodiment for realizing energy damping for improving the concealing audio from audio signal Signal section to audio signal subsequent audio signal parts conversion device.

Fig. 1 e shows device according to another embodiment, and wherein the device further includes hidden unit.

Fig. 1 f shows device according to another embodiment, and wherein the device further includes the activation for activating hidden unit Unit.

Fig. 1 g shows device according to another embodiment, wherein activation unit is additionally configured to active processor.

Fig. 2 shows Hamming Cosine Windows according to the embodiment.

Fig. 3 shows the concealment frames and good frame according to such embodiment.

Fig. 4 shows the generation of two prototypes of realization pitch adaptation overlapping according to the embodiment.And：

Fig. 5 shows excitation overlapping according to the embodiment.

Fig. 6 shows concealment frames according to the embodiment and good frame.

Fig. 7 a shows system according to the embodiment.

Fig. 7 b shows system according to another embodiment.

Fig. 7 c shows system according to another embodiment.

Fig. 7 d shows system according to another embodiment.And：

Fig. 7 e shows system according to another embodiment.

Specific embodiment

Fig. 1 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal Subsequent (succeeding) audio signal parts conversion device 10.

Device 10 includes processor 11, and processor 11 is configured as according to the first audio signal parts and according to the second sound Frequency signal section generates the decoding audio signal parts of audio signal, wherein the first audio signal parts are believed depending on concealing audio Number part, and wherein the second audio signal parts depend on subsequent audio signal parts.

In some embodiments, the first audio signal parts can be exported for example according to concealing audio signal section, still Can for example different from concealing audio signal section and/or the second audio signal parts can be for example according to subsequent audio signal Part exports, but can be for example different from subsequent audio signal parts.

In other embodiments, the first audio signal parts may, for example, be and (be equal to) concealing audio signal section, and Second audio signal parts may, for example, be subsequent audio signal parts.

In addition, device 10 includes output interface 12, for exporting decoding audio signal parts.

For example, defining sample by sample position and sample value.For example, in two-dimensional coordinate system, sample position can be with The x-axis value (axis of abscissas value) of sample is defined, and sample value can define the y-axis value (axis of ordinates value) of the sample.Therefore, In view of specific sample, all samples in two-dimensional coordinate system on the left of specific sample be all the specific sample it is leading (because It is less than the sample position of specific sample for their sample position).All samples in two-dimensional coordinate system on the right side of specific sample This is all subsequent (because sample position that their sample position is greater than specific sample) of the specific sample.

Processor 11 is configured to determine that the first subdivision of the first audio signal parts, so that with the first audio signal portion Split-phase includes less sample than the first subdivision.

Processor 11 is configured with the first subdivision of the first audio signal parts and uses the second audio signal Second subdivision of part or the second audio signal parts generates decoding audio signal parts, so that for the second audio signal Each sample in two or more partial samples, it is described in two or more samples of the second audio signal parts The sample position of sample is equal to the sample position of a sample of decoding audio signal parts, and makes the second audio signal portion The sample value of the sample in two or more samples divided is different from one sample of decoding audio signal parts Sample value.

Therefore, in some embodiments, processor 11 is configured with the first subdivision and is believed using the second audio Number part generates decoding audio signal parts.

In other embodiments, processor 11 will use the first subdivision and use the second of the second audio signal parts Subdivision generates decoding audio signal parts.The second subdivision includes less sample compared with the second audio signal parts.

Embodiment based on the finding that：By modifying the sample of subsequent audio signal parts and not only by adjusting hiding The sample of audio signal is improved from the concealing audio signal section of audio signal to the subsequent audio signal parts of audio signal Conversion be beneficial.By also modifying the sample for the frame being properly received, can improve from (for example, concealing audio signal frame) Conversion of the concealing audio signal section to (for example, subsequent audio signal frame) subsequent audio signal parts.

Therefore, decoding audio signal parts are generated using the first audio signal parts and the second audio signal parts, but Be decoding audio signal parts include (at least two or more) sample, the sample be assigned to sample position and as second The different sample of sample value in audio signal parts (its depend on subsequent audio signal parts).This means that for these samples This, the sample value of corresponding sample does not use not instead of as it is, is modified, to obtain the correspondence sample of decoding audio signal parts This.

About the first audio signal parts and the second audio signal parts, processor 11 can for example receive the first audio letter Number part and the second audio signal parts.

Alternatively, in another embodiment, for example, processor 11 can for example receive concealing audio signal section, and can To determine the first audio signal parts according to concealing audio signal section, and processor 11 can for example receive subsequent audio Signal section, and the second audio signal parts can be determined according to subsequent audio signal parts.

Alternatively, in another embodiment, for example, processor 11 can for example receive audio signal frame；For example, processor 11 It can determine that the first frame loss or first frame are destroyed.Then, processor 11 can execute hiding, and can be for example according to existing There is technical concept to generate concealing audio signal section.In addition, processor 11 can for example receive the second audio signal frame, and Subsequent audio signal parts can be obtained from the second audio signal frame.Fig. 1 e shows such embodiment.

In some embodiments, the first audio signal parts may, for example, be as relative to concealing audio signal section The residual signals part of first residual signals of residual signals.In some embodiments, for example, the second audio signal parts can be with It is the residual signals part of the second residual signals as the residual signals relative to subsequent audio signal parts.

In Fig. 1 e, device 10 further includes hidden unit 8, and hidden unit 8 is configured as to error or loss work as Previous frame, which executes, to be hidden, to obtain concealing audio signal section.

According to the embodiment of Fig. 1 e, which further includes hidden unit 8.Hidden unit 8 can be for example configured as：If Frame loss is destroyed, then is executed and hidden according to the prior art.Then, concealing audio signal section is delivered to by hidden unit 8 Processor 11.In such embodiments, concealing audio signal section may, for example, be being performed hiding error or lose The concealing audio signal section of the frame of mistake.Subsequent audio signal frame, which may, for example, be, is not performed hiding (subsequent) audio signal The subsequent audio signal parts of frame.Subsequent audio signal frame can be for example subsequent in time in error or loss frame.

Fig. 1 f shows embodiment, and wherein device 10 further includes activation unit 6, and activation unit 6 can be for example configured as Whether detection present frame is loss or error.For example, if present frame is after last received frame not predetermined It is reached in adopted time restriction, then activates unit 6 that can for example obtain the conclusion of current frame loss.Alternatively, for example, having than current Another frame (for example, subsequent frame) of the big frame number of the frame number of frame reaches, then activates unit that can for example obtain current frame loss Conclusion.If such as it is received verification and/or received check bit be not equal to by the calculated calculating of activation unit verification and/or The check bit of calculating then activates unit 6 that can for example show that frame is the conclusion of error.

The activation unit 6 of Fig. 1 f can be for example configured as：If present frame is to lose either error, activate Hidden unit 8 is hidden with executing to present frame.

Fig. 1 g shows embodiment, wherein activation unit 6 can be for example configured as：If present frame be lose or It is error, then detects whether the subsequent frame not malfunctioned reaches.In the embodiment of Fig. 1 g, activation unit 6 be can be configured as： If present frame is to lose either error, and if the subsequent frame not malfunctioned reaches, active processor (8) is to produce Raw decoding audio signal parts.

Fig. 1 b show according to another embodiment for improving from the concealing audio signal section of audio signal to audio The device 100 of the conversion of the subsequent audio signal parts of signal.The device of Fig. 1 b realizes pitch adaptation overlapping design.

The device 100 of Fig. 1 b is the specific embodiment of the device 10 of Fig. 1 a.The processor 110 of Fig. 1 b is the processor of Fig. 1 a 11 specific embodiment.The output interface 120 of Fig. 1 b is the specific embodiment of the output interface 12 of Fig. 1 a.

In the embodiment of Fig. 1 b, processor 110 can be for example configured as：It is determined as the second audio signal parts Second prototype signal part of the second subdivision, so that the second subdivision includes less sample compared with the second audio signal parts This.

Processor 110 can for example be configured as the first prototype signal part and second by that will be used as the first subdivision Prototype signal part is combined, and each of one or more intermediate prototype signal parts is determined, to determine one or more A intermediate prototype signal part.

In Figure 1b, processor 110 can for example be configured with the first prototype signal part, using one or more Intermediate prototype signal part and decoding audio signal parts are generated using the second prototype signal part.

According to embodiment, processor 110 can be for example configured as by by the first prototype signal part, one or more Intermediate prototype signal part and the second prototype signal part are combined to generate decoding audio signal parts.

In embodiment, processor 110 is configured to determine that three or more marker samples positions, wherein three or more Each of multiple marker samples positions are at least one of the first audio signal parts and the second audio signal parts Sample position.In addition, processor 110 be configured as selection the second audio signal parts in, for the second audio signal parts Any other sample any other sample position for be all subsequent sample sample position, as three or more The final sample position of a marker samples position.In addition, processor 110 is configured as by according to the first audio signal parts Correlation between first subdivision and the second subdivision of the second audio signal parts is selected from the first audio signal parts Sample position, to determine the beginning sample position of three or more marker samples positions.In addition, processor 110 is configured as According to the beginning sample position of three or more marker samples positions and according to three or more marker samples positions Final sample position, to determine one or more intermediate sample positions of three or more marker samples positions.In addition, processing Device 110 is configured as by being carried out the first prototype signal part and the second prototype signal part according to the intermediate sample position It combines and is directed to each of one or more of intermediate sample positions in prototype signal part among one or more to determine The intermediate prototype signal part of a intermediate sample position determines one or more intermediate prototype signal parts.

According to embodiment, processor 110 is configured as by according to the following formula by the first prototype signal part and second Prototype signal part is combined to determine being directed in one or more of in prototype signal part among one or more Between sample position each intermediate sample position intermediate prototype signal part, to determine prototype signal among one or more Part：

sig_i=(1- α) sig_first+α·sig_last

Wherein：

Wherein, i is integer, and i >=1, and wherein nrOfMarkers is the quantity of three or more marker samples positions 1 is subtracted, wherein sig_iIt is i-th of intermediate prototype signal part among one or more in prototype signal part, wherein sig_firstIt is the first prototype signal part, wherein sig_lastIt is the second prototype signal part.

In embodiment, processor 110 is configured as any one of according to the following formula to determine three or more One or more intermediate sample positions of a marker samples position：

Or

Wherein,

Wherein, δ=x₁-(x₀+nrOfMarkers·T_c),

Wherein,

Wherein, i is integer, and i >=1, and wherein nrOfMarkers is the quantity of three or more marker samples positions 1 is subtracted, wherein mark_iIt is i-th of intermediate sample position in three or more marker samples positions, wherein mark_i-1It is three (i-1)-th intermediate sample position of a or more marker samples position, wherein mark_i+1It is three or more marker samples The i+1 intermediate sample position of position, wherein x₀It is the beginning sample position of three or more marker samples positions, wherein x₁It is the final sample position of three or more marker samples positions, and wherein T_cIndicate pitch lag.

According to embodiment, processor 110 is configured as filtering according to concealing audio signal section and according to multiple thirds Device coefficient determines the first audio signal parts, plurality of third filter coefficient depend on concealing audio signal section and after After audio signal parts, and wherein, processor 110 is configured as according to subsequent audio signal parts and multiple third filters Coefficient determines the second audio signal parts.

In embodiment, processor 110 can be for example including filter, and wherein processor 110 is configured as to hiding sound Frequency signal section applies the filter with third filter coefficient to obtain the first audio signal parts, and wherein processor 110 are configured as applying the filter with third filter coefficient to obtain the second audio signal subsequent audio signal parts Part.

According to embodiment, processor 110 is configured as determining multiple first filter systems according to concealing audio signal section Number, wherein processor 110 is configured as determining multiple second filter coefficients according to subsequent audio signal parts, wherein processor 110 are configured as being determined according to the combination of one or more first filter coefficients and one or more second filter coefficients Each third filter coefficient.

In embodiment, the filter of the filter coefficient of multiple first filter coefficients, multiple second filter coefficients The filter coefficient of coefficient and multiple third filter coefficients is the LPC parameters of linear prediction filter.

According to embodiment, processor 110 is configured as determining each filtering of third filter coefficient according to the following formula Device coefficient：

A=0.5A_conc+0.5·A_good

Wherein, A indicates the filter coefficient value of the filter coefficient, wherein A_concIndicate multiple first filter coefficients In filter coefficient coefficient value, and wherein A_goodIndicate the coefficient of the filter coefficient in multiple second filter coefficients Value.

In embodiment, more than processor 110 is configured as applying concealing audio signal section and is defined by following formula Porthole hides windowing signal part to obtain：

Wherein, processor 110 is configured as to subsequent audio signal parts using the Cosine Window to obtain subsequent adding window Signal section, wherein processor 110 is configured as determining multiple first filter coefficients according to hiding windowing signal part, Middle processor 110 is configured as determining multiple second filter coefficients according to subsequent windowing signal part, and wherein x, x₁ And x₂Each of be sample position in multiple sample positions.

According to embodiment, processor 110 can for example be configured as candidate according to multiple subdivisions of the first audio signal Multiple correlations of each subdivision and second subdivision of the second audio signal parts of item select first prototype Signal section as multiple subdivision candidate items of the first audio signal parts subdivision.Processor 110 can for example by Be configured to it is in the multiple samples for selecting first prototype signal part, for first prototype signal part it is any its It is all leading sample position for any other sample position of its sample, as three or more marker samples positions Beginning sample position.

In embodiment, processor 110 can be for example configured as selecting in the subdivision candidate item and described second The correlation of subdivision has the subdivision of the highest correlation in the multiple correlation as first prototype signal Part.

According to embodiment, processor 110 is configured as according to the following formula determining each phase for multiple correlations The correlation of closing property：

Wherein, L_frameIndicate the sample of second audio signal parts equal with the sample size of the first audio signal parts Quantity, wherein r (2L_frame- i) instruction the second audio signal parts in sample position 2L_frameThe sample of the sample at the place-i It is worth, wherein r (L_frame- i- Δ) instruction the first audio signal parts in sample position L_frameThe sample of sample at-i- Δ Value, wherein for each of subdivision candidate item and multiple correlations of second subdivision in multiple subdivision candidate items Correlation, Δ instruction number and depend on the subdivision candidate item.

The sound of beginning of the pitch adaptation overlapping for compensating first good decoding frame after possibly being present at frame loss The pitch between pitch at high and end with the TD PLC frame hidden is poor.Signal operates in the domain LPC, to be closed using LPC The signal smoothly constructed at the end of algorithm at filter.In the domain LPC, have by cross correlation as described below to find The moment of highest similitude, and the pitch of signal is from last pitch lag T_cSlowly develop into new pitch lag T_gTo keep away Exempt from unexpected change in pitch.

It is overlapped in the following, it is described that being adapted to according to the pitch of specific embodiment.

It can for example be realized as follows according to the device or method of such embodiment：

The hiding signal s (0 about preemphasis is calculated separately using Hamming Cosine Window：L_frame- 1) and the first good frame s (L_frame：2L_frame- 1) 16 rank LPC parameter A_concAnd A_good, Hamming Cosine Window is, for example, following form：

Wherein, for the frame length with 480 samples, x₁=200 and x₂=40.

Fig. 2 shows this Hamming Cosine Windows according to the embodiment.The shape of window can be for example so that believe in analysis The mode that there is the last sample of signal of number part highest to influence designs.

Interpolation is carried out in the domain LSP obtains A=0.5A_conc+0.5·A_good。

The LPC residual signal of concealment frames is calculated using A：

With the LPC residual signal of the first good frame：

Find moment x₀, it indicates the maximum comparability between the decline of concealment frames and the decline of good frame, x₁ It is 2L_frame-1。

Obtain x₀It is to be completed by maximizing normalized cross correlation：

In general, normalization is completed at the end of correlation：For example, finding pitch value in pitch search When be normalized after correlation.

Normalization is completed, during correlation to resist the energy fluctuation between signal.For complexity reason, normalization Item is calculated according to update scheme.Only for initial value

Wherein Δ=0, such as complete dot product can be calculated.For the next increment of Δ, this can be for example updated to It is as follows：

norm_Δ=norm_Δ-1+r(L_frame-T_g-Δ)²-r(L_frame-Δ)², Δ=1...T_c

In order to make pitch lag from last pitch lag T_c(x₀) slowly develop into new pitch lag T_g(x₁), it is necessary to it sets Moment label mark therebetween is set, wherein：

mark₀=x₀

mark_nrOfMarkers=x₁

If nrOfMarkers is lower than 1 or is higher than 12, algorithm changeover to energy damping.Otherwise, if δ > 0 and T_c< T_gOr δ < 0 and T_c> f_g, wherein

δ=x₁-(x₀+nrOfMarkers·T_c)

And

Calculate label as follows from left to right：

Otherwise, label is constructed from right to left：

It should be noted that nrOfMarkers is that all marker numbers subtract 1.Alternatively, indicate in different ways, NrOfMarkers is that the quantity of institute's marked sample position subtracts 1, because of x₀=mark₀And x₁=mark_nrOfMarkersIt is also mark Remember sample position.For example, having 5 marker samples positions, i.e. mark if nrOfMarkers=4₀、mark₁、mark₂、 mark₃And mark₄,

For composite signal, (cutting-out) input segment is cut out by adding window and around moment label mark to be arranged (segment is deviated in time to concentrate on moment label).It is non-overlapping good in order to slowly be smoothed to from hiding signal shape Good signal, segment by be two not laps linear combination：That is, the latter end of concealment frames and the end portion of good frame Point.Hereinafter referred to as prototype sig_firstAnd sig_last。

The length len of prototype is twice of minimum mark distance -1, and to prevent in overlapping addition synthetic operation, energy may Increase.If the distance between two labels are not in T_cAnd T_gBetween, then it will lead to boundary and go wrong.(therefore, specific In embodiment, algorithm can for example stop in these cases, and can for example be switched to energy damping.Energy is described below Amount damping).

So that by x₀And x₁It is arranged in sig_firstAnd sig_lastMidpoint on mode cut out from pumping signal r (x) With length T_cAnd T_gThe prototype (step 1) in referring to fig. 4.Then, by prototype recycle extend, with reach length len (referring to Step 2) in Fig. 4.Then, the adding window (step 3) in referring to fig. 4, to avoid overlapping region is carried out to prototype using Hamming window In artifact.

Prototype (the step 4) in referring to fig. 4 of label i is calculated as follows：

sig_i=(1- α) sig_first+α·sig_last

Wherein

Then, prototype is arranged at correspondence markings position according to midpoint, and these prototypes is added (in referring to fig. 4 Step 5).

Finally, being filtered first with the LPC composite filter with filter parameter A to the signal of building, then use Deemphasis filter is filtered it, is returned to original signal domain.

It is fade-in fade-out to the signal and original decoded signal, to prevent the artifact on frame boundaries.

Fig. 4 shows the generation of two prototypes according to such embodiment.

For security reasons, energy damping for example as described below should be applied to signal of being fade-in fade-out, is restored with eliminating The increased risk of energy height in frame.

X is directed to about above-mentioned₀And x₁Prototype cut out, x₀And x₁It is time point, when two residual signals have When highest similitude, for x₀And x₁Prototype sig_firstAnd sig_lastLength len=" twice of the minimum mark distance-having 1".Therefore, length always odd number, this makes sig_firstAnd sig_lastThere is a midpoint.(concealment frames) had into length now For T_cResidual signals and (good frame) have length be T_gResidual signals be arranged as so that x₀Positioned at sig_firstMidpoint On, and make x₁Positioned at sig_lastMidpoint on.Later, these residual signals can be recycled and is extended to fill from sig_first And sig_last1 Dao len all samples.

In the following, it is described that excitation overlapping according to the embodiment.

Fig. 1 c show according to another embodiment for improving from the concealing audio signal section of audio signal to audio The device 200 of the conversion of the subsequent audio signal parts of signal.The device of Fig. 1 c realizes excitation overlapping design.

The device 200 of Fig. 1 c is the specific embodiment of the device 10 of Fig. 1 a.The processor 210 of Fig. 1 c is the processor of Fig. 1 a 11 specific embodiment.The output interface 220 of Fig. 1 c is the specific embodiment of the output interface 12 of Fig. 1 a.

In figure 1 c, processor 210 can for example be configured as generating the first extension signal section according to the first subdivision, So that the first extension signal section is different from the first audio signal parts, and the first extension signal section is had than first The more samples of sample possessed by subdivision.

In addition, the processor 210 of Fig. 1 c can for example be configured with the first extension signal section and use the second sound Frequency signal section generates decoding audio signal parts.

According to embodiment, processor 210 is configured as by the first extension signal section and the second audio signal parts Execution is fade-in fade-out to generate decoding audio signal parts, to obtain signal section of being fade-in fade-out.

In embodiment, processor 210 can for example be configured as generating the first son according to the first audio signal parts Part, so that the length of the first subdivision is equal to the pitch lag (T of the first audio signal parts_c)。

According to embodiment, processor 210 can for example be configured as generating the first extension signal section, so that the first extension The sample size that the quantity of the sample of signal section is equal to the pitch lag of the first audio signal parts adds the second audio Quantity (the T of the sample of signal section_cThe sample size of+the second audio signal parts).

In embodiment, processor 210 can be for example configured as according to concealing audio signal section and according to multiple Filter coefficient determines the first audio signal parts, and plurality of filter coefficient depends on concealing audio signal section.This Outside, processor 210 can for example be configured as determining the second audio according to subsequent audio signal parts and multiple filter coefficients Signal section.

According to embodiment, processor 210 can be for example including filter.In addition, processor 210 can be for example configured as Apply the filter with filter coefficient to obtain the first audio signal parts concealing audio signal section.In addition, processing Device 210 can for example be configured as applying the filter with filter coefficient to obtain the second sound subsequent audio signal parts Frequency signal section.

In embodiment, the filter coefficient of multiple filter coefficients may, for example, be the linear pre- of linear prediction filter Survey coding parameter.

According to embodiment, processor 210 can for example be configured as applying by following formula concealing audio signal section The Cosine Window of definition hides windowing signal part to obtain.

Processor 210 can for example be configured as determining multiple filter coefficients according to hiding windowing signal part, wherein x And x₁And x₂Each of be sample position in multiple sample positions.

Fig. 5 is shown to be overlapped according to the excitation of such embodiment.

Realize that the device of excitation overlapping is repeating to be faded between decoded signal in excitation domain in the forward direction of concealment frames It fades out, with slowly smooth between two signals.

Firstly, as done in pitch adaptation method of superposition, using Hamming Cosine Window to the preemphasis of previous frame Latter end carries out 16 rank lpc analysis (referring to the step 1) in Fig. 5.

Using LPC filter to obtain the pumping signal of concealment frames and the pumping signal of the first good frame (referring in Fig. 5 Step 2).

In order to construct recovery frame, the last Tc sample of the excitation of concealment frames is by preceding to repeatedly to create on full frame length (referring to the step 3) in Fig. 5.This will be used for Chong Die with the first good frame.

The excitation of extension and the excitation of the first good frame are fade-in fade-out (referring to the step 4) in Fig. 5.

Then, to the signal application LPC synthesis of being fade-in fade-out with the last preemphasis sample that storage content is concealment frames (referring to the step 5) in Fig. 5, with the conversion between smooth concealment frames and the first good frame.

Finally, (referring to the step 6) in Fig. 5, signal is returned to original to composite signal application deemphasis filter In domain.

It is fade-in fade-out the signal and original decoded signal that newly construct (referring to the step 7) in Fig. 5, to prevent frame side Pseudomorphism at boundary.

In the following, it is described that energy damping according to the embodiment.

Fig. 1 d shows embodiment, wherein the first audio signal parts are concealing audio signal sections, wherein the second audio Signal section is subsequent audio signal parts.

The device 300 of Fig. 1 d is the specific embodiment of the device 10 of Fig. 1 a.The processor 310 of Fig. 1 d is the processor of Fig. 1 a 11 specific embodiment.The output interface 320 of Fig. 1 d is the specific embodiment of the output interface 12 of Fig. 1 a.

The processor 310 of Fig. 1 d can for example be configured to determine that (it is for the first subdivision of concealing audio signal section First subdivision of the first audio signal parts) so that the first subdivision includes the one or more of concealing audio signal section Sample but include less sample compared with concealing audio signal section, and makes each sample of the sample of the first subdivision Position be in concealing audio signal section do not include any sample in the first subdivision any sample position it is subsequent.

In addition, the processor 310 of Fig. 1 d can for example be configured to determine that the third subdivision of subsequent audio signal parts, So that third subdivision includes one or more samples of subsequent audio signal parts but wraps compared with subsequent audio signal parts Less sample is included, and each sample position of each sample of third subdivision is made to be in subsequent audio signal parts It does not include the subsequent of any sample position of any sample in third subdivision.

In addition, the processor 310 of Fig. 1 d can for example be configured to determine that the second sub-portion (its of subsequent audio signal parts For the second subdivision of the second audio signal parts) so that not including in third subdivision in subsequent audio signal parts Any sample includes in the second subdivision of subsequent audio signal parts.

In the embodiment according to Fig. 1 d, processor 310 can for example be configured as from concealing audio signal section The first peak value sample is determined in the sample of one subdivision, so that the sample value of the first peak value sample is believed more than or equal to concealing audio Any other sample value of any other sample of first subdivision of number part.The processor 310 of Fig. 1 d can for example be matched It is set to from the sample of the second subdivision of subsequent audio signal parts and determines the second peak value sample, so that the second peak value sample Sample value is greater than or equal to any other sample value of any other sample of the second subdivision of subsequent audio signal parts.This Outside, the processor 310 of Fig. 1 d can for example be configured as determining from the sample of the third subdivision of subsequent audio signal parts Third peak value sample, so that the sample value of third peak value sample is greater than or equal to the third subdivision of subsequent audio signal parts Any other sample value of any other sample.

When meeting condition, the processor 310 of Fig. 1 d can for example be configured as modifying subsequent audio signal portion Each sample value of leading each sample as the second peak value sample in point, to generate decoding audio signal parts.

The sample value that the condition may, for example, be the second peak value sample is greater than the sample value and the second peak of the first peak value sample The sample value for being worth sample is greater than the sample value of third peak value sample.

Alternatively, the condition may, for example, be between the sample value of the second peak value sample and the sample value of the first peak value sample First ratio is greater than the second ratio between first threshold and the sample value of the second peak value sample and the sample value of third peak value sample Rate is greater than second threshold.

According to embodiment, the sample value which may, for example, be the second peak value sample is greater than the sample of the first peak value sample Value and the sample value of the second peak value sample are greater than the sample value of third peak value sample.

In embodiment, which may, for example, be the first ratio greater than first threshold and the second ratio is greater than the second threshold Value.

According to embodiment, first threshold can be greater than 1.1, and second threshold can be greater than 1.1.

In embodiment, first threshold can be for example equal to second threshold.

According to embodiment, when meeting condition, processor 310 can for example be configured as repairing according to the following formula Change each sample value of leading each sample as the second peak value sample in subsequent audio signal parts：

s_modified(Lframe+i)=s (Lframe+i) α_i

Wherein, Lframe indicates any other sample for subsequent audio signal parts in subsequent audio signal parts Any other sample position for be leading sample sample position,

Wherein, Lframe+i is the integer for indicating the sample position of i+1 sample of subsequent audio signal parts,

Wherein, 0≤i≤Imax-1, wherein I_maxThe sample position of -1 the second peak value sample of instruction,

Wherein, s (Lframe+i) is the i+1 sample of the subsequent audio signal parts before being modified by processor 310 Sample value,

Wherein, s_modifiedIt (Lframe+i) is by the i+1 of the modified subsequent audio signal parts of processor 310 The sample value of sample,

Wherein, 0 < α_i< 1.

In embodiment,

Wherein, E_cmaxIt is the sample value of the first peak value sample, wherein E_maxIt is the sample value of the second peak value sample, and its Middle E_gmaxIt is the sample value of third peak value.

According to embodiment, when meeting condition, processor 310 can for example be configured as coming according to the following formula Modify two or more subsequent samples as the second peak value sample in multiple samples of subsequent audio signal parts The sample value of each sample, to generate decoding audio signal parts：

s_modified(Imax+k)=s (Imax+k) α_i.

Wherein, Imax+k is the integer for indicating the sample position of max+k+1 sample of I of subsequent audio signal parts.

Fig. 6 is another diagram of concealment frames according to the embodiment and good frame.Especially, Fig. 6 shows concealing audio letter Number part, subsequent audio signal parts, the first subdivision, the second subdivision and third subdivision.

Energy damping is used to eliminate the high-energy in the lap of the signal between last concealment frames and the first good frame Increase.This is completed by slowly damping signal area to peak amplitude values.

Method according to the embodiment can be realized for example as follows：

Peak swing value is found in following item：

The last T of the last previous concealment frames of the previous concealment frames of ο_cSample：E_cmax

Last T in the first good frame of ο_gSample：E_gmax

ο and, the sample between these regions：E_max

E_cmaxIt is the first peak value sample, E_maxIt is the second peak value sample, and E_gmaxIt is third peak value sample.

If E_cmax< E_max> E_gmax, then the decoded signal in the first good frame will be damped.

In other examples, if meeting following formula, the first good frame will be damped：

For example, 1.1 < thresholdValue1 < 4 and 1.1 < thresholdValue2 < 4.

The first part of decoded signal will be damped as follows：

Wherein I_maxIt is E_maxIndex, and

Second part will be damped as follows：

Wherein

It in a preferred embodiment, for security reasons, can be for example to signal applied energy damping of being fade-in fade-out, to eliminate Restore the increased risk of energy height in frame.

Now, the combination of different improved conversion design according to the embodiment is provided.

Fig. 7 a shows according to the embodiment for improving from the concealing audio signal section of audio signal to audio signal Subsequent audio signal parts conversion system.

The system include switching module 701, for realizing the energy damping above with reference to described in Fig. 1 d device 300, with And the device 100 for realizing the adaptation overlapping of the pitch above with reference to described in Fig. 1 b.

Switching module 701 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection A device in device 300 for realizing energy damping and the device 100 for realizing pitch adaptation overlapping, for producing Raw decoding audio signal parts.

Fig. 7 b show according to another embodiment for improving from the concealing audio signal section of audio signal to audio The system of the conversion of the subsequent audio signal parts of signal.

The system include switching module 702, for realizing the energy damping above with reference to described in Fig. 1 d device 300, with And the device 200 for realizing the excitation overlapping above with reference to described in Fig. 1 c.

Switching module 702 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection A device in device 300 for realizing energy damping and the device 200 for realizing excitation overlapping, for generating solution Code audio signal parts.

Fig. 7 c show according to another embodiment for improving from the concealing audio signal section of audio signal to audio The system of the conversion of the subsequent audio signal parts of signal.

The system includes switching module 703, for realizing the device of the adaptation overlapping of the pitch above with reference to described in Fig. 1 b 100 and for realizing above with reference to described in Fig. 1 c excitation overlapping device 200.

Switching module 703 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection For realizing the device 100 of pitch adaptation overlapping and for realizing a device in the device 200 of excitation overlapping, for producing Raw decoding audio signal parts.

Fig. 7 d show according to yet another embodiment for improving from the concealing audio signal section of audio signal to audio The system of the conversion of the subsequent audio signal parts of signal.

The system includes switching module 701, the device 300 for realizing the energy damping above with reference to described in Fig. 1 d, use In the device 100 for realizing the adaptation overlapping of the pitch above with reference to described in Fig. 1 b and for realizing above with reference to described in Fig. 1 c Motivate the device 200 of overlapping.

Switching module 701 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection For realizing the device 300 of energy damping, for realizing the device 100 of pitch adaptation overlapping and for realizing excitation overlapping Device 200 in a device, with for generate decoding audio signal parts.

According to embodiment, switching module 704 can for example be configured to determine that concealing audio signal frame and subsequent audio letter Whether at least one of number frame includes voice.In addition, switching module 704 can be for example configured as：If concealing audio is believed Number frame and subsequent audio signal frame do not include voice, then device 300 of the selection for realizing energy damping generates decoding audio Signal section.

In embodiment, switching module 704 can be for example configured as：According to the frame length of subsequent audio signal frame and According at least one of the pitch of concealing audio signal section or the pitch of subsequent audio signal parts, to select for real Existing pitch is adapted to the device 100 of overlapping, for realizing the device 200 of excitation overlapping and for realizing the device of energy damping One device in 300, to decode audio signal parts for generating, wherein subsequent audio signal parts are subsequent audios The audio signal parts of signal frame.

Fig. 7 e show according to another embodiment for improving from the concealing audio signal section of audio signal to audio The system of the conversion of the subsequent audio signal parts of signal.

As in Fig. 7 c, the system of Fig. 7 e includes switching module 703, for realizing the sound above with reference to described in Fig. 1 b The device 100 that high adaptation is overlapped and the device 200 for realizing the excitation overlapping above with reference to described in Fig. 1 c.

Switching module 703 is configured as according to concealing audio signal section and according to subsequent audio signal parts, selection For realizing the device 100 of pitch adaptation overlapping and for realizing one in the device 200 of excitation overlapping, for generating solution Code audio signal parts.

In addition, the system of Fig. 7 e further includes the device 300 for realizing the energy damping above with reference to described in Fig. 1 d.

The switching module 703 of Fig. 7 e can be for example configured as according to concealing audio signal section and according to subsequent audio Signal section, select for realizing pitch adaptation overlapping device 100 and for realizing excitation overlapping device 200 in described in One device, to generate intermediate audio signal parts.

In the embodiment of Fig. 7 e, processing intertone can be for example configured as realizing the device 300 of energy damping Frequency signal section is to generate decoding audio signal parts.

Now, specific embodiment is described.Particularly, the specific reality for switching module 701,702,703 and 704 is provided Existing design.

For example, the combined first embodiment for providing different improved conversion designs can for example be used for any transform domain and compile Decoder：

The first step is to detect whether signal is such as to have the voice of prominent pitch (for example, clean speech item, having The voice of ambient noise or voice with musical background).

If signal is such voice,：

ο finds the pitch T in last concealment frames_c

ο finds the pitch T in the first good frame_g

If ο increases with the energy in last concealment frames lap,

If the pitch of ■ good frame differs more than three sample with hiding pitch,

→ execute and restore filter

■ is otherwise

→ execute energy damping

Otherwise

→ execute energy damping

If recovery filter selected above,

If hiding pitch T_cOr good pitch T_gHigher than frame length L_frame, then

→ execute energy damping

Otherwise, if hiding pitch or good pitch are higher than half frame length and normalized cross correlation value xCorr is small In threshold value, then

→ execute excitation overlapping

Otherwise, if hiding pitch or good pitch are lower than half frame length,

→ using pitch adaptation overlapping

For example, firstly, test concealment frames are with the presence or absence of voice (for example, can find out whether voice is deposited according to concealing technology ).Later, for example, it is also possible to for example test good frame using normalized crosscorrelation value xCorr with the presence or absence of voice.

For example, above-mentioned lap can be the second subdivision shown in such as Fig. 6, it means that lap be from To sample, " frame length subtracts T to first sample_g" good frame.

Now, it provides and the combined second embodiment of different improved conversion designs is provided.Such second embodiment It can be for example used for AAC-ELD codec, two of them hiding frames error method to be time domain approach and frequency domain method.

Time domain approach is the frame that loss is synthesized using pitch extrapolation, referred to as TD PLC (referring to [8]).

Frequency domain method is the prior art hidden method (referred to as noise substitution (NS)) for AAC-ELD codec, Copy is scrambled using the symbol of previous good frame.

In a second embodiment, the first division (division) is made according to latter hidden method：

If last frame is using TD PLC come hiding：

ο finds the pitch in the first good frame

If ο increases with the energy in last concealment frames lap,

→ execute and restore filter

■ is otherwise

→ execute energy damping

If last frame is using NS come hiding,

→ execute energy damping

In addition, in a second embodiment, following second being carried out in restoring filter and is divided：

If hiding pitch T_c(pitch in last frame being hidden) or good pitch T_g(the sound in the first good frame It is high) it is higher than frame length L_frame

→ execute energy damping

If hiding pitch or good pitch is higher than half frame length and normalized cross correlation value xCorr is less than threshold value

→ execute excitation overlapping

If hiding pitch or good pitch being lower than half frame length,

→ using pitch adaptation overlapping.

Multiple embodiments have been provided.

According to embodiment, provides and a kind of lost with subsequent in hiding for improving the hiding lost frames of transform domain coding signal Lose the filter of the conversion between one or more frames of the transform domain coding signal of frame.

In embodiment, filter for example can also be configured as described above.

A kind of transform domain decoding according to embodiment, including providing the filter including according to one of above-described embodiment Device.

Further it is provided that a kind of method executed by transform domain decoder as described above.

Further it is provided that a kind of for executing the computer program of method as described above.

Although describing some aspects in the context of device, it will be clear that these aspects are also represented by The description of corresponding method, wherein block or apparatus and method for step or the feature of method and step are corresponding.Similarly, it is walked in method Aspect described in rapid context also illustrates that the description of the item to corresponding blocks or corresponding intrument or feature.It can be by (or making With) hardware device (for example, microprocessor, programmable calculator or electronic circuit) executes some or all method and steps.? In some embodiments, one or more method and steps in most important method and step can be executed by this device.

According to certain realizations require, the embodiment of the present invention can with hardware or software realization, or at least partly with Hardware is at least partly implemented in software.The digital storage media for being stored thereon with electronically readable control signal can be used (for example, floppy disk, DVD, blue light, CD, ROM, PROM, EPROM, EEPROM or flash memory) executes realization, electronically readable control Signal cooperates (or can cooperate) with programmable computer system thereby executing correlation method.Therefore, stored digital is situated between Matter can be computer-readable.

It according to some embodiments of the present invention include the data medium with electronically readable control signal, the electronically readable control Signal processed can cooperate with programmable computer system thereby executing one of method described herein.

In general, the embodiment of the present invention may be implemented as the computer program product with program code, program code It can be used to one of execution method when computer program product is run on computers.Program code can be for example stored in In machine-readable carrier.

Other embodiments include the computer program being stored in machine-readable carrier, and the computer program is for executing sheet One of method described in text.

In other words, therefore the embodiment of the method for the present invention is the computer program with program code, which uses In one of execution method described herein when computer program is run on computers.

Therefore, another embodiment of the method for the present invention be thereon record have computer program data medium (or number Storage medium or computer-readable medium), the computer program is for executing one of method described herein.Data medium, number The medium of word storage medium or record is usually tangible and/or non-transitory.

Therefore, another embodiment of the method for the present invention is to indicate the data flow or signal sequence of computer program, the meter Calculation machine program is for executing one of method described herein.Data flow or signal sequence can for example be configured as logical via data Letter connection (for example, via internet) transmission.

Another embodiment include one of be configured as or be adapted for carrying out method described herein processing unit (for example, Computer or programmable logic device)

Another embodiment includes the computer for being equipped with computer program thereon, and the computer program is for executing this paper institute One of method stated.

It according to another embodiment of the present invention include being configured as to receiver (for example, electronically or with optics side Formula) transmission computer program device or system, the computer program is for executing one of method described herein.Receiver can To be such as computer, mobile device, storage equipment.Device or system can be for example including for transmitting calculating to receiver The file server of machine program.

In some embodiments, programmable logic device (for example, field programmable gate array) can be used for executing this paper Some or all of described function of method function.In some embodiments, field programmable gate array can be with micro- place Device cooperation is managed to execute one of method described herein.In general, method is preferably executed by any hardware device.

Device described herein can be used hardware device or use computer or use hardware device and calculating The combination of machine is realized.

Method described herein can be used hardware device or use computer or use hardware device and calculating The combination of machine executes.

Above-described embodiment is merely illustrative the principle of the present invention.It should be understood that：It is as described herein arrangement and The modification and variation of details will be apparent others skilled in the art.Accordingly, it is intended to only by appended patent right The range that benefit requires is to limit rather than by by describing and explaining given detail and limit to embodiment hereof.

Bibliography：

[1]Philippe Gournay："Improved Frame Loss Recovery Using Closed-Loop Estimation of Very Low Bit Rate Side Information ", Interspeech 2008, Brisbane, Australia, 22-26September, 2008.

[2] Mohamed Chibani, Roch Lefebvre, Philippe Gournay："Resynchronization Of the Adaptive Codebook in a Constrained CELP Codec after a frame erasure ", 2006International Conference on Acoustics, Speech and Signal Processing (ICASSP ' 2006), Toulouse, FRANCE March 14-19,2006.

[3] S.-U.Ryu, E.Choy, and K.Rose, " Encoder assisted frame loss Concealment for MPEG-AAC decoder ", ICASSP IEEE Int.Conf.Acoust.Speech Signal Process Proc., vol.5, pp.169-172, May 2006.

[4]ISO/IEC 14496-3：2005/Amd 9：2008：Enhanced low delay AAC, available at：

http：//www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.h tm？Csnumber=46457

[5] J.Lecomte, et al, " Enhanced time domain packet loss concealment in Switched speech/audio codec ", submitted to IEEE ICASSP, Brisbane, Australia, Apr.2015.

[6] E.Moulines and J.Laroche, " Non-parametric techniques for pitch- Seale and time-scale modification of speech ", Speech Communication, vol.16, Pp.175-205,1995.

[7]European Patent EP 363233 B1："Method and apparatus for speech synthesis by wave form overlapping and adding”.

[8]International Patent Application WO 2015063045 A1："Audio Decoder and Method for Providing a Decoded Audio Information using an Error Concealment Modifying a Time Domain Excitation Signal”.

[9] Schnell, M.；Schmidt, M.；Jander, M.；Albert, T.；Geiger, R.；Ruoppila, V.； Ekstrand, P.；Grill, B.,, MPEG-4enhanced low delay AAC-a new standard for high Quality communication ", Audio Engineering Society：125th Audio Engineering Society Convention 2008；October 2-5,2008, San Francisco, USA.

Claims

1. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The device (10 of the conversion divided；100；200；300), wherein described device (10；100；200；300) include：

Processor (11；110；210；310) it, is configured as according to the first audio signal parts and according to the second audio signal portion Divide the decoding audio signal parts for generating the audio signal, wherein first audio signal parts depend on the hiding sound Frequency signal section, and wherein second audio signal parts depend on the subsequent audio signal parts, and

Output interface (12；120；220；320), for exporting the decoding audio signal parts,

Wherein, in first audio signal parts, second audio signal parts and the decoding audio signal parts Each include multiple samples, wherein first audio signal parts, second audio signal parts and the solution Each of multiple samples of code audio signal parts sample is by the sample position and sample value in multiple sample positions Come what is defined, wherein the multiple sample position is sorted, so that for the first sample position in the multiple sample position With right, first sample of each of the second sample positions different from the first sample position in the multiple sample position This position is the subsequent or leading of second sample position,

Wherein, the processor (11；110；210；310) it is configured to determine that the first sub-portion of first audio signal parts Point, so that first subdivision includes less sample compared with first audio signal parts, and

Wherein, the processor (11；110；210；310) it is configured with the first sub-portion of first audio signal parts Divide and generates the solution using the second subdivision of second audio signal parts or second audio signal parts Code audio signal parts, so that for each sample in two or more samples of second audio signal parts, institute The sample position for stating the sample in two or more samples of the second audio signal parts is believed equal to the decoding audio The sample position of one sample of number part, and make in two or more samples of second audio signal parts The sample value of the sample is different from the sample value of one sample of the decoding audio signal parts.

2. the apparatus according to claim 1 (100),

Wherein, the processor (110) is configured as：It is determined as the of the second subdivision of second audio signal parts Diarch signal section, so that second subdivision includes less sample compared with second audio signal parts, with And

Wherein, the processor (110) is configured as in the following manner to determine prototype signal part among one or more： It will be combined as the first prototype signal part of first subdivision and second prototype signal part, to determine State the intermediate prototype signal in each of prototype signal part part among one or more；

Wherein, the processor (110) is configured with first prototype signal part, using in one or more of Between prototype signal part and the decoding audio signal parts are generated using second prototype signal part.

3. the apparatus of claim 2 (100), wherein the processor (110) is configured as：By described in combination First prototype signal part, one or more of intermediate prototype signal parts and second prototype signal part generate The decoding audio signal parts.

4. device (100) according to claim 2 or 3,

Wherein, the processor (110) is configured to determine that three or more marker samples positions, wherein described three or more Each of multiple marker samples positions marker samples position is first audio signal parts and second audio letter The sample position of at least one of number part,

Wherein, the processor (110) be configured as selecting it is in second audio signal parts, for second audio It is all the sample position of subsequent sample for any other sample position of any other sample of signal section, as described The final sample position of three or more marker samples positions,

Wherein, the processor (110) is configured as：Pass through the first subdivision according to first audio signal parts and institute The correlation stated between the second subdivision of the second audio signal parts selects sample bit from first audio signal parts It sets, determines the beginning sample position of three or more marker samples positions,

Wherein, the processor (110) is configured as：According to the beginning sample bit of three or more marker samples positions It sets and according to the final sample position of three or more marker samples positions, three or more determining described labels One or more intermediate sample positions of sample position, and

Wherein, the processor (110) is configured as：By in each for one or more of intermediate sample positions Between sample position, according to the intermediate sample position by first prototype signal part and second prototype signal part into Row combination to determine the intermediate prototype signal part in one or more of intermediate prototype signal parts, determine it is one or Multiple intermediate prototype signal parts.

5. device (100) according to claim 4,

Wherein, the processor (110) is configured as：By in each for one or more of intermediate sample positions Between sample position, according to the following formula by first prototype signal part and second prototype signal part be combined come It determines the intermediate prototype signal part in one or more of intermediate prototype signal parts, determines one or more of centres Prototype signal part：

sig_i=(1- α) sig_first+α·sig_last

Wherein

Wherein, i is integer, and i >=1,

Wherein, nrOfMarkers is that the quantity of three or more marker samples positions subtracts 1,

Wherein, sig_iIt is i-th of intermediate prototype signal part of one or more of intermediate prototype signal parts,

Wherein, sig_firstIt is first prototype signal part,

Wherein, sig_lastIt is second prototype signal part.

6. device (100) according to claim 4 or 5,

Wherein, the processor (110) be configured as any of according to the following formula come determine it is described three or more One or more intermediate sample positions of marker samples position：

Or

Wherein

Wherein δ=x₁-(x₀+nrOfMarkers·T_c),

Wherein

Wherein, i is integer, and i >=1,

Wherein, mark_iIt is i-th of intermediate sample position of three or more marker samples positions,

Wherein, mark_i-1It is (i-1)-th intermediate sample position of three or more marker samples positions,

Wherein, mark_i+1It is the i+1 intermediate sample position of three or more marker samples positions,

Wherein, x₀It is the beginning sample position of three or more marker samples positions,

Wherein, x₁It is the final sample position of three or more marker samples positions,

Wherein, T_cIndicate pitch lag.

7. the device according to any one of claim 4 to 6 (100),

Wherein, the processor (110) is configured as：According to every in multiple subdivision candidate items of first audio signal Multiple correlations of a subdivision and second subdivision of second audio signal parts select the first audio letter Subdivision in multiple subdivision candidate items of number part as first prototype signal part,

Wherein, the processor (110) is configured as：Select it is in multiple samples of first prototype signal part, for All it is leading sample position for any other sample position of any other sample of first prototype signal part, makees For the beginning sample position of three or more marker samples positions.

8. device (100) according to claim 7, wherein the processor (110) is configured as：Select the sub-portion Dividing in candidate item with the correlation of second subdivision there is the subdivision of the highest correlation in the multiple correlation to come As first prototype signal part.

9. device (100) according to claim 7 or 8,

Wherein, the processor (110) is configured as according to the following formula to determine for each of the multiple correlation The correlation of correlation：

Wherein, L_frameIndicate second audio signal parts equal with the sample size of first audio signal parts Sample size,

Wherein, r (2L_frame- i) in instruction second audio signal parts in sample position 2L_frameThe sample of the sample at the place-i This value,

Wherein, r (L_frame- i- Δ) in instruction first audio signal parts in sample position L_frameSample at-i- Δ Sample value,

Wherein, in the multiple subdivision candidate item subdivision candidate in multiple correlations of second subdivision Each correlation, Δ instruction number and depend on the subdivision candidate item.

10. the device according to any one of claim 4 to 9 (100),

Wherein, the processor (110) is configured as filtering according to the concealing audio signal section and according to multiple thirds Device coefficient determines first audio signal parts, wherein the multiple third filter coefficient depends on the concealing audio Signal section and the subsequent audio signal parts, and

Wherein, the processor (110) is configured as according to the subsequent audio signal parts and the multiple third filter Coefficient determines second audio signal parts.

11. device (100) according to claim 10,

Wherein, the processor (110) includes filter,

Wherein, the processor (110), which is configured as applying the concealing audio signal section, has the third filter The filter of coefficient to obtain first audio signal parts, and

Wherein, the processor (110), which is configured as applying the subsequent audio signal parts, has the third filter The filter of coefficient is to obtain second audio signal parts.

12. device described in 0 or 11 (100) according to claim 1,

Wherein, the processor (110) is configured as determining multiple first filter systems according to the concealing audio signal section Number,

Wherein, the processor (110) is configured as determining multiple second filter systems according to the subsequent audio signal parts Number,

Wherein, the processor (110) is configured as according to one or more filter systems in the first filter coefficient It counts with the combinations of one or more filter coefficients in the second filter coefficient and determines the third filter coefficient Each of filter coefficient.

13. device (100) according to claim 12, wherein the multiple first filter coefficient, the multiple second Filter coefficient in filter coefficient and the multiple third filter coefficient is the linear prediction of linear prediction filter Coding parameter.

14. device (100) according to claim 12 or 13,

Wherein, the processor (110) is configured as determining each filter in the third filter coefficient according to the following formula Wave device coefficient：

A=0.5A_conc+0.5·A_good

Wherein, A indicates the filter coefficient value of the filter coefficient,

Wherein, A_concIndicate the coefficient value of the filter coefficient in the multiple first filter coefficient, and

Wherein, A_goodIndicate the coefficient value of the filter coefficient in the multiple second filter coefficient.

15. device described in any one of 2 to 14 (100) according to claim 1,

Wherein, more than the processor (110) is configured as applying the concealing audio signal section and is defined by following formula Porthole hides windowing signal part to obtain：

Wherein, the processor (110) be configured as to the subsequent audio signal parts using the Cosine Window to obtain after After windowing signal part,

Wherein, the processor (110) is configured as determining the multiple first filtering according to the hiding windowing signal part Device coefficient,

Wherein, the processor (110) is configured as determining the multiple second filtering according to the subsequent windowing signal part Device coefficient, and

Wherein, x, x₁And x₂Each of be sample position in the multiple sample position.

16. the apparatus according to claim 1 (200),

Wherein, the processor (210) is configured as generating the first extension signal section according to first subdivision, so that institute It is different from first audio signal parts to state the first extension signal section, and the first extension signal section is made to compare institute Stating the first subdivision has more samples,

Wherein, the processor (210) is configured with the first extension signal section and is believed using second audio Number part generates the decoding audio signal parts.

17. device (200) according to claim 16, wherein the processor (210) is configured as by described One extension signal section and second audio signal parts execution are fade-in fade-out to obtain signal section of being fade-in fade-out, to generate The decoding audio signal parts.

18. device (200) according to claim 16 or 17, wherein the processor (210) is configured as from described First subdivision is generated in one audio signal parts, so that the length of first subdivision is believed equal to first audio The pitch lag of number part.

19. device (200) according to claim 18, wherein the processor (210) is configured as generating described first Signal section is extended, so that the quantity of the sample of the first extension signal section is equal to the institute of first audio signal parts The sample size of pitch lag is stated plus the quantity of the sample of second audio signal parts.

20. device described in any one of 6 to 19 (200) according to claim 1,

Wherein, the processor (210) is configured as according to the concealing audio signal section and according to multiple filter systems Number is to determine first audio signal parts, wherein the multiple filter coefficient depends on the concealing audio signal section Point, and

Wherein, the processor (210) is configured as according to the subsequent audio signal parts and the multiple filter coefficient Determine second audio signal parts.

21. device (200) according to claim 20,

Wherein, the processor (210) includes filter,

Wherein, the processor (210), which is configured as applying the concealing audio signal section, has the filter coefficient Filter to obtain first audio signal parts, and

Wherein, the processor (210), which is configured as applying the subsequent audio signal parts, has the filter coefficient Filter to obtain second audio signal parts.

22. device (200) according to claim 21, wherein the filter coefficient in the multiple filter coefficient is The LPC parameters of linear prediction filter.

23. the device according to any one of claim 20 to 22 (200),

Wherein, more than the processor (210) is configured as applying the concealing audio signal section and is defined by following formula Porthole hides windowing signal part to obtain：

Wherein, the processor (210) is configured as determining the multiple filter system according to the hiding windowing signal part Number,

24. the apparatus according to claim 1 (300),

Wherein, first audio signal parts are the concealing audio signal sections, wherein second audio signal parts It is the subsequent audio signal parts,

Wherein, the processor (310) is configured to determine that the first subdivision of the concealing audio signal section, as described First subdivision of the first audio signal parts, so that first subdivision includes one of the concealing audio signal section Or multiple samples, but include less sample compared with the concealing audio signal section, and make first subdivision Each sample position of sample be in the concealing audio signal section, do not include any in first subdivision Any sample position of sample it is subsequent,

Wherein, the processor (310) is configured to determine that the third subdivision of the subsequent audio signal parts, so that described Third subdivision includes one or more samples of the subsequent audio signal parts, but with the subsequent audio signal parts Compared to including less sample, and each sample position of each sample of the third subdivision is made to be the subsequent sound Any sample position in frequency signal section, not including any sample in the third subdivision it is subsequent,

Wherein, the processor (310) is configured to determine that the second subdivision of the subsequent audio signal parts, as described Second subdivision of the second audio signal parts, so that not being included in the third subdivision in the subsequent audio signal parts Interior any sample include in the second subdivision of the subsequent audio signal parts,

Wherein, the processor (310) is configured as from the sample of the first subdivision of the concealing audio signal section really Fixed first peak value sample, so that the sample value of the first peak value sample is greater than or equal to the of the concealing audio signal section Any other sample value of any other sample of one subdivision, wherein the processor (310) is configured as from described subsequent The second peak value sample is determined in the sample of second subdivision of audio signal parts, so that the sample value of the second peak value sample More than or equal to any other sample value of any other sample of the second subdivision of the subsequent audio signal parts, wherein The processor (310) is configured as determining third peak value from the sample of the third subdivision of the subsequent audio signal parts Sample, so that the sample value of the third peak value sample is greater than or equal to the third subdivision of the subsequent audio signal parts Any other sample value of any other sample,

Wherein, when meeting condition, the processor (310) is configured as modifying in the subsequent audio signal parts , each sample value of leading each sample as the second peak value sample, to generate the decoding audio signal portion Point,

Wherein, the condition is that the sample value of the second peak value sample is greater than sample value and the institute of the first peak value sample The sample value for stating the second peak value sample is greater than the sample value of the third peak value sample, or

Wherein, the condition is between the sample value of the second peak value sample and the sample value of the first peak value sample One ratio is greater than between first threshold and the sample value of the second peak value sample and the sample value of the third peak value sample Second ratio is greater than second threshold.

25. device (300) according to claim 24, wherein the condition is the sample value of the second peak value sample Greater than the sample value of the first peak value sample and the sample value of the second peak value sample is greater than the third peak value sample Sample value.

26. device (300) according to claim 24, wherein the condition is that first ratio is greater than described first Threshold value and second ratio are greater than the second threshold.

27. device (300) according to claim 26, wherein the first threshold is greater than 1.1, and wherein described the Two threshold values are greater than 1.1.

28. the device according to claim 26 or 27 (300), wherein the first threshold is equal to the second threshold.

29. the device according to any one of claim 24 to 28 (300),

Wherein, when meeting the condition, the processor (310) be configured as modifying according to the following formula it is described after After each sample value of leading each sample in audio signal parts, as the second peak value sample：

s_modified(Lframe+i)=s (Lframe+i) α_i

Wherein, Lframe indicate it is in the subsequent audio signal parts, for the subsequent audio signal parts it is any its It is the sample position of leading sample for any other sample position of its sample,

Wherein, Lframe+i is the integer for indicating the sample position of i+1 sample of the subsequent audio signal parts,

Wherein, 0≤i≤Imax-1, wherein I_maxThe sample position of -1 instruction the second peak value sample,

Wherein, s (Lframe+i) is by the i+1 of the subsequent audio signal parts before the processor (310) modification The sample value of a sample,

Wherein, s_modifiedIt (Lframe+i) is by the modified subsequent audio signal parts of the processor (310) The sample value of i+1 sample,

Wherein, 0 < α_i< 1.

30. device (300) according to claim 29,

Wherein

Wherein, E_cmaxIt is the sample value of the first peak value sample,

Wherein, E_maxIt is the sample value of the second peak value sample,

Wherein, E_gmaxIt is the sample value of the third peak value sample.

31. the device according to claim 29 or 30 (300),

Wherein, when meeting the condition, the processor (310) is configured as described to modify according to the following formula It is in multiple samples of subsequent audio signal parts, as in two or more subsequent samples of the second peak value sample Each sample sample value, to generate the decoding audio signal parts：

s_modified(Imax+k)=s (Imax+k) α_i,

Wherein, Imax+k is the integer for indicating the sample position of max+k+1 sample of I of the subsequent audio signal parts.

32. device (10 according to any one of the preceding claims；100；200；300), wherein described device (10； 100；200；It 300) further include hidden unit (8), the hidden unit (8) is configured as to error or loss present frame It executes and hides, to obtain the concealing audio signal section.

33. device (10 according to claim 32；100；200；300),

Wherein, described device (10；100；200；It 300) further include activation unit (6), the activation unit (6) is configured as examining Survey whether present frame is lost or malfunction, wherein the activation unit (6) is configured as activating if current frame loss or error The hidden unit (8) is hidden with executing to present frame.

34. device (10 according to claim 33；100；200；300),

Wherein, the activation unit (6) is configured as：If current frame loss or error, detect the subsequent frame not malfunctioned Whether reach, and

Wherein, the activation unit (6) is configured as：If current frame loss or error and if the subsequent frame not malfunctioned It reaches, then activates the processor (8) to generate the decoding audio signal parts.

35. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The method of the conversion divided, wherein the method includes：

Believe according to the first audio signal parts and according to the decoding audio that the second audio signal parts generate the audio signal Number part, wherein first audio signal parts depend on the concealing audio signal section, and wherein second sound Frequency signal section depends on the subsequent audio signal parts, and

The decoding audio signal parts are exported,

Wherein, the first subdivision that the decoding audio signal includes determining first audio signal parts is generated, so that with First audio signal parts include less sample compared to first subdivision,

Wherein, generating the decoding audio signal parts is the first subdivision using first audio signal parts and makes It is performed with the second subdivision of second audio signal parts or second audio signal parts, so that for described Each sample in two or more samples of second audio signal parts, two of second audio signal parts or more The sample position of the sample in multiple samples is equal to the sample position of a sample of the decoding audio signal parts, and And the sample value of the sample in two or more samples of second audio signal parts is made to be different from the solution The sample value of one sample of code audio signal parts.

36. a kind of computer program, for realizing when being executed on computer or signal processor according to claim 35 institute The method stated.

37. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The system of the conversion divided, wherein the system comprises：

Switching module (701)；

The device according to any one of claim 24 to 31 (300), as the device for realizing energy damping (300), and

The device according to any one of claim 2 to 15 (100) is adapted to the device (100) being overlapped as pitch,

Wherein, the switching module (701) is configured as according to the concealing audio signal section and according to the subsequent sound Frequency signal section come select for realizing energy damping device (300) and for realizing pitch adaptation overlapping device (100) In a device, with for generating the decoding audio signal parts.

38. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The system of the conversion divided, wherein the system comprises：

Switching module (702)；

Device described in any one of 6 to 23 (200) according to claim 1, as the device for realizing excitation overlapping (200),

Wherein, the switching module (702) is configured as according to the concealing audio signal section and according to the subsequent sound Frequency signal section come select for realizing energy damping device (300) and for realizing excitation overlapping device (200) in One device, for generating the decoding audio signal parts.

39. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The system of the conversion divided, wherein the system comprises：

Switching module (703)；

The device according to any one of claim 2 to 15 (100), as the device for realizing pitch adaptation overlapping (100), and

Wherein, the switching module (703) is configured as according to the concealing audio signal section and according to the subsequent sound Frequency signal section is to select to be adapted to the device (100) of overlapping for realizing pitch and for realizing the device (200) of excitation overlapping In a device, with for generating the decoding audio signal parts.

40. a kind of for improving from the concealing audio signal section of audio signal to the subsequent audio signal portion of the audio signal The system of the conversion divided, wherein the system comprises：

Switching module (704)；

The device according to any one of claim 2 to 15 (100), as the device for realizing pitch adaptation overlapping (100),

Device described in any one of 6 to 23 (200) according to claim 1, as the device for realizing excitation overlapping (200), and

The device according to any one of claim 24 to 31 (300), as the device for realizing energy damping (300),

Wherein, the switching module (704) is configured as according to the concealing audio signal section and according to the subsequent sound Frequency signal section come select for realizing pitch be adapted to overlapping device (100), for realizing excitation overlapping device (200), And for realizing a device in the device (300) of energy damping, for generating the decoding audio signal parts.

41. system according to claim 40,

Wherein, the switching module (704) is configured to determine that in concealing audio signal frame and subsequent audio signal frame at least Whether one include voice, and

Wherein, the switching module (704) is configured as：If the concealing audio signal frame and the subsequent audio signal frame Do not include voice, then selects the device (300) for realizing energy damping to generate the decoding audio signal parts.

42. the system according to claim 40 or 41, wherein the switching module (704) is configured as：According to subsequent sound The frame length of frequency signal frame and according to the pitch of the concealing audio signal section or the subsequent audio signal parts At least one of pitch selects the device (100) for realizing pitch adaptation overlapping, for realizing the device of excitation overlapping (200) and for realizing one device in the device (300) of energy damping for generating the decoding audio letter Number part, wherein the subsequent audio signal parts are the audio signal parts of the subsequent audio signal frame.

43. system according to claim 39,

Wherein, the system also includes the device according to any one of claim 24 to 31 (300) as realizing The device (300) of energy damping,

Wherein, the switching module (703) is configured as according to the concealing audio signal section and according to the subsequent sound Frequency signal section is to select to be adapted to the device (100) of overlapping for realizing pitch and for realizing the device (200) of excitation overlapping In one device, to generate intermediate audio signal parts,

Wherein, the device (300) for realizing energy damping is configured as handling the intermediate audio signal parts to produce The raw decoding audio signal parts.