WO2011134415A1

WO2011134415A1 - Audio signal switching method and device

Info

Publication number: WO2011134415A1
Application number: PCT/CN2011/073479
Authority: WO
Inventors: 刘泽新; 苗磊; 胡晨; 吴文海; 郎玥; 张清
Original assignee: 华为技术有限公司
Priority date: 2010-04-28
Filing date: 2011-04-28
Publication date: 2011-11-03
Also published as: CN101964189A; ES2635212T3; JP2017033015A; EP3249648A1; JP2015045888A; BR112012013306A2; JP5667202B2; CN101964189B; AU2011247719A1; BR112012013306B1; BR112012013306B8; KR101377547B1; AU2011247719B2; ES2718947T3; EP2485029A1; JP2013512468A; EP2485029A4; EP3249648B1; JP6410777B2; JP6027081B2

Abstract

An audio signal switching method and a device are provided. The audio signal switching method comprises the following steps: when an audio signal switches, performing weighting process on a first high-frequency band signal of a current frame audio signal and a second high-frequency band signal of former M frames audio signals, so as to obtain a processed first high-frequency band signal (101); synthesizing the processed first high-frequency band signal and a first low-frequency band signal of the current frame audio signal into a broad band signal (102).

Description

The present invention claims the priority of the Chinese patent application filed on April 28, 2010, the Chinese Patent Office, the application number is 201010163406.3, and the invention name is "Voice and audio signal switching method and device", the prior application The contents of the document are incorporated by reference in this application. Technical field

The embodiments of the present invention relate to the field of communications technologies, and in particular, to a voice and audio signal switching method and apparatus. Background technique

At present, in the process of network state transmission, due to the different network states, the network will cut off the code stream of the speech and audio signals transmitted from the encoding end to the network, so that the decoding end will intercept the data. The subsequent code stream decodes speech and audio signals of different bandwidths.

In the prior art, due to the different bandwidth of the speech and audio signals transmitted in the network, during the transmission of the speech and audio signals, there are switching between the narrowband speech and audio signals to the wideband speech and audio signals, and the wideband speech and audio signals to the narrowband speech. The phenomenon of audio signal switching. The narrowband signal mentioned in the present invention is a wideband signal which is switched to a low band component only and a high band component is empty by upsampling and low pass filtering, and the wideband speech and audio signal has both a low band signal component and a High frequency band signal component. In the process of implementing the present invention, the inventors have found that at least the following problems exist in the prior art: Since the narrowband speech audio signal and the wideband speech audio signal are different from each other in the high frequency band signal, the speech and audio signals of different bandwidths are switched. When the audio signal energy is excited, the user may feel uncomfortable and cause the quality of the user's audio signal to deteriorate.

Summary of the invention

Embodiments of the present invention provide a method and device for switching voice and audio signals, which are implemented smoothly. The audio and video signals of different bandwidths are switched to improve the quality of the audio signals received by the user. The embodiment of the invention provides a method for switching voice and audio signals, including:

When the audio signal is switched, the first high frequency band signal of the current frame audio signal and the second high frequency band signal of the previous M frame voice signal are weighted to obtain the processed first high frequency band signal. Where M is greater than or equal to 1;

And combining the processed first high frequency band signal with the first low frequency band signal of the current frame speech audio signal into a wide frequency band signal.

An embodiment of the present invention provides an audio signal switching apparatus, including:

a processing module, configured to perform weighting processing on the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the previous M frame speech and audio signal when the audio signal is switched, to obtain the processed a high frequency band signal; wherein, M is greater than or equal to 1;

And a first synthesizing module, configured to synthesize the processed first high frequency band signal with the first low frequency band signal of the current frame audio signal into a broadband signal. The voice-audio signal switching method and apparatus according to the embodiment of the present invention processes the first high-band signal of the current frame voice-audio signal according to the second high-band signal of the voice-audio signal of the previous M frame, so that the first M-frame The second high-band signal of the speech audio signal can smoothly transition to the processed first high-band signal, and combine the processed first high-band signal with the first low-band signal into a wide-band signal, thereby switching In the process of interlingual audio signals with different bandwidths, the speech/audio signal switching of different bandwidths can be smoothly performed, the influence of the subjective auditory quality difference of the speech and audio signals caused by the energy mutation is reduced, and the quality of the audio signal of the user is improved. DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description of the drawings used in the embodiments or the prior art description will be briefly described below. Obviously, the drawings in the following description It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any inventive labor.

1 is a flowchart of Embodiment 1 of a method for switching a voice signal according to the present invention; 2 is a flowchart of Embodiment 2 of a method for switching a voice signal according to the present invention;

Figure 3 is a flow chart of the step 201 - an embodiment of Figure 2;

4 is a flow chart of step 302 in FIG. 3;

Figure 5 is a flow chart 2 of another embodiment of step 302 in Figure 3;

Figure 6 is a flow chart of the step 202 of Figure 2;

Figure 7 is a second flowchart of another embodiment of step 201 in Figure 2;

Figure 8 is a third flowchart of another embodiment of step 201 in Figure 2;

9 is a schematic structural diagram of Embodiment 1 of a speech audio signal switching apparatus according to the present invention;

10 is a schematic structural diagram of Embodiment 2 of a speech audio signal switching apparatus according to the present invention;

Figure 11 is a schematic structural diagram of a processing module in the second embodiment of the audio signal switching device of the present invention;

FIG. 12 is a schematic structural diagram of a first module in Embodiment 2 of a speech audio signal switching apparatus according to the present invention; FIG.

Figure 13a is a schematic diagram showing the structure of a processing module in the second embodiment of the voice-audio signal switching device of the present invention; Figure 13b is a schematic structural view of the processing module in the second embodiment of the voice-audio signal switching device of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is a partial embodiment of the invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

FIG. 1 is a flowchart of Embodiment 1 of a method for switching a voice signal according to the present invention. As shown in FIG. 1, the voice-audio signal switching method of this embodiment, when the voice-audio signal appears to be switched, after switching the frame Each frame is processed as follows:

Step 101: When the speech signal is switched, the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the previous M frame speech and audio signal are weighted to obtain the first high after processing. a band signal; wherein, M is greater than or equal to 1.

Step 102: Synthesize the processed first high frequency band signal with the first low frequency band signal of the current frame speech audio signal into a wide frequency band signal.

The pre-M frame speech and audio signal in the embodiment provided by the present invention refers to the M frame speech frequency signal before the current frame. The L frame speech and audio signal before switching refers to the L frame speech and audio signal before the frame is switched when the audio signal is switched. The current speech frame is a broadband signal and the previous frame is a narrowband signal; or the current speech frame is a narrowband signal and the previous frame is a wideband signal, then the speech and audio signals are switched, and the current speech frame is a switching frame. .

The voice-audio signal switching method of the embodiment of the present invention processes the first high-band signal in the current frame voice-audio signal according to the second high-band signal in the pre-M frame voice-audio signal, so that the first M-frame is obtained. The second high-band signal in the speech audio signal can smoothly transition to the processed first high-band signal, thereby enabling high-band signals of different bandwidth speech and audio signals in switching between different bandwidth speech and audio signals. The smooth transition can be smoothly performed. Finally, the processed first high frequency band signal and the first low frequency band signal are combined into a wideband signal, and the wideband signal is transmitted to the user terminal, so that the user enjoys a high quality speech and audio signal. In the speech/audio signal switching method of the embodiment, the speech and audio signal switching of different bandwidths can be smoothly performed, the influence of the subjective auditory quality difference of the speech and audio signals caused by the energy excitation is reduced, and the quality of the audio signal of the user is improved.

FIG. 2 is a flowchart of Embodiment 2 of a method for switching a voice signal according to the present invention. As shown in FIG. 2, the voice and audio signal switching method in this embodiment includes:

Step 200: Synthesize the first high frequency band signal of the current frame speech audio signal with the first low frequency band signal into a broadband signal when no switching occurs.

Specifically, the first frequency band audio signal in this embodiment may be a wideband speech audio signal or a narrowband speech audio signal. In the transmission of speech and audio signals, when the first frequency band is voiced When the frequency signal does not switch, it is processed in the following two cases: 1. If the first band speech and audio signal is a wideband speech and audio signal, the low frequency band signal and the high frequency band signal in the wideband speech and audio signal are synthesized. Broadband signal; 2. If the first frequency band audio signal is a narrowband speech and audio signal, the low frequency band signal and the high frequency band signal in the narrow band speech and audio signal are combined into a wideband signal, at this time, although it is a wide band Signal, but the high band is empty, no information.

Step 201: When the speech signal is switched, the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the previous M frame speech and audio signal are weighted to obtain the first high after processing. Frequency band signal. Where M is greater than or equal to 1.

Specifically, when the voice and audio signals of different bandwidths are switched, the first high frequency band signal of the current frame voice signal is processed according to the second high frequency band signal of the previous M frame voice signal, so that the front M The second high-band signal of the frame speech audio signal can smoothly transition to the processed first high-band signal, for example, when the wide-band speech audio signal is switched to the narrow-band speech audio signal, due to the narrow-band speech audio signal corresponding to The high-band signal is empty, so in order to smoothly switch the wideband speech and audio signal to the narrow-band speech and audio signal, it is necessary to recover the components of the high-band signal corresponding to the narrow-band speech and audio signal, and when the narrow-band speech and audio signal is switched to Wideband speech and audio signals, since the high-band signal in the wideband speech and audio signal is not empty, in order to smoothly switch the narrow-band speech and audio signals to the wide-band speech and audio signals, it is necessary to reduce the continuous multi-frame wide-band speech and audio after switching. The energy of the high-band signal in the signal, causing the high-band signal of the wideband speech and audio signal to gradually transition to the true high-band signal. The current frame speech and audio signal is processed in step 201, so that the high-band signal in the speech/audio signals of different bandwidths can smoothly transition, and when the switching between the wide-band speech audio signal and the narrow-band speech audio signal is solved, The user's hearing is uncomfortable due to the energy stimuli, so that the user receives a high quality audio signal. In order to simplify the process of obtaining the processed first high-band signal, the first high-band signal and the second high-band signal of the pre-M frame speech and audio signal may be directly weighted, and the result obtained after the processing is The first high frequency band signal after processing.

Step 202: The first high frequency band signal after processing and the first low frequency of the current frame voice signal Signaling a wideband signal with a signal.

Specifically, after the current frame speech audio signal is processed by step 201, the second high frequency band signal of the pre-M frame speech and audio signal can be smoothly transitioned to the processed first high frequency band signal of the current frame, and then step 202 is performed. Combining the processed first high frequency band signal with the first low frequency band signal of the current frame speech audio signal to form a wide frequency band signal, so that the voice and audio signals received by the user are wideband speech and audio signals, and realizing different bandwidths of speech and audio. The smooth switching of the signal is beneficial to improve the quality of the audio signal received by the user.

The voice-audio signal switching method of the embodiment of the present invention processes the first high-band signal in the current frame voice-audio signal according to the second high-band signal of the previous M-frame voice signal, so that the first M-frame language The second high-band signal of the audio signal can smoothly transition to the processed first high-band signal, thereby smoothing the high-band signal of the speech and audio signals of different bandwidths during the switching of the speech and audio signals of different bandwidths. Transition switching; Finally, the processed first high frequency band signal and the first low frequency band signal are combined into a wideband signal, and the wideband signal is transmitted to the user terminal, so that the user enjoys a high quality voice signal. In the embodiment, the speech and audio signal switching method can smoothly switch the speech and audio signals of different bandwidths, reduce the influence of the subjective auditory quality difference of the audio signal caused by the energy excitation, and improve the quality of the audio signal received by the user. Further, by synthesizing the first high band signal of the current frame speech audio signal and the first low band signal into a wide band signal when the speech/audio signal switching of the different bandwidth does not occur, the user is allowed to obtain a high quality audio signal.

Based on the foregoing technical solution, optionally, when the wideband audio and video signal is converted to the narrowband speech and audio signal, as shown in FIG. 3, step 201 in this embodiment includes:

Step 301: Predict the predicted fine structure information and the predicted envelope information corresponding to the first high frequency band signal.

Specifically, the speech and audio signals can be decomposed into two parts: fine structure information and envelope information, so that the speech and audio signals can be restored according to the fine structure information and the envelope information. In the process of switching from a wideband speech and audio signal to a narrowband speech and audio signal, since only a low frequency band signal is present in the narrowband speech and audio signal, the corresponding high frequency band signal is empty, in order to smooth the wideband speech and audio signal. Switching to the narrowband speech and audio signal requires recovery of the high frequency band signal required by the current narrowband speech and audio signal to achieve smooth switching of the speech and audio signals. Step 301 in this embodiment predicts the predicted fine structure information and the predicted envelope information corresponding to the first high frequency band signal in the narrowband speech and audio signal.

In order to more accurately predict the predicted fine structure information and the predicted envelope information corresponding to the current frame speech and audio signal, step 301 may further perform signal classification on the first low frequency band signal of the current frame speech audio signal; and then according to the first low frequency band. The signal type corresponding to the signal predicts predicted fine structure information and predicted envelope information corresponding to the first high frequency band signal. For example, the narrowband speech and audio signal of the current frame may be a harmonic signal, or a non-harmonic signal or a transient signal, and the like, according to the information type corresponding to the narrowband speech and audio signal, the signal of the type should be known. The fine structure information and the envelope information are provided to more accurately predict the fine structure information and the envelope information corresponding to the high frequency band signal of the current frame. The speech audio signal switching method of the present invention does not limit the signal type of the narrowband speech and audio signal.

Step 302: Perform weighting processing on the predicted envelope signal and the first M frame envelope information corresponding to the second high frequency band signal of the pre-M frame speech and audio signal to obtain the first information about the first high frequency band signal. .

Specifically, after predicting the predicted fine structure information and the predicted envelope information corresponding to the first high-band signal of the current frame in step 301, the prediction may be based on the predicted envelope information and the second high frequency of the pre-M frame speech and audio signals. The first M-frame envelope information corresponding to the signal is generated, and the first packet information corresponding to the first high-band signal is generated.

Specifically, the process of generating the first envelope information corresponding to the first high-band signal in step 302 can be implemented in the following two manners, as follows:

Method 1, as shown in FIG. 4, an embodiment of obtaining the first envelope information by step 302 may include:

Step 401: Calculate a correlation between the first low-band signal and the low-band signal of the audio signal of the first N frame according to the first low-band signal and the low-band signal of the audio signal of the first N frame. Number; where N is greater than or equal to 1.

Specifically, comparing the first low frequency band signal of the current frame speech audio signal with the low frequency band signal of the first N frame of the speech and audio signal to obtain the first low frequency band signal of the current frame speech audio signal and the language of the first N frame a correlation coefficient between the low frequency band signals of the audio signal, for example, by determining a certain frequency band information in the first low frequency band signal of the current frame speech audio signal, and the same frequency band as the low frequency band signal of the speech audio signal of the first N frame. The magnitude of the energy of the information or the difference in the type of information, etc., determine the correlation between them to calculate the desired correlation coefficient. The speech signal of the first N frames may be a mixed signal composed of a narrowband speech audio signal, a wideband speech audio signal or a narrowband speech audio signal and a wideband speech audio signal.

Step 402: Determine whether the correlation coefficient is within a given first threshold range.

Specifically, after the correlation coefficient is calculated in step 401, it is determined whether the correlation coefficient is within a given threshold range. The function of calculating the correlation coefficient is to know whether the current frame speech audio signal is fading from the speech signal of the previous N frame or is abrupt, that is to say, whether their characteristics are the same, and then judging the prediction of the current frame speech audio signal. The weight of the high-band signal of the previous frame when the high-band signal is used. For example, if the first low-band signal of the current frame audio signal is equivalent to the low-band signal energy of the speech signal of the previous frame, and the type is the same, it means that the speech signal of the previous frame is compared with the current frame audio signal. High correlation, therefore, in order to accurately restore the first envelope information corresponding to the current frame speech audio signal, and restore the first envelope information corresponding to the current frame speech audio signal, the high frequency corresponding to the speech and audio signal of the previous frame Envelope information with envelope information or transitions occupies a larger weight; otherwise, if the first low-band signal of the current frame-audio signal differs greatly from the low-band signal energy of the speech signal of the previous frame, and the type is different , indicating that the speech signal of the previous frame has a lower correlation with the current frame speech audio signal. Therefore, in order to accurately restore the first envelope information corresponding to the current frame speech audio signal, the corresponding corresponding frame audio signal is restored. When the first envelope information is used, the high-band envelope information or the transition envelope information corresponding to the audio-video signal of the previous frame occupies a smaller weight;

Step 403, if the correlation coefficient is not within the given first threshold range, according to the set The first first weight 1 and the first weight 2 are weighted to calculate the first envelope information. The first weight 1 is a weight value of the previous frame envelope information corresponding to the high frequency band signal of the previous frame speech and audio signal, and the first weight 2 is a weight value of the envelope information.

Specifically, when step 402 concludes that the correlation coefficient is not within the given first threshold range, it may be known that the current frame speech audio signal has a small correlation with the previous N frame speech and audio signals, so the front M The first M frame envelope information or the transition envelope information corresponding to the first band speech and audio signal of the frame or the high band envelope information corresponding to the speech signal of the previous frame has little influence on the first envelope information, and is restored. When the first envelope information corresponding to the current frame audio signal is the first M frame envelope information corresponding to the first band of the first M frame, or the transition envelope information or the high frequency corresponding to the previous frame of the speech signal Envelope information has a smaller weight. Therefore, based on the first weight one and the first weight two that have been set, the first envelope information of the current frame can be calculated. The first weight one is a weight value of the envelope information corresponding to the high frequency band signal of the previous frame speech and audio signal, and the previous frame speech and audio signal may be a wideband speech audio signal or a processed narrow frequency band language. The audio signal, when switched for the first time, the speech audio signal of the previous frame is the wideband speech and audio signal; and the first weight 2 is the weight value of the predicted envelope information. The product of the predicted envelope information and the first weight two is added, and the sum of the envelope information of the previous frame and the first weight one is added, and the sum of the weights obtained is the first envelope information of the current frame. In addition, the speech and audio signals transmitted later are restored in this manner and weights, and the first envelope information corresponding to the audio signal is restored until the speech and audio signals are switched again.

Step 404: If the correlation coefficient is within the first threshold range, perform weighting processing according to the set second weight one and the second weight two to calculate transition envelope information. The second weight 1 is a weight value of the envelope information before the handover, and the second weight 2 is a weight value of the envelope information of the previous M frame; where L is greater than or equal to 1.

Specifically, when step 402 finds that the correlation coefficient is within a given threshold range, it can be known that the current frame speech audio signal has similar characteristics to the speech signal of the previous consecutive N frames, and the current frame speech audio signal corresponds to the first An envelope information is greatly affected by the envelope information of the speech signal of the previous consecutive N frames, and considering the authenticity of the envelope of the previous M frame, therefore, the envelope information according to the previous M frame needs to be The envelope information before the handover is used to solve the transition envelope information corresponding to the current frame speech and audio signal, and the envelope information of the previous M frame and the envelope information of the pre-switch L frame are restored when the first envelope information of the current frame speech and audio signal is restored. Takes a larger weight; then solves the first envelope information through the transition envelope information. The second weight 1 is a weight value of the envelope information before the handover, and the second weight 2 is a weight value of the envelope information of the previous M frame. Then, the product of the envelope information before the switching and the second weight one, plus the sum of the product of the previous M frame envelope information and the second weight 2, the obtained weighted value is the transition envelope information.

Step 405: Decrease the second weight one by one in the first weight step, and increase the second weight two by using the first weight step.

Specifically, with the transmission of the speech and audio signals, the subsequent narrow-band speech and audio signals are gradually reduced by the influence of the wide-band speech and audio signals before switching, and in order to make the calculated first envelope information more accurate, the Two weights one and two weights two adjust the applicability. Since the subsequent audio signal is gradually reduced by the influence of the wideband speech and audio signal of the L frame before switching, the value of the second weight one gradually becomes smaller, and the value of the second weight two gradually increases, thereby weakening the pre-switching The effect of the envelope information on the first envelope information. The step 405 may modify the second weight one and the second weight two by using the following method: the new second weight one is equal to the old second weight one minus the first weight step, and the new second weight two is equal to the old The second weight 2 is added to the first weight step; wherein the first weight step is a set value.

Step 406: Determine whether the third weight 1 that has been set is greater than the first weight one.

Specifically, the third weight 1 is a weight value of the transition envelope information. By comparing the magnitudes of the third weight one with the second weight one, it can be known that the first envelope information of the current frame is affected by the transition envelope information. The transition envelope information is calculated from the envelope information of the first M frame and the envelope information before the handover. Therefore, the third weight 1 actually represents the degree of influence of the envelope information before the first envelope information is switched. .

Step 407: If the third weight is not greater than the first weight one, perform weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information.

Specifically, when it is determined in step 406 that the third weight one is less than or equal to the first weight, The current frame speech audio signal is far from the speech signal of the L frame before the handover, and the first envelope information is mainly affected by the envelope information of the previous M frame, and therefore, according to the first weight 1 and the first weight that have been set. Second, the first envelope information of the current frame can be calculated.

Step 408: If the third weight one is greater than the first weight one, perform weighting processing according to the set third weight one and third weight two to calculate the first envelope information. The third weight 1 is a weight value of the transition envelope information, and the third weight 2 is a weight value of the predicted envelope information.

Specifically, when it is determined in step 406 that the third weight 1 is greater than the first weight, the current frame speech audio signal is closer to the L frame speech and audio signal before the handover, and the first envelope information is affected by the envelope information before the handover. Large, therefore, the first envelope information of the current frame needs to be solved according to the transition envelope information. The third weight 1 is a weight value of the transition envelope information, and the third weight 2 is a weight value of the predicted envelope information. Then, the product of the transition envelope information and the third weight one, and the weighted value obtained by adding the sum of the predicted envelope information and the third weight two is the first envelope information.

Step 409: Decrease the third weight one by the second weight step, and increase the third weight two by the second weight step, until the third weight one is equal to zero.

Specifically, the purpose of modifying the third weight one and the third weight two in step 409 is the same as the purpose of modifying the second weight one and the second weight two in step 405, both for the purpose of the subsequent transmission of the audio and video signals before being switched. In the case where the influence of the L-frame audio signal is gradually reduced, in order to make the calculated first envelope information more accurate, the applicability is adjusted for the third weight one and the third weight two. Since the subsequent audio signal is gradually reduced by the influence of the L frame speech and audio signal before the switching, the value of the third weight one gradually becomes smaller, and the value of the third weight two gradually increases, thereby also achieving the weakening before the switching. The effect of the envelope information on the first envelope information. The step 409 may modify the third weight one and the third weight two by the following method: the new third weight one is equal to the old third weight one minus the second weight step, and the new third weight two is equal to the old The third weight 2 is added to the second weight step; wherein the second weight step is a set value.

The sum of the first weight one and the first weight two is one, the sum of the second weight one and the second weight two is one, the sum of the third weight one and the third weight two is one; the initial value of the third weight one is greater than First right The initial value of the first one; the first weight one and the first weight two are fixed constants. Specifically, the weight one and the weight two in the embodiment actually represent the percentage of the envelope information before the handover and the first M-frame envelope information composing the first envelope information of the current frame. For the current frame audio and video signal, the closer to the speech and audio signal of the L frame before the handover and the greater the correlation, the higher the percentage of the envelope information before the handover, and the opposite of the pre-M frame envelope information. The lower the percentage. When the current frame speech audio signal is far away from the speech signal of the L frame before switching, it indicates that the speech/audio signal has been stably transmitted in the network, or when the correlation between the current frame speech audio signal and the pre-switching L frame speech audio signal is low, It indicates that the current frame audio signal characteristics have changed. Therefore, the current frame speech and audio signals are less affected by the speech and audio signals of the L frames before switching, and the percentage of the envelope information before switching is lower.

In addition, the execution order of step 404 and step 405 in this embodiment may be interchanged, that is, the second weight one and the second weight two may be modified first, and then the transition is calculated according to the second weight one and the second weight two. Envelope information. Similarly, the execution order of step 408 and step 409 in this embodiment may be interchanged, that is, the third weight 1 and the third weight 2 may be modified first, and then the third weight 1 and the third weight 2 may be modified according to the modification. First envelope information.

Manner 2, as shown in FIG. 5, another embodiment of obtaining the first envelope information by using step 302 may further include:

Step 501: Calculate a correlation coefficient between the first low frequency band signal and the low frequency signal of the speech signal of the previous frame according to the first low frequency band signal of the current frame speech audio signal and the low frequency signal of the speech signal of the previous frame. .

Specifically, in order to obtain the first envelope information more accurately, the relationship between the energy of the same frequency band of the low frequency signal of the first low frequency band signal of the current frame speech audio signal and the low frequency signal of the audio signal of the previous frame is solved. In this embodiment, the correlation coefficient may be represented by "corr", and the current frame speech audio signal is obtained by the energy relationship between the first low frequency band signal of the current frame speech audio signal and the low frequency band signal of the speech signal of the previous frame. Correlation coefficient corr between the first low-band signal and the low-band signal of the speech signal of the previous frame, the smaller the energy difference, the larger the corr, otherwise, corr The smaller. For a specific process, refer to the description of the correlation calculation of the speech and audio signals of the first N frames in step 401.

Step 502: Determine whether the correlation coefficient is within a given second threshold range.

Specifically, after the value of corr is calculated in step 501, it is determined whether the calculated corr is within a given second threshold range. For example, this embodiment can represent the second threshold range as cl~c2.

Step 503: If the correlation coefficient is not within the second threshold range, perform weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information. The first weight one is the weight value of the previous frame envelope information corresponding to the high frequency band signal of the speech audio signal of the previous frame, and the first weight 2 is the weight value of the predicted envelope information; The first weight two is a fixed constant.

Specifically, when the step 502 is that the corr is smaller than cl or greater than c2, it is learned that the first envelope information corresponding to the current frame audio signal is less affected by the envelope information of the previous frame of the voice signal, so By setting the first weight one and the first weight two, the first envelope information of the current frame can be calculated. The product of the predicted envelope information and the first weight two, plus the sum of the product of the previous frame envelope information and the first weight one, the weighted sum obtained is the first envelope information of the current frame. In addition, the narrowband audio signal transmitted later recovers the first envelope information corresponding to the narrowband audio signal in this manner and the weight until the speech and audio signals of different bandwidths are switched again. For example, the first weight 1 in this embodiment may be represented by al, the first weight 2 may be represented by bl, the previous frame envelope information may be represented by pre_fenv, and the predicted envelope information may be represented by fenv, the first envelope. Information can be represented by cur_fenv. Then step 503 can be expressed by the following formula: cur_fenv = pre_fenv * a 1 + fenv * b 1.

Step 504: If the correlation coefficient is within the second threshold range, determine whether the second weight 1 that has been set is greater than the first weight one. The second weight 1 is a weight value of the envelope information before the handover corresponding to the high frequency band signal of the previous frame of the audio signal.

Specifically, if cl<corr<c2, by comparing the size of the second weight one with the first weight one, it can be known that the first envelope information of the current frame is subjected to the envelope information before the handover and the envelope information of the previous frame. of influence level.

Step 505: If the second weight is not greater than the first weight one, calculate the first envelope information according to the first weight one and the first weight two that have been set.

Specifically, when it is determined in step 504 that the second weight 1 is less than the first weight, the current frame speech audio signal is far from the speech signal of the previous frame, and the first envelope information is subjected to the envelope information before the handover. The influence is small. Therefore, according to the first weight one and the first weight two that have been set, the first envelope information of the current frame can be calculated. Then step 505 can be expressed by the following formula: cur_fenv = pre_fenv * a 1 + fenv * b 1.

Step 506: If the second weight one is greater than the first weight one, perform weighting processing according to the second weight one and the second weight 2 that has been set to calculate the first envelope information. Wherein, the second weight 2 is a weight value of the predicted envelope information. For example: The second weight one can be represented by a2, and the second weight two can be represented by b2.

Specifically, when it is determined in step 504 that the second weight 1 is greater than the first weight, the current frame speech audio signal is closer to the first frequency band audio signal of the previous frame, and the first envelope information is switched by the previous frame. The envelope information before the switching of the speech and audio signals has a large influence. Therefore, based on the second weight one and the second weight two that have been set, the first envelope information of the current frame can be calculated. Then, the product of the predicted envelope information and the second weight 2, plus the sum of the product of the envelope information before the switching and the second weight, the obtained weighted sum is the first envelope information of the current frame. Wherein, the envelope information before the handover can be represented by con_fenv, then step 506 can be expressed by the following formula: cur_fenv=con_fenv*a2+fenv*b2.

Step 507: Decrease the second weight one by the second weight step, and add the second weight two by the second weight step.

Specifically, with the transmission of the speech and audio signals, the subsequent current frame speech and audio signals are gradually reduced by the influence of the pre-switched audio signal, and in order to make the calculated first envelope information more accurate, the second weight is required. One and the second weight 2 are adjusted for applicability. Since the subsequent audio signal is gradually reduced by the influence of the previous frame of the speech and audio signal; and close to the current frame speech and audio signal The influence of a frame of audio and video signals becomes larger. Therefore, the value of the second weight one gradually becomes smaller, and the value of the second weight two gradually increases, thereby weakening the influence of the envelope information before the switching on the first envelope information, and enhancing the envelope information of the prediction to the first The impact of envelope information. The step 507 may modify the second weight one and the second weight two by using the following method: the new second weight one is equal to the old second weight one minus the first weight step, and the new second weight two is equal to the old The second weight 2 is added to the first weight step; wherein the first weight step is a set value.

The sum of the first weight one and the first weight two is one, and the sum of the second weight one and the second weight two is one; the initial value of the second weight one is greater than the initial value of the first weight one.

Step 303: Generate a processed first high frequency band signal according to the first envelope information and the predicted fine structure information.

Specifically, after obtaining the first envelope information of the current frame by using step 302, the required processed first high-band signal may be generated according to the first envelope information and the predicted fine structure information, so that the second The high frequency band signal can smoothly transition to the processed first high frequency band signal.

In the speech/audio signal switching method of the embodiment, in the process of switching the speech/audio signal from the wideband speech and audio signal to the narrowband speech and audio signal, the processed current frame is obtained by the predicted fine structure information and the first envelope information. a first high-band signal, so that the second high-band signal of the wide-band speech and audio signal before the switching can be smoothly transitioned to the processed first high-band signal corresponding to the narrow-band speech and audio signal, and more It is beneficial to improve the quality of the audio signal received by the user.

Based on the foregoing technical solution, optionally, as shown in FIG. 6, step 202 in this embodiment includes:

Step 601: Determine whether the processed first high frequency band signal needs to be attenuated according to the current frame speech audio signal and the switching of the speech and audio signals of the previous frame.

Specifically, since the first high-band signal of the narrow-band audio signal is empty, in the process of switching the wide-band audio-video signal to the narrow-band speech and audio signal, in order to prevent the recovered narrow-band speech and audio signal corresponding to the processed The first high-band signal has a bad influence. After the number of frames extended from the narrow-band speech audio signal to the wide-band signal reaches a given number of frames, the processed first high-frequency is processed. The energy with the signal is attenuated frame by frame until the attenuation coefficient reaches a given threshold. The interval between the current frame speech audio signal and the speech/audio signal of the previous frame can be known by the current frame audio signal and the audio signal of the previous frame. For example, the narrowband speech and audio signal can be recorded by the counter. The number of frames transmitted. This frame number can be a value that is predetermined to be greater than or equal to zero.

Step 602: Combine the processed first high frequency band signal with the first low frequency band signal into a wideband signal if attenuation is not required.

Specifically, if it is determined in step 601 that the processed first high-band signal does not need to be attenuated, the processed first high-band signal and the first low-band signal are directly combined into a wide-band signal.

Step 603: If attenuation is required, determine whether the attenuation factor corresponding to the processed first high frequency band signal is greater than a threshold.

Specifically, the initial value of the attenuation factor is one; the threshold is less than one and greater than or equal to zero. If it is determined in step 601 that the processed first high frequency band signal needs to be attenuated, it is determined in step 603 whether the attenuation factor corresponding to the processed first high frequency band signal is greater than a given threshold.

Step 604: If the attenuation factor is not greater than a given threshold, multiply the processed first high-band signal by a threshold, and then synthesize the broadband signal with the first low-band signal.

Specifically, if the value of the attenuation factor is not greater than a given threshold in step 603, it indicates that the energy of the processed first high-band signal has been attenuated to a certain extent, and the processed first high-band signal has been It will not bring bad effects, and you can maintain this attenuation ratio in the future. Then, the processed first high frequency band signal is multiplied by a threshold value, and then the wideband signal is synthesized with the first low frequency band signal.

Step 605: If the attenuation factor is greater than a given threshold, multiplying the processed first high-band signal by an attenuation factor, and then synthesizing the broadband signal with the first low-band signal.

Specifically, if step 603 finds that the value of the attenuation factor is greater than a given threshold, it indicates that the first high-band signal after processing may cause a bad hearing effect at the attenuation factor, and further Attenuate until a given threshold. Then, the processed first high frequency band signal is multiplied by the attenuation factor, and then the wide frequency band signal is synthesized with the first low frequency band signal.

Step 606: Modify the attenuation factor to reduce the attenuation factor. Specifically, with the transmission of the speech and audio signals, the subsequent narrowband audio signals are gradually reduced by the influence of the speech and audio signals before switching, and correspondingly, the attenuation factor should also be gradually reduced.

Based on the above technical solution, optionally, when the narrowband speech and audio signal is switched to the wideband speech and audio signal, as shown in FIG. 7, an implementation of the processed first high frequency band signal is obtained through step 201 in this embodiment. Examples include:

Step 701: Perform weighting processing according to the set fourth weight one and fourth weight two to calculate the processed first high frequency band signal. The fourth weight 1 is a weight value of the second high frequency band signal, and the fourth weight 2 is a weight value of the first high frequency band signal of the current frame audio signal.

Specifically, in the process of switching from a narrowband speech and audio signal to a wideband speech and audio signal, since the high frequency band signal in the wideband speech and audio signal is not empty, the high frequency band signal corresponding to the narrowband speech and audio signal is Empty or processed high-band signals, in order to enable smooth switching of narrow-band speech and audio signals to wide-band speech and audio signals, energy attenuation of high-band signals in wide-band speech and audio signals is required to implement speech and audio signals. Smooth switching. By adding the product of the second high frequency band signal and the fourth weight one, and adding the product of the first high frequency band signal and the fourth weight two, the obtained weighted value is the processed first high frequency band signal. .

Step 702: Decrease the fourth weight one by the third weight step, and increase the fourth weight two by the third weight step, until the fourth weight one is equal to zero. The sum of the fourth weight one and the fourth weight two is one.

Specifically, with the transmission of the speech and audio signals, the subsequent wideband speech and audio signals are gradually reduced by the influence of the narrowband speech and audio signals before switching. Therefore, the fourth weight gradually becomes smaller, and the fourth weight 2 gradually increases until the fourth weight 1 becomes zero, and the fourth weight 2 becomes one, that is, the transmitted speech and audio signal is always a broadband audio signal. .

Similarly, as shown in FIG. 8, another embodiment of obtaining the processed first high-band signal by step 201 in this embodiment may further include:

Step 801: Perform weighting processing according to the set fifth weight one and fifth weight two to calculate the processed first high frequency band signal. Wherein, the fifth weight one is a fixed parameter that has been set The weight value of the number, the fifth weight 2 is the weight value of the first high frequency band signal of the current frame speech audio signal. Specifically, since the first high frequency band signal of the narrowband speech and audio signal is empty, a fixed parameter may be set instead of the high frequency band signal of the narrowband speech and audio signal, wherein the fixed parameter is one greater than or equal to zero less than the first A constant of the energy of a high frequency band signal. The weighted value obtained by the product of the fixed parameter and the fifth weight one plus the product of the first high frequency band signal and the fifth weight two is the processed first high frequency band signal.

Step 802: Decrease the fifth weight one in units of the fourth weight step, and increase the fifth weight two in units of the fourth weight step, until the fifth weight one is equal to zero; wherein, the fifth weight one and the fifth weight two The sum of one.

Specifically, with the transmission of the speech and audio signals, the subsequent wideband speech and audio signals are gradually reduced by the influence of the narrowband speech and audio signals before switching. Therefore, the fifth weight gradually becomes smaller, and the fifth weight 2 gradually increases until the fifth weight becomes zero, and the fifth weight 2 becomes one, that is, the transmitted speech and audio signals are always true broadband words. audio signal.

In the method for switching the speech and audio signals of the present embodiment, in the process of switching the speech/audio signal from the narrowband speech audio signal to the wideband speech audio signal, the high frequency band signal of the wideband speech and audio signal is attenuated and processed. The high-band signal enables the high-band signal corresponding to the narrow-band speech and audio signal before the switching to smoothly transition to the processed high-band signal corresponding to the wide-band speech and audio signal, which is more conducive to improving the user's listening audio. The quality of the signal.

The envelope information in this embodiment may also be replaced by other parameters capable of representing a high-band signal, such as: Linear Predictive Coding (LPC) parameters, or amplitude parameters.

A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk. FIG. 9 is a schematic structural diagram of Embodiment 1 of a speech audio signal switching apparatus according to the present invention. As shown in FIG. 9, the audio signal switching apparatus of this embodiment includes: a processing module 91 and a first synthesizing module 92.

The processing module 91 is configured to perform weighting processing on the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the previous M frame speech and audio signal when the speech signal is switched, to obtain the processed A high frequency band signal. Where M is greater than or equal to 1.

The first synthesis module 92 is configured to synthesize the processed first high frequency band signal with the first low frequency band signal of the current frame speech audio signal into a wide frequency band signal.

The speech/audio signal switching device of the embodiment of the present invention processes, by the processing module, the first high-band signal in the current frame speech and audio signal according to the second high-band signal in the pre-M frame speech and audio signal, so that The two high-band signals can smoothly transition to the processed first high-band signal, so that the high-band signals of the different bandwidth speech and audio signals can be smoothly switched during the process of switching the speech and audio signals of different bandwidths; Finally, the processed first high frequency band signal and the first low frequency band signal are combined by the first synthesis module to synthesize a wideband signal, and the wideband signal is transmitted to the user terminal, so that the user enjoys a high quality speech and audio signal. The method for switching the speech and audio signals of the embodiment can smoothly perform the switching of the speech and audio signals of different bandwidths, reduce the influence of the subjective auditory quality difference of the speech and audio signals caused by the energy excitation, and improve the quality of the audio signal of the user.

FIG. 10 is a schematic structural diagram of Embodiment 2 of a speech audio signal switching apparatus according to the present invention. As shown in FIG. 10, the audio signal switching apparatus of this embodiment is based on the first embodiment of the audio signal switching apparatus. The difference is that the audio signal switching apparatus of this embodiment further includes: a second combining module 103.

The second synthesizing module 103 is configured to synthesize the first high band signal and the first low band signal into a wide band signal when the switching of the speech signal does not occur.

In the audio signal switching apparatus of this embodiment, by setting the second combining module, the first low frequency in the first frequency band audio signal of the current frame may be used by the second combining module without switching the voice and audio signals of different bandwidths. The band signal and the first high frequency band signal are combined to form a wideband signal, thereby facilitating the improvement of the quality of the user's audio and video signals.

Based on the above technical solution, optionally, when a wideband speech and audio signal is transmitted to a narrowband speech and audio signal When the number is switched, as shown in FIG. 10 and FIG. 11 , the processing module 101 in this embodiment includes: The prediction module 1011 is configured to predict predicted fine structure information and predicted envelope information corresponding to the first high-band signal.

The first generation module 1012 is configured to perform weighting processing according to the predicted envelope information and the pre-M frame envelope information corresponding to the second high-band signal of the pre-M frame speech and audio signal, to obtain a first corresponding to the first high-band signal. An envelope information.

The second generation module 1013 is configured to generate the processed first high frequency band signal according to the first envelope information and the predicted fine structure information.

Further, the audio signal switching apparatus of this embodiment may further include: a classification module 1010, configured to perform signal classification on the first low frequency band signal of the current frame speech audio signal; and the prediction module 1011 is further configured to use the first low frequency band signal according to the signal The corresponding signal type predicts predicted fine structure information and predicted envelope information corresponding to the first low frequency band signal of the current frame speech audio signal.

In the audio signal switching apparatus of this embodiment, the predicted fine structure information and the predicted envelope information corresponding to the first high frequency band signal are predicted by the prediction module, so that the first generation module and the second generation module can be accurately generated and processed. The first high-band signal, so that the first high-band signal can be smoothly transitioned to the processed first high-band signal, which is more advantageous for improving the quality of the user's audio and video signals. In addition, the first low-band signal of the current frame speech and audio signal is classified by the classification module, and then the prediction module obtains the predicted fine structure information and the predicted envelope information according to the signal type, thereby making the predicted fine structure information and prediction. The envelope information is more accurate, and the quality of the speech and audio signals received by the user is higher.

Based on the foregoing technical solution, optionally, as shown in FIG. 10 and FIG. 12, the first synthesizing module 102 in this embodiment includes:

The first judging module 1021 is configured to judge whether the processed first high-band signal needs to be attenuated according to the current frame speech audio signal and the speech/audio signal of the previous frame.

The third synthesizing module 1022 is configured to: if the first judging module 1021 obtains that the processed first high frequency band signal does not need to be attenuated, synthesize the processed first high frequency band signal and the first low frequency band signal into a wide width Frequency band signal.

The second determining module 1023 is configured to determine whether the processed first high frequency band signal needs to be attenuated if the first determining module 1021 determines that the processed first high frequency band signal has an attenuation factor greater than a given threshold.

The fourth synthesizing module 1024 is configured to: if the second judging module 1023 finds that the attenuation factor is not greater than a given threshold, multiply the processed first high-band signal by a threshold, and then synthesize the broadband with the first low-band signal. With signal.

The fifth synthesizing module 1025 is configured to: if the second judging module 1023 obtains that the attenuation factor is greater than a given threshold, multiply the processed first high-band signal by an attenuation factor, and then synthesize the broadband with the first low-band signal. With signal.

The first modification module 1026 is for modifying the attenuation factor to reduce the attenuation factor.

Wherein, the initial value of the attenuation factor is one; the threshold is less than one and greater than or equal to zero.

In the audio signal switching apparatus of this embodiment, by performing attenuation processing on the processed first high-band signal, the wide-band signal obtained by processing the current frame-audio signal can be more accurate, which is more advantageous for improving the user's listening to the audio signal. quality.

Based on the foregoing technical solution, optionally, when the narrowband audio and video signals are switched to the broadband audio and video signals, as shown in FIG. 10 and FIG. 13a, the processing module 101 in this embodiment includes:

The first calculating module 1011a is configured to perform weighting processing according to the set fourth weight one and fourth weight two to calculate the processed first high frequency band signal; wherein, the fourth weight one is the second high frequency a weight value with a signal, and a fourth weight 2 is a weight value of the first high frequency band signal;

The second modification module 1012a is configured to reduce the fourth weight one by a third weight step, and add the fourth weight two by a third weight step, until the fourth weight one is equal to zero; wherein, the fourth weight is one The sum of the fourth weight two is one.

Similarly, when the narrowband speech and audio signals are switched to the wideband audio and video signals, as shown in FIG. 10 and FIG. 13b, the processing module 101 in this embodiment may further include:

The second calculation module 101 lb is configured to perform the second weight and the fifth weight according to the set fifth weight Performing a weighting process to calculate a processed first high-band signal; wherein, the fifth weight one is a weight value of the fixed parameter that has been set, and the fifth weight two is a weight value of the first high-band signal; The third modification module 1012b is configured to reduce the fifth weight one by the fourth weight step, and add the fifth weight two by the fourth weight step, until the fifth weight one is equal to zero; wherein, the fifth weight one The sum of the fifth weight and the second weight is one; wherein, the fixed parameter is a constant greater than or equal to zero and less than the energy value of the first high frequency band signal.

In the process of switching the speech/audio signal from the narrow-band speech audio signal to the wide-band speech audio signal, the speech/audio signal switching device is processed by attenuating the high-band signal of the wide-band speech and audio signal. The high-band signal enables the high-band signal corresponding to the narrow-band speech and audio signal before the switching to smoothly transition to the processed high-band signal corresponding to the wide-band speech and audio signal, which is more conducive to improving the user's listening audio. The quality of the signal. It should be noted that the above embodiments are only for explaining the technical solutions of the present invention, and are not intended to be limiting; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: The technical solutions described in the foregoing embodiments are modified, or some of the technical features are equivalently replaced. The modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

Rights request

A method for switching voice and audio signals, comprising:

The method for switching a voice signal according to claim 1, further comprising: synthesizing the first high frequency band signal and the first low frequency band signal when a switching of the speech audio signal does not occur Describe the broadband signal.

The method for switching a speech audio signal according to claim 1 or 2, wherein when the wideband speech and audio signal is switched to the narrowband speech and audio signal;

And weighting the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the first M frame speech and audio signal to obtain the processed first high frequency band signal, specifically:

Predicting predicted fine structure information and predicted envelope information corresponding to the first high frequency band signal of the current frame speech audio signal;

And weighting the predicted envelope information and the first M frame envelope information corresponding to the second high frequency band signal of the pre-M frame speech and audio signal to obtain a first corresponding to the first high frequency band signal Baoluo information;

And generating the processed first high frequency band signal according to the first envelope information and the predicted fine structure information.

The voice-audio signal switching method according to claim 3, wherein the predicted fine structure information and the predicted envelope information corresponding to the first high-band signal are:

Performing signal classification on the first low frequency band signal of the current frame speech audio signal; And predicting the predicted fine structure information and the predicted envelope information according to a signal type corresponding to the first low frequency band signal.

The method for switching audio and video signals according to claim 3, wherein the pre-M frame corresponding to the predicted envelope signal and the second high-band signal of the pre-M frame speech and audio signal The enveloping information is weighted to obtain the first envelope information corresponding to the first high-band signal, which is specifically:

Calculating a correlation coefficient between the first low frequency band signal and a low frequency band signal of the voice signal of the first N frame according to the first low frequency band signal and the low frequency band signal of the voice signal of the first N frame; Where N is greater than or equal to 1;

Determining whether the correlation coefficient is within a given first threshold range;

If the correlation coefficient is not within the first threshold range, performing weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information; The first weight 1 is a weight value of the previous frame envelope information corresponding to the high frequency band signal of the previous frame speech and audio signal, and the first weight 2 is a weight value of the envelope information;

If the correlation coefficient is within the first threshold range, performing weighting processing according to the set second weight one and second weight two to calculate transition envelope information; wherein, the second weight One is a weight value of the envelope information before the handover corresponding to the high frequency band signal of the pre-switching L frame speech and audio signal, and the second weight 2 is a weight value of the envelope information of the previous M frame; wherein, L Greater than or equal to 1;

Reducing the second weight one in units of the first weight step, and adding the second weight two in units of the first weight step;

Determining whether the third weight 1 that has been set is greater than the first weight one;

If the third weight is not greater than the first weight one, performing weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information. And if the third weight one is greater than the first weight one, performing weighting processing according to the set third weight one and third weight two to calculate the first envelope information; The third weight 1 is a weight value of the transition envelope information, and the third weight 2 is a weight value of the predicted envelope information;

Decreasing the third weight one in units of the second weight step, and increasing the third weight two in units of the second weight step until the third weight one is equal to zero;

The sum of the first weight one and the first weight two is one, the sum of the second weight one and the second weight two is one, the third weight one and the third weight The sum of two is one; the initial value of the third weight one is greater than the initial value of the first weight one; the first weight one and the first weight two are fixed constants.

The voice-audio signal switching method according to claim 3, wherein the pre-M frame envelope corresponding to the predicted envelope signal and the second high-band signal of the previous M-frame voice signal The information is weighted to obtain the first envelope information corresponding to the first high frequency band signal, which is specifically:

Calculating a correlation coefficient between the first low frequency band signal and a low frequency signal of the speech signal of the previous frame according to the first low frequency band signal of the current frame and the low frequency signal of the speech signal of the previous frame ;

Determining whether the correlation coefficient is within a given second threshold range;

If the correlation coefficient is not within the second threshold range, performing weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information; The first weight 1 is a weight value of the previous frame envelope information corresponding to the high frequency band signal of the speech and audio signal of the previous frame, and the first weight 2 is a weight value of the predicted envelope information; The first weight one and the first weight two are fixed constants;

If the correlation coefficient is within the second threshold range, it is determined whether the second weight 1 that is set is greater than the first weight one; wherein the second weight one is the previous frame speech and audio The weight value of the envelope information before switching corresponding to the high frequency band signal of the signal,

If the second weight is not greater than the first weight one, performing weighting processing according to the first weight 1 and the first weight 2 that have been set to calculate the first envelope information; If the second weight one is greater than the first weight one, perform weighting processing according to the second weight one and the second weight 2 that has been set to calculate the first envelope information; The second weight 2 is a weight value of the predicted envelope information;

Reducing the second weight one in units of a second weight step, and increasing the second weight two in units of a second weight step;

The sum of the first weight one and the first weight two is one, the sum of the second weight one and the second weight two is one; the initial value of the second weight one is greater than the The initial value of the first weight one.

The voice audio signal switching method according to claim 3, wherein the processing the first high frequency band signal and the first low frequency band signal of the current frame speech audio signal into a wide frequency band The signal is specifically:

Determining whether the processed first high frequency band signal needs to be attenuated according to the current frame speech audio signal and switching the speech audio signal of the previous frame;

If the attenuation is not required, synthesizing the processed first high frequency band signal and the first low frequency band signal into the wideband signal;

If the attenuation is required, determining whether the attenuation factor corresponding to the processed first high frequency band signal is greater than the threshold;

If the attenuation factor is not greater than a given threshold, multiplying the processed first high-band signal by a threshold, and then synthesizing the broadband signal with the first low-band signal;

If the attenuation factor is greater than a given threshold, multiplying the processed first high-band signal by an attenuation factor, and synthesizing the broadband signal with the first low-band signal;

Modifying the attenuation factor to reduce the attenuation factor;

The initial value of the attenuation factor is one; the threshold is less than one and greater than or equal to zero.

The method for switching a speech audio signal according to claim 1 or 2, wherein when the narrowband speech and audio signal is switched to the wideband speech and audio signal;

The first high frequency band signal of the current frame speech audio signal and the first M frame audio signal The second high-band signal is subjected to weighting processing to obtain the processed first high-band signal, specifically: performing weighting processing according to the set fourth weight one and fourth weight two to calculate the processed a first high-band signal; wherein the fourth weight one is a weight value of the second high-band signal, and the fourth weight two is a weight value of the first high-band signal;

Reducing the fourth weight one in units of a third weight step, and increasing the fourth weight two in units of a third weight step until the fourth weight one is equal to zero; wherein the fourth weight one The sum of the fourth weight two is one.

The speech/audio signal switching method according to claim 1 or 2, wherein when the narrowband speech and audio signal is switched to the wideband speech and audio signal;

And performing weighting processing on the first high frequency band signal of the current frame speech audio signal and the second high frequency band signal of the previous M frame speech and audio signal, to obtain the processed first high frequency band signal, which is specifically: The fifth weight 1 and the fifth weight 2 are determined to perform weighting processing to calculate the processed first high frequency band signal; wherein the fifth weight one is a weight value of the fixed parameter that has been set The fifth weight 2 is a weight value of the first high frequency band signal;

Reducing the fifth weight one in units of a fourth weight step, and increasing the fifth weight two in units of a fourth weight step until the fifth weight one is equal to zero; wherein the fifth weight The sum of one and the fifth weight two is one;

The fixed parameter is a constant greater than or equal to zero and less than the energy value of the first high frequency band signal.

10. A voice signal switching device, comprising:

And a first synthesizing module, configured to synthesize the processed first high frequency band signal with the first low frequency band signal of the current frame audio signal into a broadband signal.

11. The speech and audio signal switching device according to claim 10, further comprising Includes:

And a second synthesizing module, configured to synthesize the first high frequency band signal and the first low frequency band signal into the wideband signal when no switching occurs.

The speech/audio signal switching device according to claim 10 or 11, wherein when the wideband speech and audio signal is switched to the narrowband speech and audio signal;

The processing module includes:

a prediction module, configured to predict predicted fine structure information and predicted envelope information corresponding to the first high frequency band signal of the current frame speech audio signal;

a first generation module, configured to perform weighting processing on the predicted MCU envelope information corresponding to the second high-band signal of the pre-M frame speech and audio signal, to obtain the first high First envelope information corresponding to the frequency band signal;

And a second generating module, configured to generate the processed first high frequency band signal according to the first envelope information and the predicted fine structure information.

The apparatus for switching audio and video signals according to claim 12, further comprising: a classification module, configured to perform signal classification on the first low frequency band signal of the current frame speech audio signal;

The prediction module is further configured to predict the predicted fine structure information and the predicted envelope information according to a signal type corresponding to the first low frequency band signal.

The apparatus for switching audio and video signals according to claim 12, wherein the first synthesizing module comprises:

a first determining module, configured to determine, according to the current frame speech audio signal and the voice frequency signal of the previous frame, whether the processed first high frequency band signal needs to be attenuated;

a third synthesizing module, configured to: if the first high-band signal that is processed by the first determining module does not need to be attenuated, the processed first high-band signal and the first low-band Signaling the wideband signal;

a second determining module, configured to: if the first determining module obtains the processed first high frequency The band signal needs to be attenuated, and it is determined whether the attenuation factor corresponding to the processed first high frequency band signal is greater than a given threshold;

a fourth synthesizing module, configured to: if the second determining module determines that the attenuation factor is not greater than the given threshold, multiply the processed first high frequency band signal by a threshold, and then The first low frequency band signal synthesizes the wideband signal;

a fifth synthesizing module, configured to: after the second determining module obtains that the attenuation factor is greater than the given threshold, multiplying the processed first high frequency band signal by an attenuation factor, and then The first low frequency band signal synthesizes the wideband signal;

a first modification module, configured to modify the attenuation factor to reduce the attenuation factor; wherein, the initial value of the attenuation factor is one; and the threshold is less than one and greater than or equal to zero.

The speech/audio signal switching device according to claim 10 or 11, wherein when the narrowband speech and audio signal is switched to the wideband speech and audio signal;

The processing module includes:

a first calculating module, configured to perform weighting processing according to the fourth weight one and the fourth weight 2 that have been set, to calculate the processed first high frequency band signal; wherein, the fourth weight one is a weight value of the second high frequency band signal, where the fourth weight 2 is a weight value of the first high frequency band signal;

a second modification module, configured to reduce the fourth weight one by a third weight step, and increase the fourth weight two by a third weight step, until the fourth weight one is equal to zero; The sum of the fourth weight one and the fourth weight two is one.

The speech/audio signal switching device according to claim 13, wherein when the narrowband speech and audio signal is switched to the wideband speech and audio signal; the processing module comprises:

a second calculating module, configured to perform a weighting process according to the set fifth weight 1 and the fifth weight 2 to calculate the processed first high frequency band signal; wherein the fifth weight one is The weight value of the fixed parameter is set, and the fifth weight 2 is a weight value of the first high frequency band signal; a third modification module, configured to reduce the fifth weight one by a fourth weight step, and increase the fifth weight two by a fourth weight step, until the fifth weight one is equal to zero; The sum of the fifth weight one and the fifth weight two is one; wherein the fixed parameter is a constant greater than or equal to zero and smaller than the energy value of the first high frequency band signal.