CN112951252B

CN112951252B - LC3 audio code stream sound mixing method, device, medium and equipment

Info

Publication number: CN112951252B
Application number: CN202110520172.1A
Authority: CN
Inventors: 李强; 王尧; 叶东翔; 朱勇
Original assignee: Barrot Wireless Co Ltd
Current assignee: Barrot Wireless Co Ltd
Priority date: 2021-05-13
Filing date: 2021-05-13
Publication date: 2021-08-03
Anticipated expiration: 2041-05-13
Also published as: CN112951252A

Abstract

The invention discloses a sound mixing method of LC3 audio code streams, which comprises the steps of respectively carrying out partial steps in LC3 decoding on a plurality of LC3 audio code streams to obtain current frame audio spectral coefficients, and superposing the current frame audio spectral coefficients to obtain current frame mixed audio spectral coefficients; and performing partial steps in LC3 coding by using the current frame mixed audio spectral coefficient, and setting a pitch delay parameter to obtain a mixed LC3 audio code stream. Part of the steps in the LC3 decoding include code stream analysis, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding and transform domain noise shaping decoding; some of the steps in LC3 encoding include transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and codestream encapsulation. The application of the invention reduces the total computing power requirement in the audio mixing server, reduces the computation amount in the encoding and decoding process, saves the power consumption, reduces the cost and ensures that the voice quality can not be reduced.

Description

LC3 audio code stream sound mixing method, device, medium and equipment

Technical Field

The present application relates to the field of bluetooth audio encoding and decoding technologies, and in particular, to a method, an apparatus, a medium, and a device for mixing audio streams of LC 3.

Background

With the large-scale commercial use of the LC3 codec, it has been widely used in conference calls. The audio mixing of the conference call usually has a centralized mode and a distributed mode, for the centralized mode, the complete work of buffering, decoding, audio mixing and encoding of audio streams needs to be finished on a central audio mixing server, and along with the increase of the number of people participating in the conference, the computational demand of the central audio mixing server is correspondingly improved. In both the distributed mixing system and the centralized mixing system, in order to support a larger number of mixing paths, a more powerful mixing server needs to be configured, thereby increasing the cost of the conference equipment. In addition, the above processes are performed with complete decoding and encoding, and the low-delay modified discrete cosine transform step and the low-delay modified inverse discrete cosine transform step included therein usually use fixed-point operation, which results in loss of precision due to word length limitation, thereby reducing sound quality.

Disclosure of Invention

The invention provides a sound mixing method of LC3 audio code stream, which mainly omits a long-term post-filtering step, a low-delay improved discrete cosine transform step and a low-delay improved inverse discrete cosine transform step in the LC3 coding and decoding process, effectively reduces the total calculation force requirement in a sound mixing server, reduces the operation amount in the coding and decoding process, saves the power consumption, reduces the cost and ensures that the voice quality can not be reduced.

In order to solve the above problems, the present invention adopts a technical solution that: there is provided a method for mixing an LC3 audio stream, the method comprising,

respectively carrying out partial steps in LC3 decoding on the multiple paths of LC3 audio code streams to obtain a current frame audio frequency spectrum coefficient of each path of LC3 audio code stream, and superposing the current frame audio frequency spectrum coefficients of each path of LC3 audio code stream to obtain a current frame mixed audio frequency spectrum coefficient;

performing partial steps in LC3 coding by using the current frame mixed audio spectral coefficient, and setting a pitch delay parameter to obtain a mixed LC3 audio code stream;

part of the steps in the LC3 decoding include code stream analysis, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding and transform domain noise shaping decoding;

some of the steps in LC3 encoding include transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and codestream encapsulation.

The invention adopts another technical scheme that: there is provided an apparatus for mixing an LC3 audio stream, the apparatus including,

a module for respectively performing partial steps in LC3 decoding on the multiple LC3 audio code streams to obtain a current frame audio spectral coefficient of each LC3 audio code stream, and superposing the current frame audio spectral coefficients of each LC3 audio code stream to obtain a current frame mixed audio spectral coefficient;

a module for performing partial steps in LC3 encoding by using the current frame mixed audio spectrum coefficient, and setting a pitch delay parameter to obtain a mixed LC3 audio code stream;

In another technical solution of the present invention, a computer-readable storage medium is provided, which stores computer instructions, where the computer instructions are operated to execute a method for mixing LC3 audio streams in the scheme.

In another technical solution of the present application, a computer device is provided, which includes a processor and a memory, where the memory stores computer instructions, and the processor operates the computer instructions to execute a method for mixing LC3 audio streams in the scheme.

The technical scheme of the invention can achieve the following beneficial effects: the invention provides an LC3 audio code stream sound mixing method, which mainly omits a long-term post-filtering step, a low-delay improved discrete cosine transform step and a low-delay improved inverse discrete cosine transform step in the LC3 coding and decoding process, effectively reduces the total calculation force requirement in a sound mixing server, reduces the operation amount in the coding and decoding process, saves the power consumption, reduces the cost and ensures that the voice quality cannot be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic diagram of an embodiment of an audio mixing method of an LC3 audio code stream according to the present invention;

fig. 2 is a schematic flow chart of a specific example of an LC3 audio stream mixing method according to the present invention;

fig. 3 is a schematic structural diagram of an embodiment of an audio mixing apparatus for LC3 audio streams according to the present invention;

fig. 4 is a schematic structural diagram of an embodiment of an audio mixing apparatus for LC3 audio streams according to the present invention;

FIG. 5 is a flowchart illustrating an exemplary voice activity detection process according to the present invention;

FIG. 6 is a flow chart of the LC3 decoding process;

fig. 7 is a flow chart of the LC3 encoding process.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

The following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings, will make the advantages and features of the invention easier to understand by those skilled in the art, and thus will clearly and clearly define the scope of the invention.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The application scenes of the LC3 audio code stream sound mixing method comprise voice coding and decoding, teleconference, music coding and decoding and the like.

Fig. 1 is a schematic diagram illustrating an embodiment of an LC3 audio stream mixing method according to the present invention.

In this embodiment, the method for mixing the LC3 audio code stream mainly includes: the process S101: respectively carrying out partial steps in LC3 decoding on the multiple paths of LC3 audio code streams to obtain a current frame audio frequency spectrum coefficient of each path of LC3 audio code stream, and superposing the current frame audio frequency spectrum coefficients of each path of LC3 audio code stream to obtain a current frame mixed audio frequency spectrum coefficient; the process S102: and performing partial steps in LC3 coding by using the current frame mixed audio spectral coefficient, and setting a pitch delay parameter to obtain a mixed LC3 audio code stream.

In the specific embodiment shown in fig. 1, the method for mixing LC3 audio streams of the present invention includes a process S101, in which multiple LC3 audio streams are respectively subjected to partial steps in LC3 decoding to obtain current frame audio spectral coefficients of each LC3 audio stream, and the current frame audio spectral coefficients of each LC3 audio stream are superimposed to obtain current frame mixed audio spectral coefficients, where the partial steps in LC3 decoding include stream parsing, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding, and transform domain noise shaping decoding. The process decodes and mixes the LC3 audio code stream so as to obtain the mixed spectral coefficient, namely the mixed audio spectral coefficient of the current frame, and omits an LD-IMDCT step and an LTPF decoding step in the LC3 decoding process, namely a low-delay improved inverse discrete cosine transform step and a long-term post-filtering decoding step, thereby reducing the decoding operand and further reducing the overall operand of the mixing server.

In a specific example of the present invention, the process of respectively performing a part of steps in LC3 decoding on the multiple LC3 audio code streams to obtain the current frame audio spectral coefficients of each LC3 audio code stream includes, referring to a flowchart of a specific example of an audio mixing method of an LC3 audio code stream shown in fig. 2, for example, three LC3 audio code streams S1, S2, and S3 are used as input to be mixed, and first, an LC3 decoding module 1 is used to perform decoding to obtain the current frame audio spectral coefficients X1, X2, and X3 of the three LC3 audio code streams.

Referring to the LC3 decoding process flow chart provided in fig. 6 of the present invention, the LC3 decoding module 1 represents code stream parsing, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding, and transform domain noise shaping decoding in the LC3 decoding process, and functions as inputting an audio code stream and outputting an audio spectral coefficient. The LC3 decoding module 2 and the LC3 decoding module 3 in the LC3 decoding process are omitted in the foregoing process, the LC3 decoding module 2 represents a low-delay modified inverse discrete cosine transform step in the LC3 decoding process, that is, an LD-IMDCT module, which functions to input audio spectral coefficients and output audio PCM data, and the LC3 decoding module 2 is omitted because the LD-IMDCT is usually based on fixed-point operations on an embedded system, and the limited word length thereof must have precision loss, and the effect of the loss on the sound quality can be effectively reduced by skipping this module. The LC3 decoding module 3 indicates a long-term post-filtering decoding step in the LC3 decoding process, i.e., an LTPF module, which functions to input audio PCM data and output filtered audio PCM data, and omits the LC3 decoding module 3 because this module has a certain improvement effect on subjective sound quality when the code rate is low, and plays a very limited role when the code rate is high, and the transmission rate of bluetooth is sufficient for voice and audio, so that the module is prohibited from significantly reducing sound quality, and because the LPTF module has high computational complexity, this module is prohibited from reducing the amount of computation.

In a specific embodiment of the present invention, the process of obtaining the current frame mixed audio spectral coefficient by overlapping the current frame audio spectral coefficient of each LC3 audio code stream includes obtaining the current frame mixed audio spectral coefficient by overlapping the current frame audio spectral coefficient of each LC3 audio code stream. In this embodiment, the current frame audio spectral coefficients of each LC3 audio code stream are directly superimposed without any operation to obtain the current frame mixed audio spectral coefficients.

In a specific embodiment of the present invention, the process of obtaining the current frame mixed audio spectral coefficient by overlapping the current frame audio spectral coefficient of each LC3 audio code stream includes, before obtaining the current frame mixed audio spectral coefficient by overlapping the current frame audio spectral coefficient of each LC3 audio code stream, performing voice activity detection on the current frame audio spectral coefficient of each LC3 audio code stream to obtain at least one current frame audio spectral coefficient and/or at least one current non-voice frame spectral coefficient; and superposing all audio spectral coefficients of the current speech frame to obtain a mixed audio spectral coefficient of the current frame. This process is in order to reduce the likelihood of subsequent module saturation, thereby improving sound quality.

Specifically, referring to a flow diagram of a specific example of an LC3 audio stream mixing method shown in fig. 2, VAD detection, that is, voice activity detection, is performed on current frame audio spectral coefficients X1, X2, and X3 by using a VAD module, where voice activity detection in the prior art is generally applied to a time domain, and voice activity detection is performed based on a frequency domain signal in the present invention, and detection is performed through the following processes: referring to fig. 5, a flowchart of a specific example of a VAD detection process provided by the present invention, according to a pitch detection result of a current frame audio spectral coefficient of each LC3 audio code stream in a decoding process, whether the current frame audio spectral coefficient is a speech signal is determined, and if pitch can be detected, the current frame audio spectral coefficient is determined as the current frame audio spectral coefficient; if the fundamental tone cannot be detected, calculating the energy entropy of the voice sub-band in the audio spectral coefficient of the current frame, and judging whether the audio spectral coefficient of the current frame is a voice signal according to the energy entropy of the voice sub-band and a preset threshold value.

The Pitch _ present is the output of the LC3 decoding module 1', i.e. a gene delay parameter, which is contained in the LC3 audio code stream and can be obtained in the decoding process; when 'Pitch _ present = 1', it indicates that there is a strong Pitch component, i.e., there is a high probability that it is a speech signal; when 'Pitch _ present = 0', it indicates that there is a low probability that the signal is a speech signal or whether the signal is a speech signal cannot be determined, and the step of using energy entropy is required to further determine the signal, thereby improving the accuracy.

In this specific example, the spectral coefficients XV1, XV2, XV3 that are respectively continued to be subjected to spectral coefficient superposition are obtained according to the VAD detection process described above, if valid speech is included, the audio spectral coefficients of this path of current speech frame are added into the mixed sound, and if the Pitch _ present after VAD detection of X1, X2, X3 is all equal to 1, that is, all valid speech is included, the current frame mixed audio spectral coefficients Xmix are obtained by superposition, specifically, as follows:

Xmix=XV1+XV2+XV3=X1+X2+X3

in a specific embodiment of the present invention, referring to the above embodiment, when at least one current non-speech frame audio spectral coefficient is obtained, at least one current non-speech frame audio spectral coefficient is subjected to attenuation processing; and superposing all the audio spectral coefficients of the current voice frame and the attenuated audio spectral coefficients of the current non-voice frame to obtain the mixed audio spectral coefficients of the current frame. This process is convenient to reduce the possibility of subsequent module saturation while ensuring that the sound quality is not reduced.

Specifically, referring to a flowchart of a specific example of the LC3 audio stream mixing method shown in fig. 2, if some of X1, X2, and X3 include valid speech after VAD detection, and some of X1 and X2 do not include valid speech, for example, X1 and X2 include valid speech, X1 and X2 are audio spectral coefficients of a current speech frame, and X3 does not include valid speech, that is, pitch is not detected, and X3 is attenuated to obtain XV 3. Referring to fig. 5, a flowchart of a specific example of VAD detection process provided by the present invention, a process of calculating energy entropy of speech subband for audio spectral coefficient of current frame where pitch cannot be detected is as follows:

(1) calculating the energy of the low-frequency speech subband: taking a sampling rate of 48kHz as an example, in an LC3 codec, the effective frequency band range is 20 Hz-20 kHz, voice mainly occupies 300 Hz-3500 Hz, for simplicity, only 200 Hz-3600 Hz is calculated when sub-band energy is calculated, 0-4000 Hz is divided into a plurality of sub-bands SUBBAND _ NUM (20 is taken), the occupied bandwidth of each sub-band is SUBBAND _ WIDTH =4000/SUBBAND _ NUM =200Hz, the spectral resolution is 50Hz, and therefore the energy of each sub-band is obtained by 4 spectral coefficients. The subband energy calculating method comprises the following steps:

SUBBAND_ENERYG(1) = 0；

SUBBAND_ENERYG(19) = 0；

SUBBAND_ENERYG(20) = 0；

wherein

K =0, 1, 2, 3, …, 79 is the low-frequency spectral coefficient of the current audio frame (note: all spectral coefficient numbers are 0, 1, 2, 3, …, 399),

(2) calculating the total energy of the low-frequency sub-band:

the effective voice frequency band is generally considered to be 300 Hz-3500 Hz, and the invention is 200 Hz-3600 Hz.

(3) Computing speech sub-band energy probabilities

(4) Computing speech sub-band energy entropy

Judging whether the current frame sound spectrum coefficient is an audio signal according to the speech sub-band energy entropy and a preset threshold value, setting the preset threshold value to be 0.8 according to the statistical result of typical speech materials, if the low-frequency sub-band energy entropy is smaller than the threshold value, determining the current frame sound spectrum coefficient as the current speech frame audio spectrum coefficient, and if the low-frequency sub-band energy entropy is not smaller than the threshold value, determining the current frame sound spectrum coefficient as the current non-speech frame audio spectrum coefficient.

The process of attenuating the current non-speech frame audio spectral coefficient includes multiplying the current non-speech frame audio spectral coefficient by a preset attenuation factor, for example, an empirical value of 0.1, where the attenuation factor is not equal to zero. The total energy after the sound mixing can be reduced after the attenuation, the overflow possibility is reduced, the user experience is enhanced, meanwhile, some energy is still reserved, and the condition that no sense exists when the participants do not speak is avoided.

As shown in fig. 5, whether the current frame spectral coefficient is a speech signal is determined according to vadFlag.

vadFlag = 0: then there is no speech component, typically a noise or silence signal, in the audio spectral coefficients of the current frame, and all spectral coefficients are multiplied by an attenuation factor, here the spectral coefficient X3, i.e. the

K =0, 1, 2, 3, …, 399, which is taken as the attenuated audio spectral coefficient XV3 of the current non-speech frame to be superposed in the next step;

vadFlag = 1: and if so, directly outputting the audio spectral coefficient of the current frame as the audio spectral coefficient of the current speech frame for the next superposition.

And superposing all the audio spectral coefficients of the current voice frame and the attenuated audio spectral coefficients of the current non-voice frame to obtain a mixed audio spectral coefficient Xmix of the current frame, wherein the specific calculation is as follows:

Xmix=XV1+XV2+XV3=X1+X2+XV3

in an embodiment of the present invention, the method for mixing audio streams of LC3 according to the present invention further includes, before performing part of the steps in LC3 encoding using the current frame mixed audio spectral coefficients, adjusting the current frame mixed audio spectral coefficients to a predetermined fixed-point spectral coefficient representation range to obtain adjusted current frame mixed audio spectral coefficients; and performing partial steps in LC3 encoding by using the adjusted current frame mixed audio spectral coefficients. The process adjusts the current frame mixed audio spectral coefficient so as to reduce the possibility of saturation of a subsequent module and improve the tone quality.

Specifically, the process of adjusting the current frame mixed audio spectral coefficient to the predetermined fixed-point spectral coefficient representation range to obtain the adjusted current frame mixed audio spectral coefficient includes, referring to a flowchart of a specific example of the audio mixing method of the LC3 audio code stream shown in fig. 2, in practical applications, a maximum value limiting method or an average weight adjusting method or other mature methods may be used to adjust the current frame mixed audio spectral coefficient to the predetermined fixed-point spectral coefficient representation range to obtain the adjusted current frame mixed audio spectral coefficient, which is not limited in the present invention. In this specific example, the spectral coefficient adjustment module is utilized, and a maximum value limitation method is used to adjust the current frame mixed audio spectral coefficient to a predetermined fixed-point spectral coefficient representation range, so as to obtain an adjusted current frame mixed audio spectral coefficient. In practice, the predetermined fixed-point spectral coefficient representation range is set according to the word length of the processor, and can be 32bit, 40bit, 60bit and the like. The spectral coefficients of the spotting in this particular example are represented by 32 bits. If the fixed-point representation of the current frame mixed audio spectral coefficient does not exceed the predetermined fixed-point spectral coefficient representation range, that is, when the fixed-point representation of the current frame mixed audio spectral coefficient is less than 32bits or equal to 32bits, the current frame mixed audio spectral coefficient is directly determined as the adjusted current frame mixed audio spectral coefficient, that is, Xadj in fig. 2, without changing the current frame mixed audio spectral coefficient. If the fixed-point representation of the current frame mixed audio spectral coefficient exceeds the preset fixed-point spectral coefficient representation range, namely when the fixed-point representation of the current frame mixed audio spectral coefficient is larger than 32bits, the current frame mixed audio spectral coefficient needs to be adjusted until the fixed-point representation of the current frame mixed audio spectral coefficient is equal to 32 bits. The adjusted current frame mixed audio spectral coefficients are Xadj in fig. 2.

In the embodiment shown in fig. 1, the method for mixing LC3 audio streams of the present invention includes a process S102, performing partial steps in LC3 encoding by using a current frame mixed audio spectral coefficient, and setting a pitch delay parameter to obtain a mixed LC3 audio stream, where the partial steps in LC3 encoding include transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and stream encapsulation. The process partially encodes the mixed audio spectral coefficients of the current frame so as to obtain a mixed LC3 audio code stream, omits an LD-MDCT step and an LTPF encoding step in the LC3 encoding process, namely a low-delay improved discrete cosine transform step and a long-term post-filter encoding step, and reduces the encoding operation amount.

Referring to the LC3 encoding process flow chart provided in fig. 7 of the present invention, the LC3 encoding module 3 represents transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual encoding, and code stream encapsulation in LC3 encoding, and functions as inputting audio spectral coefficients and LTPF parameters and outputting audio code streams. The LC3 encoding module 1 and the LC3 encoding module 2 in the LC3 encoding process are omitted, the LC3 encoding module 1 represents a resampling step and a long-term post-filtering step in the LC3 encoding process, and the long-term post-filtering step, namely an LTPF module, has the functions of inputting an audio code stream and outputting LPTF parameters. The LC3 coding module 1 is omitted because the module has certain promotion effect on subjective tone quality when the code rate is low, the effect is limited when the code rate is high, and the transmission rate of Bluetooth is enough for voice frequency, so that the tone quality can not be obviously reduced when the module is forbidden, and the module is forbidden to reduce the operation amount because the operation complexity of the LPTF module is high. The omitted LC3 encoding module 2 represents a low-delay modified discrete cosine transform step in the LC3 encoding process, i.e., an LD-MDCT module, which functions to input an audio code stream and output audio spectral coefficients. The LC3 encoding block 2 is omitted because the LD-MDCT is usually based on fixed-point operation on the embedded system, and the limited word length thereof must have precision loss, and skipping this block can effectively reduce the influence of this loss on the sound quality.

In an embodiment of the present invention, the above-mentioned process of performing part of the steps in LC3 encoding by using the current frame mixed audio spectral coefficients and setting the pitch lag parameter to obtain the mixed LC3 audio code stream includes setting the pitch lag parameter to 0. This process is convenient for ensuring that the encoded output hybrid LC3 audio codestream conforms to the standard LC3 audio codestream syntax.

Specifically, referring to a flow diagram of a specific example of the LC3 audio stream mixing method shown in fig. 2, since the LC3 encoding module 3 has been described in the above example to represent transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and stream encapsulation in the LC3 encoding process, the function of the LC3 audio stream mixing method is to input audio spectral coefficients and LTPF parameters and output an audio stream. However, as the LPTF module in the LC3 encoding process is omitted, no LTPF parameter exists in the process, and therefore the pitch delay parameter needs to be set to ensure that the encoded and output mixed audio code stream Smix conforms to the standard LC3 audio code stream grammar.

The pitch delay parameter, i.e., pitch _ present, indicates whether a pitch delay parameter is included in the audio stream, 0 indicates none, and 1 indicates presence. According to the LC3 standard, if not, only one bit needs to be written in the audio code stream, namely the pitch delay parameter is equal to 0, if yes, 11 bits in total need to be written in the audio code stream, namely the pitch delay parameter is equal to 1, the writing of the pitch delay parameter occupies 1 bit, and the rest 10 bits comprise LTPF activation of one bit and pitch delay of 9 bits.

In a specific embodiment of the present invention, the step in LC3 encoding is performed by using the adjusted current frame mixed audio spectral coefficient, and the pitch lag parameter is set to obtain the mixed LC3 audio code stream. The encoding step of the adjusted current frame mixed audio spectral coefficient is the same as the encoding step in the above step S102, and is not described herein again.

In practical applications, in order to further reduce complexity, the impulse detection module and the bandwidth detection module in the LC3 encoding process may be omitted, where the impulse detection module is enabled only when the sampling rate and the code rate are high, that is, the impulse detection needs to be invoked only when one of the following two conditions is satisfied: the sampling rate is 32kHz, and the code rate is more than 64 kbps; the sampling rate is 44.1kbps or 48kbps, and the code rate is 80kbps or more. When the LC3 audio code stream mixing method is applied to a voice conference call scene, the voice quality is not affected by omitting the module because the scene of the voice conference call uses more 8kHz and 16 kHz. In addition, even if the application scene is changed to a high code rate and high sampling rate, the application of the invention is not influenced, because the LC3 decoder of the invention accepts a standard LC3 audio code stream, and for the decoded audio spectral coefficients, if the impact detection is enabled and the impact is detected, the smoothing is carried out in the frequency domain noise shaping decoding module, and if the impact detection is not enabled or enabled but the impact is not detected, the smoothing is not carried out. The main role of the bandwidth detection module is to detect situations where the actual bandwidth is less than the nyquist bandwidth, which is usually present in the context of mobile phones, and which is not generally present in conference phone scenarios, and can therefore be omitted.

Fig. 3 is a schematic diagram illustrating an embodiment of an LC3 audio stream mixing apparatus according to the present invention.

In the specific embodiment shown in fig. 3, the LC3 audio stream mixing apparatus according to the present invention includes a module 301 and a module 302.

The module 301 shown in fig. 3 represents a decoding and mixing module, which is configured to perform partial steps in LC3 decoding on multiple LC3 audio code streams respectively to obtain current frame audio spectral coefficients of each LC3 audio code stream, and superimpose the current frame audio spectral coefficients of each LC3 audio code stream to obtain current frame mixed audio spectral coefficients, where the partial steps in LC3 decoding include code stream parsing, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding, and transform domain noise shaping decoding. The module 301 is represented in fig. 2 as LC3 decoding module 1 and mixing module. The module decodes and mixes the multi-channel LC3 audio code stream so as to obtain the frequency spectrum coefficient after mixing the audio, namely the mixed audio spectrum coefficient of the current frame, and omits an LD-IMDCT step and an LTPF decoding step in the LC3 decoding process, namely a low-delay improved inverse discrete cosine transform step and a long-term post-filtering decoding step, thereby reducing the decoding operand and further reducing the overall operand of the audio mixing server.

In a specific embodiment of the present invention, the above-mentioned module for performing the step of LC3 decoding on the multiple LC3 audio code streams respectively to obtain the current frame audio spectral coefficient of each LC3 audio code stream, and overlapping the current frame audio spectral coefficients of each LC3 audio code stream to obtain the current frame mixed audio spectral coefficient further includes a module for overlapping the current frame audio spectral coefficients of each LC3 audio code stream to obtain the current frame mixed audio spectral coefficient.

In an embodiment of the present invention, referring to the schematic diagram of an embodiment of the mixing apparatus of the LC3 audio code stream provided in fig. 4, the decoding mixing module 401 further includes a voice activity detection sub-module, which is configured to perform voice activity detection on the current frame audio spectral coefficient of each LC3 audio code stream before the current frame audio spectral coefficient of each LC3 audio code stream is superimposed to obtain the current frame mixed audio spectral coefficient. The voice activity detection sub-module is shown in fig. 2 as a VAD module to reduce the likelihood of saturation of subsequent modules and thereby improve sound quality. After voice activity detection of the VAD module is carried out, the decoding and sound mixing module can superpose all audio spectral coefficients of the current voice frame to obtain a mixed audio spectral coefficient of the current frame; or at least one current voice frame audio frequency spectral coefficient and/or at least one current non-voice frame audio frequency spectral coefficient can be obtained, at least one current non-voice frame audio frequency spectral coefficient is attenuated, and all current voice frame audio frequency spectral coefficients and the attenuated current non-voice frame audio frequency spectral coefficients are superposed to obtain a current frame mixed audio frequency spectral coefficient.

Fig. 3 shows a block 302, which represents an encoding block, configured to perform a part of steps in LC3 encoding using a current frame mixed audio spectral coefficient, and set a pitch lag parameter to obtain a mixed LC3 audio code stream, where the part of steps in LC3 encoding includes transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and code stream encapsulation. Module 302 is represented in fig. 2 as LC3 encoding module 3. The module encodes the mixed audio spectrum coefficient of the current frame so as to obtain a mixed LC3 audio code stream, omits an LD-MDCT step and an LTPF (Long term evolution) encoding step in an LC3 encoding process, namely a low delay improved discrete cosine transform step and a long term post-filter encoding step, and reduces the encoding operation amount.

In an embodiment of the present invention, the above module for performing part of the LC3 encoding by using the current frame mixed audio spectral coefficients and setting the pitch delay parameter to obtain the mixed LC3 audio code stream further includes setting the pitch delay parameter to zero. This process is convenient for ensuring that the encoded output hybrid LC3 audio codestream conforms to the standard LC3 audio codestream syntax.

In an embodiment of the present invention, referring to fig. 4, which is a schematic diagram of an embodiment of an audio mixing apparatus of an LC3 audio code stream according to the present invention, the audio mixing apparatus of an LC3 audio code stream includes a spectral coefficient adjusting module 402, which is configured to adjust a current frame mixed audio spectral coefficient to a predetermined fixed-point spectral coefficient representation range to obtain an adjusted current frame mixed audio spectral coefficient. Module 402 is shown in fig. 2 as a spectral coefficient adjustment module that adjusts the spectral coefficients of the mixed audio of the current frame in order to reduce the likelihood of saturation of the subsequent modules and thereby improve the sound quality.

In an embodiment of the present invention, referring to the schematic diagram of an embodiment of an audio mixing apparatus of an LC3 audio code stream provided in fig. 4, the audio mixing apparatus of an LC3 audio code stream includes an encoding module 403, which is configured to perform a part of steps in LC3 encoding by using adjusted mixed audio spectral coefficients of a current frame, and set a pitch delay parameter to obtain a mixed LC3 audio code stream. Module 403 is represented in fig. 2 as LC3 encoding module 3, the difference being that the input to module 302 is the current frame mixed audio spectral coefficients without spectral coefficient adjustment, and the input to module 403 is the current frame mixed audio spectral coefficients after spectral coefficient adjustment. The module encodes the adjusted mixed audio spectral coefficient of the current frame so as to obtain a mixed LC3 audio code stream, and omits an LD-MDCT step and an LTPF (Long term post-filter coding) step in an LC3 coding process, namely a low-delay improved discrete cosine transform step and a long term post-filter coding step, thereby reducing the coding operation amount.

The working contents of the module 403 and the module 302 are the same except for different inputs, and therefore, the description thereof is omitted here.

By applying the LC3 audio code stream mixing device, the long-term post-filtering step, the low-delay improved discrete cosine transform step and the low-delay improved inverse discrete cosine transform step in the LC3 coding and decoding process are mainly omitted, the total calculation force requirement in a mixing server is effectively reduced, the calculation amount in the coding and decoding process is reduced, the power consumption is saved, the cost is reduced, and the voice quality is ensured not to be reduced.

The LC3 audio code stream mixing device provided by the invention can be used for executing the LC3 audio code stream mixing method described in any of the above embodiments, and the implementation principle and technical effect are similar, and are not described herein again.

In another embodiment of the present invention, a computer-readable storage medium stores computer instructions, where the computer instructions are operable to execute the method for mixing LC3 audio streams described in any embodiment. Wherein the storage medium may be directly in hardware, in a software module executed by a processor, or in a combination of the two.

A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium.

The Processor may be a Central Processing Unit (CPU), other general-purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, or any combination thereof. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one embodiment of the present application, a computer device includes a processor and a memory, the memory storing computer instructions, wherein: the processor operates the computer instructions to perform the method for mixing an LC3 audio bitstream described in any of the embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above embodiments are merely examples, which are not intended to limit the scope of the present disclosure, and all equivalent structural changes made by using the contents of the specification and the drawings, or any other related technical fields, are also included in the scope of the present disclosure.

Claims

1. A mixing method of LC3 audio code stream is characterized in that the method comprises,

respectively carrying out partial steps in LC3 decoding on the multiple paths of LC3 audio code streams to obtain a current frame audio spectral coefficient of each path of LC3 audio code stream, and superposing the current frame audio spectral coefficients of each path of LC3 audio code stream to obtain a current frame mixed audio spectral coefficient;

wherein, the LC3 decoding comprises code stream analysis, arithmetic and residual decoding, noise filling, global gain, time domain noise shaping decoding and transform domain noise shaping decoding;

some of the steps in the LC3 encoding include transform domain noise shaping encoding, time domain noise shaping encoding, quantization, noise level estimation, arithmetic and residual coding, and codestream encapsulation.

2. The method of mixing LC3 audio streams according to claim 1,

before the partial step of LC3 coding by using the current frame mixed audio spectral coefficient, adjusting the current frame mixed audio spectral coefficient to a preset fixed point spectral coefficient representation range to obtain an adjusted current frame mixed audio spectral coefficient;

and performing partial steps in LC3 coding by using the adjusted current frame mixed audio spectrum coefficient, and setting the pitch delay parameter to obtain the mixed LC3 audio code stream.

3. The method of mixing LC3 audio code streams according to claim 1 or 2, wherein the process of obtaining current frame mixed audio spectral coefficients by superposing the current frame audio spectral coefficients of each LC3 audio code stream includes,

before the current frame audio spectral coefficients of each LC3 audio code stream are superposed to obtain current frame mixed audio spectral coefficients, performing voice activity detection on the current frame audio spectral coefficients of each LC3 audio code stream to obtain at least one current voice frame audio spectral coefficient and/or at least one current non-voice frame audio spectral coefficient;

and superposing all the audio spectral coefficients of the current voice frame to obtain the mixed audio spectral coefficient of the current frame.

4. The method of mixing LC3 audio streams according to claim 3,

when at least one audio spectral coefficient of the current non-voice frame is obtained, performing attenuation processing on at least one audio spectral coefficient of the current non-voice frame;

and superposing all the audio spectral coefficients of the current voice frame and the attenuated audio spectral coefficients of the current non-voice frame to obtain the mixed audio spectral coefficients of the current frame.

5. The method for mixing LC3 audio code streams according to claim 1 or 2, wherein the step of performing part of LC3 encoding by using the current frame mixed audio spectral coefficients and setting the pitch lag parameter to obtain the mixed LC3 audio code stream includes,

setting the pitch delay parameter to 0.

6. An LC3 audio code stream mixing device is characterized by comprising,

a decoding and sound mixing module, configured to perform partial steps in LC3 decoding on multiple LC3 audio code streams respectively to obtain a current frame audio spectral coefficient of each LC3 audio code stream, and superimpose the current frame audio spectral coefficients of each LC3 audio code stream to obtain a current frame mixed audio spectral coefficient;

a coding module, configured to perform part of the steps in LC3 coding using the current frame mixed audio spectral coefficient, and set a pitch delay parameter to obtain a mixed LC3 audio code stream;

7. The apparatus for mixing LC3 audio streams according to claim 6,

the spectral coefficient adjusting module is used for adjusting the current frame mixed audio spectral coefficient to a preset fixed-point spectral coefficient representation range to obtain an adjusted current frame mixed audio spectral coefficient;

and the coding module is used for performing partial steps in LC3 coding by using the adjusted current frame mixed audio spectrum coefficient, and setting the pitch delay parameter to obtain the mixed LC3 audio code stream.

8. A computer readable storage medium storing computer instructions, wherein the computer instructions are operable to perform the method for mixing LC3 audio streams according to any one of claims 1-5.

9. A computer device comprising a processor and a memory, the memory storing computer instructions, wherein the processor operates the computer instructions to perform the method of mixing LC3 audio streams of any of claims 1-5.