WO2023197967A1 - 多通道的混音方法、设备及介质 - Google Patents

多通道的混音方法、设备及介质 Download PDF

Info

Publication number
WO2023197967A1
WO2023197967A1 PCT/CN2023/087077 CN2023087077W WO2023197967A1 WO 2023197967 A1 WO2023197967 A1 WO 2023197967A1 CN 2023087077 W CN2023087077 W CN 2023087077W WO 2023197967 A1 WO2023197967 A1 WO 2023197967A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
energy
audio
frame
audio data
Prior art date
Application number
PCT/CN2023/087077
Other languages
English (en)
French (fr)
Inventor
周永强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023197967A1 publication Critical patent/WO2023197967A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0324Details of processing therefor
    • G10L21/034Automatic adjustment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • This application relates to the field of audio processing technology, and specifically to a multi-channel mixing method, equipment and medium.
  • Dolby downmix solution uses the Dolby downmix solution to perform a weighted summation of the data related to the left and right channels in the multi-channel audio data to obtain two-channel audio data output.
  • Dolby specifications for example, if the energy of the bass audio data in the audio data is high, when the Dolby downmix solution is used for channel downmixing, the sound will be broken, causing the user's Poor listening experience.
  • Embodiments of the present application provide a multi-channel mixing method, equipment and medium, which solve the problem in the current channel downmixing solution that the downmixed audio data is broken and affects the user's auditory experience.
  • embodiments of the present application provide a multi-channel mixing method, which is applied to electronic devices and includes: acquiring first multi-channel audio data, where the first multi-channel audio data includes M to-be-mixed sounds. channel audio data; determine that there is audio data in the first multi-channel audio data whose energy meets the preset energy threshold, and perform energy reduction processing on the audio data in the first multi-channel audio data whose energy is greater than the preset energy threshold; According to the energy reduction processing result, the second multi-channel audio data is obtained; the second multi-channel audio data is downmixed to obtain mixed output data with N mixing channels, where M>N, and N ⁇ 1 .
  • the first multi-channel audio data is input data during channel downmixing
  • the second multi-channel audio data is output data after channel downmixing
  • the preset energy threshold is a preset lowest energy value that may cause audio breakage after mixing. In some embodiments, the preset energy threshold is a preset energy value that may cause audio breakage after mixing and affect the user's listening experience. This application does not limit this.
  • the first multi-channel audio data may be multi-channel audio data such as 2.1 channel, 3.1 channel, 5.1 channel, 7.1 channel, etc.
  • the mixed output data may be mono audio data, two-channel audio data, etc.
  • the multi-channel audio data may also be other multi-channel audio data not exceeding the first multi-channel audio data.
  • the multi-channel mixing method in this application is to mix multi-channel audio data (ie, first multi-channel audio data) into audio data with a smaller number of channels, that is, mixing output data.
  • the audio data of each channel is energy tracked to determine the audio data that exceeds the preset energy threshold, and perform energy suppression to obtain the second largest value after energy suppression.
  • channel audio data and performs channel downmixing of the second multi-channel audio data.
  • the multi-channel mixing method of the embodiment of the present application can fully adapt to and support the channel downmixing of various multi-channel audio data, and can solve the problem of broken sound caused by excessive energy of some audio frames during channel downmixing. problem, obtain more ideal channel downmixing results, and improve the user's listening experience.
  • determining that there is audio data with energy greater than a preset energy threshold in the first multi-channel audio data includes: performing frame processing on the first multi-channel audio data to obtain multiple audio frames, and determine the frame energies of the multiple audio frames; and determine that there is a high-energy audio frame in the first multi-channel audio data whose frame energy is greater than a preset energy threshold.
  • audio frames in the first multi-channel audio data whose frame energy does not exceed the preset energy threshold are low-energy audio frames, and energy reduction processing may not be performed on low-energy audio frames.
  • energy reduction processing is performed on the audio data in the first multi-channel audio data whose energy is greater than the preset energy threshold to obtain the second multi-channel audio data, including: determining the high energy The target gain of the audio frame, and determine the frame gain of the high-energy audio frame based on the target gain; determine the target audio frame corresponding to the high-energy audio frame after the energy reduction process based on the frame gain of the high-energy audio frame.
  • the target gain is an energy suppression factor when performing energy reduction processing on high-energy audio frames.
  • the energy suppression factor can be used to achieve energy reduction of high-energy audio frames.
  • the low-energy audio frame may also have a target gain.
  • the target gain of the low-energy audio frame is 1, that is, no energy reduction is performed on it.
  • the frame energy of the high-energy audio frame is determined by the following formula:
  • the high-energy audio frame includes L sampling points; ⁇ represents the frame energy smoothing coefficient; x i (n) (k) represents the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed.
  • the audio data of the k-th sampling point in; Represents the energy of the k-th sampling point in the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; Represents the frame energy of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed.
  • L can also be other values, which is not limited by this application.
  • the preset energy threshold includes a first threshold and/or a second threshold;
  • the high-energy audio frame includes at least one of the following: multiple audio frames of M channels to be mixed , the audio frame corresponding to at least one audio frame with the same index of the same mixing channel whose average frame energy is greater than the first threshold is a high-energy audio frame; the same audio frame is Among each audio frame of the mixing channel, the audio frame in which the maximum frame energy of at least two audio frames that are continuous with the corresponding audio frame is greater than the second threshold is a high-energy audio frame.
  • the index of the audio frame is the sequence number corresponding to an audio frame in any one of the M channels to be mixed, for example, for the i-th channel to be mixed among the M channels to be mixed, n audio frames, whose index is n.
  • the maximum frame energy of each audio frame of the M to-be-mixed channels is based on the frame in the audio frame corresponding to the same mixing channel and having the same index as each audio frame.
  • the frame energy of the audio frame with the highest energy is determined.
  • the target gain of a high-energy audio frame is determined based on a preset energy threshold and the maximum frame energy of at least two audio frames consecutive to each high-energy audio frame.
  • the frame gain is determined by the following formula: Among them, ⁇ represents the frame gain smoothing coefficient; Represents the target gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; Represents the frame gain of the n-1th audio frame of the i-th channel to be mixed among the M channels to be mixed; Represents the frame gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed.
  • the frame gain of a low-energy audio frame whose frame energy in the first multi-channel audio data does not exceed the preset energy threshold can also be calculated using the above formula, where the target gain of the low-energy audio frame is 1.
  • determining the target audio frame corresponding to the high-energy audio frame after energy reduction processing based on the frame gain of the high-energy audio frame includes: determining the high-energy audio frame based on the frame gain of the high-energy audio frame. The sampling point gain of each sampling point in the energy audio frame; according to the gain of each sampling point, perform energy reduction processing on the audio data of each sampling point in the high-energy audio frame to obtain the audio data of each sampling point in the target audio frame; according to the target The audio data at each sampling point of the audio frame generates the target audio frame.
  • the gain of each sampling point is determined by the following formula:
  • FrameLen represents the frame length of the target audio frame; Represents the frame gain of the n-1th audio frame of the i-th channel to be mixed among the M channels to be mixed; Represents the frame gain of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed; Represents the sampling point gain of the k-th sampling point of the n-1th audio frame of the i-th audio channel to be mixed among the M channels to be mixed; Represents the sampling point gain of the k-th sampling point of the n-th audio frame of the i-th channel to be mixed among the M channels to be mixed.
  • the low-energy audio frame whose frame energy in the first multi-channel audio data does not exceed the preset energy threshold can also use the above formula to calculate its sampling point gain, where the frame gain of the low-energy audio frame is based on the low The energy audio frame is calculated with a target gain of 1.
  • the audio data of each sampling point in the target audio frame is determined by the audio data of each sampling point of the high-energy audio frame corresponding to the target audio frame and the corresponding sampling point gain.
  • obtaining the second multi-channel audio data according to the energy reduction processing result includes: according to the target audio frame and the energy in the first multi-channel audio data is not greater than the preset energy threshold. of low-energy audio frames to generate second multi-channel audio data.
  • downmixing the second multi-channel audio data to obtain mixing output data with N second channels includes: The target audio frame and the low-energy audio frame corresponding to the same second channel are weighted and summed to obtain the mixing output data.
  • the above-mentioned process of obtaining the mixing output data is to perform channel downmixing on the second multi-channel audio data using the Dolby downmixing method, in which the weighting coefficients for weighted summation are preset parameters.
  • embodiments of the present application provide an electronic device, including: one or more processors; one or more memories; one or more memories store one or more programs. When one or more programs are When one or more processors are executed, the electronic device is caused to execute the above multi-channel mixing method.
  • embodiments of the present application provide a computer-readable storage medium. Instructions are stored on the storage medium. When the instructions are executed on a computer, they cause the computer to perform the above-mentioned multi-channel mixing method.
  • embodiments of the present application provide a computer program product, which includes a computer program/instruction that implements the above multi-channel mixing method when executed by a processor.
  • Figure 1 shows a schematic diagram of a multi-channel mixing method provided by an embodiment of the present application
  • Figure 2 shows a schematic flow chart of a mixing method for downmixing six channels into two channels according to an embodiment of the present application
  • Figure 3 shows a schematic flow chart of a multi-channel mixing method provided by an embodiment of the present application
  • Figure 4 shows a schematic flow chart of another multi-channel mixing method provided by an embodiment of the present application.
  • Figure 5 shows a schematic flow chart of an energy suppression method provided by an embodiment of the present application
  • Figure 6 shows a schematic diagram of a code stream waveform of multi-channel audio data provided by an embodiment of the present application
  • Figure 7 shows a schematic diagram of the code stream waveform and energy spectrum of the mixed channel after channel downmixing
  • Figure 8 shows a schematic diagram of the hardware structure of a mobile phone provided by an embodiment of the present application.
  • the audio data obtained by downmixing the channels will have the problem of sound cracking.
  • the energy of the bass channel audio data is too high, when the weighted sum is performed to obtain the mixed audio data, the energy of the corresponding audio data will also be too high, and further, When the corresponding audio data is output, what the user hears is broken audio, and the user's listening experience is not high.
  • this application proposes a multi-channel mixing method.
  • the method includes: the electronic device determines the audio data that exceeds the preset energy threshold in each channel of the multi-channel audio data, performs energy suppression (i.e., energy reduction) on it, and then calculates the suppressed audio data of each channel. . Then, the suppressed multi-channel audio data can be weighted and summed based on the preset channel down-mixing algorithm to obtain the audio data after channel down-mixing.
  • energy suppression can be performed by calculating the frame gain of each audio frame in each channel, and reducing the frame gain through the corresponding suppression factor to obtain the suppressed audio frame, and then obtain the suppressed audio frame. audio data.
  • the preset channel down-mixing method is, before and after the channel down-mixing, the corresponding relationship between each channel, and the weight coefficient of the weighted sum.
  • the six channels are left channel, left surround channel, bass channel, center channel, right channel, and right surround.
  • the default channel downmixing method is: left Channel, left surround, bass, center downmix is the left channel, right channel, right surround, bass, center downmix is the right channel
  • the weight coefficient in the weighted summation is the default in the Dolby downmix solution weight coefficient.
  • multi-channel audio data can be divided into multiple audio frames, and then energy tracking and energy suppression are performed based on the audio frames.
  • the audio frame is divided into multiple audio data segments through frame processing, and each segment is regarded as an audio frame. There is overlap between the audio frames, and the audio frame division of each channel is the same.
  • the audio data whose energy exceeds the preset energy threshold can also be energy suppressed in other ways, which is not limited in this application.
  • the multi-channel mixing method performs energy tracking on the audio frames of each channel before performing a weighted summation of the audio data of each channel to determine the audio frames that exceed the preset energy threshold. , and performs energy suppression, can fully adapt to a variety of audio data, supports channel downmixing of a variety of multi-channel audio data, and can solve the problem of broken sounds caused by excessive energy in some audio frames during channel downmixing. More ideal channel downmixing results improve the user's listening experience.
  • the electronic devices in the embodiments of the present application include but are not limited to mobile phones (including folding screen mobile phones), tablet computers, laptop computers, desktop computers, servers, wearable devices, head-mounted displays, mobile email devices, Various electronic devices such as car equipment, portable game consoles, portable music players, reader devices, televisions with one or more processors embedded or coupled therein.
  • the electronic device is a mobile phone as an example to introduce this application below.
  • Figure 1 shows a schematic diagram of the application scenario of the multi-channel mixing method.
  • the scene includes a mobile phone 100 and a Bluetooth headset 200.
  • the mobile phone 100 and the Bluetooth headset are wirelessly connected via Bluetooth.
  • the mobile phone 100 When the user puts on the Bluetooth headset 200 and plays the multi-channel audio data in the mobile phone 100, the mobile phone 100 needs to perform channel downmixing of the multi-channel audio data and convert it into a left channel and a right channel. Two-channel audio data, and the audio data is sent to the Bluetooth headset 200 via Bluetooth. The Bluetooth headset 200 will play the audio data after receiving it.
  • Figure 2 shows a schematic flow chart of a mixing method for downmixing six channels into two channels.
  • the six channels include left channel A1, left surround channel A2, bass channel A3, center channel A4, right channel A5, Right surround channel A6.
  • the mobile phone 100 may first perform frame processing on the six-channel audio data and determine the frame energy of each audio frame. When the frame energy of the audio frame of any channel is greater than the preset energy threshold, the energy suppression factor of the audio frame is determined, the energy suppression is performed on the audio frame, and the six energy suppression factors are calculated based on each suppressed audio frame. Channel audio data. Then, the energy-suppressed six-channel audio data is downmixed using a preset channel downmixing algorithm to obtain two-channel audio data.
  • the two-channel audio data includes the left channel a1 and the right channel a2. Then the mobile phone 100 can send the audio data of the left channel a1 to the left earphone of the Bluetooth headset 200 , and send the audio data of the right channel a2 to the right earphone of the Bluetooth headset 200 .
  • the multi-channel mixing method in this application can not only support the downmixing of the above six channels into two channels, but also support the downmixing of any M channels into N channels, where M>N.
  • FIG. 3 is a schematic flowchart of a multi-channel mixing method provided by an embodiment of the present application.
  • multi-channel mixing methods include:
  • different channels in the multi-channel audio data are different channel types, and each channel is a channel to be mixed, such as the left channel, Right channel, left surround channel, right surround channel, etc. Audio data of different channels can be output through different speakers, or can be output from the same channel through channel downmixing.
  • the multi-channel audio data may be 3.1, 5.1, 7.1, etc. multi-channel audio data.
  • the 3.1 channel includes the left channel, bass channel, center channel and right channel
  • the 5.1 channel includes the left channel, left rear surround channel, bass channel, center channel, right channel and Right surround channel
  • 7.1 channel includes left channel, left rear surround channel, bass channel, center channel, right channel, right surround channel, left rear surround channel and right rear surround channel.
  • the multi-channel audio data may also include more or fewer channels than in the above examples, and this application is not limited to this.
  • the obtained multi-channel audio data also includes a mask of the multi-channel audio data.
  • the mask mask is used to cover the multi-channel audio data with a mask to select or block part of the audio data.
  • the corresponding relationship between the channels before and after mixing the multi-channel audio data can also be determined, that is, the audio data of each mixed channel is composed of the audio data of the pre-mixed channels. Audio data is mixed.
  • the multi-channel audio data may also be initialized.
  • the initialization may include determining a preset channel down-mixing algorithm. Specifically, it may include: determining the corresponding relationship between the channels before and after mixing, and performing a weighted summation of the multi-channel audio data to obtain the corresponding mixed audio data. The weight coefficient of the audio data of the channel. It can be understood that the audio data of at least one channel in the multi-channel audio data corresponding to the same mixing channel after channel downmixing (ie, the output channel after channel downmixing) can be used as a joint detection channel. .
  • initialization may also include initializing parameters of formulas in a preset channel downmixing algorithm or energy suppression algorithm.
  • the characteristics of multi-channel audio data and the parameters that characterize its essential characteristics will change with time, that is, audio data has time-varying characteristics. Therefore, the energy tracking and energy suppression of multi-channel audio data below can be based on On a short-term basis, short-term analysis is performed. Specifically, the multi-channel audio data can be divided into multiple segments, each segment being an audio frame. And the essential characteristics of the same audio frame remain unchanged or relatively stable.
  • multi-channel audio data can be directly divided into frames.
  • the frame length of each audio frame is 10-30 ms.
  • the multi-channel audio data can be sampled, the continuous multi-channel audio data can be converted into discrete multi-channel audio data, and the audio data of 512 continuous sampling points can be composed into an audio frame. .
  • step 303 If the frame energy of the m-th audio frame is greater than the preset energy threshold.
  • the preset energy threshold is set in advance. If the frame energy of the m-th audio frame is greater than the preset energy threshold, it indicates that after the channel is downmixed, part of the audio corresponding to the audio frame may be broken, and energy suppression needs to be performed, that is, step 304 is performed.
  • the frame energy of the m-th audio frame is less than or equal to the preset energy threshold, it means that after the channel is downmixed, the part of the audio corresponding to the audio frame will not break, and there is no need to perform energy suppression, that is, step 305 is performed. .
  • the index of the audio frame is the sequence number corresponding to an audio frame in any one of the multiple channels of the multi-channel audio data. For example, for the i-th audio to be mixed among the M channels to be mixed, The mth audio frame of the channel, its index is m.
  • the preset energy threshold may be determined based on the minimum frame energy value that may cause the audio data to break after the channel is downmixed.
  • the preset energy threshold is -6dB or -3dB, etc.
  • the specific determination can be based on different electronic devices, audio output devices and multi-channel audio data, and this application does not limit this.
  • step 303 can determine the average frame energy of the m-th audio frame by calculating the average frame energy of the m-th audio frame of each jointly detected channel, and judging the average frame energy. Whether the energy is greater than the set energy threshold.
  • step 303 may only determine the frame energy of the m-th audio frame by separately calculating the frame energy of the m-th audio frame of each channel, and whether the frame energy of the audio frame is greater than the set energy threshold. Whether the energy is greater than the set energy threshold.
  • the maximum frame energy of the m-th audio frame of each jointly detected channel can be taken as the frame energy of the m-th audio frame, and then it can be determined whether the frame energy is greater than the set value. Energy threshold to determine whether the frame energy of the m-th audio frame is greater than the set energy threshold.
  • the maximum frame energy of the m-th audio frame of each jointly detected channel as the frame energy of the m-th audio frame
  • setting the energy threshold includes a preliminary energy threshold (ie, the first threshold in the foregoing) and a precise energy threshold (ie, the second threshold in the foregoing). Furthermore, step 303 may make a preliminary judgment on the frame energy of the m-th audio frame. Specifically, the average frame energy of the m-th audio frame in each jointly detected channel is first calculated, and the average frame energy is judged to determine whether it is greater than the initial judgment energy threshold. If it is greater, further refined judgment is performed.
  • a preliminary energy threshold ie, the first threshold in the foregoing
  • a precise energy threshold ie, the second threshold in the foregoing.
  • step 303 may make a preliminary judgment on the frame energy of the m-th audio frame. Specifically, the average frame energy of the m-th audio frame in each jointly detected channel is first calculated, and the average frame energy is judged to determine whether it is greater than the initial judgment energy threshold. If it is greater, further refined judgment is performed.
  • the precise judgment can be obtained by taking the maximum frame energy of the m-th audio frame of each jointly detected channel as the frame energy of the m-th audio frame, and then the distance near the m-th audio frame can be determined Whether the maximum frame energy in at least two consecutive audio frames is greater than the precise energy threshold. Step 303 will be further introduced below in conjunction with the formula.
  • the frame energy of the audio frame can be obtained by Fourier transforming the multi-channel audio data to obtain the energy spectrum of the multi-channel audio data, and calculating multiple values based on the energy values of multiple consecutive sampling points.
  • the frame energy of the audio frame composed of sampling points. The specific calculation method will be introduced below with the formula.
  • the energy suppression algorithm is an algorithm for determining the energy-suppressed target audio frame based on the frame energy of the m-th audio frame.
  • the energy suppression factor can be calculated based on the calculated frame energy of the m-th audio frame and a preset formula, and then the energy suppression factor is performed on the m-th audio frame based on the energy suppression factor to obtain the m-th audio frame.
  • m target audio frames the energy suppression factor is the target gain, and the energy suppression may be the frame gain of the m-th audio frame calculated based on the target gain, and the m-th target audio frame calculated based on the frame gain.
  • the frame gain of the m-th audio frame can be calculated through the energy suppression factor, and based on the The frame gain calculates the gain of each sampling point in the m-th audio frame (i.e., the sampling point gain), and then determines the audio data of each sampling point in the target audio frame based on the gain of each sampling point, and performs signal reconstruction to determine the m-th target The audio data of the audio frame.
  • the audio data corresponding to the audio frame will not be output due to the energy being too high. If the sound breaks, it will not affect the user's listening experience. Therefore, there is no need to suppress the energy of the audio frame, and the audio data of the original audio frame can be retained.
  • the preset channel downmixing rules include the corresponding channels before and after channel downmixing and the channel downmixing calculation formula. That is, the target audio frames of some of the multi-channel channels that participate in the calculation of the audio data of each mixed channel after channel downmixing can be determined through the corresponding channels before and after channel downmixing. Then, according to the determined speech frame corresponding to the same mixing channel, it is substituted into the corresponding channel downmix formula to achieve the mixing output data of the mixing channel.
  • the mixing output data is the audio data of one channel output to the Bluetooth headset 200 as mentioned above.
  • the Dolby downmix algorithm can be used to perform channel downmixing to obtain the mixing output data of each mixing channel.
  • the target audio frames corresponding to the indexes in each joint detection channel can be weighted and summed to obtain the mixed audio frame in the mixed output data corresponding to the target audio frame, and then the mixed audio frames corresponding to the same mixed channel can be obtained.
  • the output data of the mixed channel can be obtained.
  • the embodiment of the present application uses the above-mentioned multi-channel mixing method to track the energy of the voice data in the multi-channel audio data, and performs energy suppression on the audio data with larger energy, and then based on the energy-suppressed multi-channel audio
  • the data is channel downmixed.
  • the multi-channel mixing method in the embodiments of the present application can be applied to multi-channel audio data of Dolby standards, and can also be applied to multi-channel audio data of non-Dolby standards, that is, the multi-channel audio data in the embodiments of the present application is
  • the channel mixing method can be applied to a variety of multi-channel channel downmixing, and can achieve adaptive channel downmixing without breaking the sound due to excessive energy in the audio data.
  • the multi-channel mixing method in the embodiment of the present application only performs energy suppression on part of the audio data with higher energy, without abandoning the audio data. It can solve the problem of mixed sound while retaining the audio data of each channel. , improve the user’s listening experience.
  • FIG. 4 is a schematic flowchart of another multi-channel mixing method in an embodiment of the present application.
  • multi-channel audio data includes a code stream of channel 1 audio data and a code stream of channel 2 audio data. After acquiring the audio data of channel 1 and the audio data of channel 2, the electronic device performs frame processing on the audio data of the two channels, and each channel can obtain 6 audio data frames.
  • the second audio frame and the fourth audio frame are not very smooth.
  • the second audio frame is not very smooth.
  • the first audio frame and the third audio frame are not very smooth.
  • the audio frame, as well as the second audio frame and the third audio frame in the audio frame of channel 1 are energy suppressed to obtain the code stream of the target audio frame after suppressing each channel.
  • the Dolby downmix algorithm can be used to perform a weighted summation of the corresponding target audio frames in the two downmixed channels to obtain the corresponding mixed audio frames in the mixed channel, 6 mixed audio frames. Make up the mix output data.
  • Figure 5 shows a flow chart of an energy suppression method in an embodiment of the present application.
  • the method includes:
  • Step 501 Multi-channel data.
  • the multi-channel data obtained in step 501 is the multi-channel audio data obtained in step 301.
  • Step 501 is similar to step 301, and will not be described in detail here.
  • channel arrangement refers to the type and number of channels in multi-channel data.
  • the initialization downmixing algorithm is a formula algorithm that generates the correspondence between the channels before and after downmixing and the mixing channels based on the channel arrangement, and calculates the data of the mixing channels.
  • generating the initialization downmixing algorithm may include determining the correspondence between the M channels and the N channels through a mask of multi-channel data. For example, when downmixing six channels to two channels, the left channel, left surround, bass, and center are downmixed to the left channel, and the right channel, right surround, bass, and center are downmixed to the right channel.
  • generating the initialization downmix algorithm further includes calculating a formula for each mixing output data in the mixing channel and initializing the parameters in the formula.
  • each channel that generates the same mixing channel (i.e., mixing channel) during channel downmixing can be represented as a joint detection channel, that is, the data of each channel in the joint detection channel corresponding to the mixing channel is weighted. After summing, the data that needs to be output by the mixing channel (that is, the mixing channel) is obtained.
  • VAD detection Voice Endpoint Detection
  • the condition for VAD detection may be that the frame energy of the audio data is greater than the set energy threshold.
  • the multi-channel data may be sampled and framed before performing VAD detection.
  • the frame dividing process may include, after sampling the multi-channel data, dividing some continuous sampling points into one audio frame, for example, taking 512 sampling points as one audio frame.
  • the VAD detection condition may be to detect whether the frame energy of the audio frame is greater than a preset VAD detection threshold (ie, the first threshold in the foregoing), where the VAD detection threshold may be, for example, the frame energy is greater than -6dB. .
  • the VAD detection judgment result is true, that is, the VAD detection result is 1, it indicates that the frame energy of the audio frame has exceeded the preset VAD detection threshold.
  • the audio channel is downmixed, Broken sound may occur. It can be understood that VAD detection is the preliminary judgment mentioned above, and further precise judgment can be made based on the preliminary judgment results.
  • whether the average frame energy of each audio frame in the channel is greater than a preset VAD detection threshold can be jointly detected. If the average frame energy is too high, it is necessary to track the energy of each audio frame of each channel, and perform energy suppression on the audio frames that meet the energy suppression conditions.
  • VAD detection result 1
  • the audio frame needs to be energy tracked.
  • VAD detection result false (false)
  • the audio channel will not be downmixed. Broken sound occurs because the energy of the audio frame is too high. Then VAD detection can be performed on the next audio frame.
  • 504 Calculate the frame energy, track and jointly detect the vocal channel and the maximum energy of the previous and next frames.
  • the frame energy of the audio frame can be calculated by the following formula:
  • is the smoothing coefficient when calculating frame energy
  • x i (n) (k) represents the input data of the k-th sampling point of the n-th audio frame in the i-th channel, that is, part of the acquired multi-channel data data.
  • i 0,1,2,3,4,5 ⁇
  • the value of i is related to the number of vocal channels
  • k can range from 0 to An integer between L
  • the value range of n is the number of speech frames after the multi-channel data is framed.
  • the smoothing coefficient ⁇ 0.3.
  • step 504 by calculating the frame energy
  • the frame energy of the data of each channel can be tracked, and when the frame energy of the audio frame is tracked and it is determined that the frame energy is greater than the set detection threshold, step 505 can be performed.
  • the audio frame with the highest energy among the corresponding audio frames in the joint detection channel can be calculated, and can be determined in the following manner.
  • the maximum energy value in the previous and subsequent audio frames can be determined through the following formula:
  • 505 Calculate the target gain and frame gain, and finally calculate the gain of each sampling point, multiply by the fixed gain and output the result.
  • the output result is the input data when performing channel downmixing.
  • the energy of the current n-th audio frame exceeds the set detection threshold (i.e., the second threshold above). If it exceeds, it indicates that the speech frame has a risk of sound breakage after the vocal channel is downmixed. Energy suppression is required, and the target gain and frame gain of the audio frame can be calculated to determine the energy suppression factor.
  • the determination of setting the detection threshold accurately determines the frame energy of each channel to determine whether audio frames with excessive energy need to be suppressed.
  • the target gain can be understood as an energy suppression factor, which is used to reduce the frame gain, thereby achieving the purpose of reducing the energy of the audio frame.
  • the target gain can be calculated by the following formula:
  • the frame gain of the audio frame can be determined by the following formula:
  • the smoothing coefficient ⁇ 0.1.
  • the following formula can be used to calculate the sampling point gain of each sampling point in the audio frame:
  • FrameLen represents the frame length of the audio frame. For example, if 512 sampling points are taken as an audio frame, the frame length of the audio frame is 512.
  • the sampling point gain of each sampling point is related to the frame gain of its corresponding speech frame and the sampling point gain of the corresponding index number of the previous speech frame.
  • the index of each sampling point is the serial number of each sampling point in the corresponding audio frame. For example, the index of the k-th sampling point in an audio frame is k.
  • the calculation formula of the target audio frame calculated based on the sampling point gain obtained in Equation 6 is:
  • ⁇ i (n)[i][k] represents the audio data of the k-th sampling point of the n-th target audio frame of the i-th channel
  • x i (n)[i][k] represents the i-th The audio data of the k-th sampling point of the n-th target audio frame of the channel, that is, x i (n)(k) in Formula 1.
  • the audio data of the continuous target audio frame can be obtained. Furthermore, the audio data of each target audio frame can be used as input data for channel downmixing.
  • the channel downmixing can be performed according to the following formula:
  • a j (n) represents the output result of the n-th audio frame of the j-th mixing channel
  • w i,j (n) represents the n-th audio frame of the i-th channel in the j-th mixing channel
  • the mixing weight of ⁇ i (n) represents the input data of the n-th audio frame of the i-th channel when performing channel downmixing.
  • A 0,1,2,3,4,5 ⁇ , respectively representing multiple channels before mixing.
  • J represents the number of channels before mixing corresponding to the j-th mixing channel.
  • the multi-channel mixing method in the embodiment of the present application is simulated below with reference to Figures 6 and 7 .
  • the simulation software is clion software.
  • Figure 6 shows a schematic diagram of a code stream waveform of multi-channel audio data in an embodiment of the present application.
  • Figure 7 shows a schematic diagram of the code stream waveform and energy spectrum of the mixed channel after channel downmixing.
  • the six code stream waveforms in the figure represent the left channel, right channel, center channel, bass channel, left surround channel, and right surround channel from top to bottom.
  • the abscissa represents time, and the ordinate represents audio data of audio data.
  • each sampling point is represented by 16 bits. Therefore, when using a fixed coefficient downmixing scheme, if you want to ensure that the mixing result is not broken, the sum of each channel must be less than 16 The range that the bit can represent, otherwise the data will overflow and wrap around, and then data jumps will occur, which will produce noise.
  • Dolby downmix coefficients are used for mixing, part of the data in the box in Figure 6 has a large energy peak.
  • the channel downmix coefficient needs to be changed. Very small. At the same time, since the volume is positively related to the energy, the volume of the corresponding mixing channel will become smaller, causing the overall volume of the downmixed audio to become smaller, affecting the user's listening experience.
  • the first line of code stream waveforms is the code stream waveform of the audio data of the left channel mixing channel obtained by using the multi-channel mixing method in this application.
  • the second line of code stream waveforms is The picture shows the code stream waveform diagram of the audio data of the right channel mixing channel obtained by using the Dolby mixing method.
  • the abscissa represents time
  • the ordinate represents audio data of audio data.
  • the third row corresponds to the audio data of the left channel mixing channel in the first row and is the energy spectrum of the audio data of the left channel mixing channel.
  • the fourth row corresponds to the right channel mixing sound in the second row.
  • channel audio data which is the energy spectrum of the audio data of the right channel mixing channel.
  • the abscissa represents time and the ordinate represents energy.
  • the code stream waveform of the audio data of each channel after being processed by the multi-channel mixing method in the embodiment of the present application has a smoother envelope change and rarely has large fluctuations. , its energy stripes are also relatively stable and do not appear to be too high and spread throughout the entire frequency domain.
  • the code stream waveform of the audio data of the right channel mixing channel that has not been processed by the multi-channel mixing method in this application has a larger energy part, such as the audio data selected by the box in Figure 7. The energy stripes spread across the entire frequency domain, causing obvious sound breakage and obvious noise.
  • the multi-channel mixing method in the embodiment of the present application implements energy suppression of audio data with excessive energy by tracking the energy of the speech data of each channel, without losing the audio data. Reduces the risk of broken sound in the mix and improves the user's listening experience.
  • FIG. 8 shows a schematic diagram of the hardware structure of a mobile phone 100 according to an embodiment of the present application.
  • the mobile phone 100 can execute the display method provided by the embodiment of the present application.
  • the mobile phone 100 may include a processor 110, a power module 140, a memory 180, a camera 101, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, an interface module 160, a display screen 102, etc. .
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the mobile phone 100 .
  • the mobile phone 100 may include more or less components than shown in the figures, or some components may be combined, some components may be separated, or some components may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, such as a central processing unit (CPU), a graphics processor (GPU), an image signal processor (ISP), Digital signal processor (DSP), microprocessor (Micro-programmed Control Unit, MCU), artificial intelligence (Artificial Intelligence, AI) processor or programmable logic device (Field Programmable Gate Array, FPGA), etc. Processing module or processing circuit.
  • CPU central processing unit
  • GPU graphics processor
  • ISP image signal processor
  • DSP Digital signal processor
  • MCU microprocessor
  • Artificial Intelligence, AI Artificial Intelligence
  • FPGA Field Programmable Gate Array
  • Processing module or processing circuit can be independent devices or integrated into one or more processors.
  • the processor 110 can be used to determine whether the energy of the m-th audio frame is greater than the set energy threshold, and calculate the energy suppression factor.
  • the processor 110 may also be used to perform channel downmixing on the obtained target audio frame to obtain mixing output data.
  • Memory 180 can be used to store data, software programs and modules, and can be volatile memory (Volatile Memory), such as random access memory (Random-Access Memory, RAM); or non-volatile memory (Non-Volatile Memory), For example, read-only memory (Read-Only Memory, ROM), flash memory (Flash Memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above types of memory, or It can also be a removable storage medium, such as a Secure Digital (SD) memory card.
  • the memory 180 is used to store multi-channel audio data of the mobile phone 100 and a preset channel downmixing algorithm.
  • Power module 140 may include a power supply, power management components, and the like.
  • the power source can be a battery.
  • the power management component is used to manage the charging of the power supply and the power supply from the power supply to other modules.
  • the charging management module is used to receive charging input from the charger; the power management module is used to connect the power supply, the charging management module and the processor 110 .
  • the mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, a low noise amplifier (Low Noise Amplify, LNA), etc.
  • the mobile communication module 130 can provide wireless communication solutions including 2G/3G/4G/5G applied on the mobile phone 100.
  • the mobile communication module 130 can receive electromagnetic waves through an antenna, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 130 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna for radiation.
  • at least part of the functional modules of the mobile communication module 130 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 130 may be provided in the same device as at least part of the modules of the processor 110 .
  • the wireless communication module 120 may include an antenna, and implements the transmission and reception of electromagnetic waves via the antenna.
  • the wireless communication module 120 can provide applications on the mobile phone 100 including Wireless Local Area Networks (WLAN) (such as Wireless Fidelity (Wi-Fi) network), Bluetooth (Bluetooth, BT), and Global Navigation Satellite System. (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions.
  • WLAN Wireless Local Area Networks
  • WiFi Wireless Fidelity
  • Bluetooth Bluetooth
  • GNSS Global Navigation Satellite System
  • FM Frequency Modulation
  • NFC Near Field Communication
  • IR Infrared
  • the mobile phone 100 can communicate with the network and other devices through wireless communication technology.
  • the mobile communication module 130 and the wireless communication module 120 of the mobile phone 100 may also be located in the same module.
  • Camera 101 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP (Image Signal Processor) to convert it into a digital image signal.
  • the mobile phone 100 can realize the shooting function through the ISP, camera 101, video codec, GPU (Graphic Processing Unit, graphics processor), display screen 102 and application processor.
  • the camera 101 is used to collect face images and QR code images, and is used for the mobile phone 100 to perform face recognition, QR code recognition, etc. .
  • Display 102 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (Active-matrix Organic Light).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED Organic light-emitting diode
  • FLED Flexible Light-emitting Diode
  • Mini LED Micro LED
  • Micro OLED Quantum Dot Light-emitting Diodes
  • the display screen 102 is used to display various UI interfaces of the mobile phone 100 in the mode of split screen, parallel view, single APP monopolizing the screen, etc. in the horizontal screen/vertical screen state.
  • the sensor module 190 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the audio module 150 may convert digital audio information into an analog audio signal output, or convert an analog audio input into a digital audio signal. Audio module 150 may also be used to encode and decode audio signals. In some embodiments, the audio module 150 may be disposed in the processor 110 , or some functional modules of the audio module 150 may be disposed in the processor 110 .
  • the interface module 160 includes an external memory interface, a universal serial bus (Universal Serial Bus, USB) interface, a subscriber identification module (Subscriber Identification Module, SIM) card interface, etc.
  • the external memory interface can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the mobile phone 100.
  • the external memory card communicates with the processor 110 through the external memory interface to implement data storage functions.
  • the universal serial bus interface is used by the mobile phone 100 to communicate with other mobile phones.
  • the user identity module card interface is used to communicate with the SIM card installed in the mobile phone 100, such as reading the phone number stored in the SIM card, or writing the phone number into the SIM card.
  • the mobile phone 100 also includes buttons, motors, indicators, etc.
  • the keys may include volume keys, on/off keys, etc.
  • the motor is used to make the mobile phone 100 produce a vibration effect.
  • Indicators may include laser pointers, radio frequency indicators, LED indicators, etc.
  • Embodiments of the mechanisms disclosed in this application may be implemented in hardware, software, firmware, or a combination of these implementation methods.
  • Embodiments of the present application may be implemented as a computer program or program code executing on a programmable system including at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements) , at least one input device and at least one output device.
  • Program code may be applied to input instructions to perform the functions described herein and to generate output information.
  • Output information can be applied to one or more output devices in a known manner.
  • a processing system includes any processor having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor. system.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • Program code may be implemented in a high-level procedural language or an object-oriented programming language to communicate with the processing system. When necessary, assembly language or machine language can also be used to implement program code. In fact, the mechanisms described in this application are not limited to the scope of any particular programming language. In either case, the language may be a compiled or interpreted language. In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried on or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be operated by one or more processors Read and execute. For example, instructions may be distributed over a network or through other computer-readable media.
  • machine-readable e.g., computer-readable
  • machine-readable media may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy disks, optical disks, optical disks, read-only memories (CD-ROMs), magnetic Optical disc, Read Only Memory (ROM), Random Access Memory (RAM), Erasable Programmable Read Only Memory (EPROM), Electrically Erasable Programmable Memory Read memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), magnetic or optical card, flash memory, or use the Internet to Tangible machine-readable storage that uses electrical, optical, acoustic, or other forms of propagated signals to transmit information (e.g., carrier waves, infrared signals, digital signals, etc.).
  • machine-readable media includes any type of machine-readable media suitable for storing or transmitting electronic instructions or information in a form readable by a machine (eg, computer).
  • the technical solution of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores instructions. When the instructions are executed on the electronic device 100, the electronic device 100 causes the electronic device 100 to execute the display method provided by the technical solution of the present application.
  • the technical solution of this application also provides a computer program product.
  • the computer program product includes instructions, and the instructions are used to implement the display method provided by the technical solution of this application.
  • the technical solution of the present application also provides a chip device.
  • the chip device includes: a communication interface for inputting and/or outputting information; and a processor for executing computer executable programs, so that the device equipped with the chip device executes the present invention. Apply for the display method provided by the technical solution.
  • each unit/module mentioned in each device embodiment of this application is a logical unit/module.
  • a logical unit/module can be a physical unit/module, or it can be a physical unit/module.
  • Part of the module can also be implemented as a combination of multiple physical units/modules.
  • the physical implementation of these logical units/modules is not the most important.
  • the combination of functions implemented by these logical units/modules is what solves the problem of this application. Key technical issues raised.
  • the above-mentioned equipment embodiments of this application do not introduce units/modules that are not closely related to solving the technical problems raised by this application. This does not mean that the above-mentioned equipment embodiments do not exist. Other units/modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

本申请涉及音频处理技术领域,具体涉及一种多声道的混音方法、设备及介质。该方法包括:获取第一多声道音频数据,第一多声道音频数据包括M个待混音声道的音频数据;确定出第一多声道音频数据中存在能量满足预设能量阈值的音频数据,并对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理;根据能量降幅处理结果,得到第二多声道音频数据;对第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。本申请实施例中提供的多声道的混音方法,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。

Description

多通道的混音方法、设备及介质
本申请要求于2022年04月15日提交中国专利局、申请号为202210414876.5、发明名称为“多通道的混音方法、设备及介质”的中国专利申请的优先权,上述专利的全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术领域,具体涉及一种多通道的混音方法、设备及介质。
背景技术
随着现代技术的快速发展,在需要进行音频播放的多种场景中,由于音频数据与音频输出设备的声道数量的不匹配问题,往往需要在输出音频时完成实时多声道混音,一般为将多声道音频数据转为声道数量更少的音频数据,即声道下混。例如在大屏播放AiMax片源等的时候,会存在3.1、5.1、7.1等多声道的音频数据,但是大屏输出设备切换为数字音频接口(Sony/Philips Digital Interface,spdif)/音频回传通道(Audio Return Channel,ARC)/蓝牙输出时,存在只输出两个声道的情况,为了尽可能多的保留音频流的信息,需要对多个声道的数据进行下混生成两声道数据。
目前对多声道混音为两声道的声道下混方案一般采用下列两种方案:
1)采用多声道的前两个声道数据作为输出,丢弃中置声道、环绕声道、低音声道等部分。此种方案在进行输出时,由于部分人声音频数据出现在被丢弃的声道中会造成人声丢失,同时由于只采用两个声道作为输出,会降低用户的听觉体验。
2)采用杜比下混方案,对多声道的音频数据中和左右声道相关的数据进行加权求和,得到两声道的音频数据输出。但是对于不符合杜比规格的音频数据,例如对于音频数据中,低音音频数据能量较高的情况,在采用杜比下混方案进行声道下混时,会出现破音的情况,使得用户的听觉体验不佳。
发明内容
本申请实施例提供了一种多通道的混音方法、设备及介质,解决了目前声道下混方案中,下混后的音频数据破音,影响用户听觉体验的问题。
第一方面,本申请实施例提供了一种多声道的混音方法,应用于电子设备,包括:获取第一多声道音频数据,第一多声道音频数据包括M个待混音声道的音频数据;确定出第一多声道音频数据中存在能量满足预设能量阈值的音频数据,并对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理;根据能量降幅处理结果,得到第二多声道音频数据;对第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。
可以理解,第一多声道音频数据为声道下混时的输入数据,第二多声道音频数据为声道下混后的输出数据。
在一些实施例中,预设能量阈值为预先设置好的,可能会造成混音后音频破音的最低能量值。在一些实施例中,预设能量阈值为预先设置好的,可能会造成混音后音频破音且影响用户听觉体验的其他能量值。本申请对此不作限制。
在一些实施例中,第一多声道音频数据可以为2.1声道、3.1声道、5.1声道、7.1声道等多声道音频数据,混音输出数据可以为单声道音频数据、两声道音频数据,也可以为不超过第一多声道音频数据的其他多声道音频数据。
可以理解,本申请中的多声道混音方法为将多声道的音频数据(即第一多声道音频数据)混合成声道数量更少的音频数据,即混音输出数据。在对各声道音频数据进行声道下混前,通过对各声道的音频数据进行能量跟踪,确定出超出预设能量阈值的音频数据,并进行能量抑制,得到能量抑制后的第二多声道音频数据,并对第二多声道音频数据进行声道下混。本申请实施例的多声道的混音方法,可以充分适应并支持多种多声道音频数据的声道下混,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。
在上述第一方面的一种可能的实现中,确定出第一多声道音频数据中存在能量大于预设能量阈值的音频数据,包括:对第一多声道音频数据进行分帧处理,得到多个音频帧,并确定多个音频帧的帧能量;确定出第一多声道音频数据中存在帧能量大于预设能量阈值的高能量音频帧。
可以理解,在一些实施例中,第一多声道音频数据中帧能量不超过预设能量阈值的音频帧为低能量音频帧,对于低能量音频帧可以不进行能量降幅处理。
在上述第一方面的一种可能的实现中,对第一多声道音频数据中能量大于预设能量阈值的音频数据进行能量降幅处理,得到第二多声道音频数据,包括:确定高能量音频帧的目标增益,并根据目标增益确定高能量音频帧的帧增益;根据高能量音频帧的帧增益,确定能量降幅处理后高能量音频帧对应的目标音频帧。
可以理解,目标增益为对高能量音频帧进行能量降幅处理时的能量抑制因子,利用该能量抑制因子可以实现高能量音频帧的能量降幅。
在一些实施例中,对于低能量音频帧也可以具有目标增益,低能量音频帧的目标增益为1,即不对其进行能量降幅。
在上述第一方面的一种可能的实现中,高能量音频帧的帧能量是通过下列公式确定的:其中,高能量音频帧包括L个采样点;β表示帧能量平滑系数;xi(n)(k)表示M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的音频数据;表示M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的能量;表示M个待混音声道中第i个待混音声道的第n个音频帧的帧能量。
在一些实施例中,每一个音频帧可以包括L=512个采样点,即音频帧的帧长为512。在另一些实施例中,L还可以为其他数值,本申请对此不作限制。
在上述第一方面的一种可能的实现中,预设能量阈值包括第一阈值和/或第二阈值;高能量音频帧包括下列至少之一:M个待混音声道的多个音频帧中,对应于同一混音声道的索引相同的至少一个音频帧的平均帧能量大于第一阈值的音频帧为高能量音频帧;同一待 混音声道的各音频帧中,与对应音频帧连续的至少两个音频帧的最大帧能量大于第二阈值的音频帧为高能量音频帧。
可以理解,音频帧的索引为M个待混音声道中任意一个声道中的某一音频帧对应的序号,例如对于M个待混音声道中第i个待混音声道的第n个音频帧,其索引为n。
在上述第一方面的一种可能的实现中,M个待混音声道的各音频帧的最大帧能量是根据与各音频帧对应于同一混音声道且索引相同的音频帧中的帧能量最大的音频帧的帧能量确定的。
在上述第一方面的一种可能的实现中,高能量音频帧的目标增益是根据预设能量阈值,以及与各高能量音频帧连续的至少两个音频帧的最大帧能量确定的。
在上述第一方面的一种可能的实现中,帧增益是通过下列公式确定的:其中,α表示帧增益平滑系数;表示M个待混音声道中第i个待混音声道的第n个音频帧的目标增益;表示M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益;表示M个待混音声道中第i个待混音声道的第n个音频帧的帧增益。
在一些实施例中,第一多声道音频数据中的帧能量不超过预设能量阈值的低能量音频帧也可以采用上述公式计算其帧增益,其中低能量音频帧的目标增益为1。
在上述第一方面的一种可能的实现中,根据高能量音频帧的帧增益,确定能量降幅处理后高能量音频帧对应的目标音频帧,包括:根据高能量音频帧的帧增益,确定高能量音频帧中各采样点的采样点增益;根据各采样点增益,对高能量音频帧中的各采样点的音频数据进行能量降幅处理,得到目标音频帧中各采样点的音频数据;根据目标音频帧各采样点的音频数据生成目标音频帧。
在上述第一方面的一种可能的实现中,各采样点增益是通过下列公式确定的:
其中,FrameLen表示目标音频帧的帧长;表示M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益;表示M个待混音声道中第i个待混音声道的第n个音频帧的帧增益;表示M个待混音声道中第i个待混音声道的第n-1个音频帧的第k个采样点的采样点增益;表示M个待混音声道中第i个待混音声道的第n个音频帧的第k个采样点的采样点增益。
在一些实施例中,第一多声道音频数据中的帧能量不超过预设能量阈值的低能量音频帧也可以采用上述公式计算其采样点增益,其中低能量音频帧的帧增益为基于低能量音频帧的目标增益为1计算得到的。
在上述第一方面的一种可能的实现中,目标音频帧中各采样点的音频数据是通过目标音频帧对应的高能量音频帧的各采样点的音频数据以及对应的采样点增益确定的。
在上述第一方面的一种可能的实现中,根据能量降幅处理结果,得到第二多声道音频数据,包括:根据目标音频帧和第一多声道音频数据中能量不大于预设能量阈值的低能量音频帧,生成第二多声道音频数据。
在上述第一方面的一种可能的实现中,对第二多声道音频数据进行下混,得到具有N个第二声道的混音输出数据,包括:对第二多声道音频数据中对应于同一第二声道的目标音频帧和低能量音频帧进行加权求和,得到混音输出数据。
可以理解,上述得到混音输出数据的过程为对第二多声道音频数据采用杜比下混的方法对其进行声道下混,其中进行加权求和的权重系数为预设的参数。
第二方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;一个或多个存储器;一个或多个存储器存储有一个或多个程序,当一个或者多个程序被一个或多个处理器执行时,使得电子设备执行上述多通道的混音方法。
第三方面,本申请实施例提供了一种计算机可读存储介质,存储介质上存储有指令,指令在计算机上执行时使计算机执行上述多通道的混音方法。
第四方面,本申请实施例提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现上述多通道的混音方法。
附图说明
图1所示为本申请实施例提供的多声道的混音方法的场景示意图;
图2所示为本申请实施例提供的六声道下混为两声道的混音方法的流程示意图;
图3所示为本申请实施例提供的一种多声道的混音方法的流程示意图;
图4所示为本申请实施例提供的另一种多声道的混音方法的流程示意图;
图5所示为本申请实施例提供的一种能量抑制方法的流程示意图;
图6所示为本申请实施例提供的一种多声道音频数据的码流波形示意图;
图7所示为进行声道下混后的混音声道的码流波形示意图和能量谱示意图;
图8所示为本申请实施例提供的一种手机的硬件结构示意图。
具体实施方式
如前文所述,对于不符合杜比规格的音频数据,在进行声道下混得到的音频数据会出现破音的问题。具体地,不符合杜比规格的音频数据,由于低音声道的音频数据由于能量过高,在进行加权求和得到混音后的音频数据时,对应音频数据的能量也会过高,进而,对应音频数据在进行输出时,用户听到的就是破音的音频,用户听觉体验不高。
为解决上述声道下混方案中,下混后的音频数据破音,影响用户听觉体验的问题,本申请提出了一种多声道的混音方法。该方法包括:电子设备确定出多声道音频数据的各声道中超出预设能量阈值的音频数据,并对其进行能量抑制(即能量降幅),然后计算得到各声道抑制后的音频数据。进而可以基于预设声道下混算法对抑制后的多声道音频数据进行加权求和,得到声道下混后的音频数据。
可以理解,在一些实施例中,能量抑制可以通过计算各声道中每个音频帧的帧增益,并通过对应的抑制因子对帧增益进行降幅,得到抑制后的音频帧,进而得到能量抑制后的音频数据。
可以理解,预设声道下混算法为,声道下混前后,各声道的对应关系,以及加权求和的权重系数。例如,对于六声道下混为两声道的声道下混方案中,其中六声道分别为左声道、左环绕声道、低音声道、中置声道、右声道、右环绕声道,预设声道下混算法为:左 声道、左环绕、低音、中置下混为左声道,右声道、右环绕、低音、中置下混为右声道,加权求和时的权重系数为杜比下混方案中默认权重系数。
可以理解,在一些实施例中,可以将多声道音频数据划分为多个音频帧,进而基于音频帧进行能量跟踪与能量抑制。其中,音频帧为通过分帧处理即将音频数据划分为多个音频数据片段,每个片段作为一个音频帧,各个音频帧之间有交叠,且各声道的音频帧划分相同。在其他实施例中,还可以通过其他方式对能量超出预设能量阈值的音频数据进行能量抑制,本申请对此不作限制。
本申请实施例中提供的多声道的混音方法,在对各声道音频数据进行加权求和前,通过对各声道的音频帧进行能量跟踪,确定出超出预设能量阈值的音频帧,并进行能量抑制,可以充分适应多种音频数据,支持多种多声道音频数据的声道下混,可以解决声道下混时,由于部分音频帧能量过高产生的破音问题,得到更为理想的声道下混结果,提升用户的听觉体验。
可以理解,本申请实施例中的电子设备包括但不限于手机(包括折叠屏手机)、平板电脑、膝上型计算机、台式计算机、服务器、可穿戴设备、头戴式显示器、移动电子邮件设备、车机设备、便携式游戏机、便携式音乐播放器、阅读器设备、其中嵌入或耦接有一个或多个处理器的电视机等各类电子设备。为了方便说明,下文以电子设备为手机为例,进行对本申请进行介绍。
下面结合图1和图2,以手机100中的音频数据输出到蓝牙耳机200为例,对本申请实施例中的应用场景进行介绍。
图1所示为多声道的混音方法的应用场景示意图。
如图1所示,该场景包括手机100和蓝牙耳机200。其中手机100和蓝牙耳机通过蓝牙无线连接。
当用户戴上蓝牙耳机200并播放手机100中的多声道的音频数据时,手机100需要将其中的多声道的音频数据进行声道下混,转换为包括左声道和右声道的两声道的音频数据,并将该音频数据通过蓝牙发送至蓝牙耳机200。蓝牙耳机200接收到该音频数据后会进行播放。
图2所示为六声道下混为两声道的混音方法的流程示意图。
具体地,如图2所示,以六声道音频数据为例,其中六声道包括左声道A1、左环绕声道A2、低音声道A3、中置声道A4、右声道A5、右环绕声道A6。手机100在进行音频数据的声道下混前,可以先对六声道音频数据进行分帧处理,并确定每个音频帧的帧能量。当任一声道的音频帧的帧能量大于预设能量阈值时,确定该音频帧的能量抑制因子,对该音频帧进行能量抑制,并根据抑制后的各音频帧计算该能量抑制后的六声道的音频数据。然后将能量抑制后的六声道的音频数据采用预设声道下混算法,进行声道下混,得到两声道音频数据,两声道音频数据包括左声道a1和右声道a2。然后手机100可以将左声道a1的音频数据发送至蓝牙耳机200的左耳机,将右声道a2的音频数据发送至蓝牙耳机200的右耳机。
可以理解,本申请中的多声道的混音方法,除了可以支持上述六声道下混为两声道,还可以支持任意M声道下混为N声道,其中M>N。
下面结合附图,对本申请实施例中的多声道的混音方法进行进一步介绍。
图3所示为本申请实施例提供的多声道的混音方法的流程示意图。
如图3所示,多声道的混音方法包括:
301:获取多声道音频数据。
可以理解,多声道音频数据(即前文中的第一多声道音频数据)中的不同声道为不同的声道类型,其中每个声道为待混音声道,例如左声道、右声道、左环绕声道、右环绕声道等,不同声道的音频数据可以通过不同的扬声器输出,也可以通过声道下混,由相同的声道输出。
在一些实施例中,多声道音频数据可以为3.1、5.1、7.1等多声道的音频数据。其中,3.1声道包括左声道、低音声道、中置声道和右声道,5.1声道包括左声道、左后环绕声道、低音声道、中置声道、右声道和右环绕声道,7.1声道包括左声道、左后环绕声道、低音声道、中置声道、右声道、右环绕声道、左后环绕声道和右后环绕声道。在一些实施例中,多声道音频数据还可以为比上述举例中更多更少的声道,本申请对此不作限制。
可以理解,在一些实施例中,获取到的多声道音频数据中还包括该多声道音频数据的mask掩码。其中,mask掩码用于在多声道音频数据上覆盖一层掩膜,以选择或屏蔽部分音频数据。根据mask掩码,还可以确定出对多声道音频数据进行混音前后,各声道的对应关系,即每个混音后的声道的音频数据由哪几个混音前的声道的音频数据混合而成。
在一些实施例中,在获取到多声道音频数据后,还可以对多声道音频数据进行初始化处理。其中,初始化可以包括,确定预设声道下混算法,具体可以包括:确定混音前后各声道的对应关系,以及对多声道的音频数据进行加权求和得到对应混音后的各声道的音频数据的权重系数。可以理解,多声道音频数据中的对应于声道下混后同一混音声道(即声道下混后的输出声道)的至少一个声道的音频数据,可以作为一个联合检测声道。进而,在进行音频帧的能量判断时,可以先对联合检测声道中对应的音频帧的能量进行初判。例如,可以对应音频帧的平均能量进行判断,判断其是否大于设定的初判能量阈值。在一些实施例中,初始化还可以包括,对预设声道下混算法或能量抑制算法中的公式的参数进行初始化。
302:对多声道音频数据进行分帧,得到多个音频帧。
可以理解,多声道音频数据的特征及表征其本质特征的参数会随时间变化而改变,即音频数据具有时变特性,故而下文对多声道音频数据的能量跟踪以及能量抑制,可以建立在短时的基础上,即进行短时分析。具体地,可以通过将多声道音频数据划分为多段,每段为一个音频帧。且同一音频帧的本质特征保持不变或相对稳定。
进一步地,在一些实施例中,可以对多声道音频数据直接进行分帧,例如,每个音频帧的帧长为10-30ms。
在另一些实施例中,可以对多声道音频数据进行采样,将连续的多声道音频数据转换为离散的多声道音频数据,并将连续的512个采样点的音频数据组成一个音频帧。
303:若第m个音频帧的帧能量大于预设能量阈值。其中,预设能量阈值为预先设置的。若第m个音频帧的帧能量大于预设能量阈值时,表明声道下混后,该音频帧对应的部分音频可能会出现破音,则需要进行能量抑制,即执行步骤304。当若第m个音频帧的帧能量小于或等于预设能量阈值时,表明声道下混后,该音频帧对应的部分音频不会出现破音,则不需要进行能量抑制,即执行步骤305。
可以理解,音频帧的索引为多声道音频数据的多个声道中任意一个声道中的某一音频帧对应的序号,例如对于M个待混音声道中第i个待混音声道的第m个音频帧,其索引为m。
在一些实施例中,可以根据可能会造成声道下混后音频数据破音的最小帧能量数值确定预设能量阈值。例如,预设能量阈值为-6dB或-3dB等。具体可以根据不同的电子设备、音频输出设备以及多声道音频数据进行确定,本申请对此不作限制。
在一些实施例中,步骤303可以通过计算各联合检测声道中的各声道的第m个音频帧的平均帧能量,并对该平均帧能量进行判断,确定第m个音频帧的平均帧能量是否大于设定能量阈值。
在一些实施例中,步骤303可以仅通过分别计算各声道的第m个音频帧的帧能量,并对该音频帧的帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。在另一些实施例中,可以通过取各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量,进而判断该帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。进一步的,在确定出各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量后,可以判断第m个音频帧附近至少一个音频帧中最大的帧能量是否大于设定能量阈值。例如判断第m个音频帧和第m-1个音频帧的帧能量中最大的帧能量是否大于设定能量阈值,以确定第m个音频帧的帧能量是否大于设定能量阈值。
在一些实施例中,设定能量阈值包括初判能量阈值(即前文中的第一阈值)和精判能量阈值(即前文中的第二阈值)。进而,步骤303可以对第m个音频帧的帧能量进行初判。具体地,先通过计算各联合检测声道中的第m个音频帧的平均帧能量,并对该平均帧能量进行判断,判断是否大于初判能量阈值,若大于,则进行进一步精判。具体地,精判可以为可以通过取各联合检测声道中的各声道的第m个音频帧的最大帧能量,作为第m个音频帧的帧能量,然后可以判断第m个音频帧附近连续的至少两个音频帧中最大的帧能量是否大于精判能量阈值。下文将结合公式对步骤303进行进一步介绍。
在一些实施例中,音频帧的帧能量可以通过对多声道音频数据进行傅里叶变换得到多声道音频数据的能量谱,并根据连续的多个采样点的能量值,计算得到多个采样点组成的音频帧的帧能量,具体计算方法将在下文中结合公式进行介绍。
304:利用能量抑制算法对第m个音频帧进行能量抑制,得到第m个目标音频帧。
可以理解,能量抑制算法为基于第m个音频帧的帧能量,确定出能量抑制后的目标音频帧的算法。在一些实施例中,可以根据计算得到的第m个音频帧的帧能量以及预设的公式,计算出能量的抑制因子,然后基于该能量抑制因子对第m个音频帧进行能量抑制,得到第m个目标音频帧。在一些实施例中,能量抑制因子为目标增益,进而能量抑制可以为根据目标增益计算得到第m个音频帧的帧增益,并根据该帧增益计算得到第m个目标音频帧。
在一些实施例中,当每个音频帧包括多个采样点时,在计算得到目标音频帧时,可以通过能量抑制因子,计算第m个音频帧的帧增益,并根据第m个音频帧的帧增益计算第m个音频帧中各采样点的增益(即采样点增益),进而基于各采样点的增益确定目标音频帧中各采样点的音频数据,并进行信号重建,确定第m个目标音频帧的音频数据。
305:确定第m个音频帧为第m个目标音频帧。
可以理解,在一些实施例中,第m个音频帧的帧能量未超过预设能量阈值,则在进行声道下混后,该音频帧对应的音频数据进行输出后,不会因为能量过高出现破音,进而也不会影响到用户的听觉体验,因此不需要对该音频帧进行能量抑制,可以保留原音频帧的音频数据。
306:基于预设声道下混算法,将多个声道的各目标音频帧进行声道下混,得到混音输出数据。
可以理解,预设声道下混规则包括声道下混前后对应的声道以及声道下混计算公式。即可以通过声道下混前后对应的声道,确定参与计算声道下混后的各混音声道的音频数据的多声道中的部分声道的目标音频帧。然后根据确定出的、对应同一混音声道的语音帧,将其代入对应的声道下混公式,达到该混音声道的混音输出数据。其中,混音输出数据即前文中的输出到蓝牙耳机200的一个声道的音频数据。
可以理解,在一些实施例中,步骤305或304在得到目标音频帧后,可以利用杜比下混算法,进行声道下混,得到各混音声道的混音输出数据。具体地,可以对各联合检测声道中的对应索引的目标音频帧进行加权求和,得到该目标音频帧对应的混音输出数据中的混音音频帧,然后将对应同一混音声道的混音音频帧进行拼合,可以得到该混音声道的输出数据。
本申请实施例通过上述多声道的混音方法,对多声道音频数据中语音数据的能量进行跟踪,并对能量较大的音频数据进行能量抑制,进而基于能量抑制后的多声道音频数据进行声道下混。本申请实施例中的多声道的混音方法既可以适用于杜比规格的多声道音频数据,也可以适用于非杜比规格的多声道音频数据,即本申请实施例中的多声道的混音方法可以适用多种多声道的声道下混,能够实现自适应声道下混,不会因为音频数据的能量过高出现破音的情况。同时,本申请实施例中的多声道的混音方法仅对部分能量较高的音频数据进行能量抑制,未遗弃音频数据,在解决下混破音的同时还可以保留各声道的音频数据,提高用户的听觉体验。
下面结合图4,以两声道下混为单声道的混音方法为例,对本申请实施例中的多声道的混音方法进行进一步介绍。
图4所示为本申请实施例中的另一种多声道的混音方法的流程示意图。
如图4所示,多声道音频数据包括声道1音频数据的码流和声道2音频数据的码流。电子设备在获取到声道1音频数据和声道2音频数据后,对两声道的音频数据进行分帧处理,每个声道可以得到6个音频数据帧。
由分帧后的各声道的音频数据的码流可知,声道1的音频帧中,第二个音频帧和第四个音频帧的不是很平稳,声道2的音频帧中,第二个音频帧和第三个音频帧的不是很平稳,为了避免下混后的混音数据出现破音的情况,则需要分别对声道1的音频帧中,第二个音频帧和第四个音频帧,以及声道1的音频帧中,第二个音频帧和第三个音频帧进行能量抑制,得到各声道抑制后的目标音频帧的码流。进而,可以采用杜比下混的算法,对下混后的两个声道中对应的目标音频帧进行加权求和,得到混音声道中,对应混音音频帧,6个混音音频帧组成混音输出数据。
下面结合图5,对本申请实施例中的一种能量抑制方法进行进一步介绍。
图5所示为本申请实施例中的一种能量抑制方法的流程图。
如图5所示,该方法包括:
501:多声道数据。其中步骤501中的获取到的多声道数据即步骤301中的获取多声道音频数据,步骤501与步骤301相似,本申请在此不作赘述。
502:根据声道排布生成初始化下混算法,确定混音通道。
可以理解,声道排布即多声道数据中的声道种类和数量。初始化下混算法为,根据声道排布生成的、声道下混前后声道与混音声道的对应关系,以及计算得到混音声道的数据的公式算法。具体地,在一些实施例中,生成初始化下混算法可以包括通过多声道数据的mask掩码确定M个声道与N个声道的对应关系。例如,在六声道下混为两声道时,左声道、左环绕、低音、中置下混为左声道,右声道、右环绕、低音、中置下混为右声道。在一些实施例中,生成初始化下混算法还包括计算得到混音声道中各混音输出数据的公式和公式中参数的初始化。其中,声道下混时生成同一混音通道(即混音声道)的各声道可以表示为联合检测声道,即混音通道对应的联合检测声道中的各声道的数据进行加权求和后得到混音通道(即混音声道)需要输出的数据。
503:VAD检测结果为1。
其中VAD检测全称为语音端点检测,英文全称为Voice Activity Detection。其中VAD检测的条件可以为音频数据的帧能量大于设定能量阈值。
在一些实施例中,进行VAD检测前可以先对多声道数据进行采样和分帧的处理。其中的分帧处理可以为,对多声道数据进行采样后,将部分连续的采样点划分为一个音频帧,例如取512个采样点为一个音频帧。进而,在一些实施例中,VAD检测条件可以为,检测音频帧的帧能量是否大于预设的VAD检测阈值(即前文中的第一阈值),其中的VAD检测阈值可例如帧能量大于-6dB。当VAD的检测判决结果为真(true)时,也即VAD检测结果为1,则表明该音频帧的帧能量查过了预设的VAD检测阈值,该音频帧在进行声道下混后,可能会出现破音的情况。可以理解,VAD检测即为前文中的初判,基于初判结果可以进行进一步精判。
在一些实施例中,可以联合检测声道中的各音频帧的平均帧能量是否大于预设的VAD检测阈值。若平均帧能量过高,则需要对各声道的各音频帧进行能量跟踪,并对符合能量抑制条件的音频帧进行能量抑制。
可以理解,当VAD检测判断结果为1时,才需要对音频帧进行能量跟踪,当VAD检测的检测判决结果为假(false),也即VAD检测结果为0时,声道下混后不会由于音频帧的能量过高出现破音情况。则可以对下一个音频帧进行VAD检测。
504:计算帧能量,跟踪联合检测声道和前后帧最大能量。
可以理解,VAD检测结果为1,则需要对对应的音频帧进行能量抑制。
具体地,以L个采样点为一个语音帧,则可以通过如下公式计算出音频帧的帧能量:
其中,表示第i个声道中第n个音频帧的帧能量。β为计算帧能量时的平滑系数,xi(n)(k)表示第i个声道中第n个音频帧的第k个采样点的输入数据,也即获取的多通道数据中的部分数据。其中i=0,1,2,3,4,5·····,i的取值与声道的数量有关,k可以取0至 L之间的整数,n的取值范围为对多声道数据进行分帧后的语音帧的数量。在一些实施例中,平滑系数β=0.3。
可以理解,在上述步骤504中,通过计算帧能量可以实现对各声道的数据的帧能量的跟踪,进而当对音频帧的帧能量进行追踪,判断帧能量大于设定检测阈值时,可以执行步骤505。
具体地,可以计算联合检测声道中,对应音频帧中能量最大的音频帧,具体可以通过如下方式进行确定。
其中,表示联合检测声道的各声道数据中的第n个音频帧的最大帧能量值。
在一些实施例中,通过上述公式(2)确定出当前第n个音频帧的最大帧能量值后,可以通过以下公式,确定前后音频帧中最大的能量值:
其中,表示第n个音频帧的前后音频帧中的能量最大值,表示联合检测声道的各声道数据中的第n-1个音频帧的最大帧能量值,表示联合检测声道的各声道数据中的第n+1个音频帧的最大帧能量值。
在一些实施例中,还通过确定计算第n个和第n-1个音频帧的前后音频帧中的能量最大值中的最大值得到。
505:计算目标增益和帧增益,最后计算每个采样点的增益,乘以固定增益后输出结果。
可以理解,其中的输出结果为进行声道下混时的输入数据。
在一些实施例中,计算得到后,可以先判断当前的第n个音频帧的能量是否超过设定检测阈值(即上文的第二阈值),若超过,则表明该语音帧在进行声道下混后有破音风险,需要进行能量抑制,进而可以计算该音频帧的目标增益和帧增益确定能量抑制因子。
可以理解,设定检测阈值的判断对每个声道的帧能量进行精确判断,以确定是否需要对能量过大的音频帧进行抑制。
可以理解,目标增益可以理解为能量抑制因子,用于对帧增益进行降幅,进而实现对音频帧的能量进行降幅的目的。
在一些实施例中,目标增益可以通过如下公式进行计算:
可以理解,其中Threshold表示设定检测阈值,Threshold=-3dB。表示第i个声道的第n个音频帧的目标增益。
由公式(4)可知,当前后帧的最大帧能量值小于设定检测阈值Threshold时,表明该音频帧的帧能量适当,不会出现因能量过高而造成下混破音的情况。当前后帧的最大帧能量值大于或等于设定检测阈值Threshold时,表明该音频帧的帧能量过高,可能出现因能量过高而造成下混破音的情况,需要计算器目标增益,对音频帧的帧增益进行抑制。
在一些实施例中,计算得到目标增益后,可以通过以下公式确定音频帧的帧增益:
其中,表示第i个声道的第n个音频帧的帧增益,表示第i个声道的第n-1个音频帧的帧增益,α表示计算帧增益时的平滑系数。在一些实施例中,平滑系数α=0.1。
在一些实施例中,根据上述公式(5)计算得到帧增益后,可以采用下列公式计算音频帧中每个采样点的采样点增益:
其中,表示第i个声道的第n个音频帧中的第k个采样点的采样点增益,表示第i个声道的第n-1个音频帧中的第k个采样点的采样点增益,FrameLen表示音频帧的帧长。例如,以512个采样点为一个音频帧,则该音频帧的帧长为512。
由上述公式(6)可知,在计算采样点增益时,各采样点的采样点增益与其对应的语音帧的帧增益以及前一个语音帧的对应索引数的采样点增益有关。其中各采样点的索引为各采样点在对应音频帧中的序号,例如一个音频帧中的第k个采样点的索引为k。
进一步地,在一些实施例中,根据公式6中得到的采样点增益计算得到目标音频帧的计算公式为:
其中,αi(n)[i][k]表示第i个声道的第n个目标音频帧的第k个采样点的音频数据,xi(n)[i][k]表示第i个声道的第n个目标音频帧的第k个采样点的音频数据,即公式1中的中的xi(n)(k)。
可以理解,在一些实施例中根据目标音频帧中离散的各采样点的音频数据,可以得到连续的目标音频帧的音频数据。进而可以将各目标音频帧的音频数据作为声道下混的输入数据。
在一些实施例中,计算得到能量抑制后的各音频帧后,可以通过下列公式进行声道下混:
其中,aj(n)表示第j个混音通道的第n个音频帧的输出结果,wi,j(n)表示第j个混音通道中第i个声道的第n个音频帧的混音权重,αi(n)表示进行声道下混时,第i个声道的第n个音频帧的输入数据。当声道下混为两声道时,j=0,1,当声道下混为三声道时,j=0,1,2,以此类推。A=0,1,2,3,4,5······,分别表示混音前的多个声道。例如A=0,1,2,3,4,5,则分别代表左声道、左环绕声道、低音声道、中置声道、右声道、右环绕声道。J表示第j个混音通道对应的混音前的声道数。
为了更清楚地阐述本申请实施例中的多声道的混音方法的积极效果,下面结合图6和图7,对本申请实施例中的多声道的混音方法进行仿真。其中,仿真软件为clion软件。并且,以多声道音频数据为5.1声道音频数据,平滑系数α=0.1,β=0.3,设定检测阈值Threshold=-3dB为仿真条件进行仿真。
图6所示为本申请实施例中的一种多声道音频数据的码流波形示意图。
图7所示为进行声道下混后的混音声道的码流波形示意图和能量谱示意图。
如图6所示,图中的六个码流波形图从上到下依次代表左声道、右声道、中置声道、低音声道、左环绕声道、右环绕声道。其中横坐标表示时间,纵坐标表示音频数据的音频数据。在进行码流透传的过程中,每个采样点都用16位表示,因此采用固定系数下混方案的情况下,如果想要保证混音结果不破音,则各个声道相加要小于16位能表示的范围,否则数据就会溢出回绕,进而发生数据跳变,从而会产生杂音。如图6,如果采用杜比下混系数进行混音,图6中的方框内的部分数据存在能量峰值较大的情况,若要保证下混不破音,则声道下混的系数需要变得很小。同时由于音量的大小和能量正相关,相应的混音声道音量都会变小,导致下混后音频的整体音量变小,影响用户的听觉体验。
如图7所示,第一行码流波形图为采用本申请中的多声道的混音方法得到的左声道混音声道的音频数据的码流波形图,第二行码流波形图为采用杜比混音方法得到的右声道混音声道的音频数据的码流波形图。在第一行码流波形图和第二行码流波形图中,横坐标表示时间,纵坐标表示音频数据的音频数据。第三行对应于第一行的左声道混音声道的音频数据,为左声道混音声道的音频数据的能量谱,第四行对应于第二行的右声道混音声道的音频数据,为右声道混音声道的音频数据的能量谱。在第三行能量谱和第四行能量谱中,横坐标表示时间,纵坐标表示能量。
由图7可以看出,经过本申请实施例中的多声道的混音方法处理后的所声道的音频数据的码流波形,在包络变化上更加平稳,极少出现较大的波动,其能量条纹也比较稳定,未出现过高而遍布整个频域的情况。而未经过本申请中的多声道的混音方法处理的右声道混音声道的音频数据的码流波形在其能量较大的部分,例如图7中方框所框选的音频数据,能量条纹遍布整个频域,出现明显破音的问题,杂音也比较明显。
可见,本申请实施例中的多声道的混音方法,通过对各声道的语音数据进行能量跟踪,进而实现对能量过高的音频数据进行能量抑制,在不损失音频数据的前提下,降低了混音破音的风险,提高用户的听觉体验。
图8根据本申请的实施例,示出了一种手机100的硬件结构示意图。
手机100能够执行本申请实施例提供的显示方法。在图8中,相似的部件具有同样的附图标记。如图8所示,手机100可以包括处理器110、电源模块140、存储器180、摄像头101、移动通信模块130、无线通信模块120、传感器模块190、音频模块150、接口模块160以及显示屏102等。
可以理解的是,本发明实施例示意的结构并不构成对手机100的具体限定。在本申请另一些实施例中,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如,可以包括中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、图像信号处理器(image signal processor,ISP)、数字信号处理器(Digital Signal Processor,DSP)、微处理器(Micro-programmed Control Unit,MCU)、人工智能(Artificial Intelligence,AI)处理器或可编程逻辑器件(Field Programmable Gate Array,FPGA)等的处理模块或处理电路。其中,不 同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。例如,在本申请的一些实例中,处理器110可以用来判断第m个音频帧的能量是否大于设定能量阈值,并计算能量抑制因子。在一些实施例中,处理器110还可以用于对得到的目标音频帧进行声道下混,得到混音输出数据。
存储器180可用于存储数据、软件程序以及模块,可以是易失性存储器(Volatile Memory),例如随机存取存储器(Random-Access Memory,RAM);或者非易失性存储器(Non-Volatile Memory),例如只读存储器(Read-Only Memory,ROM),快闪存储器(Flash Memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,或者也可以是可移动存储介质,例如安全数字(Secure Digital,SD)存储卡。在申请的一些实施例中,存储器180用于存储手机100多声道音频数据以及预设声道下混算法。
电源模块140可以包括电源、电源管理部件等。电源可以为电池。电源管理部件用于管理电源的充电和电源向其他模块的供电。充电管理模块用于从充电器接收充电输入;电源管理模块用于连接电源,充电管理模块与处理器110。
移动通信模块130可以包括但不限于天线、功率放大器、滤波器、低噪声放大器(Low Noise Amplify,LNA)等。移动通信模块130可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块130可以由天线接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块130还可以对经调制解调处理器调制后的信号放大,经天线转为电磁波辐射出去。在一些实施例中,移动通信模块130的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块130至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块120可以包括天线,并经由天线实现对电磁波的收发。无线通信模块120可以提供应用在手机100上的包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared,IR)等无线通信的解决方案。手机100可以通过无线通信技术与网络以及其他设备进行通信。
在一些实施例中,手机100的移动通信模块130和无线通信模块120也可以位于同一模块中。
摄像头101用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件把光信号转换成电信号,之后将电信号传递给ISP(Image Signal Processor,图像信号处理器)转换成数字图像信号。手机100可以通过ISP,摄像头101,视频编解码器,GPU(Graphic Processing Unit,图形处理器),显示屏102以及应用处理器等实现拍摄功能。例如,在本申请的一些实施例中,摄像头101用于采集人脸图像、二维码图像,用于手机100进行人脸识别、二维码识别等。。
显示屏102包括显示面板。显示面板可以采用液晶显示屏(Liquid Crystal Display,LCD),有机发光二极管(Organic Light-emitting Diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-matrix Organic Light-emitting Diode的,AMOLED),柔性发光二极管(Flex Light-emitting Diode,FLED),Mini LED,Micro LED,Micro OLED, 量子点发光二极管(Quantum Dot Light-emitting Diodes,QLED)等。例如,显示屏102用于显示手机100在横屏/竖屏状态下分屏、平行视界、单个APP独占屏幕等模式下的各个UI界面。
传感器模块190可以包括接近光传感器、压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。
音频模块150可以将数字音频信息转换成模拟音频信号输出,或者将模拟音频输入转换为数字音频信号。音频模块150还可以用于对音频信号编码和解码。在一些实施例中,音频模块150可以设置于处理器110中,或将音频模块150的部分功能模块设置于处理器110中。
接口模块160包括外部存储器接口、通用串行总线(Universal Serial Bus,USB)接口及用户标识模块(Subscriber Identification Module,SIM)卡接口等。其中外部存储器接口可以用于连接外部存储卡,例如Micro SD卡,实现扩展手机100的存储能力。外部存储卡通过外部存储器接口与处理器110通信,实现数据存储功能。通用串行总线接口用于手机100和其他手机进行通信。用户标识模块卡接口用于与安装至手机100的SIM卡进行通信,例如读取SIM卡中存储的电话号码,或将电话号码写入SIM卡中。
在一些实施例中,手机100还包括按键、马达以及指示器等。其中,按键可以包括音量键、开/关机键等。马达用于使手机100产生振动效果。指示器可以包括激光指示器、射频指示器、LED指示器等。
本申请公开的机制的各实施例可以被实现在硬件、软件、固件或这些实现方法的组合中。本申请的实施例可实现为在可编程系统上执行的计算机程序或程序代码,该可编程系统包括至少一个处理器、存储系统(包括易失性和非易失性存储器和/或存储元件)、至少一个输入设备以及至少一个输出设备。
可将程序代码应用于输入指令,以执行本申请描述的各功能并生成输出信息。可以按已知方式将输出信息应用于一个或多个输出设备。为了本申请的目的,处理系统包括具有诸如例如数字信号处理器(Digital Signal Processor,DSP)、微控制器、专用集成电路(Application Specific Integrated Circuit,ASIC)或微处理器之类的处理器的任何系统。
程序代码可以用高级程序化语言或面向对象的编程语言来实现,以便与处理系统通信。在需要时,也可用汇编语言或机器语言来实现程序代码。事实上,本申请中描述的机制不限于任何特定编程语言的范围。在任一情形下,该语言可以是编译语言或解释语言。在一些情况下,所公开的实施例可以以硬件、固件、软件或其任何组合来实现。所公开的实施例还可以被实现为由一个或多个暂时或非暂时性机器可读(例如,计算机可读)存储介质承载或存储在其上的指令,其可以由一个或多个处理器读取和执行。例如,指令可以通过网络或通过其他计算机可读介质分发。因此,机器可读介质可以包括用于以机器(例如,计算机)可读的形式存储或传输信息的任何机制,包括但不限于,软盘、光盘、光碟、只读存储器(CD-ROMs)、磁光盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、可擦除可编程只读存储器(Erasable Programmable Read Only Memory,EPROM)、电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,EEPROM)、磁卡或光卡、闪存、或用于利用因特网以 电、光、声或其他形式的传播信号来传输信息(例如,载波、红外信号数字信号等)的有形的机器可读存储器。因此,机器可读介质包括适合于以机器(例如计算机)可读的形式存储或传输电子指令或信息的任何类型的机器可读介质。
此外,本申请的技术方案还提供一种计算机可读存储介质,计算机可读存储介质上存储有指令,该指令在电子设备100上执行时使电子设备100执行本申请技术方案提供的显示方法。
此外,本申请的技术方案还提供一种计算机程序产品,该计算机程序产品包括指令,指令用于实现本申请技术方案提供的显示方法。
此外,本申请的技术方案还提供一种芯片装置,芯片装置包括:通信接口,用于输入和/或输出信息;处理器,用于执行计算机可执行程序,使得安装有芯片装置的设备执行本申请技术方案提供的显示方法。
在附图中,可以以特定布置和/或顺序示出一些结构或方法特征。然而,应该理解,可能不需要这样的特定布置和/或排序。而是,在一些实施例中,这些特征可以以不同于说明性附图中所示的方式和/或顺序来布置。另外,在特定图中包括结构或方法特征并不意味着暗示在所有实施例中都需要这样的特征,并且在一些实施例中,可以不包括这些特征或者可以与其他特征组合。
需要说明的是,本申请各设备实施例中提到的各单元/模块都是逻辑单元/模块,在物理上,一个逻辑单元/模块可以是一个物理单元/模块,也可以是一个物理单元/模块的一部分,还可以以多个物理单元/模块的组合实现,这些逻辑单元/模块本身的物理实现方式并不是最重要的,这些逻辑单元/模块所实现的功能的组合才是解决本申请所提出的技术问题的关键。此外,为了突出本申请的创新部分,本申请上述各设备实施例并没有将与解决本申请所提出的技术问题关系不太密切的单元/模块引入,这并不表明上述设备实施例并不存在其它的单元/模块。
需要说明的是,在本专利的示例和说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
虽然通过参照本申请的某些优选实施例,已经对本申请进行了图示和描述,但本领域的普通技术人员应该明白,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (16)

  1. 一种多声道的混音方法,应用于电子设备,其特征在于,包括:
    获取第一多声道音频数据,所述第一多声道音频数据包括M个待混音声道的音频数据;
    确定出所述第一多声道音频数据中存在能量大于预设能量阈值的音频数据,并对所述第一多声道音频数据中能量大于所述预设能量阈值的所述音频数据进行能量降幅处理;
    根据所述能量降幅处理结果,得到第二多声道音频数据;
    对所述第二多声道音频数据进行下混,得到具有N个混音声道的混音输出数据,其中M>N,并且N≥1。
  2. 根据权利要求1所述的多声道的混音方法,其特征在于,所述确定出所述第一多声道音频数据中存在能量大于预设能量阈值的音频数据,包括:
    对所述第一多声道音频数据进行分帧处理,得到多个音频帧,并确定所述多个音频帧的帧能量;
    确定出所述第一多声道音频数据中存在帧能量大于预设能量阈值的高能量音频帧。
  3. 根据权利要求2所述的多声道的混音方法,其特征在于,所述高能量音频帧的帧能量是通过下列公式确定的:
    其中,所述高能量音频帧包括L个采样点;
    β表示帧能量平滑系数;
    xi(n)(k)表示所述M个待混音声道中第i个待混音声道的第n个音频帧中的第k个采样点的音频数据;
    表示所述M个待混音声道中第i个待混音声道的所述第n个音频帧中的所述第k个采样点的能量;
    表示所述M个待混音声道中第i个待混音声道的所述第n个音频帧的帧能量。
  4. 根据权利要求2所述的多声道的混音方法,其特征在于,所述预设能量阈值包括第一阈值和/或第二阈值;
    所述高能量音包括下列至少之一:
    所述M个待混音声道的所述多个音频帧中,对应于同一混音声道的索引相同的至少一个音频帧的平均帧能量大于所述第一阈值的音频帧为所述高能量音频帧;
    同一待混音声道的各音频帧中,与对应音频帧连续的至少两个音频帧的最大帧能量大于所述第二阈值的音频帧为所述高能量音频帧。
  5. 根据权利要求4所述的多声道的混音方法,其特征在于,所述M个待混音声道的各所述音频帧的最大帧能量是根据与各所述音频帧对应于同一混音声道且索引相同的音频帧中的帧能量最大的音频帧的帧能量确定的。
  6. 根据权利要求2所述的多声道的混音方法,其特征在于,对所述第一多声道音频数据中能量大于所述预设能量阈值的所述音频数据进行能量降幅处理,包括:
    确定所述高能量音频帧的目标增益,并根据所述目标增益确定所述高能量音频帧的帧增益;
    根据所述高能量音频帧的帧增益,确定能量降幅处理后所述高能量音频帧对应的目标 音频帧。
  7. 根据权利要求6所述的多声道的混音方法,其特征在于,所述高能量音频帧的所述目标增益是根据所述预设能量阈值,以及与各所述高能量音频帧连续的至少两个音频帧的最大帧能量确定的。
  8. 根据权利要求7所述的多声道的混音方法,其特征在于,所述帧增益是通过下列公式确定的:
    其中,
    α表示帧增益平滑系数;
    表示所述M个待混音声道中第i个待混音声道的第n个音频帧的目标增益;
    表示所述M个待混音声道中所述第i个待混音声道的所述第n-1个音频帧的帧增益;
    表示所述M个待混音声道中所述第i个待混音声道的所述第n个音频帧的帧增益。
  9. 根据权利要求6所述的多声道的混音方法,其特征在于,所述根据所述高能量音频帧的帧增益,确定能量降幅处理后所述高能量音频帧对应的目标音频帧,包括:
    根据所述高能量音频帧的帧增益,确定所述高能量音频帧中各采样点的采样点增益;
    根据各所述采样点增益,对所述高能量音频帧中的各采样点的音频数据进行能量降幅处理,得到所述目标音频帧中各采样点的音频数据;
    根据所述目标音频帧各采样点的音频数据生成所述目标音频帧。
  10. 根据权利要求9所述的多声道的混音方法,其特征在于,各所述采样点增益是通过下列公式确定的:
    其中,
    FrameLen表示所述目标音频帧的帧长;
    表示所述M个待混音声道中第i个待混音声道的第n-1个音频帧的帧增益;
    表示所述M个待混音声道中所述第i个待混音声道的第n个音频帧的帧增益;
    表示所述M个待混音声道中所述第i个待混音声道的所述第n-1个音频帧的第k个采样点的采样点增益;
    表示所述M个待混音声道中所述第i个待混音声道的所述第n个音频帧的第k个采样点的采样点增益。
  11. 根据权利要求9所述的多声道的混音方法,其特征在于,所述目标音频帧中各采样点的音频数据是通过所述目标音频帧对应的高能量音频帧的各采样点的音频数据以及对应的采样点增益确定的。
  12. 根据权利要求6所述的多声道的混音方法,其特征在于,所述根据所述能量降幅处理结果,得到第二多声道音频数据,包括:
    根据所述目标音频帧和所述第一多声道音频数据中能量不大于预设能量阈值的低能 量音频帧,生成所述第二多声道音频数据。
  13. 根据权利要求12所述的多声道的混音方法,其特征在于,所述对所述第二多声道音频数据进行下混,得到具有N个第二声道的混音输出数据,包括:
    对所述第二多声道音频数据中对应于同一第二声道的所述目标音频帧和所述低能量音频帧进行加权求和,得到所述混音输出数据。
  14. 一种电子设备,其特征在于,包括:
    存储器,用于存储由电子设备的一个或多个处理器执行的指令,以及
    处理器,是电子设备的处理器之一,用于控制执行权利要求1至13中任一项所述的多声道的混音方法。
  15. 一种计算机可读存储介质,其特征在于,所述存储介质上存储有指令,所述指令在计算机上执行时使所述计算机执行权利要求1至13中任一项所述的多声道的混音方法。
  16. 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,所述指令用于实现如权利要求1至13中任一项所述的多声道的混音方法。
PCT/CN2023/087077 2022-04-15 2023-04-07 多通道的混音方法、设备及介质 WO2023197967A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210414876.5 2022-04-15
CN202210414876.5A CN116962955A (zh) 2022-04-15 2022-04-15 多通道的混音方法、设备及介质

Publications (1)

Publication Number Publication Date
WO2023197967A1 true WO2023197967A1 (zh) 2023-10-19

Family

ID=88329055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/087077 WO2023197967A1 (zh) 2022-04-15 2023-04-07 多通道的混音方法、设备及介质

Country Status (2)

Country Link
CN (1) CN116962955A (zh)
WO (1) WO2023197967A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188595A (zh) * 2011-12-31 2013-07-03 展讯通信(上海)有限公司 处理多声道音频信号的方法和系统
US20140236604A1 (en) * 2004-04-16 2014-08-21 Dolby International Ab Apparatus and method for generating a level parameter and apparatus and method for generating a multi-channel representation
US20190259395A1 (en) * 2016-11-08 2019-08-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for downmixing or upmixing a multichannel signal using phase compensation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140236604A1 (en) * 2004-04-16 2014-08-21 Dolby International Ab Apparatus and method for generating a level parameter and apparatus and method for generating a multi-channel representation
CN103188595A (zh) * 2011-12-31 2013-07-03 展讯通信(上海)有限公司 处理多声道音频信号的方法和系统
US20190259395A1 (en) * 2016-11-08 2019-08-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for downmixing or upmixing a multichannel signal using phase compensation

Also Published As

Publication number Publication date
CN116962955A (zh) 2023-10-27

Similar Documents

Publication Publication Date Title
US10237651B2 (en) Audio signal processing method and electronic device for supporting the same
AU2015284970B2 (en) Operating method for microphones and electronic device supporting the same
WO2020034779A1 (zh) 音频处理方法、存储介质及电子设备
US20200053464A1 (en) User interface for controlling audio zones
CN107240396B (zh) 说话人自适应方法、装置、设备及存储介质
CN114203163A (zh) 音频信号处理方法及装置
US11412341B2 (en) Electronic apparatus and controlling method thereof
CN110941415B (zh) 一种音频文件的处理方法、装置、电子设备及存储介质
CN109964272B (zh) 声场表示的代码化
WO2021114808A1 (zh) 音频处理方法、装置、电子设备和存储介质
US10387101B2 (en) Electronic device for providing content and control method therefor
CN110600041A (zh) 一种声纹识别的方法及设备
US11822854B2 (en) Automatic volume adjustment method and apparatus, medium, and device
CN110827808A (zh) 语音识别方法、装置、电子设备和计算机可读存储介质
US20230260525A1 (en) Transform ambisonic coefficients using an adaptive network for preserving spatial direction
CN111508510A (zh) 音频处理方法、装置、存储介质及电子设备
KR102565447B1 (ko) 청각 인지 속성에 기반하여 디지털 오디오 신호의 이득을 조정하는 전자 장치 및 방법
US20220329966A1 (en) Electronic apparatus and controlling method thereof
CN114845212A (zh) 音量优化方法、装置、电子设备及可读存储介质
CN108829370B (zh) 有声资源播放方法、装置、计算机设备及存储介质
WO2023197967A1 (zh) 多通道的混音方法、设备及介质
WO2023202522A1 (zh) 播放速度控制方法和电子设备
CN114040317B (zh) 音响的声道补偿方法及装置、电子设备和存储介质
KR20160079577A (ko) 메타 데이터를 이용하여 컨텐츠를 재생하는 방법 및 장치
KR20200078184A (ko) 전자 장치 및 그 제어 방법

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23787611

Country of ref document: EP

Kind code of ref document: A1