WO2019200996A1 - 多声道音频处理方法、装置和计算机可读存储介质 - Google Patents

多声道音频处理方法、装置和计算机可读存储介质 Download PDF

Info

Publication number
WO2019200996A1
WO2019200996A1 PCT/CN2019/073021 CN2019073021W WO2019200996A1 WO 2019200996 A1 WO2019200996 A1 WO 2019200996A1 CN 2019073021 W CN2019073021 W CN 2019073021W WO 2019200996 A1 WO2019200996 A1 WO 2019200996A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
characteristic
channel
processed
channel audio
Prior art date
Application number
PCT/CN2019/073021
Other languages
English (en)
French (fr)
Inventor
黄传增
Original Assignee
北京微播视界科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京微播视界科技有限公司 filed Critical 北京微播视界科技有限公司
Publication of WO2019200996A1 publication Critical patent/WO2019200996A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • the present invention relates to the field of audio technology, and more particularly to a multi-channel audio processing method, apparatus, and computer readable storage medium.
  • audio is increasingly used as an interactive information dissemination carrier.
  • users are beginning to pay more and more attention to the audio experience.
  • the prior art is generally processed for mono audio.
  • the method of processing mono audio does not take into account the characteristics of each channel audio in multi-channel audio; thus, the existing processing method for mono audio is applied to multi-channel When listening to audio, you can't get a good user experience.
  • the present invention has been directed to the above-described shortcomings of the prior art, and proposes a multi-channel audio processing method that can obtain a good user experience effect to effectively overcome the above problems.
  • the main object of the present invention is to provide a multi-channel audio processing method to at least partially solve the technical problem of how to obtain a good user experience effect; in addition, a multi-channel audio processing device and multi-channel audio processing hardware are also provided. Apparatus and computer readable storage medium.
  • a multi-channel audio processing method comprising:
  • the multi-channel audio to be processed is processed according to the audio characteristics of the audio of each channel.
  • the multi-channel audio to be processed is offline audio, detecting an overall audio characteristic of each channel audio in the multi-channel audio to be processed;
  • the to-be-processed multi-channel audio is online audio, detecting local audio characteristics of each channel audio in the multi-channel audio to be processed.
  • the step of processing the multi-channel audio to be processed according to the audio characteristics of the audio of each channel, if the multi-channel audio to be processed is offline audio includes:
  • the multi-channel audio to be processed is processed based on the first audio processing parameter.
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, and transient sound pulse characteristics
  • the first audio processing parameter Determining the first audio processing parameter according to the pitch characteristic, the sound formant characteristic, and the transient sound pulse characteristic; wherein the first audio processing parameter includes a base frequency amplitude, a formant amplitude And transient pulses;
  • the step of processing the multi-channel audio to be processed based on the first audio processing parameter includes:
  • the fundamental frequency amplitude is adjusted, and the formant amplitude is smoothed, and the transient pulse is clipped.
  • the overall audio characteristics include pitch characteristics and sound formant characteristics
  • the first audio processing parameter comprises a base frequency amplitude and a formant amplitude
  • the step of processing the multi-channel audio to be processed based on the first audio processing parameter specifically includes:
  • the fundamental frequency amplitude is adjusted and the formant amplitude is smoothed.
  • the overall audio characteristics include pitch characteristics and transient sound pulse characteristics
  • the first audio processing parameter comprises a base frequency amplitude and a transient pulse
  • the step of processing the multi-channel audio to be processed based on the first audio processing parameter specifically includes:
  • the fundamental frequency amplitude is adjusted and the transient pulse is clipped.
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, transient sound pulse characteristics, and audio phase characteristics;
  • the first audio processing parameter Determining the first audio processing parameter based on the pitch characteristic, the sound formant characteristic, the transient sound pulse characteristic, and the audio phase characteristic; wherein the first audio processing parameter includes a baseband amplitude Value, formant amplitude, transient pulse and audio phase;
  • the step of processing the multi-channel audio to be processed based on the first audio processing parameter includes:
  • the fundamental frequency amplitude is adjusted, and the formant amplitude is smoothed, and the transient pulse is clipped and the audio phase is adjusted.
  • the overall audio characteristics include multi-channel audio downmix characteristics and primary side channel characteristics
  • the first audio processing parameter comprises: strong audio correlation, base frequency amplitude, and formant Amplitude
  • the step of processing the multi-channel audio to be processed based on the first audio processing parameter specifically includes:
  • the multi-channel audio to be processed is processed based on the second audio processing parameter.
  • the present invention also provides a multi-channel audio processing device, including:
  • a receiving module configured to receive multi-channel audio to be processed
  • a detecting module configured to detect audio characteristics of each channel audio in the multi-channel audio to be processed
  • a processing module configured to process the multi-channel audio to be processed according to audio characteristics of the audio of each channel.
  • the detection module includes:
  • a first detecting unit configured to detect an overall audio characteristic of each channel audio in the to-be-processed multi-channel audio in a case where the to-be-processed multi-channel audio is offline audio;
  • a second detecting unit configured to detect a local audio characteristic of each channel audio in the to-be-processed multi-channel audio in a case where the to-be-processed multi-channel audio is online audio.
  • the processing module includes:
  • a first determining unit configured to determine a first audio processing parameter according to the overall audio characteristic
  • a first processing unit configured to process the multi-channel audio to be processed based on the first audio processing parameter.
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, and transient sound pulse characteristics
  • the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic, the sound formant characteristic, and the transient sound pulse characteristic; wherein the first audio processing parameter includes Baseband amplitude, formant amplitude, and transient pulses;
  • the first processing unit is specifically configured to adjust the baseband amplitude, smooth the formant amplitude, and perform clipping processing on the transient pulse.
  • the overall audio characteristics include pitch characteristics and sound formant characteristics
  • the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic and the sound formant characteristic; wherein the first audio processing parameter includes a base frequency amplitude and a resonance peak amplitude value;
  • the first processing unit is specifically configured to adjust the baseband amplitude and smooth the formant amplitude.
  • the overall audio characteristics include pitch characteristics and transient sound pulse characteristics
  • the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic and the transient sound pulse characteristic; wherein the first audio processing parameter includes a base frequency amplitude and a transient pulse;
  • the first processing unit is specifically configured to adjust the base frequency amplitude and perform clipping processing on the transient pulse.
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, transient sound pulse characteristics, and audio phase characteristics;
  • the first determining unit is specifically configured to determine the first audio processing parameter according to the pitch characteristic, the sound formant characteristic, the transient sound pulse characteristic, and the audio phase characteristic; wherein The first audio processing parameters include a baseband amplitude, a formant amplitude, a transient pulse, and an audio phase;
  • the first processing unit is specifically configured to adjust the baseband amplitude, smooth the formant amplitude, perform clipping processing on the transient pulse, and adjust the audio phase.
  • the overall audio characteristics include multi-channel audio downmix characteristics and primary side channel characteristics
  • the first determining unit is specifically configured to determine the first audio processing parameter according to the multi-channel audio downmix characteristic and the primary side channel characteristic; wherein the first audio processing parameter comprises: strong audio correlation Saturation, fundamental frequency amplitude and formant amplitude;
  • the first processing unit is specifically configured to perform joint processing on all channel audios in the multi-channel audio to be processed, and adjust the base frequency amplitude and smooth the formant amplitude.
  • the processing module further includes: if the to-be-processed multi-channel audio is online audio,
  • a second determining unit configured to determine a second audio processing parameter according to the local audio characteristic
  • a second processing unit configured to process the multi-channel audio to be processed based on the second audio processing parameter.
  • the present invention also provides a multi-channel audio processing hardware device, including:
  • a memory for storing non-transitory computer readable instructions
  • a processor for executing the computer readable instructions to cause the multi-channel audio processing method to be implemented when the processor executes.
  • the present invention further provides a computer readable storage medium for storing non-transitory computer readable instructions, when the non-transitory computer readable instructions are executed by a computer, causing the computer to execute the above The multi-channel audio processing method described.
  • Embodiments of the present invention provide a multi-channel audio processing method, apparatus, and computer readable storage medium.
  • the multi-channel audio processing method includes: receiving multi-channel audio to be processed; detecting audio characteristics of each channel audio in the multi-channel audio to be processed; processing multi-channel audio according to audio characteristics of each channel audio Process it.
  • the embodiment of the present invention adopts the above technical solution, and performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining a good user experience.
  • FIG. 1 is a flow chart showing a multi-channel audio processing method according to an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of detecting offline audio and online audio, respectively, according to an embodiment of the present invention
  • FIG. 3 is a schematic flow chart of processing for offline audio according to an embodiment of the present invention.
  • FIG. 4 is a schematic flow chart of a multi-channel audio processing method according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of processing for online audio according to an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of a multi-channel audio processing method according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a multi-channel audio processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a processing module according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a processing module according to another embodiment of the present invention.
  • FIG. 10 is a schematic structural diagram of a multi-channel audio processing hardware device according to an embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a computer readable storage medium according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a multi-channel audio processing terminal according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of a multi-channel audio processing terminal according to another embodiment of the present invention.
  • the embodiment of the present invention provides a multi-channel audio processing method. As shown in FIG. 1, the method may include the following steps S1 to S3. among them:
  • Step S1 Receive multi-channel audio to be processed.
  • the multi-channel audio to be processed may be an offline multi-channel audio to be processed, or may be a multi-channel audio to be processed online, which is not limited by the present invention.
  • multi-channel audio includes but is not limited to 3.1 channel audio, 5.1 channel audio, 7.1 channel audio, and the like.
  • Step S2 detecting the audio characteristics of each channel audio in the multi-channel audio to be processed.
  • audio characteristics include, but are not limited to, pitch characteristics, sound formant characteristics, transient sound pulse characteristics, audio phase characteristics, multi-channel audio downmix characteristics, main side channel characteristics, and the like.
  • one or several audio characteristics can be detected.
  • Step S3 processing the multi-channel audio to be processed according to the detection result.
  • This step performs corresponding processing on the multi-channel audio to be processed according to the detected audio characteristics of each channel audio of one or several multi-channel audio to be processed.
  • the manner of processing the multi-channel audio to be processed includes, but is not limited to, joint processing, separation processing, smoothing processing, audio phase processing, base frequency processing, zero-setting processing, spectrum stretching processing, clipping processing, and the like.
  • the above joint processing refers to processing audio of each channel together
  • the above separation processing separately processes each channel audio separately
  • the smoothing process is to filter out the frequency domain data points of the mutation, that is, the peak data of the peaks in the smooth spectrum; in the specific implementation process, the neighborhood average method, the Gaussian smoothing method, the parabolic smoothing method, etc. may be adopted; The neighborhood averaging method is an example.
  • the sliding window is used to smooth the amplitude of the frequency signal in the spectrum.
  • the weight is calculated according to the form of the Gaussian distribution function, and This weight is linearly smoothed. This smoothing process can be for the full frequency band of the audio or for a partial frequency band of the audio. When the formant of the audio is smoothed, the effect of the tone change can be achieved;
  • the above-mentioned fundamental frequency processing refers to adjusting the fundamental frequency of the audio, thereby realizing the effect of transposition
  • the audio phase processing refers to adjusting the phase of the audio, and specifically, adjusting according to the audio phase corresponding to the predetermined sound effect;
  • the above zeroing process refers to the elimination of the spectrum corresponding to the transient pulse in the entire frequency band of the audio.
  • the above-mentioned spectrum stretching processing refers to stretching the spectrum by interpolating or extracting the audio spectrum; this processing can achieve the shifting effect.
  • the above clipping process refers to reducing the amplitude of the transient pulse.
  • the embodiment of the present invention may adopt one or more of the foregoing processing manners.
  • speed mode also called processing speed priority
  • quality mode also called high sound quality priority
  • balance mode which can handle both processing speed and high
  • Sound quality it is also possible to achieve the effects of variable speed shifting, shifting without shifting, shifting and shifting.
  • the embodiment of the present invention adopts the above technical solution, and performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining a good user experience.
  • the step S2 may specifically include:
  • Step S21 If the to-be-processed multi-channel audio is offline audio, detecting an overall audio characteristic of each channel audio in the to-be-processed multi-channel audio;
  • Step S22 If the to-be-processed multi-channel audio is online audio, detecting local audio characteristics of each channel audio in the to-be-processed multi-channel audio.
  • the received audio is a piece of audio. Therefore, the characteristics detected for this online audio are local audio characteristics.
  • the characteristics detected for the offline audio are overall audio characteristics to ensure a good user experience after audio processing.
  • the overall audio characteristics include, but are not limited to, pitch characteristics, sound formant characteristics, transient sound pulse characteristics, audio phase characteristics, multi-channel audio downmix characteristics, main side channel characteristics, and the like.
  • the local audio feature includes all or part of the overall audio characteristics, and details are not described herein.
  • the embodiment of the present invention performs local audio characteristics and overall audio characteristics detection for the obtained online sound source and offline sound source, thereby realizing adaptive audio characteristic detection, so as to realize adaptive processing for different sound sources. , which can improve the user experience.
  • the multi-channel audio processing method may further include:
  • the determination may be made by the respective characteristics of the offline audio and the online audio.
  • the offline audio is complete audio
  • the online audio may be a one-segment, one-segment packet transmitted through a real-time message transmission protocol, according to which Whether the multi-channel audio to be processed is offline audio or online audio; an identification mark can also be added in advance to determine whether the multi-channel audio to be processed is offline audio or online audio.
  • the invention is not limited thereto.
  • the embodiments of the present invention can adapt to different multi-channel audio application scenarios by performing corresponding processing on the offline audio and the online audio respectively, thereby obtaining a better user experience.
  • the step S3 specifically includes:
  • Step S31 determining a first audio processing parameter according to an overall audio characteristic
  • Step S32 Processing the multi-channel audio to be processed according to the first audio processing parameter.
  • the first audio processing parameters include, but are not limited to, the audio correlation strength between the channels, the fundamental frequency amplitude, the formant amplitude, the transient pulse, the audio envelope, and the like.
  • the audio correlation is strong, the audio of each channel in the multi-channel audio is processed jointly; if the audio correlation is weak, the audio of each channel in the multi-channel audio to be processed is separately performed. Processed separately (ie, separated).
  • the step of determining the first audio processing parameter specifically includes: according to pitch characteristics, sound The first audio processing parameter is determined by the formant characteristic and the transient sound pulse characteristic; wherein the first audio processing parameter includes a fundamental frequency amplitude, a formant amplitude, and a transient pulse; and the plurality of sounds are processed based on the first audio processing parameter
  • the step of processing the channel audio specifically includes: adjusting the amplitude of the fundamental frequency, smoothing the amplitude of the formant, and clipping the transient pulse.
  • determining the first audio processing parameter according to the overall audio characteristic comprises: determining, according to the pitch characteristic and the sound formant characteristic, a first audio processing parameter; wherein the first audio processing parameter includes a base frequency amplitude and a formant amplitude; and the step of processing the multi-channel audio to be processed based on the first audio processing parameter specifically includes: adjusting a base frequency amplitude and Smooth the formant amplitude.
  • the step of determining the first audio processing parameter based on the overall audio characteristic comprises: based on the pitch characteristic and the transient sound pulse characteristic Determining a first audio processing parameter; wherein the first audio processing parameter includes a base frequency amplitude and a transient pulse; and the step of processing the multi-channel audio to be processed based on the first audio processing parameter comprises: adjusting a base frequency amplitude The transient pulse is clipped.
  • the step of determining the first audio processing parameter may specifically include: a pitch characteristic, a sound formant characteristic, a transient sound pulse characteristic, and an audio phase characteristic to determine a first audio processing parameter; wherein the first audio processing parameter includes a base frequency amplitude, a formant amplitude, a transient pulse, and an audio phase
  • the step of processing the multi-channel audio to be processed according to the first audio processing parameter may specifically include: adjusting a fundamental frequency amplitude, smoothing the formant amplitude, clipping the transient pulse, and adjusting the audio phase.
  • the step of determining the first audio processing parameter comprises: according to the multi-channel audio downmix characteristic And determining a first audio processing parameter by the primary side channel characteristic; wherein the first audio processing parameter comprises: a strong audio correlation, a base frequency amplitude, and a formant amplitude; and the multi-channel audio is processed based on the first audio processing parameter
  • the step of processing specifically includes: performing joint processing on all channel audios in the multi-channel audio to be processed, and adjusting the fundamental frequency amplitude and smoothing the formant amplitude.
  • the strong audio correlation can be determined according to the following factors: the spectral characteristics of each channel audio, the sound quality of each channel audio source, the collection mode of each channel audio, etc., but are not limited thereto.
  • the method of joint processing may be adopted when processing the multi-channel audio to be processed; if the audio of each channel is collected by a separate microphone, then When processing multi-channel audio processing, separate processing can be adopted; if the spectral characteristics of each channel audio are good, joint processing can be adopted when processing multi-channel audio to be processed; if each channel audio If the spectral characteristics of the multi-channel audio are to be processed, a separate processing manner may be adopted; if the amplitude of the formant is greater than the formant threshold, the formant included in the multi-channel audio to be processed is performed. Smoothing; if the audio envelope is offset, the amplitudes of the fundamental and formants in the frequency domain of the multichannel audio to be processed are adjusted.
  • the first audio processing parameter corresponding thereto is determined according to the overall audio characteristic of the offline multi-channel audio to be processed; and then, according to the determined first audio processing parameter Perform adaptive processing to achieve different audio effects. For example, by adjusting the amplitude of the fundamental frequency, the effect of the tone of the sound can be realized; by smoothing the amplitude of the formant, the effect of the tone can be realized; by offsetting the audio envelope, the sound can be realized.
  • the effect of transposition thus the technical effect of adaptively translating the audio; thus, a good user experience effect can be obtained by the embodiment of the present invention.
  • Step Sa1 receiving multi-channel audio to be processed
  • Step Sa2 if the to-be-processed multi-channel audio is offline audio, detecting an overall audio characteristic of each channel audio in the to-be-processed multi-channel audio;
  • Step Sa3 determining strong correlation audio processing parameters according to the overall audio characteristics
  • Step Sa4 Joint processing of the multi-channel audio to be processed according to the strong correlation audio processing parameters.
  • the overall audio characteristic is detected by receiving the offline multi-channel audio to be processed; then, the strong correlation audio processing parameter is determined as the to-be-processed parameter of the multi-channel audio to be processed, and finally the strong Corresponding joint processing of the correlation audio processing parameters, thereby implementing adaptive processing, thereby obtaining a good user experience.
  • the step S3 specifically includes:
  • Step S33 determining a second audio processing parameter according to the local audio characteristic
  • Step S34 processing the multi-channel audio to be processed according to the second audio processing parameter.
  • the second audio processing parameter may be part or all of the above first audio processing parameters.
  • the embodiment of the present invention determines the second audio processing parameter corresponding to the local audio characteristic of the online multi-channel audio to be processed by adopting the above technical solution; and then adaptively according to the determined second audio processing parameter. Processing, so that different audio effects can be obtained. For example, by adjusting the amplitude of the fundamental frequency, the effect of the tone of the sound can be realized; by smoothing the amplitude of the formant, the effect of the tone can be realized; by offsetting the audio envelope, the sound can be realized. The effect of transposition; thus the technical effect of adaptively translating the audio; thus, a good user experience effect can be obtained by the embodiment of the present invention.
  • an embodiment of the present invention provides a multi-channel audio processing method, including:
  • Step Sb1 receiving multi-channel audio to be processed
  • Step Sb2 determining whether the multi-channel audio to be processed is offline audio or online audio; if the to-be-processed multi-channel audio is offline audio, performing step Sb3; if the to-be-processed multi-channel audio is online audio, performing step Sb4 ;
  • Step Sb3 detecting the overall audio characteristics of the audio of each channel in the multi-channel audio to be processed, and performing step Sb5;
  • Step Sb4 detecting local audio characteristics of each channel audio in the multi-channel audio to be processed, and performing step Sb7;
  • Step Sb5 determining the first audio processing parameter according to the overall audio characteristics, and performing step Sb6;
  • Step Sb6 processing the multi-channel audio to be processed according to the first audio processing parameter
  • Step Sb7 determining the second audio processing parameter according to the local audio characteristic, and performing step Sb8;
  • Step Sb8 processing the multi-channel audio to be processed according to the second audio processing parameter.
  • the embodiment of the present invention separately determines and processes the corresponding audio processing parameters according to the overall audio characteristics and the local audio characteristics for the offline audio source and the online audio source, thereby implementing adaptive audio processing. This results in a good user experience.
  • the following is a device embodiment of the present invention.
  • the device embodiment of the present invention is used to perform the steps of implementing the method embodiment of the present invention.
  • the device embodiment of the present invention is used to perform the steps of implementing the method embodiment of the present invention.
  • the specific technical details are not disclosed.
  • the embodiment of the present invention further provides a multi-channel audio processing device based on the same technical concept as the above method embodiment.
  • the apparatus includes: a receiving module 71, a detecting module 72, and a processing module 73.
  • the receiving module 71 is configured to receive multi-channel audio to be processed.
  • the detecting module 72 is configured to detect audio characteristics of each channel audio in the multi-channel audio to be processed.
  • the processing module 73 is configured to process the multi-channel audio to be processed according to the audio characteristics of each channel audio.
  • the embodiment of the present invention adopts the above technical solution, and the processing module 73 performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed detected by the detecting module 72, thereby obtaining a good user experience. .
  • the detecting module may specifically include:
  • a first detecting unit configured to detect an overall audio characteristic of each channel audio in the multi-channel audio to be processed if the multi-channel audio to be processed is offline audio;
  • a second detecting unit configured to detect local audio characteristics of each channel audio in the multi-channel audio to be processed in a case where the multi-channel audio to be processed is online audio.
  • the received audio is a piece of audio. Therefore, the characteristics detected for this online audio are local audio characteristics.
  • the characteristics detected for the offline audio are overall audio characteristics to ensure a good user experience after audio processing.
  • the overall audio characteristics include, but are not limited to, pitch characteristics, sound formant characteristics, transient sound pulse characteristics, audio phase characteristics, multi-channel audio downmix characteristics, main side channel characteristics, and the like.
  • the local audio feature includes all or part of the overall audio characteristics, and details are not described herein.
  • the first detection unit and the second detection unit respectively detect the overall audio characteristics or local audio characteristics for the offline audio or the online audio for the multi-channel audio to be processed, so as to implement adaptive processing of multiple sound sources. This allows the user to get a good experience.
  • the processing module specifically includes a first determining unit 81 and a first processing unit 82.
  • the first determining unit 81 is configured to determine the first audio processing parameter according to the overall audio characteristic.
  • the first processing unit 82 is configured to process the multi-channel audio to be processed based on the first audio processing parameters.
  • the first audio processing parameters include, but are not limited to, the audio correlation strength between the channels, the fundamental frequency amplitude, the formant amplitude, the transient pulse, the audio envelope, and the like.
  • the audio correlation is strong, the audio of each channel in the multi-channel audio is processed jointly; if the audio correlation is weak, the audio of each channel in the multi-channel audio to be processed is separately performed. Processed separately (ie, separated).
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, and transient sound pulse characteristics; the first determining unit 81 is specifically configured to use pitch characteristics, sound formant characteristics, and transient sound pulses. Characteristic, determining a first audio processing parameter; wherein the first audio processing parameter includes a base frequency amplitude, a formant amplitude, and a transient pulse; the first processing unit 82 is specifically configured to adjust the base frequency amplitude and smooth the resonant peak amplitude Value and clip the transient pulse.
  • the overall audio characteristic includes a pitch characteristic and a sound formant characteristic; the first determining unit 81 is further configured to determine the first audio processing parameter according to the pitch characteristic and the sound formant characteristic;
  • the first audio processing parameter includes a base frequency amplitude and a formant amplitude; the first processing unit 82 can also be specifically configured to adjust the base frequency amplitude and smooth the formant amplitude.
  • the overall audio characteristic includes a pitch characteristic and a transient sound pulse characteristic; the first determining unit 81 is further configured to determine the first audio processing parameter according to the pitch characteristic and the transient sound pulse characteristic.
  • the first audio processing parameter includes a base frequency amplitude and a transient pulse; the first processing unit 82 can also be specifically configured to adjust the base frequency amplitude and perform clipping processing on the transient pulse.
  • the overall audio characteristics include pitch characteristics, sound formant characteristics, transient sound pulse characteristics, and audio phase characteristics; the first determining unit 81 may also be specifically configured to use sound pitch characteristics according to pitch characteristics.
  • the first audio processing parameter is determined by the characteristic, the transient sound pulse characteristic and the audio phase characteristic; wherein the first audio processing parameter comprises a fundamental frequency amplitude, a formant amplitude, a transient pulse, and an audio phase; the first processing unit 82 further It can be specifically used to adjust the fundamental frequency amplitude, smooth the formant amplitude, clip the transient pulse, and adjust the audio phase.
  • the overall audio characteristics include a multi-channel audio downmix characteristic and a primary side channel characteristic; the first determining unit 81 may also be specifically configured to use the multi-channel audio downmix characteristic and the primary side channel characteristic, Determining the first audio processing parameter; wherein the first audio processing parameter comprises: strong audio correlation, base frequency amplitude and formant amplitude; the first processing unit 82 may also be specifically configured for all of the multi-channel audio to be processed The channel audio is jointly processed and the base frequency amplitude is adjusted and the formant amplitude is smoothed.
  • the processing module further includes a second determining unit 91 and a second processing unit 92.
  • the second determining unit 91 is configured to determine the second audio processing parameter according to the local audio characteristic.
  • the second processing unit 92 is configured to process the multi-channel audio to be processed based on the second audio processing parameter.
  • the second audio processing parameter may be part or all of the above first audio processing parameters.
  • FIG. 10 shows a schematic structural diagram of a multi-channel audio processing hardware device according to an embodiment of the present disclosure.
  • the multi-channel audio processing hardware device 10 includes a memory 101 and a processor 102.
  • the memory 101 is configured to store non-transitory computer readable instructions; the processor 102 is configured to execute the computer readable instructions such that the processor implements the multi-channel audio processing method embodiments described above.
  • the memory 101 is used to store non-transitory computer readable instructions.
  • memory 101 may include one or more computer program products, which may include various forms of computer readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache or the like.
  • the nonvolatile memory may include, for example, a read only memory (ROM), a hard disk, a flash memory, or the like.
  • the processor 102 can be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and can control other components in the multi-channel audio processing hardware device 10 to perform desired functions. .
  • the processor 102 is configured to execute the computer readable instructions stored in the memory 101 such that the multi-channel audio processing hardware device 10 performs the multi-channel of the foregoing embodiments of the present disclosure. All or part of the steps of the audio processing method.
  • the present embodiment may also include a well-known structure such as a communication bus, an interface, etc., and these well-known structures should also be included in the protection scope of the present invention. within.
  • the embodiment of the present invention adopts the above technical solution, and performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining a good user experience.
  • the embodiment of the present invention further provides a computer readable storage medium based on the same technical concept as the multi-channel audio processing method embodiment.
  • the computer readable storage medium 11 is configured to store non-transitory computer readable instructions 111 that, when executed by a computer, cause the computer to perform the multi-channel audio described above. The steps described in the method examples are processed.
  • the above computer readable storage medium 11 includes, but is not limited to, an optical storage medium (for example, CD-ROM and DVD), a magneto-optical storage medium (for example, MO), a magnetic storage medium (for example, a magnetic tape or a mobile hard disk), and has built-in
  • An optical storage medium for example, CD-ROM and DVD
  • a magneto-optical storage medium for example, MO
  • a magnetic storage medium for example, a magnetic tape or a mobile hard disk
  • a medium for example, a memory card
  • a medium for example, a ROM box having a built-in ROM can be rewritable.
  • the embodiment of the present invention adopts the above technical solution, and performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining a good user experience.
  • the embodiment of the present invention further provides a multi-channel audio processing terminal based on the same technical concept as the multi-channel audio processing method embodiment.
  • Fig. 12 exemplarily shows a structural diagram of a multi-channel audio processing terminal. As shown in FIG. 12, the multi-channel audio processing terminal 12 includes the above-described multi-channel audio processing device 121.
  • the terminal 12 described above may be implemented in various forms, and the terminal in the present disclosure may include, but is not limited to, such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (Personal Digital Assistant), a PAD (Tablet), a PMP.
  • Mobile terminal devices portable multimedia players
  • navigation devices in-vehicle terminal devices, in-vehicle display terminals, in-vehicle electronic rearview mirrors, and the like
  • fixed terminal devices such as digital TVs, desktop computers, and the like.
  • the multi-channel audio processing terminal may also include other components.
  • the multi-channel audio processing terminal 13 may include a power supply unit 131, a wireless communication unit 132, an A/V (audio/video) input unit 133, a user input unit 134, a sensing unit 135, and an interface unit 136.
  • Figure 13 illustrates a terminal having various components, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the wireless communication unit 132 allows radio communication between the terminal 13 and a wireless communication system or network.
  • the A/V input unit 133 is for receiving an audio or video signal.
  • the user input unit 134 can generate key input data in accordance with a command input by the user to control various operations of the terminal device.
  • the sensing unit 135 detects the current state of the terminal 13, the position of the terminal 13, the presence or absence of a user's touch input to the terminal 13, the orientation of the terminal 13, the acceleration or deceleration movement and direction of the terminal 13, and the like, and generates a control terminal. 13 commands or signals for operation.
  • the interface unit 136 serves as an interface through which at least one external device can be connected to the terminal 13.
  • Output unit 138 is configured to provide an output signal in a visual, audio, and/or tactile manner.
  • the memory 139 may store a software program or the like for processing and control operations performed by the controller 137, or may temporarily store data that has been output or is to be output.
  • Memory 139 can include at least one type of storage medium.
  • the terminal 13 can cooperate with a network storage device that performs a storage function of the memory 139 through a network connection.
  • the controller 137 typically controls the overall operation of the terminal device. Additionally, the controller 137 can include a multimedia module for reproducing or playing back multimedia data.
  • the controller 137 can perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
  • the power supply unit 131 receives external power or internal power under the control of the controller 137 and provides appropriate power required to operate the various components and components.
  • Various embodiments of the multi-channel audio processing method proposed by the present disclosure may be implemented in a computer readable medium using, for example, computer software, hardware, or any combination thereof.
  • various implementations of the alignment method of video features proposed by the present disclosure may be through the use of an application specific integrated circuit (ASIC), a digital signal processor (DSP), a digital signal processing device (DSPD), a programmable logic device. (PLD), field programmable gate array (FPGA), processor, controller, microcontroller, microprocessor, at least one of the electronic units designed to perform the functions described herein, in some cases
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • DSPD digital signal processing device
  • PLD programmable logic device.
  • FPGA field programmable gate array
  • processor controller, microcontroller, microprocessor, at least one of the electronic units designed to perform the functions described herein, in some cases
  • Various embodiments of the multi-channel audio processing method proposed by the present disclosure may be implemented in the controller 137.
  • various implementations of the alignment methods of video features proposed by the present disclosure can be implemented with separate software modules that allow for the execution of at least one function or operation.
  • the software code can be implemented by a software application (or program) written in any suitable programming language, which can be stored in memory 138 and executed by controller 137.
  • the embodiment of the present invention adopts the above technical solution, and performs corresponding processing according to the audio characteristics of each channel audio in the multi-channel audio to be processed, thereby obtaining a good user experience.
  • exemplary does not mean that the described examples are preferred or better than the other examples.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本发明提供一种多声道音频处理方法、装置和计算机可读存储介质。其中,该多声道音频处理方法包括:接收待处理多声道音频;检测待处理多声道音频中各声道音频的音频特性;根据各声道音频的音频特性,对待处理多声道音频进行处理。本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而解决了如何获得了良好用户体验效果的技术问题。

Description

多声道音频处理方法、装置和计算机可读存储介质
相关申请的交叉引用
本申请要求申请号为201810356546.9、申请日为2018年04月19日的中国专利申请的优先权,该文献的全部内容以引用方式并入本文。
技术领域
本发明涉及一种音频技术领域,特别是指一种多声道音频处理方法、装置和计算机可读存储介质。
背景技术
随着音频互动的流行,音频越来越多地作为这种互动的信息传播载体。为了获得良好的互动体验效果,用户开始越来越多地关注音频的体验效果。
目前,现有技术普遍针对单声道音频进行处理。对于多声道音频而言,由于处理单声道音频的方法并没有考虑多声道音频中各个声道音频的特点;因而,将现有针对于单声道音频的处理方法应用于多声道音频时,无法获得良好的用户体验效果。
有鉴于上述,本发明遂针对上述现有技术的缺点,提出一种可获得良好用户体验效果的多声道音频处理方法,以有效克服上述的这些问题。
发明内容
本发明的主要目的在于提供一种多声道音频处理方法,以至少部分地解决如何获得良好用户体验效果的技术问题;此外,还提供一种多声道音频处理装置、多声道音频处理硬件装置和计算机可读存储介质。
一种多声道音频处理方法,包括:
接收待处理多声道音频;
检测所述待处理多声道音频中各声道音频的音频特性;
根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理。
其中,所述检测所述待处理多声道音频中各声道音频的音频特性步骤,包括:
如果所述待处理多声道音频为离线音频,则检测所述待处理多声道音频 中各声道音频的整体音频特性;
如果所述待处理多声道音频为在线音频,则检测所述待处理多声道音频中各声道音频的局部音频特性。
其中,如果所述待处理多声道音频为离线音频,则所述根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理的步骤,包括:
根据所述整体音频特性,确定第一音频处理参数;
基于所述第一音频处理参数,对所述待处理多声道音频进行处理。
其中,所述整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性;
所述根据所述整体音频特性,确定第一音频处理参数的步骤具体包括:
根据所述音高特性、所述声音共振峰特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲;
所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤,包括:
调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理。
其中,所述整体音频特性包括音高特性和声音共振峰特性;
所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
根据所述音高特性和所述声音共振峰特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和共振峰幅值;
所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤具体包括:
调整所述基频幅值并平滑所述共振峰幅值。
其中,所述整体音频特性包括音高特性和瞬态声音脉冲特性;
所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
根据所述音高特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和瞬态脉冲;
所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤具体包括:
调整所述基频幅值并对所述瞬态脉冲进行削波处理。
其中,所述整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;
所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
根据所述音高特性、所述声音共振峰特性、所述瞬态声音脉冲特性和所述音频相位特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;
所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤,包括:
调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理,以及调整所述音频相位。
其中,所述整体音频特性包括多声道音频下混特性和主边信道特性;
所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
根据所述多声道音频下混特性和所述主边信道特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;
所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤具体包括:
针对所述待处理多声道音频中所有声道音频进行联合处理,并且对其调整所述基频幅值并平滑所述共振峰幅值。
其中,如果所述待处理多声道音频为在线音频,则所述根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理的步骤,具体包括:
根据所述局部音频特性,确定第二音频处理参数;
基于所述第二音频处理参数,对所述待处理多声道音频进行处理。
为达上述目的,本发明还提出一种多声道音频处理装置,包括:
接收模块,用于接收待处理多声道音频;
检测模块,用于检测所述待处理多声道音频中各声道音频的音频特性;
处理模块,用于根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理。
其中,所述检测模块包括:
第一检测单元,用于在所述待处理多声道音频为离线音频的情况下,检测所述待处理多声道音频中各声道音频的整体音频特性;
第二检测单元,用于在所述待处理多声道音频为在线音频的情况下,检测所述待处理多声道音频中各声道音频的局部音频特性。
其中,如果所述待处理多声道音频为离线音频,则处理模块包括:
第一确定单元,用于根据所述整体音频特性,确定第一音频处理参数;
第一处理单元,用于基于所述第一音频处理参数,对所述待处理多声道音频进行处理。
其中,所述整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性;
所述第一确定单元具体用于根据所述音高特性、所述声音共振峰特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲;
所述第一处理单元具体用于调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理。
其中,所述整体音频特性包括音高特性和声音共振峰特性;
所述第一确定单元具体用于根据所述音高特性和所述声音共振峰特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和共振峰幅值;
所述第一处理单元具体用于调整所述基频幅值并平滑所述共振峰幅值。
其中,所述整体音频特性包括音高特性和瞬态声音脉冲特性;
所述第一确定单元具体用于根据所述音高特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和瞬态脉冲;
所述第一处理单元具体用于调整所述基频幅值并对所述瞬态脉冲进行削波处理。
其中,所述整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;
所述第一确定单元具体用于根据所述音高特性、所述声音共振峰特性、所述瞬态声音脉冲特性和所述音频相位特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;
所述第一处理单元具体用于调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理,以及调整所述音频相位。
其中,所述整体音频特性包括多声道音频下混特性和主边信道特性;
所述第一确定单元具体用于根据所述多声道音频下混特性和所述主边信道特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;
所述第一处理单元具体用于针对所述待处理多声道音频中所有声道音 频进行联合处理,并且对其调整所述基频幅值并平滑所述共振峰幅值。
其中,如果所述待处理多声道音频为在线音频,则所述处理模块还包括:
第二确定单元,用于根据所述局部音频特性,确定第二音频处理参数;
第二处理单元,用于基于所述第二音频处理参数,对所述待处理多声道音频进行处理。
为达上述目的,本发明还提出一种多声道音频处理硬件装置,包括:
存储器,用于存储非暂时性计算机可读指令;以及
处理器,处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现上述多声道音频处理方法。
为达上述目的,本发明还提出一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行上述所述的多声道音频处理方法。
本发明的有益效果为:
本发明实施例提供一种多声道音频处理方法、装置和计算机可读存储介质。其中,该多声道音频处理方法包括:接收待处理多声道音频;检测待处理多声道音频中各声道音频的音频特性;根据各声道音频的音频特性,对待处理多声道音频进行处理。本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
附图说明
图1为根据本发明一个实施例的多声道音频处理方法的流程示意图;
图2为根据本发明一个实施例的分别针对离线音频和在线音频进行检测的流程示意图;
图3为根据本发明一个实施例的针对离线音频进行处理的流程示意图;
图4为根据本发明一个实施例的多声道音频处理方法的流程示意图;
图5为根据本发明一个实施例的针对在线音频进行处理的流程示意图;
图6为根据本发明一个实施例的多声道音频处理方法的流程示意图;
图7为根据本发明一个实施例的多声道音频处理装置的结构示意图;
图8为根据本发明一个实施例的处理模块的结构示意图;
图9为根据本发明另一个实施例的处理模块的结构示意图;
图10为根据本发明一个实施例的多声道音频处理硬件装置的结构示意图;
图11为根据本发明一个实施例的计算机可读存储介质的结构示意图;
图12为根据本发明一个实施例的多声道音频处理终端的结构示意图;
图13为根据本发明另一个实施例的多声道音频处理终端的结构示意图。
具体实施方式
以下通过特定的具体实例说明本发明的实施方式,本领域技术人员可由本说明书所揭露的内容轻易地了解本发明的其他优点与功效。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。本发明还可以通过另外不同的具体实施方式加以实施或应用,本说明书中的各项细节也可以基于不同观点与应用,在没有背离本发明的精神下进行各种修饰或改变。需说明的是,在不冲突的情况下,以下实施例及实施例中的特征可以相互组合。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,下文描述在所附权利要求书的范围内的实施例的各种方面。应显而易见,本文中所描述的方面可体现于广泛多种形式中,且本文中所描述的任何特定结构及/或功能仅为说明性的。基于本发明,所属领域的技术人员应了解,本文中所描述的一个方面可与任何其它方面独立地实施,且可以各种方式组合这些方面中的两者或两者以上。举例来说,可使用本文中所阐述的任何数目个方面来实施设备及/或实践方法。另外,可使用除了本文中所阐述的方面中的一或多者之外的其它结构及/或功能性实施此设备及/或实践此方法。
还需要说明的是,以下实施例中所提供的图示仅以示意方式说明本发明的基本构想,图式中仅显示与本发明中有关的组件而非按照实际实施时的组件数目、形状及尺寸绘制,其实际实施时各组件的型态、数量及比例可为一种随意的改变,且其组件布局型态也可能更为复杂。
另外,在以下描述中,提供具体细节是为了便于透彻理解实例。然而,所属领域的技术人员将理解,可在没有这些特定细节的情况下实践所述方面。
为了解决如何获得良好用户体验效果的技术问题,本发明实施例提供一种多声道音频处理方法。如图1所示,该方法可以包括如下步骤S1至步骤S3。其中:
步骤S1:接收待处理多声道音频。
其中,待处理多声道音频可以是离线待处理多声道音频,也可以是在线待处理多声道音频,本发明对此不作限定。其中,多声道音频包括但不限于3.1声道音频、5.1声道音频、7.1声道音频等。
步骤S2:检测待处理多声道音频中各声道音频的音频特性。
其中,音频特性包括但不限于:音高特性、声音共振峰特性、瞬态声音脉冲特性、音频相位特性、多声道音频下混特性、主边信道特性等。
在本步骤中,可以检测一种或几种音频特性。
步骤S3:根据检测结果,对待处理多声道音频进行处理。
本步骤根据检测到的、一种或几种待处理多声道音频中各个声道音频的音频特性,对待处理多声道音频进行相应的处理。
在本步骤中,对待处理多声道音频进行处理的方式包括但不限于:联合处理、分离处理、平滑处理、音频相位处理、基频处理、置零处理、频谱伸缩处理、限幅处理等。
为了便于理解,下面对上述各个处理方式进行详细说明:
上述联合处理是指对各个声道的音频一起进行处理;
上述分离处理是分别对各个声道音频进行分别处理;
上述平滑处理是滤除突变的频域数据点,也即平滑频谱中谱峰的峰值数据;在具体实施过程中,可以采取邻域平均法、高斯平滑法、抛物线平滑法等方法进行实施;以邻域平均法为例,其基于卷积运算原理,利用滑动窗口对频谱中频率信号的幅值进行平滑;再以高斯平滑法为例,其根据高斯分布函数的形态计算出权值,并以该权值进行线性平滑处理。该平滑处理可以针对音频的全频带,也可以针对音频的部分频带。当对音频的共振峰进行平滑处理后,可以实现变调的效果;
上述基频处理是指对音频的基频进行调整,从而实现变调的效果;
上述音频相位处理是指调整该音频的相位,具体地,可以根据预定音效所对应的音频相位来进行调整;
上述置零处理是指在音频的全频带内消除瞬态脉冲所对应的频谱。
上述频谱伸缩处理是指通过对音频频谱进行插值或抽取,以伸缩频谱;该处理可以实现变速效果。
上述限幅处理是指削减瞬态脉冲的幅度。
针对检测出的一种或多种音频特性,本发明实施例可以采取上述一种或多种处理方式。在实际应用中,通过采取一种或几种处理方式,可以实现速度模式(也可称为处理速度优先)、质量模式(也可称为高音质优先)、 平衡模式(其兼顾处理速度和高音质),还可以实现变速不变调、变调不变速、变速变调的效果等。
本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
为了针对在线音源、离线音源等多声源进行自适应的处理,在一个可选的实施例中,如图2所示,步骤S2具体可以包括:
步骤S21:如果该待处理多声道音频为离线音频,则检测该待处理多声道音频中各声道音频的整体音频特性;
步骤S22:如果该待处理多声道音频为在线音频,则检测该待处理多声道音频中各声道音频的局部音频特性。
在本实施例中,由于在线音频为流媒体,故,所接收到的音频为一段、一段的音频。因此,针对该在线音频所检测的特性是局部音频特性。而对于离线音频而言,由于离线音频是预先编码好的完整音频,所以,针对该离线音频所检测的特性是整体音频特性,以确保进行音频处理后可以获得良好的用户体验。
其中,整体音频特性包括但不限于:音高特性、声音共振峰特性、瞬态声音脉冲特性、音频相位特性、多声道音频下混特性、主边信道特性等。
其中,局部音频特性包括整体音频特性中全部特性或部分特性,在此不再赘述。
本发明实施例通过采取上述技术方案,针对获得的在线音源、离线音源,分别进行局部音频特性、整体音频特性的检测,从而实现了自适应音频特性检测,以便于实现针对不同音源的自适应处理,从而可以提高用户体验效果。
需要说明的是,可以预先已知待处理多声道音频是为离线音频,还是为在线音频。当然,也可以预先未知该待处理多声道音频是离线音频,还是在线音频。
对此,优选地,在步骤S1之后,上述多声道音频处理方法还可以包括:
确定待处理多声道音频是离线音频还是在线音频。
在本实施例中,可以通过离线音频和在线音频各自的特点进行确定,例如,离线音频是完整的音频,而在线音频可以是通过实时消息传输协议传输的一段、一段的封包,据此可以判断出待处理多声道音频是离线音频还是在线音频;也可以预先添加识别标记来确定待处理多声道音频是离线音 频还是在线音频。本发明对此不作限定。
本发明实施例通过对离线音频和在线音频分别进行相应的处理,从而能够适应不同的多声道音频应用场景,进而可以获得更好地用户体验效果。
在一个可选的实施例中,在上述针对不同音源进行处理的实施例的基础上,如果该待处理多声道音频为离线音频,则如图3所示,该步骤S3具体包括:
步骤S31:根据整体音频特性,确定第一音频处理参数;
步骤S32:根据该第一音频处理参数,对该待处理多声道音频进行处理。
其中,第一音频处理参数包括但不限于各声道之间的音频相关性强弱、基频幅值、共振峰幅值、瞬态脉冲、音频包络等。
举例来说,如果音频相关性强,则对待处理多声道音频中的各个声道的音频进行联合处理;如果音频相关性弱,则对待处理多声道音频中的各个声道的音频分别进行单独处理(即分离处理)。
在一个优选的实施例中,如果整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性;根据整体音频特性,确定第一音频处理参数的步骤具体包括:根据音高特性、声音共振峰特性和瞬态声音脉冲特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲;基于第一音频处理参数,对待处理多声道音频进行处理的步骤具体包括:调整基频幅值,且平滑共振峰幅值,并对瞬态脉冲进行削波处理。
在一个优选的实施例中,如果整体音频特性包括音高特性和声音共振峰特性;则根据整体音频特性,确定第一音频处理参数的步骤具体包括:根据音高特性和声音共振峰特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值和共振峰幅值;基于第一音频处理参数,对待处理多声道音频进行处理的步骤具体包括:调整基频幅值并平滑共振峰幅值。
在一个优选的实施例中,如果整体音频特性包括音高特性和瞬态声音脉冲特性;则根据整体音频特性,确定第一音频处理参数的步骤,包括:根据音高特性和瞬态声音脉冲特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值和瞬态脉冲;基于第一音频处理参数,对待处理多声道音频进行处理的步骤具体包括:调整基频幅值并对瞬态脉冲进行削波处理。
在一个优选的实施例中,如果整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;根据整体音频特性,确定第一 音频处理参数的步骤具体可以包括:根据音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;基于第一音频处理参数,对待处理多声道音频进行处理的步骤具体可以包括:调整基频幅值,且平滑共振峰幅值,并对瞬态脉冲进行削波处理,以及调整音频相位。
在一个优选的实施例中,如果整体音频特性包括多声道音频下混特性和主边信道特性;根据整体音频特性,确定第一音频处理参数的步骤具体包括:根据多声道音频下混特性和主边信道特性,确定第一音频处理参数;其中,第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;基于第一音频处理参数,对待处理多声道音频进行处理的步骤具体包括:针对待处理多声道音频中所有声道音频进行联合处理,并且对其调整基频幅值并平滑共振峰幅值。
其中,音频相关性强可以根据以下因素来确定:各个声道音频的频谱特性情况、各个声道音频源的音质情况、各个声道音频的采集方式等,但绝不限于此。具体而言,如果各个声道的音频源是联合采集的,则在对待处理多声道音频进行处理时,可以采取联合处理的方式;如果各个声道音频是通过独立的麦克风采集的,则在对待处理多声道音频进行处理时,可以采取分离处理方式;如果各个声道音频的频谱特性好,则在对待处理多声道音频进行处理时,可以采取联合处理方式;如果如果各个声道音频的频谱特性差,则在对待处理多声道音频进行处理时,可以采取分离处理方式;如果共振峰的幅值大于共振峰阈值,则对所述待处理多声道音频中包含的共振峰进行平滑处理;如果音频包络发生偏移,则对待处理多声道音频的频域中基频和共振峰的幅值进行调整。
由此可见,本实施例通过采取上述技术方案,根据离线的待处理多声道音频的整体音频特性,确定与之相应的第一音频处理参数;然后,根据所确定的该第一音频处理参数进行自适应的处理,从而可以获得不同的音频效果。例如,通过进行基频幅值的调整,可以实现声音的变调的效果;通过对共振峰幅值的平滑处理,可以实现声音的变调的效果;通过对音频包络进行偏移处理,可以实现声音变调的效果;从而自适应地对音频进行变调的技术效果;由此,通过本发明实施例可以获得良好的用户体验效果。
下面结合图4以具体实施例对本发明进一步详细说明。
步骤Sa1:接收待处理多声道音频;
步骤Sa2:如果该待处理多声道音频为离线音频,则检测该待处理多声道音频中各声道音频的整体音频特性;
步骤Sa3:根据整体音频特性,确定强相关性音频处理参数;
步骤Sa4:根据强相关性音频处理参数,对待处理多声道音频进行联合处理。
本实施例通过对接收到的离线的待处理多声道音频,进行整体音频特性的检测;然后,确定强相关性音频处理参数作为待处理多声道音频的待处理参数,最后进行与该强相关性音频处理参数相应的联合处理,从而实现了自适应处理,由此可以获得良好的用户体验效果。
在一个可选的实施例中,在上述针对不同音源进行处理的实施例的基础上,如果该待处理多声道音频为在线音频,则如图5所示,该步骤S3具体包括:
步骤S33:根据局部音频特性,确定第二音频处理参数;
步骤S34:根据第二音频处理参数,对待处理多声道音频进行处理。
其中,第二音频处理参数可以是部分或全部上述第一音频处理参数。
有关本实施例的说明可以参考前述图3所示实施例中的相应说明,在此不再赘述。
本发明实施例通过采取上述技术方案,根据在线的待处理多声道音频的局部音频特性,确定与之相应的第二音频处理参数;然后,根据所确定的该第二音频处理参数进行自适应的处理,从而可以获得不同的音频效果。例如,通过进行基频幅值的调整,可以实现声音的变调的效果;通过对共振峰幅值的平滑处理,可以实现声音的变调的效果;通过对音频包络进行偏移处理,可以实现声音变调的效果;从而自适应地对音频进行变调的技术效果;由此,通过本发明实施例可以获得良好的用户体验效果。
针对在线音频进行处理的明显变型实施例或等同替换实施例还可以参考前述针对离线音频进行处理的实施例,在此不再赘述。
为了便于更好地理解本发明,下面结合图6以具体实施例对本发明进行详细说明。
如图6所示,本发明实施例提供一种多声道音频处理方法,包括:
步骤Sb1:接收待处理多声道音频;
步骤Sb2:确定待处理多声道音频是离线音频还是在线音频;如果该待处理多声道音频为离线音频,则执行步骤Sb3;如果该待处理多声道音频为在线音频,则执行步骤Sb4;
步骤Sb3:检测该待处理多声道音频中各声道音频的整体音频特性,并执行步骤Sb5;
步骤Sb4:检测该待处理多声道音频中各声道音频的局部音频特性,并执行步骤Sb7;
步骤Sb5:根据整体音频特性,确定第一音频处理参数,并执行步骤Sb6;
步骤Sb6:根据该第一音频处理参数,对该待处理多声道音频进行处理;
步骤Sb7:根据局部音频特性,确定第二音频处理参数,并执行步骤Sb8;
步骤Sb8:根据第二音频处理参数,对待处理多声道音频进行处理。
本发明实施例通过采取上述技术方案,分别针对离线音频音源、在线音频音源,分别按照整体音频特性、局部音频特性,确定出相应的音频处理参数并进行处理,从而实现了自适应的音频处理,由此获得了良好的用户体验效果。
在上文中,虽然按照上述的顺序描述了多声道音频处理方法实施例中的各个步骤,本领域技术人员应清楚,本发明实施例中的步骤并不必然按照上述顺序执行,其也可以倒序、并行、交叉等其他顺序执行,而且,在上述步骤的基础上,本领域技术人员也可以再添加其他步骤或删减上述部分步骤,这些明显变型或等同替换的方式也应包含在本发明的保护范围之内,在此不再赘述。
下面为本发明装置实施例,本发明装置实施例用于执行本发明方法实施例实现的步骤,为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明方法实施例。
基于与上述方法实施例相同的技术构思,本发明实施例还提供一种多声道音频处理装置。如图7所示,该装置包括:接收模块71、检测模块72和处理模块73。其中,接收模块71用于接收待处理多声道音频。检测模块72用于检测待处理多声道音频中各声道音频的音频特性。处理模块73用于根据各声道音频的音频特性,对待处理多声道音频进行处理。
本发明实施例通过采取上述技术方案,处理模块73根据检测模块72检测到的待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
在一个可选的实施例中,检测模块具体可以包括:
第一检测单元,用于在待处理多声道音频为离线音频的情况下,检测待处理多声道音频中各声道音频的整体音频特性;
第二检测单元,用于在待处理多声道音频为在线音频的情况下,检测待 处理多声道音频中各声道音频的局部音频特性。
在本实施例中,由于在线音频为流媒体,故,所接收到的音频为一段、一段的音频。因此,针对该在线音频所检测的特性是局部音频特性。而对于离线音频而言,由于离线音频是预先编码好的完整音频,所以,针对该离线音频所检测的特性是整体音频特性,以确保进行音频处理后可以获得良好的用户体验。
其中,整体音频特性包括但不限于:音高特性、声音共振峰特性、瞬态声音脉冲特性、音频相位特性、多声道音频下混特性、主边信道特性等。
其中,局部音频特性包括整体音频特性中全部特性或部分特性,在此不再赘述。
本实施例通过第一检测单元和第二检测单元,分别针对待处理多声道音频为离线音频或在线音频进行整体音频特性或局部音频特性的检测,以便于实现多声源的自适应处理,从而可以使得用户能够获得良好的体验。
在一个可选的实施例中,如图8所示,如果待处理多声道音频为离线音频,则处理模块具体包括第一确定单元81和第一处理单元82。其中,第一确定单元81用于根据整体音频特性,确定第一音频处理参数。第一处理单元82用于基于第一音频处理参数,对待处理多声道音频进行处理。
其中,第一音频处理参数包括但不限于各声道之间的音频相关性强弱、基频幅值、共振峰幅值、瞬态脉冲、音频包络等。
举例而言,如果音频相关性强,则对待处理多声道音频中的各个声道的音频进行联合处理;如果音频相关性弱,则对待处理多声道音频中的各个声道的音频分别进行单独处理(即分离处理)。
在一个可选的实施例中,整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性;第一确定单元81具体用于根据音高特性、声音共振峰特性和瞬态声音脉冲特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲;第一处理单元82具体用于调整基频幅值,且平滑共振峰幅值,并对瞬态脉冲进行削波处理。
在一个可选的实施例中,整体音频特性包括音高特性和声音共振峰特性;第一确定单元81还可以具体用于根据音高特性和声音共振峰特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值和共振峰幅值;第一处理单元82还可以具体用于调整基频幅值并平滑共振峰幅值。
在一个可选的实施例中,整体音频特性包括音高特性和瞬态声音脉冲特性;第一确定单元81还可以具体用于根据音高特性和瞬态声音脉冲特性, 确定第一音频处理参数;其中,第一音频处理参数包括基频幅值和瞬态脉冲;第一处理单元82还可以具体用于调整基频幅值并对瞬态脉冲进行削波处理。
在一个可选的实施例中,整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;第一确定单元81还可以具体用于根据音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性,确定第一音频处理参数;其中,第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;第一处理单元82还可以具体用于调整基频幅值,且平滑共振峰幅值,并对瞬态脉冲进行削波处理,以及调整音频相位。
在一个可选的实施例中,整体音频特性包括多声道音频下混特性和主边信道特性;第一确定单元81还可以具体用于根据多声道音频下混特性和主边信道特性,确定第一音频处理参数;其中,第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;第一处理单元82还可以具体用于针对待处理多声道音频中所有声道音频进行联合处理,并且对其调整基频幅值并平滑共振峰幅值。
在一个可选的实施例中,如图9所示,如果待处理多声道音频为在线音频,则处理模块还包括第二确定单元91和第二处理单元92。其中,第二确定单元91,用于根据局部音频特性,确定第二音频处理参数。第二处理单元92,用于基于第二音频处理参数,对待处理多声道音频进行处理。
其中,第二音频处理参数可以是部分或全部上述第一音频处理参数。
有关本实施例的说明可以参考前述实施例中的相应说明,在此不再赘述。
基于与上述多声道音频处理方法实施例相同的技术构思,本发明实施例还提供一种多声道音频处理硬件装置。图10示出了根据本公开实施例的多声道音频处理硬件装置的结构示意图。如图10所示,该多声道音频处理硬件装置10包括存储器101和处理器102。其中,存储器101用于存储非暂时性计算机可读指令;处理器102用于运行所述计算机可读指令,使得所述处理器执行时实现上述多声道音频处理方法实施例。
其中,该存储器101用于存储非暂时性计算机可读指令。具体地,存储器101可以包括一个或多个计算机程序产品,该计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。该易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。该非易失性存储器例如可以包括只读存储器(ROM)、硬 盘、闪存等。
该处理器102可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制多声道音频处理硬件装置10中的其它组件以执行期望的功能。在本公开的一个实施例中,该处理器102用于运行该存储器101中存储的该计算机可读指令,使得该多声道音频处理硬件装置10执行前述的本公开各实施例的多声道音频处理方法的全部或部分步骤。
本领域技术人员应能理解,为了解决如何获得良好用户体验效果的技术问题,本实施例中也可以包括诸如通信总线、接口等公知的结构,这些公知的结构也应包含在本发明的保护范围之内。
有关本实施例的详细说明可以参考前述各实施例中的相应说明,在此不再赘述。
本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
基于与上述多声道音频处理方法实施例相同的技术构思,本发明实施例还提供一种计算机可读存储介质。如图11所示,该计算机可读存储介质11用于存储非暂时性计算机可读指令111,当所述非暂时性计算机可读指令111由计算机执行时,使得该计算机执行上述多声道音频处理方法实施例中所述的步骤。
上述计算机可读存储介质11包括但不限于:光存储介质(例如:CD-ROM和DVD)、磁光存储介质(例如:MO)、磁存储介质(例如:磁带或移动硬盘)、具有内置的可重写非易失性存储器的媒体(例如:存储卡)和具有内置ROM的媒体(例如:ROM盒)。
有关本实施例的详细说明可以参考前述各实施例中的相应说明,在此不再赘述。
本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
基于与上述多声道音频处理方法实施例相同的技术构思,本发明实施例还提供一种多声道音频处理终端。图12示例性地示出了多声道音频处理终端的结构示意图。如图12所示,该多声道音频处理终端12包括上述多声道音频处理装置121。
上述终端12可以以各种形式来实施,本公开中的终端可以包括但不限于诸如移动电话、智能电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、导航装置、车载终端设备、车载显示终端、车载电子后视镜等等的移动终端设备以及诸如数字TV、台式计算机等等的固定终端设备。
作为等同替换的实施方式,该多声道音频处理终端还可以包括其他组件。如图13所示,该多声道音频处理终端13可以包括电源单元131、无线通信单元132、A/V(音频/视频)输入单元133、用户输入单元134、感测单元135、接口单元136、控制器137、输出单元138和存储器139等等。图13示出了具有各种组件的终端,但是应理解的是,并不要求实施所有示出的组件,也可以替代地实施更多或更少的组件。
其中,无线通信单元132允许终端13与无线通信系统或网络之间的无线电通信。A/V输入单元133用于接收音频或视频信号。用户输入单元134可以根据用户输入的命令生成键输入数据以控制终端设备的各种操作。感测单元135检测终端13的当前状态、终端13的位置、用户对于终端13的触摸输入的有无、终端13的取向、终端13的加速或减速移动和方向等等,并且生成用于控制终端13的操作的命令或信号。接口单元136用作至少一个外部装置与终端13连接可以通过的接口。输出单元138被构造为以视觉、音频和/或触觉方式提供输出信号。存储器139可以存储由控制器137执行的处理和控制操作的软件程序等等,或者可以暂时地存储己经输出或将要输出的数据。存储器139可以包括至少一种类型的存储介质。而且,终端13可以与通过网络连接执行存储器139的存储功能的网络存储装置协作。控制器137通常控制终端设备的总体操作。另外,控制器137可以包括用于再现或回放多媒体数据的多媒体模块。控制器137可以执行模式识别处理,以将在触摸屏上执行的手写输入或者图片绘制输入识别为字符或图像。电源单元131在控制器137的控制下接收外部电力或内部电力并且提供操作各元件和组件所需的适当的电力。
本公开提出的多声道音频处理方法的各种实施方式可以以使用例如计算机软件、硬件或其任何组合的计算机可读介质来实施。对于硬件实施,本公开提出的视频特征的比对方法的各种实施方式可以通过使用特定用途集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理装置(DSPD)、可编程逻辑装置(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、被设计为执行这里描述的功能的电子单元中的至少 一种来实施,在一些情况下,本公开提出的多声道音频处理方法的各种实施方式可以在控制器137中实施。对于软件实施,本公开提出的视频特征的比对方法的各种实施方式可以与允许执行至少一种功能或操作的单独的软件模块来实施。软件代码可以由以任何适当的编程语言编写的软件应用程序(或程序)来实施,软件代码可以存储在存储器138中并且由控制器137执行。
有关本实施例的详细说明可以参考前述各实施例中的相应说明,在此不再赘述。
本发明实施例通过采取上述技术方案,根据待处理多声道音频中各个声道音频的音频特性,进行与之相应的处理,从而获得了良好的用户体验效果。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
另外,如在此使用的,在以“至少一个”开始的项的列举中使用的“或”指示分离的列举,以便例如“A、B或C的至少一个”的列举意味着A或B或C,或AB或AC或BC,或ABC(即A和B和C)。此外,措辞“示例的”不意味着描述的例子是优选的或者比其他例子更好。
还需要指出的是,在本公开的系统和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
可以不脱离由所附权利要求定义的教导的技术而进行对在此所述的技术的各种改变、替换和更改。此外,本公开的权利要求的范围不限于以上所述的处理、机器、制造、事件的组成、手段、方法和动作的具体方面。 可以利用与在此所述的相应方面进行基本相同的功能或者实现基本相同的结果的当前存在的或者稍后要开发的处理、机器、制造、事件的组成、手段、方法或动作。因而,所附权利要求包括在其范围内的这样的处理、机器、制造、事件的组成、手段、方法或动作。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (21)

  1. 一种多声道音频处理方法,包括:
    接收待处理多声道音频;
    检测所述待处理多声道音频中各声道音频的音频特性;
    根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理。
  2. 如权利要求1所述的多声道音频处理方法,其中检测所述待处理多声道音频中各声道音频的音频特性步骤,包括:
    响应于所述待处理多声道音频为离线音频,检测所述待处理多声道音频中各声道音频的整体音频特性;
    响应于所述待处理多声道音频为在线音频,检测所述待处理多声道音频中各声道音频的局部音频特性。
  3. 根据权利要求2所述的方法,其中响应于所述待处理多声道音频为离线音频,根据所述各声道音频的音频特性对所述待处理多声道音频进行处理的步骤,包括:
    根据所述整体音频特性,确定第一音频处理参数;
    基于所述第一音频处理参数,对所述待处理多声道音频进行处理。
  4. 根据权利要求3所述的方法,其中所述整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性;
    所述根据所述整体音频特性,确定第一音频处理参数的步骤具体包括:
    根据所述音高特性、所述声音共振峰特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲;
    所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤,包括:
    调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理。
  5. 根据权利要求3所述的方法,其中所述整体音频特性包括音高特性和声音共振峰特性;
    所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
    根据所述音高特性和所述声音共振峰特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和共振峰幅值;
    所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的 步骤具体包括:
    调整所述基频幅值并平滑所述共振峰幅值。
  6. 根据权利要求3所述的方法,其中所述整体音频特性包括音高特性和瞬态声音脉冲特性;
    所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
    根据所述音高特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和瞬态脉冲;
    所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤具体包括:
    调整所述基频幅值并对所述瞬态脉冲进行削波处理。
  7. 根据权利要求3所述的方法,其中所述整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;
    所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
    根据所述音高特性、所述声音共振峰特性、所述瞬态声音脉冲特性和所述音频相位特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;
    所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤,包括:
    调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理,以及调整所述音频相位。
  8. 根据权利要求3所述的方法,其中所述整体音频特性包括多声道音频下混特性和主边信道特性;
    所述根据所述整体音频特性,确定第一音频处理参数的步骤,包括:
    根据所述多声道音频下混特性和所述主边信道特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;
    所述基于所述第一音频处理参数,对所述待处理多声道音频进行处理的步骤具体包括:
    针对所述待处理多声道音频中所有声道音频进行联合处理,并且对其调整所述基频幅值并平滑所述共振峰幅值。
  9. 根据权利要求2所述的方法,其中响应于所述待处理多声道音频为在线音频,所述根据所述各声道音频的音频特性对所述待处理多声道音频进行处理的步骤,包括:
    根据所述局部音频特性,确定第二音频处理参数;
    基于所述第二音频处理参数,对所述待处理多声道音频进行处理。
  10. 一种多声道音频处理装置,包括:
    接收模块,用于接收待处理多声道音频;
    检测模块,用于检测所述待处理多声道音频中各声道音频的音频特性;
    处理模块,用于根据所述各声道音频的音频特性,对所述待处理多声道音频进行处理。
  11. 根据权利要求10所述的装置,其中所述检测模块包括:
    第一检测单元,用于在所述待处理多声道音频为离线音频的情况下,检测所述待处理多声道音频中各声道音频的整体音频特性;
    第二检测单元,用于在所述待处理多声道音频为在线音频的情况下,检测所述待处理多声道音频中各声道音频的局部音频特性。
  12. 根据权利要求11所述的装置,其中如果所述待处理多声道音频为离线音频,则处理模块包括:
    第一确定单元,用于根据所述整体音频特性,确定第一音频处理参数;
    第一处理单元,用于基于所述第一音频处理参数,对所述待处理多声道音频进行处理。
  13. 根据权利要求12所述的装置,其中所述整体音频特性包括音高特性、声音共振峰特性和瞬态声音脉冲特性中的至少一项;
    所述第一确定单元具体用于根据所述音高特性、所述声音共振峰特性和所述瞬态声音脉冲特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值、共振峰幅值和瞬态脉冲中的至少一项;
    所述第一处理单元具体用于调整所述基频幅值,且平滑所述共振峰幅值,并对所述瞬态脉冲进行削波处理。
  14. 根据权利要求12所述的装置,其中所述整体音频特性包括音高特性和声音共振峰特性;
    所述第一确定单元具体用于根据所述音高特性和所述声音共振峰特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和共振峰幅值;
    所述第一处理单元具体用于调整所述基频幅值并平滑所述共振峰幅值。
  15. 根据权利要求12所述的装置,其中所述整体音频特性包括音高特性和瞬态声音脉冲特性;
    所述第一确定单元具体用于根据所述音高特性和所述瞬态声音脉冲特 性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括基频幅值和瞬态脉冲;
    所述第一处理单元具体用于调整所述基频幅值并对所述瞬态脉冲进行削波处理。
  16. 根据权利要求12所述的装置,其中所述整体音频特性包括音高特性、声音共振峰特性、瞬态声音脉冲特性和音频相位特性;
    所述第一确定单元用于根据所述音高特性、所述声音共振峰特性、所述瞬态声音脉冲特性和所述音频相位特性,确定所述第一音频处理参数;其中所述第一音频处理参数包括基频幅值、共振峰幅值、瞬态脉冲和音频相位;
    所述第一处理单元用于调整所述基频幅值,平滑所述共振峰幅值,对所述瞬态脉冲进行削波处理,以及调整所述音频相位。
  17. 根据权利要求12所述的装置,其中所述整体音频特性包括多声道音频下混特性和主边信道特性;
    所述第一确定单元具体用于根据所述多声道音频下混特性和所述主边信道特性,确定所述第一音频处理参数;其中,所述第一音频处理参数包括:强音频相关性、基频幅值和共振峰幅值;
    所述第一处理单元具体用于针对所述待处理多声道音频中所有声道音频进行联合处理,并且对其调整所述基频幅值并平滑所述共振峰幅值。
  18. 根据权利要求11所述的装置,其中如果所述待处理多声道音频为在线音频,则所述处理模块还包括:
    第二确定单元,用于根据所述局部音频特性,确定第二音频处理参数;
    第二处理单元,用于基于所述第二音频处理参数,对所述待处理多声道音频进行处理。
  19. 一种多声道音频处理硬件装置,包括:
    存储器,用于存储非暂时性计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,使得所述处理器执行时实现根据权利要求1至9中任意一项所述的多声道音频处理方法。
  20. 一种计算机可读存储介质,用于存储非暂时性计算机可读指令,当所述非暂时性计算机可读指令由计算机执行时,使得所述计算机执行权利要求1至9中任意一项所述的多声道音频处理方法。
  21. 一种多声道音频处理终端,包括权利要求10至18中任意一项所述的多声道音频处理装置。
PCT/CN2019/073021 2018-04-19 2019-01-24 多声道音频处理方法、装置和计算机可读存储介质 WO2019200996A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810356546.9A CN108495234B (zh) 2018-04-19 2018-04-19 多声道音频处理方法、装置和计算机可读存储介质
CN201810356546.9 2018-04-19

Publications (1)

Publication Number Publication Date
WO2019200996A1 true WO2019200996A1 (zh) 2019-10-24

Family

ID=63313626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/073021 WO2019200996A1 (zh) 2018-04-19 2019-01-24 多声道音频处理方法、装置和计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108495234B (zh)
WO (1) WO2019200996A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108495234B (zh) * 2018-04-19 2020-01-07 北京微播视界科技有限公司 多声道音频处理方法、装置和计算机可读存储介质
CN115914973B (zh) * 2023-02-10 2023-12-01 浙江华创视讯科技有限公司 麦克风声道检测方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014166243A1 (zh) * 2013-08-19 2014-10-16 中兴通讯股份有限公司 一种多终端多声道独立播放方法及装置
CN105208426A (zh) * 2015-09-24 2015-12-30 福州瑞芯微电子股份有限公司 一种音视频同步变速的方法及系统
CN105682000A (zh) * 2016-01-11 2016-06-15 北京时代拓灵科技有限公司 一种音频处理方法和系统
CN106797523A (zh) * 2014-08-01 2017-05-31 史蒂文·杰伊·博尼 音频设备
CN108495234A (zh) * 2018-04-19 2018-09-04 北京微播视界科技有限公司 多声道音频处理方法、装置和计算机可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7508947B2 (en) * 2004-08-03 2009-03-24 Dolby Laboratories Licensing Corporation Method for combining audio signals using auditory scene analysis
KR100608005B1 (ko) * 2004-09-06 2006-08-02 삼성전자주식회사 서브 우퍼 채널 신호의 위상 보정 방법 및 그 장치
CN103262159B (zh) * 2010-10-05 2016-06-08 华为技术有限公司 用于对多声道音频信号进行编码/解码的方法和装置
US9219460B2 (en) * 2014-03-17 2015-12-22 Sonos, Inc. Audio settings based on environment
CN105120398B (zh) * 2015-09-09 2019-07-12 海信集团有限公司 一种音箱及音箱系统
CN106255008A (zh) * 2016-08-11 2016-12-21 乐视控股(北京)有限公司 双声道音响的输出纠正方法及输出纠正装置
CN106686520B (zh) * 2017-01-03 2019-04-02 南京地平线机器人技术有限公司 能跟踪用户的多声道音响系统和包括其的设备
CN106851488B (zh) * 2017-03-30 2020-06-30 重庆辉烨通讯技术有限公司 音频输出的控制方法、装置和电路

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014166243A1 (zh) * 2013-08-19 2014-10-16 中兴通讯股份有限公司 一种多终端多声道独立播放方法及装置
CN106797523A (zh) * 2014-08-01 2017-05-31 史蒂文·杰伊·博尼 音频设备
CN105208426A (zh) * 2015-09-24 2015-12-30 福州瑞芯微电子股份有限公司 一种音视频同步变速的方法及系统
CN105682000A (zh) * 2016-01-11 2016-06-15 北京时代拓灵科技有限公司 一种音频处理方法和系统
CN108495234A (zh) * 2018-04-19 2018-09-04 北京微播视界科技有限公司 多声道音频处理方法、装置和计算机可读存储介质

Also Published As

Publication number Publication date
CN108495234A (zh) 2018-09-04
CN108495234B (zh) 2020-01-07

Similar Documents

Publication Publication Date Title
US20230360668A1 (en) Linear filtering for noise-suppressed speech detection via multiple network microphone devices
WO2019101123A1 (zh) 语音活性检测方法、相关装置和设备
US20200213727A1 (en) Recording Method, Recording Play Method, Apparatuses, and Terminals
CN112424864A (zh) 用于噪声抑制话音检测的线性滤波
JP6703525B2 (ja) 音源を強調するための方法及び機器
CN103236263B (zh) 一种改善通话质量的方法、系统及移动终端
CN108346433A (zh) 一种音频处理方法、装置、设备及可读存储介质
US20220345817A1 (en) Audio processing method and device, terminal, and computer-readable storage medium
CN110741435B (zh) 音频信号处理的方法、系统和介质
US20170286049A1 (en) Apparatus and method for recognizing voice commands
CN106658284A (zh) 频域中的虚拟低音的相加
CN108597527B (zh) 多声道音频处理方法、装置、计算机可读存储介质和终端
WO2019200996A1 (zh) 多声道音频处理方法、装置和计算机可读存储介质
WO2019184517A1 (zh) 一种音频指纹提取方法及装置
US20130108083A1 (en) Audio processing system and adjusting method for audio signal buffer
US20200296534A1 (en) Sound playback device and output sound adjusting method thereof
TWI662544B (zh) 偵測環境噪音以改變播放語音頻率之方法及其聲音播放裝置
CN112053669B (zh) 一种人声消除方法、装置、设备及介质
CN103200480A (zh) 耳麦及其工作方法
CN108600936B (zh) 多声道音频处理方法、装置、计算机可读存储介质和终端
US20220150624A1 (en) Method, Apparatus and Computer Program for Processing Audio Signals
CN114678038A (zh) 音频噪声检测方法、计算机设备和计算机程序产品
US9438195B2 (en) Variable equalization
US9514765B2 (en) Method for reducing noise and computer program thereof and electronic device
CN103916097A (zh) 用于处理音频信号的设备和方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19788810

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 25.01.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19788810

Country of ref document: EP

Kind code of ref document: A1