WO2021212287A1 - Audio signal processing method, audio processing device, and recording apparatus - Google Patents

Audio signal processing method, audio processing device, and recording apparatus Download PDF

Info

Publication number
WO2021212287A1
WO2021212287A1 PCT/CN2020/085719 CN2020085719W WO2021212287A1 WO 2021212287 A1 WO2021212287 A1 WO 2021212287A1 CN 2020085719 W CN2020085719 W CN 2020085719W WO 2021212287 A1 WO2021212287 A1 WO 2021212287A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
audio signal
frequency
sound
sound source
Prior art date
Application number
PCT/CN2020/085719
Other languages
French (fr)
Chinese (zh)
Inventor
莫品西
边云锋
薛政
刘洋
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080038422.1A priority Critical patent/CN113875265A/en
Priority to PCT/CN2020/085719 priority patent/WO2021212287A1/en
Publication of WO2021212287A1 publication Critical patent/WO2021212287A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems

Definitions

  • This application relates to the field of signal processing technology, and in particular to an audio signal processing method, audio processing device, recording device, and computer-readable storage medium.
  • the recording function is equipped on many electronic products, such as voice recorders, cameras, camcorders and so on.
  • these electronic devices with recording functions usually simulate human ears and place microphones on the left and right sides for recording.
  • Figure 1 is a schematic structural diagram of an existing digital camera. It can be seen that the digital camera is equipped with dual microphones.
  • the left and right microphones are set for recording, there is still a significant difference between the structural layout of the dual microphones and the structural layout of the human ears. Therefore, the recorded audio is still three-dimensional with the sound actually heard by the human ears. There is a big gap. When the user is playing back the recorded audio, he cannot accurately distinguish the corresponding direction of the sound, and the stereo perception is poor.
  • the embodiments of the present application provide an audio signal processing method, an audio processing device, a recording device, and a computer-readable storage medium to solve the technical problem of poor stereo perception of audio recorded by the existing recording device.
  • the first aspect of the embodiments of the present application provides an audio signal processing method, including:
  • the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  • a second aspect of the embodiments of the present application provides an audio processing device, including:
  • Memory used to store computer programs
  • the processor is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:
  • the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  • a third aspect of the embodiments of the present application provides a recording device, including: at least two microphones, a memory, and a processor;
  • the microphone is used to collect audio signals
  • the memory is used to store a computer program
  • the processor is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:
  • the difference information of the sound reception response includes: the amplitude response difference and/or the phase response difference of the first sound reception organ and the second sound reception organ to the audio components of the same frequency transmitted in the same direction;
  • the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  • the fourth aspect of the embodiments of the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • the audio signal processing method is a computer-readable storage medium with a computer program stored on the computer-readable storage medium.
  • each audio component of the audio signal to be processed is modulated according to the difference information of the reception response corresponding to the audio component, wherein the difference information of the reception response represents the first sound collection organ and the second collection
  • the organ represents amplitude response difference and/or phase response difference information for the audio components of the same frequency transmitted in the same direction. Therefore, after modulating according to the difference information of the reception response, the generated first audio signal and the second audio signal This difference in amplitude response and/or phase response can also be reflected. Through this difference in response, a sufficient sense of space can be constructed so that the human ear can clearly distinguish the direction of the sound.
  • FIG. 1 is a schematic diagram of the structure of an existing digital camera.
  • Figure 2 is a schematic diagram of a scene of recording sound through an artificial head.
  • Fig. 3 is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application.
  • Fig. 4 is an algorithm block diagram of an exemplary audio signal processing method provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application.
  • the existing electronic equipment with a recording function is equipped with two microphones on the left and right to imitate the human ears.
  • the structural layout of dual microphones is still very different from the structural layout of human ears, the audio recorded by dual microphones does not have enough stereoscopic effect, and cannot accurately and clearly restore the corresponding direction of the actual sound.
  • a feasible solution is to set up microphones at both ears of real people, artificial heads or similar artificial head devices for recording. In this way, because the microphones are in the same environment as the human ears, they can be recorded. Real stereo sound.
  • Fig. 2 is a schematic diagram of a scene of recording sound through an artificial head.
  • the embodiments of the present application provide an audio signal processing method. After using this method to process the recorded audio signal, an audio signal with enhanced stereo perception can be obtained, and the method does not require the cooperation of real persons, artificial heads, etc. Therefore, there are no shortcomings of inconvenience, poor portability, and high cost.
  • FIG. 3 is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application. The method includes the following steps:
  • S301 Acquire an audio signal to be processed.
  • the audio signal to be processed includes audio components of multiple frequencies.
  • S302 Determine the sound source direction corresponding to each of the multiple audio components.
  • the difference in sound reception response information includes: difference in amplitude response and/or difference in phase response of the first sound receiving organ and the second sound receiving organ to audio components of the same frequency transmitted in the same direction.
  • S304 According to the determined difference information of the radio response, modulate the audio component of the corresponding frequency in the audio signal to be processed to generate a first audio signal for playing in the first radio organ and a signal for playing in the second radio organ.
  • the second audio signal According to the determined difference information of the radio response, modulate the audio component of the corresponding frequency in the audio signal to be processed to generate a first audio signal for playing in the first radio organ and a signal for playing in the second radio organ. The second audio signal.
  • the sound-receiving organ may specifically be a human ear.
  • the first sound-receiving organ may be the left ear
  • the second sound-receiving organ may be the right ear.
  • the sound-receiving organ can also be an electronic device that mimics the human ear, such as artificial ears and hearing aids.
  • the sound receiving organ may correspond to one or more channels of the playback device.
  • the playback device is an earphone
  • the first sound receiving organ may correspond to the left channel of the earphone
  • the second sound receiving organ may correspond to the right channel of the earphone.
  • the first radio organ can also correspond to the left channel speaker in the speaker kit (if it is a four-channel stereo, the first radio organ can correspond to the left front and left rear speakers), and the second radio organ can correspond to The right channel speaker in the speaker set (if it is a four-channel stereo, the second radio organ can correspond to the right front and right rear speakers).
  • the playback device with two or more sound channels has a playback channel corresponding to the first sound receiving organ and the second sound receiving organ described in this application.
  • the human ear distinguishes the direction of sound, it mainly relies on the difference in the response of the two ears to the sound.
  • the left and right ears of a person can hear the sound, and because the positions of the left and right ears are different, the response between the two is also different. For example, if the sound source is on the left, the left ear will hear the sound before the right ear, and the amplitude of the sound heard by the left ear may be greater than that of the right ear.
  • this response difference can be reflected in the difference between the two audio frequencies, and the two audio frequencies are respectively in the human left and right ears.
  • the response difference between human ears can be divided into amplitude response difference and phase response difference.
  • the main basis may be different responses. difference.
  • low-frequency sound due to its strong diffraction ability, low-frequency sound easily bypasses obstacles to reach both ears. Therefore, the amplitude difference between the two ears is small.
  • the human ear mainly relies on the phase difference when distinguishing the direction of low-frequency sound.
  • high-frequency sound because the phase difference generated at the ears has been aliased, it is difficult to distinguish its direction through the phase difference.
  • the diffraction ability of high-frequency sound is weak, and it affects the human head, shoulders and other parts.
  • the frequency components of the audio signal to be processed can be analyzed to determine the audio components of multiple frequencies included in the audio signal to be processed, so as to facilitate subsequent processing of audio components of different frequencies.
  • Different treatment There are many feasible ways to determine the frequency components of the audio signal to be processed.
  • Fourier transform can be performed on the audio signal to be processed to obtain the frequency spectrum of the audio signal to be processed, and the audio components of each frequency in the audio signal to be processed can be determined.
  • a filtering method, a progeny analysis method, etc. can also be used, and the audio components of multiple frequencies included in the to-be-processed audio signal can also be determined.
  • the corresponding sound source direction may be determined for the audio components of each frequency of the audio signal to be processed.
  • the audio signals collected by at least two microphones can be used to determine the sound source location algorithm (for ease of reference, the microphone used to determine the sound source direction is referred to as a directional microphone below) .
  • the sound source location algorithm for ease of reference, the microphone used to determine the sound source direction is referred to as a directional microphone below.
  • it can be determined according to the audio components of the corresponding frequencies in the audio signals collected by each directional microphone.
  • sound source localization algorithms such as beamforming algorithm, arrival time difference estimation algorithm, and differential microphone array algorithm.
  • step S303 according to the determined sound source direction of the audio component, the radio response difference information corresponding to the audio component can be determined, and the determined radio response difference information can be used in step S304 for the audio component of the corresponding frequency. modulation.
  • Radio response difference information is information used to characterize the difference in amplitude response and/or phase response of the first radio organ and the second radio organ to the audio components of the same frequency transmitted in the same direction. It can have a variety of manifestations, such as In an implementation, the difference information of the reception response may include a first transfer coefficient and a second transfer coefficient.
  • the first transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the first correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the first correspondence relationship, and the first transfer coefficient may be calculated .
  • the second transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the second correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the second correspondence relationship to calculate the second transfer coefficient.
  • the audio component of the corresponding frequency in the audio signal to be processed may be modulated by the first transfer coefficient to generate a first audio signal for playing in the first radio organ, and the second transfer coefficient may be used for the audio component to be processed.
  • the audio component of the corresponding frequency in the audio signal is modulated to generate a second audio signal for playing in the second radio organ.
  • the difference between the above-mentioned first correspondence and the second correspondence may reflect the difference in amplitude response and/or phase of the first sound-receiving organ and the second sound-receiving organ to the audio components of the same frequency transmitted in the same direction.
  • the difference between the first transfer coefficient determined based on the first correspondence relationship and the second transfer coefficient determined based on the second correspondence relationship can also reflect the difference between the aforementioned first sound receiving organ and the second sound receiving organ.
  • the response difference Furthermore, the difference between the first audio signal generated based on the first transfer coefficient and the second audio signal generated based on the second transfer coefficient can also reflect the difference between the first sound receiving organ and the second sound receiving organ.
  • a clear three-dimensional feeling can be produced.
  • the first correspondence may be the correspondence between the sound source direction, frequency, and the first transfer coefficient
  • the second correspondence may be the sound source direction, frequency, and the second transfer coefficient.
  • the corresponding relationship For example, in an implementation, if the first correspondence relationship and the second correspondence relationship are both expressed in the form of functions, both the first correspondence relationship and the second correspondence relationship may be In the form of And ⁇ belong to the parameters that characterize the direction of the sound source, It can represent the pitch angle, ⁇ can represent the circumference angle, and k can correspond to the frequency number in the frequency spectrum, that is, k corresponds to different frequencies substantially.
  • the transfer coefficient corresponding to the audio component can be determined.
  • the first correspondence and the second correspondence may also include the amplitude and the phase.
  • the second transfer coefficient in one example, can be Among them, ⁇ is used to represent the amplitude gain of the right ear relative to the left ear, Used to indicate the phase difference between the ears.
  • the test signals of various frequencies emitted by sound sources in different directions can be obtained by measuring the radio responses of the radio organs.
  • microphones can be set at the two ear canals of an artificial head or a real person, and test signals of various frequencies can be played in different directions.
  • test signals of each direction and frequency separate Record the response signals collected by the two microphones.
  • the ratio of the response signal to the test signal can be used as a dependent variable, and the sound source direction and frequency can be used as independent variables to fit the corresponding relationship between the sound source direction, frequency and the transfer coefficient.
  • the corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the first sound receiving organ can be regarded as the first correspondence
  • the corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the second sound receiving organ can be regarded as The second correspondence.
  • the corresponding relationship can also be determined by the propagation model of the sound source to the human ears.
  • the characteristic parameters that are considered to affect the propagation process of sound from the sound source to the human ears can be selected in advance. These characteristic parameters include but are not limited to the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, Shoulder feature parameters, cheek feature parameters, hair feature parameters. According to the pre-selected characteristic parameters, combined with the principle of acoustic propagation, the first correspondence and the second correspondence can be derived.
  • the first transfer coefficient may be multiplied by the audio component of the corresponding frequency to obtain the new audio component of the corresponding frequency Component, this new audio component may be referred to as the first audio component.
  • the audio component of the corresponding frequency in the audio signal to be processed can be modulated by the second transfer coefficient, and a new audio component of the corresponding frequency can also be obtained.
  • the new audio component obtained by the second transfer coefficient can be called the second Audio component.
  • the first transfer coefficient and the second transfer coefficient can be used to modulate the audio components in the frequency domain, respectively. Multiply.
  • modulation can also be performed in the time domain.
  • the first transfer coefficient and the second transfer coefficient can be converted into the first transfer coefficient in the time domain and the second transfer coefficient in the time domain, respectively.
  • the first transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed
  • the second transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed to generate the required first audio signal and second audio signal.
  • the audio signal processing method provided by the embodiments of the present application can be applied in a recording scene, and in this recording scene, at least two microphones are required to record at the same time, for example, through a microphone including multiple microphones. Array for recording. It can be understood that although the audio signals recorded by the microphones correspond to the same content (that is, they actually come from the same sound source), the recorded audio signals are also different due to the different positions of the microphones.
  • the audio signal to be processed is used as the object of the transfer coefficient and can be relatively flexible in selection.
  • the audio signal to be processed may be determined based on the audio signal collected by a directional microphone (that is, the aforementioned microphone for determining the direction of the sound source).
  • the audio signal to be processed may be determined based on the audio signal collected by a microphone other than a directional microphone.
  • the microphone array may include 6 microphones, 3 of which may be selected as directional microphones, and the audio signal to be processed may be determined based on the audio signals collected by the other 3 microphones.
  • the audio signal to be processed may also be determined based on audio signals collected by other microphones outside the microphone array, and the other microphones may also be microphones on other devices.
  • the audio signal to be processed can be selected from the recorded audio signal with a higher signal-to-noise ratio.
  • the audio signal to be processed may be the audio signal with the highest signal-to-noise ratio among the audio signals collected in the microphone. Since the direct action object of the transfer coefficient is the audio component of the corresponding frequency, in another optional implementation manner, the audio components of the same frequency in the audio signal collected by each directional microphone can be linearly combined to obtain the signal-to-noise The relatively high audio components of each frequency are used to combine with the first transfer coefficient and the second transfer coefficient.
  • the first audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the first audio signal
  • the second audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the second audio signal.
  • the transformation from the frequency domain to the time domain can be implemented using inverse Fourier transform.
  • the acquired audio signal to be processed is not necessarily a stable signal, and when the frequency spectrum is analyzed later, especially when the frequency spectrum is analyzed by Fourier transform, since the Fourier transform requires the input signal to be stable,
  • the audio signal to be processed may be divided into frames according to a set frame length, and the audio signal to be processed may be divided into audio frames. For each audio frame, due to its short frame length, it can be considered that the signal within the audio frame is stable.
  • the subsequent processing will also be performed on the audio frames.
  • the audio frames can be analyzed to obtain the audio components of each frequency; the first audio of each frequency can be used The component is transformed to obtain a new first audio frame; the second audio component of each frequency is used to transform to obtain a new second audio frame. Further, each obtained new first audio frame is synthesized to obtain a first audio signal, and each obtained new second audio frame is synthesized to obtain a second audio signal.
  • the frame length of an audio frame you can usually set it flexibly based on experience. For example, if the sampling frequency of the microphone is fs, the number of sampling points included in an audio frame can be represented by N, and the frame length N can be selected in the range of 0.005fs ⁇ N ⁇ fs. In an optional implementation manner, N that can be set is a power of 2, even if the number of sampling points included in the audio frame is a power of 2, so in the subsequent spectrum analysis, The fast Fourier transform FFT can be used to accelerate the calculation.
  • the audio frame can be modulated into a periodic signal before the Fourier transform is performed.
  • the specific method of modulating into a periodic signal may be to add an analysis window to the audio frame, that is, to multiply the audio frame and the window function of the analysis window.
  • the window function of the analysis window can be a sine window, a Hanning window, etc., which is not specifically limited here.
  • each new audio frame can be processed through the overlap-add method. Yes, that is, the overlapping positions of the front and rear audio frames can be superimposed, and the superimposed audio frames can be directly combined to obtain the required audio signal. Further, considering that the audio frames are directly overlapped and accumulated, the overlapped part may have a sudden change in amplitude. In order to make the audio signal obtained after overlap and addition smooth, before performing overlap-add processing on each new audio frame, The amplitude at both ends of the audio frame can be distorted first.
  • the new audio frame obtained by the transformation may be windowed by the synthesis window, and the audio frame after the synthesis window may be used for overlap accumulation.
  • the window function of the synthesis window such as a sine window or a Hanning window.
  • FIG. 4 is an algorithm block diagram of an exemplary audio signal processing method provided in an embodiment of the present application.
  • Each microphone in the microphone array can be used as a directional microphone.
  • the time domain audio signal xm(t) collected by each microphone is divided into frames.
  • N sampling points can be extracted as an audio frame to obtain the time domain audio frame xm(n)l, where n is an audio
  • the frequency spectrum Xm(k)l corresponding to each microphone is input into the sound source localization module, and the sound source direction corresponding to the audio components of each frequency can be determined through the sound source localization algorithm based on the microphone array in the sound source localization module.
  • the wave velocity formation algorithm can be used, and the sound source direction includes the circumferential angle ⁇ and the pitch angle
  • the most basic beamforming algorithm is expressed as follows:
  • ⁇ i represents the direction of the sound source
  • It is the weight vector of the sound source corresponding to the ⁇ i direction of the discrete frequency k, which is also called the steering vector.
  • the steering vector can be expressed as ⁇ is the sound source frequency corresponding to the discrete frequency k
  • ⁇ m is the time difference between the time when the sound source in the ⁇ i direction reaches the m-th microphone and the time when it reaches the reference point. Therefore, the sound source direction corresponding to the k-th discrete frequency audio component is ⁇ (k), that is, the direction in which the beamforming output is the largest at this frequency.
  • the determined sound source direction ⁇ (k), And the discrete frequency k is used as the input quantity, and the transfer coefficient determination module is input.
  • the transfer coefficient determination module the above-mentioned input quantities can be substituted into the first corresponding relationship and the second corresponding relationship, respectively, to obtain the first transfer coefficient And the second transfer coefficient (In this embodiment, the first correspondence relationship corresponds to the left channel, and the second correspondence relationship corresponds to the right channel).
  • the first transfer coefficient can be Multiply the audio component Xref(k)l with the corresponding frequency in the audio signal to be processed to obtain the new first audio component XL(k)l of each frequency, and the second transfer coefficient Multiply with the audio component Xref(k)l of the corresponding frequency in the audio signal to be processed to obtain the new second audio component XR(k)l of each frequency.
  • the audio signal to be processed is a linear combination of audio components of the same frequency in the audio signals collected by each microphone, which can be expressed by the following formula:
  • wm represents the weight of the microphone, which can be a real number or a complex number.
  • each first audio frame x'L(n)l For each first audio frame x'L(n)l, perform the windowing processing of the synthesis window to obtain x"L(n)l, and input the first audio frame x"L(n)l after each synthesis window is added Overlap-add module, through the overlap and add method to modify, get the audio frame x"'L(n)l of the lth frame of the first audio signal, and combine each audio frame x"'L(n)l to get The complete first audio signal xL(t) for playing on the first radio organ.
  • each second audio frame x'R(n)l For each second audio frame x'R(n)l, the same, through the windowing process of the synthesis window to obtain x"R(n)l, add each second audio frame x"R(n) after the synthesis window l Input the Overlap-add module to obtain the audio frame x"'R(n)l of the lth frame of the second audio signal. Combine each audio frame x"'R(n)l to obtain a complete 2. The second audio signal xR(t) played on the radio organ.
  • the difference information of the reception response of the audio component has a corresponding relationship with the sound source direction and frequency of the audio component. But considering that the three-dimensional sense of sound is not only reflected in the clear direction, but also in the distance of the sound source, and the distance between the sound source is different, the response between the ears is also different, so it can be in the corresponding relationship Add the sound source distance as a variable, that is, the corresponding relationship can be form.
  • the sound source distance of each audio component may be determined, and then the difference information of the sound reception response of the audio component may be determined according to the determined sound source distance, sound source direction, and frequency.
  • the variables in the correspondence relationship may also include designated characteristic parameters, that is, the difference information of the audio component's reception response may be determined according to the designated characteristic parameters, the distance of the sound source, and the frequency.
  • the specified characteristic parameters may include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
  • the user can interact with the user to enable the user to input relevant information, thereby obtaining the user's designated characteristic parameter.
  • image recognition technology can be used to identify the user's image to obtain the required specified characteristic parameters. For example, the distance between the ears of the user can be obtained through image ranging, and the characteristic parameters of the ear canal of the user can be captured through image feature recognition.
  • each audio component of the audio signal to be processed is modulated according to the difference information of the sound reception response corresponding to the audio component, wherein the difference information of the sound reception response represents that the first sound receiving organ and the second sound receiving organ are identical to each other.
  • the generated first audio signal and the second audio signal can also be reflected
  • This difference in amplitude response and/or phase response difference, through this difference in response, can construct a sufficient sense of space, so that the human ear can clearly distinguish the direction of the sound.
  • FIG. 5 Please refer to FIG. 5 below for a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application.
  • the device includes:
  • the memory 501 is used to store computer programs
  • the processor 502 is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:
  • the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
  • the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  • the difference information of the sound reception response includes a first transfer coefficient and a second transfer coefficient
  • the first transfer coefficient is determined according to a sound source direction, frequency, and a first correspondence relationship corresponding to the audio component.
  • the second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence.
  • the difference between the first correspondence and the second correspondence is used to characterize the first radio
  • the first correspondence relationship is obtained by measuring the reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
  • the second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
  • the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is established according to preselected characteristic parameters.
  • the preselected characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
  • the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
  • the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each microphone.
  • the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
  • the audio signal to be processed is obtained based on the audio signal collected by the microphone.
  • the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each microphone.
  • the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each microphone.
  • the to-be-processed audio signal is obtained based on audio signals collected by a microphone other than the microphone.
  • the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio signal is the first audio component using each frequency.
  • the audio component is transformed;
  • the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
  • the audio signal to be processed is divided into multiple audio frames, and the audio component is an audio component contained in the audio frame;
  • a new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
  • a new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
  • the number of sampling points included in the audio frame is a power of two.
  • the audio components corresponding to the frequencies contained in the audio frame are determined by fast Fourier transform FFT.
  • the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
  • the processor when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
  • the processor uses each of the first audio frames to synthesize the first audio signal, it is specifically configured to combine each of the first audio frames through overlap-add processing, Obtaining the first audio signal;
  • the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
  • the processor is further configured to, before performing processing by the overlap-add method, eliminate the amplitude distortion at both ends of the first audio frame and the second audio frame, respectively.
  • the processor is specifically configured to separately correct the first audio frame and the second audio frame when respectively eliminating the distortion of the amplitudes at both ends of the first audio frame and the second audio frame. Add synthesis window.
  • the sound source direction includes: a circumferential angle and/or a pitch angle.
  • the processor is further configured to determine a sound source distance of each of the audio components, where the sound source distance is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
  • the processor is further configured to obtain designated characteristic parameters of the user's binaural and binaural periphery, where the designated characteristic parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
  • the specified characteristic parameter is obtained by performing image recognition on the user.
  • the designated characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
  • FIG. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application.
  • the recording device includes: at least two microphones 601, a memory 602, and a processor 603;
  • the microphone 601 is used to collect audio signals
  • the memory 602 is used to store computer programs
  • the processor 603 is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:
  • the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
  • the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  • the difference information of the sound reception response includes a first transfer coefficient and a second transfer coefficient
  • the first transfer coefficient is determined according to a sound source direction, frequency, and a first correspondence relationship corresponding to the audio component.
  • the second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence.
  • the difference between the first correspondence and the second correspondence is used to characterize the first radio
  • the first correspondence relationship is obtained by respectively measuring the sound reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
  • the second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
  • the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is established according to preselected characteristic parameters.
  • the preselected characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
  • the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each microphone.
  • the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
  • the audio signal to be processed is obtained based on the audio signal collected by the microphone.
  • the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each microphone.
  • the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each microphone.
  • the to-be-processed audio signal is obtained based on audio signals collected by a microphone other than the microphone.
  • the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio signal is the first audio component using each frequency.
  • the audio component is transformed;
  • the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
  • the audio signal to be processed is divided into multiple audio frames, and the audio component is an audio component contained in the audio frame;
  • a new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
  • a new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
  • the number of sampling points included in the audio frame is a power of two.
  • the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform FFT.
  • the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
  • the processor when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
  • the processor uses each of the first audio frames to synthesize the first audio signal, it is specifically configured to combine each of the first audio frames through overlap-add processing, Obtaining the first audio signal;
  • the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
  • the processor is further configured to, before performing processing by the overlap-add method, eliminate the amplitude distortion at both ends of the first audio frame and the second audio frame, respectively.
  • the processor is specifically configured to separately correct the first audio frame and the second audio frame when respectively eliminating the distortion of the amplitudes at both ends of the first audio frame and the second audio frame. Add synthesis window.
  • the sound source direction includes: a circumferential angle and/or a pitch angle.
  • the processor is further configured to determine a sound source distance of each of the audio components, where the sound source distance is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
  • the processor is further configured to obtain designated characteristic parameters of the user's binaural and binaural periphery, where the designated characteristic parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
  • the specified characteristic parameter is obtained by performing image recognition on the user.
  • the designated characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
  • the recording device can be provided with a port for connecting with other external devices, and by connecting with other devices through this port, audio signals can be obtained from other devices, and the to-be-processed audio signals can be based on the audio signals obtained by the other devices. definite.
  • the recording device is an electronic device with a recording function, which can specifically be any of the following: a mobile phone, a camera, a video camera, a sports camera, a pan-tilt camera, a speaker, a VR headset, a monitor, a voice recorder, and a microphone.
  • the embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, any of the implementation manners provided in the embodiments of the present application can be implemented.
  • the audio signal processing method under.
  • the embodiments of the present application may adopt the form of a computer program product implemented on one or more storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing program codes.
  • Computer usable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology.
  • the information can be computer-readable instructions, data structures, program modules, or other data.
  • Examples of computer storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology
  • CD-ROM compact disc
  • DVD digital versatile disc
  • Magnetic cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

An audio signal processing method, an audio processing device, and a recording apparatus. The method comprises: acquiring an audio signal to be processed (S301), the audio signal comprising audio components of multiple frequencies; determining a corresponding sound source direction for each of the multiple audio components (S302); determining, according to the sound source directions, sound-reception response difference information corresponding to the respective audio components (S303), wherein the reception response difference information comprises an amplitude response difference and/or a phase response difference between a first sound reception device and a second sound reception device when responding to audio components of the same frequency transmitted from the same direction; and modulating, according to the sound-reception response difference information, the respective audio components of the corresponding frequencies in the audio signal, and generating a first audio signal to be played by the first sound reception device and a second audio signal to be played by the second sound reception device (S304). The audio signal processing method solves the technical problem in which audio recorded by existing recording apparatuses has poor stereo performance.

Description

音频信号处理方法、音频处理装置及录音设备Audio signal processing method, audio processing device and recording equipment 技术领域Technical field
本申请涉及信号处理技术领域,尤其涉及一种音频信号处理方法、音频处理装置、录音设备及计算机可读存储介质。This application relates to the field of signal processing technology, and in particular to an audio signal processing method, audio processing device, recording device, and computer-readable storage medium.
背景技术Background technique
录音功能在很多电子产品上都有配备,比如录音笔、相机、摄像机等。为了使录制的音频能够还原现实中人耳所听到的声音的立体感,在这些具有录音功能的电子设备上通常会模拟人的双耳在左右两侧放置麦克风进行录音。如图1所示,图1是现有的一种数码相机的结构示意图,可以看到,数码相机上设置了双麦克风。The recording function is equipped on many electronic products, such as voice recorders, cameras, camcorders and so on. In order to enable the recorded audio to restore the stereoscopic effect of the sound heard by human ears in reality, these electronic devices with recording functions usually simulate human ears and place microphones on the left and right sides for recording. As shown in Figure 1, Figure 1 is a schematic structural diagram of an existing digital camera. It can be seen that the digital camera is equipped with dual microphones.
然而,即便设置了左右两个麦克风进行录音,但双麦克风的结构布局与人双耳的结构布局仍然有明显的差异,因此,录制的音频仍然与人耳实际听到的声音在立体感上有很大的差距,用户在回放所录制的音频时,并不能准确分辨出声音对应的方向,立体感较差。However, even if the left and right microphones are set for recording, there is still a significant difference between the structural layout of the dual microphones and the structural layout of the human ears. Therefore, the recorded audio is still three-dimensional with the sound actually heard by the human ears. There is a big gap. When the user is playing back the recorded audio, he cannot accurately distinguish the corresponding direction of the sound, and the stereo perception is poor.
发明内容Summary of the invention
为解决上述问题,本申请实施例提供了一种音频信号处理方法、音频处理装置、录音设备及计算机可读存储介质,以解决现有的录音设备所录制的音频立体感差的技术问题。In order to solve the above-mentioned problems, the embodiments of the present application provide an audio signal processing method, an audio processing device, a recording device, and a computer-readable storage medium to solve the technical problem of poor stereo perception of audio recorded by the existing recording device.
本申请实施例第一方面提供了一种音频信号处理方法,包括:The first aspect of the embodiments of the present application provides an audio signal processing method, including:
获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
确定多个所述音频分量中每一个对应的声源方向;根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;Determine the sound source direction corresponding to each of the multiple audio components; determine the sound source difference information corresponding to each audio component according to the sound source direction, and the sound pickup response difference information includes: a first sound receiving organ and a first sound receiving organ The difference in the amplitude response and/or the phase response of the two sound-receiving organs to the audio components of the same frequency transmitted in the same direction;
根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二 收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
本申请实施例第二方面提供了一种音频处理装置,包括:A second aspect of the embodiments of the present application provides an audio processing device, including:
存储器,用于存储计算机程序;Memory, used to store computer programs;
处理器,用于调用所述计算机程序,当所述计算机程序被所述处理器执行时,实现以下步骤:The processor is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:
获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
确定多个所述音频分量中每一个对应的声源方向;根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;Determine the sound source direction corresponding to each of the multiple audio components; determine the sound source difference information corresponding to each audio component according to the sound source direction, and the sound pickup response difference information includes: a first sound receiving organ and a first sound receiving organ The difference in the amplitude response and/or the phase response of the two sound-receiving organs to the audio components of the same frequency transmitted in the same direction;
根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
本申请实施例第三方面提供了一种录音设备,包括:至少两个麦克风、存储器及处理器;A third aspect of the embodiments of the present application provides a recording device, including: at least two microphones, a memory, and a processor;
所述麦克风用于,采集音频信号;The microphone is used to collect audio signals;
所述存储器用于,存储计算机程序;The memory is used to store a computer program;
所述处理器用于,调用所述计算机程序,所述处理器执行所述计算程序时,实现以下步骤:The processor is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:
获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
利用所述麦克风采集的音频信号,通过声源定位算法确定多个所述音频分量中每一个对应的声源方向;根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;Using the audio signal collected by the microphone, determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm; determine the difference information of the reception response corresponding to each audio component according to the sound source direction, The difference information of the sound reception response includes: the amplitude response difference and/or the phase response difference of the first sound reception organ and the second sound reception organ to the audio components of the same frequency transmitted in the same direction;
根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
本申请实施例第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面提供的任一种所述音频信号处理方法。The fourth aspect of the embodiments of the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium. The audio signal processing method.
本申请实施例提供的音频信号处理方法,对待处理音频信号的每个音频分量,根 据该音频分量对应的收音响应差异信息进行调制,其中,收音响应差异信息是表征第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异的信息,因此,在根据收音响应差异信息进行调制之后,生成的第一音频信号与第二音频信号之间也可以体现这种幅度响应差异和/或相位响应差异,通过这种响应差异,可以构建出足够的空间感,使人耳可以清晰的分辨出声音的方向。In the audio signal processing method provided by the embodiment of the present application, each audio component of the audio signal to be processed is modulated according to the difference information of the reception response corresponding to the audio component, wherein the difference information of the reception response represents the first sound collection organ and the second collection The organ’s amplitude response difference and/or phase response difference information for the audio components of the same frequency transmitted in the same direction. Therefore, after modulating according to the difference information of the reception response, the generated first audio signal and the second audio signal This difference in amplitude response and/or phase response can also be reflected. Through this difference in response, a sufficient sense of space can be constructed so that the human ear can clearly distinguish the direction of the sound.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.
图1是现有的一种数码相机的结构示意图。FIG. 1 is a schematic diagram of the structure of an existing digital camera.
图2是通过人工头录制声音的场景示意图。Figure 2 is a schematic diagram of a scene of recording sound through an artificial head.
图3是本申请实施例提供的一种示例性的音频信号处理方法的流程图。Fig. 3 is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application.
图4是本申请实施例提供的一种示例性的音频信号处理方法的算法框图。Fig. 4 is an algorithm block diagram of an exemplary audio signal processing method provided by an embodiment of the present application.
图5是本申请实施例提供的一种示例性的音频处理装置的结构示意图。Fig. 5 is a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application.
图6是本申请实施例提供的一种示例性的录音设备的结构示意图。Fig. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
现有的具有录音功能的电子设备,为了能够录制具有立体感的音频,在其上模仿人的双耳设置了左右两个麦克风。但由于双麦克风的结构布局与人双耳的结构布局仍有很大的差异,因此双麦克风录制的音频并没有足够的立体感,不能准确清晰的还原实际当中声音对应的方向。In order to be able to record audio with a stereoscopic effect, the existing electronic equipment with a recording function is equipped with two microphones on the left and right to imitate the human ears. However, because the structural layout of dual microphones is still very different from the structural layout of human ears, the audio recorded by dual microphones does not have enough stereoscopic effect, and cannot accurately and clearly restore the corresponding direction of the actual sound.
为解决上述问题,一种可行的方案是,可以在真人、人工头或类人工头装置的双耳处设置麦克风进行录音,如此,由于麦克风所处的环境与人耳相同,因此能够录制 到有真实立体感的立体声。可以参见图2,图2是通过人工头录制声音的场景示意图。In order to solve the above problems, a feasible solution is to set up microphones at both ears of real people, artificial heads or similar artificial head devices for recording. In this way, because the microphones are in the same environment as the human ears, they can be recorded. Real stereo sound. Refer to Fig. 2, which is a schematic diagram of a scene of recording sound through an artificial head.
上述的方案虽然可行,但其需要借助真人、人工头或类人工头装置,因此录音的简便程度大大下降,且如人工头等装置也存在便携性差、使用成本高的缺点。Although the above solution is feasible, it requires the help of a real person, an artificial head or a similar artificial head device, so the ease of recording is greatly reduced, and devices such as artificial heads also have the disadvantages of poor portability and high cost of use.
基于上述问题,本申请实施例提供了一种音频信号处理方法,使用该方法对录制的音频信号进行处理后,可以得到立体感增强的音频信号,并且该方法不需要真人、人工头等装置的配合,因此也没有不简便、便携性差、成本高的缺点。可以参见图3,图3是本申请实施例提供的一种示例性的音频信号处理方法的流程图。该方法包括以下步骤:Based on the above problems, the embodiments of the present application provide an audio signal processing method. After using this method to process the recorded audio signal, an audio signal with enhanced stereo perception can be obtained, and the method does not require the cooperation of real persons, artificial heads, etc. Therefore, there are no shortcomings of inconvenience, poor portability, and high cost. Refer to FIG. 3, which is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application. The method includes the following steps:
S301:获取待处理音频信号。S301: Acquire an audio signal to be processed.
其中,待处理音频信号包括多个频率的音频分量。Among them, the audio signal to be processed includes audio components of multiple frequencies.
S302:确定多个音频分量中每一个对应的声源方向。S302: Determine the sound source direction corresponding to each of the multiple audio components.
S303:根据确定出的声源方向,确定每一个音频分量对应的收音响应差异信息。S303: According to the determined sound source direction, determine the difference information of the reception response corresponding to each audio component.
其中,收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。Wherein, the difference in sound reception response information includes: difference in amplitude response and/or difference in phase response of the first sound receiving organ and the second sound receiving organ to audio components of the same frequency transmitted in the same direction.
S304:根据确定出的收音响应差异信息,对待处理音频信号中相应频率的音频分量进行调制,以生成用于在第一收音器官播放的第一音频信号,和用于在第二收音器官播放的第二音频信号。S304: According to the determined difference information of the radio response, modulate the audio component of the corresponding frequency in the audio signal to be processed to generate a first audio signal for playing in the first radio organ and a signal for playing in the second radio organ. The second audio signal.
需要说明的是,收音器官具体可以是人的耳朵,比如第一收音器官可以是左耳,第二收音器官可以是右耳。当然,收音器官也可以是人工耳、助听器等仿人耳的电子设备。而在具体场景中,收音器官可以对应播放设备的一个或多个声道。比如,若播放设备是耳机,第一收音器官可以对应耳机的左声道,第二收音器官可以对应耳机的右声道。若播放设备是音响套组,第一收音器官还可以对应音箱套组中的左声道音箱(若是四声道立体声,第一收音器官可以对应左前和左后音箱),第二收音器官可以对应音箱套组中的右声道音箱(若是四声道立体声,第二收音器官可以对应右前和右后音箱)。只要具有两个或以上声道的播放设备,都具有与本申请所述的第一收音器官与第二收音器官对应的播放声道。It should be noted that the sound-receiving organ may specifically be a human ear. For example, the first sound-receiving organ may be the left ear, and the second sound-receiving organ may be the right ear. Of course, the sound-receiving organ can also be an electronic device that mimics the human ear, such as artificial ears and hearing aids. In a specific scenario, the sound receiving organ may correspond to one or more channels of the playback device. For example, if the playback device is an earphone, the first sound receiving organ may correspond to the left channel of the earphone, and the second sound receiving organ may correspond to the right channel of the earphone. If the playback device is an audio kit, the first radio organ can also correspond to the left channel speaker in the speaker kit (if it is a four-channel stereo, the first radio organ can correspond to the left front and left rear speakers), and the second radio organ can correspond to The right channel speaker in the speaker set (if it is a four-channel stereo, the second radio organ can correspond to the right front and right rear speakers). As long as a playback device with two or more sound channels has a playback channel corresponding to the first sound receiving organ and the second sound receiving organ described in this application.
申请人发现,人耳在分辨声音的方向时,主要依靠双耳对声音的响应差异。当一个声音传播过来时,人的左右耳都能收听到该声音,而由于左右耳所处的位置不同,因此两者之间的响应也有不同。比如,若声源在左边,左耳会先于右耳收听到声音,并且左耳听到的声音幅度可能也大于右耳。通过左右耳之间的响应的不同,人可以分辨出声音的方向,相应的,若能够将这种响应差异体现在两个音频的差异上,并且将 两个音频分别在人的左耳和右耳播放,则人在收听时,可以感知到清晰的立体感,能够准确分辨出其中声音的方向。The applicant found that when the human ear distinguishes the direction of sound, it mainly relies on the difference in the response of the two ears to the sound. When a sound is transmitted, the left and right ears of a person can hear the sound, and because the positions of the left and right ears are different, the response between the two is also different. For example, if the sound source is on the left, the left ear will hear the sound before the right ear, and the amplitude of the sound heard by the left ear may be greater than that of the right ear. Through the difference in the response between the left and right ears, people can distinguish the direction of the sound. Correspondingly, if this response difference can be reflected in the difference between the two audio frequencies, and the two audio frequencies are respectively in the human left and right ears. With ear playback, people can perceive a clear three-dimensional sense when listening, and can accurately distinguish the direction of the sound.
申请人还发现,由于人双耳之间的响应差异可以分为幅值响应差异与相位响应差异,而对于不同频率的声音,人耳在分辨其方向时,作为主要依据的可能是不同的响应差异。比如,对于低频声音,由于其绕射能力强,低频声音容易绕过障碍物到达双耳,因此双耳之间的幅度差异较小,人耳在分辨低频声音的方向时主要依靠相位差异。而对于高频声音,由于其在双耳处产生的相位差已经混叠,因此难以通过相位差异分辨其方向,但高频声音的绕射能力弱,且对人的头部、肩膀等部分有特定的衍射效果,因此可以通过幅度差异来分辨其方向。可见,为增强音频信号的立体感,对不同频率的音频信号可以做不同处理,具体在本申请实施例中,可以针对音频信号不同频率的分量做不同的处理。The applicant also found that the response difference between human ears can be divided into amplitude response difference and phase response difference. For sounds of different frequencies, when the human ear distinguishes its direction, the main basis may be different responses. difference. For example, for low-frequency sound, due to its strong diffraction ability, low-frequency sound easily bypasses obstacles to reach both ears. Therefore, the amplitude difference between the two ears is small. The human ear mainly relies on the phase difference when distinguishing the direction of low-frequency sound. As for high-frequency sound, because the phase difference generated at the ears has been aliased, it is difficult to distinguish its direction through the phase difference. However, the diffraction ability of high-frequency sound is weak, and it affects the human head, shoulders and other parts. Specific diffraction effect, so its direction can be distinguished by the difference in amplitude. It can be seen that, in order to enhance the stereoscopic effect of the audio signal, different processing can be performed on audio signals of different frequencies. Specifically, in this embodiment of the present application, different processing can be performed on components of different frequencies of the audio signal.
在步骤S301中,获取待处理音频信号之后,可以分析该待处理音频信号的频率成分,确定出该待处理音频信号中包括的多个频率的音频分量,以便于后续针对不同频率的音频分量进行不同的处理。确定待处理音频信号的频率成分,有多种可行的方式。在一种实施中,可以对该待处理音频信号进行傅里叶变换,得到该待处理音频信号的频谱,确定出该待处理音频信号中的各频率的音频分量。除了使用傅里叶变换之外,还可以采用滤波法、子代分析法等,同样可以确定出该待处理音频信号中包括的多个频率的音频分量。In step S301, after acquiring the audio signal to be processed, the frequency components of the audio signal to be processed can be analyzed to determine the audio components of multiple frequencies included in the audio signal to be processed, so as to facilitate subsequent processing of audio components of different frequencies. Different treatment. There are many feasible ways to determine the frequency components of the audio signal to be processed. In an implementation, Fourier transform can be performed on the audio signal to be processed to obtain the frequency spectrum of the audio signal to be processed, and the audio components of each frequency in the audio signal to be processed can be determined. In addition to using the Fourier transform, a filtering method, a progeny analysis method, etc. can also be used, and the audio components of multiple frequencies included in the to-be-processed audio signal can also be determined.
在步骤S302中,可以针对待处理音频信号的各频率的音频分量,确定其对应的声源方向。在确定声源方向时,可以利用至少两个麦克风(如麦克风阵列)采集的音频信号,通过声源定位算法确定(为方便指代,下面将用于确定声源方向的麦克风称为定向麦克风)。具体的,可以根据各个定向麦克风采集的音频信号中相应频率的音频分量进行确定。可选的声源定位算法有多种,比如有波束形成算法、到达时间差估计算法、差分麦克风阵列算法。In step S302, the corresponding sound source direction may be determined for the audio components of each frequency of the audio signal to be processed. When determining the direction of the sound source, the audio signals collected by at least two microphones (such as a microphone array) can be used to determine the sound source location algorithm (for ease of reference, the microphone used to determine the sound source direction is referred to as a directional microphone below) . Specifically, it can be determined according to the audio components of the corresponding frequencies in the audio signals collected by each directional microphone. There are many optional sound source localization algorithms, such as beamforming algorithm, arrival time difference estimation algorithm, and differential microphone array algorithm.
在步骤S303中,可以根据确定出的音频分量的声源方向,确定该音频分量对应的收音响应差异信息,而确定出的收音响应差异信息可以用于在步骤S304中对相应频率的音频分量进行调制。In step S303, according to the determined sound source direction of the audio component, the radio response difference information corresponding to the audio component can be determined, and the determined radio response difference information can be used in step S304 for the audio component of the corresponding frequency. modulation.
收音响应差异信息是用于表征第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异的信息,其可以有多种表现形式,比如,在一种实施中,收音响应差异信息可以包括第一传递系数与第二传递系数。Radio response difference information is information used to characterize the difference in amplitude response and/or phase response of the first radio organ and the second radio organ to the audio components of the same frequency transmitted in the same direction. It can have a variety of manifestations, such as In an implementation, the difference information of the reception response may include a first transfer coefficient and a second transfer coefficient.
其中,第一传递系数可以根据音频分量对应的声源方向、频率与第一对应关系确 定,具体的,可以将确定的声源方向与相应的频率代入第一对应关系,计算得到第一传递系数。第二传递系数可以根据音频分量对应的声源方向、频率与第二对应关系确定,具体的,可以将确定的声源方向与相应的频率代入第二对应关系,计算得到第二传递系数。Wherein, the first transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the first correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the first correspondence relationship, and the first transfer coefficient may be calculated . The second transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the second correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the second correspondence relationship to calculate the second transfer coefficient.
进一步的,在步骤S304中,可以通过第一传递系数对待处理音频信号中相应频率的音频分量进行调制,以生成用于在第一收音器官播放的第一音频信号,通过第二传递系数对待处理音频信号中相应频率的音频分量进行调制,以生成用于在第二收音器官播放的第二音频信号。Further, in step S304, the audio component of the corresponding frequency in the audio signal to be processed may be modulated by the first transfer coefficient to generate a first audio signal for playing in the first radio organ, and the second transfer coefficient may be used for the audio component to be processed. The audio component of the corresponding frequency in the audio signal is modulated to generate a second audio signal for playing in the second radio organ.
需要说明的是,上述的第一对应关系与第二对应关系之间的差别可以体现第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。如此,基于第一对应关系确定的第一传递系数与基于第二对应关系确定的第二传递系数,两传递系数之间的差异也可以体现出上述的第一收音器官与第二收音器官之间的响应差异。进而,基于第一传递系数生成的第一音频信号与基于第二传递系数生成的第二音频信号,两音频信号之间的差异也可以体现这种第一收音器官与第二收音器官之间的响应差异,则当第一音频信号被第一收音器官收听,第二音频信号被第二收音器官收听时,可以产生清晰的立体感。It should be noted that the difference between the above-mentioned first correspondence and the second correspondence may reflect the difference in amplitude response and/or phase of the first sound-receiving organ and the second sound-receiving organ to the audio components of the same frequency transmitted in the same direction. Respond to differences. In this way, the difference between the first transfer coefficient determined based on the first correspondence relationship and the second transfer coefficient determined based on the second correspondence relationship can also reflect the difference between the aforementioned first sound receiving organ and the second sound receiving organ. The response difference. Furthermore, the difference between the first audio signal generated based on the first transfer coefficient and the second audio signal generated based on the second transfer coefficient can also reflect the difference between the first sound receiving organ and the second sound receiving organ. In response to the difference, when the first audio signal is heard by the first radio organ and the second audio signal is heard by the second radio organ, a clear three-dimensional feeling can be produced.
关于第一对应关系与第二对应关系,具体的,第一对应关系可以是声源方向、频率与第一传递系数的对应关系,第二对应关系可以是声源方向、频率与第二传递系数的对应关系。比如,在一种实施中,若第一对应关系与第二对应关系均用函数的形式表示,则第一对应关系与第二对应关系均可以是
Figure PCTCN2020085719-appb-000001
的形式,其中
Figure PCTCN2020085719-appb-000002
与θ属于表征声源方向的参数,
Figure PCTCN2020085719-appb-000003
可以表示俯仰角,θ可以表示周角,而k可以对应频谱中的频率序号,即k实质上对应不同的频率。在确定某一频率的音频分量的传递系数时,输入该音频分量的声源方向
Figure PCTCN2020085719-appb-000004
与θ以及对应的频率序号k,即可确定该音频分量对应的传递系数。
Regarding the first correspondence and the second correspondence, specifically, the first correspondence may be the correspondence between the sound source direction, frequency, and the first transfer coefficient, and the second correspondence may be the sound source direction, frequency, and the second transfer coefficient. The corresponding relationship. For example, in an implementation, if the first correspondence relationship and the second correspondence relationship are both expressed in the form of functions, both the first correspondence relationship and the second correspondence relationship may be
Figure PCTCN2020085719-appb-000001
In the form of
Figure PCTCN2020085719-appb-000002
And θ belong to the parameters that characterize the direction of the sound source,
Figure PCTCN2020085719-appb-000003
It can represent the pitch angle, θ can represent the circumference angle, and k can correspond to the frequency number in the frequency spectrum, that is, k corresponds to different frequencies substantially. When determining the transfer coefficient of an audio component of a certain frequency, input the sound source direction of the audio component
Figure PCTCN2020085719-appb-000004
With θ and the corresponding frequency number k, the transfer coefficient corresponding to the audio component can be determined.
与双耳的响应差异包括相位差异与幅值差异相对应,第一对应关系与第二对应关系也可以包括幅值与相位两部分。在一个实施例中,若将第一对应关系对应左耳,且认为左耳的响应是标准响应,则某一频率的音频分量的第一传递系数可以是T L=1,而其对应右耳的第二传递系数,在一个例子中,可以是
Figure PCTCN2020085719-appb-000005
其中,α用于表示右耳相对于左耳的幅度增益,
Figure PCTCN2020085719-appb-000006
用于表示双耳之间的相位差。
Corresponding to the response difference between the two ears including the phase difference and the amplitude difference, the first correspondence and the second correspondence may also include the amplitude and the phase. In one embodiment, if the first corresponding relationship corresponds to the left ear, and the response of the left ear is considered to be the standard response, the first transfer coefficient of the audio component of a certain frequency may be T L =1, and it corresponds to the right ear The second transfer coefficient, in one example, can be
Figure PCTCN2020085719-appb-000005
Among them, α is used to represent the amplitude gain of the right ear relative to the left ear,
Figure PCTCN2020085719-appb-000006
Used to indicate the phase difference between the ears.
对于第一对应关系与第二对应关系的确定,有多种可选的实施方式。在一种实施中,可以针对不同方向的声源发出的各频率的测试信号分别测量收音器官的收音响应 得到。比如,在一个具体的例子中,可以在人工头或真人的两个耳道处设置麦克风,在各个不同的方向分别播放各个频率的测试信号,针对每个方向、每个频率的测试信号,分别记录两个麦克风采集的响应信号。进一步的,可以将响应信号与测试信号的比值作为因变量,将声源方向和频率作为自变量,拟合出声源方向、频率与传递系数的对应关系。可以将声源方向、频率与位于第一收音器官的麦克风采集的响应信号的对应关系作为第一对应关系,将声源方向、频率与位于第二收音器官的麦克风采集的响应信号的对应关系作为第二对应关系。For the determination of the first corresponding relationship and the second corresponding relationship, there are multiple optional implementation manners. In an implementation, the test signals of various frequencies emitted by sound sources in different directions can be obtained by measuring the radio responses of the radio organs. For example, in a specific example, microphones can be set at the two ear canals of an artificial head or a real person, and test signals of various frequencies can be played in different directions. For the test signals of each direction and frequency, separate Record the response signals collected by the two microphones. Further, the ratio of the response signal to the test signal can be used as a dependent variable, and the sound source direction and frequency can be used as independent variables to fit the corresponding relationship between the sound source direction, frequency and the transfer coefficient. The corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the first sound receiving organ can be regarded as the first correspondence, and the corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the second sound receiving organ can be regarded as The second correspondence.
在另一种实施中,也可以通过声源到人双耳的传播模型确定对应关系。比如,可以预先选出被认为会对声音从声源到人双耳的传播过程造成影响的特征参数,这些特征参数包括但不限于以下:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。根据预选的特征参数,结合声学传播原理,可以推导出第一对应关系与第二对应关系。In another implementation, the corresponding relationship can also be determined by the propagation model of the sound source to the human ears. For example, the characteristic parameters that are considered to affect the propagation process of sound from the sound source to the human ears can be selected in advance. These characteristic parameters include but are not limited to the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, Shoulder feature parameters, cheek feature parameters, hair feature parameters. According to the pre-selected characteristic parameters, combined with the principle of acoustic propagation, the first correspondence and the second correspondence can be derived.
在通过传递系数对相应频率的音频分量进行调制时,具体的,在一种可选的实施方式中,可以将第一传递系数与相应频率的音频分量相乘,得到该相应频率的新的音频分量,该新的音频分量可以称为第一音频分量。相应的,可以通过第二传递系数对待处理音频信号中相应频率的音频分量进行调制,也可以得到相应频率的新的音频分量,该通过第二传递系数得到的新的音频分量可以称为第二音频分量。其中,得到的各个第一音频分量可以用于后续生成第一音频信号,得到的各个第二音频分量可以用于后续生成第二音频信号,具体的生成过程在后文中有详细说明。When the audio component of the corresponding frequency is modulated by the transfer coefficient, specifically, in an optional implementation manner, the first transfer coefficient may be multiplied by the audio component of the corresponding frequency to obtain the new audio component of the corresponding frequency Component, this new audio component may be referred to as the first audio component. Correspondingly, the audio component of the corresponding frequency in the audio signal to be processed can be modulated by the second transfer coefficient, and a new audio component of the corresponding frequency can also be obtained. The new audio component obtained by the second transfer coefficient can be called the second Audio component. Wherein, each obtained first audio component can be used to subsequently generate a first audio signal, and each obtained second audio component can be used to subsequently generate a second audio signal. The specific generation process is described in detail later.
需要说明的是,上述在通过传递系数对音频分量进行调制时,由于是在频域上进行处理,因此可以分别用第一传递系数与第二传递系数与待处理音频信号中相应频率的音频分量相乘。但在另一种实施方式中,也可以在时域上进行调制,比如,可以将第一传递系数与第二传递系数分别转换为时域的第一传递系数和时域的第二传递系数,再进一步通过时域的第一传递系数与待处理音频信号进行卷积计算,通过时域的第二传递系数与待处理音频信号进行卷积计算,生成需要的第一音频信号与第二音频信号。It should be noted that when the audio components are modulated by the transfer coefficients, the first transfer coefficient and the second transfer coefficient can be used to modulate the audio components in the frequency domain, respectively. Multiply. However, in another embodiment, modulation can also be performed in the time domain. For example, the first transfer coefficient and the second transfer coefficient can be converted into the first transfer coefficient in the time domain and the second transfer coefficient in the time domain, respectively. Furthermore, the first transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed, and the second transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed to generate the required first audio signal and second audio signal. .
需要说明的是,本申请实施例提供的音频信号处理方法,可以应用在录音场景下,并且,在该录音场景下,需要至少两个麦克风同时进行录制,比如,可以通过包括多个麦克风的麦克风阵列进行录制。可以理解,尽管各个麦克风录制的音频信号对应的内容相同(即实际中来自相同的声源),但由于各个麦克风所处的位置不同,因此所录制的音频信号也有所不同。It should be noted that the audio signal processing method provided by the embodiments of the present application can be applied in a recording scene, and in this recording scene, at least two microphones are required to record at the same time, for example, through a microphone including multiple microphones. Array for recording. It can be understood that although the audio signals recorded by the microphones correspond to the same content (that is, they actually come from the same sound source), the recorded audio signals are also different due to the different positions of the microphones.
待处理音频信号作为传递系数的作用对象,在选择时可以相对灵活。比如,在一 种实施中,待处理音频信号可以是根据定向麦克风(即上述的用于确定声源方向的麦克风)采集的音频信号确定的。而在另一种实施中,待处理音频信号可以是根据定向麦克风以外的其他麦克风采集的音频信号确定的。比如,在一种具体例子中,麦克风阵列可以包括6个麦克风,可以选择其中的3个麦克风作为定向麦克风,而待处理音频信号可以是根据另外3个麦克风采集的音频信号确定的。又比如,在另一个例子中,待处理音频信号还可以是根据麦克风阵列外的其它麦克风采集的音频信号确定的,其他的麦克风还可以是其他设备上的麦克风。The audio signal to be processed is used as the object of the transfer coefficient and can be relatively flexible in selection. For example, in an implementation, the audio signal to be processed may be determined based on the audio signal collected by a directional microphone (that is, the aforementioned microphone for determining the direction of the sound source). In another implementation, the audio signal to be processed may be determined based on the audio signal collected by a microphone other than a directional microphone. For example, in a specific example, the microphone array may include 6 microphones, 3 of which may be selected as directional microphones, and the audio signal to be processed may be determined based on the audio signals collected by the other 3 microphones. For another example, in another example, the audio signal to be processed may also be determined based on audio signals collected by other microphones outside the microphone array, and the other microphones may also be microphones on other devices.
而为了能够得到质量较高的第一音频信号与第二音频信号,待处理音频信号可以选择所录制的音频信号中信噪比较高的。一种可选的实施方式是,待处理音频信号可以选择麦克风中采集的音频信号中信噪比最高的音频信号。由于传递系数的直接作用对象是相应频率的音频分量,因此,在另一种可选的实施方式中,可以将各个定向麦克风采集的音频信号中相同频率的音频分量进行线性组合,从而得到信噪比较高的各频率的音频分量,用于与第一传递系数和第二传递系数结合。In order to obtain the first audio signal and the second audio signal of higher quality, the audio signal to be processed can be selected from the recorded audio signal with a higher signal-to-noise ratio. An optional implementation manner is that the audio signal to be processed may be the audio signal with the highest signal-to-noise ratio among the audio signals collected in the microphone. Since the direct action object of the transfer coefficient is the audio component of the corresponding frequency, in another optional implementation manner, the audio components of the same frequency in the audio signal collected by each directional microphone can be linearly combined to obtain the signal-to-noise The relatively high audio components of each frequency are used to combine with the first transfer coefficient and the second transfer coefficient.
在通过第一传递系数进行调制得到第一音频分量、通过第二传递系数进行调制得到第二音频分量后,为了进一步生成时域上的第一音频信号和第二音频信号,需要进行频域到时域的变换。具体的,可以利用各频率的第一音频分量进行频域到时域的变换,得到第一音频信号,利用各频率的第二音频分量进行频域到时域的变换,得到第二音频信号。而从频域到时域的变换,具体实施时,可以使用傅里叶逆变换。After the first audio component is modulated by the first transfer coefficient and the second audio component is modulated by the second transfer coefficient, in order to further generate the first audio signal and the second audio signal in the time domain, it is necessary to perform the frequency domain to Time domain transformation. Specifically, the first audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the first audio signal, and the second audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the second audio signal. The transformation from the frequency domain to the time domain can be implemented using inverse Fourier transform.
考虑到获取的待处理音频信号并不一定是平稳的信号,而在后续分析其频谱时,尤其是通过傅里叶变换分析频谱时,由于傅里叶变换要求输入信号是平稳的,因此,在一种实施方式中,可以按照设定的帧长,对待处理音频信号进行分帧处理,将待处理音频信号分为一个个音频帧。对于每一个音频帧而言,由于其帧长较短,因此可以认为音频帧内信号是平稳的。Considering that the acquired audio signal to be processed is not necessarily a stable signal, and when the frequency spectrum is analyzed later, especially when the frequency spectrum is analyzed by Fourier transform, since the Fourier transform requires the input signal to be stable, In an implementation manner, the audio signal to be processed may be divided into frames according to a set frame length, and the audio signal to be processed may be divided into audio frames. For each audio frame, due to its short frame length, it can be considered that the signal within the audio frame is stable.
在待处理音频信号被分帧处理成音频帧后,相应的,后续的处理也将针对音频帧进行,比如可以对音频帧进行频谱分析,得到各频率的音频分量;利用各频率的第一音频分量进行变换得到的是新的第一音频帧;利用各频率的第二音频分量进行变换得到的是新的第二音频帧。进一步的,将得到的各个新的第一音频帧进行合成,可以得到第一音频信号,将得到的各个新的第二音频帧进行合成,可以得到第二音频信号。After the audio signal to be processed is processed into audio frames by framing, the subsequent processing will also be performed on the audio frames. For example, the audio frames can be analyzed to obtain the audio components of each frequency; the first audio of each frequency can be used The component is transformed to obtain a new first audio frame; the second audio component of each frequency is used to transform to obtain a new second audio frame. Further, each obtained new first audio frame is synthesized to obtain a first audio signal, and each obtained new second audio frame is synthesized to obtain a second audio signal.
在设定音频帧的帧长时,通常可以根据经验较为灵活的设定。比如,若麦克风的采样频率是fs,一个音频帧中包括的采样点的个数可以用N表示,帧长N可以在0.005fs<N<fs的范围中选取。在一种可选的实施方式中,可以设定的N是2的幂次方, 即使得音频帧中包含的采样点的个数是2的幂次方,如此,在后续的频谱分析时,可以采用快速傅里叶变换FFT进行加速计算。When setting the frame length of an audio frame, you can usually set it flexibly based on experience. For example, if the sampling frequency of the microphone is fs, the number of sampling points included in an audio frame can be represented by N, and the frame length N can be selected in the range of 0.005fs<N<fs. In an optional implementation manner, N that can be set is a power of 2, even if the number of sampling points included in the audio frame is a power of 2, so in the subsequent spectrum analysis, The fast Fourier transform FFT can be used to accelerate the calculation.
考虑到在对音频信号进行分帧(分帧操作也可称为信号截断)时,往往无法做到周期性截断,即截断后的音频帧往往是非周期性信号,此时若直接进行傅里叶变换将出现频谱泄漏的现象。因此,在一种实施中,可以在进行傅里叶变换之前,将音频帧调制为周期性信号。而调制为周期性信号的具体做法,可以是对音频帧加分析窗,即将音频帧与分析窗的窗函数相乘。该分析窗的窗函数可以是正弦窗、汉宁窗等,在此不做具体限定。Considering that when framing audio signals (framing operations can also be called signal truncation), periodic truncation is often not possible, that is, the truncated audio frames are often non-periodic signals. At this time, if you directly perform Fourier The conversion will show the phenomenon of spectrum leakage. Therefore, in one implementation, the audio frame can be modulated into a periodic signal before the Fourier transform is performed. The specific method of modulating into a periodic signal may be to add an analysis window to the audio frame, that is, to multiply the audio frame and the window function of the analysis window. The window function of the analysis window can be a sine window, a Hanning window, etc., which is not specifically limited here.
而在将变换得到的新的音频帧合成新的音频信号时,由于相邻音频帧之间有重叠的部分,因此,可以通过重叠相加法Overlap-add对各个新的音频帧进行处理,具体的,即可以将前后音频帧重叠的位置进行信号叠加,得到的叠加后的各个音频帧,叠加后的各个音频帧可以直接组合,从而得到需要的音频信号。进一步的,考虑到直接对音频帧进行重叠累加,重叠的部分可能有幅值突变,为使重叠相加后得到的音频信号是平滑的,在对各个新的音频帧进行Overlap-add处理之前,可以先对音频帧两端的幅值进行畸变消除。在具体实施时,可以对变换得到的新的音频帧进行合成窗的加窗处理,利用加合成窗后的音频帧进行重叠累加。合成窗的窗函数也有多种选择,比如是正弦窗或者汉宁窗等。When the new audio frame obtained by the transformation is synthesized into a new audio signal, since there are overlapping parts between adjacent audio frames, each new audio frame can be processed through the overlap-add method. Yes, that is, the overlapping positions of the front and rear audio frames can be superimposed, and the superimposed audio frames can be directly combined to obtain the required audio signal. Further, considering that the audio frames are directly overlapped and accumulated, the overlapped part may have a sudden change in amplitude. In order to make the audio signal obtained after overlap and addition smooth, before performing overlap-add processing on each new audio frame, The amplitude at both ends of the audio frame can be distorted first. In specific implementation, the new audio frame obtained by the transformation may be windowed by the synthesis window, and the audio frame after the synthesis window may be used for overlap accumulation. There are also many options for the window function of the synthesis window, such as a sine window or a Hanning window.
下面提供一个相对详尽的实施例,可以参见图4,图4是本申请实施例提供的一种示例性的音频信号处理方法的算法框图。A relatively detailed embodiment is provided below, which can be referred to FIG. 4, which is an algorithm block diagram of an exemplary audio signal processing method provided in an embodiment of the present application.
在录音场景下,通过麦克风阵列可以采集到多个音频信号。比如麦克风阵列中包含M个麦克风,M≥2,则第m个麦克风采集到的时域的音频信号可以用xm(t)表示,其中m为麦克风序号,m=1,2,…,M,t为采样离散时间序列,t=1,2,…。In the recording scene, multiple audio signals can be collected through the microphone array. For example, the microphone array contains M microphones, and M≥2, the time domain audio signal collected by the m-th microphone can be represented by xm(t), where m is the microphone number, m=1, 2,...,M, t is the sampled discrete time sequence, t=1, 2,....
可以将麦克风阵列中的各个麦克风都作为定向麦克风。分别对各个麦克风采集的时域音频信号xm(t)进行分帧,分帧时,可以提取N个采样点作为一个音频帧,得到时域音频帧xm(n)l,其中,n是一个音频帧内的时间序列,n=1,2,…,N;l是帧序列,l=1,2,…。对时域音频帧xm(n)l进行分析窗的加窗处理,得到x'm(n)l。将加分析窗后的x'm(n)l输入FFT模块,得到时域音频帧的频谱Xm(k)l,其中,k表示离散频谱序列,k=1,2,…,N。Each microphone in the microphone array can be used as a directional microphone. The time domain audio signal xm(t) collected by each microphone is divided into frames. When dividing the frame, N sampling points can be extracted as an audio frame to obtain the time domain audio frame xm(n)l, where n is an audio The time sequence within the frame, n=1, 2,...,N; l is the frame sequence, l=1, 2,.... Perform windowing processing of the analysis window on the time domain audio frame xm(n)l to obtain x'm(n)l. The x'm(n)l added with the analysis window is input to the FFT module to obtain the frequency spectrum Xm(k)l of the time domain audio frame, where k represents a discrete spectrum sequence, and k=1, 2,...,N.
将各个麦克风对应的频谱Xm(k)l输入声源定位模块,在声源定位模块中通过基于麦克风阵列的声源定位算法,可以确定各个频率的音频分量对应的声源方向。具体的,在本实施例中,可以采用波速形成算法,并且,声源方向包括周角θ与俯仰角
Figure PCTCN2020085719-appb-000007
在 本实施例中,可以认为声源来自四周,不区分上下,即认为
Figure PCTCN2020085719-appb-000008
最基本的波束形成算法表示为如下:
The frequency spectrum Xm(k)l corresponding to each microphone is input into the sound source localization module, and the sound source direction corresponding to the audio components of each frequency can be determined through the sound source localization algorithm based on the microphone array in the sound source localization module. Specifically, in this embodiment, the wave velocity formation algorithm can be used, and the sound source direction includes the circumferential angle θ and the pitch angle
Figure PCTCN2020085719-appb-000007
In this embodiment, it can be considered that the sound source comes from all around, without distinguishing between upper and lower, that is,
Figure PCTCN2020085719-appb-000008
The most basic beamforming algorithm is expressed as follows:
Figure PCTCN2020085719-appb-000009
Figure PCTCN2020085719-appb-000009
其中,θi表示声源方向,
Figure PCTCN2020085719-appb-000010
Figure PCTCN2020085719-appb-000011
为离散频率k对应θi方向声源的权重向量,也称为导向向量。经典波束形成算法中,导向向量可以表示为
Figure PCTCN2020085719-appb-000012
ω为离散频率k对应的声源频率,Δτm为θi方向的声源到达第m个麦克风的时间与达到参考点时间的时间差。因此,第k个离散频率的音频分量对应的声源方向为θ(k),即该频率下波束形成输出最大的方向。
Among them, θi represents the direction of the sound source,
Figure PCTCN2020085719-appb-000010
Figure PCTCN2020085719-appb-000011
It is the weight vector of the sound source corresponding to the θi direction of the discrete frequency k, which is also called the steering vector. In the classic beamforming algorithm, the steering vector can be expressed as
Figure PCTCN2020085719-appb-000012
ω is the sound source frequency corresponding to the discrete frequency k, and Δτm is the time difference between the time when the sound source in the θi direction reaches the m-th microphone and the time when it reaches the reference point. Therefore, the sound source direction corresponding to the k-th discrete frequency audio component is θ(k), that is, the direction in which the beamforming output is the largest at this frequency.
Figure PCTCN2020085719-appb-000013
Figure PCTCN2020085719-appb-000013
将确定出的声源方向θ(k)、
Figure PCTCN2020085719-appb-000014
以及离散频率k作为输入量,输入传递系数确定模块,传递系数确定模块中可以将上述的输入量分别代入第一对应关系与第二对应关系,得到第一传递系数
Figure PCTCN2020085719-appb-000015
与第二传递系数
Figure PCTCN2020085719-appb-000016
(在本实施例中,第一对应关系对应左声道,第二对应关系对应右声道)。在立体声重构模块中,可以将第一传递系数
Figure PCTCN2020085719-appb-000017
与待处理音频信号中对应频率的音频分量Xref(k)l相乘,得到各频率的新的第一音频分量XL(k)l,将第二传递系数
Figure PCTCN2020085719-appb-000018
与待处理音频信号中对应频率的音频分量Xref(k)l相乘,得到各频率的新的第二音频分量XR(k)l。
The determined sound source direction θ(k),
Figure PCTCN2020085719-appb-000014
And the discrete frequency k is used as the input quantity, and the transfer coefficient determination module is input. In the transfer coefficient determination module, the above-mentioned input quantities can be substituted into the first corresponding relationship and the second corresponding relationship, respectively, to obtain the first transfer coefficient
Figure PCTCN2020085719-appb-000015
And the second transfer coefficient
Figure PCTCN2020085719-appb-000016
(In this embodiment, the first correspondence relationship corresponds to the left channel, and the second correspondence relationship corresponds to the right channel). In the stereo reconstruction module, the first transfer coefficient can be
Figure PCTCN2020085719-appb-000017
Multiply the audio component Xref(k)l with the corresponding frequency in the audio signal to be processed to obtain the new first audio component XL(k)l of each frequency, and the second transfer coefficient
Figure PCTCN2020085719-appb-000018
Multiply with the audio component Xref(k)l of the corresponding frequency in the audio signal to be processed to obtain the new second audio component XR(k)l of each frequency.
在本实施例中,待处理音频信号是各个麦克风采集的音频信号中相同频率的音频分量进行线性组合,可以用以下式子表示:In this embodiment, the audio signal to be processed is a linear combination of audio components of the same frequency in the audio signals collected by each microphone, which can be expressed by the following formula:
Figure PCTCN2020085719-appb-000019
Figure PCTCN2020085719-appb-000019
其中wm表示麦克风的权重,其可以为实数也可以为复数。Where wm represents the weight of the microphone, which can be a real number or a complex number.
将各频率的新的第一音频分量XL(k)l输入快速傅里叶逆变换IFFT模块,从频域变换回时域,得到新的第一音频帧x'L(n)l;将各频率的新的第二音频分量XR(k)l输入IFFT模块,得到新的第二音频帧x'R(n)l。对于每个第一音频帧x'L(n)l,进行合成窗加窗处理,得到x″L(n)l,将各个加合成窗后的第一音频帧x″L(n)l输入Overlap-add模块,通过重叠相加法进行修正,得到第一音频信号的第l帧的音频帧x″′L(n)l,将各个音频帧x″′L(n)l进行组合,得到完整的用于在第一收音器官上播放的第一音频信号xL(t)。对于每个第二音频帧x'R(n)l,相同的,通过合成窗加窗处理得到x″R(n)l,将各个加合成窗后的第二音频帧x″R(n)l输入Overlap-add模块,得到第二音频信号的第l帧的音频帧x″′R(n)l,将各个音频帧x″′R(n)l进行组合,得到完整的用于在第二收音器官上播 放的第二音频信号xR(t)。Input the new first audio component XL(k)l of each frequency into the inverse fast Fourier transform IFFT module, and transform it from the frequency domain back to the time domain to obtain a new first audio frame x'L(n)l; The new second audio component XR(k)l of the frequency is input to the IFFT module, and a new second audio frame x'R(n)l is obtained. For each first audio frame x'L(n)l, perform the windowing processing of the synthesis window to obtain x"L(n)l, and input the first audio frame x"L(n)l after each synthesis window is added Overlap-add module, through the overlap and add method to modify, get the audio frame x"'L(n)l of the lth frame of the first audio signal, and combine each audio frame x"'L(n)l to get The complete first audio signal xL(t) for playing on the first radio organ. For each second audio frame x'R(n)l, the same, through the windowing process of the synthesis window to obtain x"R(n)l, add each second audio frame x"R(n) after the synthesis window l Input the Overlap-add module to obtain the audio frame x"'R(n)l of the lth frame of the second audio signal. Combine each audio frame x"'R(n)l to obtain a complete 2. The second audio signal xR(t) played on the radio organ.
在上述的提供的实施方式中,音频分量的收音响应差异信息与该音频分量的声源方向与频率存在对应关系。但考虑到声音的立体感除了体现在清晰的方向之外,还可以体现在声音的声源距离,并且,声源所处的距离不同,双耳之间的响应也不同,因此可以在对应关系中增加声源距离作为变量,即对应关系可以是
Figure PCTCN2020085719-appb-000020
的形式。具体实施时,可以针对每个音频分量确定其声源距离,再根据确定的声源距离、声源方向与频率确定该音频分量的收音响应差异信息。
In the above-provided implementation manner, the difference information of the reception response of the audio component has a corresponding relationship with the sound source direction and frequency of the audio component. But considering that the three-dimensional sense of sound is not only reflected in the clear direction, but also in the distance of the sound source, and the distance between the sound source is different, the response between the ears is also different, so it can be in the corresponding relationship Add the sound source distance as a variable, that is, the corresponding relationship can be
Figure PCTCN2020085719-appb-000020
form. During specific implementation, the sound source distance of each audio component may be determined, and then the difference information of the sound reception response of the audio component may be determined according to the determined sound source distance, sound source direction, and frequency.
另一方面,由于双耳之间的响应差异实际与多种特征参数相关,比如,对于同一方向的声源发出的同一频率的音频信号,在同一位置进行收听,不同人收听到的声音也有所不同。这是因为每个人的双耳构造及双耳周边的结构(如肩膀、头发、脸颊等)都有个体化的差异。因此,在一个实施例中,对应关系中的变量还可以包括指定特征参数,即可以根据指定特征参数、声源距离、频率确定该音频分量的收音响应差异信息。指定特征参数可以包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。On the other hand, because the response difference between the two ears is actually related to a variety of characteristic parameters, for example, for the audio signal of the same frequency emitted by the sound source in the same direction, listening at the same position, the sound heard by different people is also different. different. This is because everyone has individual differences in the structure of the ears and the structures around the ears (such as shoulders, hair, cheeks, etc.). Therefore, in an embodiment, the variables in the correspondence relationship may also include designated characteristic parameters, that is, the difference information of the audio component's reception response may be determined according to the designated characteristic parameters, the distance of the sound source, and the frequency. The specified characteristic parameters may include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
在获取用户的指定特征参数时,可以有多种实现方式。在一种实施中,可以通过与用户进行交互,使用户输入相关信息,从而获取用户的指定特征参数。在另一种实施中,可以通过图像识别技术,对用户的图像进行识别,得到需要的指定特征参数。比如,可以通过图像测距,获取用户的双耳间距,通过图像特征识别,抓取用户的耳道特征参数等。When obtaining the user's specified characteristic parameters, there may be multiple implementation methods. In an implementation, the user can interact with the user to enable the user to input relevant information, thereby obtaining the user's designated characteristic parameter. In another implementation, image recognition technology can be used to identify the user's image to obtain the required specified characteristic parameters. For example, the distance between the ears of the user can be obtained through image ranging, and the characteristic parameters of the ear canal of the user can be captured through image feature recognition.
以上为对本申请实施例提供的音频信号处理方法的说明。本申请实施例提供的方法,对待处理音频信号的每个音频分量,根据该音频分量对应的收音响应差异信息进行调制,其中,收音响应差异信息是表征第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异的信息,因此,在根据收音响应差异信息进行调制之后,生成的第一音频信号与第二音频信号之间也可以体现这种幅度响应差异和/或相位响应差异,通过这种响应差异,可以构建出足够的空间感,使人耳可以清晰的分辨出声音的方向。The foregoing is the description of the audio signal processing method provided by the embodiments of the present application. In the method provided by the embodiments of the present application, each audio component of the audio signal to be processed is modulated according to the difference information of the sound reception response corresponding to the audio component, wherein the difference information of the sound reception response represents that the first sound receiving organ and the second sound receiving organ are identical to each other. The information about the difference in amplitude response and/or the difference in phase response of the audio components of the same frequency transmitted from the direction. Therefore, after the information is modulated according to the difference in the radio response, the generated first audio signal and the second audio signal can also be reflected This difference in amplitude response and/or phase response difference, through this difference in response, can construct a sufficient sense of space, so that the human ear can clearly distinguish the direction of the sound.
下面请参见图5是本申请实施例提供的一种示例性的音频处理装置的结构示意图。该装置包括:Please refer to FIG. 5 below for a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application. The device includes:
存储器501,用于存储计算机程序;The memory 501 is used to store computer programs;
处理器502,用于调用所述计算机程序,当所述计算机程序被所述处理器执行时,实现以下步骤:The processor 502 is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:
获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
确定多个所述音频分量中每一个对应的声源方向;Determining a sound source direction corresponding to each of the multiple audio components;
根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
可选的,所述收音响应差异信息包括第一传递系数与第二传递系数,所述第一传递系数是根据所述音频分量对应的声源方向、频率和第一对应关系确定的,所述第二传递系数是根据所述音频分量对应的声源方向、频率和第二对应关系确定的,所述第一对应关系与所述第二对应关系之间的差别用于表征所述第一收音器官和所述第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。Optionally, the difference information of the sound reception response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is determined according to a sound source direction, frequency, and a first correspondence relationship corresponding to the audio component. The second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence. The difference between the first correspondence and the second correspondence is used to characterize the first radio The difference in amplitude response and/or phase response of the audio component of the same frequency transmitted from the same direction by the organ and the second sound-receiving organ.
可选的,所述第一对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第一收音器官的收音响应得到的;Optionally, the first correspondence relationship is obtained by measuring the reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
所述第二对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第二收音器官的收音响应得到的。The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
可选的,所述第一对应关系与所述第二对应关系是根据声源到人双耳的传播模型确定的,所述传播模型是根据预选的特征参数建立的。Optionally, the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is established according to preselected characteristic parameters.
可选的,所述预选的特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。Optionally, the preselected characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
可选的,各个所述音频分量对应的声源方向是利用至少两个麦克风采集的音频信号,通过声源定位算法确定的。Optionally, the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
可选的,所述音频分量对应的声源方向是根据各个所述麦克风采集的音频信号中相应频率的音频分量确定的。Optionally, the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each microphone.
可选的,所述声源定位算法是以下任一种:波束形成算法、到达时间差估计算法、差分麦克风阵列算法。Optionally, the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
可选的,所述待处理音频信号是根据所述麦克风采集的音频信号得到的。Optionally, the audio signal to be processed is obtained based on the audio signal collected by the microphone.
可选的,所述待处理音频信号是各个所述麦克风采集的音频信号中信噪比最高的音频信号。Optionally, the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each microphone.
可选的,所述待处理音频信号的各频率的音频分量是各个所述麦克风采集的音频 信号中相应频率的音频分量的线性组合。Optionally, the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each microphone.
可选的,所述待处理音频信号是根据所述麦克风以外的其它麦克风采集的音频信号得到的。Optionally, the to-be-processed audio signal is obtained based on audio signals collected by a microphone other than the microphone.
可选的,根据所述第一传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第一音频分量,所述第一音频信号是利用各频率的所述第一音频分量变换得到的;Optionally, the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio signal is the first audio component using each frequency. The audio component is transformed;
根据所述第二传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第二音频分量,所述第二音频信号是利用各频率的所述第二音频分量变换得到的。According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
可选的,所述待处理音频信号被分为多个音频帧,所述音频分量是所述音频帧包含的音频分量;Optionally, the audio signal to be processed is divided into multiple audio frames, and the audio component is an audio component contained in the audio frame;
利用各频率的所述第一音频分量变换得到的是新的第一音频帧,所述第一音频信号是利用各个所述第一音频帧合成得到的;A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
利用各频率的所述第二音频分量变换得到的是新的第二音频帧,所述第二音频信号是利用各个所述第二音频帧合成得到的。A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
可选的,所述音频帧包含的采样点的个数为2的幂次方。Optionally, the number of sampling points included in the audio frame is a power of two.
可选的,所述音频帧包含的对应各频率的音频分量是通过快速傅里叶变换FFT确定的。Optionally, the audio components corresponding to the frequencies contained in the audio frame are determined by fast Fourier transform FFT.
可选的,所述处理器还用于,在确定所述音频帧包含的对应各频率的音频分量之前,将所述音频帧调制为周期性信号。Optionally, the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
可选的,所述处理器在将所述音频帧调制为周期性信号时,具体用于对所述音频帧加分析窗。Optionally, when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
可选的,所述处理器在利用各个所述第一音频帧合成所述第一音频信号时,具体用于将各个所述第一音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第一音频信号;Optionally, when the processor uses each of the first audio frames to synthesize the first audio signal, it is specifically configured to combine each of the first audio frames through overlap-add processing, Obtaining the first audio signal;
所述处理器在利用各个所述第二音频帧合成所述第二音频信号时,具体用于对各个所述第二音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第二音频信号。When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
可选的,所述处理器还用于,在通过所述重叠相加法Overlap-add进行处理之前,分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变。Optionally, the processor is further configured to, before performing processing by the overlap-add method, eliminate the amplitude distortion at both ends of the first audio frame and the second audio frame, respectively.
可选的,所述处理器在分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变时,具体用于分别对所述第一音频帧与所述第二音频帧加合成窗。Optionally, the processor is specifically configured to separately correct the first audio frame and the second audio frame when respectively eliminating the distortion of the amplitudes at both ends of the first audio frame and the second audio frame. Add synthesis window.
可选的,所述声源方向包括:周角和/或俯仰角。Optionally, the sound source direction includes: a circumferential angle and/or a pitch angle.
可选的,所述处理器还用于,确定各个所述音频分量的声源距离,所述声源距离用于确定相应频率的所述音频分量的所述收音响应差异信息。Optionally, the processor is further configured to determine a sound source distance of each of the audio components, where the sound source distance is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
可选的,所述处理器还用于,获取用户的双耳及双耳周边的指定特征参数,所述指定特征参数用于确定相应频率的所述音频分量的所述收音响应差异信息。Optionally, the processor is further configured to obtain designated characteristic parameters of the user's binaural and binaural periphery, where the designated characteristic parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
可选的,所述指定特征参数是通过对用户进行图像识别得到的。Optionally, the specified characteristic parameter is obtained by performing image recognition on the user.
可选的,所述指定特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。Optionally, the designated characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
以上所提供的音频处理装置的各个实施例,其具体的实施方式可以参见本申请文件对本申请实施例提供的音频信号处理方法的相关说明,在此不再赘述。For each embodiment of the audio processing device provided above, for specific implementation manners, please refer to the relevant description of the audio signal processing method provided by the embodiments of this application in the document of this application, which will not be repeated here.
下面请参见图6,图6是本申请实施例提供的一种示例性的录音设备的结构示意图。该录音设备包括:至少两个麦克风601、存储器602及处理器603;Please refer to FIG. 6 below. FIG. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application. The recording device includes: at least two microphones 601, a memory 602, and a processor 603;
所述麦克风601用于,采集音频信号;The microphone 601 is used to collect audio signals;
所述存储器602用于,存储计算机程序;The memory 602 is used to store computer programs;
所述处理器603用于,调用所述计算机程序,所述处理器执行所述计算程序时,实现以下步骤:The processor 603 is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:
获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
利用所述麦克风采集的音频信号,通过声源定位算法确定多个所述音频分量中每一个对应的声源方向;Using the audio signal collected by the microphone to determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm;
根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
可选的,所述收音响应差异信息包括第一传递系数与第二传递系数,所述第一传递系数是根据所述音频分量对应的声源方向、频率和第一对应关系确定的,所述第二传递系数是根据所述音频分量对应的声源方向、频率和第二对应关系确定的,所述第一对应关系与所述第二对应关系之间的差别用于表征所述第一收音器官和所述第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。Optionally, the difference information of the sound reception response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is determined according to a sound source direction, frequency, and a first correspondence relationship corresponding to the audio component. The second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence. The difference between the first correspondence and the second correspondence is used to characterize the first radio The difference in amplitude response and/or phase response of the audio component of the same frequency transmitted from the same direction by the organ and the second sound-receiving organ.
可选的,所述第一对应关系是针对不同方向的声源发出的各频率的测试信号分别 测量所述第一收音器官的收音响应得到的;Optionally, the first correspondence relationship is obtained by respectively measuring the sound reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
所述第二对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第二收音器官的收音响应得到的。The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
可选的,所述第一对应关系与所述第二对应关系是根据声源到人双耳的传播模型确定的,所述传播模型是根据预选的特征参数建立的。Optionally, the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is established according to preselected characteristic parameters.
可选的,所述预选的特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。Optionally, the preselected characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
可选的,所述音频分量对应的声源方向是根据各个所述麦克风采集的音频信号中相应频率的音频分量确定的。Optionally, the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each microphone.
可选的,所述声源定位算法是以下任一种:波束形成算法、到达时间差估计算法、差分麦克风阵列算法。Optionally, the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
可选的,所述待处理音频信号是根据所述麦克风采集的音频信号得到的。Optionally, the audio signal to be processed is obtained based on the audio signal collected by the microphone.
可选的,所述待处理音频信号是各个所述麦克风采集的音频信号中信噪比最高的音频信号。Optionally, the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each microphone.
可选的,所述待处理音频信号的各频率的音频分量是各个所述麦克风采集的音频信号中相应频率的音频分量的线性组合。Optionally, the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each microphone.
可选的,所述待处理音频信号是根据所述麦克风以外的其它麦克风采集的音频信号得到的。Optionally, the to-be-processed audio signal is obtained based on audio signals collected by a microphone other than the microphone.
可选的,根据所述第一传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第一音频分量,所述第一音频信号是利用各频率的所述第一音频分量变换得到的;Optionally, the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio signal is the first audio component using each frequency. The audio component is transformed;
根据所述第二传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第二音频分量,所述第二音频信号是利用各频率的所述第二音频分量变换得到的。According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
可选的,所述待处理音频信号被分为多个音频帧,所述音频分量是所述音频帧包含的音频分量;Optionally, the audio signal to be processed is divided into multiple audio frames, and the audio component is an audio component contained in the audio frame;
利用各频率的所述第一音频分量变换得到的是新的第一音频帧,所述第一音频信号是利用各个所述第一音频帧合成得到的;A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
利用各频率的所述第二音频分量变换得到的是新的第二音频帧,所述第二音频信号是利用各个所述第二音频帧合成得到的。A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
可选的,所述音频帧包含的采样点的个数为2的幂次方。Optionally, the number of sampling points included in the audio frame is a power of two.
可选的,所述音频帧包含的对应各频率的音频分量是通过快速傅里叶变换FFT确 定的。Optionally, the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform FFT.
可选的,所述处理器还用于,在确定所述音频帧包含的对应各频率的音频分量之前,将所述音频帧调制为周期性信号。Optionally, the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
可选的,所述处理器在将所述音频帧调制为周期性信号时,具体用于对所述音频帧加分析窗。Optionally, when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
可选的,所述处理器在利用各个所述第一音频帧合成所述第一音频信号时,具体用于将各个所述第一音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第一音频信号;Optionally, when the processor uses each of the first audio frames to synthesize the first audio signal, it is specifically configured to combine each of the first audio frames through overlap-add processing, Obtaining the first audio signal;
所述处理器在利用各个所述第二音频帧合成所述第二音频信号时,具体用于对各个所述第二音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第二音频信号。When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
可选的,所述处理器还用于,在通过所述重叠相加法Overlap-add进行处理之前,分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变。Optionally, the processor is further configured to, before performing processing by the overlap-add method, eliminate the amplitude distortion at both ends of the first audio frame and the second audio frame, respectively.
可选的,所述处理器在分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变时,具体用于分别对所述第一音频帧与所述第二音频帧加合成窗。Optionally, the processor is specifically configured to separately correct the first audio frame and the second audio frame when respectively eliminating the distortion of the amplitudes at both ends of the first audio frame and the second audio frame. Add synthesis window.
可选的,所述声源方向包括:周角和/或俯仰角。Optionally, the sound source direction includes: a circumferential angle and/or a pitch angle.
可选的,所述处理器还用于,确定各个所述音频分量的声源距离,所述声源距离用于确定相应频率的所述音频分量的所述收音响应差异信息。Optionally, the processor is further configured to determine a sound source distance of each of the audio components, where the sound source distance is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
可选的,所述处理器还用于,获取用户的双耳及双耳周边的指定特征参数,所述指定特征参数用于确定相应频率的所述音频分量的所述收音响应差异信息。Optionally, the processor is further configured to obtain designated characteristic parameters of the user's binaural and binaural periphery, where the designated characteristic parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
可选的,所述指定特征参数是通过对用户进行图像识别得到的。Optionally, the specified characteristic parameter is obtained by performing image recognition on the user.
可选的,所述指定特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。Optionally, the designated characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.
可选的,录音设备可以设置有与外部的其它设备连接的端口,通过该端口与其他设备连接,可以从其他设备中获取音频信号,而待处理音频信号可以是根据该其他设备获取的音频信号确定的。Optionally, the recording device can be provided with a port for connecting with other external devices, and by connecting with other devices through this port, audio signals can be obtained from other devices, and the to-be-processed audio signals can be based on the audio signals obtained by the other devices. definite.
该录音设备是具有录音功能的电子设备,其具体可以是以下任一种:手机、照相机、摄像机、运动相机、云台相机、音箱、头戴VR设备、监控机、录音笔、话筒。The recording device is an electronic device with a recording function, which can specifically be any of the following: a mobile phone, a camera, a video camera, a sports camera, a pan-tilt camera, a speaker, a VR headset, a monitor, a voice recorder, and a microphone.
以上所提供的录音设备的各个实施例,其具体的实施方式可以参见本申请文件对本申请实施例提供的音频信号处理方法的相关说明,在此不再赘述。For the specific implementations of the various embodiments of the recording device provided above, please refer to the relevant description of the audio signal processing method provided by the embodiments of the application in the document of this application, which will not be repeated here.
本申请实施例还提供了一种计算机可读存储介质,该计算机可读存储介质上存储 有计算机程序,当该计算机程序被处理器执行时,可以实现本申请实施例提供的任一种实施方式下的音频信号处理方法。The embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium stores a computer program. When the computer program is executed by a processor, any of the implementation manners provided in the embodiments of the present application can be implemented. The audio signal processing method under.
以上实施例中提供的技术特征,只要不存在冲突或矛盾,本领域技术人员可以根据实际情况对各个技术特征进行组合,从而构成各种不同的实施例。而本申请文件限于篇幅,未对各种不同的实施例展开说明,但可以理解的是,各种不同的实施例也属于本申请实施例公开的范围。As long as there is no conflict or contradiction between the technical features provided in the above embodiments, those skilled in the art can combine the various technical features according to actual conditions to form various different embodiments. However, the document of this application is limited in length and does not describe various embodiments. However, it is understandable that various embodiments also belong to the scope of the disclosure of the embodiments of this application.
本申请实施例可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。计算机可用存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。The embodiments of the present application may adopt the form of a computer program product implemented on one or more storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing program codes. Computer usable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. The terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also other elements that are not explicitly listed. Elements, or also include elements inherent to such processes, methods, articles, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
以上对本发明实施例所提供的方法、装置、设备进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The methods, devices, and equipment provided by the embodiments of the present invention are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods of the present invention. And its core ideas; at the same time, for those of ordinary skill in the art, according to the ideas of the present invention, there will be changes in the specific implementation and the scope of application. limits.

Claims (80)

  1. 一种音频信号处理方法,其特征在于,包括:An audio signal processing method, characterized in that it comprises:
    获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
    确定多个所述音频分量中每一个对应的声源方向;Determining a sound source direction corresponding to each of the multiple audio components;
    根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
    根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  2. 根据权利要求1所述的音频信号处理方法,其特征在于,所述收音响应差异信息包括第一传递系数与第二传递系数,所述第一传递系数是根据所述音频分量对应的声源方向、频率和第一对应关系确定的,所述第二传递系数是根据所述音频分量对应的声源方向、频率和第二对应关系确定的,所述第一对应关系与所述第二对应关系之间的差别用于表征所述第一收音器官和所述第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。The audio signal processing method according to claim 1, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction corresponding to the audio component. , The frequency and the first corresponding relationship are determined, the second transfer coefficient is determined according to the sound source direction, frequency and the second corresponding relationship corresponding to the audio component, the first corresponding relationship and the second corresponding relationship The difference between is used to characterize the amplitude response difference and/or the phase response difference of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction.
  3. 根据权利要求2所述的音频信号处理方法,其特征在于,所述第一对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第一收音器官的收音响应得到的;The audio signal processing method according to claim 2, wherein the first correspondence is obtained by measuring the reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
    所述第二对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第二收音器官的收音响应得到的。The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
  4. 根据权利要求2所述的音频信号处理方法,其特征在于,所述第一对应关系与所述第二对应关系是根据声源到人双耳的传播模型确定的,所述传播模型是根据预选的特征参数建立的。The audio signal processing method according to claim 2, wherein the first corresponding relationship and the second corresponding relationship are determined according to a propagation model from a sound source to human ears, and the propagation model is determined according to a preselected The characteristic parameters are established.
  5. 根据权利要求4所述的音频信号处理方法,其特征在于,所述预选的特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The audio signal processing method according to claim 4, wherein the preselected characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristics Parameters, hair characteristic parameters.
  6. 根据权利要求1所述的音频信号处理方法,其特征在于,各个所述音频分量对应的声源方向是利用至少两个麦克风采集的音频信号,通过声源定位算法确定的。The audio signal processing method according to claim 1, wherein the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
  7. 根据权利要求6所述的音频信号处理方法,其特征在于,所述音频分量对应的声源方向是根据各个所述麦克风采集的音频信号中相应频率的音频分量确定的。The audio signal processing method according to claim 6, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
  8. 根据权利要求6所述的音频信号处理方法,其特征在于,所述声源定位算法是以下任一种:波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The audio signal processing method according to claim 6, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
  9. 根据权利要求6所述的音频信号处理方法,其特征在于,所述待处理音频信号是根据所述麦克风采集的音频信号得到的。The audio signal processing method according to claim 6, wherein the to-be-processed audio signal is obtained based on the audio signal collected by the microphone.
  10. 根据权利要求9所述的音频信号处理方法,其特征在于,所述待处理音频信号是各个所述麦克风采集的音频信号中信噪比最高的音频信号。The audio signal processing method according to claim 9, wherein the to-be-processed audio signal is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
  11. 根据权利要求9所述的音频信号处理方法,其特征在于,所述待处理音频信号的各频率的音频分量是各个所述麦克风采集的音频信号中相应频率的音频分量的线性组合。The audio signal processing method according to claim 9, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
  12. 根据权利要求6所述的音频信号处理方法,其特征在于,所述待处理音频信号是根据所述麦克风以外的其它麦克风采集的音频信号得到的。7. The audio signal processing method according to claim 6, wherein the to-be-processed audio signal is obtained based on an audio signal collected by a microphone other than the microphone.
  13. 根据权利要求2所述的音频信号处理方法,其特征在于,根据所述第一传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第一音频分量,所述第一音频信号是利用各频率的所述第一音频分量变换得到的;The audio signal processing method according to claim 2, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component of each frequency is obtained. An audio signal is obtained by transforming the first audio component of each frequency;
    根据所述第二传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第二音频分量,所述第二音频信号是利用各频率的所述第二音频分量变换得到的。According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
  14. 根据权利要求13所述的音频信号处理方法,其特征在于,所述待处理音频信号被分为多个音频帧,所述音频分量是所述音频帧包含的音频分量;The audio signal processing method according to claim 13, wherein the to-be-processed audio signal is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;
    利用各频率的所述第一音频分量变换得到的是新的第一音频帧,所述第一音频信号是利用各个所述第一音频帧合成得到的;A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
    利用各频率的所述第二音频分量变换得到的是新的第二音频帧,所述第二音频信号是利用各个所述第二音频帧合成得到的。A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
  15. 根据权利要求14所述的音频信号处理方法,其特征在于,所述音频帧包含的采样点的个数为2的幂次方。The audio signal processing method according to claim 14, wherein the number of sampling points included in the audio frame is a power of 2.
  16. 根据权利要求15所述的音频信号处理方法,其特征在于,所述音频帧包含的对应各频率的音频分量是通过快速傅里叶变换FFT确定的。The audio signal processing method according to claim 15, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
  17. 根据权利要求14所述的音频信号处理方法,其特征在于,在确定所述音频帧包含的对应各频率的音频分量之前,还包括:The audio signal processing method according to claim 14, wherein before determining the audio components corresponding to each frequency contained in the audio frame, the method further comprises:
    将所述音频帧调制为周期性信号。The audio frame is modulated into a periodic signal.
  18. 根据权利要求17所述的音频信号处理方法,其特征在于,所述将所述音频帧调制为周期性信号,包括:The audio signal processing method according to claim 17, wherein the modulating the audio frame into a periodic signal comprises:
    对所述音频帧加分析窗。An analysis window is added to the audio frame.
  19. 根据权利要求14所述的音频信号处理方法,其特征在于,利用各个所述第一音频帧合成所述第一音频信号,包括:The audio signal processing method according to claim 14, wherein synthesizing the first audio signal using each of the first audio frames comprises:
    将各个所述第一音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第一音频信号;Combining each of the first audio frames through overlap-add processing to obtain the first audio signal;
    利用各个所述第二音频帧合成所述第二音频信号,包括:Synthesizing the second audio signal using each of the second audio frames includes:
    对各个所述第二音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第二音频信号。Each of the second audio frames is combined through overlap-add processing to obtain the second audio signal.
  20. 根据权利要求19所述的音频信号处理方法,其特征在于,在通过所述重叠相加法Overlap-add进行处理之前,还包括:The audio signal processing method according to claim 19, characterized in that, before processing by the overlap-add method, the method further comprises:
    分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变。The amplitude distortion at both ends of the first audio frame and the second audio frame are respectively eliminated.
  21. 根据权利要求20所述的音频信号处理方法,其特征在于,所述分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变,包括:22. The audio signal processing method according to claim 20, wherein the respectively removing the distortion of the amplitudes at both ends of the first audio frame and the second audio frame comprises:
    分别对所述第一音频帧与所述第二音频帧加合成窗。Adding a synthesis window to the first audio frame and the second audio frame respectively.
  22. 根据权利要求1所述的音频信号处理方法,其特征在于,所述声源方向包括:周角和/或俯仰角。The audio signal processing method according to claim 1, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
  23. 根据权利要求1所述的音频信号处理方法,其特征在于,还包括:The audio signal processing method according to claim 1, further comprising:
    确定各个所述音频分量的声源距离,所述声源距离用于确定相应频率的所述音频分量的所述收音响应差异信息。The sound source distance of each of the audio components is determined, and the sound source distance is used to determine the difference information of the reception response of the audio components of the corresponding frequency.
  24. 根据权利要求1所述的音频信号处理方法,其特征在于,还包括:The audio signal processing method according to claim 1, further comprising:
    获取用户的双耳及双耳周边的指定特征参数,所述指定特征参数用于确定相应频率的所述音频分量的所述收音响应差异信息。Acquire designated feature parameters of the user's binaural and binaural periphery, where the specified feature parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
  25. 根据权利要求24所述的音频信号处理方法,其特征在于,所述指定特征参数是通过对用户进行图像识别得到的。The audio signal processing method according to claim 24, wherein the specified characteristic parameter is obtained by performing image recognition on the user.
  26. 根据权利要求24所述的音频信号处理方法,其特征在于,所述指定特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The audio signal processing method according to claim 24, wherein the specified characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristic parameters , Hair characteristic parameters.
  27. 一种音频处理装置,其特征在于,包括:An audio processing device, characterized in that it comprises:
    存储器,用于存储计算机程序;Memory, used to store computer programs;
    处理器,用于调用所述计算机程序,当所述计算机程序被所述处理器执行时,实现以下步骤:The processor is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:
    获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
    确定多个所述音频分量中每一个对应的声源方向;Determining a sound source direction corresponding to each of the multiple audio components;
    根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
    根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  28. 根据权利要求27所述的音频处理装置,其特征在于,所述收音响应差异信息包括第一传递系数与第二传递系数,所述第一传递系数是根据所述音频分量对应的声源方向、频率和第一对应关系确定的,所述第二传递系数是根据所述音频分量对应的声源方向、频率和第二对应关系确定的,所述第一对应关系与所述第二对应关系之间的差别用于表征所述第一收音器官和所述第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。The audio processing device according to claim 27, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction corresponding to the audio component, The frequency and the first correspondence relationship are determined, the second transfer coefficient is determined according to the sound source direction, frequency, and a second correspondence relationship corresponding to the audio component, and the first correspondence relationship is different from the second correspondence relationship The difference between the two is used to characterize the difference in amplitude response and/or phase response of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction.
  29. 根据权利要求28所述的音频处理装置,其特征在于,所述第一对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第一收音器官的收音响应得到的;28. The audio processing device according to claim 28, wherein the first corresponding relationship is obtained by measuring the radio response of the first radio organ for test signals of various frequencies emitted by sound sources in different directions;
    所述第二对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第二收音器官的收音响应得到的。The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
  30. 根据权利要求28所述的音频处理装置,其特征在于,所述第一对应关系与所述第二对应关系是根据声源到人双耳的传播模型确定的,所述传播模型是根据预选的特征参数建立的。The audio processing device according to claim 28, wherein the first corresponding relationship and the second corresponding relationship are determined according to a propagation model from the sound source to the human ears, and the propagation model is determined according to a preselected Characteristic parameters are established.
  31. 根据权利要求30所述的音频处理装置,其特征在于,所述预选的特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The audio processing device according to claim 30, wherein the preselected characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristic parameters , Hair characteristic parameters.
  32. 根据权利要求27所述的音频处理装置,其特征在于,各个所述音频分量对应的声源方向是利用至少两个麦克风采集的音频信号,通过声源定位算法确定的。The audio processing device according to claim 27, wherein the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
  33. 根据权利要求32所述的音频处理装置,其特征在于,所述音频分量对应的声源方向是根据各个所述麦克风采集的音频信号中相应频率的音频分量确定的。The audio processing device according to claim 32, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
  34. 根据权利要求32所述的音频处理装置,其特征在于,所述声源定位算法是以下任一种:波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The audio processing device according to claim 32, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
  35. 根据权利要求32所述的音频处理装置,其特征在于,所述待处理音频信号是根据所述麦克风采集的音频信号得到的。The audio processing device according to claim 32, wherein the to-be-processed audio signal is obtained based on the audio signal collected by the microphone.
  36. 根据权利要求35所述的音频处理装置,其特征在于,所述待处理音频信号是各个所述麦克风采集的音频信号中信噪比最高的音频信号。The audio processing device according to claim 35, wherein the to-be-processed audio signal is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
  37. 根据权利要求35所述的音频处理装置,其特征在于,所述待处理音频信号的各频率的音频分量是各个所述麦克风采集的音频信号中相应频率的音频分量的线性组合。The audio processing device according to claim 35, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
  38. 根据权利要求32所述的音频处理装置,其特征在于,所述待处理音频信号是根据所述麦克风以外的其它麦克风采集的音频信号得到的。The audio processing device according to claim 32, wherein the to-be-processed audio signal is obtained based on an audio signal collected by a microphone other than the microphone.
  39. 根据权利要求28所述的音频处理装置,其特征在于,根据所述第一传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第一音频分量,所述第一音频信号是利用各频率的所述第一音频分量变换得到的;The audio processing device according to claim 28, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component is The audio signal is obtained by transforming the first audio component of each frequency;
    根据所述第二传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第二音频分量,所述第二音频信号是利用各频率的所述第二音频分量变换得到的。According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
  40. 根据权利要求39所述的音频处理装置,其特征在于,所述待处理音频信号被分为多个音频帧,所述音频分量是所述音频帧包含的音频分量;The audio processing device according to claim 39, wherein the audio signal to be processed is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;
    利用各频率的所述第一音频分量变换得到的是新的第一音频帧,所述第一音频信号是利用各个所述第一音频帧合成得到的;A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
    利用各频率的所述第二音频分量变换得到的是新的第二音频帧,所述第二音频信号是利用各个所述第二音频帧合成得到的。A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
  41. 根据权利要求40所述的音频处理装置,其特征在于,所述音频帧包含的采样点的个数为2的幂次方。The audio processing device of claim 40, wherein the number of sampling points included in the audio frame is a power of two.
  42. 根据权利要求41所述的音频处理装置,其特征在于,所述音频帧包含的对应各频率的音频分量是通过快速傅里叶变换FFT确定的。The audio processing device according to claim 41, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
  43. 根据权利要求40所述的音频处理装置,其特征在于,所述处理器还用于,在确定所述音频帧包含的对应各频率的音频分量之前,将所述音频帧调制为周期性信号。The audio processing device according to claim 40, wherein the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
  44. 根据权利要求43所述的音频处理装置,其特征在于,所述处理器在将所述音频帧调制为周期性信号时,具体用于对所述音频帧加分析窗。The audio processing device according to claim 43, wherein the processor is specifically configured to add an analysis window to the audio frame when the processor modulates the audio frame into a periodic signal.
  45. 根据权利要求40所述的音频处理装置,其特征在于,所述处理器在利用各个所述第一音频帧合成所述第一音频信号时,具体用于将各个所述第一音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第一音频信号;The audio processing device according to claim 40, wherein when the processor synthesizes the first audio signal by using each of the first audio frames, it is specifically configured to overlap each of the first audio frames by overlapping Adding overlap-add processing and then combining to obtain the first audio signal;
    所述处理器在利用各个所述第二音频帧合成所述第二音频信号时,具体用于对各个所述第二音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第二音频信号。When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
  46. 根据权利要求45所述的音频处理装置,其特征在于,所述处理器还用于,在通过所述重叠相加法Overlap-add进行处理之前,分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变。The audio processing device according to claim 45, wherein the processor is further configured to remove the first audio frame and the first audio frame before processing by the overlap-add method. 2. The distortion of the amplitude at both ends of the audio frame.
  47. 根据权利要求46所述的音频处理装置,其特征在于,所述处理器在分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变时,具体用于分别对所述第一音频帧与所述第二音频帧加合成窗。The audio processing device according to claim 46, wherein the processor is specifically configured to separately correct the amplitude distortion at both ends of the first audio frame and the second audio frame. A synthesis window is added to the first audio frame and the second audio frame.
  48. 根据权利要求27所述的音频处理装置,其特征在于,所述声源方向包括:周角和/或俯仰角。The audio processing device according to claim 27, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
  49. 根据权利要求27所述的音频处理装置,其特征在于,所述处理器还用于,确定各个所述音频分量的声源距离,所述声源距离用于确定相应频率的所述音频分量的所述收音响应差异信息。The audio processing device according to claim 27, wherein the processor is further configured to determine the sound source distance of each of the audio components, and the sound source distance is used to determine the sound source distance of the audio component of the corresponding frequency. The radio response difference information.
  50. 根据权利要求27所述的音频处理装置,其特征在于,所述处理器还用于,获取用户的双耳及双耳周边的指定特征参数,所述指定特征参数用于确定相应频率的所述音频分量的所述收音响应差异信息。The audio processing device according to claim 27, wherein the processor is further configured to obtain a designated characteristic parameter of the user's binaural and binaural periphery, and the designated characteristic parameter is used to determine the corresponding frequency of the The radio response difference information of the audio component.
  51. 根据权利要求50所述的音频处理装置,其特征在于,所述指定特征参数是通过对用户进行图像识别得到的。The audio processing device of claim 50, wherein the designated characteristic parameter is obtained by performing image recognition on the user.
  52. 根据权利要求50所述的音频处理装置,其特征在于,所述指定特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The audio processing device according to claim 50, wherein the specified characteristic parameters comprise one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, Hair characteristic parameters.
  53. 一种录音设备,其特征在于,包括:至少两个麦克风、存储器及处理器;A recording device, characterized by comprising: at least two microphones, a memory and a processor;
    所述麦克风用于,采集音频信号;The microphone is used to collect audio signals;
    所述存储器用于,存储计算机程序;The memory is used to store a computer program;
    所述处理器用于,调用所述计算机程序,所述处理器执行所述计算程序时,实现以下步骤:The processor is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:
    获取待处理音频信号,所述待处理音频信号包括多个频率的音频分量;Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;
    利用所述麦克风采集的音频信号,通过声源定位算法确定多个所述音频分量中每一个对应的声源方向;Using the audio signal collected by the microphone to determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm;
    根据所述声源方向确定每一所述音频分量对应的收音响应差异信息,所述收音响 应差异信息包括:第一收音器官和第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异;According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;
    根据所述收音响应差异信息,对所述待处理音频信号中相应频率的所述音频分量进行调制,以生成用于在所述第一收音器官播放的第一音频信号,和用于在所述第二收音器官播放的第二音频信号。According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
  54. 根据权利要求53所述的录音设备,其特征在于,所述收音响应差异信息包括第一传递系数与第二传递系数,所述第一传递系数是根据所述音频分量对应的声源方向、频率和第一对应关系确定的,所述第二传递系数是根据所述音频分量对应的声源方向、频率和第二对应关系确定的,所述第一对应关系与所述第二对应关系之间的差别用于表征所述第一收音器官和所述第二收音器官对同一方向传来的同一频率的音频分量的幅度响应差异和/或相位响应差异。The recording device according to claim 53, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction and frequency corresponding to the audio component. And the first correspondence is determined, the second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence between the first correspondence and the second correspondence The difference of is used to characterize the difference in amplitude response and/or phase response of the first sound-receiving organ and the second sound-receiving organ to the audio components of the same frequency transmitted in the same direction.
  55. 根据权利要求54所述的录音设备,其特征在于,所述第一对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第一收音器官的收音响应得到的;The recording device according to claim 54, wherein the first corresponding relationship is obtained by measuring the sound reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;
    所述第二对应关系是针对不同方向的声源发出的各频率的测试信号分别测量所述第二收音器官的收音响应得到的。The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
  56. 根据权利要求54所述的录音设备,其特征在于,所述第一对应关系与所述第二对应关系是根据声源到人双耳的传播模型确定的,所述传播模型是根据预选的特征参数建立的。The recording device according to claim 54, wherein the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is based on a preselected feature Established by the parameters.
  57. 根据权利要求56所述的录音设备,其特征在于,所述预选的特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The recording device according to claim 56, wherein the preselected characteristic parameters include one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, Hair characteristic parameters.
  58. 根据权利要求53所述的录音设备,其特征在于,所述音频分量对应的声源方向是根据各个所述麦克风采集的音频信号中相应频率的音频分量确定的。The recording device according to claim 53, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
  59. 根据权利要求53所述的录音设备,其特征在于,所述声源定位算法是以下任一种:波束形成算法、到达时间差估计算法、差分麦克风阵列算法。The recording device according to claim 53, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
  60. 根据权利要求53所述的录音设备,其特征在于,所述待处理音频信号是根据所述麦克风采集的音频信号得到的。The recording device according to claim 53, wherein the audio signal to be processed is obtained based on the audio signal collected by the microphone.
  61. 根据权利要求60所述的录音设备,其特征在于,所述待处理音频信号是各个所述麦克风采集的音频信号中信噪比最高的音频信号。The recording device according to claim 60, wherein the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
  62. 根据权利要求60所述的录音设备,其特征在于,所述待处理音频信号的各频率的音频分量是各个所述麦克风采集的音频信号中相应频率的音频分量的线性组合。The recording device according to claim 60, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
  63. 根据权利要求53所述的录音设备,其特征在于,所述待处理音频信号是根据所述麦克风以外的其它麦克风采集的音频信号得到的。The recording device according to claim 53, wherein the audio signal to be processed is obtained based on audio signals collected by a microphone other than the microphone.
  64. 根据权利要求63所述的录音设备,其特征在于,所述录音设备与外部的其他录音设备连接,所述待处理音频信号是根据所述其他录音设备采集的音频信号得到的。The recording device according to claim 63, wherein the recording device is connected to another external recording device, and the to-be-processed audio signal is obtained based on the audio signal collected by the other recording device.
  65. 根据权利要求54所述的录音设备,其特征在于,根据所述第一传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第一音频分量,所述第一音频信号是利用各频率的所述第一音频分量变换得到的;The recording device according to claim 54, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component is The signal is obtained by transforming the first audio component of each frequency;
    根据所述第二传递系数对相应频率的所述音频分量进行调制,得到的是新的各频率的第二音频分量,所述第二音频信号是利用各频率的所述第二音频分量变换得到的。According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
  66. 根据权利要求65所述的录音设备,其特征在于,所述待处理音频信号被分为多个音频帧,所述音频分量是所述音频帧包含的音频分量;The recording device according to claim 65, wherein the audio signal to be processed is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;
    利用各频率的所述第一音频分量变换得到的是新的第一音频帧,所述第一音频信号是利用各个所述第一音频帧合成得到的;A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;
    利用各频率的所述第二音频分量变换得到的是新的第二音频帧,所述第二音频信号是利用各个所述第二音频帧合成得到的。A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
  67. 根据权利要求66所述的录音设备,其特征在于,所述音频帧包含的采样点的个数为2的幂次方。The recording device according to claim 66, wherein the number of sampling points included in the audio frame is a power of 2.
  68. 根据权利要求67所述的录音设备,其特征在于,所述音频帧包含的对应各频率的音频分量是通过快速傅里叶变换FFT确定的。The recording device according to claim 67, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
  69. 根据权利要求66所述的录音设备,其特征在于,所述处理器还用于,在确定所述音频帧包含的对应各频率的音频分量之前,将所述音频帧调制为周期性信号。The recording device according to claim 66, wherein the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
  70. 根据权利要求69所述的录音设备,其特征在于,所述处理器在将所述音频帧调制为周期性信号时,具体用于对所述音频帧加分析窗。The recording device according to claim 69, wherein when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
  71. 根据权利要求66所述的录音设备,其特征在于,所述处理器在利用各个所述第一音频帧合成所述第一音频信号时,具体用于将各个所述第一音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第一音频信号;The recording device according to claim 66, wherein when the processor synthesizes the first audio signal by using each of the first audio frames, it is specifically configured to combine each of the first audio frames by overlapping phases. Adding overlap-add processing and then performing combination to obtain the first audio signal;
    所述处理器在利用各个所述第二音频帧合成所述第二音频信号时,具体用于对各个所述第二音频帧通过重叠相加法Overlap-add处理后进行组合,得到所述第二音频信号。When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
  72. 根据权利要求71所述的录音设备,其特征在于,所述处理器还用于,在通过所述重叠相加法Overlap-add进行处理之前,分别消除所述第一音频帧与所述第二音频 帧两端幅值的畸变。The recording device according to claim 71, wherein the processor is further configured to eliminate the first audio frame and the second audio frame before processing by the overlap-add method. The distortion of the amplitude at both ends of the audio frame.
  73. 根据权利要求72所述的录音设备,其特征在于,所述处理器在分别消除所述第一音频帧与所述第二音频帧两端幅值的畸变时,具体用于分别对所述第一音频帧与所述第二音频帧加合成窗。The recording device according to claim 72, wherein the processor is specifically configured to separately correct the first audio frame and the second audio frame to eliminate the distortion of the amplitude at both ends of the second audio frame. A synthesis window is added to an audio frame and the second audio frame.
  74. 根据权利要求53所述的录音设备,其特征在于,所述声源方向包括:周角和/或俯仰角。The recording device according to claim 53, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
  75. 根据权利要求53所述的录音设备,其特征在于,所述处理器还用于,确定各个所述音频分量的声源距离,所述声源距离用于确定相应频率的所述音频分量的所述收音响应差异信息。The recording device according to claim 53, wherein the processor is further configured to determine the sound source distance of each of the audio components, and the sound source distance is used to determine the sound source distance of the audio component of the corresponding frequency. Describe the difference information of the radio response.
  76. 根据权利要求53所述的录音设备,其特征在于,所述处理器还用于,获取用户的双耳及双耳周边的指定特征参数,所述指定特征参数用于确定相应频率的所述音频分量的所述收音响应差异信息。The recording device according to claim 53, wherein the processor is further configured to obtain a designated characteristic parameter of the user's binaural and binaural periphery, and the designated characteristic parameter is used to determine the audio frequency of the corresponding frequency. The radio response difference information of the component.
  77. 根据权利要求76所述的录音设备,其特征在于,所述指定特征参数是通过对用户进行图像识别得到的。The recording device according to claim 76, wherein the designated characteristic parameter is obtained by performing image recognition on the user.
  78. 根据权利要求76所述的录音设备,其特征在于,所述指定特征参数包括以下一种或多种:双耳间距、耳廓特征参数、耳道特征参数、肩膀特征参数、脸颊特征参数、头发特征参数。The recording device according to claim 76, wherein the specified characteristic parameters include one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, hair Characteristic Parameters.
  79. 根据权利要求53所述的录音设备,其特征在于,所述录音设备具体为以下任一种:手机、照相机、摄像机、运动相机、云台相机、音箱、头戴VR设备、监控机、录音笔、话筒。The recording device according to claim 53, wherein the recording device is specifically any one of the following: a mobile phone, a camera, a video camera, a sports camera, a pan/tilt camera, a speaker, a head-mounted VR device, a monitor, and a voice recorder ,microphone.
  80. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至26任一项所述的音频信号处理方法。A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the audio signal processing according to any one of claims 1 to 26 is realized method.
PCT/CN2020/085719 2020-04-20 2020-04-20 Audio signal processing method, audio processing device, and recording apparatus WO2021212287A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080038422.1A CN113875265A (en) 2020-04-20 2020-04-20 Audio signal processing method, audio processing device and recording equipment
PCT/CN2020/085719 WO2021212287A1 (en) 2020-04-20 2020-04-20 Audio signal processing method, audio processing device, and recording apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/085719 WO2021212287A1 (en) 2020-04-20 2020-04-20 Audio signal processing method, audio processing device, and recording apparatus

Publications (1)

Publication Number Publication Date
WO2021212287A1 true WO2021212287A1 (en) 2021-10-28

Family

ID=78271023

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085719 WO2021212287A1 (en) 2020-04-20 2020-04-20 Audio signal processing method, audio processing device, and recording apparatus

Country Status (2)

Country Link
CN (1) CN113875265A (en)
WO (1) WO2021212287A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024124561A1 (en) * 2022-12-16 2024-06-20 北京小米移动软件有限公司 Analysis method and apparatus for audio collection

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101155440A (en) * 2007-09-17 2008-04-02 昊迪移通(北京)技术有限公司 Three-dimensional around sound effect technology aiming at double-track audio signal
CN102456351A (en) * 2010-10-14 2012-05-16 清华大学 Voice enhancement system
US20130343551A1 (en) * 2008-02-20 2013-12-26 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding stereo audio
CN106797525A (en) * 2014-08-13 2017-05-31 三星电子株式会社 For generating the method and apparatus with playing back audio signal
CN106973355A (en) * 2016-01-14 2017-07-21 腾讯科技(深圳)有限公司 surround sound implementation method and device
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data
CN109410912A (en) * 2018-11-22 2019-03-01 深圳市腾讯信息技术有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audio processing
CN110082724A (en) * 2019-05-31 2019-08-02 浙江大华技术股份有限公司 A kind of sound localization method, device and storage medium
CN110972053A (en) * 2019-11-25 2020-04-07 腾讯音乐娱乐科技(深圳)有限公司 Method and related apparatus for constructing a listening scene

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050060789A (en) * 2003-12-17 2005-06-22 삼성전자주식회사 Apparatus and method for controlling virtual sound
JP2005223713A (en) * 2004-02-06 2005-08-18 Sony Corp Apparatus and method for acoustic reproduction
JP4580210B2 (en) * 2004-10-19 2010-11-10 ソニー株式会社 Audio signal processing apparatus and audio signal processing method
JP5867799B2 (en) * 2011-06-23 2016-02-24 国立研究開発法人産業技術総合研究所 Sound collecting / reproducing apparatus, program, and sound collecting / reproducing method
CN106358135A (en) * 2016-10-14 2017-01-25 广州酷狗计算机科技有限公司 Stereo reducing method and device
CN108156561B (en) * 2017-12-26 2020-08-04 广州酷狗计算机科技有限公司 Audio signal processing method and device and terminal

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101155440A (en) * 2007-09-17 2008-04-02 昊迪移通(北京)技术有限公司 Three-dimensional around sound effect technology aiming at double-track audio signal
US20130343551A1 (en) * 2008-02-20 2013-12-26 Samsung Electronics Co., Ltd. Method and apparatus for encoding/decoding stereo audio
CN102456351A (en) * 2010-10-14 2012-05-16 清华大学 Voice enhancement system
CN106797525A (en) * 2014-08-13 2017-05-31 三星电子株式会社 For generating the method and apparatus with playing back audio signal
CN106973355A (en) * 2016-01-14 2017-07-21 腾讯科技(深圳)有限公司 surround sound implementation method and device
CN107889044A (en) * 2017-12-19 2018-04-06 维沃移动通信有限公司 The processing method and processing device of voice data
CN109410912A (en) * 2018-11-22 2019-03-01 深圳市腾讯信息技术有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of audio processing
CN110082724A (en) * 2019-05-31 2019-08-02 浙江大华技术股份有限公司 A kind of sound localization method, device and storage medium
CN110972053A (en) * 2019-11-25 2020-04-07 腾讯音乐娱乐科技(深圳)有限公司 Method and related apparatus for constructing a listening scene

Also Published As

Publication number Publication date
CN113875265A (en) 2021-12-31

Similar Documents

Publication Publication Date Title
US10382849B2 (en) Spatial audio processing apparatus
KR101547035B1 (en) Three-dimensional sound capturing and reproducing with multi-microphones
JP5533248B2 (en) Audio signal processing apparatus and audio signal processing method
JP4343845B2 (en) Audio data processing method and sound collector for realizing the method
JP6613078B2 (en) Signal processing apparatus and control method thereof
US11122381B2 (en) Spatial audio signal processing
US10979846B2 (en) Audio signal rendering
JP5611970B2 (en) Converter and method for converting audio signals
CN112492445B (en) Method and processor for realizing signal equalization by using ear-covering type earphone
KR20220038478A (en) Apparatus, method or computer program for processing a sound field representation in a spatial transformation domain
US20200029153A1 (en) Audio signal processing method and device
US20130243201A1 (en) Efficient control of sound field rotation in binaural spatial sound
WO2021212287A1 (en) Audio signal processing method, audio processing device, and recording apparatus
WO2020036077A1 (en) Signal processing device, signal processing method, and program
Shabtai et al. Spherical array beamforming for binaural sound reproduction
JP5163685B2 (en) Head-related transfer function measurement method, head-related transfer function convolution method, and head-related transfer function convolution device
WO2018066376A1 (en) Signal processing device, method, and program
WO2023173285A1 (en) Audio processing method and apparatus, electronic device, and computer-readable storage medium
Ahrens et al. Authentic auralization of acoustic spaces based on spherical microphone array recordings
WO2023085186A1 (en) Information processing device, information processing method, and information processing program
Bai et al. An integrated analysis-synthesis array system for spatial sound fields
US10659902B2 (en) Method and system of broadcasting a 360° audio signal
CN116261086A (en) Sound signal processing method, device, equipment and storage medium
JP2022034267A (en) Binaural reproduction device and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20932164

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20932164

Country of ref document: EP

Kind code of ref document: A1