WO2021212287A1

WO2021212287A1 - Audio signal processing method, audio processing device, and recording apparatus

Info

Publication number: WO2021212287A1
Application number: PCT/CN2020/085719
Authority: WO
Inventors: 莫品西; 边云锋; 薛政; 刘洋
Original assignee: 深圳市大疆创新科技有限公司
Priority date: 2020-04-20
Filing date: 2020-04-20
Publication date: 2021-10-28
Also published as: CN113875265A

Abstract

An audio signal processing method, an audio processing device, and a recording apparatus. The method comprises: acquiring an audio signal to be processed (S301), the audio signal comprising audio components of multiple frequencies; determining a corresponding sound source direction for each of the multiple audio components (S302); determining, according to the sound source directions, sound-reception response difference information corresponding to the respective audio components (S303), wherein the reception response difference information comprises an amplitude response difference and/or a phase response difference between a first sound reception device and a second sound reception device when responding to audio components of the same frequency transmitted from the same direction; and modulating, according to the sound-reception response difference information, the respective audio components of the corresponding frequencies in the audio signal, and generating a first audio signal to be played by the first sound reception device and a second audio signal to be played by the second sound reception device (S304). The audio signal processing method solves the technical problem in which audio recorded by existing recording apparatuses has poor stereo performance.

Description

Audio signal processing method, audio processing device and recording equipment

Technical field

This application relates to the field of signal processing technology, and in particular to an audio signal processing method, audio processing device, recording device, and computer-readable storage medium.

Background technique

The recording function is equipped on many electronic products, such as voice recorders, cameras, camcorders and so on. In order to enable the recorded audio to restore the stereoscopic effect of the sound heard by human ears in reality, these electronic devices with recording functions usually simulate human ears and place microphones on the left and right sides for recording. As shown in Figure 1, Figure 1 is a schematic structural diagram of an existing digital camera. It can be seen that the digital camera is equipped with dual microphones.

However, even if the left and right microphones are set for recording, there is still a significant difference between the structural layout of the dual microphones and the structural layout of the human ears. Therefore, the recorded audio is still three-dimensional with the sound actually heard by the human ears. There is a big gap. When the user is playing back the recorded audio, he cannot accurately distinguish the corresponding direction of the sound, and the stereo perception is poor.

Summary of the invention

In order to solve the above-mentioned problems, the embodiments of the present application provide an audio signal processing method, an audio processing device, a recording device, and a computer-readable storage medium to solve the technical problem of poor stereo perception of audio recorded by the existing recording device.

The first aspect of the embodiments of the present application provides an audio signal processing method, including:

Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;

Determine the sound source direction corresponding to each of the multiple audio components; determine the sound source difference information corresponding to each audio component according to the sound source direction, and the sound pickup response difference information includes: a first sound receiving organ and a first sound receiving organ The difference in the amplitude response and/or the phase response of the two sound-receiving organs to the audio components of the same frequency transmitted in the same direction;

According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.

A second aspect of the embodiments of the present application provides an audio processing device, including:

Memory, used to store computer programs;

The processor is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:

A third aspect of the embodiments of the present application provides a recording device, including: at least two microphones, a memory, and a processor;

The microphone is used to collect audio signals;

The memory is used to store a computer program;

The processor is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:

Using the audio signal collected by the microphone, determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm; determine the difference information of the reception response corresponding to each audio component according to the sound source direction, The difference information of the sound reception response includes: the amplitude response difference and/or the phase response difference of the first sound reception organ and the second sound reception organ to the audio components of the same frequency transmitted in the same direction;

The fourth aspect of the embodiments of the present application provides a computer-readable storage medium with a computer program stored on the computer-readable storage medium. The audio signal processing method.

In the audio signal processing method provided by the embodiment of the present application, each audio component of the audio signal to be processed is modulated according to the difference information of the reception response corresponding to the audio component, wherein the difference information of the reception response represents the first sound collection organ and the second collection The organ’s amplitude response difference and/or phase response difference information for the audio components of the same frequency transmitted in the same direction. Therefore, after modulating according to the difference information of the reception response, the generated first audio signal and the second audio signal This difference in amplitude response and/or phase response can also be reflected. Through this difference in response, a sufficient sense of space can be constructed so that the human ear can clearly distinguish the direction of the sound.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative labor.

FIG. 1 is a schematic diagram of the structure of an existing digital camera.

Figure 2 is a schematic diagram of a scene of recording sound through an artificial head.

Fig. 3 is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application.

Fig. 4 is an algorithm block diagram of an exemplary audio signal processing method provided by an embodiment of the present application.

Fig. 5 is a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application.

Fig. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

In order to be able to record audio with a stereoscopic effect, the existing electronic equipment with a recording function is equipped with two microphones on the left and right to imitate the human ears. However, because the structural layout of dual microphones is still very different from the structural layout of human ears, the audio recorded by dual microphones does not have enough stereoscopic effect, and cannot accurately and clearly restore the corresponding direction of the actual sound.

In order to solve the above problems, a feasible solution is to set up microphones at both ears of real people, artificial heads or similar artificial head devices for recording. In this way, because the microphones are in the same environment as the human ears, they can be recorded. Real stereo sound. Refer to Fig. 2, which is a schematic diagram of a scene of recording sound through an artificial head.

Although the above solution is feasible, it requires the help of a real person, an artificial head or a similar artificial head device, so the ease of recording is greatly reduced, and devices such as artificial heads also have the disadvantages of poor portability and high cost of use.

Based on the above problems, the embodiments of the present application provide an audio signal processing method. After using this method to process the recorded audio signal, an audio signal with enhanced stereo perception can be obtained, and the method does not require the cooperation of real persons, artificial heads, etc. Therefore, there are no shortcomings of inconvenience, poor portability, and high cost. Refer to FIG. 3, which is a flowchart of an exemplary audio signal processing method provided by an embodiment of the present application. The method includes the following steps:

S301: Acquire an audio signal to be processed.

Among them, the audio signal to be processed includes audio components of multiple frequencies.

S302: Determine the sound source direction corresponding to each of the multiple audio components.

S303: According to the determined sound source direction, determine the difference information of the reception response corresponding to each audio component.

Wherein, the difference in sound reception response information includes: difference in amplitude response and/or difference in phase response of the first sound receiving organ and the second sound receiving organ to audio components of the same frequency transmitted in the same direction.

S304: According to the determined difference information of the radio response, modulate the audio component of the corresponding frequency in the audio signal to be processed to generate a first audio signal for playing in the first radio organ and a signal for playing in the second radio organ. The second audio signal.

It should be noted that the sound-receiving organ may specifically be a human ear. For example, the first sound-receiving organ may be the left ear, and the second sound-receiving organ may be the right ear. Of course, the sound-receiving organ can also be an electronic device that mimics the human ear, such as artificial ears and hearing aids. In a specific scenario, the sound receiving organ may correspond to one or more channels of the playback device. For example, if the playback device is an earphone, the first sound receiving organ may correspond to the left channel of the earphone, and the second sound receiving organ may correspond to the right channel of the earphone. If the playback device is an audio kit, the first radio organ can also correspond to the left channel speaker in the speaker kit (if it is a four-channel stereo, the first radio organ can correspond to the left front and left rear speakers), and the second radio organ can correspond to The right channel speaker in the speaker set (if it is a four-channel stereo, the second radio organ can correspond to the right front and right rear speakers). As long as a playback device with two or more sound channels has a playback channel corresponding to the first sound receiving organ and the second sound receiving organ described in this application.

The applicant found that when the human ear distinguishes the direction of sound, it mainly relies on the difference in the response of the two ears to the sound. When a sound is transmitted, the left and right ears of a person can hear the sound, and because the positions of the left and right ears are different, the response between the two is also different. For example, if the sound source is on the left, the left ear will hear the sound before the right ear, and the amplitude of the sound heard by the left ear may be greater than that of the right ear. Through the difference in the response between the left and right ears, people can distinguish the direction of the sound. Correspondingly, if this response difference can be reflected in the difference between the two audio frequencies, and the two audio frequencies are respectively in the human left and right ears. With ear playback, people can perceive a clear three-dimensional sense when listening, and can accurately distinguish the direction of the sound.

The applicant also found that the response difference between human ears can be divided into amplitude response difference and phase response difference. For sounds of different frequencies, when the human ear distinguishes its direction, the main basis may be different responses. difference. For example, for low-frequency sound, due to its strong diffraction ability, low-frequency sound easily bypasses obstacles to reach both ears. Therefore, the amplitude difference between the two ears is small. The human ear mainly relies on the phase difference when distinguishing the direction of low-frequency sound. As for high-frequency sound, because the phase difference generated at the ears has been aliased, it is difficult to distinguish its direction through the phase difference. However, the diffraction ability of high-frequency sound is weak, and it affects the human head, shoulders and other parts. Specific diffraction effect, so its direction can be distinguished by the difference in amplitude. It can be seen that, in order to enhance the stereoscopic effect of the audio signal, different processing can be performed on audio signals of different frequencies. Specifically, in this embodiment of the present application, different processing can be performed on components of different frequencies of the audio signal.

In step S301, after acquiring the audio signal to be processed, the frequency components of the audio signal to be processed can be analyzed to determine the audio components of multiple frequencies included in the audio signal to be processed, so as to facilitate subsequent processing of audio components of different frequencies. Different treatment. There are many feasible ways to determine the frequency components of the audio signal to be processed. In an implementation, Fourier transform can be performed on the audio signal to be processed to obtain the frequency spectrum of the audio signal to be processed, and the audio components of each frequency in the audio signal to be processed can be determined. In addition to using the Fourier transform, a filtering method, a progeny analysis method, etc. can also be used, and the audio components of multiple frequencies included in the to-be-processed audio signal can also be determined.

In step S302, the corresponding sound source direction may be determined for the audio components of each frequency of the audio signal to be processed. When determining the direction of the sound source, the audio signals collected by at least two microphones (such as a microphone array) can be used to determine the sound source location algorithm (for ease of reference, the microphone used to determine the sound source direction is referred to as a directional microphone below) . Specifically, it can be determined according to the audio components of the corresponding frequencies in the audio signals collected by each directional microphone. There are many optional sound source localization algorithms, such as beamforming algorithm, arrival time difference estimation algorithm, and differential microphone array algorithm.

In step S303, according to the determined sound source direction of the audio component, the radio response difference information corresponding to the audio component can be determined, and the determined radio response difference information can be used in step S304 for the audio component of the corresponding frequency. modulation.

Radio response difference information is information used to characterize the difference in amplitude response and/or phase response of the first radio organ and the second radio organ to the audio components of the same frequency transmitted in the same direction. It can have a variety of manifestations, such as In an implementation, the difference information of the reception response may include a first transfer coefficient and a second transfer coefficient.

Wherein, the first transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the first correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the first correspondence relationship, and the first transfer coefficient may be calculated . The second transfer coefficient may be determined according to the sound source direction and frequency corresponding to the audio component, and the second correspondence relationship. Specifically, the determined sound source direction and the corresponding frequency may be substituted into the second correspondence relationship to calculate the second transfer coefficient.

Further, in step S304, the audio component of the corresponding frequency in the audio signal to be processed may be modulated by the first transfer coefficient to generate a first audio signal for playing in the first radio organ, and the second transfer coefficient may be used for the audio component to be processed. The audio component of the corresponding frequency in the audio signal is modulated to generate a second audio signal for playing in the second radio organ.

It should be noted that the difference between the above-mentioned first correspondence and the second correspondence may reflect the difference in amplitude response and/or phase of the first sound-receiving organ and the second sound-receiving organ to the audio components of the same frequency transmitted in the same direction. Respond to differences. In this way, the difference between the first transfer coefficient determined based on the first correspondence relationship and the second transfer coefficient determined based on the second correspondence relationship can also reflect the difference between the aforementioned first sound receiving organ and the second sound receiving organ. The response difference. Furthermore, the difference between the first audio signal generated based on the first transfer coefficient and the second audio signal generated based on the second transfer coefficient can also reflect the difference between the first sound receiving organ and the second sound receiving organ. In response to the difference, when the first audio signal is heard by the first radio organ and the second audio signal is heard by the second radio organ, a clear three-dimensional feeling can be produced.

Regarding the first correspondence and the second correspondence, specifically, the first correspondence may be the correspondence between the sound source direction, frequency, and the first transfer coefficient, and the second correspondence may be the sound source direction, frequency, and the second transfer coefficient. The corresponding relationship. For example, in an implementation, if the first correspondence relationship and the second correspondence relationship are both expressed in the form of functions, both the first correspondence relationship and the second correspondence relationship may be

In the form of

And θ belong to the parameters that characterize the direction of the sound source,

It can represent the pitch angle, θ can represent the circumference angle, and k can correspond to the frequency number in the frequency spectrum, that is, k corresponds to different frequencies substantially. When determining the transfer coefficient of an audio component of a certain frequency, input the sound source direction of the audio component

With θ and the corresponding frequency number k, the transfer coefficient corresponding to the audio component can be determined.

Corresponding to the response difference between the two ears including the phase difference and the amplitude difference, the first correspondence and the second correspondence may also include the amplitude and the phase. In one embodiment, if the first corresponding relationship corresponds to the left ear, and the response of the left ear is considered to be the standard response, the first transfer coefficient of the audio component of a certain frequency may be T _L =1, and it corresponds to the right ear The second transfer coefficient, in one example, can be

Among them, α is used to represent the amplitude gain of the right ear relative to the left ear,

Used to indicate the phase difference between the ears.

For the determination of the first corresponding relationship and the second corresponding relationship, there are multiple optional implementation manners. In an implementation, the test signals of various frequencies emitted by sound sources in different directions can be obtained by measuring the radio responses of the radio organs. For example, in a specific example, microphones can be set at the two ear canals of an artificial head or a real person, and test signals of various frequencies can be played in different directions. For the test signals of each direction and frequency, separate Record the response signals collected by the two microphones. Further, the ratio of the response signal to the test signal can be used as a dependent variable, and the sound source direction and frequency can be used as independent variables to fit the corresponding relationship between the sound source direction, frequency and the transfer coefficient. The corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the first sound receiving organ can be regarded as the first correspondence, and the corresponding relationship between the direction and frequency of the sound source and the response signal collected by the microphone located in the second sound receiving organ can be regarded as The second correspondence.

In another implementation, the corresponding relationship can also be determined by the propagation model of the sound source to the human ears. For example, the characteristic parameters that are considered to affect the propagation process of sound from the sound source to the human ears can be selected in advance. These characteristic parameters include but are not limited to the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, Shoulder feature parameters, cheek feature parameters, hair feature parameters. According to the pre-selected characteristic parameters, combined with the principle of acoustic propagation, the first correspondence and the second correspondence can be derived.

When the audio component of the corresponding frequency is modulated by the transfer coefficient, specifically, in an optional implementation manner, the first transfer coefficient may be multiplied by the audio component of the corresponding frequency to obtain the new audio component of the corresponding frequency Component, this new audio component may be referred to as the first audio component. Correspondingly, the audio component of the corresponding frequency in the audio signal to be processed can be modulated by the second transfer coefficient, and a new audio component of the corresponding frequency can also be obtained. The new audio component obtained by the second transfer coefficient can be called the second Audio component. Wherein, each obtained first audio component can be used to subsequently generate a first audio signal, and each obtained second audio component can be used to subsequently generate a second audio signal. The specific generation process is described in detail later.

It should be noted that when the audio components are modulated by the transfer coefficients, the first transfer coefficient and the second transfer coefficient can be used to modulate the audio components in the frequency domain, respectively. Multiply. However, in another embodiment, modulation can also be performed in the time domain. For example, the first transfer coefficient and the second transfer coefficient can be converted into the first transfer coefficient in the time domain and the second transfer coefficient in the time domain, respectively. Furthermore, the first transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed, and the second transfer coefficient in the time domain is calculated by convolution with the audio signal to be processed to generate the required first audio signal and second audio signal. .

It should be noted that the audio signal processing method provided by the embodiments of the present application can be applied in a recording scene, and in this recording scene, at least two microphones are required to record at the same time, for example, through a microphone including multiple microphones. Array for recording. It can be understood that although the audio signals recorded by the microphones correspond to the same content (that is, they actually come from the same sound source), the recorded audio signals are also different due to the different positions of the microphones.

The audio signal to be processed is used as the object of the transfer coefficient and can be relatively flexible in selection. For example, in an implementation, the audio signal to be processed may be determined based on the audio signal collected by a directional microphone (that is, the aforementioned microphone for determining the direction of the sound source). In another implementation, the audio signal to be processed may be determined based on the audio signal collected by a microphone other than a directional microphone. For example, in a specific example, the microphone array may include 6 microphones, 3 of which may be selected as directional microphones, and the audio signal to be processed may be determined based on the audio signals collected by the other 3 microphones. For another example, in another example, the audio signal to be processed may also be determined based on audio signals collected by other microphones outside the microphone array, and the other microphones may also be microphones on other devices.

In order to obtain the first audio signal and the second audio signal of higher quality, the audio signal to be processed can be selected from the recorded audio signal with a higher signal-to-noise ratio. An optional implementation manner is that the audio signal to be processed may be the audio signal with the highest signal-to-noise ratio among the audio signals collected in the microphone. Since the direct action object of the transfer coefficient is the audio component of the corresponding frequency, in another optional implementation manner, the audio components of the same frequency in the audio signal collected by each directional microphone can be linearly combined to obtain the signal-to-noise The relatively high audio components of each frequency are used to combine with the first transfer coefficient and the second transfer coefficient.

After the first audio component is modulated by the first transfer coefficient and the second audio component is modulated by the second transfer coefficient, in order to further generate the first audio signal and the second audio signal in the time domain, it is necessary to perform the frequency domain to Time domain transformation. Specifically, the first audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the first audio signal, and the second audio component of each frequency may be used to perform frequency domain to time domain conversion to obtain the second audio signal. The transformation from the frequency domain to the time domain can be implemented using inverse Fourier transform.

Considering that the acquired audio signal to be processed is not necessarily a stable signal, and when the frequency spectrum is analyzed later, especially when the frequency spectrum is analyzed by Fourier transform, since the Fourier transform requires the input signal to be stable, In an implementation manner, the audio signal to be processed may be divided into frames according to a set frame length, and the audio signal to be processed may be divided into audio frames. For each audio frame, due to its short frame length, it can be considered that the signal within the audio frame is stable.

After the audio signal to be processed is processed into audio frames by framing, the subsequent processing will also be performed on the audio frames. For example, the audio frames can be analyzed to obtain the audio components of each frequency; the first audio of each frequency can be used The component is transformed to obtain a new first audio frame; the second audio component of each frequency is used to transform to obtain a new second audio frame. Further, each obtained new first audio frame is synthesized to obtain a first audio signal, and each obtained new second audio frame is synthesized to obtain a second audio signal.

When setting the frame length of an audio frame, you can usually set it flexibly based on experience. For example, if the sampling frequency of the microphone is fs, the number of sampling points included in an audio frame can be represented by N, and the frame length N can be selected in the range of 0.005fs<N<fs. In an optional implementation manner, N that can be set is a power of 2, even if the number of sampling points included in the audio frame is a power of 2, so in the subsequent spectrum analysis, The fast Fourier transform FFT can be used to accelerate the calculation.

Considering that when framing audio signals (framing operations can also be called signal truncation), periodic truncation is often not possible, that is, the truncated audio frames are often non-periodic signals. At this time, if you directly perform Fourier The conversion will show the phenomenon of spectrum leakage. Therefore, in one implementation, the audio frame can be modulated into a periodic signal before the Fourier transform is performed. The specific method of modulating into a periodic signal may be to add an analysis window to the audio frame, that is, to multiply the audio frame and the window function of the analysis window. The window function of the analysis window can be a sine window, a Hanning window, etc., which is not specifically limited here.

When the new audio frame obtained by the transformation is synthesized into a new audio signal, since there are overlapping parts between adjacent audio frames, each new audio frame can be processed through the overlap-add method. Yes, that is, the overlapping positions of the front and rear audio frames can be superimposed, and the superimposed audio frames can be directly combined to obtain the required audio signal. Further, considering that the audio frames are directly overlapped and accumulated, the overlapped part may have a sudden change in amplitude. In order to make the audio signal obtained after overlap and addition smooth, before performing overlap-add processing on each new audio frame, The amplitude at both ends of the audio frame can be distorted first. In specific implementation, the new audio frame obtained by the transformation may be windowed by the synthesis window, and the audio frame after the synthesis window may be used for overlap accumulation. There are also many options for the window function of the synthesis window, such as a sine window or a Hanning window.

A relatively detailed embodiment is provided below, which can be referred to FIG. 4, which is an algorithm block diagram of an exemplary audio signal processing method provided in an embodiment of the present application.

In the recording scene, multiple audio signals can be collected through the microphone array. For example, the microphone array contains M microphones, and M≥2, the time domain audio signal collected by the m-th microphone can be represented by xm(t), where m is the microphone number, m=1, 2,...,M, t is the sampled discrete time sequence, t=1, 2,....

Each microphone in the microphone array can be used as a directional microphone. The time domain audio signal xm(t) collected by each microphone is divided into frames. When dividing the frame, N sampling points can be extracted as an audio frame to obtain the time domain audio frame xm(n)l, where n is an audio The time sequence within the frame, n=1, 2,...,N; l is the frame sequence, l=1, 2,.... Perform windowing processing of the analysis window on the time domain audio frame xm(n)l to obtain x'm(n)l. The x'm(n)l added with the analysis window is input to the FFT module to obtain the frequency spectrum Xm(k)l of the time domain audio frame, where k represents a discrete spectrum sequence, and k=1, 2,...,N.

The frequency spectrum Xm(k)l corresponding to each microphone is input into the sound source localization module, and the sound source direction corresponding to the audio components of each frequency can be determined through the sound source localization algorithm based on the microphone array in the sound source localization module. Specifically, in this embodiment, the wave velocity formation algorithm can be used, and the sound source direction includes the circumferential angle θ and the pitch angle

In this embodiment, it can be considered that the sound source comes from all around, without distinguishing between upper and lower, that is,

The most basic beamforming algorithm is expressed as follows:

Among them, θi represents the direction of the sound source,

It is the weight vector of the sound source corresponding to the θi direction of the discrete frequency k, which is also called the steering vector. In the classic beamforming algorithm, the steering vector can be expressed as

ω is the sound source frequency corresponding to the discrete frequency k, and Δτm is the time difference between the time when the sound source in the θi direction reaches the m-th microphone and the time when it reaches the reference point. Therefore, the sound source direction corresponding to the k-th discrete frequency audio component is θ(k), that is, the direction in which the beamforming output is the largest at this frequency.

The determined sound source direction θ(k),

And the discrete frequency k is used as the input quantity, and the transfer coefficient determination module is input. In the transfer coefficient determination module, the above-mentioned input quantities can be substituted into the first corresponding relationship and the second corresponding relationship, respectively, to obtain the first transfer coefficient

And the second transfer coefficient

(In this embodiment, the first correspondence relationship corresponds to the left channel, and the second correspondence relationship corresponds to the right channel). In the stereo reconstruction module, the first transfer coefficient can be

Multiply the audio component Xref(k)l with the corresponding frequency in the audio signal to be processed to obtain the new first audio component XL(k)l of each frequency, and the second transfer coefficient

Multiply with the audio component Xref(k)l of the corresponding frequency in the audio signal to be processed to obtain the new second audio component XR(k)l of each frequency.

In this embodiment, the audio signal to be processed is a linear combination of audio components of the same frequency in the audio signals collected by each microphone, which can be expressed by the following formula:

Where wm represents the weight of the microphone, which can be a real number or a complex number.

Input the new first audio component XL(k)l of each frequency into the inverse fast Fourier transform IFFT module, and transform it from the frequency domain back to the time domain to obtain a new first audio frame x'L(n)l; The new second audio component XR(k)l of the frequency is input to the IFFT module, and a new second audio frame x'R(n)l is obtained. For each first audio frame x'L(n)l, perform the windowing processing of the synthesis window to obtain x"L(n)l, and input the first audio frame x"L(n)l after each synthesis window is added Overlap-add module, through the overlap and add method to modify, get the audio frame x"'L(n)l of the lth frame of the first audio signal, and combine each audio frame x"'L(n)l to get The complete first audio signal xL(t) for playing on the first radio organ. For each second audio frame x'R(n)l, the same, through the windowing process of the synthesis window to obtain x"R(n)l, add each second audio frame x"R(n) after the synthesis window l Input the Overlap-add module to obtain the audio frame x"'R(n)l of the lth frame of the second audio signal. Combine each audio frame x"'R(n)l to obtain a complete 2. The second audio signal xR(t) played on the radio organ.

In the above-provided implementation manner, the difference information of the reception response of the audio component has a corresponding relationship with the sound source direction and frequency of the audio component. But considering that the three-dimensional sense of sound is not only reflected in the clear direction, but also in the distance of the sound source, and the distance between the sound source is different, the response between the ears is also different, so it can be in the corresponding relationship Add the sound source distance as a variable, that is, the corresponding relationship can be

form. During specific implementation, the sound source distance of each audio component may be determined, and then the difference information of the sound reception response of the audio component may be determined according to the determined sound source distance, sound source direction, and frequency.

On the other hand, because the response difference between the two ears is actually related to a variety of characteristic parameters, for example, for the audio signal of the same frequency emitted by the sound source in the same direction, listening at the same position, the sound heard by different people is also different. different. This is because everyone has individual differences in the structure of the ears and the structures around the ears (such as shoulders, hair, cheeks, etc.). Therefore, in an embodiment, the variables in the correspondence relationship may also include designated characteristic parameters, that is, the difference information of the audio component's reception response may be determined according to the designated characteristic parameters, the distance of the sound source, and the frequency. The specified characteristic parameters may include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.

When obtaining the user's specified characteristic parameters, there may be multiple implementation methods. In an implementation, the user can interact with the user to enable the user to input relevant information, thereby obtaining the user's designated characteristic parameter. In another implementation, image recognition technology can be used to identify the user's image to obtain the required specified characteristic parameters. For example, the distance between the ears of the user can be obtained through image ranging, and the characteristic parameters of the ear canal of the user can be captured through image feature recognition.

The foregoing is the description of the audio signal processing method provided by the embodiments of the present application. In the method provided by the embodiments of the present application, each audio component of the audio signal to be processed is modulated according to the difference information of the sound reception response corresponding to the audio component, wherein the difference information of the sound reception response represents that the first sound receiving organ and the second sound receiving organ are identical to each other. The information about the difference in amplitude response and/or the difference in phase response of the audio components of the same frequency transmitted from the direction. Therefore, after the information is modulated according to the difference in the radio response, the generated first audio signal and the second audio signal can also be reflected This difference in amplitude response and/or phase response difference, through this difference in response, can construct a sufficient sense of space, so that the human ear can clearly distinguish the direction of the sound.

Please refer to FIG. 5 below for a schematic structural diagram of an exemplary audio processing device provided by an embodiment of the present application. The device includes:

The memory 501 is used to store computer programs;

The processor 502 is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:

Determining a sound source direction corresponding to each of the multiple audio components;

According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;

Optionally, the difference information of the sound reception response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is determined according to a sound source direction, frequency, and a first correspondence relationship corresponding to the audio component. The second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence. The difference between the first correspondence and the second correspondence is used to characterize the first radio The difference in amplitude response and/or phase response of the audio component of the same frequency transmitted from the same direction by the organ and the second sound-receiving organ.

Optionally, the first correspondence relationship is obtained by measuring the reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;

The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.

Optionally, the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is established according to preselected characteristic parameters.

Optionally, the preselected characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.

Optionally, the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.

Optionally, the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each microphone.

Optionally, the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.

Optionally, the audio signal to be processed is obtained based on the audio signal collected by the microphone.

Optionally, the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each microphone.

Optionally, the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each microphone.

Optionally, the to-be-processed audio signal is obtained based on audio signals collected by a microphone other than the microphone.

Optionally, the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio signal is the first audio component using each frequency. The audio component is transformed;

According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.

Optionally, the audio signal to be processed is divided into multiple audio frames, and the audio component is an audio component contained in the audio frame;

A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;

A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.

Optionally, the number of sampling points included in the audio frame is a power of two.

Optionally, the audio components corresponding to the frequencies contained in the audio frame are determined by fast Fourier transform FFT.

Optionally, the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.

Optionally, when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.

Optionally, when the processor uses each of the first audio frames to synthesize the first audio signal, it is specifically configured to combine each of the first audio frames through overlap-add processing, Obtaining the first audio signal;

When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.

Optionally, the processor is further configured to, before performing processing by the overlap-add method, eliminate the amplitude distortion at both ends of the first audio frame and the second audio frame, respectively.

Optionally, the processor is specifically configured to separately correct the first audio frame and the second audio frame when respectively eliminating the distortion of the amplitudes at both ends of the first audio frame and the second audio frame. Add synthesis window.

Optionally, the sound source direction includes: a circumferential angle and/or a pitch angle.

Optionally, the processor is further configured to determine a sound source distance of each of the audio components, where the sound source distance is used to determine the difference information of the reception response of the audio component of the corresponding frequency.

Optionally, the processor is further configured to obtain designated characteristic parameters of the user's binaural and binaural periphery, where the designated characteristic parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.

Optionally, the specified characteristic parameter is obtained by performing image recognition on the user.

Optionally, the designated characteristic parameters include one or more of the following: distance between ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, and hair characteristic parameters.

For each embodiment of the audio processing device provided above, for specific implementation manners, please refer to the relevant description of the audio signal processing method provided by the embodiments of this application in the document of this application, which will not be repeated here.

Please refer to FIG. 6 below. FIG. 6 is a schematic structural diagram of an exemplary recording device provided by an embodiment of the present application. The recording device includes: at least two microphones 601, a memory 602, and a processor 603;

The microphone 601 is used to collect audio signals;

The memory 602 is used to store computer programs;

The processor 603 is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:

Using the audio signal collected by the microphone to determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm;

Optionally, the first correspondence relationship is obtained by respectively measuring the sound reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;

Optionally, the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform FFT.

Optionally, the recording device can be provided with a port for connecting with other external devices, and by connecting with other devices through this port, audio signals can be obtained from other devices, and the to-be-processed audio signals can be based on the audio signals obtained by the other devices. definite.

The recording device is an electronic device with a recording function, which can specifically be any of the following: a mobile phone, a camera, a video camera, a sports camera, a pan-tilt camera, a speaker, a VR headset, a monitor, a voice recorder, and a microphone.

For the specific implementations of the various embodiments of the recording device provided above, please refer to the relevant description of the audio signal processing method provided by the embodiments of the application in the document of this application, which will not be repeated here.

The embodiments of the present application also provide a computer-readable storage medium, and the computer-readable storage medium stores a computer program. When the computer program is executed by a processor, any of the implementation manners provided in the embodiments of the present application can be implemented. The audio signal processing method under.

As long as there is no conflict or contradiction between the technical features provided in the above embodiments, those skilled in the art can combine the various technical features according to actual conditions to form various different embodiments. However, the document of this application is limited in length and does not describe various embodiments. However, it is understandable that various embodiments also belong to the scope of the disclosure of the embodiments of this application.

The embodiments of the present application may adopt the form of a computer program product implemented on one or more storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing program codes. Computer usable storage media include permanent and non-permanent, removable and non-removable media, and information storage can be realized by any method or technology. The information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disc (DVD) or other optical storage, Magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.

It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. The terms "including", "including" or any other variations thereof are intended to cover non-exclusive inclusion, so that a process, method, article, or device that includes a series of elements includes not only those elements, but also other elements that are not explicitly listed. Elements, or also include elements inherent to such processes, methods, articles, or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.

The methods, devices, and equipment provided by the embodiments of the present invention are described in detail above. Specific examples are used in this article to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods of the present invention. And its core ideas; at the same time, for those of ordinary skill in the art, according to the ideas of the present invention, there will be changes in the specific implementation and the scope of application. limits.

Claims

An audio signal processing method, characterized in that it comprises:

Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;

Determining a sound source direction corresponding to each of the multiple audio components;

According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;

According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
The audio signal processing method according to claim 1, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction corresponding to the audio component. , The frequency and the first corresponding relationship are determined, the second transfer coefficient is determined according to the sound source direction, frequency and the second corresponding relationship corresponding to the audio component, the first corresponding relationship and the second corresponding relationship The difference between is used to characterize the amplitude response difference and/or the phase response difference of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction.
The audio signal processing method according to claim 2, wherein the first correspondence is obtained by measuring the reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;

The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
The audio signal processing method according to claim 2, wherein the first corresponding relationship and the second corresponding relationship are determined according to a propagation model from a sound source to human ears, and the propagation model is determined according to a preselected The characteristic parameters are established.
The audio signal processing method according to claim 4, wherein the preselected characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristics Parameters, hair characteristic parameters.
The audio signal processing method according to claim 1, wherein the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
The audio signal processing method according to claim 6, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
The audio signal processing method according to claim 6, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
The audio signal processing method according to claim 6, wherein the to-be-processed audio signal is obtained based on the audio signal collected by the microphone.
The audio signal processing method according to claim 9, wherein the to-be-processed audio signal is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
The audio signal processing method according to claim 9, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
7. The audio signal processing method according to claim 6, wherein the to-be-processed audio signal is obtained based on an audio signal collected by a microphone other than the microphone.
The audio signal processing method according to claim 2, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component of each frequency is obtained. An audio signal is obtained by transforming the first audio component of each frequency;

According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
The audio signal processing method according to claim 13, wherein the to-be-processed audio signal is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;

A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;

A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
The audio signal processing method according to claim 14, wherein the number of sampling points included in the audio frame is a power of 2.
The audio signal processing method according to claim 15, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
The audio signal processing method according to claim 14, wherein before determining the audio components corresponding to each frequency contained in the audio frame, the method further comprises:

The audio frame is modulated into a periodic signal.
The audio signal processing method according to claim 17, wherein the modulating the audio frame into a periodic signal comprises:

An analysis window is added to the audio frame.
The audio signal processing method according to claim 14, wherein synthesizing the first audio signal using each of the first audio frames comprises:

Combining each of the first audio frames through overlap-add processing to obtain the first audio signal;

Synthesizing the second audio signal using each of the second audio frames includes:

Each of the second audio frames is combined through overlap-add processing to obtain the second audio signal.
The audio signal processing method according to claim 19, characterized in that, before processing by the overlap-add method, the method further comprises:

The amplitude distortion at both ends of the first audio frame and the second audio frame are respectively eliminated.
22. The audio signal processing method according to claim 20, wherein the respectively removing the distortion of the amplitudes at both ends of the first audio frame and the second audio frame comprises:

Adding a synthesis window to the first audio frame and the second audio frame respectively.
The audio signal processing method according to claim 1, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
The audio signal processing method according to claim 1, further comprising:

The sound source distance of each of the audio components is determined, and the sound source distance is used to determine the difference information of the reception response of the audio components of the corresponding frequency.
The audio signal processing method according to claim 1, further comprising:

Acquire designated feature parameters of the user's binaural and binaural periphery, where the specified feature parameter is used to determine the difference information of the reception response of the audio component of the corresponding frequency.
The audio signal processing method according to claim 24, wherein the specified characteristic parameter is obtained by performing image recognition on the user.
The audio signal processing method according to claim 24, wherein the specified characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristic parameters , Hair characteristic parameters.
An audio processing device, characterized in that it comprises:

Memory, used to store computer programs;

The processor is configured to call the computer program, and when the computer program is executed by the processor, the following steps are implemented:

Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;

Determining a sound source direction corresponding to each of the multiple audio components;

According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;

According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
The audio processing device according to claim 27, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction corresponding to the audio component, The frequency and the first correspondence relationship are determined, the second transfer coefficient is determined according to the sound source direction, frequency, and a second correspondence relationship corresponding to the audio component, and the first correspondence relationship is different from the second correspondence relationship The difference between the two is used to characterize the difference in amplitude response and/or phase response of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction.
28. The audio processing device according to claim 28, wherein the first corresponding relationship is obtained by measuring the radio response of the first radio organ for test signals of various frequencies emitted by sound sources in different directions;

The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
The audio processing device according to claim 28, wherein the first corresponding relationship and the second corresponding relationship are determined according to a propagation model from the sound source to the human ears, and the propagation model is determined according to a preselected Characteristic parameters are established.
The audio processing device according to claim 30, wherein the preselected characteristic parameters include one or more of the following: binaural distance, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, and cheek characteristic parameters , Hair characteristic parameters.
The audio processing device according to claim 27, wherein the sound source direction corresponding to each of the audio components is determined by a sound source localization algorithm using audio signals collected by at least two microphones.
The audio processing device according to claim 32, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
The audio processing device according to claim 32, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
The audio processing device according to claim 32, wherein the to-be-processed audio signal is obtained based on the audio signal collected by the microphone.
The audio processing device according to claim 35, wherein the to-be-processed audio signal is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
The audio processing device according to claim 35, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
The audio processing device according to claim 32, wherein the to-be-processed audio signal is obtained based on an audio signal collected by a microphone other than the microphone.
The audio processing device according to claim 28, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component is The audio signal is obtained by transforming the first audio component of each frequency;

According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
The audio processing device according to claim 39, wherein the audio signal to be processed is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;

A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;

A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
The audio processing device of claim 40, wherein the number of sampling points included in the audio frame is a power of two.
The audio processing device according to claim 41, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
The audio processing device according to claim 40, wherein the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
The audio processing device according to claim 43, wherein the processor is specifically configured to add an analysis window to the audio frame when the processor modulates the audio frame into a periodic signal.
The audio processing device according to claim 40, wherein when the processor synthesizes the first audio signal by using each of the first audio frames, it is specifically configured to overlap each of the first audio frames by overlapping Adding overlap-add processing and then combining to obtain the first audio signal;

When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
The audio processing device according to claim 45, wherein the processor is further configured to remove the first audio frame and the first audio frame before processing by the overlap-add method. 2. The distortion of the amplitude at both ends of the audio frame.
The audio processing device according to claim 46, wherein the processor is specifically configured to separately correct the amplitude distortion at both ends of the first audio frame and the second audio frame. A synthesis window is added to the first audio frame and the second audio frame.
The audio processing device according to claim 27, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
The audio processing device according to claim 27, wherein the processor is further configured to determine the sound source distance of each of the audio components, and the sound source distance is used to determine the sound source distance of the audio component of the corresponding frequency. The radio response difference information.
The audio processing device according to claim 27, wherein the processor is further configured to obtain a designated characteristic parameter of the user's binaural and binaural periphery, and the designated characteristic parameter is used to determine the corresponding frequency of the The radio response difference information of the audio component.
The audio processing device of claim 50, wherein the designated characteristic parameter is obtained by performing image recognition on the user.
The audio processing device according to claim 50, wherein the specified characteristic parameters comprise one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, Hair characteristic parameters.
A recording device, characterized by comprising: at least two microphones, a memory and a processor;

The microphone is used to collect audio signals;

The memory is used to store a computer program;

The processor is configured to call the computer program, and when the processor executes the calculation program, the following steps are implemented:

Acquiring a to-be-processed audio signal, where the to-be-processed audio signal includes audio components of multiple frequencies;

Using the audio signal collected by the microphone to determine the sound source direction corresponding to each of the multiple audio components through a sound source localization algorithm;

According to the direction of the sound source, the difference information of the sound reception response corresponding to each of the audio components is determined, and the difference information of the sound reception response includes: the amplitude of the first sound receiving organ and the second sound receiving organ to the audio components of the same frequency transmitted in the same direction Response difference and/or phase response difference;

According to the difference information of the radio response, the audio component of the corresponding frequency in the to-be-processed audio signal is modulated to generate a first audio signal to be played in the first radio organ, and to be used in the The second audio signal played by the second radio organ.
The recording device according to claim 53, wherein the difference information of the radio response includes a first transfer coefficient and a second transfer coefficient, and the first transfer coefficient is based on the sound source direction and frequency corresponding to the audio component. And the first correspondence is determined, the second transfer coefficient is determined according to the sound source direction and frequency corresponding to the audio component, and a second correspondence between the first correspondence and the second correspondence The difference of is used to characterize the difference in amplitude response and/or phase response of the first sound-receiving organ and the second sound-receiving organ to the audio components of the same frequency transmitted in the same direction.
The recording device according to claim 54, wherein the first corresponding relationship is obtained by measuring the sound reception response of the first sound receiving organ for test signals of each frequency emitted by sound sources in different directions;

The second correspondence relationship is obtained by measuring the sound reception response of the second sound receiving organ for test signals of various frequencies emitted by sound sources in different directions.
The recording device according to claim 54, wherein the first correspondence and the second correspondence are determined according to a propagation model from the sound source to the human ears, and the propagation model is based on a preselected feature Established by the parameters.
The recording device according to claim 56, wherein the preselected characteristic parameters include one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, Hair characteristic parameters.
The recording device according to claim 53, wherein the sound source direction corresponding to the audio component is determined according to the audio component of the corresponding frequency in the audio signal collected by each of the microphones.
The recording device according to claim 53, wherein the sound source localization algorithm is any one of the following: a beamforming algorithm, an arrival time difference estimation algorithm, and a differential microphone array algorithm.
The recording device according to claim 53, wherein the audio signal to be processed is obtained based on the audio signal collected by the microphone.
The recording device according to claim 60, wherein the audio signal to be processed is an audio signal with the highest signal-to-noise ratio among the audio signals collected by each of the microphones.
The recording device according to claim 60, wherein the audio components of each frequency of the audio signal to be processed are linear combinations of audio components of corresponding frequencies in the audio signals collected by each of the microphones.
The recording device according to claim 53, wherein the audio signal to be processed is obtained based on audio signals collected by a microphone other than the microphone.
The recording device according to claim 63, wherein the recording device is connected to another external recording device, and the to-be-processed audio signal is obtained based on the audio signal collected by the other recording device.
The recording device according to claim 54, wherein the audio component of the corresponding frequency is modulated according to the first transfer coefficient to obtain a new first audio component of each frequency, and the first audio component is The signal is obtained by transforming the first audio component of each frequency;

According to the second transfer coefficient, the audio component of the corresponding frequency is modulated to obtain a new second audio component of each frequency, and the second audio signal is obtained by transforming the second audio component of each frequency of.
The recording device according to claim 65, wherein the audio signal to be processed is divided into a plurality of audio frames, and the audio component is an audio component contained in the audio frame;

A new first audio frame obtained by transforming the first audio component of each frequency is a new first audio frame, and the first audio signal is synthesized using each of the first audio frames;

A new second audio frame is obtained by transforming the second audio component of each frequency, and the second audio signal is synthesized by using each of the second audio frames.
The recording device according to claim 66, wherein the number of sampling points included in the audio frame is a power of 2.
The recording device according to claim 67, wherein the audio components corresponding to each frequency contained in the audio frame are determined by fast Fourier transform (FFT).
The recording device according to claim 66, wherein the processor is further configured to modulate the audio frame into a periodic signal before determining the audio components corresponding to each frequency contained in the audio frame.
The recording device according to claim 69, wherein when the processor modulates the audio frame into a periodic signal, it is specifically configured to add an analysis window to the audio frame.
The recording device according to claim 66, wherein when the processor synthesizes the first audio signal by using each of the first audio frames, it is specifically configured to combine each of the first audio frames by overlapping phases. Adding overlap-add processing and then performing combination to obtain the first audio signal;

When the processor uses each of the second audio frames to synthesize the second audio signal, it is specifically configured to combine each of the second audio frames through overlap-add processing to obtain the second audio frame. Two audio signals.
The recording device according to claim 71, wherein the processor is further configured to eliminate the first audio frame and the second audio frame before processing by the overlap-add method. The distortion of the amplitude at both ends of the audio frame.
The recording device according to claim 72, wherein the processor is specifically configured to separately correct the first audio frame and the second audio frame to eliminate the distortion of the amplitude at both ends of the second audio frame. A synthesis window is added to an audio frame and the second audio frame.
The recording device according to claim 53, wherein the sound source direction comprises: a circumferential angle and/or a pitch angle.
The recording device according to claim 53, wherein the processor is further configured to determine the sound source distance of each of the audio components, and the sound source distance is used to determine the sound source distance of the audio component of the corresponding frequency. Describe the difference information of the radio response.
The recording device according to claim 53, wherein the processor is further configured to obtain a designated characteristic parameter of the user's binaural and binaural periphery, and the designated characteristic parameter is used to determine the audio frequency of the corresponding frequency. The radio response difference information of the component.
The recording device according to claim 76, wherein the designated characteristic parameter is obtained by performing image recognition on the user.
The recording device according to claim 76, wherein the specified characteristic parameters include one or more of the following: distance between two ears, auricle characteristic parameters, ear canal characteristic parameters, shoulder characteristic parameters, cheek characteristic parameters, hair Characteristic Parameters.
The recording device according to claim 53, wherein the recording device is specifically any one of the following: a mobile phone, a camera, a video camera, a sports camera, a pan/tilt camera, a speaker, a head-mounted VR device, a monitor, and a voice recorder ,microphone.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the audio signal processing according to any one of claims 1 to 26 is realized method.