CN104064191B - Sound mixing method and device - Google Patents

Sound mixing method and device Download PDF

Info

Publication number
CN104064191B
CN104064191B CN201410256380.5A CN201410256380A CN104064191B CN 104064191 B CN104064191 B CN 104064191B CN 201410256380 A CN201410256380 A CN 201410256380A CN 104064191 B CN104064191 B CN 104064191B
Authority
CN
China
Prior art keywords
sound
channels
audio data
source
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410256380.5A
Other languages
Chinese (zh)
Other versions
CN104064191A (en
Inventor
田彪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Taile Culture Technology Co ltd
Original Assignee
Beijing Yinzhibang Culture Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yinzhibang Culture Technology Co ltd filed Critical Beijing Yinzhibang Culture Technology Co ltd
Priority to CN201410256380.5A priority Critical patent/CN104064191B/en
Publication of CN104064191A publication Critical patent/CN104064191A/en
Application granted granted Critical
Publication of CN104064191B publication Critical patent/CN104064191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Stereophonic System (AREA)

Abstract

The invention provides a sound mixing method and device. The embodiment of the invention carries out frequency domain transformation processing on the original audio data of each sound source in the obtained original audio data of at least two sound sources to obtain the frequency domain data corresponding to the original audio data of each sound source, then, at least two preset direction information are utilized to respectively carry out filtering processing on the frequency domain data corresponding to the original audio data of each sound source, to obtain the filtering data corresponding to the original audio data of each sound source, so that the filtering data corresponding to the original audio data of each sound source can be mixed, and because the appointed azimuth information is utilized, the audio signals of the sound sources to be mixed are filtered, so that the sound image of each sound source can be positioned at different positions, not all at one location, and thus each sound source after mixing feels very clear acoustically, thereby improving the quality of audio after mixing.

Description

Sound mixing method and device
【Technical field】
The present invention relates to audio signal processing technique, more particularly to a kind of sound mixing method and device.
【Background technology】
With the development of the communication technology, the application platform of multi-party exchange is engendered, for example, video conference or live electricity Platform etc., therefore, it is necessary to carry out audio mixing to multipath audio signal in the processing of audio, sounded so as to obtain multipath audio signal Similar to a sound, its effect is just as hear the speech of more individuals simultaneously.In the prior art, typically using linear superposition Mode, i.e. the quantization to the advanced row data of audio signal of multichannel, then the data of whole are added.So, after audio mixing Audio quality is not high.
【The content of the invention】
The many aspects of the present invention provide a kind of sound mixing method and device, to improve the audio quality after audio mixing.
An aspect of of the present present invention, there is provided a kind of sound mixing method, including:
Obtain the original audio data of at least two sources of sound;
Frequency-domain transform is carried out to the original audio data of each source of sound in the original audio data of at least two source of sound Processing, to obtain the frequency domain data corresponding to the original audio data of each source of sound;
Using at least two azimuth informations pre-set, respectively to the frequency corresponding to the original audio data of each source of sound Numeric field data is filtered processing, to obtain the filtering data corresponding to the original audio data of each source of sound, to each source of sound Filtering data corresponding to original audio data, carry out stereo process.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the acquisition is extremely The original audio data of few two sources of sound, including:
The frame head of target audio file is parsed, to determine the target channels number of the target audio file;It is right The data block of the target audio file is decoded, to obtain the original audio data of source of sound;And according to the target sound The original audio data of road number and the source of sound, obtain the channel audio data corresponding to each target channels;And/or
The audio signals of at least one target channels is sampled, quantified and coded treatment, with described in obtaining at least one Channel audio data in individual target channels corresponding to each target channels.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to described The original audio data of each source of sound carries out frequency-domain transform processing in the original audio data of at least two sources of sound, each to obtain Frequency domain data corresponding to the original audio data of source of sound, including:
It is determined that at least two target channels of audio mixing are treated, to be used as upmixed channels;
Sub-frame processing is carried out to the channel audio data corresponding to each upmixed channels, to obtain each upmixed channels extremely A few frame voice data;
To an at least frame voice data for each upmixed channels, frequency-domain transform processing is carried out, to obtain each upmixed channels Corresponding frequency domain data.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described using pre- At least two azimuth informations first set, are filtered to the frequency domain data corresponding to the original audio data of each source of sound respectively Processing, to obtain the filtering data corresponding to the original audio data of each source of sound, to the original audio data institute of each source of sound Corresponding filtering data, stereo process is carried out, including:
According to the azimuth information of each upmixed channels, the frequency response parameter of each upmixed channels is obtained;
According to the frequency domain data corresponding to the frequency response parameter of each upmixed channels and each upmixed channels, obtain each The filtering data of upmixed channels;
To the filtering data of each upmixed channels, stereo process is carried out.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the frequency domain become It is Fast Fourier Transform (FFT) method to change processing;The azimuth information of each upmixed channels of basis, obtains each upmixed channels Frequency response parameter, including:
According to the azimuth information of each upmixed channels, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), Obtain the frequency response parameter of each upmixed channels;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
K is frequency, span [0, N-1];
T (k, m) is the frequency response parameter value of k-th of frequency;
fsFor sample rate;
fkFor the frequency of k-th of frequency;
N is the points of Fast Fourier Transform (FFT) method;
θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
V is the velocity of sound, 340 meter per seconds;
Round (x) represents to take the integer closest to x.
Another aspect of the present invention, there is provided a kind of device sound mixing, including:
Acquiring unit, for obtaining the original audio data of at least two sources of sound;
Converter unit, the original audio data for each source of sound in the original audio data at least two source of sound Frequency-domain transform processing is carried out, to obtain the frequency domain data corresponding to the original audio data of each source of sound;
Downmixing unit, for utilizing at least two azimuth informations pre-set, respectively to the original audio of each source of sound Frequency domain data corresponding to data is filtered processing, to obtain the filtering number corresponding to the original audio data of each source of sound According to the filtering data corresponding to the original audio data of each source of sound, progress stereo process.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, it is described to obtain list Member, it is specifically used for
The frame head of target audio file is parsed, to determine the target channels number of the target audio file;It is right The data block of the target audio file is decoded, to obtain the original audio data of source of sound;And according to the target sound The original audio data of road number and the source of sound, obtain the channel audio data corresponding to each target channels;And/or
The audio signals of at least one target channels is sampled, quantified and coded treatment, with described in obtaining at least one Channel audio data in individual target channels corresponding to each target channels.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the conversion are single Member, it is used for
It is determined that at least two target channels of audio mixing are treated, to be used as upmixed channels;
Sub-frame processing is carried out to the channel audio data corresponding to each upmixed channels, to obtain each upmixed channels extremely A few frame voice data;And
To an at least frame voice data for each upmixed channels, frequency-domain transform processing is carried out, to obtain each upmixed channels Corresponding frequency domain data.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the audio mixing list Member, specifically for the frequency domain data corresponding to the original audio data of source of sound
According to the azimuth information of each upmixed channels, the frequency response parameter of each upmixed channels is obtained;
According to the frequency domain data corresponding to the frequency response parameter of each upmixed channels and each upmixed channels, obtain each The filtering data of upmixed channels;And
To the filtering data of each upmixed channels, stereo process is carried out.
Aspect as described above and any possible implementation, it is further provided a kind of implementation, the frequency domain become It is Fast Fourier Transform (FFT) method to change processing;The downmixing unit, is specifically used for
According to the azimuth information of each upmixed channels, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), Obtain the frequency response parameter of each upmixed channels;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
K is frequency, span [0, N-1];
T (k, m) is the frequency response parameter value of k-th of frequency;
fsFor sample rate;
fkFor the frequency of k-th of frequency;
N is the points of Fast Fourier Transform (FFT) method;
θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
V is the velocity of sound, 340 meter per seconds;
Round (x) represents to take the integer closest to x.
As shown from the above technical solution, the embodiment of the present invention passes through the original audio number at least two acquired sources of sound The original audio data of each source of sound carries out frequency-domain transform processing in, to obtain corresponding to the original audio data of each source of sound Frequency domain data, and then using at least two azimuth informations that pre-set, respectively to the original audio data institute of each source of sound Corresponding frequency domain data is filtered processing, to obtain the filtering data corresponding to the original audio data of each source of sound so that Stereo process can be carried out to the filtering data corresponding to the original audio data of each source of sound, due to utilizing the orientation specified Information, the audio signal for treating audio mixing source of sound are filtered processing so that and the acoustic image of each source of sound can be located at different positions, Rather than all a position, therefore, each source of sound after audio mixing can acoustically feel very clear, so as to improve audio mixing Audio quality afterwards.
【Brief description of the drawings】
Technical scheme in order to illustrate the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be the present invention some realities Example is applied, for those of ordinary skill in the art, without having to pay creative labor, can also be attached according to these Figure obtains other accompanying drawings.
Fig. 1 is the schematic flow sheet for the sound mixing method that one embodiment of the invention provides;
Fig. 2 is the structural representation for the device sound mixing that another embodiment of the present invention provides.
【Embodiment】
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The whole other embodiments obtained under the premise of creative work is not made, belong to the scope of protection of the invention.
It should be noted that terminal involved in the embodiment of the present invention can include but is not limited to mobile phone, individual digital Assistant (Personal Digital Assistant, PDA), wireless handheld device, wireless networking sheet, portable computer, personal electricity Brain (Personal Computer, PC), MP3 player, MP4 players etc..
In addition, the terms "and/or", only a kind of incidence relation for describing affiliated partner, represents there may be Three kinds of relations, for example, A and/or B, can be represented:Individualism A, while A and B be present, these three situations of individualism B.Separately Outside, character "/" herein, it is a kind of relation of "or" to typically represent forward-backward correlation object.
Fig. 1 is a kind of schematic flow sheet for sound mixing method that one embodiment of the invention provides, as shown in Figure 1.
101st, the original audio data of at least two sources of sound is obtained.
102nd, frequency domain is carried out to the original audio data of each source of sound in the original audio data of at least two source of sound Conversion process, to obtain the frequency domain data corresponding to the original audio data of each source of sound.
103rd, using at least two azimuth informations pre-set, respectively to corresponding to the original audio data of each source of sound Frequency domain data be filtered processing, to obtain the filtering data corresponding to the original audio data of each source of sound, to each sound Filtering data corresponding to the original audio data in source, carry out stereo process.
Wherein, the azimuth information, for indicating the acoustic image positions of source of sound.Any object of reference can be used, for example, people Any one ear, people two ears among position etc..
It should be noted that 101~103 executive agent can be processing unit, the application that can be located locally (Application, App) either may be located in the server of network side or can with one for example, Baidu's pleasure is broadcast In the application that part is located locally, another part is located at the server of network side.
It is understood that the application can be mounted in the application program (nativeAPP) in terminal, or may be used also To be a webpage (webAPP) of browser in terminal, as long as the objective reality form of the processing of voice data can be realized Can, the present embodiment is to this without limiting.
So-called source of sound, it is exactly the source of audio signal.Audio signal, it is a kind of continuously varying analog signal.At audio Reason equipment can be sampled to the audio signal gathered, quantify and coded treatment, to obtain pulse code modulation (Pulse Code Modulation, PCM) data, and then compression algorithm is used again, PCM data is compressed, to obtain different compressions The audio file of form.
Wherein, the audio file can include the audio file of various compressed formats in the prior art, for example, Dynamic Graph Picture expert group (Moving Picture Experts Group, MPEG) layer 3 (MPEGLayer-3, MP3) formatted audio files, WMA (Windows Media Audio) formatted audio files, Advanced Audio Coding (Advanced Audio Coding, AAC) Formatted audio files, Lossless Audio Compression coding (Free Lossless Audio Codec, FLAC) or APE format audios text Part etc., the present embodiment is to this without being particularly limited to.
Alternatively, in a possible implementation of the present embodiment, in 101, processing unit specifically can be to mesh The frame head of mark with phonetic symbols frequency file is parsed, to determine the target channels number of the target audio file;To the target audio The data block of file is decoded, to obtain the original audio data of source of sound, i.e. PCM data;And according to the target channels The original audio data of number and the source of sound, obtain the channel audio data corresponding to each target channels.
Alternatively, in a possible implementation of the present embodiment, in 101, processing unit specifically can be to extremely The audio signal (i.e. speech simulation signal) of few target channels is sampled, quantifies and coded treatment, with described in acquisition extremely Channel audio data in few target channels corresponding to each target channels, i.e. PCM data.
Alternatively, in a possible implementation of the present embodiment, in 102, processing unit can specifically determine At least two target channels of audio mixing are treated, to be used as upmixed channels.Then, the processing unit is to corresponding to each upmixed channels Channel audio data carry out sub-frame processing, to obtain an at least frame voice data for each upmixed channels, and then to each mixed An at least frame voice data in speech road, frequency-domain transform processing is carried out, to obtain the frequency domain data corresponding to each upmixed channels.
Specifically, because mixing operation is performed according to sound channel is corresponding, therefore, the processing unit specifically can be according to every Number of channels corresponding to individual source of sound, it is determined that at least two target channels of audio mixing are treated, to be used as upmixed channels.
For example, the number of channels of source of sound 1 is 1, L11 can be designated as, the number of channels of source of sound 2 is also 1, can be designated as L21, So, the processing unit can then determine that L11 and L21 is upmixed channels.
Or for another example the number of channels of source of sound 1 is 2, L12 and L13 can be designated as, the number of channels of source of sound 2 is also 2, L22 and L23 can be designated as, then, the processing unit can then determine that L12 and L22 is one group of upmixed channels, determine L13 and L23 is another group of upmixed channels.
For example, the number of channels of source of sound 1 is 1, L14 can be designated as, the number of channels of source of sound 2 is 2, can be designated as L24 and L25.So, the processing unit can then use two ways, carry out the determination of upmixed channels.
A kind of mode is that the processing unit can be handled the original audio data of two sound channels of source of sound 2, with Obtain the original audio data that 1 sound channel is L26.Processing unit can specifically use processing method of the prior art, by 2 The original audio data of sound channel is converted into the original audio data of 1 sound channel, and detailed description may refer to phase of the prior art Hold inside the Pass, here is omitted.So, then it is upmixed channels that can determine L14 and L26.
Another way is that the processing unit can be handled the original audio data of 1 sound channel of source of sound 1, To obtain the original audio data that 2 sound channels are L15 and L16.Processing unit can specifically use processing side of the prior art Method, the original audio data of 1 sound channel is converted into the original audio data of 2 sound channels, and detailed description may refer to existing skill Related content in art, here is omitted.So, then it is one group of upmixed channels that can determine L15 and L24, determine L16 and L25 is another group of upmixed channels.
Specifically, the frequency-domain transform processing can include but is not limited to Fast Fourier Transform (FFT) (Fast Fourier Transform, FFT).
For example, the processing unit can be the channel audio data corresponding to each upmixed channels according to preset time Every, for example, 20ms, carries out sub-frame processing, and there is the data overlap of part between consecutive frame, such as 50% data overlap, this Sample, an at least frame voice data for each upmixed channels can be obtained.Then, the processing unit then can be to each audio mixing sound An at least frame voice data in road, FFT processing is carried out, to obtain the frequency domain data corresponding to each upmixed channels, is designated as Ai,j; Wherein, i represents the numbering of frequency, and j represents the numbering of frame, Ai,jRepresent frequency domain data of j-th of frame at i-th of frequency.
Alternatively, in a possible implementation of the present embodiment, in 103, processing unit specifically can basis The azimuth information of each upmixed channels, the frequency response parameter of each upmixed channels is obtained, and then according to each upmixed channels Frequency domain data corresponding to frequency response parameter and each upmixed channels, obtain the filtering data of each upmixed channels.Then, institute State processing unit then can carry out stereo process to the filtering data of each upmixed channels.
For example, the frequency-domain transform processing is FFT processing, the processing unit specifically can be according to each upmixed channels Azimuth information, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), obtain the frequency response of each upmixed channels Parameter;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
K is frequency, span [0, N-1];
T (k, m) is the frequency response parameter value of k-th of frequency;
fsFor sample rate;
fkFor the frequency of k-th of frequency;
N is the points of Fast Fourier Transform (FFT) method;
θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
V is the velocity of sound, 340 meter per seconds;
Round (x) represents to take the integer closest to x.
Specifically, θmIt specifically can flexibly be set, caused as far as possible each according to the number M of upmixed channels to set The acoustic image of source of sound can be located at different positions.
For example, it is assumed that the number of channels of source of sound 1 is 1, L11 is designated as, the number of channels of source of sound 2 is also 1, is designated as L21, described Processing unit determines that L11 and L21 is upmixed channels.Frequency domain data corresponding to L11, H11 is designated as, the frequency domain number corresponding to L21 According to being designated as H21.
So, L11 filtering data can be then H11 × t (k, 1);L21 filtering data then can be H11 × t (k, 2);Wherein, θ1≠θ2
Then, the processing unit filtering data to L11 and L21 filtering data can be then carried out at inverse FFT respectively Reason, to obtain the virtual audio data of L11 virtual audio data and L21.Finally, the processing unit can specifically use existing There is the sound mixing method in technology, the virtual audio data of virtual audio data and L21 to L11 carry out stereo process, retouch in detail State and may refer to related content of the prior art, here is omitted.
Or for another example the number of channels of source of sound 1 is 2, L12 and L13 are designated as, the number of channels of source of sound 2 is also 2, is designated as L22 and L23, the processing unit determine that L12 and L22 is one group of upmixed channels, and it is another group of upmixed channels to determine L13 and L23. Frequency domain data corresponding to L12 and L13, is designated as H12 and H13, L22 and the frequency domain data corresponding to L23, is designated as H22 and H23.
So, L12 filtering data can be then H12 × t (k, 1)+H13 × t (k, 1), and L13 filtering data then can be with For H12 × t (k, 1 ')+H13 × t (k, 1 '), θ1′≠360°-θ1;L22 filtering data can be then H22 × t (k, 2)+H23 × t (k, 2), L23 filtering data can be then H22 × t (k, 2 ')+H23 × t (k, 2 '), θ2′≠360°-θ2;Wherein, θ1≠ θ2
Then, the processing unit filtering data to L12 and L22 filtering data can be then carried out at inverse FFT respectively Reason, to obtain the virtual audio data of L12 virtual audio data and L22, and the filtering data to L13 and L23 respectively Filtering data carries out inverse FFT processing, to obtain the virtual audio data of L13 virtual audio data and L23.
Finally, the processing unit can specifically use sound mixing method of the prior art, to L12 virtual audio data Stereo process is carried out with L22 virtual audio data, and L13 virtual audio data and L23 virtual audio data are entered Row stereo process, and then the voice data that number of channels is 2 will be reassembled into by two parts voice data of stereo process. Wherein, the detailed description of stereo process may refer to related content of the prior art, and here is omitted.
In the present embodiment, pass through the original sound of each source of sound in the original audio data at least two acquired sources of sound Frequency to obtain the frequency domain data corresponding to the original audio data of each source of sound, and then utilizes according to frequency-domain transform processing is carried out At least two azimuth informations pre-set, are filtered to the frequency domain data corresponding to the original audio data of each source of sound respectively Ripple processing, to obtain the filtering data corresponding to the original audio data of each source of sound, enabling to the original of each source of sound Filtering data corresponding to voice data, stereo process is carried out, due to utilizing the azimuth information specified, treat the sound of audio mixing source of sound Frequency signal is filtered processing so that and the acoustic image of each source of sound can be located at different positions, rather than all a position, because This, each source of sound after audio mixing can acoustically feel very clear, so as to improve the audio quality after audio mixing.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because According to the present invention, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.
Fig. 2 is the structural representation for the device sound mixing that another embodiment of the present invention provides, as shown in Figure 2.The present embodiment Device sound mixing can include acquiring unit 21, converter unit 22 and downmixing unit 23.Wherein, acquiring unit 21, for obtaining extremely The original audio data of few two sources of sound;Converter unit 22, for every in the original audio data at least two source of sound The original audio data of individual source of sound carries out frequency-domain transform processing, to obtain the frequency domain corresponding to the original audio data of each source of sound Data;Downmixing unit 23, for utilizing at least two azimuth informations pre-set, respectively to the original audio number of each source of sound Processing is filtered according to corresponding frequency domain data, to obtain the filtering data corresponding to the original audio data of each source of sound, To the filtering data corresponding to the original audio data of each source of sound, stereo process is carried out.
Wherein, the azimuth information, for indicating the acoustic image positions of source of sound.Any object of reference can be used, for example, people Any one ear, people two ears among position etc..
It should be noted that the device sound mixing that the present embodiment is provided can be a processing unit, can be located locally Using (Application, App) for example, Baidu's pleasure is broadcast, it either may be located in the server of network side or may be used also In the application being located locally with part of functions unit, another part functional unit is located at the server of network side.
It is understood that the application can be mounted in the application program (nativeAPP) in terminal, or may be used also To be a webpage (webAPP) of browser in terminal, as long as the objective reality form of the processing of voice data can be realized Can, the present embodiment is to this without limiting.
Method in embodiment corresponding to Fig. 1, the device sound mixing that can be provided by the present embodiment realize that detailed description can With referring to the related content in embodiment corresponding to Fig. 1.
Alternatively, in a possible implementation of the present embodiment, the acquiring unit 21 specifically can be used for pair The frame head of target audio file is parsed, to determine the target channels number of the target audio file;To the target sound The data block of frequency file is decoded, to obtain the original audio data of source of sound;And according to the target channels number and institute The original audio data of source of sound is stated, obtains the channel audio data corresponding to each target channels.
Alternatively, in a possible implementation of the present embodiment, the acquiring unit 21 specifically can be used for pair The audio signal of at least one target channels is sampled, quantifies and coded treatment, to obtain at least one target channels In channel audio data corresponding to each target channels.
Alternatively, in a possible implementation of the present embodiment, the converter unit 22, specifically can be used for really Surely at least two target channels of audio mixing are treated, to be used as upmixed channels;To the channel audio data corresponding to each upmixed channels Sub-frame processing is carried out, to obtain an at least frame voice data for each upmixed channels;And at least one to each upmixed channels Frame voice data, frequency-domain transform processing is carried out, to obtain the frequency domain data corresponding to each upmixed channels.
Alternatively, in a possible implementation of the present embodiment, the downmixing unit 23 specifically can be according to every The azimuth information of individual upmixed channels, obtain the frequency response parameter of each upmixed channels;Rung according to the frequency of each upmixed channels Parameter and the frequency domain data corresponding to each upmixed channels are answered, obtains the filtering data of each upmixed channels;And to each mixed The filtering data in speech road, carry out stereo process.
For example, the frequency-domain transform processing is FFT processing, the downmixing unit 23, specifically can be used for according to each mixed The azimuth information in speech road, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), obtain each upmixed channels Frequency response parameter;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
K is frequency, span [0, N-1];
T (k, m) is the frequency response parameter value of k-th of frequency;
fsFor sample rate;
fkFor the frequency of k-th of frequency;
N is the points of Fast Fourier Transform (FFT) method;
θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
V is the velocity of sound, 340 meter per seconds;
Round (x) represents to take the integer closest to x.
Specifically, θmIt specifically can flexibly be set, caused as far as possible each according to the number M of upmixed channels to set The acoustic image of source of sound can be located at different positions.
In the present embodiment, by converter unit in the original audio data of at least two sources of sound acquired in acquiring unit The original audio data of each source of sound carries out frequency-domain transform processing, to obtain the frequency corresponding to the original audio data of each source of sound Numeric field data, and then by downmixing unit using at least two azimuth informations pre-set, respectively to the original audio of each source of sound Frequency domain data corresponding to data is filtered processing, to obtain the filtering number corresponding to the original audio data of each source of sound According to, enabling to the filtering data corresponding to the original audio data of each source of sound, stereo process is carried out, is specified due to utilizing Azimuth information, the audio signal for treating audio mixing source of sound is filtered processing so that the acoustic image of each source of sound can be located at difference Position, rather than all a position, therefore, each source of sound after audio mixing can acoustically feel very clear, so as to carry Audio quality after high audio mixing.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In several embodiments provided by the present invention, it should be understood that disclosed system, apparatus and method can be with Realize by another way.For example, device embodiment described above is only schematical, for example, the unit Division, only a kind of division of logic function, can there is other dividing mode, such as multiple units or component when actually realizing Another system can be combined or be desirably integrated into, or some features can be ignored, or do not perform.It is another, it is shown or The mutual coupling discussed or direct-coupling or communication connection can be the indirect couplings by some interfaces, device or unit Close or communicate to connect, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in one and computer-readable deposit In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are causing a computer Device (can be personal computer, audio frequency process engine, or network equipment etc.) or processor (processor) perform this hair The part steps of bright each embodiment methods described.And foregoing storage medium includes:USB flash disk, mobile hard disk, read-only storage (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD Etc. it is various can be with the medium of store program codes.
Finally it should be noted that:The above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although The present invention is described in detail with reference to the foregoing embodiments, it will be understood by those within the art that:It still may be used To be modified to the technical scheme described in foregoing embodiments, or equivalent substitution is carried out to which part technical characteristic; And these modification or replace, do not make appropriate technical solution essence depart from various embodiments of the present invention technical scheme spirit and Scope.

Claims (10)

  1. A kind of 1. sound mixing method, it is characterised in that including:
    Obtain the original audio data of at least two sources of sound;
    Frequency-domain transform processing is carried out to the original audio data of each source of sound in the original audio data of at least two source of sound, To obtain the frequency domain data corresponding to the original audio data of each source of sound;
    Using at least two azimuth informations pre-set, respectively to the frequency domain number corresponding to the original audio data of each source of sound According to processing is filtered, to obtain the filtering data corresponding to the original audio data of each source of sound, to the original of each source of sound Filtering data corresponding to voice data, carry out stereo process;Wherein, each orientation letter at least two azimuth information Breath, for indicating the acoustic image positions of source of sound.
  2. 2. according to the method for claim 1, it is characterised in that the original audio data for obtaining at least two sources of sound, Including:
    The frame head of target audio file is parsed, to determine the target channels number of the target audio file;To described The data block of target audio file is decoded, to obtain the original audio data of source of sound;And according to the target channels number The original audio data of mesh and the source of sound, obtain the channel audio data corresponding to each target channels;And/or
    The audio signals of at least one target channels is sampled, quantified and coded treatment, to obtain at least one mesh Mark the channel audio data corresponding to each target channels in sound channel.
  3. 3. according to the method for claim 2, it is characterised in that the original audio data at least two source of sound In the original audio data of each source of sound carry out frequency-domain transform processing, to obtain corresponding to the original audio data of each source of sound Frequency domain data, including:
    It is determined that at least two target channels of audio mixing are treated, to be used as upmixed channels;
    Sub-frame processing is carried out to the channel audio data corresponding to each upmixed channels, to obtain at least the one of each upmixed channels Frame voice data;
    To an at least frame voice data for each upmixed channels, frequency-domain transform processing is carried out, it is right to obtain each upmixed channels institute The frequency domain data answered.
  4. 4. according to the method in claim 2 or 3, it is characterised in that described to utilize at least two orientation pre-set letter Breath, is filtered processing, to obtain each source of sound to the frequency domain data corresponding to the original audio data of each source of sound respectively Filtering data corresponding to original audio data, to the filtering data corresponding to the original audio data of each source of sound, mixed Sound processing, including:
    According to the azimuth information of each upmixed channels, the frequency response parameter of each upmixed channels is obtained;
    According to the frequency domain data corresponding to the frequency response parameter of each upmixed channels and each upmixed channels, each audio mixing is obtained The filtering data of sound channel;
    To the filtering data of each upmixed channels, stereo process is carried out.
  5. 5. according to the method for claim 4, it is characterised in that the frequency-domain transform processing is Fast Fourier Transform (FFT) side Method;The azimuth information of each upmixed channels of basis, the frequency response parameter of each upmixed channels is obtained, including:
    According to the azimuth information of each upmixed channels, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), obtain The frequency response parameter of each upmixed channels;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
    K is frequency, span [0, N-1];
    T (k, m) is the frequency response parameter value of k-th of frequency;
    fsFor sample rate;
    fkFor the frequency of k-th of frequency;
    N is the points of Fast Fourier Transform (FFT) method;
    θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
    V is the velocity of sound, 340 meter per seconds;
    Round (x) represents to take the integer closest to x.
  6. A kind of 6. device sound mixing, it is characterised in that including:
    Acquiring unit, for obtaining the original audio data of at least two sources of sound;
    Converter unit, the original audio data for each source of sound in the original audio data at least two source of sound are carried out Frequency-domain transform processing, to obtain the frequency domain data corresponding to the original audio data of each source of sound;
    Downmixing unit, for utilizing at least two azimuth informations pre-set, respectively to the original audio data of each source of sound Corresponding frequency domain data is filtered processing, right to obtain the filtering data corresponding to the original audio data of each source of sound Filtering data corresponding to the original audio data of each source of sound, carries out stereo process;Wherein, at least two azimuth information In each azimuth information, for indicating the acoustic image positions of source of sound.
  7. 7. device according to claim 6, it is characterised in that the acquiring unit, be specifically used for
    The frame head of target audio file is parsed, to determine the target channels number of the target audio file;To described The data block of target audio file is decoded, to obtain the original audio data of source of sound;And according to the target channels number The original audio data of mesh and the source of sound, obtain the channel audio data corresponding to each target channels;And/or
    The audio signals of at least one target channels is sampled, quantified and coded treatment, to obtain at least one mesh Mark the channel audio data corresponding to each target channels in sound channel.
  8. 8. device according to claim 7, it is characterised in that the converter unit, be used for
    It is determined that at least two target channels of audio mixing are treated, to be used as upmixed channels;
    Sub-frame processing is carried out to the channel audio data corresponding to each upmixed channels, to obtain at least the one of each upmixed channels Frame voice data;And
    To an at least frame voice data for each upmixed channels, frequency-domain transform processing is carried out, it is right to obtain each upmixed channels institute The frequency domain data answered.
  9. 9. the device according to claim 7 or 8, it is characterised in that the downmixing unit, be specifically used for
    According to the azimuth information of each upmixed channels, the frequency response parameter of each upmixed channels is obtained;
    According to the frequency domain data corresponding to the frequency response parameter of each upmixed channels and each upmixed channels, each audio mixing is obtained The filtering data of sound channel;And
    To the filtering data of each upmixed channels, stereo process is carried out.
  10. 10. device according to claim 9, it is characterised in that the frequency-domain transform processing is Fast Fourier Transform (FFT) side Method;The downmixing unit, is specifically used for
    According to the azimuth information of each upmixed channels, and utilize formula t (k, m)=round (N × fk×τ(θm)+0.5), obtain The frequency response parameter of each upmixed channels;Wherein, fk=k × fs/N;τ(θm)=0.2 × sin (θm)/v;Wherein,
    K is frequency, span [0, N-1];
    T (k, m) is the frequency response parameter value of k-th of frequency;
    fsFor sample rate;
    fkFor the frequency of k-th of frequency;
    N is the points of Fast Fourier Transform (FFT) method;
    θmIt is that M is the number of upmixed channels for the azimuth information of each upmixed channels, m=1,2 ..., M;
    V is the velocity of sound, 340 meter per seconds;
    Round (x) represents to take the integer closest to x.
CN201410256380.5A 2014-06-10 2014-06-10 Sound mixing method and device Active CN104064191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410256380.5A CN104064191B (en) 2014-06-10 2014-06-10 Sound mixing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410256380.5A CN104064191B (en) 2014-06-10 2014-06-10 Sound mixing method and device

Publications (2)

Publication Number Publication Date
CN104064191A CN104064191A (en) 2014-09-24
CN104064191B true CN104064191B (en) 2017-12-15

Family

ID=51551869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410256380.5A Active CN104064191B (en) 2014-06-10 2014-06-10 Sound mixing method and device

Country Status (1)

Country Link
CN (1) CN104064191B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878230A (en) * 2015-12-10 2017-06-20 中国电信股份有限公司 Audio-frequency processing method, server and system in network telephone conference
CN106231489A (en) * 2016-07-25 2016-12-14 深圳市米尔声学科技发展有限公司 The treating method and apparatus of audio frequency
CN108111474B (en) * 2016-11-25 2019-05-17 视联动力信息技术股份有限公司 A kind of sound mixing method and device
CN109309845A (en) * 2017-07-28 2019-02-05 北京陌陌信息技术有限公司 The display methods and device of video, computer readable storage medium
CN107506409B (en) * 2017-08-09 2021-01-08 浪潮金融信息技术有限公司 Method for processing multi-audio data
CN107818790B (en) * 2017-11-16 2020-08-11 苏州麦迪斯顿医疗科技股份有限公司 Multi-channel audio mixing method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770256A (en) * 2004-11-02 2006-05-10 北京中科信利技术有限公司 Digital audio frequency mixing method based on transform domain
CN1778143A (en) * 2003-09-08 2006-05-24 松下电器产业株式会社 Audio image control device design tool and audio image control device
CN101065990A (en) * 2004-09-16 2007-10-31 松下电器产业株式会社 Sound image localizer
CN101459797A (en) * 2007-12-14 2009-06-17 深圳Tcl新技术有限公司 Sound positioning method and system
CN102056053A (en) * 2010-12-17 2011-05-11 中兴通讯股份有限公司 Multi-microphone audio mixing method and device
EP2421182A1 (en) * 2010-08-20 2012-02-22 Mediaproducción, S.L. Method and device for automatically controlling audio digital mixers
CN102986254A (en) * 2010-07-12 2013-03-20 华为技术有限公司 Audio signal generator
CN103037300A (en) * 2011-10-07 2013-04-10 索尼公司 Audio-signal processing device, audio-signal processing method, program and recording medium
CN103069481A (en) * 2010-07-20 2013-04-24 华为技术有限公司 Audio signal synthesizer
CN103379424A (en) * 2012-04-24 2013-10-30 华为技术有限公司 Sound mixing method and multi-point control server
CN103686544A (en) * 2013-09-04 2014-03-26 张家港保税区润桐电子技术研发有限公司 A synthetic method for audio signals
CN105556990A (en) * 2013-08-30 2016-05-04 共荣工程株式会社 Sound processing apparatus, sound processing method, and sound processing program

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4426159B2 (en) * 2002-08-28 2010-03-03 ヤマハ株式会社 Mixing equipment
CN102222503B (en) * 2010-04-14 2013-08-28 华为终端有限公司 Mixed sound processing method, device and system of audio signal

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1778143A (en) * 2003-09-08 2006-05-24 松下电器产业株式会社 Audio image control device design tool and audio image control device
CN101065990A (en) * 2004-09-16 2007-10-31 松下电器产业株式会社 Sound image localizer
CN1770256A (en) * 2004-11-02 2006-05-10 北京中科信利技术有限公司 Digital audio frequency mixing method based on transform domain
CN101459797A (en) * 2007-12-14 2009-06-17 深圳Tcl新技术有限公司 Sound positioning method and system
CN102986254A (en) * 2010-07-12 2013-03-20 华为技术有限公司 Audio signal generator
CN103069481A (en) * 2010-07-20 2013-04-24 华为技术有限公司 Audio signal synthesizer
EP2421182A1 (en) * 2010-08-20 2012-02-22 Mediaproducción, S.L. Method and device for automatically controlling audio digital mixers
WO2012079459A1 (en) * 2010-12-17 2012-06-21 中兴通讯股份有限公司 Method and apparatus for audio mixing of multiple microphones
CN102056053A (en) * 2010-12-17 2011-05-11 中兴通讯股份有限公司 Multi-microphone audio mixing method and device
CN103037300A (en) * 2011-10-07 2013-04-10 索尼公司 Audio-signal processing device, audio-signal processing method, program and recording medium
CN103379424A (en) * 2012-04-24 2013-10-30 华为技术有限公司 Sound mixing method and multi-point control server
CN105556990A (en) * 2013-08-30 2016-05-04 共荣工程株式会社 Sound processing apparatus, sound processing method, and sound processing program
CN103686544A (en) * 2013-09-04 2014-03-26 张家港保税区润桐电子技术研发有限公司 A synthetic method for audio signals

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"论立体声节目制作中的声像定位";王智;《音响技术》;20071231;全文 *
"音乐混音中重塑空间纵深感的技术手段";刘志晟;《演艺科技》;20131112(第12期);全文 *

Also Published As

Publication number Publication date
CN104064191A (en) 2014-09-24

Similar Documents

Publication Publication Date Title
CN104064191B (en) Sound mixing method and device
CN103348703B (en) In order to utilize the reference curve calculated in advance to decompose the apparatus and method of input signal
CN107731238B (en) Coding method and coder for multi-channel signal
WO2021196905A1 (en) Voice signal dereverberation processing method and apparatus, computer device and storage medium
EP4011099A1 (en) System and method for assisting selective hearing
JP2021515277A (en) Audio signal processing system and how to convert the input audio signal
CN104934036B (en) Audio coding apparatus, method and audio decoding apparatus, method
Tan et al. SAGRNN: Self-attentive gated RNN for binaural speaker separation with interaural cue preservation
CA2907595A1 (en) Method and apparatus for compressing and decompressing a higher order ambisonics representation
CN105900455A (en) Method and apparatus for processing audio signal
CN103403800A (en) Determining the inter-channel time difference of a multi-channel audio signal
CN105580070A (en) Apparatus and method for decoding and encoding an audio signal using adaptive spectral tile selection
CN106797526B (en) Apparatus for processing audio, method and computer readable recording medium
CN104036788B (en) The acoustic fidelity identification method of audio file and device
CN104718572A (en) Audio encoding method and device, audio decoding method and device, and multimedia device employing same
JP2023055951A (en) Method and encoder for encoding multi-channel signal
CN107346664A (en) A kind of ears speech separating method based on critical band
WO2018201112A1 (en) Audio coder window sizes and time-frequency transformations
CN106233112B (en) Coding method and equipment and signal decoding method and equipment
CN106033671B (en) Method and apparatus for determining inter-channel time difference parameters
Quan et al. Multichannel speech separation with narrow-band conformer
US20120195435A1 (en) Method, Apparatus and Computer Program for Processing Multi-Channel Signals
Lin et al. Focus on the sound around you: Monaural target speaker extraction via distance and speaker information
CN107464569A (en) Vocoder
RU2495504C1 (en) Method of reducing transmission rate of linear prediction low bit rate voders

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160316

Address after: 100027 Haidian District, Qinghe Qinghe East Road, No. 23, building two, floor 2108, No., No. 18

Applicant after: BEIJING YINZHIBANG CULTURE TECHNOLOGY Co.,Ltd.

Address before: 100085 Beijing, Haidian District, No. ten on the street Baidu building, No. 10

Applicant before: BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220520

Address after: 518057 3305, floor 3, building 1, aerospace building, No. 51, Gaoxin South ninth Road, high tech Zone community, Yuehai street, Nanshan District, Shenzhen, Guangdong

Patentee after: Shenzhen Taile Culture Technology Co.,Ltd.

Address before: 2108, floor 2, building 23, No. 18, anningzhuang East Road, Qinghe, Haidian District, Beijing 100027

Patentee before: BEIJING YINZHIBANG CULTURE TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right