CN105405448B - A kind of sound effect treatment method and device - Google Patents

A kind of sound effect treatment method and device Download PDF

Info

Publication number
CN105405448B
CN105405448B CN201410472853.5A CN201410472853A CN105405448B CN 105405448 B CN105405448 B CN 105405448B CN 201410472853 A CN201410472853 A CN 201410472853A CN 105405448 B CN105405448 B CN 105405448B
Authority
CN
China
Prior art keywords
voice signal
sound
voice
signal
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410472853.5A
Other languages
Chinese (zh)
Other versions
CN105405448A (en
Inventor
王影
孙见青
江源
胡国平
胡郁
刘庆峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201410472853.5A priority Critical patent/CN105405448B/en
Publication of CN105405448A publication Critical patent/CN105405448A/en
Application granted granted Critical
Publication of CN105405448B publication Critical patent/CN105405448B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of sound effect treatment method and devices, this method comprises: collected sound signal;Determine the corresponding sound type of the voice signal;Sound effect treatment method corresponding with the sound type is selected to carry out audio effect processing to the voice signal.Using the present invention, the effect of audio effect processing can be effectively improved.

Description

A kind of sound effect treatment method and device
Technical field
This application involves signal processing technology field more particularly to a kind of sound effect treatment methods and device.
Background technique
With the universal exploitation of internet fast development and terminal applies, more and more users were able to more easily at end Online K song, substantially increases the convenience of user's singing, improves user experience in end equipment.However on terminal device product Online K song application so that user K song environment complicate, the voice input that system acquisition arrives may include not only voice, also wrap The accompaniment sound put outside when including singing and various existing ambient noises etc., for example, the external world is listened not when user is recorded using earphone To musical background sound, collected sound is pure voice at this time;Otherwise when user opens outer relieving, sound such as is played using speaker It is happy, a small amount of accompaniment tone will be mixed in the sound acquired at this time;And in a noisy environment, collected sound will also contain noise Deng interference.The complexity of sound source input causes very big difficulty to audio effect processing, while also affecting the user's body of K song product It tests.
Currently, the sound effect treatment method that mobile terminal uses is excellent based on uniform principles progress to the voice input of acquisition Change, i.e., effect process is carried out according to preset specified sound effect treatment method to different voice input types.However it is different Voice input, such as pure voice and the voice with accompaniment are very different in data distribution, using unified method to sound Input carries out audio effect processing obviously excessively generally, is unable to satisfy audio effect of optimization.
Summary of the invention
In order to solve the above technical problems, the embodiment of the present application provides a kind of sound effect treatment method and device, sound can be improved Imitate the effect of processing.Technical solution is as follows:
A kind of sound effect treatment method, comprising:
Collected sound signal;
Determine the corresponding sound type of the voice signal;
Sound effect treatment method corresponding with the sound type is selected to carry out audio effect processing to the voice signal.
Preferably, the corresponding sound type of the determination voice signal, comprising:
Framing is carried out to the voice signal;
End-point detection is carried out to each frame voice signal, obtains the voiced segments and noise section of the voice signal;
Calculate the signal-to-noise ratio of the voiced segments of the voice signal;
The corresponding sound type of the voice signal is determined according to the signal-to-noise ratio.
Preferably, the corresponding sound type of the determination voice signal, comprising:
Extract the acoustic feature of the voice signal;
Calculate the likelihood value of the acoustic feature with each sound type model being obtained ahead of time;
Using the sound type model with maximum likelihood value as the corresponding sound type of the voice signal.
Preferably, before the likelihood value for calculating the acoustic feature and each sound type model being obtained ahead of time, Further include:
Obtain each sound type model;
It is described to obtain each sound type model, comprising:
Multiple groups training data is collected, the training data includes the corresponding different types of sound letter of standard voice signal Number;
Extract the acoustic feature of the training data;
Model training, which is carried out, according to the acoustic feature of the training data obtains the corresponding sound of different types of voice signal Sound Type model.
Preferably, sound is carried out to the voice signal in selection sound effect treatment method corresponding with the sound type Before effect processing, further includes:
Obtain the signal-to-noise ratio of the voice signal;
It is described that sound effect treatment method corresponding with the sound type is selected to carry out audio effect processing, packet to the voice signal It includes:
According to the noise of the voice signal when the voice signal sound type selection sound effect treatment method to institute It states voice signal and carries out audio effect processing.
Preferably, selection sound effect treatment method corresponding with the sound type carries out audio to the voice signal Processing, comprising:
When the signal-to-noise ratio of the voice signal is less than the first snr threshold, and sound type is pure voice sound type When, the low frequency part of the voice signal is reduced, low frequency and medium-high frequency part in raising;
When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be pure voice When sound type, reduce the difference between the maximum level and track average level of the voice track of the voice signal, improves institute State the high frequency section of voice signal;
When the voice signal signal-to-noise ratio be less than first snr threshold, and sound type be band accompany sound When type, the medium-high frequency and high frequency section of the voice signal are improved, and reduces the reverberation time;
When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be band accompaniment Sound type when, improve the high frequency section of the voice signal, high frequency section compensated, remove low frequency part.
Preferably, the method also includes:
Stereo process is carried out to the voice signal after progress audio effect processing.
Preferably, the voice signal progress stereo process after described pair of progress audio effect processing includes:
Determine types of songs belonging to the voice signal;
Search the first energy ratio condition of the corresponding voice of types of songs belonging to the voice signal and accompaniment;
Calculate the second energy ratio of voice and accompaniment in the voice signal;
When second energy ratio is unsatisfactory for the first energy ratio condition, adjusted according to the first energy ratio condition Voice or accompaniment in the voice signal;
By the voice and accompaniment progress audio mixing in the voice signal adjusted.
A kind of sound-effect processing equipment, comprising:
Signal acquisition unit is used for collected sound signal;
Type determining units, for determining the corresponding sound type of the voice signal;
Audio effect processing unit, for select corresponding with sound type sound effect treatment method to the voice signal into Row audio effect processing.
Preferably, the type determining units include:
Framing subelement, for carrying out framing to the voice signal;
Detection sub-unit obtains the voiced segments of the voice signal for carrying out end-point detection to each frame voice signal With noise section;
First computation subunit, the signal-to-noise ratio of the voiced segments for calculating the voice signal;
First determines subelement, for determining the corresponding sound type of the voice signal according to the signal-to-noise ratio.
Preferably, the type determining units include:
Subelement is extracted, for extracting the acoustic feature of the voice signal;
Second computation subunit, for calculating the likelihood of the acoustic feature with each sound type model being obtained ahead of time Value;
Second determines subelement, for that will have the sound type model of maximum likelihood value corresponding as the voice signal Sound type.
Preferably, described device further include:
Model acquiring unit, for obtaining each sound class before second computation subunit calculates the likelihood value Pattern type;
The model acquiring unit includes:
Subelement is collected, for collecting multiple groups training data, the training data includes that standard voice signal is corresponding not The voice signal of same type;
Feature extraction subelement, for extracting the acoustic feature of the training data;
Model training subelement obtains different type for carrying out model training according to the acoustic feature of the training data The corresponding sound type model of voice signal.
Preferably, described device further include:
Data capture unit, for obtaining the voice signal before the audio effect processing unit carries out audio effect processing Signal-to-noise ratio;
The audio effect processing unit, specifically for the sound according to the noise of the voice signal when voice signal Type selects sound effect treatment method to carry out audio effect processing to the voice signal.
Preferably, the audio effect processing unit, specifically for the signal-to-noise ratio when the voice signal less than the first signal-to-noise ratio Threshold value, and sound type be pure voice sound type when, reduce the low frequency part of the voice signal, low frequency and middle height in raising Frequency part;When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be pure voice When sound type, reduce the difference between the maximum level and track average level of the voice track of the voice signal, improves The high frequency section of the voice signal;When the signal-to-noise ratio of the voice signal is less than first snr threshold, and sound class When type is the sound type with accompaniment, the medium-high frequency and high frequency section of the voice signal are improved, and reduce the reverberation time;Work as institute The signal-to-noise ratio for stating voice signal is greater than or equal to first snr threshold, and sound type is the sound type with accompaniment When, the high frequency section of the voice signal is improved, high frequency section is compensated, removes low frequency part.
Preferably, described device further include:
Stereo process unit, for carrying out stereo process to the voice signal after progress audio effect processing.
Preferably, the stereo process unit includes:
Third determines subelement, for determining types of songs belonging to the voice signal;
Subelement is searched, for searching the first energy of the corresponding voice of types of songs belonging to the voice signal and accompaniment Amount compares condition;
Third computation subunit, for calculating the second energy ratio of voice and accompaniment in the voice signal;
Subelement is adjusted, for when second energy ratio is unsatisfactory for the first energy ratio condition, according to described the One energy ratio condition adjusts voice or accompaniment in the voice signal;
Audio mixing subelement, for the voice in the voice signal adjusted to be carried out audio mixing with accompaniment.
The embodiment of the present invention at least has the advantages that
The embodiment of the present invention uses not by distinguishing to collected voice signal, and for different sound types Same method carries out audio effect processing, so that audio effect processing further refines, to obtain more preferably audio effect.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of method flow diagram of audio effect processing of the embodiment of the present invention;
Fig. 2 is a kind of method flow diagram of the corresponding sound type of determining voice signal in the embodiment of the present invention;
Fig. 3 is another method flow diagram for determining the corresponding sound type of voice signal in the embodiment of the present invention;
Fig. 4 is a kind of method flow diagram for obtaining sound type model in the embodiment of the present invention;
Fig. 5 carries out the method stream of stereo process for the voice signal after the audio effect processing to progress a kind of in the embodiment of the present invention Cheng Tu;
Fig. 6 is a kind of structural schematic diagram of sound-effect processing equipment of the embodiment of the present invention;
Fig. 7 is a kind of structural schematic diagram of type determining units in the embodiment of the present invention;
Fig. 8 is the structural schematic diagram of another type determining units in the embodiment of the present invention;
Fig. 9 is the structural schematic diagram of another kind of embodiment of the present invention sound-effect processing equipment;
Figure 10 is a kind of structural schematic diagram of model acquiring unit in the embodiment of the present invention;
Figure 11 is the structural schematic diagram of another kind of embodiment of the present invention sound-effect processing equipment;
Figure 12 is a kind of structural schematic diagram of stereo process unit in the embodiment of the present invention.
Specific embodiment
In order to make those skilled in the art better understand the technical solutions in the application, below in conjunction with the application reality The attached drawing in example is applied, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described implementation Example is merely a part but not all of the embodiments of the present application.Based on the embodiment in the application, this field is common The application protection all should belong in technical staff's every other embodiment obtained without making creative work Range.
In order to make the above objects, features, and advantages of the present application more apparent, with reference to the accompanying drawing and it is specific real Applying mode, the present application will be further described in detail.
It is a kind of method flow diagram of audio effect processing of the embodiment of the present invention referring to Fig. 1.
This method may include:
Step 101, collected sound signal.
The voice signal may not only include voice, it is also possible to including other ambient sounds, such as the companion put outside when singing Play sound and various existing ambient noises etc..
Step 102, the corresponding sound type of voice signal is determined.
In the embodiment of the present invention can using the various analysis such as KLR signal approach or mathematical statistical model method come according to Sound type corresponding to the voice signal is determined according to the acoustic feature of the voice signal, and specifically, the method for signal analysis is The type of sound is determined by features such as the signal-to-noise ratio of analysis input sound or energy, the method for mathematical statistical model is to pass through The acoustic feature such as MFCC (Mel frequency cepstral coefficient) for extracting input sound determines the class of sound by the method for statistical modeling Type, this case carry out statistical modeling, the description that for details, refer to the subsequent embodiments using the method based on DNN.Wherein, sound type It may include pure voice, voice with accompaniment etc..
Step 103, sound effect treatment method corresponding with sound type is selected to carry out audio effect processing to voice signal.
The corresponding sound effect treatment method of alternative sounds type is preset, determines the sound class of voice signal in above-mentioned steps After type, triggers corresponding sound effect treatment method and audio effect processing is carried out to voice signal.Wherein, preset sound effect treatment method can be with The sound effect treatment method etc. of voice including pure voice and with accompaniment.
The embodiment of the present invention uses not by distinguishing to collected voice signal, and for different sound types Same method carries out audio effect processing, so that audio effect processing further refines, to obtain more preferably audio effect.
In an embodiment of the present invention, the method for the corresponding sound type of determination voice signal can be using based on signal The method of analysis, specifically as shown in Fig. 2, this method may include:
Step 201, framing is carried out to voice signal.
The voice signal is divided into multiple speech frames first.
Step 202, end-point detection is carried out to each frame voice signal, obtains the voiced segments and noise section of voice signal.
Include: to the end-point detection of each frame voice signal in this step
Firstly, calculating the short-time average energy and short-time average zero-crossing rate of every frame voice signal;Short-time average energy and short When Average zero-crossing rate be signal analysis method common in Speech processing, so-called short-time energy is exactly first to voice signal Sub-frame processing is carried out, its energy is then asked respectively to each frame.The short-time energy of voiced sound is maximum, and the short-time energy of voiceless sound is taken second place, Noiseless short-time energy is minimum.Short-time zero-crossing rate indicates that a frame voice signal waveform passes through the number of horizontal axis (zero level).
Then, it is respectively compared the higher of the short-time average energy of every frame voice signal and pre-set short-time average energy Thresholding T1, the short-time average zero-crossing rate of more every frame voice signal and the higher threshold of pre-set short-time average zero-crossing rate T2.General voiced segments energy is higher, short-time average energy is larger, and short-time average zero-crossing rate is smaller, and noise section is put down in short-term Equal energy is smaller, and short-time average zero-crossing rate is higher.Therefore, when the short-time average energy of speech frame is greater than T1, short-time average zero passage When rate is less than T2, which belongs to voiced segments, and when the short-time average energy of speech frame is less than T1, short-time average zero-crossing rate is greater than When T2, which belongs to noise section, thus, it is possible to obtain the voiced segments and noise section in voice signal.
Step 203, the signal-to-noise ratio of the voiced segments of voice signal is calculated.
Under normal circumstances, signal-to-noise ratio is bigger illustrates that signal strength is bigger, and noise is smaller.Signal-to-noise ratio is smaller to illustrate signal strength Smaller noise is bigger.When signal-to-noise ratio reaches 60db or more, substantially noise is acceptable.The calculation method of the signal-to-noise ratio with The prior art is similar, and signal-to-noise ratio db=10log (S/N), details are not described herein again.
Step 204, the corresponding sound type of voice signal is determined according to the signal-to-noise ratio.
The second snr threshold specifically can be set, if the signal-to-noise ratio of present sound signals is less than the second noise of setting Than threshold value, then it can be determined that the sound type of present sound signals is the voice with accompaniment, otherwise sound type is pure voice.
Above-mentioned signal analysis method is generally more sensitive to environment, and analysis result is easy affected by environment and error occurs, In one embodiment of the invention, the method for the corresponding sound type of determination voice signal can be using based on mathematical statistical model point The method of analysis, specifically as shown in figure 3, this method may include:
Step 301, the acoustic feature of voice signal is extracted.
It is Mel frequency cepstral coefficient, referred to as MFCC that common acoustic feature can be extracted in this step.
Step 302, the likelihood value of the acoustic feature with each sound type model being obtained ahead of time is calculated.
It collects mass data in advance before executing this step and carries out acoustic feature extraction, then training obtains not in unison The distributed model of sound type, specifically as described in subsequent embodiment.The distributed model of the alternative sounds type is practical, and characterize is not With the acoustic feature of sound type.
It is calculated separately between the acoustic feature of voice signal and the distributed model of alternative sounds type in this step seemingly So value.Similarly to the prior art, details are not described herein again for the calculation method of the likelihood value.
Step 303, using the sound type model with maximum likelihood value as the corresponding sound type of the voice signal.
Wherein, using it is above-mentioned based on mathematical statistical model analysis method determine the corresponding sound type of voice signal it Before need to be obtained ahead of time each sound type model, sound type model construction can be using various different model hypothesis, such as GMM mould Type, HMM model etc..In general model is more complicated, then its distribution simulation precision is higher.It is based in this regard, this case proposes one kind The sound distributed model construction method of neural network, for simulating the characteristic distributions of pure voice and the voice with accompaniment, the acquisition The method of sound type model, as shown in figure 4, may include:
Step 401, multiple groups training data is collected, which includes the corresponding different types of sound of standard voice signal Sound signal.
In this step, the corresponding different types of voice signal of collection standard voice signal first as training data, with The voice signal collects two kinds of different types of data conducts of N song to be illustrated for song first in the present embodiment Training data acquires the two kinds of data such as pure voice and the voice containing accompaniment for same song respectively, then instructs Practice the total 2N song of data.This case equalizing training efficiency and precision select N=500, sample rate 16kHz.
Step 402, the acoustic feature of training data is extracted.
This step carries out framing to training data, and extracts 13 dimension MFCC spectrum signatures of every frame voice, and by present frame and Input of 13 dimension the MFCC features and its dynamic parameter of 5 frame voice of front and back as DNN, to consider context-related information.
Step 403, the different types of voice signal of model training acquisition is carried out according to the acoustic feature of training data to correspond to Sound type model.
It is illustrated for training DNN model.It specifically includes:
Firstly, determining DNN model topology structure.It is that (13*3*11 considers 429 nodes that DNN input layer, which is arranged, in this step To static and dynamic parameter, the intrinsic dimensionality of every frame voice is 39 dimensions, it is contemplated that contextual information uses present frame and its front and back The feature of 5 frames, totally 11 frame characteristic parameter), for receiving acoustic feature;Output layer indicates each classification information, is indicated with 0,1, In 0 indicate pure voice, 1 indicates the voice containing accompaniment, including 2 nodes;Its hidden layer is using 3 layers of setting, wherein each hidden layer Number of nodes be 2048.
Then, the DNN model topology structure is trained according to training data, obtains model parameter, the i.e. weight of DNN Coefficient.Special this case exploitation collection data, adjust suitable parameter update times, such as 20 times, carry out model optimization.
The carry out model training simultaneously obtains the process of model finally as the prior art, and details are not described herein again.
Analyzed by above-mentioned signal or mathematical statistical model analysis the methods of can determine it is specific corresponding to voice signal Sound type, then using the processing method of the preset corresponding audio template of alternative sounds type to voice signal at Reason, obtains the audio result of optimization.
In another embodiment, it is contemplated that should exist under different noisy environments to the sound effect treatment method of voice signal Biggish difference, the different SNR ranges to this this case also according to voice signal further segment sound effect treatment method, root According to voice signal noise when voice signal sound type selection sound effect treatment method to voice signal carry out audio effect processing, It specifically includes:
When the signal-to-noise ratio of voice signal is less than the first snr threshold, and sound type is pure voice sound type, lead to Crossing balanced device suitably reduces the low frequency part 0-80Hz of voice signal, such as reduces by 5 decibels or so, low frequency 150Hz- in promotion The part 500Hz such as promotes about 5 decibels or so, for improving the dynamics and loudness of sound;And then for medium-high frequency 2KHz-5KHz About 2 to 3 decibels can be promoted, the penetration power etc. of sound is improved;
When voice signal signal-to-noise ratio be greater than or equal to the first snr threshold, and sound type be pure voice sound type When, the dynamic range of voice track is controlled by compressor, that is, reduce the maximum level and track of the voice track of voice signal Difference between average level such that voice track is more coordinated with entire music, and sounds sound more It is plentiful, it is more strong.Compression ratio parameter is usually all arranged between 2:1 to 8:1, and threshold threshold parameter is usually all set It is -5dB between -20dB, level and its dynamic range when with specific reference to collected sound signal is determined.It is suitable by balanced device When improving the part high frequency 7KHz-8KHz, such as from about 2 to 3 decibels, the stereovision of sound is improved;
When the signal-to-noise ratio of voice signal is less than the first snr threshold, and sound type is the sound type with accompaniment, Different from pure voice environment, with, due to being influenced by accompanying, actual noise should be without so big, to people under accompaniment environment The change of sound should adjust accordingly, it should which suitably enhancing voice such as suitably mentions the medium-high frequency part 2kHz-5kHz of voice 5 decibels are risen, the penetration power of voice is improved, 2 decibels is suitably promoted to high frequency section 7kHz-8kHz, increase the clear sense of voice, sound When effect processing, the reverberation time should be arranged smaller, such as T60=1.2, prevent a small amount of accompaniment having in voice and pure accompany from having Offset causes to accompany after audio mixing have delay in time, and accompaniment sounds unintelligible.It is appropriate to increase voice and accompaniment when audio mixing Ratio achievees the purpose that enhance voice.Low-order harmonic and overtone of the part comprising most of musical instruments that frequency is 500Hz-2KHz, Appropriate promoted can make sound thoroughly become clear, and such as promote 2 decibels;
When voice signal signal-to-noise ratio be greater than or equal to the first snr threshold, and sound type be band accompany sound class When type, the part high frequency 6KHz-8KHz of voice signal is promoted by balanced device, high frequency is compensated, while to 80Hz or less Part can remove, increase the clear sense of sound.
Above-mentioned first snr threshold can be needed according to application to set, such as 50 decibels etc..In addition, it is necessary to explanation Be, above-mentioned first snr threshold and the second snr threshold be under normal circumstances it is unequal, can be according to practical application Demand is set.
It should be noted that in embodiments of the present invention, frequency can be divided into low frequency, 150-500HZ in 0-150HZ It is intermediate frequency for middle low frequency, 500-2KHZ, 2K-5KHZ is medium-high frequency, the above are high frequencies by 5KHZ.Certainly, divide not is to fix , it can be adjusted accordingly according to practical situations.In addition, the frequency band of audio effect processing is also not solid in the embodiment of the present invention Fixed, it can be adjusted according to practical application request.
In another embodiment, this method can also carry out audio mixing to the voice signal after audio effect processing according to application demand Processing.
By taking voice signal is song as an example, the song that audio mixing is beautified can be carried out to the song.Consider in audio mixing Into different type song, voice is different from the energy ratio of accompaniment and voice signal in whether with accompaniment, need to be divided to two kinds of feelings Condition carries out audio mixing.To the voice signal of no band accompaniment, audio mixing can be carried out according to normal voice and the energy proportion of accompaniment, and For having the voice signal of accompaniment tone, the shared energy ratio that needs to accompany is turned down, prevents accompaniment tone after audio mixing excessive, people The too small situation of sound occurs, and influences sense of hearing, wherein the corresponding voice of different types of songs is different from the energy ratio of accompaniment.Specifically , the method that stereo process is carried out to the voice signal after carrying out audio effect processing, as shown in Figure 5, comprising:
Step 501, types of songs belonging to voice signal is determined.
Types of songs refers to that song is the different-styles types such as tenderness, rock and roll.
Step 502, the first energy ratio condition of the corresponding voice of types of songs belonging to voice signal and accompaniment is searched.
The voice of different types of songs and the energy ratio condition of accompaniment can be preset in systems in advance according to experimental result.It should First energy ratio condition specifically can be a proportional region, be also possible to other conditions.
Such as tenderness song accompaniment and the first energy ratio of voice be usually 1.1, accompany bigger;Rock song voice with First energy ratio of accompaniment is 10/9, accompany smaller etc..
The first energy ratio condition preset corresponding to the voice signal is required to look up in this step, then executes step 504。
Step 503, the second energy ratio of voice and accompaniment in voice signal is calculated.
Voice and the energy ratio of accompaniment=voice energy/accompaniment energy, the calculating of acoustic energy is like the prior art.
Step 504, judge whether the second energy ratio meets the first energy ratio condition.
If satisfied, step 505 is executed, if not satisfied, executing step 506.
It step 505, directly will accompaniment and voice progress audio mixing.
Signal work is not changed when audio mixing, directly by two Signal averagings.
Step 506, voice or accompaniment in voice signal are adjusted according to the first energy ratio condition, then carries out audio mixing.
Such as the first energy ratio of the preset accompaniment and voice of a certain song is 0.9, the accompaniment of currently practical calculating and people Second energy ratio of sound is 1.2, illustrates that accompaniment sound is somewhat big, needs to promote voice.Voice energy=companion adjusted Play energy/0.9.Then by the voice and accompaniment progress audio mixing in voice signal adjusted.
The present embodiment carries out the song that stereo process is beautified by the voice signal after discriminatively optimizing to audio, Improve the amusement function of Karaoke.
It is the description to embodiment of the present invention method above, the device of the realization above method is introduced below.
It is a kind of structural schematic diagram of sound-effect processing equipment of the embodiment of the present invention referring to Fig. 6.
The apparatus may include:
Signal acquisition unit 601 is used for collected sound signal.
Type determining units 602, for determining the corresponding sound type of the voice signal.
Audio effect processing unit 603, for selecting sound effect treatment method corresponding with the sound type to believe the sound Number carry out audio effect processing.
The device distinguishes collected voice signal by said units, and uses for different sound types Different methods carries out audio effect processing, so that audio effect processing further refines, to obtain more preferably audio effect.
In one embodiment, as shown in fig. 7, the type determination unit 602 can specifically include:
Framing subelement 701, for carrying out framing to voice signal.
Detection sub-unit 702, for each frame voice signal carry out end-point detection, obtain voice signal voiced segments and Noise section.
First computation subunit 703, for calculating the voiced segments signal-to-noise ratio of the voice signal.
First determines subelement 704, for determining the corresponding sound type of the voice signal according to the signal-to-noise ratio.
In another embodiment, as shown in figure 8, the type determination unit 602 can specifically include:
Subelement 801 is extracted, for extracting the acoustic feature of the voice signal.
Second computation subunit 802, each sound type model for calculating the acoustic feature and being obtained ahead of time is seemingly So value.
Second determines subelement 803, for that will have the sound type model of maximum likelihood value as the voice signal Corresponding sound type.
The corresponding sound type of voice signal is determined using mathematical statistical model analysis method in type determining units 602 When, as shown in figure 9, the device in addition to include signal acquisition unit 601, type determining units 602, audio effect processing unit 603 it Outside, can also include:
Model acquiring unit 901, it is each for obtaining before second computation subunit 802 calculates the likelihood value Sound type model.
Data capture unit 902, for obtaining the sound before the audio effect processing unit 603 carries out audio effect processing The signal-to-noise ratio of sound signal.
Audio effect processing unit 603, specifically for the sound according to the noise of the voice signal when voice signal Type selects sound effect treatment method to carry out audio effect processing to the voice signal.
Wherein, which may further include as shown in Figure 10:
Subelement 1001 is collected, for collecting multiple groups training data, the training data includes that standard voice signal is corresponding Different types of voice signal.
Feature extraction subelement 1002, for extracting the acoustic feature of the training data.
Model training subelement 1003 obtains difference for carrying out model training according to the acoustic feature of the training data The corresponding sound type model of the voice signal of type.
In another embodiment, audio effect processing unit 603, specifically for the signal-to-noise ratio when the voice signal less than first Snr threshold, and sound type be pure voice sound type when, reduce the low frequency part of the voice signal, low frequency in raising With medium-high frequency part;When the signal-to-noise ratio of the voice signal is greater than or equal to first snr threshold, and sound type is When pure voice sound type, reduce the difference between the maximum level and track average level of the voice track of the voice signal Not, the high frequency section of the voice signal is improved;When the voice signal signal-to-noise ratio be less than first snr threshold, and When sound type is the sound type with accompaniment, when improving the medium-high frequency and high frequency section of the voice signal, and reducing reverberation Between;When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be band accompany sound When sound type, the high frequency section of the voice signal is improved, high frequency section is compensated, removes low frequency part.
In another embodiment, as shown in figure 11, the device is in addition to including signal acquisition unit 601, type determining units 602, except audio effect processing unit 603, can also include:
Stereo process unit 1101, for carrying out stereo process to the voice signal after progress audio effect processing.
As shown in figure 12, which can specifically include:
Third determines subelement 1201, for determining types of songs belonging to the voice signal.
Subelement 1202 is searched, for searching the of the corresponding voice of types of songs belonging to the voice signal and accompaniment One energy ratio condition.
Third computation subunit 1203, for calculating the second energy ratio of voice and accompaniment in the voice signal.
Subelement 1204 is adjusted, for when second energy ratio is unsatisfactory for the first energy ratio condition, according to institute It states the first energy ratio condition and adjusts voice or accompaniment in the voice signal.
Audio mixing subelement 1205, for the voice in the voice signal adjusted to be carried out audio mixing with accompaniment.
The present embodiment discriminatively carries out stereo process to the voice signal after audio optimization by said units and obtains beauty The song of change improves the amusement function of Karaoke.
The specific implementation process of each unit refers to the description of preceding method embodiment in apparatus above, no longer superfluous herein It states.
For convenience of description, it is divided into various units when description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit can be realized in the same or multiple software and or hardware when application.
As seen through the above description of the embodiments, those skilled in the art can be understood that the application can It realizes by means of software and necessary general hardware platform.Based on this understanding, the technical solution essence of the application On in other words the part that contributes to existing technology can be embodied in the form of software products, the computer software product It can store in storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions are used so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the certain of each embodiment of the application or embodiment Method described in part.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.System embodiment described above is only schematical, wherein described be used as separate part description Unit may or may not be physically separated, component shown as a unit may or may not be Physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to the actual needs Some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case where creative work, it can understand and implement.
The application can be used in numerous general or special purpose computing system environments or configuration.Such as: personal computer, service Device computer, handheld device or portable device, laptop device, multicomputer system, microprocessor-based system, top set Box, programmable consumer-elcetronics devices, network PC, minicomputer, mainframe computer, including any of the above system or equipment Distributed computing environment etc..
The application can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, group Part, data structure etc..The application can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
The above is only the specific embodiment of the application, it is noted that for the ordinary skill people of the art For member, under the premise of not departing from the application principle, several improvements and modifications can also be made, these improvements and modifications are also answered It is considered as the protection scope of the application.

Claims (16)

1. a kind of sound effect treatment method characterized by comprising
Collected sound signal;
Determine that the corresponding sound type of the voice signal, the sound type include pure voice and the voice with accompaniment;
Sound effect treatment method corresponding with the sound type is selected to carry out audio effect processing to the voice signal.
2. the method according to claim 1, wherein the corresponding sound type of the determination voice signal, Include:
Framing is carried out to the voice signal;
End-point detection is carried out to each frame voice signal, obtains the voiced segments and noise section of the voice signal;
Calculate the signal-to-noise ratio of the voiced segments of the voice signal;
The corresponding sound type of the voice signal is determined according to the signal-to-noise ratio.
3. the method according to claim 1, wherein the corresponding sound type of the determination voice signal, Include:
Extract the acoustic feature of the voice signal;
Calculate the likelihood value of the acoustic feature with each sound type model being obtained ahead of time;
Using the sound type model with maximum likelihood value as the corresponding sound type of the voice signal.
4. according to the method described in claim 3, it is characterized in that, it is described calculate the acoustic feature be obtained ahead of time it is each Before the likelihood value of sound type model, further includes:
Obtain each sound type model;
It is described to obtain each sound type model, comprising:
Multiple groups training data is collected, the training data includes the corresponding different types of voice signal of standard voice signal;
Extract the acoustic feature of the training data;
Model training, which is carried out, according to the acoustic feature of the training data obtains the corresponding sound class of different types of voice signal Pattern type.
5. according to the method described in claim 3, it is characterized in that, at selection audio corresponding with the sound type Reason method carries out the voice signal before audio effect processing, further includes:
Obtain the signal-to-noise ratio of the voice signal;
It is described that sound effect treatment method corresponding with the sound type is selected to carry out audio effect processing to the voice signal, comprising:
According to the noise of the voice signal when the voice signal sound type selection sound effect treatment method to the sound Sound signal carries out audio effect processing.
6. method according to claim 2 or 5, which is characterized in that described to select audio corresponding with the sound type Processing method carries out audio effect processing to the voice signal, comprising:
When the signal-to-noise ratio of the voice signal is less than the first snr threshold, and sound type is pure voice sound type, drop The low frequency part of the low voice signal, low frequency and medium-high frequency part in raising;
When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be pure people's sound class When type, reduces the difference between the maximum level and track average level of the voice track of the voice signal, improve the sound The high frequency section of sound signal;
When the voice signal signal-to-noise ratio be less than first snr threshold, and sound type be band accompany sound type When, the medium-high frequency and high frequency section of the voice signal are improved, and reduce the reverberation time;
When the voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be band accompany sound When sound type, the high frequency section of the voice signal is improved, high frequency section is compensated, removes low frequency part.
7. method as claimed in any of claims 1 to 5, which is characterized in that the method also includes:
Stereo process is carried out to the voice signal after progress audio effect processing.
8. the method according to the description of claim 7 is characterized in that the voice signal after described pair of progress audio effect processing mixes Sound is handled
Determine types of songs belonging to the voice signal;
Search the first energy ratio condition of the corresponding voice of types of songs belonging to the voice signal and accompaniment;
Calculate the second energy ratio of voice and accompaniment in the voice signal;
When second energy ratio is unsatisfactory for the first energy ratio condition, according to the first energy ratio condition adjustment Voice or accompaniment in voice signal;
By the voice and accompaniment progress audio mixing in the voice signal adjusted.
9. a kind of sound-effect processing equipment characterized by comprising
Signal acquisition unit is used for collected sound signal;
Type determining units, for determining the corresponding sound type of the voice signal, the sound type include pure voice and Voice with accompaniment;
Audio effect processing unit, for selecting sound effect treatment method corresponding with the sound type to carry out sound to the voice signal Effect processing.
10. device according to claim 9, which is characterized in that the type determining units include:
Framing subelement, for carrying out framing to the voice signal;
Detection sub-unit, for carrying out end-point detection to each frame voice signal, obtaining the voiced segments of the voice signal and making an uproar Segment;
First computation subunit, the signal-to-noise ratio of the voiced segments for calculating the voice signal;
First determines subelement, for determining the corresponding sound type of the voice signal according to the signal-to-noise ratio.
11. device according to claim 10, which is characterized in that the type determining units include:
Subelement is extracted, for extracting the acoustic feature of the voice signal;
Second computation subunit, for calculating the likelihood value of the acoustic feature with each sound type model being obtained ahead of time;
Second determines subelement, for that will have the sound type model of maximum likelihood value as the corresponding sound of the voice signal Sound type.
12. device according to claim 11, which is characterized in that further include:
Model acquiring unit, for obtaining each sound class pattern before second computation subunit calculates the likelihood value Type;
The model acquiring unit includes:
Subelement is collected, for collecting multiple groups training data, the training data includes the corresponding inhomogeneity of standard voice signal The voice signal of type;
Feature extraction subelement, for extracting the acoustic feature of the training data;
Model training subelement obtains different types of sound for carrying out model training according to the acoustic feature of the training data The corresponding sound type model of sound signal.
13. device according to claim 11, which is characterized in that described device further include:
Data capture unit, for obtaining the letter of the voice signal before the audio effect processing unit carries out audio effect processing It makes an uproar ratio;
The audio effect processing unit, specifically for the sound type according to the noise of the voice signal when voice signal Sound effect treatment method is selected to carry out audio effect processing to the voice signal.
14. device described in 0 or 13 according to claim 1, which is characterized in that
The audio effect processing unit, specifically for the signal-to-noise ratio when the voice signal less than the first snr threshold, and sound When type is pure voice sound type, the low frequency part of the voice signal is reduced, low frequency and medium-high frequency part in raising;Work as institute State voice signal signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be pure voice sound type when, Reduce the difference between the maximum level and track average level of the voice track of the voice signal, improves the voice signal High frequency section;When the voice signal signal-to-noise ratio be less than first snr threshold, and sound type be band accompany When sound type, the medium-high frequency and high frequency section of the voice signal are improved, and reduces the reverberation time;When the voice signal Signal-to-noise ratio be greater than or equal to first snr threshold, and sound type be band accompany sound type when, improve the sound The high frequency section of sound signal, compensates high frequency section, removes low frequency part.
15. the device according to any one of claim 9 to 13, which is characterized in that described device further include:
Stereo process unit, for carrying out stereo process to the voice signal after progress audio effect processing.
16. device according to claim 15, which is characterized in that the stereo process unit includes:
Third determines subelement, for determining types of songs belonging to the voice signal;
Subelement is searched, for searching the first energy ratio of the corresponding voice of types of songs belonging to the voice signal and accompaniment Condition;
Third computation subunit, for calculating the second energy ratio of voice and accompaniment in the voice signal;
Subelement is adjusted, for when second energy ratio is unsatisfactory for the first energy ratio condition, according to first energy Amount adjusts voice or accompaniment in the voice signal than condition;
Audio mixing subelement, for the voice in the voice signal adjusted to be carried out audio mixing with accompaniment.
CN201410472853.5A 2014-09-16 2014-09-16 A kind of sound effect treatment method and device Active CN105405448B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410472853.5A CN105405448B (en) 2014-09-16 2014-09-16 A kind of sound effect treatment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410472853.5A CN105405448B (en) 2014-09-16 2014-09-16 A kind of sound effect treatment method and device

Publications (2)

Publication Number Publication Date
CN105405448A CN105405448A (en) 2016-03-16
CN105405448B true CN105405448B (en) 2019-09-03

Family

ID=55470891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410472853.5A Active CN105405448B (en) 2014-09-16 2014-09-16 A kind of sound effect treatment method and device

Country Status (1)

Country Link
CN (1) CN105405448B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106486132A (en) * 2015-08-26 2017-03-08 天津三星电子有限公司 A kind of method adjusting display terminal volume and its display terminal
CN107978318A (en) * 2016-10-21 2018-05-01 咪咕音乐有限公司 A kind of real-time sound mixing method and device
CN107527611A (en) * 2017-08-23 2017-12-29 武汉斗鱼网络科技有限公司 MFCC audio recognition methods, storage medium, electronic equipment and system
WO2019169551A1 (en) * 2018-03-06 2019-09-12 深圳市沃特沃德股份有限公司 Voice processing method and device, and electronic apparatus
CN109151702B (en) * 2018-09-21 2021-10-08 歌尔科技有限公司 Sound effect adjusting method of audio equipment, audio equipment and readable storage medium
CN111048108B (en) * 2018-10-12 2022-06-24 北京微播视界科技有限公司 Audio processing method and device
CN109087634A (en) * 2018-10-30 2018-12-25 四川长虹电器股份有限公司 A kind of sound quality setting method based on audio classification
CN111385688A (en) * 2018-12-29 2020-07-07 安克创新科技股份有限公司 Active noise reduction method, device and system based on deep learning
CN109828740B (en) * 2019-01-21 2021-06-08 北京小唱科技有限公司 Audio adjusting method and device
CN109830244A (en) * 2019-01-21 2019-05-31 北京小唱科技有限公司 Dynamic reverberation processing method and processing device for audio
CN109920397B (en) * 2019-01-31 2021-06-01 李奕君 System and method for making audio function in physics
CN110072181B (en) * 2019-03-27 2021-03-19 广州飞达音响股份有限公司 Bass intensifying method and device
CN110047514B (en) * 2019-05-30 2021-05-28 腾讯音乐娱乐科技(深圳)有限公司 Method for evaluating purity of accompaniment and related equipment
CN110288983B (en) * 2019-06-26 2021-10-01 上海电机学院 Voice processing method based on machine learning
CN110677716B (en) * 2019-08-20 2022-02-01 咪咕音乐有限公司 Audio processing method, electronic device, and storage medium
CN112489664B (en) * 2020-11-30 2023-08-01 广州趣丸网络科技有限公司 Sound mixing method and device
CN112669811B (en) * 2020-12-23 2024-02-23 腾讯音乐娱乐科技(深圳)有限公司 Song processing method and device, electronic equipment and readable storage medium
CN112992167A (en) * 2021-02-08 2021-06-18 歌尔科技有限公司 Audio signal processing method and device and electronic equipment
WO2023044608A1 (en) * 2021-09-22 2023-03-30 京东方科技集团股份有限公司 Audio adjustment method, apparatus and device, and storage medium
CN117198321B (en) * 2023-11-08 2024-01-05 方图智能(深圳)科技集团股份有限公司 Composite audio real-time transmission method and system based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1457216A (en) * 2002-05-10 2003-11-19 日本先锋公司 Digital set style echo effect decoder
CN1491018A (en) * 2002-10-14 2004-04-21 中国科学院声学研究所 Echo cancellating and phonetic testing method and apparatus for dialogue interactive front end
CN103559876A (en) * 2013-11-07 2014-02-05 安徽科大讯飞信息科技股份有限公司 Sound effect processing method and sound effect processing system
CN103744666A (en) * 2013-12-23 2014-04-23 乐视致新电子科技(天津)有限公司 Method and device for adjusting audio frequencies in Android device
CN103927146A (en) * 2014-04-30 2014-07-16 深圳市中兴移动通信有限公司 Sound effect self-adapting method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101046956A (en) * 2006-03-28 2007-10-03 国际商业机器公司 Interactive audio effect generating method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1457216A (en) * 2002-05-10 2003-11-19 日本先锋公司 Digital set style echo effect decoder
CN1491018A (en) * 2002-10-14 2004-04-21 中国科学院声学研究所 Echo cancellating and phonetic testing method and apparatus for dialogue interactive front end
CN103559876A (en) * 2013-11-07 2014-02-05 安徽科大讯飞信息科技股份有限公司 Sound effect processing method and sound effect processing system
CN103744666A (en) * 2013-12-23 2014-04-23 乐视致新电子科技(天津)有限公司 Method and device for adjusting audio frequencies in Android device
CN103927146A (en) * 2014-04-30 2014-07-16 深圳市中兴移动通信有限公司 Sound effect self-adapting method and device

Also Published As

Publication number Publication date
CN105405448A (en) 2016-03-16

Similar Documents

Publication Publication Date Title
CN105405448B (en) A kind of sound effect treatment method and device
CN104538011B (en) A kind of tone adjusting method, device and terminal device
US20190115032A1 (en) Analysing speech signals
CN101627427B (en) Voice emphasis device and voice emphasis method
CN104080024B (en) Volume leveller controller and control method and audio classifiers
US20210256971A1 (en) Detection of replay attack
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
CN106782504A (en) Audio recognition method and device
US20050143997A1 (en) Method and apparatus using spectral addition for speaker recognition
CN105405439A (en) Voice playing method and device
CN104700843A (en) Method and device for identifying ages
US20210335364A1 (en) Computer program, server, terminal, and speech signal processing method
CN103915093B (en) A kind of method and apparatus for realizing singing of voice
CN104123933A (en) Self-adaptive non-parallel training based voice conversion method
Pillos et al. A Real-Time Environmental Sound Recognition System for the Android OS.
Maruri et al. V-speech: Noise-robust speech capturing glasses using vibration sensors
US10854182B1 (en) Singing assisting system, singing assisting method, and non-transitory computer-readable medium comprising instructions for executing the same
CN104707331B (en) A kind of game body-sensing production method and device
CN110400571A (en) Audio-frequency processing method, device, storage medium and electronic equipment
CN110148419A (en) Speech separating method based on deep learning
Eklund Data augmentation techniques for robust audio analysis
CN109300470A (en) Audio mixing separation method and audio mixing separator
US10839810B2 (en) Speaker enrollment
US11528571B1 (en) Microphone occlusion detection
Zouhir et al. A bio-inspired feature extraction for robust speech recognition

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant