CN105405448A

CN105405448A - Sound effect processing method and apparatus

Info

Publication number: CN105405448A
Application number: CN201410472853.5A
Authority: CN
Inventors: 王影; 孙见青; 江源; 胡国平; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-09-16
Filing date: 2014-09-16
Publication date: 2016-03-16
Anticipated expiration: 2034-09-16
Also published as: CN105405448B

Abstract

The invention discloses a sound effect processing method and apparatus. The method comprises the steps of acquiring a sound signal; determining the type of sound corresponding to the sound signal; and selecting a sound effect processing method corresponding to the type of sound for sound effect processing of the sound signal. The invention can be used for effectively improving effects of the sound effect processing.

Description

A kind of sound effect treatment method and device

Technical field

The application relates to signal processing technology field, particularly relates to a kind of sound effect treatment method and device.

Background technology

Along with internet is fast-developing and the universal exploitation of terminal applies, increasing user is able to more easily online K song on the terminal device, substantially increases the convenience of user's singing, improves Consumer's Experience.But the online K song application on terminal device product makes user K sing circumstance complication, system acquisition to Speech input not only may comprise voice, the accompaniment sound put outside when also comprising singing and the neighbourhood noise etc. of various existence, such as, when user uses earphone to record, the external world can't hear musical background sound, and the sound now collected is pure voice; Otherwise when outer relieving is opened by user, as utilized audio amplifier to play music, in the sound now gathered, a small amount of accompaniment tone will be mixed with; And in a noisy environment, the sound collected also will containing interference such as noises.The complicacy of sound source input causes very large difficulty to audio effect processing, also have impact on the Consumer's Experience that K sings product simultaneously.

At present, the sound effect treatment method that mobile terminal adopts is optimized based on uniform principles the Speech input gathered, and namely all carries out effect process according to the appointment sound effect treatment method preset to different Speech input types.But different Speech inputs, the voice as pure voice and band accompaniment is very different on Data distribution8, adopts unified method to carry out audio effect processing obviously too generally to Speech input, cannot meet audio effect of optimization.

Summary of the invention

For solving the problems of the technologies described above, the embodiment of the present application provides a kind of sound effect treatment method and device, can improve the effect of audio effect processing.Technical scheme is as follows:

A kind of sound effect treatment method, comprising:

Collected sound signal;

Determine the sound type that described voice signal is corresponding;

The sound effect treatment method corresponding with described sound type is selected to carry out audio effect processing to described voice signal.

Preferably, describedly determine the sound type that described voice signal is corresponding, comprising:

Framing is carried out to described voice signal;

End-point detection is carried out to each frame voice signal, obtains voiced segments and the noise section of described voice signal;

Calculate the signal to noise ratio (S/N ratio) of the described voiced segments of described voice signal;

The sound type that described voice signal is corresponding is determined according to described signal to noise ratio (S/N ratio).

Extract the acoustic feature of described voice signal;

Calculate the likelihood value of described acoustic feature and each sound type model to obtain in advance;

Using the sound type model with maximum likelihood value as sound type corresponding to described voice signal.

Preferably, before the likelihood value of the described acoustic feature of described calculating and each sound type model to obtain in advance, also comprise:

Obtain each sound type model;

The each sound type model of described acquisition, comprising:

Collect and organize training data more, described training data comprises dissimilar voice signal corresponding to standard voice signal;

Extract the acoustic feature of described training data;

Carry out model training according to the acoustic feature of described training data and obtain sound type model corresponding to dissimilar voice signal.

Preferably, before the described selection sound effect treatment method corresponding with described sound type carries out audio effect processing to described voice signal, also comprise:

Obtain the signal to noise ratio (S/N ratio) of described voice signal;

The described selection sound effect treatment method corresponding with described sound type carries out audio effect processing to described voice signal, comprising:

Sound type according to the noise of described voice signal when described voice signal selects sound effect treatment method to carry out audio effect processing to described voice signal.

Preferably, the described selection sound effect treatment method corresponding with described sound type carries out audio effect processing to described voice signal, comprising:

When the signal to noise ratio (S/N ratio) of described voice signal is less than the first snr threshold, and when sound type is pure voice sound type, reduce the low frequency part of described voice signal, improve medium and low frequency and medium-high frequency part;

When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and sound type is when being pure people's sound type, reduce the difference between the maximum level of the voice track of described voice signal and track average level, improve the HFS of described voice signal;

When the signal to noise ratio (S/N ratio) of described voice signal is less than described first snr threshold, and when sound type is the sound type of band accompaniment, improves medium-high frequency and the HFS of described voice signal, and reduce the reverberation time;

When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and when sound type is the sound type of band accompaniment, improves the HFS of described voice signal, HFS is compensated, remove low frequency part.

Preferably, described method also comprises:

Stereo process is carried out to the voice signal carried out after audio effect processing.

Preferably, describedly stereo process carried out to the voice signal after carrying out audio effect processing comprise:

Determine the types of songs belonging to described voice signal;

Search the first energy Ratios condition of voice corresponding to types of songs belonging to described voice signal and accompaniment;

Calculate the second energy Ratios of voice and accompaniment in described voice signal;

When described second energy Ratios does not meet described first energy Ratios condition, adjust voice in described voice signal or accompaniment according to described first energy Ratios condition;

Voice in described voice signal after adjustment and accompaniment are carried out audio mixing.

A kind of sound-effect processing equipment, comprising:

Signal gathering unit, for collected sound signal;

Type determining units, for determining the sound type that described voice signal is corresponding;

Audio effect processing unit, carries out audio effect processing for selecting the sound effect treatment method corresponding with described sound type to described voice signal.

Preferably, described type determining units comprises:

Framing subelement, for carrying out framing to described voice signal;

Detection sub-unit, for carrying out end-point detection to each frame voice signal, obtains voiced segments and the noise section of described voice signal;

First computation subunit, for calculating the signal to noise ratio (S/N ratio) of the described voiced segments of described voice signal;

First determines subelement, for determining according to described signal to noise ratio (S/N ratio) the sound type that described voice signal is corresponding.

Preferably, described type determining units comprises:

Extract subelement, for extracting the acoustic feature of described voice signal;

Second computation subunit, for calculating the likelihood value of described acoustic feature and each sound type model to obtain in advance;

Second determines subelement, for the sound type model that will have a maximum likelihood value as sound type corresponding to described voice signal.

Preferably, described device also comprises:

Model acquiring unit, before calculating described likelihood value in described second computation subunit, obtains each sound type model;

Described model acquiring unit comprises:

Collect subelement, organize training data for collecting, described training data comprises dissimilar voice signal corresponding to standard voice signal more;

Feature extraction subelement, for extracting the acoustic feature of described training data;

Model training subelement, obtains sound type model corresponding to dissimilar voice signal for carrying out model training according to the acoustic feature of described training data.

Preferably, described device also comprises:

Data capture unit, before carrying out audio effect processing at described audio effect processing unit, obtains the signal to noise ratio (S/N ratio) of described voice signal;

Described audio effect processing unit, specifically for according to the noise of described voice signal when described voice signal sound type select sound effect treatment method audio effect processing is carried out to described voice signal.

Preferably, described audio effect processing unit, is less than the first snr threshold specifically for the signal to noise ratio (S/N ratio) when described voice signal, and when sound type is pure voice sound type, reduces the low frequency part of described voice signal, improves medium and low frequency and medium-high frequency part; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and sound type is when being pure voice sound type, reduce the difference between the maximum level of the voice track of described voice signal and track average level, improve the HFS of described voice signal; When the signal to noise ratio (S/N ratio) of described voice signal is less than described first snr threshold, and when sound type is the sound type of band accompaniment, improves medium-high frequency and the HFS of described voice signal, and reduce the reverberation time; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and when sound type is the sound type of band accompaniment, improves the HFS of described voice signal, HFS is compensated, remove low frequency part.

Preferably, described device also comprises:

Stereo process unit, for carrying out stereo process to the voice signal carried out after audio effect processing.

Preferably, described stereo process unit comprises:

3rd determines subelement, for determining the types of songs belonging to described voice signal;

Search subelement, for searching the first energy Ratios condition of voice corresponding to types of songs belonging to described voice signal and accompaniment;

3rd computation subunit, for calculating the second energy Ratios of voice and accompaniment in described voice signal;

Adjustment subelement, for when described second energy Ratios does not meet described first energy Ratios condition, adjusts voice in described voice signal or accompaniment according to described first energy Ratios condition;

Audio mixing subelement, for carrying out audio mixing by the voice in the described voice signal after adjustment and accompaniment.

The embodiment of the present invention at least has following beneficial effect:

The embodiment of the present invention by distinguishing the voice signal collected, and adopts diverse ways to carry out audio effect processing for different sound types, makes the further refinement of audio effect processing, thus obtains more excellent audio effect.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present application or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the application, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the method flow diagram of a kind of audio effect processing of the embodiment of the present invention;

Fig. 2 is a kind of method flow diagram determining the sound type that voice signal is corresponding in the embodiment of the present invention;

Fig. 3 is the another kind of method flow diagram determining the sound type that voice signal is corresponding in the embodiment of the present invention;

Fig. 4 is a kind of method flow diagram obtaining sound type model in the embodiment of the present invention;

Fig. 5 is the method flow diagram that in the embodiment of the present invention, a kind of voice signal to carrying out after audio effect processing carries out stereo process;

Fig. 6 is the structural representation of a kind of sound-effect processing equipment of the embodiment of the present invention;

Fig. 7 is the structural representation of a kind of type determining units in the embodiment of the present invention;

Fig. 8 is the structural representation of another kind of type determining units in the embodiment of the present invention;

Fig. 9 is the structural representation of the another kind of sound-effect processing equipment of the embodiment of the present invention;

Figure 10 is the structural representation of a kind of model acquiring unit in the embodiment of the present invention;

Figure 11 is the structural representation of the another kind of sound-effect processing equipment of the embodiment of the present invention;

Figure 12 is the structural representation of a kind of stereo process unit in the embodiment of the present invention.

Embodiment

Technical scheme in the application is understood better in order to make those skilled in the art person, below in conjunction with the accompanying drawing in the embodiment of the present application, technical scheme in the embodiment of the present application is clearly and completely described, obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all should belong to the scope of the application's protection.

For enabling above-mentioned purpose, the feature and advantage of the application more become apparent, below in conjunction with the drawings and specific embodiments, the application is described in further detail.

See Fig. 1, it is the method flow diagram of a kind of audio effect processing of the embodiment of the present invention.

The method can comprise:

Step 101, collected sound signal.

This voice signal not only may comprise voice, also may comprise other ambient sounds, the accompaniment sound put such as, when singing outside and the neighbourhood noise etc. of various existence.

Step 102, determines the sound type that voice signal is corresponding.

The various analysis such as KLR signal approach or mathematical statistical model method can be adopted in the embodiment of the present invention to determine the sound type corresponding to this voice signal according to the acoustic feature of this voice signal, particularly, the method of signal analysis is the type that the feature such as signal to noise ratio (S/N ratio) by analyzing sound import or energy determines sound, the method of mathematical statistical model is that acoustic feature by extracting sound import such as MFCC (Mel frequency cepstral coefficient) determines the type of sound by the method for statistical modeling, this case adopts carries out statistical modeling based on the method for DNN, specifically refer to the description of subsequent embodiment.Wherein, sound type can comprise the voice etc. of pure voice, band accompaniment.

Step 103, selects the sound effect treatment method corresponding with sound type to carry out audio effect processing to voice signal.

Pre-set the sound effect treatment method that alternative sounds type is corresponding, after the sound type of above-mentioned steps determination voice signal, trigger corresponding sound effect treatment method and audio effect processing is carried out to voice signal.Wherein, the sound effect treatment method preset can comprise the sound effect treatment method etc. of the voice of pure voice and band accompaniment.

In an embodiment of the present invention, this determines that the method for the sound type that voice signal is corresponding can adopt the method based on signal analysis, and specifically as shown in Figure 2, the method can comprise:

Step 201, carries out framing to voice signal.

First this voice signal is divided into multiple speech frame.

Step 202, carries out end-point detection to each frame voice signal, obtains voiced segments and the noise section of voice signal.

In this step the end-point detection of each frame voice signal is comprised:

First, short-time average energy and the short-time average zero-crossing rate of every frame voice signal is calculated; Short-time average energy and short-time average zero-crossing rate are signal analysis methods common in Speech processing, so-called short-time energy, are exactly first to carry out sub-frame processing to voice signal, then ask its energy respectively to each frame.The short-time energy of voiced sound is maximum, and the short-time energy of voiceless sound is taken second place, and noiseless short-time energy is minimum.Short-time zero-crossing rate represents the number of times of a frame voice signal waveform through transverse axis (zero level).

Then, the short-time average energy of more every frame voice signal and the higher threshold T1 of short-time average energy pre-set respectively, more every frame voice signal the higher threshold T2 of short-time average zero-crossing rate and the short-time average zero-crossing rate pre-set.General voiced segments energy is higher, and its short-time average energy is comparatively large, and short-time average zero-crossing rate is less, and the short-time average energy of noise section is less, and short-time average zero-crossing rate is higher.Therefore, when the short-time average energy of speech frame is greater than T1, when short-time average zero-crossing rate is less than T2, this speech frame belongs to voiced segments, when the short-time average energy of speech frame is less than T1, when short-time average zero-crossing rate is greater than T2, this speech frame belongs to noise section, thus, the voiced segments in voice signal and noise section can be obtained.

Step 203, calculates the signal to noise ratio (S/N ratio) of the voiced segments of voice signal.

Generally, signal to noise ratio (S/N ratio) larger explanation signal intensity is larger, and noise is less.More the less noise of novel clear signal intensity is larger for signal to noise ratio (S/N ratio).When signal to noise ratio (S/N ratio) reaches more than 60db, noise is acceptable substantially.Similarly to the prior art, signal to noise ratio (S/N ratio) db=10log (S/N), repeats no more the computing method of this signal to noise ratio (S/N ratio) herein.

Step 204, the sound type corresponding according to this signal to noise ratio (S/N ratio) determination voice signal.

Concrete can arrange the second snr threshold, if the signal to noise ratio (S/N ratio) of present sound signals is less than the second snr threshold of setting, then can judge that the sound type of present sound signals is as the voice with accompaniment, otherwise sound type is pure voice.

Above-mentioned signal analysis method is generally more responsive to environment, analysis result is easily affected by environment and occur error, in an embodiment of the present invention, this determines that the method for the sound type that voice signal is corresponding can adopt the method analyzed based on mathematical statistical model, specifically as shown in Figure 3, the method can comprise:

Step 301, extracts the acoustic feature of voice signal.

Can extract conventional acoustic feature in this step is Mel frequency cepstral coefficient, referred to as MFCC.

Step 302, calculates the likelihood value of this acoustic feature and each sound type model to obtain in advance.

Before this step of execution, collect mass data in advance and carry out acoustic feature extraction, then training obtains the distributed model of alternative sounds type, specifically as described in subsequent embodiment.The distributed model of this alternative sounds type is actual, and what characterize is the acoustic feature of alternative sounds type.

Calculate the likelihood value between the acoustic feature of voice signal and the distributed model of alternative sounds type in this step respectively.The computing method of this likelihood value similarly to the prior art, repeat no more herein.

Step 303, using the sound type model with maximum likelihood value as sound type corresponding to this voice signal.

Wherein, needed to obtain each sound type model in advance before the above-mentioned sound type corresponding based on the method determination voice signal of mathematical statistical model analysis of employing, sound type model construction can adopt various different model hypothesis, as GMM model, and HMM model etc.In general model is more complicated, then its distribution simulation precision is higher.To this, this case proposes a kind of sound distributed model construction method based on neural network, and for simulating the characteristic distributions of the voice of pure voice and band accompaniment, the method for this acquisition sound type model, as shown in Figure 4, can comprise:

Step 401, collect and organize training data more, this training data comprises dissimilar voice signal corresponding to standard voice signal.

In this step, first dissimilar voice signal corresponding to standard voice signal is collected as training data, for this voice signal for song is described, first two kinds of different types of data of N song are collected as training data in the present embodiment, namely the data of two types such as pure voice and the voice containing accompaniment are gathered respectively for same song, then training data 2N song altogether.This case equalizing training efficiency and precision, select N=500, sampling rate is 16kHz.

Step 402, extracts the acoustic feature of training data.

This step carries out framing to training data, and extracts 13 dimension MFCC spectrum signatures of every frame voice, and using the input as DNN of 13 dimension MFCC features of present frame and front and back 5 frame voice and dynamic parameter thereof, to consider context-related information.

Step 403, carries out model training according to the acoustic feature of training data and obtains sound type model corresponding to dissimilar voice signal.

Be described to train DNN model.Specifically comprise:

First, DNN model topology structure is determined.It is that (13*3*11, considers static state and dynamic parameter to 429 nodes, and the intrinsic dimensionality of every frame voice is 39 dimensions that this step arranges DNN input layer, consider contextual information, use the feature of present frame and front and back 5 frame thereof, totally 11 frame characteristic parameters), for receiving acoustic feature; Output layer represents each classified information, represents with 0,1, and wherein 0 represents pure voice, and 1 represents the voice containing accompaniment, comprises 2 nodes; Its hidden layer adopts 3 layers of setting, and wherein the nodes of each hidden layer is 2048.

Then, according to training data, this DNN model topology structure is trained, obtain model parameter, the i.e. weight coefficient of DNN.Special this case exploitation collection data, adjust suitable parameter update times, as 20 times, carry out model optimization.

This carries out model training and the final process obtaining model is prior art, repeats no more herein.

The concrete sound type corresponding to voice signal can be determined by methods such as above-mentioned signal analysis or mathematical statistical model analyses, then utilize the disposal route of audio template corresponding to preset alternative sounds type to process voice signal, obtain the audio result optimized.

In another embodiment, larger difference should be there is to the sound effect treatment method of voice signal under considering different noisy environments, this this case is also segmented sound effect treatment method further according to the different SNR ranges of voice signal, sound type according to the noise of voice signal when voice signal selects sound effect treatment method to carry out audio effect processing to voice signal, specifically comprises:

When the signal to noise ratio (S/N ratio) of voice signal is less than the first snr threshold, and sound type is when being pure voice sound type, the low frequency part 0-80Hz of voice signal is suitably reduced by balanced device, as reduced about 5 decibels, promote medium and low frequency 150Hz-500Hz part, as promoted about about 5 decibels, for improving dynamics and the loudness of sound; Then can promote about 2 to 3 decibels for medium-high frequency 2KHz-5KHz, improve the penetration power etc. of sound;

When the signal to noise ratio (S/N ratio) of voice signal is more than or equal to the first snr threshold, and sound type is when being pure voice sound type, the dynamic range of voice track is controlled by compressor reducer, namely, reduce the difference between the maximum level of the voice track of voice signal and track average level, voice track and whole music can be made so more to coordinate, and make sound sound more plentiful, more strong.Compression ratio parameter is all arranged between 2:1 to 8:1 usually, and threshold threshold parameter is all set between-5dB to-20dB usually, specifically determines according to level during collected sound signal and its dynamic range.Suitably improve high frequency 7KHz-8KHz part by balanced device, 2 to 3 decibels according to appointment, improve the stereovision of sound;

When the signal to noise ratio (S/N ratio) of voice signal is less than the first snr threshold, and during the sound type that sound type is band accompanies, be different from pure people's acoustic environment, impact owing to being accompanied under band accompaniment environment, actual noise should be so not large, corresponding adjustment should be done to the change of voice, suitably should strengthen voice, as suitably promoted 5 decibels to the medium-high frequency part 2kHz-5kHz of voice, improve the penetration power of voice, 2 decibels are suitably promoted to HFS 7kHz-8kHz, increase the clear sense of voice, during audio effect processing, reverberation time should arrange smaller, as T60=1.2, prevent in voice with a small amount of accompaniment and pure accompaniment have skew, accompany after causing audio mixing and have delay in time, accompaniment sounds unintelligible.During audio mixing, suitably increase voice and accompaniment ratio, reach the object strengthening voice.Frequency is low-order harmonic and the overtone that the part of 500Hz-2KHz comprises most of musical instrument, and suitable lifting can make sound thoroughly become clear, as promoted 2 decibels;

When the signal to noise ratio (S/N ratio) of voice signal is more than or equal to the first snr threshold, and during the sound type that sound type is band accompanies, promoted the high frequency 6KHz-8KHz part of voice signal by balanced device, high frequency is compensated, can remove the part of below 80Hz simultaneously, increase the clear sense of sound.

Above-mentioned first snr threshold can set according to application needs, such as 50 decibels etc.In addition, it should be noted that, above-mentioned first snr threshold and the second snr threshold are generally unequal, all can set according to practical application request.

It should be noted that, in embodiments of the present invention, frequency can be divided into low frequency at 0-150HZ, 150-500HZ is medium and low frequency, 500-2KHZ is intermediate frequency, 2K-5KHZ is medium-high frequency, more than 5KHZ is high frequency.Certainly, its division is not fixing, can adjust accordingly according to practical situations.In addition, in the embodiment of the present invention, the frequency band of audio effect processing neither be fixed, and can adjust according to practical application request.

In another embodiment, the method also according to application demand, can carry out stereo process to the voice signal after audio effect processing.

Take voice signal as song be example, can carry out to this song the song that audio mixing obtains beautifying.Consider that when audio mixing in dissimilar song, voice is different from the energy Ratios of accompaniment, and whether with accompaniment in voice signal, need to carry out audio mixing in two kinds of situation.To the voice signal not with accompaniment, audio mixing can be carried out according to the energy proportion of normal voice and accompaniment, and for the voice signal with accompaniment tone, shared for accompaniment energy Ratios is needed to turn down, after preventing audio mixing, accompaniment tone is excessive, the situation appearance that voice is too small, affects sense of hearing, wherein, the voice that different types of songs is corresponding is different from the energy Ratios of accompaniment.Concrete, this carries out the method for stereo process to the voice signal carried out after audio effect processing, as shown in Figure 5, comprising:

Step 501, determines the types of songs belonging to voice signal.

Types of songs refers to that song is the different-style such as tenderness, rock and roll type.

Step 502, searches the first energy Ratios condition of voice corresponding to types of songs belonging to voice signal and accompaniment.

The voice of different types of songs is with the energy Ratios condition of accompaniment can experimentally result be prior preset in systems in which.This first energy Ratios condition can be specifically a proportional range, also can be other conditions.

The accompaniment of such as tenderness song and the first energy Ratios of voice are generally 1.1, accompany bigger; First energy Ratios of rock song voice and accompaniment is 10/9, accompanies smaller.

Need the first preset energy Ratios condition of searching corresponding to this voice signal in this step, then perform step 504.

Step 503, calculates the second energy Ratios of voice and accompaniment in voice signal.

Energy Ratios=people's acoustic energy/accompaniment the energy of voice and accompaniment, the calculating of acoustic energy like the prior art.

Step 504, judges whether the second energy Ratios meets the first energy Ratios condition.

If meet, perform step 505, if do not meet, perform step 506.

Step 505, directly carries out audio mixing by accompaniment and voice.

Do not do to change, directly by two Signal averaging to signal during audio mixing.

Step 506, according to the voice in the first energy Ratios condition adjustment voice signal or accompaniment, then carries out audio mixing.

The accompaniment that such as a certain song is preset and the first energy Ratios of voice are 0.9, and the accompaniment of current actual computation and the second energy Ratios of voice are 1.2, illustrate that accompaniment sound is a little large, need to promote voice.People's acoustic energy=accompaniment energy/0.9 after adjustment.Then the voice in the voice signal after adjustment and accompaniment are carried out audio mixing.

The present embodiment, by carrying out to the voice signal after audio optimization the song that stereo process obtains beautifying discriminatively, improves the amusement function of Karaoke.

Be more than the description to the inventive method embodiment, below the device realizing said method be introduced.

See Fig. 6, it is the structural representation of a kind of sound-effect processing equipment of the embodiment of the present invention.

This device can comprise:

Signal gathering unit 601, for collected sound signal.

Type determining units 602, for determining the sound type that described voice signal is corresponding.

Audio effect processing unit 603, carries out audio effect processing for selecting the sound effect treatment method corresponding with described sound type to described voice signal.

This device is distinguished the voice signal collected by said units, and adopts diverse ways to carry out audio effect processing for different sound types, makes the further refinement of audio effect processing, thus obtains more excellent audio effect.

In one embodiment, as shown in Figure 7, the type determining unit 602 specifically can comprise:

Framing subelement 701, for carrying out framing to voice signal.

Detection sub-unit 702, for carrying out end-point detection to each frame voice signal, obtains voiced segments and the noise section of voice signal.

First computation subunit 703, for calculating the described voiced segments signal to noise ratio (S/N ratio) of described voice signal.

First determines subelement 704, for determining according to described signal to noise ratio (S/N ratio) the sound type that described voice signal is corresponding.

In another embodiment, as shown in Figure 8, the type determining unit 602 specifically can comprise:

Extract subelement 801, for extracting the acoustic feature of described voice signal.

Second computation subunit 802, for calculating the likelihood value of described acoustic feature and each sound type model to obtain in advance.

Second determines subelement 803, for the sound type model that will have a maximum likelihood value as sound type corresponding to described voice signal.

When the sound type that type determining units 602 adopts mathematical statistical model analytical approach determination voice signal corresponding, as shown in Figure 9, this device is except comprising signal gathering unit 601, and type determining units 602, outside audio effect processing unit 603, can also comprise:

Model acquiring unit 901, before calculating described likelihood value in described second computation subunit 802, obtains each sound type model.

Data capture unit 902, before carrying out audio effect processing at described audio effect processing unit 603, obtains the signal to noise ratio (S/N ratio) of described voice signal.

Audio effect processing unit 603, specifically for according to the noise of described voice signal when described voice signal sound type select sound effect treatment method audio effect processing is carried out to described voice signal.

Wherein, this model acquiring unit 901, as shown in Figure 10, may further include:

Collect subelement 1001, organize training data for collecting, described training data comprises dissimilar voice signal corresponding to standard voice signal more.

Feature extraction subelement 1002, for extracting the acoustic feature of described training data.

Model training subelement 1003, obtains sound type model corresponding to dissimilar voice signal for carrying out model training according to the acoustic feature of described training data.

In another embodiment, audio effect processing unit 603, is less than the first snr threshold specifically for the signal to noise ratio (S/N ratio) when described voice signal, and when sound type is pure voice sound type, reduce the low frequency part of described voice signal, improve medium and low frequency and medium-high frequency part; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and sound type is when being pure voice sound type, reduce the difference between the maximum level of the voice track of described voice signal and track average level, improve the HFS of described voice signal; When the signal to noise ratio (S/N ratio) of described voice signal is less than described first snr threshold, and when sound type is the sound type of band accompaniment, improves medium-high frequency and the HFS of described voice signal, and reduce the reverberation time; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and when sound type is the sound type of band accompaniment, improves the HFS of described voice signal, HFS is compensated, remove low frequency part.

In another embodiment, as shown in figure 11, this device is except comprising signal gathering unit 601, and type determining units 602, outside audio effect processing unit 603, can also comprise:

Stereo process unit 1101, for carrying out stereo process to the voice signal carried out after audio effect processing.

As shown in figure 12, this stereo process unit 1101 specifically can comprise:

3rd determines subelement 1201, for determining the types of songs belonging to described voice signal.

Search subelement 1202, for searching the first energy Ratios condition of voice corresponding to types of songs belonging to described voice signal and accompaniment.

3rd computation subunit 1203, for calculating the second energy Ratios of voice and accompaniment in described voice signal.

Adjustment subelement 1204, for when described second energy Ratios does not meet described first energy Ratios condition, adjusts voice in described voice signal or accompaniment according to described first energy Ratios condition.

Audio mixing subelement 1205, for carrying out audio mixing by the voice in the described voice signal after adjustment and accompaniment.

The present embodiment carries out to the voice signal after audio optimization the song that stereo process obtains beautifying discriminatively by said units, improves the amusement function of Karaoke.

In above device, the specific implementation process of each unit refers to the description of preceding method embodiment, repeats no more herein.

For convenience of description, various unit is divided into describe respectively with function when describing above device.Certainly, the function of each unit can be realized in same or multiple software and/or hardware when implementing the application.

As seen through the above description of the embodiments, those skilled in the art can be well understood to the mode that the application can add required general hardware platform by software and realizes.Based on such understanding, the technical scheme of the application can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) perform the method described in some part of each embodiment of the application or embodiment.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.Especially, for system embodiment, because it is substantially similar to embodiment of the method, so describe fairly simple, relevant part illustrates see the part of embodiment of the method.System embodiment described above is only schematic, the wherein said unit illustrated as separating component or can may not be and physically separates, parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of module wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.Those of ordinary skill in the art, when not paying creative work, are namely appreciated that and implement.

The application can be used in numerous general or special purpose computing system environment or configuration.Such as: personal computer, server computer, handheld device or portable set, laptop device, multicomputer system, system, set top box, programmable consumer-elcetronics devices, network PC, small-size computer, mainframe computer, the distributed computing environment comprising above any system or equipment etc. based on microprocessor.

The application can describe in the general context of computer executable instructions, such as program module.Usually, program module comprises the routine, program, object, assembly, data structure etc. that perform particular task or realize particular abstract data type.Also can put into practice the application in a distributed computing environment, in these distributed computing environment, be executed the task by the remote processing devices be connected by communication network.In a distributed computing environment, program module can be arranged in the local and remote computer-readable storage medium comprising memory device.

The above is only the embodiment of the application; it should be pointed out that for those skilled in the art, under the prerequisite not departing from the application's principle; can also make some improvements and modifications, these improvements and modifications also should be considered as the protection domain of the application.

Claims

1. a sound effect treatment method, is characterized in that, comprising:

Collected sound signal;

Determine the sound type that described voice signal is corresponding;

2. method according to claim 1, is characterized in that, describedly determines the sound type that described voice signal is corresponding, comprising:

Framing is carried out to described voice signal;

3. method according to claim 1, is characterized in that, describedly determines the sound type that described voice signal is corresponding, comprising:

Extract the acoustic feature of described voice signal;

4. method according to claim 3, is characterized in that, before the likelihood value of the described acoustic feature of described calculating and each sound type model to obtain in advance, also comprises:

Obtain each sound type model;

The each sound type model of described acquisition, comprising:

Extract the acoustic feature of described training data;

5. method according to claim 3, is characterized in that, before the described selection sound effect treatment method corresponding with described sound type carries out audio effect processing to described voice signal, also comprises:

Obtain the signal to noise ratio (S/N ratio) of described voice signal;

6. the method according to claim 2 or 5, is characterized in that, the described selection sound effect treatment method corresponding with described sound type carries out audio effect processing to described voice signal, comprising:

7. method as claimed in any of claims 1 to 5, is characterized in that, described method also comprises:

8. method according to claim 7, is characterized in that, describedly carries out stereo process to the voice signal after carrying out audio effect processing and comprises:

Determine the types of songs belonging to described voice signal;

9. a sound-effect processing equipment, is characterized in that, comprising:

Signal gathering unit, for collected sound signal;

10. device according to claim 9, is characterized in that, described type determining units comprises:

Framing subelement, for carrying out framing to described voice signal;

11. devices according to claim 10, is characterized in that, described type determining units comprises:

12. devices according to claim 11, is characterized in that, also comprise:

Described model acquiring unit comprises:

13. devices according to claim 11, is characterized in that, described device also comprises:

14. devices according to claim 10 or 13, is characterized in that,

Described audio effect processing unit, is less than the first snr threshold specifically for the signal to noise ratio (S/N ratio) when described voice signal, and when sound type is pure voice sound type, reduces the low frequency part of described voice signal, improves medium and low frequency and medium-high frequency part; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and sound type is when being pure voice sound type, reduce the difference between the maximum level of the voice track of described voice signal and track average level, improve the HFS of described voice signal; When the signal to noise ratio (S/N ratio) of described voice signal is less than described first snr threshold, and when sound type is the sound type of band accompaniment, improves medium-high frequency and the HFS of described voice signal, and reduce the reverberation time; When the signal to noise ratio (S/N ratio) of described voice signal is more than or equal to described first snr threshold, and when sound type is the sound type of band accompaniment, improves the HFS of described voice signal, HFS is compensated, remove low frequency part.

15., according to the device in claim 9 to 13 described in any one, is characterized in that, described device also comprises:

16. devices according to claim 15, is characterized in that, described stereo process unit comprises: