CN105989835B

CN105989835B - Voice identification apparatus and speech identifying method

Info

Publication number: CN105989835B
Application number: CN201510060494.7A
Authority: CN
Inventors: 杜博仁; 张嘉仁; 曾凯盟
Original assignee: Acer Inc
Current assignee: Acer Inc
Priority date: 2015-02-05
Filing date: 2015-02-05
Publication date: 2019-08-13
Anticipated expiration: 2035-02-05
Also published as: CN105989835A

Abstract

The present invention provides a kind of voice identification apparatus and speech identifying method.The present invention according to low pass sampled signal energy and the ratio of raw tone sampled signal energy and the second consonant frequency band signals energy ratio at least one judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.The identification precision of consonant signal can be improved in the present invention.

Description

Voice identification apparatus and speech identifying method

Technical field

The invention relates to a kind of device for identifying, and in particular to a kind of voice identification apparatus and speech recognition side Method.

Background technique

For hearing-impaired people, the voice signal of higher-frequency, such as consonant letter often can not be clearly received Number, but the voice signal of low frequency can clearly be heard.Existing consonant signal judgment mode is to carry out in a frequency domain Signal processing, there are mainly two types of judgment modes, non-instant consonant signal judgement and the judgement of instant consonant.Non-instant consonant signal is sentenced It is disconnected, mainly judged by energy and zero-crossing rate.Instant consonant signal judgement, mainly according to high-frequency signal and gross energy Whether the ratio whether ratio is greater than a fixed value and low frequency signal and gross energy is less than fixed value to determine that voice is believed Number whether supplemented by sound signal.Though existing consonant signal judgment mode can distinguish consonant signal and noise, its right accuracy still without Method meets actual demand.

Summary of the invention

The present invention provides a kind of voice identification apparatus and speech identifying method, and the identification precision of consonant signal can be improved.

Voice identification apparatus of the invention includes filter unit and processing unit.Filter unit carries out voice signal low The bandpass filtering of pass filter, the first consonant frequency range and the second consonant frequency range, to generate low-pass filter signal, the first band logical respectively Filtering signal and the second bandpass filtered signal.Processing unit couples filter unit, by voice signal, low-pass filter signal, the One bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frames, wherein each speech frame includes N number of sampling letter Number, N is positive integer, calculates the energy of sampled signal in target voice frame, to obtain raw tone sampled signal energy, low pass takes Sample signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy, according to the second consonant frequency band signals The ratio calculation of energy, raw tone sampled signal energy and low pass sampled signal energy obtains the second consonant frequency band signals energy Ratio value, ratio and the second consonant frequency band signals energy according to low pass sampled signal energy and raw tone sampled signal energy Amount ratio value at least one judge correspondence target voice frame raw tone sampled signal whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned processing unit is also according to the first consonant frequency band signals energy and the second consonant The ratio of frequency band signals energy, the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy and second are auxiliary The raw tone of audio band signals energy target voice frame corresponding with the ratio in judgement of raw tone sampled signal energy samples letter It number whether is noise.

In one embodiment of this invention, above-mentioned processing unit also judges the first consonant frequency band signals energy and the second consonant The ratio of frequency band signals energy, the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy and second are auxiliary Whether audio band signals energy falls within corresponding default ratio range with the ratio of raw tone sampled signal energy respectively, if the The ratio of one consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone The ratio of the ratio of sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy is fallen respectively In corresponding default ratio range, then the raw tone sampled signal of target voice frame is noise signal.

In one embodiment of this invention, above-mentioned processing unit, which also calculates raw tone sampled signal energy and subtracts low pass, takes One energy differences of sample signal energy, and the ratio of the second consonant frequency band signals energy and energy differences is calculated, to obtain second Consonant frequency band signals energy ratio.

In one embodiment of this invention, above-mentioned processing unit is also sampled according to low pass sampled signal energy and raw tone Whether the ratio of signal energy is less than the first default ratio and low pass sampled signal energy and raw tone sampled signal energy Ratio whether be located in preset energy ratio range and the second consonant frequency band signals energy ratio whether to be greater than second default Ratio, come judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.

In one embodiment of this invention, wherein if the ratio of low pass sampled signal energy and raw tone sampled signal energy The ratio being worth less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy In ratio range and the second consonant frequency band signals energy ratio is greater than the second default ratio, processing unit also calculate it is multiple before It is judged as the energy weighted average of the raw tone sampled signal of noise signal, to obtain noise signal energy weighted average Value, and whether it is greater than noise signal energy weighted average according to raw tone sampled signal energy corresponding to target voice frame Come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal The weighted value of speech frame with each raw tone sampled signal for being judged as noise signal of correspondence speech frame and target voice The interval length of frame is different and changes.

In one embodiment of this invention, above-mentioned processing unit also calculates target voice frame and before target voice frame The average value of the ratio of the corresponding low pass sampled signal energy of multiple speech frames and raw tone sampled signal energy is low to obtain Logical sampled signal energy proportion average value, and whether it is less than default average value according to low pass sampled signal energy proportion average value Judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned processing unit also calculate it is multiple before be judged as the original of noise signal First consonant frequency band signals energy corresponding to the speech frame of phonetic sampling signal and the second consonant frequency band signals energy and Weighted average, to obtain consonant band energy summation weighted average, and according to raw tone corresponding to target voice frame Sampled signal energy subtracts whether the resulting difference of low pass sampled signal energy is greater than consonant band energy summation weighted average Come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal First consonant frequency band signals energy corresponding to speech frame and the second consonant frequency band signals energy and weighted value with corresponding each Interval length between a raw tone sampled signal for being judged as noise signal and target voice frame is different and changes.

In one embodiment of this invention, above-mentioned processing unit also whether be greater than according to raw tone sampled signal energy or Judge equal to lower limit value raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned processing unit also calculate raw tone sampled signal the first zero-crossing rate, Second zero-crossing rate and third zero-crossing rate, and calculate the original language of multiple speech frames before target voice frame and target voice frame The Average zero-crossing rate of sound sampled signal, to obtain the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate, And whether it is respectively greater than or is equal to its correspondence according to the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate Default Average zero-crossing rate come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal, the first mistake Original phonetic sampling signal is default by first respectively in target voice frame for zero rate, the second zero-crossing rate and third zero-crossing rate The number of value, the second preset value and third preset value, the second preset value is less than the first preset value and is greater than third preset value.

In one embodiment of this invention, whether above-mentioned processing unit is also greater than or equal to according to the second zero-crossing rate and preset Zero rate come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

Speech identifying method of the invention includes the following steps: to carry out low-pass filtering, the first consonant frequency range to voice signal And second consonant frequency range bandpass filtering, to generate low-pass filter signal, the first bandpass filtered signal and the second band respectively Bandpass filtered signal；Voice signal, low-pass filter signal, the first bandpass filtered signal are divided into the second bandpass filtered signal multiple Speech frame, wherein each speech frame includes N number of sampled signal, N is positive integer；Calculate the energy of sampled signal in target voice frame Amount, it is auxiliary to obtain raw tone sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and second Audio band signals energy；According to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass sampled signal energy The ratio calculation of amount obtains the second consonant frequency band signals energy ratio；It is sampled according to low pass sampled signal energy and raw tone The ratio of signal energy and the second consonant frequency band signals energy ratio at least one judge the original of corresponding target voice frame Phonetic sampling signal whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes the first consonant frequency band signals energy of foundation With the ratio of the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy And second consonant frequency band signals energy target voice frame corresponding with the ratio in judgement of raw tone sampled signal energy it is original Whether phonetic sampling signal is noise.

In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: judging the first consonant frequency range Ratio, the first consonant frequency band signals energy and the raw tone sampled signal energy of signal energy and the second consonant frequency band signals energy It is corresponding whether the ratio of amount and the second consonant frequency band signals energy fall within respectively with the ratio of raw tone sampled signal energy Default ratio range；If the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency The ratio and the second consonant frequency band signals energy and raw tone of segment signal energy and raw tone sampled signal energy sample The ratio of signal energy falls within corresponding default ratio range respectively, then the raw tone sampled signal of target voice frame is noise Signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: calculating raw tone sampling Signal energy subtracts the energy differences of low pass sampled signal energy；Calculate the ratio of the second consonant frequency band signals energy and energy differences Value, to obtain the second consonant frequency band signals energy ratio.

In one embodiment of this invention, above-mentioned speech identifying method further includes foundation low pass sampled signal energy and original Whether the ratio of beginning phonetic sampling signal energy takes less than the first default ratio and low pass sampled signal energy with raw tone Whether the ratio of sample signal energy is located in preset energy ratio range and whether the second consonant frequency band signals energy ratio is big In the second default ratio, come judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.

In one embodiment of this invention, wherein if the ratio of low pass sampled signal energy and raw tone sampled signal energy The ratio being worth less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy In ratio range and the second consonant frequency band signals energy ratio is greater than the second default ratio, and speech identifying method further includes following Step: the energy weighted average of multiple raw tone sampled signals for being judged as noise signal before is calculated, to be made an uproar Acoustical signal energy weighted average；Whether it is greater than noise letter according to raw tone sampled signal energy corresponding to target voice frame Number energy weighted average come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal The weighted value of speech frame with each raw tone sampled signal for being judged as noise signal of correspondence speech frame and target voice Interval length between frame is different and changes.

In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: calculate target voice frame with The ratio of multiple speech frames corresponding low pass sampled signal energy and raw tone sampled signal energy before target voice frame The average value of value, to obtain low pass sampled signal energy proportion average value；It is according to low pass sampled signal energy proportion average value It is no be less than default average value judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: being judged to before calculating is multiple First consonant frequency band signals energy and the second consonant corresponding to the speech frame to break as the raw tone sampled signal of noise signal The weighted average of the sum of frequency band signals energy, to obtain consonant band energy summation weighted average；According to target voice frame Corresponding raw tone sampled signal energy subtracts whether the resulting difference of low pass sampled signal energy is greater than consonant frequency range energy Amount summation weighted average come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes foundation raw tone sampled signal energy Whether be greater than lower limit value be equal to judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps.Calculate raw tone sampling The first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of signal, and calculate target voice frame with before target voice frame Multiple speech frames raw tone sampled signal Average zero-crossing rate, to obtain the first Average zero-crossing rate, the second average zero passage Rate and third Average zero-crossing rate, the first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate are respectively in target voice frame Raw tone sampled signal passes through the number of the first preset value, the second preset value and third preset value, and the second preset value is less than First preset value and be greater than third preset value.It is averaged zero passage according to the first Average zero-crossing rate, the second Average zero-crossing rate and third Whether rate is respectively greater than or corresponding equal to its default Average zero-crossing rate judges that raw tone corresponding to target voice frame takes Sample signal whether supplemented by sound signal.

In one embodiment of this invention, above-mentioned speech identifying method further includes, whether be greater than according to the second zero-crossing rate or Judge equal to default zero-crossing rate raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.

Based on above-mentioned, ratio of the embodiment of the present invention according to low pass sampled signal energy and raw tone sampled signal energy Value and the second consonant frequency band signals energy ratio at least one judge the raw tone sampled signal of corresponding target voice frame Whether supplemented by sound signal, with lower by raw tone sampled signal be mistaken for consonant signal situation occur, and then improve consonant The identification precision of signal.

To make the foregoing features and advantages of the present invention clearer and more comprehensible, special embodiment below, and it is detailed to cooperate attached drawing to make Carefully it is described as follows.

Detailed description of the invention

Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention；

Fig. 2A~2C shows the flow diagram of the speech identifying method of one embodiment of the invention.

Description of symbols:

102: filter unit；

104: processing unit；

S1: voice signal；

S2: the first bandpass filtered signal；

S3: the second bandpass filtered signal；

S4: low-pass filter signal；

S202~S238: step.

Specific embodiment

Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention, please refers to Fig. 1.Voice identification apparatus Including filter unit 102 and processing unit 104, filter unit 102 couples processing unit 104.Filter unit 102 can be to voice Signal S1 carries out the bandpass filtering of low-pass filtering, the first consonant frequency range and the second consonant frequency range, to generate low-pass filtering respectively Signal S4, the first bandpass filtered signal S2 and the second bandpass filtered signal S3, filter unit 102 can be for example including low pass filtereds Wave device and bandpass filter, and processing unit 104 can for example be implemented with central processing unit.In the present embodiment, low pass filtered The cutting frequency of wave is 0~2kHz, and the first consonant frequency range and the second consonant frequency range are respectively 2kHz~4kHz and 4kHz ~10kHz, but not limited to this.

Processing unit 104 can be to voice signal S1, low-pass filter signal S4, the first bandpass filtered signal S2 and second Bandpass filtered signal S3 is sampled, and by voice signal S1, low-pass filter signal S4, the first bandpass filtered signal S2 and Two bandpass filtered signal S3 are divided into multiple speech frames, wherein each speech frame may include N number of voice signal S1 sampled signal, The sampled signal of N number of low-pass filter signal S4, the sampled signal of N number of first bandpass filtered signal S2 and the filter of N number of second band logical The sampled signal of wave signal S3.Processing unit 104 can also calculate the energy of sampled signal in each speech frame, original to obtain Phonetic sampling signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy It measures, wherein raw tone sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant Frequency band signals energy respectively corresponds the sampled signal of voice signal S1 in speech frame, the sampled signal of low-pass filter signal S4, The energy of the sampled signal of the sampled signal of one bandpass filtered signal S2 and the second bandpass filtered signal S3.Obtaining original language Sound sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy Afterwards, processing unit 104 can be according to the ratio, first auxiliary of the first consonant frequency band signals energy and the second consonant frequency band signals energy The ratio and the second consonant frequency band signals energy and raw tone of audio band signals energy and raw tone sampled signal energy Whether the raw tone sampled signal that the ratio in judgement of sampled signal energy corresponds to each speech frame is noise.

Specifically, processing unit 104 can determine whether the first consonant frequency band signals energy and the second consonant frequency band signals energy Ratio, the first consonant frequency band signals energy and raw tone sampled signal energy ratio and the second consonant frequency band signals energy Whether amount falls within its corresponding default ratio range with the ratio of raw tone sampled signal energy respectively, if the first consonant frequency range Ratio, the first consonant frequency band signals energy and the raw tone sampled signal energy of signal energy and the second consonant frequency band signals energy It is corresponding that the ratio of amount and the second consonant frequency band signals energy with the ratio of raw tone sampled signal energy fall within its respectively Default ratio range, then the raw tone sampled signal of target voice frame is noise signal.

For example, processing unit 104 judges corresponding a target voice frame (such as m-th of speech frame, m are positive integer) Raw tone sampled signal whether be noise mode, can be judged with following formula:

Wherein EB1_mFor the first consonant frequency band signals energy, EB2_mFor the second consonant frequency band signals energy, and E_mIt is original Phonetic sampling signal energy, when formula (1), (2), (3) all meet, processing unit 104 judges the raw tone of m-th of speech frame Sampled signal is noise signal.

After the raw tone sampled signal for judging target voice frame is noise signal, processing unit 104 is also calculated The energy weighted average of multiple speech frames of the raw tone sampled signal of noise signal is judged as before target voice frame, To obtain noise signal energy weighted average, and according to raw tone sampled signal energy corresponding to target voice frame whether Judge whether raw tone sampled signal corresponding to target voice frame is consonant greater than noise signal energy weighted average Signal.

For example, noise signal energy weighted average can be judged as noise letter to calculate before target voice frame Number raw tone sampled signal 3 speech frames energy weighted average and obtain, it is assumed that before m-th of speech frame, Three speech frames for being judged as noise recently are respectively the m-10 speech frame, m-12 speech frame and the m-20 language Sound frame then corresponds to the noise signal energy weighted average AK of m-th of speech frame_mIt can be as follows shown in formula:

Wherein E_m-10、E_m-12、E_m-20Respectively the m-10 speech frame, m-12 speech frame and the m-20 voice The raw tone sampled signal energy of frame, and a0, a1, a2 are respectively the m-10 speech frame, m-12 speech frame and M-20 corresponding weighted values.Wherein weighted value a0, a1, a2 can be fixed value either change value.For example, correspondence is each The weighted value for being judged as the speech frame of the raw tone sampled signal of noise signal can be judged as noise letter with correspondence is each Number raw tone sampled signal speech frame and target voice frame between interval length it is different and change.Such as in the present embodiment In, weighted value a0, a1, a2 can it is different with the interval length between speech frame and m-th of speech frame and change.When noise signal energy Measure weighted average AK_mMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether message supplemented by the raw tone sampled signal of corresponding m-th of speech frame Number:

E_m>AK_m (5)

In addition, processing unit can calculate the speech frame of multiple raw tone sampled signals for being judged as noise signal before Corresponding the first consonant frequency band signals energy and the second consonant frequency band signals energy and weighted average, to obtain consonant Band energy summation weighted average, and subtract low pass according to raw tone sampled signal energy corresponding to target voice frame and take Whether the resulting difference of sample signal energy is greater than consonant band energy summation weighted average to judge corresponding to target voice frame Raw tone sampled signal whether supplemented by sound signal.For example, consonant band energy summation weighted average can be calculating The first consonant frequency range letter of 3 speech frames of the raw tone sampled signal of noise signal is judged as before target voice frame Number energy and the second consonant frequency band signals energy and weighted average and obtain, it is assumed that before m-th of speech frame, recently Three speech frames for being judged as noise are respectively the m-10 speech frame, m-12 speech frame and the m-20 voice Frame then corresponds to the consonant band energy summation weighted average AS of m-th of speech frame_mIt can be as follows shown in formula:

Wherein EB1_m-10、EB1_m-12、EB1_m-20Respectively the m-10 speech frame, m-12 speech frame and m-20 The first consonant frequency band signals energy, the EB2 of a speech frame_m-10、EB2_m-12、EB2_m-20Respectively the m-10 speech frame, m-12 Second consonant frequency band signals energy of a speech frame and the m-20 speech frame, and c0, c1, c2 are respectively the m-10 voice Frame, m-12 speech frame and m-20 corresponding weighted value.Wherein weighted value c0, c1, c2 can be for fixed value either Change value.For example, the corresponding to the speech frame of corresponding each raw tone sampled signal for being judged as noise signal One consonant frequency band signals energy and the second consonant frequency band signals energy and weighted value each be judged as noise letter with corresponding Number raw tone sampled signal and target voice frame between interval length it is different and change.As in the present embodiment, weight Value c0, c1, c2 can it is different with the interval length between speech frame and m-th of speech frame and change.When the first consonant energy proportion Weighted average AS_mMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether message supplemented by the raw tone sampled signal of corresponding m-th of speech frame Number:

E_m-EL_m>AS_m (7)

Wherein EL_mFor the low pass sampled signal energy of corresponding m-th of speech frame.

In addition, processing unit 104 can also calculate target voice frame and speech frames multiple before target voice frame are corresponding low The average value of the ratio of logical sampled signal energy and raw tone sampled signal energy, to obtain low pass sampled signal energy proportion Average value, for example, for m-th of speech frame, low pass sampled signal energy proportion average value AU_mFollowing formula subrepresentation:

Wherein EL_m、EL_m-1For corresponding m-th of speech frame, the low pass sampled signal energy of the m-1 speech frame, E_m、E_m-1Point Not Wei m-th of speech frame, the m-1 speech frame raw tone sampled signal energy.Processing unit 104 can be sampled according to low pass Whether signal energy ratio average is less than default average value to judge raw tone sampled signal corresponding to target voice frame Whether supplemented by sound signal.For example, above-mentioned judgment mode can be with following formula subrepresentation for m-th of speech frame:

AU_m<0.6 (9)

In the present embodiment, presetting average value is 0.6, but is not limited thereto, and presetting average value can also be according to practical situation It is adjusted to other values.In addition, carrying out low pass sampled signal energy proportion average value AU_mCalculating speech frame number also not with this Embodiment this be limited.

Also, processing unit 104 can also be according to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass The ratio calculation of sampled signal energy obtains the second consonant frequency band signals energy ratio, according to low pass sampled signal energy and original The ratio of beginning phonetic sampling signal energy and the second consonant frequency band signals energy ratio at least one judge corresponding target language The raw tone sampled signal of sound frame whether supplemented by sound signal.For example, processing unit 104 can calculate raw tone sampled signal energy Amount subtracts the energy differences of low pass sampled signal energy, and calculates the ratio of the second consonant frequency band signals energy and energy differences, To obtain the second consonant frequency band signals energy ratio.After calculating the second consonant frequency band signals energy ratio, processing unit 104 can according to low pass sampled signal energy and raw tone sampled signal energy ratio whether less than the first default ratio, with And whether the ratio of low pass sampled signal energy and raw tone sampled signal energy is located in preset energy ratio range and the Whether two consonant frequency band signals energy ratios are greater than the second default ratio, to judge that the raw tone of corresponding target voice frame takes Sample signal whether supplemented by sound signal.

For example, for m-th of speech frame, above-mentioned judgment mode can be with following formula subrepresentation:

In the present embodiment, the first default ratio be the 0.5, second default ratio be 1.3, preset energy ratio range is 0.5~0.6, but not limited to this, in some embodiments the first default ratio, the second default ratio and preset energy ratio Range can also be adjusted to other values according to practical situation.

In addition, whether processing unit 104 can also be greater than or equal to lower limit value according to raw tone sampled signal energy to sentence Raw tone sampled signal corresponding to disconnected target voice frame whether supplemented by sound signal.For example, for m-th of speech frame, on Stating judgment mode can be with following formula subrepresentation:

E_m≥50 (13)

In the present embodiment, lower limit value 50, but not limited to this, and lower limit value can also be according to practical feelings in some embodiments Shape is adjusted.

Occur since consonant signal might have energy situation of different sizes, it may can in the lesser part of energy ratio Be considered as noise, to avoid this situation, in addition to it is above-mentioned according to energy come judge raw tone sampled signal whether supplemented by sound signal Outside, processing unit 104 can also judge according to zero-crossing rate raw tone sampled signal whether supplemented by sound signal.Processing unit 104 The first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate of raw tone sampled signal can be calculated, and calculates target voice frame With the Average zero-crossing rate of the raw tone sampled signal of speech frames multiple before target voice frame, to obtain the first average zero passage Rate, the second Average zero-crossing rate and third Average zero-crossing rate, and according to the first Average zero-crossing rate, the second Average zero-crossing rate and the Whether three Average zero-crossing rates are respectively greater than or corresponding equal to its default Average zero-crossing rate judges corresponding to target voice frame Raw tone sampled signal whether supplemented by sound signal.Wherein the first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate are respectively Original phonetic sampling signal passes through the number of the first preset value, the second preset value and third preset value in target voice frame, Wherein the second preset value less than the first preset value and is greater than third preset value.

For m-th of speech frame, original zero-crossing rateIt can be shown below:

Wherein N is positive integer, represents the number of the sampled signal in m-th of speech frame, and mL is amplitude threshold value, andFor the raw tone sampled signal in m-th of speech frame.Processing unit 104 can foundationWhether one is greater than or equal to pre- If zero-crossing rate come judge raw tone sampled signal whether supplemented by sound signal, such as can judge according to following formula:

It wherein presets zero-crossing rate not to be limited with 22, its value can also be adjusted according to practical situation in some embodiments. In addition, in addition processing unit 104 can include the zero-crossing rate of energy condition according to raw tone sampled signalTo judge Raw tone sampled signal whether supplemented by sound signal, zero-crossing rateIt can be shown below:

WhereinCan following formula indicate:

In the present embodiment, α_xValue be 0.5, but not limited to this, its value can also be according to practical feelings in some embodiments Shape is adjusted.So by adjusting calculate zero-crossing rate benchmark, can more accurately judge raw tone sampled signal whether be Consonant signal.Whether processing unit 104 can also judge raw tone sampled signal according to the Average zero-crossing rate of multiple speech frames Supplemented by sound signal, for example, for m-th of speech frame, can according to its with nearest two speech frames (namely m-1, m-2 A speech frame) zero-crossing rate average value come judge raw tone sampled signal whether supplemented by sound signal, judge that formula can be as follows It is shown:

Described in embodiment as above, processing unit 104 can judge that raw tone takes according to energy or zero-crossing rate at least one Sample signal whether supplemented by sound signal namely processing unit 104 can the condition at least one of in summary formula judge corresponding mesh Mark speech frame raw tone sampled signal whether supplemented by sound signal.For example, processing unit 104 can determine whether formula (5), (7), (9), whether (10), (13), (15), (20), (21), (22) meet simultaneously, just judge corresponding target voice frame if meeting simultaneously Raw tone sampled signal supplemented by sound signal.In another example processing unit 104 also can determine whether formula (5), (7), (9), (11), (12), whether (13), (15), (20), (21), (22) meet simultaneously, if meeting the original for just judging corresponding target voice frame simultaneously Sound signal supplemented by beginning phonetic sampling signal.

Fig. 2A~2C shows the flow diagram of the speech identifying method of one embodiment of the invention, referring to figure 2. A~2C. As can be seen from the above embodiments, the speech identifying method of voice identification apparatus may include the following steps.Firstly, being carried out to voice signal The bandpass filtering of low-pass filtering, the first consonant frequency range and the second consonant frequency range, to generate low-pass filter signal, first band respectively Bandpass filtered signal and the second bandpass filtered signal (step S202).Then, by voice signal, low-pass filter signal, first band Bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frames (step S204), wherein each speech frame includes N number of sampling Signal, N are positive integer.Then, the energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal Energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy (step S206). Later, ratio, the first consonant frequency band signals energy according to the first consonant frequency band signals energy and the second consonant frequency band signals energy Amount and the ratio of raw tone sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy Ratio in judgement correspond to whether the raw tone sampled signal of target voice frame is noise (step S208).For example, can determine whether The ratio of one consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone Whether the ratio of the ratio of sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy divides Corresponding default ratio range is not fallen within, if the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy, The ratio and the second consonant frequency band signals energy and original of first consonant frequency band signals energy and raw tone sampled signal energy The ratio of beginning phonetic sampling signal energy falls within corresponding default ratio range respectively, then the raw tone sampling of target voice frame Signal is noise signal.

And then according to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass sampled signal energy The ratio calculation of amount obtains the second consonant frequency band signals energy ratio, and takes according to low pass sampled signal energy with raw tone The ratio of sample signal energy and the second consonant frequency band signals energy ratio at least one judge the original of corresponding target voice frame Beginning phonetic sampling signal whether supplemented by sound signal.As shown in Fig. 2A~2C, it can first calculate raw tone sampled signal energy and subtract Then the energy differences (step S210) of low pass sampled signal energy calculate the second consonant frequency band signals energy and energy differences again Ratio, to obtain the second consonant frequency band signals energy ratio (step S212).Judge low pass sampled signal energy again later Whether the ratio with raw tone sampled signal energy is less than the first default ratio and low pass sampled signal energy and original language Whether the ratio of sound sampled signal energy is located in preset energy ratio range and the second consonant frequency band signals energy ratio is It is no to be greater than the second default ratio (step S214).If the ratio of low pass sampled signal energy and raw tone sampled signal energy is not Ratio less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is not located at preset energy In ratio range or the second consonant frequency band signals energy ratio is not greater than the second default ratio, then judges target voice frame institute The corresponding non-consonant signal (step S216) of raw tone sampled signal.

On the contrary, if the ratio of low pass sampled signal energy and raw tone sampled signal energy is less than the first default ratio It is interior and second that the ratio of value or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy ratio range Consonant frequency band signals energy ratio is greater than the second default ratio, then calculates multiple original languages for being judged as noise signal before The energy weighted average of the speech frame of sound sampled signal, to obtain noise signal energy weighted average (step S218).So Judge whether raw tone sampled signal energy corresponding to target voice frame is greater than noise signal energy weighted average (step afterwards Rapid S220), wherein the weighted value for corresponding to the speech frame of each raw tone sampled signal for being judged as noise signal can be with right Answer interval length between the speech frame and target voice frame of each raw tone sampled signal for being judged as noise signal not Change together.If raw tone sampled signal energy corresponding to target voice frame is not greater than noise signal energy weighted average Value, then judge the non-consonant signal (step S216) of raw tone sampled signal corresponding to target voice frame.

On the contrary, if raw tone sampled signal energy corresponding to target voice frame is greater than noise signal energy, weighting is flat Mean value then calculates target voice frame low pass sampled signal energy corresponding with speech frames multiple before target voice frame and original language The average value of the ratio of sound sampled signal energy, to obtain low pass sampled signal energy proportion average value (step S222).Then Judge whether low pass sampled signal energy proportion average value is less than default average value (step S224) again.If low pass sampled signal energy Ratio average is measured no less than default average value, then the non-consonant signal of raw tone sampled signal corresponding to target voice frame (step S216).On the contrary, if low pass sampled signal energy proportion average value is less than default average value, then calculate it is multiple it Before be judged as noise signal raw tone sampled signal speech frame corresponding to the first consonant frequency band signals energy and The weighted average of the sum of two consonant frequency band signals energy, to obtain consonant band energy summation weighted average (step S226), wherein corresponding to the first consonant corresponding to the speech frame of each raw tone sampled signal for being judged as noise signal Frequency band signals energy and the second consonant frequency band signals energy and weighted value with corresponding each original for being judged as noise signal Interval length between beginning phonetic sampling signal and target voice frame is different and changes.Then judge again corresponding to target voice frame Raw tone sampled signal energy subtract whether the resulting difference of low pass sampled signal energy is greater than consonant band energy summation Weighted average (step S228), if raw tone sampled signal energy corresponding to target voice frame subtracts low pass sampled signal The resulting difference of energy is not greater than consonant band energy summation weighted average, then raw tone corresponding to target voice frame takes The non-consonant signal (step S216) of sample signal.

On the contrary, if raw tone sampled signal energy corresponding to target voice frame subtracts low pass sampled signal energy institute The difference obtained is greater than consonant band energy summation weighted average, then judges whether raw tone sampled signal energy is greater than or waits In lower limit value (step S230).If raw tone sampled signal energy is not greater than or equal to lower limit value, target voice frame institute is right The non-consonant signal (step S216) of the raw tone sampled signal answered.On the contrary, if raw tone sampled signal energy be greater than or Equal to lower limit value, then the first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of raw tone sampled signal are then calculated, and The Average zero-crossing rate of target voice frame with the raw tone sampled signal of multiple speech frames before target voice frame is calculated, to obtain One first Average zero-crossing rate, one second Average zero-crossing rate and a third Average zero-crossing rate (step S232).Wherein the first zero passage Original phonetic sampling signal is default by first respectively in target voice frame for rate, the second zero-crossing rate and third zero-crossing rate The number of value, the second preset value and third preset value, wherein the second preset value is less than the first preset value and default greater than third Value.Then judge whether the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate are respectively greater than or wait again In its corresponding default Average zero-crossing rate (step S234).If the first Average zero-crossing rate, the second Average zero-crossing rate and third are flat Equal zero-crossing rate is not all greater than or equal to its corresponding default Average zero-crossing rate, then raw tone corresponding to target voice frame samples The non-consonant signal (step S216) of signal.

On the contrary, if the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate are greater than or equal to it Corresponding default Average zero-crossing rate, then then judge whether the second zero-crossing rate is greater than or equal to default zero-crossing rate (step S236). If the second zero-crossing rate is not greater than or equal to default zero-crossing rate, the non-consonant of raw tone sampled signal corresponding to target voice frame Signal (step S216).On the contrary, if the second zero-crossing rate is greater than or equal to default zero-crossing rate, original corresponding to target voice frame Sound signal (step S238) supplemented by beginning phonetic sampling signal.

In conclusion the present invention can the condition at least one of in summary formula judge the original of corresponding target voice frame Phonetic sampling signal whether supplemented by sound signal, to improve the identification precision of consonant signal.Such as it can be according to low pass sampled signal At least one judgement pair of the ratio and the second consonant frequency band signals energy ratio of energy and raw tone sampled signal energy Answer target voice frame raw tone sampled signal whether supplemented by sound signal, with lower by raw tone sampled signal erroneous judgement supplemented by The situation of sound signal occurs, and then improves the identification precision of consonant signal.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations；To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement；And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims

1. a kind of voice identification apparatus characterized by comprising

One filter unit carries out the band of low-pass filtering, one first consonant frequency range and one second consonant frequency range to a voice signal Pass filter, to generate a low-pass filter signal, one first bandpass filtered signal and one second bandpass filtered signal respectively；And

One processing unit couples the filter unit, by the voice signal, the low-pass filter signal, first bandpass filtered signal And second bandpass filtered signal is divided into multiple speech frames, wherein respectively the speech frame includes N number of sampled signal, N is positive integer, The energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal energy, a low pass sampled signal energy Amount, one first consonant frequency band signals energy and one second consonant frequency band signals energy, according to the second consonant frequency band signals energy The ratio calculation of amount, the raw tone sampled signal energy and the low pass sampled signal energy obtains one second consonant frequency band signals Energy ratio, ratio and second consonant according to the low pass sampled signal energy and the raw tone sampled signal energy Frequency band signals energy ratio at least one judge the corresponding target voice frame raw tone sampled signal whether supplemented by message Number,

Wherein the processing unit calculates the energy difference that the raw tone sampled signal energy subtracts the low pass sampled signal energy Value, and the ratio of the second consonant frequency band signals energy and the energy differences is calculated, to obtain the second consonant frequency band signals energy Measure ratio value.

2. voice identification apparatus according to claim 1, which is characterized in that the processing unit is also according to first consonant frequency Segment signal energy and the ratio, the first consonant frequency band signals energy and the raw tone of the second consonant frequency band signals energy take The ratio in judgement of the ratio of sample signal energy and the second consonant frequency band signals energy and the raw tone sampled signal energy Whether the raw tone sampled signal of the corresponding target voice frame is noise.

3. voice identification apparatus according to claim 2, which is characterized in that the processing unit also judges first consonant frequency Segment signal energy and the ratio, the first consonant frequency band signals energy and the raw tone of the second consonant frequency band signals energy take Whether the ratio of the ratio of sample signal energy and the second consonant frequency band signals energy and the raw tone sampled signal energy Corresponding default ratio range is fallen within respectively, if the first consonant frequency band signals energy and the second consonant frequency band signals energy Ratio, the ratio of the first consonant frequency band signals energy and the raw tone sampled signal energy and the second consonant frequency range are believed Number energy falls within corresponding default ratio range with the ratio of the raw tone sampled signal energy respectively, then the target voice frame Raw tone sampled signal be noise signal.

4. voice identification apparatus according to claim 1, which is characterized in that the processing unit, which is also sampled according to the low pass, to be believed Whether the ratio of number energy and the raw tone sampled signal energy is less than one first default ratio and the low pass sampled signal Whether the ratio of energy and the raw tone sampled signal energy is located in a preset energy ratio range and second consonant frequency Whether segment signal energy ratio is greater than one second default ratio, to judge to correspond to the raw tone sampling letter of the target voice frame Number whether supplemented by sound signal.

5. voice identification apparatus according to claim 4, which is characterized in that if the low pass sampled signal energy is original with this The ratio of phonetic sampling signal energy is less than the first default ratio or the low pass sampled signal energy and the raw tone samples The ratio of signal energy is located in the preset energy ratio range and the second consonant frequency band signals energy ratio be greater than this Two default ratios, the energy which also calculates multiple raw tone sampled signals for being judged as noise signal before add Weight average value to obtain a noise signal energy weighted average, and takes according to raw tone corresponding to the target voice frame Whether sample signal energy is greater than the noise signal energy weighted average to judge raw tone corresponding to the target voice frame Sampled signal whether supplemented by sound signal.

6. voice identification apparatus according to claim 5, which is characterized in that it is corresponding respectively this be judged as the original of noise signal The weighted value of the speech frame of beginning phonetic sampling signal with it is corresponding respectively this be judged as the raw tone sampled signal of noise signal Speech frame is different from the interval length of the target voice frame and changes.

7. voice identification apparatus according to claim 5, which is characterized in that the processing unit also calculates the target voice frame Low pass sampled signal energy corresponding with multiple speech frames before the target voice frame and raw tone sampled signal energy Ratio average value, to obtain a low pass sampled signal energy proportion average value, and according to the low pass sampled signal energy ratio Example average value whether less than a default average value come judge raw tone sampled signal corresponding to the target voice frame whether be Consonant signal.

8. voice identification apparatus according to claim 7, which is characterized in that the processing unit also calculate it is multiple before be judged to The first consonant frequency band signals energy corresponding to the speech frame to break as the raw tone sampled signal of noise signal and this second The weighted average of the sum of consonant frequency band signals energy, to obtain a consonant band energy summation weighted average, and according to this Whether the raw tone sampled signal energy corresponding to target voice frame subtracts the resulting difference of low pass sampled signal energy Raw tone sampled signal corresponding to the target voice frame is judged greater than the consonant band energy summation weighted average is Sound signal supplemented by no.

9. voice identification apparatus according to claim 8, which is characterized in that it is corresponding respectively this be judged as the original of noise signal The first consonant frequency band signals energy corresponding to the speech frame of beginning phonetic sampling signal and the second consonant frequency band signals energy And weighted value with it is corresponding respectively this be judged as between the raw tone sampled signal of noise signal and the target voice frame Interval length is different and changes.

10. voice identification apparatus according to claim 8, which is characterized in that the processing unit is also according to the raw tone Whether sampled signal energy is greater than or equal to a lower limit value to judge raw tone sampled signal corresponding to the target voice frame Whether supplemented by sound signal.

11. voice identification apparatus according to claim 10, which is characterized in that the processing unit also calculates the raw tone The first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of sampled signal, and calculate the target voice frame and the target voice The Average zero-crossing rate of the raw tone sampled signal of multiple speech frames before frame, to obtain one first Average zero-crossing rate, one Two Average zero-crossing rates and a third Average zero-crossing rate, and according to first Average zero-crossing rate, second Average zero-crossing rate and Whether the third Average zero-crossing rate is respectively greater than or default Average zero-crossing rate corresponding equal to its judges the target voice frame institute Corresponding raw tone sampled signal whether supplemented by sound signal, first zero-crossing rate, second zero-crossing rate and the third zero passage The raw tone sampled signal passes through one first preset value, one second preset value and one to rate respectively in the target voice frame The number of third preset value, second preset value are less than first preset value and are greater than the third preset value.

12. voice identification apparatus according to claim 11, which is characterized in that the processing unit is also according to second zero passage Rate whether be greater than or equal to a default zero-crossing rate judge raw tone sampled signal corresponding to the target voice frame whether be Consonant signal.

13. a kind of speech identifying method characterized by comprising

The bandpass filtering of low-pass filtering, one first consonant frequency range and one second consonant frequency range is carried out, to a voice signal to divide A low-pass filter signal, one first bandpass filtered signal and one second bandpass filtered signal are not generated；

The voice signal, the low-pass filter signal, first bandpass filtered signal and second bandpass filtered signal are divided into more A speech frame, wherein respectively the speech frame includes N number of sampled signal, N is positive integer；

The energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal energy, low pass sampling letter Number energy, one first consonant frequency band signals energy and one second consonant frequency band signals energy；

Calculate the energy differences that the raw tone sampled signal energy subtracts the low pass sampled signal energy；

The ratio of the second consonant frequency band signals energy and the energy differences is calculated, to obtain one second consonant frequency band signals energy Ratio value；And

Ratio and the second consonant frequency range according to the low pass sampled signal energy and the raw tone sampled signal energy are believed Number energy ratio at least one judge the corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.

14. speech identifying method according to claim 13, which is characterized in that further include:

Believe according to the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency range Number energy takes with the ratio of the raw tone sampled signal energy and the second consonant frequency band signals energy with the raw tone Whether the raw tone sampled signal that the ratio in judgement of sample signal energy corresponds to the target voice frame is noise.

15. speech identifying method according to claim 14, which is characterized in that further include:

Judge the ratio, the first consonant frequency range letter of the first consonant frequency band signals energy and the second consonant frequency band signals energy Number energy takes with the ratio of the raw tone sampled signal energy and the second consonant frequency band signals energy with the raw tone Whether the ratio of sample signal energy falls within corresponding default ratio range respectively；And

If ratio, the first consonant frequency band signals of the first consonant frequency band signals energy and the second consonant frequency band signals energy The ratio and the second consonant frequency band signals energy of energy and the raw tone sampled signal energy and the raw tone sample The ratio of signal energy falls within corresponding default ratio range respectively, then the raw tone sampled signal of the target voice frame is to make an uproar Acoustical signal.

16. speech identifying method according to claim 13, which is characterized in that further include:

According to the ratio of the low pass sampled signal energy and raw tone sampled signal energy ratio whether default less than one first Whether the ratio of value and the low pass sampled signal energy and the raw tone sampled signal energy is located at a preset energy ratio In range and whether the second consonant frequency band signals energy ratio is greater than one second default ratio, to judge to correspond to the target language The raw tone sampled signal of sound frame whether supplemented by sound signal.

17. speech identifying method according to claim 16, which is characterized in that if the low pass sampled signal energy and the original The ratio of beginning phonetic sampling signal energy is less than the first default ratio or the low pass sampled signal energy takes with the raw tone The ratio of sample signal energy is located in the preset energy ratio range and the second consonant frequency band signals energy ratio is greater than this Second default ratio, the speech identifying method further include:

The energy weighted average for calculating multiple raw tone sampled signals for being judged as noise signal before, is made an uproar with obtaining one Acoustical signal energy weighted average；And

It is put down according to whether raw tone sampled signal energy corresponding to the target voice frame is greater than noise signal energy weighting Mean value come judge raw tone sampled signal corresponding to the target voice frame whether supplemented by sound signal.

18. speech identifying method according to claim 17, which is characterized in that it is corresponding respectively this be judged as noise signal The weighted value of the speech frame of raw tone sampled signal with it is corresponding respectively this be judged as the raw tone sampled signal of noise signal Speech frame and the target voice frame between interval length it is different and change.

19. speech identifying method according to claim 17, which is characterized in that further include:

Calculate target voice frame low pass sampled signal energy corresponding with multiple speech frames before the target voice frame with The average value of the ratio of raw tone sampled signal energy, to obtain a low pass sampled signal energy proportion average value；And

Whether less than a default average value target voice frame institute is judged according to the low pass sampled signal energy proportion average value Corresponding raw tone sampled signal whether supplemented by sound signal.

20. speech identifying method according to claim 19, which is characterized in that further include:

Calculate first consonant corresponding to the speech frame of multiple raw tone sampled signals for being judged as noise signal before Frequency band signals energy and the second consonant frequency band signals energy and weighted average, to obtain a consonant band energy summation Weighted average；And

It is subtracted obtained by the low pass sampled signal energy according to the raw tone sampled signal energy corresponding to the target voice frame Difference whether be greater than the consonant band energy summation weighted average to judge raw tone corresponding to the target voice frame Sampled signal whether supplemented by sound signal.

21. speech identifying method according to claim 20, which is characterized in that it is corresponding respectively this be judged as noise signal The first consonant frequency band signals energy corresponding to the speech frame of raw tone sampled signal and the second consonant frequency band signals energy Amount and weighted value with it is corresponding respectively this be judged as between the raw tone sampled signal of noise signal and the target voice frame Interval length it is different and change.

22. speech identifying method according to claim 20, which is characterized in that further include:

Whether it is greater than a lower limit value according to the raw tone sampled signal energy and is equal to and judges corresponding to the target voice frame Raw tone sampled signal whether supplemented by sound signal.

23. speech identifying method according to claim 22, which is characterized in that further include:

The first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate of the raw tone sampled signal are calculated, and calculates the target The Average zero-crossing rate of the raw tone sampled signal of multiple speech frames before speech frame and the target voice frame, to obtain one the One Average zero-crossing rate, one second Average zero-crossing rate and a third Average zero-crossing rate, first zero-crossing rate, second zero-crossing rate with And the raw tone sampled signal passes through one first preset value, one second to the third zero-crossing rate respectively in the target voice frame The number of preset value and a third preset value, second preset value are less than first preset value and are greater than the third preset value； And

Whether it is respectively greater than or waits according to first Average zero-crossing rate, second Average zero-crossing rate and the third Average zero-crossing rate Judge whether raw tone sampled signal corresponding to the target voice frame is consonant in its corresponding default Average zero-crossing rate Signal.

24. speech identifying method according to claim 23, which is characterized in that further include:

It is original corresponding to the target voice frame to judge whether to be greater than or equal to a default zero-crossing rate according to second zero-crossing rate Phonetic sampling signal whether supplemented by sound signal.