CN105989835B - Voice identification apparatus and speech identifying method - Google Patents
Voice identification apparatus and speech identifying method Download PDFInfo
- Publication number
- CN105989835B CN105989835B CN201510060494.7A CN201510060494A CN105989835B CN 105989835 B CN105989835 B CN 105989835B CN 201510060494 A CN201510060494 A CN 201510060494A CN 105989835 B CN105989835 B CN 105989835B
- Authority
- CN
- China
- Prior art keywords
- energy
- sampled signal
- signal
- raw tone
- ratio
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The present invention provides a kind of voice identification apparatus and speech identifying method.The present invention according to low pass sampled signal energy and the ratio of raw tone sampled signal energy and the second consonant frequency band signals energy ratio at least one judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.The identification precision of consonant signal can be improved in the present invention.
Description
Technical field
The invention relates to a kind of device for identifying, and in particular to a kind of voice identification apparatus and speech recognition side
Method.
Background technique
For hearing-impaired people, the voice signal of higher-frequency, such as consonant letter often can not be clearly received
Number, but the voice signal of low frequency can clearly be heard.Existing consonant signal judgment mode is to carry out in a frequency domain
Signal processing, there are mainly two types of judgment modes, non-instant consonant signal judgement and the judgement of instant consonant.Non-instant consonant signal is sentenced
It is disconnected, mainly judged by energy and zero-crossing rate.Instant consonant signal judgement, mainly according to high-frequency signal and gross energy
Whether the ratio whether ratio is greater than a fixed value and low frequency signal and gross energy is less than fixed value to determine that voice is believed
Number whether supplemented by sound signal.Though existing consonant signal judgment mode can distinguish consonant signal and noise, its right accuracy still without
Method meets actual demand.
Summary of the invention
The present invention provides a kind of voice identification apparatus and speech identifying method, and the identification precision of consonant signal can be improved.
Voice identification apparatus of the invention includes filter unit and processing unit.Filter unit carries out voice signal low
The bandpass filtering of pass filter, the first consonant frequency range and the second consonant frequency range, to generate low-pass filter signal, the first band logical respectively
Filtering signal and the second bandpass filtered signal.Processing unit couples filter unit, by voice signal, low-pass filter signal, the
One bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frames, wherein each speech frame includes N number of sampling letter
Number, N is positive integer, calculates the energy of sampled signal in target voice frame, to obtain raw tone sampled signal energy, low pass takes
Sample signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy, according to the second consonant frequency band signals
The ratio calculation of energy, raw tone sampled signal energy and low pass sampled signal energy obtains the second consonant frequency band signals energy
Ratio value, ratio and the second consonant frequency band signals energy according to low pass sampled signal energy and raw tone sampled signal energy
Amount ratio value at least one judge correspondence target voice frame raw tone sampled signal whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned processing unit is also according to the first consonant frequency band signals energy and the second consonant
The ratio of frequency band signals energy, the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy and second are auxiliary
The raw tone of audio band signals energy target voice frame corresponding with the ratio in judgement of raw tone sampled signal energy samples letter
It number whether is noise.
In one embodiment of this invention, above-mentioned processing unit also judges the first consonant frequency band signals energy and the second consonant
The ratio of frequency band signals energy, the first consonant frequency band signals energy and the ratio of raw tone sampled signal energy and second are auxiliary
Whether audio band signals energy falls within corresponding default ratio range with the ratio of raw tone sampled signal energy respectively, if the
The ratio of one consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone
The ratio of the ratio of sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy is fallen respectively
In corresponding default ratio range, then the raw tone sampled signal of target voice frame is noise signal.
In one embodiment of this invention, above-mentioned processing unit, which also calculates raw tone sampled signal energy and subtracts low pass, takes
One energy differences of sample signal energy, and the ratio of the second consonant frequency band signals energy and energy differences is calculated, to obtain second
Consonant frequency band signals energy ratio.
In one embodiment of this invention, above-mentioned processing unit is also sampled according to low pass sampled signal energy and raw tone
Whether the ratio of signal energy is less than the first default ratio and low pass sampled signal energy and raw tone sampled signal energy
Ratio whether be located in preset energy ratio range and the second consonant frequency band signals energy ratio whether to be greater than second default
Ratio, come judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.
In one embodiment of this invention, wherein if the ratio of low pass sampled signal energy and raw tone sampled signal energy
The ratio being worth less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy
In ratio range and the second consonant frequency band signals energy ratio is greater than the second default ratio, processing unit also calculate it is multiple before
It is judged as the energy weighted average of the raw tone sampled signal of noise signal, to obtain noise signal energy weighted average
Value, and whether it is greater than noise signal energy weighted average according to raw tone sampled signal energy corresponding to target voice frame
Come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal
The weighted value of speech frame with each raw tone sampled signal for being judged as noise signal of correspondence speech frame and target voice
The interval length of frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also calculates target voice frame and before target voice frame
The average value of the ratio of the corresponding low pass sampled signal energy of multiple speech frames and raw tone sampled signal energy is low to obtain
Logical sampled signal energy proportion average value, and whether it is less than default average value according to low pass sampled signal energy proportion average value
Judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned processing unit also calculate it is multiple before be judged as the original of noise signal
First consonant frequency band signals energy corresponding to the speech frame of phonetic sampling signal and the second consonant frequency band signals energy and
Weighted average, to obtain consonant band energy summation weighted average, and according to raw tone corresponding to target voice frame
Sampled signal energy subtracts whether the resulting difference of low pass sampled signal energy is greater than consonant band energy summation weighted average
Come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal
First consonant frequency band signals energy corresponding to speech frame and the second consonant frequency band signals energy and weighted value with corresponding each
Interval length between a raw tone sampled signal for being judged as noise signal and target voice frame is different and changes.
In one embodiment of this invention, above-mentioned processing unit also whether be greater than according to raw tone sampled signal energy or
Judge equal to lower limit value raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned processing unit also calculate raw tone sampled signal the first zero-crossing rate,
Second zero-crossing rate and third zero-crossing rate, and calculate the original language of multiple speech frames before target voice frame and target voice frame
The Average zero-crossing rate of sound sampled signal, to obtain the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate,
And whether it is respectively greater than or is equal to its correspondence according to the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate
Default Average zero-crossing rate come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal, the first mistake
Original phonetic sampling signal is default by first respectively in target voice frame for zero rate, the second zero-crossing rate and third zero-crossing rate
The number of value, the second preset value and third preset value, the second preset value is less than the first preset value and is greater than third preset value.
In one embodiment of this invention, whether above-mentioned processing unit is also greater than or equal to according to the second zero-crossing rate and preset
Zero rate come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
Speech identifying method of the invention includes the following steps: to carry out low-pass filtering, the first consonant frequency range to voice signal
And second consonant frequency range bandpass filtering, to generate low-pass filter signal, the first bandpass filtered signal and the second band respectively
Bandpass filtered signal;Voice signal, low-pass filter signal, the first bandpass filtered signal are divided into the second bandpass filtered signal multiple
Speech frame, wherein each speech frame includes N number of sampled signal, N is positive integer;Calculate the energy of sampled signal in target voice frame
Amount, it is auxiliary to obtain raw tone sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and second
Audio band signals energy;According to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass sampled signal energy
The ratio calculation of amount obtains the second consonant frequency band signals energy ratio;It is sampled according to low pass sampled signal energy and raw tone
The ratio of signal energy and the second consonant frequency band signals energy ratio at least one judge the original of corresponding target voice frame
Phonetic sampling signal whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned speech identifying method further includes the first consonant frequency band signals energy of foundation
With the ratio of the ratio of the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone sampled signal energy
And second consonant frequency band signals energy target voice frame corresponding with the ratio in judgement of raw tone sampled signal energy it is original
Whether phonetic sampling signal is noise.
In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: judging the first consonant frequency range
Ratio, the first consonant frequency band signals energy and the raw tone sampled signal energy of signal energy and the second consonant frequency band signals energy
It is corresponding whether the ratio of amount and the second consonant frequency band signals energy fall within respectively with the ratio of raw tone sampled signal energy
Default ratio range;If the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency
The ratio and the second consonant frequency band signals energy and raw tone of segment signal energy and raw tone sampled signal energy sample
The ratio of signal energy falls within corresponding default ratio range respectively, then the raw tone sampled signal of target voice frame is noise
Signal.
In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: calculating raw tone sampling
Signal energy subtracts the energy differences of low pass sampled signal energy;Calculate the ratio of the second consonant frequency band signals energy and energy differences
Value, to obtain the second consonant frequency band signals energy ratio.
In one embodiment of this invention, above-mentioned speech identifying method further includes foundation low pass sampled signal energy and original
Whether the ratio of beginning phonetic sampling signal energy takes less than the first default ratio and low pass sampled signal energy with raw tone
Whether the ratio of sample signal energy is located in preset energy ratio range and whether the second consonant frequency band signals energy ratio is big
In the second default ratio, come judge corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.
In one embodiment of this invention, wherein if the ratio of low pass sampled signal energy and raw tone sampled signal energy
The ratio being worth less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy
In ratio range and the second consonant frequency band signals energy ratio is greater than the second default ratio, and speech identifying method further includes following
Step: the energy weighted average of multiple raw tone sampled signals for being judged as noise signal before is calculated, to be made an uproar
Acoustical signal energy weighted average;Whether it is greater than noise letter according to raw tone sampled signal energy corresponding to target voice frame
Number energy weighted average come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal
The weighted value of speech frame with each raw tone sampled signal for being judged as noise signal of correspondence speech frame and target voice
Interval length between frame is different and changes.
In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: calculate target voice frame with
The ratio of multiple speech frames corresponding low pass sampled signal energy and raw tone sampled signal energy before target voice frame
The average value of value, to obtain low pass sampled signal energy proportion average value;It is according to low pass sampled signal energy proportion average value
It is no be less than default average value judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps: being judged to before calculating is multiple
First consonant frequency band signals energy and the second consonant corresponding to the speech frame to break as the raw tone sampled signal of noise signal
The weighted average of the sum of frequency band signals energy, to obtain consonant band energy summation weighted average;According to target voice frame
Corresponding raw tone sampled signal energy subtracts whether the resulting difference of low pass sampled signal energy is greater than consonant frequency range energy
Amount summation weighted average come judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned correspondence each raw tone sampled signal for being judged as noise signal
First consonant frequency band signals energy corresponding to speech frame and the second consonant frequency band signals energy and weighted value with corresponding each
Interval length between a raw tone sampled signal for being judged as noise signal and target voice frame is different and changes.
In one embodiment of this invention, above-mentioned speech identifying method further includes foundation raw tone sampled signal energy
Whether be greater than lower limit value be equal to judge raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned speech identifying method further includes the following steps.Calculate raw tone sampling
The first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of signal, and calculate target voice frame with before target voice frame
Multiple speech frames raw tone sampled signal Average zero-crossing rate, to obtain the first Average zero-crossing rate, the second average zero passage
Rate and third Average zero-crossing rate, the first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate are respectively in target voice frame
Raw tone sampled signal passes through the number of the first preset value, the second preset value and third preset value, and the second preset value is less than
First preset value and be greater than third preset value.It is averaged zero passage according to the first Average zero-crossing rate, the second Average zero-crossing rate and third
Whether rate is respectively greater than or corresponding equal to its default Average zero-crossing rate judges that raw tone corresponding to target voice frame takes
Sample signal whether supplemented by sound signal.
In one embodiment of this invention, above-mentioned speech identifying method further includes, whether be greater than according to the second zero-crossing rate or
Judge equal to default zero-crossing rate raw tone sampled signal corresponding to target voice frame whether supplemented by sound signal.
Based on above-mentioned, ratio of the embodiment of the present invention according to low pass sampled signal energy and raw tone sampled signal energy
Value and the second consonant frequency band signals energy ratio at least one judge the raw tone sampled signal of corresponding target voice frame
Whether supplemented by sound signal, with lower by raw tone sampled signal be mistaken for consonant signal situation occur, and then improve consonant
The identification precision of signal.
To make the foregoing features and advantages of the present invention clearer and more comprehensible, special embodiment below, and it is detailed to cooperate attached drawing to make
Carefully it is described as follows.
Detailed description of the invention
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention;
Fig. 2A~2C shows the flow diagram of the speech identifying method of one embodiment of the invention.
Description of symbols:
102: filter unit;
104: processing unit;
S1: voice signal;
S2: the first bandpass filtered signal;
S3: the second bandpass filtered signal;
S4: low-pass filter signal;
S202~S238: step.
Specific embodiment
Fig. 1 is shown as the schematic diagram of the voice identification apparatus of one embodiment of the invention, please refers to Fig. 1.Voice identification apparatus
Including filter unit 102 and processing unit 104, filter unit 102 couples processing unit 104.Filter unit 102 can be to voice
Signal S1 carries out the bandpass filtering of low-pass filtering, the first consonant frequency range and the second consonant frequency range, to generate low-pass filtering respectively
Signal S4, the first bandpass filtered signal S2 and the second bandpass filtered signal S3, filter unit 102 can be for example including low pass filtereds
Wave device and bandpass filter, and processing unit 104 can for example be implemented with central processing unit.In the present embodiment, low pass filtered
The cutting frequency of wave is 0~2kHz, and the first consonant frequency range and the second consonant frequency range are respectively 2kHz~4kHz and 4kHz
~10kHz, but not limited to this.
Processing unit 104 can be to voice signal S1, low-pass filter signal S4, the first bandpass filtered signal S2 and second
Bandpass filtered signal S3 is sampled, and by voice signal S1, low-pass filter signal S4, the first bandpass filtered signal S2 and
Two bandpass filtered signal S3 are divided into multiple speech frames, wherein each speech frame may include N number of voice signal S1 sampled signal,
The sampled signal of N number of low-pass filter signal S4, the sampled signal of N number of first bandpass filtered signal S2 and the filter of N number of second band logical
The sampled signal of wave signal S3.Processing unit 104 can also calculate the energy of sampled signal in each speech frame, original to obtain
Phonetic sampling signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy
It measures, wherein raw tone sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant
Frequency band signals energy respectively corresponds the sampled signal of voice signal S1 in speech frame, the sampled signal of low-pass filter signal S4,
The energy of the sampled signal of the sampled signal of one bandpass filtered signal S2 and the second bandpass filtered signal S3.Obtaining original language
Sound sampled signal energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy
Afterwards, processing unit 104 can be according to the ratio, first auxiliary of the first consonant frequency band signals energy and the second consonant frequency band signals energy
The ratio and the second consonant frequency band signals energy and raw tone of audio band signals energy and raw tone sampled signal energy
Whether the raw tone sampled signal that the ratio in judgement of sampled signal energy corresponds to each speech frame is noise.
Specifically, processing unit 104 can determine whether the first consonant frequency band signals energy and the second consonant frequency band signals energy
Ratio, the first consonant frequency band signals energy and raw tone sampled signal energy ratio and the second consonant frequency band signals energy
Whether amount falls within its corresponding default ratio range with the ratio of raw tone sampled signal energy respectively, if the first consonant frequency range
Ratio, the first consonant frequency band signals energy and the raw tone sampled signal energy of signal energy and the second consonant frequency band signals energy
It is corresponding that the ratio of amount and the second consonant frequency band signals energy with the ratio of raw tone sampled signal energy fall within its respectively
Default ratio range, then the raw tone sampled signal of target voice frame is noise signal.
For example, processing unit 104 judges corresponding a target voice frame (such as m-th of speech frame, m are positive integer)
Raw tone sampled signal whether be noise mode, can be judged with following formula:
Wherein EB1mFor the first consonant frequency band signals energy, EB2mFor the second consonant frequency band signals energy, and EmIt is original
Phonetic sampling signal energy, when formula (1), (2), (3) all meet, processing unit 104 judges the raw tone of m-th of speech frame
Sampled signal is noise signal.
After the raw tone sampled signal for judging target voice frame is noise signal, processing unit 104 is also calculated
The energy weighted average of multiple speech frames of the raw tone sampled signal of noise signal is judged as before target voice frame,
To obtain noise signal energy weighted average, and according to raw tone sampled signal energy corresponding to target voice frame whether
Judge whether raw tone sampled signal corresponding to target voice frame is consonant greater than noise signal energy weighted average
Signal.
For example, noise signal energy weighted average can be judged as noise letter to calculate before target voice frame
Number raw tone sampled signal 3 speech frames energy weighted average and obtain, it is assumed that before m-th of speech frame,
Three speech frames for being judged as noise recently are respectively the m-10 speech frame, m-12 speech frame and the m-20 language
Sound frame then corresponds to the noise signal energy weighted average AK of m-th of speech framemIt can be as follows shown in formula:
Wherein Em-10、Em-12、Em-20Respectively the m-10 speech frame, m-12 speech frame and the m-20 voice
The raw tone sampled signal energy of frame, and a0, a1, a2 are respectively the m-10 speech frame, m-12 speech frame and
M-20 corresponding weighted values.Wherein weighted value a0, a1, a2 can be fixed value either change value.For example, correspondence is each
The weighted value for being judged as the speech frame of the raw tone sampled signal of noise signal can be judged as noise letter with correspondence is each
Number raw tone sampled signal speech frame and target voice frame between interval length it is different and change.Such as in the present embodiment
In, weighted value a0, a1, a2 can it is different with the interval length between speech frame and m-th of speech frame and change.When noise signal energy
Measure weighted average AKmMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether message supplemented by the raw tone sampled signal of corresponding m-th of speech frame
Number:
Em>AKm (5)
In addition, processing unit can calculate the speech frame of multiple raw tone sampled signals for being judged as noise signal before
Corresponding the first consonant frequency band signals energy and the second consonant frequency band signals energy and weighted average, to obtain consonant
Band energy summation weighted average, and subtract low pass according to raw tone sampled signal energy corresponding to target voice frame and take
Whether the resulting difference of sample signal energy is greater than consonant band energy summation weighted average to judge corresponding to target voice frame
Raw tone sampled signal whether supplemented by sound signal.For example, consonant band energy summation weighted average can be calculating
The first consonant frequency range letter of 3 speech frames of the raw tone sampled signal of noise signal is judged as before target voice frame
Number energy and the second consonant frequency band signals energy and weighted average and obtain, it is assumed that before m-th of speech frame, recently
Three speech frames for being judged as noise are respectively the m-10 speech frame, m-12 speech frame and the m-20 voice
Frame then corresponds to the consonant band energy summation weighted average AS of m-th of speech framemIt can be as follows shown in formula:
Wherein EB1m-10、EB1m-12、EB1m-20Respectively the m-10 speech frame, m-12 speech frame and m-20
The first consonant frequency band signals energy, the EB2 of a speech framem-10、EB2m-12、EB2m-20Respectively the m-10 speech frame, m-12
Second consonant frequency band signals energy of a speech frame and the m-20 speech frame, and c0, c1, c2 are respectively the m-10 voice
Frame, m-12 speech frame and m-20 corresponding weighted value.Wherein weighted value c0, c1, c2 can be for fixed value either
Change value.For example, the corresponding to the speech frame of corresponding each raw tone sampled signal for being judged as noise signal
One consonant frequency band signals energy and the second consonant frequency band signals energy and weighted value each be judged as noise letter with corresponding
Number raw tone sampled signal and target voice frame between interval length it is different and change.As in the present embodiment, weight
Value c0, c1, c2 can it is different with the interval length between speech frame and m-th of speech frame and change.When the first consonant energy proportion
Weighted average ASmMeet the following formula period of the day from 11 p.m. to 1 a.m, can determine whether message supplemented by the raw tone sampled signal of corresponding m-th of speech frame
Number:
Em-ELm>ASm (7)
Wherein ELmFor the low pass sampled signal energy of corresponding m-th of speech frame.
In addition, processing unit 104 can also calculate target voice frame and speech frames multiple before target voice frame are corresponding low
The average value of the ratio of logical sampled signal energy and raw tone sampled signal energy, to obtain low pass sampled signal energy proportion
Average value, for example, for m-th of speech frame, low pass sampled signal energy proportion average value AUmFollowing formula subrepresentation:
Wherein ELm、ELm-1For corresponding m-th of speech frame, the low pass sampled signal energy of the m-1 speech frame, Em、Em-1Point
Not Wei m-th of speech frame, the m-1 speech frame raw tone sampled signal energy.Processing unit 104 can be sampled according to low pass
Whether signal energy ratio average is less than default average value to judge raw tone sampled signal corresponding to target voice frame
Whether supplemented by sound signal.For example, above-mentioned judgment mode can be with following formula subrepresentation for m-th of speech frame:
AUm<0.6 (9)
In the present embodiment, presetting average value is 0.6, but is not limited thereto, and presetting average value can also be according to practical situation
It is adjusted to other values.In addition, carrying out low pass sampled signal energy proportion average value AUmCalculating speech frame number also not with this
Embodiment this be limited.
Also, processing unit 104 can also be according to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass
The ratio calculation of sampled signal energy obtains the second consonant frequency band signals energy ratio, according to low pass sampled signal energy and original
The ratio of beginning phonetic sampling signal energy and the second consonant frequency band signals energy ratio at least one judge corresponding target language
The raw tone sampled signal of sound frame whether supplemented by sound signal.For example, processing unit 104 can calculate raw tone sampled signal energy
Amount subtracts the energy differences of low pass sampled signal energy, and calculates the ratio of the second consonant frequency band signals energy and energy differences,
To obtain the second consonant frequency band signals energy ratio.After calculating the second consonant frequency band signals energy ratio, processing unit
104 can according to low pass sampled signal energy and raw tone sampled signal energy ratio whether less than the first default ratio, with
And whether the ratio of low pass sampled signal energy and raw tone sampled signal energy is located in preset energy ratio range and the
Whether two consonant frequency band signals energy ratios are greater than the second default ratio, to judge that the raw tone of corresponding target voice frame takes
Sample signal whether supplemented by sound signal.
For example, for m-th of speech frame, above-mentioned judgment mode can be with following formula subrepresentation:
In the present embodiment, the first default ratio be the 0.5, second default ratio be 1.3, preset energy ratio range is
0.5~0.6, but not limited to this, in some embodiments the first default ratio, the second default ratio and preset energy ratio
Range can also be adjusted to other values according to practical situation.
In addition, whether processing unit 104 can also be greater than or equal to lower limit value according to raw tone sampled signal energy to sentence
Raw tone sampled signal corresponding to disconnected target voice frame whether supplemented by sound signal.For example, for m-th of speech frame, on
Stating judgment mode can be with following formula subrepresentation:
Em≥50 (13)
In the present embodiment, lower limit value 50, but not limited to this, and lower limit value can also be according to practical feelings in some embodiments
Shape is adjusted.
Occur since consonant signal might have energy situation of different sizes, it may can in the lesser part of energy ratio
Be considered as noise, to avoid this situation, in addition to it is above-mentioned according to energy come judge raw tone sampled signal whether supplemented by sound signal
Outside, processing unit 104 can also judge according to zero-crossing rate raw tone sampled signal whether supplemented by sound signal.Processing unit 104
The first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate of raw tone sampled signal can be calculated, and calculates target voice frame
With the Average zero-crossing rate of the raw tone sampled signal of speech frames multiple before target voice frame, to obtain the first average zero passage
Rate, the second Average zero-crossing rate and third Average zero-crossing rate, and according to the first Average zero-crossing rate, the second Average zero-crossing rate and the
Whether three Average zero-crossing rates are respectively greater than or corresponding equal to its default Average zero-crossing rate judges corresponding to target voice frame
Raw tone sampled signal whether supplemented by sound signal.Wherein the first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate are respectively
Original phonetic sampling signal passes through the number of the first preset value, the second preset value and third preset value in target voice frame,
Wherein the second preset value less than the first preset value and is greater than third preset value.
For m-th of speech frame, original zero-crossing rateIt can be shown below:
Wherein N is positive integer, represents the number of the sampled signal in m-th of speech frame, and mL is amplitude threshold value, andFor the raw tone sampled signal in m-th of speech frame.Processing unit 104 can foundationWhether one is greater than or equal to pre-
If zero-crossing rate come judge raw tone sampled signal whether supplemented by sound signal, such as can judge according to following formula:
It wherein presets zero-crossing rate not to be limited with 22, its value can also be adjusted according to practical situation in some embodiments.
In addition, in addition processing unit 104 can include the zero-crossing rate of energy condition according to raw tone sampled signalTo judge
Raw tone sampled signal whether supplemented by sound signal, zero-crossing rateIt can be shown below:
WhereinCan following formula indicate:
In the present embodiment, αxValue be 0.5, but not limited to this, its value can also be according to practical feelings in some embodiments
Shape is adjusted.So by adjusting calculate zero-crossing rate benchmark, can more accurately judge raw tone sampled signal whether be
Consonant signal.Whether processing unit 104 can also judge raw tone sampled signal according to the Average zero-crossing rate of multiple speech frames
Supplemented by sound signal, for example, for m-th of speech frame, can according to its with nearest two speech frames (namely m-1, m-2
A speech frame) zero-crossing rate average value come judge raw tone sampled signal whether supplemented by sound signal, judge that formula can be as follows
It is shown:
Described in embodiment as above, processing unit 104 can judge that raw tone takes according to energy or zero-crossing rate at least one
Sample signal whether supplemented by sound signal namely processing unit 104 can the condition at least one of in summary formula judge corresponding mesh
Mark speech frame raw tone sampled signal whether supplemented by sound signal.For example, processing unit 104 can determine whether formula (5), (7),
(9), whether (10), (13), (15), (20), (21), (22) meet simultaneously, just judge corresponding target voice frame if meeting simultaneously
Raw tone sampled signal supplemented by sound signal.In another example processing unit 104 also can determine whether formula (5), (7), (9), (11),
(12), whether (13), (15), (20), (21), (22) meet simultaneously, if meeting the original for just judging corresponding target voice frame simultaneously
Sound signal supplemented by beginning phonetic sampling signal.
Fig. 2A~2C shows the flow diagram of the speech identifying method of one embodiment of the invention, referring to figure 2. A~2C.
As can be seen from the above embodiments, the speech identifying method of voice identification apparatus may include the following steps.Firstly, being carried out to voice signal
The bandpass filtering of low-pass filtering, the first consonant frequency range and the second consonant frequency range, to generate low-pass filter signal, first band respectively
Bandpass filtered signal and the second bandpass filtered signal (step S202).Then, by voice signal, low-pass filter signal, first band
Bandpass filtered signal and the second bandpass filtered signal are divided into multiple speech frames (step S204), wherein each speech frame includes N number of sampling
Signal, N are positive integer.Then, the energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal
Energy, low pass sampled signal energy, the first consonant frequency band signals energy and the second consonant frequency band signals energy (step S206).
Later, ratio, the first consonant frequency band signals energy according to the first consonant frequency band signals energy and the second consonant frequency band signals energy
Amount and the ratio of raw tone sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy
Ratio in judgement correspond to whether the raw tone sampled signal of target voice frame is noise (step S208).For example, can determine whether
The ratio of one consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency band signals energy and raw tone
Whether the ratio of the ratio of sampled signal energy and the second consonant frequency band signals energy and raw tone sampled signal energy divides
Corresponding default ratio range is not fallen within, if the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy,
The ratio and the second consonant frequency band signals energy and original of first consonant frequency band signals energy and raw tone sampled signal energy
The ratio of beginning phonetic sampling signal energy falls within corresponding default ratio range respectively, then the raw tone sampling of target voice frame
Signal is noise signal.
And then according to the second consonant frequency band signals energy, raw tone sampled signal energy and low pass sampled signal energy
The ratio calculation of amount obtains the second consonant frequency band signals energy ratio, and takes according to low pass sampled signal energy with raw tone
The ratio of sample signal energy and the second consonant frequency band signals energy ratio at least one judge the original of corresponding target voice frame
Beginning phonetic sampling signal whether supplemented by sound signal.As shown in Fig. 2A~2C, it can first calculate raw tone sampled signal energy and subtract
Then the energy differences (step S210) of low pass sampled signal energy calculate the second consonant frequency band signals energy and energy differences again
Ratio, to obtain the second consonant frequency band signals energy ratio (step S212).Judge low pass sampled signal energy again later
Whether the ratio with raw tone sampled signal energy is less than the first default ratio and low pass sampled signal energy and original language
Whether the ratio of sound sampled signal energy is located in preset energy ratio range and the second consonant frequency band signals energy ratio is
It is no to be greater than the second default ratio (step S214).If the ratio of low pass sampled signal energy and raw tone sampled signal energy is not
Ratio less than the first default ratio or low pass sampled signal energy and raw tone sampled signal energy is not located at preset energy
In ratio range or the second consonant frequency band signals energy ratio is not greater than the second default ratio, then judges target voice frame institute
The corresponding non-consonant signal (step S216) of raw tone sampled signal.
On the contrary, if the ratio of low pass sampled signal energy and raw tone sampled signal energy is less than the first default ratio
It is interior and second that the ratio of value or low pass sampled signal energy and raw tone sampled signal energy is located at preset energy ratio range
Consonant frequency band signals energy ratio is greater than the second default ratio, then calculates multiple original languages for being judged as noise signal before
The energy weighted average of the speech frame of sound sampled signal, to obtain noise signal energy weighted average (step S218).So
Judge whether raw tone sampled signal energy corresponding to target voice frame is greater than noise signal energy weighted average (step afterwards
Rapid S220), wherein the weighted value for corresponding to the speech frame of each raw tone sampled signal for being judged as noise signal can be with right
Answer interval length between the speech frame and target voice frame of each raw tone sampled signal for being judged as noise signal not
Change together.If raw tone sampled signal energy corresponding to target voice frame is not greater than noise signal energy weighted average
Value, then judge the non-consonant signal (step S216) of raw tone sampled signal corresponding to target voice frame.
On the contrary, if raw tone sampled signal energy corresponding to target voice frame is greater than noise signal energy, weighting is flat
Mean value then calculates target voice frame low pass sampled signal energy corresponding with speech frames multiple before target voice frame and original language
The average value of the ratio of sound sampled signal energy, to obtain low pass sampled signal energy proportion average value (step S222).Then
Judge whether low pass sampled signal energy proportion average value is less than default average value (step S224) again.If low pass sampled signal energy
Ratio average is measured no less than default average value, then the non-consonant signal of raw tone sampled signal corresponding to target voice frame
(step S216).On the contrary, if low pass sampled signal energy proportion average value is less than default average value, then calculate it is multiple it
Before be judged as noise signal raw tone sampled signal speech frame corresponding to the first consonant frequency band signals energy and
The weighted average of the sum of two consonant frequency band signals energy, to obtain consonant band energy summation weighted average (step
S226), wherein corresponding to the first consonant corresponding to the speech frame of each raw tone sampled signal for being judged as noise signal
Frequency band signals energy and the second consonant frequency band signals energy and weighted value with corresponding each original for being judged as noise signal
Interval length between beginning phonetic sampling signal and target voice frame is different and changes.Then judge again corresponding to target voice frame
Raw tone sampled signal energy subtract whether the resulting difference of low pass sampled signal energy is greater than consonant band energy summation
Weighted average (step S228), if raw tone sampled signal energy corresponding to target voice frame subtracts low pass sampled signal
The resulting difference of energy is not greater than consonant band energy summation weighted average, then raw tone corresponding to target voice frame takes
The non-consonant signal (step S216) of sample signal.
On the contrary, if raw tone sampled signal energy corresponding to target voice frame subtracts low pass sampled signal energy institute
The difference obtained is greater than consonant band energy summation weighted average, then judges whether raw tone sampled signal energy is greater than or waits
In lower limit value (step S230).If raw tone sampled signal energy is not greater than or equal to lower limit value, target voice frame institute is right
The non-consonant signal (step S216) of the raw tone sampled signal answered.On the contrary, if raw tone sampled signal energy be greater than or
Equal to lower limit value, then the first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of raw tone sampled signal are then calculated, and
The Average zero-crossing rate of target voice frame with the raw tone sampled signal of multiple speech frames before target voice frame is calculated, to obtain
One first Average zero-crossing rate, one second Average zero-crossing rate and a third Average zero-crossing rate (step S232).Wherein the first zero passage
Original phonetic sampling signal is default by first respectively in target voice frame for rate, the second zero-crossing rate and third zero-crossing rate
The number of value, the second preset value and third preset value, wherein the second preset value is less than the first preset value and default greater than third
Value.Then judge whether the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate are respectively greater than or wait again
In its corresponding default Average zero-crossing rate (step S234).If the first Average zero-crossing rate, the second Average zero-crossing rate and third are flat
Equal zero-crossing rate is not all greater than or equal to its corresponding default Average zero-crossing rate, then raw tone corresponding to target voice frame samples
The non-consonant signal (step S216) of signal.
On the contrary, if the first Average zero-crossing rate, the second Average zero-crossing rate and third Average zero-crossing rate are greater than or equal to it
Corresponding default Average zero-crossing rate, then then judge whether the second zero-crossing rate is greater than or equal to default zero-crossing rate (step S236).
If the second zero-crossing rate is not greater than or equal to default zero-crossing rate, the non-consonant of raw tone sampled signal corresponding to target voice frame
Signal (step S216).On the contrary, if the second zero-crossing rate is greater than or equal to default zero-crossing rate, original corresponding to target voice frame
Sound signal (step S238) supplemented by beginning phonetic sampling signal.
In conclusion the present invention can the condition at least one of in summary formula judge the original of corresponding target voice frame
Phonetic sampling signal whether supplemented by sound signal, to improve the identification precision of consonant signal.Such as it can be according to low pass sampled signal
At least one judgement pair of the ratio and the second consonant frequency band signals energy ratio of energy and raw tone sampled signal energy
Answer target voice frame raw tone sampled signal whether supplemented by sound signal, with lower by raw tone sampled signal erroneous judgement supplemented by
The situation of sound signal occurs, and then improves the identification precision of consonant signal.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (24)
1. a kind of voice identification apparatus characterized by comprising
One filter unit carries out the band of low-pass filtering, one first consonant frequency range and one second consonant frequency range to a voice signal
Pass filter, to generate a low-pass filter signal, one first bandpass filtered signal and one second bandpass filtered signal respectively;And
One processing unit couples the filter unit, by the voice signal, the low-pass filter signal, first bandpass filtered signal
And second bandpass filtered signal is divided into multiple speech frames, wherein respectively the speech frame includes N number of sampled signal, N is positive integer,
The energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal energy, a low pass sampled signal energy
Amount, one first consonant frequency band signals energy and one second consonant frequency band signals energy, according to the second consonant frequency band signals energy
The ratio calculation of amount, the raw tone sampled signal energy and the low pass sampled signal energy obtains one second consonant frequency band signals
Energy ratio, ratio and second consonant according to the low pass sampled signal energy and the raw tone sampled signal energy
Frequency band signals energy ratio at least one judge the corresponding target voice frame raw tone sampled signal whether supplemented by message
Number,
Wherein the processing unit calculates the energy difference that the raw tone sampled signal energy subtracts the low pass sampled signal energy
Value, and the ratio of the second consonant frequency band signals energy and the energy differences is calculated, to obtain the second consonant frequency band signals energy
Measure ratio value.
2. voice identification apparatus according to claim 1, which is characterized in that the processing unit is also according to first consonant frequency
Segment signal energy and the ratio, the first consonant frequency band signals energy and the raw tone of the second consonant frequency band signals energy take
The ratio in judgement of the ratio of sample signal energy and the second consonant frequency band signals energy and the raw tone sampled signal energy
Whether the raw tone sampled signal of the corresponding target voice frame is noise.
3. voice identification apparatus according to claim 2, which is characterized in that the processing unit also judges first consonant frequency
Segment signal energy and the ratio, the first consonant frequency band signals energy and the raw tone of the second consonant frequency band signals energy take
Whether the ratio of the ratio of sample signal energy and the second consonant frequency band signals energy and the raw tone sampled signal energy
Corresponding default ratio range is fallen within respectively, if the first consonant frequency band signals energy and the second consonant frequency band signals energy
Ratio, the ratio of the first consonant frequency band signals energy and the raw tone sampled signal energy and the second consonant frequency range are believed
Number energy falls within corresponding default ratio range with the ratio of the raw tone sampled signal energy respectively, then the target voice frame
Raw tone sampled signal be noise signal.
4. voice identification apparatus according to claim 1, which is characterized in that the processing unit, which is also sampled according to the low pass, to be believed
Whether the ratio of number energy and the raw tone sampled signal energy is less than one first default ratio and the low pass sampled signal
Whether the ratio of energy and the raw tone sampled signal energy is located in a preset energy ratio range and second consonant frequency
Whether segment signal energy ratio is greater than one second default ratio, to judge to correspond to the raw tone sampling letter of the target voice frame
Number whether supplemented by sound signal.
5. voice identification apparatus according to claim 4, which is characterized in that if the low pass sampled signal energy is original with this
The ratio of phonetic sampling signal energy is less than the first default ratio or the low pass sampled signal energy and the raw tone samples
The ratio of signal energy is located in the preset energy ratio range and the second consonant frequency band signals energy ratio be greater than this
Two default ratios, the energy which also calculates multiple raw tone sampled signals for being judged as noise signal before add
Weight average value to obtain a noise signal energy weighted average, and takes according to raw tone corresponding to the target voice frame
Whether sample signal energy is greater than the noise signal energy weighted average to judge raw tone corresponding to the target voice frame
Sampled signal whether supplemented by sound signal.
6. voice identification apparatus according to claim 5, which is characterized in that it is corresponding respectively this be judged as the original of noise signal
The weighted value of the speech frame of beginning phonetic sampling signal with it is corresponding respectively this be judged as the raw tone sampled signal of noise signal
Speech frame is different from the interval length of the target voice frame and changes.
7. voice identification apparatus according to claim 5, which is characterized in that the processing unit also calculates the target voice frame
Low pass sampled signal energy corresponding with multiple speech frames before the target voice frame and raw tone sampled signal energy
Ratio average value, to obtain a low pass sampled signal energy proportion average value, and according to the low pass sampled signal energy ratio
Example average value whether less than a default average value come judge raw tone sampled signal corresponding to the target voice frame whether be
Consonant signal.
8. voice identification apparatus according to claim 7, which is characterized in that the processing unit also calculate it is multiple before be judged to
The first consonant frequency band signals energy corresponding to the speech frame to break as the raw tone sampled signal of noise signal and this second
The weighted average of the sum of consonant frequency band signals energy, to obtain a consonant band energy summation weighted average, and according to this
Whether the raw tone sampled signal energy corresponding to target voice frame subtracts the resulting difference of low pass sampled signal energy
Raw tone sampled signal corresponding to the target voice frame is judged greater than the consonant band energy summation weighted average is
Sound signal supplemented by no.
9. voice identification apparatus according to claim 8, which is characterized in that it is corresponding respectively this be judged as the original of noise signal
The first consonant frequency band signals energy corresponding to the speech frame of beginning phonetic sampling signal and the second consonant frequency band signals energy
And weighted value with it is corresponding respectively this be judged as between the raw tone sampled signal of noise signal and the target voice frame
Interval length is different and changes.
10. voice identification apparatus according to claim 8, which is characterized in that the processing unit is also according to the raw tone
Whether sampled signal energy is greater than or equal to a lower limit value to judge raw tone sampled signal corresponding to the target voice frame
Whether supplemented by sound signal.
11. voice identification apparatus according to claim 10, which is characterized in that the processing unit also calculates the raw tone
The first zero-crossing rate, the second zero-crossing rate and the third zero-crossing rate of sampled signal, and calculate the target voice frame and the target voice
The Average zero-crossing rate of the raw tone sampled signal of multiple speech frames before frame, to obtain one first Average zero-crossing rate, one
Two Average zero-crossing rates and a third Average zero-crossing rate, and according to first Average zero-crossing rate, second Average zero-crossing rate and
Whether the third Average zero-crossing rate is respectively greater than or default Average zero-crossing rate corresponding equal to its judges the target voice frame institute
Corresponding raw tone sampled signal whether supplemented by sound signal, first zero-crossing rate, second zero-crossing rate and the third zero passage
The raw tone sampled signal passes through one first preset value, one second preset value and one to rate respectively in the target voice frame
The number of third preset value, second preset value are less than first preset value and are greater than the third preset value.
12. voice identification apparatus according to claim 11, which is characterized in that the processing unit is also according to second zero passage
Rate whether be greater than or equal to a default zero-crossing rate judge raw tone sampled signal corresponding to the target voice frame whether be
Consonant signal.
13. a kind of speech identifying method characterized by comprising
The bandpass filtering of low-pass filtering, one first consonant frequency range and one second consonant frequency range is carried out, to a voice signal to divide
A low-pass filter signal, one first bandpass filtered signal and one second bandpass filtered signal are not generated;
The voice signal, the low-pass filter signal, first bandpass filtered signal and second bandpass filtered signal are divided into more
A speech frame, wherein respectively the speech frame includes N number of sampled signal, N is positive integer;
The energy of sampled signal in target voice frame is calculated, to obtain a raw tone sampled signal energy, low pass sampling letter
Number energy, one first consonant frequency band signals energy and one second consonant frequency band signals energy;
Calculate the energy differences that the raw tone sampled signal energy subtracts the low pass sampled signal energy;
The ratio of the second consonant frequency band signals energy and the energy differences is calculated, to obtain one second consonant frequency band signals energy
Ratio value;And
Ratio and the second consonant frequency range according to the low pass sampled signal energy and the raw tone sampled signal energy are believed
Number energy ratio at least one judge the corresponding target voice frame raw tone sampled signal whether supplemented by sound signal.
14. speech identifying method according to claim 13, which is characterized in that further include:
Believe according to the ratio of the first consonant frequency band signals energy and the second consonant frequency band signals energy, the first consonant frequency range
Number energy takes with the ratio of the raw tone sampled signal energy and the second consonant frequency band signals energy with the raw tone
Whether the raw tone sampled signal that the ratio in judgement of sample signal energy corresponds to the target voice frame is noise.
15. speech identifying method according to claim 14, which is characterized in that further include:
Judge the ratio, the first consonant frequency range letter of the first consonant frequency band signals energy and the second consonant frequency band signals energy
Number energy takes with the ratio of the raw tone sampled signal energy and the second consonant frequency band signals energy with the raw tone
Whether the ratio of sample signal energy falls within corresponding default ratio range respectively;And
If ratio, the first consonant frequency band signals of the first consonant frequency band signals energy and the second consonant frequency band signals energy
The ratio and the second consonant frequency band signals energy of energy and the raw tone sampled signal energy and the raw tone sample
The ratio of signal energy falls within corresponding default ratio range respectively, then the raw tone sampled signal of the target voice frame is to make an uproar
Acoustical signal.
16. speech identifying method according to claim 13, which is characterized in that further include:
According to the ratio of the low pass sampled signal energy and raw tone sampled signal energy ratio whether default less than one first
Whether the ratio of value and the low pass sampled signal energy and the raw tone sampled signal energy is located at a preset energy ratio
In range and whether the second consonant frequency band signals energy ratio is greater than one second default ratio, to judge to correspond to the target language
The raw tone sampled signal of sound frame whether supplemented by sound signal.
17. speech identifying method according to claim 16, which is characterized in that if the low pass sampled signal energy and the original
The ratio of beginning phonetic sampling signal energy is less than the first default ratio or the low pass sampled signal energy takes with the raw tone
The ratio of sample signal energy is located in the preset energy ratio range and the second consonant frequency band signals energy ratio is greater than this
Second default ratio, the speech identifying method further include:
The energy weighted average for calculating multiple raw tone sampled signals for being judged as noise signal before, is made an uproar with obtaining one
Acoustical signal energy weighted average;And
It is put down according to whether raw tone sampled signal energy corresponding to the target voice frame is greater than noise signal energy weighting
Mean value come judge raw tone sampled signal corresponding to the target voice frame whether supplemented by sound signal.
18. speech identifying method according to claim 17, which is characterized in that it is corresponding respectively this be judged as noise signal
The weighted value of the speech frame of raw tone sampled signal with it is corresponding respectively this be judged as the raw tone sampled signal of noise signal
Speech frame and the target voice frame between interval length it is different and change.
19. speech identifying method according to claim 17, which is characterized in that further include:
Calculate target voice frame low pass sampled signal energy corresponding with multiple speech frames before the target voice frame with
The average value of the ratio of raw tone sampled signal energy, to obtain a low pass sampled signal energy proportion average value;And
Whether less than a default average value target voice frame institute is judged according to the low pass sampled signal energy proportion average value
Corresponding raw tone sampled signal whether supplemented by sound signal.
20. speech identifying method according to claim 19, which is characterized in that further include:
Calculate first consonant corresponding to the speech frame of multiple raw tone sampled signals for being judged as noise signal before
Frequency band signals energy and the second consonant frequency band signals energy and weighted average, to obtain a consonant band energy summation
Weighted average;And
It is subtracted obtained by the low pass sampled signal energy according to the raw tone sampled signal energy corresponding to the target voice frame
Difference whether be greater than the consonant band energy summation weighted average to judge raw tone corresponding to the target voice frame
Sampled signal whether supplemented by sound signal.
21. speech identifying method according to claim 20, which is characterized in that it is corresponding respectively this be judged as noise signal
The first consonant frequency band signals energy corresponding to the speech frame of raw tone sampled signal and the second consonant frequency band signals energy
Amount and weighted value with it is corresponding respectively this be judged as between the raw tone sampled signal of noise signal and the target voice frame
Interval length it is different and change.
22. speech identifying method according to claim 20, which is characterized in that further include:
Whether it is greater than a lower limit value according to the raw tone sampled signal energy and is equal to and judges corresponding to the target voice frame
Raw tone sampled signal whether supplemented by sound signal.
23. speech identifying method according to claim 22, which is characterized in that further include:
The first zero-crossing rate, the second zero-crossing rate and third zero-crossing rate of the raw tone sampled signal are calculated, and calculates the target
The Average zero-crossing rate of the raw tone sampled signal of multiple speech frames before speech frame and the target voice frame, to obtain one the
One Average zero-crossing rate, one second Average zero-crossing rate and a third Average zero-crossing rate, first zero-crossing rate, second zero-crossing rate with
And the raw tone sampled signal passes through one first preset value, one second to the third zero-crossing rate respectively in the target voice frame
The number of preset value and a third preset value, second preset value are less than first preset value and are greater than the third preset value;
And
Whether it is respectively greater than or waits according to first Average zero-crossing rate, second Average zero-crossing rate and the third Average zero-crossing rate
Judge whether raw tone sampled signal corresponding to the target voice frame is consonant in its corresponding default Average zero-crossing rate
Signal.
24. speech identifying method according to claim 23, which is characterized in that further include:
It is original corresponding to the target voice frame to judge whether to be greater than or equal to a default zero-crossing rate according to second zero-crossing rate
Phonetic sampling signal whether supplemented by sound signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510060494.7A CN105989835B (en) | 2015-02-05 | 2015-02-05 | Voice identification apparatus and speech identifying method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510060494.7A CN105989835B (en) | 2015-02-05 | 2015-02-05 | Voice identification apparatus and speech identifying method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105989835A CN105989835A (en) | 2016-10-05 |
CN105989835B true CN105989835B (en) | 2019-08-13 |
Family
ID=57036220
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510060494.7A Active CN105989835B (en) | 2015-02-05 | 2015-02-05 | Voice identification apparatus and speech identifying method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105989835B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107331387A (en) * | 2017-06-29 | 2017-11-07 | 上海青声网络科技有限公司 | A kind of determination method and device of phonetic Chinese character fragment |
CN113038318B (en) * | 2019-12-25 | 2022-06-07 | 荣耀终端有限公司 | Voice signal processing method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100227097B1 (en) * | 1997-06-30 | 1999-10-15 | 전주범 | A voice recognition system and a controlling method using a delta-delta energy account |
EP1180764A1 (en) * | 2000-04-24 | 2002-02-20 | Lucent Technologies Inc. | Confidence score in decoded signal for speech recognition over wireless transmission channels |
CN102982801A (en) * | 2012-11-12 | 2013-03-20 | 中国科学院自动化研究所 | Phonetic feature extracting method for robust voice recognition |
-
2015
- 2015-02-05 CN CN201510060494.7A patent/CN105989835B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100227097B1 (en) * | 1997-06-30 | 1999-10-15 | 전주범 | A voice recognition system and a controlling method using a delta-delta energy account |
EP1180764A1 (en) * | 2000-04-24 | 2002-02-20 | Lucent Technologies Inc. | Confidence score in decoded signal for speech recognition over wireless transmission channels |
CN102982801A (en) * | 2012-11-12 | 2013-03-20 | 中国科学院自动化研究所 | Phonetic feature extracting method for robust voice recognition |
Also Published As
Publication number | Publication date |
---|---|
CN105989835A (en) | 2016-10-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI557728B (en) | Speech recognition apparatus and speech recognition method | |
EP3696814A1 (en) | Speech enhancement method and apparatus, device and storage medium | |
KR100770839B1 (en) | Method and apparatus for estimating harmonic information, spectrum information and degree of voicing information of audio signal | |
CN104464722B (en) | Voice activity detection method and apparatus based on time domain and frequency domain | |
JP2006079079A (en) | Distributed speech recognition system and its method | |
EP1229520A2 (en) | Silence insertion descriptor (sid) frame detection with human auditory perception compensation | |
CN105989834B (en) | Voice recognition device and voice recognition method | |
JP2010112996A (en) | Voice processing device, voice processing method and program | |
CN106653062A (en) | Spectrum-entropy improvement based speech endpoint detection method in low signal-to-noise ratio environment | |
JP2007041593A (en) | Method and apparatus for extracting voiced/unvoiced classification information using harmonic component of voice signal | |
JP4816711B2 (en) | Call voice processing apparatus and call voice processing method | |
CN105872910A (en) | Audio signal squeaking detection method | |
CN112017682B (en) | Single-channel voice simultaneous noise reduction and reverberation removal system | |
CN105989835B (en) | Voice identification apparatus and speech identifying method | |
CN106991998A (en) | The detection method of sound end under noise circumstance | |
WO2010086020A1 (en) | Audio signal quality prediction | |
TWI566242B (en) | Speech recognition apparatus and speech recognition method | |
CN110310669A (en) | A kind of method and device and readable storage medium storing program for executing detecting mute frame | |
CN106504760B (en) | Broadband ambient noise and speech Separation detection system and method | |
WO2016004757A1 (en) | Noise detection method and apparatus | |
CN110211596A (en) | One kind composing entropy cetacean whistle signal detection method based on Mel subband | |
CN105916090A (en) | Hearing aid system based on intelligent speech recognition technology | |
US7818168B1 (en) | Method of measuring degree of enhancement to voice signal | |
CN110689901B (en) | Voice noise reduction method and device, electronic equipment and readable storage medium | |
WO2017128910A1 (en) | Method, apparatus and electronic device for determining speech presence probability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |