CN106486133B - One kind is uttered long and high-pitched sounds scene recognition method and equipment - Google Patents
One kind is uttered long and high-pitched sounds scene recognition method and equipment Download PDFInfo
- Publication number
- CN106486133B CN106486133B CN201510532929.3A CN201510532929A CN106486133B CN 106486133 B CN106486133 B CN 106486133B CN 201510532929 A CN201510532929 A CN 201510532929A CN 106486133 B CN106486133 B CN 106486133B
- Authority
- CN
- China
- Prior art keywords
- window
- pitched sounds
- uttering long
- long
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
It utters long and high-pitched sounds scene recognition method this application discloses one kind, comprising: to each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, determining that the frame is frame of uttering long and high-pitched sounds;Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently scene of uttering long and high-pitched sounds, otherwise, judgement is currently non-scene of uttering long and high-pitched sounds.It utters long and high-pitched sounds scene Recognition equipment disclosed herein as well is one kind.Using technical solution disclosed in the present application, the accuracy rate for detection of uttering long and high-pitched sounds can be improved, to be adapted to subsequent chauvent's criterion processing.
Description
Technical field
This application involves field of communication technology, in particular to one kind is uttered long and high-pitched sounds scene recognition method and equipment.
Background technique
The voice service form of sector terminal is mainly the business such as cluster mode, direct mode operation (DMO), and this kind of business
Mainly use outer mode playback.Since sector terminal largely works in the biggish outdoor or workshop of ambient noise, it is desirable that it gives great volume,
Therefore the uplink and downlink volume gain of terminal is usually adjusted larger, and after sound is amplified by loop gain, energy is constantly accumulated shape
It at uttering long and high-pitched sounds, and utters long and high-pitched sounds and seriously affects the normal use of voice service, great discomfort is caused to customer perception, therefore to field of uttering long and high-pitched sounds
Scape, which carries out identification tool, to have very important significance.
However, sector terminal is still in the stage of fumbling, largely to the solution of scene Recognition of uttering long and high-pitched sounds and immature at present
The problem of the generally existing low efficiency of identifying schemes, identification inaccuracy, has seriously affected the overall performance of chauvent's criterion.
Summary of the invention
It utters long and high-pitched sounds scene recognition method and equipment this application provides one kind, to improve the accuracy rate for detection of uttering long and high-pitched sounds.
One kind provided by the present application is uttered long and high-pitched sounds scene recognition method, comprising:
To each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, really
The fixed frame is frame of uttering long and high-pitched sounds;
Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently scene of uttering long and high-pitched sounds, it is no
Then, judgement is currently non-scene of uttering long and high-pitched sounds.
Preferably, the basis is uttered long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds and includes:
A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, after
It is continuous to execute B, otherwise, terminate deterministic process;
B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting
Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window,
Wherein, the width of before_window window and after_window window and Peak_window window are identical or different;
C, judge whether power and the mean power of before_window window and after_window window of Po_peak are full
Foot:
If it is satisfied, continuing to execute D, otherwise, terminate deterministic process;Wherein PvFor preset value;
D, judge being averaged for the mean power of Peak_window window and before_window window and after_window window
Whether power meets:
If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.
Preferably, the scene condition of uttering long and high-pitched sounds are as follows: the quantity for frame of uttering long and high-pitched sounds in detection window is more than or equal to the quantity of setting
Thresholding.
Preferably, the scene condition of uttering long and high-pitched sounds is divided into: the scene item of uttering long and high-pitched sounds under long detection window mechanism and short detection window mechanism
Part, wherein the detection window width of long detection window mechanism is greater than the detection window width of short detection window mechanism.
Preferably, the quantity of speech frame for including in the quantity thresholding and detection window is directly proportional, and quantity thresholding is less than
Or the quantity equal to the speech frame for including in detection window.
It utters long and high-pitched sounds scene Recognition equipment present invention also provides one kind, comprising: frame judging module of uttering long and high-pitched sounds and scene judgement of uttering long and high-pitched sounds
Module, in which:
The frame judging module of uttering long and high-pitched sounds, for each speech frame in detection window, frame bar part to judge whether according to uttering long and high-pitched sounds
In the presence of feature of uttering long and high-pitched sounds, if it does, determining that the frame is frame of uttering long and high-pitched sounds;
The scene judging module of uttering long and high-pitched sounds, for judging whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied,
Then judgement is currently scene of uttering long and high-pitched sounds, and otherwise, judgement is currently non-scene of uttering long and high-pitched sounds.
As seen from the above technical solution, utter long and high-pitched sounds scene recognition method and equipment provided by the present application, first according to frame of uttering long and high-pitched sounds
Condition judges to detect with the presence or absence of feature of uttering long and high-pitched sounds in each speech frame in window respectively, if it does, determining that the frame is frame of uttering long and high-pitched sounds;
Then judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then otherwise judgement is currently sentenced for scene of uttering long and high-pitched sounds
Certainly currently it is non-scene of uttering long and high-pitched sounds, phonetic feature of uttering long and high-pitched sounds can be effectively identified by technical scheme, improve detection of uttering long and high-pitched sounds
Accuracy rate, to be adapted to the processing of subsequent chauvent's criterion.
Detailed description of the invention
Fig. 1 is the flow diagram of the preferable scene recognition method of uttering long and high-pitched sounds of the present invention one;
Fig. 2 is the time domain waveform schematic diagram in the presence of phenomenon of uttering long and high-pitched sounds;
Fig. 3 is the frequency-domain waveform schematic diagram in the presence of phenomenon of uttering long and high-pitched sounds;
Fig. 4 is the schematic diagram that present invention judgement is uttered long and high-pitched sounds a little;
Fig. 5 is the composed structure schematic diagram of a preferable equipment of the invention.
Specific embodiment
It is right hereinafter, referring to the drawings and the embodiments, for the objects, technical solutions and advantages of the application are more clearly understood
The application is described in further detail.
Fig. 1 is the flow diagram of the preferable scene recognition method of uttering long and high-pitched sounds of the present invention one, this method comprises:
Firstly, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if deposited to each speech frame in detection window
Determining that the frame is frame of uttering long and high-pitched sounds;
Then, judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently field of uttering long and high-pitched sounds
Scape, otherwise, judgement are currently non-scene of uttering long and high-pitched sounds.
In general, howling in the time domain concentrate by energy comparison, and there are saturated phenomenons, and are concentrated mainly on a certain section
The frequency domain for comparing concentration, the region as shown in ellipse in Fig. 2.Fig. 3 then shows two and utters long and high-pitched sounds a little.In Fig. 2, horizontal axis indicates the time,
Unit is the second, and the longitudinal axis indicates power, unit mW;In Fig. 3, horizontal axis indicates frequency, unit Hz, and the longitudinal axis indicates power, unit
For dB.The application is concentrated mainly on some this feature into multiple frequency points according to an energy of uttering long and high-pitched sounds and identifies frame of uttering long and high-pitched sounds, and proposes
Uttering long and high-pitched sounds, it is necessary to meet following condition for frame:
(1) frequency uttered long and high-pitched sounds a little is greater than the thresholding min_frequency of setting.
(2) power uttered long and high-pitched sounds in the peak_window window centered on uttering long and high-pitched sounds a little a little is maximum, remembers that the position uttered long and high-pitched sounds a little is
Po_peak。
(3) mean power of the power and before_window window and after_window window uttered long and high-pitched sounds a little meets:
Wherein, PdFor preset value, recommendation 10.
(4) mean power of the mean power of Peak_window window and before_window window and after_window window
Meet:
Wherein, the relationship of Peak_window window, before_window window and after_window window is as shown in Figure 4.Fig. 4
In shown example, of same size, the before_ of Peak_window window, before_window window and after_window window
Window window and after_window window are located at the two sides of Peak_window window.In practical applications, Peak_window
The width of window, before_window window and after_window window can be identical or different, the value range recommendation of width
For 5~12 sampled points.
If current speech frame meets above-mentioned condition, current speech frame is adjudicated in the presence of uttering long and high-pitched sounds a little, current speech frame can be sentenced
It is certainly frame of uttering long and high-pitched sounds.
Based on above-mentioned frame bar part of uttering long and high-pitched sounds, judge that a certain speech frame whether there is the detailed process for feature of uttering long and high-pitched sounds are as follows:
A, judge whether the frequency of prominent point in speech frame is greater than the first thresholding of setting (i.e. as previously described
Min_frequency), if so, continuing to execute B, otherwise, terminate deterministic process;
B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting
Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window,
Wherein, the width of before_window window and after_window window can be identical or different with Peak_window window, width
Value range recommendation be 5~12 sampled points;
C, judge whether power and the mean power of before_window window and after_window window of Po_peak are full
Foot:
If it is satisfied, continuing to execute D, otherwise, terminate deterministic process;Wherein, PvFor preset value, recommendation is
5;
D, judge being averaged for the mean power of Peak_window window and before_window window and after_window window
Whether power meets:
If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame, it may be assumed that the speech frame is frame of uttering long and high-pitched sounds.
For howling scene, phenomenon of uttering long and high-pitched sounds can continue to generate, and there is feature of uttering long and high-pitched sounds in continuous multiple speech frames, immediately
Characteristic of field, the application propose the scene decision method of uttering long and high-pitched sounds based on sliding window as previously described, together based on the analysis to this feature
When, using long detection window mechanism and short detection window mechanism.Short detection window mechanism is uttered long and high-pitched sounds a little by generating in analysis short cycle
Speech frame probability is mainly used for the strong howling of judgement burst to determine whether into scene of uttering long and high-pitched sounds;And grow detection window mechanism
Be by generating the speech frame probability a little of uttering long and high-pitched sounds in analysis long period to determine whether into scene of uttering long and high-pitched sounds, be mainly used for judging in
Slowly varying howling.
Long detection window mechanism and the algorithm of short detection window mechanism and processing are almost the same, and the main distinction is thresholding and detection
Window is of different sizes, is illustrated by taking short detection window mechanism as an example herein.Short detection window uses sliding window mechanism, it is assumed that sliding window size is
HORING_DURATION_SHORT, the sliding window include nearest HORING_DURATION_SHORT speech frame, and the application is first
Judge whether this HORING_DURATION_SHORT speech frame is frame of uttering long and high-pitched sounds, and then judges HORING_DURATION_ respectively
Whether the quantity of frame of uttering long and high-pitched sounds in SHORT speech frame meets the following conditions:
The effective number of speech frames of frame of uttering long and high-pitched sounds >=PEAK_NUM_THD_SHORT
Judge to enter scene of uttering long and high-pitched sounds if meeting, otherwise not can enter scene of uttering long and high-pitched sounds.Wherein, quantity thresholding PEAK_NUM_
THD_SHORT is directly proportional to HORING_DURATION_SHORT, and needs to meet PEAK_NUM_THD_SHORT≤HORING_
DURATION_SHORT。
Corresponding to the above method, utter long and high-pitched sounds scene Recognition equipment present invention also provides one kind, composed structure such as Fig. 5 institute
Show, comprising: frame judging module of uttering long and high-pitched sounds and scene judging module of uttering long and high-pitched sounds, in which:
The frame judging module of uttering long and high-pitched sounds, for each speech frame in detection window, frame bar part to judge whether according to uttering long and high-pitched sounds
In the presence of feature of uttering long and high-pitched sounds, if it does, determining that the frame is frame of uttering long and high-pitched sounds;
The scene judging module of uttering long and high-pitched sounds, for judging whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied,
Then judgement is currently scene of uttering long and high-pitched sounds, and otherwise, judgement is currently non-scene of uttering long and high-pitched sounds.
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.
Claims (5)
- The scene recognition method 1. one kind is uttered long and high-pitched sounds characterized by comprisingTo each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, determining should Speech frame is frame of uttering long and high-pitched sounds;Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then otherwise judgement is currently sentenced for scene of uttering long and high-pitched sounds It is certainly currently non-scene of uttering long and high-pitched sounds;Basis frame bar part of uttering long and high-pitched sounds judges whether there is feature of uttering long and high-pitched sounds and includes:A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, continuing to hold Otherwise row B terminates deterministic process;B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window and Peak_window window are identical or different;C, whether the power and the mean power of before_window window and after_window window for judging Po_peak meet:If it is satisfied, continuing to execute D, otherwise, terminate deterministic process;Wherein PvFor preset value;D, judge the mean power of Peak_window window and the mean power of before_window window and after_window window Whether meet:If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.
- 2. according to the method described in claim 1, it is characterized by:The scene condition of uttering long and high-pitched sounds are as follows: the quantity for frame of uttering long and high-pitched sounds in detection window is more than or equal to the quantity thresholding of setting.
- 3. according to the method described in claim 2, it is characterized by:The scene condition of uttering long and high-pitched sounds is divided into: the scene condition of uttering long and high-pitched sounds under long detection window mechanism and short detection window mechanism, wherein long inspection The detection window width for surveying window mechanism is greater than the detection window width of short detection window mechanism.
- 4. according to the method described in claim 2, it is characterized by:The quantity of speech frame for including in the quantity thresholding and detection window is directly proportional, and quantity thresholding is less than or equal to detection The quantity for the speech frame for including in window.
- The scene Recognition equipment 5. one kind is uttered long and high-pitched sounds characterized by comprising frame judging module of uttering long and high-pitched sounds and scene judging module of uttering long and high-pitched sounds, Wherein:The frame judging module of uttering long and high-pitched sounds, for each speech frame in detection window, frame bar part to be judged whether there is according to uttering long and high-pitched sounds It utters long and high-pitched sounds feature, if it does, determining that the frame is frame of uttering long and high-pitched sounds;The scene judging module of uttering long and high-pitched sounds, for judging whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then sentencing It is certainly current otherwise to be adjudicated currently as non-scene of uttering long and high-pitched sounds for scene of uttering long and high-pitched sounds;Basis frame bar part of uttering long and high-pitched sounds judges whether there is feature of uttering long and high-pitched sounds and includes:A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, continuing to hold Otherwise row B terminates deterministic process;B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window and Peak_window window are identical or different;C, whether the power and the mean power of before_window window and after_window window for judging Po_peak meet:If it is satisfied, continuing to execute D, otherwise, terminate deterministic process;Wherein PvFor preset value;D, judge the mean power of Peak_window window and the mean power of before_window window and after_window window Whether meet:If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510532929.3A CN106486133B (en) | 2015-08-27 | 2015-08-27 | One kind is uttered long and high-pitched sounds scene recognition method and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510532929.3A CN106486133B (en) | 2015-08-27 | 2015-08-27 | One kind is uttered long and high-pitched sounds scene recognition method and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106486133A CN106486133A (en) | 2017-03-08 |
CN106486133B true CN106486133B (en) | 2019-11-15 |
Family
ID=58234495
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510532929.3A Active CN106486133B (en) | 2015-08-27 | 2015-08-27 | One kind is uttered long and high-pitched sounds scene recognition method and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106486133B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109102819A (en) * | 2017-06-20 | 2018-12-28 | 中移(杭州)信息技术有限公司 | One kind is uttered long and high-pitched sounds detection method and device |
CN111724811B (en) * | 2019-03-21 | 2023-01-24 | 成都鼎桥通信技术有限公司 | Squeaking identification method and device based on subaudio frequency |
CN110838301B (en) * | 2019-11-20 | 2022-04-12 | 北京雷石天地电子技术有限公司 | Method, device terminal and non-transitory computer readable storage medium for suppressing howling |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1398054A (en) * | 2001-07-16 | 2003-02-19 | 松下电器产业株式会社 | Whistler detection and suppresser thereof, its method and computer program products |
CN102111707A (en) * | 2009-12-29 | 2011-06-29 | Gn瑞声达公司 | A method for the detection of whistling in an audio system and a hearing aid executing the method |
CN103871418A (en) * | 2014-03-06 | 2014-06-18 | 北京飞利信电子技术有限公司 | Method and device for detecting howling frequency point of acoustic amplification system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5774138B2 (en) * | 2012-01-30 | 2015-09-02 | 三菱電機株式会社 | Reverberation suppressor |
-
2015
- 2015-08-27 CN CN201510532929.3A patent/CN106486133B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1398054A (en) * | 2001-07-16 | 2003-02-19 | 松下电器产业株式会社 | Whistler detection and suppresser thereof, its method and computer program products |
CN102111707A (en) * | 2009-12-29 | 2011-06-29 | Gn瑞声达公司 | A method for the detection of whistling in an audio system and a hearing aid executing the method |
CN103871418A (en) * | 2014-03-06 | 2014-06-18 | 北京飞利信电子技术有限公司 | Method and device for detecting howling frequency point of acoustic amplification system |
Non-Patent Citations (1)
Title |
---|
基于频域的数字助听器中的啸叫检测与抑制;何艳辉等;《电声技术》;20121231;第36卷(第8期);第338-341页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106486133A (en) | 2017-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10251005B2 (en) | Method and apparatus for wind noise detection | |
CN106531172B (en) | Speaker's audio playback discrimination method and system based on ambient noise variation detection | |
CN106488052A (en) | One kind is uttered long and high-pitched sounds scene recognition method and equipment | |
US20180262832A1 (en) | Sound Signal Processing Apparatus and Method for Enhancing a Sound Signal | |
CN105810201B (en) | Voice activity detection method and its system | |
CN106486133B (en) | One kind is uttered long and high-pitched sounds scene recognition method and equipment | |
CN106098076B (en) | One kind estimating time-frequency domain adaptive voice detection method based on dynamic noise | |
CN103124165A (en) | Automatic gain control | |
US20150228293A1 (en) | Method and System for Object-Dependent Adjustment of Levels of Audio Objects | |
CN106303878A (en) | One is uttered long and high-pitched sounds and is detected and suppressing method | |
EP1706864A4 (en) | Computationally efficient background noise suppressor for speech coding and speech recognition | |
WO2020253073A1 (en) | Speech endpoint detection method, apparatus and device, and storage medium | |
CN105933557A (en) | Volume intelligent adjusting method for conference participants in synchronous voice conference and volume intelligent adjusting system thereof | |
CN103366739A (en) | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition | |
CN104464722A (en) | Voice activity detection method and equipment based on time domain and frequency domain | |
CN105810214B (en) | Voice-activation detecting method and device | |
JP2016012216A (en) | Congress analysis device, method and program | |
CN107257528A (en) | A kind of detection method of uttering long and high-pitched sounds based on weighted spectral entropy | |
JP2005018076A (en) | Method of reflecting time/language distortion in objective speech quality assessment | |
JP2013142870A (en) | Specific situation model database creating device and method thereof, specific element sound model database creating device, situation estimation device, call suitability notification device and program | |
JP5863928B1 (en) | Audio adjustment device | |
US7467084B2 (en) | Device and method for operating a voice-enhancement system | |
Reynolds et al. | The Lincoln speaker recognition system: NIST EVAL2000 | |
CN106920558B (en) | Keyword recognition method and device | |
Han et al. | Robust speaker clustering strategies to data source variation for improved speaker diarization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |