CN106486133B

CN106486133B - One kind is uttered long and high-pitched sounds scene recognition method and equipment

Info

Publication number: CN106486133B
Application number: CN201510532929.3A
Authority: CN
Inventors: 徐绍君; 王亮; 鲜柯
Original assignee: Chengdu Dingqiao Communication Technology Co Ltd
Current assignee: Chengdu Dingqiao Communication Technology Co Ltd
Priority date: 2015-08-27
Filing date: 2015-08-27
Publication date: 2019-11-15
Anticipated expiration: 2035-08-27
Also published as: CN106486133A

Abstract

It utters long and high-pitched sounds scene recognition method this application discloses one kind, comprising: to each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, determining that the frame is frame of uttering long and high-pitched sounds；Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently scene of uttering long and high-pitched sounds, otherwise, judgement is currently non-scene of uttering long and high-pitched sounds.It utters long and high-pitched sounds scene Recognition equipment disclosed herein as well is one kind.Using technical solution disclosed in the present application, the accuracy rate for detection of uttering long and high-pitched sounds can be improved, to be adapted to subsequent chauvent's criterion processing.

Description

One kind is uttered long and high-pitched sounds scene recognition method and equipment

Technical field

This application involves field of communication technology, in particular to one kind is uttered long and high-pitched sounds scene recognition method and equipment.

Background technique

The voice service form of sector terminal is mainly the business such as cluster mode, direct mode operation (DMO), and this kind of business Mainly use outer mode playback.Since sector terminal largely works in the biggish outdoor or workshop of ambient noise, it is desirable that it gives great volume, Therefore the uplink and downlink volume gain of terminal is usually adjusted larger, and after sound is amplified by loop gain, energy is constantly accumulated shape It at uttering long and high-pitched sounds, and utters long and high-pitched sounds and seriously affects the normal use of voice service, great discomfort is caused to customer perception, therefore to field of uttering long and high-pitched sounds Scape, which carries out identification tool, to have very important significance.

However, sector terminal is still in the stage of fumbling, largely to the solution of scene Recognition of uttering long and high-pitched sounds and immature at present The problem of the generally existing low efficiency of identifying schemes, identification inaccuracy, has seriously affected the overall performance of chauvent's criterion.

Summary of the invention

It utters long and high-pitched sounds scene recognition method and equipment this application provides one kind, to improve the accuracy rate for detection of uttering long and high-pitched sounds.

One kind provided by the present application is uttered long and high-pitched sounds scene recognition method, comprising:

To each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, really The fixed frame is frame of uttering long and high-pitched sounds；

Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently scene of uttering long and high-pitched sounds, it is no Then, judgement is currently non-scene of uttering long and high-pitched sounds.

Preferably, the basis is uttered long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds and includes:

A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, after It is continuous to execute B, otherwise, terminate deterministic process；

B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window and Peak_window window are identical or different；

C, judge whether power and the mean power of before_window window and after_window window of Po_peak are full Foot:

If it is satisfied, continuing to execute D, otherwise, terminate deterministic process；Wherein P_vFor preset value；

D, judge being averaged for the mean power of Peak_window window and before_window window and after_window window Whether power meets:

If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.

Preferably, the scene condition of uttering long and high-pitched sounds are as follows: the quantity for frame of uttering long and high-pitched sounds in detection window is more than or equal to the quantity of setting Thresholding.

Preferably, the scene condition of uttering long and high-pitched sounds is divided into: the scene item of uttering long and high-pitched sounds under long detection window mechanism and short detection window mechanism Part, wherein the detection window width of long detection window mechanism is greater than the detection window width of short detection window mechanism.

Preferably, the quantity of speech frame for including in the quantity thresholding and detection window is directly proportional, and quantity thresholding is less than Or the quantity equal to the speech frame for including in detection window.

It utters long and high-pitched sounds scene Recognition equipment present invention also provides one kind, comprising: frame judging module of uttering long and high-pitched sounds and scene judgement of uttering long and high-pitched sounds Module, in which:

The frame judging module of uttering long and high-pitched sounds, for each speech frame in detection window, frame bar part to judge whether according to uttering long and high-pitched sounds In the presence of feature of uttering long and high-pitched sounds, if it does, determining that the frame is frame of uttering long and high-pitched sounds；

The scene judging module of uttering long and high-pitched sounds, for judging whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, Then judgement is currently scene of uttering long and high-pitched sounds, and otherwise, judgement is currently non-scene of uttering long and high-pitched sounds.

As seen from the above technical solution, utter long and high-pitched sounds scene recognition method and equipment provided by the present application, first according to frame of uttering long and high-pitched sounds Condition judges to detect with the presence or absence of feature of uttering long and high-pitched sounds in each speech frame in window respectively, if it does, determining that the frame is frame of uttering long and high-pitched sounds； Then judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then otherwise judgement is currently sentenced for scene of uttering long and high-pitched sounds Certainly currently it is non-scene of uttering long and high-pitched sounds, phonetic feature of uttering long and high-pitched sounds can be effectively identified by technical scheme, improve detection of uttering long and high-pitched sounds Accuracy rate, to be adapted to the processing of subsequent chauvent's criterion.

Detailed description of the invention

Fig. 1 is the flow diagram of the preferable scene recognition method of uttering long and high-pitched sounds of the present invention one；

Fig. 2 is the time domain waveform schematic diagram in the presence of phenomenon of uttering long and high-pitched sounds；

Fig. 3 is the frequency-domain waveform schematic diagram in the presence of phenomenon of uttering long and high-pitched sounds；

Fig. 4 is the schematic diagram that present invention judgement is uttered long and high-pitched sounds a little；

Fig. 5 is the composed structure schematic diagram of a preferable equipment of the invention.

Specific embodiment

It is right hereinafter, referring to the drawings and the embodiments, for the objects, technical solutions and advantages of the application are more clearly understood The application is described in further detail.

Fig. 1 is the flow diagram of the preferable scene recognition method of uttering long and high-pitched sounds of the present invention one, this method comprises:

Firstly, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if deposited to each speech frame in detection window Determining that the frame is frame of uttering long and high-pitched sounds；

Then, judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then judgement is currently field of uttering long and high-pitched sounds Scape, otherwise, judgement are currently non-scene of uttering long and high-pitched sounds.

In general, howling in the time domain concentrate by energy comparison, and there are saturated phenomenons, and are concentrated mainly on a certain section The frequency domain for comparing concentration, the region as shown in ellipse in Fig. 2.Fig. 3 then shows two and utters long and high-pitched sounds a little.In Fig. 2, horizontal axis indicates the time, Unit is the second, and the longitudinal axis indicates power, unit mW；In Fig. 3, horizontal axis indicates frequency, unit Hz, and the longitudinal axis indicates power, unit For dB.The application is concentrated mainly on some this feature into multiple frequency points according to an energy of uttering long and high-pitched sounds and identifies frame of uttering long and high-pitched sounds, and proposes Uttering long and high-pitched sounds, it is necessary to meet following condition for frame:

(1) frequency uttered long and high-pitched sounds a little is greater than the thresholding min_frequency of setting.

(2) power uttered long and high-pitched sounds in the peak_window window centered on uttering long and high-pitched sounds a little a little is maximum, remembers that the position uttered long and high-pitched sounds a little is Po_peak。

(3) mean power of the power and before_window window and after_window window uttered long and high-pitched sounds a little meets:

Wherein, P_dFor preset value, recommendation 10.

(4) mean power of the mean power of Peak_window window and before_window window and after_window window Meet:

Wherein, the relationship of Peak_window window, before_window window and after_window window is as shown in Figure 4.Fig. 4 In shown example, of same size, the before_ of Peak_window window, before_window window and after_window window Window window and after_window window are located at the two sides of Peak_window window.In practical applications, Peak_window The width of window, before_window window and after_window window can be identical or different, the value range recommendation of width For 5~12 sampled points.

If current speech frame meets above-mentioned condition, current speech frame is adjudicated in the presence of uttering long and high-pitched sounds a little, current speech frame can be sentenced It is certainly frame of uttering long and high-pitched sounds.

Based on above-mentioned frame bar part of uttering long and high-pitched sounds, judge that a certain speech frame whether there is the detailed process for feature of uttering long and high-pitched sounds are as follows:

A, judge whether the frequency of prominent point in speech frame is greater than the first thresholding of setting (i.e. as previously described Min_frequency), if so, continuing to execute B, otherwise, terminate deterministic process；

B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window can be identical or different with Peak_window window, width Value range recommendation be 5~12 sampled points；

If it is satisfied, continuing to execute D, otherwise, terminate deterministic process；Wherein, P_vFor preset value, recommendation is 5；

If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame, it may be assumed that the speech frame is frame of uttering long and high-pitched sounds.

For howling scene, phenomenon of uttering long and high-pitched sounds can continue to generate, and there is feature of uttering long and high-pitched sounds in continuous multiple speech frames, immediately Characteristic of field, the application propose the scene decision method of uttering long and high-pitched sounds based on sliding window as previously described, together based on the analysis to this feature When, using long detection window mechanism and short detection window mechanism.Short detection window mechanism is uttered long and high-pitched sounds a little by generating in analysis short cycle Speech frame probability is mainly used for the strong howling of judgement burst to determine whether into scene of uttering long and high-pitched sounds；And grow detection window mechanism Be by generating the speech frame probability a little of uttering long and high-pitched sounds in analysis long period to determine whether into scene of uttering long and high-pitched sounds, be mainly used for judging in Slowly varying howling.

Long detection window mechanism and the algorithm of short detection window mechanism and processing are almost the same, and the main distinction is thresholding and detection Window is of different sizes, is illustrated by taking short detection window mechanism as an example herein.Short detection window uses sliding window mechanism, it is assumed that sliding window size is HORING_DURATION_SHORT, the sliding window include nearest HORING_DURATION_SHORT speech frame, and the application is first Judge whether this HORING_DURATION_SHORT speech frame is frame of uttering long and high-pitched sounds, and then judges HORING_DURATION_ respectively Whether the quantity of frame of uttering long and high-pitched sounds in SHORT speech frame meets the following conditions:

The effective number of speech frames of frame of uttering long and high-pitched sounds >=PEAK_NUM_THD_SHORT

Judge to enter scene of uttering long and high-pitched sounds if meeting, otherwise not can enter scene of uttering long and high-pitched sounds.Wherein, quantity thresholding PEAK_NUM_ THD_SHORT is directly proportional to HORING_DURATION_SHORT, and needs to meet PEAK_NUM_THD_SHORT≤HORING_ DURATION_SHORT。

Corresponding to the above method, utter long and high-pitched sounds scene Recognition equipment present invention also provides one kind, composed structure such as Fig. 5 institute Show, comprising: frame judging module of uttering long and high-pitched sounds and scene judging module of uttering long and high-pitched sounds, in which:

The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims

The scene recognition method 1. one kind is uttered long and high-pitched sounds characterized by comprising

To each speech frame in detection window, according to uttering long and high-pitched sounds, frame bar part judges whether there is feature of uttering long and high-pitched sounds, if it does, determining should Speech frame is frame of uttering long and high-pitched sounds；

Judge whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then otherwise judgement is currently sentenced for scene of uttering long and high-pitched sounds It is certainly currently non-scene of uttering long and high-pitched sounds；

Basis frame bar part of uttering long and high-pitched sounds judges whether there is feature of uttering long and high-pitched sounds and includes:

A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, continuing to hold Otherwise row B terminates deterministic process；

B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window and Peak_window window are identical or different；

C, whether the power and the mean power of before_window window and after_window window for judging Po_peak meet:

If it is satisfied, continuing to execute D, otherwise, terminate deterministic process；Wherein P_vFor preset value；

D, judge the mean power of Peak_window window and the mean power of before_window window and after_window window Whether meet:

If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.
2. according to the method described in claim 1, it is characterized by:

The scene condition of uttering long and high-pitched sounds are as follows: the quantity for frame of uttering long and high-pitched sounds in detection window is more than or equal to the quantity thresholding of setting.
3. according to the method described in claim 2, it is characterized by:

The scene condition of uttering long and high-pitched sounds is divided into: the scene condition of uttering long and high-pitched sounds under long detection window mechanism and short detection window mechanism, wherein long inspection The detection window width for surveying window mechanism is greater than the detection window width of short detection window mechanism.
4. according to the method described in claim 2, it is characterized by:

The quantity of speech frame for including in the quantity thresholding and detection window is directly proportional, and quantity thresholding is less than or equal to detection The quantity for the speech frame for including in window.
The scene Recognition equipment 5. one kind is uttered long and high-pitched sounds characterized by comprising frame judging module of uttering long and high-pitched sounds and scene judging module of uttering long and high-pitched sounds, Wherein:

The frame judging module of uttering long and high-pitched sounds, for each speech frame in detection window, frame bar part to be judged whether there is according to uttering long and high-pitched sounds It utters long and high-pitched sounds feature, if it does, determining that the frame is frame of uttering long and high-pitched sounds；

The scene judging module of uttering long and high-pitched sounds, for judging whether current detection window meets scene condition of uttering long and high-pitched sounds, if it is satisfied, then sentencing It is certainly current otherwise to be adjudicated currently as non-scene of uttering long and high-pitched sounds for scene of uttering long and high-pitched sounds；

Basis frame bar part of uttering long and high-pitched sounds judges whether there is feature of uttering long and high-pitched sounds and includes:

A, judge whether the frequency of prominent point in the speech frame is greater than the first thresholding of setting, if so, continuing to hold Otherwise row B terminates deterministic process；

B, the position for remembering the prominent point is Po_peak, is delimited centered on Po_peak according to the width of setting Peak_window window, and before_window window and after_window window delimited respectively in the two sides of Peak_window window, Wherein, the width of before_window window and after_window window and Peak_window window are identical or different；

C, whether the power and the mean power of before_window window and after_window window for judging Po_peak meet:

If it is satisfied, continuing to execute D, otherwise, terminate deterministic process；Wherein P_vFor preset value；

D, judge the mean power of Peak_window window and the mean power of before_window window and after_window window Whether meet:

If it is satisfied, then determining there is feature of uttering long and high-pitched sounds in the speech frame.