CN1760973A

CN1760973A - Method of speech recognition based on qualitative mapping

Info

Publication number: CN1760973A
Application number: CNA2004100670847A
Authority: CN
Inventors: 方红峰; 冯嘉礼
Original assignee: Shanghai Maritime University
Current assignee: Shanghai Maritime University
Priority date: 2004-10-12
Filing date: 2004-10-12
Publication date: 2006-04-19

Abstract

The method carries out learning image of voice (pattern) regarded as a set of points. Characters of point are integrated as characters of mode in time for carrying out recognition. Thus, the method can learn and recognize modes in low difficulty and quickly. Using function of inverting degree obtains degree of similarity to make process of generation and recognition possesses fuzziness, which reflects that thinking process of human brain has fuzziness. The method can generate actual, visible memory mode. Comparing with prior art, the invention does not need large sample and complex structure of recognition system. Features of the invention are: quick recognition speed and high recognition rate for learned voice.

Description

Audio recognition method based on qualitative mapping

Technical field

The present invention relates to a kind of audio recognition method, particularly relate to a kind of audio recognition method based on qualitative mapping.

Background technology

Formerly in the technology, usually use a kind of dictation machine for the identification of voice.Dictation machine is the topological method that is based upon a kind of implicit Markov model (HMM) on acoustic model and the speech model basis.It is a kind of large vocabulary that contains, the operation of unspecified person, the mode of continuous speech recognition.

Formerly in the technology, be used to realize that man-machine spoken dialog also usually uses interactive method.The conversational system of setting up is often towards a narrow field, and vocabulary is limited.

The mode of above-mentioned two kinds of speech recognitions all need to set up large sample amount and the complicated system of structure, and recognition speed is slower, and discrimination is also lower.

Summary of the invention

Purpose of the present invention provides a kind of audio recognition method based on qualitative mapping in order to overcome the deficiency of technology formerly.Need not to set up and contain large sample amount and baroque speech recognition or conversational system.Method based on qualitative mapping and conversion degree function.The speech recognition speed of learning for oneself is fast, discrimination is high.

The present invention to achieve the above object, the technical scheme of being taked is: at first voice are converted into oscillogram; Then, adopt the method extraction model and the integral body of qualitative mapping to discern; In identifying, utilize conversion degree function to ask its similarity.

The qualitative mapping that said extraction model adopted is for quantitative mapping.The method of qualitative mapping is pattern to be regarded as integral body learn and discern.In the learning process, the oscillogram integral memory in a tri-vector, one-dimensional representation file sequence number wherein.In the identifying, the needs recognized patterns is regarded as an integral body discern.According to amount---the matter transformational relation provides a result qualitatively.After promptly similarity being compared, find out a maximum similarity as a result of.And quantitatively shine upon the amount of being---the correspondence of amount, it is measured for each, and a result is all arranged.It is unpractical clearly using this quantitative mapping in this speech recognition, also there is no need.

Usually a main formulas explaining qualitative mapping is:

τ (x, (α _i, β _i))=x ⊥ (α _i, β _i)=τ _i(x) p _i(o) wherein,

τ_{i} (x) = \{\begin{matrix} 1 & iff & x &Element; (α_{i}, β_{i}) \\ &Not; (or 0) & iff & x &NotElement; (α_{i}, β_{i}) \end{matrix}

The corresponding character p of the amount of being x _i(o) true value, ⊥ is relation " x ∈ (α _i, β _i) " detecting operation, (with " ⊥ " horizontal expression interval (α down _i, β _i), a perpendicular expression x is at (α _i, β _i) between.) again weighing one matter (feature) transform operator or the qualitative operator of matter feature (or character).Because of " x ∈ (α _i, β _i) "  " α _i＜x＜β _i" so operator ⊥ can be realized with the simplest arithmetical operation " 〉=" and "≤" or "＞" and "＜".

Said conversion degree function because of measuring the matter transforming degree difference that difference causes, is to be prevalent among attribute amount-matter conversion.So introduce the conversion degree function that to portray the notion of this species diversity.

In general, qualitative benchmark (α _i, β _i) frontier point α _iAnd β _iTwo corresponding character p _i(α _i), p _i(β _i) ∈ p _i(o), be p _i(o) the easiest in to other matter feature (or character) p _j(o) or p _k(o) character of Zhuan Bianing is so can be referred to as class p _i(o) critical properties in; And mid point ξ _iCorresponding character p _i(ξ _i) then be p _i(o) the most stable in, also change other matter features and best embody its matter feature class p least easily into _i(o) Ben Zhi character is so can establish p _i(ξ _i) be p _i(o) intrinsic properties, and claim ξ _iBe p _i(o) intrinsic point.If claim k ₁

(x) = \frac{p_{i} (ξ_{i}) - p_{i} (x)}{p_{i} (ξ_{i}) - p_{i} (α_{i})}

(x＜ξ wherein _i) and

k_{2} (x) = \frac{p_{i} (x) - p_{i} (ξ_{i})}{p_{i} (β_{i}) - p_{i} (ξ_{i})}

(wherein, x＞ξ _i) be the character p of x correspondence _i(x) depart from p _i(ξ _t) degree, then can claim

η (x) = \{\begin{matrix} - (1 - k_{1} (x)) & x < ξ_{i} \\ 1 - k_{2} (x) & x > ξ_{i} \end{matrix}

Be p _i(x) near intrinsic properties p _i(ξ _i) degree or p _i(x) embody its matter feature class p _i(o) (or p _i(ξ _i)) degree or be magnetized to matter feature class p _i(o) transforming degree.

Conversion degree function η _i(x) definition: consider k ₁(x), k ₂(x) ∈ [0,1] is so η _i(x) ∈ [1,1] is as claiming mapping η: X * Γ → [1,1] _iBe p _i(x) embody its matter feature class p _i(ξ _i) degree function, if to (x, N (ξ _i, δ _i)) ∈ X * Γ,

&Exists; η_{i} (x) &Element; {[- 1,1]}_{i}

, make:

η(x，ξ _i，δ _i)＝|x-ξ _i|⊥δ _i＝η _i(x)

Thus, conversion degree function η _i(x) mathematics essence is with | x-ξ _i| with given limit value δ _iCompare, so ratio

Size, just can the pairing character p of reflection amount x _i(x) with intrinsic properties p _i(ξ _i) difference.

Said pattern-recognition is that the set of the pattern with some denominator is discerned.Pattern can be thought the description to the quantitative and structure of a certain object.Can be called the set of pattern with some denominator is mode class.Therefore, alleged pattern-recognition is identification to mode class among the present invention.

The advantage of audio recognition method of the present invention is significant.

As above-mentioned, the present invention is based on feature extraction model and the identification of qualitative mapping (QM-Qualitative Mapping).The recognition methods of this qualitative mapping model is to regard voice (pattern) image as set a little learn.During identification the feature of point is integrated into the feature of pattern.Therefore, can learn apace and discern pattern.Its learning difficulty is low, and speed is fast.Adopt conversion degree function to ask similarity, make generation and identifying have ambiguity, this has exactly reflected the ambiguity that human brain thinking process itself is had.Method of the present invention can generate tangible, visible memory pattern.Need not desired large sample amount and the complicated recognition system of structure in the technology formerly.Method of the present invention needs only the voice to having learnt, and not only recognition speed is fast, and the discrimination height.

Description of drawings

Fig. 1 is the process flow diagram of the first step of audio recognition method of the present invention.

Fig. 2 is the process flow diagram of the second step learning process of audio recognition method of the present invention.

Fig. 3 is the process flow diagram of the 3rd step identifying of audio recognition method of the present invention.

Embodiment

Further specify method of the present invention below in conjunction with accompanying drawing.

As Fig. 1, Fig. 2, shown in Figure 3.The concrete steps of the inventive method are:

＜1〉gathers voice signal, speech pattern is converted into oscillogram;

＜2〉learn with the set of above-mentioned oscillogram mid point, obtain the memory pattern of voice;

＜3〉discern for the memory pattern of above-mentioned study, adopt conversion degree function to ask for the similarity of memory pattern and reference model, compare similarity, maximum similarity is recognition result.

The flow process of Fig. 1 is the above-mentioned first step, is about to speech pattern and is converted into oscillogram, just audio files is changed into oscillogram.At first judge sound channel, if the sexadecimal voice document, then by sexadecimal a section read, with the line of the information read, just constituted oscillogram.

The flow process of Fig. 2 is second above-mentioned step, the i.e. learning process.Said study be with the oscillogram integral memory in a tri-vector, one-dimensional representation file sequence number wherein; Its concrete learning process is:

A. at first set up three-dimensional group;

B. read sample figure again;

C. gather the available point among the sample figure again, give the three-dimensional group assignment simultaneously;

D. try to achieve key point (highs and lows) again;

E. judge central row and starting point again;

F. note the memory sample mode at last.

The flow process of Fig. 3 is the 3rd above-mentioned step, i.e. identifying.Said identifying is:

A. at first need to select the sample mode of identification, and read figure according to the memory sample mode of noting in the learning process;

B. give two-dimentional assignment (removing the one dimension file sequence number in the three-dimensional) again;

C. adopt key point vector relative method again, the key point vector of reference sample pattern with the memory sample mode compared;

D. the key point vector of remembering sample mode and reference sample pattern reaches desired match point as if having, and then is recognition result;

E. if no match point, adopt conversion degree function that the memory sample mode is carried out pre-service again after, ask for the similarity of remembering sample mode and reference sample pattern;

F. similarity relatively, the similarity maximum be recognition result.

The detailed process of said employing conversion degree function identification is: at first the oscillogram of learning is asked for amplitude and differ multiple, and carry out sameization of oscillogram amplitude; Ask the distance that differs of starting point and central row again, find out closest approach and weight; Seek the nearest match point and the assignment of each point, try to achieve similarity and assignment; Compare similarity, maximum similarity is a recognition result.

Claims

1, a kind of audio recognition method based on qualitative mapping is after it is characterized in that at first voice being converted into oscillogram; Adopt the method extraction model and the integral body of qualitative mapping to discern; In identifying, utilize conversion degree function to ask its similarity.

2, the audio recognition method based on qualitative mapping according to claim 1 is characterized in that the concrete grammar step is: