CN1760973A - Method of speech recognition based on qualitative mapping - Google Patents

Method of speech recognition based on qualitative mapping Download PDF

Info

Publication number
CN1760973A
CN1760973A CNA2004100670847A CN200410067084A CN1760973A CN 1760973 A CN1760973 A CN 1760973A CN A2004100670847 A CNA2004100670847 A CN A2004100670847A CN 200410067084 A CN200410067084 A CN 200410067084A CN 1760973 A CN1760973 A CN 1760973A
Authority
CN
China
Prior art keywords
similarity
pattern
memory
sample
oscillogram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2004100670847A
Other languages
Chinese (zh)
Inventor
方红峰
冯嘉礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Maritime University
Original Assignee
Shanghai Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Maritime University filed Critical Shanghai Maritime University
Priority to CNA2004100670847A priority Critical patent/CN1760973A/en
Publication of CN1760973A publication Critical patent/CN1760973A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

The method carries out learning image of voice (pattern) regarded as a set of points. Characters of point are integrated as characters of mode in time for carrying out recognition. Thus, the method can learn and recognize modes in low difficulty and quickly. Using function of inverting degree obtains degree of similarity to make process of generation and recognition possesses fuzziness, which reflects that thinking process of human brain has fuzziness. The method can generate actual, visible memory mode. Comparing with prior art, the invention does not need large sample and complex structure of recognition system. Features of the invention are: quick recognition speed and high recognition rate for learned voice.

Description

Audio recognition method based on qualitative mapping
Technical field
The present invention relates to a kind of audio recognition method, particularly relate to a kind of audio recognition method based on qualitative mapping.
Background technology
Formerly in the technology, usually use a kind of dictation machine for the identification of voice.Dictation machine is the topological method that is based upon a kind of implicit Markov model (HMM) on acoustic model and the speech model basis.It is a kind of large vocabulary that contains, the operation of unspecified person, the mode of continuous speech recognition.
Formerly in the technology, be used to realize that man-machine spoken dialog also usually uses interactive method.The conversational system of setting up is often towards a narrow field, and vocabulary is limited.
The mode of above-mentioned two kinds of speech recognitions all need to set up large sample amount and the complicated system of structure, and recognition speed is slower, and discrimination is also lower.
Summary of the invention
Purpose of the present invention provides a kind of audio recognition method based on qualitative mapping in order to overcome the deficiency of technology formerly.Need not to set up and contain large sample amount and baroque speech recognition or conversational system.Method based on qualitative mapping and conversion degree function.The speech recognition speed of learning for oneself is fast, discrimination is high.
The present invention to achieve the above object, the technical scheme of being taked is: at first voice are converted into oscillogram; Then, adopt the method extraction model and the integral body of qualitative mapping to discern; In identifying, utilize conversion degree function to ask its similarity.
The qualitative mapping that said extraction model adopted is for quantitative mapping.The method of qualitative mapping is pattern to be regarded as integral body learn and discern.In the learning process, the oscillogram integral memory in a tri-vector, one-dimensional representation file sequence number wherein.In the identifying, the needs recognized patterns is regarded as an integral body discern.According to amount---the matter transformational relation provides a result qualitatively.After promptly similarity being compared, find out a maximum similarity as a result of.And quantitatively shine upon the amount of being---the correspondence of amount, it is measured for each, and a result is all arranged.It is unpractical clearly using this quantitative mapping in this speech recognition, also there is no need.
Usually a main formulas explaining qualitative mapping is:
τ (x, (α i, β i))=x ⊥ (α i, β i)=τ i(x) p i(o) wherein, τ i ( x ) = 1 iff x ∈ ( α i , β i ) ⫬ ( or 0 ) iff x ∉ ( α i , β i ) The corresponding character p of the amount of being x i(o) true value, ⊥ is relation " x ∈ (α i, β i) " detecting operation, (with " ⊥ " horizontal expression interval (α down i, β i), a perpendicular expression x is at (α i, β i) between.) again weighing one matter (feature) transform operator or the qualitative operator of matter feature (or character).Because of " x ∈ (α i, β i) "  " α i<x<β i" so operator ⊥ can be realized with the simplest arithmetical operation " 〉=" and "≤" or ">" and "<".
Said conversion degree function because of measuring the matter transforming degree difference that difference causes, is to be prevalent among attribute amount-matter conversion.So introduce the conversion degree function that to portray the notion of this species diversity.
In general, qualitative benchmark (α i, β i) frontier point α iAnd β iTwo corresponding character p ii), p ii) ∈ p i(o), be p i(o) the easiest in to other matter feature (or character) p j(o) or p k(o) character of Zhuan Bianing is so can be referred to as class p i(o) critical properties in; And mid point ξ iCorresponding character p ii) then be p i(o) the most stable in, also change other matter features and best embody its matter feature class p least easily into i(o) Ben Zhi character is so can establish p ii) be p i(o) intrinsic properties, and claim ξ iBe p i(o) intrinsic point.If claim k 1 ( x ) = p i ( ξ i ) - p i ( x ) p i ( ξ i ) - p i ( α i ) (x<ξ wherein i) and k 2 ( x ) = p i ( x ) - p i ( ξ i ) p i ( β i ) - p i ( ξ i ) (wherein, x>ξ i) be the character p of x correspondence i(x) depart from p it) degree, then can claim
&eta; ( x ) = - ( 1 - k 1 ( x ) ) x < &xi; i 1 - k 2 ( x ) x > &xi; i
Be p i(x) near intrinsic properties p ii) degree or p i(x) embody its matter feature class p i(o) (or p ii)) degree or be magnetized to matter feature class p i(o) transforming degree.
Conversion degree function η i(x) definition: consider k 1(x), k 2(x) ∈ [0,1] is so η i(x) ∈ [1,1] is as claiming mapping η: X * Γ → [1,1] iBe p i(x) embody its matter feature class p ii) degree function, if to (x, N (ξ i, δ i)) ∈ X * Γ, &Exists; &eta; i ( x ) &Element; [ - 1,1 ] i , make:
η(x,ξ i,δ i)=|x-ξ i|⊥δ i=η i(x)
Thus, conversion degree function η i(x) mathematics essence is with | x-ξ i| with given limit value δ iCompare, so ratio
Figure A20041006708400046
Size, just can the pairing character p of reflection amount x i(x) with intrinsic properties p ii) difference.
Said pattern-recognition is that the set of the pattern with some denominator is discerned.Pattern can be thought the description to the quantitative and structure of a certain object.Can be called the set of pattern with some denominator is mode class.Therefore, alleged pattern-recognition is identification to mode class among the present invention.
The advantage of audio recognition method of the present invention is significant.
As above-mentioned, the present invention is based on feature extraction model and the identification of qualitative mapping (QM-Qualitative Mapping).The recognition methods of this qualitative mapping model is to regard voice (pattern) image as set a little learn.During identification the feature of point is integrated into the feature of pattern.Therefore, can learn apace and discern pattern.Its learning difficulty is low, and speed is fast.Adopt conversion degree function to ask similarity, make generation and identifying have ambiguity, this has exactly reflected the ambiguity that human brain thinking process itself is had.Method of the present invention can generate tangible, visible memory pattern.Need not desired large sample amount and the complicated recognition system of structure in the technology formerly.Method of the present invention needs only the voice to having learnt, and not only recognition speed is fast, and the discrimination height.
Description of drawings
Fig. 1 is the process flow diagram of the first step of audio recognition method of the present invention.
Fig. 2 is the process flow diagram of the second step learning process of audio recognition method of the present invention.
Fig. 3 is the process flow diagram of the 3rd step identifying of audio recognition method of the present invention.
Embodiment
Further specify method of the present invention below in conjunction with accompanying drawing.
As Fig. 1, Fig. 2, shown in Figure 3.The concrete steps of the inventive method are:
<1〉gathers voice signal, speech pattern is converted into oscillogram;
<2〉learn with the set of above-mentioned oscillogram mid point, obtain the memory pattern of voice;
<3〉discern for the memory pattern of above-mentioned study, adopt conversion degree function to ask for the similarity of memory pattern and reference model, compare similarity, maximum similarity is recognition result.
The flow process of Fig. 1 is the above-mentioned first step, is about to speech pattern and is converted into oscillogram, just audio files is changed into oscillogram.At first judge sound channel, if the sexadecimal voice document, then by sexadecimal a section read, with the line of the information read, just constituted oscillogram.
The flow process of Fig. 2 is second above-mentioned step, the i.e. learning process.Said study be with the oscillogram integral memory in a tri-vector, one-dimensional representation file sequence number wherein; Its concrete learning process is:
A. at first set up three-dimensional group;
B. read sample figure again;
C. gather the available point among the sample figure again, give the three-dimensional group assignment simultaneously;
D. try to achieve key point (highs and lows) again;
E. judge central row and starting point again;
F. note the memory sample mode at last.
The flow process of Fig. 3 is the 3rd above-mentioned step, i.e. identifying.Said identifying is:
A. at first need to select the sample mode of identification, and read figure according to the memory sample mode of noting in the learning process;
B. give two-dimentional assignment (removing the one dimension file sequence number in the three-dimensional) again;
C. adopt key point vector relative method again, the key point vector of reference sample pattern with the memory sample mode compared;
D. the key point vector of remembering sample mode and reference sample pattern reaches desired match point as if having, and then is recognition result;
E. if no match point, adopt conversion degree function that the memory sample mode is carried out pre-service again after, ask for the similarity of remembering sample mode and reference sample pattern;
F. similarity relatively, the similarity maximum be recognition result.
The detailed process of said employing conversion degree function identification is: at first the oscillogram of learning is asked for amplitude and differ multiple, and carry out sameization of oscillogram amplitude; Ask the distance that differs of starting point and central row again, find out closest approach and weight; Seek the nearest match point and the assignment of each point, try to achieve similarity and assignment; Compare similarity, maximum similarity is a recognition result.

Claims (4)

1, a kind of audio recognition method based on qualitative mapping is after it is characterized in that at first voice being converted into oscillogram; Adopt the method extraction model and the integral body of qualitative mapping to discern; In identifying, utilize conversion degree function to ask its similarity.
2, the audio recognition method based on qualitative mapping according to claim 1 is characterized in that the concrete grammar step is:
<1〉gathers voice signal, speech pattern is converted into oscillogram;
<2〉learn with the set of above-mentioned oscillogram mid point, obtain the memory pattern of voice;
<3〉discern for the memory pattern of above-mentioned study, adopt conversion degree function to ask for the similarity of memory pattern and reference model, compare similarity, maximum similarity is recognition result.
3, the audio recognition method based on qualitative mapping according to claim 1 and 2, it is characterized in that said study be with the oscillogram integral memory in a tri-vector, one-dimensional representation file sequence number wherein; Its concrete learning process is:
A. at first set up three-dimensional group;
B. read sample figure again;
C. gather the available point among the sample figure again, give the three-dimensional group assignment simultaneously;
D. try to achieve key point (highs and lows) again;
E. judge central row and starting point again;
F. note the memory sample mode at last.
4, the audio recognition method based on qualitative mapping according to claim 1 and 2 is characterized in that said identifying is:
A. at first need to select the sample mode of identification, and read figure according to the memory sample mode of noting in the learning process;
B. give two-dimentional assignment (removing the one dimension file sequence number in the three-dimensional) again;
C. adopt key point vector relative method again, the key point vector of reference sample pattern with the memory sample mode compared;
D. the key point vector of remembering sample mode and reference sample pattern reaches desired match point as if having, and then is recognition result;
E. if no match point, adopt conversion degree function that the memory sample mode is carried out pre-service again after, ask for the similarity of remembering sample mode and reference sample pattern;
F. similarity relatively, the similarity maximum be recognition result.
CNA2004100670847A 2004-10-12 2004-10-12 Method of speech recognition based on qualitative mapping Pending CN1760973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2004100670847A CN1760973A (en) 2004-10-12 2004-10-12 Method of speech recognition based on qualitative mapping

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2004100670847A CN1760973A (en) 2004-10-12 2004-10-12 Method of speech recognition based on qualitative mapping

Publications (1)

Publication Number Publication Date
CN1760973A true CN1760973A (en) 2006-04-19

Family

ID=36707015

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2004100670847A Pending CN1760973A (en) 2004-10-12 2004-10-12 Method of speech recognition based on qualitative mapping

Country Status (1)

Country Link
CN (1) CN1760973A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888392A (en) * 2017-02-14 2017-06-23 广东九联科技股份有限公司 A kind of Set Top Box automatic translation system and method
CN107393538A (en) * 2017-07-26 2017-11-24 上海与德通讯技术有限公司 Robot interactive method and system
CN108520752A (en) * 2018-04-25 2018-09-11 西北工业大学 A kind of method for recognizing sound-groove and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106888392A (en) * 2017-02-14 2017-06-23 广东九联科技股份有限公司 A kind of Set Top Box automatic translation system and method
CN107393538A (en) * 2017-07-26 2017-11-24 上海与德通讯技术有限公司 Robot interactive method and system
CN108520752A (en) * 2018-04-25 2018-09-11 西北工业大学 A kind of method for recognizing sound-groove and device
CN108520752B (en) * 2018-04-25 2021-03-12 西北工业大学 Voiceprint recognition method and device

Similar Documents

Publication Publication Date Title
US9767792B2 (en) System and method for learning alternate pronunciations for speech recognition
US6535850B1 (en) Smart training and smart scoring in SD speech recognition system with user defined vocabulary
CN1165887C (en) Method and system for dynamically adjusted training for speech recognition
US8271283B2 (en) Method and apparatus for recognizing speech by measuring confidence levels of respective frames
EP1936606B1 (en) Multi-stage speech recognition
US6845357B2 (en) Pattern recognition using an observable operator model
US6839667B2 (en) Method of speech recognition by presenting N-best word candidates
CN101785051B (en) Voice recognition device and voice recognition method
US20110202487A1 (en) Statistical model learning device, statistical model learning method, and program
CN101188109B (en) Speech recognition apparatus and method
US20140039896A1 (en) Methods and System for Grammar Fitness Evaluation as Speech Recognition Error Predictor
CN109036471B (en) Voice endpoint detection method and device
WO1998000834A9 (en) Method and system for dynamically adjusted training for speech recognition
Tripathy et al. A MFCC based Hindi speech recognition technique using HTK Toolkit
Hamooni et al. Dual-domain hierarchical classification of phonetic time series
CN102592593A (en) Emotional-characteristic extraction method implemented through considering sparsity of multilinear group in speech
EP0344017B1 (en) Speech recognition system
CN117727307B (en) Bird voice intelligent recognition method based on feature fusion
CN1760973A (en) Method of speech recognition based on qualitative mapping
CN1198261C (en) Voice identification based on decision tree
Hirschberg et al. Generalizing prosodic prediction of speech recognition errors
JP3428058B2 (en) Voice recognition device
CN112420075B (en) Multitask-based phoneme detection method and device
RU2403628C2 (en) Method of recognising key words in continuous speech
Cao et al. Continuous speech research based on two-weight neural network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication