CN101609672A - A kind of speech recognition semantic confidence feature extracting methods and device - Google Patents

A kind of speech recognition semantic confidence feature extracting methods and device Download PDF

Info

Publication number
CN101609672A
CN101609672A CNA2009100888676A CN200910088867A CN101609672A CN 101609672 A CN101609672 A CN 101609672A CN A2009100888676 A CNA2009100888676 A CN A2009100888676A CN 200910088867 A CN200910088867 A CN 200910088867A CN 101609672 A CN101609672 A CN 101609672A
Authority
CN
China
Prior art keywords
speech
recognition result
theme
anchor point
topic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2009100888676A
Other languages
Chinese (zh)
Other versions
CN101609672B (en
Inventor
陈伟
刘刚
郭军
国玉晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN2009100888676A priority Critical patent/CN101609672B/en
Publication of CN101609672A publication Critical patent/CN101609672A/en
Application granted granted Critical
Publication of CN101609672B publication Critical patent/CN101609672B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses a kind of speech recognition semantic confidence feature extracting methods, comprising: by topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result; The theme that utilizes The reasoning results to calculate speech distributes; From recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words); Utilize the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes; Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.Also disclose a kind of speech recognition semantic confidence feature deriving means,, thereby can describe more accurately and the analyzing speech recognition result, improved the precision of confidence bits for confidence bits provides the guidance of semantic high layer information.

Description

A kind of speech recognition semantic confidence feature extracting methods and device
Technical field
The present invention relates to field of speech recognition, particularly a kind of semantic confidence feature extracting methods and device.
Background technology
It is the key that the recognition result reliability is estimated in the speech recognition aftertreatment that the letter feature is put in speech recognition, is mainly used in the speech recognition confidence bits problem that solves.
The voice confidence bits generally need be put the letter feature or characteristics combination is labeled as correct and wrong two classes with confidence bits primitive in the recognition result based on different, thereby estimates the reliability of recognition result.The primitive of confidence bits generally adopts speech, also can adopt speech frame, phoneme and sentence etc. simultaneously.
At present, the information that the letter feature is mainly derived from demoder is put in speech recognition, but, yellow Ceng Yang mentions in its 1998 books " HNC (hierarchical network of concepts) theory " of being published by publishing house of Tsing-Hua University, human auditory's experiment shows, human auditory's pre-service can only be caught in the continuous speech stream 70% syllable, and people can instruct understanding to voice with knowledge such as grammer, semantemes when sound pronunciation is fuzzy.At present, the key of speech recognition has also depended on the ambiguity solution and the error correcting capability of after-treatment system, so grammer, semantic contour level information are very important for the speech recognition aftertreatment.But, how in the speech recognition aftertreatment, to extract syntax and semantics effectively and put also difficulty relatively of letter feature machine.
The inventor finds that there are the following problems at least in the prior art in realizing process of the present invention:
The information that the letter feature all derives from demoder put in the voice that existing method is extracted, and the characteristic information source is more single, can't believe that from the semantic stratification of semantic contour level information extraction feature instructs the evaluation to recognition result effectively.
The present invention is based on statistics topic model (Statistical Topic Models), given recognition result, extract thematic structure implicit in the recognition result and can be understood by topic model by the people, metastable implicit semantic structure, and seek the description of semantic layer for recognition result, and then the semantic feature of speech or other confidence bits primitives in the extraction recognition result, topic model has comprised that latent Di Li Cray distributes (Latent DirichletAllocation, LDA), the probability latent semantic analysis (Probability Latent Semantic Analysis, PLSA) etc.
Summary of the invention
In view of this, the purpose of the one or more embodiment of the present invention is to provide a kind of semantic confidence feature extracting methods and device, realize to increase the information source of putting the letter feature, to describe more accurately and the analyzing speech recognition result, improve the purpose of confidence bits precision by knowledge such as semantemes.
The embodiment of the invention provides a kind of speech recognition semantic confidence feature extracting methods, comprising:
By topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result;
The theme that utilizes The reasoning results to calculate speech distributes, and from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), and utilizing the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes;
Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.
Also disclose a kind of speech recognition semantic confidence feature deriving means, having comprised:
The subject analysis device is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result;
The posterior probability generating apparatus is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus is used for the thematic structure of the recognition result that obtains according to the subject analysis device, and the theme that calculates speech distributes;
Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes;
The semantic feature extraction element is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
Compared with prior art, the speech recognition semantic confidence feature that the embodiment of the invention proposes for confidence bits provides the guidance of semantic high layer information, thereby can be described and the analyzing speech recognition result more accurately, improves the precision of confidence bits.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention;
Shown in Figure 2, be the process flow diagram of the generation recognition result benchmark theme distribution of the embodiment of the invention;
Shown in Fig. 2-1, be the method flow diagram of the searching anchor point speech of the embodiment of the invention;
Shown in Fig. 2-2, be that to carry out confidence bits with acoustics posterior probability and semantic confidence characteristics combination of the present invention be example, provide the mark precision is sought parameter with the anchor point speech variation synoptic diagram;
Shown in Figure 3, be the device block diagram of the semantic confidence feature extraction of the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
In the relevant technologies scheme of the semantic confidence feature extraction that the embodiment of the invention provides, a basic premise is arranged, the speech that is correct identification in the recognition result more meets semantic rules than the speech of wrong identification, and under above-mentioned prerequisite, the inventor has conceived the relevant embodiment of the present invention just.
In embodiments of the present invention, can carry out the division of following mode to the function of semantic confidence feature extraction:
First functional unit of the embodiment of the invention mainly uses a large amount of document sets, trains topic model.
Second functional unit of the embodiment of the invention mainly carries out speech recognition, exports final recognition result, and the whole decode procedure of detail record.
The 3rd functional unit of the embodiment of the invention is mainly used at first, second functional unit and generates under the guidance of information, extracts the semantic confidence feature of speech in the recognition result.The topic model that utilizes first functional unit to generate carries out rational analysis to voice identification result, obtains the thematic structure in the recognition result; Utilize the detailed decoded information of second functional unit record, calculate the acoustics posterior probability of each speech in the recognition result.Under the guidance of these information, the theme that calculates speech distributes; And from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, utilize the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes; Distribute by the theme that utilizes speech in the recognition result, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
Need to prove; the division of above-mentioned functions module is relative; be mainly used in and help the those skilled in the art to understand principle of the present invention on the whole; the embodiment of the invention can also realize principle of the present invention with other functional module and combination thereof; reach identical technique effect, this does not all exceed protection scope of the present invention.
As shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention, comprising:
First functional unit 101, second functional unit 102 and the 3rd functional unit 103, the 3rd functional unit is connected with second functional unit with first functional unit respectively, and wherein, first functional unit 101 comprises document sets 1011, topic model training module 1012 and topic model 1013; Second functional unit 102 comprises that speech data load module 1021, sound identification module 1022, voice identification result 1023 and speech recognition decoder information 1024, the three functional units comprise topic model analysis module 1031, posterior probability generation module 1032, speech theme distribution generation module 1033, document benchmark theme distribution generation module 1034 and semantic feature extraction module 1035.
Be example with LDA below, introduce topic model analysis module 1031 and speech theme distribution generation module 1033.
The LDA model is a kind of topic model that can extract the unsupervised learning of the implicit theme of text that proposes in recent years, it is a production probability model that comprises speech, theme and document three-decker, the document sets of supposing training LDA comprises M document and V different speech, the theme number of LDA is K, promptly z → = ( z 1 , z 2 , . . . , z K ) , Speech number among the current recognition result d is N d, the equivalent sequence w → = ( w 1 , w 2 , . . . , w N d ) .
Topic model analysis module 1031 is exactly to obtain thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j by the LDA reasoning: Φ j ( w ) = P ( w | z = j ) And θ j ( d ) = P ( z = j | d ) .
Speech theme distribution generation module 1034 is exactly an information of utilizing topic model analysis module 1031 to obtain, calculates the theme distribution Topic_dis (w of speech i), w wherein iBe the speech among the recognition result d, Topic_dis (w i) be the vector of K dimension, concrete formula as follows:
(Topic_dis(w i)=(H(w i,z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) ;
Below, in conjunction with Fig. 2-Fig. 4, be example with LDA, the method for document benchmark theme distribution generation module 1034 in the key diagram 1.
As shown in Figure 2, be the process flow diagram of recognition result benchmark theme distribution generation module in the embodiment of the invention, comprising:
201, current recognition result is carried out the topic model reasoning, obtain the thematic structure in the recognition result;
202, result and posterior probability are searched anchor point speech in the recognition result by inference, speech among the recognition result d should be consistent with entire chapter document theme to be expressed, but the theme of considering recognition result d distributes mainly by some the thematic strong speech decisions in the d, the benchmark theme that therefore will calculate recognition result distributes and just need find these to the speech that theme distributes and plays a decisive role, and is called anchor point speech (Anchor word).Because there is the speech of being discerned in the recognition result by mistake, need guarantee at first that the anchor point speech has very big may being correctly validated when therefore selecting the anchor point speech, promptly the acoustics posterior probability is enough big, also will guarantee the thematic more intense of these anchor point speech simultaneously.Concrete anchor point speech is sought way shown in Fig. 2-1, Fig. 2-the 1st, and the method flow diagram of the searching anchor point speech of the embodiment of the invention:
2021, calculate the acoustics posterior probability of each speech in the recognition result by the detailed decoded information of speech recognition record;
2022, the threshold value of posterior probability is set, and called after PPThresh when certain speech posterior probability during greater than this threshold value, is added to credible class with this speech, among the called after CClass; If less than this threshold value, then give up;
2023, the number of speech among the credible class CClass of statistics, called after C_num;
2024, judge whether speech is arranged among the credible class CClass, promptly whether C_num is 0;
2025, if do not have speech among the credible class CClass, promptly C_num equals 0, then changes posteriority probability threshold value PPThresh, selects speech again and is added to credible class;
2026, if among the credible class CClass speech is arranged, promptly C_num is not equal to 0, calculates the Topic_dis (w of each speech among the credible class CClass i), and record w iPairing H (w i, z j) in maximal value, promptly max _ prob ( w i ) = max j = 1 . . . K H ( w i , z j ) , Maximal value is to should the thematic power of speech;
2027, the ratio choose the anchor point speech is set, called after Aratio, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class CClass according to max_prob (w i) select the anchor point speech of L speech from big to small as current document.
203, after the 202 anchor point speech that find in the recognition result, the theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence A → = ( A 1 , A 2 . . . A L ) , Anchor point speech A then iTheme be distributed as Topic_dis (A i), i=1...L.
204, the benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K))
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for the probable value of each anchor point speech under certain theme made up for example uses the method for asking arithmetic mean, then
L ( d , z j ) = 1 L * Σ i = 1 L H ( A i , z j )
Therefore, the semantic feature extraction module 1035 of Fig. 1 can be by comparing speech theme distribution Topic_dis (w i) with the similarity between the document benchmark theme distribution Topic_dis (d), as the semantic confidence feature of speech in the recognition result, promptly
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature, the method for tolerance similarity Similarity () has a lot, such as symmetrical K-L divergence:
Make M1:Topic_dis (w i); M2:Topic_dis (d);
Then can be defined as the M1 of reference model and the K-L divergence of M2 with M2
D KL ( M 1 | | M 2 ) = Σ j = 1 K H ( w i , z j ) * log ( H ( w i , z j ) L ( d , z j ) )
In order not consider reference model, then define the measure of symmetrical K-L divergence, thereby the semantic confidence of speech is characterized as similarity
Sem ( w i ) = 1 2 { D KL ( M 1 | | M 2 ) + D KL ( M 2 | | M 1 ) }
Shown in Fig. 2-2, be that to carry out confidence bits with acoustics posterior probability and semantic confidence characteristics combination of the present invention be example, provide the mark precision and search the variation synoptic diagram of parameter with the anchor point speech.
Can see from Fig. 2-2, do not use acoustics posterior probability threshold value, be that parameter PP Thresh=0 searched in the anchor point speech, with use acoustics posterior probability threshold value, PPThresh=0.88 compares in this synoptic diagram, can see that the effect of using PPThresh can be better, thereby prove when selecting the anchor point speech and need select to be correctly validated the big speech of possibility that promptly the acoustics posterior probability is greater than the speech of threshold value.Can see simultaneously when selecting the anchor point speech and using acoustics posterior probability threshold value, the mark performance is along with the selection percentage Aratio amplitude of variation of anchor point speech is bigger, thereby the necessity of selecting the Aratio parameter has been described also, and then explanation selects the anchor point speech need guarantee at first that the anchor point speech has very big may being correctly validated, be that the acoustics posterior probability is enough big, also to guarantee the thematic more intense of these anchor point speech simultaneously, could extract high performance semantic confidence feature.
As shown in Figure 3, the embodiment of the invention also provides a kind of speech recognition semantic confidence feature deriving means, comprising:
Subject analysis device 301 is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result, supposes that promptly number of topics is K, promptly z → = ( z 1 , z 2 , . . . , z K ) , The probability of theme j under the probability of speech w and the current recognition result d under the given theme j: Φ j ( w ) = P ( w | z = j ) And θ j ( d ) = P ( z = j | d ) ;
Posterior probability generating apparatus 302 is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus 303 is used for the thematic structure of the recognition result that obtains according to subject analysis device 301, calculates the theme distribution Topic_dis (w of speech i), according to formula
Topic_dis(w i)=(H(w i,z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) ;
Document benchmark theme distribution generating apparatus 304, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by subject analysis device 301, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus 302, from recognition result, choose some, as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes the acoustics posterior probability greater than certain threshold value and thematic strong speech.Suppose L altogether of current anchor point speech, the corresponding point sequence A → = ( A 1 , A 2 . . . A L ) , i=1...L。Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, called after Topic_dis (d), the vector for the K dimension, specifically pass through formula:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K));
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for each anchor point speech is made up in certain theme lower probability value;
Semantic feature extraction element 305 is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature,
Similarity () is the method for similarity measurement.
Apparatus of the present invention embodiment has the technique effect identical with method embodiment, no longer repeats.
Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
Above-described embodiment of the present invention does not constitute the qualification to protection domain of the present invention.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within protection scope of the present invention.

Claims (11)

1, a kind of speech recognition semantic confidence feature extracting methods is characterized in that, comprising:
By topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result;
The theme that utilizes The reasoning results to calculate speech distributes;
From recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes;
Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.
2, the method for claim 1 is characterized in that, by topic model voice identification result is carried out reasoning, obtains the thematic structure of recognition result, comprising:
Suppose that number of topics is K, promptly z → = ( z 1 , z 2 , . . . , z K ) , Obtain by the topic model reasoning, the thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j: Φ j ( w ) = P ( w | z = j ) And θ j ( d ) = P ( z = j | d ) .
3, the method for claim 1 is characterized in that, the theme that utilizes The reasoning results to calculate speech distributes and comprises:
The information of utilizing the topic model reasoning to obtain, the theme distribution Topic_dis (w of calculating speech i), w wherein iBe the speech among the recognition result d, Topic_dis (w i) be the vector of K dimension, concrete formula as follows:
Topic_dis(w i)=(H(w i,z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) .
4, the method for claim 1, it is characterized in that, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes, and comprising:
Determining of anchor point speech, mainly pass through following steps:
By the detailed decoded information of speech recognition record, calculate the acoustics posterior probability of each speech in the recognition result;
The threshold value of posterior probability is set, when certain speech posterior probability is greater than this threshold value in the recognition result, this speech is added in the credible class; If less than this threshold value, then give up;
Add up the number of speech in the credible class, called after C_num;
Judge whether speech is arranged in the credible class; If do not have speech in the credible class, then change the posteriority probability threshold value, select speech again and be added to credible class;
If in the credible class speech is arranged, calculates the Topic_dis (w of each speech in the credible class i), and record w iPairing H (w i, z j) in maximal value, promptly max _ prob ( w i ) = max j = 1 . . . K H ( w i , z j ) , Maximal value is to should the thematic power of speech;
The ratio Aratio of anchor point speech is chosen in setting, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class according to max_prob (w i) select the anchor point speech of L speech from big to small as current recognition result.
Obtain after the anchor point speech in the recognition result, following steps are mainly passed through in the calculating that recognition result benchmark theme distributes:
The theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence A → = ( A 1 , A 2 . . . A L ) , Anchor point speech A then iTheme be distributed as Topic_dis (A i), i=1...L.
Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K));
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for the probable value of each anchor point speech under j theme made up, the form of Com () has a lot, such as asking arithmetic mean value etc.
5, the method for claim 1 is characterized in that, uses the theme of speech in the recognition result to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, the semantic confidence feature as speech comprises:
By making word theme distribution Topic_dis (w i), the similarity between itself and the recognition result benchmark theme distribution Topic_dis (d) relatively is as the semantic confidence feature of speech in the recognition result, promptly
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature, Similarity () is the similarity measurement function, similarity measurement function commonly used has a lot, such as symmetrical K-L divergence etc.
6, a kind of speech recognition semantic confidence feature deriving means comprises:
The subject analysis device is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result;
The posterior probability generating apparatus is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus is used for the thematic structure of the recognition result that obtains according to the subject analysis device, and the theme that calculates speech distributes;
Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain recognition result then distributes;
The semantic feature extraction element is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
7, device as claimed in claim 6 is characterized in that, described subject analysis device comprises: be used to use topic model that recognition result is carried out rational analysis, obtain the thematic structure in the recognition result, suppose that promptly number of topics is K, promptly z → = ( z 1 , z 2 , . . . , z K ) , The probability of theme j under the probability of speech w and the current recognition result d under the given theme j: Φ j ( w ) = P ( w | z = j ) And θ j ( d ) = P ( z = j | d ) .
8, device as claimed in claim 6 is characterized in that, described posterior probability generating apparatus comprises: be used for the detailed decoded information that utilizes speech recognition process to write down, calculate the acoustics posterior probability of each speech in the recognition result.
9, device as claimed in claim 6 is characterized in that, institute's predicate theme distribution generating apparatus comprises: be used for the thematic structure of the recognition result that obtains according to the subject analysis device, calculate the theme distribution Topic_dis (w of speech i), according to formula
Topic_dis(w i)=(H(w i.z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) .
10, device as claimed in claim 6, it is characterized in that, described document benchmark theme distribution generating apparatus comprises: be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech; The benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes.Suppose L altogether of current anchor point speech, the corresponding point sequence A → = ( A 1 , A 2 . . . A L ) , I=1...L distributes according to the benchmark theme of the theme Distribution calculation recognition result d of anchor point speech, called after Topic_dis (d), the vector for the K dimension, pass through formula:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K));
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for each anchor point speech is made up in certain theme lower probability value, the form of Com () has a lot, such as asking arithmetic mean value etc.
11, device as claimed in claim 6, it is characterized in that, described semantic feature extraction element generating apparatus comprises: be used to utilize the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature,
Similarity () is the method for similarity measurement, and similarity measurement function commonly used has a lot, such as symmetrical K-L divergence etc.
CN2009100888676A 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device Expired - Fee Related CN101609672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100888676A CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100888676A CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Publications (2)

Publication Number Publication Date
CN101609672A true CN101609672A (en) 2009-12-23
CN101609672B CN101609672B (en) 2011-09-07

Family

ID=41483397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100888676A Expired - Fee Related CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Country Status (1)

Country Link
CN (1) CN101609672B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894549A (en) * 2010-06-24 2010-11-24 中国科学院声学研究所 Method for fast calculating confidence level in speech recognition application field
CN103177721A (en) * 2011-12-26 2013-06-26 中国电信股份有限公司 Voice recognition method and system
CN103700368A (en) * 2014-01-13 2014-04-02 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus
CN107195299A (en) * 2016-03-14 2017-09-22 株式会社东芝 Train the method and apparatus and audio recognition method and device of neutral net acoustic model
CN109389983A (en) * 2017-08-10 2019-02-26 奥迪股份公司 For handling the method and switching equipment of the recognition result of automatic online-speech recognition device of mobile terminal device
WO2022121257A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Model training method and apparatus, speech recognition method and apparatus, device, and storage medium
CN115376499A (en) * 2022-08-18 2022-11-22 东莞市乐移电子科技有限公司 Learning monitoring means applied to intelligent earphone in learning field

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9263042B1 (en) * 2014-07-25 2016-02-16 Google Inc. Providing pre-computed hotword models

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1123863C (en) * 2000-11-10 2003-10-08 清华大学 Information check method based on speed recognition
CN1223985C (en) * 2002-10-17 2005-10-19 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN101013421B (en) * 2007-02-02 2012-06-27 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101030369B (en) * 2007-03-30 2011-06-29 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894549A (en) * 2010-06-24 2010-11-24 中国科学院声学研究所 Method for fast calculating confidence level in speech recognition application field
CN103177721A (en) * 2011-12-26 2013-06-26 中国电信股份有限公司 Voice recognition method and system
CN103177721B (en) * 2011-12-26 2015-08-19 中国电信股份有限公司 Audio recognition method and system
CN103700368A (en) * 2014-01-13 2014-04-02 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN103700368B (en) * 2014-01-13 2017-01-18 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN105529028A (en) * 2015-12-09 2016-04-27 百度在线网络技术(北京)有限公司 Voice analytical method and apparatus
CN107195299A (en) * 2016-03-14 2017-09-22 株式会社东芝 Train the method and apparatus and audio recognition method and device of neutral net acoustic model
CN109389983A (en) * 2017-08-10 2019-02-26 奥迪股份公司 For handling the method and switching equipment of the recognition result of automatic online-speech recognition device of mobile terminal device
CN109389983B (en) * 2017-08-10 2023-07-07 奥迪股份公司 Method for processing recognition results of an automatic online voice recognizer of a mobile terminal and switching device
WO2022121257A1 (en) * 2020-12-11 2022-06-16 平安科技(深圳)有限公司 Model training method and apparatus, speech recognition method and apparatus, device, and storage medium
CN115376499A (en) * 2022-08-18 2022-11-22 东莞市乐移电子科技有限公司 Learning monitoring means applied to intelligent earphone in learning field
CN115376499B (en) * 2022-08-18 2023-07-28 东莞市乐移电子科技有限公司 Learning monitoring method of intelligent earphone applied to learning field

Also Published As

Publication number Publication date
CN101609672B (en) 2011-09-07

Similar Documents

Publication Publication Date Title
CN101609672B (en) Speech recognition semantic confidence feature extraction method and device
Chung et al. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech
CN103400577B (en) The acoustic model method for building up of multilingual speech recognition and device
Chen et al. Structure-aware abstractive conversation summarization via discourse and action graphs
CN110457432B (en) Interview scoring method, interview scoring device, interview scoring equipment and interview scoring storage medium
CN106328147A (en) Speech recognition method and device
Tran et al. Parsing speech: a neural approach to integrating lexical and acoustic-prosodic information
EP3594940B1 (en) Training method for voice data set, computer device and computer readable storage medium
Potash et al. Towards debate automation: a recurrent model for predicting debate winners
CN105374352A (en) Voice activation method and system
Kim et al. Gated embeddings in end-to-end speech recognition for conversational-context fusion
Van Dalen et al. Improving multiple-crowd-sourced transcriptions using a speech recogniser
Ganesan et al. N-best ASR transformer: Enhancing SLU performance using multiple ASR hypotheses
CN103559289A (en) Language-irrelevant keyword search method and system
Bowen Word order detection in English classroom teaching based on improved genetic algorithm of block coding
CN113779190B (en) Event causal relationship identification method, device, electronic equipment and storage medium
CN113609264B (en) Data query method and device for power system nodes
Dan et al. Enhancing class understanding via prompt-tuning for zero-shot text classification
Mitra et al. Feature fusion for high-accuracy keyword spotting
Shrivastava et al. Retrieve-and-fill for scenario-based task-oriented semantic parsing
CN112131879A (en) Relationship extraction system, method and device
Ranaldi et al. Modeling easiness for training transformers with curriculum learning
Liu et al. Hybrid models for sentence readability assessment
CN115129819A (en) Text abstract model production method and device, equipment and medium thereof
Kim et al. A composite kernel approach for dialog topic tracking with structured domain knowledge from wikipedia

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110907

Termination date: 20140721

EXPY Termination of patent right or utility model