CN101609672B - Speech recognition semantic confidence feature extraction method and device - Google Patents

Speech recognition semantic confidence feature extraction method and device Download PDF

Info

Publication number
CN101609672B
CN101609672B CN2009100888676A CN200910088867A CN101609672B CN 101609672 B CN101609672 B CN 101609672B CN 2009100888676 A CN2009100888676 A CN 2009100888676A CN 200910088867 A CN200910088867 A CN 200910088867A CN 101609672 B CN101609672 B CN 101609672B
Authority
CN
China
Prior art keywords
speech
recognition result
theme
topic
anchor point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2009100888676A
Other languages
Chinese (zh)
Other versions
CN101609672A (en
Inventor
陈伟
刘刚
郭军
国玉晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN2009100888676A priority Critical patent/CN101609672B/en
Publication of CN101609672A publication Critical patent/CN101609672A/en
Application granted granted Critical
Publication of CN101609672B publication Critical patent/CN101609672B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a speech recognition semantic confidence feature extraction method, which comprises the steps of reasoning a speech recognition result through a topic model for obtaining a topic structure of the recognition result, utilizing the reasoning result to calculate and obtain the topic distribution of words, selecting a certain number of words with acoustic posterior probability of being greater than a certain threshold and strong topic from the recognition result as anchor words, utilizing the topic distribution of the anchor words for calculating the reference topic distribution of the whole recognition result, and using the topic distribution in the recognition result for comparing the similarity between the topic distribution and the reference topic distribution of the recognition result and being taken as a semantic confidence feature of the words. The invention further discloses a speech recognition semantic confidence feature extraction device, which provides guidance of semantic high-level information for confidence annotation and can further more accurately describe and analysis the speech recognition result and improve the precision of confidence annotation.

Description

A kind of speech recognition semantic confidence feature extracting methods and device
Technical field
The present invention relates to field of speech recognition, particularly a kind of semantic confidence feature extracting methods and device.
Background technology
It is the key that the recognition result reliability is estimated in the speech recognition aftertreatment that the letter feature is put in speech recognition, is mainly used in the speech recognition confidence bits problem that solves.
The voice confidence bits generally need be put the letter feature or characteristics combination is labeled as correct and wrong two classes with confidence bits primitive in the recognition result based on different, thereby estimates the reliability of recognition result.The primitive of confidence bits generally adopts speech, also can adopt speech frame, phoneme and sentence etc. simultaneously.
At present, the information that the letter feature is mainly derived from demoder is put in speech recognition, but, yellow Ceng Yang mentions in its 1998 books " HNC (hierarchical network of concepts) theory " of being published by publishing house of Tsing-Hua University, human auditory's experiment shows, human auditory's pre-service can only be caught in the continuous speech stream 70% syllable, and people can instruct understanding to voice with knowledge such as grammer, semantemes when sound pronunciation is fuzzy.At present, the key of speech recognition has also depended on the ambiguity solution and the error correcting capability of after-treatment system, so grammer, semantic contour level information are very important for the speech recognition aftertreatment.But, how in the speech recognition aftertreatment, to extract syntax and semantics effectively and put also difficulty relatively of letter feature machine.
The inventor finds that there are the following problems at least in the prior art in realizing process of the present invention:
The information that the letter feature all derives from demoder put in the voice that existing method is extracted, and the characteristic information source is more single, can't believe that from the semantic stratification of semantic contour level information extraction feature instructs the evaluation to recognition result effectively.
The present invention is based on statistics topic model (Statistical Topic Models), given recognition result, extract thematic structure implicit in the recognition result and can be understood by topic model by the people, metastable implicit semantic structure, and seek the description of semantic layer for recognition result, and then the semantic feature of speech or other confidence bits primitives in the extraction recognition result, topic model has comprised that latent Di Li Cray distributes (Latent Dirichlet Allocation, LDA), the probability latent semantic analysis (Probability Latent Semantic Analysis, PLSA) etc.
Summary of the invention
In view of this, the purpose of the one or more embodiment of the present invention is to provide a kind of semantic confidence feature extracting methods and device, realize to increase the information source of putting the letter feature, to describe more accurately and the analyzing speech recognition result, improve the purpose of confidence bits precision by knowledge such as semantemes.
The embodiment of the invention provides a kind of speech recognition semantic confidence feature extracting methods, comprising:
By topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result;
The theme that utilizes The reasoning results to calculate speech distributes, and from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), and utilizing the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes;
Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.
Also disclose a kind of speech recognition semantic confidence feature deriving means, having comprised:
The subject analysis device is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result;
The posterior probability generating apparatus is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus is used for the thematic structure of the recognition result that obtains according to the subject analysis device, and the theme that calculates speech distributes;
Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes;
The semantic feature extraction element is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
Compared with prior art, the speech recognition semantic confidence feature that the embodiment of the invention proposes for confidence bits provides the guidance of semantic high layer information, thereby can be described and the analyzing speech recognition result more accurately, improves the precision of confidence bits.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention;
Shown in Figure 2, be the process flow diagram of the generation recognition result benchmark theme distribution of the embodiment of the invention;
Shown in Fig. 2-1, be the method flow diagram of the searching anchor point speech of the embodiment of the invention;
Shown in Fig. 2-2, be to carry out degree of confidence with acoustics posterior probability and semantic confidence characteristics combination of the present invention
Be labeled as example, provide the mark precision is sought parameter with the anchor point speech variation synoptic diagram;
Shown in Figure 3, be the device block diagram of the semantic confidence feature extraction of the embodiment of the invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.
In the relevant technologies scheme of the semantic confidence feature extraction that the embodiment of the invention provides, a basic premise is arranged, the speech that is correct identification in the recognition result more meets semantic rules than the speech of wrong identification, and under above-mentioned prerequisite, the inventor has conceived the relevant embodiment of the present invention just.
In embodiments of the present invention, can carry out the division of following mode to the function of semantic confidence feature extraction:
First functional unit of the embodiment of the invention mainly uses a large amount of document sets, trains topic model.
Second functional unit of the embodiment of the invention mainly carries out speech recognition, exports final recognition result, and the whole decode procedure of detail record.
The 3rd functional unit of the embodiment of the invention is mainly used at first, second functional unit and generates under the guidance of information, extracts the semantic confidence feature of speech in the recognition result.The topic model that utilizes first functional unit to generate carries out rational analysis to voice identification result, obtains the thematic structure in the recognition result; Utilize the detailed decoded information of second functional unit record, calculate the acoustics posterior probability of each speech in the recognition result.Under the guidance of these information, the theme that calculates speech distributes; And from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, utilize the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes; Distribute by the theme that utilizes speech in the recognition result, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
Need to prove; the division of above-mentioned functions module is relative; be mainly used in and help the those skilled in the art to understand principle of the present invention on the whole; the embodiment of the invention can also realize principle of the present invention with other functional module and combination thereof; reach identical technique effect, this does not all exceed protection scope of the present invention.
As shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention, comprising:
First functional unit 101, second functional unit 102 and the 3rd functional unit 103, the 3rd functional unit is connected with second functional unit with first functional unit respectively, and wherein, first functional unit 101 comprises document sets 1011, topic model training module 1012 and topic model 1013; Second functional unit 102 comprises that speech data load module 1021, sound identification module 1022, voice identification result 1023 and speech recognition decoder information 1024, the three functional units comprise topic model analysis module 1031, posterior probability generation module 1032, speech theme distribution generation module 1033, document benchmark theme distribution generation module 1034 and semantic feature extraction module 1035.
Be example with LDA below, introduce topic model analysis module 1031 and speech theme distribution generation module 1033.
The LDA model is a kind of topic model that can extract the unsupervised learning of the implicit theme of text that proposes in recent years, it is a production probability model that comprises speech, theme and document three-decker, the document sets of supposing training LDA comprises M document and V different speech, the theme number of LDA is K, promptly Speech number among the current recognition result d is N d, the equivalent sequence w → = ( w 1 , w 2 , . . . , w N d ) .
Topic model analysis module 1031 is exactly to obtain thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j by the LDA reasoning: Φ j ( w ) = P ( w | z = j ) And θ j ( d ) = P ( z = j | d ) .
Speech theme distribution generation module 1034 is exactly an information of utilizing topic model analysis module 1031 to obtain, calculates the theme distribution Topic_dis (w of speech i), w wherein iBe the speech among the recognition result d, Topic_dis (w i) be the vector of K dimension, concrete formula as follows:
Topic_dis(w i)=(H(w i,z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) ;
Below, in conjunction with Fig. 2-Fig. 4, be example with LDA, the method for document benchmark theme distribution generation module 1034 in the key diagram 1.
As shown in Figure 2, be the process flow diagram of recognition result benchmark theme distribution generation module in the embodiment of the invention, comprising:
201, current recognition result is carried out the topic model reasoning, obtain the thematic structure in the recognition result;
202, result and posterior probability are searched anchor point speech in the recognition result by inference, speech among the recognition result d should be consistent with entire chapter document theme to be expressed, but the theme of considering recognition result d distributes mainly by some the thematic strong speech decisions in the d, the benchmark theme that therefore will calculate recognition result distributes and just need find these to the speech that theme distributes and plays a decisive role, and is called anchor point speech (Anchor word).Because there is the speech of being discerned in the recognition result by mistake, need guarantee at first that the anchor point speech has very big may being correctly validated when therefore selecting the anchor point speech, promptly the acoustics posterior probability is enough big, also will guarantee the thematic more intense of these anchor point speech simultaneously.Concrete anchor point speech is sought way shown in Fig. 2-1, Fig. 2-the 1st, and the method flow diagram of the searching anchor point speech of the embodiment of the invention:
2021, calculate the acoustics posterior probability of each speech in the recognition result by the detailed decoded information of speech recognition record;
2022, the threshold value of posterior probability is set, and called after PPThresh when certain speech posterior probability during greater than this threshold value, is added to credible class with this speech, among the called after CClass; If less than this threshold value, then give up;
2023, the number of speech among the credible class CClass of statistics, called after C_hum;
2024, judge whether speech is arranged among the credible class CClass, promptly whether C_num is 0;
2025, if do not have speech among the credible class CClass, promptly C_num equals 0, then changes posteriority probability threshold value PPThresh, selects speech again and is added to credible class;
2026, if among the credible class CClass speech is arranged, promptly C_num is not equal to 0, calculates the Topic_dis (w of each speech among the credible class CClass i), and record w iPairing H (w i, z j) in maximal value, promptly Maximal value is to should the thematic power of speech;
2027, the ratio of choosing the anchor point speech is set, called after Aratio, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class CClass
According to max_prob (w i) select the anchor point speech of L speech from big to small as current document.
203, after the 202 anchor point speech that find in the recognition result, the theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence Anchor point speech A then iTheme be distributed as Topic_dis (A i), i=1...L.
204, the benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K))
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for the probable value of each anchor point speech under certain theme made up for example uses the method for asking arithmetic mean, then
L ( d , z j ) = 1 L * Σ i = 1 L H ( A i , z j )
Therefore, the semantic feature extraction module 1035 of Fig. 1 can be by comparing speech theme distribution Topic_dis (w i) with the similarity between the document benchmark theme distribution Topic_dis (d), as the semantic confidence feature of speech in the recognition result, promptly
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature, the method for tolerance similarity Similarity () has a lot, such as symmetrical K-L divergence:
Make M1:Topic_dis (w i); M2:Topic_dis (d);
Then can be defined as the M1 of reference model and the K-L divergence of M2 with M2
D KL ( M 1 | | M 2 ) = Σ j = 1 K H ( w i , z j ) * log ( H ( w i , z j ) L ( d , z j ) )
In order not consider reference model, then define the measure of symmetrical K-L divergence, thereby the semantic confidence of speech is characterized as similarity
Sem ( w i ) = 1 2 { D KL ( M 1 | | M 2 ) + D KL ( M 2 | | M 1 ) }
Shown in Fig. 2-2, be that to carry out confidence bits with acoustics posterior probability and semantic confidence characteristics combination of the present invention be example, provide the mark precision and search the variation synoptic diagram of parameter with the anchor point speech.
Can see from Fig. 2-2, do not use acoustics posterior probability threshold value, be that parameter PP Thresh=0 searched in the anchor point speech, with use acoustics posterior probability threshold value, PPThresh=0.88 compares in this synoptic diagram, can see that the effect of using PPThresh can be better, thereby prove when selecting the anchor point speech and need select to be correctly validated the big speech of possibility that promptly the acoustics posterior probability is greater than the speech of threshold value.Can see simultaneously when selecting the anchor point speech and using acoustics posterior probability threshold value, the mark performance is along with the selection percentage Aratio amplitude of variation of anchor point speech is bigger, thereby the necessity of selecting the Aratio parameter has been described also, and then explanation selects the anchor point speech need guarantee at first that the anchor point speech has very big may being correctly validated, be that the acoustics posterior probability is enough big, also to guarantee the thematic more intense of these anchor point speech simultaneously, could extract high performance semantic confidence feature.
As shown in Figure 3, the embodiment of the invention also provides a kind of speech recognition semantic confidence feature deriving means, comprising:
Subject analysis device 301 is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result, supposes that promptly number of topics is K, promptly
Figure GSB00000473818500071
The probability of theme j under the probability of speech w and the current recognition result d under the given theme j:
Figure GSB00000473818500072
And θ j ( d ) = P ( z = j | d ) ;
Posterior probability generating apparatus 302 is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus 303 is used for the thematic structure of the recognition result that obtains according to subject analysis device 301, calculates the theme distribution Topic_dis (w of speech i), according to formula
Topic_dis(w i)=(H(w i,z 1),H(w i,z 2)...H(w i,z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
(annotate: the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M)
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) ;
Document benchmark theme distribution generating apparatus 304, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by subject analysis device 301, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus 302, from recognition result, choose some, as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes the acoustics posterior probability greater than certain threshold value and thematic strong speech.Suppose L altogether of current anchor point speech, the corresponding point sequence
Figure GSB00000473818500077
I=1...L.Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, called after Topic_dis (d), the vector for the K dimension, specifically pass through formula:
Topic_dis(d)=(L(d,z 1),L(d,z 2)...L(d,z K));
Wherein,
L(d,z j)=Com(H(A 1,z j),H(A 2,z j)...,H(A L,z j));
Wherein, the function of Com () for each anchor point speech is made up in certain theme lower probability value;
Semantic feature extraction element 305 is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature,
Similarity () is the method for similarity measurement.
Apparatus of the present invention embodiment has the technique effect identical with method embodiment, no longer repeats.
Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.
Above-described embodiment of the present invention does not constitute the qualification to protection domain of the present invention.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a speech recognition semantic confidence feature extracting methods is characterized in that, comprising:
By topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result;
The theme that utilizes The reasoning results to calculate speech distributes;
From recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes;
Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.
2. the method for claim 1 is characterized in that, by topic model voice identification result is carried out reasoning, obtains the thematic structure of recognition result, comprising:
Suppose that number of topics is K, promptly Obtain by the topic model reasoning, the thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j:
Figure FSB00000530734200012
And
Figure FSB00000530734200013
3. method as claimed in claim 2 is characterized in that, the theme that utilizes The reasoning results to calculate speech distributes and comprises:
Utilize
Figure FSB00000530734200014
And
Figure FSB00000530734200015
Calculate the theme distribution Topic_dis (w of speech i), w wherein iBe the speech among the recognition result d, Topic_dis (w i) be the vector of K dimension, concrete formula as follows:
Topic_dis(w i)=(H(w i,Z 1),H(w i,Z 2)...H(w i,Z K));
Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
Wherein, the be the theme training number of documents of model of M, d iBe i piece of writing document in the training document, the prior probability of document d is counted as even distribution, i.e. P (d i)=P (d), i=1...M wherein, then
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) .
4. method as claimed in claim 3, it is characterized in that, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes, and comprising:
Calculate the acoustics posterior probability of each speech in the recognition result by the detailed decoded information of speech recognition record;
The threshold value of posterior probability is set, when certain speech posterior probability is greater than this threshold value in the recognition result, this speech is added in the credible class,, then give up if less than this threshold value;
Add up the number of speech in the credible class, called after C_num;
Judge whether speech is arranged in the credible class,, then change the posteriority probability threshold value, select speech again and be added to credible class if do not have speech in the credible class;
If in the credible class speech is arranged, calculates the Topic_dis (w of each speech in the credible class i), and record w iPairing H (w i, z j) in maximal value, promptly
Figure FSB00000530734200021
Maximal value is to should the thematic power of speech;
The ratio Aratio of anchor point speech is chosen in setting, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class according to max_prob (w i) select the anchor point speech of L speech from big to small as current recognition result;
The theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence
Figure FSB00000530734200022
Anchor point speech A then iTheme be distributed as Topic_dis (A i), i=1...L;
Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:
Topic_dis(d)=(L(d,Z 1),L(d,Z 2)...L(d,Z K));
Wherein,
L(d,Z j)=Com(H(A 1,Z j),H(A 2,Z j)...,H(A L,Z j));
Wherein, Com () is for carrying out the function of arithmetic mean to the probable value of each anchor point speech under j theme.
5. method as claimed in claim 4 is characterized in that, uses the theme of speech in the recognition result to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, the semantic confidence feature as speech comprises:
By making word theme distribution Topic_dis (w i), the similarity between itself and the recognition result benchmark theme distribution Topic_dis (d) relatively is as the semantic confidence feature of speech in the recognition result, promptly
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature, Similarity () is the similarity measurement function, uses symmetrical K-L divergence.
6. speech recognition semantic confidence feature deriving means comprises:
The subject analysis device is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result;
The posterior probability generating apparatus is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;
Speech theme distribution generating apparatus is used for the thematic structure of the recognition result that obtains according to the subject analysis device, and the theme that calculates speech distributes;
Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain recognition result then distributes;
The semantic feature extraction element is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.
7. device as claimed in claim 6 is characterized in that, described subject analysis device comprises: be used to use topic model that recognition result is carried out rational analysis, obtain the thematic structure in the recognition result, suppose that promptly number of topics is K, promptly
Figure FSB00000530734200031
The probability of theme j under the probability of speech w and the current recognition result d under the given theme j:
Figure FSB00000530734200032
And
Figure FSB00000530734200033
8. device as claimed in claim 7 is characterized in that, institute's predicate theme distribution generating apparatus comprises: be used for using
Figure FSB00000530734200034
And
Figure FSB00000530734200035
Calculate the theme distribution Topic_dis (w of speech i), according to formula Topic_dis (w i)=(H (w i, Z 1), H (w i, Z 2) ... H (w i, Z K)); Wherein,
H ( w i , z j ) = P ( z j | w i ) = P ( w i | z j ) * P ( z j ) p ( w i ) = Φ j ( w i ) * P ( z j ) p ( w i ) ;
P ( z j ) = Σ i = 1 M P ( z j , d i ) = Σ i = 1 M P ( z j | d i ) * P ( d i ) = P ( d ) * Σ i = 1 M θ j ( d i ) ;
Wherein, the be the theme training number of documents of model of M, d iBe i piece of writing document in the training document, the prior probability of document d is counted as even distribution, i.e. P (d i)=p (d), i=1...M wherein, then
P ( w i ) = Σ j = 1 K P ( w i , z j ) = Σ j = 1 K P ( w i | z j ) * P ( z j ) = Σ j = 1 K Φ j ( w i ) * P ( z j ) .
9. device as claimed in claim 8, it is characterized in that, described document benchmark theme distribution generating apparatus comprises: the thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech; The benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes; Suppose L altogether of current anchor point speech, the corresponding point sequence
Figure FSB00000530734200042
I=1...L distributes according to the benchmark theme of the theme Distribution calculation recognition result d of anchor point speech, called after Topic_dis (d), the vector for the K dimension, pass through formula:
Topic_dis(d)=(L(d,Z 1),L(d,Z 2)...L(d,Z K));
Wherein,
L(d,Z j)=Com(H(A 1,Z j),H(A 2,Z j)...,H(A L,Z j));
Wherein, Com () is for carrying out the function of arithmetic mean j theme lower probability value to each anchor point speech.
10. device as claimed in claim 9, it is characterized in that described semantic feature extraction element comprises: be used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula
Sem(w i)=Similarity(Topic_dis(w i),Topic_dis(d))
Wherein, Sem (w i) be speech w iThe semantic confidence feature,
Similarity () is the method for similarity measurement, uses symmetrical K-L divergence.
CN2009100888676A 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device Expired - Fee Related CN101609672B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2009100888676A CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2009100888676A CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Publications (2)

Publication Number Publication Date
CN101609672A CN101609672A (en) 2009-12-23
CN101609672B true CN101609672B (en) 2011-09-07

Family

ID=41483397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2009100888676A Expired - Fee Related CN101609672B (en) 2009-07-21 2009-07-21 Speech recognition semantic confidence feature extraction method and device

Country Status (1)

Country Link
CN (1) CN101609672B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106062868A (en) * 2014-07-25 2016-10-26 谷歌公司 Providing pre-computed hotword models

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101894549A (en) * 2010-06-24 2010-11-24 中国科学院声学研究所 Method for fast calculating confidence level in speech recognition application field
CN103177721B (en) * 2011-12-26 2015-08-19 中国电信股份有限公司 Audio recognition method and system
CN103700368B (en) * 2014-01-13 2017-01-18 联想(北京)有限公司 Speech recognition method, speech recognition device and electronic equipment
CN105529028B (en) * 2015-12-09 2019-07-30 百度在线网络技术(北京)有限公司 Speech analysis method and apparatus
CN107195299A (en) * 2016-03-14 2017-09-22 株式会社东芝 Train the method and apparatus and audio recognition method and device of neutral net acoustic model
DE102017213946B4 (en) * 2017-08-10 2022-11-10 Audi Ag Method for processing a recognition result of an automatic online speech recognizer for a mobile terminal
CN112435656B (en) * 2020-12-11 2024-03-01 平安科技(深圳)有限公司 Model training method, voice recognition method, device, equipment and storage medium
CN115376499B (en) * 2022-08-18 2023-07-28 东莞市乐移电子科技有限公司 Learning monitoring method of intelligent earphone applied to learning field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
CN1490786A (en) * 2002-10-17 2004-04-21 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1293428A (en) * 2000-11-10 2001-05-02 清华大学 Information check method based on speed recognition
CN1490786A (en) * 2002-10-17 2004-04-21 中国科学院声学研究所 Phonetic recognition confidence evaluating method, system and dictation device therewith
CN101013421A (en) * 2007-02-02 2007-08-08 清华大学 Rule-based automatic analysis method of Chinese basic block
CN101030369A (en) * 2007-03-30 2007-09-05 清华大学 Built-in speech discriminating method based on sub-word hidden Markov model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Desilets, Alain.Semantic Similarity for Detecting Recognition Errors in Automatic Speech Transcripts.《Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing》.2005,49-56.
Cox S. J.
Cox, S. J., Dasmahapatra, S..High-level Approaches to Confidence Estimation in Speech Recognition.《IEEE Transactions on Speech and Audio》.2002,460-471. *
Inkpen, Diana
Inkpen, Diana ; Desilets, Alain.Semantic Similarity for Detecting Recognition Errors in Automatic Speech Transcripts.《Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing》.2005,49-56. *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106062868A (en) * 2014-07-25 2016-10-26 谷歌公司 Providing pre-computed hotword models
CN106062868B (en) * 2014-07-25 2019-10-29 谷歌有限责任公司 The hot word model precalculated is provided

Also Published As

Publication number Publication date
CN101609672A (en) 2009-12-23

Similar Documents

Publication Publication Date Title
CN101609672B (en) Speech recognition semantic confidence feature extraction method and device
Chung et al. Speech2vec: A sequence-to-sequence framework for learning word embeddings from speech
CN103400577B (en) The acoustic model method for building up of multilingual speech recognition and device
Tran et al. Parsing speech: a neural approach to integrating lexical and acoustic-prosodic information
CN106328147A (en) Speech recognition method and device
Cox et al. High-level approaches to confidence estimation in speech recognition
CN105374352A (en) Voice activation method and system
Chen et al. Spoken Lecture Summarization by Random Walk over a Graph Constructed with Automatically Extracted Key Terms.
CN113178193A (en) Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip
Ning et al. Multi-task deep learning for user intention understanding in speech interaction systems
Kim et al. Data augmentation by data noising for open-vocabulary slots in spoken language understanding
Shen et al. Neural headline generation with sentence-wise optimization
Yuan et al. Extracting bottleneck features and word-like pairs from untranscribed speech for feature representation
Ruiz-Casado et al. Using context-window overlapping in synonym discovery and ontology extension
Bowen Word order detection in English classroom teaching based on improved genetic algorithm of block coding
CN103559289A (en) Language-irrelevant keyword search method and system
Agarwal et al. Lidsnet: A lightweight on-device intent detection model using deep siamese network
Muralidharan et al. Leveraging user engagement signals for entity labeling in a virtual assistant
Ranaldi et al. Modeling easiness for training transformers with curriculum learning
Kolář et al. Automatic sentence boundary detection in conversational speech: A cross-lingual evaluation on English and Czech
CN103548015A (en) A method, an apparatus and a computer-readable medium for indexing a document for document retrieval
Liu et al. Hybrid models for sentence readability assessment
CN115129819A (en) Text abstract model production method and device, equipment and medium thereof
Chung et al. Unsupervised discovery of structured acoustic tokens with applications to spoken term detection
Kang et al. Linguistic versus latent relations for modeling coherent flow in paragraphs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110907

Termination date: 20140721

EXPY Termination of patent right or utility model