CN101609672B

CN101609672B - Speech recognition semantic confidence feature extraction method and device

Info

Publication number: CN101609672B
Application number: CN2009100888676A
Authority: CN
Inventors: 陈伟; 刘刚; 郭军; 国玉晶
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2009-07-21
Filing date: 2009-07-21
Publication date: 2011-09-07
Anticipated expiration: 2029-07-21
Also published as: CN101609672A

Abstract

The embodiment of the invention discloses a speech recognition semantic confidence feature extraction method, which comprises the steps of reasoning a speech recognition result through a topic model for obtaining a topic structure of the recognition result, utilizing the reasoning result to calculate and obtain the topic distribution of words, selecting a certain number of words with acoustic posterior probability of being greater than a certain threshold and strong topic from the recognition result as anchor words, utilizing the topic distribution of the anchor words for calculating the reference topic distribution of the whole recognition result, and using the topic distribution in the recognition result for comparing the similarity between the topic distribution and the reference topic distribution of the recognition result and being taken as a semantic confidence feature of the words. The invention further discloses a speech recognition semantic confidence feature extraction device, which provides guidance of semantic high-level information for confidence annotation and can further more accurately describe and analysis the speech recognition result and improve the precision of confidence annotation.

Description

A kind of speech recognition semantic confidence feature extracting methods and device

Technical field

The present invention relates to field of speech recognition, particularly a kind of semantic confidence feature extracting methods and device.

Background technology

It is the key that the recognition result reliability is estimated in the speech recognition aftertreatment that the letter feature is put in speech recognition, is mainly used in the speech recognition confidence bits problem that solves.

The voice confidence bits generally need be put the letter feature or characteristics combination is labeled as correct and wrong two classes with confidence bits primitive in the recognition result based on different, thereby estimates the reliability of recognition result.The primitive of confidence bits generally adopts speech, also can adopt speech frame, phoneme and sentence etc. simultaneously.

At present, the information that the letter feature is mainly derived from demoder is put in speech recognition, but, yellow Ceng Yang mentions in its 1998 books " HNC (hierarchical network of concepts) theory " of being published by publishing house of Tsing-Hua University, human auditory's experiment shows, human auditory's pre-service can only be caught in the continuous speech stream 70% syllable, and people can instruct understanding to voice with knowledge such as grammer, semantemes when sound pronunciation is fuzzy.At present, the key of speech recognition has also depended on the ambiguity solution and the error correcting capability of after-treatment system, so grammer, semantic contour level information are very important for the speech recognition aftertreatment.But, how in the speech recognition aftertreatment, to extract syntax and semantics effectively and put also difficulty relatively of letter feature machine.

The inventor finds that there are the following problems at least in the prior art in realizing process of the present invention:

The information that the letter feature all derives from demoder put in the voice that existing method is extracted, and the characteristic information source is more single, can't believe that from the semantic stratification of semantic contour level information extraction feature instructs the evaluation to recognition result effectively.

The present invention is based on statistics topic model (Statistical Topic Models), given recognition result, extract thematic structure implicit in the recognition result and can be understood by topic model by the people, metastable implicit semantic structure, and seek the description of semantic layer for recognition result, and then the semantic feature of speech or other confidence bits primitives in the extraction recognition result, topic model has comprised that latent Di Li Cray distributes (Latent Dirichlet Allocation, LDA), the probability latent semantic analysis (Probability Latent Semantic Analysis, PLSA) etc.

Summary of the invention

In view of this, the purpose of the one or more embodiment of the present invention is to provide a kind of semantic confidence feature extracting methods and device, realize to increase the information source of putting the letter feature, to describe more accurately and the analyzing speech recognition result, improve the purpose of confidence bits precision by knowledge such as semantemes.

The embodiment of the invention provides a kind of speech recognition semantic confidence feature extracting methods, comprising:

By topic model voice identification result is carried out reasoning, obtain the thematic structure of recognition result;

The theme that utilizes The reasoning results to calculate speech distributes, and from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), and utilizing the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes;

Use the theme of speech in the recognition result to distribute, relatively its with the distribution of recognition result benchmark theme between similarity, as the semantic confidence feature of speech.

Also disclose a kind of speech recognition semantic confidence feature deriving means, having comprised:

The subject analysis device is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result;

The posterior probability generating apparatus is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;

Speech theme distribution generating apparatus is used for the thematic structure of the recognition result that obtains according to the subject analysis device, and the theme that calculates speech distributes;

Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes;

The semantic feature extraction element is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.

Compared with prior art, the speech recognition semantic confidence feature that the embodiment of the invention proposes for confidence bits provides the guidance of semantic high layer information, thereby can be described and the analyzing speech recognition result more accurately, improves the precision of confidence bits.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, to do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below, apparently, accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention;

Shown in Figure 2, be the process flow diagram of the generation recognition result benchmark theme distribution of the embodiment of the invention;

Shown in Fig. 2-1, be the method flow diagram of the searching anchor point speech of the embodiment of the invention;

Shown in Fig. 2-2, be to carry out degree of confidence with acoustics posterior probability and semantic confidence characteristics combination of the present invention

Be labeled as example, provide the mark precision is sought parameter with the anchor point speech variation synoptic diagram;

Shown in Figure 3, be the device block diagram of the semantic confidence feature extraction of the embodiment of the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that is obtained under the creative work prerequisite.

In the relevant technologies scheme of the semantic confidence feature extraction that the embodiment of the invention provides, a basic premise is arranged, the speech that is correct identification in the recognition result more meets semantic rules than the speech of wrong identification, and under above-mentioned prerequisite, the inventor has conceived the relevant embodiment of the present invention just.

In embodiments of the present invention, can carry out the division of following mode to the function of semantic confidence feature extraction:

First functional unit of the embodiment of the invention mainly uses a large amount of document sets, trains topic model.

Second functional unit of the embodiment of the invention mainly carries out speech recognition, exports final recognition result, and the whole decode procedure of detail record.

The 3rd functional unit of the embodiment of the invention is mainly used at first, second functional unit and generates under the guidance of information, extracts the semantic confidence feature of speech in the recognition result.The topic model that utilizes first functional unit to generate carries out rational analysis to voice identification result, obtains the thematic structure in the recognition result; Utilize the detailed decoded information of second functional unit record, calculate the acoustics posterior probability of each speech in the recognition result.Under the guidance of these information, the theme that calculates speech distributes; And from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, utilize the theme of anchor point speech to distribute, the benchmark theme that calculates whole recognition result distributes; Distribute by the theme that utilizes speech in the recognition result, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech.

Need to prove; the division of above-mentioned functions module is relative; be mainly used in and help the those skilled in the art to understand principle of the present invention on the whole; the embodiment of the invention can also realize principle of the present invention with other functional module and combination thereof; reach identical technique effect, this does not all exceed protection scope of the present invention.

As shown in Figure 1, be a kind of structured flowchart of the embodiment of the invention, comprising:

First functional unit 101, second functional unit 102 and the 3rd functional unit 103, the 3rd functional unit is connected with second functional unit with first functional unit respectively, and wherein, first functional unit 101 comprises document sets 1011, topic model training module 1012 and topic model 1013; Second functional unit 102 comprises that speech data load module 1021, sound identification module 1022, voice identification result 1023 and speech recognition decoder information 1024, the three functional units comprise topic model analysis module 1031, posterior probability generation module 1032, speech theme distribution generation module 1033, document benchmark theme distribution generation module 1034 and semantic feature extraction module 1035.

Be example with LDA below, introduce topic model analysis module 1031 and speech theme distribution generation module 1033.

The LDA model is a kind of topic model that can extract the unsupervised learning of the implicit theme of text that proposes in recent years, it is a production probability model that comprises speech, theme and document three-decker, the document sets of supposing training LDA comprises M document and V different speech, the theme number of LDA is K, promptly Speech number among the current recognition result d is N _d, the equivalent sequence

\overset{&RightArrow;}{w} = (w_{1}, w_{2}, . . ., w_{N_{d}}) .

Topic model analysis module 1031 is exactly to obtain thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j by the LDA reasoning:

Φ_{j}^{(w)} = P (w | z = j)

And

θ_{j}^{(d)} = P (z = j | d) .

Speech theme distribution generation module 1034 is exactly an information of utilizing topic model analysis module 1031 to obtain, calculates the theme distribution Topic_dis (w of speech _i), w wherein _iBe the speech among the recognition result d, Topic_dis (w _i) be the vector of K dimension, concrete formula as follows:

Topic_dis(w _i)＝(H(w _i，z ₁)，H(w _i，z ₂)...H(w _i，z _K))；

Wherein,

H (w_{i}, z_{j}) = P (z_{j} | w_{i}) = \frac{P (w_{i} | z_{j}) * P (z_{j})}{p (w_{i})} = \frac{Φ_{j}^{(w_{i})} * P (z_{j})}{p (w_{i})};

P (z_{j}) = Σ_{i = 1}^{M} P (z_{j}, d_{i}) = Σ_{i = 1}^{M} P (z_{j} | d_{i}) * P (d_{i}) = P (d) * Σ_{i = 1}^{M} θ_{j}^{(d_{i})};

(annotate: the prior probability of document d is counted as even distribution, i.e. P (d _i)=p (d), i=1...M)

P (w_{i}) = Σ_{j = 1}^{K} P (w_{i}, z_{j}) = Σ_{j = 1}^{K} P (w_{i} | z_{j}) * P (z_{j}) = Σ_{j = 1}^{K} Φ_{j}^{(w_{i})} * P (z_{j});

Below, in conjunction with Fig. 2-Fig. 4, be example with LDA, the method for document benchmark theme distribution generation module 1034 in the key diagram 1.

As shown in Figure 2, be the process flow diagram of recognition result benchmark theme distribution generation module in the embodiment of the invention, comprising:

201, current recognition result is carried out the topic model reasoning, obtain the thematic structure in the recognition result;

202, result and posterior probability are searched anchor point speech in the recognition result by inference, speech among the recognition result d should be consistent with entire chapter document theme to be expressed, but the theme of considering recognition result d distributes mainly by some the thematic strong speech decisions in the d, the benchmark theme that therefore will calculate recognition result distributes and just need find these to the speech that theme distributes and plays a decisive role, and is called anchor point speech (Anchor word).Because there is the speech of being discerned in the recognition result by mistake, need guarantee at first that the anchor point speech has very big may being correctly validated when therefore selecting the anchor point speech, promptly the acoustics posterior probability is enough big, also will guarantee the thematic more intense of these anchor point speech simultaneously.Concrete anchor point speech is sought way shown in Fig. 2-1, Fig. 2-the 1st, and the method flow diagram of the searching anchor point speech of the embodiment of the invention:

2021, calculate the acoustics posterior probability of each speech in the recognition result by the detailed decoded information of speech recognition record;

2022, the threshold value of posterior probability is set, and called after PPThresh when certain speech posterior probability during greater than this threshold value, is added to credible class with this speech, among the called after CClass; If less than this threshold value, then give up;

2023, the number of speech among the credible class CClass of statistics, called after C_hum;

2024, judge whether speech is arranged among the credible class CClass, promptly whether C_num is 0;

2025, if do not have speech among the credible class CClass, promptly C_num equals 0, then changes posteriority probability threshold value PPThresh, selects speech again and is added to credible class;

2026, if among the credible class CClass speech is arranged, promptly C_num is not equal to 0, calculates the Topic_dis (w of each speech among the credible class CClass _i), and record w _iPairing H (w _i, z _j) in maximal value, promptly Maximal value is to should the thematic power of speech;

2027, the ratio of choosing the anchor point speech is set, called after Aratio, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class CClass

According to max_prob (w _i) select the anchor point speech of L speech from big to small as current document.

203, after the 202 anchor point speech that find in the recognition result, the theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence Anchor point speech A then _iTheme be distributed as Topic_dis (A _i), i=1...L.

204, the benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:

Topic_dis(d)＝(L(d，z ₁)，L(d，z ₂)...L(d，z _K))

Wherein,

L(d，z _j)＝Com(H(A ₁，z _j)，H(A ₂，z _j)...，H(A _L，z _j))；

Wherein, the function of Com () for the probable value of each anchor point speech under certain theme made up for example uses the method for asking arithmetic mean, then

L (d, z_{j}) = \frac{1}{L} * Σ_{i = 1}^{L} H (A_{i}, z_{j})

Therefore, the semantic feature extraction module 1035 of Fig. 1 can be by comparing speech theme distribution Topic_dis (w _i) with the similarity between the document benchmark theme distribution Topic_dis (d), as the semantic confidence feature of speech in the recognition result, promptly

Sem(w _i)＝Similarity(Topic_dis(w _i)，Topic_dis(d))

Wherein, Sem (w _i) be speech w _iThe semantic confidence feature, the method for tolerance similarity Similarity () has a lot, such as symmetrical K-L divergence:

Make M1:Topic_dis (w _i); M2:Topic_dis (d);

Then can be defined as the M1 of reference model and the K-L divergence of M2 with M2

D_{KL} (M 1 | | M 2) = Σ_{j = 1}^{K} H (w_{i}, z_{j}) * \log (\frac{H (w_{i}, z_{j})}{L (d, z_{j})})

In order not consider reference model, then define the measure of symmetrical K-L divergence, thereby the semantic confidence of speech is characterized as similarity

Sem (w_{i}) = \frac{1}{2} {D_{KL} (M 1 | | M 2) + D_{KL} (M 2 | | M 1)}

Shown in Fig. 2-2, be that to carry out confidence bits with acoustics posterior probability and semantic confidence characteristics combination of the present invention be example, provide the mark precision and search the variation synoptic diagram of parameter with the anchor point speech.

Can see from Fig. 2-2, do not use acoustics posterior probability threshold value, be that parameter PP Thresh=0 searched in the anchor point speech, with use acoustics posterior probability threshold value, PPThresh=0.88 compares in this synoptic diagram, can see that the effect of using PPThresh can be better, thereby prove when selecting the anchor point speech and need select to be correctly validated the big speech of possibility that promptly the acoustics posterior probability is greater than the speech of threshold value.Can see simultaneously when selecting the anchor point speech and using acoustics posterior probability threshold value, the mark performance is along with the selection percentage Aratio amplitude of variation of anchor point speech is bigger, thereby the necessity of selecting the Aratio parameter has been described also, and then explanation selects the anchor point speech need guarantee at first that the anchor point speech has very big may being correctly validated, be that the acoustics posterior probability is enough big, also to guarantee the thematic more intense of these anchor point speech simultaneously, could extract high performance semantic confidence feature.

As shown in Figure 3, the embodiment of the invention also provides a kind of speech recognition semantic confidence feature deriving means, comprising:

Subject analysis device 301 is used to use topic model that recognition result is carried out rational analysis, obtains the thematic structure in the recognition result, supposes that promptly number of topics is K, promptly

The probability of theme j under the probability of speech w and the current recognition result d under the given theme j:

And

θ_{j}^{(d)} = P (z = j | d);

Posterior probability generating apparatus 302 is used for the detailed decoded information that utilizes speech recognition process to write down, calculates the acoustics posterior probability of each speech in the recognition result;

Speech theme distribution generating apparatus 303 is used for the thematic structure of the recognition result that obtains according to subject analysis device 301, calculates the theme distribution Topic_dis (w of speech _i), according to formula

Topic_dis(w _i)＝(H(w _i，z ₁)，H(w _i，z ₂)...H(w _i，z _K))；

Wherein,

H (w_{i}, z_{j}) = P (z_{j} | w_{i}) = \frac{P (w_{i} | z_{j}) * P (z_{j})}{p (w_{i})} = \frac{Φ_{j}^{(w_{i})} * P (z_{j})}{p (w_{i})};

P (z_{j}) = Σ_{i = 1}^{M} P (z_{j}, d_{i}) = Σ_{i = 1}^{M} P (z_{j} | d_{i}) * P (d_{i}) = P (d) * Σ_{i = 1}^{M} θ_{j}^{(d_{i})};

P (w_{i}) = Σ_{j = 1}^{K} P (w_{i}, z_{j}) = Σ_{j = 1}^{K} P (w_{i} | z_{j}) * P (z_{j}) = Σ_{j = 1}^{K} Φ_{j}^{(w_{i})} * P (z_{j});

Document benchmark theme distribution generating apparatus 304, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by subject analysis device 301, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus 302, from recognition result, choose some, as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes the acoustics posterior probability greater than certain threshold value and thematic strong speech.Suppose L altogether of current anchor point speech, the corresponding point sequence

I=1...L.Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, called after Topic_dis (d), the vector for the K dimension, specifically pass through formula:

Topic_dis(d)＝(L(d，z ₁)，L(d，z ₂)...L(d，z _K))；

Wherein,

L(d，z _j)＝Com(H(A ₁，z _j)，H(A ₂，z _j)...，H(A _L，z _j))；

Wherein, the function of Com () for each anchor point speech is made up in certain theme lower probability value;

Semantic feature extraction element 305 is used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula

Sem(w _i)＝Similarity(Topic_dis(w _i)，Topic_dis(d))

Wherein, Sem (w _i) be speech w _iThe semantic confidence feature,

Similarity () is the method for similarity measurement.

Apparatus of the present invention embodiment has the technique effect identical with method embodiment, no longer repeats.

Through the above description of the embodiments, the those skilled in the art can be well understood to the present invention and can realize by the mode that software adds essential general hardware platform, can certainly pass through hardware, but the former is better embodiment under a lot of situation.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in the storage medium, comprise that some instructions are with so that a computer equipment (can be a personal computer, server, the perhaps network equipment etc.) carry out the described method of each embodiment of the present invention.

Above-described embodiment of the present invention does not constitute the qualification to protection domain of the present invention.Any modification of being done within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within protection scope of the present invention.

Claims

1. a speech recognition semantic confidence feature extracting methods is characterized in that, comprising:

The theme that utilizes The reasoning results to calculate speech distributes;

From recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes;

2. the method for claim 1 is characterized in that, by topic model voice identification result is carried out reasoning, obtains the thematic structure of recognition result, comprising:

Suppose that number of topics is K, promptly Obtain by the topic model reasoning, the thematic structure on the current recognition result d, the probability of theme j under the probability of speech w and the current recognition result d under the promptly given theme j:

And

3. method as claimed in claim 2 is characterized in that, the theme that utilizes The reasoning results to calculate speech distributes and comprises:

Utilize

And

Calculate the theme distribution Topic_dis (w of speech _i), w wherein _iBe the speech among the recognition result d, Topic_dis (w _i) be the vector of K dimension, concrete formula as follows:

Topic_dis(w _i)＝(H(w _i，Z ₁)，H(w _i，Z ₂)...H(w _i，Z _K))；

Wherein,

H (w_{i}, z_{j}) = P (z_{j} | w_{i}) = \frac{P (w_{i} | z_{j}) * P (z_{j})}{p (w_{i})} = \frac{Φ_{j}^{(w_{i})} * P (z_{j})}{p (w_{i})};

P (z_{j}) = Σ_{i = 1}^{M} P (z_{j} {, d}_{i}) = Σ_{i = 1}^{M} P (z_{j} | d_{i}) * P (d_{i}) = P (d) * Σ_{i = 1}^{M} θ_{j}^{(d_{i})};

Wherein, the be the theme training number of documents of model of M, d _iBe i piece of writing document in the training document, the prior probability of document d is counted as even distribution, i.e. P (d _i)=P (d), i=1...M wherein, then

P (w_{i}) = Σ_{j = 1}^{K} P (w_{i} {, z}_{j}) = Σ_{j = 1}^{K} P (w_{i} | z_{j}) * P (z_{j}) = Σ_{j = 1}^{K} Φ_{j}^{(w_{i})} * P (z_{j}) .

4. method as claimed in claim 3, it is characterized in that, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as anchor point speech (Anchor Words), utilize the theme of anchor point speech to distribute then, the benchmark theme that calculates recognition result distributes, and comprising:

Calculate the acoustics posterior probability of each speech in the recognition result by the detailed decoded information of speech recognition record;

The threshold value of posterior probability is set, when certain speech posterior probability is greater than this threshold value in the recognition result, this speech is added in the credible class,, then give up if less than this threshold value;

Add up the number of speech in the credible class, called after C_num;

Judge whether speech is arranged in the credible class,, then change the posteriority probability threshold value, select speech again and be added to credible class if do not have speech in the credible class;

If in the credible class speech is arranged, calculates the Topic_dis (w of each speech in the credible class _i), and record w _iPairing H (w _i, z _j) in maximal value, promptly

Maximal value is to should the thematic power of speech;

The ratio Aratio of anchor point speech is chosen in setting, the number L=INT of anchor point (C_num*Aratio)+1, wherein function INT () is a bracket function, from credible class according to max_prob (w _i) select the anchor point speech of L speech from big to small as current recognition result;

The theme of statistics anchor point speech distributes, and supposes L altogether of current anchor point speech, the corresponding point sequence

Anchor point speech A then _iTheme be distributed as Topic_dis (A _i), i=1...L;

Benchmark theme according to the theme Distribution calculation recognition result d of anchor point speech distributes, and called after Topic_dis (d) is the vector of K dimension, specifically formula as follows:

Topic_dis(d)＝(L(d，Z ₁)，L(d，Z ₂)...L(d，Z _K))；

Wherein,

L(d，Z _j)＝Com(H(A ₁，Z _j)，H(A ₂，Z _j)...，H(A _L，Z _j))；

Wherein, Com () is for carrying out the function of arithmetic mean to the probable value of each anchor point speech under j theme.

5. method as claimed in claim 4 is characterized in that, uses the theme of speech in the recognition result to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, the semantic confidence feature as speech comprises:

By making word theme distribution Topic_dis (w _i), the similarity between itself and the recognition result benchmark theme distribution Topic_dis (d) relatively is as the semantic confidence feature of speech in the recognition result, promptly

Sem(w _i)＝Similarity(Topic_dis(w _i)，Topic_dis(d))

Wherein, Sem (w _i) be speech w _iThe semantic confidence feature, Similarity () is the similarity measurement function, uses symmetrical K-L divergence.

6. speech recognition semantic confidence feature deriving means comprises:

Document benchmark theme distribution generating apparatus, be used for determining the anchor point speech, the concrete thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech, the benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain recognition result then distributes;

7. device as claimed in claim 6 is characterized in that, described subject analysis device comprises: be used to use topic model that recognition result is carried out rational analysis, obtain the thematic structure in the recognition result, suppose that promptly number of topics is K, promptly

And

8. device as claimed in claim 7 is characterized in that, institute's predicate theme distribution generating apparatus comprises: be used for using

And

Calculate the theme distribution Topic_dis (w of speech _i), according to formula Topic_dis (w _i)=(H (w _i, Z ₁), H (w _i, Z ₂) ... H (w _i, Z _K)); Wherein,

H (w_{i}, z_{j}) = P (z_{j} | w_{i}) = \frac{P (w_{i} | z_{j}) * P (z_{j})}{p (w_{i})} = \frac{Φ_{j}^{(w_{i})} * P (z_{j})}{p (w_{i})};

P (z_{j}) = Σ_{i = 1}^{M} P (z_{j} {, d}_{i}) = Σ_{i = 1}^{M} P (z_{j} | d_{i}) * P (d_{i}) = P (d) * Σ_{i = 1}^{M} θ_{j}^{(d_{i})};

P (w_{i}) = Σ_{j = 1}^{K} P (w_{i} {, z}_{j}) = Σ_{j = 1}^{K} P (w_{i} | z_{j}) * P (z_{j}) = Σ_{j = 1}^{K} Φ_{j}^{(w_{i})} * P (z_{j}) .

9. device as claimed in claim 8, it is characterized in that, described document benchmark theme distribution generating apparatus comprises: the thematic structure in the recognition result that obtains by the subject analysis device, and the acoustics posterior probability information of speech in the recognition result that obtains of posterior probability generating apparatus, from recognition result, choose some, acoustics posterior probability greater than certain threshold value and thematic strong speech as the anchor point speech; The benchmark theme that utilizes the theme Distribution calculation of anchor point speech to obtain whole recognition result then distributes; Suppose L altogether of current anchor point speech, the corresponding point sequence

I=1...L distributes according to the benchmark theme of the theme Distribution calculation recognition result d of anchor point speech, called after Topic_dis (d), the vector for the K dimension, pass through formula:

Topic_dis(d)＝(L(d，Z ₁)，L(d，Z ₂)...L(d，Z _K))；

Wherein,

L(d，Z _j)＝Com(H(A ₁，Z _j)，H(A ₂，Z _j)...，H(A _L，Z _j))；

Wherein, Com () is for carrying out the function of arithmetic mean j theme lower probability value to each anchor point speech.

10. device as claimed in claim 9, it is characterized in that described semantic feature extraction element comprises: be used for utilizing the theme of recognition result speech to distribute, relatively its with the benchmark theme distribution of recognition result between similarity, as the semantic confidence feature of speech, specifically pass through formula

Sem(w _i)＝Similarity(Topic_dis(w _i)，Topic_dis(d))

Wherein, Sem (w _i) be speech w _iThe semantic confidence feature,

Similarity () is the method for similarity measurement, uses symmetrical K-L divergence.