CN105161098A

CN105161098A - Speech recognition method and speech recognition device for interaction system

Info

Publication number: CN105161098A
Application number: CN201510463527.2A
Authority: CN
Inventors: 齐路; 韩笑; 苑一时
Original assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd; Qizhi Software Beijing Co Ltd
Priority date: 2015-07-31
Filing date: 2015-07-31
Publication date: 2015-12-16
Also published as: WO2017020794A1

Abstract

The invention discloses a speech recognition method and a speech recognition device for an interaction system. The method comprises the following steps: pre-determining multiple expected samples corresponding to an interaction state and an expected answer under the interaction state in a speech recognition sample library according to the interaction state and the expected answer; dividing the multiple expected samples corresponding to the expected answer into at least two sample groups, wherein each sample group at least includes one expected sample; acquiring a voice signal of a user under the interaction state; and matching the voice signal with the expected sample(s) in one of the at least two sample groups. By adopting the technical scheme of the invention, the speech recognition speed and accuracy of the interaction system can be improved.

Description

A kind of audio recognition method of interactive system and device

Technical field

The present invention relates to technical field of voice recognition, be specifically related to a kind of audio recognition method and device of interactive system.

Background technology

Along with the development of multimedia technology, various interactive system all have employed the mode of interactive voice, to improve with the interactive efficiency of user and to improve interest.Such as question answering system, first come up with question to user by the mode of voice or image display, then user speech is answered.Such as some display systems again, need user to send phonetic order to select the content of showing in which catalogue.All need accurately to identify the voice of user under these scenes, could judge under question answering system that whether the vocal answer that user provides is correct, and under display systems, judge which catalogue user have selected on earth, to show the content in respective directories.

Therefore carry out accurately and identify fast being the problem needing solution badly to the voice of user in interactive system.

Summary of the invention

In view of the above problems, the present invention is proposed to provide a kind of overcoming the problems referred to above or a kind of audio recognition method of interactive system solved the problem at least in part and device.

According to one aspect of the present invention, provide a kind of audio recognition method of interactive system, wherein, the method comprises:

According to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determine the multiple expection samples corresponding with interactive state and described expection answer;

Multiple expection samples corresponding for described expection answer are divided at least two sample groups, in each sample group, at least comprise an expection sample;

Gather the voice signal of the user under described interactive state;

Described voice signal is mated with the expection sample in a sample group at least two sample groups.

Alternatively, the method comprises further:

The expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.

Alternatively, the method comprises further:

The expection sample mated if do not find in described sample group, then mate described voice signal with the expection sample in another sample group in described at least two sample groups.

Alternatively, the method comprises further:

The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.

Alternatively, the method comprises further:

The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.

Alternatively, the method comprises further:

Calculate described voice signal with expection sample mate angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.

Alternatively, describedly multiple expection samples corresponding for described expection answer be divided at least two sample groups comprise:

By multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.

Alternatively, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:

Described voice signal is mated with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.

Expection sample the highest with a sample group medium priority at least two sample groups for described voice signal is mated.

Alternatively, the method comprises further:

According to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion;

Alternatively, before the voice signal gathering the user under described interactive state, the method comprises further:

By in conjunction with any one or more form in voice, image and video, represent interactive state.

According to an alternative embodiment of the invention, disclose a kind of speech recognition equipment of interactive system, wherein, this device comprises:

Expection sample determining unit, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer;

Grouped element, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group;

Collecting unit, is suitable for gathering the voice signal of the user under described interactive state;

Matching treatment unit, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.

Alternatively, described matching treatment unit, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.

Alternatively, described matching treatment unit, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.

Alternatively, described matching treatment unit, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.

Alternatively, this device comprises further:

Described matching treatment unit, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.

Alternatively, described matching treatment unit, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.

Alternatively, described grouped element, being suitable for multiple expection samples corresponding for described expection answer, is at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.

Alternatively, described matching treatment unit, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.

Alternatively, described matching treatment unit, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.

Alternatively, this device comprises further:

Expansion unit, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion;

Alternatively, this device comprises further:

Represent unit, be suitable for, by conjunction with any one or more form in voice, image and video, representing interactive state.

According to of the present invention this according to the expection answer under interactive state and described interactive state, the multiple expection samples corresponding with interactive state and described expection answer are pre-determined in speech recognition Sample Storehouse, multiple expection samples corresponding for described expection answer are divided at least two sample groups, an expection sample is at least comprised in each sample group, gather the voice signal of the user under described interactive state, described voice signal and the expection sample in a sample group at least two sample groups are carried out the technical scheme of mating, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.

Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.

Accompanying drawing explanation

By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:

Fig. 1 shows a kind of according to an embodiment of the invention process flow diagram of audio recognition method of interactive system;

Fig. 2 shows a kind of according to an embodiment of the invention structural drawing of speech recognition equipment of interactive system; And

Fig. 3 shows the structural drawing of the speech recognition equipment of a kind of interactive system according to another embodiment of the present invention.

Embodiment

Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.

Fig. 1 shows a kind of according to an embodiment of the invention process flow diagram of audio recognition method of interactive system.As shown in Figure 1, the method comprises:

Step S110, according to the expection answer under interactive state and described interactive state, pre-determines the multiple expection samples corresponding with interactive state and described expection answer in speech recognition Sample Storehouse.

In the present embodiment of the invention, an interactive state refers to certain specific interactive scene, such as, certain specific question-and-answer problem scene in question answering system, or the scene of certain certain content of displaying in display systems.

For the specific question-and-answer problem scene of certain in question answering system, question answering system is putd question to: " whether leaf of Herba Apii graveolentis edible? "Corresponding expection answer is "Yes".According to this expection answer, user answer "Yes", " can " and " energy " all correct, therefore select from speech recognition Sample Storehouse "Yes", " can " and " energy " these three samples as multiple expection sample groups corresponding to this interactive state.

Multiple expection samples corresponding for described expection answer are divided at least two sample groups by step S120, at least comprise an expection sample in each sample group.

Such as, still with question answering system put question to " whether leaf of Herba Apii graveolentis edible? " this interactive state is example, and three of correspondence expection samples are divided into two groups, and wherein "Yes" is the first sample group, " can " and " can " be the second sample group.

Step S130, gathers the voice signal of the user under described interactive state.

Such as, when gathering the voice signal of user for this problem after question answering system output problem.General employing microphone carries out the collection of voice signal.

Step S140, mates described voice signal with the expection sample in a sample group at least two sample groups.

Such as, the voice signal of collection is mated with the expection sample "Yes" in the first sample group, or mate with the expection sample " can " and " can " in the second sample group.

Method shown in Fig. 1, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.

In one embodiment of the invention, the method shown in Fig. 1 comprises further: the expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.Namely no matter described voice signal is mated with the expection sample in which sample group, as long as find the expection sample of coupling, then determine that the answer that user provides is this expection answer, i.e. correct option.

In one embodiment of the invention, the method shown in Fig. 1 comprises further: the expection sample mated if do not find in described sample group, then mated with the expection sample in another sample group in described at least two sample groups by described voice signal.The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.

If namely do not find the expection sample mated in the sample group of first time selection, then second time selects other sample group again, and the expection signal in the sample group select voice signal and second time mates; Equally, if find the expection sample of coupling in the sample group of second time selection, then determine that user gives correct option; Otherwise, if do not find the expection sample of coupling in the sample group that second time is selected, then determine that user does not provide correct option, or also have other then not mated by the expection sample in the sample group selected further with by voice signal by the sample group selected.

In one embodiment of the invention, said method comprises further: calculate described voice signal with expection sample mate angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.

When namely judging with one, voice signal expects whether sample mates, pre-set a preset value, when specifically mating be calculate voice signal with expect sample mate angle value, then the coupling angle value calculated and preset value to be compared.Voice signal mates the Similarity value that angle value can adopt both, i.e. the Similarity value of two voice signals with expection sample.

In one embodiment of the invention, described in method, multiple expection samples corresponding for described expection answer are divided at least two sample groups shown in Fig. 1 to comprise: by multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.

Such as, " whether leaf of Herba Apii graveolentis edible? " corresponding expection answer is "Yes".According to this expection answer, select "Yes", " can " and " energy " these three samples expect samples as this interactive state corresponding three.Wherein "Yes" with expection answer similarity degree be 100%, be therefore divided into the first sample group, and " can " and " energy " be not 100% due to similarity, be therefore divided into the second sample group.Again such as, the probability that user answers "Yes" is 70%, is divided into the first sample group, user answer " can " and the probability of " energy " be 16% and 14% respectively, be then divided into the second sample group.

In one embodiment of the invention, described in method, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups shown in Fig. 1: described voice signal is mated with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.

Such as, the "Yes" in the first sample group is that user answers a highest answer of probability, and the user of such as 70% can answer "Yes", and therefore described voice signal mates with the expection sample "Yes" in the first sample group by first time.

In one embodiment of the invention, in method shown in Fig. 1, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups: expection sample the highest with a sample group medium priority at least two sample groups for described voice signal is mated.

Such as, if described voice signal mates with the expection sample in aforesaid second sample group by current selection, and in the second sample group user answer " can " probability be 16%, answering the probability of " energy " is 14%, therefore arrange " can " priority ratio " energy " height.Then by described voice signal first with expection sample " can " mate, if do not mate again with expect sample " can " mate.

In one embodiment of the invention, method shown in Fig. 1 comprises further: according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.

This is because the speech recognition Sample Storehouse pre-set may not cover all samples corresponding to expection answer, the sample group therefore can carried out supplementary speech recognition Sample Storehouse by study or selected.Such as found by study, some user can answer " " and " OK ", this is also the implication representing "Yes" in mankind's linguistic context, therefore can " " and " OK " these two samples be also indexed in speech recognition Sample Storehouse, or add in the sample group selected, an or newly-increased sample group.Such as, " " and " OK " is added to aforesaid second sample group, or " " and " OK " is divided into the 3rd sample group.

In one embodiment of the invention, method shown in Fig. 1 is before the voice signal gathering the user under described interactive state, and the method comprises further: by conjunction with any one or more form in voice, image and video, represent interactive state.Such as, by representing problem in conjunction with the form of any one or more in voice, image and video or represent the content that will show.

Fig. 2 shows a kind of according to an embodiment of the invention structural drawing of speech recognition equipment of interactive system.Shown in Fig. 2, the speech recognition equipment 200 of this interactive system comprises:

Expection sample determining unit 210, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer.

Grouped element 220, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group.

Collecting unit 230, is suitable for gathering the voice signal of the user under described interactive state.

Matching treatment unit 240, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.

Device shown in Fig. 2, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.

Fig. 3 shows the structural drawing of the speech recognition equipment of a kind of interactive system according to another embodiment of the present invention.Shown in Fig. 3, the speech recognition equipment 300 of this interactive system comprises:

Expection sample determining unit 310, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer.

Grouped element 320, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group.At least two sample groups described in preserving in grouped element 320.

Collecting unit 330, is suitable for gathering the voice signal of the user under described interactive state;

Matching treatment unit 340, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.

In one embodiment of the invention, described matching treatment unit 340, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.Namely no matter described voice signal is mated with the expection sample in which sample group, as long as find the expection sample of coupling, then determine that the answer that user provides is this expection answer, i.e. correct option.

In one embodiment of the invention, described matching treatment unit 340, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.

In one embodiment of the invention, described matching treatment unit 340, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.

In one embodiment of the invention, described matching treatment unit 340, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.

In one embodiment of the invention, described matching treatment unit 340, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.When namely judging with one, voice signal expects whether sample mates, pre-set a preset value, when specifically mating be calculate voice signal with expect sample mate angle value, then the coupling angle value calculated and preset value to be compared.Voice signal mates the Similarity value that angle value can adopt both, i.e. the Similarity value of two voice signals with expection sample.

In one embodiment of the invention, described grouped element 320, be suitable for multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.Such as, " whether leaf of Herba Apii graveolentis edible? " corresponding expection answer is "Yes".According to this expection answer, select "Yes", " can " and " energy " these three samples expect samples as this interactive state corresponding three.Wherein "Yes" with expection answer similarity degree be 100%, be therefore divided into the first sample group, and " can " and " energy " be not 100% due to similarity, be therefore divided into the second sample group.Again such as, the probability that user answers "Yes" is 70%, is divided into the first sample group, user answer " can " and the probability of " energy " be 16% and 14% respectively, be then divided into the second sample group.

In one embodiment of the invention, described matching treatment unit 340, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.Such as, the "Yes" in the first sample group is that user answers a highest answer of probability, and the user of such as 70% can answer "Yes", and therefore described voice signal mates with the expection sample "Yes" in the first sample group by first time.

In one embodiment of the invention, described matching treatment unit 340, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.Such as, if described voice signal mates with the expection sample in aforesaid second sample group by current selection, and in the second sample group user answer " can " probability be 16%, answering the probability of " energy " is 14%, therefore arrange " can " priority ratio " energy " height.Then by described voice signal line with expection sample " can " mate, if do not mate again with expection sample " can " mate.

In one embodiment of the invention, this device 300 comprises further: expansion unit 350, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.This is because the speech recognition Sample Storehouse pre-set may not cover all samples corresponding to expection answer, the sample group therefore can carried out supplementary speech recognition Sample Storehouse by study or selected.Such as found by study, some user can answer " " and " OK ", this is also the implication representing "Yes" in mankind's linguistic context, therefore can " " and " OK " these two samples be also indexed in speech recognition Sample Storehouse, or add in the sample group selected, an or newly-increased sample group.Such as, " " and " OK " is added to aforesaid second sample group, or " " and " OK " is divided into the 3rd sample group.

In one embodiment of the invention, this device 300 comprises further: represent unit 360, is suitable for, by conjunction with any one or more form in voice, image and video, representing interactive state.Such as, by representing problem in conjunction with the form of any one or more in voice, image and video or represent the content that will show.

It should be noted that:

Intrinsic not relevant to any certain computer, virtual bench or miscellaneous equipment with display at this algorithm provided.Various fexible unit also can with use based on together with this teaching.According to description above, the structure constructed required by this kind of device is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.

In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.

Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.

Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.

In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.

All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the speech recognition equipment of the interactive system of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.

The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.

The invention discloses the audio recognition method of A1, a kind of interactive system, wherein, the method comprises:

Gather the voice signal of the user under described interactive state;

A2, method as described in A1, wherein, the method comprises further:

A3, method as described in A1, wherein, the method comprises further:

A4, method as described in A3, wherein, the method comprises further:

A5, method as described in A3, wherein, the method comprises further:

A6, method according to any one of A2-A5, wherein, the method comprises further:

Multiple expection samples corresponding for described expection answer wherein, are describedly divided at least two sample groups and comprise by A7, method as described in A1:

A8, method as described in A7, wherein, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:

A9, method as described in A1, wherein, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:

A10, method as described in A1, wherein, the method comprises further:

According to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.

A11, method as described in A1, wherein, before the voice signal gathering the user under described interactive state, the method comprises further:

The invention also discloses the speech recognition equipment of B12, a kind of interactive system, wherein, this device comprises:

B13, device as described in B12, wherein,

Described matching treatment unit, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.

B14, device as described in B12, wherein,

Described matching treatment unit, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.

B15, device as described in B14, wherein,

Described matching treatment unit, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.

B16, device as described in B14, wherein,

B17, device according to any one of B13-B16, wherein,

Described matching treatment unit, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.

B18, device as described in B12, wherein,

Described grouped element, be suitable for multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.

B19, device as described in B18, wherein,

Described matching treatment unit, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.

B20, device as described in B12, wherein,

Described matching treatment unit, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.

B21, device as described in B12, wherein, this device comprises further:

Expansion unit, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.

B22, device as described in B12, wherein, this device comprises further:

Claims

1. an audio recognition method for interactive system, wherein, the method comprises:

Gather the voice signal of the user under described interactive state;

2. the method for claim 1, wherein the method comprises further:

3. the method for claim 1, wherein the method comprises further:

4. method as claimed in claim 3, wherein, the method comprises further:

5. method as claimed in claim 3, wherein, the method comprises further:

6. a speech recognition equipment for interactive system, wherein, this device comprises:

7. device as claimed in claim 6, wherein,

8. device as claimed in claim 6, wherein,

9. device as claimed in claim 8, wherein,

10. device as claimed in claim 8, wherein,