CN105161098A - Speech recognition method and speech recognition device for interaction system - Google Patents

Speech recognition method and speech recognition device for interaction system Download PDF

Info

Publication number
CN105161098A
CN105161098A CN201510463527.2A CN201510463527A CN105161098A CN 105161098 A CN105161098 A CN 105161098A CN 201510463527 A CN201510463527 A CN 201510463527A CN 105161098 A CN105161098 A CN 105161098A
Authority
CN
China
Prior art keywords
sample
expection
answer
voice signal
sample group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510463527.2A
Other languages
Chinese (zh)
Inventor
齐路
韩笑
苑一时
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201510463527.2A priority Critical patent/CN105161098A/en
Publication of CN105161098A publication Critical patent/CN105161098A/en
Priority to PCT/CN2016/092412 priority patent/WO2017020794A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Abstract

The invention discloses a speech recognition method and a speech recognition device for an interaction system. The method comprises the following steps: pre-determining multiple expected samples corresponding to an interaction state and an expected answer under the interaction state in a speech recognition sample library according to the interaction state and the expected answer; dividing the multiple expected samples corresponding to the expected answer into at least two sample groups, wherein each sample group at least includes one expected sample; acquiring a voice signal of a user under the interaction state; and matching the voice signal with the expected sample(s) in one of the at least two sample groups. By adopting the technical scheme of the invention, the speech recognition speed and accuracy of the interaction system can be improved.

Description

A kind of audio recognition method of interactive system and device
Technical field
The present invention relates to technical field of voice recognition, be specifically related to a kind of audio recognition method and device of interactive system.
Background technology
Along with the development of multimedia technology, various interactive system all have employed the mode of interactive voice, to improve with the interactive efficiency of user and to improve interest.Such as question answering system, first come up with question to user by the mode of voice or image display, then user speech is answered.Such as some display systems again, need user to send phonetic order to select the content of showing in which catalogue.All need accurately to identify the voice of user under these scenes, could judge under question answering system that whether the vocal answer that user provides is correct, and under display systems, judge which catalogue user have selected on earth, to show the content in respective directories.
Therefore carry out accurately and identify fast being the problem needing solution badly to the voice of user in interactive system.
Summary of the invention
In view of the above problems, the present invention is proposed to provide a kind of overcoming the problems referred to above or a kind of audio recognition method of interactive system solved the problem at least in part and device.
According to one aspect of the present invention, provide a kind of audio recognition method of interactive system, wherein, the method comprises:
According to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determine the multiple expection samples corresponding with interactive state and described expection answer;
Multiple expection samples corresponding for described expection answer are divided at least two sample groups, in each sample group, at least comprise an expection sample;
Gather the voice signal of the user under described interactive state;
Described voice signal is mated with the expection sample in a sample group at least two sample groups.
Alternatively, the method comprises further:
The expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.
Alternatively, the method comprises further:
The expection sample mated if do not find in described sample group, then mate described voice signal with the expection sample in another sample group in described at least two sample groups.
Alternatively, the method comprises further:
The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.
Alternatively, the method comprises further:
The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.
Alternatively, the method comprises further:
Calculate described voice signal with expection sample mate angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.
Alternatively, describedly multiple expection samples corresponding for described expection answer be divided at least two sample groups comprise:
By multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.
Alternatively, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:
Described voice signal is mated with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.
Alternatively, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:
Expection sample the highest with a sample group medium priority at least two sample groups for described voice signal is mated.
Alternatively, the method comprises further:
According to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion;
Alternatively, before the voice signal gathering the user under described interactive state, the method comprises further:
By in conjunction with any one or more form in voice, image and video, represent interactive state.
According to an alternative embodiment of the invention, disclose a kind of speech recognition equipment of interactive system, wherein, this device comprises:
Expection sample determining unit, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer;
Grouped element, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group;
Collecting unit, is suitable for gathering the voice signal of the user under described interactive state;
Matching treatment unit, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.
Alternatively, described matching treatment unit, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.
Alternatively, described matching treatment unit, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.
Alternatively, described matching treatment unit, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.
Alternatively, this device comprises further:
Described matching treatment unit, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.
Alternatively, described matching treatment unit, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.
Alternatively, described grouped element, being suitable for multiple expection samples corresponding for described expection answer, is at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.
Alternatively, described matching treatment unit, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.
Alternatively, described matching treatment unit, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.
Alternatively, this device comprises further:
Expansion unit, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion;
Alternatively, this device comprises further:
Represent unit, be suitable for, by conjunction with any one or more form in voice, image and video, representing interactive state.
According to of the present invention this according to the expection answer under interactive state and described interactive state, the multiple expection samples corresponding with interactive state and described expection answer are pre-determined in speech recognition Sample Storehouse, multiple expection samples corresponding for described expection answer are divided at least two sample groups, an expection sample is at least comprised in each sample group, gather the voice signal of the user under described interactive state, described voice signal and the expection sample in a sample group at least two sample groups are carried out the technical scheme of mating, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to technological means of the present invention can be better understood, and can be implemented according to the content of instructions, and can become apparent, below especially exemplified by the specific embodiment of the present invention to allow above and other objects of the present invention, feature and advantage.
Accompanying drawing explanation
By reading hereafter detailed description of the preferred embodiment, various other advantage and benefit will become cheer and bright for those of ordinary skill in the art.Accompanying drawing only for illustrating the object of preferred implementation, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts by identical reference symbol.In the accompanying drawings:
Fig. 1 shows a kind of according to an embodiment of the invention process flow diagram of audio recognition method of interactive system;
Fig. 2 shows a kind of according to an embodiment of the invention structural drawing of speech recognition equipment of interactive system; And
Fig. 3 shows the structural drawing of the speech recognition equipment of a kind of interactive system according to another embodiment of the present invention.
Embodiment
Below with reference to accompanying drawings exemplary embodiment of the present disclosure is described in more detail.Although show exemplary embodiment of the present disclosure in accompanying drawing, however should be appreciated that can realize the disclosure in a variety of manners and not should limit by the embodiment set forth here.On the contrary, provide these embodiments to be in order to more thoroughly the disclosure can be understood, and complete for the scope of the present disclosure can be conveyed to those skilled in the art.
Fig. 1 shows a kind of according to an embodiment of the invention process flow diagram of audio recognition method of interactive system.As shown in Figure 1, the method comprises:
Step S110, according to the expection answer under interactive state and described interactive state, pre-determines the multiple expection samples corresponding with interactive state and described expection answer in speech recognition Sample Storehouse.
In the present embodiment of the invention, an interactive state refers to certain specific interactive scene, such as, certain specific question-and-answer problem scene in question answering system, or the scene of certain certain content of displaying in display systems.
For the specific question-and-answer problem scene of certain in question answering system, question answering system is putd question to: " whether leaf of Herba Apii graveolentis edible? "Corresponding expection answer is "Yes".According to this expection answer, user answer "Yes", " can " and " energy " all correct, therefore select from speech recognition Sample Storehouse "Yes", " can " and " energy " these three samples as multiple expection sample groups corresponding to this interactive state.
Multiple expection samples corresponding for described expection answer are divided at least two sample groups by step S120, at least comprise an expection sample in each sample group.
Such as, still with question answering system put question to " whether leaf of Herba Apii graveolentis edible? " this interactive state is example, and three of correspondence expection samples are divided into two groups, and wherein "Yes" is the first sample group, " can " and " can " be the second sample group.
Step S130, gathers the voice signal of the user under described interactive state.
Such as, when gathering the voice signal of user for this problem after question answering system output problem.General employing microphone carries out the collection of voice signal.
Step S140, mates described voice signal with the expection sample in a sample group at least two sample groups.
Such as, the voice signal of collection is mated with the expection sample "Yes" in the first sample group, or mate with the expection sample " can " and " can " in the second sample group.
Method shown in Fig. 1, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.
In one embodiment of the invention, the method shown in Fig. 1 comprises further: the expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.Namely no matter described voice signal is mated with the expection sample in which sample group, as long as find the expection sample of coupling, then determine that the answer that user provides is this expection answer, i.e. correct option.
In one embodiment of the invention, the method shown in Fig. 1 comprises further: the expection sample mated if do not find in described sample group, then mated with the expection sample in another sample group in described at least two sample groups by described voice signal.The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.
If namely do not find the expection sample mated in the sample group of first time selection, then second time selects other sample group again, and the expection signal in the sample group select voice signal and second time mates; Equally, if find the expection sample of coupling in the sample group of second time selection, then determine that user gives correct option; Otherwise, if do not find the expection sample of coupling in the sample group that second time is selected, then determine that user does not provide correct option, or also have other then not mated by the expection sample in the sample group selected further with by voice signal by the sample group selected.
In one embodiment of the invention, said method comprises further: calculate described voice signal with expection sample mate angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.
When namely judging with one, voice signal expects whether sample mates, pre-set a preset value, when specifically mating be calculate voice signal with expect sample mate angle value, then the coupling angle value calculated and preset value to be compared.Voice signal mates the Similarity value that angle value can adopt both, i.e. the Similarity value of two voice signals with expection sample.
In one embodiment of the invention, described in method, multiple expection samples corresponding for described expection answer are divided at least two sample groups shown in Fig. 1 to comprise: by multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.
Such as, " whether leaf of Herba Apii graveolentis edible? " corresponding expection answer is "Yes".According to this expection answer, select "Yes", " can " and " energy " these three samples expect samples as this interactive state corresponding three.Wherein "Yes" with expection answer similarity degree be 100%, be therefore divided into the first sample group, and " can " and " energy " be not 100% due to similarity, be therefore divided into the second sample group.Again such as, the probability that user answers "Yes" is 70%, is divided into the first sample group, user answer " can " and the probability of " energy " be 16% and 14% respectively, be then divided into the second sample group.
In one embodiment of the invention, described in method, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups shown in Fig. 1: described voice signal is mated with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.
Such as, the "Yes" in the first sample group is that user answers a highest answer of probability, and the user of such as 70% can answer "Yes", and therefore described voice signal mates with the expection sample "Yes" in the first sample group by first time.
In one embodiment of the invention, in method shown in Fig. 1, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups: expection sample the highest with a sample group medium priority at least two sample groups for described voice signal is mated.
Such as, if described voice signal mates with the expection sample in aforesaid second sample group by current selection, and in the second sample group user answer " can " probability be 16%, answering the probability of " energy " is 14%, therefore arrange " can " priority ratio " energy " height.Then by described voice signal first with expection sample " can " mate, if do not mate again with expect sample " can " mate.
In one embodiment of the invention, method shown in Fig. 1 comprises further: according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.
This is because the speech recognition Sample Storehouse pre-set may not cover all samples corresponding to expection answer, the sample group therefore can carried out supplementary speech recognition Sample Storehouse by study or selected.Such as found by study, some user can answer " " and " OK ", this is also the implication representing "Yes" in mankind's linguistic context, therefore can " " and " OK " these two samples be also indexed in speech recognition Sample Storehouse, or add in the sample group selected, an or newly-increased sample group.Such as, " " and " OK " is added to aforesaid second sample group, or " " and " OK " is divided into the 3rd sample group.
In one embodiment of the invention, method shown in Fig. 1 is before the voice signal gathering the user under described interactive state, and the method comprises further: by conjunction with any one or more form in voice, image and video, represent interactive state.Such as, by representing problem in conjunction with the form of any one or more in voice, image and video or represent the content that will show.
Fig. 2 shows a kind of according to an embodiment of the invention structural drawing of speech recognition equipment of interactive system.Shown in Fig. 2, the speech recognition equipment 200 of this interactive system comprises:
Expection sample determining unit 210, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer.
Grouped element 220, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group.
Collecting unit 230, is suitable for gathering the voice signal of the user under described interactive state.
Matching treatment unit 240, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.
Device shown in Fig. 2, due to expection sample corresponding for expection answer is found out in advance and is divided into groups, by the voice signal of user directly with wherein one group expect that sample mates, therefore make the range shorter of coupling among a small circle expected to one, not only increase the speed of the speech recognition of interactive system, and improve the accuracy of speech recognition.
Fig. 3 shows the structural drawing of the speech recognition equipment of a kind of interactive system according to another embodiment of the present invention.Shown in Fig. 3, the speech recognition equipment 300 of this interactive system comprises:
Expection sample determining unit 310, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer.
Grouped element 320, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group.At least two sample groups described in preserving in grouped element 320.
Collecting unit 330, is suitable for gathering the voice signal of the user under described interactive state;
Matching treatment unit 340, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.
In one embodiment of the invention, described matching treatment unit 340, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.Namely no matter described voice signal is mated with the expection sample in which sample group, as long as find the expection sample of coupling, then determine that the answer that user provides is this expection answer, i.e. correct option.
In one embodiment of the invention, described matching treatment unit 340, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.
In one embodiment of the invention, described matching treatment unit 340, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.
In one embodiment of the invention, described matching treatment unit 340, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.
If namely do not find the expection sample mated in the sample group of first time selection, then second time selects other sample group again, and the expection signal in the sample group select voice signal and second time mates; Equally, if find the expection sample of coupling in the sample group of second time selection, then determine that user gives correct option; Otherwise, if do not find the expection sample of coupling in the sample group that second time is selected, then determine that user does not provide correct option, or also have other then not mated by the expection sample in the sample group selected further with by voice signal by the sample group selected.
In one embodiment of the invention, described matching treatment unit 340, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.When namely judging with one, voice signal expects whether sample mates, pre-set a preset value, when specifically mating be calculate voice signal with expect sample mate angle value, then the coupling angle value calculated and preset value to be compared.Voice signal mates the Similarity value that angle value can adopt both, i.e. the Similarity value of two voice signals with expection sample.
In one embodiment of the invention, described grouped element 320, be suitable for multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.Such as, " whether leaf of Herba Apii graveolentis edible? " corresponding expection answer is "Yes".According to this expection answer, select "Yes", " can " and " energy " these three samples expect samples as this interactive state corresponding three.Wherein "Yes" with expection answer similarity degree be 100%, be therefore divided into the first sample group, and " can " and " energy " be not 100% due to similarity, be therefore divided into the second sample group.Again such as, the probability that user answers "Yes" is 70%, is divided into the first sample group, user answer " can " and the probability of " energy " be 16% and 14% respectively, be then divided into the second sample group.
In one embodiment of the invention, described matching treatment unit 340, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.Such as, the "Yes" in the first sample group is that user answers a highest answer of probability, and the user of such as 70% can answer "Yes", and therefore described voice signal mates with the expection sample "Yes" in the first sample group by first time.
In one embodiment of the invention, described matching treatment unit 340, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.Such as, if described voice signal mates with the expection sample in aforesaid second sample group by current selection, and in the second sample group user answer " can " probability be 16%, answering the probability of " energy " is 14%, therefore arrange " can " priority ratio " energy " height.Then by described voice signal line with expection sample " can " mate, if do not mate again with expection sample " can " mate.
In one embodiment of the invention, this device 300 comprises further: expansion unit 350, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.This is because the speech recognition Sample Storehouse pre-set may not cover all samples corresponding to expection answer, the sample group therefore can carried out supplementary speech recognition Sample Storehouse by study or selected.Such as found by study, some user can answer " " and " OK ", this is also the implication representing "Yes" in mankind's linguistic context, therefore can " " and " OK " these two samples be also indexed in speech recognition Sample Storehouse, or add in the sample group selected, an or newly-increased sample group.Such as, " " and " OK " is added to aforesaid second sample group, or " " and " OK " is divided into the 3rd sample group.
In one embodiment of the invention, this device 300 comprises further: represent unit 360, is suitable for, by conjunction with any one or more form in voice, image and video, representing interactive state.Such as, by representing problem in conjunction with the form of any one or more in voice, image and video or represent the content that will show.
It should be noted that:
Intrinsic not relevant to any certain computer, virtual bench or miscellaneous equipment with display at this algorithm provided.Various fexible unit also can with use based on together with this teaching.According to description above, the structure constructed required by this kind of device is apparent.In addition, the present invention is not also for any certain programmed language.It should be understood that and various programming language can be utilized to realize content of the present invention described here, and the description done language-specific is above to disclose preferred forms of the present invention.
In instructions provided herein, describe a large amount of detail.But can understand, embodiments of the invention can be put into practice when not having these details.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand in each inventive aspect one or more, in the description above to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes.But, the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires feature more more than the feature clearly recorded in each claim.Or rather, as claims below reflect, all features of disclosed single embodiment before inventive aspect is to be less than.Therefore, the claims following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and adaptively can change the module in the equipment in embodiment and they are arranged in one or more equipment different from this embodiment.Module in embodiment or unit or assembly can be combined into a module or unit or assembly, and multiple submodule or subelement or sub-component can be put them in addition.Except at least some in such feature and/or process or unit be mutually repel except, any combination can be adopted to combine all processes of all features disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) and so disclosed any method or equipment or unit.Unless expressly stated otherwise, each feature disclosed in this instructions (comprising adjoint claim, summary and accompanying drawing) can by providing identical, alternative features that is equivalent or similar object replaces.
In addition, those skilled in the art can understand, although embodiments more described herein to comprise in other embodiment some included feature instead of further feature, the combination of the feature of different embodiment means and to be within scope of the present invention and to form different embodiments.Such as, in the following claims, the one of any of embodiment required for protection can use with arbitrary array mode.
All parts embodiment of the present invention with hardware implementing, or can realize with the software module run on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that the some or all functions that microprocessor or digital signal processor (DSP) can be used in practice to realize according to the some or all parts in the speech recognition equipment of the interactive system of the embodiment of the present invention.The present invention can also be embodied as part or all equipment for performing method as described herein or device program (such as, computer program and computer program).Realizing program of the present invention and can store on a computer-readable medium like this, or the form of one or more signal can be had.Such signal can be downloaded from internet website and obtain, or provides on carrier signal, or provides with any other form.
The present invention will be described instead of limit the invention to it should be noted above-described embodiment, and those skilled in the art can design alternative embodiment when not departing from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and does not arrange element in the claims or step.Word "a" or "an" before being positioned at element is not got rid of and be there is multiple such element.The present invention can by means of including the hardware of some different elements and realizing by means of the computing machine of suitably programming.In the unit claim listing some devices, several in these devices can be carry out imbody by same hardware branch.Word first, second and third-class use do not represent any order.Can be title by these word explanations.
The invention discloses the audio recognition method of A1, a kind of interactive system, wherein, the method comprises:
According to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determine the multiple expection samples corresponding with interactive state and described expection answer;
Multiple expection samples corresponding for described expection answer are divided at least two sample groups, in each sample group, at least comprise an expection sample;
Gather the voice signal of the user under described interactive state;
Described voice signal is mated with the expection sample in a sample group at least two sample groups.
A2, method as described in A1, wherein, the method comprises further:
The expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.
A3, method as described in A1, wherein, the method comprises further:
The expection sample mated if do not find in described sample group, then mate described voice signal with the expection sample in another sample group in described at least two sample groups.
A4, method as described in A3, wherein, the method comprises further:
The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.
A5, method as described in A3, wherein, the method comprises further:
The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.
A6, method according to any one of A2-A5, wherein, the method comprises further:
Calculate described voice signal with expection sample mate angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.
Multiple expection samples corresponding for described expection answer wherein, are describedly divided at least two sample groups and comprise by A7, method as described in A1:
By multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.
A8, method as described in A7, wherein, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:
Described voice signal is mated with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.
A9, method as described in A1, wherein, described voice signal is carried out mating comprising with the expection sample in a sample group at least two sample groups:
Expection sample the highest with a sample group medium priority at least two sample groups for described voice signal is mated.
A10, method as described in A1, wherein, the method comprises further:
According to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.
A11, method as described in A1, wherein, before the voice signal gathering the user under described interactive state, the method comprises further:
By in conjunction with any one or more form in voice, image and video, represent interactive state.
The invention also discloses the speech recognition equipment of B12, a kind of interactive system, wherein, this device comprises:
Expection sample determining unit, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer;
Grouped element, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group;
Collecting unit, is suitable for gathering the voice signal of the user under described interactive state;
Matching treatment unit, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.
B13, device as described in B12, wherein,
Described matching treatment unit, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.
B14, device as described in B12, wherein,
Described matching treatment unit, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.
B15, device as described in B14, wherein,
Described matching treatment unit, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.
B16, device as described in B14, wherein,
Described matching treatment unit, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.
B17, device according to any one of B13-B16, wherein,
Described matching treatment unit, what be suitable for calculating described voice signal and expection sample mates angle value, if coupling angle value reaches preset value, then determine described voice signal and this expection sample matches, if instead coupling angle value does not reach preset value, then determine that described voice signal does not mate with this expection sample.
B18, device as described in B12, wherein,
Described grouped element, be suitable for multiple expection samples corresponding for described expection answer, be at least two sample groups according to the different demarcation of the similarity degree with described expection answer, or the different demarcation of the probability of the expection answer that may reply according to user is at least two sample groups.
B19, device as described in B18, wherein,
Described matching treatment unit, be suitable for described voice signal first to mate with the expection sample in the sample group the highest with the similarity degree of described expection answer at least two sample groups, or described voice signal is mated with the sample group comprising the highest expection answer of probability that user may reply at least two sample groups.
B20, device as described in B12, wherein,
Described matching treatment unit, is suitable for first being mated by first for the described voice signal expection sample the highest with a sample group medium priority at least two sample groups.
B21, device as described in B12, wherein, this device comprises further:
Expansion unit, be suitable for according to the history voice signal under this gathered interactive state, the sample group quantity that corresponding expansion described expection answer is corresponding, or, the expection sample size that a sample group corresponding to corresponding expansion described expection answer comprises, or the sample size in the described speech recognition Sample Storehouse of corresponding expansion.
B22, device as described in B12, wherein, this device comprises further:
Represent unit, be suitable for, by conjunction with any one or more form in voice, image and video, representing interactive state.

Claims (10)

1. an audio recognition method for interactive system, wherein, the method comprises:
According to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determine the multiple expection samples corresponding with interactive state and described expection answer;
Multiple expection samples corresponding for described expection answer are divided at least two sample groups, in each sample group, at least comprise an expection sample;
Gather the voice signal of the user under described interactive state;
Described voice signal is mated with the expection sample in a sample group at least two sample groups.
2. the method for claim 1, wherein the method comprises further:
The expection sample mated if find in described sample group, then determine that the answer that user provides is this expection answer.
3. the method for claim 1, wherein the method comprises further:
The expection sample mated if do not find in described sample group, then mate described voice signal with the expection sample in another sample group in described at least two sample groups.
4. method as claimed in claim 3, wherein, the method comprises further:
The expection sample mated if find in another sample group described, then determine that the answer that user provides is this expection answer.
5. method as claimed in claim 3, wherein, the method comprises further:
The expection sample mated if do not find in another sample group described, then determine that user does not provide expection answer.
6. a speech recognition equipment for interactive system, wherein, this device comprises:
Expection sample determining unit, is suitable for according to the expection answer under interactive state and described interactive state, in speech recognition Sample Storehouse, pre-determines the multiple expection samples corresponding with interactive state and described expection answer;
Grouped element, is suitable for multiple expection samples corresponding for described expection answer to be divided at least two sample groups, at least comprises an expection sample in each sample group;
Collecting unit, is suitable for gathering the voice signal of the user under described interactive state;
Matching treatment unit, is suitable for described voice signal to mate with the expection sample in a sample group at least two sample groups.
7. device as claimed in claim 6, wherein,
Described matching treatment unit, is suitable for, when finding the expection sample mated in described sample group, determining that the answer that user provides is this expection answer.
8. device as claimed in claim 6, wherein,
Described matching treatment unit, is suitable for, when not finding the expection sample mated in described sample group, being mated by described voice signal with the expection sample in another sample group in described at least two sample groups.
9. device as claimed in claim 8, wherein,
Described matching treatment unit, is suitable for, when finding the expection sample mated in another sample group described, determining that the answer that user provides is this expection answer.
10. device as claimed in claim 8, wherein,
Described matching treatment unit, is suitable for the expection sample that ought not find to mate in another sample group described, then determines that user does not provide expection answer.
CN201510463527.2A 2015-07-31 2015-07-31 Speech recognition method and speech recognition device for interaction system Pending CN105161098A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510463527.2A CN105161098A (en) 2015-07-31 2015-07-31 Speech recognition method and speech recognition device for interaction system
PCT/CN2016/092412 WO2017020794A1 (en) 2015-07-31 2016-07-29 Voice recognition method applicable to interactive system and device utilizing same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510463527.2A CN105161098A (en) 2015-07-31 2015-07-31 Speech recognition method and speech recognition device for interaction system

Publications (1)

Publication Number Publication Date
CN105161098A true CN105161098A (en) 2015-12-16

Family

ID=54801931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510463527.2A Pending CN105161098A (en) 2015-07-31 2015-07-31 Speech recognition method and speech recognition device for interaction system

Country Status (2)

Country Link
CN (1) CN105161098A (en)
WO (1) WO2017020794A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105771234A (en) * 2016-04-02 2016-07-20 深圳市熙龙玩具有限公司 Riddle guessing toy and implementation method thereof
WO2017020794A1 (en) * 2015-07-31 2017-02-09 北京奇虎科技有限公司 Voice recognition method applicable to interactive system and device utilizing same
CN110706536A (en) * 2019-10-25 2020-01-17 北京猿力未来科技有限公司 Voice answering method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1458645A (en) * 2002-05-15 2003-11-26 日本先锋公司 Voice identification equipment and voice identification program
CN103280218A (en) * 2012-12-31 2013-09-04 威盛电子股份有限公司 Voice recognition-based selection method and mobile terminal device and information system thereof
US20140093056A1 (en) * 2011-03-18 2014-04-03 Fujitsu Limited Call evaluation device and call evaluation method
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104064062A (en) * 2014-06-23 2014-09-24 中国石油大学(华东) On-line listening learning method and system based on voiceprint and voice recognition
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10308611A1 (en) * 2003-02-27 2004-09-16 Siemens Ag Determination of the likelihood of confusion between vocabulary entries in phoneme-based speech recognition
US10319363B2 (en) * 2012-02-17 2019-06-11 Microsoft Technology Licensing, Llc Audio human interactive proof based on text-to-speech and semantics
CN102881284B (en) * 2012-09-03 2014-07-09 江苏大学 Unspecific human voice and emotion recognition method and system
CN103794214A (en) * 2014-03-07 2014-05-14 联想(北京)有限公司 Information processing method, device and electronic equipment
CN104809103B (en) * 2015-04-29 2018-03-30 北京京东尚科信息技术有限公司 A kind of interactive semantic analysis and system
CN105161098A (en) * 2015-07-31 2015-12-16 北京奇虎科技有限公司 Speech recognition method and speech recognition device for interaction system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1458645A (en) * 2002-05-15 2003-11-26 日本先锋公司 Voice identification equipment and voice identification program
US20140093056A1 (en) * 2011-03-18 2014-04-03 Fujitsu Limited Call evaluation device and call evaluation method
CN103280218A (en) * 2012-12-31 2013-09-04 威盛电子股份有限公司 Voice recognition-based selection method and mobile terminal device and information system thereof
CN104424290A (en) * 2013-09-02 2015-03-18 佳能株式会社 Voice based question-answering system and method for interactive voice system
CN104021786A (en) * 2014-05-15 2014-09-03 北京中科汇联信息技术有限公司 Speech recognition method and speech recognition device
CN104064062A (en) * 2014-06-23 2014-09-24 中国石油大学(华东) On-line listening learning method and system based on voiceprint and voice recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017020794A1 (en) * 2015-07-31 2017-02-09 北京奇虎科技有限公司 Voice recognition method applicable to interactive system and device utilizing same
CN105771234A (en) * 2016-04-02 2016-07-20 深圳市熙龙玩具有限公司 Riddle guessing toy and implementation method thereof
CN110706536A (en) * 2019-10-25 2020-01-17 北京猿力未来科技有限公司 Voice answering method and device
CN110706536B (en) * 2019-10-25 2021-10-01 北京猿力教育科技有限公司 Voice answering method and device

Also Published As

Publication number Publication date
WO2017020794A1 (en) 2017-02-09

Similar Documents

Publication Publication Date Title
AU2018200165A1 (en) Phrasecut: segmentation using natural language inputs
CN103491205A (en) Related resource address push method and device based on video retrieval
CN108172213B (en) Surge audio identification method, surge audio identification device, surge audio identification equipment and computer readable medium
CN105515900B (en) A kind of method and device obtaining terminal presence
CN108875486A (en) Recongnition of objects method, apparatus, system and computer-readable medium
CN105161098A (en) Speech recognition method and speech recognition device for interaction system
CN105893390B (en) Application processing method and electronic equipment
US11544543B2 (en) Apparatus and method for sparse training acceleration in neural networks
US10629205B2 (en) Identifying an accurate transcription from probabilistic inputs
CN103488787B (en) A kind of method for pushing and device of the online broadcasting entrance object based on video search
US11282502B2 (en) Method for utterance generation, smart device, and computer readable storage medium
CN104933747A (en) Method and device for converting vector animation into bitmap animation
CN103605696B (en) Method and device for acquiring audio-video file addresses
CN104036259A (en) Face similarity recognition method and system
CN103942264A (en) Method and device for pushing webpages containing news information
CN105893548A (en) Naming method and terminal
CN105617657A (en) Intelligent game recommendation method and device
CN107562847A (en) Information processing method and related product
CN102968445B (en) Based on the application call method and apparatus of browser input
CN104781698A (en) System and method for isolating signal in seismic data
CN105161105A (en) Speech recognition method and speech recognition device for interaction system
CN109509028A (en) A kind of advertisement placement method and device, storage medium, computer equipment
CN106503010A (en) A kind of method and device of database change write subregion
CN105224649A (en) A kind of data processing method and device
CN110580255A (en) method and system for storing and retrieving data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20151216

RJ01 Rejection of invention patent application after publication