CN103971681A - Voice recognition method and system - Google Patents

Voice recognition method and system Download PDF

Info

Publication number
CN103971681A
CN103971681A CN201410168436.1A CN201410168436A CN103971681A CN 103971681 A CN103971681 A CN 103971681A CN 201410168436 A CN201410168436 A CN 201410168436A CN 103971681 A CN103971681 A CN 103971681A
Authority
CN
China
Prior art keywords
model
voice
data
voice data
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410168436.1A
Other languages
Chinese (zh)
Inventor
穆向禹
彭守业
刘思成
贾磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410168436.1A priority Critical patent/CN103971681A/en
Publication of CN103971681A publication Critical patent/CN103971681A/en
Pending legal-status Critical Current

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention provides a voice recognition method. The voice recognition method comprises the steps that first voice frequency data are collected; by utilizing a first model and a second model, voice recognition is performed on the first voice frequency data to obtain a voice recognition result, wherein the first model is used for recognizing second voice frequency data, played by a client terminal, contained in the first voice frequency data, and the second model is used for recognizing third voice frequency data, except for the second voice frequency data played by the client terminal, contained in the first voice frequency data. The embodiment of the invention further provides a voice recognition system. According to the voice recognition method and system, the success rate of voice wake-up in the voice recognition system can be increased.

Description

A kind of audio recognition method and system
[technical field]
The present invention relates to speech recognition technology, relate in particular to a kind of audio recognition method and system.
[background technology]
Speech recognition technology is obtained marked improvement in recent years, speech recognition technology will enter the every field such as industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product.For example, speech recognition technology is often applied in airmanship, and due to user's inconvenient manual manipulation navigation client in driving procedure, therefore, phonetic entry is a kind of well interactive mode; Navigation client is under listening state, can monitor user's phonetic order, and phonetic order is carried out to voice recognition processing, to obtain voice identification result, in the time that voice identification result meets wake-up condition, wake the speech navigation function of navigation client up, the traffic information of audio form is provided to user.
But, navigation client needs to play frequently traffic information sometimes, in the user's that the client that makes to navigate listens to phonetic order, the voice data of often playing doped with navigation client self, make user's phonetic order can not effectively wake navigation client up, the probability of failure that causes waking up navigation client is higher.
[summary of the invention]
In view of this, the embodiment of the present invention provides a kind of audio recognition method and system, can realize and improve the success ratio that in speech recognition system, voice wake up.
The embodiment of the present invention provides a kind of audio recognition method, comprising:
Gather the first voice data;
Utilize the first model and the second model, described the first voice data is carried out to speech recognition, to obtain voice identification result;
Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
In said method, described the first model and the second model of utilizing, carries out speech recognition to described the first voice data, and before obtaining voice identification result, described method also comprises:
Obtain the corresponding text message of second audio data that described client is play;
Described text message is carried out to cutting processing, and to obtain M character, described M is greater than or equal to 2 integer;
A described M character is carried out to clustering processing or Screening Treatment, and to obtain N character, described N is the positive integer that is less than or equal to M;
According to a described N character, obtain described the first model.
In said method, the phonetic order that described the 3rd voice data is user; Described the first model is that model refused to know in voice, and the second model is that voice wake model up.
In said method, described the first model and the second model of utilizing, carries out speech recognition to described the first voice data, to obtain voice identification result, comprising:
Described the first voice data gathering is carried out to echo cancellation process;
Utilize described the first model and described the second model, described the first voice data obtaining is carried out to speech recognition, to obtain described voice identification result after echo cancellation process.
In said method, described described the first voice data to collection carries out echo cancellation process, comprising:
Obtain the reference position of described the 3rd voice data with respect to described second audio data;
Described the 3rd voice data is converted to the first frequency domain data, the described second audio data after described reference position is converted to the second frequency domain data;
According to described the second frequency domain data, described the first frequency domain data is carried out to filtering processing.
The embodiment of the present invention also provides a kind of speech recognition system, comprising:
Data input cell, for gathering the first voice data;
Data identification unit, for utilizing the first model and the second model, carries out speech recognition to described the first voice data, to obtain voice identification result;
Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
In said system, described system also comprises:
Model generation unit, the corresponding text message of second audio data of playing for obtaining described client; Described text message is carried out to cutting processing, and to obtain M character, described M is greater than or equal to 2 integer; A described M character is carried out to clustering processing or Screening Treatment, and to obtain N character, described N is the positive integer that is less than or equal to M; According to a described N character, obtain described the first model.
In said system, the phonetic order that described the 3rd voice data is user; Described the first model is that model refused to know in voice, and the second model is that voice wake model up.
In said system, described data identification unit specifically for:
Described the first voice data gathering is carried out to echo cancellation process;
Utilize described the first model and described the second model, described the first voice data obtaining is carried out to speech recognition, to obtain described voice identification result after echo cancellation process.
In said system, described data identification unit is carried out echo cancellation process to described the first voice data gathering, and specifically comprises:
Obtain the reference position of described the 3rd voice data with respect to described second audio data;
Described the 3rd voice data is converted to the first frequency domain data, the described second audio data after described reference position is converted to the second frequency domain data;
According to described the second frequency domain data, described the first frequency domain data is carried out to filtering processing.
As can be seen from the above technical solutions, the embodiment of the present invention has following beneficial effect:
Client utilizes the first model to identify the voice data of collection, the voice data of being play to identify client, therefore, in the embodiment of the present invention, can utilize the voice data of identifying interference for the model that identifies the voice data that client plays, thereby can reduce the interference to final voice identification result of voice identification result that voice data that client plays is corresponding, thereby can reduce voice identification result that voice data that client plays is corresponding as the probability for differentiating the voice identification result whether waking up, improve the success ratio that in speech recognition system, voice wake up.
[brief description of the drawings]
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, to the accompanying drawing of required use in embodiment be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the schematic diagram of the navigation client that uses of the technical scheme that provides of the embodiment of the present invention;
Fig. 2 is the schematic flow sheet of the audio recognition method that provides of the embodiment of the present invention;
Fig. 3 is the schematic diagram of the first model of providing of the embodiment of the present invention;
Fig. 4 is the exemplary plot that client that the embodiment of the present invention provides utilizes the first model and the second model to carry out speech recognition;
Fig. 5 is the functional block diagram of the speech recognition system that provides of the embodiment of the present invention.
[embodiment]
Technical scheme for a better understanding of the present invention, is described in detail the embodiment of the present invention below in conjunction with accompanying drawing.
Should be clear and definite, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making all other embodiment that obtain under creative work prerequisite, belong to the scope of protection of the invention.
The term using is in embodiments of the present invention only for describing the object of specific embodiment, but not is intended to limit the present invention." one ", " described " and " being somebody's turn to do " of the singulative using in the embodiment of the present invention and appended claims are also intended to comprise most forms, unless context clearly represents other implications.It is also understood that term "and/or" used herein refer to and comprise one or more projects of listing that are associated any or all may combine.
Although should be appreciated that and may adopt in embodiments of the present invention term first, second, third, etc. to describe various voice datas and frequency domain data, these voice datas and frequency domain data should not be limited to these terms.These terms are only used for voice data and frequency domain data to be distinguished from each other out.
Depend on linguistic context, as used in this word " if " or " if " can be construed as into " ... time " or " when ... time " or " in response to determining " or " in response to detecting ".Similarly, depend on linguistic context, phrase " if determining " or " if detecting (the conditioned disjunction event of statement) " can be construed as " when definite " or " in response to determining " or " in the time detecting (the conditioned disjunction event of statement) " or " in response to detecting (the conditioned disjunction event of statement) ".
Taking client as navigation client is as example, the navigation client that the technical scheme that the embodiment of the present invention provides is used as shown in Figure 1, mainly formed by speech recognition system and speech guide system, the method and system that the embodiment of the present invention provides is realized in the speech recognition system of navigation client, be mainly used in waking up speech guide system, to make speech guide system provide Voice Navigation service to user, realize the speech navigation function of client.
In the embodiment of the present invention, described client except can be navigation client, can also be to utilize interactive voice mode that the client of the information of audio form is provided to user.Described client can be positioned on navigation terminal, intelligent television or subscriber equipment; Described subscriber equipment can comprise personal computer (Personal Computer, PC), notebook computer, mobile phone or panel computer etc.
The embodiment of the present invention provides a kind of audio recognition method, please refer to Fig. 2, the schematic flow sheet of its audio recognition method providing for the embodiment of the present invention, and as shown in the figure, the method comprises the following steps:
S201, gathers the first voice data.
Concrete, client gathers the first voice data.
Preferably, the first voice data can comprise second audio data that client self plays and the 3rd voice data except the second audio data that described client is play.
Preferably, if this client is navigation client, the second audio data that this client self is play can be the voice data based on Text To Speech (Text to Speech, TTS), as the traffic information of client terminal playing etc.For example, " there is hypervelocity camera at 500 meters of of road ahead " that client is play can be above-mentioned second audio data.Again for example, the 3rd voice data except the second audio data that described client is play can be the phonetic order sending in the time that user need to use speech navigation function, and this phonetic order is for waking the speech navigation function of client up.
Preferably, client can utilize audio collecting device to gather above-mentioned the first voice data.For example, when client is positioned on mobile phone or panel computer, client can utilize microphone to gather the first voice data.
S202, utilizes the first model and the second model, described the first voice data is carried out to speech recognition, to obtain voice identification result; Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
Concrete, client is collecting after the first voice data, and client need to be utilized the first model and the second model, described the first voice data is carried out to speech recognition, to obtain voice identification result.Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
Preferably, described client is utilized the first model and the second model, and described the first voice data is carried out to speech recognition, before obtaining voice identification result, needs to set in advance the first model and the second model in client.Wherein, this first model can comprise that voice refuse to know model, model refused to know in these voice is in the embodiment of the present invention, to need to set in advance in client, and the second model can comprise that voice wake model up, and it is in client, to have set in prior art that these voice wake model up.
Illustrate, refuse to know model if described the first model comprises voice, the generation method that sets in advance the first model in client can comprise:
First, obtain the corresponding text message of second audio data that described client is play.For example, if client is navigation client, when this navigation client terminal playing second audio data, first according to default report text library, determine the text message that needs the second audio data of reporting, then utilize TTS technology second audio data corresponding to text information to convert to, finally utilize loudspeaker to play second audio data, so, client in the embodiment of the present invention can be preserved play history record, thereby can be according to the play history record of client, add up the broadcasting time of each second audio data, then obtain the corresponding text message of second audio data that broadcasting time is wherein greater than default broadcasting time threshold value.Here, do not need to obtain all text messages of reporting in text library, but obtain the wherein more corresponding text message of second audio data of broadcasting time, can reduce data processing amount while generating the first model.For example, the broadcasting time of the second audio data of " there is hypervelocity camera at 500 meters of of road ahead " and " road ahead is turned right " correspondence is more, can obtain this two corresponding text messages of second audio data.
Then, the text message obtaining is carried out to cutting processing, to obtain M character, described M is greater than or equal to 2 integer.For example, after the corresponding text message of second audio data of playing, each text message is carried out respectively to cutting processing in acquisition client, so just the text message of acquisition can be cut into R character, each character is an independent word; Then the numeral in this R character is removed, and carried out duplicate removal processing, to obtain M character; Duplicate removal is processed for merging R the character that character is identical; Wherein, R is greater than or equal to 2 integer, and M is less than or equal to R, and M is greater than or equal to 2 integer.
For example, text message " there is hypervelocity camera at 500 meters of of road ahead " and " road ahead is turned right " are carried out respectively to cutting processing, obtain following character: front, side, road, road, 500, rice, locate, have, surpass, speed, take the photograph, as, head, front, side, road, road, the right side, turn.Preferably, numeral in above-mentioned character " 500 " can also be converted to corresponding Chinese character, as " 500 " are converted to corresponding " 500 ", only retain one of them for the character repeating, the final character obtaining is: front, side, road, road, five, hundred, rice, locate, have, surpass, speed, take the photograph, as,, the right side, turn.
Then, a described M character is carried out to clustering processing or Screening Treatment, to obtain N character, described N is the positive integer that is less than or equal to M.Preferably, M character being carried out to clustering processing can be: in M character, a classification can be served as in each independent character, in order to reduce the number of classification, need to merge similar classification.For example, can, according to M character, obtain the phonetic that each character is corresponding, calculate the similarity of two characters according to phonetic corresponding to each character; A character merged in two characters that similarity are greater than to default similarity threshold, as selected arbitrarily a character in similarity is greater than two characters of default similarity threshold, retains the character of selecting, and removes remaining character.Preferably, a described M character is carried out to Screening Treatment can be: can retain every a character M character, remaining character by screened fall; For example, M character is: front, side, road, road, five, hundred, rice, locate, have, surpass, speed, take the photograph, as, head, right, turn, to obtaining after this M character screening: front, road, five, rice, have, fast, as, the right side.The above-mentioned object that M character carried out to clustering processing or Screening Treatment is to reduce number of characters.
Finally, according to a described N character, obtain described the first model.Be understandable that, can be relevant between character and character, namely can transfer to another character from a character, between every two characters, there is transition probability, the transition probability that another other characters transferred in only each character is different, therefore the branch mode combining according to the various arrangement of N character, can be obtained up to a few character string, in each character string, can comprise at least two characters.Then, can obtain described the first model according at least one character string; Wherein, the first model can comprise all character strings that obtain according to N character, also can comprise some character strings of weighted value maximum in all character strings that obtain according to N character.Wherein, the weighted value of character string can equal the product of the transition probability between every two characters in character string, the model of the transition probability between every two characters can utilize default acoustic model to obtain, this acoustic model is a probability model, can comprise transition probability between probability, character and the character that initial consonant and simple or compound vowel of a Chinese syllable occur simultaneously etc.
For example, please refer to Fig. 3, the schematic diagram of its first model providing for the embodiment of the present invention, as shown in the figure, 14 characters that obtain comprise: front, side, face, road, road, have, left and right, straight, capable, take the photograph, as,, turn, can obtain 4 character strings shown in Fig. 3 according to these 14 characters, i.e. there is camera in road ahead craspedodrome, front, turns left above, turns right above.
Preferably, client can first be carried out echo cancellation process to described the first voice data gathering; Then, client is utilized described the first model and described the second model, described the first voice data obtaining after echo cancellation process is carried out to speech recognition, to obtain described voice identification result, like this, client, before the first voice data is carried out to voice recognition processing, just can utilize echo cancellation technology to filter out the second audio data that client is partly play.
Illustrate, the method that described client is carried out echo cancellation process to described the first voice data gathering can comprise:
First, client obtains the reference position of described the 3rd voice data with respect to described second audio data.Here, client need to be play second audio data to user, and therefore client can obtain the second audio data that self plays.For example, client can be utilized auto-correlation algorithm, the first voice data to client collection and the second audio data of client terminal playing carry out autocorrelation calculation, to obtain the 3rd voice data that comprised in the first voice data reference position with respect to second audio data.
Then, client according to obtain reference position, and utilize echo cancellation technology to gather the first voice data carry out echo cancellation process.For example, the first voice data gathering is converted to the first frequency domain data by client, and the described second audio data after described reference position is converted to the second frequency domain data.Client is by the first frequency domain data and the second frequency domain data input filter, wave filter can be according to described the second frequency domain data like this, described the first frequency domain data is carried out to filtering processing, utilize echo cancellation technology thereby can realize, the second audio data of the client terminal playing that filtering the first voice data comprises in the first voice data collecting.
It should be noted that, it is a kind of preferred embodiment that client is carried out echo cancellation process to the first voice data, and client also can not carried out echo cancellation process to the first voice data, directly the first voice data is carried out to voice recognition processing.
Illustrate, client is utilized the first model and the second model, described the first voice data is carried out to speech recognition, can comprise with the method that obtains voice identification result: please refer to Fig. 4, the exemplary plot that its client providing for the embodiment of the present invention utilizes the first model and the second model to carry out speech recognition, as shown in the figure, client utilizes the first model to carry out speech recognition to the first voice data, to obtain the first voice identification result, here, because the first model is that the corresponding text message of second audio data of playing according to client obtains, therefore the first model is in the time carrying out voice recognition processing to the first voice data that comprises second audio data, can identify the second audio data of the client terminal playing comprising in the first voice data, as shown in Figure 4, because the character in the first model has passed through clustering processing or Screening Treatment, so only comprise the character in the text message that the second audio data of part is corresponding in the first voice identification result, make discrimination lower, discrimination equals the ratio of character total number in number of characters in recognition result and voice data, the weighted value of the first voice identification result and discrimination are relation in direct ratio, therefore the weighted value of the first voice identification result is lower.Meanwhile, utilize the second model to carry out voice recognition processing to the first voice data, to obtain the second voice identification result, wherein, because the second model is that voice wake model up, voice wake model up and comprise that at least one wakes keyword (as the Baidu navigation in Fig. 4) up, therefore, utilize the second model to carry out after voice recognition processing the first voice data, can obtain the second voice identification result corresponding to the 3rd voice data (as user's phonetic order) comprising in the first voice data, the weighted value of the weighted value of this second voice identification result and the first voice identification result is compared, using the voice identification result of weighted value maximum wherein as final voice identification result.
Optionally, after obtaining final voice identification result, client can judge in this final voice identification result, whether to comprise the default keyword that wakes up, wake keyword up if comprised, client can be waken the speech navigation function of client up, to make client to provide Voice Navigation service to user, realize the speech navigation function of client.Otherwise, wake keyword up if do not comprised, client is not waken speech navigation function up.
It should be noted that, in prior art, the first model is that model is known in general refusing, be not that model is known in set the refusing of second audio data of playing for client, in practical application, when client to self gather the first voice data, utilizing general refusing to know model and voice wakes model up and carries out respectively voice recognition processing, while obtaining voice identification result respectively, the weighted value of refusing to know the voice identification result that model is corresponding in most of situation can be greater than or equal to voice and wake up the weighted value of model, like this, client will will refuse to know the corresponding voice identification result of model as final voice identification result, and judge whether refuse to know the voice identification result that model is corresponding comprises the default keyword that wakes up, owing to refusing to know the keyword that wakes up that model generally can not comprise user preset, therefore cause waking up speech navigation function failure.The embodiment of the present invention is utilized said method, text message corresponding to second audio data of playing for client builds the first model, utilize the first model as refusing to know model, the first voice data is carried out to voice recognition processing, utilize clustering processing or Screening Treatment to character to reduce the discrimination that the first model is corresponding to the second audio data comprising in the first voice data simultaneously, thereby can reduce the weighted value of the voice identification result that utilizes the first model acquisition, client can be exported the second voice identification result that utilizes the second model to obtain as far as possible as final voice identification result, like this, utilize the second voice identification result to judge whether to comprise and wake keyword up, the second voice identification result is owing to being to obtain for user's phonetic order, therefore generally can comprise and wake keyword up, thereby can successfully wake speech navigation function up, just can improve under the disturbed condition of voice data that has client terminal playing, the success ratio that voice wake up.
In the embodiment of the present invention, in order to wake the speech navigation function of navigation client up, navigation client need to identify the keyword that wakes up setting in advance from the voice data gathering, the model that wakes keyword up that is used for the voice data that identifies collection is exactly that above-mentioned voice wake model up, voice wake model up and can comprise at least one that set in advance and wake keyword up, if the voice data gathering can hit voice and wake in model one up and wake keyword up, just can successfully wake speech navigation function up.For other voice datas beyond user's phonetic order, can define some non-keywords that wake up, the non-model that wakes keyword up that is used for the voice data that identifies collection is exactly the above-mentioned model of refusing to know, refuse to know model and can comprise at least one the non-keyword that wakes up setting in advance, the non-keyword that wakes up that the voice data of collection hits can not wake speech navigation function up.
The embodiment of the present invention further provides the device embodiment that realizes each step and method in said method embodiment.
Please refer to Fig. 5, the functional block diagram of its speech recognition system providing for the embodiment of the present invention.As shown in the figure, this system comprises:
Data input cell 501, for gathering the first voice data;
Data identification unit 502, for utilizing the first model and the second model, carries out speech recognition to described the first voice data, to obtain voice identification result;
Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
Preferably, described system also comprises:
Model generation unit 503, the corresponding text message of second audio data of playing for obtaining described client; Described text message is carried out to cutting processing, and to obtain M character, described M is greater than or equal to 2 integer; A described M character is carried out to clustering processing or Screening Treatment, and to obtain N character, described N is the positive integer that is less than or equal to M; According to a described N character, obtain described the first model.
Preferably, the phonetic order that described the 3rd voice data is user; Described the first model is that model refused to know in voice, and the second model is that voice wake model up.
Preferably, described data identification unit 502 specifically for: to gather described the first voice data carry out echo cancellation process; Utilize described the first model and described the second model, described the first voice data obtaining is carried out to speech recognition, to obtain described voice identification result after echo cancellation process.
Preferably, described data identification unit 502 is carried out echo cancellation process to described the first voice data gathering, and specifically comprises: obtain the reference position of described the 3rd voice data with respect to described second audio data; Described the 3rd voice data is converted to the first frequency domain data, the described second audio data after described reference position is converted to the second frequency domain data; According to described the second frequency domain data, described the first frequency domain data is carried out to filtering processing.
Due to the method for the each unit in the present embodiment shown in can execution graph 2, the part that the present embodiment is not described in detail, can be with reference to the related description to Fig. 2.
The technical scheme of the embodiment of the present invention has following beneficial effect:
Client utilizes the first model to identify the voice data of collection, the voice data of being play to identify client, therefore, in the embodiment of the present invention, can utilize the voice data of identifying interference for the model that identifies the voice data that client plays, thereby can reduce the interference to final voice identification result of voice identification result that voice data that client plays is corresponding, thereby can reduce voice identification result that voice data that client plays is corresponding as the probability for differentiating the voice identification result whether waking up, improve the success ratio that in speech recognition system, voice wake up.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any amendment of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (10)

1. an audio recognition method, is characterized in that, described method comprises:
Gather the first voice data;
Utilize the first model and the second model, described the first voice data is carried out to speech recognition, to obtain voice identification result;
Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
2. method according to claim 1, is characterized in that, described the first model and the second model of utilizing, carries out speech recognition to described the first voice data, and before obtaining voice identification result, described method also comprises:
Obtain the corresponding text message of second audio data that described client is play;
Described text message is carried out to cutting processing, and to obtain M character, described M is greater than or equal to 2 integer;
A described M character is carried out to clustering processing or Screening Treatment, and to obtain N character, described N is the positive integer that is less than or equal to M;
According to a described N character, obtain described the first model.
3. method according to claim 1 and 2, is characterized in that,
The phonetic order that described the 3rd voice data is user;
Described the first model is that model refused to know in voice, and the second model is that voice wake model up.
4. method according to claim 1 and 2, is characterized in that, described the first model and the second model of utilizing, carries out speech recognition to described the first voice data, to obtain voice identification result, comprising:
Described the first voice data gathering is carried out to echo cancellation process;
Utilize described the first model and described the second model, described the first voice data obtaining is carried out to speech recognition, to obtain described voice identification result after echo cancellation process.
5. method according to claim 4, is characterized in that, described described the first voice data to collection carries out echo cancellation process, comprising:
Obtain the reference position of described the 3rd voice data with respect to described second audio data;
Described the 3rd voice data is converted to the first frequency domain data, the described second audio data after described reference position is converted to the second frequency domain data;
According to described the second frequency domain data, described the first frequency domain data is carried out to filtering processing.
6. a speech recognition system, is characterized in that, described system comprises:
Data input cell, for gathering the first voice data;
Data identification unit, for utilizing the first model and the second model, carries out speech recognition to described the first voice data, to obtain voice identification result;
Wherein, the second audio data that described the first model is play for identifying client that described the first voice data comprises, described the second model is for identifying the 3rd voice data except the second audio data that described client is play that described the first voice data comprises.
7. system according to claim 6, is characterized in that, described system also comprises:
Model generation unit, the corresponding text message of second audio data of playing for obtaining described client; Described text message is carried out to cutting processing, and to obtain M character, described M is greater than or equal to 2 integer; A described M character is carried out to clustering processing or Screening Treatment, and to obtain N character, described N is the positive integer that is less than or equal to M; According to a described N character, obtain described the first model.
8. according to the system described in claim 6 or 7, it is characterized in that,
The phonetic order that described the 3rd voice data is user;
Described the first model is that model refused to know in voice, and the second model is that voice wake model up.
9. according to the system described in claim 6 or 7, it is characterized in that, described data identification unit specifically for:
Described the first voice data gathering is carried out to echo cancellation process;
Utilize described the first model and described the second model, described the first voice data obtaining is carried out to speech recognition, to obtain described voice identification result after echo cancellation process.
10. system according to claim 9, is characterized in that, described data identification unit is carried out echo cancellation process to described the first voice data gathering, and specifically comprises:
Obtain the reference position of described the 3rd voice data with respect to described second audio data;
Described the 3rd voice data is converted to the first frequency domain data, the described second audio data after described reference position is converted to the second frequency domain data;
According to described the second frequency domain data, described the first frequency domain data is carried out to filtering processing.
CN201410168436.1A 2014-04-24 2014-04-24 Voice recognition method and system Pending CN103971681A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410168436.1A CN103971681A (en) 2014-04-24 2014-04-24 Voice recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410168436.1A CN103971681A (en) 2014-04-24 2014-04-24 Voice recognition method and system

Publications (1)

Publication Number Publication Date
CN103971681A true CN103971681A (en) 2014-08-06

Family

ID=51241099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410168436.1A Pending CN103971681A (en) 2014-04-24 2014-04-24 Voice recognition method and system

Country Status (1)

Country Link
CN (1) CN103971681A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104535071A (en) * 2014-12-05 2015-04-22 百度在线网络技术(北京)有限公司 Voice navigation method and device
CN105096939A (en) * 2015-07-08 2015-11-25 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN105427866A (en) * 2015-10-29 2016-03-23 北京云知声信息技术有限公司 Voice processing method and device, and pickup circuit
CN105469806A (en) * 2014-09-12 2016-04-06 联想(北京)有限公司 Sound processing method, device and system
CN105575386A (en) * 2015-12-18 2016-05-11 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105681579A (en) * 2016-03-11 2016-06-15 广东欧珀移动通信有限公司 Terminal, and screen control method and device for terminal in navigation state
WO2016127550A1 (en) * 2015-02-13 2016-08-18 百度在线网络技术(北京)有限公司 Method and device for human-machine voice interaction
CN106782547A (en) * 2015-11-23 2017-05-31 芋头科技(杭州)有限公司 A kind of robot semantics recognition system based on speech recognition
CN106796786A (en) * 2014-09-30 2017-05-31 三菱电机株式会社 Speech recognition system
CN108062213A (en) * 2017-10-20 2018-05-22 沈阳美行科技有限公司 A kind of methods of exhibiting and device at quick search interface
CN108090112A (en) * 2017-10-20 2018-05-29 沈阳美行科技有限公司 The exchange method and device of a kind of search interface
CN108847222A (en) * 2018-06-19 2018-11-20 Oppo广东移动通信有限公司 Speech recognition modeling generation method, device, storage medium and electronic equipment
CN109065036A (en) * 2018-08-30 2018-12-21 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition
CN109489803A (en) * 2018-10-17 2019-03-19 浙江大学医学院附属邵逸夫医院 A kind of environmental noise intellectual analysis and alarm set
CN109801491A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Intelligent navigation method, device, equipment and storage medium based on risk assessment
CN110895936A (en) * 2018-09-13 2020-03-20 珠海格力电器股份有限公司 Voice processing method and device based on household appliance
WO2021008534A1 (en) * 2019-07-15 2021-01-21 华为技术有限公司 Voice wakeup method and electronic device

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1235332A (en) * 1998-04-02 1999-11-17 日本电气株式会社 Speech recognition noise removing system and speech recognition noise removing method
CN1337670A (en) * 2001-09-28 2002-02-27 北京安可尔通讯技术有限公司 Fast voice identifying method for Chinese phrase of specific person
CN1397062A (en) * 2000-12-29 2003-02-12 祖美和 Voice-controlled television set and control method thereof
CN1542734A (en) * 2003-05-02 2004-11-03 ������������ʽ���� Voice recognition system and method
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
US20080270131A1 (en) * 2007-04-27 2008-10-30 Takashi Fukuda Method, preprocessor, speech recognition system, and program product for extracting target speech by removing noise
US20100254539A1 (en) * 2009-04-07 2010-10-07 Samsung Electronics Co., Ltd. Apparatus and method for extracting target sound from mixed source sound
CN101964192A (en) * 2009-07-22 2011-02-02 索尼公司 Sound processing device, sound processing method, and program
CN102097099A (en) * 2009-12-11 2011-06-15 冲电气工业株式会社 Source sound separator with spectrum analysis through linear combination and method therefor
CN102111468A (en) * 2010-12-20 2011-06-29 上海华勤通讯技术有限公司 Denoising calling mobile phone and method thereof
US20110231185A1 (en) * 2008-06-09 2011-09-22 Kleffner Matthew D Method and apparatus for blind signal recovery in noisy, reverberant environments
CN102918592A (en) * 2010-05-25 2013-02-06 日本电气株式会社 Signal processing method, information processing device, and signal processing program
CN103366740A (en) * 2012-03-27 2013-10-23 联想(北京)有限公司 Voice command recognition method and voice command recognition device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1235332A (en) * 1998-04-02 1999-11-17 日本电气株式会社 Speech recognition noise removing system and speech recognition noise removing method
CN1397062A (en) * 2000-12-29 2003-02-12 祖美和 Voice-controlled television set and control method thereof
CN1337670A (en) * 2001-09-28 2002-02-27 北京安可尔通讯技术有限公司 Fast voice identifying method for Chinese phrase of specific person
CN1542734A (en) * 2003-05-02 2004-11-03 ������������ʽ���� Voice recognition system and method
CN101238511A (en) * 2005-08-11 2008-08-06 旭化成株式会社 Sound source separating device, speech recognizing device, portable telephone, and sound source separating method, and program
US20080270131A1 (en) * 2007-04-27 2008-10-30 Takashi Fukuda Method, preprocessor, speech recognition system, and program product for extracting target speech by removing noise
US20110231185A1 (en) * 2008-06-09 2011-09-22 Kleffner Matthew D Method and apparatus for blind signal recovery in noisy, reverberant environments
US20100254539A1 (en) * 2009-04-07 2010-10-07 Samsung Electronics Co., Ltd. Apparatus and method for extracting target sound from mixed source sound
CN101964192A (en) * 2009-07-22 2011-02-02 索尼公司 Sound processing device, sound processing method, and program
CN102097099A (en) * 2009-12-11 2011-06-15 冲电气工业株式会社 Source sound separator with spectrum analysis through linear combination and method therefor
CN102918592A (en) * 2010-05-25 2013-02-06 日本电气株式会社 Signal processing method, information processing device, and signal processing program
CN102111468A (en) * 2010-12-20 2011-06-29 上海华勤通讯技术有限公司 Denoising calling mobile phone and method thereof
CN103366740A (en) * 2012-03-27 2013-10-23 联想(北京)有限公司 Voice command recognition method and voice command recognition device

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105469806A (en) * 2014-09-12 2016-04-06 联想(北京)有限公司 Sound processing method, device and system
CN105469806B (en) * 2014-09-12 2020-02-21 联想(北京)有限公司 Sound processing method, device and system
CN106796786A (en) * 2014-09-30 2017-05-31 三菱电机株式会社 Speech recognition system
CN104535071B (en) * 2014-12-05 2018-12-14 百度在线网络技术(北京)有限公司 A kind of phonetic navigation method and device
CN104535071A (en) * 2014-12-05 2015-04-22 百度在线网络技术(北京)有限公司 Voice navigation method and device
WO2016127550A1 (en) * 2015-02-13 2016-08-18 百度在线网络技术(北京)有限公司 Method and device for human-machine voice interaction
CN105096939A (en) * 2015-07-08 2015-11-25 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN105427866A (en) * 2015-10-29 2016-03-23 北京云知声信息技术有限公司 Voice processing method and device, and pickup circuit
WO2017071183A1 (en) * 2015-10-29 2017-05-04 北京云知声信息技术有限公司 Voice processing method and device, and pickup circuit
CN106782547A (en) * 2015-11-23 2017-05-31 芋头科技(杭州)有限公司 A kind of robot semantics recognition system based on speech recognition
CN105575386A (en) * 2015-12-18 2016-05-11 百度在线网络技术(北京)有限公司 Method and device for voice recognition
CN105575386B (en) * 2015-12-18 2019-07-30 百度在线网络技术(北京)有限公司 Audio recognition method and device
CN105681579A (en) * 2016-03-11 2016-06-15 广东欧珀移动通信有限公司 Terminal, and screen control method and device for terminal in navigation state
CN108062213A (en) * 2017-10-20 2018-05-22 沈阳美行科技有限公司 A kind of methods of exhibiting and device at quick search interface
CN108090112A (en) * 2017-10-20 2018-05-29 沈阳美行科技有限公司 The exchange method and device of a kind of search interface
CN108847222A (en) * 2018-06-19 2018-11-20 Oppo广东移动通信有限公司 Speech recognition modeling generation method, device, storage medium and electronic equipment
CN109065036A (en) * 2018-08-30 2018-12-21 出门问问信息科技有限公司 Method, apparatus, electronic equipment and the computer readable storage medium of speech recognition
CN110895936A (en) * 2018-09-13 2020-03-20 珠海格力电器股份有限公司 Voice processing method and device based on household appliance
CN109489803A (en) * 2018-10-17 2019-03-19 浙江大学医学院附属邵逸夫医院 A kind of environmental noise intellectual analysis and alarm set
CN109489803B (en) * 2018-10-17 2020-09-01 浙江大学医学院附属邵逸夫医院 Intelligent environmental noise analysis and reminding device
CN109801491A (en) * 2019-01-18 2019-05-24 深圳壹账通智能科技有限公司 Intelligent navigation method, device, equipment and storage medium based on risk assessment
WO2021008534A1 (en) * 2019-07-15 2021-01-21 华为技术有限公司 Voice wakeup method and electronic device

Similar Documents

Publication Publication Date Title
CN103971681A (en) Voice recognition method and system
CN107147618A (en) A kind of user registering method, device and electronic equipment
CN110570840B (en) Intelligent device awakening method and device based on artificial intelligence
CN110349579B (en) Voice wake-up processing method and device, electronic equipment and storage medium
CN106935253A (en) The method of cutting out of audio file, device and terminal device
CN103700370A (en) Broadcast television voice recognition method and system
US9424743B2 (en) Real-time traffic detection
CN104217717A (en) Language model constructing method and device
CN103903621A (en) Method for voice recognition and electronic equipment
CN110047481A (en) Method for voice recognition and device
CN105374352A (en) Voice activation method and system
CN110503944B (en) Method and device for training and using voice awakening model
CN105938399A (en) Text input identification method of intelligent equipment based on acoustics
CN108595406B (en) User state reminding method and device, electronic equipment and storage medium
CN106936991A (en) The method and terminal of a kind of automatic regulating volume
CN110875045A (en) Voice recognition method, intelligent device and intelligent television
CN106531195B (en) A kind of dialogue collision detection method and device
CN113436611B (en) Test method and device for vehicle-mounted voice equipment, electronic equipment and storage medium
CN106228047A (en) A kind of application icon processing method and terminal unit
CN105869622B (en) Chinese hot word detection method and device
CN104282303A (en) Method for conducting voice recognition by voiceprint recognition and electronic device thereof
CN113658586B (en) Training method of voice recognition model, voice interaction method and device
CN110992953A (en) Voice data processing method, device, system and storage medium
CN112466328B (en) Breath sound detection method and device and electronic equipment
CN108231074A (en) A kind of data processing method, voice assistant equipment and computer readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20140806

RJ01 Rejection of invention patent application after publication