CN104766608A - Voice control method and voice control device - Google Patents

Voice control method and voice control device Download PDF

Info

Publication number
CN104766608A
CN104766608A CN201410007018.4A CN201410007018A CN104766608A CN 104766608 A CN104766608 A CN 104766608A CN 201410007018 A CN201410007018 A CN 201410007018A CN 104766608 A CN104766608 A CN 104766608A
Authority
CN
China
Prior art keywords
key data
data
keyword
speech
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201410007018.4A
Other languages
Chinese (zh)
Inventor
周宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen ZTE Microelectronics Technology Co Ltd
Original Assignee
Shenzhen ZTE Microelectronics Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen ZTE Microelectronics Technology Co Ltd filed Critical Shenzhen ZTE Microelectronics Technology Co Ltd
Priority to CN201410007018.4A priority Critical patent/CN104766608A/en
Priority to PCT/CN2014/078463 priority patent/WO2015103836A1/en
Publication of CN104766608A publication Critical patent/CN104766608A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention discloses a voice control method and a voice control device. The voice control method comprises the following steps: acquiring voice data after triggering operation of a user; performing voice recognition on the voice data, carrying out keyword matching according to a preset mode, and obtaining recognized keyword data from the voice data; triggering to send a keyword control command, and taking the recognized keyword data as a control command to respond to operation of the user so as to realize voice control.

Description

A kind of sound control method and device
Technical field
The present invention relates to voice technology, particularly relate to a kind of sound control method and device.
Background technology
Present inventor, in the process realizing the embodiment of the present application technical scheme, at least finds to there is following technical matters in prior art:
For Visual communications scene, along with speech recognition technology is commercially applied on a large scale, user sends control command to by voice, replaces the demand of manual operation control command day by day to strengthen, at present in Visual communications field, control program is function singleness all, and be all be based upon on simple manual operation basis, do not possess novel utility function, lack novelty, for this problem, there is not effective solution in prior art.
Summary of the invention
For solving prior art Problems existing, the embodiment of the present invention desirable to provide a kind of sound control method of one and, send control command by voice, be convenient to user operation, the both hands of user are freed.
A kind of sound control method, described method comprises:
Speech data is obtained after activated user operation;
Speech recognition is carried out to described speech data, carries out keyword match according to predetermined way, from described speech data, obtain the key data identified;
Trigger the transmission of key word control command, the described key data identified is responded described user operation as control command, realizes Voice command.
Preferably, described speech recognition is carried out to described speech data, carries out keyword match according to predetermined way, from described speech data, obtain the key data identified, comprising:
When predetermined way based on Hidden Markov Model (HMM) HMM modeling carries out keyword match, it is MFCC characteristic parameter that described speech data carries out the acoustical characteristic parameters that speech recognition extracts, using the reference data of recognition result as keyword match, obtain the key data identified.
Preferably, described method also comprises: after obtaining the key data identified, and the predetermined way based on bee-line carries out keyword match optimization process.
Preferably, the described predetermined way based on bee-line carries out keyword match optimization process, comprising:
Set up key data sound bank;
The acoustical characteristic parameters of the key data identified described in extraction is MFCC characteristic parameter, and the data clusters using vector quantization (VQ) to carry out in described key data sound bank, obtain the representative vector in each class;
According to the representative vector in each class obtain address the bee-line of the representative vector in the MFCC characteristic parameter of the key data identified and each class;
The key data that described bee-line and empirical value identify after obtaining keyword match optimization process when the match is successful.
Preferably, described method also comprises:
By contrasting the energy information of key data, judging whether control command is finished, if be finished, then terminate current keyword coupling, again speech recognition being carried out to described speech data.
Preferably, described key data comprises: incoming call, breathe out, answer, hang up at least one basic controlling command information.
A kind of phonetic controller, described device comprises:
Voice acquiring unit, for obtaining speech data after activated user operation;
Keyword recognition unit, for carrying out speech recognition to described speech data, carrying out keyword match according to predetermined way, obtaining the key data identified from described speech data;
Speech control unit, for triggering the transmission of key word control command, responding the described key data identified described user operation as control command, realizing Voice command.
Preferably, described keyword recognition unit, when being further used for carrying out keyword match based on the predetermined way of Hidden Markov Model (HMM) HMM modeling, it is MFCC characteristic parameter that described speech data carries out the acoustical characteristic parameters that speech recognition extracts, using the reference data of recognition result as keyword match, obtain the key data identified.
Preferably, described keyword recognition unit, after being further used for the key data obtaining identifying, the predetermined way based on bee-line carries out keyword match optimization process.
Preferably, described keyword recognition unit, when being further used for carrying out keyword match optimization process based on the predetermined way of bee-line, sets up key data sound bank; The acoustical characteristic parameters of the key data identified described in extraction is MFCC characteristic parameter, and the data clusters using vector quantization (VQ) to carry out in described key data sound bank, obtain the representative vector in each class; According to the representative vector in each class obtain address the bee-line of the representative vector in the MFCC characteristic parameter of the key data identified and each class; The key data that described bee-line and empirical value identify after obtaining keyword match optimization process when the match is successful.
Preferably, described keyword recognition unit, is further used for, by contrasting the energy information of key data, judging whether control command is finished, if be finished, then terminate current keyword coupling, again carries out speech recognition to described speech data.
Preferably, described key data comprises: incoming call, breathe out, answer, hang up at least one basic controlling command information.
The method of the embodiment of the present invention comprises: obtain speech data after activated user operation; Speech recognition is carried out to described speech data, carries out keyword match according to predetermined way, from described speech data, obtain the key data identified; Trigger the transmission of key word control command, the described key data identified is responded described user operation as control command, realizes Voice command.Because the key data by identifying triggers the transmission of key word control command, described user operation is responded, realize Voice command, therefore, adopt the Auto-matching of embodiment of the present invention control command to send and instead of existing user manual operations, be convenient to user operation, the both hands of user are freed.
Accompanying drawing explanation
Fig. 1 is the method flow diagram of the embodiment of the present invention;
Fig. 2 is the structure drawing of device of the embodiment of the present invention;
Fig. 3 is the process flow diagram of the embodiment of the present invention one application scenarios;
Fig. 4 is the schematic diagram of embodiment of the present invention vector quantization example;
Fig. 5-7 is the realization flow figure that the device basic module of the embodiment of the present invention one application scenarios runs.
Embodiment
Be described in further detail below in conjunction with the enforcement of accompanying drawing to technical scheme.
The scheme of the embodiment of the present invention is that a kind of application speech recognition technology carries out keyword recognition and then realizes voice-operated scheme, may be used for each application scenarioss such as conversing between Visual communications system, terminal device and send short messages mutually, the control command of Auto-matching is obtained by the identification of speech data key word, replace current manual control, the embodiment of the present invention makes user can carry out humanized various control operations as a kind of supplementary means.
The sound control method of the embodiment of the present invention, as shown in Figure 1, comprising:
Speech data is obtained after the operation of step 101, activated user.
Step 102, speech recognition is carried out to described speech data, carry out keyword match according to predetermined way, from described speech data, obtain the key data identified.
The transmission of step 103, triggering key word control command, responds the described key data identified described user operation as control command, realizes Voice command.
Here, described key data comprises: incoming call, breathe out, answer, hang up at least one basic controlling command information.
Here, step 102 carries out speech recognition to described speech data, keyword match is carried out according to predetermined way, if obtain the key data identified from described speech data, then step 103 can be performed, if do not mated, the key data identified cannot be obtained, then speech data can be sent as general data.
The phonetic controller of the embodiment of the present invention, as shown in Figure 2, comprising:
Voice acquiring unit 11, for obtaining speech data after activated user operation.Keyword recognition unit 12, for carrying out speech recognition to described speech data, carrying out keyword match according to predetermined way, obtaining the key data identified from described speech data.Speech control unit 13, for triggering the transmission of key word control command, responding the described key data identified described user operation as control command, realizing Voice command.
The embodiment of the present invention may be used for each application scenarioss such as conversing between Visual communications system, terminal device and send short messages mutually, is specifically addressed below with Visual communications application scenarios.
As shown in Figure 3, the embodiment of the present invention, in Visual communications application scenarios, comprises the following steps:
Step 201, user obtain speech data after triggering Visual communications operation.
Step 202, keyword match identification is carried out to speech data, if coupling, then perform step 203, otherwise, perform step 204.
Step 203, response user operation, send key word control command, realizes the Voice command in Visual communications operation.
Step 204, RTP Packet Generation.
Here it is to be noted, the embodiment of the present invention is mainly before sending RTP VoP, embedded in voice acquiring unit, voice signal can be gathered by voice-input devices such as microphones when performing step 201 to sample, then perform the pre-service of step 202, namely perform keyword match identifying processing through keyword recognition unit, if coupling key data, then user operation is responded, otherwise, this section of speech data packing is sent.That is, as long as the recognition result of this speech data is not the control command meeting key word, just directly send speech data, if recognition result is the control command meeting key word, then trigger the transmission of key word control command, utilize this control command to carry out operation to Visual communications and control.
For keyword recognition unit, in order to realize carrying out keyword match identification to speech data, the content of employing comprises: 1) the predetermined way algorithm of the first level; 2) the predetermined way algorithm combination that the first level and the second level combine is carried out, adopt content 2) be to content 1) and matching optimization process, finally to obtain key word more accurately, and it can be used as final control command to send.Wherein, the predetermined way algorithm of described first level carries out modeling to realize keyword recognition based on the mode of Hidden Markov Model (HMM) (HMM); The predetermined way algorithm of described second level is that bee-line matching way is to realize keyword recognition.Below for content 2) this algorithm situation about combining is specifically addressed:
For keyword recognition unit, in order to promote keyword recognition performance further in the embodiment of the present invention, two-layer algorithm parallelism recognition flow process can be taked and combined, ground floor adopts the existing method based on HMM to carry out modeling, carrying out to speech data the acoustical characteristic parameters that speech recognition extracts is Mel frequency cepstral coefficient (MFCC, Mel Frequency Cepstrum Coefficient) characteristic parameter, using the key data that identifies as ground floor reference, the second layer sets up a key data sound bank set, then the acoustical characteristic parameters of the key data identified described in obtaining through ground floor is extracted, this acoustical characteristic parameters is MFCC characteristic parameter, use vector quantization (VQ, Vector Quantization) carry out in described key data sound bank data clusters, obtain the representative vector in each class (or being called cell), then according to the representative vector in each class (or being called cell), the bee-line of the representative vector in the MFCC characteristic parameter of the key data identified described in trying to achieve and each class (or being called cell), by bee-line with rule of thumb obtain an empirical value and contrast, if meet certain predetermined criterion, then as the final result identified, namely key word more accurately is finally obtained, and it can be used as final control command to send.
The above-mentioned algorithm used for keyword recognition unit is described as follows:
The implementation of speech recognition algorithm generally includes: 1) receive voice signal; 2) parameter extraction; 3) modeling statistical analysis; 4) decision logic; 5) output is identified.
In speech recognition, the different parameters of voice can be extracted, reach best recognition effect.
For the 2nd) for parameter extraction, what the embodiment of the present invention was taked is extract MFCC.To the voice signal received by parameter extraction, after this preconditioning technique of parameter extraction, can remove the inessential redundant information of speech recognition, extract the important information useful to speech recognition.Key step is: the existing steps such as pre-emphasis, framing, windowing, fast fourier transform, triangle strip bandpass filter.
For the 3rd) for modeling statistical analysis, after extracting parameter, by these parameter characteristics of modeling analysis, obtain recognition result.The present embodiment when based on described ground floor with reference to when carrying out modeling analysis, the probability model for describing statistics of random processes characteristic of to be hidden Markov model (HMM, HiddenMarkovModel), HMM the be a kind of Parametric Representation taked.In the Hidden Markov Model (HMM) of speech recognition, each word generates a corresponding HMM, each observation sequence is made up of the voice of a word, and the identification of word is realized by the HMM assessing and then select the pronunciation representated by most possible generation observation sequence.
The vector quantization of the second layer is the further optimization to ground floor reference, also be a kind of existing modeling statistical analysis, the ultimate principle of vector quantization is: several scalar datas are formed a vector (or the eigenvector extracted from a frame speech data) and give overall quantification in hyperspace, thus can when quantity of information loss is less packed data.Here scalar can be understood as the parameter of extraction.
For the vector quantization concrete example of ground floor reference and the second layer, as shown in Figure 4, if the recognition result that ground floor obtains is a1, can go to find corresponding control command b1 in Key word voice storehouse by recognition result a1, this is used for the MFCC that voice-operated control command also has oneself.Then use vector quantization (VQ) to carry out cluster to recognition result a1, control command b1, draw the vector a of corresponding recognition result a1, the vector b of corresponding control command b1, then compare the similarity degree of vector a, vector b, i.e. bee-line.Treating method is that vector a deducts vector b, obtains vector C, if the amplitude of vector C and phase place more level off to 0(180 ° in other words), then illustrate that recognition result a1 is more similar with control command b1.
For the empirical value that the vector quantization of the second layer relates to, this empirical value is the similarity degree of recognition result a1 and control command b1, and this depends on the vector C obtained.The acquisition of this value of vector C needs repeating experiment and could determine repeatedly.
For the predetermined criterion that the vector quantization of the second layer relates to, it also can be understood as decision logic.It is the core of whole speech recognition system, mainly: distance measure, expertise (as word-building rule, syntax rule, semantic rules etc.), calculate the similarity (as matching distance, likelihood probability) between pattern in input feature vector and pattern base, judge the semantic information of input voice, draw recognition result.
In the middle of keyword recognition process, in time detecting with set keyword match, the control response action of just response respective user operation.
In embodiment of the present invention preferred implementation, after needing to judge that triggering key word control command responds, after this control command performs and terminates, how key word is identified, for this reason, in the Visual communications of the embodiment of the present invention, need to gather voice signal by voice-input device, when getting voice signal, we can be identified by keyword recognition unit the voice signal got, when recognizing key data, can by calculating the energy information of the key word recognized, and the energy information of front and back 20 frame contrasts, if the average recognizing the energy of key word is 2 times of front and back 20 frame average energy value, then determine that this is really get to need to carry out the key data that controls and as control command, then just incoming call can be controlled according to the key word arranged, breathe out, the control operation such as to answer and hang up.
The device of the embodiment of the present invention is on the basis of existing module, increased voice acquiring unit and keyword recognition unit newly, realizes keyword match identification, responded by the key word identified as control command mainly through keyword recognition module.
For Visual communications application scenarios, existing module comprises: proxy module 21, Session Initiation Protocol stack module 22, assembly communication function function library integration module 23, signaling control scheduler module 24, media processing scheduler module 25 and media video display module 26.Wherein, proxy module 21 comprises MS Message Agent 211 and good friend's proxy module 212; Session Initiation Protocol stack module 22 is responsible for and the reception of sip server interaction message and transmission, assembly communication function function library integration module 23 contains required function and is used to provide function call, signaling controls scheduler module 24 for the treatment of Call-Control1 order, the input-output device of management audio frequency and video, media processing scheduler module 25 is mainly used in receiving audio, video data acquisition function, and media video display module 26 is mainly used in showing the video data gathered.
Be respectively the realization flow figure that signaling controls the modules operation of scheduler module 24, media processing scheduler module 25 and media video display module 26 as illustrated in figs. 5-7, and proxy module 21, Session Initiation Protocol stack module 22, these modules of assembly communication function function library integration module 23 are mainly used for the transmission of speech data and mutual, the embodiment of the present invention focuses on the process to speech data, therefore, these modules are little with the relation of the embodiment of the present invention, do not stress herein.
Be illustrated in figure 5 the realization flow figure that signaling controls scheduler module 24 operation, signaling controls scheduler module 24 for the treatment of Call-Control1 order, and its realization flow comprises the following steps:
Step 401, receive control command.
Step 402, according to voice control command table, call the software API of respective operations, just can replace realizing manual all operations.
Step 403, call respective operations software API after perform corresponding operation, open various equipment, process concrete affairs or close various equipment.
Here, 1) various equipment can be opened, such as make a phone call (entering the interface of making a phone call), opens camera, the orders such as microphone; 2) concrete issued transaction, such as calls so-and-so, windows exchange etc. some need manual action; 3) close various equipment, such as shut down, standby, close camera, microphone etc.
Be illustrated in figure 6 the realization flow figure that media processing scheduler module 25 runs, media processing scheduler module 25, for gathering voice data, is judged to be the audio frequency of non-controlling order, and its realization flow comprises the following steps:
Step 501, respectively collection voice data and collection video data.
Step 502, respectively to collection voice data and video data use open source software coding.
Step 503, code multiplexing, carry out integration by voice data and video data and obtain audio, video data.
This audio, video data of step 504, Internet Transmission.
Be illustrated in figure 7 the realization flow figure that media video display module 26 runs, media video display module 26, for showing the video data of collection, on the one hand to the video data that media processing scheduler module 25 compresses, is decoded, and delivers to for display after decoding; On the other hand an Audio and Video stream scheme of increasing income freely cross-platform to voice data by third party's open source software FFmpeg(FFmpeg, it provide recording, conversion and the total solution of fluidisation audio frequency and video) decode, deliver to the audio frequency of sound card output for playing out corresponding video data after treatment, its realization flow comprises the following steps:
Step 601, receive the audio, video data after code multiplexing.
Step 602, demultiplexing, resolve to voice data and video data by audio, video data.
Step 603, obtain packets of audio data and video packets of data respectively.
Step 604, packets of audio data and video packets of data to be decoded respectively, use open source software to decode to packets of audio data, obtain pulse code and adjust (PCM) data, use and from grinding chip hardware demoder, video packets of data is decoded, obtain picture.
Step 605, decoded voice data is sent into sound card export, decoded video data is sent into hardware display buffer district (buffer) and waits for that hardware shows.
In basic Visual communications module, also comprise speech recognition keyword message module and key word processing module, by these two module integrations in basic module, the certain operations with Voice command Visual communications can be played, such as incoming call, breathe out, answer and hang up.In the present invention, first several basic command is set in keyword selection: incoming call, breathe out, answer and hang up, then, in call flow, voice can be passed through speech recognition keyword message module, when identifying keyword message, just by key word processing module, judging that the keyword message identified is the need of being for further processing by certain criterion, if needed, then starting to feed back voice-controlled operations, otherwise, do not carry out control operation.By the method, speech recognition technology can be applied accurately in Visual communications, the incoming call of realization Voice command, the operation such as breathe out, answer and hang up.
In sum, the embodiment of the present invention mainly to be sent short messages etc. at Visual communications, mobile phone dialing phone, mobile phone and is applicable in the application scenarios of speech recognition and input, the basis of above-mentioned existing module adds voice acquisition module and keyword recognition unit, after phonetic entry, mainly through this keyword recognition unit, the key word of needs can be detected, then control the control operation in Visual communications according to this key word trigger command, such as breathe out, hang up, incoming call etc.In keyword recognition unit, what take is two-layer speech recognition controlled, one deck is the audio recognition method based on HMM, one deck is bee-line matching method in addition, controlled by two-layer identification, obtain key data accurately, compare according to the energy of key data to front and back 20 frame simultaneously, thus coupling obtain the need of key data.
If module integrated described in the embodiment of the present invention using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.Based on such understanding, the technical scheme of the embodiment of the present invention can embody with the form of software product the part that prior art contributes in essence in other words, this computer software product is stored in a storage medium, comprises some instructions and performs all or part of of method described in each embodiment of the present invention in order to make a computer equipment (can be personal computer, server or the network equipment etc.).And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disc or CD etc. various can be program code stored medium.Like this, the embodiment of the present invention is not restricted to any specific hardware and software combination.
Accordingly, the embodiment of the present invention also provides a kind of computer-readable storage medium, wherein stores computer program, and this computer program is for performing the sound control method of the embodiment of the present invention.
The above, be only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention.

Claims (12)

1. a sound control method, is characterized in that, described method comprises:
Speech data is obtained after activated user operation;
Speech recognition is carried out to described speech data, carries out keyword match according to predetermined way, from described speech data, obtain the key data identified;
Trigger the transmission of key word control command, the described key data identified is responded described user operation as control command, realizes Voice command.
2. method according to claim 1, is characterized in that, describedly carries out speech recognition to described speech data, carries out keyword match, obtain the key data identified, comprising from described speech data according to predetermined way:
When predetermined way based on Hidden Markov Model (HMM) HMM modeling carries out keyword match, it is MFCC characteristic parameter that described speech data carries out the acoustical characteristic parameters that speech recognition extracts, using the reference data of recognition result as keyword match, obtain the key data identified.
3. method according to claim 2, is characterized in that, described method also comprises: after obtaining the key data identified, and the predetermined way based on bee-line carries out keyword match optimization process.
4. method according to claim 3, is characterized in that, the described predetermined way based on bee-line carries out keyword match optimization process, comprising:
Set up key data sound bank;
The acoustical characteristic parameters of the key data identified described in extraction is MFCC characteristic parameter, and the data clusters using vector quantization (VQ) to carry out in described key data sound bank, obtain the representative vector in each class;
According to the representative vector in each class obtain address the bee-line of the representative vector in the MFCC characteristic parameter of the key data identified and each class;
The key data that described bee-line and empirical value identify after obtaining keyword match optimization process when the match is successful.
5. the method according to any one of claim 2 to 4, is characterized in that, described method also comprises:
By contrasting the energy information of key data, judging whether control command is finished, if be finished, then terminate current keyword coupling, again speech recognition being carried out to described speech data.
6. method according to claim 1, is characterized in that, described key data comprises: incoming call, breathe out, answer, hang up at least one basic controlling command information.
7. a phonetic controller, is characterized in that, described device comprises:
Voice acquiring unit, for obtaining speech data after activated user operation;
Keyword recognition unit, for carrying out speech recognition to described speech data, carrying out keyword match according to predetermined way, obtaining the key data identified from described speech data;
Speech control unit, for triggering the transmission of key word control command, responding the described key data identified described user operation as control command, realizing Voice command.
8. device according to claim 7, it is characterized in that, described keyword recognition unit, when being further used for carrying out keyword match based on the predetermined way of Hidden Markov Model (HMM) HMM modeling, it is MFCC characteristic parameter that described speech data carries out the acoustical characteristic parameters that speech recognition extracts, using the reference data of recognition result as keyword match, obtain the key data identified.
9. device according to claim 8, is characterized in that, described keyword recognition unit, and after being further used for the key data obtaining identifying, the predetermined way based on bee-line carries out keyword match optimization process.
10. device according to claim 9, is characterized in that, described keyword recognition unit, when being further used for carrying out keyword match optimization process based on the predetermined way of bee-line, sets up key data sound bank; The acoustical characteristic parameters of the key data identified described in extraction is MFCC characteristic parameter, and the data clusters using vector quantization (VQ) to carry out in described key data sound bank, obtain the representative vector in each class; According to the representative vector in each class obtain address the bee-line of the representative vector in the MFCC characteristic parameter of the key data identified and each class; The key data that described bee-line and empirical value identify after obtaining keyword match optimization process when the match is successful.
Device described in 11. any one of according to Claim 8 to 10, it is characterized in that, described keyword recognition unit, be further used for by contrasting the energy information of key data, judge whether control command is finished, if be finished, then terminate current keyword coupling, again speech recognition is carried out to described speech data.
12. devices according to claim 7, is characterized in that, described key data comprises: incoming call, breathe out, answer, hang up at least one basic controlling command information.
CN201410007018.4A 2014-01-07 2014-01-07 Voice control method and voice control device Withdrawn CN104766608A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410007018.4A CN104766608A (en) 2014-01-07 2014-01-07 Voice control method and voice control device
PCT/CN2014/078463 WO2015103836A1 (en) 2014-01-07 2014-05-26 Voice control method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410007018.4A CN104766608A (en) 2014-01-07 2014-01-07 Voice control method and voice control device

Publications (1)

Publication Number Publication Date
CN104766608A true CN104766608A (en) 2015-07-08

Family

ID=53523498

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410007018.4A Withdrawn CN104766608A (en) 2014-01-07 2014-01-07 Voice control method and voice control device

Country Status (2)

Country Link
CN (1) CN104766608A (en)
WO (1) WO2015103836A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139858A (en) * 2015-07-27 2015-12-09 联想(北京)有限公司 Information processing method and electronic equipment
CN105898496A (en) * 2015-11-18 2016-08-24 乐视网信息技术(北京)股份有限公司 HLS stream hardware decoding method based on Android device and device
CN106488286A (en) * 2015-08-28 2017-03-08 上海欢众信息科技有限公司 High in the clouds Information Collection System
CN106686445A (en) * 2015-11-05 2017-05-17 北京中广上洋科技股份有限公司 Method of carrying out on-demand jump on multimedia file
CN107249116A (en) * 2017-08-09 2017-10-13 成都全云科技有限公司 Noise echo eliminating device based on video conference
CN108242241A (en) * 2016-12-23 2018-07-03 中国农业大学 A kind of pure voice rapid screening method and its device
CN108702411A (en) * 2017-03-21 2018-10-23 华为技术有限公司 A kind of method and device of control call
WO2018219023A1 (en) * 2017-05-27 2018-12-06 腾讯科技(深圳)有限公司 Speech keyword identification method and device, terminal and server
CN109003604A (en) * 2018-06-20 2018-12-14 恒玄科技(上海)有限公司 A kind of audio recognition method that realizing low-power consumption standby and system
CN109887512A (en) * 2019-03-15 2019-06-14 深圳市奥迪信科技有限公司 Wisdom hotel guest room control method and system
CN110174924A (en) * 2018-09-30 2019-08-27 广东小天才科技有限公司 A kind of making friends method and wearable device based on wearable device
CN112086091A (en) * 2020-09-18 2020-12-15 南京孝德智能科技有限公司 Intelligent endowment service system and method
CN112687269A (en) * 2020-12-18 2021-04-20 山东盛帆蓝海电气有限公司 Building management robot voice automatic identification method and system
CN113709545A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Video processing method and device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2136559Y (en) * 1992-07-09 1993-06-16 陈康 Voice-operated automatic dialing device for telephone
CN1615508A (en) * 2001-12-17 2005-05-11 旭化成株式会社 Speech recognition method, remote controller, information terminal, telephone communication terminal and speech recognizer
CN101345668A (en) * 2008-08-22 2009-01-14 中兴通讯股份有限公司 Control method and apparatus for monitoring equipment
CN101516005A (en) * 2008-02-23 2009-08-26 华为技术有限公司 Speech recognition channel selecting system, method and channel switching device
CN101923857A (en) * 2009-06-17 2010-12-22 复旦大学 Extensible audio recognition method based on man-machine interaction
CN102568478A (en) * 2012-02-07 2012-07-11 合一网络技术(北京)有限公司 Video play control method and system based on voice recognition
CN102938811A (en) * 2012-10-15 2013-02-20 华南理工大学 Household mobile phone communication system based on voice recognition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6141641A (en) * 1998-04-15 2000-10-31 Microsoft Corporation Dynamically configurable acoustic model for speech recognition system
CN1190772C (en) * 2002-09-30 2005-02-23 中国科学院声学研究所 Voice identifying system and compression method of characteristic vector set for voice identifying system
US7027979B2 (en) * 2003-01-14 2006-04-11 Motorola, Inc. Method and apparatus for speech reconstruction within a distributed speech recognition system
CN101154379B (en) * 2006-09-27 2011-11-23 夏普株式会社 Method and device for locating keywords in voice and voice recognition system
CN101673112A (en) * 2009-09-17 2010-03-17 李华东 Intelligent home voice controller
CN103366743A (en) * 2012-03-30 2013-10-23 北京千橡网景科技发展有限公司 Voice-command operation method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN2136559Y (en) * 1992-07-09 1993-06-16 陈康 Voice-operated automatic dialing device for telephone
CN1615508A (en) * 2001-12-17 2005-05-11 旭化成株式会社 Speech recognition method, remote controller, information terminal, telephone communication terminal and speech recognizer
CN101516005A (en) * 2008-02-23 2009-08-26 华为技术有限公司 Speech recognition channel selecting system, method and channel switching device
CN101345668A (en) * 2008-08-22 2009-01-14 中兴通讯股份有限公司 Control method and apparatus for monitoring equipment
CN101923857A (en) * 2009-06-17 2010-12-22 复旦大学 Extensible audio recognition method based on man-machine interaction
CN102568478A (en) * 2012-02-07 2012-07-11 合一网络技术(北京)有限公司 Video play control method and system based on voice recognition
CN102938811A (en) * 2012-10-15 2013-02-20 华南理工大学 Household mobile phone communication system based on voice recognition

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139858A (en) * 2015-07-27 2015-12-09 联想(北京)有限公司 Information processing method and electronic equipment
CN105139858B (en) * 2015-07-27 2019-07-26 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN106488286A (en) * 2015-08-28 2017-03-08 上海欢众信息科技有限公司 High in the clouds Information Collection System
CN106686445A (en) * 2015-11-05 2017-05-17 北京中广上洋科技股份有限公司 Method of carrying out on-demand jump on multimedia file
CN106686445B (en) * 2015-11-05 2019-06-11 北京中广上洋科技股份有限公司 The method that multimedia file is jumped on demand
CN105898496A (en) * 2015-11-18 2016-08-24 乐视网信息技术(北京)股份有限公司 HLS stream hardware decoding method based on Android device and device
CN108242241A (en) * 2016-12-23 2018-07-03 中国农业大学 A kind of pure voice rapid screening method and its device
CN108702411A (en) * 2017-03-21 2018-10-23 华为技术有限公司 A kind of method and device of control call
US10938978B2 (en) 2017-03-21 2021-03-02 Huawei Technologies Co., Ltd. Call control method and apparatus
WO2018219023A1 (en) * 2017-05-27 2018-12-06 腾讯科技(深圳)有限公司 Speech keyword identification method and device, terminal and server
CN107249116B (en) * 2017-08-09 2020-05-05 成都全云科技有限公司 Noise echo eliminating device based on video conference
CN107249116A (en) * 2017-08-09 2017-10-13 成都全云科技有限公司 Noise echo eliminating device based on video conference
CN109003604A (en) * 2018-06-20 2018-12-14 恒玄科技(上海)有限公司 A kind of audio recognition method that realizing low-power consumption standby and system
CN110174924A (en) * 2018-09-30 2019-08-27 广东小天才科技有限公司 A kind of making friends method and wearable device based on wearable device
CN110174924B (en) * 2018-09-30 2021-03-30 广东小天才科技有限公司 Friend making method based on wearable device and wearable device
CN109887512A (en) * 2019-03-15 2019-06-14 深圳市奥迪信科技有限公司 Wisdom hotel guest room control method and system
CN112086091A (en) * 2020-09-18 2020-12-15 南京孝德智能科技有限公司 Intelligent endowment service system and method
CN112687269A (en) * 2020-12-18 2021-04-20 山东盛帆蓝海电气有限公司 Building management robot voice automatic identification method and system
CN113709545A (en) * 2021-04-13 2021-11-26 腾讯科技(深圳)有限公司 Video processing method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
WO2015103836A1 (en) 2015-07-16

Similar Documents

Publication Publication Date Title
CN104766608A (en) Voice control method and voice control device
CN110049270B (en) Multi-person conference voice transcription method, device, system, equipment and storage medium
AU2021277642A1 (en) Method and apparatus for detecting spoofing conditions
WO2021051506A1 (en) Voice interaction method and apparatus, computer device and storage medium
JP6469252B2 (en) Account addition method, terminal, server, and computer storage medium
CN108182944A (en) Control the method, apparatus and intelligent terminal of intelligent terminal
US9293140B2 (en) Speaker-identification-assisted speech processing systems and methods
CN111341325A (en) Voiceprint recognition method and device, storage medium and electronic device
CN102543071A (en) Voice recognition system and method used for mobile equipment
CN108010513B (en) Voice processing method and device
CN103347070B (en) Push method, terminal, server and the system of speech data
CN110995943B (en) Multi-user streaming voice recognition method, system, device and medium
CN113436609B (en) Voice conversion model, training method thereof, voice conversion method and system
CN113724718B (en) Target audio output method, device and system
CN113345473B (en) Voice endpoint detection method, device, electronic equipment and storage medium
US9454959B2 (en) Method and apparatus for passive data acquisition in speech recognition and natural language understanding
CN103514882A (en) Voice identification method and system
CN110570847A (en) Man-machine interaction system and method for multi-person scene
CN113299306B (en) Echo cancellation method, echo cancellation device, electronic equipment and computer-readable storage medium
CN110556114B (en) Speaker identification method and device based on attention mechanism
JP6448950B2 (en) Spoken dialogue apparatus and electronic device
CN113823303A (en) Audio noise reduction method and device and computer readable storage medium
CN107886940A (en) Voiced translation processing method and processing device
CN113345423B (en) Voice endpoint detection method, device, electronic equipment and storage medium
CN115762500A (en) Voice processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20150708

WW01 Invention patent application withdrawn after publication