CN101419797A - Method for enhancing speech identification efficiency and speech identification apparatus - Google Patents

Method for enhancing speech identification efficiency and speech identification apparatus Download PDF

Info

Publication number
CN101419797A
CN101419797A CNA2008102326005A CN200810232600A CN101419797A CN 101419797 A CN101419797 A CN 101419797A CN A2008102326005 A CNA2008102326005 A CN A2008102326005A CN 200810232600 A CN200810232600 A CN 200810232600A CN 101419797 A CN101419797 A CN 101419797A
Authority
CN
China
Prior art keywords
voice
wave band
identification data
characteristic parameter
voice wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008102326005A
Other languages
Chinese (zh)
Inventor
赵仁宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Besta Xian Co Ltd
Original Assignee
Inventec Besta Xian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Besta Xian Co Ltd filed Critical Inventec Besta Xian Co Ltd
Priority to CNA2008102326005A priority Critical patent/CN101419797A/en
Publication of CN101419797A publication Critical patent/CN101419797A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a method used for improving the language identification efficiency and a language identification device thereof; the method comprises the steps: 1) at least one identification data is provided, and a first characteristic parameter is set in the identification data; 2) a voice signal is received; 3) the beginning position and the ending position of the voice signal are detected by utilizing voice activity detection program, thereby obtaining a first voice wave band; 4) a voice area and an unvoiced sound area in the first voice wave band are judged, a second characteristic parameter is set in the voice area of the first voice wave band, and the second characteristic parameter is corresponding to the first characteristic parameter; 5) the unvoiced sound area in the first voice wave band is deleted by using the voice activity detection program, thus generating a second voice wave band; 6) the first characteristic parameter in the identification data is compared with the second characteristic parameter in the second voice wave band to judge whether the second voice wave band is in accordance with the identification data or not, thus completing voice recognition. The invention has low calculating amount and high identification accuracy.

Description

A kind of method and voice identification apparatus thereof that improves speech recognition efficient
Technical field
The present invention relates to a kind of method and voice identification apparatus thereof that improves speech recognition efficient, relate in particular to a kind of method and device that promotes the speech recognition accuracy rate by deletion non-voice district.
Background technology
The definition that speech recognition is the most basic promptly " is calculated function and is understood statement or the life that the mankind speak
Figure A200810232600D0004083852QIETU
, and make corresponding work ".Just, if computing machine is equipped with the function of " speech recognition ", when being converted to a kind of voice signal by a conversion equipment, sound is input to an electronic installation inside, and after being stored, the speech recognition program just begins with the sample sound of your input and stores the work that good sample sound is compared in advance.Voice signal comparison work finish after, electronic installation will import one it think " as " the sample sound sequence number, what connotation the sound that can identification sends just now is, and then carries out this life
Figure A200810232600D0004083852QIETU
But will really set up the high speech recognition package of discrimination power, its difficulty is very high.For example, as want ten vocabulary of identification, that is exactly earlier the sound of these ten glossarys to be read in the computing machine, be saved as ten reference samples, during identification, only need the speech sound signal that will receive and ten reference samples of record in advance to compare one by one, find out with test sample book as sample, can pick out Come to test sample book.But voice length, tone, frequency that each user reads into computing machine are all different, even same position user, although all read identical sound at every turn, waveform also can be not quite similar, if in an environment that noise arranged, that situation is with even worse.Therefore, how many people solve this difficult problem in research.
For this kind problem, the someone attempts using as fourier transform, cepstrum parameter etc., but the result still is undesirable.
In addition, the needed operand of speech recognition package that accuracy rate is high more is also just big more, so just can't realize the speech recognition of high-accuracy on the mancarried device that hangs down the operand processor in the past.Yet mancarried device is very general at present, almost is that everybody can use every day.In view of above problem the present invention proposes method and the voice identification apparatus that a kind of operand is low and the identification accuracy rate is high.
Summary of the invention
The identification accuracy rate of mentioning in order to solve in the prior art is low, and needing processor calculating rate height and having now all is the shortcoming of portable apparatus, the invention provides a kind of operand is low and the identification accuracy rate is high speech identifying method and voice identification apparatus thereof.
Technical solution of the present invention: the present invention is a kind of method that improves language idendification efficient, and its special character is: this method may further comprise the steps:
1) provide at least one Identification Data that one first characteristic parameter also is set therein;
2) receive a voice signal;
3) utilize the voice activity detection program, detect the position of the beginning and the end of this voice signal, thereby obtain the first voice wave band;
4) differentiate speech region and non-voice district in the first voice wave band, and in the speech region of this first voice wave band one second characteristic parameter is set, described second characteristic parameter is corresponding with first characteristic parameter;
5) utilize the voice activity detection program, thereby the non-voice district of deleting in the first voice wave band produces the second voice wave band;
6) second characteristic parameter in first characteristic parameter in the Identification Data and the second voice wave band is compared, whether meet this Identification Data to judge the second voice wave band, to finish speech recognition.
Above-mentioned Identification Data is an Identification Data of having utilized the voice activity detection program to handle through deletion non-voice district.
Above-mentioned non-voice district is meant quiet or noise.
Above-mentioned Identification Data, the second voice wave band all are digital signals.
Above-mentioned Identification Data is that the speech data prerecorded for the user or manufacturer are stored in the speech data in the electronic installation in advance.
Above-mentioned speech data is a phonetic order.
Above-mentioned voice signal is a phonetic order.
A kind of voice identification apparatus that uses the method for above-mentioned raising language idendification efficient, its special character is: this device comprises and is used to store at least one has deleted the Identification Data of handling through the non-voice district storage element, the receiving element that is used to receive sound and sound is converted into voice signal, be used for position that the detecting voice signal begins and finish to obtain the first voice wave band, and delete non-voice district in the first voice wave band to produce the second voice wave band, then compare the Identification Data and the second voice wave band, judging whether the second voice wave band meets the processing unit of Identification Data, and the judgement unit that is used for differentiating the first voice wave band speech region and non-voice district; Receiving element inserts processing unit, and processing unit links to each other with storage element with judgement unit respectively.
The present invention has the following advantages:
The position that speech sound signal began and finishes when (1) the present invention had used the voice activity detection program to carry out speech recognition with decision, after obtaining the first voice wave band that carries out speech recognition, for example: the voice paragraph, carry out secondary treating again, the back is eliminated to produce second a voice wave band of not having a non-voice district by non-voice district (quiet or noise) in the first voice wave band paragraph, and utilize this second voice wave band of a plurality of Identification Data identifications, improve the efficient of identification according to this.
(2) only need the part of processed voice in the method for the present invention, so just can reduce the load of machinery systems of system or not need the microprocessor (CPU) of higher performance.
Description of drawings
Fig. 1 improves the flow chart of steps of the method for speech recognition efficient for the present invention;
Fig. 2 is the embodiment calcspar of voice identification apparatus of the present invention;
Fig. 3 is an embodiments of the invention synoptic diagram one;
Fig. 4 is an embodiments of the invention synoptic diagram two.
Description of reference numerals: the 20-first voice wave band, 201-speech region, 202-non-voice district, the 21-initiating key, 22-end key, 23-singer's identification key, 24-singer's menu, 25-progress bar, the 26-second voice wave band, the 31-storage unit, 32-receiving element, 33-processing unit, the 34-judgement unit, 311-Identification Data, 321-voice signal, the 331-first voice wave band, the 332-second voice wave band.
Embodiment
As shown in Figure 1, improve the flow chart of steps of the method for speech recognition efficient for the present invention.The method includes the steps of:
S11: provide at least one Identification Data, and this Identification Data is to have utilized the voice activity detection program to handle through deletion non-voice district (quiet or noise).These Identification Data are speech datas of prerecording for the user, or manufacturer is stored in the speech data in the electronic installation in advance; This Identification Data can be phonetic order; First characteristic parameter is set in this Identification Data;
S12: received speech signal, for example phonetic order that the user imported;
S13: (Voice Activity Detection VAD) detects position that this voice signal begins and finish to obtain the first voice wave band to utilize the voice activity detection program;
S14: differentiate speech region and non-voice district (quiet or noise) in the first voice wave band, and in the speech region of this first voice wave band one second characteristic parameter is set, second characteristic parameter is corresponding with first characteristic parameter;
S15: (Voice Activity Detection VAD) deletes the non-voice district in the first voice wave band to produce the second voice wave band to utilize the voice activity detection program;
S16: the comparison Identification Data and the second voice wave band, to judge whether the second voice wave band meets this Identification Data, soon first characteristic parameter in the Identification Data and second characteristic parameter in the second voice wave band are compared, to judge whether the second voice wave band meets this Identification Data, to finish speech recognition.
If Identification Data meets the second voice wave band, then just carry out the pairing instruction of this Identification Data, just can reach the effect of speech-input instructions.
Voice activity detection program (Voice Activity Detection, abbreviation VAD) purpose is the position that begins and finish for the decision voice, in speech processes and identification, play the part of an important role, how effectively to use the VAD technology very big influence to be arranged for speech recognition efficient.
Describe more specifically, when the user began to sound, it was voice signal that the voice activity detection program just begins sound recording, promptly stopped to record after sounding finishes when detecting, and so just obtained the first voice wave band.Wherein, the speech region of the first voice wave band includes second characteristic parameter, and when the non-voice district of the deletion first voice wave band just produced the second voice wave band, the second voice wave band had also kept second characteristic parameter in the first voice wave band.Identification Data then includes first characteristic parameter, and this second characteristic parameter is corresponding with first characteristic parameter, so the Identification Data of being deposited in the storage unit that identification is used is the offset information of the second voice wave band.This Identification Data is stored in the storage unit.
Delete the non-voice district of the first voice wave band, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon, improves the speed of speech recognition.Just because of this, when carrying out speech recognition, to record the normal voice of user's input down in advance and be converted to the first voice wave band, again the non-voice district in the first voice wave band is deleted to produce the second voice wave band, then just the Identification Data in the second voice wave band and the storage unit is compared, when first characteristic parameter in the Identification Data is corresponding with second characteristic parameter in the second voice wave band, why can differentiate the second voice wave band, then identification is finished in expression.Improve the accuracy of identification and the use that reduces program resource by this method.
The second voice wave band also can be exchanged into a numerical digit signal and corresponding with the Identification Data in being stored in storage unit, wherein, sound is converted to voice signal or is known technology the technology that the second voice wave band is converted to the numerical digit signal, it is known by those skilled in the art, so no longer add to set forth at this.
Referring to Fig. 2, voice identification apparatus is to comprise storage unit 31, receiving element 32, processing unit 33 and judgement unit 34.Storage unit 31 is to be used to store at least one Identification Data 311, this Identification Data 311 is to handle through the deletion of non-voice district, receiving element 32 is in order to reception sound and is translated into voice signal 321 that receiving element 32 can be microphone or sound receives ear.Processing unit 33 is that detecting voice signal 321 position that begins and finish is to obtain the first voice wave band 331, judgement unit 33 is speech region and non-voice districts of differentiating the first voice wave band 331, then processing unit 32 is to delete the non-voice district of the first voice wave band 331 to produce the second voice wave band 332, and the comparison Identification Data 311 and the second voice wave band 332, whether meet Identification Data 311 to judge the second voice wave band 332.The speech region of the first voice wave band 331 comprises second characteristic parameter, and when the non-voice district of the deletion first voice wave band 331 during with the second voice wave band 332 that produces, the second voice wave band 332 has also kept second characteristic parameter in the first voice wave band 331.311 of Identification Data include first characteristic parameter, and this second characteristic parameter is corresponding with first characteristic parameter, so 31 li Identification Data of being deposited 311 of the database that identification is used are the offset information of the second voice wave band 332.Delete the non-voice district of the first voice wave band 331, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon, improves the speed of speech recognition.
Processing unit 33 is to utilize position that voice activity detection process monitoring voice signal 321 begins and finish and the wave band of deleting its non-voice district.Identification Data 311 during the second voice wave band 332 also can be exchanged into the one digit number signal and is stored in storage unit 31 is corresponding.
Among Fig. 3, voice identification apparatus provides speech recognition interface to the user, when the user presses initiating key 21, the sound receiving end just receives the sound that the user sends and is converted to voice signal, utilize voice activity detection program (Voice Activity Detection then, be called for short VAD) the detecting voice signal position that begins and finish to be to obtain the first voice wave band 20, this first voice wave band 20 includes second characteristic parameter, differentiates the speech region 201 and the non-voice district 202 of the first voice wave band 20 subsequently.
Among Fig. 4, utilize the VAD technology that non-voice district 202 is deleted, then obtain the second voice wave band 26, this second voice wave band 26 is the non-voice districts 202 of having deleted the first voice wave band 20, the speech region 201 that has only kept the first voice wave band 20, so the second voice wave band 26 has also kept second characteristic parameter of the speech region 201 of the first voice wave band 20, this method can reduce the signal length that voice identification apparatus need be handled.
Voice identification apparatus is to come the identification second voice wave band 26 according at least one Identification Data in the storage unit, and these a little Identification Data include first characteristic parameter, and this Identification Data is to handle through the deletion of non-voice district.In particular, cause second characteristic parameter is corresponding with first characteristic parameter, so these identification data can be considered the corresponding data of the second voice wave band, on the other hand, delete the non-voice district 202 of the first voice wave band 20, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon.
When carrying out speech recognition, record in advance is the normal voice of user's input down, and be converted to the first voice wave band 20, again the non-voice district in the first voice wave band 20 is deleted to produce the second voice wave band 26, then just the Identification Data in the second voice wave band 26 and the storage unit is compared, when second characteristic parameter is corresponding with first characteristic parameter in the second voice wave band 26, why can differentiate the second voice wave band, finish identification.Can improve the accuracy of identification and the waste that reduces program resource by the method.
The define method that note that characteristic parameter is that this area worker knows, so do not giving unnecessary details at this.Identification Data during the second voice wave band 26 also can be exchanged into a digital signals and is stored in database is corresponding, wherein, sound is converted to voice signal or is known skill the technology that the second voice wave band 26 is converted to numerical digit, it is that this area person knows, so no longer add to set forth at this.
Wherein, desire in the speech recognition process, to carry out other instruction as the user, speech recognition interface also provides the action of end key 22 for the identification of user's terminated speech, and what is more, speech recognition interface also comprises a progress bar 25 is understood speech recognition for the user progress.In addition, the user carries out the singer when requesting a song, and can put the singer's identification key in the anthology speech recognition interface, and by user's input speech signal (as singer's title), voice identification apparatus is selected decision with the corresponding singer's short-list 24 that meets of identification to the user.

Claims (8)

1, a kind of method that improves language idendification efficient, it is characterized in that: this method may further comprise the steps:
1) provide at least one Identification Data that one first characteristic parameter also is set therein;
2) receive a voice signal;
3) utilize the voice activity detection program, detect the position of the beginning and the end of this voice signal, thereby obtain the first voice wave band;
4) differentiate speech region and non-voice district in the first voice wave band, and in the speech region of this first voice wave band one second characteristic parameter is set, described second characteristic parameter is corresponding with first characteristic parameter;
5) utilize the voice activity detection program, thereby the non-voice district of deleting in the first voice wave band produces the second voice wave band;
6) second characteristic parameter in first characteristic parameter in the Identification Data and the second voice wave band is compared, whether meet this Identification Data to judge the second voice wave band, to finish speech recognition.
2, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data is an Identification Data of having utilized the voice activity detection program to handle through deletion non-voice district.
3, the method for raising language idendification efficient according to claim 1 is characterized in that: described non-voice district is meant quiet or noise.
4, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data, the second voice wave band all are digital signals.
5, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data is that the speech data prerecorded for the user or manufacturer are stored in the speech data in the electronic installation in advance.
6, the method for raising language idendification efficient according to claim 5, it is characterized in that: described speech data is a phonetic order.
7, the method for raising language idendification efficient according to claim 1 is characterized in that: described voice signal is a phonetic order.
8, a kind of voice identification apparatus that uses the method for the described raising language idendification of claim 1 efficient, it is characterized in that: this device comprises and is used to store at least one has deleted the Identification Data of handling through the non-voice district storage element, the receiving element that is used to receive sound and sound is converted into voice signal, be used for position that the detecting voice signal begins and finish to obtain the first voice wave band, and delete non-voice district in the first voice wave band to produce the second voice wave band, then compare the Identification Data and the second voice wave band, judging whether the second voice wave band meets the processing unit of Identification Data, and the judgement unit that is used for differentiating the first voice wave band speech region and non-voice district; Described receiving element inserts processing unit, and described processing unit links to each other with storage element with judgement unit respectively.
CNA2008102326005A 2008-12-05 2008-12-05 Method for enhancing speech identification efficiency and speech identification apparatus Pending CN101419797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008102326005A CN101419797A (en) 2008-12-05 2008-12-05 Method for enhancing speech identification efficiency and speech identification apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008102326005A CN101419797A (en) 2008-12-05 2008-12-05 Method for enhancing speech identification efficiency and speech identification apparatus

Publications (1)

Publication Number Publication Date
CN101419797A true CN101419797A (en) 2009-04-29

Family

ID=40630562

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008102326005A Pending CN101419797A (en) 2008-12-05 2008-12-05 Method for enhancing speech identification efficiency and speech identification apparatus

Country Status (1)

Country Link
CN (1) CN101419797A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104934043A (en) * 2015-06-17 2015-09-23 广东欧珀移动通信有限公司 Audio processing method and device
WO2017045429A1 (en) * 2015-09-18 2017-03-23 广州酷狗计算机科技有限公司 Audio data detection method and system and storage medium
CN108091334A (en) * 2016-11-17 2018-05-29 株式会社东芝 Identification device, recognition methods and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104934043A (en) * 2015-06-17 2015-09-23 广东欧珀移动通信有限公司 Audio processing method and device
WO2017045429A1 (en) * 2015-09-18 2017-03-23 广州酷狗计算机科技有限公司 Audio data detection method and system and storage medium
CN108091334A (en) * 2016-11-17 2018-05-29 株式会社东芝 Identification device, recognition methods and storage medium

Similar Documents

Publication Publication Date Title
CN106463112B (en) Voice recognition method, voice awakening device, voice recognition device and terminal
CN110428810B (en) Voice wake-up recognition method and device and electronic equipment
CN103095911B (en) Method and system for finding mobile phone through voice awakening
KR102072730B1 (en) Determining hotword suitability
US7010490B2 (en) Method, system, and apparatus for limiting available selections in a speech recognition system
CN110047481B (en) Method and apparatus for speech recognition
CN109785859B (en) Method, device and computer equipment for managing music based on voice analysis
CN100521708C (en) Voice recognition and voice tag recoding and regulating method of mobile information terminal
CN104168353A (en) Bluetooth earphone and voice interaction control method thereof
CN109215647A (en) Voice awakening method, electronic equipment and non-transient computer readable storage medium
CN101419797A (en) Method for enhancing speech identification efficiency and speech identification apparatus
CN110889008B (en) Music recommendation method and device, computing device and storage medium
JP6549009B2 (en) Communication terminal and speech recognition system
CN106453910A (en) Call communication recording method and device
CN109377982A (en) A kind of efficient voice acquisition methods
CN103247316B (en) The method and system of index building in a kind of audio retrieval
CN105788590B (en) Audio recognition method and device, mobile terminal
JP7002822B2 (en) Voice analysis system and voice analysis method
CN104766610A (en) Voice recognition system and method based on vibration
CN111292723A (en) Voice recognition system
JP3311467B2 (en) Speech recognition system
KR20200109841A (en) A speech recognition apparatus
CN110473517A (en) Speech detection method and speech detection device
KR102075399B1 (en) A speech recognition apparatus
CN103811008A (en) Audio frequency content identification method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20090429