CN101419797A

CN101419797A - Method for enhancing speech identification efficiency and speech identification apparatus

Info

Publication number: CN101419797A
Application number: CNA2008102326005A
Authority: CN
Inventors: 赵仁宏
Original assignee: Inventec Besta Xian Co Ltd
Current assignee: Inventec Besta Xian Co Ltd
Priority date: 2008-12-05
Filing date: 2008-12-05
Publication date: 2009-04-29

Abstract

The invention relates to a method used for improving the language identification efficiency and a language identification device thereof; the method comprises the steps: 1) at least one identification data is provided, and a first characteristic parameter is set in the identification data; 2) a voice signal is received; 3) the beginning position and the ending position of the voice signal are detected by utilizing voice activity detection program, thereby obtaining a first voice wave band; 4) a voice area and an unvoiced sound area in the first voice wave band are judged, a second characteristic parameter is set in the voice area of the first voice wave band, and the second characteristic parameter is corresponding to the first characteristic parameter; 5) the unvoiced sound area in the first voice wave band is deleted by using the voice activity detection program, thus generating a second voice wave band; 6) the first characteristic parameter in the identification data is compared with the second characteristic parameter in the second voice wave band to judge whether the second voice wave band is in accordance with the identification data or not, thus completing voice recognition. The invention has low calculating amount and high identification accuracy.

Description

A kind of method and voice identification apparatus thereof that improves speech recognition efficient

Technical field

The present invention relates to a kind of method and voice identification apparatus thereof that improves speech recognition efficient, relate in particular to a kind of method and device that promotes the speech recognition accuracy rate by deletion non-voice district.

Background technology

The definition that speech recognition is the most basic promptly " is calculated function and is understood statement or the life that the mankind speak

, and make corresponding work ".Just, if computing machine is equipped with the function of " speech recognition ", when being converted to a kind of voice signal by a conversion equipment, sound is input to an electronic installation inside, and after being stored, the speech recognition program just begins with the sample sound of your input and stores the work that good sample sound is compared in advance.Voice signal comparison work finish after, electronic installation will import one it think " as " the sample sound sequence number, what connotation the sound that can identification sends just now is, and then carries out this life

But will really set up the high speech recognition package of discrimination power, its difficulty is very high.For example, as want ten vocabulary of identification, that is exactly earlier the sound of these ten glossarys to be read in the computing machine, be saved as ten reference samples, during identification, only need the speech sound signal that will receive and ten reference samples of record in advance to compare one by one, find out with test sample book as sample, can pick out Come to test sample book.But voice length, tone, frequency that each user reads into computing machine are all different, even same position user, although all read identical sound at every turn, waveform also can be not quite similar, if in an environment that noise arranged, that situation is with even worse.Therefore, how many people solve this difficult problem in research.

For this kind problem, the someone attempts using as fourier transform, cepstrum parameter etc., but the result still is undesirable.

In addition, the needed operand of speech recognition package that accuracy rate is high more is also just big more, so just can't realize the speech recognition of high-accuracy on the mancarried device that hangs down the operand processor in the past.Yet mancarried device is very general at present, almost is that everybody can use every day.In view of above problem the present invention proposes method and the voice identification apparatus that a kind of operand is low and the identification accuracy rate is high.

Summary of the invention

The identification accuracy rate of mentioning in order to solve in the prior art is low, and needing processor calculating rate height and having now all is the shortcoming of portable apparatus, the invention provides a kind of operand is low and the identification accuracy rate is high speech identifying method and voice identification apparatus thereof.

Technical solution of the present invention: the present invention is a kind of method that improves language idendification efficient, and its special character is: this method may further comprise the steps:

1) provide at least one Identification Data that one first characteristic parameter also is set therein;

2) receive a voice signal;

3) utilize the voice activity detection program, detect the position of the beginning and the end of this voice signal, thereby obtain the first voice wave band;

4) differentiate speech region and non-voice district in the first voice wave band, and in the speech region of this first voice wave band one second characteristic parameter is set, described second characteristic parameter is corresponding with first characteristic parameter;

5) utilize the voice activity detection program, thereby the non-voice district of deleting in the first voice wave band produces the second voice wave band;

6) second characteristic parameter in first characteristic parameter in the Identification Data and the second voice wave band is compared, whether meet this Identification Data to judge the second voice wave band, to finish speech recognition.

Above-mentioned Identification Data is an Identification Data of having utilized the voice activity detection program to handle through deletion non-voice district.

Above-mentioned non-voice district is meant quiet or noise.

Above-mentioned Identification Data, the second voice wave band all are digital signals.

Above-mentioned Identification Data is that the speech data prerecorded for the user or manufacturer are stored in the speech data in the electronic installation in advance.

Above-mentioned speech data is a phonetic order.

Above-mentioned voice signal is a phonetic order.

A kind of voice identification apparatus that uses the method for above-mentioned raising language idendification efficient, its special character is: this device comprises and is used to store at least one has deleted the Identification Data of handling through the non-voice district storage element, the receiving element that is used to receive sound and sound is converted into voice signal, be used for position that the detecting voice signal begins and finish to obtain the first voice wave band, and delete non-voice district in the first voice wave band to produce the second voice wave band, then compare the Identification Data and the second voice wave band, judging whether the second voice wave band meets the processing unit of Identification Data, and the judgement unit that is used for differentiating the first voice wave band speech region and non-voice district; Receiving element inserts processing unit, and processing unit links to each other with storage element with judgement unit respectively.

The present invention has the following advantages:

The position that speech sound signal began and finishes when (1) the present invention had used the voice activity detection program to carry out speech recognition with decision, after obtaining the first voice wave band that carries out speech recognition, for example: the voice paragraph, carry out secondary treating again, the back is eliminated to produce second a voice wave band of not having a non-voice district by non-voice district (quiet or noise) in the first voice wave band paragraph, and utilize this second voice wave band of a plurality of Identification Data identifications, improve the efficient of identification according to this.

(2) only need the part of processed voice in the method for the present invention, so just can reduce the load of machinery systems of system or not need the microprocessor (CPU) of higher performance.

Description of drawings

Fig. 1 improves the flow chart of steps of the method for speech recognition efficient for the present invention;

Fig. 2 is the embodiment calcspar of voice identification apparatus of the present invention;

Fig. 3 is an embodiments of the invention synoptic diagram one;

Fig. 4 is an embodiments of the invention synoptic diagram two.

Description of reference numerals: the 20-first voice wave band, 201-speech region, 202-non-voice district, the 21-initiating key, 22-end key, 23-singer's identification key, 24-singer's menu, 25-progress bar, the 26-second voice wave band, the 31-storage unit, 32-receiving element, 33-processing unit, the 34-judgement unit, 311-Identification Data, 321-voice signal, the 331-first voice wave band, the 332-second voice wave band.

Embodiment

As shown in Figure 1, improve the flow chart of steps of the method for speech recognition efficient for the present invention.The method includes the steps of:

S11: provide at least one Identification Data, and this Identification Data is to have utilized the voice activity detection program to handle through deletion non-voice district (quiet or noise).These Identification Data are speech datas of prerecording for the user, or manufacturer is stored in the speech data in the electronic installation in advance; This Identification Data can be phonetic order; First characteristic parameter is set in this Identification Data;

S12: received speech signal, for example phonetic order that the user imported;

S13: (Voice Activity Detection VAD) detects position that this voice signal begins and finish to obtain the first voice wave band to utilize the voice activity detection program;

S14: differentiate speech region and non-voice district (quiet or noise) in the first voice wave band, and in the speech region of this first voice wave band one second characteristic parameter is set, second characteristic parameter is corresponding with first characteristic parameter;

S15: (Voice Activity Detection VAD) deletes the non-voice district in the first voice wave band to produce the second voice wave band to utilize the voice activity detection program;

S16: the comparison Identification Data and the second voice wave band, to judge whether the second voice wave band meets this Identification Data, soon first characteristic parameter in the Identification Data and second characteristic parameter in the second voice wave band are compared, to judge whether the second voice wave band meets this Identification Data, to finish speech recognition.

If Identification Data meets the second voice wave band, then just carry out the pairing instruction of this Identification Data, just can reach the effect of speech-input instructions.

Voice activity detection program (Voice Activity Detection, abbreviation VAD) purpose is the position that begins and finish for the decision voice, in speech processes and identification, play the part of an important role, how effectively to use the VAD technology very big influence to be arranged for speech recognition efficient.

Describe more specifically, when the user began to sound, it was voice signal that the voice activity detection program just begins sound recording, promptly stopped to record after sounding finishes when detecting, and so just obtained the first voice wave band.Wherein, the speech region of the first voice wave band includes second characteristic parameter, and when the non-voice district of the deletion first voice wave band just produced the second voice wave band, the second voice wave band had also kept second characteristic parameter in the first voice wave band.Identification Data then includes first characteristic parameter, and this second characteristic parameter is corresponding with first characteristic parameter, so the Identification Data of being deposited in the storage unit that identification is used is the offset information of the second voice wave band.This Identification Data is stored in the storage unit.

Delete the non-voice district of the first voice wave band, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon, improves the speed of speech recognition.Just because of this, when carrying out speech recognition, to record the normal voice of user's input down in advance and be converted to the first voice wave band, again the non-voice district in the first voice wave band is deleted to produce the second voice wave band, then just the Identification Data in the second voice wave band and the storage unit is compared, when first characteristic parameter in the Identification Data is corresponding with second characteristic parameter in the second voice wave band, why can differentiate the second voice wave band, then identification is finished in expression.Improve the accuracy of identification and the use that reduces program resource by this method.

The second voice wave band also can be exchanged into a numerical digit signal and corresponding with the Identification Data in being stored in storage unit, wherein, sound is converted to voice signal or is known technology the technology that the second voice wave band is converted to the numerical digit signal, it is known by those skilled in the art, so no longer add to set forth at this.

Referring to Fig. 2, voice identification apparatus is to comprise storage unit 31, receiving element 32, processing unit 33 and judgement unit 34.Storage unit 31 is to be used to store at least one Identification Data 311, this Identification Data 311 is to handle through the deletion of non-voice district, receiving element 32 is in order to reception sound and is translated into voice signal 321 that receiving element 32 can be microphone or sound receives ear.Processing unit 33 is that detecting voice signal 321 position that begins and finish is to obtain the first voice wave band 331, judgement unit 33 is speech region and non-voice districts of differentiating the first voice wave band 331, then processing unit 32 is to delete the non-voice district of the first voice wave band 331 to produce the second voice wave band 332, and the comparison Identification Data 311 and the second voice wave band 332, whether meet Identification Data 311 to judge the second voice wave band 332.The speech region of the first voice wave band 331 comprises second characteristic parameter, and when the non-voice district of the deletion first voice wave band 331 during with the second voice wave band 332 that produces, the second voice wave band 332 has also kept second characteristic parameter in the first voice wave band 331.311 of Identification Data include first characteristic parameter, and this second characteristic parameter is corresponding with first characteristic parameter, so 31 li Identification Data of being deposited 311 of the database that identification is used are the offset information of the second voice wave band 332.Delete the non-voice district of the first voice wave band 331, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon, improves the speed of speech recognition.

Processing unit 33 is to utilize position that voice activity detection process monitoring voice signal 321 begins and finish and the wave band of deleting its non-voice district.Identification Data 311 during the second voice wave band 332 also can be exchanged into the one digit number signal and is stored in storage unit 31 is corresponding.

Among Fig. 3, voice identification apparatus provides speech recognition interface to the user, when the user presses initiating key 21, the sound receiving end just receives the sound that the user sends and is converted to voice signal, utilize voice activity detection program (Voice Activity Detection then, be called for short VAD) the detecting voice signal position that begins and finish to be to obtain the first voice wave band 20, this first voice wave band 20 includes second characteristic parameter, differentiates the speech region 201 and the non-voice district 202 of the first voice wave band 20 subsequently.

Among Fig. 4, utilize the VAD technology that non-voice district 202 is deleted, then obtain the second voice wave band 26, this second voice wave band 26 is the non-voice districts 202 of having deleted the first voice wave band 20, the speech region 201 that has only kept the first voice wave band 20, so the second voice wave band 26 has also kept second characteristic parameter of the speech region 201 of the first voice wave band 20, this method can reduce the signal length that voice identification apparatus need be handled.

Voice identification apparatus is to come the identification second voice wave band 26 according at least one Identification Data in the storage unit, and these a little Identification Data include first characteristic parameter, and this Identification Data is to handle through the deletion of non-voice district.In particular, cause second characteristic parameter is corresponding with first characteristic parameter, so these identification data can be considered the corresponding data of the second voice wave band, on the other hand, delete the non-voice district 202 of the first voice wave band 20, make characteristic parameter contraction in length used when carrying out the speech recognition comparison, for example: dwindle the length of the voice wave band that needs record, second characteristic parameter that then need write down also reduces thereupon.

When carrying out speech recognition, record in advance is the normal voice of user's input down, and be converted to the first voice wave band 20, again the non-voice district in the first voice wave band 20 is deleted to produce the second voice wave band 26, then just the Identification Data in the second voice wave band 26 and the storage unit is compared, when second characteristic parameter is corresponding with first characteristic parameter in the second voice wave band 26, why can differentiate the second voice wave band, finish identification.Can improve the accuracy of identification and the waste that reduces program resource by the method.

The define method that note that characteristic parameter is that this area worker knows, so do not giving unnecessary details at this.Identification Data during the second voice wave band 26 also can be exchanged into a digital signals and is stored in database is corresponding, wherein, sound is converted to voice signal or is known skill the technology that the second voice wave band 26 is converted to numerical digit, it is that this area person knows, so no longer add to set forth at this.

Wherein, desire in the speech recognition process, to carry out other instruction as the user, speech recognition interface also provides the action of end key 22 for the identification of user's terminated speech, and what is more, speech recognition interface also comprises a progress bar 25 is understood speech recognition for the user progress.In addition, the user carries out the singer when requesting a song, and can put the singer's identification key in the anthology speech recognition interface, and by user's input speech signal (as singer's title), voice identification apparatus is selected decision with the corresponding singer's short-list 24 that meets of identification to the user.

Claims

1, a kind of method that improves language idendification efficient, it is characterized in that: this method may further comprise the steps:

2) receive a voice signal;

2, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data is an Identification Data of having utilized the voice activity detection program to handle through deletion non-voice district.

3, the method for raising language idendification efficient according to claim 1 is characterized in that: described non-voice district is meant quiet or noise.

4, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data, the second voice wave band all are digital signals.

5, the method for raising language idendification efficient according to claim 1 is characterized in that: described Identification Data is that the speech data prerecorded for the user or manufacturer are stored in the speech data in the electronic installation in advance.

6, the method for raising language idendification efficient according to claim 5, it is characterized in that: described speech data is a phonetic order.

7, the method for raising language idendification efficient according to claim 1 is characterized in that: described voice signal is a phonetic order.

8, a kind of voice identification apparatus that uses the method for the described raising language idendification of claim 1 efficient, it is characterized in that: this device comprises and is used to store at least one has deleted the Identification Data of handling through the non-voice district storage element, the receiving element that is used to receive sound and sound is converted into voice signal, be used for position that the detecting voice signal begins and finish to obtain the first voice wave band, and delete non-voice district in the first voice wave band to produce the second voice wave band, then compare the Identification Data and the second voice wave band, judging whether the second voice wave band meets the processing unit of Identification Data, and the judgement unit that is used for differentiating the first voice wave band speech region and non-voice district; Described receiving element inserts processing unit, and described processing unit links to each other with storage element with judgement unit respectively.