CN101000767A

CN101000767A - Speech recognition equipment and method

Info

Publication number: CN101000767A
Application number: CNA2006100005316A
Authority: CN
Inventors: 陈刚; 陈骧; 吕凡; 王强
Original assignee: HANGZHOU SHIDAO SCIENCE AND TECHNOLOGY Co Ltd
Current assignee: HANGZHOU SHIDAO SCIENCE AND TECHNOLOGY Co Ltd
Priority date: 2006-01-09
Filing date: 2006-01-09
Publication date: 2007-07-18

Abstract

A method for identifying voice includes packing up voice character information from user voice information by voice character picking up module, sending said voice character information to voice character information to voice character identification unit, identifying whether user voice information is effective or not by voice character identification unit according to received voice character identification information. The voice identification device used for realizing said method is also disclosed.

Description

Speech recognition apparatus and method thereof

Technical field

The present invention relates to a kind of speech recognition apparatus and method, more specifically, relate to a kind of speech recognition apparatus and method thereof that is used for the communications field.

Background technology

In existing a large amount of telecommunication services, the operation terminal need, colourful voice messaging a large amount of with user interactions, and nearly all voice messaging is all finished by artificial recording, this has increased the cost of operator greatly, elongate the time of information issuing and renewal simultaneously, be difficult in time satisfy the dynamic various needs of user.And the method for prerecording is adopted in most recording and do not use the reason of speech synthesis system generation voice to be: traditional speech synthesis system often has the shortcoming that tone color is flat, the machine taste of voice is dense and lack characteristic.

Simultaneously, the user must be after hearing out voice suggestion when selecting information with traditional method, the voice suggestion that the button that constantly stops enters next section, and behind a series of loaded down with trivial details button operation of process, the information that can obtain wanting.Like this, multi-level complex operations often makes the user be fed up with, and simultaneously, the efficient of promptly losing time is low again.

Therefore, there is the cost height in traditional method, technological means is loaded down with trivial details and use problem such as inconvenience.

Summary of the invention

The present invention is devoted to overcome one or more in the problems referred to above of prior art, for this reason, a kind of speech recognition apparatus and method are provided, this method utilizes TTS (speech synthesis technique) technology at the accurate voice messaging database of the built-in day-mark of wire net, and the user can utilize speech recognition technology to carry out Information Selection by communication network like this.

For achieving the above object, the invention provides a kind of speech recognition apparatus, comprising: the received pronunciation information database is used for storing by the resulting received pronunciation information of TTS technology converting text information; The phonetic feature extraction element is used for extracting phonetic feature from user's voice; And the phonetic feature recognition device, be used to discern user's voice information.Optionally, this speech recognition apparatus comprises the phonetic feature information-storing device, is used to store the phonetic feature information of being extracted by the phonetic feature extraction element.

Preferably, speech recognition apparatus according to the present invention comprises: specific area language speech synthetic device, be used for carrying out phonetic synthesis at specific area respectively so that synthetic voice have more is professional, guarantee simultaneously voice natural and tripping, near true man's voice; The background music adding set is used for adding background music to received pronunciation information when phonetic synthesis, make communication process become rich and varied.

Simultaneously, for achieving the above object, the present invention also provides a kind of audio recognition method.The received pronunciation information database that this method utilization is set up carries out speech recognition, may further comprise the steps: utilize TTS (speech synthesis technique) technology to set up the received pronunciation information database; When the user passes through speech input information, from user's voice, extract phonetic feature information by the phonetic feature extraction element, subsequently this phonetic feature information is sent to the phonetic feature recognition device; Alternatively, the phonetic feature extraction element arrives the phonetic feature information-storing device with the phonetic feature information storage of being extracted, and the phonetic feature recognition device reads phonetic feature information from the phonetic feature information-storing device.Subsequently, the phonetic feature recognition device is searched for the phonetic feature information information corresponding with received (institute read) from the received pronunciation information database, thereby user speech is discerned, and at last, by communication network information is offered the user.

Beneficial effect of the present invention is: utilize speech recognition apparatus to convert text message to voice messaging, make the user can find required information quickly and easily.In addition, this method has broken through the limitation of phone numeric keypad, remove the inconvenience of user key-press and loaded down with trivial details from, and being applied in to the user of voice technology provides interactive mode easily, improve interactive dynamic, ageing, simplicity and recreational the time, also paved road for carrying out of new telecommunication service.

Description of drawings

Fig. 1 is a block diagram of describing speech recognition apparatus of the present invention;

Fig. 2 is a process flow diagram of describing the implementation procedure of speech recognition;

Fig. 3 is the main flow chart according to the first embodiment of the present invention;

Fig. 4 is the process flow diagram of saying song title choosing song according to the first embodiment of the present invention;

Fig. 5 is the process flow diagram of saying singer's title choosing song according to the first embodiment of the present invention;

Fig. 6 is the process flow diagram according to the report result of the first embodiment of the present invention.

Embodiment

Describe the preferred embodiments of the present invention in detail hereinafter with reference to accompanying drawing.

Fig. 1 is a block diagram of describing speech recognition apparatus of the present invention.Wherein, be used for being connected to and be used to discern whether effectively phonetic feature recognition device 104 of user's voice information from the phonetic feature extraction element 102 that user's voice information is extracted phonetic feature information, and phonetic feature recognition device 104 is connected to received pronunciation information database 106.Alternatively, speech recognition apparatus according to the present invention comprises phonetic feature information-storing device 108, is used for the phonetic feature information that the store voice feature deriving means is extracted, and the phonetic feature recognition device reads phonetic feature information from this storer.

Fig. 2 shows the implementation procedure according to audio recognition method of the present invention.When the user passed through speech input information, phonetic feature extraction element 102 extracted phonetic feature information from user's voice information, and the phonetic feature information of being extracted is sent to phonetic feature recognition device 104; Alternatively, phonetic feature extraction element 102 arrives phonetic feature information-storing device 108 with the phonetic feature information storage of being extracted, and phonetic feature recognition device 104 can read phonetic feature information from this phonetic feature information-storing device.Subsequently, phonetic feature recognition device 104 compares (being read) the phonetic feature information that received and the received pronunciation information in the received pronunciation information database 106, and with phonetic feature information be key word from received pronunciation information database 106, search for this key word information corresponding, thereby whether identification user's voice information is effective, in other words, if phonetic feature recognition device 104 searches from received pronunciation information database 106 and phonetic feature information information corresponding, then user's voice information is effective, otherwise, be invalid.

Wherein, according to embodiments of the invention one, received pronunciation information database 106 is a song database, preferably, is referred to as key word with song title and singer's name; According to embodiments of the invention two, received pronunciation information database 106 is the phone directory database.

Fig. 3 shows utilization, and speech recognition apparatus carries out the general flow chart that the voice choosing is sung according to the present invention.After the user connects the operation terminal, enter choosing song flow process as shown in Figure 3, obtain the prompt system prompting immediately, system will point out the user by song title choosing song or by singer's title choosing song.

Under the situation of user by song title choosing song, promptly say under the situation of song title, phonetic feature extraction element 102 extracts the song title characteristic information from user's voice information, and the song title characteristic information that will extract sends to phonetic feature recognition device 104, phonetic feature recognition device 104 will be that the song title corresponding with trip searched in key word from song database 106 with the song title characteristic information that receives then, if search corresponding song title, then be effectively, carry out report as a result the user's voice information Recognition; If do not search corresponding song title, be invalid then with the user's voice information Recognition.

Under the situation of user by singer's title choosing song, promptly say under the situation of singer's title, enter flow process (flow process 5) as shown in Figure 5 by singer's title choosing song.Phonetic feature extraction element 102 extracts singer's title characteristic information from user's voice information, and singer's title characteristic information that will extract sends to phonetic feature recognition device 104, the phonetic feature recognition device will be that the singer title corresponding with trip searched in key word from song database 106 with the singer's title characteristic information that receives then, if search corresponding singer's title, be effective then with the user's voice information Recognition; If do not search corresponding singer's title, be invalid then with the user's voice information Recognition.

When carrying out the report result step of the process flow diagram of pressing song title choosing song shown in Figure 4, preferably, the result who searches for from song database 106 when phonetic feature recognition device 104 is during more than one, may further comprise the steps: system will point out the user that a plurality of results are arranged, and the prompting user is by selecting different results (step 4-1) by different numeral keys.

When carrying out the report result step of the process flow diagram of pressing singer's title choosing song shown in Figure 5, preferably may further comprise the steps: whether the song of judging this singer is more than one, if then enter step 4-1 as shown in Figure 4, if not, then directly report.

By this embodiment as can be seen, by speech recognition apparatus of the present invention and audio recognition method, the user can be rapid and simple finds required information, has got rid of complex operating steps and complicated button selection operation.

Embodiment 2: by the individual voice call basis of speech recognition structure

Current, the renewal frequency of mobile phone is quite fast, simultaneously, mobile phone lose with spoilage also than higher, reached 10%.Need when changing mobile phone to import again or the input communication record, use inconvenience very.By the individual voice call basis of using speech recognition to make up, can provide a telecommunication record of never losing to the user, the user uses this individual's voice call after this, not only can all be saved in the system all address list contents, when needs dial, login native system by modes such as wap, note, voice, inquire required number, also can directly make a call.

Similar with embodiment 1, but different be, be each user allocate storage in personal call notebook data storehouse, utilize the calling party phone number as key word, utilize the TTS technology will be, and store this memory block of personal call into subscriber-related phonetic feature according to name that the user said and the telephone number of saying.When the user puts through Service Phone and says corresponding name, phonetic feature extraction element 104 extracts the name phonetic feature from user's voice information, simultaneously this name phonetic feature is sent to phonetic feature recognition device 106, phonetic feature recognition device 106 is searched for from the database of this memory block of personal call and the corresponding name of name phonetic feature information that received subsequently, and the telephone number of this name correspondence is sent to the user or directly calls this number according to user's request with the note form.

Be the preferred embodiments of the present invention only below, be not limited to the present invention.This law is bright can also multiple different implementation, for example, can be used for voice SMS, voice mail and speech secretary etc.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a speech recognition apparatus that is used for the communications field is characterized in that, comprising:

The received pronunciation information database is used to store the received pronunciation information that is converted to by text message;

The phonetic feature extraction element is used for extracting phonetic feature information from user's voice information; And

The phonetic feature recognition device, be connected to described received pronunciation information database, be used to receive phonetic feature information from described phonetic feature extraction element, and received phonetic feature information and the received pronunciation information that is stored in the described received pronunciation information database compared, thereby whether identification user's voice information is effective.

2. speech recognition apparatus according to claim 1, it is characterized in that, described phonetic feature recognition device with the phonetic feature information that receives from described phonetic feature extraction element as key word, the search received pronunciation information relevant in described received pronunciation information database with described key word.

3. speech recognition apparatus according to claim 1, it is characterized in that, further comprise the phonetic feature information-storing device, be used to store the phonetic feature information that described phonetic feature extraction element is extracted, described phonetic feature recognition device can read described phonetic feature information from described phonetic feature information-storing device.

4. speech recognition apparatus according to claim 1 is characterized in that, further comprises:

Specific area language speech synthetic device is used for carrying out phonetic synthesis at specific area respectively, so that synthetic voice have more is professional, guarantees that simultaneously voice are natural and tripping.

5. according to each described speech recognition apparatus in the claim 1 to 4, it is characterized in that, further comprise:

The background music adding set is used for adding the received pronunciation information of background music in the described received pronunciation information database in the phonetic synthesis process.

6. an audio recognition method that is used for the communications field is characterized in that, may further comprise the steps:

Extraction step uses the phonetic feature extraction element to extract phonetic feature information from user's voice information, and the phonetic feature information of being extracted is sent to described phonetic feature recognition device; And

Whether identification step uses the phonetic feature recognition device effective according to received phonetic feature information Recognition user's voice information.

7. audio recognition method according to claim 6 is characterized in that, before described extraction step and described identification step, further may further comprise the steps:

Storing step, the received pronunciation information of using the standard information database storage to be converted to by text message.

8. audio recognition method according to claim 7 is characterized in that, further may further comprise the steps:

Comparison step described speech recognition shape device is compared received phonetic feature information and the received pronunciation information that is stored in the described received pronunciation information database, thereby whether identification user's voice information is effective.

9. audio recognition method according to claim 6 is characterized in that, described storing step further may further comprise the steps:

Specific area language synthesis step uses specific area language synthesizer to carry out phonetic synthesis at specific area respectively.

10. according to each described audio recognition method in the claim 6 to 9, it is characterized in that, further may further comprise the steps:

Background music adds step, uses the background music adding set to add the received pronunciation information of background music in the described received pronunciation information database in the phonetic synthesis process.

11. audio recognition method according to claim 6 is characterized in that, further may further comprise the steps:

Phonetic feature information stores step, the phonetic feature information of using the phonetic feature information memory stores in described extraction step, to extract, and described speech recognition apparatus reads the phonetic feature information of storage from described phonetic feature storer.

12. audio recognition method according to claim 6, it is characterized in that, described identification step further may further comprise the steps: described phonetic feature recognition device as key word, is searched for the received pronunciation information relevant with described key word with received phonetic feature information from described received pronunciation database.

13. audio recognition method according to claim 6, it is characterized in that: when described phonetic feature recognition device searches with the corresponding received pronunciation information of received phonetic feature information from described received pronunciation database, with described user speech information Recognition is effective, on the contrary, when described speech recognition apparatus does not search with the corresponding received pronunciation information of received phonetic feature information from described standard information database, be invalid with described user speech information Recognition.