CN1217313C

CN1217313C - Method for recognizing voice modified tone in system of recognizing voice of the Chinese language

Info

Publication number: CN1217313C
Application number: CN021447640A
Authority: CN
Inventors: 林宽农; 陈秋涌
Original assignee: Individual
Current assignee: Individual
Priority date: 2002-12-10
Filing date: 2002-12-10
Publication date: 2005-08-31
Anticipated expiration: 2022-12-10
Also published as: CN1416111A

Abstract

The present invention discloses a recognition method for tone change in a Chinese speed sound recognition system. When continuous 'third' pronunciation word connection or identical overlapping word connection is generated in Chinese speech sound (pronunciation), words and sentences with the tone change can be automatically recognized according to the relationship between character strings and word boundaries, and the system identification accuracy can be improved.

Description

The discrimination method of breaking of voice in the Chinese speech identification system

Technical field

The present invention relates to the discrimination method of breaking of voice in a kind of Chinese speech identification system, especially refer to a kind of in Chinese speech, automatically according to character string relation and speech circle, coming the identification continuous tone is the language modified tone words and phrases that three word or same word are produced when connecting, and promotes the accuracy of Chinese speech identification system identification.

Background technology

The existing many-sided development of present voice identification system both domestic and external, major part all is the difference at the voice and the meaning of word in the words and phrases in various countries' language, and further discrimination method is provided, and uses the accuracy of promoting System Discrimination; Yet, in the Chinese speech identification system, except that the difference of general polyphone, voice, pronunciation (phonetic notation) causes the puzzlement and degree of difficulty of speech recognition easily, the special circumstances of tonal variations are more arranged, and these situations often become the significant obstacle of speech recognition ability; For example, each Chinese words all has the pronunciation of appointment, comprise " one " ( ) sound, " two " (

) sound, " three " (

) sound, " four " (

) sound four kinds of tones and " softly " ( ).And in the general Chinese statement, if it is all no problem to read Chinese statement according to " one " sound of the pronunciation (phonetic notation) of individual character to " four " or " softly ", this moment, voice and pronunciation were identical, but comprised the speech of " three " sound word continuously in the while statement, or be used in the identical reduplicated word of appellation, and go up at voice (pronunciation) and just have the situation of changing voice automatically and take place, make between voice and pronunciation to change and difference, and the situation of changing voice is relevant with the number of words of speech again, attempts example and is described as follows several frequently seen situation:

(1) in the middle of the character string of two words, if two words are all " three " sound, then first word is often changed and is read the sound for " two "; As: two word character strings such as " hello ", " thinking you ", and " thinking you " former phonetic notation is (ㄒㄧㄤ

ㄋㄧ ) tone, but (ㄒㄧㄤ then changed in voice (when reading sound)

ㄋㄧ ) tone.

(2) in the middle of triliteral character string, if continuous two " three " sound word is wherein arranged, then first " three " sound word must change thought " two " sound; If three words are " three " sound all, then first word and second word must change thought " two "; As: in the three word character strings of " submarine ", former phonetic notation is (ㄑㄧㄢ

ㄨㄟ

ㄊㄧㄥ ) tone, but then change into (ㄑㄧㄢ when reading sound ㄨㄟㄊㄧㄥ ) tone; And for example: the three word character strings of " president's prize ", former phonetic notation is (ㄗㄨㄥㄊㄨㄥ

ㄐㄧㄤ ) tone, but then change into (ㄗㄨㄥ when reading sound ㄊㄨㄥ

ㄐㄧㄤ ) tone.

(3) in the middle of the character string of four words, if four words " three " sound all, then first word and the 3rd word must change thought " two "; As: four word character strings such as " very few ", former phonetic notation is (ㄉㄧㄠㄉㄧㄠㄜ

ㄨ ) tone, but then change into (ㄉㄧㄠ when reading sound

ㄉㄧㄠ

ㄜ

ㄨ ) tone.

(4) in the middle of five words, if five words " three " sound all, then first word and the 3rd, the 4th word must change thought " two "; As: the five word character strings of " 999 99 " (the digital kenel that comprises this example), former phonetic notation is (ㄐㄧㄡ

ㄐㄧㄡㄐㄧㄡ

ㄐㄧㄡ

ㄐㄧㄡ ) tone, but then change into (ㄐㄧㄡ when reading sound ㄐㄧㄡㄐㄧㄡ

ㄐㄧㄡ

) tone.

(5) if continuously the number of words of " three " sound is an even number, and six words or six words are when above, and per two words in front are one group, according to the pronunciation of continuous two " three "; As: the six word character strings of " 999 999 ", former phonetic notation are (ㄐㄧㄡㄐㄧㄡㄐㄧㄡㄐㄧㄡ

ㄐㄧㄡ

ㄐㄧㄡ ) tone, but then change into (ㄐㄧㄡ when reading sound ㄐㄧㄡㄐㄧㄡㄐㄧㄡㄐㄧㄡㄐㄧㄡ ) tone.

(6) if continuously the number of words of " three " sound is an odd number, and seven words or seven words are when above, and per two words in front are one group, and according to the pronunciation of continuous two " three " sound, but last group is three words, according to the pronunciation of continuous three " three "; As: " 5555555 " former phonetic notation is (ㄨ

ㄨ

ㄨㄨ

ㄨ

ㄨㄨ ) tone, but then change into (ㄨ when reading sound

ㄨㄨ

ㄨㄨㄨㄨ ) tone.

As from the foregoing, above-mentioned tonal variations rule is to be applicable to general words and phrases and proper noun, and when having " speech circle " to distinguish in the three word character strings, apply mechanically above-mentioned criterion again after must distinguishing according to " speech circle " earlier, also be applicable to the combination of numeral and normal words, as also be suitable for when being used for " name ", and its tonal variations rule is in addition according to following manner:

(1) when the identification of " name " is used, must separate " surname " and " name ", as " Yo-yo Ma " or " Liu Shuibian ", though three words all " three ", former phonetic notation is (ㄇㄚ

ㄧヌ

ㄧㄡ ) and (ㄉㄧㄡㄨㄟㄅㄧㄢ

), but " horse ", " willow " are " surname ", do not change tone and equally read " three " sound, " friend friend " or " water is flat " two words are read " two " sound then according to the rule of above-mentioned two continuous " three " sound with first word in two words, and to read be (ㄇㄚ so change ㄧㄡㄧㄡ

) and (ㄉㄧㄡㄨㄟㄅㄧㄢ

).

When (2) having " speech circle " to distinguish in other three words character string, as: " president Jiang " or " president's prize " etc., wherein, " president's prize " three words are " three " sound all, and former phonetic notation is (ㄗㄨㄥ

ㄊㄨㄥㄐㄧㄤ

), but because of not comprising " surname ", so still utilize above-mentioned rule, " two " sound read in two words in front, is (ㄗㄨㄥ so read

ㄊㄨㄥㄐㄧㄤ ).And " president Jiang " three words all " three " sound, former phonetic notation is (ㄐㄧㄤㄗㄨㄥ

ㄊㄨㄥ

), but " Jiang " word is " surname ", is not the part of speech, so equally read " three " sound, back two words " president " are then read " two " sound according to above-mentioned two continuous rules of three with first word, are (ㄐㄧㄤ so read

ㄗㄨㄥ

ㄊㄨㄥ ).

When another kind is the appellation of reduplicated word kenel again, promptly be reduplicated word if two identical words are arranged in the appellation of Chinese words and phrases, even if this two word is " three " sound word, voice (pronunciation) are also different, first of reduplicated word read by original sound, and second word will be read to softly.For example: " grandfather ", " grandmother ", " father ", " mother ", " elder brother ", " elder sister ", " younger brother ", " younger sister ", " orangutan " or the like.

In sum, it is the phenomenon that must take place that Chinese words and phrases medium pitch changes, and the otherness of voice that tonal variations forms and pronunciation obviously can increase the degree of difficulty and the incorrectness of speech recognition, cause the puzzlement in the actual use, as: in a Chinese speech identification system, be to utilize the pronunciation mode to import a Chinese database earlier, utilize voice mode that one Chinese words and phrases (or character string) are imported in this voice identification system again, the phase identification of this voice identification system of energy mat is handled, use and can from Chinese database, capture correct Chinese words and phrases or signal and output, to reach the convenience of speech control; Yet, during actual the use, in the often same Chinese words and phrases (or character string), voice are because of the relation of changing voice is different with pronunciation, and causing can't correct Chinese words and phrases or the signal of simple and easy output, cause the difficulty or the mistake of speech recognition, also influence the carrying out of subsequent job, examination is lifted actual example again and is described as follows:

(1) as using company's telephone system of voice forwarding extension set: possessed a cover voice identification system in above-mentioned telephone system device all, voice data such as name with the input of will sending a telegram here enter by microphone in this voice identification system; And the voice identification system of company generally is to have utilized the pronunciation mode to import a Chinese database earlier, for example: in the company someone " name " with its under " extension ", when then using, this voice identification system must be at the in addition identification of the voice data of incoming call input, export use from Chinese database, to capture correct Chinese words and phrases or signal, as: the incoming call input data was " Yo-yo Ma " originally, but the voice data of input is (ㄇㄚ

ㄧㄡ

), and " Yo-yo Ma " is with pronunciation (ㄇㄚ in the Chinese database of voice identification system ㄧㄡㄧㄡ

) store, cause voice data (the ㄇㄚ that easily makes input

ㄧヌㄧㄡ ) can't simple and easy identification and cause the operation trouble of follow-up system, as: can't be forwarded to correct extension set apace automatically because of can't the correct name of identification causing, reduce the use effect of voice forwarding system.

(2) as using voice hospital registration system in hospital; Possessed a cover voice identification system in above-mentioned voice hospital registration system device all, voice data of importing sending a telegram here such as name etc. enter in the voice identification system by microphone; And the voice identification system of hospital generally is to have utilized the pronunciation mode to import a Chinese database earlier, as: someone name and registration form under it in the hospital, when then using, this voice identification system must be at the in addition identification of incoming call input voice data, export use from Chinese database, to capture correct Chinese words and phrases or signal, as: the incoming call input data was " Yo-yo Ma " and together with an identification number (as the registration form number) originally, but the voice data of input is (ㄇㄚㄧㄡㄧㄡ

), and " Yo-yo Ma " is with pronunciation (ㄇㄚ in the Chinese database of voice identification system ㄧㄡ

ㄧㄡ ) store, cause voice data (the ㄇㄚ that easily makes input ㄧㄡ

ㄧㄡ ) can't simple and easy identification to export correct Chinese words and phrases or signal, cause the operation trouble of follow-up system, as: can't the correct name of identification in the hospital cause and can't voice register (wherein, the long character string of registration form number situation that breaking of voice also may take place).

(3) as voice switching system of restaurant's guest extension or the like: possessed a cover voice identification system in the said system device all, the voice data will send a telegram here and import as name or guest extension etc., enters by microphone in this voice identification system; And the voice identification system in restaurant generally is to have utilized the pronunciation mode to import a Chinese database earlier, as: a certain objective room number reaches tenant's name of check-in in the restaurant, when then using, this voice identification system must be at the in addition identification of the voice data of incoming call input, export use from Chinese database, to capture correct Chinese words and phrases or signal, as: the incoming call input data was that " Yo-yo Ma " reaches together with an identification number (as the guest room number) originally, but the voice data of input is (ㄇㄚ

ㄧㄡ

ㄧㄡ ), and " Yo-yo Ma " is with pronunciation (ㄇㄚ in the Chinese database of voice identification system

ㄧㄡ

ㄧㄡ ) store, cause voice data (the ㄇㄚ that easily makes input

ㄧㄡ

) can't simple and easy identification to export correct Chinese words and phrases or signal, cause the operation trouble of follow-up system, as: restaurant because of the correct name of tenant in can't the incoming call recognizing voice, cause and to be forwarded to the guest room automatically, and the incoming call of wanting desire switching guest room in the general restaurant telephone system must be represented guest extension and tenant's correct name, beginning will be sent a telegram here and will be forwarded to the guest room automatically if coincide, in case promote improperly.

Summary of the invention

The discrimination method that the purpose of this invention is to provide breaking of voice in a kind of Chinese speech identification system is to increase the accuracy of System Discrimination.

The present invention's method may further comprise the steps:

Utilize keyboard or other input media, Chinese material is input into the data buffer zone;

Utilize words and phrases to judge the resolution process unit, handle the Chinese material of data buffer zone, and distinguish continuously " three " sound or appellation reduplicated word according to speech circle rule;

At the appellation reduplicated word of Chinese material, produce appellation reduplicated word data, and be stored in Storage Media;

At continuous " three " in Chinese material sound, produce continuously " three " sound data, and be stored in Storage Media;

Make Storage Media be connected in the signal fusing processor;

Reception one sees through the Chinese speech data of the phonetic entry of signal encoding;

Utilize the signal fusing processor, according to the data of Storage Media and the Chinese speech data of phonetic entry is carried out identification;

Again comparison result is exported.

Described words and phrases judgement resolution process unit is to decompose the Chinese material of input according to speech circle rule, makes to divide into general comparison, continuous " three " acoustic ratio to reaching the comparison of appellation reduplicated word.

Described words and phrases judgement resolution process unit can be built generally comparison data and put at Storage Media to form database.

Described words and phrases judge that the resolution process unit can be with " three " sound classification continuously, divide into 2 continuous " three " sound, 3 continuous " three " sound, 4 continuous " three " sound, 5 continuous " three " sound, 6 continuous " three " sound, 7 different numbers of words such as continuous " three " sound, and comply with different numbers of words and build the different comparison data of putting, comprise: then the 1st word wherein set up " two " acoustic ratio to data at 2 continuous " three " sound, at 3 continuous " three " sound then to wherein the 1st, 2 words are set up " two " acoustic ratio to data, at 4 continuous " three " sound then to wherein the 1st, 3 words are set up " two " acoustic ratio to data, at 5 continuous " three " sound then to wherein the 1st, 3,4 words are set up " two " acoustic ratio to data, at 6 " three " sound then to wherein the 1st, 3,5 words are set up " two " acoustic ratio to data, at 7 continuous " three " sound then to wherein the 1st, 3,5,6 words are set up " two " acoustic ratio to data, and use all to build through the comparison data of changing voice and put at Storage Media to form database.

During described words and phrases judge that the resolution process unit can the Chinese material with input, the 2nd word in the appellation reduplicated word set up compare data softly, and the comparison data that will change voice is built and put at Storage Media to form database.

Described Storage Media comprises the hard disk internal memory.

When utilizing the signal fusing processor and carrying out identification according to the data of Storage Media and to the Chinese speech data of phonetic entry, can carry out general voice comparison earlier, if concern and speech circle according to character string again during the comparison failure, and utilize the voice after the conversion to carry out identification.

Description of drawings

Fig. 1 is a process block diagram of the present invention.

Fig. 2 produces the variation pattern synoptic diagram of breaking of voice naturally for connection pronunciation of " three " sound word or identical reduplicated word are arranged in the Chinese speech continuously.

Fig. 3 is the actual block schematic diagram of controlling of embodiments of the invention.

Fig. 4 is the actual block schematic diagram of controlling of another embodiment of the present invention.

Embodiment

See also shown in Figure 1ly, the present invention's method may further comprise the steps:

Building aspect the Chinese database of putting voice identification system: be to utilize keyboard or other input media 1, Chinese material is input into data buffer zone 2, and judge resolution process unit 3 by words and phrases, Chinese material in the processing data buffer zone 2 is also judged, if this Chinese material is judged as " appellation reduplicated word " and then produces appellation reduplicated word data 4, be stored in Storage Medias 6 such as hard disk internal memory again; If this Chinese material is judged as " ' three ' continuously " and then produces ' three ' data 5 continuously, be stored in Storage Medias 6 such as hard disk internal memory again, make must build in the Storage Media 6 such as hard disk internal memory to be set to a Chinese database: Storage Medias 6 such as hard disk internal memory then connect to an operating system with signal fusing processor 7.

In use, via microphone or other speech input device 8 inputs one Chinese speech data, this Chinese speech data is carried out identification by signal fusing processor 7 more earlier through signal encoding 9, with comparison result output 10, receive correct signal and carry out subsequent job again for follow-up system.

See also shown in Figure 2, it is to have " three " sound word to connect pronunciation or identical reduplicated word in the Chinese speech continuously and the variation pattern synoptic diagram of generation breaking of voice naturally, after the comparison data input, promptly be introduced into according to speech circle rule decomposing program 11, the comparison data is divided into general comparison program 12, continuous " three " acoustic ratio to program 13 and appellation reduplicated word comparison program 14, wherein, if generally compare program 12, the data of then comparing is directly built and is put in database 15; And if continuous " three " acoustic ratio is to program 13, then comparison data enters according to continuous " three " sound number of words sort program 16, to compare information data area and be divided into 2 continuous " three " sound 17,3 continuous " three " sound 18,4 continuous " three " sound 19,5 continuous " three " sound 20,6 continuous " three " sound 21, different numbers of words such as 7 continuous " three " sound, 22 grades, and carry out building of different comparison data according to different numbers of words and put, comprise: the 1st word wherein set up " two " acoustic ratio to data 23 at 17 of 2 continuous " three " sound, at 18 of 3 continuous " three " sound to wherein the 1st, 2 words are set up " two " acoustic ratio to data 24, at 19 of 4 continuous " three " sound to wherein the 1st, 3 words are set up " two " acoustic ratio to data 25, at 20 of 5 continuous " three " sound to wherein the 1st, 3,4 words are set up " two " acoustic ratio to data 26, at 21 of 6 continuous " three " sound to wherein the 1st, 3,5 words are set up " two " acoustic ratio to data 27, at 7 continuous three 22 to wherein the 1st, 3,5,6 words are set up " two " acoustic ratio to data 28, and with all comparison data 23 through changing voice, 24,25,26,27,28 build and put in database 15.If appellation reduplicated word comparison program 14 is again then set up the 2nd word wherein and compared data 29 softly, and the comparison data 29 that will change voice is built and is put in database 15.Then in the database 15 at general comparison program 12, continuously " three " acoustic ratio is built to put to program 13 and appellation reduplicated word comparison program 14 and is finished the part of changing voice that most probable produces between complete comparison data, especially voice and pronunciation; When then desiring identification when voice signal 30 inputs, but the complete comparison data voice comparison 31 that mat database 15 has stored again with comparison result output 32, receives correct signal and must carry out subsequent job for follow-up system.

In order to allow the personage who knows this technology can understand feature of the present invention more, the embodiment especially exemplified by the telephone system of voice forwarding illustrates the actual situation of using of the present invention:

See also shown in Figure 3, user's calling before this 40; Proceed to auto-pickup and declare salutatory 41, as: " you are good in OO company, could you tell me whom you will look for? " Proceed to the user again and say the people 42 that (voice) will be looked for, as: " old president (ㄔㄣㄗㄨㄥㄊㄨㄥ

) "; Proceed to general comparison 43 again, i.e. old president's voice (ㄔㄣㄗㄨㄥ

ㄊㄨㄥ

) and pronunciation (ㄔㄣㄗㄨㄥㄊㄨㄥ ) comparison; If comparison failure (because of voice and pronunciation variant), then proceed to continuous " three " acoustic ratio to 44, i.e. old president's voice (ㄔㄣㄗㄨㄥㄊㄨㄥ ) with the conversion after voice (ㄔㄣ

ㄗㄨㄥㄊㄨㄥ ) comparison; Compare successfully (accuracy improves than general comparison 43), proceed to voice again and read out comparison result and deal with 45, as: " that you will look for is old president's voice (ㄔㄣ

ㄗㄨㄥㄊㄨㄥ

), I am your switching at once "; Transfer and successfully finish speech recognition.

See also shown in Figure 4, user's calling before this 47; Proceed to auto-pickup and declare salutatory 48, as: " be dotey family here, could you tell me whom you will look for? " Proceed to the user again and say the people 49 that (voice) will be looked for, as: " old little sister (ㄔㄣㄒㄧㄠ

ㄇㄟㄇㄟ ) "; Proceed to general comparison 50 again, promptly old little sister (ㄔㄣ

ㄒㄧㄠㄇㄟ

ㄇㄟ ) and pronunciation (ㄔㄣㄒㄧㄠ

ㄇㄟㄇㄟ

) comparison; If comparison failure (because of voice and pronunciation variant), then proceed to the appellation reduplicated word and compare 51, promptly old little sister (ㄔㄣ

ㄒㄧㄠ

ㄇㄟㄇㄟ

) with the conversion after voice (ㄔㄣ

ㄒㄧㄠㄇㄟㄇㄟ ) comparison; Compare successfully (accuracy improves than general comparison 43), proceed to voice again and read out comparison result and deal with 52, as: " you wait a moment, and I am old little sister (ㄔㄣ

ㄒㄧㄠㄇㄟㄇㄟ ) answer a call "; Transfer and successfully finish speech recognition 53.

In summary, any Chinese language at input, Chinese speech identification system of the present invention is except that can carrying out general language comparison as 43,50, more can be when the comparison failure, automatically according to character string relation and speech circle, utilize the words and phrases after voice after the conversion come the identification breaking of voice again, as " three " acoustic ratio continuously to 44 or appellation reduplicated word comparison 51, with the identification difficulty that has continuously " three " sound word to connect in effective reduction voice or produced during identical reduplicated word, and increase the accuracy of System Discrimination.

In sum, the discrimination method of breaking of voice in the Chinese speech identification system of the present invention can reach described effect by above-mentioned disclosed method really.

Claims

1, the discrimination method of breaking of voice in a kind of Chinese speech identification system may further comprise the steps:

Utilize keyboard or other input media, Chinese material is input into the data buffer zone; Utilize words and phrases to judge the resolution process unit, handle the Chinese material of data buffer zone, and distinguish continuously " three " sound or appellation reduplicated word according to speech circle rule;

Make Storage Media be connected in the signal fusing processor;

Again comparison result is exported.

2, according to the discrimination method of breaking of voice in the described a kind of Chinese speech identification system of claim 1, it is characterized in that: described words and phrases judgement resolution process unit is to decompose the Chinese material of input according to speech circle rule, makes and divides into general comparison, continuous " three " acoustic ratio to reaching the comparison of appellation reduplicated word.

3, according to the discrimination method of breaking of voice in the described a kind of Chinese speech identification system of claim 2, it is characterized in that: described words and phrases judgement resolution process unit can be built generally comparison data and put at Storage Media to form database.

4, discrimination method according to breaking of voice in claim 1 or the 2 described a kind of Chinese speech identification systems, it is characterized in that: described words and phrases judge that the resolution process unit can be with " three " sound classification continuously, divide into 2 continuous " three " sound, 3 continuous " three " sound, 4 continuous " three " sound, 5 continuous " three " sound, 6 continuous " three " sound, 7 different numbers of words such as continuous " three " sound, and comply with different numbers of words and build the different comparison data of putting, comprise: then the 1st word wherein set up " two " acoustic ratio to data at 2 continuous " three " sound, at 3 continuous " three " sound then to wherein the 1st, 2 words are set up " two " acoustic ratio to data, at 4 continuous " three " sound then to wherein the 1st, 3 words are set up " two " acoustic ratio to data, at 5 continuous " three " sound then to wherein the 1st, 3,4 words are set up " two " acoustic ratio to data, at 6 " three " sound then to wherein the 1st, 3,5 words are set up " two " acoustic ratio to data, at 7 continuous " three " sound then to wherein the 1st, 3,5,6 words are set up " two " acoustic ratio to data, and use all to build through the comparison data of changing voice and put at Storage Media to form database.

5, according to the discrimination method of breaking of voice in claim 1 or the 2 described a kind of Chinese speech identification systems, it is characterized in that: described words and phrases judge that the resolution process unit can be with in the Chinese material of importing, the 2nd word in the appellation reduplicated word set up compare data softly, and the comparison data that will change voice is built and put at Storage Media to form database.

6, according to the discrimination method of breaking of voice in claim 1 or the 2 described a kind of Chinese speech identification systems, it is characterized in that: when utilizing the signal fusing processor and carrying out identification according to the data of Storage Media and to the Chinese speech data of phonetic entry, can carry out general voice comparison earlier, if concern and speech circle according to character string again during the comparison failure, and utilize the comparison data of changing voice to carry out identification.

7, according to the discrimination method of breaking of voice in the described a kind of Chinese speech identification system of claim 1, it is characterized in that: described this Storage Media comprises the hard disk internal memory.