CN1384421A

CN1384421A - Text pronunciation digital coding method

Info

Publication number: CN1384421A
Application number: CN 01115811
Authority: CN
Inventors: 刘东华
Original assignee: Individual
Current assignee: Individual
Priority date: 2001-04-30
Filing date: 2001-04-30
Publication date: 2002-12-11

Abstract

The text pronunciation digital coding method encodes the pronunciation and tone of each character as well as the inflexion sound articulation and sensation expressing element of each character, and even more, the natural sound to make reading more vivid and all these constitute the reading text of one article. The present invention is for electronic talking book.

Description

A kind of text pronunciation digital coding method

The invention belongs to the digital processing technology of acoustic information, particularly the digital coding method of text pronunciation.

People mostly also can remember late nineteen seventies that period to early eighties, listen storytelling to reach as the degree silly as fan, even the wonder that the storytelling time occurred whole town turns out.This has illustrated that people like for " listening " book very much.Also be people often and be not content with oneself reading, can hear the reading aloud of vivid, modulation in tone of works, also be a kind of enjoyment.

In journey, way on and off duty, because of insufficient light, vehicle body shake or surrounded by the people around is crowded, can not normally read newspaper, novel, people wish to make full use of and remove to listen a newspaper during this period of time, listen a teaching material to learn or listen novel for amusement.

Present blind person's books are published by the Braille Press, need make of special making apparatus and very thick paperboard.Both made difficulty, very heavy again, can't be in time, fully various works are published.And braille need could be familiar with by study, and current situation is that only some blind person can read braille, and numerous blind persons can not similarly receive up-to-date information with the normal person, is an individual great obstacle for their education and entertainment.

The child also can not read a book before the character learning, and the head of a family and teacher's education time is limited, needs a kind of complementary teaching tools that can tell a story, introduce knowledge for a long time with pronunciation accurately.

Also have, in educational works such as mandarin education, ancient Chinese prose explanation, political education, also need to pronounce accurately, explain cheer and bright talking book in tourist attractions, during buyer's guide.

Above statement of facts people need talking book.

; though the mankind have invented tape; storage medium and ADPCM such as CD; LPC; MELP; voice compressing methods such as CELP; can note the voice messaging that has personal emotion and characteristic preferably; but in order to guarantee certain tonequality; these methods are all more limited to the compression factor of voice; therefore all need to use jumbo storage medium (as tape; CD and semi-conductive storer); because the medium memory finite capacity (or because price limit and to use limited capacity); generally can only write down several hrs with interior voice messaging, can not realize the long-time talking book of playing.

Acoustic information for the such expression people emotion of language, people can write down it with phonetic transcription now, and also there are some patents to propose numeric coding scheme, proposed initial consonant, simple or compound vowel of a Chinese syllable to the Chinese phonetic alphabet and the syllable that constitutes by them as the patent of invention of publication number CN 1087439A to phonetic; The phonetic symbol of English and the syllable that constitutes by them; The numeric coding scheme of the syllable that set with Japanese alphabet and their constitute (following they are referred to as voice).; phonetic has only write down the pronunciation syllable and the tone of voice; the affective sound intensity, pronunciation length, intonation height in when pronunciation, language elements such as the suffixation of a nonsyllabic "r" whether can not be noted, only machinery, stiff voice can only be synthesized without any emotion according to phonetic.

We notice that music is a kind of acoustic information that is full of human emotion, and music beautiful, that vary only just can write down, propagate with simple music score; No matter where, people just can verily play out equally graceful, feeling melody according to music score.

For this reason, people press for the voice coding method of short code storage of a kind of voice realized that are similar to music score, make that the speech text after the coding is short and small, only just can store with extremely light little medium to reach tens up to a hundred hours voice.

Purpose of the present invention just provides a kind of text pronunciation coding method of the people's of meeting above-mentioned requirements.Utilize the music score of the pronunciation text of this text pronunciation coding method formation, can synthesize voice at any time and place by light electronic sound reading matter device with emotion as music.

In order to achieve the above object, technical scheme of the present invention is: by to basic pronunciation key element such as the pronunciation syllable of literal, tone with read aloud sound intensity that literal should have, pronunciation length, intonation height, affective pronunciation key element such as the suffixation of a nonsyllabic "r" and make and read aloud more lively the Nature sound and carry out numerical coding whether, constitute the pronunciation text of electronic sound reading matter; With the pronunciation coding word that the basic pronunciation key element and the affective pronunciation key element of article Chinese words under the digital recording of standard word length and form just constituted original language, described pronunciation coding word comprises the pronunciation syllable, tone, sound intensity, pronunciation length, intonation height of original language, the numerical coding of pronunciation key element such as the suffixation of a nonsyllabic "r" whether; Read aloud lively the Nature digital coding of voice (DCV) and constitute the Nature acoustic coding word for making, described the Nature acoustic coding word comprises the numerical coding of sounding key elements such as represent sounding power when reading aloud of the Nature digital coding of voice (DCV) and this sound, sounding length, tone height; The pronunciation coding word of literal correspondence and the Nature acoustic coding word constitute the pronunciation text jointly.

This coding method can synthesize the pronunciation that has emotion, satisfies the normal person and listens novel, listens newspaper news, listens article, listens teaching material, blind person's study and purposes such as middle and primary schools, preschool education.

The present invention will be described in detail below in conjunction with embodiment.

The core of this coding method is that each key element of text pronunciation is carried out digitally coded rule respectively, below explanation respectively:

With Chinese is example, though the Chinese character sum has several ten thousand, the pronunciation of Chinese character promptly has only 1334 with the syllable sum that the four tones of standard Chinese pronunciation are in harmonious proportion softly, is less than Chinese character quantity greatly.And people as long as recognize pronunciation, do not need to see Chinese character listening when reading aloud, and just are appreciated that the content of reading aloud.Therefore, we carry out numerical coding to 1334 syllables, as the basis of text pronunciation digital coding method.When making the pronunciation text of certain piece of article, as long as each Chinese character in the article is replaced with its pairing pronunciation syllable coding, just can be converted to pronunciation text to writing text based on its word pronunciation syllable.But though the pronunciation that synthesizes with this pronunciation text can satisfy the needs of people's understanding content, because be flat, unfelt sound, people can listen bored soon.For satisfying the needs of electronic sound reading matter, we also will add the sound intensity, pronunciation length, intonation height of the color that gives expression to one's sentiment that it should have, the numerical coding of pronunciation key element such as the suffixation of a nonsyllabic "r" whether for each word pronunciation coding in article, thereby constitute the pronunciation coding word of each literal, with such pronunciation coding word, we just can synthesize and have reading aloud of emotion.

1334 syllables add expression chapter, paragraph etc. cut apart sign, punctuation mark and in full end mark, distinguish declaimer's sex, age segmentation or the like text entry sign, add tone-off syllable that the expression pronunciation pauses and auxiliary syllable (character pronunciation syllable and necessary auxiliary syllables such as foreign language etc. in full) within 2048.

On this basis,, make it more vivid, increase the coding of performance the Nature sound again, adopt 12 bits (totally 4096) binary digit that text pronunciation and the Nature sound are carried out fgs encoder in order to increase the expressive ability of electronic sound reading matter.So-called the Nature sound should comprise:

1. ambient sound: as wind, rain, Thunder rumbled and lightning flashed, door sound, window sound, various vehicle sound, explosive sound, zing, machine sound etc.

2. animal sounds: as ox, pig, horse, sheep, donkey, dog, chicken, duck, goose and other various animals, the action of various birds and the sound etc. that pipes.

3. other need need the sound that shows in the electronic sound reading matter.

The length of pronunciation coding word can be got 4-5 byte (each byte is 8 bits) length, and for example when we selected 4 bytes to be the pronunciation coding word length, each pronunciation coding word is 32 bits altogether.

Aforesaid expression syllable, mark and the Nature acoustic coding need 12 bits altogether, and remaining 20 bits are used for other pronunciation key element of coded representation emotion.

Concrete Bit distribution method can have multiple, be below a kind of can the actual coding method of using:

1. sound intensity: with 4-6 bits of encoded;

2. pronunciation length: with 4-6 bits of encoded;

3. intonation just: with 4-6 bits of encoded;

4. the whether suffixation of a nonsyllabic "r": with 1 bits of encoded;

5. remaining 1-7 bit is as standby.

The numerical coding standard word length of GB Chinese character is two byte 16 bits, and the word length of pronunciation coding word is 4 byte 32 bits, the bit number that also is the pronunciation text of article only is about the twice of its language digital coding text, realized the high-level efficiency coding of voice, be easy to store long works with semiconductor memory.Utilize the semiconductor memory of 16 megabits of nail cover size can store the pronunciation text of the novel of about 500,000 words, reading aloud 4 words by average per second calculates, can read aloud about 36 hours, for high-capacity storage and the practical electronic sound reading matter of realizing voice have been created condition.

Claims

1. text pronunciation digital coding method, it is characterized in that:, constitute the pronunciation text of electronic sound reading matter by to basic pronunciation key element such as the pronunciation syllable of literal, tone with read aloud sound intensity that literal should have, pronunciation length, intonation height, affective pronunciation key element such as the suffixation of a nonsyllabic "r" and make and read aloud more lively the Nature sound and carry out numerical coding whether; With the pronunciation coding word that the basic pronunciation key element and the affective pronunciation key element of article Chinese words under the digital recording of standard word length and form just constituted original language, described pronunciation coding word comprises the pronunciation syllable, tone, sound intensity, pronunciation length, intonation height of original language, the numerical coding of pronunciation key element such as the suffixation of a nonsyllabic "r" whether; Read aloud lively the Nature digital coding of voice (DCV) and constitute the Nature acoustic coding word for making, described the Nature acoustic coding word comprises the numerical coding of sounding key elements such as represent sounding power when reading aloud of the Nature digital coding of voice (DCV) and this sound, sounding length, tone height; The pronunciation coding word of literal correspondence and the Nature acoustic coding word constitute the pronunciation text jointly.

2. a kind of text pronunciation digital coding method according to claim 1, it is characterized in that: the coding of described basic pronunciation key element is that 1,334 syllables for Chinese character pronunciation, auxiliary pronunciation syllable and text entry sign and the Nature sound adopt 12 bit-binary numerical codings, each coding corresponding pronunciation syllable, text entry mark or the Nature sound.

3. a kind of text pronunciation digital coding method according to claim 1, it is characterized in that: described sound intensity, pronunciation length, intonation height, the digital coding method of affective pronunciation key element such as the suffixation of a nonsyllabic "r" is that every kind of pronunciation key element is carried out binary coding, make them can note the pronunciation feature and the emotion of pronunciation syllable, and respectively with the coded combination of described basic pronunciation key element, constitute the pronunciation coding word of literal, the pronunciation coding word of literal correspondence and text entry sign and the Nature acoustic coding word constitute the pronunciation text.