CN1032391C - Chinese character-phonetics transfer method and system edited based on waveform - Google Patents

Chinese character-phonetics transfer method and system edited based on waveform Download PDF

Info

Publication number
CN1032391C
CN1032391C CN 94103372 CN94103372A CN1032391C CN 1032391 C CN1032391 C CN 1032391C CN 94103372 CN94103372 CN 94103372 CN 94103372 A CN94103372 A CN 94103372A CN 1032391 C CN1032391 C CN 1032391C
Authority
CN
China
Prior art keywords
speech
word
sound
string
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 94103372
Other languages
Chinese (zh)
Other versions
CN1099165A (en
Inventor
蔡莲红
魏华武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN 94103372 priority Critical patent/CN1032391C/en
Publication of CN1099165A publication Critical patent/CN1099165A/en
Application granted granted Critical
Publication of CN1032391C publication Critical patent/CN1032391C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The present invention provides a Chinese character-to-speech converting method based on waveform editing, which belongs to the technical field of a signal processing technique. The present invention comprises the following procedures: a character string input by a computer is divided into word strings or phrase strings according to a speech word dividing method; audio change processing or tone change processing are carried out to generate an audio element waveform index string; corresponding audio and optical waveforms are brought out from an audio element waveform database and edited to obtain a speech data sequence. A character-to-speech converting system formed in the method is composed of a universal computer, a speech output board, a loudspeaker and corresponding method software. The present invention has the characteristics of high output speech natural degree, high intelligibility and high processing speed.

Description

Chinese written language-phonetics transfer method and system based on waveform compilation
The invention belongs to the signal processing technology field, particularly Chinese written language-Voice Conversion Techniques.
Voice are instruments of information interchange, the sharpness height of natural-sounding (people's pronunciation), easy to understand.In man-machine interaction, convenient with voice mode transmission information, nature, demand is also urgent day by day.All carried out the machine synthetic speech both at home and abroad, the research of literal-speech conversion.Literal-speech conversion is the extension of speech synthesis technique, and it becomes the voice parameter control sequence to text strings earlier, utilizes speech synthesis technique by machine or computing machine output sound then,
Existing a kind of Chinese written language-phonetics transfer method as shown in Figure 1.This method at first with the input character string through character-Parameters Transformation, become voice parameter control string, from the speech parameter storehouse, take out speech parameter (common speech parameter is LPC coefficient or formant parameter) then, utilize the pumping signal of driving source again, obtain speech-sound data sequence by speech synthesis method, through the D/A conversion, export voice flow at last.
This speech output method naturalness based on parameter (LPC or resonance peak) is relatively poor, also owes perfect because limited parameter is difficult to the slight change, particularly the method for adaptation voice to the formant parameter of initial consonant and the description of pumping signal.Formant parameter is revised comparatively complicated, is difficult to carry out in real time, and the literary composition of this method-language conversion does not comprise voice-based minute word algorithm in addition, has influenced the naturalness and the intelligibility of output voice.
The objective of the invention is to overcome the weak point of the literary composition-language conversion method of prior art, propose a kind of Chinese written language-phonetics transfer method and system thereof, have output voice naturalness height, the characteristics that intelligibility is good based on waveform compilation.
A kind of Chinese written language-phonetics transfer method based on waveform compilation that the present invention proposes as shown in Figure 2, is characterized in that may further comprise the steps:
(1) at first Hanzi internal code or the pinyin string note cent morphology of importing is divided into speech or phrase string (21), between speech, inserts the symbol that pauses;
(2) said speech or phrase string generate the sound unit waveform index string of band characteristics of speech sounds mark again by the change of tune, modulation rule treatments (22);
(3) set up sound unit Wave data storehouse (23), in the first Wave data of said sound storehouse, take out the first waveform of corresponding sound according to the first waveform index of said sound string;
(4) the more first waveform of said sound is edited (24), promptly revise loudness of a sound, pitch, the duration of a sound of sound unit waveform, obtain speech-sound data sequence.
The said voice of the present invention divide morphology as shown in Figure 3, comprise the very big matching method of forward scan (31), " rule of two and three " of monosyllabic word organizes morphology (32) continuously, the sticking group of word morphology (34), the reverse major term coupling of ambiguity speech string is divided morphology (33), the very big matching method of said forward scan (31) is to set up the participle dictionary, with the Hanzi internal code of said input or pinyin string backward word for word from beginning of the sentence, suppose that cut-point forms speech, with terminology match in the dictionary, " rule of two and three " the group morphology (32) of said continuous monosyllabic word is meant when continuous monosyllabic word number surpasses four, press rule of two and three composition group speech, said word sticking group morphology (34) is to set up the function word storehouse that indicates the sticking speech rule of function word, press in the function word storehouse sticking speech rule with before the function word in the vocabulary string and its, after speech stick together, the backward major term coupling of said ambiguity speech string divides morphology (33) to be meant from last byte of ambiguity speech string, progressively adds the word coupling forward, finds the end point of long word.
The said change of tune of the present invention, modulation process (22) may further comprise the steps: 1) text that the character string behind the text participle is replaced with its Chinese character pronunciation is replaced; 2) utilize the polyphone dictionary, the polyphone that said character string is marked correct pronunciation is handled; 3) utilize the change of tune, the modulation rule base set up to make said character string generate the sound unit index string of band pronunciation characteristics mark.
The present invention is based upon the literal of realizing at Chinese on the waveform compilation method to the conversion method of voice, comprises that voice divide morphology, and change of tune modulation process is set up sound unit Wave data storehouse and several parts of the first waveform editor of sound.As shown in Figure 2.Respectively each ingredient is described in detail below.
Semantic participle is the basic assignment that Chinese natural language is understood, the voice of considering Chinese language-language conversion to require and designing divide morphology (21), have one's own knack technically, voice among the present invention divide the morphology workflow as shown in Figure 3, one of its feature is that participle is to adopt the very big matching process of forward scan (31). two of feature is to adopt word to glue method (34) again behind the participle, synthetic spoken phrase.Three of feature is that the reverse major term coupling in the ambiguity speech string divides morphology (33). four of feature is non-posting term is handled.
The very big matching method of said forward scan (31) is to scan backward from beginning of the sentence, when definite speech boundary be from current word backward by major term coupling, sentence is divided into speech or phrase string.Its workflow such as Fig. 4 the steps include: when beginning a statement participle, at first remove to be used to store the impact damper of interim character string, and pointer is changed to 0 (41). begin to read in Chinese character then, be placed on temporarily in the impact damper (42).Judging whether to be sentence tail (43). the sign of sentence tail is punctuation mark (as fullstop, pause mark, comma etc.) and space character (manually dividing lexicon).If the sentence tail, participle finishes.If not the sentence tail, character in the impact damper and participle dictionary forward scan coupling (44).The very big matching algorithm of said forward scan (31) is by the forward scan order, finds out in the dictionary for word segmentation and the long word for the treatment of the participle string matching.Whether success (45) is read in Chinese character (42) again if the match is successful to judge coupling, and with the suffix word of the neologisms lead-in of writing words, coupling (44) again up to coupling unsuccessful (45), is noted the position of the first and tail of speech (string) again, and inserts pause and accord with (46).Divide morphology to be complementary with these voice, set up the participle dictionary, comprise two words in the storehouse, three words and four words.Said scanning refers to word for word supposes cut-point, forms speech, with terminology match in the dictionary.
Said very big coupling refers to, and determines the principle of the character string that cannot divide again in scanning by very big coupling.For example statement " atom is combined to molecule in reaction " is through the very big matching method of above-mentioned forward scan, and the process of scanning coupling is as follows:
Comparative result
Atom " atom " is a speech
Atom is not a speech in " atom exists ", " son exists ", determines that " atom " is speech
Anti-" anti-" is not speech
In reaction " in reaction " is not speech, and " reaction " is speech, determine " " be monosyllabic word
" in the reaction " is not speech in the reaction, determines that " reaction " is speech
Middleization " middleization " is not speech
Middle chemical combination " middle chemical combination " is not speech, and " chemical combination " is speech, determine " in " be monosyllabic word
Being combined to " being combined to ", " synthesizing " is speech
Chemical combination composition " chemical combination composition ", " the synthetic branch " is not speech, " chemical combination ", " composition " is speech
Be combined to molecule and " be combined to molecule ", " synthetic molecules " is not speech, " chemical combination ", " synthesizing "
" composition " " molecule " is speech; Therefore " be combined to molecule " and be one by the utmost point
The ambiguity speech string that big coupling obtains.
The result of cutting apart is: atom U U in U reaction U is combined to molecule." atom " " reaction " is two two words, " ", " in " be two monosyllabic words, " being combined to molecule " is the ambiguity speech string that wouldn't divide again.
Said word glues method (34) and refers to some adverbial word and the speech that is close to it are sticked together, and forms phrase, and flow is paused suitably, improves hearing effect.Such as in the last sentence " " word, the pronunciation that should stick together with the speech of its front could more meet the human pronunciation custom like this.Set up the function word storehouse among the present invention, marked the sticking speech rule (sticking with preceding speech, still sticking) of these function words in the storehouse with the back speech.
The reverse major term coupling of said ambiguity speech string is divided morphology (33), is after forward scan is greatly mated participle and is defined as ambiguity speech string, mates with reverse major term to be divided into speech again.The course of work is that last byte from ambiguity speech string begins progressively to add forward the word coupling, finds the end point of long word, is major term and finishes.As above example " being combined to molecule " is divided into " being combined into " and " molecule " two speech.
Said to non-posting term processing, refer to non-posting term and when participle, be divided into monosyllabic word.Among the present invention, designed " rule of two and three ".When continuous monosyllabic word number surpasses four, by " rule of two and three: the group speech, read then.If continuously the monosyllabic word number is N, rule of two and three group speech number be W then:
Figure C9410337200061
N W array mode portmanteau word number for example
5 0 2,3 2+0
623,3 2+ round [0.3 * 0.2]
732,3,2 2+ round [0.3 * 0.4]
832,3,3 2+ round [0.3 * 0.6]
942,3,2,2 2+ round [0.3 * 0.8]
10 4 2,3,2,3 4
11 42,3,3,3 4+ round [0.3 * 0.2]
12 52,3,2,3,2 4+ round [0.3 * 0.4]
13 52,3,2,3,3 4+ round [0.3 * 0.6]
14 62,3,2,3,2,2 4+ round [0.3 * 0.8]
15 6 2,3,2,3,2,3 6
Voice segmenting method participle speed among the present invention is fast, and per second can divide several thousand speech, because it has considered voice output, thereby to the follow-up change of tune, modulation process has been established good basis, and the naturalness that improves output flow is made very big contribution.
Change of tune modulation process (22) comprises that text is replaced, polyphone is handled, changes voice, modified tone etc.
Text is replaced to refer to text is replaced with its Chinese character pronunciation, replaces to " percent " as handle " % ", " ÷ " replace to " divided by " etc.The purpose that polyphone is handled is to make the Chinese character of a word multitone be able to orthoepy in corresponding speech.In " just in time ", read zheng as " just ", in " first month of the lunar year ", read zh ē ng.The course of work that polyphone is handled is: set up the polyphone dictionary earlier, list the vocabulary of band polyphone in dictionary, and mark its pronunciation.In text-to-speech conversion system, character string by participle after, in the polyphone dictionary, find correct pronunciation.Change voice, modifying tone is meant in language stream, the variation of the sound that takes place by pronunciation rule, this variation has following several:
(1) modify tone: each independent syllable has definite tone, but in phrase, because the influence of adjacent syllable pronunciation, tone changes.When as above sound links to each other, the front go up the approximate rising tone of sound, three when going up sound and linking to each other, go up the approximate rising tone of sound for preceding two.
(2) reduction: the reduction exactly some syllable in the flow is read as softly, as in " you " " ", " son " in " cup ", second " father " in " father ".
(3) strengthen: reinforcement is exactly that some syllable in the flow is strengthened, and is read as stress.As " effort " in " studying hard ".
(4) suffixation of a nonsyllabic "r": the syllable in the Chinese has the suffixation of a nonsyllabic "r" ending of a final, has produced suffixation of a nonsyllabic "r" rhythm, as " flower ".
(5) the sound connection change of tune: in the flow, adjacent consonant, vowel, adjacent syllable interacts in phonation, and the change of tune has taken place.
(6) intonation and syllable rhythm: Chinese has the multiple tone, as query (?), pray and make (.), the statement (. ) and sigh with feeling (! ) tone.A kind of tone has different moods again, and the variation of tone mood is reflected on the tonal variations of each syllable in the sentence.
Below only list the part example of the relevant change of tune, modulation rule, and a large amount of rules and example are studied, are concluded out by linguist.One of feature of the present invention be with those rule application in Chinese written language-speech conversion system.Two of feature is the enforcement of the change of tune, modulation rule.Mostly diplomatic parameter phonetic synthesis is to revise LPC or formant parameter.And in the present invention, set up the rule base of the change of tune, modulation, and treated then program generates the sound unit index string of band pronunciation characteristics mark, and sound unit index is the sound unit address in the sound unit Wave data storehouse that generates by pronunciation rule.The pronunciation characteristics mark is characterised in that the character or the numeral explanation of inserting pronunciation character in sound unit index string, and it comprises:
A, mark pauses: pause that mark is divided between the sound mother, between inter-syllable, speech, between sentence, intersegmental five kinds.
B, pronunciation fundamental characteristics mark: this is meant the intensity of pronunciation, speed and pitch mark, and they are independent parameter each other.
C, change of tune characteristic mark: refer to here to pronounce emphasize, reduction, the suffixation of a nonsyllabic "r", sound connection change of tune characteristic.Emphasize to refer to centre word or stress.
D, intonation characteristic, finger speech sentence intonation mark here; The foundation of intonation mark is the punctuation mark of a tail.
Sentence tail tag point symbol tone intonation characteristic example sentence
Are you Mr. Zhang for the strong rising tune of query?
。Statement is prayed and is made Heibei provincial opera this is Mr. Zhang.
Pray and make, sigh with feeling that weak rising tune halts!
Weather is very nice!
Sound of the present invention unit refers to data item in the sound unit Wave data storehouse.It can be a syllable waveform data, also can be an initial consonant, a simple or compound vowel of a Chinese syllable, phonetic transition section data.It is the waveform editor's of sound unit a base unit.
One of sound of the present invention unit its feature of Wave data storehouse (23) be when the first waveform of sound be when being unit with the syllable, it comprise the single syllable normal articulation of Chinese, softly, the preceding syllable of two words and the postbyte of two words.When needing the output voice,, take out corresponding syllable Wave data and edit splicing according to pronunciation rule.
Sound unit Wave data storehouse, two of its feature are that sound unit waveform may be that simple or compound vowel of a Chinese syllable semitone joint, initial consonant semitone save or rhythm-sound transition section when sound unit waveform is sound unit less than syllable.When needing the output voice,, take out the first Wave data of corresponding sound, edit splicing by sound unit waveform edit methods according to word segmentation result and characteristics of speech sounds mark.
Sound of the present invention unit's waveform edit methods (24) is by the pronunciation characteristics mark, revises the weak point/length of low/high, the duration of a sound of light/heavy, pitch of the loudness of a sound of sound unit waveform; To improve the naturalness of output flow.
Concrete work is as follows:
A, voice basic parameter are provided with.Before carrying out the waveform editor of sound unit, the basic parameter of voice is set earlier.Promptly press system requirements or pronunciation fundamental characteristics mark, intensity, speed and the pitch of voice is set.Usually each mark parameter can be divided into M grade (M is a positive integer, as M=10), and system is set to arbitrary integer in 1~M scope.
B inserts silent interval by the mark that pauses in sound unit waveform.Pause among the present invention and be divided into five kinds, as previously mentioned.Its silent interval order from short to long is: between the sound mother, between inter-syllable, speech, between sentence, intersegmental.
C, the change of tune are handled, and by change of tune characteristic mark, revise the temporal envelope of speech or phrase middle pitch unit waveform intensity, length, pitch and loudness of a sound.Be referred to as " sound elementary wave deformation sound method ", concrete mark and corresponding processing are as follows:
Mark is handled
Centre word, stress improve loudness of a sound, pitch, the lengthening duration of a sound
Reduction reduces loudness of a sound, pitch, the shortening duration of a sound
The temporal envelope of waveform is revised in the suffixation of a nonsyllabic "r"
The change of tune of sound connection is read the appointment ripple by change of tune requirement from sound unit Wave data storehouse
Shape, and revised.
D, intonation correction: according to intonation characteristic mark, revise sound unit waveform, be referred to as " sound elementary wave deformation accent algorithm.”
The Chinese intonation changes, and is reflected in the tone variation of each syllable.The people is stall unit in when speech with speech or phrase, and for understanding the unit, intonation modification method of the present invention is according to the determined tone of sentence tail tag point symbol with phrase or sentence, revises the pitch of 3~5 syllables before the sentence tail.Promptly revise two speech before the subordinate clause Caudad or the pitch of phrase.Concrete grammar is:
The strong rising tune correction of intonation modification method tonic syllable improves the weak rising tune correction tonic syllable of tone Δ % backward successively and improves tone backward successively Δ 2 % Falling tone correction tonic syllable reduces tone backward successively Δ 2 % If Heibei provincial opera is labeled as 6, the tone variation range is 10, and revising the tonic syllable number is N.Tone increment then: Δ = 4 N
E, smothing filtering: when realizing waveform compilation, finish the shearing of waveform, splicing, and carry out smothing filtering.Smothing filtering is exactly that the speech data that newly is spliced into is carried out filtering, to guarantee that characteristics of speech sounds does not suddenly change, is referred to as " characteristic continuous wave modification method ", as the loudness of a sound smoothing method, specific practice be the preceding of voice joint point n and after respectively get N speech data, calculate its average amplitude M - F = 1 N &Sigma; i = n - N + 1 n S ( i ) With M - B = 1 N &Sigma; i = n + 1 N + n S ( i ) The average relative amplitude that calculates them then is poor, E = M F &OverBar; - M B &OverBar; M F &OverBar; - M B &OverBar; If E is being for just, and E > 1 3 , Revise the preceding waveform of splice point; If E is negative, and E < - 1 3 , Waveform after the correction splice point, specific algorithm is: S (i)=(1-|E|) S (i) n-N+1≤i≤n works as E > 1 3 Or n+1≤i≤N+n works as E < - 1 3
Speech production method of the present invention no longer is to be based upon on the parameter synthetic method basis, but directly the time domain waveform data of voice is edited, so its calculated amount is little, voice naturalness height.The Chinese syllable feature is obvious, and change of tune modulation rule is complicated and changeable, and coarticulation is to the naturalness of voice flow, and the intelligibility influence is big, so sound of the present invention unit Wave data storehouse is a sound unit with syllable, semitone joint, phonetic transition section.
Voice of the present invention divide morphology: be different from semantic participle, it considers the understanding characteristics in the human audition process, solved the branch word algorithm preferably, the ambiguous word string manipulation, monosyllabic word is handled continuously, problems such as phrase is synthetic make word segmentation result establish good basis for speech understanding, the change of tune, modulation process.
The change of tune of the present invention, modulation process fully takes into account the custom that human pronunciation is talked, and determines the orthoepy of syllable in flow.The waveform editor of sound unit algorithm is to revise sound unit waveform with software approach, to obtain continuously the language stream of nature.
The present invention designs the Chinese written language speech conversion system that adopts described method, it is characterized in that by multi-purpose computer, voice output plate by the computer interface connection, loudspeaker is formed, said voice output plate, by number/change converter, wave filter, the literary composition of power amplifier and curing-language converse routine is formed.
Brief Description Of Drawings:
Fig. 1 is prior art Chinese written language-phonetics transfer method FB(flow block).
Fig. 2 is Chinese written language of the present invention-phonetics transfer method FB(flow block).
Fig. 3 divides morphology workflow block diagram for voice of the present invention.
Fig. 4 is the very big matching method FB(flow block) of forward scan of the present invention.
Fig. 5 is Chinese written language of the present invention-speech conversion system structured flowchart.
Fig. 6 is the voice output hardware block diagram that matches with the present invention.
The present invention designs the Chinese written language-speech conversion system that adopts the Chinese written language-phonetics transfer method based on the syllable waveform compilation of the present invention.Its system chart as shown in Figure 5.Comprise multi-purpose computer (51), adopt the software program (52) of the method for the invention establishment to be stored in hard disc of computer or the internal memory voice output plate (53) that is connected with computer interface.The participle dictionary that matches with the voice segmenting method in the present embodiment comprises two words, three words and four words totally 6 ten thousand speech.Also set up the polyphone dictionary, be used for determining the orthoepy of polyphone at different speech.Build a sub-vocabulary for each polyphone in the dictionary, and indicate its pronunciation sequence number, as follows as the sub-vocabulary of " OK " word:
OK: silver 1 is walked 2 roads 3
Actual pronunciation is that bank (h á ng), row (xing) are walked, road capable (h é ng)
The change of tune, the same the present invention of modulation process.Illustrate softly and to handle: if current word is identical with the word of front, and when being one of words such as " milk, sister-in-law, elder sister, father, mother, brother, younger brother, grandfather, baby, see, look, jump, jump ... ", current word changes into softly.If current word is not the prefix word, do not form the reduplicated word speech again, be ", son,,, eh,,,, which " one of and be the suffix word, then read softly.And for example special sound is handled " one "
The processing of " one ":
When " one " is the suffix word, read rising tone, as " first (y ī) "
When " one " word back became the word of speech to read falling tone with it, " one " read rising tone as " one (y í) is individual "
Otherwise read falling tone, as " one (y ì) group "
What import in the present embodiment is the Hanzi internal code string, handles through participle, the change of tune, obtains the index string with the sound unit Wave data storehouse of mark.Point out the inclined to one side location of this pronunciation in database in the string.Here mark mainly is pause sign, end mark.Pause to divide three classes: between word, between speech with sentence between pause give different signs, give the different dead times.During voice output, promptly return if meet end mark.
Sound unit in the present embodiment middle pitch unit Wave data storehouse is a single syllable, comprising the preceding syllable of two words, back syllable, light tone syllable and some special sound syllables of two words.Each syllable data is stored after compression.All the syllable data are merged into a file, and set up index file.Indicate the offset address of each phonetic sign indicating number at file.
After from database, taking out corresponding syllable data, revise loudness of a sound with " characteristic continuous wave modification method ", export voice at last.
The voice output hardware that matches with this software approach, comprise D/A (61), wave filter (62), power amplifier (63) and loudspeaker (64), as shown in Figure 6.Text in Chinese character, numeral, symbol or the computing machine of computer keyboard input all can be used as the input of Chinese language-language conversion method.Sound unit Wave data storehouse and software program are stored in hard disc of computer or the internal memory.Executive program is handled character string, utilizes voice output hardware then, converts speech data to analog quantity through filtering, power amplification, outputs in the loudspeaker.The voice output plate inserts in the expansion slot of computing machine (65).The voice output plate also can adopt the universal audio card on the market, as Sound Blaster and compatible cards thereof.

Claims (3)

1, a kind of Chinese written language-phonetics transfer method based on waveform compilation is characterized in that may further comprise the steps:
(1) at first Hanzi internal code or the pinyin string note cent morphology of importing is divided into speech or phrase string, between speech, inserts the symbol that pauses;
(2) said speech or phrase string generate the sound unit waveform index string of band characteristics of speech sounds mark again by the change of tune, modulation rule treatments;
(3) set up sound unit Wave data storehouse, in the first Wave data of said sound storehouse, take out the first waveform of corresponding sound according to the first waveform index of said sound string;
(4) the more first waveform of said sound is edited, promptly revised loudness of a sound, pitch, the duration of a sound of sound unit waveform, obtain speech-sound data sequence.
2, Chinese written language-phonetics transfer method as claimed in claim 1, it is characterized in that said voice divide morphology to comprise the very big matching method of forward scan, " rule of two and three " of monosyllabic word organizes morphology continuously, the sticking group of word morphology, the reverse major term coupling of ambiguity speech string is divided morphology, the very big matching method of said forward scan is to set up the participle dictionary, with the Hanzi internal code of said input or pinyin string backward word for word from beginning of the sentence, suppose cut-point, form terminology match in speech and the dictionary, " rule of two and three " group morphology of said continuous monosyllabic word, be meant when continuous monosyllabic word number surpasses four, press rule of two and three group speech, the sticking group of said word morphology is to set up the function word storehouse that indicates the sticking speech rule of function word, press and glues the speech rule in the function word storehouse with before the function word in the vocabulary string and its, after speech stick together, the reverse major term coupling of said ambiguity speech string divides morphology to be meant from last byte of ambiguity speech string, progressively add word coupling forward, find the end point of long word.
3, Chinese written language-phonetics transfer method as claimed in claim 1 is characterized in that the said change of tune, modulation process may further comprise the steps: 1) text that the character string behind the text participle is replaced with its Chinese character pronunciation is replaced; 2) utilize the polyphone dictionary, the polyphone that said character string is marked correct pronunciation is handled; 3) utilize the change of tune of setting up, the modulation rule base makes said character string generate the sound unit index string of band pronunciation characteristics mark.
CN 94103372 1994-04-01 1994-04-01 Chinese character-phonetics transfer method and system edited based on waveform Expired - Fee Related CN1032391C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 94103372 CN1032391C (en) 1994-04-01 1994-04-01 Chinese character-phonetics transfer method and system edited based on waveform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 94103372 CN1032391C (en) 1994-04-01 1994-04-01 Chinese character-phonetics transfer method and system edited based on waveform

Publications (2)

Publication Number Publication Date
CN1099165A CN1099165A (en) 1995-02-22
CN1032391C true CN1032391C (en) 1996-07-24

Family

ID=5031030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 94103372 Expired - Fee Related CN1032391C (en) 1994-04-01 1994-04-01 Chinese character-phonetics transfer method and system edited based on waveform

Country Status (1)

Country Link
CN (1) CN1032391C (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7453369B2 (en) 2005-11-25 2008-11-18 Hong Fu Jin Precision Industry (Shenzhen) Co., Ltd. System and method for voice alarm in measuring a workpiece

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6885987B2 (en) 2001-02-09 2005-04-26 Fastmobile, Inc. Method and apparatus for encoding and decoding pause information
CN1320482C (en) * 2003-09-29 2007-06-06 摩托罗拉公司 Natural voice pause in identification text strings
CN100347741C (en) * 2005-09-02 2007-11-07 清华大学 Mobile speech synthesis method
CN101000764B (en) * 2006-12-18 2011-05-18 黑龙江大学 Speech synthetic text processing method based on rhythm structure
CN101226741B (en) * 2007-12-28 2011-06-15 无敌科技(西安)有限公司 Method for detecting movable voice endpoint
JP6520108B2 (en) * 2014-12-22 2019-05-29 カシオ計算機株式会社 Speech synthesizer, method and program
JP6784022B2 (en) * 2015-12-18 2020-11-11 ヤマハ株式会社 Speech synthesis method, speech synthesis control method, speech synthesis device, speech synthesis control device and program
CN107871495A (en) * 2016-09-27 2018-04-03 晨星半导体股份有限公司 Text-to-speech method and system
CN110781651A (en) * 2019-10-22 2020-02-11 合肥名阳信息技术有限公司 Method for inserting pause from text to voice

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7453369B2 (en) 2005-11-25 2008-11-18 Hong Fu Jin Precision Industry (Shenzhen) Co., Ltd. System and method for voice alarm in measuring a workpiece

Also Published As

Publication number Publication date
CN1099165A (en) 1995-02-22

Similar Documents

Publication Publication Date Title
US8219398B2 (en) Computerized speech synthesizer for synthesizing speech from text
US9761219B2 (en) System and method for distributed text-to-speech synthesis and intelligibility
CN1169115C (en) Prosodic databases holding fundamental frequency templates for use in speech synthesis
CN1889170B (en) Method and system for generating synthesized speech based on recorded speech template
CN101064104A (en) Emotion voice creating method based on voice conversion
CN1032391C (en) Chinese character-phonetics transfer method and system edited based on waveform
CN1259631C (en) Chinese test to voice joint synthesis system and method using rhythm control
CN1811912A (en) Minor sound base phonetic synthesis method
CN1956057A (en) Voice time premeauring device and method based on decision tree
Demuynck et al. Automatic generation of phonetic transcriptions for large speech corpora.
CN1118493A (en) Language and speech converting system with synchronous fundamental tone waves
KR100373329B1 (en) Apparatus and method for text-to-speech conversion using phonetic environment and intervening pause duration
CN109859746B (en) TTS-based voice recognition corpus generation method and system
CN1787072A (en) Method for synthesizing pronunciation based on rhythm model and parameter selecting voice
CN1113330C (en) Phoneme regulating method for phoneme synthesis
Williams A Welsh speech database: preliminary results.
Ngugi et al. Swahili text-to-speech system
Ghone et al. TBT (Toolkit to Build TTS): A High Performance Framework to Build Multiple Language HTS Voice.
Javkin et al. A multilingual text-to-speech system
Iyanda et al. Development of a Yorúbà Textto-Speech System Using Festival
KR0134707B1 (en) Voice synthesizer
CN1979636B (en) Method for converting phonetic symbol to speech
CN1547192A (en) An audio synthesis method
Akinwonmi et al. A prosodic text-to-speech system for yorùbá language
Nandwani et al. Speech Synthesis for Punjabi Language Using Festival

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee