CN103854648A

CN103854648A - Chinese and foreign language voiced image data bidirectional reversible voice converting and subtitle labeling method

Info

Publication number: CN103854648A
Application number: CN201210523616.8A
Authority: CN
Inventors: 苗玉水
Original assignee: Shanghai Nenggan Epc System Network Co Ltd
Current assignee: QINGHAI HANLA INFORMATION TECHNOLOGY CO., LTD.
Priority date: 2012-12-08
Filing date: 2012-12-08
Publication date: 2014-06-11

Abstract

The invention provides a Chinese and foreign language voiced image data bidirectional reversible voice converting and subtitle labeling method, and belongs to the technical field of voice and image data processing of computer systems. When Chinese voiced image data are converted into foreign language voice and subtitle labeled image data, voice signals are extracted, transmitted to a Chinese voice recognition module, and recognized into Chinese voice codes or Chinese characters, the Chinese voice codes or the Chinese characters are translated into an appointed foreign language through a machine translating module, and the foreign language is transmitted to a traditional video or image frame subtitle overlaying machine independently or by contrasting text subtitles, subtitle information is overlaid on a video or an image frame, the translated foreign language is synthesized into corresponding foreign language voice through a voice synthesis module, the video or the image frame with the subtitles and the foreign language voice are synthesized together and stored or synchronously output, and in a similar way, foreign language image data can be converted into Chinese image data. According to the method, Chinese audiences can learn the foreign language through foreign image data conveniently, and foreign audiences can learn Chinese through Chinese image data.

Description

Chinese language and foreign language sound image data bidirectional reversible speech conversion the captions method of annotating

Technical field

The technical program belongs to embedded or non-embedded computer system sound and image data processing technology field, below in narration, embedded and non-embedded computer system is referred to as to computer system.

Background technology

At present Chinese on the market or Chinese character or foreign language or its contrast subtitle superposition of the sound image data of foreign language, generally convert Chinese or foreign language in Chinese or the sound image data of foreign language to Chinese character or foreign language by manual type, giving video pictures or image frame subtitle superposition machine is superimposed upon the Chinese character captions of expressing the Chinese meaning on video pictures or image frame again, otherwise by translation, Chinese is translated into foreign language or foreign language translation is become to Chinese simultaneously, by voice-over actor, voice are fitted on synchronous video pictures or image frame again, owing to having a large amount of Chinese or the sound image data of foreign language all over the world, comprise the sound image data such as telerecording and film, therefore, can be very time-consuming if depend merely on the mode of the artificial conversion of employing, and along with the appearance of the sound image technology of numeral, particularly computer system occurs for the treatment of the technology of video image data, more and more need to have a kind of technology that can be automatically mutually converts foreign language or Chinese the captions of annotating to according to Chinese or foreign language voice to occur, and thisly can automatically become the technology of Chinese captions not only can in the computer system with hanzi system, move according to Chinese or foreign language speech conversion, can also be or else only in the Hesperian computer system take the U.S. as representative of the ASCII character system with 128 characters, move with hanzi system, to meet increasingly extensive utilization and the cloud computing of internet, the world, the appearance of Internet of Things and all over the world Chinese language craze, Chinese and Western culture exchanges the needs of new situations more and more frequently.

Summary of the invention

The proposition of the technical program is exactly in order to solve above-mentioned these problems that occurs.The technical program is by adopting following Chinese language and foreign language sound image data bidirectional reversible speech conversion the captions method of annotating to solve the problem of above-mentioned appearance specifically:

In the time that Chinese speech image data is transformed into the image data of foreign language voice the captions of annotating, first carry out the sound signal synchronizing signal mark of video pictures in image data or image frame and corresponding sound language by traditional computer software, the synchronizing signal mark here can be to adopt existing vision signal and sound signal to add the method for identical time stamp, above following all with.Then the sound signal of the sound language with synchronizing signal mark is extracted to the Chinese speech identification module of passing in computing machine, Chinese speech identification module is identified as Chinese speech to be with the Chinese phonetics codes representing with 26 Latin alphabets of use of the identified identical synchronizing signal mark of Chinese speech, again by mechanical translation module by above-mentioned Chinese phonetics codes translate into represent with 26 Latin alphabets there is the foreign language of the appointment of identical synchronizing signal mark with corresponding Chinese phonetics codes sentence, again above-mentioned Chinese phonetics codes captions with synchronizing signal mark or foreign language caption or their contrast text subtitles are transferred to traditional video pictures or image frame subtitle superposition machine, according to the corresponding relation of Chinese phonetics codes captions or foreign language caption or their contrast text subtitles and video pictures or image frame synchronizing signal mark, caption information is superimposed upon on video pictures or image frame, simultaneously by the above-mentioned foreign language of translating into the appointment with synchronizing signal mark, by voice synthetic module synthesize into foreign language voice with synchronizing signal mark accordingly and with there is the video pictures with captions of identical synchronizing signal mark or image frame and be synthesized together and store or synchronously output,

Foreign language voice image data being transformed into Chinese speech and when the image data of the captions of annotating, first carry out the sound signal synchronizing signal mark of video pictures in image data or image frame and corresponding sound language by traditional computer software, then the sound signal of the sound language with synchronizing signal mark is extracted to the foreign language sound identification module of passing in computing machine, the foreign language that 26 Latin alphabets of use that foreign language sound identification module becomes band and identified foreign language voice to have identical synchronizing signal mark foreign language speech recognition represent, what by mechanical translation module, above-mentioned foreign languages translation is become to represent with 26 Latin alphabets again has the Chinese phonetics codes sentence of identical synchronizing signal mark with this corresponding foreign language sentence, again above-mentioned Chinese phonetics codes captions with synchronizing signal mark or foreign language caption or their contrast text subtitles are transferred to traditional video pictures or image frame subtitle superposition machine, according to the corresponding relation of Chinese phonetics codes captions or foreign language caption or their contrast text subtitles and video pictures or image frame synchronizing signal mark, caption information is superimposed upon on video pictures or image frame, simultaneously by the above-mentioned Chinese phonetics codes sentence of translating into synchronizing signal mark, by Chinese speech synthesis module synthesize into the Chinese speech with synchronizing signal mark accordingly and with there is the video pictures with captions of identical synchronizing signal mark or image frame and be synthesized together and store or synchronously output.

Chinese phonetics codes described above is take word as unit, here regard individual Chinese character as monosyllable, according to the phonetic in " Scheme for the Chinese Phonetic Alphabet " of each syllable of this word of composition, with and only use the initial consonant of 26 Latin alphabets to the Chinese phonetic alphabet, referral letter, simple or compound vowel of a Chinese syllable, tone is taked first to encode again successively by the sequential encoding spelling of " acoustic code+Jie code+rhyme code+tune code is held concurrently every syllable symbol ", and directly express Chinese information by the coding of the Chinese phonetics codes that obtains, in the time directly encoding to represent Chinese information with Chinese phonetics codes, its usage in punctuation is identical with English usage in punctuation, when coding, multiple syllables of same word are without space continuous programming code, between word and word, to there is space to separate.

Because the Chinese phonetics codes that the technical program adopts 26 Latin alphabets to represent is expressed Chinese information, and in the time that direct term syllable code represents Chinese information, its usage in punctuation is identical with English usage in punctuation, like this with regard to the expression punctuation mark that guaranteed Chinese information interior all in full accord with ASCII character, also with ASCII character 100% compatibility, above-mentioned like this Chinese speech identification module, mechanical translation module, voice synthetic module is because the Chinese information of processing is used with the on all four Chinese phonetics codes of ASCII character and represented, so just make these modules in the computing machine of ASCII character system, to move, because the module of composition whole system can be moved in the computing machine of ASCII character system, therefore, whole system can be moved in the computing machine of ASCII character system.

After having had the technical program, Chinese information can be in the ASCII of Hanzi internal code system and non-Hanzi internal code system transmits and processes unblockedly in the computer information system of code system, and along with increasingly extensive utilization and the cloud computing of internet, the world, the appearance of Internet of Things and all over the world Chinese language craze, make mutually the viewing and emulating to exchange and bring great convenience of image data of Chinese and the countries in the world take English as representative, particularly facilitate Chinese audiences to learn a foreign language by foreign television data, the spectators of foreign country learn to speak Chinese by Chinese television data, thereby Chinese can more extensively be propagated into all over the world better, promote Chinese culture and the mutual of world culture to exchange.

Except the formal output of Chinese with Chinese phonetics codes, when Chinese phonetics codes described in the technical program needs, in the computing machine of hanzi system, can convert Chinese character to by Chinese character modular converter, and Chinese phonetics codes or Chinese character can be separately or Chinese phonetics codes and Chinese character, the Chinese phonetic alphabet, the foreign language contrast that meaning is consistent shows, stores, output.

Embodiment

Below in conjunction with embodiment, the specific embodiment of the present invention is further described.

(1) coding method of each syllable sound, rhyme, tone of the Chinese phonetics codes that the technical program adopts adopts following method:

Note: the symbol in bracket is the Chinese phonetic symbols in " Scheme for the Chinese Phonetic Alphabet ", be designated hereinafter simply as Chinese phonetic symbols, the coded identification of each syllable sound, rhyme, tone of the Chinese phonetics codes that not parenthesized letter adopts for this programme, below by the following table of comparisons referred to as code table;

1, the coded identification of acoustic code adopts the letter character of the initial consonant basically identical with the Scheme for the Chinese Phonetic Alphabet, such as adopting the coding form of this acoustic code below:

b：（b） p?:?（p） m：（m） f：（f） d：（d） t：（t）

n：（n） l：（l） g：（g） k：（k） h：（h）

j:?（zh），（j） q：（ch），（q） x：（sh），（x） r：（r）

z：（z） c：（c） s：（s） y：（y） w：（w）

2, Chinese phonetic alphabet referral letter (ü) adopts a letter representation in 26 Latin alphabets, such as adopting the coding form of this code that is situated between below:

i：（i） u：（u） y：（ü）

3, the coding of rhyme code, a letter representation to single vowel in 26 Latin alphabets of (ü) employing, other adopts the letter character identical with the Chinese phonetic alphabet, the composite vowel of the Chinese phonetic alphabet is all fine as long as adopting consonant to encode, such as adopting this letter character below to encode to the simple or compound vowel of a Chinese syllable of the Chinese phonetic alphabet:

a：（a） o：（o） e：（e） i：（i） u：（u） y：（ü）

k：（ao） c：（ai） s?:（an） x：（ou） w：（ei） n：（en）

z：（ua） l：（uo） b：（ang） d：（ong） p：（eng）

q：（ing）?g：（ng） er：（er）

R:(i) [only spell mutually with (zh), (ch), (sh)]

4, adjust the coding of code except adopting a no consonant v of Chinese to represent the upper sound (∨) of the Chinese phonetic alphabet, other employing vowel represents the tone of Chinese, such as adopting letter below to encode to the tone of the Chinese phonetic alphabet:

A: (-) high and level tone e:(/) rising tone v:(∨) and upper sound u:() falling tone

O:(does not mark) softly

(2) utilize the Chinese phonetics codes Chinese information of above-mentioned coding to represent to adopt following method:

Take word as unit, here regard individual Chinese character as monosyllable, according to the phonetic in " Scheme for the Chinese Phonetic Alphabet " of each syllable of this word of composition, successively by the sequential encoding of " acoustic code+Jie code+rhyme code+tune code is held concurrently every syllable symbol ", multiple syllables of same word separate write the two or more syllables of a word together without space, and the coding between word and word separates with space, in the time that Chinese information represents in Chinese phonetics codes state, its six kinds of periods, seven kinds of labels adopt and English identical form with the number of dividing a word with a hyphen at the end of a line;

Here owing to regarding the independent Chinese character using as monosyllable, therefore, the method of encode Chinese characters for computer of the present invention is identical with the method for Chinese single syllable coding, adopt in the present invention single syllable coding by obtaining word coding after word write the two or more syllables of a word together, the one group of word being made up of several words is called phrase by we, the coding of phrase that the present invention adopts is identical with the coding of Chinese sentence, because word can represent phrase and Chinese sentence, the coding of the phrase that therefore adopted in the present invention and the coding of Chinese sentence can be realized by the coding of word, and do not need phrase and Chinese sentence to formulate in addition a set of special coding, generally at whole sentence entire chapter during take word as unit representation Chinese information, in the time understanding, generally do not need to carry out the selection of homophone word, sound in principle the sentence that can not produce ambiguity, while expression with coding, also can not produce ambiguity.

The specific implementation step of the technical program is described as an example of the voice of a Chinese sentence sentence and the voice of English sentence example below

One. first carry out the sound signal synchronizing signal mark of video pictures in image data or image frame and corresponding sound language by traditional computer software, then the sound signal of the sound language with synchronizing signal mark is extracted to the Chinese speech identification module of passing in computing machine, Chinese speech identification module is identified as Chinese putonghua speech to be with the Chinese phonetics codes representing with 26 Latin alphabets of use of the identified identical synchronizing signal mark of Chinese speech, first Chinese speech is changed into Chinese phonetics codes by Chinese speech identification module, the synchronizing signal mark here can adopt existing making Audio and Video synchronized timestamp to carry out mark.

In the time adopting Chinese-voice-code voice identification module to carry out Chinese speech identification, the primitive of this Chinese speech identification module using Chinese syllable as identification, by searching the Chinese syllable sound template and the Chinese speech syllabified code table of comparisons that are stored in advance in computer system, after coupling, identify corresponding Chinese syllable phonetic code, when inputting continuously, voice just obtain continuous Chinese syllable phonetic code string, the above-mentioned Chinese syllable phonetic code that obtains was ganged up the mode of checking thesaurus and carried out by word segmentation, to the multiple segmentation of words, after can differentiating according to means such as the contact of Chinese lexical syntactic context and statistical laws, carry out again the segmentation of words, the word being syncopated as is taked to write the two or more syllables of a word together between the syllable of same word and syllable, between word and word, the mode in space represents.

Exemplify the example that by the inventive method, Chinese speech is carried out Chinese phonetics codes identification below:

1. Chinese speech converts Chinese phonetics codes to:

Such as: we extract the Chinese speech of the following Chinese sentence in image data:

" we use Latin every day.”

(1) by searching the Chinese syllable sound template and the Chinese speech syllabified code table of comparisons that are stored in advance in computer system, after coupling, identify corresponding Chinese syllable phonetic code string:

Between Wov mno mwv tisa xrv ydu laa dqa wnv .(syllable and syllable, there is space)

Or wo vmn omw vtis axr vyd ula adq awn v. (between syllable and syllable without space)

(the schwa symbol o after skilled in mno can omit in the time not causing audio mixing, above following all with.)

In order to allow everybody see clearly here, the letter that represents tone is added to underscore, the tool sound insulation joint effect simultaneously of the tone letter in phonetic code, in actual speech code, tone is without underscore, and after skilled Chinese phonetics codes, tone is held concurrently and can conveniently be distinguished every syllabic sign.

So just, completed the irrelevant pure speech recognition process of the complicacy of a system and the dictionary scale of system.

(2) phonetic code string is carried out to the segmentation of words, finally complete the phonetic code conversion take word as unit.

By searching the Chinese phonetics codes word dictionary that is stored in advance point good word in computer system, by multiple syllable write the two or more syllables of a word together of same word, between word and word, separate with space the Chinese phonetics codes that just obtains following our final needs:

Wovmno mwvtisa xrvydu laadqawnv.

Two. call Chinese phonetics codes and foreign language bi-directional conversion module, then the Chinese information of the Chinese speech representation obtaining is converted to take English as main foreign language.

(note: is above the implication in order conveniently to understand Chinese phonetics codes with Chinese character contrast with Chinese phonetics codes hereinafter occurring, does not actually occur pure ASCII character system is in service, above below all with)

Such as by the Chinese information of Chinese speech representation obtained above: wovmno mwvtisa xrvydu laadqawnv. calling Chinese phonetics codes is that the foreign language two-way translation module of leading obtains following translation switch process with English, finally obtains the english sentence of above-mentioned corresponding Chinese phonetics codes:

The Chinese information of 1.wovmno mwvtisa xrvydu laadqawnv .(Chinese speech representation)

We use Latin every day.(Chinese information representing with Chinese character)

A) Chinese dictionary of looking into the mark word part of speech being stored in advance in computer system is set up word part of speech string: (part of sentence in bracket is part of speech, below all with)

Wovmno(personal pronoun 1)+mwvtisa(time noun 1)+xrvydu(verb 1)+laadqawnv(noun 2).

Our (personal pronoun 1)+every day (time noun 1)+use (verb 1)+Latin (noun 2).

B) look into according to sentence part of speech string obtained above the table being stored in advance in computer system and obtain being stored in advance the Chinese sentence patterns in table:

(the component string composition sentence pattern that part of speech and this word are done, below all with)

Wovmno(personal pronoun 1 is made subject)+mwvtisa (time noun 1 is made time adverbial)+xrvydu(verb 1 makes predicate)+laadqawnv (object made in noun 2)

Our (personal pronoun 1 is made subject)+every day (time noun 1 is made time adverbial)+use (predicate made in verb 1)+Latin (object made in noun 2)

C) table look-up and obtain being stored in advance the corresponding English sentence in table according to Chinese sentence patterns obtained above:

Wovmno(personal pronoun 1 is made subject)+xrvydu (predicate made in verb 1)+laadqawnv(noun 2 makes object)+mwvtisa(time noun 1 makes time adverbial)

We (personal pronoun 1 is made subject)+use (predicate made in verb 1)+Latin (object made in noun 2)

+ every day (time noun 1 is made time adverbial)

Now look into the Chinese-English dictionary being stored in advance in computer system and carry out the conversion of word or the phrase meaning, and the conversion that just completes Chinese and translate into English by this sentence pattern Sequential output, can amphicheirality for what show this mechanical translation process, we remake below further conversion:

D) table look-up and obtain being stored in advance in table and corresponding English word or the consistent part of speech string of phrase part of speech according to obtaining English sentence above: (this part of speech string also can extract and obtain from the target language sentence pattern obtaining, below all with)

Wovmno(personal pronoun 1)+xrvydu(verb 1)+laadqawnv(noun 2)+mwvtisa(time noun 1).

We (personal pronoun 1)+use (verb 1)+Latin (noun 2)+every day (time noun 1).

E) look into that the Chinese-English dictionary being stored in advance in computer system carries out the conversion of word or the phrase meaning and by the Sequential output of the English sentence that obtained above:

We(personal pronoun 1) use(verb 1) latin(noun 2) every day(time noun 1).

we use latin every day.

So just, completed the conversion that English translated in Chinese.Again Chinese phonetics codes captions with synchronizing signal mark obtained above or foreign language caption or their contrast text subtitles are transferred to traditional video pictures or image frame subtitle superposition machine, according to the corresponding relation of Chinese phonetics codes captions or foreign language caption or their contrast text subtitles and video pictures or image frame synchronizing signal mark, caption information is superimposed upon on video pictures or image frame

Simultaneously by the above-mentioned foreign language of translating into the appointment with synchronizing signal mark, We use latin every day. synthesizes into foreign language voice with synchronizing signal mark accordingly by voice synthetic module and is synthesized together and stores or synchronously output with the above-mentioned video pictures with captions with identical synchronizing signal mark or image frame, we adopt said method to realize Chinese speech image data is transformed into English Phonetics and the image data of the captions of annotating like this, in like manner also can adopt identical method to realize above process and result to other foreign language, here just tire out and state no longer one by one.

In like manner we can complete from English Phonetics to Chinese speech and the conversion of the captions of Chinese phonetics codes, such as, we extract the voice of a following English sentence from image data:

We use latin every day.

First call English Phonetics identification module, we obtain above-mentioned english sentence:

We use latin every day.

Then we obtain following translation steps and result to call Chinese English two-way translation module:

1. " we use latin every day. " (this is to identify by English Phonetics the English sentence obtaining)

C) English dictionary of looking into the mark word that is stored in advance in computer system or phrase part of speech is set up the part of speech string of word or phrase:

We(personal pronoun 1)+use(verb 1)+latin(noun 1)+every day(time noun 2).

D) table look-up and obtain being stored in advance the English sentence in table according to sentence part of speech string obtained above:

Predicate made in we (personal pronoun 1 is made subject)+use(verb 1)+latin(noun 1 makes object)+every day(time noun 2 makes time adverbial)

E) table look-up and obtain being stored in advance the corresponding Chinese sentence patterns in table according to obtaining English sentence above:

We (personal pronoun 1 is made subject)+every day(time noun 2 is made time adverbial)+use(verb 1 makes predicate)+latin(noun 1 makes object)

Now look into Chinese-English-bidirectional English-Chinese dictionary being stored in advance in computer system and carry out the conversion of word or the phrase meaning, and just complete English Translation and become the conversion of Chinese by this sentence pattern Sequential output, can amphicheirality for what show this mechanical translation process, we remake below further conversion:

F) table look-up and obtain being stored in advance Chinese word in table and corresponding or the consistent part of speech string of phrase part of speech according to obtaining Chinese sentence patterns above:

We (personal pronoun 1)+every day(time noun 2)+use(verb 1)+latin(noun 1)

G) look into that Chinese-English-bidirectional English-Chinese dictionary being stored in advance in computer system carries out the conversion of word or the phrase meaning and by the Sequential output of the Chinese sentence patterns that obtained above:

Wovmno(personal pronoun 1)+mwvtisa(time noun 2)+xrvydu(verb 1)+laadqawnv(noun 1).

Finally we obtain:

The Chinese information of wovmno mwvtisa xrvydu laadqawnv .(Chinese speech representation)

Like this we repeat Chinese translate the process of English just got back we just now the system of giving translate into the Chinese sentence of English, illustrated that this machine translation method has bidirectional reversible.In like manner, also can carry out two-way translation to complex sentence by method above, just be not repeated here.

So just, completed the conversion that English Translation becomes Chinese.Again Chinese phonetics codes captions with synchronizing signal mark obtained above or foreign language caption or their contrast text subtitles are transferred to traditional video pictures or image frame subtitle superposition machine, according to the corresponding relation of Chinese phonetics codes captions or foreign language caption or their contrast text subtitles and video pictures or image frame synchronizing signal mark, caption information are superimposed upon on video pictures or image frame.

Three. last computing machine calls Chinese speech synthesis module again and converts above-mentioned Chinese phonetics codes to Chinese speech, and the Chinese sentence of the Chinese speech representation when needed translation being obtained is exported in the lump, so just completed computing machine from English Phonetics to Chinese speech and the conversion of captions, the following steps that specifically have that this Chinese speech synthesis module converts Chinese phonetics codes to Chinese speech complete:

Still the sentence of following Chinese phonetics codes composition is example:

wovmno mwvtisa xrvydu laadqawnv .

Its meaning is expressed as with Chinese character:

" we use Latin every day.”

Carry out Chinese speech when synthetic at the Chinese information that Chinese phonetics codes is expressed, generally can adopt as required one of following three kinds of methods:

1. carry out the method for phonetic synthesis by looking into Chinese phonetics codes and the syllable Chinese speech composite document table of comparisons:

Looking into the Chinese phonetics codes being stored in advance in computer system (represents with " corresponding syllable Chinese phonetic alphabet .wav " for statement facilitates this audio files with the audio files that obtains the Chinese speech corresponding with phonetic code after the syllable Chinese speech composite document table of comparisons, in actual conditions, there is no Chinese phonetic symbols, it is just stored in computer system in advance, the audio files of the expression corresponding syllables Chinese speech that can play by certain sound playout software.

wov（wǒ.wav）?mno(men.wav) mwv?(měi.wav) tisa(tiān.wav) xrv(shǐ.wav?)?ydu(yòng.wav) laa(lā.wav)?dqa(dīng.wav) wnv(wěn.wav).

The corresponding audio files of this syllable Chinese speech of the representative finding is broadcasted with sound playout software successively order, between word and word, employing was broadcasted successively continuously than the time interval longer between same single syllable, can sound so more approaching effect of reading aloud by word, the custom that more meets people and listen voice.

2. carry out the method for phonetic synthesis by looking into Chinese holophrase tone code and the word Chinese speech composite document table of comparisons:

Looking into the audio files that obtains being stored in advance the Chinese speech take word as unit corresponding with holophrase tone code in computer system after the Chinese holophrase tone code that is stored in advance in computer system and the word pronunciation composite document table of comparisons (represents with " the corresponding Chinese phonetic alphabet .wav take word as unit " as statement facilitates this Chinese sound file take word as unit, actual conditions do not have Chinese phonetic symbols, it is just stored in computer system in advance, the audio files of the corresponding Chinese speech take word as unit of expression that can play by certain sound playout software)

wovmno(wǒmen.wav)?mwvtisa?(měitiān.wav)?xrvydu(shǐyòng.wav) laadqawnv(lādīngwěn.wav).

The corresponding audio files sound playout software take word as unit this Chinese speech of representative finding successively order is broadcasted, between word and word, employing was broadcasted successively continuously than the time interval longer between same single syllable, can sound so more approaching effect of reading aloud by word, the custom that more meets people and listen voice.

3. carry out the method for phonetic synthesis by looking into Chinese phonetics codes string and the maximum coupling paragraph Chinese speech composite document table of comparisons:

The method adopts maximum matching method, exports corresponding Chinese speech by looking into the Chinese phonetics codes string take maximum paragraph as unit and the paragraph Chinese speech composite document table of comparisons that are stored in advance in computer system.Such as by looking into the maximum paragraph being stored in advance in computer system being: " we use wovmno mwvtisa xrvydu every day " and " hsuyyv laadqawnv Chinese character and latin literary composition " Chinese speech is synthetic is so undertaken by mode below:

Wovmno mwvtisa xrvydu（wǒmen měitiān shǐyòng.wav) hsuyyv?laadqawnv (hànyǔ?lādīngwěn.wav).

(for statement facilitates above-mentioned should expression with " the corresponding Chinese phonetic alphabet .wav take this paragraph as unit " by the Chinese sound file take paragraph as unit, actual conditions do not have Chinese phonetic symbols, it is just stored in computer system in advance, the audio files of the corresponding Chinese speech take this paragraph as unit of expression that can play by certain sound playout software)

The needed computer system voice document of first method storage area minimum in above-mentioned three kinds of phoneme synthesizing methods, the third needed computer system voice document storage area maximum.

Sometimes the convenience in order to proofread, we need to read out the punctuation mark in Chinese phonetics codes with the number of dividing a word with a hyphen at the end of a line is bright, this will carry out phonetic synthesis to the punctuation mark in Chinese phonetics codes and the number of dividing a word with a hyphen at the end of a line, in order to make the expressed Chinese information of Chinese phonetics codes and ASCII character 100% compatibility, here the punctuation mark in our special provision Chinese phonetics codes is identical with the number of dividing a word with a hyphen at the end of a line with English punctuation mark respectively with the number of dividing a word with a hyphen at the end of a line, in the time that concrete sound is synthetic, we are as long as extract being stored in advance accordingly punctuation mark in computer system and the audio files of the number of dividing a word with a hyphen at the end of a line, play and just can with sound playout software, such as:

Six kinds of periods: fullstop ". " (j ù h à o.wav), question mark "? " (wenh à o.wav), exclamation mark "! "

(g ǎ nt à nh à o.wav), comma, " (d ò uh à o.wav), colon ": " (m à oh à o.wav), branch "; " (f ē nh à o.wav).

Seven kinds of labels: quotation marks " " (y ǐ nh à o.wav), bracket () (ku ò h à o.wav), dash "-" (p ò zh é h à o.wav), suspension points ... (sh ě nglueh à o.wav), mark of emphasis .(zhu ó zh ò ngh à o.wav), punctuation marks used to enclose the title (()) (sh ū m í ngh à o.wav), separation dot. (ji à ng é h à o.wav).

The number of dividing a word with a hyphen at the end of a line: the number of dividing a word with a hyphen at the end of a line "-" (y í h á ngh à o.wav).

The six kind periods identical with English of the present invention, seven kinds of labels and the number of dividing a word with a hyphen at the end of a line are listed above, " .wav " file in bracket is exactly and punctuation mark or the number of the dividing a word with a hyphen at the end of a line corresponding phonetic synthesis file that pronounces, in the time of phonetic synthesis file that this phonetic synthesis file is Chinese, the bright sound reading out of this punctuation mark or the number of dividing a word with a hyphen at the end of a line is the sound of the corresponding punctuation mark of Chinese or the number of dividing a word with a hyphen at the end of a line.

Finally synthesize into the Chinese speech with synchronizing signal mark accordingly by Chinese speech synthesis module and be synthesized together and store or synchronously output with the above-mentioned video pictures with captions with identical synchronizing signal mark or image frame, we adopt said method to realize English Phonetics image data is transformed into Chinese speech and the image data of the captions of annotating like this, in like manner also can adopt identical method to realize above process and result to other foreign language, just tire out and state no longer one by one here.

Four. further obtaining after Chinese phonetics codes, when needing, Chinese phonetics codes can convert Chinese character to by Chinese phonetics codes Chinese character modular converter, now whole system has to operate in the computing machine of hanzi system, Chinese phonetics codes or Chinese character can be separately or Chinese phonetics codes and Chinese character, the Chinese phonetic alphabet, the foreign language contrast that meaning is consistent shows, stores, exports, and now whole system has to operate in the computing machine of hanzi system.

Convert Chinese phonetics codes to Chinese character by calling Chinese phonetics codes Chinese character bi-directional conversion modular computer by following steps:

Can easily Chinese phonetics codes be converted to Chinese character and the Chinese phonetic alphabet by searching respectively Chinese phonetics codes with Chinese character and the Chinese phonetic alphabet table of comparisons take word as unit, such as:

Wovmno is by looking into acoustic code, code is situated between, rhyme code, the Chinese phonetics codes syllable of adjusting code and the Chinese phonetic alphabet table of comparisons or generate according to this table of comparisons or word and pinyin syllable or the word table of comparisons obtain w ǒ men, find the Chinese character take word as unit by w ǒ men again, when the phonetic code take word as unit is set up after corresponding relation by the Chinese phonetic alphabet take word as unit and the Chinese character take word as unit, once need to can no longer need by the Chinese phonetic alphabet take word as unit by the phonetic code take word as unit, directly set up corresponding relation and carry out corresponding conversion with the Chinese character take word as unit.Such as: wovmno can be converted to w ǒ men, can convert " we " to by w ǒ men again, wovmno and " we " have just directly set up corresponding relation like this, while needs, can not change by Chinese phonetic alphabet w ǒ men, and directly between wovmno and " us ", realize bidirectional reversible conversion.

While meeting homonym, the Chinese character carrying out take word as unit after can differentiating according to means such as the contact of Chinese lexical syntactic context and statistical laws is selected.Such as: on ysvlune, fill mailbag.On ysvlune, fill crude oil.Can know in conjunction with contextual contact: " ysvlune " in represents cruise above, after " ysvlune " in one represent oil tanker, these two words can convert respectively " on cruise, having filled mailbag " and " on oil tanker, having filled crude oil " to.To other word situation also.

The result of above-mentioned bidirectional reversible conversion both can show separately also can contrast demonstration, such as:

Former sentence: " we use Chinese character and latin literary composition every day." can reversibly be converted to following several form with the inventive method computing machine:

1.“Wǒmen měitiān shǐyòng lādīngwěn。”

2.“wovmno mwvtisa xrvydu laadqawnv.”

3.“Wǒmen měitiān shǐyòng lādīngwěn。”

We use Latin every day.

4.“wovmno mwvtisa xrvydu laadqawnv.”

We use Latin every day.

5.?“Wǒmen měitiān shǐyòng lādīngwěn。”

“wovmno mwvtisa xrvydu laadqawnv.”

In order to allow the foreigner or Minorities In China more understand to aspect implication and the learning Chinese of Chinese, also can in the word of each contrast, insert corresponding foreign language word or minority language, such as adding corresponding English word to do the note of the Chinese meaning in word below:

“wovmno?Wǒmen mwvtisa?měitiān xrvydu?shǐyòng laadqawnv?lādīngwěn?。”

We We every every day day uses use Latin Latine.

The like, with said method, computing machine has been realized bidirectional reversible conversion to Chinese speech with English Phonetics and corresponding captions thereof, and with character signal superimposing technique, Chinese speech and English Phonetics and corresponding captions thereof realized to bidirectional reversible conversion and stored or output with synchronous corresponding video pictures or the synthetic stack of image frame by existing video pictures or image frame, the like, with said method, we can also realize Chinese to other foreign language, the conversion of the bidirectional reversible voice of a kind of foreign language to another kind of foreign language and corresponding captions thereof also stores or output with synchronous corresponding video pictures or the synthetic stack of image frame.

Claims

1. a Chinese language and foreign language sound image data bidirectional reversible speech conversion the captions method of annotating, it is characterized in that: in the time that Chinese speech image data is transformed into the image data of foreign language voice the captions of annotating, first carry out the sound signal synchronizing signal mark of video pictures in image data or image frame and corresponding sound language by traditional computer software, then the sound signal of the sound language with synchronizing signal mark is extracted to the Chinese speech identification module of passing in computing machine, Chinese speech identification module is identified as Chinese speech to be with the Chinese phonetics codes representing with 26 Latin alphabets of use of the identified identical synchronizing signal mark of Chinese speech, again by mechanical translation module by above-mentioned Chinese phonetics codes translate into represent with 26 Latin alphabets there is the foreign language of the appointment of identical synchronizing signal mark with corresponding Chinese phonetics codes sentence, again above-mentioned Chinese phonetics codes captions with synchronizing signal mark or foreign language caption or their contrast text subtitles are transferred to traditional video pictures or image frame subtitle superposition machine, according to the corresponding relation of Chinese phonetics codes captions or foreign language caption or their contrast text subtitles and video pictures or image frame synchronizing signal mark, caption information is superimposed upon on video pictures or image frame, simultaneously by the above-mentioned foreign language of translating into the appointment with synchronizing signal mark, by voice synthetic module synthesize into foreign language voice with synchronizing signal mark accordingly and with there is the video pictures with captions of identical synchronizing signal mark or image frame and be synthesized together and store or synchronously output,

2. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 the captions method of annotating, it is characterized in that: described Chinese phonetics codes is take word as unit, here regard individual Chinese character as monosyllable, according to the phonetic in " Scheme for the Chinese Phonetic Alphabet " of each syllable of this word of composition, with and only use the initial consonant of 26 Latin alphabets to the Chinese phonetic alphabet, referral letter, simple or compound vowel of a Chinese syllable, tone is taked first to encode again successively by the sequential encoding spelling of " acoustic code+Jie code+rhyme code+tune code is held concurrently every syllable symbol ", and directly express Chinese information by the coding of the phonetic code that obtains, in the time that direct term syllable code represents Chinese information, its usage in punctuation is identical with English usage in punctuation, when coding, multiple syllables of same word are without space continuous programming code, between word and word, to there is space to separate.

3. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, it is characterized in that: described Chinese phonetics codes is that initial consonant all represents with the consonant Latin alphabet, be used for the initial consonant of the phonetic code that represents Chinese information except (zh), (ch), (sh) initial consonant is used respectively j, q, outside tri-consonant Latin alphabets of x represent, remaining initial consonant is used with the consonant Latin alphabet of Chinese phonetic alphabet same-sign and is represented, (zhi) of the Chinese phonetic alphabet, (chi), (shi) use respectively the jr of phonetic code, qr, xr represents, (er) of the Chinese phonetic alphabet represents with the er of phonetic code, when jr or qr or xr and the input of er keyboard, press respectively two key mapping inputs of J and R or Q and R or X and R and E and R.

4. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, it is characterized in that: described Chinese phonetics codes (ü) in the original Chinese phonetic alphabet single vowel of a letter representation and referral letter in 26 letters, the coding of all the other single vowels and referral letter adopts the symbol identical with referral letter with Chinese phonetic alphabet single vowel.

5. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, is characterized in that: described Chinese phonetics codes y represents (ü) in original Chinese phonetic alphabet single vowel and referral letter.

6. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, it is characterized in that: described Chinese phonetics codes is except the composite vowel of part with referral letter, and the rhyme code of remaining composite vowel represents with consonant.

7. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 6 the captions method of annotating, is characterized in that: described Chinese phonetics codes Latin alphabet k, c, s, x, w, n, z, l, b, d, p, q, g(are without the initial and the final) represent respectively (ao), (ai), (an), (ou), (ei), (en), (ua), (uo), (ang), (ong), (eng), (ing), (ng) of the Chinese phonetic alphabet.

8. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, it is characterized in that: described Chinese phonetics codes it adjust code to represent with four vowels and a no alphabetical v of Chinese, with Latin alphabet a, e, v, u, o represent respectively the high and level tone (-), rising tone e:(of the Chinese phonetic alphabet /), upper sound v:(∨), falling tone u:(), o(does not mark softly).

9. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 or 2 the captions method of annotating, it is characterized in that: when described Chinese phonetics codes needs, in the computing machine of hanzi system, can convert Chinese character to by Chinese character modular converter, Chinese phonetics codes or Chinese character can be separately or Chinese phonetics codes and Chinese character, the Chinese phonetic alphabet, and the foreign language contrast that meaning is consistent shows, stores, output.

10. a kind of Chinese language and foreign language sound image data bidirectional reversible speech conversion as claimed in claim 1 the captions method of annotating, is characterized in that: described synchronizing signal mark can adopt existing making video pictures or image frame and audio sync timestamp mark.