CN105786802B - A kind of transliteration method and device of foreign language - Google Patents

A kind of transliteration method and device of foreign language Download PDF

Info

Publication number
CN105786802B
CN105786802B CN201410835020.0A CN201410835020A CN105786802B CN 105786802 B CN105786802 B CN 105786802B CN 201410835020 A CN201410835020 A CN 201410835020A CN 105786802 B CN105786802 B CN 105786802B
Authority
CN
China
Prior art keywords
phonetic symbol
phonetic
transliteration
file
characters string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410835020.0A
Other languages
Chinese (zh)
Other versions
CN105786802A (en
Inventor
梁捷
杨淑敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Guangzhou I9Game Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou I9Game Information Technology Co Ltd filed Critical Guangzhou I9Game Information Technology Co Ltd
Priority to CN201410835020.0A priority Critical patent/CN105786802B/en
Publication of CN105786802A publication Critical patent/CN105786802A/en
Application granted granted Critical
Publication of CN105786802B publication Critical patent/CN105786802B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application discloses a kind of transliteration method and device of foreign language, in this application, first after acquisition needs the transliteration file of transliteration, obtain the corresponding phonetic symbol of the transliteration file, wherein the phonetic symbol of each word constitutes a phonetic characters string in the transliteration file, then according to phonetic symbol regular expression, the phonetic characters string is divided into multiple substrings, further according to the corresponding relationship of phonetic symbol pronunciation and object language, obtains object language corresponding with the pronunciation of each substring.Pass through the application, the pronunciation of the transliteration file of foreign language can be converted into corresponding object language, will pass through object language, know the pronunciation of transliteration file, it is existing in the prior art to solve, when facing foreign language, due to not knowing the pronunciation of foreign language, thus the problem of language expression can not be carried out.

Description

A kind of transliteration method and device of foreign language
Technical field
This disclosure relates to the transliteration method and device of language processing techniques field more particularly to a kind of foreign language.
Background technique
Currently, a variety of foreign languages more and more appear in daily life and work.Such as it is many interesting to listen to outer Language song flows into the country, and the language of song is related to the multilinguals such as English, Japanese and Korean, alternatively, with economic development, more Carry out the more various countries Ren Huiqu tourisms, to touch a variety of foreign languages.
But most people can not skillfully grasp various foreign languages, thus when occurring in face of various foreign languages, due to not knowing The pronunciation of foreign language, thus the problem of language expression can not be carried out.For example, often occurring asking the way, ordering dishes etc. needing in overseas travel The scene for wanting foreign language to link up if tourist cannot grasp foreign language, will appear even if carrying foreign language dictionary in this case, But pronunciation can not be read, thus the problem of can not linking up.
Summary of the invention
To overcome the problems in correlation technique, the disclosure provides a kind of transliteration method and device of foreign language.
In order to solve the above-mentioned technical problem, the embodiment of the invention discloses following technical solutions:
According to the first aspect of the embodiments of the present disclosure, a kind of transliteration method of foreign language is provided, comprising:
After acquisition needs the transliteration file of transliteration, the corresponding phonetic symbol of the transliteration file is obtained, wherein the transliteration The phonetic symbol of each word constitutes a phonetic characters string in file;
According to phonetic symbol regular expression, the phonetic characters string is divided into multiple substrings, wherein every sub- character The corresponding phonetic symbol pronunciation of string;
According to the corresponding relationship of phonetic symbol pronunciation and object language, obtain corresponding with the pronunciation of each substring Object language.
With reference to first aspect, in a first possible implementation of that first aspect, the transliteration method of the foreign language is also wrapped It includes:
After obtaining object language corresponding with the pronunciation of each substring, according to the section of the transliteration file It claps and separates object language corresponding with the pronunciation of each substring.
The first possible implementation with reference to first aspect, in a second possible implementation of that first aspect, According to phonetic symbol regular expression, the phonetic characters string is divided into before multiple substrings, further includes:
According to the language form of the transliteration file, the phonetic symbol rule of the transliteration file is determined;
According to the phonetic symbol rule, corresponding deterministic stresses DFA transition diagram is drawn;
According to the deterministic stresses DFA transition diagram, corresponding phonetic symbol regular expression is obtained.
With reference to first aspect, perhaps the first possible implementation or with reference to first aspect with reference to first aspect Two kinds of possible implementations, in first aspect in the third possible implementation, if the transliteration file is English file, The phonetic symbol regular expression are as follows:
η=an|bman
Wherein, η indicates the pronunciation of phonetic symbol, and a indicates a vowel phonetic symbols, and b indicates a consonant phonetic symbol, m and n be greater than 0 positive integer.
It is with reference to first aspect, described according to phonetic symbol regular expression in the 4th kind of possible implementation of first aspect, The phonetic characters string is divided into multiple substrings, comprising:
51) currently pending phonetic symbol set is determined, wherein the currently pending phonetic symbol collection is combined into the phonetic characters string In first phonetic symbol;
52) judge whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, if not meeting, execute The operation of step 53) executes the operation of step 54) if meeting;
53) the last one phonetic symbol in the currently pending phonetic symbol set is set as the kth in the phonetic characters string Phonetic symbol divides from the phonetic characters string using the preceding k-1 phonetic symbol in the currently pending phonetic symbol set as substring It cuts away, and sets first phonetic symbol in remaining phonetic characters string as currently pending phonetic symbol, the remaining phonetic symbol word Symbol string is new phonetic characters string, and returns to step operation 52);
54) it sets the currently pending phonetic symbol collection and is combined into k-th of phonetic symbol in the phonetic characters string, whether judge k Equal to s, if being equal to, the operation of step 55) is executed, if being not equal to, executes the operation of step 56), wherein s indicates the phonetic symbol Character string middle pitch target number;
55) it determines that the currently pending phonetic symbol collection is combined into a substring, terminates this segmentation;
56) the preceding k+1 phonetic symbol in the phonetic characters string is built into currently pending phonetic symbol set, and returns to execution The operation of step 52).
According to the second aspect of an embodiment of the present disclosure, a kind of transliteration device of foreign language is provided, comprising:
Translation module, for obtaining the corresponding phonetic symbol of the transliteration file after acquisition needs the transliteration file of transliteration, Wherein, the phonetic symbol of each word constitutes a phonetic characters string in the transliteration file;
Divide module, for according to phonetic symbol regular expression, the phonetic characters string to be divided into multiple substrings, In, the corresponding phonetic symbol pronunciation of each substring;
Object language obtains module, for according to phonetic symbol pronunciation and the corresponding relationship of object language, obtain with it is described each The corresponding object language of the pronunciation of substring.
In conjunction with second aspect, in second aspect in the first possible implementation, the transliteration device of the foreign language is also wrapped It includes:
Separating modules, for after obtaining corresponding with the pronunciation of each substring object language, according to institute The beat for stating transliteration file separates object language corresponding with the pronunciation of each substring.
The first possible implementation in conjunction with second aspect, in second of second aspect possible implementation, institute State the transliteration device of foreign language further include:
Phonetic symbol regular expression obtains module, for the language form according to the transliteration file, determines the transliteration text The phonetic symbol rule of part draws corresponding deterministic stresses DFA transition diagram according to the phonetic symbol rule, and according to described true Determine finite automata DFA transition diagram, obtains corresponding phonetic symbol regular expression.
In conjunction with second aspect, perhaps combines the first possible implementation of second aspect or combine second aspect the Two kinds of possible implementations, in second aspect in the third possible implementation, if the transliteration file is English file, The phonetic symbol regular expression are as follows:
η=an|bman
Wherein, η indicates the pronunciation of phonetic symbol, and a indicates a vowel phonetic symbols, and b indicates a consonant phonetic symbol, m and n be greater than 0 positive integer.
In conjunction with second aspect, in the 4th kind of possible implementation of second aspect, the segmentation module comprises determining that list Member, judging unit, first processing units, the second processing unit, substring determination unit and construction unit, wherein
The determination unit is for determining currently pending phonetic symbol set, wherein the currently pending phonetic symbol collection is combined into First phonetic symbol in the phonetic characters string;
The judging unit for judging whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, If not meeting, corresponding operation is executed by the first processing units, if meeting, is executed by described the second processing unit corresponding Operation;
The last one phonetic symbol that the first processing units are used to set in the currently pending phonetic symbol set is described K-th of phonetic symbol in phonetic characters string, using the preceding k-1 phonetic symbol in the currently pending phonetic symbol set as substring, Divide away from the phonetic characters string, and sets first phonetic symbol in remaining phonetic characters string as currently pending sound Mark, the remaining phonetic characters string is new phonetic characters string, and new phonetic characters string is transferred to the judging unit, Corresponding operation is executed by the judging unit;
Described the second processing unit is for setting the kth that the currently pending phonetic symbol collection is combined into the phonetic characters string A phonetic symbol, judges whether k is equal to s, if being equal to, executes corresponding operation by the substring determination unit, if being not equal to, by The construction unit executes corresponding operation, wherein s indicates the phonetic characters string middle pitch target number;
The substring determination unit terminates for determining that the currently pending phonetic symbol collection is combined into a substring This segmentation;
The construction unit is used to the preceding k+1 phonetic symbol in the phonetic characters string being built into currently pending phonetic symbol collection It closes, and the currently pending phonetic symbol set is obtained by judging unit, execute corresponding operation.
The technical scheme provided by this disclosed embodiment can include the following benefits:
By the application, the pronunciation of the transliteration file of various foreign languages can be converted into corresponding object language, By object language, can conveniently know the pronunciation of transliteration file, thus solve it is existing in the prior art, face foreign language When, due to not knowing the pronunciation of foreign language, thus the problem of language expression can not be carried out.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of workflow schematic diagram of the transliteration method of foreign language shown according to an exemplary embodiment;
Fig. 2 is the workflow schematic diagram of the transliteration method of another foreign language shown according to an exemplary embodiment;
Fig. 3 is the workflow schematic diagram of the transliteration method of another foreign language shown according to an exemplary embodiment;
Fig. 4 is the schematic diagram of DFA transition diagram in a kind of transliteration method of foreign language shown according to an exemplary embodiment;
Fig. 5 is phonetic symbol pronunciation and object language in a kind of transliteration method of foreign language shown according to an exemplary embodiment Correspondence diagram;
Fig. 6 is a kind of structural schematic diagram of the transliteration device of foreign language shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.
It is existing in the prior art in order to solve, when facing foreign language, due to not knowing the pronunciation of foreign language, thus can not be into The problem of row language is expressed, this application discloses a kind of transliteration methods of foreign language.
Fig. 1 is the workflow schematic diagram of the transliteration method of foreign language shown according to an exemplary embodiment, this method packet It includes:
Step S11, after acquisition needs the transliteration file of transliteration, the corresponding phonetic symbol of the transliteration file is obtained, wherein The phonetic symbol of each word constitutes a phonetic characters string in the transliteration file.
Wherein, the transliteration file includes the language of diversified forms, such as English, Japanese and French, the application to this not It limits.In addition, a word in the transliteration file refers to an English word if the transliteration file is English.
Step S12, according to phonetic symbol regular expression, the phonetic characters string is divided into multiple substrings, wherein every The corresponding phonetic symbol pronunciation of a substring.
Regular expression is to be described using single string, match a series of character strings for meeting some syntactic rule. In the application, the meaning of the phonetic symbol regular expression is to construct the character string of phonetic symbol pronunciation.
Step S13, according to the corresponding relationship of phonetic symbol pronunciation and object language, the pronunciation with each substring is obtained Corresponding object language.
In the application, the corresponding relationship of phonetic symbol pronunciation and object language is obtained in advance, after getting each substring, The phonetic symbol of the substring is pronounced and is matched with the object language, the pronunciation that can obtain each substring is corresponding Object language.
Wherein, the object language is a kind of language different from the transliteration file.For example, if the transliteration file is English, and when user is only familiar with Chinese, the object language may be configured as Chinese, in this case, pass through the behaviour of step S13 Make, Chinese corresponding to the transliteration file of English can be obtained.
The step S11 to step S13 of the application discloses a kind of transliteration method of foreign language, in this method, needs first obtaining After the transliteration file for wanting transliteration, the corresponding phonetic symbol of the transliteration file is obtained, wherein the sound of each word in the transliteration file Mark constitutes a phonetic characters string and the phonetic characters string is divided into multiple sub- characters then according to phonetic symbol regular expression String obtains mesh corresponding with the pronunciation of each substring further according to the corresponding relationship of phonetic symbol pronunciation and object language Poster speech.
In this way, the pronunciation of the transliteration file of foreign language can be converted into corresponding object language, so as to By object language, know the pronunciation of transliteration file, thus solve it is existing in the prior art, face foreign language when, due to The pronunciation of foreign language is not known, thus the problem of can not carrying out language expression.
Further, the transliteration method of the foreign language disclosed in the present application further include:
After obtaining object language corresponding with the pronunciation of each substring, according to the section of the transliteration file It claps and separates object language corresponding with the pronunciation of each substring.
Some transliteration files have corresponding rhythm, such as song.User encounters the English songs liked sometimes, but due to This foreign language is not grasped, often wants to sing and but does not know how, timing is not caught up with yet.In this case, the application is public The transliteration method for the foreign language opened further includes the steps that the beat interval transliteration according to song.In the step, obtain with it is described each After the pronunciation of the corresponding object language of a substring, also according to the corresponding beat of song, separate the target language according to beat The pronunciation of speech meets the needs of user's singing so that making the distribution of pronunciation has timing.
Wherein, the beat of the transliteration file can obtain in several ways, in one way in which in, the beat can It is inputted by user, in addition, the beat of the transliteration file can be obtained by the numbered musical notation of song if the transliteration file is the lyrics It takes.
In addition, workflow schematic diagram shown in Figure 2, according to phonetic symbol regular expression, by the phonetic characters String is divided into before multiple substrings, the transliteration method of foreign language disclosed in the present application further include:
Step S21, according to the language form of the transliteration file, the phonetic symbol rule of the transliteration file is determined.
For example, if the language form of the transliteration file is English, the phonetic symbol rule are as follows: vowel+vowel=vowel, Consonant+consonant=consonant.
Step S22, according to the phonetic symbol rule, corresponding deterministic stresses DFA transition diagram is drawn.
Wherein, (deterministic finite automaton, deterministic finite automation or determination are limited by DFA Automatic machine) it is the automatic machine for being able to achieve state transfer.
Step S23, according to the deterministic stresses DFA transition diagram, corresponding phonetic symbol regular expression is obtained.
By step S21 to the operation of step S23, phonetic symbol regular expression can be obtained, so as in the follow-up process, root According to the phonetic symbol regular expression, the segmentation to phonetic characters string is realized.
When the language form of the transliteration file is different form, the phonetic symbol regular expression is also different.If the sound Translation part is English file, the phonetic symbol regular expression are as follows:
η=an|bman
Wherein, η indicates the pronunciation of phonetic symbol, and a indicates a vowel phonetic symbols, and b indicates a consonant phonetic symbol, m and n be greater than 0 positive integer.
The meaning of the phonetic symbol regular expression is to construct the character string of phonetic symbol pronunciation.If the transliteration file is English File, the phonetic symbol regular expression indicate that a phonetic symbol pronunciation is built-up by single or multiple vowel phonetic symbols, alternatively, One phonetic symbol pronunciation is jointly built-up by m consonant and n vowel, wherein m and n is the positive integer greater than 0.
In step s 12, it discloses according to phonetic symbol regular expression, the phonetic characters string is divided into multiple sub- characters The step of string.Workflow schematic diagram shown in Figure 3, it is described according to phonetic symbol regular expression, by the phonetic characters string It is divided into multiple substrings, comprising the following steps:
Step S31, currently pending phonetic symbol set is determined, wherein the currently pending phonetic symbol collection is combined into the phonetic symbol First phonetic symbol in character string.
Step S32, judge whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, if not being inconsistent It closes, executes the operation of step S33, if meeting, execute the operation of step S34.
Step S33, the last one phonetic symbol in the currently pending phonetic symbol set is set as in the phonetic characters string K-th of phonetic symbol, if the currently pending phonetic symbol set does not meet the phonetic symbol regular expression, by described currently wait locate Preceding k-1 phonetic symbol in reason phonetic symbol set is divided away from the phonetic characters string as substring, and sets remaining First phonetic symbol in phonetic characters string is currently pending phonetic symbol set, and the remaining phonetic characters string is new phonetic symbol word The step of symbol is gone here and there, and S32 is returned to step.
If the currently pending phonetic symbol set does not meet the phonetic symbol regular expression, illustrate currently pending phonetic symbol A phonetic symbol pronunciation cannot be issued, then is retained the last one phonetic symbol in the currently pending phonetic symbol set, by preceding k-1 Phonetic symbol is divided away from the phonetic characters string as substring, and the substring energy that the preceding k-1 phonetic symbol is constituted Enough issue a phonetic symbol pronunciation.
Step S34, it sets the currently pending phonetic symbol collection and is combined into k-th of phonetic symbol in the phonetic characters string, if institute It states currently pending phonetic symbol set and meets the phonetic symbol regular expression, judge whether k is equal to s, if being equal to, execute step S35 Operation execute the operation of step S36 if being not equal to, wherein s indicates the phonetic characters string middle pitch target number.
If step S35, k is equal to s, it is determined that the currently pending phonetic symbol collection is combined into a substring, terminates this Segmentation.
If k be equal to s, illustrate that the currently pending phonetic symbol collection is combined into the phonetic characters string, then by it is described currently to Phonetic symbol set is handled as a substring, this segmentation terminates.
If step S36, k is not equal to s, the preceding k+1 phonetic symbol in the phonetic characters string is built into currently pending Phonetic symbol set, and return to step the operation of S32.
If k is not equal to s, illustrates that k is less than s, remove other than the currently pending phonetic symbol set, the phonetic characters string In there is also other phonetic symbols, in this case, then construct new currently pending phonetic symbol set, return to step the behaviour of S32 Make, to judge whether the currently pending set newly constructed meets the phonetic symbol regular expression.
It is operated disclosed in S31 to step S36 through the above steps, phonetic characters string can be divided into multiple sub- characters String, the corresponding phonetic symbol pronunciation of each substring, so as to subsequent according to the substring, obtains corresponding target language Speech.
In addition, can also be carried out some corresponding after obtaining object language corresponding with the pronunciation of each substring Processing.For example, since in English, consonant articulation is lighter, generally requiring in conjunction with vowel just has weight if transliteration file is English Sound, so the sound of single consonant may be selected not read, after obtaining object language corresponding with the pronunciation of each substring, also Single consonant can be deleted.
The transliteration method of foreign language disclosed in the present application is introduced below by way of an actual example.In the example, need by One first English song translates into Chinese, that is to say, that transliteration file is the lyrics of English form, and object language is Chinese.
In this example, it is necessary first to obtain transliteration file, the transliteration file refers to the English lyrics, and with " Tonight For this English lyrics of Icelebratemylove ".
Then the corresponding phonetic symbol of the English lyrics is obtained, wherein the phonetic symbol of each word is constituted in the English lyrics One phonetic characters string, the phonetic symbol got are as follows:
According to English phonetic symbol table it is found that English phonetic symbol include 48 English phonetic tables, wherein have 20 vowels and 28 it is auxiliary Sound.
In addition, obtaining phonetic symbol regular expression according to deterministic stresses DFA principle in this example.The DFA is usually wrapped It includes:
The state set Q of one nonempty finite;
One input alphabet Σ (character set of nonempty finite);
One δ: Q* Σ of transfer function → Q (such as: δ (q, σ)=p, p, q ∈ Q, σ ∈ Σ);
One beginning state s ∈ Q;
One receives the set of state
Composed 5- tuple.
In the application, Q=(S1, S2), wherein S1 and S2 respectively indicates pronounceable phonetic characters string, wherein S1 is indicated The vowel phonetic characters string that perhaps the phonetic characters string S2 voiced consonant of set composition of vowel or the set of consonant are constituted. Wherein, Σ indicates the composition character of phonetic symbol to Σ={ a, b }, is divided into vowel and consonant, and a indicates vowel, b voiced consonant.S=S is Null character indicates beginning state.F=(S1, S2), expression receive state.
Due to the phonetic symbol rule of English are as follows: the group of vowel and vowel is combined into vowel, and the combination of consonant and consonant is consonant, according to This, can be obtained following state transfer relationship:
S+a=S1 (indicates null character and vowel combination, constitute vowel);
S+b=S2 (indicates null character and consonant combination, constitute consonant);
S1+S1=S1 (indicates vowel and vowel combination, constitute vowel);
S1+S2=S (indicates vowel and consonant combination, constitute null character);
S2+a=S (voiced consonant and vowel combination, constitute null character);
S2+S=S2 (voiced consonant and null character combination, constitute consonant).
The transfer function that state transfer relationship as above is used to characterize DFA can draw DFA transition diagram accordingly, wherein institute It is as shown in Figure 4 to state DFA transition diagram.
According to DFA transition diagram, corresponding phonetic symbol regular expression can be obtained, in this example, transliteration file is English, is obtained The phonetic symbol regular expression got are as follows:
η=an|bman
Wherein, η indicates the pronunciation of phonetic symbol, and a indicates a vowel phonetic symbols, and b indicates a consonant phonetic symbol, m and n be greater than 0 positive integer.The phonetic symbol regular expression indicates that a phonetic symbol pronunciation is built-up by single or multiple vowel phonetic symbols, or Person, a phonetic symbol pronunciation are jointly built-up by m consonant and n vowel, wherein m and n is the positive integer greater than 0.Wherein, anIt can be with are as follows: ei, ai;bnIt can be brei,p3And fi.
Then, according to phonetic symbol regular expression, phonetic characters string is split, the rule that when segmentation follows are as follows: if sound After mark set is along with a phonetic symbol, it is unsatisfactory for the phonetic symbol regular expression, then is come out the phonetic symbol set-partition, by institute It states phonetic symbol set and forms a new substring.
The method for dividing phonetic characters string is realized according to operating disclosed in step S31 to step S36, can specifically pass through puppet Code realizes that for C language, pseudocode is as follows:
Wherein, Y indicates the complete phonetic symbol input an of word, and c indicates the length of phonetic symbol, and y expression separates effective Phonetic symbol pronunciation, after k indicates that currently processed k-th of character to phonetic characters string, i indicate segmentation, remaining phonetic characters string Subscript.
With the corresponding phonetic symbol of celebrateFor, program output process is as follows:
The phonetic characters string of final CelebrateThe substring being separated out is as follows:
selibreit。
The segmentation situation correspondence of each word is as follows:
Transliteration file: Tonight I celebrate my love
Phonetic characters string:[aI] [l∧v]
Substring: se li l∧v。
Through the above steps, the corresponding phonetic characters string of transliteration file can be divided into multiple substrings, every height The corresponding phonetic symbol pronunciation of character string.Then, the corresponding relationship to pronounce according to phonetic symbol with object language is needed, is obtained and described each The corresponding object language of the pronunciation of a substring.
Wherein, phonetic symbol pronunciation and the corresponding relationship needs of object language are preset.In this example, the corresponding relationship can lead to Cross Fig. 5 expression.
In Fig. 5, vowel and consonant form two-dimensional array and can obtain the hair with each substring by searching for the table The corresponding object language of sound.
Former sentence transliteration is as follows:
Te Naiteaise Lifibrate buys peppery husband.
In addition, consonant articulation is lighter due in English, generally requiring just has stress in conjunction with vowel, so single consonant Sound may be selected do not read, above-mentioned object language corresponding with pronunciation is handled accordingly, the final result are as follows:
Then, the numbered musical notation for obtaining former song, determines beat according to numbered musical notation, then separates above-mentioned transliteration according to beat, so as to User has timing when singing.
By disclosed method, the pronunciation of the transliteration file of foreign language can be converted into corresponding target Language knows the pronunciation of transliteration file will pass through object language, thus solve it is existing in the prior art, in face of outer When language, due to not knowing the pronunciation of foreign language, thus the problem of language expression can not be carried out.
Correspondingly, structural schematic diagram shown in Figure 6 is described disclosed herein as well is a kind of transliteration device of foreign language The transliteration device of foreign language includes: that translation module 100, segmentation module 200 and object language obtain module 300.
Wherein, the translation module 100, for obtaining the transliteration file after acquisition needs the transliteration file of transliteration Corresponding phonetic symbol, wherein the phonetic symbol of each word constitutes a phonetic characters string in the transliteration file;
The segmentation module 200, for according to phonetic symbol regular expression, the phonetic characters string to be divided into multiple sub- words Symbol string, wherein the corresponding phonetic symbol pronunciation of each substring;
In the application, the meaning of the phonetic symbol regular expression is to construct the character string of phonetic symbol pronunciation.
The object language obtains module 300, for the corresponding relationship according to phonetic symbol pronunciation and object language, acquisition and institute State the corresponding object language of pronunciation of each substring.
In the application, the corresponding relationship of phonetic symbol pronunciation and object language is obtained in advance, after getting each substring, The phonetic symbol of the substring is pronounced and is matched with the object language, the pronunciation that can obtain each substring is corresponding Object language.
Wherein, the object language is a kind of language different from the transliteration file.For example, if the transliteration file is English, and when user is only familiar with Chinese, the object language may be configured as Chinese, in this case, pass through the behaviour of step S13 Make, Chinese corresponding to the transliteration file of English can be obtained.
Further, the transliteration device of the foreign language further include:
Separating modules, for after obtaining corresponding with the pronunciation of each substring object language, according to institute The beat for stating transliteration file separates object language corresponding with the pronunciation of each substring.
Some transliteration files have corresponding rhythm, such as song.User encounters the English songs liked sometimes, but due to This foreign language is not grasped, often wants to sing and but does not know how, timing is not caught up with yet.In this case, the application is public The transliteration device for the foreign language opened further includes separating modules, in the pronunciation for obtaining object language corresponding with each substring Afterwards, the separating modules separate the pronunciation of the object language according to beat also according to the corresponding beat of song, to make pronunciation Distribution have timing, meet user singing the needs of.
Further, the transliteration device of the foreign language further include:
Phonetic symbol regular expression obtains module, for the language form according to the transliteration file, determines the transliteration text The phonetic symbol rule of part draws corresponding deterministic stresses DFA transition diagram according to the phonetic symbol rule, and according to described true Determine finite automata DFA transition diagram, obtains corresponding phonetic symbol regular expression.
If the transliteration file is English file, the phonetic symbol regular expression are as follows:
η=an|bman
Wherein, η indicates the pronunciation of phonetic symbol, and a indicates a vowel phonetic symbols, and b indicates a consonant phonetic symbol, m and n be greater than 0 positive integer.
Further, the segmentation module comprises determining that unit, judging unit, first processing units, second processing list Member, substring determination unit and construction unit, wherein
The determination unit is for determining currently pending phonetic symbol set, wherein the currently pending phonetic symbol collection is combined into First phonetic symbol in the phonetic characters string;
The judging unit for judging whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, If not meeting, corresponding operation is executed by the first processing units, if meeting, is executed by described the second processing unit corresponding Operation;
The last one phonetic symbol that the first processing units are used to set in the currently pending phonetic symbol set is described K-th of phonetic symbol in phonetic characters string, using the preceding k-1 phonetic symbol in the currently pending phonetic symbol set as substring, Divide away from the phonetic characters string, and sets first phonetic symbol in remaining phonetic characters string as currently pending sound Mark set, the remaining phonetic characters string is new phonetic characters string, and new phonetic characters string is transferred to the judgement Unit executes corresponding operation by the judging unit;
Described the second processing unit is for setting the kth that the currently pending phonetic symbol collection is combined into the phonetic characters string A phonetic symbol, judges whether k is equal to s, if being equal to, executes corresponding operation by the substring determination unit, if being not equal to, by The construction unit executes corresponding operation, wherein s indicates the phonetic characters string middle pitch target number;
The substring determination unit terminates for determining that the currently pending phonetic symbol collection is combined into a substring This segmentation;
The construction unit is used to the preceding k+1 phonetic symbol in the phonetic characters string being built into currently pending phonetic symbol collection It closes, and the currently pending phonetic symbol set is obtained by judging unit, execute corresponding operation.
By device disclosed in the present application, the pronunciation of the transliteration file of foreign language can be converted into corresponding target Language knows the pronunciation of transliteration file will pass through object language, thus solve it is existing in the prior art, in face of outer When language, due to not knowing the pronunciation of foreign language, thus the problem of language expression can not be carried out.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by following Claim is pointed out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (8)

1. a kind of transliteration method of foreign language characterized by comprising
After acquisition needs the transliteration file of transliteration, the corresponding phonetic symbol of the transliteration file is obtained, wherein the transliteration file In each word phonetic symbol constitute a phonetic characters string;
According to phonetic symbol regular expression, the phonetic characters string is divided into multiple substrings, wherein each substring pair Answer a phonetic symbol pronunciation;
According to the corresponding relationship of phonetic symbol pronunciation and object language, target corresponding with the pronunciation of each substring is obtained Language;
Wherein, described according to phonetic symbol regular expression, the phonetic characters string is divided into multiple substrings, comprising:
51) currently pending phonetic symbol set is determined, wherein the currently pending phonetic symbol collection is combined into the phonetic characters string First phonetic symbol;
52) judge whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, if not meeting, execute step 53) operation executes the operation of step 54) if meeting;
53) the last one phonetic symbol in the currently pending phonetic symbol set is set as k-th of sound in the phonetic characters string Mark, using the preceding k-1 phonetic symbol in the currently pending phonetic symbol set as substring, divides from the phonetic characters string It goes out, and sets first phonetic symbol in remaining phonetic characters string as currently pending phonetic symbol set, the remaining phonetic symbol Character string is new phonetic characters string, and returns to step operation 52);
54) it sets the currently pending phonetic symbol collection and is combined into k-th of phonetic symbol in the phonetic characters string, judge whether k is equal to S executes the operation of step 55) if being equal to, if being not equal to, executes the operation of step 56), wherein s indicates the phonetic characters String middle pitch target number;
55) it determines that the currently pending phonetic symbol collection is combined into a substring, terminates this segmentation;
56) the preceding k+1 phonetic symbol in the phonetic characters string is built into currently pending phonetic symbol set, and returned to step 52) operation.
2. the method according to claim 1, wherein the transliteration method of the foreign language further include:
After obtaining object language corresponding with the pronunciation of each substring, according to the beat of the transliteration file point Every object language corresponding with the pronunciation of each substring.
3. the method according to claim 1, wherein according to phonetic symbol regular expression, by the phonetic characters String is divided into before multiple substrings, further includes:
According to the language form of the transliteration file, the phonetic symbol rule of the transliteration file is determined;
According to the phonetic symbol rule, corresponding deterministic stresses DFA transition diagram is drawn;
According to the deterministic stresses DFA transition diagram, corresponding phonetic symbol regular expression is obtained.
4. method according to any one of claims 1 to 3, which is characterized in that if the transliteration file is English file, institute State phonetic symbol regular expression are as follows:
η=an|bman
Wherein, η indicates a phonetic symbol pronunciation, and a indicates that a vowel phonetic symbols, b indicate that a consonant phonetic symbol, m and n are greater than 0 Positive integer.
5. a kind of transliteration device of foreign language characterized by comprising
Translation module, for obtaining the corresponding phonetic symbol of the transliteration file after acquisition needs the transliteration file of transliteration, In, the phonetic symbol of each word constitutes a phonetic characters string in the transliteration file;
Divide module, for according to phonetic symbol regular expression, the phonetic characters string to be divided into multiple substrings, wherein The corresponding phonetic symbol pronunciation of each substring;
Object language obtains module, for the corresponding relationship according to phonetic symbol pronunciation and object language, obtains and each sub- word Accord with the corresponding object language of pronunciation of string;
The segmentation module comprises determining that unit, judging unit, first processing units, the second processing unit, substring determine Unit and construction unit, wherein
The determination unit is for determining currently pending phonetic symbol set, wherein the currently pending phonetic symbol collection is combined into described First phonetic symbol in phonetic characters string;
The judging unit is for judging whether the currently pending phonetic symbol set meets the phonetic symbol regular expression, if not Meet, execute corresponding operation by the first processing units, if meeting, corresponding behaviour is executed by described the second processing unit Make;
The first processing units are used to set the last one phonetic symbol in the currently pending phonetic symbol set as the phonetic symbol K-th of phonetic symbol in character string, using the preceding k-1 phonetic symbol in the currently pending phonetic symbol set as substring, from institute It states and divides away in phonetic characters string, and set first phonetic symbol in remaining phonetic characters string as currently pending phonetic symbol collection It closing, the remaining phonetic characters string is new phonetic characters string, and new phonetic characters string is transferred to the judging unit, Corresponding operation is executed by the judging unit;
Described the second processing unit is for setting k-th of sound that the currently pending phonetic symbol collection is combined into the phonetic characters string Mark, judges whether k is equal to s, if being equal to, corresponding operation is executed by the substring determination unit, if being not equal to, by described Construction unit executes corresponding operation, wherein s indicates the phonetic characters string middle pitch target number;
The substring determination unit terminates this for determining that the currently pending phonetic symbol collection is combined into a substring Segmentation;
The construction unit is used to the preceding k+1 phonetic symbol in the phonetic characters string being built into currently pending phonetic symbol set, And the currently pending phonetic symbol set is obtained by judging unit, execute corresponding operation.
6. device according to claim 5, which is characterized in that the transliteration device of the foreign language further include:
Separating modules, for after obtaining corresponding with the pronunciation of each substring object language, according to the sound The beat of translation part separates object language corresponding with the pronunciation of each substring.
7. device according to claim 5, which is characterized in that the transliteration device of the foreign language further include:
Phonetic symbol regular expression obtains module and determines the transliteration file for the language form according to the transliteration file Phonetic symbol rule draws corresponding deterministic stresses DFA transition diagram, and have according to the determination according to the phonetic symbol rule Automatic machine DFA transition diagram is limited, corresponding phonetic symbol regular expression is obtained.
8. according to the described in any item devices of claim 5 to 7, which is characterized in that if the transliteration file is English file, institute State phonetic symbol regular expression are as follows: η=an|bman;Wherein, η indicates a phonetic symbol pronunciation, and a indicates that a vowel phonetic symbols, b indicate one A consonant phonetic symbol, m and n are the positive integer greater than 0.
CN201410835020.0A 2014-12-26 2014-12-26 A kind of transliteration method and device of foreign language Active CN105786802B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410835020.0A CN105786802B (en) 2014-12-26 2014-12-26 A kind of transliteration method and device of foreign language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410835020.0A CN105786802B (en) 2014-12-26 2014-12-26 A kind of transliteration method and device of foreign language

Publications (2)

Publication Number Publication Date
CN105786802A CN105786802A (en) 2016-07-20
CN105786802B true CN105786802B (en) 2019-04-12

Family

ID=56389079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410835020.0A Active CN105786802B (en) 2014-12-26 2014-12-26 A kind of transliteration method and device of foreign language

Country Status (1)

Country Link
CN (1) CN105786802B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11200881B2 (en) 2019-07-26 2021-12-14 International Business Machines Corporation Automatic translation using deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101520903A (en) * 2009-04-23 2009-09-02 北京水晶石数字科技有限公司 Method for matching Chinese mouth shape of cartoon role
CN101593173A (en) * 2008-05-28 2009-12-02 中国科学院自动化研究所 A kind of reverse Chinese-English transliteration method and device
CN102023858A (en) * 2010-12-03 2011-04-20 上海交通大学 Software and hardware collaborative character matching system and matching method thereof
CN102193643A (en) * 2010-03-15 2011-09-21 北京搜狗科技发展有限公司 Word input method and input method system having translation function
CN102262450A (en) * 2010-05-27 2011-11-30 北京搜狗科技发展有限公司 Method and device for converting characters based on mixed input character string
CN103810993A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Text phonetic notation method and device
CN104239289A (en) * 2013-06-24 2014-12-24 富士通株式会社 Syllabication method and syllabication device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8521761B2 (en) * 2008-07-18 2013-08-27 Google Inc. Transliteration for query expansion
JP5090547B2 (en) * 2011-03-04 2012-12-05 楽天株式会社 Transliteration processing device, transliteration processing program, computer-readable recording medium recording transliteration processing program, and transliteration processing method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101593173A (en) * 2008-05-28 2009-12-02 中国科学院自动化研究所 A kind of reverse Chinese-English transliteration method and device
CN101520903A (en) * 2009-04-23 2009-09-02 北京水晶石数字科技有限公司 Method for matching Chinese mouth shape of cartoon role
CN102193643A (en) * 2010-03-15 2011-09-21 北京搜狗科技发展有限公司 Word input method and input method system having translation function
CN102262450A (en) * 2010-05-27 2011-11-30 北京搜狗科技发展有限公司 Method and device for converting characters based on mixed input character string
CN102023858A (en) * 2010-12-03 2011-04-20 上海交通大学 Software and hardware collaborative character matching system and matching method thereof
CN103810993A (en) * 2012-11-14 2014-05-21 北京百度网讯科技有限公司 Text phonetic notation method and device
CN104239289A (en) * 2013-06-24 2014-12-24 富士通株式会社 Syllabication method and syllabication device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
德语语音合成中的字音转换研究;王永生;《计算机工程与应用》;20091231;第45卷(第35期);第2节,图1

Also Published As

Publication number Publication date
CN105786802A (en) 2016-07-20

Similar Documents

Publication Publication Date Title
Black et al. Issues in building general letter to sound rules
CN103189860B (en) Combine the machine translation apparatus and machine translation method of syntax transformation model and vocabulary transformation model
JP7096919B2 (en) Entity word recognition method and device
CN105138514B (en) It is a kind of based on dictionary it is positive gradually plus a word maximum matches Chinese word cutting method
CN104731768B (en) A kind of location of incident abstracting method towards Chinese newsletter archive
McCurdy et al. Rhymedesign: A tool for analyzing sonic devices in poetry
CN103309926A (en) Chinese and English-named entity identification method and system based on conditional random field (CRF)
CN108509409A (en) A method of automatically generating semantic similarity sentence sample
CN111178076A (en) Named entity identification and linking method, device, equipment and readable storage medium
Raj et al. Text processing for text-to-speech systems in Indian languages.
JP5231484B2 (en) Voice recognition apparatus, voice recognition method, program, and information processing apparatus for distributing program
CN103810993A (en) Text phonetic notation method and device
CN105895076B (en) A kind of phoneme synthesizing method and system
CN105786802B (en) A kind of transliteration method and device of foreign language
CN106874294A (en) A kind of information matching method and device
JP5853595B2 (en) Morphological analyzer, method, program, speech synthesizer, method, program
CN102298927B (en) voice identifying system and method capable of adjusting use space of internal memory
Nastase et al. What’s in a name? In some languages, grammatical gender
WO2007105615A1 (en) Request content identification system, request content identification method using natural language, and program
Srithirath et al. A hybrid approach to lao word segmentation using longest syllable level matching with named entities recognition
JP5722375B2 (en) End-of-sentence expression conversion apparatus, method, and program
Juan et al. Fast bootstrapping of grapheme to phoneme system for under-resourced languages-application to the iban language
CN107368473B (en) Method for realizing voice interaction
Basumatary et al. Deep Learning Based Bodo Parts of Speech Tagger
Shadang et al. Towards the study of morphological processing of the tangkhul language

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C41 Transfer of patent application or patent right or utility model
TA01 Transfer of patent application right

Effective date of registration: 20160902

Address after: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping radio square B tower 13 floor 02 unit self

Applicant after: GUANGZHOU I9GAME INFORMATION TECHNOLOGY CO., LTD.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping B radio 14 floor tower square

Applicant before: Guangzhou Dongjing Computer Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200526

Address after: 310052 room 508, floor 5, building 4, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Alibaba (China) Co.,Ltd.

Address before: 510627 Guangdong city of Guangzhou province Whampoa Tianhe District Road No. 163 Xiping Yun Lu Yun Ping radio square B tower 13 floor 02 unit self

Patentee before: GUANGZHOU UCWEB COMPUTER TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right