CN106649291B - Korean transliteration method and device - Google Patents

Korean transliteration method and device Download PDF

Info

Publication number
CN106649291B
CN106649291B CN201611207837.9A CN201611207837A CN106649291B CN 106649291 B CN106649291 B CN 106649291B CN 201611207837 A CN201611207837 A CN 201611207837A CN 106649291 B CN106649291 B CN 106649291B
Authority
CN
China
Prior art keywords
korean
character
syllable
sequence
characters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611207837.9A
Other languages
Chinese (zh)
Other versions
CN106649291A (en
Inventor
陶县俊
邱宇扬
黄卓腾
姜宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201611207837.9A priority Critical patent/CN106649291B/en
Publication of CN106649291A publication Critical patent/CN106649291A/en
Application granted granted Critical
Publication of CN106649291B publication Critical patent/CN106649291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a Korean transliteration method and device, and belongs to the field of language processing. The method comprises the following steps: splitting the Korean information to obtain a plurality of Korean characters; inquiring phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments; and splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information. Because the word stock stores the corresponding relation between the Korean characters and the phonetic notation segments in advance, the phonetic notation segments corresponding to the Korean characters are inquired from the word stock, so that when the Korean information to be transliterated comprises a rare phrase or a new phrase popular on the Internet or an artificial phrase, the phonetic notation segments corresponding to each Korean character in the Korean information can be inquired, thereby carrying out accurate phonetic notation and improving the accuracy of the transliteration result.

Description

Korean transliteration method and device
Technical Field
The embodiment of the invention relates to the field of language processing, in particular to a Korean transliteration method and device.
Background
Transliteration (English) is used to translate words in one language into words or phonetic notations in another language that are similar to the pronunciation of the words. The current widely used korean transliteration technology is a transliteration technology based on a phrase library.
The core idea of the Korean transliteration technology based on the phrase library is as follows: manually collecting commonly used phrases in Korean and phonetic notation fragment sequences corresponding to each commonly used phrase in advance, and establishing a phrase library; the server splits the Korean information to be transliterated to obtain a plurality of groups of Korean phrases, selects a phonetic notation fragment sequence with the matching degree higher than a threshold value from a phrase library for each group of Korean phrases, and splices the selected plurality of groups of phonetic notation fragments according to the sequence corresponding to the Korean phrases to obtain phonetic notation information corresponding to the input Korean information.
According to the method, the phrase library is stored with the common phrases in the korean, and the common phrases are collected manually, so that the phrase library cannot cover all phrases in the korean, and when the korean phrase to be transliterated does not exist in the phrase library, the phonetic notation fragment sequence selected according to the matching degree is not the accurate phonetic notation of the korean phrase, so that the problem of low accuracy of the transliteration result occurs.
Disclosure of Invention
In order to solve the problem of low accuracy of the transliteration result of the current korean transliteration technology, the embodiment of the invention provides a korean transliteration method and device. The technical scheme is as follows:
in a first aspect, a korean transliteration method is provided, which includes:
splitting the Korean information to obtain a plurality of Korean characters;
inquiring phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments;
and splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
Optionally, splitting the korean information to obtain a plurality of korean characters, including:
detecting whether Korean characters to be subjected to sound variation exist between two adjacent Korean characters in the Korean information;
if the Korean characters to be voice-changed exist, replacing the Korean characters to be voice-changed with the Korean characters after voice-changing;
and obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change.
Optionally, detecting whether there are korean characters to be vocalized between two adjacent korean characters in the korean information includes:
splitting the Korean information into a plurality of groups of Korean phrases by taking the preset identification as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;
and detecting whether Korean characters to be subjected to sound variation exist between two connected Korean characters in the Korean phrase.
Optionally, detecting whether there are korean characters to be vocalized between two consecutive korean characters in the korean phrase, including:
acquiring a first monosyllabic sequence of a first Korean character and a second monosyllabic sequence of a second Korean character, wherein the first Korean character and the second Korean character are two adjacent Korean characters in a Korean phrase;
extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence;
detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination;
if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.
Optionally, replacing the korean characters to be inflected with the inflected korean characters, including:
when the first Korean character is a Korean character to be subjected to sound variation, performing sound variation on the tail syllable of the first single syllable sequence, recombining a third Korean character according to the first single syllable sequence after sound variation, and replacing the first Korean character with the third Korean character; and/or the presence of a gas in the gas,
when the second Korean character is the Korean character to be voice-changed, the head syllable of the second monosyllabic sequence is voice-changed, a fourth Korean character is recombined according to the voice-changed second monosyllabic sequence, and the fourth Korean character is used for replacing the second Korean character.
In a second aspect, there is provided a korean transliteration apparatus, including:
the splitting module is used for splitting the Korean information to obtain a plurality of Korean characters;
the query module is used for querying phonetic notation segments corresponding to the Korean characters from a character library, and the character library stores the corresponding relation between the Korean characters and the phonetic notation segments;
and the splicing module is used for splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
Optionally, the splitting module includes:
a detection unit, a replacement unit and an obtaining unit;
the detection unit is used for detecting whether Korean characters to be subjected to voice change exist between two adjacent Korean characters in the Korean information;
the replacing unit is used for replacing the Korean characters to be subjected to voice change with the Korean characters subjected to voice change if the Korean characters to be subjected to voice change exist;
and the obtaining unit is used for obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change.
Optionally, the detection unit comprises:
splitting the subunit and detecting the subunit;
the splitting subunit is used for splitting the Korean information into a plurality of groups of Korean phrases by taking the preset identification as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;
and the detection subunit is used for detecting whether Korean characters to be subjected to voice change exist between two connected Korean characters in the Korean phrase.
Optionally, the detecting subunit is further configured to obtain a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, where the first korean letter and the second korean letter are two adjacent korean letters in the korean phrase; extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence; detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination; if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.
Optionally, a replacement unit comprising:
a first replacement subunit and/or a second replacement subunit;
a first replacing subunit, configured to, when the first korean character is a korean character to be voice-changed, change a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the first monosyllabic sequence after the change of the voice, and replace the first korean character with the third korean character;
and a second replacing subunit, configured to, when the second korean character is a korean character to be voice-changed, change a head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the second monosyllabic sequence after the change of the voice, and replace the second korean character with the fourth korean character.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the method comprises the steps that the corresponding relation between Korean characters and phonetic notation fragments is stored in a character library in advance, Korean information is split to obtain a plurality of Korean characters, the phonetic notation fragments corresponding to the Korean characters are inquired from the character library, and the inquired phonetic notation fragments are spliced according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information; therefore, when the Korean information to be transliterated comprises the obscure phrases or the network popular new phrases or the self-created phrases, the phonetic notation segment corresponding to each Korean character in the Korean information can be inquired, so that the phonetic notation is accurately performed, and the accuracy of the transliteration result is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a flowchart of a korean transliteration method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a Korean transliteration method according to another embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a Korean transliteration method according to another embodiment of the present invention;
FIG. 4 is a flowchart illustrating a Korean transliteration method according to another embodiment of the present invention;
fig. 5 is a block diagram of a korean transliteration device according to an embodiment of the present invention;
fig. 6 is a block diagram of a korean transliteration device according to another embodiment of the present invention;
fig. 7 is a block diagram of a terminal provided by an embodiment of the present invention;
fig. 8 is a structural framework diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Please refer to fig. 1, which illustrates a flowchart of a korean transliteration method according to an embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and the following embodiments are described with the subject of the korean transliteration method being the server. The Korean transliteration method comprises the following steps:
step 101, splitting the Korean information to obtain a plurality of Korean characters.
Optionally, the server acquires the Korean information to be transliterated, and splits the Korean information to obtain a plurality of Korean characters; the korean information is information of which the character type is korean, and the information is a phrase or a sentence or a segment of characters or an article, which is not limited in this embodiment.
For example, the Korean information to be transliterated is
Figure BDA0001190362970000051
The server transmits the Korean information
Figure BDA0001190362970000052
Splitting to obtain four Korean characters
Figure BDA0001190362970000053
And
Figure BDA0001190362970000054
step 102, searching phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments.
Alternatively, since there are 11172 korean characters in total, the server deconstructs each korean character into a corresponding monosyllabic sequence including at least one monosyllable, also called a monosyllable symbol or a monosyllable stroke, constituting the korean character according to a predetermined encoding rule in advance; for each Korean character, the server generates a phonetic notation segment corresponding to the Korean character according to the single syllable sequence corresponding to the Korean character; illustratively, the predetermined encoding rule is an encoding rule based on a unicode (chinese: unicode) character set, the encoding section of the korean alphabet in the unicode character set is AC00 to D7AF, and the unicode encoding section of the monosyllable constituting the korean alphabet is 1100 to 11 FF.
The predetermined encoding rule may also be based on a GB18030 character Set, an UCS character Set (universal character Set), or other character sets supporting korean characters, and the present embodiment does not limit the type of the predetermined encoding rule and the encoding range of the korean characters in the predetermined encoding rule.
Optionally, the server establishes a word stock in advance according to all the korean characters, the single syllable sequence corresponding to each korean character and the corresponding phonetic notation fragment; illustratively, the phonetic notation type of the phonetic notation segment is a Roman phonetic notation type and/or a Chinese phonetic notation type. The present embodiment does not limit the ZhuYin type of the ZhuYin segment.
And in combination with the reference table I, the corresponding relation among the Korean characters, the single syllable sequence, the Roman phonetic notation and the Chinese phonetic notation is stored in a character library established by the server.
For example, when Korean is
Figure BDA0001190362970000055
The Korean characters
Figure BDA0001190362970000056
The corresponding monosyllabic sequence is
Figure BDA0001190362970000057
The corresponding Roman phonetic notation is 'bek', and the corresponding Chinese character phonetic notation is 'back'; when the Korean character is
Figure BDA0001190362970000058
The Korean characters
Figure BDA0001190362970000059
The corresponding monosyllabic sequence is
Figure BDA00011903629700000510
The corresponding Roman phonetic notation is 'ba', and the corresponding Chinese character phonetic notation is 'bar'; when the Korean character is
Figure BDA00011903629700000511
The Korean characters
Figure BDA00011903629700000512
The corresponding monosyllabic sequence is
Figure BDA00011903629700000513
Figure BDA00011903629700000514
The corresponding Roman phonetic notation is da, and the corresponding Chinese character phonetic notation is answer.
Watch 1
Figure BDA00011903629700000515
For example, when the Korean characters obtained by the server are respectively
Figure BDA00011903629700000516
And
Figure BDA00011903629700000517
then, the server searches the character library for the Korean character
Figure BDA0001190362970000061
The corresponding phonetic segment is "go" and Korean
Figure BDA0001190362970000062
Corresponding phonetic notation segment is 'ma' and Korean character
Figure BDA0001190362970000063
The corresponding phonetic segment is "wa" and Korean
Figure BDA0001190362970000064
The corresponding ZhuYin fragment is "yo".
For another example, when the Korean characters obtained by the server are respectively
Figure BDA0001190362970000065
And
Figure BDA0001190362970000066
then, the server searches the character library for the Korean character
Figure BDA0001190362970000067
The corresponding phonetic segment is 'groove' and Korean character
Figure BDA0001190362970000068
The corresponding phonetic segment is 'Dou', and Korean character
Figure BDA0001190362970000069
The corresponding phonetic notation is 'Wa' and Korean
Figure BDA00011903629700000610
The corresponding ZhuYin segment is "C".
And 103, splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
Optionally, the server splices the searched phonetic notation fragments according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information.
For example, the server can be based on Korean characters
Figure BDA00011903629700000611
And
Figure BDA00011903629700000612
information in Korean
Figure BDA00011903629700000613
The phonetic segments "go", "ma", "wa" and "yo" corresponding to the four Korean characters are spliced to obtain Korean information
Figure BDA00011903629700000614
And corresponding ZhuYin information 'go ma wa yo'.
For another example, the server can be based on Korean characters
Figure BDA00011903629700000615
And
Figure BDA00011903629700000616
information in Korean
Figure BDA00011903629700000617
Figure BDA00011903629700000618
The phonetic segment 'groove', 'Doma', 'Wa' and 'C' corresponding to the four Korean characters are spliced to obtain the information of Korean character
Figure BDA00011903629700000619
The corresponding phonetic notation information "mojawa" is included.
In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.
For example, if the tail syllable corresponding to the previous korean character and the head syllable corresponding to the next adjacent character belong to a inflexion syllable combination in one korean phrase, there is a korean character to be inflexion between the two korean characters, and the actual pronunciation of the korean character to be inflexion is not the phonetic segment corresponding to itself but the phonetic segment corresponding to the inflexion korean character.
For example, Korean information
Figure BDA00011903629700000620
The first Korean character in the Chinese
Figure BDA00011903629700000621
The corresponding tail syllable is
Figure BDA00011903629700000622
The second Korean character
Figure BDA00011903629700000623
The corresponding head syllable is
Figure BDA00011903629700000624
Tail syllable
Figure BDA00011903629700000625
And head syllable
Figure BDA00011903629700000626
Belongs to the syllable combination of sound variation
Figure BDA0001190362970000071
Then the Korean character to be voice-changed is selected according to the Korean voice-changing rule
Figure BDA0001190362970000072
Tail syllable of
Figure BDA0001190362970000073
Is replaced by
Figure BDA0001190362970000074
Obtain the Korean characters after sound change
Figure BDA0001190362970000075
Namely at
Figure BDA0001190362970000076
In
Figure BDA0001190362970000077
Is no longer the actual pronunciation of
Figure BDA0001190362970000078
Corresponding phonetic notation fragment 'bad', but Korean character after sound change
Figure BDA0001190362970000079
The corresponding ZhuYin fragment "ban". To this end, the invention also provides the following examples.
Please refer to fig. 2, which shows a flowchart of a korean transliteration method according to another embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and includes:
step 201, detecting whether there is a korean character to be voice-changed between two adjacent korean characters in the korean information.
Optionally, the server detects whether there is a korean character to be inflected between two adjacent korean characters in the korean information.
In one possible implementation, the detecting step can be implemented as the following steps:
1. the server obtains a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, the first korean letter and the second korean letter being two adjacent korean letters in the korean information.
Wherein the first sequence of monosyllabic comprises at least one monosyllable constituting a first korean letter and the second sequence of monosyllabic comprises at least one monosyllable constituting a second korean letter.
As shown in table one, the server establishes a word stock in advance, in which a corresponding relationship between the korean characters and the monosyllabic sequences is stored, and when the server obtains the korean information to be transliterated, the server searches the monosyllabic sequences corresponding to each korean character in the korean information from the word stock.
Illustratively, the server obtains a first Korean letter
Figure BDA00011903629700000710
First monosyllabic sequence of (1)
Figure BDA00011903629700000711
And the second Korean character
Figure BDA00011903629700000712
Second monosyllabic sequence of (1)
Figure BDA00011903629700000713
2. The server extracts the tail syllable of the first monosyllabic sequence and the head syllable of the second monosyllabic sequence.
Illustratively, the server extracts the first monosyllabic sequence
Figure BDA00011903629700000714
Tail syllable of
Figure BDA00011903629700000715
And a second monosyllabic sequence
Figure BDA00011903629700000716
Of (2) a head syllable
Figure BDA00011903629700000717
3. The server detects whether the tail and head syllables belong to inflected syllable combinations.
Alternatively, as shown in table two, the server stores in advance a syllable-variant combination database created according to the syllable-variant rule of korean, in which syllable-variant combinations and after-syllable-variant are storedAnd (4) correspondence of syllable combinations. For example, when inflexion syllables are combined into
Figure BDA00011903629700000718
The corresponding syllable combination after the inflexion is as
Figure BDA00011903629700000719
Figure BDA00011903629700000720
When the syllable of the inflexion is combined into
Figure BDA00011903629700000721
The corresponding syllable combination after the inflexion is as
Figure BDA00011903629700000722
When the syllable of the inflexion is combined into
Figure BDA00011903629700000723
The corresponding syllable combination after the inflexion is as
Figure BDA00011903629700000724
Watch two
Figure BDA0001190362970000081
Illustratively, the server bases the tail syllables
Figure BDA0001190362970000082
And head syllable
Figure BDA0001190362970000083
Querying the inflexion syllable combination library whether the combination exists
Figure BDA0001190362970000084
If so, the combination is detected
Figure BDA0001190362970000085
Belonging to the syllable combination of inflexion.
4. If the tail syllable and the head syllable belong to the inflexion syllable combination, the server determines that there is a Korean character to be inflexion.
When the tail syllable and the head syllable belong to the combination of the inflexion syllables, the tail syllable may need to be inflexion, i.e. the first korean character is the korean character to be inflexion, the head syllable may need to be inflexion, i.e. the second korean character is the korean character to be inflexion, or both the tail syllable and the head syllable may need to be inflexion, i.e. the first korean character and the second korean character are both the korean characters to be inflexion.
Illustratively, if the tail syllable
Figure BDA0001190362970000086
And head syllable
Figure BDA0001190362970000087
Belonging to the combination of syllable-changing, the server determines the first Korean character
Figure BDA0001190362970000088
And the second Korean character
Figure BDA0001190362970000089
There are Korean characters to be changed.
Step 202, if there is a korean character to be inflected, replacing the korean character to be inflected with the inflected korean character.
Optionally, the server replaces the syllable combination after the sound change with the syllable combination after the sound change according to the searched syllable combination after the sound change, and replaces the first korean character and the second korean character according to the syllable combination after the sound change.
Optionally, the server judges whether the tail syllable in the syllable combination after the variation is the same as the tail syllable in the syllable combination after the variation according to the searched syllable combination after the variation, if so, the server determines that the first korean character is not the korean character to be varied; if the first Korean character is different from the first Korean character, determining that the first Korean character is the Korean character to be subjected to sound variation, performing sound variation on the tail syllable of the first mono-syllable sequence, recombining a third Korean character according to the first mono-syllable sequence after sound variation, and replacing the first Korean character with the third Korean character.
Optionally, the server judges whether the head syllable of the syllable combination after the variation is the same as the head syllable of the syllable combination after the variation according to the searched syllable combination after the variation, if so, the server determines that the second korean character is not the korean character to be varied; if the first Korean character is different from the second Korean character, determining that the second Korean character is the Korean character to be subjected to sound variation, performing sound variation on the head syllable of the second monosyllabic sequence, recombining a fourth Korean character according to the second monosyllabic sequence after sound variation, and replacing the second Korean character with the fourth Korean character.
For example, the first Korean character is
Figure BDA0001190362970000091
The second Korean character is
Figure BDA0001190362970000092
The syllable combination of the inflexion is as
Figure BDA0001190362970000093
Combined with the searched syllable after inflexion into
Figure BDA0001190362970000094
In contrast, the server determines the tail syllable
Figure BDA0001190362970000095
Different, but head syllables
Figure BDA0001190362970000096
If the first Korean character is the Korean character to be changed, the second Korean character is not the Korean character to be changed. For the tail syllable of the first monosyllabic sequence
Figure BDA0001190362970000097
Performing sound change according to the first monosyllable after sound changeSequence of
Figure BDA0001190362970000098
Recombining the third Korean character
Figure BDA0001190362970000099
Using the third Korean character
Figure BDA00011903629700000910
Replace the first Korean character
Figure BDA00011903629700000911
Step 203, obtaining a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters.
Optionally, the server obtains a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters; illustratively, the server is based on the third Korean letter
Figure BDA00011903629700000912
Obtaining and compiling Korean information
Figure BDA00011903629700000913
A plurality of corresponding Korean characters are respectively
Figure BDA00011903629700000914
And
Figure BDA00011903629700000915
step 204, searching phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments.
Optionally, the server queries phonetic notation segments corresponding to the korean characters from a character library, wherein the character library stores the corresponding relationship between the korean characters and the phonetic notation segments; the phonetic notation type of the phonetic notation fragment includes at least one of a roman phonetic notation type and a chinese phonetic notation type, and may further include other non-korean phonetic notation types, which is not limited in this embodiment.
And step 205, splicing the searched phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
Optionally, the server splices the searched phonetic notation fragments according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information.
Optionally, the server sends the phonetic notation information to the terminal after obtaining the phonetic notation information corresponding to the korean information; correspondingly, the terminal receives the phonetic notation information, automatically determines the phonetic notation type or receives the phonetic notation type selected by the user, and outputs the phonetic notation information corresponding to the Korean information according to the phonetic notation type.
In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.
The embodiment also comprises the steps of detecting whether Korean characters to be subjected to sound variation exist between two adjacent Korean characters in the Korean information, replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation if the Korean characters to be subjected to sound variation exist, and obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters subjected to sound variation; therefore, when the Korean information contains the Korean character needing sound change, the Korean character to be sound changed can be replaced by the Korean character after sound change, and the phonetic notation segment corresponding to the Korean character after sound change is inquired from the word stock, so that the actual pronunciation of the Korean character to be sound changed is accurately marked.
In the embodiment, the phonetic notation type comprises at least one of a roman phonetic notation type and a Chinese phonetic notation type, the terminal receives the phonetic notation information, automatically determines the phonetic notation type or receives the phonetic notation type selected by the user, and outputs the phonetic notation information corresponding to the Korean information according to the phonetic notation type; the terminal can selectively output the phonetic notation information according to the determined phonetic notation type, and the flexibility of the phonetic notation mode is improved.
In a specific example, as shown in fig. 3, the server acquires korean information
Figure BDA0001190362970000101
The Korean information is stored in a character library
Figure BDA0001190362970000102
Deconstructing to obtain Korean characters
Figure BDA0001190362970000103
A single syllable sequence of
Figure BDA0001190362970000104
Figure BDA0001190362970000105
Korean characters
Figure BDA0001190362970000106
A single syllable sequence of
Figure BDA0001190362970000107
And Korean characters
Figure BDA0001190362970000108
A single syllable sequence of
Figure BDA0001190362970000109
The server then extracts the monosyllabic sequence
Figure BDA00011903629700001010
Tail syllable of
Figure BDA00011903629700001011
And monosyllabic sequences
Figure BDA00011903629700001012
Of (2) a head syllable
Figure BDA00011903629700001013
According to the syllable-variant syllable combination library detection
Figure BDA00011903629700001014
Belongs to the syllable combination of inflexion, and the syllable combination after inquiring the corresponding inflexion is
Figure BDA00011903629700001015
Server re-extracts monosyllabic sequences
Figure BDA00011903629700001016
Tail syllable of
Figure BDA00011903629700001017
And monosyllabic sequences
Figure BDA00011903629700001018
Of (2) a head syllable
Figure BDA00011903629700001019
Detecting from a library of inflexion syllable combinations
Figure BDA00011903629700001020
Not belonging to inflexion syllable combination, namely, inflexion is not needed; then, the server processes the Korean characters
Figure BDA00011903629700001021
Corresponding tail syllable
Figure BDA00011903629700001022
Performing sound change, based on the changed monosyllabic sequence
Figure BDA00011903629700001023
Recombining to obtain Korean characters
Figure BDA00011903629700001024
Using Korean characters
Figure BDA00011903629700001025
Substitution of Korean characters
Figure BDA00011903629700001026
Obtaining and compiling Korean information
Figure BDA00011903629700001027
A plurality of corresponding Korean characters for phonetic notation are respectively
Figure BDA00011903629700001028
And
Figure BDA00011903629700001029
finally, the server searches the character library for Korean characters
Figure BDA00011903629700001030
The phonetic segment is "ban" and Korean
Figure BDA00011903629700001031
The phonetic segment is "nen" and Korean
Figure BDA00011903629700001032
The phonetic notation fragment is 'da', and the inquired phonetic notation fragments 'ban', 'nen' and 'da' are spliced in sequence to obtain the Korean information
Figure BDA00011903629700001033
Corresponding phonetic notation information "ban nen da".
In some possible cases, in the korean information to be transliterated, when an identifier of a non-korean character is included between two korean characters, such as a space symbol or a punctuation symbol, the server does not perform the step of detecting whether there is a korean character to be transliterated between the two korean characters, even if the tail syllable of the first monosyllabic sequence and the head syllable of the second monosyllabic sequence corresponding to the two korean characters belong to a transliterated syllable combination, there is no need for replacement.
Please refer to fig. 4, which illustrates a flowchart of a korean transliteration method according to another embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and includes:
step 401, taking the predetermined mark as a splitting position, splitting the korean information into a plurality of groups of korean phrases. The predetermined indicia includes at least one of a space symbol and a punctuation symbol.
Optionally, the server sets a predetermined identifier in advance, and when the server acquires the korean information to be transliterated, the server splits the korean information into a plurality of groups of korean phrases by using the predetermined identifier as a splitting position according to a predetermined coding rule; illustratively, the predetermined encoding rule is an encoding rule based on a unicode character set, and the space symbol and the punctuation mark have encoding intervals of 4000 to 403F in the unicode character set. The present embodiment does not limit the type of the predetermined encoding rule and the encoding range of the space symbol and punctuation mark in the predetermined encoding rule.
Optionally, the predetermined identifier further comprises at least one of other country characters, graphic symbols, mathematical symbols, and control symbols.
For example, the Korean information to be transliterated is
Figure BDA0001190362970000111
The server detects punctuation marks in the Korean information, and takes the punctuation marks as splitting positions to split the Korean information into two groups of Korean phrases which are respectively Korean phrases
Figure BDA0001190362970000112
And
Figure BDA0001190362970000113
as another example, the Korean information to be transliterated is
Figure BDA0001190362970000114
The server detects the space symbol in the Korean information, and divides the Korean information into two groups of Korean phrases with the space symbol as the dividing position
Figure BDA0001190362970000115
And
Figure BDA0001190362970000116
alternatively, after the server splits the korean information into several groups of korean phrases, steps 402 to 406 are performed for each group of korean phrases, the specific details of which may refer to the embodiment provided in fig. 2.
Step 402, detecting whether there is a korean character to be voice-changed between two connected korean characters in the korean phrase.
Alternatively, the detecting step can be implemented as several steps as follows:
1. the server obtains a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, wherein the first korean letter and the second korean letter are two adjacent korean letters in a korean phrase.
2. The server extracts the tail syllable of the first monosyllabic sequence and the head syllable of the second monosyllabic sequence.
3. The server detects whether the tail and head syllables belong to inflected syllable combinations.
4. If the tail syllable and the head syllable belong to the inflexion syllable combination, the server determines that there is a Korean character to be inflexion.
Step 403, replacing the korean characters to be voice-changed with the voice-changed korean characters.
Optionally, when the first korean character is a korean character to be inflected, the server inflects a tail syllable of the first monosyllabic sequence, recombines a third korean character according to the inflected first monosyllabic sequence, and replaces the first korean character with the third korean character.
Alternatively, when the second korean character is a korean character to be inflected, the server inflects the first syllable of the second mono-syllable sequence, recombines a fourth korean character according to the inflected second mono-syllable sequence, and replaces the second korean character with the fourth korean character.
Step 404, obtaining a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters.
Optionally, the server obtains a plurality of korean characters corresponding to the korean information according to the vocalized korean characters.
Step 405, searching phonetic notation segments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation segments.
Optionally, the server searches a phonetic notation fragment corresponding to the korean character from a character library, and the character library stores a corresponding relationship between the korean character and the phonetic notation fragment.
And step 406, splicing the searched phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
Optionally, the server splices the phonetic transcription segments searched in each group of korean phrases according to the sequence of the korean characters in the korean phrases, and then splices the spliced korean phrases according to the sequence of the korean phrases in the korean information to obtain phonetic transcription information corresponding to the korean information.
In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.
The embodiment also takes the preset mark as a splitting position, splits the Korean information into a plurality of groups of Korean phrases, and detects whether Korean characters to be inflected exist between two connected Korean characters in the Korean phrases; when the Korean information contains the preset identification, the Korean information can be split according to the preset identification, the condition that two Korean characters separated by the preset identification in two adjacent Korean phrases are subjected to sound change is avoided, and therefore the obtained phonetic notation information is more accurate.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Please refer to fig. 5, which illustrates a schematic structural diagram of a korean transliteration apparatus according to an embodiment of the present invention. The Korean transliteration device includes:
the splitting module 520 is configured to split the korean information to obtain a plurality of korean characters;
the query module 540 is configured to query phonetic notation segments corresponding to the korean characters from a character library, where the character library stores a corresponding relationship between the korean characters and the phonetic notation segments;
and the splicing module 560 is configured to splice the searched phonetic notation fragments according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information.
In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.
Please refer to fig. 6, which illustrates a schematic structural diagram of a korean transliteration apparatus according to another embodiment of the present invention. The Korean transliteration device includes:
a splitting module 520 comprising:
a detection unit 521, a replacement unit 522, and a deriving unit 523;
a detecting unit 521, configured to detect whether there is a korean character to be vocalized between two adjacent korean characters in the korean information;
a replacing unit 522, configured to replace the korean characters to be voice-changed with the voice-changed korean characters if there are korean characters to be voice-changed;
an obtaining unit 523 configured to obtain a plurality of korean characters corresponding to the korean information from the vocalized korean characters after the change of the sound.
A detection unit 521, including:
a splitting subunit 521a and a detecting subunit 521 b;
a splitting subunit 521a, configured to split the korean information into a plurality of groups of korean phrases by using the predetermined identifier as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;
the detecting subunit 521b is configured to detect whether there is a korean character to be vocalized between two consecutive korean characters in the korean phrase.
The detecting subunit 521a is further configured to obtain a first monosyllabic sequence of the first korean letter and a second monosyllabic sequence of the second korean letter, where the first korean letter and the second korean letter are two adjacent korean letters in the korean phrase; extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence; detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination; if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.
A replacement unit 522, comprising:
a first replacement subunit 522a and/or a second replacement subunit 522 b;
a first replacing subunit 522a, configured to, when the first korean character is a korean character to be voice-shifted, shift a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the shifted first monosyllabic sequence, and replace the first korean character with the third korean character;
a second replacing subunit 522b, configured to, when the second korean character is a korean character to be vocalized, vocalize a head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the vocalized second monosyllabic sequence, and replace the second korean character with the fourth korean character.
In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.
Referring to fig. 7, a block diagram of a terminal 700 according to an embodiment of the invention is shown. Specifically, the method comprises the following steps: device 700 may include RF (Radio Frequency) circuitry 710, memory 720 including one or more computer-readable storage media, input unit 730, display unit 740, sensors 750, audio circuitry 760, WiFi (wireless fidelity) module 770, processor 780 including one or more processing cores, and power supply 790. Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 7 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:
RF circuit 710 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink information from a base station and processing the received downlink information by one or more processors 780; in addition, data relating to uplink is transmitted to the base station. In general, RF circuit 710 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuit 710 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (short messaging Service), etc. Memory 720 may be used to store software programs and modules. The processor 780 executes various functional applications and data processing by executing software programs and modules stored in the memory 720. The memory 720 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the stored data area may store data (such as audio data, a phonebook, etc.) created according to the use of the device 700, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 720 may also include a memory controller to provide access to memory 720 by processor 780 and input unit 730.
The input unit 730 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 730 may include a touch-sensitive surface 731 as well as other input devices 732. Touch-sensitive surface 731, also referred to as a touch display screen or touch pad, can collect touch operations by a user on or near touch-sensitive surface 731 (e.g., operations by a user on or near touch-sensitive surface 731 using a finger, stylus, or any other suitable object or attachment) and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface 731 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 780, and can receive and execute commands from the processor 780. In addition, the touch-sensitive surface 731 can be implemented in a variety of types, including resistive, capacitive, infrared, and surface acoustic wave. The input unit 730 may also include other input devices 732 in addition to the touch-sensitive surface 731. In particular, other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 740 may be used to display information input by or provided to the user, as well as various graphical user interfaces of the device 70, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 740 may include a Display panel 741, and optionally, the Display panel 741 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, touch-sensitive surface 731 can be overlaid on display panel 741, such that when touch-sensitive surface 731 detects a touch operation thereon or thereabout, processor 780 can determine the type of touch event, and processor 780 can then provide a corresponding visual output on display panel 741 based on the type of touch event. Although in FIG. 7 the touch-sensitive surface 731 and the display panel 741 are implemented as two separate components to implement input and output functions, in some embodiments the touch-sensitive surface 731 and the display panel 741 may be integrated to implement input and output functions.
The device 700 may also include at least one sensor 750, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 741 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 741 and/or a backlight when the device 700 is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which may be further configured to the device 700, the detailed description is omitted here.
Audio circuitry 760, speaker 721, microphone 722 may provide an audio interface between a user and device 700. The audio circuit 760 may transmit the received electrical signal converted from the audio data to the speaker 721, and convert the received electrical signal into an audio signal for output by the speaker 721; on the other hand, the microphone 722 converts the collected sound signals into electrical signals, which are received by the audio circuit 760 and converted into audio data, which are then processed by the audio data output processor 780, either by the RF circuit 710 for transmission to another device, or by outputting the audio data to the memory 720 for further processing. The audio circuitry 760 may also include an earbud jack to provide communication of peripheral headphones with the device 700.
WiFi belongs to short-range wireless transmission technology, and the device 700 can help the user send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 770, which provides wireless broadband internet access for the user. Although fig. 7 shows WiFi module 770, it is understood that it does not belong to the essential constitution of device 700 and may be omitted entirely as needed within the scope not changing the essence of the invention.
The processor 780 is the control center for the device 700, connects the various parts of the overall device using various interfaces and lines, and performs various functions of the device 700 and processes data by running or executing software programs and/or modules stored in the memory 720 and calling up data stored in the memory 720, thereby monitoring the device as a whole. Alternatively, processor 780 may include one or more processing cores; alternatively, processor 780 may integrate an application processor that handles primarily the operating system, user interface, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 780.
The device 700 also includes a power supply 790 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 780 via a power management system that may be used to manage charging, discharging, and power consumption. The power supply 790 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
Although not shown, the device 700 may also include a camera, a bluetooth module, etc., which are not described in detail herein.
The apparatus 700 also includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, such that the device 700 is capable of performing the aforementioned method of transliteration of korean performed by the terminal.
Referring to fig. 8, a structural framework diagram of a server according to an embodiment of the present invention is shown. Specifically, the method comprises the following steps: the server 800 includes a Central Processing Unit (CPU)801, a system memory 804 including a Random Access Memory (RAM)802 and a Read Only Memory (ROM)803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The server 800 also includes a basic input/output system (I/O system) 806, which facilitates transfer of information between devices within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.
The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse, keyboard, etc. for user input of information. Wherein the display 808 and the input device 809 are connected to the central processing unit 801 through an input output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 810 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or CD-ROI drive.
Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage 807 described above may be collectively referred to as memory.
The server 800 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the invention. That is, the server 800 may be connected to the network 812 through the network interface unit 811 coupled to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.
The memory further includes one or more programs stored in the memory, the one or more programs including steps for performing the server cluster in the korean transliteration method provided by the embodiments of the present invention.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (4)

1. A method for transliteration of korean, the method comprising:
splitting the Korean information into a plurality of groups of Korean phrases by taking a preset mark as a splitting position, wherein the preset mark comprises at least one of a space symbol and a punctuation mark symbol;
acquiring a first monosyllabic sequence of a first korean text and a second monosyllabic sequence of a second korean text, extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence, detecting whether the tail syllable and the head syllable belong to a inflected syllable combination, and determining that there is a korean text to be inflected if the tail syllable and the head syllable belong to the inflected syllable combination, wherein the korean text is composed of the monosyllabic sequences, and the first korean text and the second korean text are two adjacent korean texts in a korean phrase;
if the Korean characters to be subjected to sound variation exist, replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation;
obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change;
searching a word stock for Roman phonetic notation and Chinese phonetic notation corresponding to each single syllable sequence in the Korean characters, wherein the word stock stores the corresponding relation among the Korean characters, the single syllable sequence, the Roman phonetic notation and the Chinese phonetic notation;
and splicing the searched Roman phonetic notation and Chinese phonetic notation respectively according to the sequence of the single syllable sequence in the Korean characters and the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
2. The method according to claim 1, wherein the replacing the korean characters to be inflected with the inflected korean characters comprises:
when the first Korean character is the Korean character to be voice-changed, voice-changing the tail syllable of the first monosyllabic sequence, recombining a third Korean character according to the voice-changed first monosyllabic sequence, and replacing the first Korean character with the third Korean character; and/or the presence of a gas in the gas,
when the second Korean character is the Korean character to be voice-changed, the head syllable of the second monosyllabic sequence is voice-changed, a fourth Korean character is recombined according to the voice-changed second monosyllabic sequence, and the second Korean character is replaced by the fourth Korean character.
3. A korean transliteration device, the device comprising:
a splitting module comprising: a detection unit, a replacement unit and an obtaining unit;
the detection unit includes: splitting the subunit and detecting the subunit;
the splitting subunit is configured to split the korean information into a plurality of groups of korean phrases by using a predetermined identifier as a splitting position, where the predetermined identifier includes at least one of a space symbol and a punctuation mark;
the detecting subunit is configured to obtain a first monosyllable sequence of a first korean character and a second monosyllable sequence of a second korean character, extract a tail syllable of the first monosyllable sequence and a head syllable of the second monosyllable sequence, detect whether the tail syllable and the head syllable belong to a inflexion syllable combination, and determine that there is a korean character to be inflexion if the tail syllable and the head syllable belong to the inflexion syllable combination, where the korean character is composed of the monosyllable sequence, and the first korean character and the second korean character are two adjacent korean characters in a korean phrase;
the replacing unit is used for replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation if the Korean characters to be subjected to sound variation exist;
the obtaining unit is used for obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change;
the query module is used for querying Roman phonetic notation and Chinese character phonetic notation corresponding to each single syllable sequence in the Korean characters from a character library, and the character library stores the corresponding relation among the Korean characters, the single syllable sequences, the Roman phonetic notation and the Chinese character phonetic notation;
and the splicing module is used for splicing the searched Roman phonetic notation and Chinese phonetic notation respectively according to the sequence of the single syllable sequence in the Korean characters and the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.
4. The apparatus of claim 3, wherein the replacement unit comprises:
a first replacement subunit and/or a second replacement subunit;
the first replacing subunit is configured to, when the first korean character is the korean character to be inflected, inflict a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the inflected first monosyllabic sequence, and replace the first korean character with the third korean character;
and the second replacing subunit is configured to, when the second korean character is the korean character to be vocalized, vocalize the head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the vocalized second monosyllabic sequence, and replace the second korean character with the fourth korean character.
CN201611207837.9A 2016-12-23 2016-12-23 Korean transliteration method and device Active CN106649291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611207837.9A CN106649291B (en) 2016-12-23 2016-12-23 Korean transliteration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611207837.9A CN106649291B (en) 2016-12-23 2016-12-23 Korean transliteration method and device

Publications (2)

Publication Number Publication Date
CN106649291A CN106649291A (en) 2017-05-10
CN106649291B true CN106649291B (en) 2020-10-09

Family

ID=58826912

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611207837.9A Active CN106649291B (en) 2016-12-23 2016-12-23 Korean transliteration method and device

Country Status (1)

Country Link
CN (1) CN106649291B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080027311A (en) * 2008-03-06 2008-03-26 이영춘 Method of transformation of korean to roman spelling and computer memory device recording computer program of the method
KR20090066067A (en) * 2007-12-18 2009-06-23 한국전자통신연구원 Method and apparatus for providing hybrid automatic translation
CN101571854A (en) * 2009-06-11 2009-11-04 四川国腾通讯股份有限公司 System and method for exchange and communication among multinational languages

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539909A (en) * 2009-04-10 2009-09-23 无敌科技(西安)有限公司 Method and device for translating Thai into Romanization
CN103810993B (en) * 2012-11-14 2020-07-10 北京百度网讯科技有限公司 Text phonetic notation method and device
KR20150105075A (en) * 2014-03-07 2015-09-16 한국전자통신연구원 Apparatus and method for automatic interpretation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20090066067A (en) * 2007-12-18 2009-06-23 한국전자통신연구원 Method and apparatus for providing hybrid automatic translation
KR20080027311A (en) * 2008-03-06 2008-03-26 이영춘 Method of transformation of korean to roman spelling and computer memory device recording computer program of the method
CN101571854A (en) * 2009-06-11 2009-11-04 四川国腾通讯股份有限公司 System and method for exchange and communication among multinational languages

Also Published As

Publication number Publication date
CN106649291A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN108885614B (en) Text and voice information processing method and terminal
US9977779B2 (en) Automatic supplementation of word correction dictionaries
US20160110332A1 (en) Character string input control method and apparatus
CN104852885B (en) Method, device and system for verifying verification code
US20110041056A1 (en) Electronic device with touch-sensitive display and method of facilitating input at the electronic device
US20090198691A1 (en) Device and method for providing fast phrase input
CN106203235B (en) Living body identification method and apparatus
US8996995B2 (en) Method and apparatus for phrase replacement
CN104462058B (en) Character string identification method and device
CN110889265A (en) Information processing apparatus, information processing method, and computer program
CN104281568B (en) Paraphrasing display method and paraphrasing display device
CN109543014B (en) Man-machine conversation method, device, terminal and server
US10666783B2 (en) Method and apparatus for storing telephone numbers in a portable terminal
CN110335629B (en) Pitch recognition method and device of audio file and storage medium
CN112989148A (en) Error correction word ordering method and device, terminal equipment and storage medium
CN109992753B (en) Translation processing method and terminal equipment
CN105320858B (en) Method and device for rapidly displaying application icons
CN116955610A (en) Text data processing method and device and storage medium
CN116795780A (en) Document format conversion method and device, storage medium and electronic equipment
CN106649291B (en) Korean transliteration method and device
KR102322606B1 (en) Method for correcting typographical error and mobile terminal using the same
KR20030086425A (en) System and method for filtering far east language
CN115187999A (en) Text recognition method and device, electronic equipment and computer readable storage medium
CN108093124B (en) Audio positioning method and device and mobile terminal
CN106201011B (en) Communication information retrieval method and device and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510660 Guangzhou City, Guangzhou, Guangdong, Whampoa Avenue, No. 315, self - made 1-17

Applicant after: Guangzhou KuGou Networks Co., Ltd.

Address before: 510000 B1, building, No. 16, rhyme Road, Guangzhou, Guangdong, China 13F

Applicant before: Guangzhou KuGou Networks Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant