CN106649291B

CN106649291B - Korean transliteration method and device

Info

Publication number: CN106649291B
Application number: CN201611207837.9A
Authority: CN
Inventors: 陶县俊; 邱宇扬; 黄卓腾; 姜宁
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2016-12-23
Filing date: 2016-12-23
Publication date: 2020-10-09
Anticipated expiration: 2036-12-23
Also published as: CN106649291A

Abstract

The invention discloses a Korean transliteration method and device, and belongs to the field of language processing. The method comprises the following steps: splitting the Korean information to obtain a plurality of Korean characters; inquiring phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments; and splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information. Because the word stock stores the corresponding relation between the Korean characters and the phonetic notation segments in advance, the phonetic notation segments corresponding to the Korean characters are inquired from the word stock, so that when the Korean information to be transliterated comprises a rare phrase or a new phrase popular on the Internet or an artificial phrase, the phonetic notation segments corresponding to each Korean character in the Korean information can be inquired, thereby carrying out accurate phonetic notation and improving the accuracy of the transliteration result.

Description

Korean transliteration method and device

Technical Field

The embodiment of the invention relates to the field of language processing, in particular to a Korean transliteration method and device.

Background

Transliteration (English) is used to translate words in one language into words or phonetic notations in another language that are similar to the pronunciation of the words. The current widely used korean transliteration technology is a transliteration technology based on a phrase library.

The core idea of the Korean transliteration technology based on the phrase library is as follows: manually collecting commonly used phrases in Korean and phonetic notation fragment sequences corresponding to each commonly used phrase in advance, and establishing a phrase library; the server splits the Korean information to be transliterated to obtain a plurality of groups of Korean phrases, selects a phonetic notation fragment sequence with the matching degree higher than a threshold value from a phrase library for each group of Korean phrases, and splices the selected plurality of groups of phonetic notation fragments according to the sequence corresponding to the Korean phrases to obtain phonetic notation information corresponding to the input Korean information.

According to the method, the phrase library is stored with the common phrases in the korean, and the common phrases are collected manually, so that the phrase library cannot cover all phrases in the korean, and when the korean phrase to be transliterated does not exist in the phrase library, the phonetic notation fragment sequence selected according to the matching degree is not the accurate phonetic notation of the korean phrase, so that the problem of low accuracy of the transliteration result occurs.

Disclosure of Invention

In order to solve the problem of low accuracy of the transliteration result of the current korean transliteration technology, the embodiment of the invention provides a korean transliteration method and device. The technical scheme is as follows:

in a first aspect, a korean transliteration method is provided, which includes:

splitting the Korean information to obtain a plurality of Korean characters;

inquiring phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments;

and splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

Optionally, splitting the korean information to obtain a plurality of korean characters, including:

detecting whether Korean characters to be subjected to sound variation exist between two adjacent Korean characters in the Korean information;

if the Korean characters to be voice-changed exist, replacing the Korean characters to be voice-changed with the Korean characters after voice-changing;

and obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change.

Optionally, detecting whether there are korean characters to be vocalized between two adjacent korean characters in the korean information includes:

splitting the Korean information into a plurality of groups of Korean phrases by taking the preset identification as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;

and detecting whether Korean characters to be subjected to sound variation exist between two connected Korean characters in the Korean phrase.

Optionally, detecting whether there are korean characters to be vocalized between two consecutive korean characters in the korean phrase, including:

acquiring a first monosyllabic sequence of a first Korean character and a second monosyllabic sequence of a second Korean character, wherein the first Korean character and the second Korean character are two adjacent Korean characters in a Korean phrase;

extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence;

detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination;

if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.

Optionally, replacing the korean characters to be inflected with the inflected korean characters, including:

when the first Korean character is a Korean character to be subjected to sound variation, performing sound variation on the tail syllable of the first single syllable sequence, recombining a third Korean character according to the first single syllable sequence after sound variation, and replacing the first Korean character with the third Korean character; and/or the presence of a gas in the gas,

when the second Korean character is the Korean character to be voice-changed, the head syllable of the second monosyllabic sequence is voice-changed, a fourth Korean character is recombined according to the voice-changed second monosyllabic sequence, and the fourth Korean character is used for replacing the second Korean character.

In a second aspect, there is provided a korean transliteration apparatus, including:

the splitting module is used for splitting the Korean information to obtain a plurality of Korean characters;

the query module is used for querying phonetic notation segments corresponding to the Korean characters from a character library, and the character library stores the corresponding relation between the Korean characters and the phonetic notation segments;

and the splicing module is used for splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

Optionally, the splitting module includes:

a detection unit, a replacement unit and an obtaining unit;

the detection unit is used for detecting whether Korean characters to be subjected to voice change exist between two adjacent Korean characters in the Korean information;

the replacing unit is used for replacing the Korean characters to be subjected to voice change with the Korean characters subjected to voice change if the Korean characters to be subjected to voice change exist;

and the obtaining unit is used for obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change.

Optionally, the detection unit comprises:

splitting the subunit and detecting the subunit;

the splitting subunit is used for splitting the Korean information into a plurality of groups of Korean phrases by taking the preset identification as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;

and the detection subunit is used for detecting whether Korean characters to be subjected to voice change exist between two connected Korean characters in the Korean phrase.

Optionally, the detecting subunit is further configured to obtain a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, where the first korean letter and the second korean letter are two adjacent korean letters in the korean phrase; extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence; detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination; if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.

Optionally, a replacement unit comprising:

a first replacement subunit and/or a second replacement subunit;

a first replacing subunit, configured to, when the first korean character is a korean character to be voice-changed, change a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the first monosyllabic sequence after the change of the voice, and replace the first korean character with the third korean character;

and a second replacing subunit, configured to, when the second korean character is a korean character to be voice-changed, change a head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the second monosyllabic sequence after the change of the voice, and replace the second korean character with the fourth korean character.

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

the method comprises the steps that the corresponding relation between Korean characters and phonetic notation fragments is stored in a character library in advance, Korean information is split to obtain a plurality of Korean characters, the phonetic notation fragments corresponding to the Korean characters are inquired from the character library, and the inquired phonetic notation fragments are spliced according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information; therefore, when the Korean information to be transliterated comprises the obscure phrases or the network popular new phrases or the self-created phrases, the phonetic notation segment corresponding to each Korean character in the Korean information can be inquired, so that the phonetic notation is accurately performed, and the accuracy of the transliteration result is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a korean transliteration method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a Korean transliteration method according to another embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a Korean transliteration method according to another embodiment of the present invention;

FIG. 4 is a flowchart illustrating a Korean transliteration method according to another embodiment of the present invention;

fig. 5 is a block diagram of a korean transliteration device according to an embodiment of the present invention;

fig. 6 is a block diagram of a korean transliteration device according to another embodiment of the present invention;

fig. 7 is a block diagram of a terminal provided by an embodiment of the present invention;

fig. 8 is a structural framework diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Please refer to fig. 1, which illustrates a flowchart of a korean transliteration method according to an embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and the following embodiments are described with the subject of the korean transliteration method being the server. The Korean transliteration method comprises the following steps:

step 101, splitting the Korean information to obtain a plurality of Korean characters.

Optionally, the server acquires the Korean information to be transliterated, and splits the Korean information to obtain a plurality of Korean characters; the korean information is information of which the character type is korean, and the information is a phrase or a sentence or a segment of characters or an article, which is not limited in this embodiment.

For example, the Korean information to be transliterated is

The server transmits the Korean information

Splitting to obtain four Korean characters

And

step 102, searching phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments.

Alternatively, since there are 11172 korean characters in total, the server deconstructs each korean character into a corresponding monosyllabic sequence including at least one monosyllable, also called a monosyllable symbol or a monosyllable stroke, constituting the korean character according to a predetermined encoding rule in advance; for each Korean character, the server generates a phonetic notation segment corresponding to the Korean character according to the single syllable sequence corresponding to the Korean character; illustratively, the predetermined encoding rule is an encoding rule based on a unicode (chinese: unicode) character set, the encoding section of the korean alphabet in the unicode character set is AC00 to D7AF, and the unicode encoding section of the monosyllable constituting the korean alphabet is 1100 to 11 FF.

The predetermined encoding rule may also be based on a GB18030 character Set, an UCS character Set (universal character Set), or other character sets supporting korean characters, and the present embodiment does not limit the type of the predetermined encoding rule and the encoding range of the korean characters in the predetermined encoding rule.

Optionally, the server establishes a word stock in advance according to all the korean characters, the single syllable sequence corresponding to each korean character and the corresponding phonetic notation fragment; illustratively, the phonetic notation type of the phonetic notation segment is a Roman phonetic notation type and/or a Chinese phonetic notation type. The present embodiment does not limit the ZhuYin type of the ZhuYin segment.

And in combination with the reference table I, the corresponding relation among the Korean characters, the single syllable sequence, the Roman phonetic notation and the Chinese phonetic notation is stored in a character library established by the server.

For example, when Korean is

The Korean characters

The corresponding monosyllabic sequence is

The corresponding Roman phonetic notation is 'bek', and the corresponding Chinese character phonetic notation is 'back'; when the Korean character is

The Korean characters

The corresponding monosyllabic sequence is

The corresponding Roman phonetic notation is 'ba', and the corresponding Chinese character phonetic notation is 'bar'; when the Korean character is

The Korean characters

The corresponding monosyllabic sequence is

The corresponding Roman phonetic notation is da, and the corresponding Chinese character phonetic notation is answer.

Watch 1

For example, when the Korean characters obtained by the server are respectively

And

then, the server searches the character library for the Korean character

The corresponding phonetic segment is "go" and Korean

Corresponding phonetic notation segment is 'ma' and Korean character

The corresponding phonetic segment is "wa" and Korean

The corresponding ZhuYin fragment is "yo".

For another example, when the Korean characters obtained by the server are respectively

And

then, the server searches the character library for the Korean character

The corresponding phonetic segment is 'groove' and Korean character

The corresponding phonetic segment is 'Dou', and Korean character

The corresponding phonetic notation is 'Wa' and Korean

The corresponding ZhuYin segment is "C".

And 103, splicing the inquired phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

Optionally, the server splices the searched phonetic notation fragments according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information.

For example, the server can be based on Korean characters

And

information in Korean

The phonetic segments "go", "ma", "wa" and "yo" corresponding to the four Korean characters are spliced to obtain Korean information

And corresponding ZhuYin information 'go ma wa yo'.

For another example, the server can be based on Korean characters

And

information in Korean

The phonetic segment 'groove', 'Doma', 'Wa' and 'C' corresponding to the four Korean characters are spliced to obtain the information of Korean character

The corresponding phonetic notation information "mojawa" is included.

In summary, in the embodiment, the korean information is split to obtain a plurality of korean characters, phonetic notation fragments corresponding to the korean characters are searched from the character library, and the searched phonetic notation fragments are spliced according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information; because the corresponding relation between the Korean characters and the phonetic transcription segments is stored in the character library in advance, when the Korean information to be transliterated comprises a rare phrase or a new network popular phrase or an artificial phrase, the phonetic transcription segment corresponding to each Korean character in the Korean information can be still inquired, so that accurate phonetic transcription is performed, and the accuracy of the transliteration result is improved.

For example, if the tail syllable corresponding to the previous korean character and the head syllable corresponding to the next adjacent character belong to a inflexion syllable combination in one korean phrase, there is a korean character to be inflexion between the two korean characters, and the actual pronunciation of the korean character to be inflexion is not the phonetic segment corresponding to itself but the phonetic segment corresponding to the inflexion korean character.

For example, Korean information

The first Korean character in the Chinese

The corresponding tail syllable is

The second Korean character

The corresponding head syllable is

Tail syllable

And head syllable

Belongs to the syllable combination of sound variation

Then the Korean character to be voice-changed is selected according to the Korean voice-changing rule

Tail syllable of

Is replaced by

Obtain the Korean characters after sound change

Namely at

In

Is no longer the actual pronunciation of

Corresponding phonetic notation fragment 'bad', but Korean character after sound change

The corresponding ZhuYin fragment "ban". To this end, the invention also provides the following examples.

Please refer to fig. 2, which shows a flowchart of a korean transliteration method according to another embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and includes:

step 201, detecting whether there is a korean character to be voice-changed between two adjacent korean characters in the korean information.

Optionally, the server detects whether there is a korean character to be inflected between two adjacent korean characters in the korean information.

In one possible implementation, the detecting step can be implemented as the following steps:

1. the server obtains a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, the first korean letter and the second korean letter being two adjacent korean letters in the korean information.

Wherein the first sequence of monosyllabic comprises at least one monosyllable constituting a first korean letter and the second sequence of monosyllabic comprises at least one monosyllable constituting a second korean letter.

As shown in table one, the server establishes a word stock in advance, in which a corresponding relationship between the korean characters and the monosyllabic sequences is stored, and when the server obtains the korean information to be transliterated, the server searches the monosyllabic sequences corresponding to each korean character in the korean information from the word stock.

Illustratively, the server obtains a first Korean letter

First monosyllabic sequence of (1)

And the second Korean character

Second monosyllabic sequence of (1)

2. The server extracts the tail syllable of the first monosyllabic sequence and the head syllable of the second monosyllabic sequence.

Illustratively, the server extracts the first monosyllabic sequence

Tail syllable of

And a second monosyllabic sequence

Of (2) a head syllable

3. The server detects whether the tail and head syllables belong to inflected syllable combinations.

Alternatively, as shown in table two, the server stores in advance a syllable-variant combination database created according to the syllable-variant rule of korean, in which syllable-variant combinations and after-syllable-variant are storedAnd (4) correspondence of syllable combinations. For example, when inflexion syllables are combined into

The corresponding syllable combination after the inflexion is as

When the syllable of the inflexion is combined into

The corresponding syllable combination after the inflexion is as

When the syllable of the inflexion is combined into

The corresponding syllable combination after the inflexion is as

Watch two

Illustratively, the server bases the tail syllables

And head syllable

Querying the inflexion syllable combination library whether the combination exists

If so, the combination is detected

Belonging to the syllable combination of inflexion.

4. If the tail syllable and the head syllable belong to the inflexion syllable combination, the server determines that there is a Korean character to be inflexion.

When the tail syllable and the head syllable belong to the combination of the inflexion syllables, the tail syllable may need to be inflexion, i.e. the first korean character is the korean character to be inflexion, the head syllable may need to be inflexion, i.e. the second korean character is the korean character to be inflexion, or both the tail syllable and the head syllable may need to be inflexion, i.e. the first korean character and the second korean character are both the korean characters to be inflexion.

Illustratively, if the tail syllable

And head syllable

Belonging to the combination of syllable-changing, the server determines the first Korean character

And the second Korean character

There are Korean characters to be changed.

Step 202, if there is a korean character to be inflected, replacing the korean character to be inflected with the inflected korean character.

Optionally, the server replaces the syllable combination after the sound change with the syllable combination after the sound change according to the searched syllable combination after the sound change, and replaces the first korean character and the second korean character according to the syllable combination after the sound change.

Optionally, the server judges whether the tail syllable in the syllable combination after the variation is the same as the tail syllable in the syllable combination after the variation according to the searched syllable combination after the variation, if so, the server determines that the first korean character is not the korean character to be varied; if the first Korean character is different from the first Korean character, determining that the first Korean character is the Korean character to be subjected to sound variation, performing sound variation on the tail syllable of the first mono-syllable sequence, recombining a third Korean character according to the first mono-syllable sequence after sound variation, and replacing the first Korean character with the third Korean character.

Optionally, the server judges whether the head syllable of the syllable combination after the variation is the same as the head syllable of the syllable combination after the variation according to the searched syllable combination after the variation, if so, the server determines that the second korean character is not the korean character to be varied; if the first Korean character is different from the second Korean character, determining that the second Korean character is the Korean character to be subjected to sound variation, performing sound variation on the head syllable of the second monosyllabic sequence, recombining a fourth Korean character according to the second monosyllabic sequence after sound variation, and replacing the second Korean character with the fourth Korean character.

For example, the first Korean character is

The second Korean character is

The syllable combination of the inflexion is as

Combined with the searched syllable after inflexion into

In contrast, the server determines the tail syllable

Different, but head syllables

If the first Korean character is the Korean character to be changed, the second Korean character is not the Korean character to be changed. For the tail syllable of the first monosyllabic sequence

Performing sound change according to the first monosyllable after sound changeSequence of

Recombining the third Korean character

Using the third Korean character

Replace the first Korean character

Step 203, obtaining a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters.

Optionally, the server obtains a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters; illustratively, the server is based on the third Korean letter

Obtaining and compiling Korean information

A plurality of corresponding Korean characters are respectively

And

step 204, searching phonetic notation fragments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation fragments.

Optionally, the server queries phonetic notation segments corresponding to the korean characters from a character library, wherein the character library stores the corresponding relationship between the korean characters and the phonetic notation segments; the phonetic notation type of the phonetic notation fragment includes at least one of a roman phonetic notation type and a chinese phonetic notation type, and may further include other non-korean phonetic notation types, which is not limited in this embodiment.

And step 205, splicing the searched phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

Optionally, the server sends the phonetic notation information to the terminal after obtaining the phonetic notation information corresponding to the korean information; correspondingly, the terminal receives the phonetic notation information, automatically determines the phonetic notation type or receives the phonetic notation type selected by the user, and outputs the phonetic notation information corresponding to the Korean information according to the phonetic notation type.

The embodiment also comprises the steps of detecting whether Korean characters to be subjected to sound variation exist between two adjacent Korean characters in the Korean information, replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation if the Korean characters to be subjected to sound variation exist, and obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters subjected to sound variation; therefore, when the Korean information contains the Korean character needing sound change, the Korean character to be sound changed can be replaced by the Korean character after sound change, and the phonetic notation segment corresponding to the Korean character after sound change is inquired from the word stock, so that the actual pronunciation of the Korean character to be sound changed is accurately marked.

In the embodiment, the phonetic notation type comprises at least one of a roman phonetic notation type and a Chinese phonetic notation type, the terminal receives the phonetic notation information, automatically determines the phonetic notation type or receives the phonetic notation type selected by the user, and outputs the phonetic notation information corresponding to the Korean information according to the phonetic notation type; the terminal can selectively output the phonetic notation information according to the determined phonetic notation type, and the flexibility of the phonetic notation mode is improved.

In a specific example, as shown in fig. 3, the server acquires korean information

The Korean information is stored in a character library

Deconstructing to obtain Korean characters

A single syllable sequence of

Korean characters

A single syllable sequence of

And Korean characters

A single syllable sequence of

The server then extracts the monosyllabic sequence

Tail syllable of

And monosyllabic sequences

Of (2) a head syllable

According to the syllable-variant syllable combination library detection

Belongs to the syllable combination of inflexion, and the syllable combination after inquiring the corresponding inflexion is

Server re-extracts monosyllabic sequences

Tail syllable of

And monosyllabic sequences

Of (2) a head syllable

Detecting from a library of inflexion syllable combinations

Not belonging to inflexion syllable combination, namely, inflexion is not needed; then, the server processes the Korean characters

Corresponding tail syllable

Performing sound change, based on the changed monosyllabic sequence

Recombining to obtain Korean characters

Using Korean characters

Substitution of Korean characters

Obtaining and compiling Korean information

A plurality of corresponding Korean characters for phonetic notation are respectively

And

finally, the server searches the character library for Korean characters

The phonetic segment is "ban" and Korean

The phonetic segment is "nen" and Korean

The phonetic notation fragment is 'da', and the inquired phonetic notation fragments 'ban', 'nen' and 'da' are spliced in sequence to obtain the Korean information

Corresponding phonetic notation information "ban nen da".

In some possible cases, in the korean information to be transliterated, when an identifier of a non-korean character is included between two korean characters, such as a space symbol or a punctuation symbol, the server does not perform the step of detecting whether there is a korean character to be transliterated between the two korean characters, even if the tail syllable of the first monosyllabic sequence and the head syllable of the second monosyllabic sequence corresponding to the two korean characters belong to a transliterated syllable combination, there is no need for replacement.

Please refer to fig. 4, which illustrates a flowchart of a korean transliteration method according to another embodiment of the present invention. The korean transliteration method may be performed by a server or a terminal having korean processing capability, and includes:

step 401, taking the predetermined mark as a splitting position, splitting the korean information into a plurality of groups of korean phrases. The predetermined indicia includes at least one of a space symbol and a punctuation symbol.

Optionally, the server sets a predetermined identifier in advance, and when the server acquires the korean information to be transliterated, the server splits the korean information into a plurality of groups of korean phrases by using the predetermined identifier as a splitting position according to a predetermined coding rule; illustratively, the predetermined encoding rule is an encoding rule based on a unicode character set, and the space symbol and the punctuation mark have encoding intervals of 4000 to 403F in the unicode character set. The present embodiment does not limit the type of the predetermined encoding rule and the encoding range of the space symbol and punctuation mark in the predetermined encoding rule.

Optionally, the predetermined identifier further comprises at least one of other country characters, graphic symbols, mathematical symbols, and control symbols.

For example, the Korean information to be transliterated is

The server detects punctuation marks in the Korean information, and takes the punctuation marks as splitting positions to split the Korean information into two groups of Korean phrases which are respectively Korean phrases

And

as another example, the Korean information to be transliterated is

The server detects the space symbol in the Korean information, and divides the Korean information into two groups of Korean phrases with the space symbol as the dividing position

And

alternatively, after the server splits the korean information into several groups of korean phrases, steps 402 to 406 are performed for each group of korean phrases, the specific details of which may refer to the embodiment provided in fig. 2.

Step 402, detecting whether there is a korean character to be voice-changed between two connected korean characters in the korean phrase.

Alternatively, the detecting step can be implemented as several steps as follows:

1. the server obtains a first monosyllabic sequence of a first korean letter and a second monosyllabic sequence of a second korean letter, wherein the first korean letter and the second korean letter are two adjacent korean letters in a korean phrase.

Step 403, replacing the korean characters to be voice-changed with the voice-changed korean characters.

Optionally, when the first korean character is a korean character to be inflected, the server inflects a tail syllable of the first monosyllabic sequence, recombines a third korean character according to the inflected first monosyllabic sequence, and replaces the first korean character with the third korean character.

Alternatively, when the second korean character is a korean character to be inflected, the server inflects the first syllable of the second mono-syllable sequence, recombines a fourth korean character according to the inflected second mono-syllable sequence, and replaces the second korean character with the fourth korean character.

Step 404, obtaining a plurality of Korean characters corresponding to the Korean information according to the changed Korean characters.

Optionally, the server obtains a plurality of korean characters corresponding to the korean information according to the vocalized korean characters.

Step 405, searching phonetic notation segments corresponding to the Korean characters from a character library, wherein the character library stores the corresponding relation between the Korean characters and the phonetic notation segments.

Optionally, the server searches a phonetic notation fragment corresponding to the korean character from a character library, and the character library stores a corresponding relationship between the korean character and the phonetic notation fragment.

And step 406, splicing the searched phonetic notation segments according to the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

Optionally, the server splices the phonetic transcription segments searched in each group of korean phrases according to the sequence of the korean characters in the korean phrases, and then splices the spliced korean phrases according to the sequence of the korean phrases in the korean information to obtain phonetic transcription information corresponding to the korean information.

The embodiment also takes the preset mark as a splitting position, splits the Korean information into a plurality of groups of Korean phrases, and detects whether Korean characters to be inflected exist between two connected Korean characters in the Korean phrases; when the Korean information contains the preset identification, the Korean information can be split according to the preset identification, the condition that two Korean characters separated by the preset identification in two adjacent Korean phrases are subjected to sound change is avoided, and therefore the obtained phonetic notation information is more accurate.

The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.

Please refer to fig. 5, which illustrates a schematic structural diagram of a korean transliteration apparatus according to an embodiment of the present invention. The Korean transliteration device includes:

the splitting module 520 is configured to split the korean information to obtain a plurality of korean characters;

the query module 540 is configured to query phonetic notation segments corresponding to the korean characters from a character library, where the character library stores a corresponding relationship between the korean characters and the phonetic notation segments;

and the splicing module 560 is configured to splice the searched phonetic notation fragments according to the sequence of the korean characters in the korean information to obtain phonetic notation information corresponding to the korean information.

Please refer to fig. 6, which illustrates a schematic structural diagram of a korean transliteration apparatus according to another embodiment of the present invention. The Korean transliteration device includes:

a splitting module 520 comprising:

a detection unit 521, a replacement unit 522, and a deriving unit 523;

a detecting unit 521, configured to detect whether there is a korean character to be vocalized between two adjacent korean characters in the korean information;

a replacing unit 522, configured to replace the korean characters to be voice-changed with the voice-changed korean characters if there are korean characters to be voice-changed;

an obtaining unit 523 configured to obtain a plurality of korean characters corresponding to the korean information from the vocalized korean characters after the change of the sound.

A detection unit 521, including:

a splitting subunit 521a and a detecting subunit 521 b;

a splitting subunit 521a, configured to split the korean information into a plurality of groups of korean phrases by using the predetermined identifier as a splitting position; the predetermined mark comprises at least one of a space symbol and a punctuation symbol;

the detecting subunit 521b is configured to detect whether there is a korean character to be vocalized between two consecutive korean characters in the korean phrase.

The detecting subunit 521a is further configured to obtain a first monosyllabic sequence of the first korean letter and a second monosyllabic sequence of the second korean letter, where the first korean letter and the second korean letter are two adjacent korean letters in the korean phrase; extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence; detecting whether the tail syllable and the head syllable belong to the inflexion syllable combination; if the tail syllable and the head syllable belong to the variable syllable combination, the existence of the Korean character to be subjected to variable sound is determined.

A replacement unit 522, comprising:

a first replacement subunit 522a and/or a second replacement subunit 522 b;

a first replacing subunit 522a, configured to, when the first korean character is a korean character to be voice-shifted, shift a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the shifted first monosyllabic sequence, and replace the first korean character with the third korean character;

a second replacing subunit 522b, configured to, when the second korean character is a korean character to be vocalized, vocalize a head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the vocalized second monosyllabic sequence, and replace the second korean character with the fourth korean character.

Referring to fig. 7, a block diagram of a terminal 700 according to an embodiment of the invention is shown. Specifically, the method comprises the following steps: device 700 may include RF (Radio Frequency) circuitry 710, memory 720 including one or more computer-readable storage media, input unit 730, display unit 740, sensors 750, audio circuitry 760, WiFi (wireless fidelity) module 770, processor 780 including one or more processing cores, and power supply 790. Those skilled in the art will appreciate that the configuration of the apparatus shown in fig. 7 is not intended to be limiting of the apparatus and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components. Wherein:

RF circuit 710 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for receiving downlink information from a base station and processing the received downlink information by one or more processors 780; in addition, data relating to uplink is transmitted to the base station. In general, RF circuit 710 includes, but is not limited to, an antenna, at least one Amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, an LNA (Low Noise Amplifier), a duplexer, and the like. In addition, the RF circuit 710 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA (Code Division Multiple Access), WCDMA (Wideband Code Division Multiple Access), LTE (Long Term Evolution), email, SMS (short messaging Service), etc. Memory 720 may be used to store software programs and modules. The processor 780 executes various functional applications and data processing by executing software programs and modules stored in the memory 720. The memory 720 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the stored data area may store data (such as audio data, a phonebook, etc.) created according to the use of the device 700, and the like. Further, the memory 720 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, memory 720 may also include a memory controller to provide access to memory 720 by processor 780 and input unit 730.

The input unit 730 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, the input unit 730 may include a touch-sensitive surface 731 as well as other input devices 732. Touch-sensitive surface 731, also referred to as a touch display screen or touch pad, can collect touch operations by a user on or near touch-sensitive surface 731 (e.g., operations by a user on or near touch-sensitive surface 731 using a finger, stylus, or any other suitable object or attachment) and drive the corresponding connection device according to a predetermined program. Alternatively, the touch sensitive surface 731 may comprise two parts, a touch detection means and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts it to touch point coordinates, and sends the touch point coordinates to the processor 780, and can receive and execute commands from the processor 780. In addition, the touch-sensitive surface 731 can be implemented in a variety of types, including resistive, capacitive, infrared, and surface acoustic wave. The input unit 730 may also include other input devices 732 in addition to the touch-sensitive surface 731. In particular, other input devices 732 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.

The display unit 740 may be used to display information input by or provided to the user, as well as various graphical user interfaces of the device 70, which may be made up of graphics, text, icons, video, and any combination thereof. The Display unit 740 may include a Display panel 741, and optionally, the Display panel 741 may be configured in the form of an LCD (Liquid Crystal Display), an OLED (Organic Light-Emitting Diode), or the like. Further, touch-sensitive surface 731 can be overlaid on display panel 741, such that when touch-sensitive surface 731 detects a touch operation thereon or thereabout, processor 780 can determine the type of touch event, and processor 780 can then provide a corresponding visual output on display panel 741 based on the type of touch event. Although in FIG. 7 the touch-sensitive surface 731 and the display panel 741 are implemented as two separate components to implement input and output functions, in some embodiments the touch-sensitive surface 731 and the display panel 741 may be integrated to implement input and output functions.

The device 700 may also include at least one sensor 750, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 741 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 741 and/or a backlight when the device 700 is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when the mobile phone is stationary, and can be used for applications of recognizing the posture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which may be further configured to the device 700, the detailed description is omitted here.

Audio circuitry 760, speaker 721, microphone 722 may provide an audio interface between a user and device 700. The audio circuit 760 may transmit the received electrical signal converted from the audio data to the speaker 721, and convert the received electrical signal into an audio signal for output by the speaker 721; on the other hand, the microphone 722 converts the collected sound signals into electrical signals, which are received by the audio circuit 760 and converted into audio data, which are then processed by the audio data output processor 780, either by the RF circuit 710 for transmission to another device, or by outputting the audio data to the memory 720 for further processing. The audio circuitry 760 may also include an earbud jack to provide communication of peripheral headphones with the device 700.

WiFi belongs to short-range wireless transmission technology, and the device 700 can help the user send and receive e-mail, browse web pages, access streaming media, etc. through the WiFi module 770, which provides wireless broadband internet access for the user. Although fig. 7 shows WiFi module 770, it is understood that it does not belong to the essential constitution of device 700 and may be omitted entirely as needed within the scope not changing the essence of the invention.

The processor 780 is the control center for the device 700, connects the various parts of the overall device using various interfaces and lines, and performs various functions of the device 700 and processes data by running or executing software programs and/or modules stored in the memory 720 and calling up data stored in the memory 720, thereby monitoring the device as a whole. Alternatively, processor 780 may include one or more processing cores; alternatively, processor 780 may integrate an application processor that handles primarily the operating system, user interface, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 780.

The device 700 also includes a power supply 790 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 780 via a power management system that may be used to manage charging, discharging, and power consumption. The power supply 790 may also include any component including one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.

Although not shown, the device 700 may also include a camera, a bluetooth module, etc., which are not described in detail herein.

The apparatus 700 also includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, such that the device 700 is capable of performing the aforementioned method of transliteration of korean performed by the terminal.

Referring to fig. 8, a structural framework diagram of a server according to an embodiment of the present invention is shown. Specifically, the method comprises the following steps: the server 800 includes a Central Processing Unit (CPU)801, a system memory 804 including a Random Access Memory (RAM)802 and a Read Only Memory (ROM)803, and a system bus 805 connecting the system memory 804 and the central processing unit 801. The server 800 also includes a basic input/output system (I/O system) 806, which facilitates transfer of information between devices within the computer, and a mass storage device 807 for storing an operating system 813, application programs 814, and other program modules 815.

The basic input/output system 806 includes a display 808 for displaying information and an input device 809 such as a mouse, keyboard, etc. for user input of information. Wherein the display 808 and the input device 809 are connected to the central processing unit 801 through an input output controller 810 connected to the system bus 805. The basic input/output system 806 may also include an input/output controller 810 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 810 also provides output to a display screen, a printer, or other type of output device.

The mass storage device 807 is connected to the central processing unit 801 through a mass storage controller (not shown) connected to the system bus 805. The mass storage device 807 and its associated computer-readable media provide non-volatile storage for the server 800. That is, the mass storage device 807 may include a computer-readable medium (not shown) such as a hard disk or CD-ROI drive.

Without loss of generality, the computer-readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 804 and mass storage 807 described above may be collectively referred to as memory.

The server 800 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with various embodiments of the invention. That is, the server 800 may be connected to the network 812 through the network interface unit 811 coupled to the system bus 805, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 811.

The memory further includes one or more programs stored in the memory, the one or more programs including steps for performing the server cluster in the korean transliteration method provided by the embodiments of the present invention.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A method for transliteration of korean, the method comprising:

splitting the Korean information into a plurality of groups of Korean phrases by taking a preset mark as a splitting position, wherein the preset mark comprises at least one of a space symbol and a punctuation mark symbol;

acquiring a first monosyllabic sequence of a first korean text and a second monosyllabic sequence of a second korean text, extracting a tail syllable of the first monosyllabic sequence and a head syllable of the second monosyllabic sequence, detecting whether the tail syllable and the head syllable belong to a inflected syllable combination, and determining that there is a korean text to be inflected if the tail syllable and the head syllable belong to the inflected syllable combination, wherein the korean text is composed of the monosyllabic sequences, and the first korean text and the second korean text are two adjacent korean texts in a korean phrase;

if the Korean characters to be subjected to sound variation exist, replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation;

obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change;

searching a word stock for Roman phonetic notation and Chinese phonetic notation corresponding to each single syllable sequence in the Korean characters, wherein the word stock stores the corresponding relation among the Korean characters, the single syllable sequence, the Roman phonetic notation and the Chinese phonetic notation;

and splicing the searched Roman phonetic notation and Chinese phonetic notation respectively according to the sequence of the single syllable sequence in the Korean characters and the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

2. The method according to claim 1, wherein the replacing the korean characters to be inflected with the inflected korean characters comprises:

when the first Korean character is the Korean character to be voice-changed, voice-changing the tail syllable of the first monosyllabic sequence, recombining a third Korean character according to the voice-changed first monosyllabic sequence, and replacing the first Korean character with the third Korean character; and/or the presence of a gas in the gas,

when the second Korean character is the Korean character to be voice-changed, the head syllable of the second monosyllabic sequence is voice-changed, a fourth Korean character is recombined according to the voice-changed second monosyllabic sequence, and the second Korean character is replaced by the fourth Korean character.

3. A korean transliteration device, the device comprising:

a splitting module comprising: a detection unit, a replacement unit and an obtaining unit;

the detection unit includes: splitting the subunit and detecting the subunit;

the splitting subunit is configured to split the korean information into a plurality of groups of korean phrases by using a predetermined identifier as a splitting position, where the predetermined identifier includes at least one of a space symbol and a punctuation mark;

the detecting subunit is configured to obtain a first monosyllable sequence of a first korean character and a second monosyllable sequence of a second korean character, extract a tail syllable of the first monosyllable sequence and a head syllable of the second monosyllable sequence, detect whether the tail syllable and the head syllable belong to a inflexion syllable combination, and determine that there is a korean character to be inflexion if the tail syllable and the head syllable belong to the inflexion syllable combination, where the korean character is composed of the monosyllable sequence, and the first korean character and the second korean character are two adjacent korean characters in a korean phrase;

the replacing unit is used for replacing the Korean characters to be subjected to sound variation with the Korean characters subjected to sound variation if the Korean characters to be subjected to sound variation exist;

the obtaining unit is used for obtaining a plurality of Korean characters corresponding to the Korean information according to the Korean characters after the sound change;

the query module is used for querying Roman phonetic notation and Chinese character phonetic notation corresponding to each single syllable sequence in the Korean characters from a character library, and the character library stores the corresponding relation among the Korean characters, the single syllable sequences, the Roman phonetic notation and the Chinese character phonetic notation;

and the splicing module is used for splicing the searched Roman phonetic notation and Chinese phonetic notation respectively according to the sequence of the single syllable sequence in the Korean characters and the sequence of the Korean characters in the Korean information to obtain phonetic notation information corresponding to the Korean information.

4. The apparatus of claim 3, wherein the replacement unit comprises:

a first replacement subunit and/or a second replacement subunit;

the first replacing subunit is configured to, when the first korean character is the korean character to be inflected, inflict a tail syllable of the first monosyllabic sequence, recombine a third korean character according to the inflected first monosyllabic sequence, and replace the first korean character with the third korean character;

and the second replacing subunit is configured to, when the second korean character is the korean character to be vocalized, vocalize the head syllable of the second monosyllabic sequence, recombine a fourth korean character according to the vocalized second monosyllabic sequence, and replace the second korean character with the fourth korean character.