CN111460809B - Arabic place name proper name transliteration method and device, translation equipment and storage medium - Google Patents

Arabic place name proper name transliteration method and device, translation equipment and storage medium Download PDF

Info

Publication number
CN111460809B
CN111460809B CN202010234562.8A CN202010234562A CN111460809B CN 111460809 B CN111460809 B CN 111460809B CN 202010234562 A CN202010234562 A CN 202010234562A CN 111460809 B CN111460809 B CN 111460809B
Authority
CN
China
Prior art keywords
arabic
transliteration
transliterated
romanized
vowel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010234562.8A
Other languages
Chinese (zh)
Other versions
CN111460809A (en
Inventor
毛曦
马维军
王继周
岳振华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Academy of Surveying and Mapping
Original Assignee
Chinese Academy of Surveying and Mapping
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Academy of Surveying and Mapping filed Critical Chinese Academy of Surveying and Mapping
Priority to CN202010234562.8A priority Critical patent/CN111460809B/en
Publication of CN111460809A publication Critical patent/CN111460809A/en
Priority to AU2021100730A priority patent/AU2021100730A4/en
Application granted granted Critical
Publication of CN111460809B publication Critical patent/CN111460809B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases

Abstract

The invention relates to a method, a device, translation equipment and a storage medium for transliteration of Arabic geographical names, wherein the method comprises the following steps: romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into a target transliteration table for matching to obtain a transliteration result. Compared with manual translation, the translation efficiency in the Arabic place name proper name translation is improved, the labor cost is reduced, and translation errors are easy to check.

Description

Arabic place name proper name transliteration method and device, translation equipment and storage medium
Technical Field
The invention relates to the technical field of place name translation in geographic information, in particular to a method and a device for transliterating a proper name of an Arabic place name, translation equipment and a storage medium.
Background
Place name translation refers to the conversion of an expression of a geographic entity in one language into an expression in another language. The place name is a special word for distinguishing similar things in a certain geographic entity, and is one of two main constitutional words of the place name. The general transliteration of the proper names of the place names is specified in the foreign language place name Chinese character translation guide (GB/T17693.3-2009).
The translation forms of the Arabic place name are divided into two types, one type is that the Arabic original text is directly translated into Chinese, and the other type is that the Roman Arabic place name is utilized to realize translation. At present, the adana proper name transliteration mode is mainly carried out manually, and the proper name is mechanically matched with a transliteration table, however, the mode has low efficiency and high labor cost under the background of large-scale operation, and is not easy to check errors. In addition, translation is required to be realized according to the specified rules of different regions, different languages and different types of place names. Therefore, the existing machine translation method cannot independently and efficiently solve the transliteration problem of the Artisian place name.
Disclosure of Invention
In view of the above, a method, an apparatus, a translation device, and a storage medium for transliteration of an arabic place name are provided to solve the problems of low efficiency, high cost, difficulty in checking errors, and the like in manual translation of an arabic place name in the prior art.
The invention adopts the following technical scheme:
in a first aspect, an embodiment of the present application provides a transliteration method for names of arabic languages, including:
romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated;
preprocessing the standard transliteration table to obtain a target transliteration table;
inputting the romanized Arabic place names to be transliterated into the target transliteration table for matching to obtain a transliteration result.
In a second aspect, an embodiment of the present application provides an arabic local name proper transliteration device, including:
the romanization module is used for romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated;
the transliteration table preprocessing module is used for preprocessing the standard transliteration table to obtain a target transliteration table;
and the transliteration module is used for inputting the romanized Arabic names to be transliterated into the target transliteration table for matching to obtain a transliteration result.
In a third aspect, an embodiment of the present application provides a translation apparatus, including:
a processor, and a memory coupled to the processor;
the memory is used for storing a computer program, and the computer program is at least used for executing the Arabic place name transliteration method in the first aspect of the embodiment of the application;
the processor is used for calling and executing the computer program in the memory.
In a fourth aspect, an embodiment of the present application provides a storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps in the arabic geographical name proper transliteration method according to the first aspect.
According to the technical scheme, the Arabic place names to be transliterated are romanized to obtain the romanized Arabic place names to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into a target transliteration table for matching to obtain a transliteration result. The automatic translation of the Japanese geographical name is reasonably translated according to the specific pronunciation characteristics of the Japanese geographical name, the automatic translation problem of the Japanese geographical name is independently and efficiently solved, the labor cost is reduced, and errors are easy to check.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a transliteration method for names of arabic locals according to an embodiment of the present invention;
FIG. 2 is a flowchart of another Arabic geographical name proper transliteration method according to an embodiment of the present invention;
FIG. 3 is a transliteration example diagram suitable for use in embodiments of the present invention;
fig. 4 is a schematic structural diagram of an arabic geographical name proper transliteration device according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a translation apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Examples
Fig. 1 is a flowchart of an arabic place name proper transliteration method according to an embodiment of the present invention, which may be implemented by an arabic place name proper transliteration apparatus according to an embodiment of the present invention, and the apparatus may be implemented in software and/or hardware. Referring to fig. 1, the method may specifically include the following steps:
s101, romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated.
The romanization may also be referred to as latin, and romanization is a term of linguistics, and refers to a process of converting a pinyin system that is not a latin alphabet or a roman alphabet, into a latin alphabet, that is, a non-latin character in the converted system is faithfully translated into a latin character in the conversion system by a number pair (including a diacritic of the character and a single-phone double character) according to rules and a transcription table of the transcription system. In addition, for convenience of description, arabic in the embodiments of the present application is simply referred to as "arabic".
In the embodiment of the application, the arabic place name to be transliterated is romanized to obtain the romanized arabic place name to be transliterated. The Arabic geographical name herein mainly refers to a geographical name proper name.
S102, preprocessing the standard transliteration table to obtain a target transliteration table.
The standard transliteration table is a transliteration table referred to in the prior art when transliterating the proper names of the Arabic place names, and can be an Arabic transliteration part in the foreign-language place name Chinese character translation guide book Arabic, for example. Specifically, the vowel and consonant combinations in the standard transliteration table corresponding to the arhan transliteration part and the corresponding Chinese characters are persisted to obtain the target transliteration table applied in the embodiment of the application. Compared with the original transliteration table, the target transliteration table is more suitable for quick search and accurate search in the machine translation process.
S103, inputting the romanized Arabic place names to be transliterated into a target transliteration table for matching to obtain transliteration results.
Specifically, a forward maximum transliteration matching algorithm is applied, existing syllables in the Roman Arabic place name to be transliterated are matched with vowel units and consonant units according to the maximum matching algorithm, corresponding Chinese characters are obtained, translation is correspondingly achieved, and transliteration results are obtained.
According to the technical scheme, the Arabic place names to be transliterated are romanized to obtain the romanized Arabic place names to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into a target transliteration table for matching to obtain a transliteration result. The automatic translation of the Japanese geographical name is reasonably translated according to the specific pronunciation characteristics of the Japanese geographical name, the automatic translation problem of the Japanese geographical name is independently and efficiently solved, the labor cost is reduced, and errors are easy to check.
Fig. 2 is a flowchart of a transliteration method for names of arabic units according to another embodiment of the present invention, which is implemented on the basis of the foregoing embodiment. Referring to fig. 2, the method may specifically include the following steps:
s201, romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated.
S202, extracting vowel letters and consonant letters of the head of the standard transliteration list.
Wherein, arabic is a guide rule for foreign language place name Chinese character translation written by the national place name standardization technical committee and the like and an Arabic translation table are used for guiding the Chinese character translation work of the Arabic place name. The arhan transliteration table is the standard transliteration table in the embodiment of the present application. In the standard transliteration table, the horizontal row header comprises vowels, consonant letters and consonant letters after the transcription of corresponding Roman letters, the vertical row header comprises vowel symbols of the Alphabet and vowel letters after the corresponding transcription, and the cross position of each row is the corresponding Chinese character after the combination of the vowels and the consonants. When the consonants are matched with the silent symbols of the vowels, namely only a single consonant letter exists after romanization writing, the Chinese characters in the first row of the vowels are translated during transliteration. Specifically, the vowel letters and consonant letters of the header in the standard transliteration table are extracted for storage.
S203, obtaining the corresponding Chinese characters according to the corresponding relation of the rows and the columns of the standard transliteration table.
And S204, respectively recording vowel letters, consonant letters and corresponding Chinese characters in a preset table to obtain a target transliteration table.
The preset table can be an empty Excel table, the table head vowels and consonant letters of the standard transliteration table are respectively recorded into the Excel table, and then corresponding Chinese characters are recorded according to the row-column corresponding relation, so that the target transliteration table can be obtained.
S205, traversing the whole letter sequence of the Roman Arabic place name to be transliterated, and searching the position of each vowel letter.
Before syllabic segmentation, a brief introduction to the Alphabet system is first made. The aloud alphabet system does not have vowel letters per se, and represents a pronunciation type using vowel symbols, i.e., pronunciation symbols, marked on consonants. Except overlapping sound, the total phonetic symbols are 12, mainly including vowel movement symbol, short vowel movement symbol including open symbol
Figure BDA0002430550250000051
Tooth aligning device
Figure BDA0002430550250000052
Closing sign
Figure BDA0002430550250000053
They represent different bilingual syllables in combination with different consonant letters. However, the vowel alphabet is only a vowel symbol of the alphabets and not a phonetic symbol, for example, 3 long vowel alphabet are combined with consonants, respectively, not all of which emit corresponding long vowels, for example,
Figure BDA0002430550250000061
with consonant letters
Figure BDA0002430550250000062
Combined haira, reading by combining with other consonants \257. In addition, all syllables in the Alphabet start with consonants and, in combination with vowels, the basic structure of the "one-consonant" is fixed.
In one specific example, syllabic segmentation generally includes the following 4 forms: (1) Open syllables, i.e., consonants + short vowels, with all 3 syllables in na-shi-ba being open syllables; (2) Long open syllable, i.e. consonant + long vowel, the sh g of na-sh-ba; (3) Short closed syllables, i.e., consonants + short vowels + consonants, man in man-sha-' a; (4) Long closed syllables, i.e., consonants + long vowels + consonants), nash-w \ 257n, w \ 257n.
Specifically, in the embodiment of the present application, the whole letter sequence of the romanized arabic geographical name to be transliterated is traversed, and the position of each vowel letter is found.
And S206, traversing from right to left according to the position of each vowel letter, and determining the consonant letter matched with the current vowel letter.
Specifically, after the position of each vowel is determined, traversal is started from right to left according to the position of each vowel, and the consonant letters matched with each vowel are determined.
Optionally, the determining the consonant letters may be specifically implemented by: traversing from right to left according to the position of each vowel letter; if the number of consonant letters is 1, directly combining the consonant letters and vowel letters into a syllable; if the number of consonant letters is 2, dividing the current left consonant into left syllables, and dividing the current right consonant into right syllables; if the number of the consonant letters is 3 or more, dividing the current consonant into corresponding syllables according to the approach principle; each consonant letter paired with the current vowel letter is determined.
Traversing from right to left by the position of each vowel, and if the number of the consonant letters is 1, directly combining the consonant letters and the current vowel letters into a syllable; if the number of consonant letters is 2, syllable division is realized, the current left consonant is divided into a left syllable, and the current right consonant is divided into a right syllable. In addition, if the number of consonants is 3 or more, the current consonants are divided into corresponding syllables according to a proximity principle, wherein the proximity principle refers to consonant letters with the closer vowel letter positions.
S207, completing traversal, and obtaining a romanized syllable dividing result of the Arabic place name to be transliterated.
Thus, traversing from right to left in turn to obtain all syllable dividing results.
And S208, inputting the romanized Arabic place name to be transliterated into the target transliteration table for matching by taking the syllable as a unit according to the romanized syllable dividing result of the Arabic place name to be transliterated, so as to obtain a transliteration result.
The basic idea of the forward maximum matching algorithm is to select the whole text or sub-strings to be segmented from left to right to match with a target transliteration table, if the matching is successful, the current strings are segmented, otherwise, one string is removed for continuous matching, or syllables of strings with failed matching are removed for continuous matching.
Optionally, the matching process may be specifically implemented by the following method: dividing each syllable into a plurality of strings; inputting each character string in the current syllable into a target transliteration table for matching aiming at each syllable from left to right to obtain a corresponding Chinese character; until all the strings in all syllables are matched.
The maximum matching algorithm mainly comprises a forward maximum matching algorithm, a reverse maximum matching algorithm, a bidirectional matching algorithm and the like. The main principle is that single character strings are segmented and then compared with a word bank, if a word is a word, the word is recorded, otherwise, one single character string is added or reduced, comparison is continued, and a single character string is terminated if the word is still left, and if the single character string cannot be segmented, the word is treated as unregistered.
In a specific example, the syllables that have been segmented are transliterated one by one using a maximum matching algorithm. Firstly, syllables are selected from left to right, and elements in a syllable set formed by combining combined vowels are matched one by one on the assumption that the length of the current syllable is n; if the matching is successful, acquiring the Chinese characters corresponding to the current syllable, packaging and storing the Chinese characters, removing the current syllable from the whole syllable vocabulary entry, selecting the next syllable and then matching the next syllable until all the syllable vocabulary entries are completed. In addition, if the matching fails, n-1 characters are reselected from left to right of the current syllable to match the syllable set. And finally, splicing the transliteration results of all syllables, and sorting and outputting the results. Illustratively, n is a positive integer greater than 1.
FIG. 3 shows an example of transliteration, where
Figure BDA0002430550250000071
In fig. 3, the transliteration result of "hassayerh" is obtained by performing syllable division and then performing maximum matching for the romanized arabic place name to be transliterated.
It should be noted that, the processes S202 to S204 are processes for obtaining the target transliteration table, and the processes S205 to S207 are processes for dividing syllables, and there is no obvious precedence relationship between the two processes, which is only an example in fig. 2.
In the embodiment of the application, firstly, the vowel letters, the consonant letters and the corresponding Chinese characters in the standard transliteration table are respectively input to obtain a target transliteration table, so that syllable matching can be carried out based on the target transliteration table; then, determining consonant letters matched with each vowel letter by traversing the position of each vowel letter, and dividing the whole letter sequence of the Roman Arabic place name to be transliterated into syllables; and inputting the romanized Arabic place names to be transliterated into the target transliteration table for matching by taking syllables as units to obtain transliteration results. And by applying a maximum matching algorithm, the matching efficiency and accuracy are improved. Therefore, compared with manual transliteration in the prior art, the method saves cost and improves transliteration efficiency and accuracy.
Fig. 4 is a schematic structural diagram of an arabic place name proper transliteration apparatus according to an embodiment of the present invention, which is suitable for executing an arabic place name proper transliteration method according to an embodiment of the present invention. As shown in fig. 4, the apparatus may specifically include: a romanization module 401, a transliteration table pre-processing module 402, and a transliteration module 403.
The romanization module 401 is configured to romanize the arabic place name to be transliterated to obtain a romanized arabic place name to be transliterated; a transliteration table preprocessing module 402, configured to preprocess the standard transliteration table to obtain a target transliteration table; and the transliteration module 403 is configured to input the romanized arabic geographical names to be transliterated into the target transliteration table for matching, so as to obtain a transliteration result.
According to the technical scheme, the Arabic place names to be transliterated are romanized to obtain the romanized Arabic place names to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into the target transliteration table for matching to obtain a transliteration result. The automatic translation of the Japanese geographical name is reasonably translated according to the specific pronunciation characteristics of the Japanese geographical name, the automatic translation problem of the Japanese geographical name is independently and efficiently solved, the labor cost is reduced, and errors are easy to check.
Optionally, the transliteration table preprocessing module 402 is specifically configured to:
extracting vowel letters and consonant letters of a header of a standard transliteration table;
obtaining corresponding Chinese characters according to the corresponding relation of rows and columns of the standard transliteration table;
and respectively recording vowel letters, consonant letters and corresponding Chinese characters in a preset table to obtain a target transliteration table.
Optionally, the system further comprises a syllable dividing module, configured to input the romanized arabic place name to be transliterated into the target transliteration table for matching:
traversing the whole letter sequence of the Roman Arabic place name to be transliterated, and searching the position of each vowel letter;
starting traversal from right to left according to the position of each vowel letter, and determining consonant letters matched with the current vowel letters;
completing traversal to obtain a syllable dividing result of the Rome Arabic place name to be transliterated;
correspondingly, the transliteration module is specifically configured to:
and inputting the romanized Arabic place name to be transliterated into a target transliteration table for matching by taking the syllable as a unit according to the syllable dividing result of the romanized Arabic place name to be transliterated.
Optionally, the syllable dividing module is specifically configured to:
starting traversal from right to left according to the position of each vowel letter;
if the number of consonant letters is 1, directly combining the consonant letters and vowel letters into a syllable;
if the number of consonant letters is 2, dividing the current left consonant into left syllables, and dividing the current right consonant into right syllables;
if the number of the consonant letters is 3 or more, dividing the current consonant into corresponding syllables according to the approach principle;
each consonant letter paired with the current vowel letter is determined.
Optionally, the transliteration module is specifically configured to:
dividing each syllable into a plurality of strings;
inputting each character string in the current syllable into a target transliteration table for matching aiming at each syllable from left to right to obtain a corresponding Chinese character;
until all the strings in all the syllables are matched.
The Arabic place name proper name transliteration device provided by the embodiment of the invention can execute the Arabic place name proper name transliteration method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
An embodiment of the present invention further provides a translation device, please refer to fig. 5, fig. 5 is a schematic structural diagram of a translation device, and as shown in fig. 5, the translation device includes: a processor 510, and a memory 520 coupled to the processor 510; the memory 520 is used for storing a computer program at least for executing the arabic geographical name proper transliteration method in the embodiment of the present invention; processor 510 is used to invoke and execute the computer programs in the memory; the Arabic place name proper name transliteration method at least comprises the following steps: romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into the target transliteration table for matching to obtain a transliteration result.
The embodiment of the invention also provides a storage medium, wherein the storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the Arabic geographical name proper name transliteration method in the embodiment of the invention are realized; romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated; preprocessing the standard transliteration table to obtain a target transliteration table; inputting the romanized Arabic place names to be transliterated into a target transliteration table for matching to obtain a transliteration result.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. A method for transliteration of Arabic Place names is characterized by comprising the following steps:
romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated;
preprocessing the standard transliteration table to obtain a target transliteration table;
inputting the romanized Arabic place names to be transliterated into the target transliteration table for matching to obtain a transliteration result;
in the standard transliteration table, the horizontal row header comprises the acangu consonant letters and the corresponding romanized consonant letters, the vertical row header comprises the vowel symbols of the acangu and the corresponding transliterated vowel letters, and the cross position of each row is the corresponding Chinese character after the combination of the vowels and the consonants;
wherein, the preprocessing the standard transliteration table to obtain the target transliteration table comprises:
extracting romanized vowel letters and consonant letters of the head of the standard transliteration table;
obtaining corresponding Chinese characters according to the corresponding relation of the rows and the columns of the standard transliteration table;
respectively recording the romanized vowel letters, the consonant letters and the corresponding Chinese characters in a preset table to obtain a target transliteration table;
wherein, the inputting the romanized arabic place names to be transliterated into the target transliteration table for matching further comprises:
traversing the whole letter sequence of the romanized Arabic place name to be transliterated, and searching the position of each vowel letter;
traversing from right to left according to the position of each vowel letter, and determining a consonant letter matched with the current vowel letter;
completing traversal to obtain a syllable dividing result of the romanized Arabic place name to be transliterated;
correspondingly, inputting the romanized arabic geographical names to be transliterated into the target transliteration table for matching, including:
and inputting the romanized Arabic place name to be transliterated into the target transliteration table for matching by taking syllables as units according to the syllable dividing result of the romanized Arabic place name to be transliterated.
2. The method of claim 1, wherein determining consonants that pair with a current vowel, starting from right to left with the position of each vowel, comprises:
traversing from right to left according to the position of each vowel letter;
if the number of the consonant letters is 1, directly combining the consonant letters and the vowel letters into a syllable;
if the number of consonant letters is 2, dividing the current left consonant into left syllables, and dividing the current right consonant into right syllables;
if the number of the consonant letters is 3 or more, dividing the current consonant into corresponding syllables according to the approach principle;
each consonant letter paired with the current vowel letter is determined.
3. The method of claim 1, wherein inputting the romanized arabic place names to be transliterated into the target transliteration table for matching in syllable units comprises:
dividing each syllable into a plurality of strings;
inputting each character string in the current syllable into a target transliteration table for matching aiming at each syllable from left to right to obtain a corresponding Chinese character;
until all the strings in all the syllables are matched.
4. An Arabic geographical name proper name transliteration device is characterized by comprising:
the romanization module is used for romanizing the Arabic place name to be transliterated to obtain the romanized Arabic place name to be transliterated;
the transliteration table preprocessing module is used for preprocessing the standard transliteration table to obtain a target transliteration table;
the transliteration module is used for inputting the romanized Arabic names to be transliterated into the target transliteration table for matching to obtain transliteration results;
in the standard transliteration table, the horizontal row header comprises the acangu consonant letters and the corresponding romanized consonant letters, the vertical row header comprises the vowel symbols of the acangu and the corresponding transliterated vowel letters, and the cross position of each row is the corresponding Chinese character after the combination of the vowels and the consonants;
wherein, the preprocessing the standard transliteration table to obtain the target transliteration table comprises:
extracting romanized vowel letters and consonant letters of the header of the standard transliteration table;
obtaining corresponding Chinese characters according to the corresponding relation of the rows and the columns of the standard transliteration table;
respectively recording the romanized vowel letters, the consonant letters and the corresponding Chinese characters in a preset table to obtain a target transliteration table;
wherein, the inputting the romanized arabic place names to be transliterated into the target transliteration table for matching further comprises:
traversing the whole letter sequence of the romanized Arabic place name to be transliterated, and searching the position of each vowel letter;
starting traversal from right to left according to the position of each vowel letter, and determining consonant letters matched with the current vowel letters;
completing traversal to obtain a syllable dividing result of the romanized Arabic place name to be transliterated;
correspondingly, inputting the romanized arabic geographical names to be transliterated into the target transliteration table for matching, including:
and inputting the romanized Arabic place name to be transliterated into the target transliteration table for matching by taking syllables as units according to the syllable dividing result of the romanized Arabic place name to be transliterated.
5. The apparatus of claim 4, wherein the transliteration table preprocessing module is specifically configured to:
extracting vowel letters and consonant letters of the header of the standard transliteration table;
obtaining corresponding Chinese characters according to the corresponding relation of the row and the column of the standard transliteration table;
and respectively recording the vowel letters, the consonant letters and the corresponding Chinese characters in a preset table to obtain a target transliteration table.
6. The apparatus of claim 4, further comprising a syllabization module configured to, before inputting the romanized arabic geographical name to be transliterated into the target transliteration table for matching:
traversing the whole letter sequence of the romanized Arabic place name to be transliterated, and searching the position of each vowel letter;
traversing from right to left according to the position of each vowel letter, and determining a consonant letter matched with the current vowel letter;
completing traversal to obtain a syllable dividing result of the romanized Arabic place name to be transliterated;
correspondingly, the transliteration module is specifically configured to:
and inputting the romanized Arabic place name to be transliterated into the target transliteration table for matching by taking syllables as units according to the syllable dividing result of the romanized Arabic place name to be transliterated.
7. A translation apparatus, comprising:
a processor, and a memory coupled to the processor;
the memory is adapted to store a computer program adapted to perform at least the arabic place name transliteration method of any one of claims 1-3;
the processor is used for calling and executing the computer program in the memory.
8. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the steps in the arabic geographical name proper transliteration method according to any one of claims 1 to 3.
CN202010234562.8A 2020-03-30 2020-03-30 Arabic place name proper name transliteration method and device, translation equipment and storage medium Active CN111460809B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010234562.8A CN111460809B (en) 2020-03-30 2020-03-30 Arabic place name proper name transliteration method and device, translation equipment and storage medium
AU2021100730A AU2021100730A4 (en) 2020-03-30 2021-02-05 Method and apparatus for transliterating special term of arabic geographical name, translation device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010234562.8A CN111460809B (en) 2020-03-30 2020-03-30 Arabic place name proper name transliteration method and device, translation equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111460809A CN111460809A (en) 2020-07-28
CN111460809B true CN111460809B (en) 2023-03-10

Family

ID=71684984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010234562.8A Active CN111460809B (en) 2020-03-30 2020-03-30 Arabic place name proper name transliteration method and device, translation equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111460809B (en)
AU (1) AU2021100730A4 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011135A (en) * 2021-03-03 2021-06-22 科大讯飞股份有限公司 Arabic vowel recovery method, device, equipment and storage medium
CN113361288B (en) * 2021-06-30 2024-03-12 民政部地名研究所 Automatic foreign language place name Chinese character translation writing method based on word group

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101002455B (en) * 2004-06-04 2011-12-28 B·F·加萨比安 Device and method to enhance data entry in mobile and fixed environment
AR051014A1 (en) * 2004-08-24 2006-12-13 Geneva Software Technologies Ltd SYSTEM AND METHOD FOR MIGRATION OF A PRODUCT IN VARIOUS LANGUAGES
EG25474A (en) * 2007-05-21 2012-01-11 Sherikat Link Letatweer Elbarmaguey At Sae Method for translitering and suggesting arabic replacement for a given user input
CN108628846A (en) * 2017-03-22 2018-10-09 湖南本来文化发展有限公司 Based on ES Expert System Models to the interpretation method of Sichuan accent and Arabic

Also Published As

Publication number Publication date
CN111460809A (en) 2020-07-28
AU2021100730A4 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
US9582489B2 (en) Orthographic error correction using phonetic transcription
KR101083540B1 (en) System and method for transforming vernacular pronunciation with respect to hanja using statistical method
JP2013117978A (en) Generating method for typing candidate for improvement in typing efficiency
Alghamdi et al. Automatic restoration of arabic diacritics: a simple, purely statistical approach
Chakravarthi et al. A survey of orthographic information in machine translation
Li et al. Improving text normalization using character-blocks based models and system combination
Chea et al. Khmer word segmentation using conditional random fields
CN111460809B (en) Arabic place name proper name transliteration method and device, translation equipment and storage medium
Josan et al. A Punjabi to Hindi machine transliteration system
KR20230009564A (en) Learning data correction method and apparatus thereof using ensemble score
Younes et al. Romanized tunisian dialect transliteration using sequence labelling techniques
JP2004303240A (en) System and method for word analysis
Wu et al. Integrating dictionary and web N-grams for chinese spell checking
Teshome et al. Phoneme-based English-Amharic statistical machine translation
US8977538B2 (en) Constructing and analyzing a word graph
Murthy et al. Kannada spell checker with sandhi splitter
Núñez et al. Phonetic normalization for machine translation of user generated content
Tongtep et al. Multi-stage automatic NE and pos annotation using pattern-based and statistical-based techniques for thai corpus construction
Jansche et al. Named entity transcription with pair n-gram models
Reddy et al. Substring-based transliteration with conditional random fields
Lehal A Gurmukhi to Shahmukhi transliteration system
Doyle et al. Preservation of original orthography in the construction of an Old Irish corpus
Kaur et al. Improving the accuracy of tesseract OCR engine for machine printed Hindi documents
Hatori et al. Predicting word pronunciation in Japanese
AlGahtani et al. Joint Arabic segmentation and part-of-speech tagging

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant